Re: [openstack-dev] [Heat] Using Job Queues for timeout ops

Zane Bitter Thu, 13 Nov 2014 05:56:55 -0800

On 13/11/14 03:29, Murugan, Visnusaran wrote:

Hi all,


Convergence-POC distributes stack operations by sending resource actions
over RPC for any heat-engine to execute. Entire stack lifecycle will be
controlled by worker/observer notifications. This distributed model has
its own advantages and disadvantages.

Any stack operation has a timeout and a single engine will be
responsible for it. If that engine goes down, timeout is lost along with
it. So a traditional way is for other engines to recreate timeout from
scratch. Also a missed resource action notification will be detected
only when stack operation timeout happens.

To overcome this, we will need the following capability:

1.Resource timeout (can be used for retry)

I don't believe this is strictly needed for phase 1 (essentially wedon't have it now, so nothing gets worse).

For phase 2, yes, we'll want it. One thing we haven't discussed much isthat if we used Zaqar for this then the observer could claim a messagebut not acknowledge it until it had processed it, so we could haveguaranteed delivery.

2.Recover from engine failure (loss of stack timeout, resource action
notification)

Suggestion:

1.Use task queue like celery to host timeouts for both stack and resource.

I believe Celery is more or less a non-starter as an OpenStackdependency because it uses Kombu directly to talk to the queue, vs.oslo.messaging which is an abstraction layer over Kombu, Qpid, ZeroMQand maybe others in the future. i.e. requiring Celery means that someusers would be forced to install Rabbit for the first time.

One option would be to fork Celery and replace Kombu with oslo.messagingas its abstraction layer. Good luck getting that maintained though,since Celery _invented_ Kombu to be it's abstraction layer.

2.Poll database for engine failures and restart timers/ retrigger
resource retry (IMHO: This would be a traditional and weighs heavy)

3.Migrate heat to use TaskFlow. (Too many code change)

If it's just handling timed triggers (maybe this is closer to #2) andnot migrating the whole code base, then I don't see why it would be abig change (or even a change at all - it's basically new functionality).I'm not sure if TaskFlow has something like this already. If not wecould also look at what Mistral is doing with timed tasks and see if wecould spin some of it out into an Oslo library.


cheers,
Zane.

I am not suggesting we use Task Flow. Using celery will have very
minimum code change. (decorate appropriate functions)

Your thoughts.

-Vishnu

IRC: ckmvishnu



_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] Using Job Queues for timeout ops

Reply via email to