On 13/11/14 03:29, Murugan, Visnusaran wrote:
Hi all,

Convergence-POC distributes stack operations by sending resource actions
over RPC for any heat-engine to execute. Entire stack lifecycle will be
controlled by worker/observer notifications. This distributed model has
its own advantages and disadvantages.

Any stack operation has a timeout and a single engine will be
responsible for it. If that engine goes down, timeout is lost along with
it. So a traditional way is for other engines to recreate timeout from
scratch. Also a missed resource action notification will be detected
only when stack operation timeout happens.

To overcome this, we will need the following capability:

1.Resource timeout (can be used for retry)

I don't believe this is strictly needed for phase 1 (essentially we don't have it now, so nothing gets worse).

For phase 2, yes, we'll want it. One thing we haven't discussed much is that if we used Zaqar for this then the observer could claim a message but not acknowledge it until it had processed it, so we could have guaranteed delivery.

2.Recover from engine failure (loss of stack timeout, resource action
notification)

Suggestion:

1.Use task queue like celery to host timeouts for both stack and resource.

I believe Celery is more or less a non-starter as an OpenStack dependency because it uses Kombu directly to talk to the queue, vs. oslo.messaging which is an abstraction layer over Kombu, Qpid, ZeroMQ and maybe others in the future. i.e. requiring Celery means that some users would be forced to install Rabbit for the first time.

One option would be to fork Celery and replace Kombu with oslo.messaging as its abstraction layer. Good luck getting that maintained though, since Celery _invented_ Kombu to be it's abstraction layer.

2.Poll database for engine failures and restart timers/ retrigger
resource retry (IMHO: This would be a traditional and weighs heavy)

3.Migrate heat to use TaskFlow. (Too many code change)

If it's just handling timed triggers (maybe this is closer to #2) and not migrating the whole code base, then I don't see why it would be a big change (or even a change at all - it's basically new functionality). I'm not sure if TaskFlow has something like this already. If not we could also look at what Mistral is doing with timed tasks and see if we could spin some of it out into an Oslo library.

cheers,
Zane.

I am not suggesting we use Task Flow. Using celery will have very
minimum code change. (decorate appropriate functions)

Your thoughts.

-Vishnu

IRC: ckmvishnu



_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to