Hi all, Convergence-POC distributes stack operations by sending resource actions over RPC for any heat-engine to execute. Entire stack lifecycle will be controlled by worker/observer notifications. This distributed model has its own advantages and disadvantages.
Any stack operation has a timeout and a single engine will be responsible for it. If that engine goes down, timeout is lost along with it. So a traditional way is for other engines to recreate timeout from scratch. Also a missed resource action notification will be detected only when stack operation timeout happens. To overcome this, we will need the following capability: 1. Resource timeout (can be used for retry) 2. Recover from engine failure (loss of stack timeout, resource action notification) Suggestion: 1. Use task queue like celery to host timeouts for both stack and resource. 2. Poll database for engine failures and restart timers/ retrigger resource retry (IMHO: This would be a traditional and weighs heavy) 3. Migrate heat to use TaskFlow. (Too many code change) I am not suggesting we use Task Flow. Using celery will have very minimum code change. (decorate appropriate functions) Your thoughts. -Vishnu IRC: ckmvishnu
_______________________________________________ OpenStack-dev mailing list OpenStackfirstname.lastname@example.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev