Hi, Intension is not to transfer work load of a failed engine onto an active one. Convergence implementation that we are working on will be able to recover from a failure, provided a timeout notification hits heat-engine. All I want is a safe holding area for my timeout tasks. Timeout can be a stack timeout or a resource timeout.
By code change :) I meant posting to a job queue will be a matter of decorating timeout method and firing it for a delayed execution. Felt that we need not use taskflow just for posting a delayed execution(timer in our case). Correct me if I'm wrong. -Vishnu From: Joshua Harlow [mailto:[email protected]] Sent: Thursday, November 13, 2014 2:15 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Heat] Using Job Queues for timeout ops A question; How is using something like celery in heat vs taskflow in heat (or at least concept [1]) 'to many code change'. Both seem like change of similar levels ;-) What was your metric for determining the code change either would have (out of curiosity)? Perhaps u should look at [2], although I'm unclear on what the desired functionality is here. Do u want the single engine to transfer its work to another engine when it 'goes down'? If so then the jobboard model + zookeper inherently does this. Or maybe u want something else? I'm probably confused because u seem to be asking for resource timeouts + recover from engine failure (which seems like a liveness issue and not a resource timeout one), those 2 things seem separable. [1] http://docs.openstack.org/developer/taskflow/jobs.html [2] http://docs.openstack.org/developer/taskflow/examples.html#jobboard-producer-consumer-simple On Nov 13, 2014, at 12:29 AM, Murugan, Visnusaran <[email protected]<mailto:[email protected]>> wrote: Hi all, Convergence-POC distributes stack operations by sending resource actions over RPC for any heat-engine to execute. Entire stack lifecycle will be controlled by worker/observer notifications. This distributed model has its own advantages and disadvantages. Any stack operation has a timeout and a single engine will be responsible for it. If that engine goes down, timeout is lost along with it. So a traditional way is for other engines to recreate timeout from scratch. Also a missed resource action notification will be detected only when stack operation timeout happens. To overcome this, we will need the following capability: 1. Resource timeout (can be used for retry) 2. Recover from engine failure (loss of stack timeout, resource action notification) Suggestion: 1. Use task queue like celery to host timeouts for both stack and resource. 2. Poll database for engine failures and restart timers/ retrigger resource retry (IMHO: This would be a traditional and weighs heavy) 3. Migrate heat to use TaskFlow. (Too many code change) I am not suggesting we use Task Flow. Using celery will have very minimum code change. (decorate appropriate functions) Your thoughts. -Vishnu IRC: ckmvishnu _______________________________________________ OpenStack-dev mailing list [email protected]<mailto:[email protected]> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
