Intension is not to transfer work load of a failed engine onto an active one. 
Convergence implementation that we are working on will be able to recover from 
a failure, provided a timeout notification hits heat-engine. All I want is a 
safe holding area for my timeout tasks. Timeout can be a stack timeout or a 
resource timeout.

By code change :) I meant posting to a job queue will be a matter of decorating 
timeout method and firing it for a delayed execution. Felt that we need not use 
taskflow just for posting a delayed execution(timer in our case).

Correct me if I'm wrong.


From: Joshua Harlow [mailto:harlo...@outlook.com]
Sent: Thursday, November 13, 2014 2:15 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Heat] Using Job Queues for timeout ops

A question;

How is using something like celery in heat vs taskflow in heat (or at least 
concept [1]) 'to many code change'.

Both seem like change of similar levels ;-)

What was your metric for determining the code change either would have (out of 

Perhaps u should look at [2], although I'm unclear on what the desired 
functionality is here.

Do u want the single engine to transfer its work to another engine when it 
'goes down'? If so then the jobboard model + zookeper inherently does this.

Or maybe u want something else? I'm probably confused because u seem to be 
asking for resource timeouts + recover from engine failure (which seems like a 
liveness issue and not a resource timeout one), those 2 things seem separable.

[1] http://docs.openstack.org/developer/taskflow/jobs.html


On Nov 13, 2014, at 12:29 AM, Murugan, Visnusaran 
<visnusaran.muru...@hp.com<mailto:visnusaran.muru...@hp.com>> wrote:

Hi all,

Convergence-POC distributes stack operations by sending resource actions over 
RPC for any heat-engine to execute. Entire stack lifecycle will be controlled 
by worker/observer notifications. This distributed model has its own advantages 
and disadvantages.

Any stack operation has a timeout and a single engine will be responsible for 
it. If that engine goes down, timeout is lost along with it. So a traditional 
way is for other engines to recreate timeout from scratch. Also a missed 
resource action notification will be detected only when stack operation timeout 

To overcome this, we will need the following capability:
1.       Resource timeout (can be used for retry)
2.       Recover from engine failure (loss of stack timeout, resource action 

1.       Use task queue like celery to host timeouts for both stack and 
2.       Poll database for engine failures and restart timers/ retrigger 
resource retry (IMHO: This would be a traditional and weighs heavy)
3.       Migrate heat to use TaskFlow. (Too many code change)

I am not suggesting we use Task Flow. Using celery will have very minimum code 
change. (decorate appropriate functions)

Your thoughts.

IRC: ckmvishnu
OpenStack-dev mailing list

OpenStack-dev mailing list

Reply via email to