Re: [openstack-dev] [Heat] Using Job Queues for timeout ops

Murugan, Visnusaran Thu, 13 Nov 2014 01:30:13 -0800

Hi,

Intension is not to transfer work load of a failed engine onto an active one. 
Convergence implementation that we are working on will be able to recover from 
a failure, provided a timeout notification hits heat-engine. All I want is a 
safe holding area for my timeout tasks. Timeout can be a stack timeout or a 
resource timeout.


By code change :) I meant posting to a job queue will be a matter of decorating 
timeout method and firing it for a delayed execution. Felt that we need not use 
taskflow just for posting a delayed execution(timer in our case).

Correct me if I'm wrong.

-Vishnu

From: Joshua Harlow [mailto:[email protected]]
Sent: Thursday, November 13, 2014 2:15 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Heat] Using Job Queues for timeout ops

A question;

How is using something like celery in heat vs taskflow in heat (or at least 
concept [1]) 'to many code change'.

Both seem like change of similar levels ;-)

What was your metric for determining the code change either would have (out of 
curiosity)?

Perhaps u should look at [2], although I'm unclear on what the desired 
functionality is here.

Do u want the single engine to transfer its work to another engine when it 
'goes down'? If so then the jobboard model + zookeper inherently does this.

Or maybe u want something else? I'm probably confused because u seem to be 
asking for resource timeouts + recover from engine failure (which seems like a 
liveness issue and not a resource timeout one), those 2 things seem separable.

[1] http://docs.openstack.org/developer/taskflow/jobs.html

[2] 
http://docs.openstack.org/developer/taskflow/examples.html#jobboard-producer-consumer-simple

On Nov 13, 2014, at 12:29 AM, Murugan, Visnusaran 
<[email protected]<mailto:[email protected]>> wrote:


Hi all,

Convergence-POC distributes stack operations by sending resource actions over 
RPC for any heat-engine to execute. Entire stack lifecycle will be controlled 
by worker/observer notifications. This distributed model has its own advantages 
and disadvantages.

Any stack operation has a timeout and a single engine will be responsible for 
it. If that engine goes down, timeout is lost along with it. So a traditional 
way is for other engines to recreate timeout from scratch. Also a missed 
resource action notification will be detected only when stack operation timeout 
happens.

To overcome this, we will need the following capability:
1.       Resource timeout (can be used for retry)
2.       Recover from engine failure (loss of stack timeout, resource action 
notification)


Suggestion:
1.       Use task queue like celery to host timeouts for both stack and 
resource.
2.       Poll database for engine failures and restart timers/ retrigger 
resource retry (IMHO: This would be a traditional and weighs heavy)
3.       Migrate heat to use TaskFlow. (Too many code change)

I am not suggesting we use Task Flow. Using celery will have very minimum code 
change. (decorate appropriate functions)


Your thoughts.

-Vishnu
IRC: ckmvishnu
_______________________________________________
OpenStack-dev mailing list
[email protected]<mailto:[email protected]>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] Using Job Queues for timeout ops

Reply via email to