Re: [openstack-dev] [Mistral][TaskFlow] Long running actions

Renat Akhmerov Tue, 01 Apr 2014 21:39:07 -0700

On 02 Apr 2014, at 06:00, Joshua Harlow <[email protected]> wrote:


> More inline.
> 
> From: Dmitri Zimine <[email protected]>
> Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
> <[email protected]>
> Date: Tuesday, April 1, 2014 at 2:59 PM
> To: "OpenStack Development Mailing List (not for usage questions)" 
> <[email protected]>
> Subject: Re: [openstack-dev] [Mistral][TaskFlow] Long running actions
> 
>> 
>> On Apr 1, 2014, at 3:43 AM, Renat Akhmerov <[email protected]> wrote:
>>> On 25 Mar 2014, at 01:51, Joshua Harlow <[email protected]> wrote:
>>> 
>>>> The first execution model I would call the local execution model, this 
>>>> model involves forming tasks and flows and then executing them inside an 
>>>> application, that application is running for the duration of the workflow 
>>>> (although if it crashes it can re-establish the task and flows that it was 
>>>> doing and attempt to resume them). This could also be what openstack 
>>>> projects would call the 'conductor' approach where nova, ironic, trove 
>>>> have a conductor which manages these long-running actions (the conductor 
>>>> is alive/running throughout the duration of these workflows, although it 
>>>> may be restarted while running). The restarting + resuming part is 
>>>> something that openstack hasn't handled so gracefully currently, typically 
>>>> requiring either some type of cleanup at restart (or by operations), with 
>>>> taskflow using this model the resumption part makes it possible to resume 
>>>> from the last saved state (this connects into the persistence model that 
>>>> taskflow uses, the state transitions, how execution occurrs itself...). 
>>>> 
>>>> The second execution model is an extension of the first, whereby there is 
>>>> still a type of 'conductor' that is managing the life-time of the 
>>>> workflow, but instead of locally executing tasks in the conductor itself 
>>>> tasks are now executed on remote-workers (see http://tinyurl.com/lf3yqe4
>>>> ). The engine currently still is 'alive' for the life-time of the 
>>>> execution, although the work that it is doing is relatively minimal (since 
>>>> its not actually executing any task code, but proxying those requests to 
>>>> others works). The engine while running does the conducting of the 
>>>> remote-workers (saving persistence details, doing state-transtions, 
>>>> getting results, sending requests to workers…).
>>> 
>>> These two execution models are special cases of what you call “lazy 
>>> execution model” (or passive as we call it). To illustrate this idea we can 
>>> take a look at the first sequence diagram at [0], we basically will see the 
>>> following interaction:
>>> 
>>> 1) engine --(task)--> queue --(task)--> worker
>>> 2) execute task
>>> 3) worker --(result)--> queue --(result)--> engine
>>> 
>>> This is how TaskFlow worker based model works.
>>> 
>>> If we loosen the requirement in 3) and assume that not only worker can send 
>>> a task result back to engine we’ll get our passive model. Instead of worker 
>>> it can be anything else (some external system) that knows how to make this 
>>> call. A particular way is not too important, it can be a direct message or 
>>> it can be hidden behind an API method. In Mistral it’s now a REST API 
>>> method however we’re about to decouple engine from REST API so that engine 
>>> is a standalone process and listens to a queue. So worker-based model is 
>>> basically the same with the only strict requirement that only worker sends 
>>> a result back.
>>> 
>>> In order to implement local execution model on top of “lazy execution 
>>> model” we just need to abstract a transport (queue) so that we can use an 
>>> in-process transport. That’s it. It’s what Mistral already has implemented. 
>>> Again, we see that “lazy execution model” is more universal.
>>> 
>>> IMO this “lazy execution model” should be the main execution model that 
>>> TaskFlow supports, others can be easily implemented on top of it. But the 
>>> opposite assertion is wrong. IMO this is the most important obstacle in all 
>>> our discussions, the reason why we don’t always understand each other well 
>>> enough. I know it may be a lot of work to shift a paradigm in TaskFlow team 
>>> but if we did that we would get enough freedom for using TaskFlow in lots 
>>> of cases.
>>> 
>>> Let me know what you think. I might have missed something.
>> 
>> DZ: Interesting idea! So that other models of execution are based on lazy 
>> execution model? TaskFlow implements this, we can use it, and for other 
>> clients more convenient higher level execution models are provided? 
>> Interesting. Makes sense.
>> @Joshua? @Kirill? Others? 
> 
> 
> I think this is likely possible, which is simiar to whats in 
> http://tinyurl.com/k3s2gmy, engine types can be built from each other (and if 
> we wanted to alter the structure that exists in taskflow) then sure. But see 
> that message for more of my concerns around exposing that engine API to 
> library users (I think it could have its usage in mistral to expose this, but 
> I'm not sure its useful for elsewhere, and once its public engine API, its 
> public for a very long time).

What are we waiting for? Let’s code it up! :)

Renat Akhmerov
@ Mirantis Inc.

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Mistral][TaskFlow] Long running actions

Reply via email to