Re: [openstack-dev] [Mistral] How Mistral handling long running delegate tasks

Joshua Harlow Thu, 27 Mar 2014 17:17:33 -0700

Thanks for the description!

The steps here seem very much like what a taskflow engine does (which is good).


To connect this to how I think could work in taskflow.

  1.  Someone creates tasks/flows describing the work to-be-done (converting a 
DSL -> taskflow tasks/flows/retry[1] objects…)
  2.  On execute(workflow) engine creates a new workflow execution, computes 
the first batch of tasks, creates executor for those tasks (remote, local…) and 
executes those tasks.
  3.  Waits for response back from 
futures<http://docs.python.org/dev/library/concurrent.futures.html> returned 
from executor.
  4.  Receives futures responses (or receives new response DELAY, for example), 
or exceptions…
  5.  Continues sending out batches of tasks that can be still be executing 
(aka tasks that don't have dependency on output of delayed tasks).
  6.  If any delayed tasks after repeating #2-5 as many times as it can, the 
engine will shut itself down (see http://tinyurl.com/l3x3rrb).
  7.  On delay task finishing some API/webhook/other (the mechanism imho 
shouldn't be tied to webhooks, at least not in taskflow, but should be left up 
to the user of taskflow to decide how to accomplish this) will be/must be 
responsible for resuming the engine and setting the result for the previous 
delayed task.
  8.  Repeat 2 -> 7 until all tasks have executed/failed.
  9.  Profit!

This seems like it could be accomplished, although there are race conditions in 
the #6 (what if multiple delayed requests are received at the same time)? What 
locking is done to ensure that this doesn't cause conflicts? Does the POC solve 
that part (no simultaneous step #5 from below)? There was a mention of a 
watch-dog (ideally to ensure that delayed tasks can't just sit around forever), 
was that implemented?

[1] https://wiki.openstack.org/wiki/TaskFlow#Retries (new feature!)

From: Dmitri Zimine <[email protected]<mailto:[email protected]>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, March 27, 2014 at 4:43 PM
To: "OpenStack Development Mailing List (not for usage questions)" 
<[email protected]<mailto:[email protected]>>
Subject: [openstack-dev] [Mistral] How Mistral handling long running delegate 
tasks

Following up on http://tinyurl.com/l8gtmsw and http://tinyurl.com/n3v9lt8: this 
explains how Mistral handles long running delegate tasks. Note that a 'passive' 
workflow engine can handle both normal tasks and delegates the same way. I'll 
also put that on ActionDesign wiki, after discussion.

Diagram:
https://docs.google.com/a/stackstorm.com/drawings/d/147_EpdatpN_sOLQ0LS07SWhaC3N85c95TkKMAeQ_a4c/edit?usp=sharing

1. On start(workflow), engine creates a new workflow execution, computes the 
first batch of tasks, sends them to ActionRunner [1].
2. ActionRunner creates an action and calls action.run(input)
3. Action does the work (compute (10!)), produce the results,  and return the 
results to executor. If it returns, status=SUCCESS. If it fails it throws 
exception, status=ERROR.
4. ActionRunner notifies Engine that the task is complete task_done(execution, 
task, status, results)[2]
5. Engine computes the next task(s) ready to trigger, according to control flow 
and data flow, and sends them to ActionRunner.
6. Like step 2: ActionRunner calls the action's run(input)
7. A delegate action doesn't produce results: it calls out the 3rd party 
system, which is expected to make a callback to a workflow service with the 
results. It returns to ActionRunner without results, "immediately".
8. ActionRunner marks status=RUNNING [?]
9. 3rd party system takes 'long time' == longer then any system component can 
be assumed to stay alive.
10. 3rd party component calls Mistral WebHook which resolves to 
engine.task_complete(workbook, id, status, results)

Comments:
* One Engine handles multiple executions of multiple workflows. It exposes two 
main operations: start(workflow) and task_complete(execution, task, status, 
results), and responsible for defining the next batch of tasks based on control 
flow and data flow. Engine is passive - it runs in a hosts' thread. Engine and 
ActionRunner communicate via task queues asynchronously, for details, see  
https://wiki.openstack.org/wiki/Mistral/POC

* Engine doesn't distinct sync and async actions, it doesn't deal with Actions 
at all. It only reacts to task completions, handling the results, updating the 
state, and queuing next set of tasks.

* Only Action can know and define if it is a delegate or not. Some protocol 
required to let ActionRunner know that the action is not returning the results 
immediately. A convention of returning None may be sufficient.

* Mistral exposes  engine.task_done in the REST API so 3rd party systems can 
call a web hook.

DZ.

[1]  I use ActionRunner instead of Executor (current name) to avoid confusion: 
it is Engine which is responsible for executions, and ActionRunner only runs 
actions. We should rename it in the code.

[2] I use task_done for briefly and out of pure spite, in the code it is 
conveny_task_results.

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Mistral] How Mistral handling long running delegate tasks

Reply via email to