Re: User Defined Workflow Execution Framework

2018-09-21 Thread Yasas Gunarathne
Hi Marcus and Sudhakar,

Thank you for the detailed answers but still, I have few issues. Let me
explain a little bit more about the architecture of the Airavata Workflow
implementation.

[image: workflow.png]

Currently, Orchestrator is only capable of submitting single application
Experiments. But to support different types of workflows we need to have
more control over the processes (It is a little bit more complicated than
submitting a set of Processes). For that, we decided to use *Helix* at the
Orchestrator level.

Since the current experiment implementation cannot be used in such a
situation, we decided to use a separate set of models and APIs which
enables submitting and launching workflows. [1]

Workflow execution is also managed by *Helix* and it is at the
*Orchestrator* level. These workflows contain *Helix tasks* which are
responsible for handling workflows.

*i. Flow Starter Task*

This task is responsible for starting a specific branch of the Airavata
Workflow. In a single Airavata Workflow, there can be multiple starting
points. Flow starter is the only component which can accept input in the
standard InputDataObjectType.

*ii. Flow Terminator Task*

This task is responsible for terminating a specific branch of the Airavata
workflow. In a single workflow, there can be multiple terminating points.
Flow terminator is the only component which can output in the standard
OutputDataObjectType.

*iii. Flow Barrier Task*

This task works as a waiting component at a middle of a workflow. For
example, if there are two applications running and the results of both
applications are required to continue the workflow, barrier waits for both
applications to be completed before continuing.

*iv. Flow Divider Task*

This task opens up new branches in the middle of a workflow.

*v. Condition Handler Task*

This task is the path selection component of the workflow. It works similar
to an if statement.

*vi. Foreach Loop Task*

This task divides the input into specified portions and executes the task
loop parallelly for those input portions.

*vii. Do While Loop Task*

This task is capable of re-running a specified tasks loop until the result
meets a specified condition.


Other than these flow handler tasks it contains a type of task called
*ApplicationTask*, which is responsible for executing an application within
a workflow (workflow contains multiple *application tasks* connected with *flow
handler tasks*).

Within these ApplicationTasks, it is required to perform the similar
operation that is currently executed within *Orchestrator* in a single
*Experiment*. That is, creating a Process (which has a set of tasks to be
executed) and submitting it for execution.

I was planned previously to use the same approach that Orchestrator follows
currently when launching an experiment, also within the *ApplicationTask*,
but later realized that it cannot be done since Process execution performs
many experiment specific activities. That is the reason why I raised this
issue and proposed to make Process execution independent.

Output data staging (*Saving output files*), is planned to do within
*ApplicationTask* after the Process completes its execution (after
receiving the Process completion message). This is required to be done at
the Orchestrator level since outputs are used as inputs to other *application
tasks* within a workflow. (Outputs are persisted using the DataBlock table
- DataBlock is responsible for maintaining the data flow within the
workflow)

I think I am clear enough about the exact issue now and waiting to hear
from you again. Thank you again for the continuous support.

Regards

[1] https://github.com/apache/airavata/pull/203


On Fri, Sep 21, 2018 at 9:03 PM Christie, Marcus Aaron 
wrote:

>
>
> On Sep 20, 2018, at 12:15 AM, Yasas Gunarathne 
> wrote:
>
> In the beginning, I tried to use the current ExperimentModel to implement
> workflows since it has workflow related characteristics as you have
> mentioned. It seemed to be designed at first keeping the workflow as a
> primary focus including even ExperimentType.WORKFLOW. But, apart from that
> and the database level one-to-many relationship with processes, there is no
> significant support provided for workflows.
>
> I believe processes should be capable of executing independently at their
> level of abstraction. But, in the current architecture processes execute
> some experiment related parts going beyond their scope. For example, saving
> experiment output along with process output after completing the process,
> which is not required for workflows. Here, submitting a message to indicate
> the process status should be enough.
>
>
> I think Sudhakar addressed a lot of your questions, but here are some
> additional thoughts:
>
> Processes just execute a set of tasks, which are specified by the
> Orchestrator. For workflows I would expect the Orchestrator to create a
> list of processes that each have a set of tasks that make sense for the
> 

Re: User Defined Workflow Execution Framework

2018-09-21 Thread Christie, Marcus Aaron


On Sep 20, 2018, at 12:15 AM, Yasas Gunarathne 
mailto:yasasgunarat...@gmail.com>> wrote:

In the beginning, I tried to use the current ExperimentModel to implement 
workflows since it has workflow related characteristics as you have mentioned. 
It seemed to be designed at first keeping the workflow as a primary focus 
including even ExperimentType.WORKFLOW. But, apart from that and the database 
level one-to-many relationship with processes, there is no significant support 
provided for workflows.

I believe processes should be capable of executing independently at their level 
of abstraction. But, in the current architecture processes execute some 
experiment related parts going beyond their scope. For example, saving 
experiment output along with process output after completing the process, which 
is not required for workflows. Here, submitting a message to indicate the 
process status should be enough.


I think Sudhakar addressed a lot of your questions, but here are some 
additional thoughts:

Processes just execute a set of tasks, which are specified by the Orchestrator. 
For workflows I would expect the Orchestrator to create a list of processes 
that each have a set of tasks that make sense for the running of the workflow.  
For example, regarding saving experiment output, the Orchestrator could either 
create a process to save the experiment output or have the terminal process in 
the workflow have a final task to save the experiment output.

If processes can execute independently, it doesn't need to keep experiment_id 
within itself in the table. Isn't it the responsibility of whatever the outer 
layer (Experiment/Workflow) to keep this mapping? WDYT? :)

Possibly. I wonder how this relates to the recent data parsing efforts.  It 
does make sense that we might want processes to execute independently because 
we do have the use case of running task dags separate from any experiment-like 
context.

As you have mentioned we can keep an additional Experiment within Workflow 
Application to keeping the current Process execution unchanged. (Here the 
experiment is still executing a single application.) Is that what you meant?


Not quite. I was suggesting that the Experiment is the workflow instance, 
having a list of processes where each process executes an application 
(corresponding roughly to nodes in the workflow dag).

Thanks,

Marcus