Re: User Defined Workflow Execution Framework

Yasas Gunarathne Fri, 28 Sep 2018 07:32:28 -0700

Hi All,

After the discussion with Upeksha, we decided to implement Airavata
Workflow as a part of the current Experiment. With that, it is required to
modify the current way of saving output files and reporting execution
errors at Process execution level.


I modified the previous workflow implementation related thrift models and
database models to align with the new requirements. [1]

Regards

[1] https://github.com/apache/airavata/pull/207

On Fri, Sep 21, 2018 at 10:24 PM Yasas Gunarathne <yasasgunarat...@gmail.com>
wrote:

> Hi Marcus and Sudhakar,
>
> Thank you for the detailed answers but still, I have few issues. Let me
> explain a little bit more about the architecture of the Airavata Workflow
> implementation.
>
> [image: workflow.png]
>
> Currently, Orchestrator is only capable of submitting single application
> Experiments. But to support different types of workflows we need to have
> more control over the processes (It is a little bit more complicated than
> submitting a set of Processes). For that, we decided to use *Helix* at
> the Orchestrator level.
>
> Since the current experiment implementation cannot be used in such a
> situation, we decided to use a separate set of models and APIs which
> enables submitting and launching workflows. [1]
>
> Workflow execution is also managed by *Helix* and it is at the
> *Orchestrator* level. These workflows contain *Helix tasks* which are
> responsible for handling workflows.
>
> *i. Flow Starter Task*
>
> This task is responsible for starting a specific branch of the Airavata
> Workflow. In a single Airavata Workflow, there can be multiple starting
> points. Flow starter is the only component which can accept input in the
> standard InputDataObjectType.
>
> *ii. Flow Terminator Task*
>
> This task is responsible for terminating a specific branch of the Airavata
> workflow. In a single workflow, there can be multiple terminating points.
> Flow terminator is the only component which can output in the standard
> OutputDataObjectType.
>
> *iii. Flow Barrier Task*
>
> This task works as a waiting component at a middle of a workflow. For
> example, if there are two applications running and the results of both
> applications are required to continue the workflow, barrier waits for both
> applications to be completed before continuing.
>
> *iv. Flow Divider Task*
>
> This task opens up new branches in the middle of a workflow.
>
> *v. Condition Handler Task*
>
> This task is the path selection component of the workflow. It works
> similar to an if statement.
>
> *vi. Foreach Loop Task*
>
> This task divides the input into specified portions and executes the task
> loop parallelly for those input portions.
>
> *vii. Do While Loop Task*
>
> This task is capable of re-running a specified tasks loop until the result
> meets a specified condition.
>
>
> Other than these flow handler tasks it contains a type of task called
> *ApplicationTask*, which is responsible for executing an application
> within a workflow (workflow contains multiple *application tasks*
> connected with *flow handler tasks*).
>
> Within these ApplicationTasks, it is required to perform the similar
> operation that is currently executed within *Orchestrator* in a single
> *Experiment*. That is, creating a Process (which has a set of tasks to be
> executed) and submitting it for execution.
>
> I was planned previously to use the same approach that Orchestrator
> follows currently when launching an experiment, also within the
> *ApplicationTask*, but later realized that it cannot be done since
> Process execution performs many experiment specific activities. That is the
> reason why I raised this issue and proposed to make Process execution
> independent.
>
> Output data staging (*Saving output files*), is planned to do within
> *ApplicationTask* after the Process completes its execution (after
> receiving the Process completion message). This is required to be done at
> the Orchestrator level since outputs are used as inputs to other *application
> tasks* within a workflow. (Outputs are persisted using the DataBlock
> table - DataBlock is responsible for maintaining the data flow within the
> workflow)
>
> I think I am clear enough about the exact issue now and waiting to hear
> from you again. Thank you again for the continuous support.
>
> Regards
>
> [1] https://github.com/apache/airavata/pull/203
>
>
> On Fri, Sep 21, 2018 at 9:03 PM Christie, Marcus Aaron <machr...@iu.edu>
> wrote:
>
>>
>>
>> On Sep 20, 2018, at 12:15 AM, Yasas Gunarathne <yasasgunarat...@gmail.com>
>> wrote:
>>
>> In the beginning, I tried to use the current ExperimentModel to implement
>> workflows since it has workflow related characteristics as you have
>> mentioned. It seemed to be designed at first keeping the workflow as a
>> primary focus including even ExperimentType.WORKFLOW. But, apart from that
>> and the database level one-to-many relationship with processes, there is no
>> significant support provided for workflows.
>>
>> I believe processes should be capable of executing independently at their
>> level of abstraction. But, in the current architecture processes execute
>> some experiment related parts going beyond their scope. For example, saving
>> experiment output along with process output after completing the process,
>> which is not required for workflows. Here, submitting a message to indicate
>> the process status should be enough.
>>
>>
>> I think Sudhakar addressed a lot of your questions, but here are some
>> additional thoughts:
>>
>> Processes just execute a set of tasks, which are specified by the
>> Orchestrator. For workflows I would expect the Orchestrator to create a
>> list of processes that each have a set of tasks that make sense for the
>> running of the workflow.  For example, regarding saving experiment output,
>> the Orchestrator could either create a process to save the experiment
>> output or have the terminal process in the workflow have a final task to
>> save the experiment output.
>>
>> If processes can execute independently, it doesn't need to keep
>> experiment_id within itself in the table. Isn't it the responsibility of
>> whatever the outer layer (Experiment/Workflow) to keep this mapping? WDYT?
>> :)
>>
>> Possibly. I wonder how this relates to the recent data parsing efforts.
>> It does make sense that we might want processes to execute independently
>> because we do have the use case of running task dags separate from any
>> experiment-like context.
>>
>> As you have mentioned we can keep an additional Experiment within
>> Workflow Application to keeping the current Process execution unchanged.
>> (Here the experiment is still executing a single application.) Is that what
>> you meant?
>>
>>
>> Not quite. I was suggesting that the Experiment is the workflow instance,
>> having a list of processes where each process executes an application
>> (corresponding roughly to nodes in the workflow dag).
>>
>> Thanks,
>>
>> Marcus
>>
>
>
> --
> *Yasas Gunarathne*
> Undergraduate at Department of Computer Science and Engineering
> Faculty of Engineering - University of Moratuwa Sri Lanka
> LinkedIn <https://www.linkedin.com/in/yasasgunarathne/> | GitHub
> <https://github.com/yasgun> | Mobile : +94 77 4893616
>


-- 
*Yasas Gunarathne*
Undergraduate at Department of Computer Science and Engineering
Faculty of Engineering - University of Moratuwa Sri Lanka
LinkedIn <https://www.linkedin.com/in/yasasgunarathne/> | GitHub
<https://github.com/yasgun> | Mobile : +94 77 4893616

Re: User Defined Workflow Execution Framework

Reply via email to