Hi Marcus and Sudhakar,
Thank you for the detailed answers but still, I have few issues. Let me
explain a little bit more about the architecture of the Airavata Workflow
implementation.
[image: workflow.png]
Currently, Orchestrator is only capable of submitting single application
Experiments. But to support different types of workflows we need to have
more control over the processes (It is a little bit more complicated than
submitting a set of Processes). For that, we decided to use *Helix* at the
Orchestrator level.
Since the current experiment implementation cannot be used in such a
situation, we decided to use a separate set of models and APIs which
enables submitting and launching workflows. [1]
Workflow execution is also managed by *Helix* and it is at the
*Orchestrator* level. These workflows contain *Helix tasks* which are
responsible for handling workflows.
*i. Flow Starter Task*
This task is responsible for starting a specific branch of the Airavata
Workflow. In a single Airavata Workflow, there can be multiple starting
points. Flow starter is the only component which can accept input in the
standard InputDataObjectType.
*ii. Flow Terminator Task*
This task is responsible for terminating a specific branch of the Airavata
workflow. In a single workflow, there can be multiple terminating points.
Flow terminator is the only component which can output in the standard
OutputDataObjectType.
*iii. Flow Barrier Task*
This task works as a waiting component at a middle of a workflow. For
example, if there are two applications running and the results of both
applications are required to continue the workflow, barrier waits for both
applications to be completed before continuing.
*iv. Flow Divider Task*
This task opens up new branches in the middle of a workflow.
*v. Condition Handler Task*
This task is the path selection component of the workflow. It works similar
to an if statement.
*vi. Foreach Loop Task*
This task divides the input into specified portions and executes the task
loop parallelly for those input portions.
*vii. Do While Loop Task*
This task is capable of re-running a specified tasks loop until the result
meets a specified condition.
Other than these flow handler tasks it contains a type of task called
*ApplicationTask*, which is responsible for executing an application within
a workflow (workflow contains multiple *application tasks* connected with *flow
handler tasks*).
Within these ApplicationTasks, it is required to perform the similar
operation that is currently executed within *Orchestrator* in a single
*Experiment*. That is, creating a Process (which has a set of tasks to be
executed) and submitting it for execution.
I was planned previously to use the same approach that Orchestrator follows
currently when launching an experiment, also within the *ApplicationTask*,
but later realized that it cannot be done since Process execution performs
many experiment specific activities. That is the reason why I raised this
issue and proposed to make Process execution independent.
Output data staging (*Saving output files*), is planned to do within
*ApplicationTask* after the Process completes its execution (after
receiving the Process completion message). This is required to be done at
the Orchestrator level since outputs are used as inputs to other *application
tasks* within a workflow. (Outputs are persisted using the DataBlock table
- DataBlock is responsible for maintaining the data flow within the
workflow)
I think I am clear enough about the exact issue now and waiting to hear
from you again. Thank you again for the continuous support.
Regards
[1] https://github.com/apache/airavata/pull/203
On Fri, Sep 21, 2018 at 9:03 PM Christie, Marcus Aaron
wrote:
>
>
> On Sep 20, 2018, at 12:15 AM, Yasas Gunarathne
> wrote:
>
> In the beginning, I tried to use the current ExperimentModel to implement
> workflows since it has workflow related characteristics as you have
> mentioned. It seemed to be designed at first keeping the workflow as a
> primary focus including even ExperimentType.WORKFLOW. But, apart from that
> and the database level one-to-many relationship with processes, there is no
> significant support provided for workflows.
>
> I believe processes should be capable of executing independently at their
> level of abstraction. But, in the current architecture processes execute
> some experiment related parts going beyond their scope. For example, saving
> experiment output along with process output after completing the process,
> which is not required for workflows. Here, submitting a message to indicate
> the process status should be enough.
>
>
> I think Sudhakar addressed a lot of your questions, but here are some
> additional thoughts:
>
> Processes just execute a set of tasks, which are specified by the
> Orchestrator. For workflows I would expect the Orchestrator to create a
> list of processes that each have a set of tasks that make sense for the
>