Re: User Defined Workflow Execution Framework

2018-09-28 Thread Yasas Gunarathne
Hi All,

After the discussion with Upeksha, we decided to implement Airavata
Workflow as a part of the current Experiment. With that, it is required to
modify the current way of saving output files and reporting execution
errors at Process execution level.

I modified the previous workflow implementation related thrift models and
database models to align with the new requirements. [1]

Regards

[1] https://github.com/apache/airavata/pull/207

On Fri, Sep 21, 2018 at 10:24 PM Yasas Gunarathne 
wrote:

> Hi Marcus and Sudhakar,
>
> Thank you for the detailed answers but still, I have few issues. Let me
> explain a little bit more about the architecture of the Airavata Workflow
> implementation.
>
> [image: workflow.png]
>
> Currently, Orchestrator is only capable of submitting single application
> Experiments. But to support different types of workflows we need to have
> more control over the processes (It is a little bit more complicated than
> submitting a set of Processes). For that, we decided to use *Helix* at
> the Orchestrator level.
>
> Since the current experiment implementation cannot be used in such a
> situation, we decided to use a separate set of models and APIs which
> enables submitting and launching workflows. [1]
>
> Workflow execution is also managed by *Helix* and it is at the
> *Orchestrator* level. These workflows contain *Helix tasks* which are
> responsible for handling workflows.
>
> *i. Flow Starter Task*
>
> This task is responsible for starting a specific branch of the Airavata
> Workflow. In a single Airavata Workflow, there can be multiple starting
> points. Flow starter is the only component which can accept input in the
> standard InputDataObjectType.
>
> *ii. Flow Terminator Task*
>
> This task is responsible for terminating a specific branch of the Airavata
> workflow. In a single workflow, there can be multiple terminating points.
> Flow terminator is the only component which can output in the standard
> OutputDataObjectType.
>
> *iii. Flow Barrier Task*
>
> This task works as a waiting component at a middle of a workflow. For
> example, if there are two applications running and the results of both
> applications are required to continue the workflow, barrier waits for both
> applications to be completed before continuing.
>
> *iv. Flow Divider Task*
>
> This task opens up new branches in the middle of a workflow.
>
> *v. Condition Handler Task*
>
> This task is the path selection component of the workflow. It works
> similar to an if statement.
>
> *vi. Foreach Loop Task*
>
> This task divides the input into specified portions and executes the task
> loop parallelly for those input portions.
>
> *vii. Do While Loop Task*
>
> This task is capable of re-running a specified tasks loop until the result
> meets a specified condition.
>
>
> Other than these flow handler tasks it contains a type of task called
> *ApplicationTask*, which is responsible for executing an application
> within a workflow (workflow contains multiple *application tasks*
> connected with *flow handler tasks*).
>
> Within these ApplicationTasks, it is required to perform the similar
> operation that is currently executed within *Orchestrator* in a single
> *Experiment*. That is, creating a Process (which has a set of tasks to be
> executed) and submitting it for execution.
>
> I was planned previously to use the same approach that Orchestrator
> follows currently when launching an experiment, also within the
> *ApplicationTask*, but later realized that it cannot be done since
> Process execution performs many experiment specific activities. That is the
> reason why I raised this issue and proposed to make Process execution
> independent.
>
> Output data staging (*Saving output files*), is planned to do within
> *ApplicationTask* after the Process completes its execution (after
> receiving the Process completion message). This is required to be done at
> the Orchestrator level since outputs are used as inputs to other *application
> tasks* within a workflow. (Outputs are persisted using the DataBlock
> table - DataBlock is responsible for maintaining the data flow within the
> workflow)
>
> I think I am clear enough about the exact issue now and waiting to hear
> from you again. Thank you again for the continuous support.
>
> Regards
>
> [1] https://github.com/apache/airavata/pull/203
>
>
> On Fri, Sep 21, 2018 at 9:03 PM Christie, Marcus Aaron 
> wrote:
>
>>
>>
>> On Sep 20, 2018, at 12:15 AM, Yasas Gunarathne 
>> wrote:
>>
>> In the beginning, I tried to use the current ExperimentModel to implement
>> workflows since it has workflow related characteristics as you have
>> mentioned. It seemed to be designed at first keeping the workflow as a
>> primary focus including even ExperimentType.WORKFLOW. But, apart from that
>> and the database level one-to-many relationship with processes, there is no
>> significant support provided for workflows.
>>
>> I believe processes should be capable of executing 

Re: User Defined Workflow Execution Framework

2018-09-21 Thread Yasas Gunarathne
Hi Marcus and Sudhakar,

Thank you for the detailed answers but still, I have few issues. Let me
explain a little bit more about the architecture of the Airavata Workflow
implementation.

[image: workflow.png]

Currently, Orchestrator is only capable of submitting single application
Experiments. But to support different types of workflows we need to have
more control over the processes (It is a little bit more complicated than
submitting a set of Processes). For that, we decided to use *Helix* at the
Orchestrator level.

Since the current experiment implementation cannot be used in such a
situation, we decided to use a separate set of models and APIs which
enables submitting and launching workflows. [1]

Workflow execution is also managed by *Helix* and it is at the
*Orchestrator* level. These workflows contain *Helix tasks* which are
responsible for handling workflows.

*i. Flow Starter Task*

This task is responsible for starting a specific branch of the Airavata
Workflow. In a single Airavata Workflow, there can be multiple starting
points. Flow starter is the only component which can accept input in the
standard InputDataObjectType.

*ii. Flow Terminator Task*

This task is responsible for terminating a specific branch of the Airavata
workflow. In a single workflow, there can be multiple terminating points.
Flow terminator is the only component which can output in the standard
OutputDataObjectType.

*iii. Flow Barrier Task*

This task works as a waiting component at a middle of a workflow. For
example, if there are two applications running and the results of both
applications are required to continue the workflow, barrier waits for both
applications to be completed before continuing.

*iv. Flow Divider Task*

This task opens up new branches in the middle of a workflow.

*v. Condition Handler Task*

This task is the path selection component of the workflow. It works similar
to an if statement.

*vi. Foreach Loop Task*

This task divides the input into specified portions and executes the task
loop parallelly for those input portions.

*vii. Do While Loop Task*

This task is capable of re-running a specified tasks loop until the result
meets a specified condition.


Other than these flow handler tasks it contains a type of task called
*ApplicationTask*, which is responsible for executing an application within
a workflow (workflow contains multiple *application tasks* connected with *flow
handler tasks*).

Within these ApplicationTasks, it is required to perform the similar
operation that is currently executed within *Orchestrator* in a single
*Experiment*. That is, creating a Process (which has a set of tasks to be
executed) and submitting it for execution.

I was planned previously to use the same approach that Orchestrator follows
currently when launching an experiment, also within the *ApplicationTask*,
but later realized that it cannot be done since Process execution performs
many experiment specific activities. That is the reason why I raised this
issue and proposed to make Process execution independent.

Output data staging (*Saving output files*), is planned to do within
*ApplicationTask* after the Process completes its execution (after
receiving the Process completion message). This is required to be done at
the Orchestrator level since outputs are used as inputs to other *application
tasks* within a workflow. (Outputs are persisted using the DataBlock table
- DataBlock is responsible for maintaining the data flow within the
workflow)

I think I am clear enough about the exact issue now and waiting to hear
from you again. Thank you again for the continuous support.

Regards

[1] https://github.com/apache/airavata/pull/203


On Fri, Sep 21, 2018 at 9:03 PM Christie, Marcus Aaron 
wrote:

>
>
> On Sep 20, 2018, at 12:15 AM, Yasas Gunarathne 
> wrote:
>
> In the beginning, I tried to use the current ExperimentModel to implement
> workflows since it has workflow related characteristics as you have
> mentioned. It seemed to be designed at first keeping the workflow as a
> primary focus including even ExperimentType.WORKFLOW. But, apart from that
> and the database level one-to-many relationship with processes, there is no
> significant support provided for workflows.
>
> I believe processes should be capable of executing independently at their
> level of abstraction. But, in the current architecture processes execute
> some experiment related parts going beyond their scope. For example, saving
> experiment output along with process output after completing the process,
> which is not required for workflows. Here, submitting a message to indicate
> the process status should be enough.
>
>
> I think Sudhakar addressed a lot of your questions, but here are some
> additional thoughts:
>
> Processes just execute a set of tasks, which are specified by the
> Orchestrator. For workflows I would expect the Orchestrator to create a
> list of processes that each have a set of tasks that make sense for the
> 

Re: User Defined Workflow Execution Framework

2018-09-21 Thread Christie, Marcus Aaron


On Sep 20, 2018, at 12:15 AM, Yasas Gunarathne 
mailto:yasasgunarat...@gmail.com>> wrote:

In the beginning, I tried to use the current ExperimentModel to implement 
workflows since it has workflow related characteristics as you have mentioned. 
It seemed to be designed at first keeping the workflow as a primary focus 
including even ExperimentType.WORKFLOW. But, apart from that and the database 
level one-to-many relationship with processes, there is no significant support 
provided for workflows.

I believe processes should be capable of executing independently at their level 
of abstraction. But, in the current architecture processes execute some 
experiment related parts going beyond their scope. For example, saving 
experiment output along with process output after completing the process, which 
is not required for workflows. Here, submitting a message to indicate the 
process status should be enough.


I think Sudhakar addressed a lot of your questions, but here are some 
additional thoughts:

Processes just execute a set of tasks, which are specified by the Orchestrator. 
For workflows I would expect the Orchestrator to create a list of processes 
that each have a set of tasks that make sense for the running of the workflow.  
For example, regarding saving experiment output, the Orchestrator could either 
create a process to save the experiment output or have the terminal process in 
the workflow have a final task to save the experiment output.

If processes can execute independently, it doesn't need to keep experiment_id 
within itself in the table. Isn't it the responsibility of whatever the outer 
layer (Experiment/Workflow) to keep this mapping? WDYT? :)

Possibly. I wonder how this relates to the recent data parsing efforts.  It 
does make sense that we might want processes to execute independently because 
we do have the use case of running task dags separate from any experiment-like 
context.

As you have mentioned we can keep an additional Experiment within Workflow 
Application to keeping the current Process execution unchanged. (Here the 
experiment is still executing a single application.) Is that what you meant?


Not quite. I was suggesting that the Experiment is the workflow instance, 
having a list of processes where each process executes an application 
(corresponding roughly to nodes in the workflow dag).

Thanks,

Marcus


Re: User Defined Workflow Execution Framework

2018-09-20 Thread Pamidighantam, Sudhakar
Yasas and Marcus:

Please see below..
On Sep 20, 2018, at 12:15 AM, Yasas Gunarathne 
mailto:yasasgunarat...@gmail.com>> wrote:

Hi Marcus,

Thank you very much for the suggestion.

In the beginning, I tried to use the current ExperimentModel to implement 
workflows since it has workflow related characteristics as you have mentioned. 
It seemed to be designed at first keeping the workflow as a primary focus 
including even ExperimentType.WORKFLOW. But, apart from that and the database 
level one-to-many relationship with processes, there is no significant support 
provided for workflows.

I believe processes should be capable of executing independently at their level 
of abstraction. But, in the current architecture processes execute some 
experiment related parts going beyond their scope. For example, saving 
experiment output along with process output after completing the process, which 
is not required for workflows. Here, submitting a message to indicate the 
process status should be enough.


This can be implemented as an option, keeping the saving the files at the end 
of each step of the workflow either as default.



If processes can execute independently, it doesn't need to keep experiment_id 
within itself in the table. Isn't it the responsibility of whatever the outer 
layer (Experiment/Workflow) to keep this mapping? WDYT? :)


An Experiment can have multiple jobs. That way the experiment ID becomes 
equivalent to a workflow ID but among the processes multiple jobIDs could be 
associated.


As you have mentioned we can keep an additional Experiment within Workflow 
Application to keeping the current Process execution unchanged. (Here the 
experiment is still executing a single application.) Is that what you meant?


For each job different application could be associated and the requirement that 
we start with Application need to be modified.

All these are assuming we keep the experimentID at the top level and provide a 
way to add more applications based jobs as processes below along with data 
staging steps.

Thanks,
Sudhakar.

Regards


On Thu, Sep 20, 2018 at 6:22 AM Christie, Marcus Aaron 
mailto:machr...@iu.edu>> wrote:


On Sep 8, 2018, at 9:11 AM, Yasas Gunarathne 
mailto:yasasgunarat...@gmail.com>> wrote:

To represent workflow application -> process relationship, it is required to 
fill the experiment_id field of the process with a null and then add an entry 
to an intermediate table to keep the above relationship, and modify the logic a 
little bit to handle null experiment ids (which seems like a bad way to do it).

Hi Yasas,

Sorry for the late reply. Not sure if this helps, but I would think of an 
experiment as an execution of a workflow instead of simply being one-to-one 
with a process. The ExperimentModel already features a list of Processes, 
although recently without workflows there is generally only one Process. Maybe 
the workflow application can reference an Experiment that keeps the 
relationship with the processes for the workflow?

Thanks,

Marcus


--
Yasas Gunarathne
Undergraduate at Department of Computer Science and Engineering
Faculty of Engineering - University of Moratuwa Sri Lanka
LinkedIn | 
GitHub | Mobile : +94 77 4893616



Re: User Defined Workflow Execution Framework

2018-09-19 Thread Yasas Gunarathne
Hi Marcus,

Thank you very much for the suggestion.

In the beginning, I tried to use the current ExperimentModel to implement
workflows since it has workflow related characteristics as you have
mentioned. It seemed to be designed at first keeping the workflow as a
primary focus including even ExperimentType.WORKFLOW. But, apart from that
and the database level one-to-many relationship with processes, there is no
significant support provided for workflows.

I believe processes should be capable of executing independently at their
level of abstraction. But, in the current architecture processes execute
some experiment related parts going beyond their scope. For example, saving
experiment output along with process output after completing the process,
which is not required for workflows. Here, submitting a message to indicate
the process status should be enough.

If processes can execute independently, it doesn't need to keep
experiment_id within itself in the table. Isn't it the responsibility of
whatever the outer layer (Experiment/Workflow) to keep this mapping? WDYT?
:)

As you have mentioned we can keep an additional Experiment within Workflow
Application to keeping the current Process execution unchanged. (Here the
experiment is still executing a single application.) Is that what you meant?

Regards


On Thu, Sep 20, 2018 at 6:22 AM Christie, Marcus Aaron 
wrote:

>
>
> On Sep 8, 2018, at 9:11 AM, Yasas Gunarathne 
> wrote:
>
> To represent workflow application -> process relationship, it is required
> to fill the experiment_id field of the process with a null and then add an
> entry to an intermediate table to keep the above relationship, and modify
> the logic a little bit to handle null experiment ids (which seems like a
> bad way to do it).
>
> Hi Yasas,
>
> Sorry for the late reply. Not sure if this helps, but I would think of an
> experiment as an execution of a workflow instead of simply being one-to-one
> with a process. The ExperimentModel already features a list of Processes,
> although recently without workflows there is generally only one Process.
> Maybe the workflow application can reference an Experiment that keeps the
> relationship with the processes for the workflow?
>
> Thanks,
>
> Marcus
>


-- 
*Yasas Gunarathne*
Undergraduate at Department of Computer Science and Engineering
Faculty of Engineering - University of Moratuwa Sri Lanka
LinkedIn  | GitHub
 | Mobile : +94 77 4893616


Re: User Defined Workflow Execution Framework

2018-09-19 Thread Christie, Marcus Aaron


On Sep 8, 2018, at 9:11 AM, Yasas Gunarathne 
mailto:yasasgunarat...@gmail.com>> wrote:

To represent workflow application -> process relationship, it is required to 
fill the experiment_id field of the process with a null and then add an entry 
to an intermediate table to keep the above relationship, and modify the logic a 
little bit to handle null experiment ids (which seems like a bad way to do it).

Hi Yasas,

Sorry for the late reply. Not sure if this helps, but I would think of an 
experiment as an execution of a workflow instead of simply being one-to-one 
with a process. The ExperimentModel already features a list of Processes, 
although recently without workflows there is generally only one Process. Maybe 
the workflow application can reference an Experiment that keeps the 
relationship with the processes for the workflow?

Thanks,

Marcus


Re: User Defined Workflow Execution Framework

2018-07-07 Thread Yasas Gunarathne
Hi Upeksha,

I will change my implementation [1] accordingly.

[1]
https://github.com/apache/airavata/compare/develop...yasgun:ochestrator-refactoring

Thank You

On Sat, Jul 7, 2018 at 10:09 PM DImuthu Upeksha 
wrote:

> Hi Yasas,
>
> I my preference is to go with first approach. It looks simple and clean.
> Having two notations as Workflow and Experiment might confuse the users
> also. I agree that APIs should be changed but we can preserve old APIs for
> some time by manually mapping them to new APIs. Can you share your
> currently working on fork with this thread as well?
>
> Thanks
> Dimuthu
>
> On Sat, Jul 7, 2018 at 12:50 AM, Yasas Gunarathne <
> yasasgunarat...@gmail.com> wrote:
>
>> Hi All,
>>
>> Thank you for the information. I will consider the explained scenarios in
>> the process of modifying the Orchestrator with workflow capabilities.
>>
>> Apart from that, I have few issues to be clarified regarding the API
>> level implementation. *ExperimentModel* has *ExperimentType* enum which
>> includes two basic types; *SINGLE_APPLICATION* and *WORKFLOW*. According
>> to this, experiment can be a single application or a workflow (which may
>> include multiple applications). But the other parameters in the experiment
>> model are defined considering it only as a single application. Therefore,
>> the actual meaning of  experiment, needs to be clarified in order to
>> continue with the API level implementation.
>>
>> There are two basic options,
>> 1. Modifying *ExperimentModel* to support workflows (which causes all
>> client side implementations to be modified)
>> [image: 1.png]
>> ​2. Defining a separate *WorkflowModel* for workflow execution and
>> removing *ExperimentType* parameter from *ExperimentModel* to avoid
>> confusion.
>> [image: 2.png]
>> ​Please provide any suggestions regarding these two options or any other
>> alternative if any. For the moment, I am working on creating a separate 
>> *WorkflowModel
>> *(which is little bit similar to XBaya *WorkflowModel*).
>>
>> Regards
>>
>> On Mon, Jun 4, 2018 at 8:41 PM Pamidighantam, Sudhakar 
>> wrote:
>>
>>> Some times the workflow crashes and/or  ends unfinished which is
>>> probably more like scenario 2. In those cases also one has to restart the
>>> workflow from the point where it stopped.
>>> So a workflow state needs to be maintained along with the data needed
>>> and where it might be available when a restart is required. It not strictly
>>> cloning and rerunning an old workflow but restarting in the middle of an
>>> execution.
>>>
>>> Thanks,
>>> Sudhakar.
>>>
>>> On Jun 4, 2018, at 10:43 AM, DImuthu Upeksha 
>>> wrote:
>>>
>>> Hi Yasas,
>>>
>>> Thanks for the summary. As now you have a clear idea about what you have
>>> to do, let's move on to implement a prototype that validates your workflow
>>> blocks so that we can give our feedbacks constructively.
>>>
>>> Hi Sudhakar,
>>>
>>> Based on your question, I can imagine two scenarios.
>>>
>>> 1. Workflow is paused in the middle and resumed when required.
>>> This is straightforward if we use Helix api directly
>>>
>>> 2. Workflow is stopped permanently and do a fresh restart of the
>>> workflow.
>>> As far as I have understood, Helix currently does not have a workflow
>>> cloning capability, So we might have to clone it from our side and instruct
>>> Helix to run it as a new workflow. Or we can extend Helix api to support
>>> workflow cloning which is the cleaner and ideal way. However it might need
>>> some understanding of Helix code base and proper testing. So for the
>>> time-being, let's go with the first approach.
>>>
>>> Thanks
>>> Dimuthu
>>>
>>> On Sun, Jun 3, 2018 at 7:35 AM, Pamidighantam, Sudhakar >> > wrote:
>>>
 Is there a chance to include workflow restarter (where it was stopped
 earlier) in the tasks.

 Thanks,
 Sudhakar.

 On Jun 2, 2018, at 11:52 PM, Yasas Gunarathne <
 yasasgunarat...@gmail.com> wrote:

 Hi Suresh and Dimuthu,

 Thank you very much for the clarifications and suggestions. Based on
 them and other Helix related factors encountered during the implementation
 process, I updated and simplified the structure of workflow execution
 framework.

 *1. Airavata Workflow Manager*

 Airavata Workflow Manager is responsible for accepting the workflow
 information provided by the user, creating a Helix workflow with task
 dependencies, and submitting it for execution.


 *2. Airavata Workflow Data Blocks*

 Airavata Workflow Data Blocks are saved in JSON format as user contents
 in Helix workflow scope. These blocks contain the links of input data of
 the user, replica catalog entries of output data, and other information
 that are required for the workflow execution.


 *3. Airavata Workflow Tasks*
 *3.1. Operator Tasks*

 *i. Flow Starter Task*

 Flow Starter Task is responsible for starting a specific branch of the

Re: User Defined Workflow Execution Framework

2018-07-06 Thread Yasas Gunarathne
Hi All,

Thank you for the information. I will consider the explained scenarios in
the process of modifying the Orchestrator with workflow capabilities.

Apart from that, I have few issues to be clarified regarding the API level
implementation. *ExperimentModel* has *ExperimentType* enum which includes
two basic types; *SINGLE_APPLICATION* and *WORKFLOW*. According to this,
experiment can be a single application or a workflow (which may include
multiple applications). But the other parameters in the experiment model
are defined considering it only as a single application. Therefore, the
actual meaning of  experiment, needs to be clarified in order to continue
with the API level implementation.

There are two basic options,
1. Modifying *ExperimentModel* to support workflows (which causes all
client side implementations to be modified)
[image: 1.png]
​2. Defining a separate *WorkflowModel* for workflow execution and removing
*ExperimentType* parameter from *ExperimentModel* to avoid confusion.
[image: 2.png]
​Please provide any suggestions regarding these two options or any other
alternative if any. For the moment, I am working on creating a
separate *WorkflowModel
*(which is little bit similar to XBaya *WorkflowModel*).

Regards

On Mon, Jun 4, 2018 at 8:41 PM Pamidighantam, Sudhakar 
wrote:

> Some times the workflow crashes and/or  ends unfinished which is probably
> more like scenario 2. In those cases also one has to restart the workflow
> from the point where it stopped.
> So a workflow state needs to be maintained along with the data needed and
> where it might be available when a restart is required. It not strictly
> cloning and rerunning an old workflow but restarting in the middle of an
> execution.
>
> Thanks,
> Sudhakar.
>
> On Jun 4, 2018, at 10:43 AM, DImuthu Upeksha 
> wrote:
>
> Hi Yasas,
>
> Thanks for the summary. As now you have a clear idea about what you have
> to do, let's move on to implement a prototype that validates your workflow
> blocks so that we can give our feedbacks constructively.
>
> Hi Sudhakar,
>
> Based on your question, I can imagine two scenarios.
>
> 1. Workflow is paused in the middle and resumed when required.
> This is straightforward if we use Helix api directly
>
> 2. Workflow is stopped permanently and do a fresh restart of the workflow.
> As far as I have understood, Helix currently does not have a workflow
> cloning capability, So we might have to clone it from our side and instruct
> Helix to run it as a new workflow. Or we can extend Helix api to support
> workflow cloning which is the cleaner and ideal way. However it might need
> some understanding of Helix code base and proper testing. So for the
> time-being, let's go with the first approach.
>
> Thanks
> Dimuthu
>
> On Sun, Jun 3, 2018 at 7:35 AM, Pamidighantam, Sudhakar 
> wrote:
>
>> Is there a chance to include workflow restarter (where it was stopped
>> earlier) in the tasks.
>>
>> Thanks,
>> Sudhakar.
>>
>> On Jun 2, 2018, at 11:52 PM, Yasas Gunarathne 
>> wrote:
>>
>> Hi Suresh and Dimuthu,
>>
>> Thank you very much for the clarifications and suggestions. Based on them
>> and other Helix related factors encountered during the implementation
>> process, I updated and simplified the structure of workflow execution
>> framework.
>>
>> *1. Airavata Workflow Manager*
>>
>> Airavata Workflow Manager is responsible for accepting the workflow
>> information provided by the user, creating a Helix workflow with task
>> dependencies, and submitting it for execution.
>>
>>
>> *2. Airavata Workflow Data Blocks*
>>
>> Airavata Workflow Data Blocks are saved in JSON format as user contents
>> in Helix workflow scope. These blocks contain the links of input data of
>> the user, replica catalog entries of output data, and other information
>> that are required for the workflow execution.
>>
>>
>> *3. Airavata Workflow Tasks*
>> *3.1. Operator Tasks*
>>
>> *i. Flow Starter Task*
>>
>> Flow Starter Task is responsible for starting a specific branch of the
>> Airavata Workflow. In a single Airavata Workflow there can be multiple
>> starting points.
>>
>> *ii. Flow Terminator Task*
>>
>> Flow Terminator Task is responsible for terminating a specific branch of
>> the Airavata workflow. In a single workflow there can be multiple
>> terminating points.
>>
>> *iii. Flow Barrier Task*
>>
>> Flow Barrier Task works as a waiting component at a middle of a workflow.
>> For example if there are two experiments running and the results of both
>> experiments are required to continue the workflow, barrier waits for both
>> experiments to be completed before continuing.
>>
>> *iv. Flow Divider Task*
>>
>> Flow Divider Task opens up new branches of the workflow.
>>
>> *v. Condition Handler Task*
>>
>> Condition Handler Task is the path selection component of the workflow.
>>
>>
>> *3.2. Processor Tasks*
>>
>> These components are responsible for triggering Orchestrator to perform
>> specific processes (ex: experiments / data 

Re: User Defined Workflow Execution Framework

2018-06-04 Thread Pamidighantam, Sudhakar
Some times the workflow crashes and/or  ends unfinished which is probably more 
like scenario 2. In those cases also one has to restart the workflow from the 
point where it stopped.
So a workflow state needs to be maintained along with the data needed and where 
it might be available when a restart is required. It not strictly cloning and 
rerunning an old workflow but restarting in the middle of an execution.

Thanks,
Sudhakar.
On Jun 4, 2018, at 10:43 AM, DImuthu Upeksha 
mailto:dimuthu.upeks...@gmail.com>> wrote:

Hi Yasas,

Thanks for the summary. As now you have a clear idea about what you have to do, 
let's move on to implement a prototype that validates your workflow blocks so 
that we can give our feedbacks constructively.

Hi Sudhakar,

Based on your question, I can imagine two scenarios.

1. Workflow is paused in the middle and resumed when required.
This is straightforward if we use Helix api directly

2. Workflow is stopped permanently and do a fresh restart of the workflow.
As far as I have understood, Helix currently does not have a workflow cloning 
capability, So we might have to clone it from our side and instruct Helix to 
run it as a new workflow. Or we can extend Helix api to support workflow 
cloning which is the cleaner and ideal way. However it might need some 
understanding of Helix code base and proper testing. So for the time-being, 
let's go with the first approach.

Thanks
Dimuthu

On Sun, Jun 3, 2018 at 7:35 AM, Pamidighantam, Sudhakar 
mailto:pamid...@iu.edu>> wrote:
Is there a chance to include workflow restarter (where it was stopped earlier) 
in the tasks.

Thanks,
Sudhakar.

On Jun 2, 2018, at 11:52 PM, Yasas Gunarathne 
mailto:yasasgunarat...@gmail.com>> wrote:

Hi Suresh and Dimuthu,

Thank you very much for the clarifications and suggestions. Based on them and 
other Helix related factors encountered during the implementation process, I 
updated and simplified the structure of workflow execution framework.

1. Airavata Workflow Manager
Airavata Workflow Manager is responsible for accepting the workflow information 
provided by the user, creating a Helix workflow with task dependencies, and 
submitting it for execution.

2. Airavata Workflow Data Blocks
Airavata Workflow Data Blocks are saved in JSON format as user contents in 
Helix workflow scope. These blocks contain the links of input data of the user, 
replica catalog entries of output data, and other information that are required 
for the workflow execution.

3. Airavata Workflow Tasks
3.1. Operator Tasks
i. Flow Starter Task
Flow Starter Task is responsible for starting a specific branch of the Airavata 
Workflow. In a single Airavata Workflow there can be multiple starting points.
ii. Flow Terminator Task
Flow Terminator Task is responsible for terminating a specific branch of the 
Airavata workflow. In a single workflow there can be multiple terminating 
points.
iii. Flow Barrier Task
Flow Barrier Task works as a waiting component at a middle of a workflow. For 
example if there are two experiments running and the results of both 
experiments are required to continue the workflow, barrier waits for both 
experiments to be completed before continuing.
iv. Flow Divider Task
Flow Divider Task opens up new branches of the workflow.
v. Condition Handler Task
Condition Handler Task is the path selection component of the workflow.

3.2. Processor Tasks
These components are responsible for triggering Orchestrator to perform 
specific processes (ex: experiments / data processing activities).

3.3. Loop Tasks
i. Foreach Loop Task
ii. Do While Loop Task

Regards

On Mon, May 21, 2018 at 4:01 PM Suresh Marru 
mailto:sma...@apache.org>> wrote:
Hi Yasas,

This is good detail, I haven’t digested it all, but a quick feedback. Instead 
of connecting multiple experiments within a workflow which will be confusing 
from a user point of view, can you use the following terminology:

* A computational experiment may have a single application execution or 
multiple (workflow).

** So an experiment may correspond to a single application execution, multiple 
application execution or even multiple workflows nested amongst them 
(hierarchal workflows) To avoid any confusion, lets call these units of 
execution as a Process.

A process is an abstract notion for a unit of execution without going into 
implementing details and it described the inputs and outputs. For an experiment 
with single application, experiment and process have one on one correspondence, 
but within a workflow, each step is a Process.

Tasks are the implementation detail of a Process.

So the change in your Architecture will be to chain multiple processes together 
within an experiment and not to chain multiple experiments. Does it make sense? 
You can also refer to the attached figure which illustrate these from a data 
model perspective.

Suresh

P.S. Over all, great going in mailing list communications, keep’em coming.


On May 21, 2018, at 1:25 AM, Yasas 

Re: User Defined Workflow Execution Framework

2018-06-04 Thread DImuthu Upeksha
Hi Yasas,

Thanks for the summary. As now you have a clear idea about what you have to
do, let's move on to implement a prototype that validates your workflow
blocks so that we can give our feedbacks constructively.

Hi Sudhakar,

Based on your question, I can imagine two scenarios.

1. Workflow is paused in the middle and resumed when required.
This is straightforward if we use Helix api directly

2. Workflow is stopped permanently and do a fresh restart of the workflow.
As far as I have understood, Helix currently does not have a workflow
cloning capability, So we might have to clone it from our side and instruct
Helix to run it as a new workflow. Or we can extend Helix api to support
workflow cloning which is the cleaner and ideal way. However it might need
some understanding of Helix code base and proper testing. So for the
time-being, let's go with the first approach.

Thanks
Dimuthu

On Sun, Jun 3, 2018 at 7:35 AM, Pamidighantam, Sudhakar 
wrote:

> Is there a chance to include workflow restarter (where it was stopped
> earlier) in the tasks.
>
> Thanks,
> Sudhakar.
>
> On Jun 2, 2018, at 11:52 PM, Yasas Gunarathne 
> wrote:
>
> Hi Suresh and Dimuthu,
>
> Thank you very much for the clarifications and suggestions. Based on them
> and other Helix related factors encountered during the implementation
> process, I updated and simplified the structure of workflow execution
> framework.
>
> *1. Airavata Workflow Manager*
>
> Airavata Workflow Manager is responsible for accepting the workflow
> information provided by the user, creating a Helix workflow with task
> dependencies, and submitting it for execution.
>
>
> *2. Airavata Workflow Data Blocks*
>
> Airavata Workflow Data Blocks are saved in JSON format as user contents in
> Helix workflow scope. These blocks contain the links of input data of the
> user, replica catalog entries of output data, and other information that
> are required for the workflow execution.
>
>
> *3. Airavata Workflow Tasks*
> *3.1. Operator Tasks*
>
> *i. Flow Starter Task*
>
> Flow Starter Task is responsible for starting a specific branch of the
> Airavata Workflow. In a single Airavata Workflow there can be multiple
> starting points.
>
> *ii. Flow Terminator Task*
>
> Flow Terminator Task is responsible for terminating a specific branch of
> the Airavata workflow. In a single workflow there can be multiple
> terminating points.
>
> *iii. Flow Barrier Task*
>
> Flow Barrier Task works as a waiting component at a middle of a workflow.
> For example if there are two experiments running and the results of both
> experiments are required to continue the workflow, barrier waits for both
> experiments to be completed before continuing.
>
> *iv. Flow Divider Task*
>
> Flow Divider Task opens up new branches of the workflow.
>
> *v. Condition Handler Task*
>
> Condition Handler Task is the path selection component of the workflow.
>
>
> *3.2. Processor Tasks*
>
> These components are responsible for triggering Orchestrator to perform
> specific processes (ex: experiments / data processing activities).
>
>
> *3.3. Loop Tasks*
>
> *i. Foreach Loop Task*
> *ii. Do While Loop Task*
>
>
> Regards
>
> On Mon, May 21, 2018 at 4:01 PM Suresh Marru  wrote:
>
>> Hi Yasas,
>>
>> This is good detail, I haven’t digested it all, but a quick feedback.
>> Instead of connecting multiple experiments within a workflow which will be
>> confusing from a user point of view, can you use the following terminology:
>>
>> * A computational experiment may have a single application execution or
>> multiple (workflow).
>>
>> ** So an experiment may correspond to a single application execution,
>> multiple application execution or even multiple workflows nested amongst
>> them (hierarchal workflows) To avoid any confusion, lets call these units
>> of execution as a Process.
>>
>> A process is an abstract notion for a unit of execution without going
>> into implementing details and it described the inputs and outputs. For an
>> experiment with single application, experiment and process have one on one
>> correspondence, but within a workflow, each step is a Process.
>>
>> Tasks are the implementation detail of a Process.
>>
>> So the change in your Architecture will be to chain multiple processes
>> together within an experiment and not to chain multiple experiments. Does
>> it make sense? You can also refer to the attached figure which illustrate
>> these from a data model perspective.
>>
>> Suresh
>>
>> P.S. Over all, great going in mailing list communications, keep’em
>> coming.
>>
>>
>> On May 21, 2018, at 1:25 AM, Yasas Gunarathne 
>> wrote:
>>
>> Hi Upeksha,
>>
>> Thank you for the information. I have identified the components that
>> needs to be included in the workflow execution framework. Please add if
>> anything is missing.
>>
>> *1. Airavata Workflow Message Context*
>>
>> Airavata Workflow Message Context is the common data structure that is
>> passing through all Airavata workflow components. 

Re: User Defined Workflow Execution Framework

2018-06-03 Thread Pamidighantam, Sudhakar
Is there a chance to include workflow restarter (where it was stopped earlier) 
in the tasks.

Thanks,
Sudhakar.
On Jun 2, 2018, at 11:52 PM, Yasas Gunarathne 
mailto:yasasgunarat...@gmail.com>> wrote:

Hi Suresh and Dimuthu,

Thank you very much for the clarifications and suggestions. Based on them and 
other Helix related factors encountered during the implementation process, I 
updated and simplified the structure of workflow execution framework.

1. Airavata Workflow Manager
Airavata Workflow Manager is responsible for accepting the workflow information 
provided by the user, creating a Helix workflow with task dependencies, and 
submitting it for execution.

2. Airavata Workflow Data Blocks
Airavata Workflow Data Blocks are saved in JSON format as user contents in 
Helix workflow scope. These blocks contain the links of input data of the user, 
replica catalog entries of output data, and other information that are required 
for the workflow execution.

3. Airavata Workflow Tasks
3.1. Operator Tasks
i. Flow Starter Task
Flow Starter Task is responsible for starting a specific branch of the Airavata 
Workflow. In a single Airavata Workflow there can be multiple starting points.
ii. Flow Terminator Task
Flow Terminator Task is responsible for terminating a specific branch of the 
Airavata workflow. In a single workflow there can be multiple terminating 
points.
iii. Flow Barrier Task
Flow Barrier Task works as a waiting component at a middle of a workflow. For 
example if there are two experiments running and the results of both 
experiments are required to continue the workflow, barrier waits for both 
experiments to be completed before continuing.
iv. Flow Divider Task
Flow Divider Task opens up new branches of the workflow.
v. Condition Handler Task
Condition Handler Task is the path selection component of the workflow.

3.2. Processor Tasks
These components are responsible for triggering Orchestrator to perform 
specific processes (ex: experiments / data processing activities).

3.3. Loop Tasks
i. Foreach Loop Task
ii. Do While Loop Task

Regards

On Mon, May 21, 2018 at 4:01 PM Suresh Marru 
mailto:sma...@apache.org>> wrote:
Hi Yasas,

This is good detail, I haven’t digested it all, but a quick feedback. Instead 
of connecting multiple experiments within a workflow which will be confusing 
from a user point of view, can you use the following terminology:

* A computational experiment may have a single application execution or 
multiple (workflow).

** So an experiment may correspond to a single application execution, multiple 
application execution or even multiple workflows nested amongst them 
(hierarchal workflows) To avoid any confusion, lets call these units of 
execution as a Process.

A process is an abstract notion for a unit of execution without going into 
implementing details and it described the inputs and outputs. For an experiment 
with single application, experiment and process have one on one correspondence, 
but within a workflow, each step is a Process.

Tasks are the implementation detail of a Process.

So the change in your Architecture will be to chain multiple processes together 
within an experiment and not to chain multiple experiments. Does it make sense? 
You can also refer to the attached figure which illustrate these from a data 
model perspective.

Suresh

P.S. Over all, great going in mailing list communications, keep’em coming.


On May 21, 2018, at 1:25 AM, Yasas Gunarathne 
mailto:yasasgunarat...@gmail.com>> wrote:

Hi Upeksha,

Thank you for the information. I have identified the components that needs to 
be included in the workflow execution framework. Please add if anything is 
missing.

1. Airavata Workflow Message Context
Airavata Workflow Message Context is the common data structure that is passing 
through all Airavata workflow components. Airavata Workflow Message Context 
includes the followings.

  *   Airavata Workflow Messages - This contains the actual data that needs to 
be transferred through the workflow. Content of the Airavata Workflow Messages 
can be modified at Airavata Workflow Components. Single Airavata Workflow 
Message Context can hold multiple Airavata Workflow Messages, and they will be 
stored as key-value pairs keyed by the component id of the last modified 
component. (This is required for the Airavata Flow Barrier)
  *   Flow Monitoring Information - Flow Monitoring Information contains the 
current status and progress of the workflow.
  *   Parent Message Contexts - Parent Message Contexts includes the preceding 
Airavata Workflow Message Contexts if the current message context is created at 
the middle of the workflow. For example Airavata Flow Barriers and Airavata 
Flow Dividers create new message contexts combining and copying messages 
respectively. In such cases new message contexts will include its parent 
message context/s to this section.
  *   Child Message Contexts - Child Message Contexts includes the succeeding 
Airavata 

Re: User Defined Workflow Execution Framework

2018-06-02 Thread Yasas Gunarathne
Hi Suresh and Dimuthu,

Thank you very much for the clarifications and suggestions. Based on them
and other Helix related factors encountered during the implementation
process, I updated and simplified the structure of workflow execution
framework.

*1. Airavata Workflow Manager*

Airavata Workflow Manager is responsible for accepting the workflow
information provided by the user, creating a Helix workflow with task
dependencies, and submitting it for execution.


*2. Airavata Workflow Data Blocks*

Airavata Workflow Data Blocks are saved in JSON format as user contents in
Helix workflow scope. These blocks contain the links of input data of the
user, replica catalog entries of output data, and other information that
are required for the workflow execution.


*3. Airavata Workflow Tasks*
*3.1. Operator Tasks*

*i. Flow Starter Task*

Flow Starter Task is responsible for starting a specific branch of the
Airavata Workflow. In a single Airavata Workflow there can be multiple
starting points.

*ii. Flow Terminator Task*

Flow Terminator Task is responsible for terminating a specific branch of
the Airavata workflow. In a single workflow there can be multiple
terminating points.

*iii. Flow Barrier Task*

Flow Barrier Task works as a waiting component at a middle of a workflow.
For example if there are two experiments running and the results of both
experiments are required to continue the workflow, barrier waits for both
experiments to be completed before continuing.

*iv. Flow Divider Task*

Flow Divider Task opens up new branches of the workflow.

*v. Condition Handler Task*

Condition Handler Task is the path selection component of the workflow.


*3.2. Processor Tasks*

These components are responsible for triggering Orchestrator to perform
specific processes (ex: experiments / data processing activities).


*3.3. Loop Tasks*

*i. Foreach Loop Task*
*ii. Do While Loop Task*


Regards

On Mon, May 21, 2018 at 4:01 PM Suresh Marru  wrote:

> Hi Yasas,
>
> This is good detail, I haven’t digested it all, but a quick feedback.
> Instead of connecting multiple experiments within a workflow which will be
> confusing from a user point of view, can you use the following terminology:
>
> * A computational experiment may have a single application execution or
> multiple (workflow).
>
> ** So an experiment may correspond to a single application execution,
> multiple application execution or even multiple workflows nested amongst
> them (hierarchal workflows) To avoid any confusion, lets call these units
> of execution as a Process.
>
> A process is an abstract notion for a unit of execution without going into
> implementing details and it described the inputs and outputs. For an
> experiment with single application, experiment and process have one on one
> correspondence, but within a workflow, each step is a Process.
>
> Tasks are the implementation detail of a Process.
>
> So the change in your Architecture will be to chain multiple processes
> together within an experiment and not to chain multiple experiments. Does
> it make sense? You can also refer to the attached figure which illustrate
> these from a data model perspective.
>
> Suresh
>
> P.S. Over all, great going in mailing list communications, keep’em coming.
>
>
> On May 21, 2018, at 1:25 AM, Yasas Gunarathne 
> wrote:
>
> Hi Upeksha,
>
> Thank you for the information. I have identified the components that needs
> to be included in the workflow execution framework. Please add if anything
> is missing.
>
> *1. Airavata Workflow Message Context*
>
> Airavata Workflow Message Context is the common data structure that is
> passing through all Airavata workflow components. Airavata Workflow Message
> Context includes the followings.
>
>
>- *Airavata Workflow Messages *- This contains the actual data that
>needs to be transferred through the workflow. Content of the Airavata
>Workflow Messages can be modified at Airavata Workflow Components. Single
>Airavata Workflow Message Context can hold multiple Airavata Workflow
>Messages, and they will be stored as key-value pairs keyed by the component
>id of the last modified component. (This is required for the Airavata Flow
>Barrier)
>- *Flow Monitoring Information* - Flow Monitoring Information contains
>the current status and progress of the workflow.
>- *Parent Message Contexts *- Parent Message Contexts includes the
>preceding Airavata Workflow Message Contexts if the current message context
>is created at the middle of the workflow. For example Airavata Flow
>Barriers and Airavata Flow Dividers create new message contexts combining
>and copying messages respectively. In such cases new message contexts will
>include its parent message context/s to this section.
>- *Child Message Contexts* - Child Message Contexts includes the
>succeeding Airavata Workflow Message Contexts if other message contexts are
>created at the middle of the workflow 

Re: User Defined Workflow Execution Framework

2018-03-25 Thread DImuthu Upeksha
Hi Yasas,

I'm not a expert in XBaya design and use cases but I think Suresh can shed
some light about it. However we no longer use XBaya for workflow
interpretation. So don't get confused with the workflows defined in XBaya
with the description provided in the JIRA ticket. Let's try to make the
concepts clear. We need two levels of Workflows.

1. To run a single experiment of an Application. We call this as a DAG. So
a DAG is statically defined. It can have a set of environment setup tasks,
data staging tasks and a job submission task. For example, a DAG is created
to run a  Gaussian experiment on a compute host
2. To make a chain of Applications. This is what we call an actual
Workflow. In a workflow you can have a Gaussian Experiment running and
followed by a Lammps Experiment. So this is a dynamic workflow. Users can
come up with different combinations of Applications as a workflow

However your claim is true about pausing and restarting workflows. Either
it is a statically defined DAG or a dynamic workflow, we should be able to
do those operations.

I can understand some of the words and terminologies that are in those
resources are confusing and unclear so please feel free to let us know if
you need anything to be clarified.

Thanks
Dimuthu

On Sun, Mar 25, 2018 at 2:45 AM, Yasas Gunarathne  wrote:

> Hi All,
>
> I have few questions to be clarified regarding the user-defined workflow
> execution in Apache Airavata. Here I am talking about the high level
> workflows that are used to chain together multiple applications. This
> related to the issue - Airavata-2717 [1].
>
> In this [2] documentation it says that, the workflow interpreter that
> worked with XBaya provided an interpreted workflow execution framework
> rather than the compiled workflow execution environments, which allowed the
> users to pause the execution of the workflow as necessary and update the
> DAG’s execution states or even the DAG itself and resume execution.
>
> I want to know the actual requirement of having an interpreted workflow
> execution at this level. Is there any domain level advantage in allowing
> users to modify the order of workflow at runtime?
>
> I think we can have, pause, resume, restart, and stop commands available
> even in a compiled workflow execution environment, as long as we don't need
> to change the workflow.
>
> [1] https://issues.apache.org/jira/browse/AIRAVATA-2717
> [2] http://airavata.apache.org/architecture/workflow.html
>
> Regards
> --
> *Yasas Gunarathne*
> Undergraduate at Department of Computer Science and Engineering
> Faculty of Engineering - University of Moratuwa Sri Lanka
> LinkedIn  | GitHub
>  | Mobile : +94 77 4893616
> <+94%2077%20489%203616>
>