Hi All, After the discussion with Upeksha, we decided to implement Airavata Workflow as a part of the current Experiment. With that, it is required to modify the current way of saving output files and reporting execution errors at Process execution level.
I modified the previous workflow implementation related thrift models and database models to align with the new requirements. [1] Regards [1] https://github.com/apache/airavata/pull/207 On Fri, Sep 21, 2018 at 10:24 PM Yasas Gunarathne <yasasgunarat...@gmail.com> wrote: > Hi Marcus and Sudhakar, > > Thank you for the detailed answers but still, I have few issues. Let me > explain a little bit more about the architecture of the Airavata Workflow > implementation. > > [image: workflow.png] > > Currently, Orchestrator is only capable of submitting single application > Experiments. But to support different types of workflows we need to have > more control over the processes (It is a little bit more complicated than > submitting a set of Processes). For that, we decided to use *Helix* at > the Orchestrator level. > > Since the current experiment implementation cannot be used in such a > situation, we decided to use a separate set of models and APIs which > enables submitting and launching workflows. [1] > > Workflow execution is also managed by *Helix* and it is at the > *Orchestrator* level. These workflows contain *Helix tasks* which are > responsible for handling workflows. > > *i. Flow Starter Task* > > This task is responsible for starting a specific branch of the Airavata > Workflow. In a single Airavata Workflow, there can be multiple starting > points. Flow starter is the only component which can accept input in the > standard InputDataObjectType. > > *ii. Flow Terminator Task* > > This task is responsible for terminating a specific branch of the Airavata > workflow. In a single workflow, there can be multiple terminating points. > Flow terminator is the only component which can output in the standard > OutputDataObjectType. > > *iii. Flow Barrier Task* > > This task works as a waiting component at a middle of a workflow. For > example, if there are two applications running and the results of both > applications are required to continue the workflow, barrier waits for both > applications to be completed before continuing. > > *iv. Flow Divider Task* > > This task opens up new branches in the middle of a workflow. > > *v. Condition Handler Task* > > This task is the path selection component of the workflow. It works > similar to an if statement. > > *vi. Foreach Loop Task* > > This task divides the input into specified portions and executes the task > loop parallelly for those input portions. > > *vii. Do While Loop Task* > > This task is capable of re-running a specified tasks loop until the result > meets a specified condition. > > > Other than these flow handler tasks it contains a type of task called > *ApplicationTask*, which is responsible for executing an application > within a workflow (workflow contains multiple *application tasks* > connected with *flow handler tasks*). > > Within these ApplicationTasks, it is required to perform the similar > operation that is currently executed within *Orchestrator* in a single > *Experiment*. That is, creating a Process (which has a set of tasks to be > executed) and submitting it for execution. > > I was planned previously to use the same approach that Orchestrator > follows currently when launching an experiment, also within the > *ApplicationTask*, but later realized that it cannot be done since > Process execution performs many experiment specific activities. That is the > reason why I raised this issue and proposed to make Process execution > independent. > > Output data staging (*Saving output files*), is planned to do within > *ApplicationTask* after the Process completes its execution (after > receiving the Process completion message). This is required to be done at > the Orchestrator level since outputs are used as inputs to other *application > tasks* within a workflow. (Outputs are persisted using the DataBlock > table - DataBlock is responsible for maintaining the data flow within the > workflow) > > I think I am clear enough about the exact issue now and waiting to hear > from you again. Thank you again for the continuous support. > > Regards > > [1] https://github.com/apache/airavata/pull/203 > > > On Fri, Sep 21, 2018 at 9:03 PM Christie, Marcus Aaron <machr...@iu.edu> > wrote: > >> >> >> On Sep 20, 2018, at 12:15 AM, Yasas Gunarathne <yasasgunarat...@gmail.com> >> wrote: >> >> In the beginning, I tried to use the current ExperimentModel to implement >> workflows since it has workflow related characteristics as you have >> mentioned. It seemed to be designed at first keeping the workflow as a >> primary focus including even ExperimentType.WORKFLOW. But, apart from that >> and the database level one-to-many relationship with processes, there is no >> significant support provided for workflows. >> >> I believe processes should be capable of executing independently at their >> level of abstraction. But, in the current architecture processes execute >> some experiment related parts going beyond their scope. For example, saving >> experiment output along with process output after completing the process, >> which is not required for workflows. Here, submitting a message to indicate >> the process status should be enough. >> >> >> I think Sudhakar addressed a lot of your questions, but here are some >> additional thoughts: >> >> Processes just execute a set of tasks, which are specified by the >> Orchestrator. For workflows I would expect the Orchestrator to create a >> list of processes that each have a set of tasks that make sense for the >> running of the workflow. For example, regarding saving experiment output, >> the Orchestrator could either create a process to save the experiment >> output or have the terminal process in the workflow have a final task to >> save the experiment output. >> >> If processes can execute independently, it doesn't need to keep >> experiment_id within itself in the table. Isn't it the responsibility of >> whatever the outer layer (Experiment/Workflow) to keep this mapping? WDYT? >> :) >> >> Possibly. I wonder how this relates to the recent data parsing efforts. >> It does make sense that we might want processes to execute independently >> because we do have the use case of running task dags separate from any >> experiment-like context. >> >> As you have mentioned we can keep an additional Experiment within >> Workflow Application to keeping the current Process execution unchanged. >> (Here the experiment is still executing a single application.) Is that what >> you meant? >> >> >> Not quite. I was suggesting that the Experiment is the workflow instance, >> having a list of processes where each process executes an application >> (corresponding roughly to nodes in the workflow dag). >> >> Thanks, >> >> Marcus >> > > > -- > *Yasas Gunarathne* > Undergraduate at Department of Computer Science and Engineering > Faculty of Engineering - University of Moratuwa Sri Lanka > LinkedIn <https://www.linkedin.com/in/yasasgunarathne/> | GitHub > <https://github.com/yasgun> | Mobile : +94 77 4893616 > -- *Yasas Gunarathne* Undergraduate at Department of Computer Science and Engineering Faculty of Engineering - University of Moratuwa Sri Lanka LinkedIn <https://www.linkedin.com/in/yasasgunarathne/> | GitHub <https://github.com/yasgun> | Mobile : +94 77 4893616