Hi all,

I am currently investigating Airflow as a solution for our experiment analysis workflow, and I have some questions related to our specific requirements:

1. We have multiple experiments each day with our reactor and every
   experiment needs to be analyzed by multiple analysis nodes (ANs).
    1. should I rather utilize dynamic task mapping for spawning many
       identical "subgraphs" (only with changing experiment id), or is
       it better to use a local file or database as an airflow.Dataset
2. We want to be able to set the priority of analysis, such that we
   have quick access to the results of important experiments
    1. Is it advisable to build an Operator subclass that updates the
       priorities of the dag based on user input?
3. Especially every time, a new experiment has just finished, we want
   to prioritize the analysis of that in order to quickly plan new
   experiments
    1. Is it possible at all to keep searching for recently finished
       experiments while the dag is running?
4. We need to track which experiments have already been analyzed
    1. Does the airflow database track xcom values. In our case the
       experiment id needs to be passed via xcom and it would be good
       to be able to filter all finished dag runs for their experiment id
5. In case an AN changes its version, all experiments have to be
   reprocessed starting from the AN and triggering all downstream ANs
   for that experiment
    1. I found the priority_weight parameter. I have no clue, how to
       globally prioritize experiments and ANs in case I use
       airflow.Dataset with two DAGs

Generally:

1. is airflow the best tool suited for these requirements.


I hope these questions are clear and comprehensible.

Thanks for this nice project and your time,

Daniel

Reply via email to