Hi all,
I am currently investigating Airflow as a solution for our experiment
analysis workflow, and I have some questions related to our specific
requirements:
1. We have multiple experiments each day with our reactor and every
experiment needs to be analyzed by multiple analysis nodes (ANs).
1. should I rather utilize dynamic task mapping for spawning many
identical "subgraphs" (only with changing experiment id), or is
it better to use a local file or database as an airflow.Dataset
2. We want to be able to set the priority of analysis, such that we
have quick access to the results of important experiments
1. Is it advisable to build an Operator subclass that updates the
priorities of the dag based on user input?
3. Especially every time, a new experiment has just finished, we want
to prioritize the analysis of that in order to quickly plan new
experiments
1. Is it possible at all to keep searching for recently finished
experiments while the dag is running?
4. We need to track which experiments have already been analyzed
1. Does the airflow database track xcom values. In our case the
experiment id needs to be passed via xcom and it would be good
to be able to filter all finished dag runs for their experiment id
5. In case an AN changes its version, all experiments have to be
reprocessed starting from the AN and triggering all downstream ANs
for that experiment
1. I found the priority_weight parameter. I have no clue, how to
globally prioritize experiments and ANs in case I use
airflow.Dataset with two DAGs
Generally:
1. is airflow the best tool suited for these requirements.
I hope these questions are clear and comprehensible.
Thanks for this nice project and your time,
Daniel