Hi, this is a question about best practices, as we build our AirFlow
instance and establish coding conventions.

We have a few jobs that follow this pattern:

   - An external API defines a list of items.  Calls to this API are slow,
   let's say on the order of minutes.
   - For each item in this list, we want to launch a sequence of tasks.

So far reading and playing with AirFlow, we figure this might be a good
approach:

   1. A separate "Generator" DAG calls the API and generates a config file
   with the list of items.
   2. The "Actual" DAG parses at DAG parsing time, reads the config file
   and generates a dynamic DAG accordingly.

Are there other preferred ways to do this kind of thing?  Thanks in advance!


Dan Andreescu
Wikimedia Foundation

Reply via email to