Hi Airflow Users: I have what I think is a fairly common scenario: my scheduler went down for N number of hours, and I have a DAG that I'd like to re-run for the missing the data intervals. This DAG is currently on an @hourly data interval, and so I can *mostly* achieve this manually from the UI by clicking "Trigger w/ Config", and manually setting the logical date. But this poses a few issues:
1. It's tedious. If I'm missing 6, 7, 8, etc. hours, I need to do this over and over in order to catch up. If I have multiple DAGs that are affected by the same issue (probable), then this really becomes a time drag. 2. It leads to gaps. Say I have successful DAG runs for 12:00 to 13:00, and 14:00 to 15:00. If I try to manually trigger my DAG with a logical date of 14:00 (in order to process the prior hour), Airflow complains that this logical date already exists. So I'm missing an hour's worth of data. It seems there are a couple recommended solutions: * Use the `airflow dags backfill`[1] command. However, the start and end parameters to this command are _dates_, not _datetimes_. So it seems like trying to backfill for, say, 4 hours might lead to some unexpected behavior. * Create a separate DAG [2]. This seems to be the generally accepted paradigm, though in the linked GitHub discussion, there is a view that it should be possible to manually override data interval parameters for an existing DAG when triggering a manual run. Am I missing something? Is there another way to catch up on missing data intervals in this scenario, or should I go ahead and create a separate DAG to handle this job? Thanks, Ben [1]: https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#backfill [2]: https://github.com/apache/airflow/discussions/22232?sort=top#discussioncomment-3159129 --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@airflow.apache.org For additional commands, e-mail: users-h...@airflow.apache.org