Hi Airflow Users:

I have what I think is a fairly common scenario: my scheduler went
down for N number of hours, and I have a DAG that I'd like to re-run
for the missing the data intervals. This DAG is currently on an
@hourly data interval, and so I can *mostly* achieve this manually
from the UI by clicking "Trigger w/ Config", and manually setting the
logical date. But this poses a few issues:

1. It's tedious. If I'm missing 6, 7, 8, etc. hours, I need to do this
   over and over in order to catch up. If I have multiple DAGs that
   are affected by the same issue (probable), then this really becomes
   a time drag.

2. It leads to gaps. Say I have successful DAG runs for 12:00 to
   13:00, and 14:00 to 15:00. If I try to manually trigger my DAG with
   a logical date of 14:00 (in order to process the prior hour),
   Airflow complains that this logical date already exists. So I'm
   missing an hour's worth of data.

It seems there are a couple recommended solutions:

* Use the `airflow dags backfill`[1] command. However, the start and
  end parameters to this command are _dates_, not _datetimes_. So it
  seems like trying to backfill for, say, 4 hours might lead to some
  unexpected behavior.

* Create a separate DAG [2]. This seems to be the generally accepted
  paradigm, though in the linked GitHub discussion, there is a view
  that it should be possible to manually override data interval
  parameters for an existing DAG when triggering a manual run.

Am I missing something? Is there another way to catch up on
missing data intervals in this scenario, or should I go ahead and
create a separate DAG to handle this job?

Thanks,

Ben

[1]: 
https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#backfill

[2]: 
https://github.com/apache/airflow/discussions/22232?sort=top#discussioncomment-3159129


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@airflow.apache.org
For additional commands, e-mail: users-h...@airflow.apache.org

Reply via email to