This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
     new 02ce3a0238 Edited timetable docs (#38505)
02ce3a0238 is described below

commit 02ce3a0238ef0c170b2121a3ce80b77f54fa1071
Author: Laura Zdanski <25642903+lzdan...@users.noreply.github.com>
AuthorDate: Thu Apr 4 07:00:09 2024 -0400

    Edited timetable docs (#38505)
    
    ---------
    
    Co-authored-by: Collin McNulty <collin.mcnu...@gmail.com>
    Co-authored-by: Tzu-ping Chung <uranu...@gmail.com>
---
 .../authoring-and-scheduling/timetable.rst         | 135 +++++++++++----------
 1 file changed, 72 insertions(+), 63 deletions(-)

diff --git a/docs/apache-airflow/authoring-and-scheduling/timetable.rst 
b/docs/apache-airflow/authoring-and-scheduling/timetable.rst
index 78234910a6..9ca7d436db 100644
--- a/docs/apache-airflow/authoring-and-scheduling/timetable.rst
+++ b/docs/apache-airflow/authoring-and-scheduling/timetable.rst
@@ -19,9 +19,9 @@
 Timetables
 ==========
 
-For DAGs with time-based schedules (as opposed to event-driven), the scheduling
-decisions are driven by its internal "timetable".  The timetable also
-determines the data interval and the logical date of each run created for the 
DAG.
+For a DAG with a time-based schedule (as opposed to event-driven), the DAG's 
internal "timetable"
+drives scheduling.  The timetable also determines the data interval and the 
logical date of
+each run created for the DAG.
 
 DAGs scheduled with a cron expression or ``timedelta`` object are
 internally converted to always use a timetable.
@@ -29,39 +29,39 @@ internally converted to always use a timetable.
 If a cron expression or ``timedelta`` is sufficient for your use case, you 
don't need
 to worry about writing a custom timetable because Airflow has default 
timetables that handle those cases.
 But for more complicated scheduling requirements,
-you may create your own timetable class and pass that to the DAG's 
``schedule`` argument.
+you can create your own timetable class and pass that to the DAG's 
``schedule`` argument.
 
-Here are some examples of when custom timetable implementations are useful:
+Some examples of when custom timetable implementations are useful:
 
-* Data intervals with "holes" between. (Instead of continuous, as both the cron
-  expression and ``timedelta`` schedules represent.)
-* Run tasks at different times each day. For example, an astronomer may find it
+* Task runs that occur at different times each day. For example, an astronomer 
might find it
   useful to run a task at dawn to process data collected from the previous
   night-time period.
-* Schedules not following the Gregorian calendar. For example, create a run for
+* Schedules that don't follow the Gregorian calendar. For example, create a 
run for
   each month in the `Traditional Chinese Calendar`_. This is conceptually
-  similar to the sunset case above, but for a different time scale.
-* Rolling windows, or overlapping data intervals. For example, one may want to
+  similar to the sunrise case, but for a different time scale.
+* Rolling windows, or overlapping data intervals. For example, you might want 
to
   have a run each day, but make each run cover the period of the previous seven
-  days. It is possible to "hack" this with a cron expression, but a custom data
-  interval would be a more natural representation.
+  days. It is possible to hack this with a cron expression, but a custom data
+  interval provides a more natural representation.
+* Data intervals with "holes" between intervals instead of a continuous 
interval, as both the cron
+  expression and ``timedelta`` schedules represent continuous intervals. See 
:ref:`data-interval`.
 
 .. _`Traditional Chinese Calendar`: 
https://en.wikipedia.org/wiki/Chinese_calendar
 
-As such, Airflow allows for custom timetables to be written in plugins and 
used by
-DAGs. An example demonstrating a custom timetable can be found in the
+Airflow allows you to write custom timetables in plugins and used by
+DAGs. You can find an example demonstrating a custom timetable in the
 :doc:`/howto/timetable` how-to guide.
 
 .. note::
 
-    As a general rule, always access Variables, Connections etc or anything 
that would access
+    As a general rule, always access Variables, Connections, or anything else 
that needs access to
     the database as late as possible in your code. See 
:ref:`best_practices/timetables`
     for more best practices to follow.
 
 Built-in Timetables
 -------------------
 
-Airflow comes with several common timetables built in to cover the most common 
use cases. Additional timetables
+Airflow comes with several common timetables built-in to cover the most common 
use cases. Additional timetables
 may be available in plugins.
 
 .. _CronTriggerTimetable:
@@ -82,9 +82,8 @@ A timetable that accepts a cron expression, and triggers DAG 
runs according to i
     def example_dag():
         pass
 
-It is also possible to provide a static data interval to the timetable. The 
optional ``interval`` argument
-must be a :class:`datetime.timedelta` or 
``dateutil.relativedelta.relativedelta``. If given, a triggered DAG
-run's data interval would span the specified duration, and *ends* with the 
trigger time.
+You can also provide a static data interval to the timetable. The optional 
``interval`` argument
+must be a :class:`datetime.timedelta` or 
``dateutil.relativedelta.relativedelta``. When using these arguments, a 
triggered DAG run's data interval spans the specified duration, and *ends* with 
the trigger time.
 
 .. code-block:: python
 
@@ -111,11 +110,11 @@ run's data interval would span the specified duration, 
and *ends* with the trigg
 DeltaDataIntervalTimetable
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Schedules data intervals with a time delta. Can be selected by providing a
+A timetable that schedules data intervals with a time delta. You can select it 
by providing a
 :class:`datetime.timedelta` or ``dateutil.relativedelta.relativedelta`` to the 
``schedule`` parameter of a DAG.
 
-This timetable is more focused on the data interval value and does not 
necessarily align execution dates with
-arbitrary bounds such as start of day or of hour.
+This timetable focuses on the data interval value and does not necessarily 
align execution dates with
+arbitrary bounds, such as the start of day or of hour.
 
 .. seealso:: `Differences between the cron and delta data interval timetables`_
 
@@ -136,8 +135,8 @@ trigger points, and triggers a DAG run at the end of each 
data interval.
 .. seealso:: `Differences between the two cron timetables`_
 .. seealso:: `Differences between the cron and delta data interval timetables`_
 
-This can be selected by providing a string that is a valid cron expression to 
the ``schedule``
-parameter of a DAG as described in the :doc:`../core-concepts/dags` 
documentation.
+Select this timetable by providing a valid cron expression as a string to the 
``schedule``
+parameter of a DAG, as described in the :doc:`../core-concepts/dags` 
documentation.
 
 .. code-block:: python
 
@@ -148,13 +147,13 @@ parameter of a DAG as described in the 
:doc:`../core-concepts/dags` documentatio
 EventsTimetable
 ^^^^^^^^^^^^^^^
 
-Simply pass a list of ``datetime``\s for the DAG to run after. Useful for 
timing based on sporting
-events, planned communication campaigns, and other schedules that are 
arbitrary and irregular but predictable.
+Pass a list of ``datetime``\s for the DAG to run after. This can be useful for 
timing based on sporting
+events, planned communication campaigns, and other schedules that are 
arbitrary and irregular, but predictable.
 
-The list of events must be finite and of reasonable size as it must be loaded 
every time the DAG is parsed. Optionally,
-the ``restrict_to_events`` flag can be used to force manual runs of the DAG to 
use the time of the most recent (or very
-first) event for the data interval, otherwise manual runs will run with a 
``data_interval_start`` and
-``data_interval_end`` equal to the time at which the manual run was begun. You 
can also name the set of events using the
+The list of events must be finite and of reasonable size as it must be loaded 
every time the DAG is parsed. Optionally, use
+the ``restrict_to_events`` flag to force manual runs of the DAG that use the 
time of the most recent, or very
+first, event for the data interval. Otherwise, manual runs begin with a 
``data_interval_start`` and
+``data_interval_end`` equal to the time at which the manual run started. You 
can also name the set of events using the
 ``description`` parameter, which will be displayed in the Airflow UI.
 
 .. code-block:: python
@@ -181,9 +180,9 @@ first) event for the data interval, otherwise manual runs 
will run with a ``data
 
 Dataset event based scheduling with time based scheduling
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Combining conditional dataset expressions with time-based schedules enhances 
scheduling flexibility:
+Combining conditional dataset expressions with time-based schedules enhances 
scheduling flexibility.
 
-The ``DatasetOrTimeSchedule`` is a specialized timetable allowing for the 
scheduling of DAGs based on both time-based schedules and dataset events. It 
facilitates the creation of scheduled runs (as per traditional timetables) and 
dataset-triggered runs, which operate independently.
+The ``DatasetOrTimeSchedule`` is a specialized timetable that allows for the 
scheduling of DAGs based on both time-based schedules and dataset events. It 
also facilitates the creation of both scheduled runs, as per traditional 
timetables, and dataset-triggered runs, which operate independently.
 
 This feature is particularly useful in scenarios where a DAG needs to run on 
dataset updates and also at periodic intervals. It ensures that the workflow 
remains responsive to data changes and consistently runs regular checks or 
updates.
 
@@ -210,52 +209,62 @@ Here's an example of a DAG using 
``DatasetOrTimeSchedule``:
 Timetables comparisons
 ----------------------
 
-
 .. _Differences between the two cron timetables:
 
 Differences between the two cron timetables
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-There are two timetables `CronTriggerTimetable`_ and 
`CronDataIntervalTimetable`_ that accepts a cron expression.
-There are some differences between the two:
-- `CronTriggerTimetable`_ does not take care of *Data Interval*, while 
`CronDataIntervalTimetable`_ does.
-- The time when a DAG run is triggered by `CronTriggerTimetable`_ is more 
intuitive and more similar to what people
-expect cron to behave than that of `CronDataIntervalTimetable`_ (when 
``catchup`` is ``False``).
+Airflow has two timetables `CronTriggerTimetable`_ and 
`CronDataIntervalTimetable`_ that accept a cron expression.
+
+However, there are differences between the two:
+- `CronTriggerTimetable`_ does not address *Data Interval*, while 
`CronDataIntervalTimetable`_ does.
+- The timestamp in the ``run_id``, the ``logical_date`` for 
`CronTriggerTimetable`_ and `CronDataIntervalTimetable`_  are defined 
differently based on how they handle the data interval, as described in 
:ref:`timetables_run_id_logical_date`.
 
 Whether taking care of *Data Interval*
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-`CronTriggerTimetable`_ *does not* care the idea of *data interval*. It means 
the value of ``data_interval_start``,
-``data_interval_end`` and legacy ``execution_date`` are the same - the time 
when a DAG run is triggered.
+`CronTriggerTimetable`_ *does not* include *data interval*. This means that 
the value of ``data_interval_start`` and
+``data_interval_end`` (and the legacy ``execution_date``) are the same; the 
time when a DAG run is triggered.
+
+However, `CronDataIntervalTimetable`_ *does* include *data interval*. This 
means the value of
+``data_interval_start`` and ``data_interval_end`` (and legacy 
``execution_date``) are different. ``data_interval_start`` is the time when a
+DAG run is triggered and ``data_interval_end`` is the end of the interval.
+
+*Catchup* behavior
+^^^^^^^^^^^^^^^^^^
+
+Whether you're using `CronTriggerTimetable`_ or `CronDataIntervalTimetable`_,  
there is no difference when ``catchup`` is ``True``.
 
-On the other hand, `CronDataIntervalTimetable`_ *does* care the idea of *data 
interval*. It means the value of
-``data_interval_start`` and ``data_interval_end`` (and legacy 
``execution_date``) are different. They are the start
-and end of the interval respectively.
+You might want to use ``False`` for ``catchup`` for certain scenarios, to 
prevent running unnecessary DAGs:
+- If you create a new DAG with a start date in the past, and don't want to run 
DAGs for the past. If ``catchup`` is ``True``, Airflow runs all DAGs that would 
have run in that time interval.
+- If you pause an existing DAG, and then restart it at a later date, and don't 
want to  If ``catchup`` is ``True``,
+
+In these scenarios, the ``logical_date`` in the ``run_id`` are based on how 
`CronTriggerTimetable`_ or `CronDataIntervalTimetable`_ handle the data 
interval.
+
+See :ref:`dag-catchup` for more information about how DAG runs are triggered 
when using ``catchup``.
+
+.. _timetables_run_id_logical_date:
 
 The time when a DAG run is triggered
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-There is no difference between the two when ``catchup`` is ``True``. 
:ref:`dag-catchup` tells you how DAG runs are
-triggered when ``catchup`` is ``True``.
+`CronTriggerTimetable`_ and `CronDataIntervalTimetable`_ trigger DAG runs at 
the same time. However, the timestamp for the ``run_id`` is different for each.
+
+- `CronTriggerTimetable`_ has a ``run_id`` timestamp, the ``logical_date``, 
showing when DAG run is able to start.
+- `CronTriggerTimetable`_ and `CronDataIntervalTimetable`_ trigger DAG runs at 
the same time. However, the timestamp for the ``run_id`` (``logical_date``) is 
different for each.
 
-When ``catchup`` is ``False``, there is difference in how a new DAG run is 
triggered. `CronTriggerTimetable`_ triggers
-a new DAG run *after* the current time, while `CronDataIntervalTimetable`_ 
does *before* the current time (assuming
-the value of ``start_date`` is past time).
+For example, suppose there is a cron expression ``@daily`` or ``0 0 * * *``, 
which is scheduled to run at 12AM every day. If you enable DAGs using the two 
timetables at 3PM on January
+31st,
+- `CronTriggerTimetable`_ triggers a new DAG run at 12AM on February 1st. The 
``run_id`` timestamp is midnight, on February 1st.
+- `CronDataIntervalTimetable`_ immediately triggers a new DAG run, because a 
DAG run for the daily time interval beginning at 12AM on January 31st did not 
occur yet. The ``run_id`` timestamp is midnight, on January 31st, since that is 
the beginning of the data interval.
 
-Here is an example showing how the first DAG run is triggered. Supposes there 
is a cron expression ``@daily`` or
-``0 0 * * *``, which is aimed to run at 12AM every day. If you enable DAGs 
using the two timetables at 3PM on January
-31st, `CronTriggerTimetable`_ will trigger a new DAG run at 12AM on February 
1st. `CronDataIntervalTimetable`_, on the other
-hand, will immediately trigger a new DAG run which is supposed to trigger at 
12AM on January 31st if the DAG had been
-enabled beforehand.
+This is another example showing the difference in the case of skipping DAG 
runs.
 
-This is another example showing the difference in the case of skipping DAG 
runs. Suppose there are two running DAGs
-using the two timetables with a cron expression ``@daily`` or ``0 0 * * *``. 
If you pause the DAGs at 3PM on January
-31st and re-enable them at 3PM on February 2nd, `CronTriggerTimetable`_ skips 
the DAG runs which are supposed to
-trigger on February 1st and 2nd. The next DAG run will be triggered at 12AM on 
February 3rd. `CronDataIntervalTimetable`_,
-on the other hand, skips the DAG runs which are supposed to trigger on 
February 1st only. A DAG run for February 2nd
-is immediately triggered after you re-enable the DAG.
+Suppose there are two running DAGs with a cron expression ``@daily`` or ``0 0 
* * *`` that use the two different timetables. If you pause the DAGs at 3PM on 
January 31st and re-enable them at 3PM on February 2nd,
+- `CronTriggerTimetable`_ skips the DAG runs that were supposed to trigger on 
February 1st and 2nd. The next DAG run will be triggered at 12AM on February 
3rd.
+- `CronDataIntervalTimetable`_ skips the DAG runs that were supposed to 
trigger on February 1st only. A DAG run for February 2nd is immediately 
triggered after you re-enable the DAG.
 
-By these examples, you see how `CronTriggerTimetable`_ triggers DAG runs is 
more intuitive and more similar to what
+In these examples, you see how `CronTriggerTimetable`_ triggers DAG runs is 
more intuitive and more similar to what
 people expect cron to behave than how `CronDataIntervalTimetable`_ does.
 
 
@@ -265,8 +274,8 @@ Differences between the cron and delta data interval 
timetables:
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Choosing between `DeltaDataIntervalTimetable`_ and 
`CronDataIntervalTimetable`_ depends on your use case.
-If you enable a DAG at 01:05 on February 1st, the following table summarizes 
the DAG runs created (and the
-data interval that they cover), depending on 3 arguments: ``schedule``, 
``start_date`` and ``catchup``.
+If you enable a DAG at 01:05 on February 1st, the following table summarizes 
the DAG runs created and the
+data interval that they cover, depending on 3 arguments: ``schedule``, 
``start_date`` and ``catchup``.
 
 .. list-table::
    :header-rows: 1

Reply via email to