[ https://issues.apache.org/jira/browse/AIRFLOW-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jarek Potiuk closed AIRFLOW-3285. --------------------------------- Resolution: Won't Fix I am closing some old issues that are not relevant any more. Please let me know if you want to reopen it. > lazy marking of upstream_failed task state > ------------------------------------------ > > Key: AIRFLOW-3285 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3285 > Project: Apache Airflow > Issue Type: Improvement > Reporter: Kevin McHale > Priority: Minor > > Airflow aggressively applies the {{upstream_failed}} task state: as soon as a > task fails, all of its downstream dependencies get marked. This sometimes > creates problems for us at Etsy. > In particular, we use a pattern for our hadoop Airflow DAGs along these lines: > # the DAG creates a hadoop cluster in GCP/Dataproc > # the DAG executes its tasks on the cluster > # the DAG deletes the cluster once all tasks are done > There are some cases in which the tasks immediately upstream of the > cluster-delete step get marked as {{upstream_failed}}, triggering the > cluster-delete step, even while other tasks continue to execute without > problems on the cluster. The cluster-delete step of course kills all of the > running tasks, requiring all of them to be re-run once the problem with the > failed task is mitigated. > As an example, a DAG that looks like this can exhibit the problem: > {code:java} > Cluster = ClusterCreateOperator(...) > A = Job1Operator(...) > Cluster << A > B = Job2Operator(...) > Cluster << B > C = Job3Operator(...) > A << C > B << C > ClusterDelete = DeleteClusterOperator(trigger_rule="all_done", ...) > D << ClusterDelete{code} > In a DAG like this, suppose task A fails while task B is running. Task C > will immediately be marked as {{upstream_failed}}, which will cause > ClusterDelete to run while task B is still running, which will cause task B > to also fail. > Our solution to this problem has been to implement something like [this > diff|https://github.com/mchalek/incubator-airflow/commit/585349018656cd9b2e3e3e113db6412345485dde], > which lazily applies the {{upstream_failed}} state only to tasks for which > all upstream tasks have already completed. > The consequence in terms of the example above is that task C will not be > marked {{upstream_failed}} in response to task A failing until task B > completes, ensuring that the cluster is not deleted while any upstream tasks > are running. > We find this not to have any adverse behavior on our airflow instances, so we > run all of them with this lazy-marking feature enabled. However, we > recognize that a change in behavior like this may be something that existing > users will want to opt-in for, so we included a config flag in the diff that > defaults to the original behavior. > We would appreciate your consideration of incorporating this diff, or > something like it, to allow us to configure this behavior in unmodified, > upstream airflow. > Thanks! > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)