dimberman opened a new issue #7939: DagRuns are marked as failed as soon as one 
task fails
URL: https://github.com/apache/airflow/issues/7939
 
 
   
   
   **Apache Airflow version**: 1.7.1.2
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**:
   - **OS** (e.g. from /etc/os-release):
   - **Kernel** (e.g. `uname -a`):
   - **Install tools**:
   - **Others**:
   **What happened**:
   
   https://github.com/apache/incubator-airflow/pull/1514 added a 
verify_integrity function that greedily creates TaskInstance objects for all 
tasks in a dag.
   
   This does not interact well with the assumptions in the new update_state 
function. The guard for if len(tis) == len(dag.active_tasks) is no longer 
effective; in the old world of lazily-created tasks this code would only run 
once all the tasks in the dag had run. Now it runs all the time, and as soon as 
one task in a dag run fails the whole DagRun fails. This is bad since the 
scheduler stops processing the DagRun after that.
   
   In retrospect, the old code was also buggy: if your dag ends with a bunch of 
Queued tasks the DagRun could be marked as failed prematurely.
   
   I suspect the fix is to update the guard to look at tasks where the state is 
success or failed. Otherwise we're evaluating and failing the dag based on 
up_for_retry/queued/scheduled tasks.
   
   **What you expected to happen**:
   
   
   **How to reproduce it**:
   
   
   **Anything else we need to know**:
   
   Moved here from https://issues.apache.org/jira/browse/AIRFLOW-441
       

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to