[ https://issues.apache.org/jira/browse/AIRFLOW-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942326#comment-16942326 ]
Michael A Perez commented on AIRFLOW-5191: ------------------------------------------ Hello Oliver, I recently had to address this issue with a DAG that attempts a CloudSQLImport and falls back to a bulk upsert if the import task fails. Basically, when a task instance fails ([https://github.com/apache/airflow/blob/master/airflow/models/taskinstance.py#L1047]) it announces it and sets it's self to failed. I've found that updating the task after the fact, like in a PythonBranchOperator, doesn't do the trick, however overriding {{on_failure_callback}} does! The function I pass as my task's {{on_failure_callback}} parameter looks like: {code:python} def unfail(context): """ Needed to prevent task fail from propagating """ context['ti'].set_state(state.State.SKIPPED) {code} TL;DR you gotta change the failing task's state using it's {{on_failure_callback}} parameter > SubDag is marked failed > ------------------------ > > Key: AIRFLOW-5191 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5191 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, DagRun > Affects Versions: 1.10.4 > Environment: CentOS 7, Maria-DB, python 3.6.7, Airflow 1.10.4 > Reporter: Oliver Ricken > Priority: Blocker > > Dear all, > after having upgraded from Airflow version 1.10.2 to 1.10.4, we experience > strange and very problematic behaviour of SubDags (which are crucial for our > environment and used frequently). > Tasks inside the SubDag failing and awaiting retry ("up-for-retry") mark the > SubDag "failed" (while in 1.10.2, the SubDag was still in "running"-state). > This is particularly problematic for downstream tasks depending on the state > of the SubDag. Since we have downstream tasks triggered on "all_done", the > downstream task is triggered by the "failed" SubDag although a > SubDag-internal task is awaiting retry and might (in our case: most likely) > yield successfully processed data. This data is thus not available to the > prematurely triggered task downstream of the SubDag. > This is a severe problem for us and worth rolling back to 1.10.2 if there is > no quick solution or work-around to this issue! > We urgently need help on this matter. > Thanks allot in advance, any suggestions and input is highly appreciated! > Cheers > Oliver -- This message was sent by Atlassian Jira (v8.3.4#803005)