[jira] [Commented] (AIRFLOW-1419) Trigger Rule not respected downstream of BranchPythonOperator

2018-11-15 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688037#comment-16688037
 ] 

Iuliia Volkova commented on AIRFLOW-1419:
-

[~conradlee], without dummy task, you don't have a branch, you just have the 
task confluence_op what depend on branch_op, it means confluence_op  is a 
branch by itself, it not depend on some branch, it's branch. We don't have in 
Airflow edges what you can say - o, this edge is Branch without a task. 

In your case on your picture, confluence_op - is a branch of branch operator, 
what never be returned by branch operator and it also depends on the result of 
another branch. 

> Trigger Rule not respected downstream of BranchPythonOperator
> -
>
> Key: AIRFLOW-1419
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1419
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.2
>Reporter: Conrad Lee
>Priority: Major
>
> Lets consider the following DAG:
> {noformat}
>   
>  /  \
> branch_op confluence_op
>  \__work_op/
> {noformat}
> This is implemented in the following code:
> {code:java}
> import airflow
> from airflow.operators.python_operator import BranchPythonOperator
> from airflow.operators.dummy_operator import DummyOperator
> from airflow.utils.trigger_rule import TriggerRule
> from airflow.models import DAG
> args = {
> 'owner': 'airflow',
> 'start_date': airflow.utils.dates.days_ago(2)
> }
> dag = DAG(
> dag_id='branch_skip_problem',
> default_args=args,
> schedule_interval="@daily")
> branch_op = BranchPythonOperator(
> task_id='branch_op',
> python_callable=lambda: 'work_op',
> dag=dag)
> work_op = DummyOperator(task_id='work_op', dag=dag)
> confluence_op = DummyOperator(task_id='confluence_op', dag=dag, 
> trigger_rule=TriggerRule.ALL_DONE)
> branch_op.set_downstream(confluence_op)
> branch_op.set_downstream(work_op)
> work_op.set_downstream(confluence_op)
> {code}
> Note that branch_op is a BranchPythonOperator, work_op and confluence_op are 
> DummyOperators, and that confluence_op has its trigger_rule set to ALL_DONE.
> In dag runs where brancher_op chooses to activate work_op as its child, 
> confluence_op never runs. This doesn't seem right, because confluence_op has 
> two parents and a trigger_rule set that it'll run as soon as all of its 
> parents are done (whether or not they are skipped).
> I know this example seems contrived and that in practice there are better 
> ways of conditionally executing work_op. However, this is the minimal code to 
> illustrate the problem. You can imagine that this problem might actually 
> creep up in practice where originally there was a good reason to use the 
> BranchPythonOperator, and then time passes and someone modifies one of the 
> branches so that it doesn't really contain any children anymore, thus 
> resembling the example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1419) Trigger Rule not respected downstream of BranchPythonOperator

2018-11-15 Thread Conrad Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688014#comment-16688014
 ] 

Conrad Lee commented on AIRFLOW-1419:
-

[~xnuinside] thanks for having a look.  Also thanks for fining the bug in the 
example code–I've fixed that.

 

I'm not sure this should be closed though.  As I recall, before 1.8.2, no dummy 
operator was required at all, because task-skip propagated differently.  When 
1.8.2 came along, all of a sudden a dummy was necessary – the question is 
whether this is desired.

I much preferred this previous behavior–why should a dummy operator be 
necessary at all?  If one of the child tasks has a trigger rule thats stops the 
propagation of task-skipping (such as ALL_DONE), then IMHO it should never be 
skipped.

> Trigger Rule not respected downstream of BranchPythonOperator
> -
>
> Key: AIRFLOW-1419
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1419
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.2
>Reporter: Conrad Lee
>Priority: Major
>
> Lets consider the following DAG:
> {noformat}
>   
>  /  \
> branch_op confluence_op
>  \__work_op/
> {noformat}
> This is implemented in the following code:
> {code:java}
> import airflow
> from airflow.operators.python_operator import BranchPythonOperator
> from airflow.operators.dummy_operator import DummyOperator
> from airflow.utils.trigger_rule import TriggerRule
> from airflow.models import DAG
> args = {
> 'owner': 'airflow',
> 'start_date': airflow.utils.dates.days_ago(2)
> }
> dag = DAG(
> dag_id='branch_skip_problem',
> default_args=args,
> schedule_interval="@daily")
> branch_op = BranchPythonOperator(
> task_id='branch_op',
> python_callable=lambda: 'work_op',
> dag=dag)
> work_op = DummyOperator(task_id='work_op', dag=dag)
> confluence_op = DummyOperator(task_id='confluence_op', dag=dag, 
> trigger_rule=TriggerRule.ALL_DONE)
> branch_op.set_downstream(confluence_op)
> branch_op.set_downstream(work_op)
> work_op.set_downstream(confluence_op)
> {code}
> Note that branch_op is a BranchPythonOperator, work_op and confluence_op are 
> DummyOperators, and that confluence_op has its trigger_rule set to ALL_DONE.
> In dag runs where brancher_op chooses to activate work_op as its child, 
> confluence_op never runs. This doesn't seem right, because confluence_op has 
> two parents and a trigger_rule set that it'll run as soon as all of its 
> parents are done (whether or not they are skipped).
> I know this example seems contrived and that in practice there are better 
> ways of conditionally executing work_op. However, this is the minimal code to 
> illustrate the problem. You can imagine that this problem might actually 
> creep up in practice where originally there was a good reason to use the 
> BranchPythonOperator, and then time passes and someone modifies one of the 
> branches so that it doesn't really contain any children anymore, thus 
> resembling the example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)