[jira] [Resolved] (AIRFLOW-7059) Pass hive_conf to get_pandas_df in HiveServer2Hook

2020-04-21 Thread Chao-Han Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai resolved AIRFLOW-7059.

Fix Version/s: 1.10.11
   Resolution: Fixed

> Pass hive_conf to get_pandas_df in HiveServer2Hook
> --
>
> Key: AIRFLOW-7059
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7059
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.10.9
>Reporter: Ping Zhang
>Priority: Minor
> Fix For: 1.10.11
>
>
> code: 
> [https://github.com/apache/airflow/blob/97a429f9d0cf740c5698060ad55f11e93cb57b55/airflow/providers/apache/hive/hooks/hive.py#L973]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4421) task related stats should be optional

2020-01-03 Thread Chao-Han Tsai (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007559#comment-17007559
 ] 

Chao-Han Tsai commented on AIRFLOW-4421:


I see then I think should fix the issue.

> task related stats should be optional
> -
>
> Key: AIRFLOW-4421
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4421
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently airflow emit task stats by default. The number of tasks can be huge 
> and add heavy loading to stats backend. Thus this ticket aim to make this 
> stats off by default and can be turned on via a flag.
> https://github.com/apache/airflow/blob/af3090786b170baf32c75fbd03c5f277c3ffaef8/docs/metrics.rst



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-4421) task related stats should be optional

2020-01-03 Thread Chao-Han Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai closed AIRFLOW-4421.
--
Resolution: Won't Fix

> task related stats should be optional
> -
>
> Key: AIRFLOW-4421
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4421
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently airflow emit task stats by default. The number of tasks can be huge 
> and add heavy loading to stats backend. Thus this ticket aim to make this 
> stats off by default and can be turned on via a flag.
> https://github.com/apache/airflow/blob/af3090786b170baf32c75fbd03c5f277c3ffaef8/docs/metrics.rst



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4421) task related stats should be optional

2020-01-03 Thread Chao-Han Tsai (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007510#comment-17007510
 ] 

Chao-Han Tsai commented on AIRFLOW-4421:


[~kamil.bregula] just want to clarify. `statsd_allow_list` either enable all 
the stats or none, right?

> task related stats should be optional
> -
>
> Key: AIRFLOW-4421
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4421
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently airflow emit task stats by default. The number of tasks can be huge 
> and add heavy loading to stats backend. Thus this ticket aim to make this 
> stats off by default and can be turned on via a flag.
> https://github.com/apache/airflow/blob/af3090786b170baf32c75fbd03c5f277c3ffaef8/docs/metrics.rst



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6073) Move Qubole Operator Link class to qubole_operator.py

2019-11-26 Thread Chao-Han Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai resolved AIRFLOW-6073.

Resolution: Fixed

> Move Qubole Operator Link class to qubole_operator.py
> -
>
> Key: AIRFLOW-6073
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6073
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: contrib
>Affects Versions: 2.0.0
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 2.0.0
>
>
> The OperatorLink should be in the file where Operator is defined for 
> consistency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-5683) Add propagate_skipped_state to SubDagOperator

2019-11-05 Thread Chao-Han Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai resolved AIRFLOW-5683.

Fix Version/s: 2.0.0
   Resolution: Fixed

> Add propagate_skipped_state to SubDagOperator
> -
>
> Key: AIRFLOW-5683
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5683
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: operators
>Affects Versions: 1.10.6
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently there is no way of telling the parent dag of a sub dag that an 
> essential task has been skipped. This PR addresses this issue by adding a new 
> propagate_skipped_state option to the SubDagOperator.
> Background story: 
> https://lists.apache.org/thread.html/0eefd459a502c5100d792416f8ba720302aa49a8906fe6ea4ec8fca4@%3Cdev.airflow.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5442) Druid broker hook get pandas DataFrame implementation

2019-10-29 Thread Chao-Han Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-5442:
---
Fix Version/s: (was: 1.10.7)
   1.10.6

> Druid broker hook get pandas DataFrame implementation
> -
>
> Key: AIRFLOW-5442
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5442
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.10.5
>Reporter: Sayed Mohammad Hossein Torabi
>Assignee: Sayed Mohammad Hossein Torabi
>Priority: Minor
> Fix For: 1.10.6
>
>
> The  *get_pandas_df* of *DruidDbApiHook* returns NotImplementedError.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-5442) Druid broker hook get pandas DataFrame implementation

2019-10-29 Thread Chao-Han Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai resolved AIRFLOW-5442.

Resolution: Fixed

> Druid broker hook get pandas DataFrame implementation
> -
>
> Key: AIRFLOW-5442
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5442
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.10.5
>Reporter: Sayed Mohammad Hossein Torabi
>Assignee: Sayed Mohammad Hossein Torabi
>Priority: Minor
> Fix For: 1.10.6
>
>
> The  *get_pandas_df* of *DruidDbApiHook* returns NotImplementedError.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5442) Druid broker hook get pandas DataFrame implementation

2019-10-29 Thread Chao-Han Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-5442:
---
Fix Version/s: 1.10.7

> Druid broker hook get pandas DataFrame implementation
> -
>
> Key: AIRFLOW-5442
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5442
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.10.5
>Reporter: Sayed Mohammad Hossein Torabi
>Assignee: Sayed Mohammad Hossein Torabi
>Priority: Minor
> Fix For: 1.10.7
>
>
> The  *get_pandas_df* of *DruidDbApiHook* returns NotImplementedError.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5710) Optionally error on unused operator arguments

2019-10-22 Thread Chao-Han Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-5710:
---
Fix Version/s: 1.10.6

> Optionally error on unused operator arguments
> -
>
> Key: AIRFLOW-5710
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5710
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.10.5
>Reporter: Joshua Carp
>Assignee: Joshua Carp
>Priority: Trivial
> Fix For: 1.10.6
>
>
> Airflow 2.0 will error when operators are instantiated with unused keyword 
> arguments, but for now unused arguments just raise a warning. My team has 
> passed unused arguments to operators and not noticed the warning a few times, 
> and it would be useful to be able to opt in to airflow 2.0 validation early. 
> I propose adding a configuration flag that causes tasks to error on unused 
> arguments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-5710) Optionally error on unused operator arguments

2019-10-22 Thread Chao-Han Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai resolved AIRFLOW-5710.

Resolution: Fixed

> Optionally error on unused operator arguments
> -
>
> Key: AIRFLOW-5710
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5710
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.10.5
>Reporter: Joshua Carp
>Assignee: Joshua Carp
>Priority: Trivial
>
> Airflow 2.0 will error when operators are instantiated with unused keyword 
> arguments, but for now unused arguments just raise a warning. My team has 
> passed unused arguments to operators and not noticed the warning a few times, 
> and it would be useful to be able to opt in to airflow 2.0 validation early. 
> I propose adding a configuration flag that causes tasks to error on unused 
> arguments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-5714) Collect SLA miss emails only from tasks missed SLA

2019-10-22 Thread Chao-Han Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai resolved AIRFLOW-5714.

Fix Version/s: 1.10.7
   Resolution: Fixed

> Collect SLA miss emails only from tasks missed SLA
> --
>
> Key: AIRFLOW-5714
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5714
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.7
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
> Fix For: 1.10.7
>
>
> Currently when a task in the DAG missed the SLA, Airflow would traverse 
> through all the tasks in the DAG and collect all the task-level emails. Then 
> Airflow would send an SLA miss email to all those collected emails, which can 
> add unnecessary noise to task owners that does not contribute to the SLA miss.
> Thus, changing the code to only collect emails from the tasks that missed the 
> SLA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AIRFLOW-5714) Collect SLA miss emails only from tasks missed SLA

2019-10-21 Thread Chao-Han Tsai (Jira)
Chao-Han Tsai created AIRFLOW-5714:
--

 Summary: Collect SLA miss emails only from tasks missed SLA
 Key: AIRFLOW-5714
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5714
 Project: Apache Airflow
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 1.10.7
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Currently when a task in the DAG missed the SLA, Airflow would traverse through 
all the tasks in the DAG and collect all the task-level emails. Then Airflow 
would send an SLA miss email to all those collected emails, which can add 
unnecessary noise to task owners that does not contribute to the SLA miss.

Thus, changing the code to only collect emails from the tasks that missed the 
SLA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AIRFLOW-5666) Create a BaseTransferToS3Operator

2019-10-15 Thread Chao-Han Tsai (Jira)
Chao-Han Tsai created AIRFLOW-5666:
--

 Summary: Create a BaseTransferToS3Operator
 Key: AIRFLOW-5666
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5666
 Project: Apache Airflow
  Issue Type: Improvement
  Components: aws
Affects Versions: 1.10.5
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Create a BaseTransferToS3Operator so that operators such as 
DynamodbToS3Operator can inherit and share common code logic to upload files to 
S3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5338) Add a RedsfhitToS3Operator

2019-08-28 Thread Chao-Han Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-5338:
---
Description: Create an Airflow operator that queries Redshift and persists 
the results to S3. We should be able to leverage the existing code in 
https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dynamodb_to_s3.py
 to handle the flush to s3 logic. We should abstract that logic to a base class 
and let RedshiftToS3Operator and DynamodbToS3Operator inherits that base class  
(was: Create an Airflow operator that queries Postgres and persists the results 
to S3. We should be able to leverage the existing code in 
https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dynamodb_to_s3.py
 to handle the flush to s3 logic. We should abstract that logic to a base class 
and let PostgresToS3Operator and DynamodbToS3Operator inherits that base class)

> Add a RedsfhitToS3Operator
> --
>
> Key: AIRFLOW-5338
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5338
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: aws
>Affects Versions: 1.10.5
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Create an Airflow operator that queries Redshift and persists the results to 
> S3. We should be able to leverage the existing code in 
> https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dynamodb_to_s3.py
>  to handle the flush to s3 logic. We should abstract that logic to a base 
> class and let RedshiftToS3Operator and DynamodbToS3Operator inherits that 
> base class



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (AIRFLOW-5337) Add a PostgresToS3Operator

2019-08-28 Thread Chao-Han Tsai (Jira)
Chao-Han Tsai created AIRFLOW-5337:
--

 Summary: Add a PostgresToS3Operator
 Key: AIRFLOW-5337
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5337
 Project: Apache Airflow
  Issue Type: New Feature
  Components: aws
Affects Versions: 1.10.5
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Create an Airflow operator that queries Postgres and persists the results to 
S3. We should be able to leverage the existing code in 
https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dynamodb_to_s3.py
 to handle the flush to s3 logic. We should abstract that logic to a base class 
and let PostgresToS3Operator and DynamodbToS3Operator inherits that base class



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (AIRFLOW-5338) Add a RedsfhitToS3Operator

2019-08-28 Thread Chao-Han Tsai (Jira)
Chao-Han Tsai created AIRFLOW-5338:
--

 Summary: Add a RedsfhitToS3Operator
 Key: AIRFLOW-5338
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5338
 Project: Apache Airflow
  Issue Type: New Feature
  Components: aws
Affects Versions: 1.10.5
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Create an Airflow operator that queries Postgres and persists the results to 
S3. We should be able to leverage the existing code in 
https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dynamodb_to_s3.py
 to handle the flush to s3 logic. We should abstract that logic to a base class 
and let PostgresToS3Operator and DynamodbToS3Operator inherits that base class



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (AIRFLOW-4940) DynamoDB to S3 backup operator

2019-08-28 Thread Chao-Han Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai closed AIRFLOW-4940.
--
Resolution: Fixed

> DynamoDB to S3 backup operator
> --
>
> Key: AIRFLOW-4940
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4940
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: aws
>Affects Versions: 1.10.4
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Add an Airflow operator that back up DynamoDB table to S3.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (AIRFLOW-4138) [AIP] Introduce DAG manifest

2019-08-05 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai resolved AIRFLOW-4138.

Resolution: Fixed

> [AIP] Introduce DAG manifest
> 
>
> Key: AIRFLOW-4138
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4138
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently Airflow traverses all the files under $AIRFLOW_HOME/dags and find 
> out the DAGs by inspecting the code. We should explicitly specify which files 
> are DAG.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-5094) Make airflow conn prefix configurable

2019-08-05 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-5094:
---
Description: Currently the Airflow picks up connection strings from 
environment variable if the environment variable starts with `AIRFLOW_CONN_`. 
This ticket propose to make the prefix configurable.  (was: Make airflow conn 
prefix configurable)

> Make airflow conn prefix configurable
> -
>
> Key: AIRFLOW-5094
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5094
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.10.5
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently the Airflow picks up connection strings from environment variable 
> if the environment variable starts with `AIRFLOW_CONN_`. This ticket propose 
> to make the prefix configurable.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (AIRFLOW-5095) airflow.cfg merger

2019-08-02 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai closed AIRFLOW-5095.
--
Resolution: Fixed

> airflow.cfg merger
> --
>
> Key: AIRFLOW-5095
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5095
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 1.10.5
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> A CLI that can reads an old airflow.cfg and merge with the latest version of 
> airflow.cfg



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-5095) airflow.cfg merger

2019-08-02 Thread Chao-Han Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899366#comment-16899366
 ] 

Chao-Han Tsai commented on AIRFLOW-5095:


Say in the later version of airflow, we introduced a new flag in the 
airflow.cfg that default to certain value and when the user want to upgrade 
their airflow, this tool would add the flag to their existing airflow.cfg

[~ash] what is your experience to upgrade user's airflow.cfg to a later version?

> airflow.cfg merger
> --
>
> Key: AIRFLOW-5095
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5095
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 1.10.5
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> A CLI that can reads an old airflow.cfg and merge with the latest version of 
> airflow.cfg



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-1482) Error when try to backfill the example_trigger_controller_dag

2019-08-01 Thread Chao-Han Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898572#comment-16898572
 ] 

Chao-Han Tsai commented on AIRFLOW-1482:


At first glance this is more likely to be a problem in TriggerDagOperator 
instead of backfill..

> Error when try to backfill the example_trigger_controller_dag
> -
>
> Key: AIRFLOW-1482
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1482
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: backfill
>Affects Versions: 1.8.1, 1.8.2
> Environment: Ubuntu: 16.04
> Python: 2.7
> CeleryExecutor
> Broker: Redis
>Reporter: Timothee N
>Priority: Blocker
> Attachments: airflow_1.png, airflow_2.png, airflow_3.png
>
>
> Hello,
> Running a backfill command for the 
> {noformat}example_trigger_controller_dag{noformat} example dag, result in the 
> failed task {noformat}test_trigger_dagrun{noformat}
> It seems to me that the problem comes from the TriggerDagRunOperator in the 
> example_trigger_controller_dag ?
> Backfill command: {noformat}airflow backfill -s 2017-07-10 -e 2017-07-13 
> --pool backfill example_trigger_controller_dag{noformat}
> Tested in 1.8.1 and 1.8.2rc1
> Here is the output log from the backfill command :
> {noformat}
> [2017-08-02 13:53:00,844] {__init__.py:57} INFO - Using executor 
> CeleryExecutor
> [2017-08-02 13:53:00,888] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/Grammar.txt
> [2017-08-02 13:53:00,902] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
> /var/lib/airflow/local/lib/python2.7/site-packages/airflow/www/app.py:23: 
> FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been renamed to 
> "CSRFProtect" and will be removed in 1.0.
>   csrf = CsrfProtect()
> [2017-08-02 13:53:01,033] {models.py:168} INFO - Filling up the DagBag from 
> /var/lib/airflow/dags
> [2017-08-02 13:53:01,332] {models.py:1128} INFO - Dependencies all met for 
>  00:00:00 [scheduled]>
> [2017-08-02 13:53:01,337] {base_executor.py:50} INFO - Adding to queue: 
> airflow run example_trigger_controller_dag test_trigger_dagrun 
> 2017-07-10T00:00:00 --pickle 1 --local --pool backfill
> [2017-08-02 13:53:06,267] {celery_executor.py:81} INFO - [celery] queuing 
> (u'example_trigger_controller_dag', u'test_trigger_dagrun', 
> datetime.datetime(2017, 7, 10, 0, 0)) through celery, queue=default
> [2017-08-02 13:53:06,330] {models.py:4164} INFO - Updating state for  example_trigger_controller_dag @ 2017-07-10 00:00:00: 
> backfill_2017-07-10T00:00:00, externally triggered: False> considering 1 
> task(s)
> [2017-08-02 13:53:06,334] {jobs.py:2020} INFO - [backfill progress] | 
> finished run 0 of 1 | tasks waiting: 0 | succeeded: 0 | kicked_off: 1 | 
> failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 0
> [2017-08-02 13:53:11,273] {jobs.py:1743} ERROR - Executor reports task 
> instance  2017-07-10 00:00:00 [queued]> finished (failed) although the task says its 
> queued. Was the task killed externally?
> [2017-08-02 13:53:11,273] {models.py:1433} ERROR - Executor reports task 
> instance  2017-07-10 00:00:00 [queued]> finished (failed) although the task says its 
> queued. Was the task killed externally?
> None
> [2017-08-02 13:53:11,273] {models.py:1457} INFO - Marking task as FAILED.
> [2017-08-02 13:53:11,279] {models.py:1478} ERROR - Executor reports task 
> instance  2017-07-10 00:00:00 [queued]> finished (failed) although the task says its 
> queued. Was the task killed externally?
> [2017-08-02 13:53:11,281] {jobs.py:1694} ERROR - Task instance  example_trigger_controller_dag.test_trigger_dagrun 2017-07-10 00:00:00 
> [failed]> failed
> [2017-08-02 13:53:11,283] {models.py:4164} INFO - Updating state for  example_trigger_controller_dag @ 2017-07-10 00:00:00: 
> backfill_2017-07-10T00:00:00, externally triggered: False> considering 1 
> task(s)
> [2017-08-02 13:53:11,285] {models.py:4204} INFO - Marking run  example_trigger_controller_dag @ 2017-07-10 00:00:00: 
> backfill_2017-07-10T00:00:00, externally triggered: False> failed
> [2017-08-02 13:53:11,298] {jobs.py:2020} INFO - [backfill progress] | 
> finished run 1 of 1 | tasks waiting: 0 | succeeded: 0 | kicked_off: 0 | 
> failed: 1 | skipped: 0 | deadlocked: 0 | not ready: 0
> Traceback (most recent call last):
>   File "/var/lib/airflow/bin/airflow", line 28, in 
> args.func(args)
>   File 
> "/var/lib/airflow/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 
> 167, in backfill
> pool=args.pool)
>   File 
> "/var/lib/airflow/local/lib/python2.7/site-packages/airflow/models.py", line 
> 3373, in run
> job.run()
>   File "/var/lib/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", 
> line 201, in run
> 

[jira] [Created] (AIRFLOW-5095) airflow.cfg merger

2019-08-01 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-5095:
--

 Summary: airflow.cfg merger
 Key: AIRFLOW-5095
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5095
 Project: Apache Airflow
  Issue Type: Improvement
  Components: cli
Affects Versions: 1.10.5
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


A CLI that can reads an old airflow.cfg and merge with the latest version of 
airflow.cfg



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (AIRFLOW-5094) Make airflow conn prefix configurable

2019-08-01 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-5094:
--

 Summary: Make airflow conn prefix configurable
 Key: AIRFLOW-5094
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5094
 Project: Apache Airflow
  Issue Type: Improvement
  Components: core
Affects Versions: 1.10.5
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Make airflow conn prefix configurable



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4941) default_args not applied when dag is assigned to task through setter

2019-07-12 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4941:
---
Description: When the DAG is set to the task via 
[setter|https://github.com/apache/airflow/blob/526c65a57204022596fb69e9478c5515ad0b880e/airflow/models/baseoperator.py#L501],
 `default_args` won't be applied to the task. Adding a warning message to let 
user know about that.  (was: When the DAG is set to the task via 
[setter|https://github.com/apache/airflow/blob/526c65a57204022596fb69e9478c5515ad0b880e/airflow/models/baseoperator.py#L501],
 `default_args` won't be applied to the task)

> default_args not applied when dag is assigned to task through setter
> 
>
> Key: AIRFLOW-4941
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4941
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.4
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> When the DAG is set to the task via 
> [setter|https://github.com/apache/airflow/blob/526c65a57204022596fb69e9478c5515ad0b880e/airflow/models/baseoperator.py#L501],
>  `default_args` won't be applied to the task. Adding a warning message to let 
> user know about that.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4941) default_args not applied when dag is assigned to task through setter

2019-07-12 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4941:
---
Summary: default_args not applied when dag is assigned to task through 
setter  (was: Apply default_args when dag is assigned to task through setter)

> default_args not applied when dag is assigned to task through setter
> 
>
> Key: AIRFLOW-4941
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4941
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.4
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> When the DAG is set to the task via 
> [setter|https://github.com/apache/airflow/blob/526c65a57204022596fb69e9478c5515ad0b880e/airflow/models/baseoperator.py#L501],
>  `default_args` won't be applied to the task



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (AIRFLOW-4941) Apply default_args when dag is assigned to task through setter

2019-07-12 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4941:
--

 Summary: Apply default_args when dag is assigned to task through 
setter
 Key: AIRFLOW-4941
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4941
 Project: Apache Airflow
  Issue Type: Bug
  Components: operators
Affects Versions: 1.10.4
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


When the DAG is set to the task via 
[setter|https://github.com/apache/airflow/blob/526c65a57204022596fb69e9478c5515ad0b880e/airflow/models/baseoperator.py#L501],
 `default_args` won't be applied to the task



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (AIRFLOW-4940) DynamoDB to S3 backup operator

2019-07-12 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4940:
--

 Summary: DynamoDB to S3 backup operator
 Key: AIRFLOW-4940
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4940
 Project: Apache Airflow
  Issue Type: New Feature
  Components: aws
Affects Versions: 1.10.4
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Add an Airflow operator that back up DynamoDB table to S3.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (AIRFLOW-4915) Support backfill through Airflow UI

2019-07-08 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4915:
--

 Summary: Support backfill through Airflow UI
 Key: AIRFLOW-4915
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4915
 Project: Apache Airflow
  Issue Type: New Feature
  Components: backfill, ui
Affects Versions: 2.0.0
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Support a way to backfill DAGs through Airflow UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4914) Submit backfill request through REST endpoint

2019-07-08 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4914:
--

 Summary: Submit backfill request through REST endpoint
 Key: AIRFLOW-4914
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4914
 Project: Apache Airflow
  Issue Type: New Feature
  Components: api, backfill
Affects Versions: 2.0.0
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Support submitting backfill request through REST endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4913) Backfill through scheduler

2019-07-08 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4913:
--

 Summary: Backfill through scheduler
 Key: AIRFLOW-4913
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4913
 Project: Apache Airflow
  Issue Type: New Feature
  Components: backfill
Affects Versions: 2.0.0
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Currently Airflow backfill has its own scheduling logic other than the core 
scheduler. Since the backfill process might not be long running, often times we 
may find out later that the backfill process failed and exited in the middle. 
However, core scheduler is guaranteed (?) to be long running and would be great 
if there is a way to backfill through core scheduler.

Later we can support user submit backfill request through REST endpoint and 
then through Airflow UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4591) Tag tasks with default pool

2019-05-29 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4591:
--

 Summary: Tag tasks with default pool
 Key: AIRFLOW-4591
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4591
 Project: Apache Airflow
  Issue Type: New Feature
  Components: core
Affects Versions: 2.0.0
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Currently the number of running tasks without a pool specified will be limited 
by `non_pooled_task_slot_count`. It limits the number of tasks launched per 
scheduler loop but does not limit the number of tasks running in parallel.

This ticket proposes that we assign tasks without a pool specified to default 
pool which limits the number of running tasks in parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-4535) Breaks job.py into multiple files

2019-05-23 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai resolved AIRFLOW-4535.

Resolution: Fixed

> Breaks job.py into multiple files
> -
>
> Key: AIRFLOW-4535
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4535
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4535) Breaks job.py into multiple files

2019-05-18 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4535:
--

 Summary: Breaks job.py into multiple files
 Key: AIRFLOW-4535
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4535
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4509) SubDagOperator using scheduler instead of backfill

2019-05-13 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4509:
--

 Summary: SubDagOperator using scheduler instead of backfill
 Key: AIRFLOW-4509
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4509
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Make SubDagOperator use Airflow scheduler instead of backfill to schedule tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2372) SubDAGs should share dag_concurrency of parent DAG

2019-05-05 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai reassigned AIRFLOW-2372:
--

Assignee: Chao-Han Tsai

> SubDAGs should share dag_concurrency of parent DAG
> --
>
> Key: AIRFLOW-2372
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2372
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: subdag
>Affects Versions: 1.9.0
> Environment: 1.9.0
> a local scheduler and LocalExecutor, and parallelism = 32, dag_concurrency = 
> 16
>Reporter: Xiao Zhu
>Assignee: Chao-Han Tsai
>Priority: Major
>
> It seems like right now subDAGs are scheduled just like normal DAGs, so if a 
> DAG has a lot of (parallel) subDAGs with each having a lot of operators, 
> triggering that DAG means those subDAGs will get triggered as normal DAGs, 
> and they can easily take all the resources (limited by parallelism) of the 
> scheduler, and other DAGs have to wait for those subDAGs.
> For example, if I have this DAG, with a local scheduler and LocalExecutor, 
> and parallelism = 32, dag_concurrency = 16
> {code:python}
> from airflow.operators.dummy_operator import DummyOperator
> from airflow.operators.python_operator import PythonOperator
> from airflow.operators.subdag_operator import SubDagOperator
> NUM_SUBDAGS = 20
> NUM_OPS_PER_SUBDAG = 10
> def logging_func(id):
>   log.info("Now running id: {}".format(id))
> def build_dag(dag_id, num_ops):
>   dag = DAG(dag_id)
>   start_op = DummyOperator(task_id='start', dag=dag)
>   for i in range(num_ops):
> op = PythonOperator(
>   task_id=str(i),
>   python_callable=logging_func,
>   op_args=[i],
>   dag=dag
> )
> start_op >> op
>   return dag
> parent_id = 'consistent_failure'
> with DAG(
>   parent_id
> ) as dag:
>   start_op = DummyOperator(task_id='start')
>   for i in range(NUM_SUBDAGS):
> task_id = "subdag_{}".format(i)
> op = SubDagOperator(
>   task_id=task_id,
>   subdag=build_dag("{}.{}".format(parent_id, task_id), NUM_OPS_PER_SUBDAG)
> )
> start_op >> op
> {code}
> When I trigger this DAG, Airflow tries to run a lot of the subDAGs at the 
> same time, and since they don't share the dag_concurrency with their parent 
> DAG, each of them tries to run all their operators in parallel at the same 
> time too, which results in 500+ python processes created by Airflow.
> Ideally those subDAGs should share dag_concurrency with their parent DAG (and 
> thus with each other too), so when I trigger this DAG, at any time only up to 
> 16 operators, including the ones in the subDAGs, are running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3653) Pausing a Dag does not pause its SubDags

2019-05-05 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai reassigned AIRFLOW-3653:
--

Assignee: Chao-Han Tsai

> Pausing a Dag does not pause its SubDags
> 
>
> Key: AIRFLOW-3653
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3653
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Håvard Wall
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Pausing a Dag will have no effect on running SubDags and let them complete 
> fully.
> Expected behavior: A running subdag will complete it's current running task, 
> but not start a new one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1077) Subdags can deadlock

2019-05-05 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai reassigned AIRFLOW-1077:
--

Assignee: Chao-Han Tsai

> Subdags can deadlock
> 
>
> Key: AIRFLOW-1077
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1077
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Given a concurrency of n, if all n running tasks are Subdags, the subdags 
> block any of their tasks from executing, leading to deadlock



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4420) Backfill respects task_concurrency

2019-04-26 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4420:
--

 Summary: Backfill respects task_concurrency
 Key: AIRFLOW-4420
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4420
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Airflow backfill should respect 
[task_concurrency|https://github.com/apache/airflow/blob/af3090786b170baf32c75fbd03c5f277c3ffaef8/airflow/models/baseoperator.py#L195-L197].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4422) Stats about pool utilization

2019-04-26 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4422:
--

 Summary: Stats about pool utilization
 Key: AIRFLOW-4422
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4422
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Currently we have stats around number of starving tasks in the pool. We should 
also add more stats for pool around used_slots/open_slots



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4421) task related stats should be optional

2019-04-26 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4421:
--

 Summary: task related stats should be optional
 Key: AIRFLOW-4421
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4421
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Currently airflow emit task stats by default. The number of tasks can be huge 
and add heavy loading to stats backend. Thus this ticket aim to make this stats 
off by default and can be turned on via a flag.

https://github.com/apache/airflow/blob/af3090786b170baf32c75fbd03c5f277c3ffaef8/docs/metrics.rst



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-4254) Create a Presto operator

2019-04-26 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai closed AIRFLOW-4254.
--
Resolution: Won't Fix

> Create a Presto operator
> 
>
> Key: AIRFLOW-4254
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4254
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Create a presto operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1022) Subdag can't receive templated fields

2019-04-26 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai reassigned AIRFLOW-1022:
--

Assignee: (was: Chao-Han Tsai)

> Subdag can't receive templated fields
> -
>
> Key: AIRFLOW-1022
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1022
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: subdag
>Reporter: Marcos Takahashi
>Priority: Major
>  Labels: easyfix
>
> Subdag's can't receive any templated fields as the Operator is setted as 
> tuple() 
> (https://github.com/apache/incubator-airflow/blob/master/airflow/operators/subdag_operator.py#L24)
>  instead of any other templated dict like on PythonOperator 
> (https://github.com/apache/incubator-airflow/blob/master/airflow/operators/python_operator.py#L52).
> That makes impossible on getting some important values like execution_date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2552) Improve description for airflow backfill -c

2019-04-26 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai closed AIRFLOW-2552.
--
Resolution: Won't Fix

> Improve description for airflow backfill -c
> ---
>
> Key: AIRFLOW-2552
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2552
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Improve the description for {code}airflow backfill -c{code}
> Example: JSON string that gets pickled into the DagRun / backfill 's conf 
> attribute



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-4190) Add instrumentation of schedule delay

2019-04-26 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai closed AIRFLOW-4190.
--
Resolution: Won't Fix

> Add instrumentation of schedule delay
> -
>
> Key: AIRFLOW-4190
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4190
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Measures the delay between the scheduled DAG start time (e.g. 
> next_execution_date) and the wall clock time when first task executes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2528) Airflow cli does not allow disabled stdin

2019-04-26 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai reassigned AIRFLOW-2528:
--

Assignee: (was: Chao-Han Tsai)

> Airflow cli does not allow disabled stdin
> -
>
> Key: AIRFLOW-2528
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2528
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 1.9.0
>Reporter: Brent Johnson
>Priority: Major
>
> So basically, I am trying to automated regression testing by executing an 
> airflow dag. I 
>  
> Using the cli I can successfully run the following command:
> {code:java}
> ./airflow run regression-testing regression-ingestion 2018-05-25{code}
> Unfortunately, I want to be triggering this against our staging instance in 
> production on AWS.  
> I figured an easy way to do this would be to use [AWS System 
> Manager|https://docs.aws.amazon.com/systems-manager/latest/userguide/run-command.html]
>  unfortunately any airflow command I call returns:
> {code:java}
> the input device is not a TTY
> {code}
> I was able to recreate this running the following command locally by piping 
> stdin to anywhere:
> {code:java}
> ./airflow run regression-testing regression-ingestion 2018-05-25 
> 0>/dev/null{code}
> This is of course an extreme example but it feels like a bug for a cli to 
> require stdin to be open.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-4189) Airflow table retention

2019-04-26 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai closed AIRFLOW-4189.
--
Resolution: Won't Fix

> Airflow table retention
> ---
>
> Key: AIRFLOW-4189
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4189
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Airflow scheduler cleans up records out of retention window in the Airflow 
> table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-4402) Update super() calls for nvd3

2019-04-23 Thread Chao-Han Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824823#comment-16824823
 ] 

Chao-Han Tsai commented on AIRFLOW-4402:


Hmm i have seen changes on those files.

> Update super() calls for nvd3
> -
>
> Key: AIRFLOW-4402
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4402
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
> Fix For: 2.0.0
>
>
> In all classes under nvd3, replace {{super(__class__, self).__init__(...)}} 
> by {{super().__init__(...)}}
> Similarly for any other {{super}} calls for other methods.
> (In Python 3 {{super(__class__, self) == super()}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4402) Update super() calls for nvd3

2019-04-23 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4402:
---
Description: 
In all classes under nvd3, replace {{super(__class__, self).__init__(...)}} by 
{{super().__init__(...)}}

Similarly for any other {{super}} calls for other methods.

(In Python 3 {{super(__class__, self) == super()}})

  was:
In all classes, replace {{super(__class__, self).__init__(...)}} by 
{{super().__init__(...)}}

Similarly for any other {{super}} calls for other methods.

(In Python 3 {{super(__class__, self) == super()}})


> Update super() calls for nvd3
> -
>
> Key: AIRFLOW-4402
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4402
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
> Fix For: 2.0.0
>
>
> In all classes under nvd3, replace {{super(__class__, self).__init__(...)}} 
> by {{super().__init__(...)}}
> Similarly for any other {{super}} calls for other methods.
> (In Python 3 {{super(__class__, self) == super()}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4402) Update super() calls for nvd3

2019-04-23 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4402:
--

 Summary: Update super() calls for nvd3
 Key: AIRFLOW-4402
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4402
 Project: Apache Airflow
  Issue Type: Sub-task
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai
 Fix For: 2.0.0


In all classes, replace {{super(__class__, self).__init__(...)}} by 
{{super().__init__(...)}}

Similarly for any other {{super}} calls for other methods.

(In Python 3 {{super(__class__, self) == super()}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4398) Extra links include try_number

2019-04-23 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4398:
--

 Summary: Extra links include try_number
 Key: AIRFLOW-4398
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4398
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Currently airflow supports constructing extra links using task and 
execution_date.
https://github.com/apache/airflow/blob/master/airflow/models/baseoperator.py#L977

We should also include the try_number into the scope.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3407) BaseOperator and LoggingMixin do not call super().__init__

2019-04-22 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai reassigned AIRFLOW-3407:
--

Assignee: Chao-Han Tsai

> BaseOperator and LoggingMixin do not call super().__init__
> --
>
> Key: AIRFLOW-3407
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3407
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.1
>Reporter: adam hitchcock
>Assignee: Chao-Han Tsai
>Priority: Major
>
> The {{BaseOperator}} is not necessarily the last class in the MRO; usually it 
> is best practice to always call {{super().__init__(*args, **kwargs)}}
>  to make sure that every class gets it chance to {{__init__}}.
> Is there a specific reason {{BaseOperator}} doesn't call super?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4362) Fix test_execution_limited_parallelism logic

2019-04-20 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4362:
---
Description: 
- We should use `assertEqual` to compare two items instead of using 
`assertTrue`.
- Ensure that all queues are empty after the executer ends. So when 
executor.end()is called, it ensures that the queues are empty with: 
https://github.com/apache/airflow/blob/master/airflow/executors/local_executor.py#L195-L196
 
https://github.com/apache/airflow/blob/master/airflow/executors/local_executor.py#L205
 So I am adding assertion in the test to ensure that.

  was:
- We should use `assertEqual` to compare two items instead of using 
`assertTrue`.
- Ensure that all queues are empty after the executer ends.


> Fix test_execution_limited_parallelism logic
> 
>
> Key: AIRFLOW-4362
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4362
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> - We should use `assertEqual` to compare two items instead of using 
> `assertTrue`.
> - Ensure that all queues are empty after the executer ends. So when 
> executor.end()is called, it ensures that the queues are empty with: 
> https://github.com/apache/airflow/blob/master/airflow/executors/local_executor.py#L195-L196
>  
> https://github.com/apache/airflow/blob/master/airflow/executors/local_executor.py#L205
>  So I am adding assertion in the test to ensure that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4362) Fix test_execution_limited_parallelism logic

2019-04-20 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4362:
---
Description: 
- We should use `assertEqual` to compare two items instead of using 
`assertTrue`.
- Ensure that all queues are empty after the executer ends.

> Fix test_execution_limited_parallelism logic
> 
>
> Key: AIRFLOW-4362
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4362
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> - We should use `assertEqual` to compare two items instead of using 
> `assertTrue`.
> - Ensure that all queues are empty after the executer ends.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-4362) Fix test_execution_limited_parallelism logic

2019-04-20 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai reassigned AIRFLOW-4362:
--

Assignee: Chao-Han Tsai

> Fix test_execution_limited_parallelism logic
> 
>
> Key: AIRFLOW-4362
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4362
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4361) Fix flaky test_integration_run_dag_with_scheduler_failure

2019-04-19 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4361:
--

 Summary: Fix flaky test_integration_run_dag_with_scheduler_failure
 Key: AIRFLOW-4361
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4361
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


test_integration_run_dag_with_scheduler_failure often fails with
{code}
^[[33m   ^[[33m^[[1m^[[33mConnectionError^[[0m^[[0m^[[33m: 
^[[0m^[[33mHTTPConnectionPool(host='10.20.3.19', port=  30809): Max retries 
exceeded with url: 
/api/experimental/dags/example_kubernetes_executor_config/paused/false (Ca  
used by NewConnectionError(': Failed to establish a n  ew connection: [Errno 111] 
Connection refused',))^[[0m^M
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4348) Add a GCP console link in BigQueryOperator

2019-04-17 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4348:
--

 Summary: Add a GCP console link in BigQueryOperator
 Key: AIRFLOW-4348
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4348
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Add a GCP console link in BigQueryOperator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-4191) Add stats about pool utilization

2019-04-14 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai resolved AIRFLOW-4191.

Resolution: Fixed

> Add stats about pool utilization
> 
>
> Key: AIRFLOW-4191
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4191
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Add stats about utilization of each pool (open_slots, used_slots, 
> queued_slots).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4307) Backfill respect concurrency limit

2019-04-14 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4307:
---
Description: Currently backfill respects `pool` limit and 
`max_active_runs`. It is probably a good idea to make it respect concurrency 
limit so that we won't launch a big backfill that occupied all the resources.

> Backfill respect concurrency limit
> --
>
> Key: AIRFLOW-4307
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4307
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently backfill respects `pool` limit and `max_active_runs`. It is 
> probably a good idea to make it respect concurrency limit so that we won't 
> launch a big backfill that occupied all the resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4307) Backfill respect concurrency limit

2019-04-13 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4307:
--

 Summary: Backfill respect concurrency limit
 Key: AIRFLOW-4307
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4307
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4306) Global operator extra links

2019-04-13 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4306:
--

 Summary: Global operator extra links
 Key: AIRFLOW-4306
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4306
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


A way to register global operator extra links that are shared by all the 
operators.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4303) Send email report in the end of backfill

2019-04-12 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4303:
--

 Summary: Send email report in the end of backfill
 Key: AIRFLOW-4303
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4303
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Sends an email report in the end of airflow backfill when user run backfill 
with `--email` flag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4254) Create a Presto operator

2019-04-07 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4254:
--

 Summary: Create a Presto operator
 Key: AIRFLOW-4254
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4254
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Create a presto operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4251) Instrument DagRun schedule delay

2019-04-05 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4251:
--

 Summary: Instrument DagRun schedule delay
 Key: AIRFLOW-4251
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4251
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Instrument DagRun schedule delay - time between expected DagRun start date and 
the actual DagRun start date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-4217) Remove six package in project

2019-04-02 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai reassigned AIRFLOW-4217:
--

Assignee: Chao-Han Tsai

> Remove six package in project
> -
>
> Key: AIRFLOW-4217
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4217
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: zhongjiajie
>Assignee: Chao-Han Tsai
>Priority: Major
> Fix For: 2.0.0
>
>
> Remove all six package in Airflow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-4215) Remove mock from the setup.py and move to internal unittest.mock

2019-04-02 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai reassigned AIRFLOW-4215:
--

Assignee: Chao-Han Tsai

> Remove mock from the setup.py and move to internal unittest.mock
> 
>
> Key: AIRFLOW-4215
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4215
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Fokko Driesprong
>Assignee: Chao-Han Tsai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-4208) Replace @abstractproperty by @abstractmethod

2019-04-02 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai reassigned AIRFLOW-4208:
--

Assignee: Chao-Han Tsai

> Replace @abstractproperty by @abstractmethod
> 
>
> Key: AIRFLOW-4208
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4208
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Fokko Driesprong
>Assignee: Chao-Han Tsai
>Priority: Major
> Fix For: 2.0.0
>
>
> Replace @abstractproperty by @abstractmethod (see 
> https://docs.python.org/3/library/abc.html#abc.abstractproperty)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-4204) Update super() calls

2019-04-02 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai reassigned AIRFLOW-4204:
--

Assignee: Chao-Han Tsai

> Update super() calls 
> -
>
> Key: AIRFLOW-4204
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4204
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Fokko Driesprong
>Assignee: Chao-Han Tsai
>Priority: Major
> Fix For: 2.0.0
>
>
> In all classes, replace {{super(__class__, self).__init__(...)}} by 
> {{super().__init__(...)}}
> Similarly for any other {{super}} calls for other methods.
> (In Python 3 {{super(__class__, self) == super()}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4190) Add instrumentation of schedule delay

2019-04-02 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4190:
---
Summary: Add instrumentation of schedule delay  (was: Add a schedule delay 
monitoring DAG)

> Add instrumentation of schedule delay
> -
>
> Key: AIRFLOW-4190
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4190
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> The DAG measures the delay between the scheduled DAG start time (e.g. 
> next_execution_date) and the wall clock time when first task executes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4189) Airflow table retention

2019-04-02 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4189:
---
Description: Airflow scheduler cleans up records out of retention window in 
the Airflow table.  (was: Create an Airflow DAG that cleans up the records that 
are out of retention period in the metastore. 

- We probably need to first modify each table to record the `last_modfiy_date` 
to support retention period.
- User will specify the retention_period in the airflow.cfg)

> Airflow table retention
> ---
>
> Key: AIRFLOW-4189
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4189
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Airflow scheduler cleans up records out of retention window in the Airflow 
> table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4189) Airflow table retention

2019-04-02 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4189:
---
Summary: Airflow table retention  (was: Add an airflowdb retention DAG)

> Airflow table retention
> ---
>
> Key: AIRFLOW-4189
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4189
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Create an Airflow DAG that cleans up the records that are out of retention 
> period in the metastore. 
> - We probably need to first modify each table to record the 
> `last_modfiy_date` to support retention period.
> - User will specify the retention_period in the airflow.cfg



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4191) Add stats about pool utilization

2019-04-02 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4191:
---
Summary: Add stats about pool utilization  (was: Add a pool utilization 
instrument DAG)

> Add stats about pool utilization
> 
>
> Key: AIRFLOW-4191
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4191
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> This DAG measures the utilization of each pool (open_slots, used_slots, 
> queued_slots).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4191) Add stats about pool utilization

2019-04-02 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4191:
---
Description: Add stats about utilization of each pool (open_slots, 
used_slots, queued_slots).  (was: This DAG measures the utilization of each 
pool (open_slots, used_slots, queued_slots).)

> Add stats about pool utilization
> 
>
> Key: AIRFLOW-4191
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4191
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Add stats about utilization of each pool (open_slots, used_slots, 
> queued_slots).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4190) Add instrumentation of schedule delay

2019-04-02 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4190:
---
Description: Measures the delay between the scheduled DAG start time (e.g. 
next_execution_date) and the wall clock time when first task executes.  (was: 
The DAG measures the delay between the scheduled DAG start time (e.g. 
next_execution_date) and the wall clock time when first task executes.)

> Add instrumentation of schedule delay
> -
>
> Key: AIRFLOW-4190
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4190
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Measures the delay between the scheduled DAG start time (e.g. 
> next_execution_date) and the wall clock time when first task executes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4194) set dag_run state to failed when user terminate backfill

2019-03-31 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4194:
---
Description: Reset dag_run state to failed if user terminate backfill. 
Otherwise the dag_run state will stay in running state which consumes 
max_active_dagruns.  (was: Currently when user terminate the backfill, we set 
the task_instance state to failed. We should also set the dag_run state to 
failed.)

> set dag_run state to failed when user terminate backfill
> 
>
> Key: AIRFLOW-4194
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4194
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Reset dag_run state to failed if user terminate backfill. Otherwise the 
> dag_run state will stay in running state which consumes max_active_dagruns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-4194) set dag_run state to failed when user terminate backfill

2019-03-31 Thread Chao-Han Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806071#comment-16806071
 ] 

Chao-Han Tsai commented on AIRFLOW-4194:


[~TaoFeng] I remember that you contributed something related before but I can't 
find it in the code.

> set dag_run state to failed when user terminate backfill
> 
>
> Key: AIRFLOW-4194
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4194
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently when user terminate the backfill, we set the task_instance state to 
> failed. We should also set the dag_run state to failed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4194) set dag_run state to failed when user terminate backfill

2019-03-31 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4194:
--

 Summary: set dag_run state to failed when user terminate backfill
 Key: AIRFLOW-4194
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4194
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Currently when user terminate the backfill, we set the task_instance state to 
failed. We should also set the dag_run state to failed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-4190) Add a schedule delay monitoring DAG

2019-03-30 Thread Chao-Han Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805890#comment-16805890
 ] 

Chao-Han Tsai commented on AIRFLOW-4190:


[~TaoFeng] I am thinking that we can probably create another category of 
utility DAGs that are used just for the purpose of maintaining Airflow cluster. 
They can be Airflow cluster monitoring, log retention. People can choose 
whether they need to load these DAGs through airflow.cfg, just like how we 
control the example_dags.

> Add a schedule delay monitoring DAG
> ---
>
> Key: AIRFLOW-4190
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4190
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> The DAG measures the delay between the scheduled DAG start time (e.g. 
> next_execution_date) and the wall clock time when first task executes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4138) [AIP] Introduce DAG manifest

2019-03-29 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4138:
---
Summary: [AIP] Introduce DAG manifest  (was: Introduce DAG manifest)

> [AIP] Introduce DAG manifest
> 
>
> Key: AIRFLOW-4138
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4138
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently Airflow traverses all the files under $AIRFLOW_HOME/dags and find 
> out the DAGs by inspecting the code. We should explicitly specify which files 
> are DAG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-4119) Instrument dagrun start time delay

2019-03-29 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai closed AIRFLOW-4119.
--
Resolution: Won't Fix

> Instrument dagrun start time delay
> --
>
> Key: AIRFLOW-4119
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4119
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4139) [AIP] DAG versioning

2019-03-29 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4139:
---
Summary: [AIP] DAG versioning  (was: DAG versioning)

> [AIP] DAG versioning
> 
>
> Key: AIRFLOW-4139
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4139
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently, running DagRun will be impacted if we change the DAG file in the 
> middle of the run. After we have 
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher 
> and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest 
> ready, we can start saving each version of the DAG file on the remote system 
> and the running tasks should refer to a specific version of DAG instead of 
> the latest DAG.
> How is it different from 
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB?
> Please see 
> https://issues.apache.org/jira/browse/AIRFLOW-4139?focusedCommentId=16799264=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16799264



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4191) Add a pool utilization instrument DAG

2019-03-29 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4191:
--

 Summary: Add a pool utilization instrument DAG
 Key: AIRFLOW-4191
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4191
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


This DAG measures the utilization of each pool (open_slots, used_slots, 
queued_slots).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4190) Add a schedule delay monitoring DAG

2019-03-29 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4190:
--

 Summary: Add a schedule delay monitoring DAG
 Key: AIRFLOW-4190
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4190
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


The DAG measures the delay between the scheduled DAG start time (e.g. 
next_execution_date) and the wall clock time when first task executes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4189) Add an airflowdb retention DAG

2019-03-29 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4189:
--

 Summary: Add an airflowdb retention DAG
 Key: AIRFLOW-4189
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4189
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Create an Airflow DAG that cleans up the records that are out of retention 
period in the metastore. 

- We probably need to first modify each table to record the `last_modfiy_date` 
to support retention period.
- User will specify the retention_period in the airflow.cfg



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-4118) Instrument dagrun duration

2019-03-26 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai resolved AIRFLOW-4118.

Resolution: Fixed

> Instrument dagrun duration
> --
>
> Key: AIRFLOW-4118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4118
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4163) IntervalCheckOperator support relative difference ratio and can ignore zero

2019-03-26 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4163:
---
Description: 
- IntervalCheckOperator takes max/min ratio of two values for a metric and 
returns true if it is less than threshold. Currently if one of the values is 0, 
it assigns the ratio as None. In python comparison None < Number is always 
true. We should add an option to fail the task if one of the value is 0.

- Currently it only supports Max/Min. It would be useful to support calculating 
ratio with relative difference.

  was:
IntervalCheckOperator takes max/min ratio of two values for a metric and 
returns true if it is less than threshold.

The bug is if one of the values is 0, it assigns the ratio as None. In python 
comparison None < Number is always true. We should change that to fail the test

Also we should support calculating ratio with relative difference.


>  IntervalCheckOperator support relative difference ratio and can ignore zero
> 
>
> Key: AIRFLOW-4163
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4163
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> - IntervalCheckOperator takes max/min ratio of two values for a metric and 
> returns true if it is less than threshold. Currently if one of the values is 
> 0, it assigns the ratio as None. In python comparison None < Number is always 
> true. We should add an option to fail the task if one of the value is 0.
> - Currently it only supports Max/Min. It would be useful to support 
> calculating ratio with relative difference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4163) IntervalCheckOperator support relative difference ratio and can ignore zero

2019-03-26 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4163:
---
Summary:  IntervalCheckOperator support relative difference ratio and can 
ignore zero  (was:  IntervalCheckOperator support relative difference ratio)

>  IntervalCheckOperator support relative difference ratio and can ignore zero
> 
>
> Key: AIRFLOW-4163
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4163
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> IntervalCheckOperator takes max/min ratio of two values for a metric and 
> returns true if it is less than threshold.
> The bug is if one of the values is 0, it assigns the ratio as None. In python 
> comparison None < Number is always true. We should change that to fail the 
> test
> Also we should support calculating ratio with relative difference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4163) IntervalCheckOperator support relative difference ratio

2019-03-26 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4163:
---
Description: 
IntervalCheckOperator takes max/min ratio of two values for a metric and 
returns true if it is less than threshold.

The bug is if one of the values is 0, it assigns the ratio as None. In python 
comparison None < Number is always true. We should change that to fail the test

Also we should support calculating ratio with relative difference.

  was:
IntervalCheckOperator takes max/min ratio of two values for a metric and 
returns true if it is less than threshold.

The bug is if one of the values is 0, it assigns the ratio as None. In python 
comparison None < Number is always true. We should change that to fail the test


>  IntervalCheckOperator support relative difference ratio
> 
>
> Key: AIRFLOW-4163
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4163
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> IntervalCheckOperator takes max/min ratio of two values for a metric and 
> returns true if it is less than threshold.
> The bug is if one of the values is 0, it assigns the ratio as None. In python 
> comparison None < Number is always true. We should change that to fail the 
> test
> Also we should support calculating ratio with relative difference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4163) IntervalCheckOperator support relative difference ratio

2019-03-26 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4163:
---
Summary:  IntervalCheckOperator support relative difference ratio  (was:  
IntervalCheckOperator comparison doesn't work for null values)

>  IntervalCheckOperator support relative difference ratio
> 
>
> Key: AIRFLOW-4163
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4163
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> IntervalCheckOperator takes max/min ratio of two values for a metric and 
> returns true if it is less than threshold.
> The bug is if one of the values is 0, it assigns the ratio as None. In python 
> comparison None < Number is always true. We should change that to fail the 
> test



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4139) DAG versioning

2019-03-22 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4139:
---
Description: Currently, running DagRun will be impacted if we change the 
DAG file in the middle of the run. After we have   (was: Currently, running 
DagRun will be impacted if we change the DAG file. Existing running DagRun 
should not be impacted.)

> DAG versioning
> --
>
> Key: AIRFLOW-4139
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4139
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently, running DagRun will be impacted if we change the DAG file in the 
> middle of the run. After we have 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-4139) DAG versioning

2019-03-22 Thread Chao-Han Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799264#comment-16799264
 ] 

Chao-Han Tsai commented on AIRFLOW-4139:


I took a quick scan of 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB 
and I think that they are only persisting DAG graphs for each DAG version in 
the metastore just for the purpose of making webserver stateless. That is not 
enough information to run a task at a given DAG version. We will need to 
version the code in a way that Airflow worker can refer to.

What I am proposing here is that once we have 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher 
and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest 
ready. We will have the DAG file stored in a versioned storage. Each running 
TaskInstance would run specific version of code.

> DAG versioning
> --
>
> Key: AIRFLOW-4139
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4139
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently, running DagRun will be impacted if we change the DAG file. 
> Existing running DagRun should not be impacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4139) DAG versioning

2019-03-22 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4139:
---
Description: 
Currently, running DagRun will be impacted if we change the DAG file in the 
middle of the run. After we have 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher 
and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest 
ready, we can start saving each version of the DAG file on the remote system 
and the running tasks should refer to a specific version of DAG instead of the 
latest DAG.

How is it different from 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB?
Please see 
https://issues.apache.org/jira/browse/AIRFLOW-4139?focusedCommentId=16799264=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16799264

  was:
Currently, running DagRun will be impacted if we change the DAG file in the 
middle of the run. After we have 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher 
and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest 
ready, we can start saving each version of the DAG file on the remote system 
and the running tasks should refer to a specific version of DAG instead of the 
latest DAG.

How is it different from 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB?



> DAG versioning
> --
>
> Key: AIRFLOW-4139
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4139
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently, running DagRun will be impacted if we change the DAG file in the 
> middle of the run. After we have 
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher 
> and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest 
> ready, we can start saving each version of the DAG file on the remote system 
> and the running tasks should refer to a specific version of DAG instead of 
> the latest DAG.
> How is it different from 
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB?
> Please see 
> https://issues.apache.org/jira/browse/AIRFLOW-4139?focusedCommentId=16799264=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16799264



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4139) DAG versioning

2019-03-22 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4139:
---
Description: 
Currently, running DagRun will be impacted if we change the DAG file in the 
middle of the run. After we have 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher 
and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest 
ready, we can start saving each version of the DAG file on the remote system 
and the running tasks should refer to a specific version of DAG instead of the 
latest DAG.

How is it different from 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB?


  was:
Currently, running DagRun will be impacted if we change the DAG file in the 
middle of the run. After we have 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher 
and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest 
ready, we can start saving each version of the DAG file on the remote system 
and the running tasks should refer to a specific version of DAG instead of the 
latest DAG.

How is it different from 


> DAG versioning
> --
>
> Key: AIRFLOW-4139
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4139
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently, running DagRun will be impacted if we change the DAG file in the 
> middle of the run. After we have 
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher 
> and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest 
> ready, we can start saving each version of the DAG file on the remote system 
> and the running tasks should refer to a specific version of DAG instead of 
> the latest DAG.
> How is it different from 
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-4139) DAG versioning

2019-03-22 Thread Chao-Han Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao-Han Tsai updated AIRFLOW-4139:
---
Description: 
Currently, running DagRun will be impacted if we change the DAG file in the 
middle of the run. After we have 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher 
and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest 
ready, we can start saving each version of the DAG file on the remote system 
and the running tasks should refer to a specific version of DAG instead of the 
latest DAG.

How is it different from 

  was:Currently, running DagRun will be impacted if we change the DAG file in 
the middle of the run. After we have 


> DAG versioning
> --
>
> Key: AIRFLOW-4139
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4139
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently, running DagRun will be impacted if we change the DAG file in the 
> middle of the run. After we have 
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher 
> and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest 
> ready, we can start saving each version of the DAG file on the remote system 
> and the running tasks should refer to a specific version of DAG instead of 
> the latest DAG.
> How is it different from 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-4139) DAG versioning

2019-03-22 Thread Chao-Han Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799265#comment-16799265
 ] 

Chao-Han Tsai commented on AIRFLOW-4139:


I think I should better add more description as it must be very confusing of 
what we aim for this ticket.

> DAG versioning
> --
>
> Key: AIRFLOW-4139
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4139
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Currently, running DagRun will be impacted if we change the DAG file. 
> Existing running DagRun should not be impacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4139) DAG versioning

2019-03-22 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4139:
--

 Summary: DAG versioning
 Key: AIRFLOW-4139
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4139
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Currently, running DagRun will be impacted if we change the DAG file. Existing 
running DagRun should not be impacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4138) Introduce DAG manifest

2019-03-22 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4138:
--

 Summary: Introduce DAG manifest
 Key: AIRFLOW-4138
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4138
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai


Currently Airflow traverses all the files under $AIRFLOW_HOME/dags and find out 
the DAGs by inspecting the code. We should explicitly specify which files are 
DAG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4119) Instrument dagrun start time delay

2019-03-18 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4119:
--

 Summary: Instrument dagrun start time delay
 Key: AIRFLOW-4119
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4119
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-4118) Instrument dagrun duration

2019-03-18 Thread Chao-Han Tsai (JIRA)
Chao-Han Tsai created AIRFLOW-4118:
--

 Summary: Instrument dagrun duration
 Key: AIRFLOW-4118
 URL: https://issues.apache.org/jira/browse/AIRFLOW-4118
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Chao-Han Tsai
Assignee: Chao-Han Tsai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >