[jira] [Resolved] (AIRFLOW-7059) Pass hive_conf to get_pandas_df in HiveServer2Hook
[ https://issues.apache.org/jira/browse/AIRFLOW-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai resolved AIRFLOW-7059. Fix Version/s: 1.10.11 Resolution: Fixed > Pass hive_conf to get_pandas_df in HiveServer2Hook > -- > > Key: AIRFLOW-7059 > URL: https://issues.apache.org/jira/browse/AIRFLOW-7059 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Affects Versions: 1.10.9 >Reporter: Ping Zhang >Priority: Minor > Fix For: 1.10.11 > > > code: > [https://github.com/apache/airflow/blob/97a429f9d0cf740c5698060ad55f11e93cb57b55/airflow/providers/apache/hive/hooks/hive.py#L973] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4421) task related stats should be optional
[ https://issues.apache.org/jira/browse/AIRFLOW-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007559#comment-17007559 ] Chao-Han Tsai commented on AIRFLOW-4421: I see then I think should fix the issue. > task related stats should be optional > - > > Key: AIRFLOW-4421 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4421 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently airflow emit task stats by default. The number of tasks can be huge > and add heavy loading to stats backend. Thus this ticket aim to make this > stats off by default and can be turned on via a flag. > https://github.com/apache/airflow/blob/af3090786b170baf32c75fbd03c5f277c3ffaef8/docs/metrics.rst -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (AIRFLOW-4421) task related stats should be optional
[ https://issues.apache.org/jira/browse/AIRFLOW-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai closed AIRFLOW-4421. -- Resolution: Won't Fix > task related stats should be optional > - > > Key: AIRFLOW-4421 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4421 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently airflow emit task stats by default. The number of tasks can be huge > and add heavy loading to stats backend. Thus this ticket aim to make this > stats off by default and can be turned on via a flag. > https://github.com/apache/airflow/blob/af3090786b170baf32c75fbd03c5f277c3ffaef8/docs/metrics.rst -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4421) task related stats should be optional
[ https://issues.apache.org/jira/browse/AIRFLOW-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007510#comment-17007510 ] Chao-Han Tsai commented on AIRFLOW-4421: [~kamil.bregula] just want to clarify. `statsd_allow_list` either enable all the stats or none, right? > task related stats should be optional > - > > Key: AIRFLOW-4421 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4421 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently airflow emit task stats by default. The number of tasks can be huge > and add heavy loading to stats backend. Thus this ticket aim to make this > stats off by default and can be turned on via a flag. > https://github.com/apache/airflow/blob/af3090786b170baf32c75fbd03c5f277c3ffaef8/docs/metrics.rst -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (AIRFLOW-6073) Move Qubole Operator Link class to qubole_operator.py
[ https://issues.apache.org/jira/browse/AIRFLOW-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai resolved AIRFLOW-6073. Resolution: Fixed > Move Qubole Operator Link class to qubole_operator.py > - > > Key: AIRFLOW-6073 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6073 > Project: Apache Airflow > Issue Type: Sub-task > Components: contrib >Affects Versions: 2.0.0 >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Minor > Fix For: 2.0.0 > > > The OperatorLink should be in the file where Operator is defined for > consistency -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (AIRFLOW-5683) Add propagate_skipped_state to SubDagOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai resolved AIRFLOW-5683. Fix Version/s: 2.0.0 Resolution: Fixed > Add propagate_skipped_state to SubDagOperator > - > > Key: AIRFLOW-5683 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5683 > Project: Apache Airflow > Issue Type: New Feature > Components: operators >Affects Versions: 1.10.6 >Reporter: Felix Uellendall >Assignee: Felix Uellendall >Priority: Major > Fix For: 2.0.0 > > > Currently there is no way of telling the parent dag of a sub dag that an > essential task has been skipped. This PR addresses this issue by adding a new > propagate_skipped_state option to the SubDagOperator. > Background story: > https://lists.apache.org/thread.html/0eefd459a502c5100d792416f8ba720302aa49a8906fe6ea4ec8fca4@%3Cdev.airflow.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (AIRFLOW-5442) Druid broker hook get pandas DataFrame implementation
[ https://issues.apache.org/jira/browse/AIRFLOW-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-5442: --- Fix Version/s: (was: 1.10.7) 1.10.6 > Druid broker hook get pandas DataFrame implementation > - > > Key: AIRFLOW-5442 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5442 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Affects Versions: 1.10.5 >Reporter: Sayed Mohammad Hossein Torabi >Assignee: Sayed Mohammad Hossein Torabi >Priority: Minor > Fix For: 1.10.6 > > > The *get_pandas_df* of *DruidDbApiHook* returns NotImplementedError. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (AIRFLOW-5442) Druid broker hook get pandas DataFrame implementation
[ https://issues.apache.org/jira/browse/AIRFLOW-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai resolved AIRFLOW-5442. Resolution: Fixed > Druid broker hook get pandas DataFrame implementation > - > > Key: AIRFLOW-5442 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5442 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Affects Versions: 1.10.5 >Reporter: Sayed Mohammad Hossein Torabi >Assignee: Sayed Mohammad Hossein Torabi >Priority: Minor > Fix For: 1.10.6 > > > The *get_pandas_df* of *DruidDbApiHook* returns NotImplementedError. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (AIRFLOW-5442) Druid broker hook get pandas DataFrame implementation
[ https://issues.apache.org/jira/browse/AIRFLOW-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-5442: --- Fix Version/s: 1.10.7 > Druid broker hook get pandas DataFrame implementation > - > > Key: AIRFLOW-5442 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5442 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Affects Versions: 1.10.5 >Reporter: Sayed Mohammad Hossein Torabi >Assignee: Sayed Mohammad Hossein Torabi >Priority: Minor > Fix For: 1.10.7 > > > The *get_pandas_df* of *DruidDbApiHook* returns NotImplementedError. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (AIRFLOW-5710) Optionally error on unused operator arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-5710: --- Fix Version/s: 1.10.6 > Optionally error on unused operator arguments > - > > Key: AIRFLOW-5710 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5710 > Project: Apache Airflow > Issue Type: Improvement > Components: core >Affects Versions: 1.10.5 >Reporter: Joshua Carp >Assignee: Joshua Carp >Priority: Trivial > Fix For: 1.10.6 > > > Airflow 2.0 will error when operators are instantiated with unused keyword > arguments, but for now unused arguments just raise a warning. My team has > passed unused arguments to operators and not noticed the warning a few times, > and it would be useful to be able to opt in to airflow 2.0 validation early. > I propose adding a configuration flag that causes tasks to error on unused > arguments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (AIRFLOW-5710) Optionally error on unused operator arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai resolved AIRFLOW-5710. Resolution: Fixed > Optionally error on unused operator arguments > - > > Key: AIRFLOW-5710 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5710 > Project: Apache Airflow > Issue Type: Improvement > Components: core >Affects Versions: 1.10.5 >Reporter: Joshua Carp >Assignee: Joshua Carp >Priority: Trivial > > Airflow 2.0 will error when operators are instantiated with unused keyword > arguments, but for now unused arguments just raise a warning. My team has > passed unused arguments to operators and not noticed the warning a few times, > and it would be useful to be able to opt in to airflow 2.0 validation early. > I propose adding a configuration flag that causes tasks to error on unused > arguments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (AIRFLOW-5714) Collect SLA miss emails only from tasks missed SLA
[ https://issues.apache.org/jira/browse/AIRFLOW-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai resolved AIRFLOW-5714. Fix Version/s: 1.10.7 Resolution: Fixed > Collect SLA miss emails only from tasks missed SLA > -- > > Key: AIRFLOW-5714 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5714 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.7 >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > Fix For: 1.10.7 > > > Currently when a task in the DAG missed the SLA, Airflow would traverse > through all the tasks in the DAG and collect all the task-level emails. Then > Airflow would send an SLA miss email to all those collected emails, which can > add unnecessary noise to task owners that does not contribute to the SLA miss. > Thus, changing the code to only collect emails from the tasks that missed the > SLA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (AIRFLOW-5714) Collect SLA miss emails only from tasks missed SLA
Chao-Han Tsai created AIRFLOW-5714: -- Summary: Collect SLA miss emails only from tasks missed SLA Key: AIRFLOW-5714 URL: https://issues.apache.org/jira/browse/AIRFLOW-5714 Project: Apache Airflow Issue Type: Improvement Components: scheduler Affects Versions: 1.10.7 Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Currently when a task in the DAG missed the SLA, Airflow would traverse through all the tasks in the DAG and collect all the task-level emails. Then Airflow would send an SLA miss email to all those collected emails, which can add unnecessary noise to task owners that does not contribute to the SLA miss. Thus, changing the code to only collect emails from the tasks that missed the SLA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (AIRFLOW-5666) Create a BaseTransferToS3Operator
Chao-Han Tsai created AIRFLOW-5666: -- Summary: Create a BaseTransferToS3Operator Key: AIRFLOW-5666 URL: https://issues.apache.org/jira/browse/AIRFLOW-5666 Project: Apache Airflow Issue Type: Improvement Components: aws Affects Versions: 1.10.5 Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Create a BaseTransferToS3Operator so that operators such as DynamodbToS3Operator can inherit and share common code logic to upload files to S3. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (AIRFLOW-5338) Add a RedsfhitToS3Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-5338: --- Description: Create an Airflow operator that queries Redshift and persists the results to S3. We should be able to leverage the existing code in https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dynamodb_to_s3.py to handle the flush to s3 logic. We should abstract that logic to a base class and let RedshiftToS3Operator and DynamodbToS3Operator inherits that base class (was: Create an Airflow operator that queries Postgres and persists the results to S3. We should be able to leverage the existing code in https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dynamodb_to_s3.py to handle the flush to s3 logic. We should abstract that logic to a base class and let PostgresToS3Operator and DynamodbToS3Operator inherits that base class) > Add a RedsfhitToS3Operator > -- > > Key: AIRFLOW-5338 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5338 > Project: Apache Airflow > Issue Type: New Feature > Components: aws >Affects Versions: 1.10.5 >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Create an Airflow operator that queries Redshift and persists the results to > S3. We should be able to leverage the existing code in > https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dynamodb_to_s3.py > to handle the flush to s3 logic. We should abstract that logic to a base > class and let RedshiftToS3Operator and DynamodbToS3Operator inherits that > base class -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (AIRFLOW-5337) Add a PostgresToS3Operator
Chao-Han Tsai created AIRFLOW-5337: -- Summary: Add a PostgresToS3Operator Key: AIRFLOW-5337 URL: https://issues.apache.org/jira/browse/AIRFLOW-5337 Project: Apache Airflow Issue Type: New Feature Components: aws Affects Versions: 1.10.5 Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Create an Airflow operator that queries Postgres and persists the results to S3. We should be able to leverage the existing code in https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dynamodb_to_s3.py to handle the flush to s3 logic. We should abstract that logic to a base class and let PostgresToS3Operator and DynamodbToS3Operator inherits that base class -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (AIRFLOW-5338) Add a RedsfhitToS3Operator
Chao-Han Tsai created AIRFLOW-5338: -- Summary: Add a RedsfhitToS3Operator Key: AIRFLOW-5338 URL: https://issues.apache.org/jira/browse/AIRFLOW-5338 Project: Apache Airflow Issue Type: New Feature Components: aws Affects Versions: 1.10.5 Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Create an Airflow operator that queries Postgres and persists the results to S3. We should be able to leverage the existing code in https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dynamodb_to_s3.py to handle the flush to s3 logic. We should abstract that logic to a base class and let PostgresToS3Operator and DynamodbToS3Operator inherits that base class -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (AIRFLOW-4940) DynamoDB to S3 backup operator
[ https://issues.apache.org/jira/browse/AIRFLOW-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai closed AIRFLOW-4940. -- Resolution: Fixed > DynamoDB to S3 backup operator > -- > > Key: AIRFLOW-4940 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4940 > Project: Apache Airflow > Issue Type: New Feature > Components: aws >Affects Versions: 1.10.4 >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Add an Airflow operator that back up DynamoDB table to S3. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (AIRFLOW-4138) [AIP] Introduce DAG manifest
[ https://issues.apache.org/jira/browse/AIRFLOW-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai resolved AIRFLOW-4138. Resolution: Fixed > [AIP] Introduce DAG manifest > > > Key: AIRFLOW-4138 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4138 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently Airflow traverses all the files under $AIRFLOW_HOME/dags and find > out the DAGs by inspecting the code. We should explicitly specify which files > are DAG. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (AIRFLOW-5094) Make airflow conn prefix configurable
[ https://issues.apache.org/jira/browse/AIRFLOW-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-5094: --- Description: Currently the Airflow picks up connection strings from environment variable if the environment variable starts with `AIRFLOW_CONN_`. This ticket propose to make the prefix configurable. (was: Make airflow conn prefix configurable) > Make airflow conn prefix configurable > - > > Key: AIRFLOW-5094 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5094 > Project: Apache Airflow > Issue Type: Improvement > Components: core >Affects Versions: 1.10.5 >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently the Airflow picks up connection strings from environment variable > if the environment variable starts with `AIRFLOW_CONN_`. This ticket propose > to make the prefix configurable. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (AIRFLOW-5095) airflow.cfg merger
[ https://issues.apache.org/jira/browse/AIRFLOW-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai closed AIRFLOW-5095. -- Resolution: Fixed > airflow.cfg merger > -- > > Key: AIRFLOW-5095 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5095 > Project: Apache Airflow > Issue Type: Improvement > Components: cli >Affects Versions: 1.10.5 >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > A CLI that can reads an old airflow.cfg and merge with the latest version of > airflow.cfg -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (AIRFLOW-5095) airflow.cfg merger
[ https://issues.apache.org/jira/browse/AIRFLOW-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899366#comment-16899366 ] Chao-Han Tsai commented on AIRFLOW-5095: Say in the later version of airflow, we introduced a new flag in the airflow.cfg that default to certain value and when the user want to upgrade their airflow, this tool would add the flag to their existing airflow.cfg [~ash] what is your experience to upgrade user's airflow.cfg to a later version? > airflow.cfg merger > -- > > Key: AIRFLOW-5095 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5095 > Project: Apache Airflow > Issue Type: Improvement > Components: cli >Affects Versions: 1.10.5 >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > A CLI that can reads an old airflow.cfg and merge with the latest version of > airflow.cfg -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (AIRFLOW-1482) Error when try to backfill the example_trigger_controller_dag
[ https://issues.apache.org/jira/browse/AIRFLOW-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898572#comment-16898572 ] Chao-Han Tsai commented on AIRFLOW-1482: At first glance this is more likely to be a problem in TriggerDagOperator instead of backfill.. > Error when try to backfill the example_trigger_controller_dag > - > > Key: AIRFLOW-1482 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1482 > Project: Apache Airflow > Issue Type: Bug > Components: backfill >Affects Versions: 1.8.1, 1.8.2 > Environment: Ubuntu: 16.04 > Python: 2.7 > CeleryExecutor > Broker: Redis >Reporter: Timothee N >Priority: Blocker > Attachments: airflow_1.png, airflow_2.png, airflow_3.png > > > Hello, > Running a backfill command for the > {noformat}example_trigger_controller_dag{noformat} example dag, result in the > failed task {noformat}test_trigger_dagrun{noformat} > It seems to me that the problem comes from the TriggerDagRunOperator in the > example_trigger_controller_dag ? > Backfill command: {noformat}airflow backfill -s 2017-07-10 -e 2017-07-13 > --pool backfill example_trigger_controller_dag{noformat} > Tested in 1.8.1 and 1.8.2rc1 > Here is the output log from the backfill command : > {noformat} > [2017-08-02 13:53:00,844] {__init__.py:57} INFO - Using executor > CeleryExecutor > [2017-08-02 13:53:00,888] {driver.py:120} INFO - Generating grammar tables > from /usr/lib/python2.7/lib2to3/Grammar.txt > [2017-08-02 13:53:00,902] {driver.py:120} INFO - Generating grammar tables > from /usr/lib/python2.7/lib2to3/PatternGrammar.txt > /var/lib/airflow/local/lib/python2.7/site-packages/airflow/www/app.py:23: > FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been renamed to > "CSRFProtect" and will be removed in 1.0. > csrf = CsrfProtect() > [2017-08-02 13:53:01,033] {models.py:168} INFO - Filling up the DagBag from > /var/lib/airflow/dags > [2017-08-02 13:53:01,332] {models.py:1128} INFO - Dependencies all met for > 00:00:00 [scheduled]> > [2017-08-02 13:53:01,337] {base_executor.py:50} INFO - Adding to queue: > airflow run example_trigger_controller_dag test_trigger_dagrun > 2017-07-10T00:00:00 --pickle 1 --local --pool backfill > [2017-08-02 13:53:06,267] {celery_executor.py:81} INFO - [celery] queuing > (u'example_trigger_controller_dag', u'test_trigger_dagrun', > datetime.datetime(2017, 7, 10, 0, 0)) through celery, queue=default > [2017-08-02 13:53:06,330] {models.py:4164} INFO - Updating state for example_trigger_controller_dag @ 2017-07-10 00:00:00: > backfill_2017-07-10T00:00:00, externally triggered: False> considering 1 > task(s) > [2017-08-02 13:53:06,334] {jobs.py:2020} INFO - [backfill progress] | > finished run 0 of 1 | tasks waiting: 0 | succeeded: 0 | kicked_off: 1 | > failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 0 > [2017-08-02 13:53:11,273] {jobs.py:1743} ERROR - Executor reports task > instance 2017-07-10 00:00:00 [queued]> finished (failed) although the task says its > queued. Was the task killed externally? > [2017-08-02 13:53:11,273] {models.py:1433} ERROR - Executor reports task > instance 2017-07-10 00:00:00 [queued]> finished (failed) although the task says its > queued. Was the task killed externally? > None > [2017-08-02 13:53:11,273] {models.py:1457} INFO - Marking task as FAILED. > [2017-08-02 13:53:11,279] {models.py:1478} ERROR - Executor reports task > instance 2017-07-10 00:00:00 [queued]> finished (failed) although the task says its > queued. Was the task killed externally? > [2017-08-02 13:53:11,281] {jobs.py:1694} ERROR - Task instance example_trigger_controller_dag.test_trigger_dagrun 2017-07-10 00:00:00 > [failed]> failed > [2017-08-02 13:53:11,283] {models.py:4164} INFO - Updating state for example_trigger_controller_dag @ 2017-07-10 00:00:00: > backfill_2017-07-10T00:00:00, externally triggered: False> considering 1 > task(s) > [2017-08-02 13:53:11,285] {models.py:4204} INFO - Marking run example_trigger_controller_dag @ 2017-07-10 00:00:00: > backfill_2017-07-10T00:00:00, externally triggered: False> failed > [2017-08-02 13:53:11,298] {jobs.py:2020} INFO - [backfill progress] | > finished run 1 of 1 | tasks waiting: 0 | succeeded: 0 | kicked_off: 0 | > failed: 1 | skipped: 0 | deadlocked: 0 | not ready: 0 > Traceback (most recent call last): > File "/var/lib/airflow/bin/airflow", line 28, in > args.func(args) > File > "/var/lib/airflow/local/lib/python2.7/site-packages/airflow/bin/cli.py", line > 167, in backfill > pool=args.pool) > File > "/var/lib/airflow/local/lib/python2.7/site-packages/airflow/models.py", line > 3373, in run > job.run() > File "/var/lib/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 201, in run >
[jira] [Created] (AIRFLOW-5095) airflow.cfg merger
Chao-Han Tsai created AIRFLOW-5095: -- Summary: airflow.cfg merger Key: AIRFLOW-5095 URL: https://issues.apache.org/jira/browse/AIRFLOW-5095 Project: Apache Airflow Issue Type: Improvement Components: cli Affects Versions: 1.10.5 Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai A CLI that can reads an old airflow.cfg and merge with the latest version of airflow.cfg -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (AIRFLOW-5094) Make airflow conn prefix configurable
Chao-Han Tsai created AIRFLOW-5094: -- Summary: Make airflow conn prefix configurable Key: AIRFLOW-5094 URL: https://issues.apache.org/jira/browse/AIRFLOW-5094 Project: Apache Airflow Issue Type: Improvement Components: core Affects Versions: 1.10.5 Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Make airflow conn prefix configurable -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (AIRFLOW-4941) default_args not applied when dag is assigned to task through setter
[ https://issues.apache.org/jira/browse/AIRFLOW-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4941: --- Description: When the DAG is set to the task via [setter|https://github.com/apache/airflow/blob/526c65a57204022596fb69e9478c5515ad0b880e/airflow/models/baseoperator.py#L501], `default_args` won't be applied to the task. Adding a warning message to let user know about that. (was: When the DAG is set to the task via [setter|https://github.com/apache/airflow/blob/526c65a57204022596fb69e9478c5515ad0b880e/airflow/models/baseoperator.py#L501], `default_args` won't be applied to the task) > default_args not applied when dag is assigned to task through setter > > > Key: AIRFLOW-4941 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4941 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: 1.10.4 >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > When the DAG is set to the task via > [setter|https://github.com/apache/airflow/blob/526c65a57204022596fb69e9478c5515ad0b880e/airflow/models/baseoperator.py#L501], > `default_args` won't be applied to the task. Adding a warning message to let > user know about that. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (AIRFLOW-4941) default_args not applied when dag is assigned to task through setter
[ https://issues.apache.org/jira/browse/AIRFLOW-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4941: --- Summary: default_args not applied when dag is assigned to task through setter (was: Apply default_args when dag is assigned to task through setter) > default_args not applied when dag is assigned to task through setter > > > Key: AIRFLOW-4941 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4941 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: 1.10.4 >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > When the DAG is set to the task via > [setter|https://github.com/apache/airflow/blob/526c65a57204022596fb69e9478c5515ad0b880e/airflow/models/baseoperator.py#L501], > `default_args` won't be applied to the task -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (AIRFLOW-4941) Apply default_args when dag is assigned to task through setter
Chao-Han Tsai created AIRFLOW-4941: -- Summary: Apply default_args when dag is assigned to task through setter Key: AIRFLOW-4941 URL: https://issues.apache.org/jira/browse/AIRFLOW-4941 Project: Apache Airflow Issue Type: Bug Components: operators Affects Versions: 1.10.4 Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai When the DAG is set to the task via [setter|https://github.com/apache/airflow/blob/526c65a57204022596fb69e9478c5515ad0b880e/airflow/models/baseoperator.py#L501], `default_args` won't be applied to the task -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (AIRFLOW-4940) DynamoDB to S3 backup operator
Chao-Han Tsai created AIRFLOW-4940: -- Summary: DynamoDB to S3 backup operator Key: AIRFLOW-4940 URL: https://issues.apache.org/jira/browse/AIRFLOW-4940 Project: Apache Airflow Issue Type: New Feature Components: aws Affects Versions: 1.10.4 Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Add an Airflow operator that back up DynamoDB table to S3. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (AIRFLOW-4915) Support backfill through Airflow UI
Chao-Han Tsai created AIRFLOW-4915: -- Summary: Support backfill through Airflow UI Key: AIRFLOW-4915 URL: https://issues.apache.org/jira/browse/AIRFLOW-4915 Project: Apache Airflow Issue Type: New Feature Components: backfill, ui Affects Versions: 2.0.0 Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Support a way to backfill DAGs through Airflow UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4914) Submit backfill request through REST endpoint
Chao-Han Tsai created AIRFLOW-4914: -- Summary: Submit backfill request through REST endpoint Key: AIRFLOW-4914 URL: https://issues.apache.org/jira/browse/AIRFLOW-4914 Project: Apache Airflow Issue Type: New Feature Components: api, backfill Affects Versions: 2.0.0 Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Support submitting backfill request through REST endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4913) Backfill through scheduler
Chao-Han Tsai created AIRFLOW-4913: -- Summary: Backfill through scheduler Key: AIRFLOW-4913 URL: https://issues.apache.org/jira/browse/AIRFLOW-4913 Project: Apache Airflow Issue Type: New Feature Components: backfill Affects Versions: 2.0.0 Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Currently Airflow backfill has its own scheduling logic other than the core scheduler. Since the backfill process might not be long running, often times we may find out later that the backfill process failed and exited in the middle. However, core scheduler is guaranteed (?) to be long running and would be great if there is a way to backfill through core scheduler. Later we can support user submit backfill request through REST endpoint and then through Airflow UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4591) Tag tasks with default pool
Chao-Han Tsai created AIRFLOW-4591: -- Summary: Tag tasks with default pool Key: AIRFLOW-4591 URL: https://issues.apache.org/jira/browse/AIRFLOW-4591 Project: Apache Airflow Issue Type: New Feature Components: core Affects Versions: 2.0.0 Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Currently the number of running tasks without a pool specified will be limited by `non_pooled_task_slot_count`. It limits the number of tasks launched per scheduler loop but does not limit the number of tasks running in parallel. This ticket proposes that we assign tasks without a pool specified to default pool which limits the number of running tasks in parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-4535) Breaks job.py into multiple files
[ https://issues.apache.org/jira/browse/AIRFLOW-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai resolved AIRFLOW-4535. Resolution: Fixed > Breaks job.py into multiple files > - > > Key: AIRFLOW-4535 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4535 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4535) Breaks job.py into multiple files
Chao-Han Tsai created AIRFLOW-4535: -- Summary: Breaks job.py into multiple files Key: AIRFLOW-4535 URL: https://issues.apache.org/jira/browse/AIRFLOW-4535 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4509) SubDagOperator using scheduler instead of backfill
Chao-Han Tsai created AIRFLOW-4509: -- Summary: SubDagOperator using scheduler instead of backfill Key: AIRFLOW-4509 URL: https://issues.apache.org/jira/browse/AIRFLOW-4509 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Make SubDagOperator use Airflow scheduler instead of backfill to schedule tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2372) SubDAGs should share dag_concurrency of parent DAG
[ https://issues.apache.org/jira/browse/AIRFLOW-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai reassigned AIRFLOW-2372: -- Assignee: Chao-Han Tsai > SubDAGs should share dag_concurrency of parent DAG > -- > > Key: AIRFLOW-2372 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2372 > Project: Apache Airflow > Issue Type: Wish > Components: subdag >Affects Versions: 1.9.0 > Environment: 1.9.0 > a local scheduler and LocalExecutor, and parallelism = 32, dag_concurrency = > 16 >Reporter: Xiao Zhu >Assignee: Chao-Han Tsai >Priority: Major > > It seems like right now subDAGs are scheduled just like normal DAGs, so if a > DAG has a lot of (parallel) subDAGs with each having a lot of operators, > triggering that DAG means those subDAGs will get triggered as normal DAGs, > and they can easily take all the resources (limited by parallelism) of the > scheduler, and other DAGs have to wait for those subDAGs. > For example, if I have this DAG, with a local scheduler and LocalExecutor, > and parallelism = 32, dag_concurrency = 16 > {code:python} > from airflow.operators.dummy_operator import DummyOperator > from airflow.operators.python_operator import PythonOperator > from airflow.operators.subdag_operator import SubDagOperator > NUM_SUBDAGS = 20 > NUM_OPS_PER_SUBDAG = 10 > def logging_func(id): > log.info("Now running id: {}".format(id)) > def build_dag(dag_id, num_ops): > dag = DAG(dag_id) > start_op = DummyOperator(task_id='start', dag=dag) > for i in range(num_ops): > op = PythonOperator( > task_id=str(i), > python_callable=logging_func, > op_args=[i], > dag=dag > ) > start_op >> op > return dag > parent_id = 'consistent_failure' > with DAG( > parent_id > ) as dag: > start_op = DummyOperator(task_id='start') > for i in range(NUM_SUBDAGS): > task_id = "subdag_{}".format(i) > op = SubDagOperator( > task_id=task_id, > subdag=build_dag("{}.{}".format(parent_id, task_id), NUM_OPS_PER_SUBDAG) > ) > start_op >> op > {code} > When I trigger this DAG, Airflow tries to run a lot of the subDAGs at the > same time, and since they don't share the dag_concurrency with their parent > DAG, each of them tries to run all their operators in parallel at the same > time too, which results in 500+ python processes created by Airflow. > Ideally those subDAGs should share dag_concurrency with their parent DAG (and > thus with each other too), so when I trigger this DAG, at any time only up to > 16 operators, including the ones in the subDAGs, are running. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3653) Pausing a Dag does not pause its SubDags
[ https://issues.apache.org/jira/browse/AIRFLOW-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai reassigned AIRFLOW-3653: -- Assignee: Chao-Han Tsai > Pausing a Dag does not pause its SubDags > > > Key: AIRFLOW-3653 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3653 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Håvard Wall >Assignee: Chao-Han Tsai >Priority: Major > > Pausing a Dag will have no effect on running SubDags and let them complete > fully. > Expected behavior: A running subdag will complete it's current running task, > but not start a new one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1077) Subdags can deadlock
[ https://issues.apache.org/jira/browse/AIRFLOW-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai reassigned AIRFLOW-1077: -- Assignee: Chao-Han Tsai > Subdags can deadlock > > > Key: AIRFLOW-1077 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1077 > Project: Apache Airflow > Issue Type: Bug >Reporter: Alex Guziel >Assignee: Chao-Han Tsai >Priority: Major > > Given a concurrency of n, if all n running tasks are Subdags, the subdags > block any of their tasks from executing, leading to deadlock -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4420) Backfill respects task_concurrency
Chao-Han Tsai created AIRFLOW-4420: -- Summary: Backfill respects task_concurrency Key: AIRFLOW-4420 URL: https://issues.apache.org/jira/browse/AIRFLOW-4420 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Airflow backfill should respect [task_concurrency|https://github.com/apache/airflow/blob/af3090786b170baf32c75fbd03c5f277c3ffaef8/airflow/models/baseoperator.py#L195-L197]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4422) Stats about pool utilization
Chao-Han Tsai created AIRFLOW-4422: -- Summary: Stats about pool utilization Key: AIRFLOW-4422 URL: https://issues.apache.org/jira/browse/AIRFLOW-4422 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Currently we have stats around number of starving tasks in the pool. We should also add more stats for pool around used_slots/open_slots -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4421) task related stats should be optional
Chao-Han Tsai created AIRFLOW-4421: -- Summary: task related stats should be optional Key: AIRFLOW-4421 URL: https://issues.apache.org/jira/browse/AIRFLOW-4421 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Currently airflow emit task stats by default. The number of tasks can be huge and add heavy loading to stats backend. Thus this ticket aim to make this stats off by default and can be turned on via a flag. https://github.com/apache/airflow/blob/af3090786b170baf32c75fbd03c5f277c3ffaef8/docs/metrics.rst -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-4254) Create a Presto operator
[ https://issues.apache.org/jira/browse/AIRFLOW-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai closed AIRFLOW-4254. -- Resolution: Won't Fix > Create a Presto operator > > > Key: AIRFLOW-4254 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4254 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Create a presto operator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1022) Subdag can't receive templated fields
[ https://issues.apache.org/jira/browse/AIRFLOW-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai reassigned AIRFLOW-1022: -- Assignee: (was: Chao-Han Tsai) > Subdag can't receive templated fields > - > > Key: AIRFLOW-1022 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1022 > Project: Apache Airflow > Issue Type: Improvement > Components: subdag >Reporter: Marcos Takahashi >Priority: Major > Labels: easyfix > > Subdag's can't receive any templated fields as the Operator is setted as > tuple() > (https://github.com/apache/incubator-airflow/blob/master/airflow/operators/subdag_operator.py#L24) > instead of any other templated dict like on PythonOperator > (https://github.com/apache/incubator-airflow/blob/master/airflow/operators/python_operator.py#L52). > That makes impossible on getting some important values like execution_date. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-2552) Improve description for airflow backfill -c
[ https://issues.apache.org/jira/browse/AIRFLOW-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai closed AIRFLOW-2552. -- Resolution: Won't Fix > Improve description for airflow backfill -c > --- > > Key: AIRFLOW-2552 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2552 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Improve the description for {code}airflow backfill -c{code} > Example: JSON string that gets pickled into the DagRun / backfill 's conf > attribute -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-4190) Add instrumentation of schedule delay
[ https://issues.apache.org/jira/browse/AIRFLOW-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai closed AIRFLOW-4190. -- Resolution: Won't Fix > Add instrumentation of schedule delay > - > > Key: AIRFLOW-4190 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4190 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Measures the delay between the scheduled DAG start time (e.g. > next_execution_date) and the wall clock time when first task executes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2528) Airflow cli does not allow disabled stdin
[ https://issues.apache.org/jira/browse/AIRFLOW-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai reassigned AIRFLOW-2528: -- Assignee: (was: Chao-Han Tsai) > Airflow cli does not allow disabled stdin > - > > Key: AIRFLOW-2528 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2528 > Project: Apache Airflow > Issue Type: Bug > Components: cli >Affects Versions: 1.9.0 >Reporter: Brent Johnson >Priority: Major > > So basically, I am trying to automated regression testing by executing an > airflow dag. I > > Using the cli I can successfully run the following command: > {code:java} > ./airflow run regression-testing regression-ingestion 2018-05-25{code} > Unfortunately, I want to be triggering this against our staging instance in > production on AWS. > I figured an easy way to do this would be to use [AWS System > Manager|https://docs.aws.amazon.com/systems-manager/latest/userguide/run-command.html] > unfortunately any airflow command I call returns: > {code:java} > the input device is not a TTY > {code} > I was able to recreate this running the following command locally by piping > stdin to anywhere: > {code:java} > ./airflow run regression-testing regression-ingestion 2018-05-25 > 0>/dev/null{code} > This is of course an extreme example but it feels like a bug for a cli to > require stdin to be open. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-4189) Airflow table retention
[ https://issues.apache.org/jira/browse/AIRFLOW-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai closed AIRFLOW-4189. -- Resolution: Won't Fix > Airflow table retention > --- > > Key: AIRFLOW-4189 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4189 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Airflow scheduler cleans up records out of retention window in the Airflow > table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4402) Update super() calls for nvd3
[ https://issues.apache.org/jira/browse/AIRFLOW-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824823#comment-16824823 ] Chao-Han Tsai commented on AIRFLOW-4402: Hmm i have seen changes on those files. > Update super() calls for nvd3 > - > > Key: AIRFLOW-4402 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4402 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > Fix For: 2.0.0 > > > In all classes under nvd3, replace {{super(__class__, self).__init__(...)}} > by {{super().__init__(...)}} > Similarly for any other {{super}} calls for other methods. > (In Python 3 {{super(__class__, self) == super()}}) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4402) Update super() calls for nvd3
[ https://issues.apache.org/jira/browse/AIRFLOW-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4402: --- Description: In all classes under nvd3, replace {{super(__class__, self).__init__(...)}} by {{super().__init__(...)}} Similarly for any other {{super}} calls for other methods. (In Python 3 {{super(__class__, self) == super()}}) was: In all classes, replace {{super(__class__, self).__init__(...)}} by {{super().__init__(...)}} Similarly for any other {{super}} calls for other methods. (In Python 3 {{super(__class__, self) == super()}}) > Update super() calls for nvd3 > - > > Key: AIRFLOW-4402 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4402 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > Fix For: 2.0.0 > > > In all classes under nvd3, replace {{super(__class__, self).__init__(...)}} > by {{super().__init__(...)}} > Similarly for any other {{super}} calls for other methods. > (In Python 3 {{super(__class__, self) == super()}}) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4402) Update super() calls for nvd3
Chao-Han Tsai created AIRFLOW-4402: -- Summary: Update super() calls for nvd3 Key: AIRFLOW-4402 URL: https://issues.apache.org/jira/browse/AIRFLOW-4402 Project: Apache Airflow Issue Type: Sub-task Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Fix For: 2.0.0 In all classes, replace {{super(__class__, self).__init__(...)}} by {{super().__init__(...)}} Similarly for any other {{super}} calls for other methods. (In Python 3 {{super(__class__, self) == super()}}) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4398) Extra links include try_number
Chao-Han Tsai created AIRFLOW-4398: -- Summary: Extra links include try_number Key: AIRFLOW-4398 URL: https://issues.apache.org/jira/browse/AIRFLOW-4398 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Currently airflow supports constructing extra links using task and execution_date. https://github.com/apache/airflow/blob/master/airflow/models/baseoperator.py#L977 We should also include the try_number into the scope. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3407) BaseOperator and LoggingMixin do not call super().__init__
[ https://issues.apache.org/jira/browse/AIRFLOW-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai reassigned AIRFLOW-3407: -- Assignee: Chao-Han Tsai > BaseOperator and LoggingMixin do not call super().__init__ > -- > > Key: AIRFLOW-3407 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3407 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: 1.10.1 >Reporter: adam hitchcock >Assignee: Chao-Han Tsai >Priority: Major > > The {{BaseOperator}} is not necessarily the last class in the MRO; usually it > is best practice to always call {{super().__init__(*args, **kwargs)}} > to make sure that every class gets it chance to {{__init__}}. > Is there a specific reason {{BaseOperator}} doesn't call super? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4362) Fix test_execution_limited_parallelism logic
[ https://issues.apache.org/jira/browse/AIRFLOW-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4362: --- Description: - We should use `assertEqual` to compare two items instead of using `assertTrue`. - Ensure that all queues are empty after the executer ends. So when executor.end()is called, it ensures that the queues are empty with: https://github.com/apache/airflow/blob/master/airflow/executors/local_executor.py#L195-L196 https://github.com/apache/airflow/blob/master/airflow/executors/local_executor.py#L205 So I am adding assertion in the test to ensure that. was: - We should use `assertEqual` to compare two items instead of using `assertTrue`. - Ensure that all queues are empty after the executer ends. > Fix test_execution_limited_parallelism logic > > > Key: AIRFLOW-4362 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4362 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > - We should use `assertEqual` to compare two items instead of using > `assertTrue`. > - Ensure that all queues are empty after the executer ends. So when > executor.end()is called, it ensures that the queues are empty with: > https://github.com/apache/airflow/blob/master/airflow/executors/local_executor.py#L195-L196 > > https://github.com/apache/airflow/blob/master/airflow/executors/local_executor.py#L205 > So I am adding assertion in the test to ensure that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4362) Fix test_execution_limited_parallelism logic
[ https://issues.apache.org/jira/browse/AIRFLOW-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4362: --- Description: - We should use `assertEqual` to compare two items instead of using `assertTrue`. - Ensure that all queues are empty after the executer ends. > Fix test_execution_limited_parallelism logic > > > Key: AIRFLOW-4362 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4362 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > - We should use `assertEqual` to compare two items instead of using > `assertTrue`. > - Ensure that all queues are empty after the executer ends. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-4362) Fix test_execution_limited_parallelism logic
[ https://issues.apache.org/jira/browse/AIRFLOW-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai reassigned AIRFLOW-4362: -- Assignee: Chao-Han Tsai > Fix test_execution_limited_parallelism logic > > > Key: AIRFLOW-4362 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4362 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4361) Fix flaky test_integration_run_dag_with_scheduler_failure
Chao-Han Tsai created AIRFLOW-4361: -- Summary: Fix flaky test_integration_run_dag_with_scheduler_failure Key: AIRFLOW-4361 URL: https://issues.apache.org/jira/browse/AIRFLOW-4361 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai test_integration_run_dag_with_scheduler_failure often fails with {code} ^[[33m ^[[33m^[[1m^[[33mConnectionError^[[0m^[[0m^[[33m: ^[[0m^[[33mHTTPConnectionPool(host='10.20.3.19', port= 30809): Max retries exceeded with url: /api/experimental/dags/example_kubernetes_executor_config/paused/false (Ca used by NewConnectionError(': Failed to establish a n ew connection: [Errno 111] Connection refused',))^[[0m^M {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4348) Add a GCP console link in BigQueryOperator
Chao-Han Tsai created AIRFLOW-4348: -- Summary: Add a GCP console link in BigQueryOperator Key: AIRFLOW-4348 URL: https://issues.apache.org/jira/browse/AIRFLOW-4348 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Add a GCP console link in BigQueryOperator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-4191) Add stats about pool utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai resolved AIRFLOW-4191. Resolution: Fixed > Add stats about pool utilization > > > Key: AIRFLOW-4191 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4191 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Add stats about utilization of each pool (open_slots, used_slots, > queued_slots). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4307) Backfill respect concurrency limit
[ https://issues.apache.org/jira/browse/AIRFLOW-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4307: --- Description: Currently backfill respects `pool` limit and `max_active_runs`. It is probably a good idea to make it respect concurrency limit so that we won't launch a big backfill that occupied all the resources. > Backfill respect concurrency limit > -- > > Key: AIRFLOW-4307 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4307 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently backfill respects `pool` limit and `max_active_runs`. It is > probably a good idea to make it respect concurrency limit so that we won't > launch a big backfill that occupied all the resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4307) Backfill respect concurrency limit
Chao-Han Tsai created AIRFLOW-4307: -- Summary: Backfill respect concurrency limit Key: AIRFLOW-4307 URL: https://issues.apache.org/jira/browse/AIRFLOW-4307 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4306) Global operator extra links
Chao-Han Tsai created AIRFLOW-4306: -- Summary: Global operator extra links Key: AIRFLOW-4306 URL: https://issues.apache.org/jira/browse/AIRFLOW-4306 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai A way to register global operator extra links that are shared by all the operators. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4303) Send email report in the end of backfill
Chao-Han Tsai created AIRFLOW-4303: -- Summary: Send email report in the end of backfill Key: AIRFLOW-4303 URL: https://issues.apache.org/jira/browse/AIRFLOW-4303 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Sends an email report in the end of airflow backfill when user run backfill with `--email` flag. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4254) Create a Presto operator
Chao-Han Tsai created AIRFLOW-4254: -- Summary: Create a Presto operator Key: AIRFLOW-4254 URL: https://issues.apache.org/jira/browse/AIRFLOW-4254 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Create a presto operator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4251) Instrument DagRun schedule delay
Chao-Han Tsai created AIRFLOW-4251: -- Summary: Instrument DagRun schedule delay Key: AIRFLOW-4251 URL: https://issues.apache.org/jira/browse/AIRFLOW-4251 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Instrument DagRun schedule delay - time between expected DagRun start date and the actual DagRun start date. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-4217) Remove six package in project
[ https://issues.apache.org/jira/browse/AIRFLOW-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai reassigned AIRFLOW-4217: -- Assignee: Chao-Han Tsai > Remove six package in project > - > > Key: AIRFLOW-4217 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4217 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: zhongjiajie >Assignee: Chao-Han Tsai >Priority: Major > Fix For: 2.0.0 > > > Remove all six package in Airflow -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-4215) Remove mock from the setup.py and move to internal unittest.mock
[ https://issues.apache.org/jira/browse/AIRFLOW-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai reassigned AIRFLOW-4215: -- Assignee: Chao-Han Tsai > Remove mock from the setup.py and move to internal unittest.mock > > > Key: AIRFLOW-4215 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4215 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Fokko Driesprong >Assignee: Chao-Han Tsai >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-4208) Replace @abstractproperty by @abstractmethod
[ https://issues.apache.org/jira/browse/AIRFLOW-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai reassigned AIRFLOW-4208: -- Assignee: Chao-Han Tsai > Replace @abstractproperty by @abstractmethod > > > Key: AIRFLOW-4208 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4208 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Fokko Driesprong >Assignee: Chao-Han Tsai >Priority: Major > Fix For: 2.0.0 > > > Replace @abstractproperty by @abstractmethod (see > https://docs.python.org/3/library/abc.html#abc.abstractproperty) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-4204) Update super() calls
[ https://issues.apache.org/jira/browse/AIRFLOW-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai reassigned AIRFLOW-4204: -- Assignee: Chao-Han Tsai > Update super() calls > - > > Key: AIRFLOW-4204 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4204 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Fokko Driesprong >Assignee: Chao-Han Tsai >Priority: Major > Fix For: 2.0.0 > > > In all classes, replace {{super(__class__, self).__init__(...)}} by > {{super().__init__(...)}} > Similarly for any other {{super}} calls for other methods. > (In Python 3 {{super(__class__, self) == super()}}) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4190) Add instrumentation of schedule delay
[ https://issues.apache.org/jira/browse/AIRFLOW-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4190: --- Summary: Add instrumentation of schedule delay (was: Add a schedule delay monitoring DAG) > Add instrumentation of schedule delay > - > > Key: AIRFLOW-4190 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4190 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > The DAG measures the delay between the scheduled DAG start time (e.g. > next_execution_date) and the wall clock time when first task executes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4189) Airflow table retention
[ https://issues.apache.org/jira/browse/AIRFLOW-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4189: --- Description: Airflow scheduler cleans up records out of retention window in the Airflow table. (was: Create an Airflow DAG that cleans up the records that are out of retention period in the metastore. - We probably need to first modify each table to record the `last_modfiy_date` to support retention period. - User will specify the retention_period in the airflow.cfg) > Airflow table retention > --- > > Key: AIRFLOW-4189 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4189 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Airflow scheduler cleans up records out of retention window in the Airflow > table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4189) Airflow table retention
[ https://issues.apache.org/jira/browse/AIRFLOW-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4189: --- Summary: Airflow table retention (was: Add an airflowdb retention DAG) > Airflow table retention > --- > > Key: AIRFLOW-4189 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4189 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Create an Airflow DAG that cleans up the records that are out of retention > period in the metastore. > - We probably need to first modify each table to record the > `last_modfiy_date` to support retention period. > - User will specify the retention_period in the airflow.cfg -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4191) Add stats about pool utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4191: --- Summary: Add stats about pool utilization (was: Add a pool utilization instrument DAG) > Add stats about pool utilization > > > Key: AIRFLOW-4191 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4191 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > This DAG measures the utilization of each pool (open_slots, used_slots, > queued_slots). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4191) Add stats about pool utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4191: --- Description: Add stats about utilization of each pool (open_slots, used_slots, queued_slots). (was: This DAG measures the utilization of each pool (open_slots, used_slots, queued_slots).) > Add stats about pool utilization > > > Key: AIRFLOW-4191 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4191 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Add stats about utilization of each pool (open_slots, used_slots, > queued_slots). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4190) Add instrumentation of schedule delay
[ https://issues.apache.org/jira/browse/AIRFLOW-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4190: --- Description: Measures the delay between the scheduled DAG start time (e.g. next_execution_date) and the wall clock time when first task executes. (was: The DAG measures the delay between the scheduled DAG start time (e.g. next_execution_date) and the wall clock time when first task executes.) > Add instrumentation of schedule delay > - > > Key: AIRFLOW-4190 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4190 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Measures the delay between the scheduled DAG start time (e.g. > next_execution_date) and the wall clock time when first task executes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4194) set dag_run state to failed when user terminate backfill
[ https://issues.apache.org/jira/browse/AIRFLOW-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4194: --- Description: Reset dag_run state to failed if user terminate backfill. Otherwise the dag_run state will stay in running state which consumes max_active_dagruns. (was: Currently when user terminate the backfill, we set the task_instance state to failed. We should also set the dag_run state to failed.) > set dag_run state to failed when user terminate backfill > > > Key: AIRFLOW-4194 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4194 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Reset dag_run state to failed if user terminate backfill. Otherwise the > dag_run state will stay in running state which consumes max_active_dagruns. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4194) set dag_run state to failed when user terminate backfill
[ https://issues.apache.org/jira/browse/AIRFLOW-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806071#comment-16806071 ] Chao-Han Tsai commented on AIRFLOW-4194: [~TaoFeng] I remember that you contributed something related before but I can't find it in the code. > set dag_run state to failed when user terminate backfill > > > Key: AIRFLOW-4194 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4194 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently when user terminate the backfill, we set the task_instance state to > failed. We should also set the dag_run state to failed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4194) set dag_run state to failed when user terminate backfill
Chao-Han Tsai created AIRFLOW-4194: -- Summary: set dag_run state to failed when user terminate backfill Key: AIRFLOW-4194 URL: https://issues.apache.org/jira/browse/AIRFLOW-4194 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Currently when user terminate the backfill, we set the task_instance state to failed. We should also set the dag_run state to failed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4190) Add a schedule delay monitoring DAG
[ https://issues.apache.org/jira/browse/AIRFLOW-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805890#comment-16805890 ] Chao-Han Tsai commented on AIRFLOW-4190: [~TaoFeng] I am thinking that we can probably create another category of utility DAGs that are used just for the purpose of maintaining Airflow cluster. They can be Airflow cluster monitoring, log retention. People can choose whether they need to load these DAGs through airflow.cfg, just like how we control the example_dags. > Add a schedule delay monitoring DAG > --- > > Key: AIRFLOW-4190 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4190 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > The DAG measures the delay between the scheduled DAG start time (e.g. > next_execution_date) and the wall clock time when first task executes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4138) [AIP] Introduce DAG manifest
[ https://issues.apache.org/jira/browse/AIRFLOW-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4138: --- Summary: [AIP] Introduce DAG manifest (was: Introduce DAG manifest) > [AIP] Introduce DAG manifest > > > Key: AIRFLOW-4138 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4138 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently Airflow traverses all the files under $AIRFLOW_HOME/dags and find > out the DAGs by inspecting the code. We should explicitly specify which files > are DAG. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-4119) Instrument dagrun start time delay
[ https://issues.apache.org/jira/browse/AIRFLOW-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai closed AIRFLOW-4119. -- Resolution: Won't Fix > Instrument dagrun start time delay > -- > > Key: AIRFLOW-4119 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4119 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4139) [AIP] DAG versioning
[ https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4139: --- Summary: [AIP] DAG versioning (was: DAG versioning) > [AIP] DAG versioning > > > Key: AIRFLOW-4139 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4139 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently, running DagRun will be impacted if we change the DAG file in the > middle of the run. After we have > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher > and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest > ready, we can start saving each version of the DAG file on the remote system > and the running tasks should refer to a specific version of DAG instead of > the latest DAG. > How is it different from > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB? > Please see > https://issues.apache.org/jira/browse/AIRFLOW-4139?focusedCommentId=16799264=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16799264 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4191) Add a pool utilization instrument DAG
Chao-Han Tsai created AIRFLOW-4191: -- Summary: Add a pool utilization instrument DAG Key: AIRFLOW-4191 URL: https://issues.apache.org/jira/browse/AIRFLOW-4191 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai This DAG measures the utilization of each pool (open_slots, used_slots, queued_slots). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4190) Add a schedule delay monitoring DAG
Chao-Han Tsai created AIRFLOW-4190: -- Summary: Add a schedule delay monitoring DAG Key: AIRFLOW-4190 URL: https://issues.apache.org/jira/browse/AIRFLOW-4190 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai The DAG measures the delay between the scheduled DAG start time (e.g. next_execution_date) and the wall clock time when first task executes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4189) Add an airflowdb retention DAG
Chao-Han Tsai created AIRFLOW-4189: -- Summary: Add an airflowdb retention DAG Key: AIRFLOW-4189 URL: https://issues.apache.org/jira/browse/AIRFLOW-4189 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Create an Airflow DAG that cleans up the records that are out of retention period in the metastore. - We probably need to first modify each table to record the `last_modfiy_date` to support retention period. - User will specify the retention_period in the airflow.cfg -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-4118) Instrument dagrun duration
[ https://issues.apache.org/jira/browse/AIRFLOW-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai resolved AIRFLOW-4118. Resolution: Fixed > Instrument dagrun duration > -- > > Key: AIRFLOW-4118 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4118 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4163) IntervalCheckOperator support relative difference ratio and can ignore zero
[ https://issues.apache.org/jira/browse/AIRFLOW-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4163: --- Description: - IntervalCheckOperator takes max/min ratio of two values for a metric and returns true if it is less than threshold. Currently if one of the values is 0, it assigns the ratio as None. In python comparison None < Number is always true. We should add an option to fail the task if one of the value is 0. - Currently it only supports Max/Min. It would be useful to support calculating ratio with relative difference. was: IntervalCheckOperator takes max/min ratio of two values for a metric and returns true if it is less than threshold. The bug is if one of the values is 0, it assigns the ratio as None. In python comparison None < Number is always true. We should change that to fail the test Also we should support calculating ratio with relative difference. > IntervalCheckOperator support relative difference ratio and can ignore zero > > > Key: AIRFLOW-4163 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4163 > Project: Apache Airflow > Issue Type: Bug >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > - IntervalCheckOperator takes max/min ratio of two values for a metric and > returns true if it is less than threshold. Currently if one of the values is > 0, it assigns the ratio as None. In python comparison None < Number is always > true. We should add an option to fail the task if one of the value is 0. > - Currently it only supports Max/Min. It would be useful to support > calculating ratio with relative difference. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4163) IntervalCheckOperator support relative difference ratio and can ignore zero
[ https://issues.apache.org/jira/browse/AIRFLOW-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4163: --- Summary: IntervalCheckOperator support relative difference ratio and can ignore zero (was: IntervalCheckOperator support relative difference ratio) > IntervalCheckOperator support relative difference ratio and can ignore zero > > > Key: AIRFLOW-4163 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4163 > Project: Apache Airflow > Issue Type: Bug >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > IntervalCheckOperator takes max/min ratio of two values for a metric and > returns true if it is less than threshold. > The bug is if one of the values is 0, it assigns the ratio as None. In python > comparison None < Number is always true. We should change that to fail the > test > Also we should support calculating ratio with relative difference. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4163) IntervalCheckOperator support relative difference ratio
[ https://issues.apache.org/jira/browse/AIRFLOW-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4163: --- Description: IntervalCheckOperator takes max/min ratio of two values for a metric and returns true if it is less than threshold. The bug is if one of the values is 0, it assigns the ratio as None. In python comparison None < Number is always true. We should change that to fail the test Also we should support calculating ratio with relative difference. was: IntervalCheckOperator takes max/min ratio of two values for a metric and returns true if it is less than threshold. The bug is if one of the values is 0, it assigns the ratio as None. In python comparison None < Number is always true. We should change that to fail the test > IntervalCheckOperator support relative difference ratio > > > Key: AIRFLOW-4163 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4163 > Project: Apache Airflow > Issue Type: Bug >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > IntervalCheckOperator takes max/min ratio of two values for a metric and > returns true if it is less than threshold. > The bug is if one of the values is 0, it assigns the ratio as None. In python > comparison None < Number is always true. We should change that to fail the > test > Also we should support calculating ratio with relative difference. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4163) IntervalCheckOperator support relative difference ratio
[ https://issues.apache.org/jira/browse/AIRFLOW-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4163: --- Summary: IntervalCheckOperator support relative difference ratio (was: IntervalCheckOperator comparison doesn't work for null values) > IntervalCheckOperator support relative difference ratio > > > Key: AIRFLOW-4163 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4163 > Project: Apache Airflow > Issue Type: Bug >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > IntervalCheckOperator takes max/min ratio of two values for a metric and > returns true if it is less than threshold. > The bug is if one of the values is 0, it assigns the ratio as None. In python > comparison None < Number is always true. We should change that to fail the > test -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4139) DAG versioning
[ https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4139: --- Description: Currently, running DagRun will be impacted if we change the DAG file in the middle of the run. After we have (was: Currently, running DagRun will be impacted if we change the DAG file. Existing running DagRun should not be impacted.) > DAG versioning > -- > > Key: AIRFLOW-4139 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4139 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently, running DagRun will be impacted if we change the DAG file in the > middle of the run. After we have -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4139) DAG versioning
[ https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799264#comment-16799264 ] Chao-Han Tsai commented on AIRFLOW-4139: I took a quick scan of https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB and I think that they are only persisting DAG graphs for each DAG version in the metastore just for the purpose of making webserver stateless. That is not enough information to run a task at a given DAG version. We will need to version the code in a way that Airflow worker can refer to. What I am proposing here is that once we have https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest ready. We will have the DAG file stored in a versioned storage. Each running TaskInstance would run specific version of code. > DAG versioning > -- > > Key: AIRFLOW-4139 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4139 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently, running DagRun will be impacted if we change the DAG file. > Existing running DagRun should not be impacted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4139) DAG versioning
[ https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4139: --- Description: Currently, running DagRun will be impacted if we change the DAG file in the middle of the run. After we have https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest ready, we can start saving each version of the DAG file on the remote system and the running tasks should refer to a specific version of DAG instead of the latest DAG. How is it different from https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB? Please see https://issues.apache.org/jira/browse/AIRFLOW-4139?focusedCommentId=16799264=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16799264 was: Currently, running DagRun will be impacted if we change the DAG file in the middle of the run. After we have https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest ready, we can start saving each version of the DAG file on the remote system and the running tasks should refer to a specific version of DAG instead of the latest DAG. How is it different from https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB? > DAG versioning > -- > > Key: AIRFLOW-4139 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4139 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently, running DagRun will be impacted if we change the DAG file in the > middle of the run. After we have > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher > and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest > ready, we can start saving each version of the DAG file on the remote system > and the running tasks should refer to a specific version of DAG instead of > the latest DAG. > How is it different from > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB? > Please see > https://issues.apache.org/jira/browse/AIRFLOW-4139?focusedCommentId=16799264=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16799264 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4139) DAG versioning
[ https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4139: --- Description: Currently, running DagRun will be impacted if we change the DAG file in the middle of the run. After we have https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest ready, we can start saving each version of the DAG file on the remote system and the running tasks should refer to a specific version of DAG instead of the latest DAG. How is it different from https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB? was: Currently, running DagRun will be impacted if we change the DAG file in the middle of the run. After we have https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest ready, we can start saving each version of the DAG file on the remote system and the running tasks should refer to a specific version of DAG instead of the latest DAG. How is it different from > DAG versioning > -- > > Key: AIRFLOW-4139 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4139 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently, running DagRun will be impacted if we change the DAG file in the > middle of the run. After we have > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher > and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest > ready, we can start saving each version of the DAG file on the remote system > and the running tasks should refer to a specific version of DAG instead of > the latest DAG. > How is it different from > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4139) DAG versioning
[ https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4139: --- Description: Currently, running DagRun will be impacted if we change the DAG file in the middle of the run. After we have https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest ready, we can start saving each version of the DAG file on the remote system and the running tasks should refer to a specific version of DAG instead of the latest DAG. How is it different from was:Currently, running DagRun will be impacted if we change the DAG file in the middle of the run. After we have > DAG versioning > -- > > Key: AIRFLOW-4139 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4139 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently, running DagRun will be impacted if we change the DAG file in the > middle of the run. After we have > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher > and https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest > ready, we can start saving each version of the DAG file on the remote system > and the running tasks should refer to a specific version of DAG instead of > the latest DAG. > How is it different from -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4139) DAG versioning
[ https://issues.apache.org/jira/browse/AIRFLOW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799265#comment-16799265 ] Chao-Han Tsai commented on AIRFLOW-4139: I think I should better add more description as it must be very confusing of what we aim for this ticket. > DAG versioning > -- > > Key: AIRFLOW-4139 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4139 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently, running DagRun will be impacted if we change the DAG file. > Existing running DagRun should not be impacted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4139) DAG versioning
Chao-Han Tsai created AIRFLOW-4139: -- Summary: DAG versioning Key: AIRFLOW-4139 URL: https://issues.apache.org/jira/browse/AIRFLOW-4139 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Currently, running DagRun will be impacted if we change the DAG file. Existing running DagRun should not be impacted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4138) Introduce DAG manifest
Chao-Han Tsai created AIRFLOW-4138: -- Summary: Introduce DAG manifest Key: AIRFLOW-4138 URL: https://issues.apache.org/jira/browse/AIRFLOW-4138 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Currently Airflow traverses all the files under $AIRFLOW_HOME/dags and find out the DAGs by inspecting the code. We should explicitly specify which files are DAG. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4119) Instrument dagrun start time delay
Chao-Han Tsai created AIRFLOW-4119: -- Summary: Instrument dagrun start time delay Key: AIRFLOW-4119 URL: https://issues.apache.org/jira/browse/AIRFLOW-4119 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4118) Instrument dagrun duration
Chao-Han Tsai created AIRFLOW-4118: -- Summary: Instrument dagrun duration Key: AIRFLOW-4118 URL: https://issues.apache.org/jira/browse/AIRFLOW-4118 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai -- This message was sent by Atlassian JIRA (v7.6.3#76005)