[GitHub] zhongjiajie commented on issue #4773: [AIRFLOW-3767] Correct bulk insert function
zhongjiajie commented on issue #4773: [AIRFLOW-3767] Correct bulk insert function URL: https://github.com/apache/airflow/pull/4773#issuecomment-468174775 CI test failed many many many time for no detaile resone :sob: :sob: :sob: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ryanyuan commented on a change in pull request #4786: [AIRFLOW-3966] Correct default bigquery_conn_id in BigQueryTableSensor
ryanyuan commented on a change in pull request #4786: [AIRFLOW-3966] Correct default bigquery_conn_id in BigQueryTableSensor URL: https://github.com/apache/airflow/pull/4786#discussion_r261073067 ## File path: airflow/contrib/sensors/bigquery_sensor.py ## @@ -50,7 +50,7 @@ def __init__(self, project_id, dataset_id, table_id, - bigquery_conn_id='bigquery_default_conn', + bigquery_conn_id='bigquery_default', Review comment: @mik-laj Cool! :thumbsup: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on issue #4801: [AIRFLOW-XXX] Unpin cryptography (2.6.1 fixes issue in 2.6)
XD-DENG commented on issue #4801: [AIRFLOW-XXX] Unpin cryptography (2.6.1 fixes issue in 2.6) URL: https://github.com/apache/airflow/pull/4801#issuecomment-468151503 @feng-tao @Fokko PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (AIRFLOW-2767) Upgrade gunicorn to 19.5.0 or greater to avoid moderate-severity CVE
[ https://issues.apache.org/jira/browse/AIRFLOW-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Feng resolved AIRFLOW-2767. --- Resolution: Fixed Assignee: (was: Siddharth Anand) > Upgrade gunicorn to 19.5.0 or greater to avoid moderate-severity CVE > > > Key: AIRFLOW-2767 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2767 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Siddharth Anand >Priority: Major > > Refer to the moderate-severity CVE in gunicorn 19.4.5 (apparently fixed in > 19.5.0) > [https://nvd.nist.gov/vuln/detail/CVE-2018-1000164] > Currently, apache airflow's setup.py allows 19.4.0 > -s -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2767) Upgrade gunicorn to 19.5.0 or greater to avoid moderate-severity CVE
[ https://issues.apache.org/jira/browse/AIRFLOW-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780144#comment-16780144 ] ASF subversion and git services commented on AIRFLOW-2767: -- Commit 71140dd2dfb63f16254420b8ba3a4a62b5919f45 in airflow's branch refs/heads/master from RosterIn [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=71140dd ] [AIRFLOW-2767] - Upgrade gunicorn to 19.5.0 to avoid moderate-severity CVE (#4795) Upgrade gunicorn to 19.5.0 to avoid moderate-severity CVE > Upgrade gunicorn to 19.5.0 or greater to avoid moderate-severity CVE > > > Key: AIRFLOW-2767 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2767 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Siddharth Anand >Assignee: Siddharth Anand >Priority: Major > > Refer to the moderate-severity CVE in gunicorn 19.4.5 (apparently fixed in > 19.5.0) > [https://nvd.nist.gov/vuln/detail/CVE-2018-1000164] > Currently, apache airflow's setup.py allows 19.4.0 > -s -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feng-tao commented on issue #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit…
feng-tao commented on issue #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit… URL: https://github.com/apache/airflow/pull/4795#issuecomment-468149763 thanks @RosterIn This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2767) Upgrade gunicorn to 19.5.0 or greater to avoid moderate-severity CVE
[ https://issues.apache.org/jira/browse/AIRFLOW-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780141#comment-16780141 ] ASF GitHub Bot commented on AIRFLOW-2767: - feng-tao commented on pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit… URL: https://github.com/apache/airflow/pull/4795 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Upgrade gunicorn to 19.5.0 or greater to avoid moderate-severity CVE > > > Key: AIRFLOW-2767 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2767 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Siddharth Anand >Assignee: Siddharth Anand >Priority: Major > > Refer to the moderate-severity CVE in gunicorn 19.4.5 (apparently fixed in > 19.5.0) > [https://nvd.nist.gov/vuln/detail/CVE-2018-1000164] > Currently, apache airflow's setup.py allows 19.4.0 > -s -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feng-tao merged pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit…
feng-tao merged pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit… URL: https://github.com/apache/airflow/pull/4795 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit…
codecov-io commented on issue #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit… URL: https://github.com/apache/airflow/pull/4795#issuecomment-468144716 # [Codecov](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=h1) Report > Merging [#4795](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/b0c4d37fb5cbc471097f2383ac2f1c4f37a5c859?src=pr=desc) will **increase** coverage by `0.77%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/4795/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4795 +/- ## == + Coverage 74.44% 75.22% +0.77% == Files 450 450 Lines 2897030099+1129 == + Hits2156722641+1074 - Misses 7403 7458 +55 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/task/task\_runner/base\_task\_runner.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy90YXNrL3Rhc2tfcnVubmVyL2Jhc2VfdGFza19ydW5uZXIucHk=) | `78.57% <0%> (-0.74%)` | :arrow_down: | | [...irflow/contrib/example\_dags/example\_gcp\_spanner.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4YW1wbGVfZGFncy9leGFtcGxlX2djcF9zcGFubmVyLnB5) | `0% <0%> (ø)` | :arrow_up: | | [.../kubernetes\_request\_factory/pod\_request\_factory.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2t1YmVybmV0ZXMva3ViZXJuZXRlc19yZXF1ZXN0X2ZhY3RvcnkvcG9kX3JlcXVlc3RfZmFjdG9yeS5weQ==) | `100% <0%> (ø)` | :arrow_up: | | [airflow/ti\_deps/dep\_context.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy90aV9kZXBzL2RlcF9jb250ZXh0LnB5) | `100% <0%> (ø)` | :arrow_up: | | [airflow/models/taskreschedule.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza3Jlc2NoZWR1bGUucHk=) | `100% <0%> (ø)` | :arrow_up: | | [airflow/operators/python\_operator.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvcHl0aG9uX29wZXJhdG9yLnB5) | `96.63% <0%> (+0.8%)` | :arrow_up: | | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `94.05% <0%> (+1.4%)` | :arrow_up: | | [airflow/contrib/utils/gcp\_field\_validator.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL3V0aWxzL2djcF9maWVsZF92YWxpZGF0b3IucHk=) | `93.67% <0%> (+2.14%)` | :arrow_up: | | [airflow/utils/dates.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYXRlcy5weQ==) | `85.71% <0%> (+2.2%)` | :arrow_up: | | [airflow/contrib/hooks/gcp\_vision\_hook.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2djcF92aXNpb25faG9vay5weQ==) | `86.36% <0%> (+3.5%)` | :arrow_up: | | ... and [1 more](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=footer). Last update [b0c4d37...66f5152](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #4801: [AIRFLOW-XXX] Unpin cryptography (2.6.1 fixes issue in 2.6)
codecov-io commented on issue #4801: [AIRFLOW-XXX] Unpin cryptography (2.6.1 fixes issue in 2.6) URL: https://github.com/apache/airflow/pull/4801#issuecomment-468137177 # [Codecov](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=h1) Report > Merging [#4801](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/2ade9126588cef252cc7406a4729976f95e1c66e?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/4801/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4801 +/- ## == + Coverage 74.44% 74.44% +<.01% == Files 450 450 Lines 2897028970 == + Hits2156621567 +1 + Misses 7404 7403 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/4801/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `92.64% <0%> (+0.05%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=footer). Last update [2ade912...ab1bf06](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] fenglu-g commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper failed session commit handling
fenglu-g commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper failed session commit handling URL: https://github.com/apache/airflow/pull/4769#discussion_r261043400 ## File path: airflow/jobs.py ## @@ -2168,147 +2168,149 @@ def _process_backfill_task_instances(self, # or leaf to root, as otherwise tasks might be # determined deadlocked while they are actually # waiting for their upstream to finish +@provide_session Review comment: Any other concerns? @Fokko @ashb This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] fenglu-g commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper failed session commit handling
fenglu-g commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper failed session commit handling URL: https://github.com/apache/airflow/pull/4769#discussion_r261043313 ## File path: airflow/jobs.py ## @@ -2168,147 +2168,149 @@ def _process_backfill_task_instances(self, # or leaf to root, as otherwise tasks might be # determined deadlocked while they are actually # waiting for their upstream to finish +@provide_session Review comment: I think SQLAlchemy does the pooling but is un-opinionated about how sessions are managed. The following access pattern is recommended per https://docs.sqlalchemy.org/en/latest/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it, which is what Airflow follows: https://github.com/apache/airflow/blob/c50a85146373bafb0cbf86850f834d63bd4dede8/airflow/utils/db.py#L37. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG opened a new pull request #4801: [AIRFLOW-XXX] Unpin cryptography (2.6.1 fixes issue in 2.6)
XD-DENG opened a new pull request #4801: [AIRFLOW-XXX] Unpin cryptography (2.6.1 fixes issue in 2.6) URL: https://github.com/apache/airflow/pull/4801 There was issue in `cryptography` 2.6, and https://github.com/apache/airflow/commit/2ade9126588cef252cc7406a4729976f95e1c66e helped pin version of it to avoid the issue. `cryptography` 2.6.1 was released very fast though to fix this issue. So we can remove the pin on `cryptography`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4781: [AIRFLOW-3962] Added graceful handling for creation of dag_run of a dag which doesn't have any task
codecov-io edited a comment on issue #4781: [AIRFLOW-3962] Added graceful handling for creation of dag_run of a dag which doesn't have any task URL: https://github.com/apache/airflow/pull/4781#issuecomment-467492161 # [Codecov](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=h1) Report > Merging [#4781](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/2ade9126588cef252cc7406a4729976f95e1c66e?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/4781/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4781 +/- ## == + Coverage 74.44% 74.44% +<.01% == Files 450 450 Lines 2897028970 == + Hits2156621567 +1 + Misses 7404 7403 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/jobs.py](https://codecov.io/gh/apache/airflow/pull/4781/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `76.46% <100%> (ø)` | :arrow_up: | | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/4781/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `92.64% <0%> (+0.05%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=footer). Last update [2ade912...690aed5](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #4799: [AIRFLOW-3975] Handle null inputs in attribute renderers.
codecov-io commented on issue #4799: [AIRFLOW-3975] Handle null inputs in attribute renderers. URL: https://github.com/apache/airflow/pull/4799#issuecomment-468114720 # [Codecov](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=h1) Report > Merging [#4799](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/2ade9126588cef252cc7406a4729976f95e1c66e?src=pr=desc) will **increase** coverage by `0.01%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/4799/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4799 +/- ## == + Coverage 74.44% 74.45% +0.01% == Files 450 450 Lines 2897028969 -1 == + Hits2156621569 +3 + Misses 7404 7400 -4 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www/utils.py](https://codecov.io/gh/apache/airflow/pull/4799/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdXRpbHMucHk=) | `75.39% <100%> (+1.43%)` | :arrow_up: | | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/4799/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `92.64% <0%> (+0.05%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=footer). Last update [2ade912...3e65674](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3853) Duplicate Logs appearing in S3
[ https://issues.apache.org/jira/browse/AIRFLOW-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780002#comment-16780002 ] ASF GitHub Bot commented on AIRFLOW-3853: - samuelwbock commented on pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload URL: https://github.com/apache/airflow/pull/4675 Make sure you have checked _all_ steps below. ### Jira - [X] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-3853) issues and references them in the PR title. For example, "\[AIRFLOW-3853\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3853 ### Description - [X] We've recently started to see duplicate logs in S3. After digging into it, we discovered that this was due to our use of the new `reschedule` mode on our sensors. Because the same `try_number` is used when a task reschedules, the local log file frequently contains results from previous attempts. Additionally, because the `s3_task_helper.py` always tries to `append` the local log file to the remove log file, this can result in massive logs (we found one that 400 mb). To fix this, we'd like to remove the local log after a successful upload. Because the file is uploaded to S3, no data will be lost. ### Tests - [X] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: I've modified the following unit tests to cover the change to `s3_write`: `test_write`, `test_write_existing`, `test_write_raises`. ### Commits - [X] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [X] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [X] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Duplicate Logs appearing in S3 > -- > > Key: AIRFLOW-3853 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3853 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.2 >Reporter: Sam Bock >Assignee: Sam Bock >Priority: Major > > We've recently started to see duplicate logs in S3. After digging into it, we > discovered that this was due to our use of the new `reschedule` mode on our > sensors. Because the same `try_number` is used when a task reschedules, the > local log file frequently contains results from previous attempts. > Additionally, because the `s3_task_helper.py` always tries to `append` the > local log file to the remove log file, this can result in massive logs (we > found one that 400 mb). > To fix this, we'd like to remove the local log after a successful upload. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3853) Duplicate Logs appearing in S3
[ https://issues.apache.org/jira/browse/AIRFLOW-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780001#comment-16780001 ] ASF GitHub Bot commented on AIRFLOW-3853: - samuelwbock commented on pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload URL: https://github.com/apache/airflow/pull/4675 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Duplicate Logs appearing in S3 > -- > > Key: AIRFLOW-3853 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3853 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.2 >Reporter: Sam Bock >Assignee: Sam Bock >Priority: Major > > We've recently started to see duplicate logs in S3. After digging into it, we > discovered that this was due to our use of the new `reschedule` mode on our > sensors. Because the same `try_number` is used when a task reschedules, the > local log file frequently contains results from previous attempts. > Additionally, because the `s3_task_helper.py` always tries to `append` the > local log file to the remove log file, this can result in massive logs (we > found one that 400 mb). > To fix this, we'd like to remove the local log after a successful upload. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] samuelwbock closed pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload
samuelwbock closed pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload URL: https://github.com/apache/airflow/pull/4675 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] samuelwbock opened a new pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload
samuelwbock opened a new pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload URL: https://github.com/apache/airflow/pull/4675 Make sure you have checked _all_ steps below. ### Jira - [X] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-3853) issues and references them in the PR title. For example, "\[AIRFLOW-3853\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3853 ### Description - [X] We've recently started to see duplicate logs in S3. After digging into it, we discovered that this was due to our use of the new `reschedule` mode on our sensors. Because the same `try_number` is used when a task reschedules, the local log file frequently contains results from previous attempts. Additionally, because the `s3_task_helper.py` always tries to `append` the local log file to the remove log file, this can result in massive logs (we found one that 400 mb). To fix this, we'd like to remove the local log after a successful upload. Because the file is uploaded to S3, no data will be lost. ### Tests - [X] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: I've modified the following unit tests to cover the change to `s3_write`: `test_write`, `test_write_existing`, `test_write_raises`. ### Commits - [X] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [X] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [X] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhongjiajie commented on a change in pull request #4773: [AIRFLOW-3767] Correct bulk insert function
zhongjiajie commented on a change in pull request #4773: [AIRFLOW-3767] Correct bulk insert function URL: https://github.com/apache/airflow/pull/4773#discussion_r261014804 ## File path: airflow/hooks/oracle_hook.py ## @@ -199,12 +199,20 @@ def bulk_insert_rows(self, table, rows, target_fields=None, commit_every=5000): Default 5000. Set greater than 0. Set 1 to insert each row in each transaction :type commit_every: int """ +if not rows: +raise ValueError("parameter rows could not be None or empty iterable") conn = self.get_conn() cursor = conn.cursor() -values = ', '.join(':%s' % i for i in range(1, len(target_fields) + 1)) -prepared_stm = 'insert into {tablename} ({columns}) values ({values})'.format( +if target_fields: +columns = ', '.join(target_fields) Review comment: @Fokko Change the code as you said, waiting for CI pass, PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib
feng-tao commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib URL: https://github.com/apache/airflow/pull/4800#issuecomment-468094775 @XD-DENG , let's keep it as it is. If anyone confirms that the issue has been fixed, we could unpin later. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhongjiajie commented on issue #4699: [AIRFLOW-3881] Correct to_csv row number
zhongjiajie commented on issue #4699: [AIRFLOW-3881] Correct to_csv row number URL: https://github.com/apache/airflow/pull/4699#issuecomment-468094463 @Fokko You are welcome. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #4797: [AIRFLOW-3973] Run each Alembic migration in its own transaction
codecov-io commented on issue #4797: [AIRFLOW-3973] Run each Alembic migration in its own transaction URL: https://github.com/apache/airflow/pull/4797#issuecomment-468092563 # [Codecov](https://codecov.io/gh/apache/airflow/pull/4797?src=pr=h1) Report > Merging [#4797](https://codecov.io/gh/apache/airflow/pull/4797?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/2ade9126588cef252cc7406a4729976f95e1c66e?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/4797/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/4797?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#4797 +/- ## === Coverage 74.44% 74.44% === Files 450 450 Lines 2897028970 === Hits2156621566 Misses 7404 7404 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/4797?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/4797?src=pr=footer). Last update [2ade912...ad67c68](https://codecov.io/gh/apache/airflow/pull/4797?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib
XD-DENG commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib URL: https://github.com/apache/airflow/pull/4800#issuecomment-468075302 Hi @feng-tao, 2.6.1 of cryptograph was released a few minutes ago. I’m afk. Do you want to try unpinning it? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao merged pull request #4800: [AIRFLOW-XXX] Fix CI for broken lib
feng-tao merged pull request #4800: [AIRFLOW-XXX] Fix CI for broken lib URL: https://github.com/apache/airflow/pull/4800 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib
feng-tao commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib URL: https://github.com/apache/airflow/pull/4800#issuecomment-468072275 CI fixed. merged now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib
feng-tao commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib URL: https://github.com/apache/airflow/pull/4800#issuecomment-468062616 PTAL @Fokko @kaxil @ashb @XD-DENG This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ryanyuan commented on a change in pull request #4786: [AIRFLOW-3966] Correct default bigquery_conn_id in BigQueryTableSensor
ryanyuan commented on a change in pull request #4786: [AIRFLOW-3966] Correct default bigquery_conn_id in BigQueryTableSensor URL: https://github.com/apache/airflow/pull/4786#discussion_r260972377 ## File path: airflow/contrib/sensors/bigquery_sensor.py ## @@ -50,7 +50,7 @@ def __init__(self, project_id, dataset_id, table_id, - bigquery_conn_id='bigquery_default_conn', + bigquery_conn_id='bigquery_default', Review comment: @mik-laj @Fokko Agreed. @mik-laj Nice initiative for your changes. Will you have time to continue working on it? If not, I would love to take the whole task. Cheers. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3975) Handle null values in attr renderers
[ https://issues.apache.org/jira/browse/AIRFLOW-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779794#comment-16779794 ] ASF GitHub Bot commented on AIRFLOW-3975: - jmcarp commented on pull request #4799: [AIRFLOW-3975] Handle null inputs in attribute renderers. URL: https://github.com/apache/airflow/pull/4799 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3975 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Handle null values in attr renderers > > > Key: AIRFLOW-3975 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3975 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Josh Carp >Assignee: Josh Carp >Priority: Trivial > > Some renderers in `attr_renderers` raise unhandled exceptions when given null > inputs. For example, the `python_callable` renderer raises an error if passed > `None`. Some operators allow null values for this attribute, such as > `TriggerDagRunOperator`. I think all renderers should handle null input by > returning the empty string and not raising an exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jmcarp opened a new pull request #4799: [AIRFLOW-3975] Handle null inputs in attribute renderers.
jmcarp opened a new pull request #4799: [AIRFLOW-3975] Handle null inputs in attribute renderers. URL: https://github.com/apache/airflow/pull/4799 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3975 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3975) Handle null values in attr renderers
Josh Carp created AIRFLOW-3975: -- Summary: Handle null values in attr renderers Key: AIRFLOW-3975 URL: https://issues.apache.org/jira/browse/AIRFLOW-3975 Project: Apache Airflow Issue Type: Improvement Reporter: Josh Carp Assignee: Josh Carp Some renderers in `attr_renderers` raise unhandled exceptions when given null inputs. For example, the `python_callable` renderer raises an error if passed `None`. Some operators allow null values for this attribute, such as `TriggerDagRunOperator`. I think all renderers should handle null input by returning the empty string and not raising an exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] tayloramurphy opened a new pull request #4798: Add GitLab to list of organizations using Airflow
tayloramurphy opened a new pull request #4798: Add GitLab to list of organizations using Airflow URL: https://github.com/apache/airflow/pull/4798 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Adds GitLab as an organization using Airflow. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lucafuji edited a comment on issue #4783: [AIRFLOW-578] Fix check return code
lucafuji edited a comment on issue #4783: [AIRFLOW-578] Fix check return code URL: https://github.com/apache/airflow/pull/4783#issuecomment-468023066 > thanks @lucafuji for the fix. Just curious: does uber use Airflow as well? If that's the case, it would be great to put the company under "who uses Airflow" section :) BTW, very cool for your team's aresdb project :) Thanks, @feng-tao. We have an internal workflow management system. I'm not very familiar with that now but if you want to touch base, I can connect you with their manager. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lucafuji commented on issue #4783: [AIRFLOW-578] Fix check return code
lucafuji commented on issue #4783: [AIRFLOW-578] Fix check return code URL: https://github.com/apache/airflow/pull/4783#issuecomment-468025010 BTW @feng-tao BTW seems the fix does not integrate well with the impersonation test. I take a look at the tests but I'm not sure what I'm doing wrong. Can you help validate whether treat non zero return code as failure will break the impersonation? If that's the case, it's better someone from airflow to fix the issue https://issues.apache.org/jira/browse/AIRFLOW-578 via another way. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lucafuji edited a comment on issue #4783: [AIRFLOW-578] Fix check return code
lucafuji edited a comment on issue #4783: [AIRFLOW-578] Fix check return code URL: https://github.com/apache/airflow/pull/4783#issuecomment-468023066 > thanks @lucafuji for the fix. Just curious: does uber use Airflow as well? If that's the case, it would be great to put the company under "who uses Airflow" section :) BTW, very cool for your team's aresdb project :) Thanks, @feng-tao. We have an internal workflow management tool which was forked from airflow 3 years ago. I'm not very familiar with that now but if you want to touch base, I can connect you with their manager. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lucafuji commented on a change in pull request #4783: [AIRFLOW-578] Fix check return code
lucafuji commented on a change in pull request #4783: [AIRFLOW-578] Fix check return code URL: https://github.com/apache/airflow/pull/4783#discussion_r260934052 ## File path: airflow/jobs.py ## @@ -2559,7 +2569,13 @@ def signal_handler(signum, frame): while True: # Monitor the task to see if it's done return_code = self.task_runner.return_code() + if return_code is not None: +if return_code != 0: +msg = ("LocalTaskJob process exited with non zero status " + "{}".format(return_code)) +raise AirflowException(msg) Review comment: that's exactly the issue in [AIRFLOW-578](https://issues.apache.org/jira/browse/AIRFLOW-578). BaseJob ignores the return code of the spawned process. which makes even that process is killed or returned abnormally, it will think it finishes with success. So raise an exception here will make the job finished with failures This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lucafuji commented on issue #4783: [AIRFLOW-578] Fix check return code
lucafuji commented on issue #4783: [AIRFLOW-578] Fix check return code URL: https://github.com/apache/airflow/pull/4783#issuecomment-468023066 > thanks @lucafuji for the fix. Just curious: does uber use Airflow as well? If that's the case, it would be great to put the company under "who uses Airflow" section :) BTW, very cool for your team's aresdb project :) Thanks, @feng-tao. We have an internal workflow management tool which was forked from airflow 3 years ago. I'm not very familiar with that now but if you want to touch base, I can connect you with their manager. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lucafuji commented on a change in pull request #4783: [AIRFLOW-578] Fix check return code
lucafuji commented on a change in pull request #4783: [AIRFLOW-578] Fix check return code URL: https://github.com/apache/airflow/pull/4783#discussion_r260934052 ## File path: airflow/jobs.py ## @@ -2559,7 +2569,13 @@ def signal_handler(signum, frame): while True: # Monitor the task to see if it's done return_code = self.task_runner.return_code() + if return_code is not None: +if return_code != 0: +msg = ("LocalTaskJob process exited with non zero status " + "{}".format(return_code)) +raise AirflowException(msg) Review comment: that's exactly the issue in [AIRFLOW-578](https://issues.apache.org/jira/browse/AIRFLOW-578). BaseJob ignores the return code of the spawned process. which makes even that process is killed or returned abnormally, it will think it finishes with success This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3974) Having task with `trigger_rule='one_success'` causes failed dag to be marked succesful
David created AIRFLOW-3974: -- Summary: Having task with `trigger_rule='one_success'` causes failed dag to be marked succesful Key: AIRFLOW-3974 URL: https://issues.apache.org/jira/browse/AIRFLOW-3974 Project: Apache Airflow Issue Type: Bug Components: scheduler Reporter: David The following dag will be marked successful and the failure callback will not run {code:java} import datetime from airflow import DAG from airflow.operators.dummy_operator import DummyOperator from airflow.operators.python_operator import PythonOperator from util.slack.callback import post_error_to_slack_callback dag = DAG('a_slack_dag', schedule_interval=None, start_date=datetime.datetime.now(), on_failure_callback=post_error_to_slack_callback) with dag: succeed = DummyOperator(task_id='will_succeed') def raise_it(): raise Exception('raised') fail = PythonOperator(task_id='branch_operator_fail', python_callable=raise_it) option_one = DummyOperator(task_id='option_one') option_two = DummyOperator(task_id='option_two') final_task = DummyOperator(task_id='final_task', trigger_rule='one_success') succeed >> fail >> option_one >> final_task fail >> option_two >> final_task {code} However if the `one_success` rule is removed from `final_task` the dag will correctly be marked as failed. While the example doesn't explicitly show it, the failing task is a branch python operator and only one of the option task will ever be run, hence the requirement for th `one_success` rule. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3853) Duplicate Logs appearing in S3
[ https://issues.apache.org/jira/browse/AIRFLOW-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779742#comment-16779742 ] ASF GitHub Bot commented on AIRFLOW-3853: - samuelwbock commented on pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload URL: https://github.com/apache/airflow/pull/4675 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Duplicate Logs appearing in S3 > -- > > Key: AIRFLOW-3853 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3853 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.2 >Reporter: Sam Bock >Assignee: Sam Bock >Priority: Major > > We've recently started to see duplicate logs in S3. After digging into it, we > discovered that this was due to our use of the new `reschedule` mode on our > sensors. Because the same `try_number` is used when a task reschedules, the > local log file frequently contains results from previous attempts. > Additionally, because the `s3_task_helper.py` always tries to `append` the > local log file to the remove log file, this can result in massive logs (we > found one that 400 mb). > To fix this, we'd like to remove the local log after a successful upload. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] samuelwbock closed pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload
samuelwbock closed pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload URL: https://github.com/apache/airflow/pull/4675 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] samuelwbock opened a new pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload
samuelwbock opened a new pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload URL: https://github.com/apache/airflow/pull/4675 Make sure you have checked _all_ steps below. ### Jira - [X] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-3853) issues and references them in the PR title. For example, "\[AIRFLOW-3853\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3853 ### Description - [X] We've recently started to see duplicate logs in S3. After digging into it, we discovered that this was due to our use of the new `reschedule` mode on our sensors. Because the same `try_number` is used when a task reschedules, the local log file frequently contains results from previous attempts. Additionally, because the `s3_task_helper.py` always tries to `append` the local log file to the remove log file, this can result in massive logs (we found one that 400 mb). To fix this, we'd like to remove the local log after a successful upload. Because the file is uploaded to S3, no data will be lost. ### Tests - [X] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: I've modified the following unit tests to cover the change to `s3_write`: `test_write`, `test_write_existing`, `test_write_raises`. ### Commits - [X] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [X] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [X] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] galak75 edited a comment on issue #4743: [AIRFLOW-3871] render Operators template fields recursively
galak75 edited a comment on issue #4743: [AIRFLOW-3871] render Operators template fields recursively URL: https://github.com/apache/airflow/pull/4743#issuecomment-468013507 @Fokko : I could not decide between : - a recursive template rendering over inner attributes approach (as done in this pull request) - a duck typing custom rendering approach (like in [this comment](https://issues.apache.org/jira/browse/AIRFLOW-2508?focusedCommentId=16654887=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16654887)) (see https://issues.apache.org/jira/browse/AIRFLOW-3871) Is this recursive solution accepted ? is it preferred to a duck typing solution? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] galak75 commented on issue #4743: [AIRFLOW-3871] render Operators template fields recursively
galak75 commented on issue #4743: [AIRFLOW-3871] render Operators template fields recursively URL: https://github.com/apache/airflow/pull/4743#issuecomment-468013507 @Fokko : I could not decide between : - a recursive template rendering over inner attributes approach (as done in this pull request) - a duck typing custom rendering approach (like in [this comment](https://issues.apache.org/jira/browse/AIRFLOW-2508?focusedCommentId=16654887=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16654887)) (see https://issues.apache.org/jira/browse/AIRFLOW-3871) Is this recursive solution accepted ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
[ https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Shugerman updated AIRFLOW-3973: --- Description: h2. Notes: * This does not occur if the database is already initialized. If it is, run `resetdb` instead to observe the bug. * This does not occur with the default SQLite database. h2. Example {{ERROR [airflow.models.DagBag] Failed to import: /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context cursor, statement, parameters, context File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute cursor.execute(statement, parameters) psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM variable}} h2. Explanation The first thing {{airflow initdb}} does is run the Alembic migrations. All migrations are run in one transaction. Most tables, including the {{variable}} table, are defined in the initial migration. A [later migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} calls its {{collect_dags}} method, which scans the DAGs directory and attempts to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it will query the database to see if that {{Variable}} is defined in the {{variable}} table. It's not clear to me how exactly the connection for that query is created, but I think it is apparent that it does _not_ use the same transaction that is used to run the migrations. Since the migrations are not yet complete, and all migrations are run in one transaction, the migration that creates the {{variable}} table has not yet been committed, and therefore the table does not exist to any other connection/transaction. This raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}. h2. Proposed Solution Run each Alembic migration in its own transaction. I will open a pull request which accomplishes this shortly. was: h2. Notes: * This does not occur if the database is already initialized. If it is, run `resetdb` instead to observe the bug. * This does not occur with the default SQLite database. h2. Example {{ERROR [airflow.models.DagBag] Failed to import: /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context cursor, statement, parameters, context File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute cursor.execute(statement, parameters) psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM variable}} h2. Explanation The first thing {{airflow initdb}} does is run the Alembic migrations. All migrations are run in one transaction. Most tables, including the {{variable}} table, are defined in the initial migration. A [later migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} calls its {{collect_dags}} method, which scans the DAGs directory and attempts to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it will query the database to see if that {{Variable}} is defined in the {{variable}} table. It's not clear to me how exactly the connection for that query is created, but I think it is a fair assumption that it does _not_ use the same transaction that is used to run the migrations. Since the migrations are not yet complete, and all migrations are run in one transaction, the migration that creates the {{variable}} table has not yet been committed, and therefore the table does not exist to any other connection/transaction. This raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}. h2. Proposed Solution Run each Alembic migration in its own transaction. I will open a pull request which accomplishes this shortly. > `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is > used for the internal database > --- > > Key: AIRFLOW-3973 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 > Project: Apache Airflow > Issue Type: Bug >Reporter: Elliott Shugerman >Assignee: Elliott Shugerman >Priority: Minor > > h2. Notes: > * This does not occur if the database is
[jira] [Commented] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
[ https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779731#comment-16779731 ] ASF GitHub Bot commented on AIRFLOW-3973: - eeshugerman commented on pull request #4797: [AIRFLOW-3973] problem: `initdb` spams log with errors | solution: run each migration in its own transaction URL: https://github.com/apache/airflow/pull/4797 ### Jira https://issues.apache.org/jira/browse/AIRFLOW-3973 ### Description If `Variable`s are used in DAGs, and Postgres is used for the internal database, a fresh `$ airflow initdb` (or `$ airflow resetdb`) spams the logs with error messages (but does not fail). This commit corrects this by running each migration in a separate transaction. See Jira ticket for more details. I have tested this change with the default SQLite database and, of course, with Postgres. ### Tests No tests included as this is a one line change which adds no functionality whatsoever. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is > used for the internal database > --- > > Key: AIRFLOW-3973 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 > Project: Apache Airflow > Issue Type: Bug >Reporter: Elliott Shugerman >Assignee: Elliott Shugerman >Priority: Minor > > h2. Notes: > * This does not occur if the database is already initialized. If it is, run > `resetdb` instead to observe the bug. > * This does not occur with the default SQLite database. > h2. Example > {{ERROR [airflow.models.DagBag] Failed to import: > /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): > File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1236, in _execute_context cursor, statement, parameters, context File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 536, in do_execute cursor.execute(statement, parameters) > psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM > variable}} > h2. Explanation > The first thing {{airflow initdb}} does is run the Alembic migrations. All > migrations are run in one transaction. Most tables, including the > {{variable}} table, are defined in the initial migration. A [later > migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] > imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} > calls its {{collect_dags}} method, which scans the DAGs directory and > attempts to load all DAGs it finds. When it loads a DAG that uses a > {{Variable}}, it will query the database to see if that {{Variable}} is > defined in the {{variable}} table. It's not clear to me how exactly the > connection for that query is created, but I think it is apparent that it does > _not_ use the same transaction that is used to run the migrations. Since the > migrations are not yet complete, and all migrations are run in one > transaction, the migration that creates the {{variable}} table has not yet > been committed, and therefore the table does not exist to any other > connection/transaction. This raises {{ProgrammingError}}, which is caught and > logged by {{collect_dags}}. > > h2. Proposed Solution > Run each Alembic migration in its own transaction. I will open a pull request > which accomplishes this shortly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] eeshugerman opened a new pull request #4797: [AIRFLOW-3973] problem: `initdb` spams log with errors | solution: run each migration in its own transaction
eeshugerman opened a new pull request #4797: [AIRFLOW-3973] problem: `initdb` spams log with errors | solution: run each migration in its own transaction URL: https://github.com/apache/airflow/pull/4797 ### Jira https://issues.apache.org/jira/browse/AIRFLOW-3973 ### Description If `Variable`s are used in DAGs, and Postgres is used for the internal database, a fresh `$ airflow initdb` (or `$ airflow resetdb`) spams the logs with error messages (but does not fail). This commit corrects this by running each migration in a separate transaction. See Jira ticket for more details. I have tested this change with the default SQLite database and, of course, with Postgres. ### Tests No tests included as this is a one line change which adds no functionality whatsoever. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] galak75 commented on issue #4691: [AIRFLOW-1814] : templatize PythonOperator op_args and op_kwargs fields
galak75 commented on issue #4691: [AIRFLOW-1814] : templatize PythonOperator op_args and op_kwargs fields URL: https://github.com/apache/airflow/pull/4691#issuecomment-468008932 Thanks a lot @Fokko! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] lucafuji removed a comment on issue #4783: [AIRFLOW-578] Fix check return code
lucafuji removed a comment on issue #4783: [AIRFLOW-578] Fix check return code URL: https://github.com/apache/airflow/pull/4783#issuecomment-467673428 @ddavydov Sorry for bothering. But it seems my newly introduced tests broke your impersonation tests. I took a look but have no idea why it broke. Would you mind help take a look, thanks This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
[ https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Shugerman updated AIRFLOW-3973: --- Description: h2. Notes: * This does not occur if the database is already initialized. If it is, run `resetdb` instead to observe the bug. * This does not occur with the default SQLite database. h2. Example {{ERROR [airflow.models.DagBag] Failed to import: /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context cursor, statement, parameters, context File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute cursor.execute(statement, parameters) psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM variable}} h2. Explanation The first thing {{airflow initdb}} does is run the Alembic migrations. All migrations are run in one transaction. Most tables, including the {{variable}} table, are defined in the initial migration. A [later migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} calls its {{collect_dags}} method, which scans the DAGs directory and attempts to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it will query the database to see if that {{Variable}} is defined in the {{variable}} table. It's not clear to me how exactly the connection for that query is created, but I think it is a fair assumption that it does _not_ use the same transaction that is used to run the migrations. Since the migrations are not yet complete, and all migrations are run in one transaction, the migration that creates the {{variable}} table has not yet been committed, and therefore the table does not exist to any other connection/transaction. This raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}. h2. Proposed Solution Run each Alembic migration in its own transaction. I will open a pull request which accomplishes this shortly. was: h2. Example {{ERROR [airflow.models.DagBag] Failed to import: /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context cursor, statement, parameters, context File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute cursor.execute(statement, parameters) psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM variable}} h2. Explanation The first thing {{airflow initdb}} does is run the Alembic migrations. All migrations are run in one transaction. Most tables, including the {{variable}} table, are defined in the initial migration. A [later migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} calls its {{collect_dags}} method, which scans the DAGs directory and attempts to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it will query the database to see if that {{Variable}} is defined in the {{variable}} table. It's not clear to me how exactly the connection for that query is created, but I think it is a fair assumption that it does _not_ use the same transaction that is used to run the migrations. Since the migrations are not yet complete, and all migrations are run in one transaction, the migration that creates the {{variable}} table has not yet been committed, and therefore the table does not exist to any other connection/transaction. This raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}. NOTE: This does not occur with the default SQLite database. h2. Proposed Solution Run each Alembic migration in its own transaction. I will open a pull request which accomplishes this shortly. > `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is > used for the internal database > --- > > Key: AIRFLOW-3973 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 > Project: Apache Airflow > Issue Type: Bug >Reporter: Elliott Shugerman >Assignee: Elliott Shugerman >Priority: Minor > > h2. Notes: > * This does not occur if the database is already initialized. If it is, run > `resetdb` instead to observe the bug. > * This does not occur with the default
[jira] [Commented] (AIRFLOW-3853) Duplicate Logs appearing in S3
[ https://issues.apache.org/jira/browse/AIRFLOW-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779683#comment-16779683 ] ASF GitHub Bot commented on AIRFLOW-3853: - samuelwbock commented on pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload URL: https://github.com/apache/airflow/pull/4675 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Duplicate Logs appearing in S3 > -- > > Key: AIRFLOW-3853 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3853 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.2 >Reporter: Sam Bock >Assignee: Sam Bock >Priority: Major > > We've recently started to see duplicate logs in S3. After digging into it, we > discovered that this was due to our use of the new `reschedule` mode on our > sensors. Because the same `try_number` is used when a task reschedules, the > local log file frequently contains results from previous attempts. > Additionally, because the `s3_task_helper.py` always tries to `append` the > local log file to the remove log file, this can result in massive logs (we > found one that 400 mb). > To fix this, we'd like to remove the local log after a successful upload. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3853) Duplicate Logs appearing in S3
[ https://issues.apache.org/jira/browse/AIRFLOW-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779684#comment-16779684 ] ASF GitHub Bot commented on AIRFLOW-3853: - samuelwbock commented on pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload URL: https://github.com/apache/airflow/pull/4675 Make sure you have checked _all_ steps below. ### Jira - [X] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-3853) issues and references them in the PR title. For example, "\[AIRFLOW-3853\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3853 ### Description - [X] We've recently started to see duplicate logs in S3. After digging into it, we discovered that this was due to our use of the new `reschedule` mode on our sensors. Because the same `try_number` is used when a task reschedules, the local log file frequently contains results from previous attempts. Additionally, because the `s3_task_helper.py` always tries to `append` the local log file to the remove log file, this can result in massive logs (we found one that 400 mb). To fix this, we'd like to remove the local log after a successful upload. Because the file is uploaded to S3, no data will be lost. ### Tests - [X] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: I've modified the following unit tests to cover the change to `s3_write`: `test_write`, `test_write_existing`, `test_write_raises`. ### Commits - [X] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [X] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [X] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Duplicate Logs appearing in S3 > -- > > Key: AIRFLOW-3853 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3853 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.10.2 >Reporter: Sam Bock >Assignee: Sam Bock >Priority: Major > > We've recently started to see duplicate logs in S3. After digging into it, we > discovered that this was due to our use of the new `reschedule` mode on our > sensors. Because the same `try_number` is used when a task reschedules, the > local log file frequently contains results from previous attempts. > Additionally, because the `s3_task_helper.py` always tries to `append` the > local log file to the remove log file, this can result in massive logs (we > found one that 400 mb). > To fix this, we'd like to remove the local log after a successful upload. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] samuelwbock opened a new pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload
samuelwbock opened a new pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload URL: https://github.com/apache/airflow/pull/4675 Make sure you have checked _all_ steps below. ### Jira - [X] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-3853) issues and references them in the PR title. For example, "\[AIRFLOW-3853\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3853 ### Description - [X] We've recently started to see duplicate logs in S3. After digging into it, we discovered that this was due to our use of the new `reschedule` mode on our sensors. Because the same `try_number` is used when a task reschedules, the local log file frequently contains results from previous attempts. Additionally, because the `s3_task_helper.py` always tries to `append` the local log file to the remove log file, this can result in massive logs (we found one that 400 mb). To fix this, we'd like to remove the local log after a successful upload. Because the file is uploaded to S3, no data will be lost. ### Tests - [X] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: I've modified the following unit tests to cover the change to `s3_write`: `test_write`, `test_write_existing`, `test_write_raises`. ### Commits - [X] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [X] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [X] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] samuelwbock closed pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload
samuelwbock closed pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload URL: https://github.com/apache/airflow/pull/4675 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3964) Reduce duplicated tasks and optimize with scheduler embedded sensor
[ https://issues.apache.org/jira/browse/AIRFLOW-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingbo Wang updated AIRFLOW-3964: - Description: h2. Problem h3. Airflow Sensor: Sensors are a certain type of operator that will keep running until a certain criterion is met. Examples include a specific file landing in HDFS or S3, a partition appearing in Hive, or a specific time of the day. Sensors are derived from BaseSensorOperator and run a poke method at a specified poke_interval until it returns True. Airflow Sensor duplication is a normal problem for large scale airflow project. There are duplicated partitions needing to be detected from same/different DAG. In Airbnb there are 88 boxes running four different types of sensors everyday. The number of running sensor tasks ranges from 8k to 16k, which takes great amount of resources. Although Airflow team had redirected all sensors to a specific queue to allocate relatively minor resource, there is still large room to reduce the number of workers and relief DB pressure by optimizing the sensor mechanism. Existing sensor implementation creates an identical task for any sensor task with specific dag_id, task_id and execution_date. This task is responsible of keeping querying DB until the specified partitions exists. Even if two tasks are waiting for same partition in DB, they are creating two connections with the DB and checking the status in two separate processes. In one hand, DB need to run duplicate jobs in multiple processes which will take both cpu and memory resources. At the same time, Airflow need to maintain a process for each sensor to query and wait for the partition/table to be created. To optimize the sensor, add a hashcode for each partition decided by the set of (conn_id, schema, table, partition). Add dependencies between qualified sensors and partitions. Use a single entry for each sensor to query DB and avoid duplication in Airflow. Add a sensor scheduling part in scheduler to: # Check partitions status to enable downstream sensor success and trigger sensor downstream tasks # Selecting all pending partitions in DB including: ## New coming partition sensor request ## Existing sensor request that is still waiting # With a time interval: ## Create the set of tasks for sensing all pending partitions. ## Kill previous sensor tasks # For the task mentioned in 3: Each task should check many partitions. We can introduce the sensor chunk number here for a maximum number of partitions one task should handle. The sensors keep updating partition status in Airflow DB during running. was: h2. Problem h3. Airflow Sensor: Sensors are a certain type of operator that will keep running until a certain criterion is met. Examples include a specific file landing in HDFS or S3, a partition appearing in Hive, or a specific time of the day. Sensors are derived from BaseSensorOperator and run a poke method at a specified poke_interval until it returns True. Airflow Sensor duplication is a normal problem for large scale airflow project. There are duplicated partitions needing to be detected from same/different DAG. In Airbnb there are 88 boxes running four different types of sensors everyday. The number of running sensor tasks ranges from 8k to 16k, which takes great amount of resources. Although Airflow team had redirected all sensors to a specific queue to allocate relatively minor resource, there is still large room to reduce the number of workers and relief DB pressure by optimizing the sensor mechanism. Existing sensor implementation creates an identical task for any sensor task with specific dag_id, task_id and execution_date. This task is responsible of keeping querying DB until the specified partitions exists. Even if two tasks are waiting for same partition in DB, they are creating two connections with the DB and checking the status in two separate processes. In one hand, DB need to run duplicate jobs in multiple processes which will take both cpu and memory resources. At the same time, Airflow need to maintain a process for each sensor to query and wait for the partition/table to be created. To optimize the sensor, add a hashcode for each partition decided by the set of (conn_id, schema, table, partition). Add dependencies between qualified sensors and partitions. Use a single entry for each sensor to query DB and avoid duplication in Airflow. Add a sensor scheduling part in scheduler to: # Check partitions status to enable downstream sensor success and trigger sensor downstream tasks # Selecting all pending partitions in DB including: ## New coming partition sensor request ## Existing sensor request that is still waiting ## With a time interval: ### Create the set of tasks for sensing all pending partitions. ### Kill previous sensor tasks # For the task mentioned in 3: Each task should check many partitions.
[jira] [Work started] (AIRFLOW-3964) Reduce duplicated tasks and optimize with scheduler embedded sensor
[ https://issues.apache.org/jira/browse/AIRFLOW-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-3964 started by Yingbo Wang. > Reduce duplicated tasks and optimize with scheduler embedded sensor > > > Key: AIRFLOW-3964 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3964 > Project: Apache Airflow > Issue Type: Improvement > Components: dependencies, operators, scheduler >Reporter: Yingbo Wang >Assignee: Yingbo Wang >Priority: Critical > > h2. Problem > h3. Airflow Sensor: > Sensors are a certain type of operator that will keep running until a certain > criterion is met. Examples include a specific file landing in HDFS or S3, a > partition appearing in Hive, or a specific time of the day. Sensors are > derived from BaseSensorOperator and run a poke method at a specified > poke_interval until it returns True. > Airflow Sensor duplication is a normal problem for large scale airflow > project. There are duplicated partitions needing to be detected from > same/different DAG. In Airbnb there are 88 boxes running four different types > of sensors everyday. The number of running sensor tasks ranges from 8k to > 16k, which takes great amount of resources. Although Airflow team had > redirected all sensors to a specific queue to allocate relatively minor > resource, there is still large room to reduce the number of workers and > relief DB pressure by optimizing the sensor mechanism. > Existing sensor implementation creates an identical task for any sensor task > with specific dag_id, task_id and execution_date. This task is responsible of > keeping querying DB until the specified partitions exists. Even if two tasks > are waiting for same partition in DB, they are creating two connections with > the DB and checking the status in two separate processes. In one hand, DB > need to run duplicate jobs in multiple processes which will take both cpu and > memory resources. At the same time, Airflow need to maintain a process for > each sensor to query and wait for the partition/table to be created. > To optimize the sensor, add a hashcode for each partition decided by the set > of (conn_id, schema, table, partition). Add dependencies between qualified > sensors and partitions. Use a single entry for each sensor to query DB and > avoid duplication in Airflow. > Add a sensor scheduling part in scheduler to: > # Check partitions status to enable downstream sensor success and trigger > sensor downstream tasks > # Selecting all pending partitions in DB including: > ## New coming partition sensor request > ## Existing sensor request that is still waiting > # With a time interval: > ## Create the set of tasks for sensing all pending partitions. > ## Kill previous sensor tasks > # For the task mentioned in 3: Each task should check many partitions. We > can introduce the sensor chunk number here for a maximum number of partitions > one task should handle. The sensors keep updating partition status in Airflow > DB during running. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3964) Reduce duplicated tasks and optimize with scheduler embedded sensor
[ https://issues.apache.org/jira/browse/AIRFLOW-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingbo Wang updated AIRFLOW-3964: - Description: h2. Problem h3. Airflow Sensor: Sensors are a certain type of operator that will keep running until a certain criterion is met. Examples include a specific file landing in HDFS or S3, a partition appearing in Hive, or a specific time of the day. Sensors are derived from BaseSensorOperator and run a poke method at a specified poke_interval until it returns True. Airflow Sensor duplication is a normal problem for large scale airflow project. There are duplicated partitions needing to be detected from same/different DAG. In Airbnb there are 88 boxes running four different types of sensors everyday. The number of running sensor tasks ranges from 8k to 16k, which takes great amount of resources. Although Airflow team had redirected all sensors to a specific queue to allocate relatively minor resource, there is still large room to reduce the number of workers and relief DB pressure by optimizing the sensor mechanism. Existing sensor implementation creates an identical task for any sensor task with specific dag_id, task_id and execution_date. This task is responsible of keeping querying DB until the specified partitions exists. Even if two tasks are waiting for same partition in DB, they are creating two connections with the DB and checking the status in two separate processes. In one hand, DB need to run duplicate jobs in multiple processes which will take both cpu and memory resources. At the same time, Airflow need to maintain a process for each sensor to query and wait for the partition/table to be created. To optimize the sensor, add a hashcode for each partition decided by the set of (conn_id, schema, table, partition). Add dependencies between qualified sensors and partitions. Use a single entry for each sensor to query DB and avoid duplication in Airflow. Add a sensor scheduling part in scheduler to: # Check partitions status to enable downstream sensor success and trigger sensor downstream tasks # Selecting all pending partitions in DB including: ## New coming partition sensor request ## Existing sensor request that is still waiting ## With a time interval: ### Create the set of tasks for sensing all pending partitions. ### Kill previous sensor tasks # For the task mentioned in 3: Each task should check many partitions. We can introduce the sensor chunk number here for a maximum number of partitions one task should handle. The sensors keep updating partition status in Airflow DB during running was: h2. Problem h3. Airflow Sensor: Sensors are a certain type of operator that will keep running until a certain criterion is met. Examples include a specific file landing in HDFS or S3, a partition appearing in Hive, or a specific time of the day. Sensors are derived from BaseSensorOperator and run a poke method at a specified poke_interval until it returns True. Airflow Sensor duplication is a normal problem for large scale airflow project. There are duplicated partitions needing to be detected from same/different DAG. In Airbnb there are 88 boxes running four different types of sensors everyday. The number of running sensor tasks ranges from 8k to 16k, which takes great amount of resources. Although Airflow team had redirected all sensors to a specific queue to allocate relatively minor resource, there is still large room to reduce the number of workers and relief DB pressure by optimizing the sensor mechanism. Existing sensor implementation creates an identical task for any sensor task with specific dag_id, task_id and execution_date. This task is responsible of keeping querying DB until the specified partitions exists. Even if two tasks are waiting for same partition in DB, they are creating two connections with the DB and checking the status in two separate processes. In one hand, DB need to run duplicate jobs in multiple processes which will take both cpu and memory resources. At the same time, Airflow need to maintain a process for each sensor to query and wait for the partition/table to be created. To optimize the sensor, add a hashcode for each partition decided by the set of (conn_id, schema, table, partition). Add dependencies between qualified sensors and partitions. Use a single entry for each sensor to query DB and avoid duplication in Airflow. Add a sensor scheduling part in scheduler to: # Check partitions status to enable downstream sensor success and trigger sensor downstream tasks # Selecting all pending partitions in DB including: # New coming partition sensor request # Existing sensor request that is still waiting # With a time interval: # Create the set of tasks for sensing all pending partitions. # Kill previous sensor tasks # For the task mentioned in 3: Each task should check many partitions.
[jira] [Updated] (AIRFLOW-3964) Reduce duplicated tasks and optimize with scheduler embedded sensor
[ https://issues.apache.org/jira/browse/AIRFLOW-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingbo Wang updated AIRFLOW-3964: - Description: h2. Problem h3. Airflow Sensor: Sensors are a certain type of operator that will keep running until a certain criterion is met. Examples include a specific file landing in HDFS or S3, a partition appearing in Hive, or a specific time of the day. Sensors are derived from BaseSensorOperator and run a poke method at a specified poke_interval until it returns True. Airflow Sensor duplication is a normal problem for large scale airflow project. There are duplicated partitions needing to be detected from same/different DAG. In Airbnb there are 88 boxes running four different types of sensors everyday. The number of running sensor tasks ranges from 8k to 16k, which takes great amount of resources. Although Airflow team had redirected all sensors to a specific queue to allocate relatively minor resource, there is still large room to reduce the number of workers and relief DB pressure by optimizing the sensor mechanism. Existing sensor implementation creates an identical task for any sensor task with specific dag_id, task_id and execution_date. This task is responsible of keeping querying DB until the specified partitions exists. Even if two tasks are waiting for same partition in DB, they are creating two connections with the DB and checking the status in two separate processes. In one hand, DB need to run duplicate jobs in multiple processes which will take both cpu and memory resources. At the same time, Airflow need to maintain a process for each sensor to query and wait for the partition/table to be created. To optimize the sensor, add a hashcode for each partition decided by the set of (conn_id, schema, table, partition). Add dependencies between qualified sensors and partitions. Use a single entry for each sensor to query DB and avoid duplication in Airflow. Add a sensor scheduling part in scheduler to: # Check partitions status to enable downstream sensor success and trigger sensor downstream tasks # Selecting all pending partitions in DB including: # New coming partition sensor request # Existing sensor request that is still waiting # With a time interval: # Create the set of tasks for sensing all pending partitions. # Kill previous sensor tasks # For the task mentioned in 3: Each task should check many partitions. We can introduce the sensor chunk number here for a maximum number of partitions one task should handle. The sensors keep updating partition status in Airflow DB during running was: h2. Problem h3. Airflow Sensor: Sensors are a certain type of operator that will keep running until a certain criterion is met. Examples include a specific file landing in HDFS or S3, a partition appearing in Hive, or a specific time of the day. Sensors are derived from BaseSensorOperator and run a poke method at a specified poke_interval until it returns True. Airflow Sensor duplication is a normal problem for large scale airflow project. There are duplicated partitions needing to be detected from same/different DAG. In Airbnb there are 88 boxes running four different types of sensors everyday. The number of running sensor tasks ranges from 8k to 16k, which takes great amount of resources. Although Airflow team had redirected all sensors to a specific queue to allocate relatively minor resource, there is still large room to reduce the number of workers and relief DB pressure by optimizing the sensor mechanism. Existing sensor implementation creates an identical task for any sensor task with specific dag_id, task_id and execution_date. This task is responsible of keeping querying DB until the specified partitions exists. Even if two tasks are waiting for same partition in DB, they are creating two connections with the DB and checking the status in two separate processes. In one hand, DB need to run duplicate jobs in multiple processes which will take both cpu and memory resources. At the same time, Airflow need to maintain a process for each sensor to query and wait for the partition/table to be created. Airflow Scheduler: Airflow scheduler is responsible of parsing DAGs and scheduling airflow tasks. The jobs.process_file function process all python file that have “airflow” and “Dag”: # Execute the file and look for DAG objects in the namespace. # Pickle the DAG and save it to the DB (if necessary). # For each DAG, see what tasks should run and create appropriate task instances in the DB. # Record any errors importing the file into ORM # Kill (in ORM) any task instances belonging to the DAGs that haven't issued a heartbeat in a while. This function returns a list of SimpleDag objects that represent the DAGs found in the file There are some issues with existing Airflow scheduler: # Multiple parsing: Scheduler will parse a
[jira] [Updated] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
[ https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Shugerman updated AIRFLOW-3973: --- Description: h2. Example {{ERROR [airflow.models.DagBag] Failed to import: /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context cursor, statement, parameters, context File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute cursor.execute(statement, parameters) psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM variable}} h2. Explanation The first thing {{airflow initdb}} does is run the Alembic migrations. All migrations are run in one transaction. Most tables, including the {{variable}} table, are defined in the initial migration. A [later migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} calls its {{collect_dags}} method, which scans the DAGs directory and attempts to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it will query the database to see if that {{Variable}} is defined in the {{variable}} table. It's not clear to me how exactly the connection for that query is created, but I think it is a fair assumption that it does _not_ use the same transaction that is used to run the migrations. Since the migrations are not yet complete, and all migrations are run in one transaction, the migration that creates the {{variable}} table has not yet been committed, and therefore the table does not exist to any other connection/transaction. This raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}. NOTE: This does not occur with the default SQLite database. h2. Proposed Solution Run each Alembic migration in its own transaction. I will open a pull request which accomplishes this shortly. was: h2. Example {{ERROR [airflow.models.DagBag] Failed to import: /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context cursor, statement, parameters, context File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute cursor.execute(statement, parameters) psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM variable}} h2. Explanation The first thing {{airflow initdb}} does is run the Alembic migrations. All migrations are run in one transaction. Most tables, including the {{variable}} table, are defined in the initial migration. A [later migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} calls its {{collect_dags}} method, which scans the DAGs directory and attempts to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it will query the database to see if that {{Variable}} is defined in the {{variable}} table. It's not clear to me how exactly the connection for that query is created, but I think it is a fair assumption that it does _not_ use the same transaction that is used to run the migrations. Since the migrations are not yet complete, and all migrations are run in one transaction, the migration that creates the {{variable}} table has not yet been committed, and therefore the table does not exist to any other connection/transaction. This raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}. NOTE: This does not occur with the default SQLite database. h2. Proposed Solution Run each Alembic migration in its own transaction. I will be opening a pull request which accomplishes this shortly. > `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is > used for the internal database > --- > > Key: AIRFLOW-3973 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 > Project: Apache Airflow > Issue Type: Bug >Reporter: Elliott Shugerman >Assignee: Elliott Shugerman >Priority: Minor > > h2. Example > {{ERROR [airflow.models.DagBag] Failed to import: > /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): > File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1236, in _execute_context cursor,
[jira] [Created] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
Elliott Shugerman created AIRFLOW-3973: -- Summary: `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database Key: AIRFLOW-3973 URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 Project: Apache Airflow Issue Type: Bug Reporter: Elliott Shugerman Assignee: Elliott Shugerman h2. Example {{ERROR [airflow.models.DagBag] Failed to import: /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context cursor, statement, parameters, context File "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute cursor.execute(statement, parameters) psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM variable}} h2. Explanation The first thing {{airflow initdb}} does is run the Alembic migrations. All migrations are run in one transaction. Most tables, including the {{variable}} table, are defined in the initial migration. A [later migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} calls its {{collect_dags}} method, which scans the DAGs directory and attempts to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it will query the database to see if that {{Variable}} is defined in the {{variable}} table. It's not clear to me how exactly the connection for that query is created, but I think it is a fair assumption that it does _not_ use the same transaction that is used to run the migrations. Since the migrations are not yet complete, and all migrations are run in one transaction, the migration that creates the {{variable}} table has not yet been committed, and therefore the table does not exist to any other connection/transaction. This raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}. NOTE: This does not occur with the default SQLite database. h2. Proposed Solution Run each Alembic migration in its own transaction. I will be opening a pull request which accomplishes this shortly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3972) Http Operators sets Connection model's schema property to the scheme of the uri
[ https://issues.apache.org/jira/browse/AIRFLOW-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-3972: --- Fix Version/s: 2.0.0 Description: The HttpOperator is expecting there to be a *schema* property on the Connection model after it parses through a URI env connection string here: [https://github.com/apache/airflow/blob/6b38649fa6cdf16055c7f5458050c70f39cac8fd/airflow/hooks/http_hook.py#L67] However the Connection model uses a subset of the uri path to set the `schema` property on itself. The HttpOperator sets the url to *schema + '://' + host* and if the schema isn't set, it uses http by default which prevents us from hitting https endpoints. the HTTP operator is assuming the schema property is a scheme, and the connection model doesn't include a scheme in the model. was: The HttpOperator is expecting there to be a *schema* property on the Connection model after it parses through a URI env connection string here: [https://github.com/apache/airflow/blob/6b38649fa6cdf16055c7f5458050c70f39cac8fd/airflow/hooks/http_hook.py#L67] However the Connection model uses a subset of the uri path to set the `schema` property on itself. The HttpOperator sets the url to *schema + '://' + host* and if the schema isn't set, it uses http by default which prevents us from hitting https endpoints. the HTTP operator is assuming the schema property is a scheme, and the connection model doesn't include a scheme in the model. > Http Operators sets Connection model's schema property to the scheme of the > uri > --- > > Key: AIRFLOW-3972 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3972 > Project: Apache Airflow > Issue Type: Bug > Components: hooks, models >Affects Versions: 1.10.2 >Reporter: Kamla >Priority: Major > Fix For: 2.0.0 > > > The HttpOperator is expecting there to be a *schema* property on the > Connection model after it parses through a URI env connection string here: > [https://github.com/apache/airflow/blob/6b38649fa6cdf16055c7f5458050c70f39cac8fd/airflow/hooks/http_hook.py#L67] > However the Connection model uses a subset of the uri path to set the > `schema` property on itself. The HttpOperator sets the url to *schema + '://' > + host* and if the schema isn't set, it uses http by default which prevents > us from hitting https endpoints. > the HTTP operator is assuming the schema property is a scheme, and the > connection model doesn't include a scheme in the model. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3918) Adding a parameter in Airflow-kubernetes config to support git-sync with SSH credential
[ https://issues.apache.org/jira/browse/AIRFLOW-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Mateus Pires updated AIRFLOW-3918: - External issue URL: https://github.com/apache/airflow/pull/4777 > Adding a parameter in Airflow-kubernetes config to support git-sync with SSH > credential > --- > > Key: AIRFLOW-3918 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3918 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Daniel Mateus Pires >Assignee: Daniel Mateus Pires >Priority: Minor > > It's the preferred pattern in my work place to integrate deployment systems > with GitHub using the SSH deploy key feature that can easily be scoped to > read-only on a single repository > I would like to support this feature by supporting a "git_ssh_key_file" > parameter in the kubernetes section of the config, which would be an > alternate authentication method to the already supported git_user + > git_password > It will use the following feature: > https://github.com/kubernetes/git-sync/blob/7bb3262084ac1ad64321856c1e769358cf18f67d/cmd/git-sync/main.go#L88 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] vardancse commented on a change in pull request #4781: [AIRFLOW-3962] Added graceful handling for creation of dag_run of a dag which doesn't have any task
vardancse commented on a change in pull request #4781: [AIRFLOW-3962] Added graceful handling for creation of dag_run of a dag which doesn't have any task URL: https://github.com/apache/airflow/pull/4781#discussion_r260836841 ## File path: tests/test_jobs.py ## @@ -2665,6 +2665,22 @@ def test_scheduler_do_not_schedule_too_early(self): queue.put.assert_not_called() +def test_scheduler_do_not_schedule_without_tasks(self): +dag = DAG( +dag_id='test_scheduler_do_not_schedule_without_tasks', +start_date=DEFAULT_DATE) + +with create_session() as session: +orm_dag = DagModel(dag_id=dag.dag_id) +session.merge(orm_dag) +session.commit() Review comment: Thanks for catching that, removed suggested lines. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mik-laj commented on issue #4787: [AIRFLOW-3967] Extract Jinja directive from Javascript
mik-laj commented on issue #4787: [AIRFLOW-3967] Extract Jinja directive from Javascript URL: https://github.com/apache/airflow/pull/4787#issuecomment-467932679 I finished my work. It looks good to me. However, I encourage you to submit your suggestions. @ashb Can I ask for review? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mik-laj commented on a change in pull request #4786: [AIRFLOW-3966] Correct default bigquery_conn_id in BigQueryTableSensor
mik-laj commented on a change in pull request #4786: [AIRFLOW-3966] Correct default bigquery_conn_id in BigQueryTableSensor URL: https://github.com/apache/airflow/pull/4786#discussion_r260830804 ## File path: airflow/contrib/sensors/bigquery_sensor.py ## @@ -50,7 +50,7 @@ def __init__(self, project_id, dataset_id, table_id, - bigquery_conn_id='bigquery_default_conn', + bigquery_conn_id='bigquery_default', Review comment: I started working in this direction, but I did not have time to finish it. If you want, you can base your work on this change. https://github.com/PolideaInternal/airflow/pull/42/files This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mik-laj commented on a change in pull request #4784: [AIRFLOW-XXX][WIP]Enforce order in imports
mik-laj commented on a change in pull request #4784: [AIRFLOW-XXX][WIP]Enforce order in imports URL: https://github.com/apache/airflow/pull/4784#discussion_r260828711 ## File path: setup.py ## @@ -233,6 +233,8 @@ def write_version(filename=os.path.join(*['airflow', devel = [ 'click==6.7', +'flake8-import-order-0.18', Review comment: ```suggestion 'flake8-import-order>=0.18', ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #4781: [AIRFLOW-3962] Added graceful handling for creation of dag_run of a dag which doesn't have any task
Fokko commented on a change in pull request #4781: [AIRFLOW-3962] Added graceful handling for creation of dag_run of a dag which doesn't have any task URL: https://github.com/apache/airflow/pull/4781#discussion_r260806069 ## File path: tests/test_jobs.py ## @@ -2665,6 +2665,22 @@ def test_scheduler_do_not_schedule_too_early(self): queue.put.assert_not_called() +def test_scheduler_do_not_schedule_without_tasks(self): +dag = DAG( +dag_id='test_scheduler_do_not_schedule_without_tasks', +start_date=DEFAULT_DATE) + +with create_session() as session: +orm_dag = DagModel(dag_id=dag.dag_id) +session.merge(orm_dag) +session.commit() Review comment: `.commit()` and `.close()` can be omitted since they are part of the `create_session` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #4699: [AIRFLOW-3881] Correct to_csv row number
Fokko commented on issue #4699: [AIRFLOW-3881] Correct to_csv row number URL: https://github.com/apache/airflow/pull/4699#issuecomment-467909116 Thanks @zhongjiajie This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko merged pull request #4699: [AIRFLOW-3881] Correct to_csv row number
Fokko merged pull request #4699: [AIRFLOW-3881] Correct to_csv row number URL: https://github.com/apache/airflow/pull/4699 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #4699: [AIRFLOW-3881] Correct to_csv row number
Fokko commented on a change in pull request #4699: [AIRFLOW-3881] Correct to_csv row number URL: https://github.com/apache/airflow/pull/4699#discussion_r260804565 ## File path: tests/hooks/test_hive_hook.py ## @@ -451,7 +452,23 @@ def test_get_results_data(self): results = hook.get_results(query, schema=self.database) self.assertListEqual(results['data'], [(1, 1), (2, 2)]) -def test_to_csv(self): +@unittest.skipIf(NOT_ASSERTLOGS_VERSION < 3.4, 'assertLogs not support before python 3.4') Review comment: Ah check, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3795) provide_context is not a passable parameter for PythonVirtualenvOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779440#comment-16779440 ] ASF subversion and git services commented on AIRFLOW-3795: -- Commit 217c940d0e82c0b8bf0d43c26d69297d2d374107 in airflow's branch refs/heads/master from Sergio Soto [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=217c940 ] [AIRFLOW-3795] provide_context param is now used (#4735) * provide_context param is now used * Fixed new PythonVirtualenvOperator test > provide_context is not a passable parameter for PythonVirtualenvOperator > > > Key: AIRFLOW-3795 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3795 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Reporter: Susannah Doss >Assignee: Sergio Soto Núñez >Priority: Trivial > > `PythonVirtualenvOperator` does not allow me to specify > `provide_context=True`: > https://github.com/apache/airflow/blob/83cb9c3acdd3b4eeadf1cab3cb45d644c3e9ede0/airflow/operators/python_operator.py#L242 > However, I am able to do so when I use the plain `PythonOperator`. I can't > see a reason why I wouldn't be allowed to have it be set to `True` when using > a `PythonVirtualenvOperator`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3795) provide_context is not a passable parameter for PythonVirtualenvOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779439#comment-16779439 ] ASF GitHub Bot commented on AIRFLOW-3795: - Fokko commented on pull request #4735: [AIRFLOW-3795] provide_context param is now used URL: https://github.com/apache/airflow/pull/4735 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > provide_context is not a passable parameter for PythonVirtualenvOperator > > > Key: AIRFLOW-3795 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3795 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Reporter: Susannah Doss >Assignee: Sergio Soto Núñez >Priority: Trivial > > `PythonVirtualenvOperator` does not allow me to specify > `provide_context=True`: > https://github.com/apache/airflow/blob/83cb9c3acdd3b4eeadf1cab3cb45d644c3e9ede0/airflow/operators/python_operator.py#L242 > However, I am able to do so when I use the plain `PythonOperator`. I can't > see a reason why I wouldn't be allowed to have it be set to `True` when using > a `PythonVirtualenvOperator`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-3795) provide_context is not a passable parameter for PythonVirtualenvOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-3795. --- Resolution: Fixed Fix Version/s: 2.0.0 > provide_context is not a passable parameter for PythonVirtualenvOperator > > > Key: AIRFLOW-3795 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3795 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Reporter: Susannah Doss >Assignee: Sergio Soto Núñez >Priority: Trivial > Fix For: 2.0.0 > > > `PythonVirtualenvOperator` does not allow me to specify > `provide_context=True`: > https://github.com/apache/airflow/blob/83cb9c3acdd3b4eeadf1cab3cb45d644c3e9ede0/airflow/operators/python_operator.py#L242 > However, I am able to do so when I use the plain `PythonOperator`. I can't > see a reason why I wouldn't be allowed to have it be set to `True` when using > a `PythonVirtualenvOperator`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko merged pull request #4735: [AIRFLOW-3795] provide_context param is now used
Fokko merged pull request #4735: [AIRFLOW-3795] provide_context param is now used URL: https://github.com/apache/airflow/pull/4735 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #4786: [AIRFLOW-3966] Correct default bigquery_conn_id in BigQueryTableSensor
Fokko commented on a change in pull request #4786: [AIRFLOW-3966] Correct default bigquery_conn_id in BigQueryTableSensor URL: https://github.com/apache/airflow/pull/4786#discussion_r260800642 ## File path: airflow/contrib/sensors/bigquery_sensor.py ## @@ -50,7 +50,7 @@ def __init__(self, project_id, dataset_id, table_id, - bigquery_conn_id='bigquery_default_conn', + bigquery_conn_id='bigquery_default', Review comment: Agree with @mik-laj :-) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mik-laj commented on issue #4788: [AIRFLOW-3811][3/3] Add automatic generation of API Reference
mik-laj commented on issue #4788: [AIRFLOW-3811][3/3] Add automatic generation of API Reference URL: https://github.com/apache/airflow/pull/4788#issuecomment-467904933 @Fokko This is corrected, but in earlier PR. Please accept changes in order. I have divided all the changes into a few PRs to increase transparency. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] RosterIn opened a new pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit…
RosterIn opened a new pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit… URL: https://github.com/apache/airflow/pull/4795 ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-2767/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2767 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit…
Fokko commented on issue #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit… URL: https://github.com/apache/airflow/pull/4795#issuecomment-467902825 I've restarted the Kubernetes tests This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko closed pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit…
Fokko closed pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit… URL: https://github.com/apache/airflow/pull/4795 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #4788: [AIRFLOW-3811][3/3] Add automatic generation of API Reference
Fokko commented on issue #4788: [AIRFLOW-3811][3/3] Add automatic generation of API Reference URL: https://github.com/apache/airflow/pull/4788#issuecomment-467899614 Fully agree with @potiuk, this was overdue for a long time! Awesome work @mik-laj This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3961) Centos 7 + mysql 8.0 - initdb - Incorrect datetime value
[ https://issues.apache.org/jira/browse/AIRFLOW-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779415#comment-16779415 ] Florian FERREIRA commented on AIRFLOW-3961: --- Hello, I have reinstalled my VM and i can't reproduce. sorry for the inconvenience > Centos 7 + mysql 8.0 - initdb - Incorrect datetime value > > > Key: AIRFLOW-3961 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3961 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.10.2 >Reporter: Florian FERREIRA >Priority: Major > > Hello, i have some problems with initialization of backend database. > How to reproduce : > Airflow : 1.10.2 > Mysql : mysql Ver 8.0.15 for Linux on x86_64 (MySQL Community Server - GPL) > Mysql user : > {code} > CREATE USER 'airflow'@'%' IDENTIFIED WITH mysql_native_password BY ''; GRANT > ALL PRIVILEGES ON airflow.* TO 'airflow'@'%'; > {code} > My.cnf file: > {code} > [mysqld] > # > # Remove leading # and set to the amount of RAM for the most important data > # cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%. > # innodb_buffer_pool_size = 128M > # > # Remove the leading "# " to disable binary logging > # Binary logging captures changes between backups and is enabled by > # default. It's default setting is log_bin=binlog > # disable_log_bin > # > # Remove leading # to set options mainly useful for reporting servers. > # The server defaults are faster for transactions and fast SELECTs. > # Adjust sizes as needed, experiment to find the optimal values. > # join_buffer_size = 128M > # sort_buffer_size = 2M > # read_rnd_buffer_size = 2M > # > # Remove leading # to revert to previous value for > default_authentication_plugin, > # this will increase compatibility with older clients. For background, see: > # > https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_default_authentication_plugin > # default-authentication-plugin=mysql_native_password > datadir=/var/lib/mysql > socket=/var/lib/mysql/mysql.sock > log-error=/var/log/mysqld.log > pid-file=/var/run/mysqld/mysqld.pid > explicit_defaults_for_timestamp=1 > {code} > When i launch on *empty* database `airflow initdb` or `airflow resetdb` > i have the following error : > {code} > airflow resetdb > /usr/lib/python2.7/site-packages/requests/__init__.py:91: > RequestsDependencyWarning: urllib3 (1.24.1) or chardet (2.2.1) doesn't match > a supported version! > RequestsDependencyWarning) > [2019-02-26 14:29:25,280] {settings.py:174} INFO - settings.configure_orm(): > Using pool settings. pool_size=5, pool_recycle=1800, pid=12078 > [2019-02-26 14:29:25,526] {__init__.py:51} INFO - Using executor LocalExecutor > DB: mysql://airflow:***@airflow.bboxdata-dev.lan.oxv.fr/airflow > This will drop existing tables if they exist. Proceed? (y/n)y > [2019-02-26 14:29:26,693] {db.py:358} INFO - Dropping tables that exist > [2019-02-26 14:29:27,238] {migration.py:116} INFO - Context impl MySQLImpl. > [2019-02-26 14:29:27,238] {migration.py:121} INFO - Will assume > non-transactional DDL. > [2019-02-26 14:29:27,265] {db.py:338} INFO - Creating tables > INFO [alembic.runtime.migration] Context impl MySQLImpl. > INFO [alembic.runtime.migration] Will assume non-transactional DDL. > INFO [alembic.runtime.migration] Running upgrade -> e3a246e0dc1, current > schema > INFO [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> > 1507a7289a2f, create is_encrypted > INFO [alembic.runtime.migration] Running upgrade 1507a7289a2f -> > 13eb55f81627, maintain history for compatibility with earlier migrations > INFO [alembic.runtime.migration] Running upgrade 13eb55f81627 -> > 338e90f54d61, More logging into task_instance > INFO [alembic.runtime.migration] Running upgrade 338e90f54d61 -> > 52d714495f0, job_id indices > INFO [alembic.runtime.migration] Running upgrade 52d714495f0 -> > 502898887f84, Adding extra to Log > INFO [alembic.runtime.migration] Running upgrade 502898887f84 -> > 1b38cef5b76e, add dagrun > INFO [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> > 2e541a1dcfed, task_duration > INFO [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> > 40e67319e3a9, dagrun_config > INFO [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> > 561833c1c74b, add password column to user > INFO [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, > dagrun start end > INFO [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, > Add notification_sent column to sla_miss > INFO [alembic.runtime.migration] Running upgrade bbc73705a13e -> > bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field > in connection > INFO [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> > 1968acfc09e3, add
[GitHub] Fokko commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper failed session commit handling
Fokko commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper failed session commit handling URL: https://github.com/apache/airflow/pull/4769#discussion_r260790149 ## File path: airflow/jobs.py ## @@ -2168,147 +2168,149 @@ def _process_backfill_task_instances(self, # or leaf to root, as otherwise tasks might be # determined deadlocked while they are actually # waiting for their upstream to finish +@provide_session Review comment: Personally I would prefer to have a `create_session`, since we commit the result on the last line anyway. If we do this properly, we shouldn't have to do `refresh_from_db` so often. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3967) Avoid mixin Jinja and Javascript
[ https://issues.apache.org/jira/browse/AIRFLOW-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779397#comment-16779397 ] ASF GitHub Bot commented on AIRFLOW-3967: - mik-laj commented on pull request #4787: [AIRFLOW-3967][WIP] Extract Jinja directive from Javascript URL: https://github.com/apache/airflow/pull/4787 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3967 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description The current state is unexpected because: * When Javascript is generated by the template engine, we do metaprogramming, which raises the complexity of the problems. * It does not allow testing the code in isolation. * Jinja is a HTML Template Engine. Using it to the JS code may cause a security risk. As a next step, I would like to: * move the JS code to separate file * introduce linting for all JS/HTML files. * extract inlined CSS to separate file For a long-term goal, I would like to * introduce visual regression and snapshot testing. If the JS code will be in separate files then it will be relatively simple. * update a dependency - Bootstrap 4 including dropping glyphicons @jmcarp I saw that you changed HTML / JS files recently. What do you think about the change and plans for the future? Does what I want to do in the future make sense for you? ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Avoid mixin Jinja and Javascript > > > Key: AIRFLOW-3967 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3967 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Kamil Bregula >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3967) Avoid mixin Jinja and Javascript
[ https://issues.apache.org/jira/browse/AIRFLOW-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779396#comment-16779396 ] ASF GitHub Bot commented on AIRFLOW-3967: - mik-laj commented on pull request #4787: [AIRFLOW-3967][WIP] Extract Jinja directive from Javascript URL: https://github.com/apache/airflow/pull/4787 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Avoid mixin Jinja and Javascript > > > Key: AIRFLOW-3967 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3967 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Kamil Bregula >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper failed session commit handling
Fokko commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper failed session commit handling URL: https://github.com/apache/airflow/pull/4769#discussion_r260789285 ## File path: airflow/jobs.py ## @@ -2168,147 +2168,149 @@ def _process_backfill_task_instances(self, # or leaf to root, as otherwise tasks might be # determined deadlocked while they are actually # waiting for their upstream to finish +@provide_session Review comment: In general I think we should let SQLAlchemy do the pooling and close the sessions that we don't use anymore, instead of keeping them open and passing them around all the time. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #4637: [AIRFLOW-3793] Decommission configuration items for Flask-Admin web UI & related codes
Fokko commented on issue #4637: [AIRFLOW-3793] Decommission configuration items for Flask-Admin web UI & related codes URL: https://github.com/apache/airflow/pull/4637#issuecomment-467893390 Restarted the failing tests This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #4781: [AIRFLOW-3962] Added graceful handling for creation of dag_run of a dag which doesn't have any task
Fokko commented on a change in pull request #4781: [AIRFLOW-3962] Added graceful handling for creation of dag_run of a dag which doesn't have any task URL: https://github.com/apache/airflow/pull/4781#discussion_r260786130 ## File path: tests/test_jobs.py ## @@ -2665,6 +2665,22 @@ def test_scheduler_do_not_schedule_too_early(self): queue.put.assert_not_called() +def test_scheduler_do_not_schedule_without_tasks(self): +dag = DAG( +dag_id='test_scheduler_do_not_schedule_without_tasks', +start_date=DEFAULT_DATE) + +session = settings.Session() Review comment: Please use the `create_session()` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3543) rescheduled tasks block DAG deletion
[ https://issues.apache.org/jira/browse/AIRFLOW-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779404#comment-16779404 ] ASF subversion and git services commented on AIRFLOW-3543: -- Commit 078ff765dbde1a47a0f9bcbd605c711e96201f79 in airflow's branch refs/heads/master from Stefan Seelmann [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=078ff76 ] AIRFLOW-3543: Fix deletion of DAG with rescheduled tasks (#4646) > rescheduled tasks block DAG deletion > > > Key: AIRFLOW-3543 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3543 > Project: Apache Airflow > Issue Type: Bug > Components: cli, database > Environment: postgres 10 database >Reporter: Christopher >Assignee: Stefan Seelmann >Priority: Critical > > This applies to current master branch after > [AIRFLOW-2747|https://github.com/apache/incubator-airflow/commit/dc59d7e2750aa90e099afad8689f2646f18f92a6] > was merged. > Once a sensor task is rescheduled, the task cannot be deleted from the DB due > to a foreign key constraint. This prevents deletion of tasks and DAGS. This > occurs regardless of whether the DAG is still running or whether the sensor > is actually rescheduled to run in the future or not (ie the task may complete > successfully but its entry still resides as a row in the task_reschedule > table. > > I am running a postgres-backed airflow instance. > > {{Traceback (most recent call last):}} > {{ File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1193, in _execute_context}} > {{context) > {{File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 509, in do_execute > {{cursor.execute(statement, parameters)}} > {{psycopg2.IntegrityError: update or delete on table "task_instance" violates > foreign key constraint "task_reschedule_dag_task_date_fkey" on table > "task_reschedule"}} > {{DETAIL: Key (task_id, dag_id, execution_date)=(check_images_ready_11504, > flight5105_v0.0.1, 2018-12-13 00:00:00+00) is still referenced from table > "task_reschedule".}} > {{sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) update or delete > on table "task_instance" violates foreign key constraint > "task_reschedule_dag_task_date_fkey" on table "task_reschedule"}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-3543) rescheduled tasks block DAG deletion
[ https://issues.apache.org/jira/browse/AIRFLOW-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-3543. --- Resolution: Fixed Fix Version/s: 2.0.0 > rescheduled tasks block DAG deletion > > > Key: AIRFLOW-3543 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3543 > Project: Apache Airflow > Issue Type: Bug > Components: cli, database > Environment: postgres 10 database >Reporter: Christopher >Assignee: Stefan Seelmann >Priority: Critical > Fix For: 2.0.0 > > > This applies to current master branch after > [AIRFLOW-2747|https://github.com/apache/incubator-airflow/commit/dc59d7e2750aa90e099afad8689f2646f18f92a6] > was merged. > Once a sensor task is rescheduled, the task cannot be deleted from the DB due > to a foreign key constraint. This prevents deletion of tasks and DAGS. This > occurs regardless of whether the DAG is still running or whether the sensor > is actually rescheduled to run in the future or not (ie the task may complete > successfully but its entry still resides as a row in the task_reschedule > table. > > I am running a postgres-backed airflow instance. > > {{Traceback (most recent call last):}} > {{ File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1193, in _execute_context}} > {{context) > {{File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 509, in do_execute > {{cursor.execute(statement, parameters)}} > {{psycopg2.IntegrityError: update or delete on table "task_instance" violates > foreign key constraint "task_reschedule_dag_task_date_fkey" on table > "task_reschedule"}} > {{DETAIL: Key (task_id, dag_id, execution_date)=(check_images_ready_11504, > flight5105_v0.0.1, 2018-12-13 00:00:00+00) is still referenced from table > "task_reschedule".}} > {{sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) update or delete > on table "task_instance" violates foreign key constraint > "task_reschedule_dag_task_date_fkey" on table "task_reschedule"}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko merged pull request #4646: AIRFLOW-3543: Fix deletion of DAG with rescheduled tasks
Fokko merged pull request #4646: AIRFLOW-3543: Fix deletion of DAG with rescheduled tasks URL: https://github.com/apache/airflow/pull/4646 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #4646: AIRFLOW-3543: Fix deletion of DAG with rescheduled tasks
Fokko commented on issue #4646: AIRFLOW-3543: Fix deletion of DAG with rescheduled tasks URL: https://github.com/apache/airflow/pull/4646#issuecomment-467892646 Thanks again @seelmann This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mik-laj opened a new pull request #4787: [AIRFLOW-3967][WIP] Extract Jinja directive from Javascript
mik-laj opened a new pull request #4787: [AIRFLOW-3967][WIP] Extract Jinja directive from Javascript URL: https://github.com/apache/airflow/pull/4787 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3967 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description The current state is unexpected because: * When Javascript is generated by the template engine, we do metaprogramming, which raises the complexity of the problems. * It does not allow testing the code in isolation. * Jinja is a HTML Template Engine. Using it to the JS code may cause a security risk. As a next step, I would like to: * move the JS code to separate file * introduce linting for all JS/HTML files. * extract inlined CSS to separate file For a long-term goal, I would like to * introduce visual regression and snapshot testing. If the JS code will be in separate files then it will be relatively simple. * update a dependency - Bootstrap 4 including dropping glyphicons @jmcarp I saw that you changed HTML / JS files recently. What do you think about the change and plans for the future? Does what I want to do in the future make sense for you? ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mik-laj closed pull request #4787: [AIRFLOW-3967][WIP] Extract Jinja directive from Javascript
mik-laj closed pull request #4787: [AIRFLOW-3967][WIP] Extract Jinja directive from Javascript URL: https://github.com/apache/airflow/pull/4787 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #4768: [AIRFLOW-3800] run a dag at the beginning of the scheduled interval
Fokko commented on issue #4768: [AIRFLOW-3800] run a dag at the beginning of the scheduled interval URL: https://github.com/apache/airflow/pull/4768#issuecomment-467889897 I also agree with @XD-DENG, if we want this, this should preferably be on Airflow instance level. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #4768: [AIRFLOW-3800] run a dag at the beginning of the scheduled interval
Fokko commented on issue #4768: [AIRFLOW-3800] run a dag at the beginning of the scheduled interval URL: https://github.com/apache/airflow/pull/4768#issuecomment-467889502 I see where this comes from, but I think it might only increase the confusion. If you explain the philosophy behind kicking off the first DAG after the dag_start+invarval, it makes perfect sense. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-1847) Webhook Sensor
[ https://issues.apache.org/jira/browse/AIRFLOW-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779369#comment-16779369 ] gsemet commented on AIRFLOW-1847: - Yes, but since the web server is already a web server, i would like it (or a minibackend behind the extension) to provide a webhook for that. Also, bust treatment need to be adressed somehow, so we would need a kind of queue. Overall this very classic scheme should be handled by the webhook sensor proposal. > Webhook Sensor > -- > > Key: AIRFLOW-1847 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1847 > Project: Apache Airflow > Issue Type: Improvement > Components: core, operators >Reporter: gsemet >Assignee: gsemet >Priority: Minor > Labels: api, sensors, webhook > Attachments: airflow-webhook-proposal.png > > > h1. Webhook sensor > May require a hook in the experimental API > Register an api endpoint and wait for input on each. > It is different than the {{dag_runs}} api in that the format is not airflow > specific, it is just a callback web url called by an external system on some > even with its application specific content. The content in really important > and need to be sent to the dag (as XCom?) > Use Case: > - A Dag registers a WebHook sensor named {{}} > - An custom endpoint is exposed at > {{http://myairflow.server/api/experimental/webhook/}}. > - I set this URL in the external system I wish to use the webhook from. Ex: > github/gitlab project webhook > - when the external application performs a request to this URL, this is > automatically sent to the WebHook sensor. For simplicity, we can have a > JsonWebHookSensor that would be able to carry any kind of json content. > - sensor only job would be normally to trigger the exection of a DAG, > providing it with the json content as xcom. > If there are several requests at the same time, the system should be scalable > enough to not die or not slow down the webui. It is also possible to > instantiate an independant flask/gunicorn server to split the load. It would > mean it runs on another port, but this could be just an option in the > configuration file or even a complete independant application ({{airflow > webhookserver}}). I saw recent changes integrated gunicorn in airflow core, > guess it can help this use case. > To support the charge, I think it is good that the part in the API just post > the received request in an internal queue so the Sensor can handle them later > without risk of missing one. > Documentation would be updated to describe the classic scheme to implement > this use case, which would look like: > !airflow-webhook-proposal.png! > I think it is good to split it into 2 DAGs, one for linear handling of the > messages and triggering new DAG, and the processing DAG that might be > executed in parallel. > h2. Example usage in Sensor DAG: trigger a DAG on GitHub Push Event > {code} > sensor = JsonWebHookSensor( > task_id='my_task_id', > name="on_github_push" > ) > .. user is responsible to triggering the processing DAG himself. > {code} > In my github project, I register the following URL in webhook page: > {code} > http://airflow.myserver.com/api/experimental/webhook/on_github_push > {code} > From now on, on push, github will send a [json with this > format|https://developer.github.com/v3/activity/events/types/#pushevent] to > the previous URL. > The {{JsonWebHookSensor}} receives the payload, and a new dag is triggered in > this Sensing Dag. > h2. Documenation update > - add new item in the [scheduling > documentation|https://pythonhosted.org/airflow/scheduler.html] about how to > trigger a DAG using a webhook > - describe the sensing dag + processing dag scheme and provide the github use > case as real life example > h2. Possible evolutions > - use an external queue (redis, amqp) to handle lot of events > - subscribe in a pub/sub system such as WAMP? > - allow batch processing (trigger processing DAG on n events or after a > timeout, gathering n messages alltogether) > - for higher throughput, kafka? > - Security, authentication and other related subject might be adresses in > another ticket. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-3867) Unification of the subpackage's name for GCP
[ https://issues.apache.org/jira/browse/AIRFLOW-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-3867. --- Resolution: Fixed Fix Version/s: 2.0.0 > Unification of the subpackage's name for GCP > > > Key: AIRFLOW-3867 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3867 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Kamil Bregula >Assignee: Kamil Bregula >Priority: Minor > Fix For: 2.0.0 > > > The names for packages for Azure and AWS have been standardized. Google Cloud > Platform should follow the trend. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3970) Misleading navigation on the DAG screen.
[ https://issues.apache.org/jira/browse/AIRFLOW-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779355#comment-16779355 ] ASF GitHub Bot commented on AIRFLOW-3970: - mik-laj commented on pull request #4796: [AIRFLOW-3970] Pull out the action buttons from the tabs URL: https://github.com/apache/airflow/pull/4796 Make sure you have checked _all_ steps below. ### Jira - https://issues.apache.org/jira/browse/AIRFLOW-3970 ### Description Assigning actions to a bookmark element is misleading.Now, bookmarks are used only for browsing and buttons are used to perform action. Before: ![localhost_8004_tree_dag_id example_gcp_vision 1](https://user-images.githubusercontent.com/12058428/53496242-e81ef780-3aa1-11e9-8082-4d57463e135a.png) After: ![localhost_8004_tree_dag_id example_gcp_vision](https://user-images.githubusercontent.com/12058428/53496105-9aa28a80-3aa1-11e9-87c1-3574b0f4badc.png) @ashb Can you look at it? ### Tests No applicable ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Misleading navigation on the DAG screen. > > > Key: AIRFLOW-3970 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3970 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Kamil Bregula >Assignee: Kamil Bregula >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3867) Unification of the subpackage's name for GCP
[ https://issues.apache.org/jira/browse/AIRFLOW-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779360#comment-16779360 ] ASF GitHub Bot commented on AIRFLOW-3867: - Fokko commented on pull request #4690: [AIRFLOW-3867] Rename GCP's subpackage URL: https://github.com/apache/airflow/pull/4690 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unification of the subpackage's name for GCP > > > Key: AIRFLOW-3867 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3867 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Kamil Bregula >Assignee: Kamil Bregula >Priority: Minor > > The names for packages for Azure and AWS have been standardized. Google Cloud > Platform should follow the trend. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko merged pull request #4690: [AIRFLOW-3867] Rename GCP's subpackage
Fokko merged pull request #4690: [AIRFLOW-3867] Rename GCP's subpackage URL: https://github.com/apache/airflow/pull/4690 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services