[GitHub] [airflow] feng-tao commented on issue #5160: [AIRFLOW-4394] Don't double test behaviour of BackfillJob from CLI tests
feng-tao commented on issue #5160: [AIRFLOW-4394] Don't double test behaviour of BackfillJob from CLI tests URL: https://github.com/apache/airflow/pull/5160#issuecomment-486077899 Nice, do we know how much we expect to save ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-4402) Update super() calls for nvd3
[ https://issues.apache.org/jira/browse/AIRFLOW-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824830#comment-16824830 ] Tao Feng commented on AIRFLOW-4402: --- [~milton0825] didn't quite follow all the prs recently. But I found what you are talking about. Just raise the questions in that pr as I am not 100% sure. > Update super() calls for nvd3 > - > > Key: AIRFLOW-4402 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4402 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > Fix For: 2.0.0 > > > In all classes under nvd3, replace {{super(__class__, self).__init__(...)}} > by {{super().__init__(...)}} > Similarly for any other {{super}} calls for other methods. > (In Python 3 {{super(__class__, self) == super()}}) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] codecov-io edited a comment on issue #5167: Reliable SynchronizedQueue used instead of multiprocessing.Queue
codecov-io edited a comment on issue #5167: Reliable SynchronizedQueue used instead of multiprocessing.Queue URL: https://github.com/apache/airflow/pull/5167#issuecomment-486068982 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=h1) Report > Merging [#5167](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/de1795c8f1e65a5f4a848e88ef3848ba51aab449?src=pr=desc) will **increase** coverage by `0.01%`. > The diff coverage is `91.66%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5167/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5167 +/- ## == + Coverage 78.44% 78.46% +0.01% == Files 466 467 +1 Lines 2973529762 +27 == + Hits2332723354 +27 Misses 6408 6408 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/executors/base\_executor.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvYmFzZV9leGVjdXRvci5weQ==) | `95.58% <ø> (-0.07%)` | :arrow_down: | | [airflow/executors/local\_executor.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvbG9jYWxfZXhlY3V0b3IucHk=) | `81.25% <100%> (+0.19%)` | :arrow_up: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `59.22% <100%> (+0.1%)` | :arrow_up: | | [airflow/contrib/executors/kubernetes\_executor.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4ZWN1dG9ycy9rdWJlcm5ldGVzX2V4ZWN1dG9yLnB5) | `63.38% <100%> (+0.1%)` | :arrow_up: | | [airflow/jobs.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `78.53% <50%> (+0.01%)` | :arrow_up: | | [airflow/utils/synchronized\_queue.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zeW5jaHJvbml6ZWRfcXVldWUucHk=) | `92.3% <92.3%> (ø)` | | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.59% <0%> (+0.17%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=footer). Last update [de1795c...9494571](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #5167: Reliable SynchronizedQueue used instead of multiprocessing.Queue
codecov-io edited a comment on issue #5167: Reliable SynchronizedQueue used instead of multiprocessing.Queue URL: https://github.com/apache/airflow/pull/5167#issuecomment-486068982 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=h1) Report > Merging [#5167](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/de1795c8f1e65a5f4a848e88ef3848ba51aab449?src=pr=desc) will **increase** coverage by `0.01%`. > The diff coverage is `91.66%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5167/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5167 +/- ## == + Coverage 78.44% 78.46% +0.01% == Files 466 467 +1 Lines 2973529762 +27 == + Hits2332723354 +27 Misses 6408 6408 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/executors/base\_executor.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvYmFzZV9leGVjdXRvci5weQ==) | `95.58% <ø> (-0.07%)` | :arrow_down: | | [airflow/executors/local\_executor.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvbG9jYWxfZXhlY3V0b3IucHk=) | `81.25% <100%> (+0.19%)` | :arrow_up: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `59.22% <100%> (+0.1%)` | :arrow_up: | | [airflow/contrib/executors/kubernetes\_executor.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4ZWN1dG9ycy9rdWJlcm5ldGVzX2V4ZWN1dG9yLnB5) | `63.38% <100%> (+0.1%)` | :arrow_up: | | [airflow/jobs.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `78.53% <50%> (+0.01%)` | :arrow_up: | | [airflow/utils/synchronized\_queue.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zeW5jaHJvbml6ZWRfcXVldWUucHk=) | `92.3% <92.3%> (ø)` | | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.59% <0%> (+0.17%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=footer). Last update [de1795c...9494571](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao edited a comment on issue #5019: [AIRFLOW-4199][AIP-3-step3] Remove all sys version info
feng-tao edited a comment on issue #5019: [AIRFLOW-4199][AIP-3-step3] Remove all sys version info URL: https://github.com/apache/airflow/pull/5019#issuecomment-486072340 @kaxil, @potiuk , @ashb, @bolkedebruin are we supposed to change the vendor files? I thought files under vendor dir are meant for copy paste from upstream without any change? Or I understand incorrectly? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao commented on issue #5019: [AIRFLOW-4199][AIP-3-step3] Remove all sys version info
feng-tao commented on issue #5019: [AIRFLOW-4199][AIP-3-step3] Remove all sys version info URL: https://github.com/apache/airflow/pull/5019#issuecomment-486072369 cc @milton0825 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao commented on issue #5019: [AIRFLOW-4199][AIP-3-step3] Remove all sys version info
feng-tao commented on issue #5019: [AIRFLOW-4199][AIP-3-step3] Remove all sys version info URL: https://github.com/apache/airflow/pull/5019#issuecomment-486072340 @kaxil, @potiuk , @ashb, @bolkedebruin are we supposed to change the vendor files(https://github.com/apache/airflow/commit/e36bdef0b34c16def20ecbb8248950070eb5fa33)? I thought files under vendor dir are meant for copy paste from upstream? Or I understand incorrectly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-4402) Update super() calls for nvd3
[ https://issues.apache.org/jira/browse/AIRFLOW-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824823#comment-16824823 ] Chao-Han Tsai commented on AIRFLOW-4402: Hmm i have seen changes on those files. > Update super() calls for nvd3 > - > > Key: AIRFLOW-4402 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4402 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > Fix For: 2.0.0 > > > In all classes under nvd3, replace {{super(__class__, self).__init__(...)}} > by {{super().__init__(...)}} > Similarly for any other {{super}} calls for other methods. > (In Python 3 {{super(__class__, self) == super()}}) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] codecov-io edited a comment on issue #5167: Reliable SynchronizedQueue used instead of multiprocessing.Queue
codecov-io edited a comment on issue #5167: Reliable SynchronizedQueue used instead of multiprocessing.Queue URL: https://github.com/apache/airflow/pull/5167#issuecomment-486068982 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=h1) Report > Merging [#5167](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/de1795c8f1e65a5f4a848e88ef3848ba51aab449?src=pr=desc) will **increase** coverage by `0.01%`. > The diff coverage is `91.66%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5167/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5167 +/- ## == + Coverage 78.44% 78.46% +0.01% == Files 466 467 +1 Lines 2973529762 +27 == + Hits2332723353 +26 - Misses 6408 6409 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/executors/base\_executor.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvYmFzZV9leGVjdXRvci5weQ==) | `95.58% <ø> (-0.07%)` | :arrow_down: | | [airflow/executors/local\_executor.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvbG9jYWxfZXhlY3V0b3IucHk=) | `81.25% <100%> (+0.19%)` | :arrow_up: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `59.22% <100%> (+0.1%)` | :arrow_up: | | [airflow/contrib/executors/kubernetes\_executor.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4ZWN1dG9ycy9rdWJlcm5ldGVzX2V4ZWN1dG9yLnB5) | `63.38% <100%> (+0.1%)` | :arrow_up: | | [airflow/jobs.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `78.53% <50%> (+0.01%)` | :arrow_up: | | [airflow/utils/synchronized\_queue.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zeW5jaHJvbml6ZWRfcXVldWUucHk=) | `92.3% <92.3%> (ø)` | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=footer). Last update [de1795c...9494571](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] milton0825 commented on issue #5143: [AIRFLOW-4204] Update super() calls to PY3
milton0825 commented on issue #5143: [AIRFLOW-4204] Update super() calls to PY3 URL: https://github.com/apache/airflow/pull/5143#issuecomment-486068967 Removed the second commit and created a separate PR: https://github.com/apache/airflow/pull/5168 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io commented on issue #5167: Reliable SynchronizedQueue used instead of multiprocessing.Queue
codecov-io commented on issue #5167: Reliable SynchronizedQueue used instead of multiprocessing.Queue URL: https://github.com/apache/airflow/pull/5167#issuecomment-486068982 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=h1) Report > Merging [#5167](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/de1795c8f1e65a5f4a848e88ef3848ba51aab449?src=pr=desc) will **increase** coverage by `0.01%`. > The diff coverage is `91.66%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5167/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5167 +/- ## == + Coverage 78.44% 78.46% +0.01% == Files 466 467 +1 Lines 2973529762 +27 == + Hits2332723353 +26 - Misses 6408 6409 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/executors/base\_executor.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvYmFzZV9leGVjdXRvci5weQ==) | `95.58% <ø> (-0.07%)` | :arrow_down: | | [airflow/executors/local\_executor.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvbG9jYWxfZXhlY3V0b3IucHk=) | `81.25% <100%> (+0.19%)` | :arrow_up: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `59.22% <100%> (+0.1%)` | :arrow_up: | | [airflow/contrib/executors/kubernetes\_executor.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4ZWN1dG9ycy9rdWJlcm5ldGVzX2V4ZWN1dG9yLnB5) | `63.38% <100%> (+0.1%)` | :arrow_up: | | [airflow/jobs.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `78.53% <50%> (+0.01%)` | :arrow_up: | | [airflow/utils/synchronized\_queue.py](https://codecov.io/gh/apache/airflow/pull/5167/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zeW5jaHJvbml6ZWRfcXVldWUucHk=) | `92.3% <92.3%> (ø)` | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=footer). Last update [de1795c...9494571](https://codecov.io/gh/apache/airflow/pull/5167?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-4402) Update super() calls for nvd3
[ https://issues.apache.org/jira/browse/AIRFLOW-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824820#comment-16824820 ] Tao Feng commented on AIRFLOW-4402: --- hold on, I don't think we should change any of the files under vendor folder as all the file under that folder meant to be copy paste from upstream dependency. > Update super() calls for nvd3 > - > > Key: AIRFLOW-4402 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4402 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > Fix For: 2.0.0 > > > In all classes under nvd3, replace {{super(__class__, self).__init__(...)}} > by {{super().__init__(...)}} > Similarly for any other {{super}} calls for other methods. > (In Python 3 {{super(__class__, self) == super()}}) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] milton0825 opened a new pull request #5168: [AIRFLOW-4204] Update super() calls to PY3 for nvd3
milton0825 opened a new pull request #5168: [AIRFLOW-4204] Update super() calls to PY3 for nvd3 URL: https://github.com/apache/airflow/pull/5168 (cherry picked from commit 52edaebf1a043120f8a9882b768fea0cf2270db1) Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-4401) multiprocessing.Queue.empty() is unreliable
[ https://issues.apache.org/jira/browse/AIRFLOW-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Potiuk updated AIRFLOW-4401: -- Description: After discussing with [~ash] and [~BasPH] potential reasons for flakiness of LocalExecutor tests, I took a deeper dive into the problem and what I found raised the remaining hair on top of my head. We had a number of flaky tests in the local executor that resulted in result_queue not being empty where it should have been emptied a moment before. More details and discussion can be found in [https://github.com/apache/airflow/pull/5159] The problem turned out to be ... unreliability of multiprocessing.Queue empty() implementation. It turned out that multiprocessing.Queue.empty() implementation is not fully synchronized and it might return True even if put() operation has been already completed in another process. What's more - empty() might return True even if qsize() of the queue returns > 0 (!) It's a bit mind-boggling but it is "as intended' as documented in [https://bugs.python.org/issue23582] (resolved as "not a bug" ) A few people have stumbled upon this problem. For example [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] and [https://github.com/keras-team/autokeras/issues/368] Also we seemed to experienced that in Airflow before. In jobs.py years ago (31.07.2016) - we can see the comment below (but we used multiprocessing.queue empty() nevertheless): {code:java} # Not using multiprocessing.Queue() since it's no longer a separate # process and due to some unusual behavior. (empty() incorrectly # returns true?){code} The solution available in [https://bugs.python.org/issue23582] using qsize() was working on Linux but is not really acceptable because qsize() does not work on MacOS (throws NotImplementedError). The working solution is to implement a reliable queue (SynchronizedQueue) based on [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] (butwith a twist that __init__ of class deriving from Queue has to be changed for python 3.4+ as described in [https://stackoverflow.com/questions/24941359/ctx-parameter-in-multiprocessing-queue]. Luckily we are now Python3.5+ We should replace all usages of multiprocessing.Queue where empty() is used with the SynchronizedQueue. And make sure we do not use multiprocessing.Queue in similar way in the future. was: After discussing with [~ash] and [~BasPH] potential reasons for flakiness of LocalExecutor tests, I took a deeper dive into the problem and what I found raised the remaining hair on top of my head. We had a number of flaky tests in the local executor that resulted in result_queue not being empty where it should have been emptied a moment before. More details and discussion can be found in [https://github.com/apache/airflow/pull/5159] The problem turned out to be ... unreliability of multiprocessing.Queue empty() implementation. It turned out that multiprocessing.Queue.empty() implementation is not fully synchronized and it might return True even if put() operation has been already completed in another process. What's more - empty() might return True even if qsize() of the queue returns > 0 (!) It's a bit mind-boggling but it is "as intended' as documented in [https://bugs.python.org/issue23582] (resolved as "not a bug" ) A few people have stumbled upon this problem. For example [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] and [https://github.com/keras-team/autokeras/issues/368] Also we seemed to experienced that in Airflow before. In jobs.py years ago (31.07.2016) - we can see the comment below (but we used multiprocessing.queue empty() nevertheless): {code:java} # Not using multiprocessing.Queue() since it's no longer a separate # process and due to some unusual behavior. (empty() incorrectly # returns true?){code} The solution available in [https://bugs.python.org/issue23582] using qsize() was not usable because qsize() does not work on MacOS (throws NotImplementedError). The working solution is to implement a reliable queue (SynchronizedQueue) based on [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] (butwith a twist that __init__ of class deriving from Queue has to be changed for python 3.4+ as described in [https://stackoverflow.com/questions/24941359/ctx-parameter-in-multiprocessing-queue]. Luckily we are now Python3.5+ We should replace all usages of multiprocessing.Queue where empty() is used with the SynchronizedQueue. And make sure we do not use multiprocessing.Queue in similar way in the future. > multiprocessing.Queue.empty() is unreliable > --- > > Key: AIRFLOW-4401 > URL:
[jira] [Updated] (AIRFLOW-4402) Update super() calls for nvd3
[ https://issues.apache.org/jira/browse/AIRFLOW-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-4402: --- Description: In all classes under nvd3, replace {{super(__class__, self).__init__(...)}} by {{super().__init__(...)}} Similarly for any other {{super}} calls for other methods. (In Python 3 {{super(__class__, self) == super()}}) was: In all classes, replace {{super(__class__, self).__init__(...)}} by {{super().__init__(...)}} Similarly for any other {{super}} calls for other methods. (In Python 3 {{super(__class__, self) == super()}}) > Update super() calls for nvd3 > - > > Key: AIRFLOW-4402 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4402 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > Fix For: 2.0.0 > > > In all classes under nvd3, replace {{super(__class__, self).__init__(...)}} > by {{super().__init__(...)}} > Similarly for any other {{super}} calls for other methods. > (In Python 3 {{super(__class__, self) == super()}}) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4402) Update super() calls for nvd3
Chao-Han Tsai created AIRFLOW-4402: -- Summary: Update super() calls for nvd3 Key: AIRFLOW-4402 URL: https://issues.apache.org/jira/browse/AIRFLOW-4402 Project: Apache Airflow Issue Type: Sub-task Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Fix For: 2.0.0 In all classes, replace {{super(__class__, self).__init__(...)}} by {{super().__init__(...)}} Similarly for any other {{super}} calls for other methods. (In Python 3 {{super(__class__, self) == super()}}) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4401) multiprocessing.Queue.empty() is unreliable
[ https://issues.apache.org/jira/browse/AIRFLOW-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Potiuk updated AIRFLOW-4401: -- Description: After discussing with [~ash] and [~BasPH] potential reasons for flakiness of LocalExecutor tests, I took a deeper dive into the problem and what I found raised the remaining hair on top of my head. We had a number of flaky tests in the local executor that resulted in result_queue not being empty where it should have been emptied a moment before. More details and discussion can be found in [https://github.com/apache/airflow/pull/5159] The problem turned out to be ... unreliability of multiprocessing.Queue empty() implementation. It turned out that multiprocessing.Queue.empty() implementation is not fully synchronized and it might return True even if put() operation has been already completed in another process. What's more - empty() might return True even if qsize() of the queue returns > 0 (!) It's a bit mind-boggling but it is "as intended' as documented in [https://bugs.python.org/issue23582] (resolved as "not a bug" ) A few people have stumbled upon this problem. For example [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] and [https://github.com/keras-team/autokeras/issues/368] Also we seemed to experienced that in Airflow before. In jobs.py years ago (31.07.2016) - we can see the comment below (but we used multiprocessing.queue empty() nevertheless): {code:java} # Not using multiprocessing.Queue() since it's no longer a separate # process and due to some unusual behavior. (empty() incorrectly # returns true?){code} The solution available in [https://bugs.python.org/issue23582] using qsize() was not usable because qsize() does not work on MacOS (throws NotImplementedError). The working solution is to implement a reliable queue (SynchronizedQueue) based on [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] (butwith a twist that __init__ of class deriving from Queue has to be changed for python 3.4+ as described in [https://stackoverflow.com/questions/24941359/ctx-parameter-in-multiprocessing-queue]. Luckily we are now Python3.5+ We should replace all usages of multiprocessing.Queue where empty() is used with the SynchronizedQueue. And make sure we do not use multiprocessing.Queue in similar way in the future. was: After discussing with [~ash] and [~BasPH] potential reasons for flakiness of LocalExecutor tests, I took a deeper dive into the problem and what I found raised the remaining hair on top of my head. We had a number of flaky tests in the local executor that resulted in result_queue not being empty where it should have been emptied a moment before. More details and discussion can be found in [https://github.com/apache/airflow/pull/5159] The problem turned out to be ... unreliability of multiprocessing.Queue empty() implementation. It turned out that multiprocessing.Queue.empty() implementation is not fully synchronized and it might return True even if put() operation has been already completed in another process. What's more - empty() might return True even if qsize() of the queue returns > 0 (!) It's a bit mind-boggling but it is "as intended' as documented in [https://bugs.python.org/issue23582] (resolved as "not a bug" ) A few people have stumbled upon this problem. For example [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] and [https://github.com/keras-team/autokeras/issues/368] Also we seemed to experienced that in Airflow before. In jobs.py year ago (31.07.2016) - we can see the comment below (but we used multiprocessing.queue empty() nevertheless): {code:java} # Not using multiprocessing.Queue() since it's no longer a separate # process and due to some unusual behavior. (empty() incorrectly # returns true?){code} The solution available in [https://bugs.python.org/issue23582] using qsize() was not usable because qsize() does not work on MacOS (throws NotImplementedError). The working solution is to implement a reliable queue (SynchronizedQueue) based on [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] (butwith a twist that __init__ of class deriving from Queue has to be changed for python 3.4+ as described in [https://stackoverflow.com/questions/24941359/ctx-parameter-in-multiprocessing-queue]. Luckily we are now Python3.5+ We should replace all usages of multiprocessing.Queue where empty() is used with the SynchronizedQueue. And make sure we do not use multiprocessing.Queue in similar way in the future. > multiprocessing.Queue.empty() is unreliable > --- > > Key: AIRFLOW-4401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4401 > Project: Apache
[GitHub] [airflow] milton0825 commented on issue #5041: [AIRFLOW-4208] Replace @abstractproperty by @abstractmethod
milton0825 commented on issue #5041: [AIRFLOW-4208] Replace @abstractproperty by @abstractmethod URL: https://github.com/apache/airflow/pull/5041#issuecomment-486066778 Shall we merge this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] jmcarp commented on issue #4821: [AIRFLOW-3998] Use nested commands in cli.
jmcarp commented on issue #4821: [AIRFLOW-3998] Use nested commands in cli. URL: https://github.com/apache/airflow/pull/4821#issuecomment-486065962 Updated documentation to match the new CLI structure. What's the next step here? A vote on the AIP? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-4401) multiprocessing.Queue.empty() is unreliable
[ https://issues.apache.org/jira/browse/AIRFLOW-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Potiuk updated AIRFLOW-4401: -- Description: After discussing with [~ash] and [~BasPH] potential reasons for flakiness of LocalExecutor tests, I took a deeper dive into the problem and what I found raised the remaining hair on top of my head. We had a number of flaky tests in the local executor that resulted in result_queue not being empty where it should have been emptied a moment before. More details and discussion can be found in [https://github.com/apache/airflow/pull/5159] The problem turned out to be ... unreliability of multiprocessing.Queue empty() implementation. It turned out that multiprocessing.Queue.empty() implementation is not fully synchronized and it might return True even if put() operation has been already completed in another process. What's more - empty() might return True even if qsize() of the queue returns > 0 (!) It's a bit mind-boggling but it is "as intended' as documented in [https://bugs.python.org/issue23582] (resolved as "not a bug" ) A few people have stumbled upon this problem. For example [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] and [https://github.com/keras-team/autokeras/issues/368] Also we seemed to experienced that in Airflow before. In jobs.py year ago (31.07.2016) - we can see the comment below (but we used multiprocessing.queue empty() nevertheless): {code:java} # Not using multiprocessing.Queue() since it's no longer a separate # process and due to some unusual behavior. (empty() incorrectly # returns true?){code} The solution available in [https://bugs.python.org/issue23582] using qsize() was not usable because qsize() does not work on MacOS (throws NotImplementedError). The working solution is to implement a reliable queue (SynchronizedQueue) based on [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] (butwith a twist that __init__ of class deriving from Queue has to be changed for python 3.4+ as described in [https://stackoverflow.com/questions/24941359/ctx-parameter-in-multiprocessing-queue]. Luckily we are now Python3.5+ We should replace all usages of multiprocessing.Queue where empty() is used with the SynchronizedQueue. And make sure we do not use multiprocessing.Queue in similar way in the future. was: After discussing with [~ash] and [~BasPH] potential reasons for flakiness of LocalExecutor tests, I took a deeper dive into the problem and what I found raised the remaining hair on top of my head. We had a number of flaky tests in the local executor that resulted in result_queue not being empty where it should have been emptied a moment before. More details and discussion can be found in [https://github.com/apache/airflow/pull/5159] The problem turned out to be ... unreliability of multiprocessing.Queue empty() implementation. Itt turned out that multiprocessing.Queue.empty() implementation is not fully synchronized and it might return True even if put() operation has been already completed in another thread. What's more - empty() might return True even if qsize() of the queue returns > 0 (!) It's a bit mind-boggling but it is "as intended' as documented in [https://bugs.python.org/issue23582] (resolved as "not a bug" ) A few people have stumbled upon this problem. For example [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] and [https://github.com/keras-team/autokeras/issues/368] Also we seemed to experienced that in Airflow before. In jobs.py year ago (31.07.2016) - we can see the comment below (but we used multiprocessing.queue empty() nevertheless): {code:java} # Not using multiprocessing.Queue() since it's no longer a separate # process and due to some unusual behavior. (empty() incorrectly # returns true?){code} The solution available in [https://bugs.python.org/issue23582] using qsize() was not usable because qsize() does not work on MacOS (throws NotImplementedError). The working solution is to implement a reliable queue (SynchronizedQueue) based on [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] (butwith a twist that __init__ of class deriving from Queue has to be changed for python 3.4+ as described in [https://stackoverflow.com/questions/24941359/ctx-parameter-in-multiprocessing-queue]. Luckily we are now Python3.5+ We should replace all usages of multiprocessing.Queue where empty() is used with the SynchronizedQueue. And make sure we do not use multiprocessing.Queue in similar way in the future. > multiprocessing.Queue.empty() is unreliable > --- > > Key: AIRFLOW-4401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4401 > Project: Apache
[GitHub] [airflow] potiuk commented on a change in pull request #5166: [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor
potiuk commented on a change in pull request #5166: [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor URL: https://github.com/apache/airflow/pull/5166#discussion_r277952601 ## File path: airflow/contrib/sensors/gcs_sensor.py ## @@ -160,3 +162,105 @@ def poke(self, context): google_cloud_storage_conn_id=self.google_cloud_conn_id, delegate_to=self.delegate_to) return bool(hook.list(self.bucket, prefix=self.prefix)) + + +class GoogleCloudStorageUploadSessionCompleteSensor(BaseSensorOperator): +""" +Checks for changes in the number of files at prefix in Google Cloud Storage +bucket and returns True if the inactivity period has passed with no +increase in the number of files. Note, it is recommended to use reschedule +mode if you expect this sensor to run for hours. + +:param bucket: The Google cloud storage bucket where the objects are. +expected. +:type bucket: str +:param prefix: The name of the prefix to check in the Google cloud +storage bucket. +:param inactivity_period: The total seconds of inactivity to designate +an upload session is over. Note, this mechanism is not real time and +this operator may not return until a poke_interval after this period +has passed with no additional files sensed. +:type inactivity_period: int +:param min_files: The minimum number of files needed for upload session +to be considered valid. +:type min_files: int +:param previous_num_files: The previous number of files before the next +iteration. +:type previous_num_files: int +:param inactivity_seconds: The current seconds of the inactivity period. +:type inactivity_seconds: int +:param google_cloud_conn_id: The connection ID to use when connecting +to Google cloud storage. +:type google_cloud_conn_id: str +:param delegate_to: The account to impersonate, if any. For this to work, +the service account making the request must have domain-wide +delegation enabled. +:type delegate_to: str +""" + +template_fields = ('bucket', 'prefix') +ui_color = '#f0eee4' + +@apply_defaults +def __init__(self, + bucket, + prefix, + inactivity_period=60 * 60, + min_files=1, + previous_num_files=0, + google_cloud_conn_id='google_cloud_default', + delegate_to=None, + *args, **kwargs): + +super(GoogleCloudStorageUploadSessionCompleteSensor, self).__init__(*args, **kwargs) + +self.bucket = bucket +self.prefix = prefix +self.google_cloud_conn_id = google_cloud_conn_id +self.inactivity_period = inactivity_period +self.min_files = min_files +self.previous_num_files = previous_num_files +self.inactivity_seconds = 0 +self.google_cloud_conn_id = google_cloud_conn_id +self.delegate_to = delegate_to + +def is_bucket_updated(self, current_num_files): +""" +Checks whether new files have been uploaded and the inactivity_period +has passed and updates the state of the sensor accordingly. + +:param current_num_files: number of files in bucket during last poke. +:type current_num_files: int +""" + +if current_num_files > self.previous_num_files: +# When new files arrived, reset the inactivity_seconds +# previous_num_files for the next poke. +self.inactivity_seconds = 0 +self.previous_num_files = current_num_files +else: +self.inactivity_seconds = self.inactivity_seconds \ ++ self.poke_interval Review comment: I think it would have been better to use the actual system time for inactivity seconds calculation (current_time - last_activity_time). Using poke_interval - especially multiple times - does not take into account potential scheduling delays. There is no guarantee that the "is_bucket_updated" will be always called exactly after poke_interval seconds. It will be a bit more difficult for unit testing but it can be done with mocking the current_time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] jmcarp commented on issue #5030: [AIRFLOW-4227] Use python3-style type annotations.
jmcarp commented on issue #5030: [AIRFLOW-4227] Use python3-style type annotations. URL: https://github.com/apache/airflow/pull/5030#issuecomment-486064887 This should be ready to go now that #5090 is finished. cc @Fokko @kaxil This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-4401) multiprocessing.Queue is unreliable
Jarek Potiuk created AIRFLOW-4401: - Summary: multiprocessing.Queue is unreliable Key: AIRFLOW-4401 URL: https://issues.apache.org/jira/browse/AIRFLOW-4401 Project: Apache Airflow Issue Type: Bug Reporter: Jarek Potiuk After discussing with [~ash] and [~BasPH] potential reasons for flakiness of LocalExecutor tests, I took a deeper dive into the problem and what I found raised the remaining hair on top of my head. We had a number of flaky tests in the local executor that resulted in result_queue not being empty where it should have been emptied a moment before. More details and discussion can be found in [https://github.com/apache/airflow/pull/5159] The problem turned out to be ... unreliability of multiprocessing.Queue empty() implementation. Itt turned out that multiprocessing.Queue.empty() implementation is not fully synchronized and it might return True even if put() operation has been already completed in another thread. What's more - empty() might return True even if qsize() of the queue returns > 0 (!) It's a bit mind-boggling but it is "as intended' as documented in [https://bugs.python.org/issue23582] (resolved as "not a bug" ) A few people have stumbled upon this problem. For example [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] and [https://github.com/keras-team/autokeras/issues/368] The solution available in [https://bugs.python.org/issue23582] using qsize() was not usable because qsize() does not work on MacOS (throws NotImplementedError). The working solution is to implement a reliable queue (SynchronizedQueue) based on [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] (butwith a twist that __init__ of class deriving from Queue has to be changed for python 3.4+ as described in [https://stackoverflow.com/questions/24941359/ctx-parameter-in-multiprocessing-queue]. Luckily we are now Python3.5+ We should replace all usages of multiprocessing.Queue where empty() is used with the SynchronizedQueue -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] potiuk edited a comment on issue #5159: [AIRFLOW-XXX] Attempt to remove flakeyness of LocalExecutor
potiuk edited a comment on issue #5159: [AIRFLOW-XXX] Attempt to remove flakeyness of LocalExecutor URL: https://github.com/apache/airflow/pull/5159#issuecomment-486063506 Hey @BasPH @ashb . I woke up this morning and could not stop thinking about that one and I took a dep dive. It was great idea @BasPH to make it reproducible locally, that has helped me a lot. I added a lot of logging and eventually it boiled down to cases where queue.empty() returned True where queue.qsize() returned 1. I thought I lost my mind for a while, but well ... It turned out that this this is INTENDED BEHAVIOUR of multiprocessing.Queue. See the issue https://bugs.python.org/issue23582 (resolved as "not a bug"). I have no words for it, but well, we needed to solve it. Unfortunately simple solution does not work on MacOS (qsize() does not work on MacOS and throws NotImplemented) ... So at the end I had to do what others did in the same case - have a custom SynchronizedQueue deriving from multiprocessing.Queue and add extra synchronisation/counter to put/get methods :( I described all details in https://issues.apache.org/jira/browse/AIRFLOW-4401 and created a PR to fix it here: https://github.com/apache/airflow/pull/5167 Speaking about productive morning :). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #5159: [AIRFLOW-XXX] Attempt to remove flakeyness of LocalExecutor
potiuk commented on issue #5159: [AIRFLOW-XXX] Attempt to remove flakeyness of LocalExecutor URL: https://github.com/apache/airflow/pull/5159#issuecomment-486063506 Hey @BasPH @ashb . I woke up this morning and could not stop thinking about that one and I took a dep dive. It was great idea @BasPH to make it reproducible locally, that has helped me a lot. I added a lot of logging. Eventually it boiled down to cases where queue.empty() returned True where queue.qsize() returned 1. I thought I lost my mind for a while, but well ... It turned out that this this is INTENDED BEHAVIOUR of multiprocessing.Queue. See the issue https://bugs.python.org/issue23582 (resolved as "not a bug"). I have no words for it, but well, we needed to solve it. Unfortunately simple solution does not work on MacOS (qsize() does not work on MacOS and throws NotImplemented) ... So at the end I had to do what others did in the same case - have a custom SynchronizedQueue deriving from multiprocessing.Queue and add extra synchronisation/counter to put/get methods :( I described all details in https://issues.apache.org/jira/browse/AIRFLOW-4401 and created a PR to fix it here: https://github.com/apache/airflow/pull/5167 Speaking about productive morning :). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk opened a new pull request #5167: Reliable SynchronizedQueue used instead of multiprocessing.Queue
potiuk opened a new pull request #5167: Reliable SynchronizedQueue used instead of multiprocessing.Queue URL: https://github.com/apache/airflow/pull/5167 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-4401 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: It solves unreliable behaviour of multiprocessing.Queue.empty() - sometimes returning True when the queue is not really empty. ### Tests - [x] My PR does not need testing for this extremely good reason: We already have LocalExecutor and other tests that will just become less flaky. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-4401) multiprocessing.Queue is unreliable
[ https://issues.apache.org/jira/browse/AIRFLOW-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Potiuk updated AIRFLOW-4401: -- Description: After discussing with [~ash] and [~BasPH] potential reasons for flakiness of LocalExecutor tests, I took a deeper dive into the problem and what I found raised the remaining hair on top of my head. We had a number of flaky tests in the local executor that resulted in result_queue not being empty where it should have been emptied a moment before. More details and discussion can be found in [https://github.com/apache/airflow/pull/5159] The problem turned out to be ... unreliability of multiprocessing.Queue empty() implementation. Itt turned out that multiprocessing.Queue.empty() implementation is not fully synchronized and it might return True even if put() operation has been already completed in another thread. What's more - empty() might return True even if qsize() of the queue returns > 0 (!) It's a bit mind-boggling but it is "as intended' as documented in [https://bugs.python.org/issue23582] (resolved as "not a bug" ) A few people have stumbled upon this problem. For example [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] and [https://github.com/keras-team/autokeras/issues/368] Also we seemed to experienced that in Airflow before. In jobs.py year ago (31.07.2016) - we can see the comment below (but we used multiprocessing.queue empty() nevertheless): {code:java} # Not using multiprocessing.Queue() since it's no longer a separate # process and due to some unusual behavior. (empty() incorrectly # returns true?){code} The solution available in [https://bugs.python.org/issue23582] using qsize() was not usable because qsize() does not work on MacOS (throws NotImplementedError). The working solution is to implement a reliable queue (SynchronizedQueue) based on [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] (butwith a twist that __init__ of class deriving from Queue has to be changed for python 3.4+ as described in [https://stackoverflow.com/questions/24941359/ctx-parameter-in-multiprocessing-queue]. Luckily we are now Python3.5+ We should replace all usages of multiprocessing.Queue where empty() is used with the SynchronizedQueue. And make sure we do not use multiprocessing.Queue in similar way in the future. was: After discussing with [~ash] and [~BasPH] potential reasons for flakiness of LocalExecutor tests, I took a deeper dive into the problem and what I found raised the remaining hair on top of my head. We had a number of flaky tests in the local executor that resulted in result_queue not being empty where it should have been emptied a moment before. More details and discussion can be found in [https://github.com/apache/airflow/pull/5159] The problem turned out to be ... unreliability of multiprocessing.Queue empty() implementation. Itt turned out that multiprocessing.Queue.empty() implementation is not fully synchronized and it might return True even if put() operation has been already completed in another thread. What's more - empty() might return True even if qsize() of the queue returns > 0 (!) It's a bit mind-boggling but it is "as intended' as documented in [https://bugs.python.org/issue23582] (resolved as "not a bug" ) A few people have stumbled upon this problem. For example [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] and [https://github.com/keras-team/autokeras/issues/368] The solution available in [https://bugs.python.org/issue23582] using qsize() was not usable because qsize() does not work on MacOS (throws NotImplementedError). The working solution is to implement a reliable queue (SynchronizedQueue) based on [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] (butwith a twist that __init__ of class deriving from Queue has to be changed for python 3.4+ as described in [https://stackoverflow.com/questions/24941359/ctx-parameter-in-multiprocessing-queue]. Luckily we are now Python3.5+ We should replace all usages of multiprocessing.Queue where empty() is used with the SynchronizedQueue > multiprocessing.Queue is unreliable > --- > > Key: AIRFLOW-4401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4401 > Project: Apache Airflow > Issue Type: Bug >Reporter: Jarek Potiuk >Priority: Major > > After discussing with [~ash] and [~BasPH] potential reasons for flakiness of > LocalExecutor tests, I took a deeper dive into the problem and what I found > raised the remaining hair on top of my head. > We had a number of flaky tests in the local executor that resulted in > result_queue not being empty where it should have been emptied
[jira] [Updated] (AIRFLOW-4401) multiprocessing.Queue.empty() is unreliable
[ https://issues.apache.org/jira/browse/AIRFLOW-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Potiuk updated AIRFLOW-4401: -- Summary: multiprocessing.Queue.empty() is unreliable (was: multiprocessing.Queue is unreliable) > multiprocessing.Queue.empty() is unreliable > --- > > Key: AIRFLOW-4401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4401 > Project: Apache Airflow > Issue Type: Bug >Reporter: Jarek Potiuk >Priority: Major > > After discussing with [~ash] and [~BasPH] potential reasons for flakiness of > LocalExecutor tests, I took a deeper dive into the problem and what I found > raised the remaining hair on top of my head. > We had a number of flaky tests in the local executor that resulted in > result_queue not being empty where it should have been emptied a moment > before. More details and discussion can be found in > [https://github.com/apache/airflow/pull/5159] > The problem turned out to be ... unreliability of multiprocessing.Queue > empty() implementation. Itt turned out that multiprocessing.Queue.empty() > implementation is not fully synchronized and it might return True even if > put() operation has been already completed in another thread. What's more - > empty() might return True even if qsize() of the queue returns > 0 (!) > It's a bit mind-boggling but it is "as intended' as documented in > [https://bugs.python.org/issue23582] (resolved as "not a bug" ) > A few people have stumbled upon this problem. For example > [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] > and [https://github.com/keras-team/autokeras/issues/368] > Also we seemed to experienced that in Airflow before. In jobs.py year ago > (31.07.2016) - we can see the comment below (but we used > multiprocessing.queue empty() nevertheless): > {code:java} > # Not using multiprocessing.Queue() since it's no longer a separate > # process and due to some unusual behavior. (empty() incorrectly > # returns true?){code} > The solution available in [https://bugs.python.org/issue23582] using qsize() > was not usable because qsize() does not work on MacOS (throws > NotImplementedError). > The working solution is to implement a reliable queue (SynchronizedQueue) > based on > [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] > (butwith a twist that __init__ of class deriving from Queue has to be > changed for python 3.4+ as described in > [https://stackoverflow.com/questions/24941359/ctx-parameter-in-multiprocessing-queue]. > Luckily we are now Python3.5+ > We should replace all usages of multiprocessing.Queue where empty() is used > with the SynchronizedQueue. And make sure we do not use multiprocessing.Queue > in similar way in the future. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] codecov-io edited a comment on issue #4821: [AIRFLOW-3998] Use nested commands in cli.
codecov-io edited a comment on issue #4821: [AIRFLOW-3998] Use nested commands in cli. URL: https://github.com/apache/airflow/pull/4821#issuecomment-468981700 # [Codecov](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=h1) Report > Merging [#4821](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/de1795c8f1e65a5f4a848e88ef3848ba51aab449?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `82.92%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/4821/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4821 +/- ## == + Coverage 78.44% 78.45% +<.01% == Files 466 466 Lines 2973529706 -29 == - Hits2332723305 -22 + Misses 6408 6401 -7 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=tree) | Coverage Δ | | |---|---|---| | [...ow/contrib/example\_dags/example\_qubole\_operator.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4YW1wbGVfZGFncy9leGFtcGxlX3F1Ym9sZV9vcGVyYXRvci5weQ==) | `0% <ø> (ø)` | :arrow_up: | | [...flow/contrib/example\_dags/example\_qubole\_sensor.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4YW1wbGVfZGFncy9leGFtcGxlX3F1Ym9sZV9zZW5zb3IucHk=) | `0% <ø> (ø)` | :arrow_up: | | [...le\_dags/example\_passing\_params\_via\_test\_command.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvZXhhbXBsZV9wYXNzaW5nX3BhcmFtc192aWFfdGVzdF9jb21tYW5kLnB5) | `100% <ø> (ø)` | :arrow_up: | | [airflow/models/crypto.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvY3J5cHRvLnB5) | `79.41% <ø> (ø)` | :arrow_up: | | [...trib/example\_dags/example\_azure\_cosmosdb\_sensor.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4YW1wbGVfZGFncy9leGFtcGxlX2F6dXJlX2Nvc21vc2RiX3NlbnNvci5weQ==) | `0% <ø> (ø)` | :arrow_up: | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.59% <100%> (+0.17%)` | :arrow_up: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `66.23% <82.84%> (-0.4%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=footer). Last update [de1795c...5b67478](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #4821: [AIRFLOW-3998] Use nested commands in cli.
codecov-io edited a comment on issue #4821: [AIRFLOW-3998] Use nested commands in cli. URL: https://github.com/apache/airflow/pull/4821#issuecomment-468981700 # [Codecov](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=h1) Report > Merging [#4821](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/de1795c8f1e65a5f4a848e88ef3848ba51aab449?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `82.92%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/4821/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4821 +/- ## == + Coverage 78.44% 78.45% +<.01% == Files 466 466 Lines 2973529706 -29 == - Hits2332723305 -22 + Misses 6408 6401 -7 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=tree) | Coverage Δ | | |---|---|---| | [...ow/contrib/example\_dags/example\_qubole\_operator.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4YW1wbGVfZGFncy9leGFtcGxlX3F1Ym9sZV9vcGVyYXRvci5weQ==) | `0% <ø> (ø)` | :arrow_up: | | [...flow/contrib/example\_dags/example\_qubole\_sensor.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4YW1wbGVfZGFncy9leGFtcGxlX3F1Ym9sZV9zZW5zb3IucHk=) | `0% <ø> (ø)` | :arrow_up: | | [...le\_dags/example\_passing\_params\_via\_test\_command.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvZXhhbXBsZV9wYXNzaW5nX3BhcmFtc192aWFfdGVzdF9jb21tYW5kLnB5) | `100% <ø> (ø)` | :arrow_up: | | [airflow/models/crypto.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvY3J5cHRvLnB5) | `79.41% <ø> (ø)` | :arrow_up: | | [...trib/example\_dags/example\_azure\_cosmosdb\_sensor.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4YW1wbGVfZGFncy9leGFtcGxlX2F6dXJlX2Nvc21vc2RiX3NlbnNvci5weQ==) | `0% <ø> (ø)` | :arrow_up: | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.59% <100%> (+0.17%)` | :arrow_up: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/airflow/pull/4821/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `66.23% <82.84%> (-0.4%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=footer). Last update [de1795c...5b67478](https://codecov.io/gh/apache/airflow/pull/4821?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] jaketf opened a new pull request #5166: [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor
jaketf opened a new pull request #5166: [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor URL: https://github.com/apache/airflow/pull/5166 This commit add a GoogleCloudStorageUploadSessionCompleteSensor to address the use case of accepting files from a third party vendor who refuses to send a success indicator when providing data files into a bucket and waiting until an inactivity period has passed to indicate the end of an upload session. Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow-4937](https://issues.apache.org/jira/browse/AIRFLOW-4397) issues and references them in the PR title. ### Description - [x] This PR add a GCS sensor to poke a bucket until some inactivity period has passed with no new files added to the bucket. ### Tests - [x] My PR adds the following unit tests: `tests/contrib/sensors/test_gcs_upload_session_sensor.py` ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io commented on issue #5165: [AIRFLOW-4399] Avoid duplicated os.path.isfile() check in models.dagbag
codecov-io commented on issue #5165: [AIRFLOW-4399] Avoid duplicated os.path.isfile() check in models.dagbag URL: https://github.com/apache/airflow/pull/5165#issuecomment-486059506 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5165?src=pr=h1) Report > Merging [#5165](https://codecov.io/gh/apache/airflow/pull/5165?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/de1795c8f1e65a5f4a848e88ef3848ba51aab449?src=pr=desc) will **not change** coverage. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5165/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5165?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#5165 +/- ## === Coverage 78.44% 78.44% === Files 466 466 Lines 2973529735 === Hits2332723327 Misses 6408 6408 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5165?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models/dagbag.py](https://codecov.io/gh/apache/airflow/pull/5165/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnYmFnLnB5) | `91.82% <100%> (ø)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5165?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5165?src=pr=footer). Last update [de1795c...7966d38](https://codecov.io/gh/apache/airflow/pull/5165?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-4400) Wrong filemode on three files
[ https://issues.apache.org/jira/browse/AIRFLOW-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beau Barker updated AIRFLOW-4400: - Description: Three files inside the models directory have the wrong file mode. __init__.py, taskfail.py and taskreschedule.py. {noformat} $ ls -l airflow/models total 232K -rwxr-xr-x 1 beau staff 176K Apr 8 10:38 __init__.py* -rw-r--r-- 1 beau staff 1.2K Apr 5 13:47 base.py -rw-r--r-- 1 beau staff 12K Apr 8 09:28 connection.py -rw-r--r-- 1 beau staff 1.9K Apr 5 13:47 dagpickle.py -rw-r--r-- 1 beau staff 1.2K Apr 5 13:47 errors.py -rw-r--r-- 1 beau staff 2.8K Apr 5 13:47 kubernetes.py -rw-r--r-- 1 beau staff 2.2K Apr 5 13:47 log.py -rw-r--r-- 1 beau staff 2.6K Apr 5 13:47 skipmixin.py -rw-r--r-- 1 beau staff 1.8K Apr 5 13:47 slamiss.py -rwxr-xr-x 1 beau staff 2.0K Apr 5 13:47 taskfail.py* -rwxr-xr-x 1 beau staff 3.3K Apr 5 13:47 taskreschedule.py* -rw-r--r-- 1 beau staff 7.5K Apr 8 09:28 xcom.py {noformat} was: Three files inside the models directory have the wrong file mode. `__init__.py`, `taskfail.py` and `taskreschedule.py`. ``` $ ls -l airflow/models total 232K -rwxr-xr-x 1 beau staff 176K Apr 8 10:38 __init__.py* -rw-r--r-- 1 beau staff 1.2K Apr 5 13:47 base.py -rw-r--r-- 1 beau staff 12K Apr 8 09:28 connection.py -rw-r--r-- 1 beau staff 1.9K Apr 5 13:47 dagpickle.py -rw-r--r-- 1 beau staff 1.2K Apr 5 13:47 errors.py -rw-r--r-- 1 beau staff 2.8K Apr 5 13:47 kubernetes.py -rw-r--r-- 1 beau staff 2.2K Apr 5 13:47 log.py -rw-r--r-- 1 beau staff 2.6K Apr 5 13:47 skipmixin.py -rw-r--r-- 1 beau staff 1.8K Apr 5 13:47 slamiss.py -rwxr-xr-x 1 beau staff 2.0K Apr 5 13:47 taskfail.py* -rwxr-xr-x 1 beau staff 3.3K Apr 5 13:47 taskreschedule.py* -rw-r--r-- 1 beau staff 7.5K Apr 8 09:28 xcom.py ``` > Wrong filemode on three files > - > > Key: AIRFLOW-4400 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4400 > Project: Apache Airflow > Issue Type: Bug > Components: models >Affects Versions: 1.10.3 >Reporter: Beau Barker >Assignee: Beau Barker >Priority: Trivial > > Three files inside the models directory have the wrong file mode. > __init__.py, taskfail.py and taskreschedule.py. > {noformat} > $ ls -l airflow/models > total 232K > -rwxr-xr-x 1 beau staff 176K Apr 8 10:38 __init__.py* > -rw-r--r-- 1 beau staff 1.2K Apr 5 13:47 base.py > -rw-r--r-- 1 beau staff 12K Apr 8 09:28 connection.py > -rw-r--r-- 1 beau staff 1.9K Apr 5 13:47 dagpickle.py > -rw-r--r-- 1 beau staff 1.2K Apr 5 13:47 errors.py > -rw-r--r-- 1 beau staff 2.8K Apr 5 13:47 kubernetes.py > -rw-r--r-- 1 beau staff 2.2K Apr 5 13:47 log.py > -rw-r--r-- 1 beau staff 2.6K Apr 5 13:47 skipmixin.py > -rw-r--r-- 1 beau staff 1.8K Apr 5 13:47 slamiss.py > -rwxr-xr-x 1 beau staff 2.0K Apr 5 13:47 taskfail.py* > -rwxr-xr-x 1 beau staff 3.3K Apr 5 13:47 taskreschedule.py* > -rw-r--r-- 1 beau staff 7.5K Apr 8 09:28 xcom.py > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4400) Wrong filemode on three files
Beau Barker created AIRFLOW-4400: Summary: Wrong filemode on three files Key: AIRFLOW-4400 URL: https://issues.apache.org/jira/browse/AIRFLOW-4400 Project: Apache Airflow Issue Type: Bug Components: models Affects Versions: 1.10.3 Reporter: Beau Barker Assignee: Beau Barker Three files inside the models directory have the wrong file mode. `__init__.py`, `taskfail.py` and `taskreschedule.py`. ``` $ ls -l airflow/models total 232K -rwxr-xr-x 1 beau staff 176K Apr 8 10:38 __init__.py* -rw-r--r-- 1 beau staff 1.2K Apr 5 13:47 base.py -rw-r--r-- 1 beau staff 12K Apr 8 09:28 connection.py -rw-r--r-- 1 beau staff 1.9K Apr 5 13:47 dagpickle.py -rw-r--r-- 1 beau staff 1.2K Apr 5 13:47 errors.py -rw-r--r-- 1 beau staff 2.8K Apr 5 13:47 kubernetes.py -rw-r--r-- 1 beau staff 2.2K Apr 5 13:47 log.py -rw-r--r-- 1 beau staff 2.6K Apr 5 13:47 skipmixin.py -rw-r--r-- 1 beau staff 1.8K Apr 5 13:47 slamiss.py -rwxr-xr-x 1 beau staff 2.0K Apr 5 13:47 taskfail.py* -rwxr-xr-x 1 beau staff 3.3K Apr 5 13:47 taskreschedule.py* -rw-r--r-- 1 beau staff 7.5K Apr 8 09:28 xcom.py ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-4324) Partial search in RBAC UI no longer works
[ https://issues.apache.org/jira/browse/AIRFLOW-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaodong DENG resolved AIRFLOW-4324. Resolution: Resolved > Partial search in RBAC UI no longer works > - > > Key: AIRFLOW-4324 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4324 > Project: Apache Airflow > Issue Type: Bug > Components: ui >Affects Versions: 1.10.3 >Reporter: Robin Edwards >Assignee: sumous >Priority: Minor > Fix For: 1.10.4 > > > In airflow 1.10.2 RBAC UI searching for part of a dag name worked. For > example if I had multiple dags named: > foo_bar_1 > foo_bar_2 > foo_bar_3 > Searching for "foo" would return all 3 dags. As of upgrading to 1.10.3 the > same search yields no results. On the result page after no results are > returned the search autocomplete feature no longer suggest dags as I type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] XD-DENG edited a comment on issue #5131: [AIRFLOW-4324] fix search in RBAC UI
XD-DENG edited a comment on issue #5131: [AIRFLOW-4324] fix search in RBAC UI URL: https://github.com/apache/airflow/pull/5131#issuecomment-486053307 On the other hand, we have the same issue in `master` branch https://github.com/apache/airflow/blob/master/airflow/www/views.py#L239 @sumous , May you raise another PR to fix it in master as well? Thanks (you may use the same JIRA ticket number) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-4324) Partial search in RBAC UI no longer works
[ https://issues.apache.org/jira/browse/AIRFLOW-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824784#comment-16824784 ] ASF GitHub Bot commented on AIRFLOW-4324: - XD-DENG commented on pull request #5131: [AIRFLOW-4324] fix search in RBAC UI URL: https://github.com/apache/airflow/pull/5131 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Partial search in RBAC UI no longer works > - > > Key: AIRFLOW-4324 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4324 > Project: Apache Airflow > Issue Type: Bug > Components: ui >Affects Versions: 1.10.3 >Reporter: Robin Edwards >Assignee: sumous >Priority: Minor > Fix For: 1.10.4 > > > In airflow 1.10.2 RBAC UI searching for part of a dag name worked. For > example if I had multiple dags named: > foo_bar_1 > foo_bar_2 > foo_bar_3 > Searching for "foo" would return all 3 dags. As of upgrading to 1.10.3 the > same search yields no results. On the result page after no results are > returned the search autocomplete feature no longer suggest dags as I type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4324) Partial search in RBAC UI no longer works
[ https://issues.apache.org/jira/browse/AIRFLOW-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824785#comment-16824785 ] ASF subversion and git services commented on AIRFLOW-4324: -- Commit 57013ecf01acc3f5e95e2c9668181b415a1ab47d in airflow's branch refs/heads/v1-10-stable from sumous [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=57013ec ] [AIRFLOW-4324] fix DAG fuzzy search in RBAC UI (#5131) > Partial search in RBAC UI no longer works > - > > Key: AIRFLOW-4324 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4324 > Project: Apache Airflow > Issue Type: Bug > Components: ui >Affects Versions: 1.10.3 >Reporter: Robin Edwards >Assignee: sumous >Priority: Minor > Fix For: 1.10.4 > > > In airflow 1.10.2 RBAC UI searching for part of a dag name worked. For > example if I had multiple dags named: > foo_bar_1 > foo_bar_2 > foo_bar_3 > Searching for "foo" would return all 3 dags. As of upgrading to 1.10.3 the > same search yields no results. On the result page after no results are > returned the search autocomplete feature no longer suggest dags as I type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] XD-DENG merged pull request #5131: [AIRFLOW-4324] fix search in RBAC UI
XD-DENG merged pull request #5131: [AIRFLOW-4324] fix search in RBAC UI URL: https://github.com/apache/airflow/pull/5131 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Assigned] (AIRFLOW-3143) Support auto-zone in DataprocClusterCreateOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Croteau reassigned AIRFLOW-3143: - Assignee: Joel Croteau (was: fengya li) > Support auto-zone in DataprocClusterCreateOperator > -- > > Key: AIRFLOW-3143 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3143 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, operators >Reporter: Wilson Lian >Assignee: Joel Croteau >Priority: Minor > > [Dataproc > Auto-zone|https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/auto-zone] > allows users to omit the zone when creating a cluster, and the service will > pick a zone in the chosen region. > Providing an empty string or None for `zone` would match up with how users > would request auto-zone via direct API access, but as-is the > DataprocClusterCreateOperator makes a bad API request when such values are > passed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4399) To avoid duplicated os.path.isfile(filepath) checks in models.dagbag
[ https://issues.apache.org/jira/browse/AIRFLOW-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824766#comment-16824766 ] ASF GitHub Bot commented on AIRFLOW-4399: - XD-DENG commented on pull request #5165: [AIRFLOW-4399] Avoid duplicated os.path.isfile() check in models.dagbag URL: https://github.com/apache/airflow/pull/5165 ### Jira - https://issues.apache.org/jira/browse/AIRFLOW-4399 ### Description The check which is removed in this commit always returns `True` for sure, because the same check has already been done earlier in this function body https://github.com/apache/airflow/blob/de1795c8f1e65a5f4a848e88ef3848ba51aab449/airflow/models/dagbag.py#L157-L158 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > To avoid duplicated os.path.isfile(filepath) checks in models.dagbag > > > Key: AIRFLOW-4399 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4399 > Project: Apache Airflow > Issue Type: Improvement > Components: models >Affects Versions: 1.10.3 >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Minor > Fix For: 1.10.4 > > > There are duplicated os.path.isfile(filepath) checks, among which the 2nd one > is not necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] XD-DENG opened a new pull request #5165: [AIRFLOW-4399] Avoid duplicated os.path.isfile() check in models.dagbag
XD-DENG opened a new pull request #5165: [AIRFLOW-4399] Avoid duplicated os.path.isfile() check in models.dagbag URL: https://github.com/apache/airflow/pull/5165 ### Jira - https://issues.apache.org/jira/browse/AIRFLOW-4399 ### Description The check which is removed in this commit always returns `True` for sure, because the same check has already been done earlier in this function body https://github.com/apache/airflow/blob/de1795c8f1e65a5f4a848e88ef3848ba51aab449/airflow/models/dagbag.py#L157-L158 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-4399) To avoid duplicated os.path.isfile(filepath) checks in models.dagbag
[ https://issues.apache.org/jira/browse/AIRFLOW-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaodong DENG updated AIRFLOW-4399: --- Summary: To avoid duplicated os.path.isfile(filepath) checks in models.dagbag (was: To remove duplicated os.path.isfile(filepath) checks in models.dagbag) > To avoid duplicated os.path.isfile(filepath) checks in models.dagbag > > > Key: AIRFLOW-4399 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4399 > Project: Apache Airflow > Issue Type: Improvement > Components: models >Affects Versions: 1.10.3 >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Minor > Fix For: 1.10.4 > > > There are duplicated os.path.isfile(filepath) checks, among which the 2nd one > is not necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4399) To remove duplicated os.path.isfile(filepath) checks in models.dagbag
Xiaodong DENG created AIRFLOW-4399: -- Summary: To remove duplicated os.path.isfile(filepath) checks in models.dagbag Key: AIRFLOW-4399 URL: https://issues.apache.org/jira/browse/AIRFLOW-4399 Project: Apache Airflow Issue Type: Improvement Components: models Affects Versions: 1.10.3 Reporter: Xiaodong DENG Assignee: Xiaodong DENG Fix For: 1.10.4 There are duplicated os.path.isfile(filepath) checks, among which the 2nd one is not necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] milton0825 commented on a change in pull request #5094: [AIRFLOW-4306] Global operator extra links
milton0825 commented on a change in pull request #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#discussion_r277924790 ## File path: docs/plugins.rst ## @@ -202,6 +212,13 @@ definitions in Airflow. def stat_name_dummy_handler(stat_name): return stat_name +# A global operator extra link that redirect you to google +class GoogleLink(BaseOperatorLink): Review comment: This is for the testing purpose so I want to make it as simple as possible This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao commented on issue #5094: [AIRFLOW-4306] Global operator extra links
feng-tao commented on issue #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#issuecomment-486027034 @KevinYang21 , not sure how to DM you, but you may want to check your email :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] KevinYang21 edited a comment on issue #5094: [AIRFLOW-4306] Global operator extra links
KevinYang21 edited a comment on issue #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#issuecomment-486025570 @feng-tao Took a quick look at the code and I think that's because we start to construct the links from operator instead of from task instance. But since we already have the `execution_date` we can always construct the TI just like other endpoints and use whatever info is inside. While getting the TI fields is easy, my thinking is still more on the form of the link. The fact that we can just construct a link button right now is probly not going to look very nice when we need some other format, like when we want the download log look for our kibana log links. EDIT: (hit comment button too soon) I'm not trying to scope out this PR, I think there might be use cases that a link button is enough. If it is not trivial to create such a framework for more complex pluggable components in the dialog it makes sense to start another PR around that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao edited a comment on issue #5094: [AIRFLOW-4306] Global operator extra links
feng-tao edited a comment on issue #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#issuecomment-486026307 yeah, FORM/Design/UI is always a tough question BTW @KevinYang21 , is yrql...@gmail.com your email ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao commented on issue #5094: [AIRFLOW-4306] Global operator extra links
feng-tao commented on issue #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#issuecomment-486026307 yeah, FORM/Design/UI is always a tough question BTW, @KevinYang21 , is yrql...@gmail.com your email ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] zhongjiajie commented on a change in pull request #5153: [AIRFLOW-XXX] Name default config_file param in KubernetesPodOperator docstring
zhongjiajie commented on a change in pull request #5153: [AIRFLOW-XXX] Name default config_file param in KubernetesPodOperator docstring URL: https://github.com/apache/airflow/pull/5153#discussion_r277922027 ## File path: airflow/contrib/operators/kubernetes_pod_operator.py ## @@ -71,7 +71,8 @@ class KubernetesPodOperator(BaseOperator): :type affinity: dict :param node_selectors: A dict containing a group of scheduling rules :type node_selectors: dict -:param config_file: The path to the Kubernetes config file +:param config_file: The path to the Kubernetes config file. +If not specified, defaults to ``~/.kube/config`` Review comment: Looking great. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] KevinYang21 commented on issue #5094: [AIRFLOW-4306] Global operator extra links
KevinYang21 commented on issue #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#issuecomment-486025570 @feng-tao Took a quick look at the code and I think that's because we start to construct the links from operator instead of from task instance. But since we already have the `execution_date` we can always construct the TI just like other endpoints and use whatever info is inside. While getting the TI fields is easy, my thinking is still more on the form of the link. The fact that we can just construct a link button right now is probly not going to look very nice when we need some other format, like when we want the download log look for our kibana log links. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] milton0825 commented on issue #5094: [AIRFLOW-4306] Global operator extra links
milton0825 commented on issue #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#issuecomment-486025327 @feng-tao we exposed the operator so all the following fields should be available to user: https://github.com/apache/airflow/blob/8be65327e8cd1f4cf7dedcb90b8eacc4fa5f635d/airflow/models/baseoperator.py#L241-L274 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] milton0825 commented on a change in pull request #5094: [AIRFLOW-4306] Global operator extra links
milton0825 commented on a change in pull request #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#discussion_r277920779 ## File path: tests/plugins/test_plugin.py ## @@ -100,6 +114,10 @@ class AirflowTestPlugin(AirflowPlugin): appbuilder_views = [v_appbuilder_package] appbuilder_menu_items = [appbuilder_mitem] stat_name_handler = staticmethod(stat_name_dummy_handler) +global_operator_extra_links = [ +GoogleLink(), Review comment: Yeah I think that make more sense to let the user know our intent This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] milton0825 commented on a change in pull request #5094: [AIRFLOW-4306] Global operator extra links
milton0825 commented on a change in pull request #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#discussion_r277920575 ## File path: airflow/models/baseoperator.py ## @@ -973,6 +986,25 @@ class BaseOperatorLink: __metaclass__ = ABCMeta +@property +@abstractmethod +def name(self): +# type: () -> str +""" +Name of the link. This will be the button name on the task UI. + +:return: link name +""" +pass + @abstractmethod def get_link(self, operator, dttm): +# type: (BaseOperator, datetime) -> str +""" +Link to external system. + +:param operator: airflow operator Review comment: I have a type hint in: https://github.com/apache/airflow/pull/5094/files/8be65327e8cd1f4cf7dedcb90b8eacc4fa5f635d#diff-b5ec97b8301ab688b7e045426f24d485R1002 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-4398) Extra links include try_number
Chao-Han Tsai created AIRFLOW-4398: -- Summary: Extra links include try_number Key: AIRFLOW-4398 URL: https://issues.apache.org/jira/browse/AIRFLOW-4398 Project: Apache Airflow Issue Type: New Feature Reporter: Chao-Han Tsai Assignee: Chao-Han Tsai Currently airflow supports constructing extra links using task and execution_date. https://github.com/apache/airflow/blob/master/airflow/models/baseoperator.py#L977 We should also include the try_number into the scope. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] milton0825 edited a comment on issue #5094: [AIRFLOW-4306] Global operator extra links
milton0825 edited a comment on issue #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#issuecomment-486023240 @KevinYang21 @kunalyogenshah I think it totally make sense to include the try-number into the scope. I am thinking about addressing that in a separate PR to limit the scope of changes in this PR and here is the jira to track that: https://issues.apache.org/jira/browse/AIRFLOW-4398 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] milton0825 commented on issue #5094: [AIRFLOW-4306] Global operator extra links
milton0825 commented on issue #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#issuecomment-486023240 @KevinYang21 @kunalyogenshah I think it totally make sense to include the try-number into the scope. I am thinking about addressing that in a separate PR and here is the jira to track that: https://issues.apache.org/jira/browse/AIRFLOW-4398 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-4397) Add GCSUploadSessionCompleteSensor
Jacob Ferriero created AIRFLOW-4397: --- Summary: Add GCSUploadSessionCompleteSensor Key: AIRFLOW-4397 URL: https://issues.apache.org/jira/browse/AIRFLOW-4397 Project: Apache Airflow Issue Type: New Feature Components: contrib Reporter: Jacob Ferriero Assignee: Jacob Ferriero I'd like to contribute a Sensor for Google Cloud Storage that can poke a bucket until there has been sufficient time without a new file drop. Often times there are cases where a third party vendor drops data to a bucket but don't send a success flag when they are done. This sensor would allow you to poke every n minutes to check if more files have been added since the last poke, and if there had been `inactivity_period` minutes without a new file drop, return `True`. This could allow SLA misses if data did not arrive by an expected time, and have a configurable deadline past which the sensor would fail. Optionally the user could specify a minimum number of files for the sensor to succeed. This would be my first time contributing to an OSS project, so please let me know if this is not the appropriate place to start. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] KevinYang21 commented on issue #5094: [AIRFLOW-4306] Global operator extra links
KevinYang21 commented on issue #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#issuecomment-486021540 Nice idea to make it pluggable. We do also have the need to put a Kibana link on the dialog but as @kunalyogenshah mentioned, for task logs I think it makes more sense to provide different links for different `try_number`. Possible that we can make it so flexible that we can have plugin section just like the download log section? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] kunalyogenshah commented on issue #5094: [AIRFLOW-4306] Global operator extra links
kunalyogenshah commented on issue #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#issuecomment-486019322 @milton0825 Got it. Constructing it this way is certainly doable. But somehow it does seem like introducing the Kibana log URLs in a more central way like the Download log links / View log links could be more semantically appropriate than adding them through global extra links? Specially if you consider adding try number to the mix, in which case a Kibana global should return multi-links instead of one link. Just a thought. + @KevinYang21 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] milton0825 commented on issue #5094: [AIRFLOW-4306] Global operator extra links
milton0825 commented on issue #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#issuecomment-486015757 @kunalyogenshah so currently the operator instance and the `execution_time` are exposed to you like in: https://github.com/apache/airflow/pull/5094/files#diff-2fdb519812514e29589938d977d68ea0R94 and you should have the information (dag_id, task_id,...) to formulate the URL with granularity to each task_instance. If you need information such as log id that is out of the scope of airflow, you can do whatever you need in the `get_link` to get that information. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-4346) Kubernetes Executor Fails for Large Wide DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824629#comment-16824629 ] Vincent Castaneda commented on AIRFLOW-4346: Has anyone been looking into this? > Kubernetes Executor Fails for Large Wide DAGs > - > > Key: AIRFLOW-4346 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4346 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, executor, kubernetes >Affects Versions: 1.10.2, 1.10.3 >Reporter: Vincent Castaneda >Priority: Blocker > Attachments: configmap-airflow-share.yaml, sched_logs.txt, > wide_dag_bash_test.py, wide_dag_test_100_300.py, wide_dag_test_300_300.py > > > When running large DAGs–those with parallelism of over 100 task instances to > be running concurrently--several tasks fail on the executor and are reported > to the database, but the scheduler is never aware of them failing. > Attached are: > - A test DAG that we can use to replicate the issue. > - The configmap-airflow.yaml file > I will be available to answer any other questions that are raised about our > configuration. We are running this on GKE and giving the scheduler and web > pod a base 100m for execution. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] kunalyogenshah commented on issue #5094: [AIRFLOW-4306] Global operator extra links
kunalyogenshah commented on issue #5094: [AIRFLOW-4306] Global operator extra links URL: https://github.com/apache/airflow/pull/5094#issuecomment-486008366 Quick question about this implementation. Could you help provide an example usage for let's say Kibana logs that require arguments to be passed to it. For example, if I want to add a global Kibana link for all tasks, I need to be able to specify something like a log ID to the URL to redirect to the right place. Does this implementation allow that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] kunalyogenshah commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI.
kunalyogenshah commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI. URL: https://github.com/apache/airflow/pull/5164#issuecomment-486007872 Perfect. Thanks @milton0825 . I'll move my questions over to that PR for better context :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] milton0825 commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI.
milton0825 commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI. URL: https://github.com/apache/airflow/pull/5164#issuecomment-486006581 @kunalyogenshah I think this is what you are looking for: https://github.com/apache/airflow/pull/5094 We can register an extra link that is global to all the operators. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] kunalyogenshah commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI.
kunalyogenshah commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI. URL: https://github.com/apache/airflow/pull/5164#issuecomment-486006137 @feng-tao : A quick naive clarification question. Does this mean that the DAG operator itself must specify the link to Kibana as an extra link in its operator definition? Or is there a way to make that universal for all DAGs and tasks? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] kunalyogenshah commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI.
kunalyogenshah commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI. URL: https://github.com/apache/airflow/pull/5164#issuecomment-486004575 Ah. Thanks @feng-tao . This seems to be exactly similar to what we've been trying to do. Let me look into using extra links instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao edited a comment on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI.
feng-tao edited a comment on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI. URL: https://github.com/apache/airflow/pull/5164#issuecomment-486001330 And we have already did that at Lyft in prod for quite some time(https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff). cc @milton0825 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI.
feng-tao commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI. URL: https://github.com/apache/airflow/pull/5164#issuecomment-486001330 And we have already did that at Lyft in prod for quite some time. cc @milton0825 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI.
feng-tao commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI. URL: https://github.com/apache/airflow/pull/5164#issuecomment-486001239 @kunalyogenshah , not sure if this is the right way to show the UI. I think another approach(Lyft) is to leverage https://github.com/apache/airflow/pull/5059 and https://github.com/apache/airflow/pull/5094 which allows you to customize your task instance panel in generic way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io commented on issue #5161: [AIRFLOW-4395] Remove pickle_info view
codecov-io commented on issue #5161: [AIRFLOW-4395] Remove pickle_info view URL: https://github.com/apache/airflow/pull/5161#issuecomment-486000901 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5161?src=pr=h1) Report > Merging [#5161](https://codecov.io/gh/apache/airflow/pull/5161?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/de1795c8f1e65a5f4a848e88ef3848ba51aab449?src=pr=desc) will **decrease** coverage by `0.02%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5161/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5161?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5161 +/- ## == - Coverage 78.44% 78.42% -0.03% == Files 466 466 Lines 2973529722 -13 == - Hits2332723308 -19 - Misses 6408 6414 +6 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5161?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www/views.py](https://codecov.io/gh/apache/airflow/pull/5161/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `76.18% <ø> (-0.16%)` | :arrow_down: | | [airflow/www/security.py](https://codecov.io/gh/apache/airflow/pull/5161/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvc2VjdXJpdHkucHk=) | `92.3% <ø> (ø)` | :arrow_up: | | [airflow/models/dag.py](https://codecov.io/gh/apache/airflow/pull/5161/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnLnB5) | `92.1% <0%> (-1.27%)` | :arrow_down: | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5161/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.59% <0%> (+0.17%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5161?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5161?src=pr=footer). Last update [de1795c...78ab648](https://codecov.io/gh/apache/airflow/pull/5161?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] KevinYang21 commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI.
KevinYang21 commented on issue #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI. URL: https://github.com/apache/airflow/pull/5164#issuecomment-485994653 This is nice, thank you @kunalyogenshah ! Users using Airflow with Elasticsearch backend do likely want to utilize more those aggregation features Elasticsearch frontends, e.g. Kibana, provide. Before looking into the code, my opinion on the UI change is that we make the layout similar to how we do the log downloading--we allow users to choose the try number of the task instances. Do you think that make sense? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3951) No easy way to get query_execution_id from AWSAthenaOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824570#comment-16824570 ] Rob Dinoff commented on AIRFLOW-3951: - yes you may close this ... thanks, Rob On Tue, Apr 23, 2019 at 5:04 PM Ash Berlin-Taylor (JIRA) > No easy way to get query_execution_id from AWSAthenaOperator > - > > Key: AIRFLOW-3951 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3951 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib, operators >Affects Versions: 1.10.2 >Reporter: Rob Dinoff >Priority: Major > > There does not seem to be an easy way to get the query_execution_id from the > AWSAthensOperator. This should be pushed as an xcom variable. Need this so > query_execution_id can be used in the AthenaSensor. > The following should be added to the end of the AWSAthenaOperator.execute > {code:java} > context["task_instance"].xcom_push(key="query_execution_id", > value=self.query_execution_id){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4396) Add a link to Elasticsearch frontend for logs if available
[ https://issues.apache.org/jira/browse/AIRFLOW-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824571#comment-16824571 ] ASF GitHub Bot commented on AIRFLOW-4396: - kunalyogenshah commented on pull request #5164: [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI. URL: https://github.com/apache/airflow/pull/5164 ### Jira - [ ] My PR addresses the following: - https://issues.apache.org/jira/browse/AIRFLOW-4396 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: This PR adds a new button to the Airflow UI next to "View log" ("View log in ES"). It provides an external URL for visualizing the log, and renders the latest try for the Task, DAG and execution being looked up. This button is controlled by a new `elasticsearch_frontend` conf param, which provides the template URL for viewing the corresponding log. Screenshot below : ![Screen Shot 2019-04-23 at 2 54 15 PM](https://user-images.githubusercontent.com/5075200/56618520-b4f77100-65d7-11e9-99a8-37c4c9577407.png) Documentation update is pending code review. Will update documentation after. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Unit tested the UI and checked the console for errors in the JS. Button doesn't show up if the param is not defined. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add a link to Elasticsearch frontend for logs if available > -- > > Key: AIRFLOW-4396 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4396 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Reporter: Kunal Shah >Assignee: Kunal Shah >Priority: Minor > > If elasticsearch backend is being used for logs collection, then there may > exist a frontend to redirect for the client to view their logs. In those > cases, we can provide a link to the external UI which may allow visualizing > the logs in a more queryable fashion. > The idea is to add a new conf param for elasticsearch for a template URL to > the frontend, which can formatted with the elasticsearch log ID template. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-4396) Add a link to Elasticsearch frontend for logs if available
Kunal Shah created AIRFLOW-4396: --- Summary: Add a link to Elasticsearch frontend for logs if available Key: AIRFLOW-4396 URL: https://issues.apache.org/jira/browse/AIRFLOW-4396 Project: Apache Airflow Issue Type: Improvement Components: webserver Reporter: Kunal Shah Assignee: Kunal Shah If elasticsearch backend is being used for logs collection, then there may exist a frontend to redirect for the client to view their logs. In those cases, we can provide a link to the external UI which may allow visualizing the logs in a more queryable fashion. The idea is to add a new conf param for elasticsearch for a template URL to the frontend, which can formatted with the elasticsearch log ID template. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] kaxil edited a comment on issue #5143: [AIRFLOW-4204] Update super() calls to PY3
kaxil edited a comment on issue #5143: [AIRFLOW-4204] Update super() calls to PY3 URL: https://github.com/apache/airflow/pull/5143#issuecomment-485983744 We should have a separate PR for modifying nvd3 stuff - ideally with a new JIRA issue. It becomes easy to revert if need be. Or create sub-task under a parent JIRA This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io commented on issue #5163: WIP: [AIRFLOW-XXX] Remove hard-coded logging level
codecov-io commented on issue #5163: WIP: [AIRFLOW-XXX] Remove hard-coded logging level URL: https://github.com/apache/airflow/pull/5163#issuecomment-485984210 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5163?src=pr=h1) Report > Merging [#5163](https://codecov.io/gh/apache/airflow/pull/5163?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/de1795c8f1e65a5f4a848e88ef3848ba51aab449?src=pr=desc) will **not change** coverage. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5163/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5163?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#5163 +/- ## === Coverage 78.44% 78.44% === Files 466 466 Lines 2973529735 === Hits2332723327 Misses 6408 6408 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5163?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/settings.py](https://codecov.io/gh/apache/airflow/pull/5163/diff?src=pr=tree#diff-YWlyZmxvdy9zZXR0aW5ncy5weQ==) | `83.87% <100%> (ø)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5163?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5163?src=pr=footer). Last update [de1795c...d7ed667](https://codecov.io/gh/apache/airflow/pull/5163?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] kaxil commented on issue #5143: [AIRFLOW-4204] Update super() calls to PY3
kaxil commented on issue #5143: [AIRFLOW-4204] Update super() calls to PY3 URL: https://github.com/apache/airflow/pull/5143#issuecomment-485983744 @potiuk - We don't use tool any longer (at least I don't since we moved to GitBox). Use **Squash and merge** from the UI as shown in the image below. Remove extra details from the commit message and resolve the Jira issue once merged. ![image](https://user-images.githubusercontent.com/8811558/56617494-ec860d80-6617-11e9-9b3d-d63681a2f57a.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] kaxil closed pull request #5163: WIP: [AIRFLOW-XXX] Remove hard-coded logging level
kaxil closed pull request #5163: WIP: [AIRFLOW-XXX] Remove hard-coded logging level URL: https://github.com/apache/airflow/pull/5163 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] BasPH edited a comment on issue #5159: [AIRFLOW-XXX] Attempt to remove flakeyness of LocalExecutor
BasPH edited a comment on issue #5159: [AIRFLOW-XXX] Attempt to remove flakeyness of LocalExecutor URL: https://github.com/apache/airflow/pull/5159#issuecomment-485977458 I have very limited time this week but tried digging into it. At least I managed to reproduce locally: I set an optional breakpoint before `self.assertEqual(len(executor.running), 0)`: ```python if len(executor.running) != 0: import ipdb ipdb.set_trace() self.assertEqual(len(executor.running), 0) ``` Let this run a few times and after just a few attempts I hit the breakpoint: ```bash for n in {1..100}; do nosetests tests/executors/test_local_executor.py:LocalExecutorTest.test_execution_limited_parallelism -s; done ``` For the record, at this point: - `len(executor.running)` is 1 at this point in time and `executor.running` contains `{'fail': True}` - `executor.queue.empty()` returns `True` - `executor.result_queue.get()` returns `('fail', 'failed')` (I would expect this thing to be empty because `sync()` is called which loops over the result_queue, clearing out all items...) Thinking out loud: my understanding is there's a race condition where the asserts are already being called before `executor.end()` finishes. I imagine there's something iffy about multiple workers and passing around the JoinableQueue to all workers, but unsure exactly what's causing the issue. Maybe `change_state()` might not finish fast enough, and the result_queue is already empty by then, and as a result the `end()` function is finished? Hope this helps so far :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3951) No easy way to get query_execution_id from AWSAthenaOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824525#comment-16824525 ] Ash Berlin-Taylor commented on AIRFLOW-3951: "but no way to tell if it succeeded or failed" was a bug that has been fixed in 1.10.4 - if the Query fails the Operator will now also fail. Given that I think we can close this? > No easy way to get query_execution_id from AWSAthenaOperator > - > > Key: AIRFLOW-3951 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3951 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib, operators >Affects Versions: 1.10.2 >Reporter: Rob Dinoff >Priority: Major > > There does not seem to be an easy way to get the query_execution_id from the > AWSAthensOperator. This should be pushed as an xcom variable. Need this so > query_execution_id can be used in the AthenaSensor. > The following should be added to the end of the AWSAthenaOperator.execute > {code:java} > context["task_instance"].xcom_push(key="query_execution_id", > value=self.query_execution_id){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] kaxil opened a new pull request #5163: WIP: [AIRFLOW-XXX] Remove hard-coded logging level
kaxil opened a new pull request #5163: WIP: [AIRFLOW-XXX] Remove hard-coded logging level URL: https://github.com/apache/airflow/pull/5163 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3951) No easy way to get query_execution_id from AWSAthenaOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824490#comment-16824490 ] Rob Dinoff commented on AIRFLOW-3951: - There is no way to get the query status returned from poll_query_status or the execution id returned from the hook outside of the operator so was not sure how AthenaSensor would work. But yes just using AWSAthenaOperator will work but no way to tell if it succeeded or failed > No easy way to get query_execution_id from AWSAthenaOperator > - > > Key: AIRFLOW-3951 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3951 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, contrib, operators >Affects Versions: 1.10.2 >Reporter: Rob Dinoff >Priority: Major > > There does not seem to be an easy way to get the query_execution_id from the > AWSAthensOperator. This should be pushed as an xcom variable. Need this so > query_execution_id can be used in the AthenaSensor. > The following should be added to the end of the AWSAthenaOperator.execute > {code:java} > context["task_instance"].xcom_push(key="query_execution_id", > value=self.query_execution_id){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] potiuk commented on a change in pull request #5159: [AIRFLOW-XXX] Attempt to remove flakeyness of LocalExecutor
potiuk commented on a change in pull request #5159: [AIRFLOW-XXX] Attempt to remove flakeyness of LocalExecutor URL: https://github.com/apache/airflow/pull/5159#discussion_r277838641 ## File path: tests/executors/test_local_executor.py ## @@ -49,11 +48,7 @@ def execution_parallelism(self, parallelism=0): executor.running['fail'] = True Review comment: I think this also should be moved before execute_async above. I also do not understand why we need the try/except This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #5153: [AIRFLOW-XXX] Name default config_file param in KubernetesPodOperator docstring
codecov-io edited a comment on issue #5153: [AIRFLOW-XXX] Name default config_file param in KubernetesPodOperator docstring URL: https://github.com/apache/airflow/pull/5153#issuecomment-485495821 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=h1) Report > Merging [#5153](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/de1795c8f1e65a5f4a848e88ef3848ba51aab449?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5153/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#5153 +/- ## === Coverage 78.44% 78.44% === Files 466 466 Lines 2973529735 === Hits2332723327 Misses 6408 6408 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=tree) | Coverage Δ | | |---|---|---| | [...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/5153/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==) | `98.63% <ø> (ø)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=footer). Last update [de1795c...469bbeb](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #5153: [AIRFLOW-XXX] Name default config_file param in KubernetesPodOperator docstring
codecov-io edited a comment on issue #5153: [AIRFLOW-XXX] Name default config_file param in KubernetesPodOperator docstring URL: https://github.com/apache/airflow/pull/5153#issuecomment-485495821 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=h1) Report > Merging [#5153](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/de1795c8f1e65a5f4a848e88ef3848ba51aab449?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5153/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#5153 +/- ## === Coverage 78.44% 78.44% === Files 466 466 Lines 2973529735 === Hits2332723327 Misses 6408 6408 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=tree) | Coverage Δ | | |---|---|---| | [...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/5153/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==) | `98.63% <ø> (ø)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=footer). Last update [de1795c...469bbeb](https://codecov.io/gh/apache/airflow/pull/5153?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on issue #5158: [AIRFLOW-XXX] Speed up tests for PythonSensor by 60s
Fokko commented on issue #5158: [AIRFLOW-XXX] Speed up tests for PythonSensor by 60s URL: https://github.com/apache/airflow/pull/5158#issuecomment-485910763 Love it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on issue #5162: [AIRFLOW-4358] Speed up test_jobs by not running tasks
ashb commented on issue #5162: [AIRFLOW-4358] Speed up test_jobs by not running tasks URL: https://github.com/apache/airflow/pull/5162#issuecomment-485898431 Looks like I might have some ordering in the tests to sort out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao commented on issue #5162: [AIRFLOW-4358] Speed up test_jobs by not running tasks
feng-tao commented on issue #5162: [AIRFLOW-4358] Speed up test_jobs by not running tasks URL: https://github.com/apache/airflow/pull/5162#issuecomment-485891838 Nice! It is always good to speed up the CI! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-4358) Decouple running actual tasks from test_jobs.py
[ https://issues.apache.org/jira/browse/AIRFLOW-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824325#comment-16824325 ] ASF GitHub Bot commented on AIRFLOW-4358: - ashb commented on pull request #5162: [AIRFLOW-4358] Speed up test_jobs by not running tasks URL: https://github.com/apache/airflow/pull/5162 Make sure you have checked _all_ steps below. ### Jira - [x] https://issues.apache.org/jira/browse/AIRFLOW-4358 ### Description - [ ] We don't care about the behaviour of the tasks (we test those elsewhere) - we just care that the task is "run" and yeilds a state, which we can all do more directly. On my laptop this takes the time for SchedulerJobTest from 133s to 35s, and reduces the BackfillJobTests from >5mins to mere seconds! I think this approach still tests the behaviour well enough ### Tests - [x] Changed tests to check events that are run via executor, but it won't run it ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Decouple running actual tasks from test_jobs.py > --- > > Key: AIRFLOW-4358 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4358 > Project: Apache Airflow > Issue Type: Sub-task > Components: tests >Reporter: Ash Berlin-Taylor >Assignee: Ash Berlin-Taylor >Priority: Major > > In tests/test_jobs.py we have a lot of slow, fragile tests that test the > behaviour of the "Jobs" - BackfillJob and SchedulerJob being the main two. > Since we have tests that check that the executors will actually _run_ tasks > we should "stub" out the executor and replace the executor in those tests > with one that just records what tasks it was asked to run. > This will hopefully speed up the tests and make them less fragile. > A possible starting point: > {code:title=test_jobs.diff} > diff --git a/tests/test_jobs.py b/tests/test_jobs.py > index e3ae8be1..aa919f90 100644 > --- a/tests/test_jobs.py > +++ b/tests/test_jobs.py > @@ -1384,6 +1384,14 @@ class LocalTaskJobTest(unittest.TestCase): > class SchedulerJobTest(unittest.TestCase): > +class NullExecutor(airflow.executors.base_executor.BaseExecutor): > +def heartbeat(self, *args, **kwargs): > +# This would normally pop tasks of the queue and run them. > +pass > + > +def end(): > +pass > + > def setUp(self): > clear_db_runs() > clear_db_pools() > @@ -1391,6 +1399,10 @@ class SchedulerJobTest(unittest.TestCase): > clear_db_sla_miss() > clear_db_errors() > +# Speed up some tests by not running the tasks, just look at what we > +# enqueue! > +self.null_exec = self.NullExecutor() > + > @classmethod > def setUpClass(cls): > cls.dagbag = DagBag() > @@ -1409,8 +1421,7 @@ class SchedulerJobTest(unittest.TestCase): > def tearDownClass(cls): > cls.patcher.stop() > -@staticmethod > -def run_single_scheduler_loop_with_no_dags(dags_folder): > +def run_single_scheduler_loop_with_no_dags(self, dags_folder): > """ > Utility function that runs a single scheduler loop without actually > changing/scheduling any dags. This is useful to simulate the other > side effects of > @@ -1421,6 +1432,7 @@ class SchedulerJobTest(unittest.TestCase): > :type directory: str > """ > scheduler = SchedulerJob( > +
[GitHub] [airflow] ashb opened a new pull request #5162: [AIRFLOW-4358] Speed up test_jobs by not running tasks
ashb opened a new pull request #5162: [AIRFLOW-4358] Speed up test_jobs by not running tasks URL: https://github.com/apache/airflow/pull/5162 Make sure you have checked _all_ steps below. ### Jira - [x] https://issues.apache.org/jira/browse/AIRFLOW-4358 ### Description - [ ] We don't care about the behaviour of the tasks (we test those elsewhere) - we just care that the task is "run" and yeilds a state, which we can all do more directly. On my laptop this takes the time for SchedulerJobTest from 133s to 35s, and reduces the BackfillJobTests from >5mins to mere seconds! I think this approach still tests the behaviour well enough ### Tests - [x] Changed tests to check events that are run via executor, but it won't run it ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #5159: [AIRFLOW-XXX] Attempt to remove flakeyness of LocalExecutor
ashb commented on a change in pull request #5159: [AIRFLOW-XXX] Attempt to remove flakeyness of LocalExecutor URL: https://github.com/apache/airflow/pull/5159#discussion_r23566 ## File path: tests/executors/test_local_executor.py ## @@ -49,11 +48,7 @@ def execution_parallelism(self, parallelism=0): executor.running['fail'] = True -if parallelism == 0: Review comment: The sentinel trick didn't immediately work, likely because we are using multiprocessing. I'll leave that as a different issue then. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work started] (AIRFLOW-4358) Decouple running actual tasks from test_jobs.py
[ https://issues.apache.org/jira/browse/AIRFLOW-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-4358 started by Ash Berlin-Taylor. -- > Decouple running actual tasks from test_jobs.py > --- > > Key: AIRFLOW-4358 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4358 > Project: Apache Airflow > Issue Type: Sub-task > Components: tests >Reporter: Ash Berlin-Taylor >Assignee: Ash Berlin-Taylor >Priority: Major > > In tests/test_jobs.py we have a lot of slow, fragile tests that test the > behaviour of the "Jobs" - BackfillJob and SchedulerJob being the main two. > Since we have tests that check that the executors will actually _run_ tasks > we should "stub" out the executor and replace the executor in those tests > with one that just records what tasks it was asked to run. > This will hopefully speed up the tests and make them less fragile. > A possible starting point: > {code:title=test_jobs.diff} > diff --git a/tests/test_jobs.py b/tests/test_jobs.py > index e3ae8be1..aa919f90 100644 > --- a/tests/test_jobs.py > +++ b/tests/test_jobs.py > @@ -1384,6 +1384,14 @@ class LocalTaskJobTest(unittest.TestCase): > class SchedulerJobTest(unittest.TestCase): > +class NullExecutor(airflow.executors.base_executor.BaseExecutor): > +def heartbeat(self, *args, **kwargs): > +# This would normally pop tasks of the queue and run them. > +pass > + > +def end(): > +pass > + > def setUp(self): > clear_db_runs() > clear_db_pools() > @@ -1391,6 +1399,10 @@ class SchedulerJobTest(unittest.TestCase): > clear_db_sla_miss() > clear_db_errors() > +# Speed up some tests by not running the tasks, just look at what we > +# enqueue! > +self.null_exec = self.NullExecutor() > + > @classmethod > def setUpClass(cls): > cls.dagbag = DagBag() > @@ -1409,8 +1421,7 @@ class SchedulerJobTest(unittest.TestCase): > def tearDownClass(cls): > cls.patcher.stop() > -@staticmethod > -def run_single_scheduler_loop_with_no_dags(dags_folder): > +def run_single_scheduler_loop_with_no_dags(self, dags_folder): > """ > Utility function that runs a single scheduler loop without actually > changing/scheduling any dags. This is useful to simulate the other > side effects of > @@ -1421,6 +1432,7 @@ class SchedulerJobTest(unittest.TestCase): > :type directory: str > """ > scheduler = SchedulerJob( > +executor=self.null_exec, > dag_id='this_dag_doesnt_exist', # We don't want to actually run > anything > num_runs=1, > subdir=os.path.join(dags_folder)) > @@ -2483,6 +2495,7 @@ class SchedulerJobTest(unittest.TestCase): > self.assertTrue(dag.start_date > datetime.datetime.utcnow()) > scheduler = SchedulerJob(dag_id, > + executor=self.null_exec, > subdir=os.path.join(TEST_DAG_FOLDER, > 'test_scheduler_dags.py'), > num_runs=1) > scheduler.run() > @@ -2491,6 +2504,7 @@ class SchedulerJobTest(unittest.TestCase): > self.assertEqual( > len(session.query(TI).filter(TI.dag_id == dag_id).all()), 0) > session.commit() > +self.assertEqual({}, self.null_exec.queued_tasks) > # previously, running this backfill would kick off the Scheduler > # because it would take the most recent run and start from there > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4358) Decouple running actual tasks from test_jobs.py
[ https://issues.apache.org/jira/browse/AIRFLOW-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-4358: --- Issue Type: Sub-task (was: Test) Parent: AIRFLOW-2488 > Decouple running actual tasks from test_jobs.py > --- > > Key: AIRFLOW-4358 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4358 > Project: Apache Airflow > Issue Type: Sub-task > Components: tests >Reporter: Ash Berlin-Taylor >Priority: Major > > In tests/test_jobs.py we have a lot of slow, fragile tests that test the > behaviour of the "Jobs" - BackfillJob and SchedulerJob being the main two. > Since we have tests that check that the executors will actually _run_ tasks > we should "stub" out the executor and replace the executor in those tests > with one that just records what tasks it was asked to run. > This will hopefully speed up the tests and make them less fragile. > A possible starting point: > {code:title=test_jobs.diff} > diff --git a/tests/test_jobs.py b/tests/test_jobs.py > index e3ae8be1..aa919f90 100644 > --- a/tests/test_jobs.py > +++ b/tests/test_jobs.py > @@ -1384,6 +1384,14 @@ class LocalTaskJobTest(unittest.TestCase): > class SchedulerJobTest(unittest.TestCase): > +class NullExecutor(airflow.executors.base_executor.BaseExecutor): > +def heartbeat(self, *args, **kwargs): > +# This would normally pop tasks of the queue and run them. > +pass > + > +def end(): > +pass > + > def setUp(self): > clear_db_runs() > clear_db_pools() > @@ -1391,6 +1399,10 @@ class SchedulerJobTest(unittest.TestCase): > clear_db_sla_miss() > clear_db_errors() > +# Speed up some tests by not running the tasks, just look at what we > +# enqueue! > +self.null_exec = self.NullExecutor() > + > @classmethod > def setUpClass(cls): > cls.dagbag = DagBag() > @@ -1409,8 +1421,7 @@ class SchedulerJobTest(unittest.TestCase): > def tearDownClass(cls): > cls.patcher.stop() > -@staticmethod > -def run_single_scheduler_loop_with_no_dags(dags_folder): > +def run_single_scheduler_loop_with_no_dags(self, dags_folder): > """ > Utility function that runs a single scheduler loop without actually > changing/scheduling any dags. This is useful to simulate the other > side effects of > @@ -1421,6 +1432,7 @@ class SchedulerJobTest(unittest.TestCase): > :type directory: str > """ > scheduler = SchedulerJob( > +executor=self.null_exec, > dag_id='this_dag_doesnt_exist', # We don't want to actually run > anything > num_runs=1, > subdir=os.path.join(dags_folder)) > @@ -2483,6 +2495,7 @@ class SchedulerJobTest(unittest.TestCase): > self.assertTrue(dag.start_date > datetime.datetime.utcnow()) > scheduler = SchedulerJob(dag_id, > + executor=self.null_exec, > subdir=os.path.join(TEST_DAG_FOLDER, > 'test_scheduler_dags.py'), > num_runs=1) > scheduler.run() > @@ -2491,6 +2504,7 @@ class SchedulerJobTest(unittest.TestCase): > self.assertEqual( > len(session.query(TI).filter(TI.dag_id == dag_id).all()), 0) > session.commit() > +self.assertEqual({}, self.null_exec.queued_tasks) > # previously, running this backfill would kick off the Scheduler > # because it would take the most recent run and start from there > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2488) Speed up the CI
[ https://issues.apache.org/jira/browse/AIRFLOW-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-2488: --- Description: Some of the tests take far to long: {noformat} [success] 10.48% tests.test_jobs.BackfillJobTest.test_backfill_examples: 177.9813s [success] 6.24% tests.test_jobs.BackfillJobTest.test_backfill_multi_dates: 106.0157s [success] 3.59% tests.transplant_class..C.test_backfill: 61.0503s [success] 3.55% tests.contrib.sensors.test_python_sensor.PythonSensorTests.test_python_sensor_false: 60.2099s [success] 2.66% tests.test_impersonation.ImpersonationTest.test_impersonation_subdag: 45.1117s [success] 1.86% tests.test_jobs.SchedulerJobTest.test_dagrun_root_after_dagrun_unfinished: 31.5533s [success] 1.77% tests.executors.test_celery_executor.CeleryExecutorTest.test_celery_integration_1_redis_redis_6379_0: 30.0873s [success] 1.48% tests.operators.test_subdag_operator.SubDagOperatorTests.test_subdag_deadlock: 25.0747s [success] 1.40% tests.test_jobs.SchedulerJobTest.test_new_import_error_replaces_old: 23.7093s [success] 1.36% tests.test_jobs.SchedulerJobTest.test_dagrun_success: 23.1645s [success] 1.33% tests.test_jobs.BackfillJobTest.test_cli_backfill_depends_on_past_backwards: 22.5161s [success] 1.30% tests.test_jobs.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past: 22.1402s [success] 1.30% tests.test_jobs.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past_advance_ex_date: 22.1153s [success] 1.28% tests.test_jobs.BackfillJobTest.test_cli_backfill_depends_on_past: 21.7944s [success] 1.27% tests.test_jobs.SchedulerJobTest.test_dagrun_root_fail_unfinished: 21.5010s [success] 1.25% tests.test_jobs.BackfillJobTest.test_run_ignores_all_dependencies: 21.3037s [success] 1.25% tests.test_jobs.SchedulerJobTest.test_dagrun_root_fail: 21.1455s [success] 1.20% tests.test_impersonation.ImpersonationTest.test_impersonation_custom: 20.3233s [success] 1.19% tests.executors.test_celery_executor.CeleryExecutorTest.test_celery_integration_0_amqp_guest_guest_rabbitmq_5672: 20.1648s [success] 1.14% tests.test_jobs.BackfillJobTest.test_backfill_run_backwards: 19.4043s [success] 1.06% tests.cli.test_cli.TestCLI.test_next_execution: 17.9290s {noformat} (List updated April 23, 2019) These tests should be refactored to make them faster. was: Some of the tests take far to long: {{[success] 14.51% tests.BackfillJobTest.test_backfill_examples: 259.7899s [success] 4.29% tests.hooks.test_s3_hook.TestS3Hook.test_list_keys_paged: 76.7397s [success] 4.18% tests.hooks.test_s3_hook.TestS3Hook.test_list_prefixes_paged: 74.8025s [success] 2.58% tests.BackfillJobTest.test_backfill_multi_dates: 46.1206s [success] 2.38% tests.SchedulerJobTest.test_scheduler_start_date: 42.5719s [success] 1.95% tests.CoreTest.test_schedule_dag_no_end_date_up_to_today_only: 34.9399s [success] 1.71% tests.CliTests.test_backfill: 30.5825s [success] 1.70% tests.SchedulerJobTest.test_scheduler_multiprocessing: 30.3580s [success] 1.68% tests.contrib.hooks.test_gcp_mlengine_hook.TestMLEngineHook.test_create_mlengine_job: 30.0609s [success] 1.68% tests.SchedulerJobTest.test_new_import_error_replaces_old: 29.9968s}} These tests should be refactored to make them faster. > Speed up the CI > --- > > Key: AIRFLOW-2488 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2488 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Fokko Driesprong >Priority: Major > > Some of the tests take far to long: > {noformat} > [success] 10.48% tests.test_jobs.BackfillJobTest.test_backfill_examples: > 177.9813s > [success] 6.24% tests.test_jobs.BackfillJobTest.test_backfill_multi_dates: > 106.0157s > [success] 3.59% tests.transplant_class..C.test_backfill: 61.0503s > [success] 3.55% > tests.contrib.sensors.test_python_sensor.PythonSensorTests.test_python_sensor_false: > 60.2099s > [success] 2.66% > tests.test_impersonation.ImpersonationTest.test_impersonation_subdag: 45.1117s > [success] 1.86% > tests.test_jobs.SchedulerJobTest.test_dagrun_root_after_dagrun_unfinished: > 31.5533s > [success] 1.77% > tests.executors.test_celery_executor.CeleryExecutorTest.test_celery_integration_1_redis_redis_6379_0: > 30.0873s > [success] 1.48% > tests.operators.test_subdag_operator.SubDagOperatorTests.test_subdag_deadlock: > 25.0747s > [success] 1.40% > tests.test_jobs.SchedulerJobTest.test_new_import_error_replaces_old: 23.7093s > [success] 1.36% tests.test_jobs.SchedulerJobTest.test_dagrun_success: 23.1645s > [success] 1.33% > tests.test_jobs.BackfillJobTest.test_cli_backfill_depends_on_past_backwards: > 22.5161s > [success] 1.30% > tests.test_jobs.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past: > 22.1402s > [success] 1.30% >
[jira] [Commented] (AIRFLOW-4395) Remove pickle_info view
[ https://issues.apache.org/jira/browse/AIRFLOW-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824298#comment-16824298 ] ASF GitHub Bot commented on AIRFLOW-4395: - mik-laj commented on pull request #5161: [AIRFLOW-4395] Remove pickle_info view URL: https://github.com/apache/airflow/pull/5161 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-4395\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-4395 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Description As for me, this view is not currently used in the project. The view was introduced in https://github.com/apache/airflow/commit/300951c23ac079e13101c6dd11d6e42b60912ebb but it does not contain references in UI. This is a development tool that can be added using a plugin when someone misses this feature. ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove pickle_info view > --- > > Key: AIRFLOW-4395 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4395 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Kamil Bregula >Priority: Trivial > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] mik-laj opened a new pull request #5161: [AIRFLOW-4395] Remove pickle_info view
mik-laj opened a new pull request #5161: [AIRFLOW-4395] Remove pickle_info view URL: https://github.com/apache/airflow/pull/5161 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-4395\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-4395 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Description As for me, this view is not currently used in the project. The view was introduced in https://github.com/apache/airflow/commit/300951c23ac079e13101c6dd11d6e42b60912ebb but it does not contain references in UI. This is a development tool that can be added using a plugin when someone misses this feature. ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io commented on issue #5160: [AIRFLOW-4394] Don't double test behaviour of BackfillJob from CLI tests
codecov-io commented on issue #5160: [AIRFLOW-4394] Don't double test behaviour of BackfillJob from CLI tests URL: https://github.com/apache/airflow/pull/5160#issuecomment-485876624 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5160?src=pr=h1) Report > Merging [#5160](https://codecov.io/gh/apache/airflow/pull/5160?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/37fdee3f5be3cdd7c416d4e13dff592bb757bda4?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5160/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5160?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#5160 +/- ## === Coverage 78.44% 78.44% === Files 466 466 Lines 2973529735 === Hits2332723327 Misses 6408 6408 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5160?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5160?src=pr=footer). Last update [37fdee3...3fe35e3](https://codecov.io/gh/apache/airflow/pull/5160?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-4395) Remove pickle_info view
Kamil Bregula created AIRFLOW-4395: -- Summary: Remove pickle_info view Key: AIRFLOW-4395 URL: https://issues.apache.org/jira/browse/AIRFLOW-4395 Project: Apache Airflow Issue Type: New Feature Reporter: Kamil Bregula -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] feluelle commented on a change in pull request #5127: [AIRFLOW-4343] Show warning in UI if scheduler is not running
feluelle commented on a change in pull request #5127: [AIRFLOW-4343] Show warning in UI if scheduler is not running URL: https://github.com/apache/airflow/pull/5127#discussion_r277758349 ## File path: airflow/config_templates/default_airflow.cfg ## @@ -464,11 +464,6 @@ dag_dir_list_interval = 300 # How often should stats be printed to the logs print_stats_interval = 30 -# If the last scheduler heartbeat happened more than scheduler_health_check_threshold ago (in seconds), -# scheduler is considered unhealthy. -# This is used by the health check in the "/health" endpoint -scheduler_health_check_threshold = 30 Review comment: > Does anyone thing we need an UPDATING note for this? I think it is not necessary, but please don't forget to update the docs accordingly to your changes: https://github.com/apache/airflow/blob/master/docs/howto/check-health.rst > Should I replace it with something that controls the grace_multiplier for SchedulerJob? (or keep this setting and have SchedulerJob use a fixed timeout instead) @feluelle @XD-DENG thoughts? I prefer a fixed value over a multiplier, because it is clearer to me. But I would also be fine with having a grace_multiplier as long as it is documented properly :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services