[jira] [Updated] (AIRFLOW-4401) multiprocessing.Queue.empty() is unreliable
[ https://issues.apache.org/jira/browse/AIRFLOW-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Potiuk updated AIRFLOW-4401: -- Description: After discussing with [~ash] and [~BasPH] potential reasons for flakiness of LocalExecutor tests, I took a deeper dive into the problem and what I found raised the remaining hair on top of my head. We had a number of flaky tests in the local executor that resulted in result_queue not being empty where it should have been emptied a moment before. More details and discussion can be found in [https://github.com/apache/airflow/pull/5159] The problem turned out to be unreliability of multiprocessing.Queue empty() implementation. It turned out that multiprocessing.Queue.empty() implementation is not fully synchronized and it might return True even if put() operation has been already completed in another process. What's more - empty() might return True even if qsize() of the queue returns > 0 (!) It's a bit mind-boggling but it is "as intended' as documented in [https://bugs.python.org/issue23582] (resolved as "not a bug" ) and it is described in [https://docs.python.org/3/library/multiprocessing.html#pipes-and-queues] when you go details of how data is synchronized between processes. A few people have stumbled upon this problem. For example [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] and [https://github.com/keras-team/autokeras/issues/368] Also we seemed to experienced that in Airflow before. In jobs.py years ago (31.07.2016) - we can see the comment below (but we used multiprocessing.queue empty() nevertheless): {code:java} # Not using multiprocessing.Queue() since it's no longer a separate # process and due to some unusual behavior. (empty() incorrectly # returns true?){code} The solution available in [https://bugs.python.org/issue23582] using qsize() was working on Linux but is not really acceptable because qsize() does not work on MacOS (throws NotImplementedError). The proposed solution 1): Implement a more reliable queue (SynchronizedQueue) based on [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] (but we have to addjust initialisation to match 2.7 and 3.5+ syntax (since we want to backport to stable v1.10 releasE). We should replace all usages of multiprocessing.Queue where empty() is used with the SynchronizedQueue. And make sure we do not use multiprocessing.Queue in similar way in the future. The solution 2) Seems that this unreliable behaviour of Queue is only happening if the Queue is instantiated directly and the small delays between processes are gone when Shared Manager is used (because then Queue is a proxy to a central Queue started in a separate process - thus synchronisation is implemented in this single queue: [https://docs.python.org/3/library/multiprocessing.html#pipes-and-queues] . Switching to Managed queue should solve the problem as well. was: After discussing with [~ash] and [~BasPH] potential reasons for flakiness of LocalExecutor tests, I took a deeper dive into the problem and what I found raised the remaining hair on top of my head. We had a number of flaky tests in the local executor that resulted in result_queue not being empty where it should have been emptied a moment before. More details and discussion can be found in [https://github.com/apache/airflow/pull/5159] The problem turned out to be ... unreliability of multiprocessing.Queue empty() implementation. It turned out that multiprocessing.Queue.empty() implementation is not fully synchronized and it might return True even if put() operation has been already completed in another process. What's more - empty() might return True even if qsize() of the queue returns > 0 (!) It's a bit mind-boggling but it is "as intended' as documented in [https://bugs.python.org/issue23582] (resolved as "not a bug" ) A few people have stumbled upon this problem. For example [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] and [https://github.com/keras-team/autokeras/issues/368] Also we seemed to experienced that in Airflow before. In jobs.py years ago (31.07.2016) - we can see the comment below (but we used multiprocessing.queue empty() nevertheless): {code:java} # Not using multiprocessing.Queue() since it's no longer a separate # process and due to some unusual behavior. (empty() incorrectly # returns true?){code} The solution available in [https://bugs.python.org/issue23582] using qsize() was working on Linux but is not really acceptable because qsize() does not work on MacOS (throws NotImplementedError). The solution 1) Implement a reliable queue (SynchronizedQueue) based on [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] (butwith a twist that __init__ of class deriving from Queue has
[GitHub] [airflow] benbenbang commented on issue #5137: [AIRFLOW-4363] Add try-catch for retrieving `status` from cli in docker operator
benbenbang commented on issue #5137: [AIRFLOW-4363] Add try-catch for retrieving `status` from cli in docker operator URL: https://github.com/apache/airflow/pull/5137#issuecomment-487452686 All done, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #5201: [Airlfow-XXXX] Add docs for Google Cloud Storage Sensors
potiuk commented on issue #5201: [Airlfow-] Add docs for Google Cloud Storage Sensors URL: https://github.com/apache/airflow/pull/5201#issuecomment-487449971 Just the last commit is needed since the sensor is already merged. Can you please rebase just the last on on top of master This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] jmcarp commented on issue #5196: [AIRFLOW-4423] Improve date handling in mysql to gcs operator.
jmcarp commented on issue #5196: [AIRFLOW-4423] Improve date handling in mysql to gcs operator. URL: https://github.com/apache/airflow/pull/5196#issuecomment-487444578 @feluelle: I updated the tests and docs like you suggested. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] zhongjiajie commented on issue #4890: [AIRFLOW-4048] HttpSensor provide-context to response_check
zhongjiajie commented on issue #4890: [AIRFLOW-4048] HttpSensor provide-context to response_check URL: https://github.com/apache/airflow/pull/4890#issuecomment-487434271 @raphaelauv BTW, we are already remove `py2` in our CI, and you failed test is in `py2.7` part. Maybe we should check CI log after you rebase on master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] zhongjiajie commented on issue #4890: [AIRFLOW-4048] HttpSensor provide-context to response_check
zhongjiajie commented on issue #4890: [AIRFLOW-4048] HttpSensor provide-context to response_check URL: https://github.com/apache/airflow/pull/4890#issuecomment-487434079 @raphaelauv You could use `git push -f` re-submit commit and the failed CI will restart. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] TV4Fun commented on issue #5169: [AIRFLOW-3143] Support Auto-Zone in DataprocClusterCreateOperator
TV4Fun commented on issue #5169: [AIRFLOW-3143] Support Auto-Zone in DataprocClusterCreateOperator URL: https://github.com/apache/airflow/pull/5169#issuecomment-487424425 @fenglu-g @potiuk This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #5201: [Airlfow-XXXX] Add docs for Google Cloud Storage Sensors
codecov-io edited a comment on issue #5201: [Airlfow-] Add docs for Google Cloud Storage Sensors URL: https://github.com/apache/airflow/pull/5201#issuecomment-487417345 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=h1) Report > Merging [#5201](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9daad7ecd14fabfc3f441074670f4b38aa93e8b0?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `93.61%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5201/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5201 +/- ## == - Coverage 78.54% 78.54% -0.01% == Files 469 469 Lines 2989629896 == - Hits2348323482 -1 - Misses 6413 6414 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/contrib/sensors/gcs\_sensor.py](https://codecov.io/gh/apache/airflow/pull/5201/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvZ2NzX3NlbnNvci5weQ==) | `69.47% <93.61%> (ø)` | :arrow_up: | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5201/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.42% <0%> (-0.18%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=footer). Last update [9daad7e...84b45b8](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io commented on issue #5201: [Airlfow-XXXX] Add docs for Google Cloud Storage Sensors
codecov-io commented on issue #5201: [Airlfow-] Add docs for Google Cloud Storage Sensors URL: https://github.com/apache/airflow/pull/5201#issuecomment-487417345 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=h1) Report > Merging [#5201](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9daad7ecd14fabfc3f441074670f4b38aa93e8b0?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `93.61%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5201/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5201 +/- ## == - Coverage 78.54% 78.54% -0.01% == Files 469 469 Lines 2989629896 == - Hits2348323482 -1 - Misses 6413 6414 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/contrib/sensors/gcs\_sensor.py](https://codecov.io/gh/apache/airflow/pull/5201/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvZ2NzX3NlbnNvci5weQ==) | `69.47% <93.61%> (ø)` | :arrow_up: | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5201/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.42% <0%> (-0.18%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=footer). Last update [9daad7e...84b45b8](https://codecov.io/gh/apache/airflow/pull/5201?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] jaketf commented on issue #5201: [Airlfow-XXXX] Add docs for Google Cloud Storage Sensors
jaketf commented on issue #5201: [Airlfow-] Add docs for Google Cloud Storage Sensors URL: https://github.com/apache/airflow/pull/5201#issuecomment-487414475 @potiuk @OmerJog hopefully this addresses the documentation gap in `integrations.rst` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] jaketf opened a new pull request #5201: [Airlfow-XXXX] Add docs for Google Cloud Storage Sensors
jaketf opened a new pull request #5201: [Airlfow-] Add docs for Google Cloud Storage Sensors URL: https://github.com/apache/airflow/pull/5201 This PR adds documentation for the Google Cloud Storage sensors to `integrations.rst` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] andriisoldatenko commented on a change in pull request #5048: [AIRFLOW-3370] Add stdout output options to Elasticsearch task log handler
andriisoldatenko commented on a change in pull request #5048: [AIRFLOW-3370] Add stdout output options to Elasticsearch task log handler URL: https://github.com/apache/airflow/pull/5048#discussion_r279211175 ## File path: airflow/utils/log/es_task_handler.py ## @@ -119,7 +146,10 @@ def _read(self, ti, try_number, metadata=None): if offset != next_offset or 'last_log_timestamp' not in metadata: metadata['last_log_timestamp'] = str(cur_ts) -message = '\n'.join([log.message for log in logs]) +# If we hit the end of the log, remove the actual end_of_log message Review comment: @KevinYang21 can you please demonstrate example? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] jaketf commented on issue #5166: [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor
jaketf commented on issue #5166: [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor URL: https://github.com/apache/airflow/pull/5166#issuecomment-487412971 @potiuk @OmerJog Sure I'll add a PR for adding the docs for the GCS sensors. It looks like I inherited the lack of docs from the the other GCS sensors too :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #5196: [AIRFLOW-4423] Improve date handling in mysql to gcs operator.
codecov-io edited a comment on issue #5196: [AIRFLOW-4423] Improve date handling in mysql to gcs operator. URL: https://github.com/apache/airflow/pull/5196#issuecomment-487343692 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5196?src=pr=h1) Report > Merging [#5196](https://codecov.io/gh/apache/airflow/pull/5196?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9daad7ecd14fabfc3f441074670f4b38aa93e8b0?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `95.23%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5196/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5196?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5196 +/- ## == + Coverage 78.54% 78.55% +<.01% == Files 469 469 Lines 2989629898 +2 == + Hits2348323486 +3 + Misses 6413 6412 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5196?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/contrib/operators/mysql\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/5196/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9teXNxbF90b19nY3MucHk=) | `92.59% <95.23%> (+1.61%)` | :arrow_up: | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5196/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.42% <0%> (-0.18%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5196?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5196?src=pr=footer). Last update [9daad7e...cadae0a](https://codecov.io/gh/apache/airflow/pull/5196?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-4401) multiprocessing.Queue.empty() is unreliable
[ https://issues.apache.org/jira/browse/AIRFLOW-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828780#comment-16828780 ] Jarek Potiuk commented on AIRFLOW-4401: --- I had to also add some shutdowns to clean-up the extra processes running in 5200 > multiprocessing.Queue.empty() is unreliable > --- > > Key: AIRFLOW-4401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4401 > Project: Apache Airflow > Issue Type: Bug >Reporter: Jarek Potiuk >Priority: Major > Fix For: 1.10.4 > > > After discussing with [~ash] and [~BasPH] potential reasons for flakiness of > LocalExecutor tests, I took a deeper dive into the problem and what I found > raised the remaining hair on top of my head. > We had a number of flaky tests in the local executor that resulted in > result_queue not being empty where it should have been emptied a moment > before. More details and discussion can be found in > [https://github.com/apache/airflow/pull/5159] > The problem turned out to be ... unreliability of multiprocessing.Queue > empty() implementation. It turned out that multiprocessing.Queue.empty() > implementation is not fully synchronized and it might return True even if > put() operation has been already completed in another process. What's more - > empty() might return True even if qsize() of the queue returns > 0 (!) > It's a bit mind-boggling but it is "as intended' as documented in > [https://bugs.python.org/issue23582] (resolved as "not a bug" ) > A few people have stumbled upon this problem. For example > [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] > and [https://github.com/keras-team/autokeras/issues/368] > Also we seemed to experienced that in Airflow before. In jobs.py years ago > (31.07.2016) - we can see the comment below (but we used > multiprocessing.queue empty() nevertheless): > {code:java} > # Not using multiprocessing.Queue() since it's no longer a separate > # process and due to some unusual behavior. (empty() incorrectly > # returns true?){code} > The solution available in [https://bugs.python.org/issue23582] using qsize() > was working on Linux but is not really acceptable because qsize() does not > work on MacOS (throws NotImplementedError). > The working solution is to implement a reliable queue (SynchronizedQueue) > based on > [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] > (butwith a twist that __init__ of class deriving from Queue has to be > changed for python 3.4+ as described in > [https://stackoverflow.com/questions/24941359/ctx-parameter-in-multiprocessing-queue]. > Luckily we are now Python3.5+ > We should replace all usages of multiprocessing.Queue where empty() is used > with the SynchronizedQueue. And make sure we do not use multiprocessing.Queue > in similar way in the future. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] potiuk commented on issue #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used.
potiuk commented on issue #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used. URL: https://github.com/apache/airflow/pull/5199#issuecomment-487394715 Seems that https://github.com/apache/airflow/pull/5200/ solves it in a simpler way. Let's wait for the build to finish, but it looks good from local hosting. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk edited a comment on issue #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used.
potiuk edited a comment on issue #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used. URL: https://github.com/apache/airflow/pull/5199#issuecomment-487394715 Seems that https://github.com/apache/airflow/pull/5200/ solves it in a simpler way. Let's wait for the build to finish, but it looks good from local testing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used.
codecov-io edited a comment on issue #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used. URL: https://github.com/apache/airflow/pull/5199#issuecomment-487392721 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=h1) Report > Merging [#5199](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9daad7ecd14fabfc3f441074670f4b38aa93e8b0?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `81.48%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5199/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5199 +/- ## == + Coverage 78.54% 78.55% +<.01% == Files 469 470 +1 Lines 2989629932 +36 == + Hits2348323514 +31 - Misses 6413 6418 +5 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/executors/base\_executor.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvYmFzZV9leGVjdXRvci5weQ==) | `95.58% <ø> (-0.07%)` | :arrow_down: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `59.22% <100%> (+0.1%)` | :arrow_up: | | [airflow/contrib/executors/kubernetes\_executor.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4ZWN1dG9ycy9rdWJlcm5ldGVzX2V4ZWN1dG9yLnB5) | `63.27% <100%> (ø)` | :arrow_up: | | [airflow/executors/local\_executor.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvbG9jYWxfZXhlY3V0b3IucHk=) | `80.41% <40%> (-0.65%)` | :arrow_down: | | [airflow/jobs.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `78.53% <50%> (+0.01%)` | :arrow_up: | | [airflow/utils/synchronized\_queue.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zeW5jaHJvbml6ZWRfcXVldWUucHk=) | `88.57% <88.57%> (ø)` | | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.42% <0%> (-0.18%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=footer). Last update [9daad7e...462d7cd](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used.
codecov-io edited a comment on issue #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used. URL: https://github.com/apache/airflow/pull/5199#issuecomment-487392721 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=h1) Report > Merging [#5199](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9daad7ecd14fabfc3f441074670f4b38aa93e8b0?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `81.48%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5199/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5199 +/- ## == + Coverage 78.54% 78.55% +<.01% == Files 469 470 +1 Lines 2989629932 +36 == + Hits2348323514 +31 - Misses 6413 6418 +5 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/executors/base\_executor.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvYmFzZV9leGVjdXRvci5weQ==) | `95.58% <ø> (-0.07%)` | :arrow_down: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `59.22% <100%> (+0.1%)` | :arrow_up: | | [airflow/contrib/executors/kubernetes\_executor.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4ZWN1dG9ycy9rdWJlcm5ldGVzX2V4ZWN1dG9yLnB5) | `63.27% <100%> (ø)` | :arrow_up: | | [airflow/executors/local\_executor.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvbG9jYWxfZXhlY3V0b3IucHk=) | `80.41% <40%> (-0.65%)` | :arrow_down: | | [airflow/jobs.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `78.53% <50%> (+0.01%)` | :arrow_up: | | [airflow/utils/synchronized\_queue.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zeW5jaHJvbml6ZWRfcXVldWUucHk=) | `88.57% <88.57%> (ø)` | | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.42% <0%> (-0.18%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=footer). Last update [9daad7e...462d7cd](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io commented on issue #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used.
codecov-io commented on issue #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used. URL: https://github.com/apache/airflow/pull/5199#issuecomment-487392721 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=h1) Report > Merging [#5199](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9daad7ecd14fabfc3f441074670f4b38aa93e8b0?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `81.48%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5199/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5199 +/- ## == + Coverage 78.54% 78.55% +<.01% == Files 469 470 +1 Lines 2989629932 +36 == + Hits2348323514 +31 - Misses 6413 6418 +5 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/executors/base\_executor.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvYmFzZV9leGVjdXRvci5weQ==) | `95.58% <ø> (-0.07%)` | :arrow_down: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `59.22% <100%> (+0.1%)` | :arrow_up: | | [airflow/contrib/executors/kubernetes\_executor.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4ZWN1dG9ycy9rdWJlcm5ldGVzX2V4ZWN1dG9yLnB5) | `63.27% <100%> (ø)` | :arrow_up: | | [airflow/executors/local\_executor.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvbG9jYWxfZXhlY3V0b3IucHk=) | `80.41% <40%> (-0.65%)` | :arrow_down: | | [airflow/jobs.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `78.53% <50%> (+0.01%)` | :arrow_up: | | [airflow/utils/synchronized\_queue.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zeW5jaHJvbml6ZWRfcXVldWUucHk=) | `88.57% <88.57%> (ø)` | | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.42% <0%> (-0.18%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=footer). Last update [9daad7e...462d7cd](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk opened a new pull request #5200: [aIRFLOW-4401] Use managers for Queue synchronization
potiuk opened a new pull request #5200: [aIRFLOW-4401] Use managers for Queue synchronization URL: https://github.com/apache/airflow/pull/5200 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-4401 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: It is a known problem https://bugs.python.org/issue23582 that multiprocessing.Queue empty() method is not reliable - sometimes it might return True even if another process already put something in the queue. This resulted in some of the tasks not picked up when sync() methods were called (in AirflowKubernetesScheduler, LocalExecutor, DagFileProcessor). This was less of a problem if the method was called in sync() - as the remaining jobs/files could be processed in next pass but it was a problem in tests and when graceful shutdown was executed (some tasks could be still unprocessed while the shutdown occured). We switched to Managers() managed queues to handle that - the queue in this case is run in a separate subprocess and each process using it uses a proxy to access this shared queue. All the cases impacted follow the same pattern now: while not queue.empty(): res = queue.get() This loop runs always in single (main) process so it is safe to run it this way - there is no risk that some other process will retrieve the data from the queue in between empty() and get(). In all these cases overhead for inter-processing locking is negligible comparing to the action executed (Parsing DAG, executing job) so it appears it should be safe to merge the change. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Tests were there (but flaky) ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-4401) multiprocessing.Queue.empty() is unreliable
[ https://issues.apache.org/jira/browse/AIRFLOW-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828045#comment-16828045 ] Jarek Potiuk commented on AIRFLOW-4401: --- And we have now a competing (simpler) implementation [https://github.com/apache/airflow/pull/5200] following discussion with the mysterious @airflowuser . > multiprocessing.Queue.empty() is unreliable > --- > > Key: AIRFLOW-4401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4401 > Project: Apache Airflow > Issue Type: Bug >Reporter: Jarek Potiuk >Priority: Major > Fix For: 1.10.4 > > > After discussing with [~ash] and [~BasPH] potential reasons for flakiness of > LocalExecutor tests, I took a deeper dive into the problem and what I found > raised the remaining hair on top of my head. > We had a number of flaky tests in the local executor that resulted in > result_queue not being empty where it should have been emptied a moment > before. More details and discussion can be found in > [https://github.com/apache/airflow/pull/5159] > The problem turned out to be ... unreliability of multiprocessing.Queue > empty() implementation. It turned out that multiprocessing.Queue.empty() > implementation is not fully synchronized and it might return True even if > put() operation has been already completed in another process. What's more - > empty() might return True even if qsize() of the queue returns > 0 (!) > It's a bit mind-boggling but it is "as intended' as documented in > [https://bugs.python.org/issue23582] (resolved as "not a bug" ) > A few people have stumbled upon this problem. For example > [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] > and [https://github.com/keras-team/autokeras/issues/368] > Also we seemed to experienced that in Airflow before. In jobs.py years ago > (31.07.2016) - we can see the comment below (but we used > multiprocessing.queue empty() nevertheless): > {code:java} > # Not using multiprocessing.Queue() since it's no longer a separate > # process and due to some unusual behavior. (empty() incorrectly > # returns true?){code} > The solution available in [https://bugs.python.org/issue23582] using qsize() > was working on Linux but is not really acceptable because qsize() does not > work on MacOS (throws NotImplementedError). > The working solution is to implement a reliable queue (SynchronizedQueue) > based on > [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] > (butwith a twist that __init__ of class deriving from Queue has to be > changed for python 3.4+ as described in > [https://stackoverflow.com/questions/24941359/ctx-parameter-in-multiprocessing-queue]. > Luckily we are now Python3.5+ > We should replace all usages of multiprocessing.Queue where empty() is used > with the SynchronizedQueue. And make sure we do not use multiprocessing.Queue > in similar way in the future. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] codecov-io edited a comment on issue #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used.
codecov-io edited a comment on issue #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used. URL: https://github.com/apache/airflow/pull/5199#issuecomment-487392721 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=h1) Report > Merging [#5199](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/9daad7ecd14fabfc3f441074670f4b38aa93e8b0?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `81.48%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5199/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5199 +/- ## == + Coverage 78.54% 78.55% +<.01% == Files 469 470 +1 Lines 2989629932 +36 == + Hits2348323514 +31 - Misses 6413 6418 +5 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/executors/base\_executor.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvYmFzZV9leGVjdXRvci5weQ==) | `95.58% <ø> (-0.07%)` | :arrow_down: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `59.22% <100%> (+0.1%)` | :arrow_up: | | [airflow/contrib/executors/kubernetes\_executor.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4ZWN1dG9ycy9rdWJlcm5ldGVzX2V4ZWN1dG9yLnB5) | `63.27% <100%> (ø)` | :arrow_up: | | [airflow/executors/local\_executor.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvbG9jYWxfZXhlY3V0b3IucHk=) | `80.41% <40%> (-0.65%)` | :arrow_down: | | [airflow/jobs.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `78.53% <50%> (+0.01%)` | :arrow_up: | | [airflow/utils/synchronized\_queue.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zeW5jaHJvbml6ZWRfcXVldWUucHk=) | `88.57% <88.57%> (ø)` | | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5199/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.42% <0%> (-0.18%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=footer). Last update [9daad7e...462d7cd](https://codecov.io/gh/apache/airflow/pull/5199?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-4382) Limited Parallelism test is flaky
[ https://issues.apache.org/jira/browse/AIRFLOW-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Potiuk updated AIRFLOW-4382: -- Description: -The SFTPOperator tests seems to fail many times.- Limited parallelism test is flaky. Previously the diagnosis of that test indicated that the problem is with SFTPOperator, but in fact all those log files there are expected. All the builds below fail because of limited parallelism test: This is the failure message: {code:java} == 49) FAIL: test_execution_limited_parallelism (tests.executors.test_local_executor.LocalExecutorTest) -- Traceback (most recent call last): tests/executors/test_local_executor.py line 77 in test_execution_limited_parallelism self.execution_parallelism(parallelism=test_parallelism) tests/executors/test_local_executor.py line 61 in execution_parallelism self.assertEqual(len(executor.running), 0) AssertionError: 1 != 0 {code} Examples for PRs where this test fails: [https://travis-ci.org/apache/airflow/jobs/522789194] [https://travis-ci.org/apache/airflow/jobs/522931911] [https://travis-ci.org/apache/airflow/jobs/523348590] THIS IS NOT AN error: (mistakenly identified as one): {code:java} INFO [airflow.task] Dependencies all met for INFO [airflow.task] INFO [airflow.task] Starting attempt 1 of 1 INFO [airflow.task] INFO [airflow.task] Executing on 2019-04-22 06:02:24.255945+00:00 INFO [paramiko.transport] Connected (version 2.0, client OpenSSH_7.2p2) INFO [paramiko.transport] Authentication (publickey) successful! INFO [paramiko.transport.sftp] [chan 0] Opened sftp connection (server version 3) INFO [airflow.task.operators] Starting to transfer from /tmp/test_remote_file to /tmp/tmp2/test_local_file ERROR [airflow.task] Error while transferring from /tmp/test_remote_file to /tmp/tmp2/test_local_file, error: [Errno 2] No such file or directory: '/tmp/tmp2/test_local_file' Traceback (most recent call last): File "/app/airflow/contrib/operators/sftp_operator.py", line 138, in execute sftp_client.get(self.remote_filepath, self.local_filepath) File "/app/.tox/py35-backend_sqlite-env_docker/lib/python3.5/site-packages/paramiko/sftp_client.py", line 801, in get with open(localpath, "wb") as fl: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp2/test_local_file' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/app/airflow/models/taskinstance.py", line 888, in _run_raw_task result = task_copy.execute(context=context) File "/app/airflow/contrib/operators/sftp_operator.py", line 155, in execute .format(file_msg, str(e))) airflow.exceptions.AirflowException: Error while transferring from /tmp/test_remote_file to /tmp/tmp2/test_local_file, error: [Errno 2] No such file or directory: '/tmp/tmp2/test_local_file'{code} was: The SFTPOperator tests seems to fail many times. Examples for PRs where this test fails: [https://travis-ci.org/apache/airflow/jobs/522789194] [https://travis-ci.org/apache/airflow/jobs/522931911] [https://travis-ci.org/apache/airflow/jobs/523348590] {code:java} INFO [airflow.task] Dependencies all met for INFO [airflow.task] INFO [airflow.task] Starting attempt 1 of 1 INFO [airflow.task] INFO [airflow.task] Executing on 2019-04-22 06:02:24.255945+00:00 INFO [paramiko.transport] Connected (version 2.0, client OpenSSH_7.2p2) INFO [paramiko.transport] Authentication (publickey) successful! INFO [paramiko.transport.sftp] [chan 0] Opened sftp connection (server version 3) INFO [airflow.task.operators] Starting to transfer from /tmp/test_remote_file to /tmp/tmp2/test_local_file ERROR [airflow.task] Error while transferring from /tmp/test_remote_file to /tmp/tmp2/test_local_file, error: [Errno 2] No such file or directory: '/tmp/tmp2/test_local_file' Traceback (most recent call last): File "/app/airflow/contrib/operators/sftp_operator.py", line 138, in execute sftp_client.get(self.remote_filepath, self.local_filepath) File "/app/.tox/py35-backend_sqlite-env_docker/lib/python3.5/site-packages/paramiko/sftp_client.py", line 801, in get with open(localpath, "wb") as fl: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp2/test_local_file' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/app/airflow/models/taskinstance.py", line 888, in
[jira] [Updated] (AIRFLOW-4382) Limited Parallelism test is flaky
[ https://issues.apache.org/jira/browse/AIRFLOW-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Potiuk updated AIRFLOW-4382: -- Summary: Limited Parallelism test is flaky (was: SFTPOperator tests are flaky) > Limited Parallelism test is flaky > - > > Key: AIRFLOW-4382 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4382 > Project: Apache Airflow > Issue Type: Wish > Components: tests >Reporter: jack >Priority: Major > Fix For: 1.10.4 > > > The SFTPOperator tests seems to fail many times. > Examples for PRs where this test fails: > [https://travis-ci.org/apache/airflow/jobs/522789194] > [https://travis-ci.org/apache/airflow/jobs/522931911] > [https://travis-ci.org/apache/airflow/jobs/523348590] > > {code:java} > > INFO [airflow.task] Dependencies all met for unit_teststest_schedule_dag_once.test_sftp 2019-04-22 06:02:24.255945+00:00 > [None]> INFO [airflow.task] > > INFO [airflow.task] Starting attempt 1 of 1 INFO [airflow.task] > > INFO [airflow.task] Executing on 2019-04-22 > 06:02:24.255945+00:00 INFO [paramiko.transport] Connected (version 2.0, > client OpenSSH_7.2p2) INFO [paramiko.transport] Authentication (publickey) > successful! INFO [paramiko.transport.sftp] [chan 0] Opened sftp connection > (server version 3) INFO [airflow.task.operators] Starting to transfer from > /tmp/test_remote_file to /tmp/tmp2/test_local_file ERROR [airflow.task] Error > while transferring from /tmp/test_remote_file to /tmp/tmp2/test_local_file, > error: [Errno 2] No such file or directory: '/tmp/tmp2/test_local_file' > Traceback (most recent call last): File > "/app/airflow/contrib/operators/sftp_operator.py", line 138, in execute > sftp_client.get(self.remote_filepath, self.local_filepath) File > "/app/.tox/py35-backend_sqlite-env_docker/lib/python3.5/site-packages/paramiko/sftp_client.py", > line 801, in get with open(localpath, "wb") as fl: FileNotFoundError: > [Errno 2] No such file or directory: '/tmp/tmp2/test_local_file' During > handling of the above exception, another exception occurred: Traceback > (most recent call last): File "/app/airflow/models/taskinstance.py", line > 888, in _run_raw_task result = task_copy.execute(context=context) File > "/app/airflow/contrib/operators/sftp_operator.py", line 155, in execute > .format(file_msg, str(e))) airflow.exceptions.AirflowException: Error while > transferring from /tmp/test_remote_file to /tmp/tmp2/test_local_file, error: > [Errno 2] No such file or directory: '/tmp/tmp2/test_local_file'{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4401) multiprocessing.Queue.empty() is unreliable
[ https://issues.apache.org/jira/browse/AIRFLOW-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828027#comment-16828027 ] Jarek Potiuk commented on AIRFLOW-4401: --- [~BasPH] [~ash] -> I think the new PR [https://github.com/apache/airflow/pull/5199] is likely to solve the issue finally :). > multiprocessing.Queue.empty() is unreliable > --- > > Key: AIRFLOW-4401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4401 > Project: Apache Airflow > Issue Type: Bug >Reporter: Jarek Potiuk >Priority: Major > Fix For: 1.10.4 > > > After discussing with [~ash] and [~BasPH] potential reasons for flakiness of > LocalExecutor tests, I took a deeper dive into the problem and what I found > raised the remaining hair on top of my head. > We had a number of flaky tests in the local executor that resulted in > result_queue not being empty where it should have been emptied a moment > before. More details and discussion can be found in > [https://github.com/apache/airflow/pull/5159] > The problem turned out to be ... unreliability of multiprocessing.Queue > empty() implementation. It turned out that multiprocessing.Queue.empty() > implementation is not fully synchronized and it might return True even if > put() operation has been already completed in another process. What's more - > empty() might return True even if qsize() of the queue returns > 0 (!) > It's a bit mind-boggling but it is "as intended' as documented in > [https://bugs.python.org/issue23582] (resolved as "not a bug" ) > A few people have stumbled upon this problem. For example > [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] > and [https://github.com/keras-team/autokeras/issues/368] > Also we seemed to experienced that in Airflow before. In jobs.py years ago > (31.07.2016) - we can see the comment below (but we used > multiprocessing.queue empty() nevertheless): > {code:java} > # Not using multiprocessing.Queue() since it's no longer a separate > # process and due to some unusual behavior. (empty() incorrectly > # returns true?){code} > The solution available in [https://bugs.python.org/issue23582] using qsize() > was working on Linux but is not really acceptable because qsize() does not > work on MacOS (throws NotImplementedError). > The working solution is to implement a reliable queue (SynchronizedQueue) > based on > [https://github.com/vterron/lemon/commit/9ca6b4b1212228dbd4f69b88aaf88b12952d7d6f] > (butwith a twist that __init__ of class deriving from Queue has to be > changed for python 3.4+ as described in > [https://stackoverflow.com/questions/24941359/ctx-parameter-in-multiprocessing-queue]. > Luckily we are now Python3.5+ > We should replace all usages of multiprocessing.Queue where empty() is used > with the SynchronizedQueue. And make sure we do not use multiprocessing.Queue > in similar way in the future. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-4382) Limited Parallelism test is flaky
[ https://issues.apache.org/jira/browse/AIRFLOW-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Potiuk updated AIRFLOW-4382: -- Description: -The SFTPOperator tests seems to fail many times.- Limited parallelism test is flaky. AIRFLOW-4401 is the root cause. Previously the diagnosis of that test indicated that the problem is with SFTPOperator, but in fact all those log files there are expected. All the builds below fail because of limited parallelism test: This is the failure message: {code:java} == 49) FAIL: test_execution_limited_parallelism (tests.executors.test_local_executor.LocalExecutorTest) -- Traceback (most recent call last): tests/executors/test_local_executor.py line 77 in test_execution_limited_parallelism self.execution_parallelism(parallelism=test_parallelism) tests/executors/test_local_executor.py line 61 in execution_parallelism self.assertEqual(len(executor.running), 0) AssertionError: 1 != 0 {code} Examples for PRs where this test fails: [https://travis-ci.org/apache/airflow/jobs/522789194] [https://travis-ci.org/apache/airflow/jobs/522931911] [https://travis-ci.org/apache/airflow/jobs/523348590] THIS IS NOT AN error: (mistakenly identified as one): {code:java} INFO [airflow.task] Dependencies all met for INFO [airflow.task] INFO [airflow.task] Starting attempt 1 of 1 INFO [airflow.task] INFO [airflow.task] Executing on 2019-04-22 06:02:24.255945+00:00 INFO [paramiko.transport] Connected (version 2.0, client OpenSSH_7.2p2) INFO [paramiko.transport] Authentication (publickey) successful! INFO [paramiko.transport.sftp] [chan 0] Opened sftp connection (server version 3) INFO [airflow.task.operators] Starting to transfer from /tmp/test_remote_file to /tmp/tmp2/test_local_file ERROR [airflow.task] Error while transferring from /tmp/test_remote_file to /tmp/tmp2/test_local_file, error: [Errno 2] No such file or directory: '/tmp/tmp2/test_local_file' Traceback (most recent call last): File "/app/airflow/contrib/operators/sftp_operator.py", line 138, in execute sftp_client.get(self.remote_filepath, self.local_filepath) File "/app/.tox/py35-backend_sqlite-env_docker/lib/python3.5/site-packages/paramiko/sftp_client.py", line 801, in get with open(localpath, "wb") as fl: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp2/test_local_file' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/app/airflow/models/taskinstance.py", line 888, in _run_raw_task result = task_copy.execute(context=context) File "/app/airflow/contrib/operators/sftp_operator.py", line 155, in execute .format(file_msg, str(e))) airflow.exceptions.AirflowException: Error while transferring from /tmp/test_remote_file to /tmp/tmp2/test_local_file, error: [Errno 2] No such file or directory: '/tmp/tmp2/test_local_file'{code} was: -The SFTPOperator tests seems to fail many times.- Limited parallelism test is flaky. Previously the diagnosis of that test indicated that the problem is with SFTPOperator, but in fact all those log files there are expected. All the builds below fail because of limited parallelism test: This is the failure message: {code:java} == 49) FAIL: test_execution_limited_parallelism (tests.executors.test_local_executor.LocalExecutorTest) -- Traceback (most recent call last): tests/executors/test_local_executor.py line 77 in test_execution_limited_parallelism self.execution_parallelism(parallelism=test_parallelism) tests/executors/test_local_executor.py line 61 in execution_parallelism self.assertEqual(len(executor.running), 0) AssertionError: 1 != 0 {code} Examples for PRs where this test fails: [https://travis-ci.org/apache/airflow/jobs/522789194] [https://travis-ci.org/apache/airflow/jobs/522931911] [https://travis-ci.org/apache/airflow/jobs/523348590] THIS IS NOT AN error: (mistakenly identified as one): {code:java} INFO [airflow.task] Dependencies all met for INFO [airflow.task] INFO [airflow.task] Starting attempt 1 of 1 INFO [airflow.task] INFO [airflow.task] Executing on 2019-04-22 06:02:24.255945+00:00 INFO [paramiko.transport] Connected (version 2.0, client OpenSSH_7.2p2) INFO [paramiko.transport] Authentication
[jira] [Commented] (AIRFLOW-2337) Broken Import Variables
[ https://issues.apache.org/jira/browse/AIRFLOW-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828018#comment-16828018 ] jack commented on AIRFLOW-2337: --- Can not reproduce with 1.10.3 - Export and Import works fine for me. > Broken Import Variables > --- > > Key: AIRFLOW-2337 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2337 > Project: Apache Airflow > Issue Type: Bug > Components: db >Affects Versions: 1.9.0 >Reporter: Daniel Lamblin >Priority: Major > > Importing variables that were encrypted seems to have produced a far too long > parameter value. > Using the UI in v1.8.2 I selected all variables, then with selected exported > them. > This produced a json file. it is only 1617 bytes in size total with 26 > variables. > Using the UI in v1.9.0 I selected the file and clicked to import variables. > …/admin/airflow/varimport > gave me the error: > h2. Ooops. > {code:bash} > / ( () ) \___ > /( ( ( ) _)) ) )\ > Can this be briefer and cheerier? >(_((__(_(__(( ( ( | ) ) ) )_))__))_)___) >((__)\\||lll|l||/// \_)) > ( /(/ ( ) ) )\ ) > (( ( ( | | ) ) )\ ) >( /(| / ( )) ) ) )) ) > ( ( _(|)_) ) > ( ||\(|(|)|/|| ) > (|(||(||)) > ( //|/l|||)|\\ \ ) > (/ / // /|//\\ \ \ \ _) > --- > Node: 90f7f5d06c61 > --- > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1988, in > wsgi_app > response = self.full_dispatch_request() > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1641, in > full_dispatch_request > rv = self.handle_user_exception(e) > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1544, in > handle_user_exception > reraise(exc_type, exc_value, tb) > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1639, in > full_dispatch_request > rv = self.dispatch_request() > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1625, in > dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 69, > in inner > return self._run_view(f, *args, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line > 368, in _run_view > return fn(self, *args, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/flask_login.py", line 758, in > decorated_view > return func(*args, **kwargs) > File "/pang/service/airflow/airflow-src/airflow/www/utils.py", line 262, in > wrapper > return f(*args, **kwargs) > File "/pang/service/airflow/airflow-src/airflow/www/views.py", line 1787, > in varimport > models.Variable.set(k, v, serialize_json=isinstance(v, dict)) > File "/pang/service/airflow/airflow-src/airflow/utils/db.py", line 55, in > wrapper > result = func(*args, **kwargs) > File "/pang/service/airflow/airflow-src/airflow/models.py", line 4031, in > set > session.query(cls).filter(cls.key == key).delete() > File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line > 3236, in delete > delete_op.exec_() > File > "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line > 1326, in exec_ > self._do_exec() > File > "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line > 1518, in _do_exec > self._execute_stmt(delete_stmt) > File > "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line > 1333, in _execute_stmt > mapper=self.mapper) > File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", > line 1176, in execute > bind, close_with_result=True).execute(clause, params or {}) > File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", > line 1040, in _connection_for_bind > engine, execution_options) > File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", > line 388, in _connection_for_bind > self._assert_active() > File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", > line 276, in _assert_active > % self._rollback_exception > InvalidRequestError: This Session's transaction has been rolled back due to a > previous exception during flush. To begin a new transaction with this >
[jira] [Commented] (AIRFLOW-1832) xcom_push is not reliable
[ https://issues.apache.org/jira/browse/AIRFLOW-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828022#comment-16828022 ] jack commented on AIRFLOW-1832: --- There has been significant work for the xcom_push including unification done in https://issues.apache.org/jira/browse/AIRFLOW-3249 Please check against newer airflow version and report back > xcom_push is not reliable > - > > Key: AIRFLOW-1832 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1832 > Project: Apache Airflow > Issue Type: Bug >Reporter: Michelle Huang >Priority: Major > > we have a few ETL jobs that are hitting a third party API, extracting data > onto the Airflow server and then transforming and inserting that data into > Redshift. we're using xcom_push in the extract function to store an > identifier appended to the filename to be referenced in the transform > function so we can grab the right file to transform. I'm noticing that > xcom_push is not successfully pushing the value and xcom_pull is returning > "None". this is happening across many of our scheduled jobs regularly and > only just started happening on 11/16. manual triggers of the jobs run > successfully to completion without issue. this only affects our scheduled > runs and doesn't happen to all of them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4401) multiprocessing.Queue.empty() is unreliable
[ https://issues.apache.org/jira/browse/AIRFLOW-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828017#comment-16828017 ] ASF GitHub Bot commented on AIRFLOW-4401: - potiuk commented on pull request #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used. URL: https://github.com/apache/airflow/pull/5199 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-4401 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: It is a known problem https://bugs.python.org/issue23582 that multiprocessing.Queue empty() method is not reliable - sometimes it might return True even if another process already put something in the queue. This resulted in some of the tasks not picked up when sync() methods were called (in AirflowKubernetesScheduler, LocalExecutor, DagFileProcessor). This was less of a problem if the method was called in sync() - as the remaining jobs/files could be processed in next pass but it was a problem in tests and when graceful shutdown was executed (some tasks could be still unprocessed while the shutdown occured). All the cases impacted follow the same pattern now: while not queue.empty(): res = queue.get() This loop runs always in single (main) process so it is safe to run it this way - there is no risk that some other process will retrieve the data from the queue in between empty() and get(). Note that unlike in the standard multiprocessing.Queue, you cannot rely on data being immediately available after empty() is False. You should be prepared that subsequent get_nowait() raises Empty, or (better) use get() to retrieve the data. In all these cases overhead for inter-processing locking is negligible comparing to the action executed (Parsing DAG, executing job) so it appears it should be safe to merge the change. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: No need. Lots of tests for that already (flaky ones). ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > multiprocessing.Queue.empty() is unreliable > --- > > Key: AIRFLOW-4401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4401 > Project: Apache Airflow > Issue Type: Bug >Reporter: Jarek Potiuk >Priority: Major > Fix For: 1.10.4 > > > After discussing with [~ash] and [~BasPH] potential reasons for flakiness of > LocalExecutor tests, I took a deeper dive into the problem and what I found > raised the remaining hair on top
[GitHub] [airflow] potiuk opened a new pull request #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used.
potiuk opened a new pull request #5199: [AIRFLOW-4401] SynchronizedQueue used where empty() is used. URL: https://github.com/apache/airflow/pull/5199 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-4401 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: It is a known problem https://bugs.python.org/issue23582 that multiprocessing.Queue empty() method is not reliable - sometimes it might return True even if another process already put something in the queue. This resulted in some of the tasks not picked up when sync() methods were called (in AirflowKubernetesScheduler, LocalExecutor, DagFileProcessor). This was less of a problem if the method was called in sync() - as the remaining jobs/files could be processed in next pass but it was a problem in tests and when graceful shutdown was executed (some tasks could be still unprocessed while the shutdown occured). All the cases impacted follow the same pattern now: while not queue.empty(): res = queue.get() This loop runs always in single (main) process so it is safe to run it this way - there is no risk that some other process will retrieve the data from the queue in between empty() and get(). Note that unlike in the standard multiprocessing.Queue, you cannot rely on data being immediately available after empty() is False. You should be prepared that subsequent get_nowait() raises Empty, or (better) use get() to retrieve the data. In all these cases overhead for inter-processing locking is negligible comparing to the action executed (Parsing DAG, executing job) so it appears it should be safe to merge the change. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: No need. Lots of tests for that already (flaky ones). ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2911) Add job cancellation capability to Dataflow hook
[ https://issues.apache.org/jira/browse/AIRFLOW-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827986#comment-16827986 ] jack commented on AIRFLOW-2911: --- [~pabloem] any progress? > Add job cancellation capability to Dataflow hook > > > Key: AIRFLOW-2911 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2911 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, Dataflow, gcp >Reporter: Wilson Lian >Assignee: Pablo Estrada >Priority: Minor > > The hook currently only supports starting and waiting on a job. One might > want to cancel a job when, for example, it exceeds a certain timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (AIRFLOW-3061) Update Outdated (4 years old) Screenshots
[ https://issues.apache.org/jira/browse/AIRFLOW-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jack updated AIRFLOW-3061: -- Comment: was deleted (was: Wasn't this updated in 1.10.2? ) > Update Outdated (4 years old) Screenshots > -- > > Key: AIRFLOW-3061 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3061 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Siddharth Anand >Assignee: Siddharth Anand >Priority: Minor > > The following images are still pretty old (almost 4 years old) and need an > update > * > ** > adhoc.png > ** > chart.png > ** > chart_form.png -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3061) Update Outdated (4 years old) Screenshots
[ https://issues.apache.org/jira/browse/AIRFLOW-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827977#comment-16827977 ] jack commented on AIRFLOW-3061: --- Though these files were marked to be updated in: [https://github.com/apache/airflow/pull/3865] They were removed in [https://github.com/apache/airflow/pull/4599] [~TaoFeng] is this ticket still needed? > Update Outdated (4 years old) Screenshots > -- > > Key: AIRFLOW-3061 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3061 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Siddharth Anand >Assignee: Siddharth Anand >Priority: Minor > > The following images are still pretty old (almost 4 years old) and need an > update > * > ** > adhoc.png > ** > chart.png > ** > chart_form.png -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-4364) Integrate Pylint
[ https://issues.apache.org/jira/browse/AIRFLOW-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bas Harenslak reassigned AIRFLOW-4364: -- Assignee: Bas Harenslak > Integrate Pylint > > > Key: AIRFLOW-4364 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4364 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Bas Harenslak >Assignee: Bas Harenslak >Priority: Major > > After a [vote on the mailing > list|https://lists.apache.org/thread.html/f4940d36e98ded96a2473bb2ccdfa4cc648faa2c1334b2aa901c0bba@%3Cdev.airflow.apache.org%3E] > everybody voted for pylint integration. It involves a big change to the > codebase, so let's do it in 2 steps: > # Check pylint only on changed code in the CI. > # After a while we should have a good pylint config, and the remaining > non-checked code should be made compatible with pylint, i.e. enable pylint in > the CI on the full project. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-4364) Integrate Pylint
[ https://issues.apache.org/jira/browse/AIRFLOW-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bas Harenslak reassigned AIRFLOW-4364: -- Assignee: (was: Bas Harenslak) > Integrate Pylint > > > Key: AIRFLOW-4364 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4364 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Bas Harenslak >Priority: Major > > After a [vote on the mailing > list|https://lists.apache.org/thread.html/f4940d36e98ded96a2473bb2ccdfa4cc648faa2c1334b2aa901c0bba@%3Cdev.airflow.apache.org%3E] > everybody voted for pylint integration. It involves a big change to the > codebase, so let's do it in 2 steps: > # Check pylint only on changed code in the CI. > # After a while we should have a good pylint config, and the remaining > non-checked code should be made compatible with pylint, i.e. enable pylint in > the CI on the full project. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4094) XCom not pushed into database when size of content is greater than 65535 bytes
[ https://issues.apache.org/jira/browse/AIRFLOW-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827942#comment-16827942 ] jack commented on AIRFLOW-4094: --- Actually Airflow enforce max size of 49344 bytes. [https://github.com/apache/airflow/blob/master/airflow/models/xcom.py#L37] > XCom not pushed into database when size of content is greater than 65535 bytes > -- > > Key: AIRFLOW-4094 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4094 > Project: Apache Airflow > Issue Type: Bug > Environment: airflow version: 1.10.0 > python: 3.6.5 > postgresql: 9.6.1 >Reporter: Sai Varun Reddy Daram >Priority: Major > > I happen to use airflow with postgres as backend, one of the things I tried > recently is use kubernetes pod operator, which will run and return some data > to next task via XCOMs, one thing I observed is, when the XCOM length is > greater than 65535 bytes it's not getting pushed into XCOM. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3157) Variable.set shouldn't accept types other than string when JSON mode is False
[ https://issues.apache.org/jira/browse/AIRFLOW-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jack updated AIRFLOW-3157: -- Fix Version/s: 1.10.4 > Variable.set shouldn't accept types other than string when JSON mode is False > - > > Key: AIRFLOW-3157 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3157 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: jack >Priority: Minor > Fix For: 1.10.4 > > > Assume the following code (My xcom contains: {color:#33}7764615{color} ): > {code:java} > def set_last_orderid_variable(ds, **kwargs): > ti = kwargs['ti'] > new_orderid = ti.xcom_pull(task_ids='get_max_order_id') > str = "Changed variable from {0} to > {1}".format(LAST_IMPORTED_ORDER_ID,new_orderid) > Variable.set('last_order_id_imported', new_orderid) > logging.info(str) > return{code} > > > Log shows: > {code:java} > [2018-10-04 12:49:48,186] {base_task_runner.py:98} INFO - Subtask: > [2018-10-04 12:49:38,169] {orders_dag.py:400} INFO - Changed variable from 10 > to 7764615{code} > > However When I go to variable > {code:java} > 'last_order_id_imported' {code} > I see empty cell on the Value. *It changed it from 10 to nothing*. > > > If I'll change my code to : > {code:java} > def set_last_orderid_variable(ds, **kwargs): > ti = kwargs['ti'] > new_orderid = ti.xcom_pull(task_ids='get_max_order_id') > str = "Changed variable from {0} to > {1}".format(LAST_IMPORTED_ORDER_ID,new_orderid) > x = '{0}'.format(new_orderid) > logging.info(type(new_orderid)) > Variable.set('last_order_id_imported', x) > logging.info(str) > return{code} > > > This code works. > When I go to variable > {code:java} > 'last_order_id_imported' {code} > I see {color:#33}7764615{color} > The log shows me: > {code:java} > [2018-10-04 12:53:58,002] {base_task_runner.py:98} INFO - Subtask: > [2018-10-04 12:53:58,002] {orders_dag.py:399} INFO - > [2018-10-04 12:53:58,032] {base_task_runner.py:98} INFO - Subtask: > [2018-10-04 12:53:58,031] {orders_dag.py:401} INFO - Changed variable from 10 > to 7764615{code} > > > As you can see the type of new_orderid is > {code:java} > long{code} > . Apparently the > {code:java} > Variable.set {code} > can't handle it (let me remind that this value was received from xcom). > > > If by design Variable.set can accept specific types (strings/ json etc...) it > should raise exception for others (long etc...) . It shouldn't deicide that > it puts empty string on it's own and mark the task as success. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4098) gcp_api dependency conflicts
[ https://issues.apache.org/jira/browse/AIRFLOW-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827941#comment-16827941 ] jack commented on AIRFLOW-4098: --- [~kaxilnaik] have you encountered this? > gcp_api dependency conflicts > > > Key: AIRFLOW-4098 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4098 > Project: Apache Airflow > Issue Type: Bug > Components: gcp >Affects Versions: 1.10.2 > Environment: Ubuntu 16.04.6 LTS > Python 2.7 / Python 3.5 / Python 3.6 >Reporter: Manuel Rodríguez Guimeráns >Priority: Major > > The {{gcp_api}} extra contains inconsistencies in its subdependencies. When > installed with {{pip}} it completes with a warning, but {{pipenv}} and > {{poetry}} are not so forgiving and it cannot be installed at all. > Note that on the master branch as of 419e30 (where {{gcp_api}} is now called > just {{gcp}}) there are still dependency errors as well. > h3. pip with python 2.7 > {code:java} > $ mkvirtualenv -p /usr/bin/python2.7 airflow_pip > ... > $ pip --version > pip 19.0.3 from > /home/manu/.local/share/virtualenvs/airflow_pip/local/lib/python2.7/site-packages/pip > (python 2.7) > $ SLUGIFY_USES_TEXT_UNIDECODE=yes pip install > 'apache-airflow[gcp_api]==1.10.2' > ... > google-cloud-bigquery 1.10.0 has requirement > google-cloud-core<0.30dev,>=0.29.0, but you'll have google-cloud-core 0.28.1 > which is incompatible. > google-cloud-spanner 1.8.0 has requirement > google-cloud-core<0.30dev,>=0.29.0, but you'll have google-cloud-core 0.28.1 > which is incompatible. > ... > {code} > h3. pipenv with python 3.6 > {code:java} > $ mkdir airflow_pipenv > $ cd airflow_pipenv > $ pipenv --version > pipenv, version 2018.11.26 > $ pipenv --python 3.6 > Creating a virtualenv for this project... > Pipfile: /tmp/airflow_pipenv/Pipfile > Using /home/manu/.pyenv/versions/3.6.6/bin/python3.6 (3.6.6) to create > virtualenv... > ⠋ Creating virtual environment...Already using interpreter > /home/manu/.pyenv/versions/3.6.6/bin/python3.6 > Using base prefix '/home/manu/.pyenv/versions/3.6.6' > New python executable in > /home/manu/.local/share/virtualenvs/airflow_pipenv-ePWbOumO/bin/python3.6 > Also creating executable in > /home/manu/.local/share/virtualenvs/airflow_pipenv-ePWbOumO/bin/python > Installing setuptools, pip, wheel...done. > ✔ Successfully created virtual environment! > Virtualenv location: > /home/manu/.local/share/virtualenvs/airflow_pipenv-ePWbOumO > $ airflow_pipenv pipenv install 'apache-airflow[gcp_api]==1.10.2' > Installing apache-airflow[gcp_api]==1.10.2... > Adding apache-airflow to Pipfile's [packages]... > ✔ Installation Succeeded > Pipfile.lock (e3de4b) out of date, updating to (ca72e7)... > Locking [dev-packages] dependencies... > Locking [packages] dependencies... > ✘ Locking Failed! > [pipenv.exceptions.ResolutionFailure]: File > "/home/manu/.pyenv/versions/3.6.6/lib/python3.6/site-packages/pipenv/resolver.py", > line 69, in resolve > [pipenv.exceptions.ResolutionFailure]: req_dir=requirements_dir > [pipenv.exceptions.ResolutionFailure]: File > "/home/manu/.pyenv/versions/3.6.6/lib/python3.6/site-packages/pipenv/utils.py", > line 726, in resolve_deps > [pipenv.exceptions.ResolutionFailure]: req_dir=req_dir, > [pipenv.exceptions.ResolutionFailure]: File > "/home/manu/.pyenv/versions/3.6.6/lib/python3.6/site-packages/pipenv/utils.py", > line 480, in actually_resolve_deps > [pipenv.exceptions.ResolutionFailure]: resolved_tree = > resolver.resolve() > [pipenv.exceptions.ResolutionFailure]: File > "/home/manu/.pyenv/versions/3.6.6/lib/python3.6/site-packages/pipenv/utils.py", > line 395, in resolve > [pipenv.exceptions.ResolutionFailure]: raise > ResolutionFailure(message=str(e)) > [pipenv.exceptions.ResolutionFailure]: > pipenv.exceptions.ResolutionFailure: ERROR: ERROR: Could not find a version > that matches google-cloud-core<0.29dev,<0.30dev,>=0.28.0,>=0.29.0 > [pipenv.exceptions.ResolutionFailure]: Tried: 0.20.0, 0.20.0, 0.21.0, > 0.21.0, 0.22.0, 0.22.0, 0.22.1, 0.22.1, 0.23.0, 0.23.0, 0.23.1, 0.23.1, > 0.24.0, 0.24.0, 0.24.1, 0.24.1, 0.25.0, 0.25.0, 0.26.0, 0.26.0, 0.27.0, > 0.27.0, 0.27.1, 0.27.1, 0.28.0, 0.28.0, 0.28.1, 0.28.1, 0.29.0, 0.29.0, > 0.29.1, 0.29.1 > [pipenv.exceptions.ResolutionFailure]: Warning: Your dependencies could not > be resolved. You likely have a mismatch in your sub-dependencies. > First try clearing your dependency cache with $ pipenv lock --clear, then > try the original command again. > Alternatively, you can use $ pipenv install --skip-lock to bypass this > mechanism, then run $ pipenv graph to inspect the situation. > Hint: try $ pipenv lock --pre if it is a pre-release dependency. > ERROR: ERROR: Could not
[jira] [Commented] (AIRFLOW-4403) UI search query in main view should filters by either dag_id or owner
[ https://issues.apache.org/jira/browse/AIRFLOW-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827931#comment-16827931 ] ASF subversion and git services commented on AIRFLOW-4403: -- Commit 0fcc83c1a16e94d8f0754f740dbc49e2041176f4 in airflow's branch refs/heads/v1-10-test from Jarek Potiuk [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=0fcc83c ] fixup! [AIRFLOW-4403] search by `dag_id` or `owners` in UI (#5184) > UI search query in main view should filters by either dag_id or owner > - > > Key: AIRFLOW-4403 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4403 > Project: Apache Airflow > Issue Type: Bug > Components: ui >Affects Versions: 1.10.3 >Reporter: Damien Avrillon >Assignee: Vladimir Gavrilenko >Priority: Minor > Fix For: 1.10.4 > > > In airflow 1.10.2, In main dags view, UI search was filtering not only 'DAG' > but also 'Owner' column. > For example, if there are multiple dags / owners: > dag_id_1 / owner_id_1 > dag_id_2 / owner_id_2 > dag_id_3 / owner_id_2 > Searching for '_2' would return 2 dags: 'dag_id_2' and 'dag_id_3' because > 'Owner' matched. > As upgrading to 1.10.3, the same search yields one single result: 'dag_id_2'. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3720) GoogleCloudStorageToS3Operator - incorrect folder compare
[ https://issues.apache.org/jira/browse/AIRFLOW-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827917#comment-16827917 ] ASF subversion and git services commented on AIRFLOW-3720: -- Commit 3d6f1ae7e47644d5fb7823a6fd1a9f861e1c077b in airflow's branch refs/heads/v1-10-stable from Rodrigo Chaparro Plata Hernandez [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3d6f1ae ] [AIRFLOW-3720] Fix missmatch while comparing GCS and S3 files (#4766) (cherry picked from commit 60b9023ed92b31a75dbdf8b33ce7e9c2bc3637d1) > GoogleCloudStorageToS3Operator - incorrect folder compare > -- > > Key: AIRFLOW-3720 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3720 > Project: Apache Airflow > Issue Type: Bug > Components: aws >Affects Versions: 1.10.0 >Reporter: Chaim >Assignee: Chaim >Priority: Major > Fix For: 1.10.4, 2.0.0 > > > the code that compares folders from gcp to s3 is incorrect. > the code is: > files = set(files) - set(existing_files) > but the list from gcp has a "/" to the name, for example: "myfolder/", while > in s3 it does not have "/" so the folder is "myfolder" > the result is that the code tries to recopy the folder name but fails since > it already exists -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3720) GoogleCloudStorageToS3Operator - incorrect folder compare
[ https://issues.apache.org/jira/browse/AIRFLOW-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827922#comment-16827922 ] ASF subversion and git services commented on AIRFLOW-3720: -- Commit fcfc8e963c78f31e71369246a5eb67099b2dd4a7 in airflow's branch refs/heads/v1-10-test from Rodrigo Chaparro Plata Hernandez [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=fcfc8e9 ] [AIRFLOW-3720] Fix missmatch while comparing GCS and S3 files (#4766) (cherry picked from commit 60b9023ed92b31a75dbdf8b33ce7e9c2bc3637d1) > GoogleCloudStorageToS3Operator - incorrect folder compare > -- > > Key: AIRFLOW-3720 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3720 > Project: Apache Airflow > Issue Type: Bug > Components: aws >Affects Versions: 1.10.0 >Reporter: Chaim >Assignee: Chaim >Priority: Major > Fix For: 1.10.4, 2.0.0 > > > the code that compares folders from gcp to s3 is incorrect. > the code is: > files = set(files) - set(existing_files) > but the list from gcp has a "/" to the name, for example: "myfolder/", while > in s3 it does not have "/" so the folder is "myfolder" > the result is that the code tries to recopy the folder name but fails since > it already exists -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4397) Add GCSUploadSessionCompleteSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827921#comment-16827921 ] ASF subversion and git services commented on AIRFLOW-4397: -- Commit 97c31a1cb45d4ecee672b5257e055ca4f9effd95 in airflow's branch refs/heads/v1-10-test from Jacob Ferriero [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=97c31a1 ] [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor (#5166) * [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor This commit add a GoogleCloudStorageUploadSessionCompleteSensor to address the use case of accepting files from a third party vendor who refuses to send a success indicator when providing data files into a bucket and waiting until an inactivity period has passed to indicate the end of an upload session. (cherry picked from commit 2bb79197cdba49b43a5a821674af1f11c0279d75) > Add GCSUploadSessionCompleteSensor > -- > > Key: AIRFLOW-4397 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4397 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Labels: beginner, newbie > Fix For: 1.10.4, 2.0.0 > > > I'd like to contribute a Sensor for Google Cloud Storage that can poke a > bucket until there has been sufficient time without a new file drop. Often > times there are cases where a third party vendor drops data to a bucket but > don't send a success flag when they are done. This sensor would allow you to > poke every n minutes to check if more files have been added since the last > poke, and if there had been `inactivity_period` minutes without a new file > drop, return `True`. This could allow SLA misses if data did not arrive by an > expected time, and have a configurable deadline past which the sensor would > fail. Optionally the user could specify a minimum number of files for the > sensor to succeed. This would be my first time contributing to an OSS > project, so please let me know if this is not the appropriate place to start. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4168) Create Google Cloud Video Intelligence Operators
[ https://issues.apache.org/jira/browse/AIRFLOW-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827923#comment-16827923 ] ASF subversion and git services commented on AIRFLOW-4168: -- Commit 3b64582156cd9681ad6ce9305218e39d6dce6641 in airflow's branch refs/heads/v1-10-test from Antoni Smoliński [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3b64582 ] [AIRFLOW-4168] Create Google Cloud Video Intelligence Operators (#4985) (cherry picked from commit df18b02d2abc98df2ef850f25b82f8202106a147) > Create Google Cloud Video Intelligence Operators > > > Key: AIRFLOW-4168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4168 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Antoni Smoliński >Assignee: Antoni Smoliński >Priority: Major > Fix For: 1.10.4, 2.0.0 > > > Create Video Intelligence operators: > CloudVideoIntelligenceDetectVideoLabelsOperator, > CloudVideoIntelligenceDetectVideoExplicitContentOperator, > CloudVideoIntelligenceDetectVideoShotsOperator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4397) Add GCSUploadSessionCompleteSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827920#comment-16827920 ] ASF subversion and git services commented on AIRFLOW-4397: -- Commit 97c31a1cb45d4ecee672b5257e055ca4f9effd95 in airflow's branch refs/heads/v1-10-test from Jacob Ferriero [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=97c31a1 ] [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor (#5166) * [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor This commit add a GoogleCloudStorageUploadSessionCompleteSensor to address the use case of accepting files from a third party vendor who refuses to send a success indicator when providing data files into a bucket and waiting until an inactivity period has passed to indicate the end of an upload session. (cherry picked from commit 2bb79197cdba49b43a5a821674af1f11c0279d75) > Add GCSUploadSessionCompleteSensor > -- > > Key: AIRFLOW-4397 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4397 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Labels: beginner, newbie > Fix For: 1.10.4, 2.0.0 > > > I'd like to contribute a Sensor for Google Cloud Storage that can poke a > bucket until there has been sufficient time without a new file drop. Often > times there are cases where a third party vendor drops data to a bucket but > don't send a success flag when they are done. This sensor would allow you to > poke every n minutes to check if more files have been added since the last > poke, and if there had been `inactivity_period` minutes without a new file > drop, return `True`. This could allow SLA misses if data did not arrive by an > expected time, and have a configurable deadline past which the sensor would > fail. Optionally the user could specify a minimum number of files for the > sensor to succeed. This would be my first time contributing to an OSS > project, so please let me know if this is not the appropriate place to start. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4397) Add GCSUploadSessionCompleteSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827916#comment-16827916 ] ASF subversion and git services commented on AIRFLOW-4397: -- Commit f5f07de46bdc493dae60a479c90a95c8361b8827 in airflow's branch refs/heads/v1-10-stable from Jacob Ferriero [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=f5f07de ] [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor (#5166) * [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor This commit add a GoogleCloudStorageUploadSessionCompleteSensor to address the use case of accepting files from a third party vendor who refuses to send a success indicator when providing data files into a bucket and waiting until an inactivity period has passed to indicate the end of an upload session. (cherry picked from commit 2bb79197cdba49b43a5a821674af1f11c0279d75) > Add GCSUploadSessionCompleteSensor > -- > > Key: AIRFLOW-4397 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4397 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Labels: beginner, newbie > Fix For: 1.10.4, 2.0.0 > > > I'd like to contribute a Sensor for Google Cloud Storage that can poke a > bucket until there has been sufficient time without a new file drop. Often > times there are cases where a third party vendor drops data to a bucket but > don't send a success flag when they are done. This sensor would allow you to > poke every n minutes to check if more files have been added since the last > poke, and if there had been `inactivity_period` minutes without a new file > drop, return `True`. This could allow SLA misses if data did not arrive by an > expected time, and have a configurable deadline past which the sensor would > fail. Optionally the user could specify a minimum number of files for the > sensor to succeed. This would be my first time contributing to an OSS > project, so please let me know if this is not the appropriate place to start. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4397) Add GCSUploadSessionCompleteSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827915#comment-16827915 ] ASF subversion and git services commented on AIRFLOW-4397: -- Commit f5f07de46bdc493dae60a479c90a95c8361b8827 in airflow's branch refs/heads/v1-10-stable from Jacob Ferriero [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=f5f07de ] [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor (#5166) * [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor This commit add a GoogleCloudStorageUploadSessionCompleteSensor to address the use case of accepting files from a third party vendor who refuses to send a success indicator when providing data files into a bucket and waiting until an inactivity period has passed to indicate the end of an upload session. (cherry picked from commit 2bb79197cdba49b43a5a821674af1f11c0279d75) > Add GCSUploadSessionCompleteSensor > -- > > Key: AIRFLOW-4397 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4397 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Labels: beginner, newbie > Fix For: 1.10.4, 2.0.0 > > > I'd like to contribute a Sensor for Google Cloud Storage that can poke a > bucket until there has been sufficient time without a new file drop. Often > times there are cases where a third party vendor drops data to a bucket but > don't send a success flag when they are done. This sensor would allow you to > poke every n minutes to check if more files have been added since the last > poke, and if there had been `inactivity_period` minutes without a new file > drop, return `True`. This could allow SLA misses if data did not arrive by an > expected time, and have a configurable deadline past which the sensor would > fail. Optionally the user could specify a minimum number of files for the > sensor to succeed. This would be my first time contributing to an OSS > project, so please let me know if this is not the appropriate place to start. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4168) Create Google Cloud Video Intelligence Operators
[ https://issues.apache.org/jira/browse/AIRFLOW-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827918#comment-16827918 ] ASF subversion and git services commented on AIRFLOW-4168: -- Commit b4b48b53b6de778cf957ee68b5ab08d291e8e3ee in airflow's branch refs/heads/v1-10-stable from Antoni Smoliński [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=b4b48b5 ] [AIRFLOW-4168] Create Google Cloud Video Intelligence Operators (#4985) (cherry picked from commit df18b02d2abc98df2ef850f25b82f8202106a147) > Create Google Cloud Video Intelligence Operators > > > Key: AIRFLOW-4168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4168 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Antoni Smoliński >Assignee: Antoni Smoliński >Priority: Major > Fix For: 1.10.4, 2.0.0 > > > Create Video Intelligence operators: > CloudVideoIntelligenceDetectVideoLabelsOperator, > CloudVideoIntelligenceDetectVideoExplicitContentOperator, > CloudVideoIntelligenceDetectVideoShotsOperator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] OmerJog commented on issue #4830: [AIRFLOW-3929] Use anchor tag for dag filter link.
OmerJog commented on issue #4830: [AIRFLOW-3929] Use anchor tag for dag filter link. URL: https://github.com/apache/airflow/pull/4830#issuecomment-487368134 @ashb There is a pending pr to fix the zoom in issue: https://github.com/apache/airflow/pull/4473 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #5166: [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor
potiuk commented on issue #5166: [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor URL: https://github.com/apache/airflow/pull/5166#issuecomment-487364370 @jaketf - indeed @OmerJog is right. Most of the docs are now auto-generated in Reference but integration still needs to be updated. Thanks for noticing @OmerJog. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] OmerJog commented on issue #5176: [AIRFLOW-2636] Fix Nonetype error for task failure instance duration
OmerJog commented on issue #5176: [AIRFLOW-2636] Fix Nonetype error for task failure instance duration URL: https://github.com/apache/airflow/pull/5176#issuecomment-487362582 There is a pending previous PR https://github.com/apache/airflow/pull/4032 and @ashb requested unit tests before the PR can be merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] OmerJog commented on issue #5166: [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor
OmerJog commented on issue #5166: [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor URL: https://github.com/apache/airflow/pull/5166#issuecomment-487362261 @jaketf The PR is missing an edit to docs/integration.rst would you mind adding the it in a new PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-1860) When adding variables in Web GUI they are case-insensitive, but everywhere else they are case-sensitive
[ https://issues.apache.org/jira/browse/AIRFLOW-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jack updated AIRFLOW-1860: -- Attachment: aIRFLOW11.PNG > When adding variables in Web GUI they are case-insensitive, but everywhere > else they are case-sensitive > --- > > Key: AIRFLOW-1860 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1860 > Project: Apache Airflow > Issue Type: Bug > Components: webapp >Affects Versions: 1.8.0 >Reporter: Zelko Nikolic >Priority: Minor > Attachments: aIRFLOW11.PNG > > > 1. Open Web GUI -> Variables > 2. Add a variable named "ABCD" and enter some value for it. > 3. Now add a variable named "abcd", enter some value and save it. Oh, you > can't. It says that variable already exists. As if variable names are > case-insensitive. But they are not. > 4. Because if you try accessing the first variable from the code, using > Variable.get("abcd"), it will say that variable doesn't exist. > So, from the code variables are case-sensitive, but when Web GUI is adding > them, it treats them case-insensitive. To make things even weirder, when you > sort the variables by name in Web GUI, it sorts uppercase first, then > lowercase. Obviously, even Web GUI isn't consistent with itself, because it's > treating them sometimes case-sensitive, sometimes case-insensitive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] feluelle commented on a change in pull request #5162: [AIRFLOW-4358] Speed up test_jobs by not running tasks
feluelle commented on a change in pull request #5162: [AIRFLOW-4358] Speed up test_jobs by not running tasks URL: https://github.com/apache/airflow/pull/5162#discussion_r279185479 ## File path: tests/executors/test_executor.py ## @@ -16,47 +16,73 @@ # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. + +from collections import defaultdict + from airflow.executors.base_executor import BaseExecutor from airflow.utils.state import State - -from airflow import settings +from airflow.utils.db import create_session class TestExecutor(BaseExecutor): """ TestExecutor is used for unit testing purposes. """ -def __init__(self, do_update=False, *args, **kwargs): +def __init__(self, do_update=True, *args, **kwargs): self.do_update = do_update self._running = [] + +# A list of "batches" of tasks self.history = [] +# All the tasks, in a stable sort order +self.sorted_tasks = [] +self.mock_task_results = defaultdict(lambda: State.SUCCESS) super().__init__(*args, **kwargs) -def execute_async(self, key, command, queue=None): -self.log.debug("{} running task instances".format(len(self.running))) -self.log.debug("{} in queue".format(len(self.queued_tasks))) - def heartbeat(self): -session = settings.Session() -if self.do_update: +if not self.do_update: Review comment: Don't you want to _mock_ the `heartbeat` here? If so you could remove that extra statement and just patch the `TestExecutor.heartbeat` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] RosterIn commented on issue #5196: [AIRFLOW-4423] Improve date handling in mysql to gcs operator.
RosterIn commented on issue #5196: [AIRFLOW-4423] Improve date handling in mysql to gcs operator. URL: https://github.com/apache/airflow/pull/5196#issuecomment-487360020 My previous PR is somehow related as this one also address TIME field. I suggest going through the comments of the mysql lib commiter which kindly gave input: https://github.com/apache/airflow/pull/4802 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-1860) When adding variables in Web GUI they are case-insensitive, but everywhere else they are case-sensitive
[ https://issues.apache.org/jira/browse/AIRFLOW-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827864#comment-16827864 ] jack commented on AIRFLOW-1860: --- Can not reproduce on Airflow 1.10.3: !aIRFLOW11.PNG! > When adding variables in Web GUI they are case-insensitive, but everywhere > else they are case-sensitive > --- > > Key: AIRFLOW-1860 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1860 > Project: Apache Airflow > Issue Type: Bug > Components: webapp >Affects Versions: 1.8.0 >Reporter: Zelko Nikolic >Priority: Minor > Attachments: aIRFLOW11.PNG > > > 1. Open Web GUI -> Variables > 2. Add a variable named "ABCD" and enter some value for it. > 3. Now add a variable named "abcd", enter some value and save it. Oh, you > can't. It says that variable already exists. As if variable names are > case-insensitive. But they are not. > 4. Because if you try accessing the first variable from the code, using > Variable.get("abcd"), it will say that variable doesn't exist. > So, from the code variables are case-sensitive, but when Web GUI is adding > them, it treats them case-insensitive. To make things even weirder, when you > sort the variables by name in Web GUI, it sorts uppercase first, then > lowercase. Obviously, even Web GUI isn't consistent with itself, because it's > treating them sometimes case-sensitive, sometimes case-insensitive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3791) Multiple jobs in run & check if already running
[ https://issues.apache.org/jira/browse/AIRFLOW-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827863#comment-16827863 ] Chaim commented on AIRFLOW-3791: how can i get my pull request to go in? > Multiple jobs in run & check if already running > --- > > Key: AIRFLOW-3791 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3791 > Project: Apache Airflow > Issue Type: Improvement > Components: Dataflow >Affects Versions: 1.10.2 >Reporter: Chaim >Assignee: Chaim >Priority: Major > Fix For: 1.10.4 > > > # Support to check if job is already running before starting java job > # In case dataflow creates more than one job, we need to track all jobs for > status -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] tssujt commented on issue #5176: [AIRFLOW-2636] Fix Nonetype error for task failure instance duration
tssujt commented on issue #5176: [AIRFLOW-2636] Fix Nonetype error for task failure instance duration URL: https://github.com/apache/airflow/pull/5176#issuecomment-487359241 Rebased @feng-tao This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feluelle commented on a change in pull request #5196: [AIRFLOW-4423] Improve date handling in mysql to gcs operator.
feluelle commented on a change in pull request #5196: [AIRFLOW-4423] Improve date handling in mysql to gcs operator. URL: https://github.com/apache/airflow/pull/5196#discussion_r279184388 ## File path: tests/contrib/operators/test_mysql_to_gcs_operator.py ## @@ -106,7 +121,7 @@ def _assert_upload(bucket, obj, tmp_filename, mime_type=None): op.execute(None) mysql_hook_mock_class.assert_called_once_with(mysql_conn_id=MYSQL_CONN_ID) - mysql_hook_mock.get_conn().cursor().execute.assert_called_once_with(SQL) +mysql_hook_mock.get_conn().cursor().execute.assert_called_with(SQL) Review comment: Please test that you executed the `tz_query` before you execute the actual `sql` statement. You can test that via [assert_has_calls](https://docs.python.org/3.6/library/unittest.mock.html#unittest.mock.Mock.assert_has_calls). Also see this https://github.com/apache/airflow/pull/5101#discussion_r275438792 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feluelle commented on a change in pull request #5196: [AIRFLOW-4423] Improve date handling in mysql to gcs operator.
feluelle commented on a change in pull request #5196: [AIRFLOW-4423] Improve date handling in mysql to gcs operator. URL: https://github.com/apache/airflow/pull/5196#discussion_r279183742 ## File path: airflow/contrib/operators/mysql_to_gcs.py ## @@ -265,27 +288,31 @@ def _upload_to_gcs(self, files_to_upload): tmp_file.get('file_handle').name, mime_type=tmp_file.get('file_mime_type')) -@staticmethod -def _convert_types(schema, col_type_dict, row): +@classmethod +def _convert_types(cls, schema, col_type_dict, row): +return [ +cls._convert_type(value, col_type_dict.get(name)) +for name, value in zip(schema, row) +] + +@classmethod +def _convert_type(cls, value, schema_type): Review comment: The documentation for this conversion is good, but in my opinion we should always document the parameters as well + to have a cleaner sphinx generated documentation. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-4425) Add FacebookAdsHook
jack created AIRFLOW-4425: - Summary: Add FacebookAdsHook Key: AIRFLOW-4425 URL: https://issues.apache.org/jira/browse/AIRFLOW-4425 Project: Apache Airflow Issue Type: Wish Components: hooks Reporter: jack Fix For: 2.0.0 Add hook to interact with FacebookAds -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] OmerJog commented on issue #4546: AIRFLOW-3720
OmerJog commented on issue #4546: AIRFLOW-3720 URL: https://github.com/apache/airflow/pull/4546#issuecomment-487355614 This can be closed due to the merge of https://github.com/apache/airflow/pull/4766 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io commented on issue #5195: [AIRFLOW-4348] Add GCP console link in BigQueryOperator
codecov-io commented on issue #5195: [AIRFLOW-4348] Add GCP console link in BigQueryOperator URL: https://github.com/apache/airflow/pull/5195#issuecomment-487352417 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5195?src=pr=h1) Report > Merging [#5195](https://codecov.io/gh/apache/airflow/pull/5195?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/75fca09e6d691a3efec66af37cca855332558203?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5195/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5195?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5195 +/- ## == + Coverage 78.54% 78.55% +<.01% == Files 469 469 Lines 2989629905 +9 == + Hits2348323491 +8 - Misses 6413 6414 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5195?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/contrib/operators/bigquery\_operator.py](https://codecov.io/gh/apache/airflow/pull/5195/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9iaWdxdWVyeV9vcGVyYXRvci5weQ==) | `93.95% <100%> (+0.38%)` | :arrow_up: | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5195/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.42% <0%> (-0.18%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5195?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5195?src=pr=footer). Last update [75fca09...6be12da](https://codecov.io/gh/apache/airflow/pull/5195?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao commented on issue #5194: [AIRFLOW-4419] Refine concurrency check in scheduler
feng-tao commented on issue #5194: [AIRFLOW-4419] Refine concurrency check in scheduler URL: https://github.com/apache/airflow/pull/5194#issuecomment-487352281 will take a look at the pr tomorrow This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] KevinYang21 edited a comment on issue #5177: [AIRFLOW-4084] Fix bug downloading incomplete logs from ElasticSearch
KevinYang21 edited a comment on issue #5177: [AIRFLOW-4084] Fix bug downloading incomplete logs from ElasticSearch URL: https://github.com/apache/airflow/pull/5177#issuecomment-487350990 @cong-zhu Do we have unit test for this feature before? If so we'll still need to update it so we capture the use case which this PR is fixing, if not we might want to add that test. Also the CI is failing now, if you're still working on it then you can mark the PR as a WIP PR by putting a `[WIP]` into the PR title. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] KevinYang21 commented on issue #5177: [AIRFLOW-4084] Fix bug downloading incomplete logs from ElasticSearch
KevinYang21 commented on issue #5177: [AIRFLOW-4084] Fix bug downloading incomplete logs from ElasticSearch URL: https://github.com/apache/airflow/pull/5177#issuecomment-487350990 @cong-zhu Do we have unit test for this feature before? If so we'll still need to update it so we capture the use case which this PR is fixing, if not we might want to add that test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (AIRFLOW-4228) DatabricksRunNowOperator does not show up under airflow docs
[ https://issues.apache.org/jira/browse/AIRFLOW-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Feng resolved AIRFLOW-4228. --- Resolution: Fixed Fix Version/s: 2.0.0 > DatabricksRunNowOperator does not show up under airflow docs > > > Key: AIRFLOW-4228 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4228 > Project: Apache Airflow > Issue Type: Bug >Reporter: Thomas Dziedzic >Assignee: Thomas Elvey >Priority: Trivial > Fix For: 2.0.0 > > > [https://airflow.apache.org/_modules/airflow/contrib/operators/databricks_operator.html] > contains DatabricksRunNowOperator but when you visit > [https://airflow.apache.org/integration.html?highlight=databricks#databricks] > there is no mention of the DatabricksRunNowOperator even though it is > available and working. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] feng-tao commented on issue #5177: [AIRFLOW-4084] Fix bug downloading incomplete logs from ElasticSearch
feng-tao commented on issue #5177: [AIRFLOW-4084] Fix bug downloading incomplete logs from ElasticSearch URL: https://github.com/apache/airflow/pull/5177#issuecomment-487350721 cc @KevinYang21 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao commented on issue #5195: [AIRFLOW-4348] Add GCP console link in BigQueryOperator
feng-tao commented on issue #5195: [AIRFLOW-4348] Add GCP console link in BigQueryOperator URL: https://github.com/apache/airflow/pull/5195#issuecomment-487350649 cc GCP folks @fenglu-g @potiuk This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao merged pull request #5182: [AIRFLOW-4361] Fix flaky test_integration_run_dag_with_scheduler_failure
feng-tao merged pull request #5182: [AIRFLOW-4361] Fix flaky test_integration_run_dag_with_scheduler_failure URL: https://github.com/apache/airflow/pull/5182 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao commented on issue #5182: [AIRFLOW-4361] Fix flaky test_integration_run_dag_with_scheduler_failure
feng-tao commented on issue #5182: [AIRFLOW-4361] Fix flaky test_integration_run_dag_with_scheduler_failure URL: https://github.com/apache/airflow/pull/5182#issuecomment-487350412 lgtm This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao merged pull request #5171: [AIRFLOW-4228] DatabricksRunNowOperator missing from documentation
feng-tao merged pull request #5171: [AIRFLOW-4228] DatabricksRunNowOperator missing from documentation URL: https://github.com/apache/airflow/pull/5171 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-4361) Fix flaky test_integration_run_dag_with_scheduler_failure
[ https://issues.apache.org/jira/browse/AIRFLOW-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827844#comment-16827844 ] ASF GitHub Bot commented on AIRFLOW-4361: - feng-tao commented on pull request #5182: [AIRFLOW-4361] Fix flaky test_integration_run_dag_with_scheduler_failure URL: https://github.com/apache/airflow/pull/5182 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix flaky test_integration_run_dag_with_scheduler_failure > - > > Key: AIRFLOW-4361 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4361 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > Fix For: 2.0.0 > > > test_integration_run_dag_with_scheduler_failure often fails with > {code} > ^[[33m ^[[33m^[[1m^[[33mConnectionError^[[0m^[[0m^[[33m: > ^[[0m^[[33mHTTPConnectionPool(host='10.20.3.19', port= 30809): Max > retries exceeded with url: > /api/experimental/dags/example_kubernetes_executor_config/paused/false (Ca > used by NewConnectionError(' 0x7fb80986f208>: Failed to establish a n ew connection: [Errno 111] > Connection refused',))^[[0m^M > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4361) Fix flaky test_integration_run_dag_with_scheduler_failure
[ https://issues.apache.org/jira/browse/AIRFLOW-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827845#comment-16827845 ] ASF subversion and git services commented on AIRFLOW-4361: -- Commit 9daad7ecd14fabfc3f441074670f4b38aa93e8b0 in airflow's branch refs/heads/master from Chao-Han Tsai [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=9daad7e ] [AIRFLOW-4361] Fix flaky test_integration_run_dag_with_scheduler_failure (#5182) > Fix flaky test_integration_run_dag_with_scheduler_failure > - > > Key: AIRFLOW-4361 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4361 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > Fix For: 2.0.0 > > > test_integration_run_dag_with_scheduler_failure often fails with > {code} > ^[[33m ^[[33m^[[1m^[[33mConnectionError^[[0m^[[0m^[[33m: > ^[[0m^[[33mHTTPConnectionPool(host='10.20.3.19', port= 30809): Max > retries exceeded with url: > /api/experimental/dags/example_kubernetes_executor_config/paused/false (Ca > used by NewConnectionError(' 0x7fb80986f208>: Failed to establish a n ew connection: [Errno 111] > Connection refused',))^[[0m^M > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4228) DatabricksRunNowOperator does not show up under airflow docs
[ https://issues.apache.org/jira/browse/AIRFLOW-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827842#comment-16827842 ] ASF subversion and git services commented on AIRFLOW-4228: -- Commit e27492436aae56bf2119e953cf882723411e4667 in airflow's branch refs/heads/master from Tomme [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=e274924 ] [AIRFLOW-4228] DatabricksRunNowOperator does not show up under airflow docs (#5171) * Add DatabricksRunNowOperator to integration.rst documentation > DatabricksRunNowOperator does not show up under airflow docs > > > Key: AIRFLOW-4228 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4228 > Project: Apache Airflow > Issue Type: Bug >Reporter: Thomas Dziedzic >Assignee: Thomas Elvey >Priority: Trivial > > [https://airflow.apache.org/_modules/airflow/contrib/operators/databricks_operator.html] > contains DatabricksRunNowOperator but when you visit > [https://airflow.apache.org/integration.html?highlight=databricks#databricks] > there is no mention of the DatabricksRunNowOperator even though it is > available and working. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-4228) DatabricksRunNowOperator does not show up under airflow docs
[ https://issues.apache.org/jira/browse/AIRFLOW-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827841#comment-16827841 ] ASF GitHub Bot commented on AIRFLOW-4228: - feng-tao commented on pull request #5171: [AIRFLOW-4228] DatabricksRunNowOperator missing from documentation URL: https://github.com/apache/airflow/pull/5171 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DatabricksRunNowOperator does not show up under airflow docs > > > Key: AIRFLOW-4228 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4228 > Project: Apache Airflow > Issue Type: Bug >Reporter: Thomas Dziedzic >Assignee: Thomas Elvey >Priority: Trivial > > [https://airflow.apache.org/_modules/airflow/contrib/operators/databricks_operator.html] > contains DatabricksRunNowOperator but when you visit > [https://airflow.apache.org/integration.html?highlight=databricks#databricks] > there is no mention of the DatabricksRunNowOperator even though it is > available and working. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [airflow] feng-tao merged pull request #5198: [AIRFLOW-XXX] Update readme for Lyft
feng-tao merged pull request #5198: [AIRFLOW-XXX] Update readme for Lyft URL: https://github.com/apache/airflow/pull/5198 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao commented on issue #5198: [AIRFLOW-XXX] Update readme for Lyft
feng-tao commented on issue #5198: [AIRFLOW-XXX] Update readme for Lyft URL: https://github.com/apache/airflow/pull/5198#issuecomment-487348695 PTAL @milton0825 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao opened a new pull request #5198: [AIRFLOW-XXX] Update readme for Lyft
feng-tao opened a new pull request #5198: [AIRFLOW-XXX] Update readme for Lyft URL: https://github.com/apache/airflow/pull/5198 Add a few more folks from Lyft. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] feng-tao commented on issue #5197: [AIRFLOW-XXX] Fix CVE-2019-11358
feng-tao commented on issue #5197: [AIRFLOW-XXX] Fix CVE-2019-11358 URL: https://github.com/apache/airflow/pull/5197#issuecomment-487348001 thanks @XD-DENG :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io commented on issue #5197: [AIRFLOW-XXX] Fix CVE-2019-11358
codecov-io commented on issue #5197: [AIRFLOW-XXX] Fix CVE-2019-11358 URL: https://github.com/apache/airflow/pull/5197#issuecomment-487347426 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=h1) Report > Merging [#5197](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/df18b02d2abc98df2ef850f25b82f8202106a147?src=pr=desc) will **increase** coverage by `0.26%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5197/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5197 +/- ## == + Coverage 78.27% 78.54% +0.26% == Files 469 469 Lines 2989629896 == + Hits2340223482 +80 + Misses 6494 6414 -80 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.42% <0%> (-0.18%)` | :arrow_down: | | [airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5) | `88.79% <0%> (+0.86%)` | :arrow_up: | | [airflow/models/connection.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvY29ubmVjdGlvbi5weQ==) | `65.73% <0%> (+1.12%)` | :arrow_up: | | [airflow/hooks/hive\_hooks.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9oaXZlX2hvb2tzLnB5) | `74.93% <0%> (+1.86%)` | :arrow_up: | | [airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5) | `80.95% <0%> (+4.76%)` | :arrow_up: | | [airflow/operators/mysql\_operator.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvbXlzcWxfb3BlcmF0b3IucHk=) | `100% <0%> (+100%)` | :arrow_up: | | [airflow/operators/mysql\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvbXlzcWxfdG9faGl2ZS5weQ==) | `100% <0%> (+100%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=footer). Last update [df18b02...b9beb58](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] XD-DENG merged pull request #5197: [AIRFLOW-XXX] Fix CVE-2019-11358
XD-DENG merged pull request #5197: [AIRFLOW-XXX] Fix CVE-2019-11358 URL: https://github.com/apache/airflow/pull/5197 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io commented on issue #5197: [AIRFLOW-XXX] Fix CVE-2019-11358
codecov-io commented on issue #5197: [AIRFLOW-XXX] Fix CVE-2019-11358 URL: https://github.com/apache/airflow/pull/5197#issuecomment-487347427 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=h1) Report > Merging [#5197](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=desc) into [master](https://codecov.io/gh/apache/airflow/commit/df18b02d2abc98df2ef850f25b82f8202106a147?src=pr=desc) will **increase** coverage by `0.26%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5197/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=tree) ```diff @@Coverage Diff @@ ## master#5197 +/- ## == + Coverage 78.27% 78.54% +0.26% == Files 469 469 Lines 2989629896 == + Hits2340223482 +80 + Misses 6494 6414 -80 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `92.42% <0%> (-0.18%)` | :arrow_down: | | [airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5) | `88.79% <0%> (+0.86%)` | :arrow_up: | | [airflow/models/connection.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvY29ubmVjdGlvbi5weQ==) | `65.73% <0%> (+1.12%)` | :arrow_up: | | [airflow/hooks/hive\_hooks.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9oaXZlX2hvb2tzLnB5) | `74.93% <0%> (+1.86%)` | :arrow_up: | | [airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5) | `80.95% <0%> (+4.76%)` | :arrow_up: | | [airflow/operators/mysql\_operator.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvbXlzcWxfb3BlcmF0b3IucHk=) | `100% <0%> (+100%)` | :arrow_up: | | [airflow/operators/mysql\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/5197/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvbXlzcWxfdG9faGl2ZS5weQ==) | `100% <0%> (+100%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=footer). Last update [df18b02...b9beb58](https://codecov.io/gh/apache/airflow/pull/5197?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services