[GitHub] yrqls21 commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-08-31 Thread GitBox
yrqls21 commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor
URL: 
https://github.com/apache/incubator-airflow/pull/3830#issuecomment-417831376
 
 
   @feng-tao Not sure about what's our timeline for 1.10.1 but there are two 
other PRs about the same topic coming in the next week and they are less 
mature( in terms of time they've been running in our cluster). I would hope if 
we make this into 1.10.1 we make three of them in together. ( With three of 
them together, conservatively Airflow should be able to handle 30k concurrent 
running tasks and 4k DAG files while meeting 5 mins, even in extreme cases like 
30k tasks need to be scheduled at the same time)
   
   @aoen @Fokko @bolkedebruin @mistercrunch @afernandez @saguziel @YingboWang 
PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-08-31 Thread GitBox
feng-tao commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor
URL: 
https://github.com/apache/incubator-airflow/pull/3830#issuecomment-417830505
 
 
   Nice job! Are we aiming this improvement for 1.10.1?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3832: [AIRFLOW-2979] Make celery_result_backend conf Backwards compatible

2018-08-31 Thread GitBox
codecov-io commented on issue #3832: [AIRFLOW-2979] Make celery_result_backend 
conf Backwards compatible
URL: 
https://github.com/apache/incubator-airflow/pull/3832#issuecomment-417827204
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3832?src=pr=h1)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@f279151`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3832/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3832?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#3832   +/-   ##
   =
 Coverage  ?   77.43%   
   =
 Files ?  203   
 Lines ?15846   
 Branches  ?0   
   =
 Hits  ?12271   
 Misses? 3575   
 Partials  ?0
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3832?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/configuration.py](https://codecov.io/gh/apache/incubator-airflow/pull/3832/diff?src=pr=tree#diff-YWlyZmxvdy9jb25maWd1cmF0aW9uLnB5)
 | `84.07% <ø> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3832?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3832?src=pr=footer).
 Last update 
[f279151...cbed29f](https://codecov.io/gh/apache/incubator-airflow/pull/3832?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3832: [AIRFLOW-2979] Make celery_result_backend conf Backwards compatible

2018-08-31 Thread GitBox
codecov-io edited a comment on issue #3832: [AIRFLOW-2979] Make 
celery_result_backend conf Backwards compatible
URL: 
https://github.com/apache/incubator-airflow/pull/3832#issuecomment-417827204
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3832?src=pr=h1)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@f279151`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3832/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3832?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#3832   +/-   ##
   =
 Coverage  ?   77.43%   
   =
 Files ?  203   
 Lines ?15846   
 Branches  ?0   
   =
 Hits  ?12271   
 Misses? 3575   
 Partials  ?0
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3832?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/configuration.py](https://codecov.io/gh/apache/incubator-airflow/pull/3832/diff?src=pr=tree#diff-YWlyZmxvdy9jb25maWd1cmF0aW9uLnB5)
 | `84.07% <ø> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3832?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3832?src=pr=footer).
 Last update 
[f279151...cbed29f](https://codecov.io/gh/apache/incubator-airflow/pull/3832?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3831: [AIRFLOW-2981] Fix TypeError in dataflow operators

2018-08-31 Thread GitBox
codecov-io edited a comment on issue #3831: [AIRFLOW-2981] Fix TypeError in 
dataflow operators
URL: 
https://github.com/apache/incubator-airflow/pull/3831#issuecomment-417826829
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3831?src=pr=h1)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@f279151`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3831/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3831?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#3831   +/-   ##
   =
 Coverage  ?   77.43%   
   =
 Files ?  203   
 Lines ?15846   
 Branches  ?0   
   =
 Hits  ?12271   
 Misses? 3575   
 Partials  ?0
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3831?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3831?src=pr=footer).
 Last update 
[f279151...8aa8eb4](https://codecov.io/gh/apache/incubator-airflow/pull/3831?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3831: [AIRFLOW-2981] Fix TypeError in dataflow operators

2018-08-31 Thread GitBox
codecov-io commented on issue #3831: [AIRFLOW-2981] Fix TypeError in dataflow 
operators
URL: 
https://github.com/apache/incubator-airflow/pull/3831#issuecomment-417826829
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3831?src=pr=h1)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@f279151`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3831/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3831?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#3831   +/-   ##
   =
 Coverage  ?   77.43%   
   =
 Files ?  203   
 Lines ?15846   
 Branches  ?0   
   =
 Hits  ?12271   
 Misses? 3575   
 Partials  ?0
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3831?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3831?src=pr=footer).
 Last update 
[f279151...8aa8eb4](https://codecov.io/gh/apache/incubator-airflow/pull/3831?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-2949) Syntax Highlight for Single Quote

2018-08-31 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-2949.
-
   Resolution: Fixed
Fix Version/s: 1.10.1

Resolved by https://github.com/apache/incubator-airflow/pull/3795

> Syntax Highlight for Single Quote
> -
>
> Key: AIRFLOW-2949
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2949
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Tomas Zulberti
>Priority: Major
> Fix For: 1.10.1
>
> Attachments: image-2018-08-23-16-16-59-375.png
>
>
> When checking the code of any DAG, there is a highlight for double quote 
> strings but there isn't any for single quote strings. pygments generate a 
> special css class but there is no color asigned for them
>  
> !image-2018-08-23-16-16-59-375.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2953) Installing postgres and gcp_api doesn't install PostgresToGoogleCloudStorageOperator

2018-08-31 Thread Kaxil Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599498#comment-16599498
 ] 

Kaxil Naik commented on AIRFLOW-2953:
-

Try to updated apache-airflow to use 1.10.0 . This operator was added in 
`1.10.0`

> Installing postgres and gcp_api doesn't install 
> PostgresToGoogleCloudStorageOperator
> 
>
> Key: AIRFLOW-2953
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2953
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: robert kelly
>Priority: Minor
>
> I have gone through the installation process several times in python2 and 
> python3.   I'm wondering how to handle this without manually copying the 
> class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2972) Can't see the dag logs in the browser

2018-08-31 Thread Kaxil Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599478#comment-16599478
 ] 

Kaxil Naik commented on AIRFLOW-2972:
-

[~dbar] It would be lazily loaded. Scroll to the bottom of the page. Double 
check that javascript is enabled for your browser.

> Can't see the dag logs in the browser
> -
>
> Key: AIRFLOW-2972
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2972
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.10
>Reporter: Dedi
>Priority: Major
> Attachments: empty log.jpg
>
>
>  
>  # I install clean 10.1 AirFlow
>  # Trigger tutorial dag
>  # Click on dag logs
> [http://192.168.56.13:8080/admin/airflow/log?task_id=print_date_id=tutorial_date=2018-08-28T14%3A54%3A34.326568%2B00%3A00]
> Expected:
> to see the print date
>  
> Actual:
> Empty, See attach image



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-2979) Deprecated Celery Option not in Options list

2018-08-31 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2979 started by Kaxil Naik.
---
> Deprecated Celery Option not in Options list 
> -
>
> Key: AIRFLOW-2979
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2979
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
>Affects Versions: 1.10.0
>Reporter: Micheal Ascah
>Assignee: Kaxil Naik
>Priority: Critical
>
> References AIRFLOW-1840
> In airflow/configuration.py
> {code:java}
> # A two-level mapping of (section -> new_name -> old_name). When reading
> # new_name, the old_name will be checked to see if it exists. If it does a
> # DeprecationWarning will be issued and the old name will be used instead
> deprecated_options = {
> 'celery': {
> # Remove these keys in Airflow 1.11
> 'worker_concurrency': 'celeryd_concurrency',
> 'broker_url': 'celery_broker_url',
> 'ssl_active': 'celery_ssl_active',
> 'ssl_cert': 'celery_ssl_cert',
> 'ssl_key': 'celery_ssl_key',
> }
> }
> {code}
> This block is missing the renaming of celery_result_backend to just 
> result_backed.
>  
> When setting this through an environment variable, the deprecated config name 
> is not being used and instead the default value in the file is being used. 
> This is obviously remedied by the reading the UPDATING and setting the new 
> name, but this change has broken back compat as far as I can tell.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2979) Deprecated Celery Option not in Options list

2018-08-31 Thread Kaxil Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599477#comment-16599477
 ] 

Kaxil Naik commented on AIRFLOW-2979:
-

Created a PR at https://github.com/apache/incubator-airflow/pull/3832

> Deprecated Celery Option not in Options list 
> -
>
> Key: AIRFLOW-2979
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2979
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
>Affects Versions: 1.10.0
>Reporter: Micheal Ascah
>Assignee: Kaxil Naik
>Priority: Critical
>
> References AIRFLOW-1840
> In airflow/configuration.py
> {code:java}
> # A two-level mapping of (section -> new_name -> old_name). When reading
> # new_name, the old_name will be checked to see if it exists. If it does a
> # DeprecationWarning will be issued and the old name will be used instead
> deprecated_options = {
> 'celery': {
> # Remove these keys in Airflow 1.11
> 'worker_concurrency': 'celeryd_concurrency',
> 'broker_url': 'celery_broker_url',
> 'ssl_active': 'celery_ssl_active',
> 'ssl_cert': 'celery_ssl_cert',
> 'ssl_key': 'celery_ssl_key',
> }
> }
> {code}
> This block is missing the renaming of celery_result_backend to just 
> result_backed.
>  
> When setting this through an environment variable, the deprecated config name 
> is not being used and instead the default value in the file is being used. 
> This is obviously remedied by the reading the UPDATING and setting the new 
> name, but this change has broken back compat as far as I can tell.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2979) Deprecated Celery Option not in Options list

2018-08-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599475#comment-16599475
 ] 

ASF GitHub Bot commented on AIRFLOW-2979:
-

kaxil opened a new pull request #3832: [AIRFLOW-2979] Make 
celery_result_backend conf Backwards compatible
URL: https://github.com/apache/incubator-airflow/pull/3832
 
 
   
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2979
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   (#2806) Renamed `celery_result_backend` to `result_backend` and broke 
backwards compatibility.
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Deprecated Celery Option not in Options list 
> -
>
> Key: AIRFLOW-2979
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2979
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
>Affects Versions: 1.10.0
>Reporter: Micheal Ascah
>Assignee: Kaxil Naik
>Priority: Critical
>
> References AIRFLOW-1840
> In airflow/configuration.py
> {code:java}
> # A two-level mapping of (section -> new_name -> old_name). When reading
> # new_name, the old_name will be checked to see if it exists. If it does a
> # DeprecationWarning will be issued and the old name will be used instead
> deprecated_options = {
> 'celery': {
> # Remove these keys in Airflow 1.11
> 'worker_concurrency': 'celeryd_concurrency',
> 'broker_url': 'celery_broker_url',
> 'ssl_active': 'celery_ssl_active',
> 'ssl_cert': 'celery_ssl_cert',
> 'ssl_key': 'celery_ssl_key',
> }
> }
> {code}
> This block is missing the renaming of celery_result_backend to just 
> result_backed.
>  
> When setting this through an environment variable, the deprecated config name 
> is not being used and instead the default value in the file is being used. 
> This is obviously remedied by the reading the UPDATING and setting the new 
> name, but this change has broken back compat as far as I can tell.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kaxil opened a new pull request #3832: [AIRFLOW-2979] Make celery_result_backend conf Backwards compatible

2018-08-31 Thread GitBox
kaxil opened a new pull request #3832: [AIRFLOW-2979] Make 
celery_result_backend conf Backwards compatible
URL: https://github.com/apache/incubator-airflow/pull/3832
 
 
   
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2979
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   (#2806) Renamed `celery_result_backend` to `result_backend` and broke 
backwards compatibility.
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (AIRFLOW-2979) Deprecated Celery Option not in Options list

2018-08-31 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik reassigned AIRFLOW-2979:
---

Assignee: Kaxil Naik

> Deprecated Celery Option not in Options list 
> -
>
> Key: AIRFLOW-2979
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2979
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
>Affects Versions: 1.10.0
>Reporter: Micheal Ascah
>Assignee: Kaxil Naik
>Priority: Critical
>
> References AIRFLOW-1840
> In airflow/configuration.py
> {code:java}
> # A two-level mapping of (section -> new_name -> old_name). When reading
> # new_name, the old_name will be checked to see if it exists. If it does a
> # DeprecationWarning will be issued and the old name will be used instead
> deprecated_options = {
> 'celery': {
> # Remove these keys in Airflow 1.11
> 'worker_concurrency': 'celeryd_concurrency',
> 'broker_url': 'celery_broker_url',
> 'ssl_active': 'celery_ssl_active',
> 'ssl_cert': 'celery_ssl_cert',
> 'ssl_key': 'celery_ssl_key',
> }
> }
> {code}
> This block is missing the renaming of celery_result_backend to just 
> result_backed.
>  
> When setting this through an environment variable, the deprecated config name 
> is not being used and instead the default value in the file is being used. 
> This is obviously remedied by the reading the UPDATING and setting the new 
> name, but this change has broken back compat as far as I can tell.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] codecov-io edited a comment on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-08-31 Thread GitBox
codecov-io edited a comment on issue #3830: [AIRFLOW-2156] Parallelize Celery 
Executor
URL: 
https://github.com/apache/incubator-airflow/pull/3830#issuecomment-417825080
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3830?src=pr=h1)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@f279151`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `69.49%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3830/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3830?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#3830   +/-   ##
   =
 Coverage  ?   77.43%   
   =
 Files ?  203   
 Lines ?15880   
 Branches  ?0   
   =
 Hits  ?12296   
 Misses? 3584   
 Partials  ?0
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3830?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/executors/celery\_executor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3830/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvY2VsZXJ5X2V4ZWN1dG9yLnB5)
 | `80% <69.49%> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3830?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3830?src=pr=footer).
 Last update 
[f279151...23faf39](https://codecov.io/gh/apache/incubator-airflow/pull/3830?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-08-31 Thread GitBox
codecov-io commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor
URL: 
https://github.com/apache/incubator-airflow/pull/3830#issuecomment-417825080
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3830?src=pr=h1)
 Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@f279151`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `69.49%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3830/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3830?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#3830   +/-   ##
   =
 Coverage  ?   77.43%   
   =
 Files ?  203   
 Lines ?15880   
 Branches  ?0   
   =
 Hits  ?12296   
 Misses? 3584   
 Partials  ?0
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3830?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/executors/celery\_executor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3830/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvY2VsZXJ5X2V4ZWN1dG9yLnB5)
 | `80% <69.49%> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3830?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3830?src=pr=footer).
 Last update 
[f279151...23faf39](https://codecov.io/gh/apache/incubator-airflow/pull/3830?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2981) TypeError in dataflow operators when using GCS jar or py_file

2018-08-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599472#comment-16599472
 ] 

ASF GitHub Bot commented on AIRFLOW-2981:
-

kaxil opened a new pull request #3831: [AIRFLOW-2981] Fix TypeError in dataflow 
operators
URL: https://github.com/apache/incubator-airflow/pull/3831
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2981
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   The `GoogleCloudBucketHelper.google_cloud_to_local` function attempts to 
compare a list to an int, resulting in the TypeError, with:
   
   ```
   ...
   path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/')
   if path_components < 2:
   ```
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   - `GoogleCloudBucketHelperTest. test_invalid_object_path`
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


>  TypeError in dataflow operators when using GCS jar or py_file
> --
>
> Key: AIRFLOW-2981
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2981
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, Dataflow
>Affects Versions: 1.9.0, 1.10
>Reporter: Jeffrey Payne
>Assignee: Kaxil Naik
>Priority: Major
>
> The {{GoogleCloudBucketHelper.google_cloud_to_local}} function attempts to 
> compare a list to an int, resulting in the TypeError, with:
> {noformat}
> ...
> path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/')
> if path_components < 2:
> ...
> {noformat}
> This should be {{if len(path_components) < 2:}}.
> Also, fix {{if file_size > 0:}} in same function...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kaxil commented on issue #3831: [AIRFLOW-2981] Fix TypeError in dataflow operators

2018-08-31 Thread GitBox
kaxil commented on issue #3831: [AIRFLOW-2981] Fix TypeError in dataflow 
operators
URL: 
https://github.com/apache/incubator-airflow/pull/3831#issuecomment-417824933
 
 
   cc @fenglu-g @tswast 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil opened a new pull request #3831: [AIRFLOW-2981] Fix TypeError in dataflow operators

2018-08-31 Thread GitBox
kaxil opened a new pull request #3831: [AIRFLOW-2981] Fix TypeError in dataflow 
operators
URL: https://github.com/apache/incubator-airflow/pull/3831
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2981
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   The `GoogleCloudBucketHelper.google_cloud_to_local` function attempts to 
compare a list to an int, resulting in the TypeError, with:
   
   ```
   ...
   path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/')
   if path_components < 2:
   ```
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   - `GoogleCloudBucketHelperTest. test_invalid_object_path`
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (AIRFLOW-2981) TypeError in dataflow operators when using GCS jar or py_file

2018-08-31 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik reassigned AIRFLOW-2981:
---

Assignee: Kaxil Naik  (was: Jeffrey Payne)

>  TypeError in dataflow operators when using GCS jar or py_file
> --
>
> Key: AIRFLOW-2981
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2981
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, Dataflow
>Affects Versions: 1.9.0, 1.10
>Reporter: Jeffrey Payne
>Assignee: Kaxil Naik
>Priority: Major
>
> The {{GoogleCloudBucketHelper.google_cloud_to_local}} function attempts to 
> compare a list to an int, resulting in the TypeError, with:
> {noformat}
> ...
> path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/')
> if path_components < 2:
> ...
> {noformat}
> This should be {{if len(path_components) < 2:}}.
> Also, fix {{if file_size > 0:}} in same function...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2156) Parallelize Celery Executor

2018-08-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599435#comment-16599435
 ] 

ASF GitHub Bot commented on AIRFLOW-2156:
-

yrqls21 opened a new pull request #3830: [AIRFLOW-2156] Parallelize Celery 
Executor
URL: https://github.com/apache/incubator-airflow/pull/3830
 
 
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-2156) issues and references 
them in the PR title.
 - https://issues.apache.org/jira/browse/AIRFLOW-2156
   
   ### Description
   This change is mostly authored by @aoen. I am merely doing the tests/small 
fix and PR publishing due to job change of him.
   
   The change would bring Airflow to meet 5 min scheduling SLA with 30k running 
tasks according to Airbnb production traffic and stress tests. The performance 
of the celery querying step is hugely improved by 15x+ with 16 processors and 
can potentially be fast with more processors.
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   ![screen shot 2018-06-13 at 12 01 17 am 
copy](https://user-images.githubusercontent.com/7818710/44940493-2bdbb300-ad44-11e8-943c-c3d2d3c5907b.png)
   
   https://user-images.githubusercontent.com/7818710/44940495-36964800-ad44-11e8-9a4a-3ff7790d14a9.png;>
   
   Notes:
   
   Syncing no longer happens at the end of the celery executor executions (e.g. 
if scheduler shuts down). The sync does not actually guarantee that tasks 
finished anyways and prolongs the ending protocol.
   There is no timeout on the subprocesses but that wasn't the case before this 
change either
   What about logging for the multiprocessing tasks? Well it's ok to skip them, 
they aren't currently logged either.
   
   ### Tests
   
   - [x] My PR adds the following unit tests:
   
tests/executors/test_celery_executor.py:CeleryExecutorTest.test_exception_propagation
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Parallelize Celery Executor
> ---
>
> Key: AIRFLOW-2156
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2156
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: celery
>Reporter: Dan Davydov
>Assignee: Dan Davydov
>Priority: Major
>
> The CeleryExecutor doesn't currently support parallel execution to check task 
> states since Celery does not support this. This can greatly slow down the 
> Scheduler loops since each request to check a task's state is a network 
> request.
>  
> The Celery Executor should parallelize these requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] yrqls21 opened a new pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-08-31 Thread GitBox
yrqls21 opened a new pull request #3830: [AIRFLOW-2156] Parallelize Celery 
Executor
URL: https://github.com/apache/incubator-airflow/pull/3830
 
 
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-2156) issues and references 
them in the PR title.
 - https://issues.apache.org/jira/browse/AIRFLOW-2156
   
   ### Description
   This change is mostly authored by @aoen. I am merely doing the tests/small 
fix and PR publishing due to job change of him.
   
   The change would bring Airflow to meet 5 min scheduling SLA with 30k running 
tasks according to Airbnb production traffic and stress tests. The performance 
of the celery querying step is hugely improved by 15x+ with 16 processors and 
can potentially be fast with more processors.
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   ![screen shot 2018-06-13 at 12 01 17 am 
copy](https://user-images.githubusercontent.com/7818710/44940493-2bdbb300-ad44-11e8-943c-c3d2d3c5907b.png)
   
   https://user-images.githubusercontent.com/7818710/44940495-36964800-ad44-11e8-9a4a-3ff7790d14a9.png;>
   
   Notes:
   
   Syncing no longer happens at the end of the celery executor executions (e.g. 
if scheduler shuts down). The sync does not actually guarantee that tasks 
finished anyways and prolongs the ending protocol.
   There is no timeout on the subprocesses but that wasn't the case before this 
change either
   What about logging for the multiprocessing tasks? Well it's ok to skip them, 
they aren't currently logged either.
   
   ### Tests
   
   - [x] My PR adds the following unit tests:
   
tests/executors/test_celery_executor.py:CeleryExecutorTest.test_exception_propagation
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-2984) Cannot convert naive_datetime when task has a naive start_date/end_date

2018-08-31 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-2984.
-
Resolution: Fixed
  Assignee: Bolke de Bruin

Resolved by https://github.com/apache/incubator-airflow/pull/3822

> Cannot convert naive_datetime when task has a naive start_date/end_date
> ---
>
> Key: AIRFLOW-2984
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2984
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Bolke de Bruin
>Assignee: Bolke de Bruin
>Priority: Major
> Fix For: 1.10.1
>
>
> Task can have a start_date / end_date separately from the DAG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2995) Controlling access via AD security group

2018-08-31 Thread Art Fotinich (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Art Fotinich updated AIRFLOW-2995:
--
Description: 
Hi Folks,

I had the following task, which seemed to be quite logical: restrict user 
access to Airflow UI by using an AD security group with a select list of a few 
users (for a company that has 100's of people in its AD). It looks to me like 
regular airflow ldap configuration parameters do not provide this option.

The few examples available online showed usage of security groups for 
group_filter and superuser_filter, which do not fully restrict the Web UI 
access but only limit some features thereof. The most obvious parameter, 
user_filter, would have expected value as something simple like: 
*objectClass=user* and would error out on a properly configures ldap search 
query that included the security group...

After reviewing the *ldap_auth.py* it looks to me like the user authentication 
is controlled by these strings: "(&(\{0})(\{1}=\{2}))".format(user_filter, 
user_name_att, username). As a result, the security group may be included into 
the authentication process by formatting the user_filter parameter as follows: 

*objectClass=user)(memberOf=cn=airflow-users,ou=Security 
Groups,dc=company_division,dc=company_name,dc=domain*

_Note the missing parentheses around the string_. This string would allow 
restricting Airflow access to the members of the security group, which was the 
goal. This works for us but looks like a hack to me.

It looks like Airflow would benefit from restructuring the ldap queries in 
ldap_auth.py to include a couple of new parameters (named like 
*object_class_filter* and *security_group_filter*) instead of just one, 
user_filter. This would make ldap authentication process clearer and more self 
explanatory. 

Please let me know if it makes sense. Our team would be willing to contribute 
the required change unless it is already in works.

Thanks 

 

  was:
Hi Folks,

I had the following task, which seemed to be quite logical: restrict user 
access to Airflow UI by using an AD security group with a select list of a few 
users (for a company that has 100's of people in its AD). It looks to me like 
regular airflow ldap configuration parameters do not provide this option.

The few examples available online showed usage of security groups for 
group_filter and superuser_filter, which do not full restrict the Web UI access 
but only limit some features thereof. The most obvious parameter, user_filter, 
would have expected value as something simple like: *objectClass=user* and 
would error out on a properly configures ldap search query that included the 
security group.

After reviewing the *ldap_auth.py* it looks to me like the user authentication 
is controlled by these strings: "(&(\{0})(\{1}=\{2}))".format(user_filter, 
user_name_att, username). As a result, the security group may be included into 
the authentication process by formatting the user_filter parameter as follows: 

*objectClass=user)(memberOf=cn=airflow-users,ou=Security 
Groups,dc=company_division,dc=company_name,dc=domain*

_Note the missing parentheses around the string_. This string would allow 
restricting Airflow access to the members of the security group, which was the 
goal of this hack. This works for us but looks like a hack to me.

It looks like *ldap_auth.py* would ** benefit from restructuring the ldap 
queries to include a couple of parameters (named like object_class_filter and 
security_group_filter) instead of just one, user_filter. This would make ldap 
authentication process much more clear and self explanatory. 

Please let me know if it makes sense. Our team would be willing to contribute 
the required change unless it is already in works.

Thanks 

 


> Controlling access via AD security group
> 
>
> Key: AIRFLOW-2995
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2995
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: authentication, security
>Affects Versions: 1.9.0
>Reporter: Art Fotinich
>Priority: Minor
>
> Hi Folks,
> I had the following task, which seemed to be quite logical: restrict user 
> access to Airflow UI by using an AD security group with a select list of a 
> few users (for a company that has 100's of people in its AD). It looks to me 
> like regular airflow ldap configuration parameters do not provide this option.
> The few examples available online showed usage of security groups for 
> group_filter and superuser_filter, which do not fully restrict the Web UI 
> access but only limit some features thereof. The most obvious parameter, 
> user_filter, would have expected value as something simple like: 
> *objectClass=user* and would error out on a properly configures ldap search 
> query that included the 

[jira] [Created] (AIRFLOW-2995) Controlling access via AD security group

2018-08-31 Thread Art Fotinich (JIRA)
Art Fotinich created AIRFLOW-2995:
-

 Summary: Controlling access via AD security group
 Key: AIRFLOW-2995
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2995
 Project: Apache Airflow
  Issue Type: Improvement
  Components: authentication, security
Affects Versions: 1.9.0
Reporter: Art Fotinich


Hi Folks,

I had the following task, which seemed to be quite logical: restrict user 
access to Airflow UI by using an AD security group with a select list of a few 
users (for a company that has 100's of people in its AD). It looks to me like 
regular airflow ldap configuration parameters do not provide this option.

The few examples available online showed usage of security groups for 
group_filter and superuser_filter, which do not full restrict the Web UI access 
but only limit some features thereof. The most obvious parameter, user_filter, 
would have expected value as something simple like: *objectClass=user* and 
would error out on a properly configures ldap search query that included the 
security group.

After reviewing the *ldap_auth.py* it looks to me like the user authentication 
is controlled by these strings: "(&(\{0})(\{1}=\{2}))".format(user_filter, 
user_name_att, username). As a result, the security group may be included into 
the authentication process by formatting the user_filter parameter as follows: 

*objectClass=user)(memberOf=cn=airflow-users,ou=Security 
Groups,dc=company_division,dc=company_name,dc=domain*

_Note the missing parentheses around the string_. This string would allow 
restricting Airflow access to the members of the security group, which was the 
goal of this hack. This works for us but looks like a hack to me.

It looks like *ldap_auth.py* would ** benefit from restructuring the ldap 
queries to include a couple of parameters (named like object_class_filter and 
security_group_filter) instead of just one, user_filter. This would make ldap 
authentication process much more clear and self explanatory. 

Please let me know if it makes sense. Our team would be willing to contribute 
the required change unless it is already in works.

Thanks 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kaxil commented on issue #2338: [AIRFLOW-1262] Allow configuration of email alert subject and body

2018-08-31 Thread GitBox
kaxil commented on issue #2338: [AIRFLOW-1262] Allow configuration of email 
alert subject and body
URL: 
https://github.com/apache/incubator-airflow/pull/2338#issuecomment-417816505
 
 
   @alekstorm Can you rebase with master and resolve conflicts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2145) Deadlock after clearing a running task

2018-08-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599411#comment-16599411
 ] 

ASF GitHub Bot commented on AIRFLOW-2145:
-

kaxil closed pull request #3657: [AIRFLOW-2145] fix deadlock on clearing 
running task instance
URL: https://github.com/apache/incubator-airflow/pull/3657
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/utils/state.py b/airflow/utils/state.py
index 9da98510eb..a351df07b9 100644
--- a/airflow/utils/state.py
+++ b/airflow/utils/state.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -101,7 +101,6 @@ def finished(cls):
 """
 return [
 cls.SUCCESS,
-cls.SHUTDOWN,
 cls.FAILED,
 cls.SKIPPED,
 ]
@@ -117,5 +116,6 @@ def unfinished(cls):
 cls.SCHEDULED,
 cls.QUEUED,
 cls.RUNNING,
+cls.SHUTDOWN,
 cls.UP_FOR_RETRY
 ]
diff --git a/tests/models.py b/tests/models.py
index 1c88ea47f7..529ae56454 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -801,7 +801,26 @@ def test_dagrun_deadlock(self):
 dr.update_state()
 self.assertEqual(dr.state, State.FAILED)
 
-def test_dagrun_no_deadlock(self):
+def test_dagrun_no_deadlock_with_shutdown(self):
+session = settings.Session()
+dag = DAG('test_dagrun_no_deadlock_with_shutdown',
+  start_date=DEFAULT_DATE)
+with dag:
+op1 = DummyOperator(task_id='upstream_task')
+op2 = DummyOperator(task_id='downstream_task')
+op2.set_upstream(op1)
+
+dr = dag.create_dagrun(run_id='test_dagrun_no_deadlock_with_shutdown',
+   state=State.RUNNING,
+   execution_date=DEFAULT_DATE,
+   start_date=DEFAULT_DATE)
+upstream_ti = dr.get_task_instance(task_id='upstream_task')
+upstream_ti.set_state(State.SHUTDOWN, session=session)
+
+dr.update_state()
+self.assertEqual(dr.state, State.RUNNING)
+
+def test_dagrun_no_deadlock_with_depends_on_past(self):
 session = settings.Session()
 dag = DAG('test_dagrun_no_deadlock',
   start_date=DEFAULT_DATE)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Deadlock after clearing a running task
> --
>
> Key: AIRFLOW-2145
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2145
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: George Roldugin
>Priority: Minor
> Fix For: 1.10.1
>
> Attachments: image-2018-02-23-18-59-11-828.png, 
> image-2018-02-23-19-00-37-741.png, image-2018-02-23-19-00-55-630.png, 
> image-2018-02-23-19-01-45-012.png, image-2018-02-23-19-01-57-498.png, 
> image-2018-02-23-19-02-18-837.png
>
>
> TL;DR The essense of the issue is that whenever a currently running ask is 
> cleared, the dagrun enters a deadlocked state and fails.
>  
> We see this in production with Celery executors and {{TimeDeltaSensor}}, and 
> I've been able to reproduce it locally with both {{TimeDeltaSensor}} and 
> {{WebHDFSSensor}}.
> Here's the minimal example:
> {code:java}
> from datetime import datetime, timedelta
> import airflow
> from airflow.operators.sensors import TimeDeltaSensor
> from airflow.operators.dummy_operator import DummyOperator
> with airflow.DAG(
> 'foo',
> schedule_interval='@daily',
> start_date=datetime(2018, 1, 1)) as dag:
> wait_for_upstream_sla = TimeDeltaSensor(
> task_id="wait_for_upstream_sla",
> delta=timedelta(days=365*10)
> )
> do_work = DummyOperator(task_id='do_work')
> dag >> wait_for_upstream_sla >> do_work
> {code}
>  
> Sequence of actions, relevant DEBUG level logs, and some UI screenshots
> {code:java}
> airflow clear foo -e 2018-02-22 --no_confirm && airflow backfill foo -s 
> 2018-02-22 

[jira] [Resolved] (AIRFLOW-2145) Deadlock after clearing a running task

2018-08-31 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-2145.
-
   Resolution: Fixed
Fix Version/s: 1.10.1

Resolved by https://github.com/apache/incubator-airflow/pull/3657

> Deadlock after clearing a running task
> --
>
> Key: AIRFLOW-2145
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2145
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: George Roldugin
>Priority: Minor
> Fix For: 1.10.1
>
> Attachments: image-2018-02-23-18-59-11-828.png, 
> image-2018-02-23-19-00-37-741.png, image-2018-02-23-19-00-55-630.png, 
> image-2018-02-23-19-01-45-012.png, image-2018-02-23-19-01-57-498.png, 
> image-2018-02-23-19-02-18-837.png
>
>
> TL;DR The essense of the issue is that whenever a currently running ask is 
> cleared, the dagrun enters a deadlocked state and fails.
>  
> We see this in production with Celery executors and {{TimeDeltaSensor}}, and 
> I've been able to reproduce it locally with both {{TimeDeltaSensor}} and 
> {{WebHDFSSensor}}.
> Here's the minimal example:
> {code:java}
> from datetime import datetime, timedelta
> import airflow
> from airflow.operators.sensors import TimeDeltaSensor
> from airflow.operators.dummy_operator import DummyOperator
> with airflow.DAG(
> 'foo',
> schedule_interval='@daily',
> start_date=datetime(2018, 1, 1)) as dag:
> wait_for_upstream_sla = TimeDeltaSensor(
> task_id="wait_for_upstream_sla",
> delta=timedelta(days=365*10)
> )
> do_work = DummyOperator(task_id='do_work')
> dag >> wait_for_upstream_sla >> do_work
> {code}
>  
> Sequence of actions, relevant DEBUG level logs, and some UI screenshots
> {code:java}
> airflow clear foo -e 2018-02-22 --no_confirm && airflow backfill foo -s 
> 2018-02-22 -e 2018-02-22{code}
> {code:java}
> [2018-02-23 17:17:45,983] {__init__.py:45} INFO - Using executor 
> SequentialExecutor
> [2018-02-23 17:17:46,069] {models.py:189} INFO - Filling up the DagBag from 
> /Users/grol/Drive/dev/reporting/dags
> ...
> [2018-02-23 17:17:47,563] {jobs.py:2180} DEBUG - Task instance to run 
>  
> state scheduled
> ...
> {code}
> !image-2018-02-23-18-59-11-828.png|width=418,height=87!
> Now we clear all DAG's tasks externally:
> {code:java}
> airflow clear foo -e 2018-02-22 --no_confirm
> {code}
> This causes the following:
> {code:java}
> [2018-02-23 17:17:55,258] {base_task_runner.py:98} INFO - Subtask: 
> [2018-02-23 17:17:55,258] {sensors.py:629} INFO - Checking if the time 
> (2018-02-23 16:19:00) has come
> [2018-02-23 17:17:58,844] {jobs.py:184} DEBUG - [heart] Boom.
> [2018-02-23 17:18:03,848] {jobs.py:184} DEBUG - [heart] Boom.
> [2018-02-23 17:18:08,856] {jobs.py:2585} WARNING - State of this instance has 
> been externally set to shutdown. Taking the poison pill.
> [2018-02-23 17:18:08,874] {helpers.py:266} DEBUG - There are no descendant 
> processes to kill
> [2018-02-23 17:18:08,875] {jobs.py:184} DEBUG - [heart] Boom.
> [2018-02-23 17:18:08,900] {helpers.py:266} DEBUG - There are no descendant 
> processes to kill
> [2018-02-23 17:18:08,922] {helpers.py:266} DEBUG - There are no descendant 
> processes to kill
> [2018-02-23 17:18:09,005] {sequential_executor.py:47} ERROR - Failed to 
> execute task Command 'airflow run foo wait_for_upstream_sla 
> 2018-02-22T00:00:00 --local -sd DAGS_FOLDER/foo.py' returned non-zero exit 
> status 1.
> [2018-02-23 17:18:09,012] {jobs.py:2004} DEBUG - Executor state: failed task 
> 
> [2018-02-23 17:18:09,018] {models.py:4584} INFO - Updating state for  foo @ 2018-02-22 00:00:00: backfill_2018-02-22T00:00:00, externally 
> triggered: False> considering 2 task(s)
> [2018-02-23 17:18:09,021] {models.py:1215} DEBUG -  2018-02-22 00:00:00 [None]> dependency 'Previous Dagrun State' PASSED: True, 
> The task did not have depends_on_past set.
> [2018-02-23 17:18:09,021] {models.py:1215} DEBUG -  2018-02-22 00:00:00 [None]> dependency 'Not In Retry Period' PASSED: True, 
> The context specified that being in a retry period was permitted.
> [2018-02-23 17:18:09,027] {models.py:1215} DEBUG -  2018-02-22 00:00:00 [None]> dependency 'Trigger Rule' PASSED: False, Task's 
> trigger rule 'all_success' requires all upstream tasks to have succeeded, but 
> found 1 non-success(es). upstream_tasks_state={'skipped': 0, 'successes': 0, 
> 'failed': 0, 'upstream_failed': 0, 'done': 0, 'total': 1}, 
> upstream_task_ids=['wait_for_upstream_sla']
> [2018-02-23 17:18:09,029] {models.py:4643} INFO - Deadlock; marking run 
>  triggered: False> failed
> [2018-02-23 17:18:09,045] {jobs.py:2125} INFO - [backfill progress] | 
> finished run 1 of 1 | tasks waiting: 1 | succeeded: 0 | kicked_off: 1 | 
> failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 1
> [2018-02-23 

[GitHub] kaxil closed pull request #3657: [AIRFLOW-2145] fix deadlock on clearing running task instance

2018-08-31 Thread GitBox
kaxil closed pull request #3657: [AIRFLOW-2145] fix deadlock on clearing 
running task instance
URL: https://github.com/apache/incubator-airflow/pull/3657
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/utils/state.py b/airflow/utils/state.py
index 9da98510eb..a351df07b9 100644
--- a/airflow/utils/state.py
+++ b/airflow/utils/state.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -101,7 +101,6 @@ def finished(cls):
 """
 return [
 cls.SUCCESS,
-cls.SHUTDOWN,
 cls.FAILED,
 cls.SKIPPED,
 ]
@@ -117,5 +116,6 @@ def unfinished(cls):
 cls.SCHEDULED,
 cls.QUEUED,
 cls.RUNNING,
+cls.SHUTDOWN,
 cls.UP_FOR_RETRY
 ]
diff --git a/tests/models.py b/tests/models.py
index 1c88ea47f7..529ae56454 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -801,7 +801,26 @@ def test_dagrun_deadlock(self):
 dr.update_state()
 self.assertEqual(dr.state, State.FAILED)
 
-def test_dagrun_no_deadlock(self):
+def test_dagrun_no_deadlock_with_shutdown(self):
+session = settings.Session()
+dag = DAG('test_dagrun_no_deadlock_with_shutdown',
+  start_date=DEFAULT_DATE)
+with dag:
+op1 = DummyOperator(task_id='upstream_task')
+op2 = DummyOperator(task_id='downstream_task')
+op2.set_upstream(op1)
+
+dr = dag.create_dagrun(run_id='test_dagrun_no_deadlock_with_shutdown',
+   state=State.RUNNING,
+   execution_date=DEFAULT_DATE,
+   start_date=DEFAULT_DATE)
+upstream_ti = dr.get_task_instance(task_id='upstream_task')
+upstream_ti.set_state(State.SHUTDOWN, session=session)
+
+dr.update_state()
+self.assertEqual(dr.state, State.RUNNING)
+
+def test_dagrun_no_deadlock_with_depends_on_past(self):
 session = settings.Session()
 dag = DAG('test_dagrun_no_deadlock',
   start_date=DEFAULT_DATE)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-2951) dag_run end_date Null after a dag is finished

2018-08-31 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-2951.
-
   Resolution: Fixed
Fix Version/s: 1.10.1

Resolved by https://github.com/apache/incubator-airflow/pull/3798

> dag_run end_date Null after a dag is finished
> -
>
> Key: AIRFLOW-2951
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2951
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: DagRun
>Reporter: Yingbo Wang
>Assignee: Yingbo Wang
>Priority: Major
> Fix For: 1.10.1
>
>
> dag_run table should have an end_date updated when a dag is finished. 
> Currently only user activated dag termination request coming from UI may 
> change the "end_date" in dag_run table. All scheduled dags that are 
> automatically running by airflow will leave a NULL value after they fall into 
> a "success" or "failed" state. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2951) dag_run end_date Null after a dag is finished

2018-08-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599403#comment-16599403
 ] 

ASF GitHub Bot commented on AIRFLOW-2951:
-

kaxil closed pull request #3798: [AIRFLOW-2951] Update dag_run table end_date 
when state change
URL: https://github.com/apache/incubator-airflow/pull/3798
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/api/common/experimental/mark_tasks.py 
b/airflow/api/common/experimental/mark_tasks.py
index 88c5275f5a..2fac1254cd 100644
--- a/airflow/api/common/experimental/mark_tasks.py
+++ b/airflow/api/common/experimental/mark_tasks.py
@@ -208,6 +208,7 @@ def _set_dag_run_state(dag_id, execution_date, state, 
session=None):
 dr.state = state
 if state == State.RUNNING:
 dr.start_date = timezone.utcnow()
+dr.end_date = None
 else:
 dr.end_date = timezone.utcnow()
 session.commit()
diff --git a/airflow/models.py b/airflow/models.py
index 55badf4828..6c8031c18c 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -4840,6 +4840,8 @@ def get_state(self):
 def set_state(self, state):
 if self._state != state:
 self._state = state
+self.end_date = timezone.utcnow() if self._state in 
State.finished() else None
+
 if self.dag_id is not None:
 # FIXME: Due to the scoped_session factor we we don't get a 
clean
 # session here, so something really weird goes on:
@@ -5063,7 +5065,7 @@ def update_state(self, session=None):
 if (not unfinished_tasks and
 any(r.state in (State.FAILED, State.UPSTREAM_FAILED) for r 
in roots)):
 self.log.info('Marking run %s failed', self)
-self.state = State.FAILED
+self.set_state(State.FAILED)
 dag.handle_callback(self, success=False, reason='task_failure',
 session=session)
 
@@ -5071,20 +5073,20 @@ def update_state(self, session=None):
 elif not unfinished_tasks and all(r.state in (State.SUCCESS, 
State.SKIPPED)
   for r in roots):
 self.log.info('Marking run %s successful', self)
-self.state = State.SUCCESS
+self.set_state(State.SUCCESS)
 dag.handle_callback(self, success=True, reason='success', 
session=session)
 
 # if *all tasks* are deadlocked, the run failed
 elif (unfinished_tasks and none_depends_on_past and
   none_task_concurrency and no_dependencies_met):
 self.log.info('Deadlock; marking run %s failed', self)
-self.state = State.FAILED
+self.set_state(State.FAILED)
 dag.handle_callback(self, success=False, 
reason='all_tasks_deadlocked',
 session=session)
 
 # finally, if the roots aren't done, the dag is still running
 else:
-self.state = State.RUNNING
+self.set_state(State.RUNNING)
 
 # todo: determine we want to use with_for_update to make sure to lock 
the run
 session.merge(self)
diff --git a/tests/models.py b/tests/models.py
index a1fd1e9912..7adeb3acdd 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -896,6 +896,124 @@ def on_failure_callable(context):
 updated_dag_state = dag_run.update_state()
 self.assertEqual(State.FAILED, updated_dag_state)
 
+def test_dagrun_set_state_end_date(self):
+session = settings.Session()
+
+dag = DAG(
+'test_dagrun_set_state_end_date',
+start_date=DEFAULT_DATE,
+default_args={'owner': 'owner1'})
+
+dag.clear()
+
+now = timezone.utcnow()
+dr = dag.create_dagrun(run_id='test_dagrun_set_state_end_date',
+   state=State.RUNNING,
+   execution_date=now,
+   start_date=now)
+
+# Initial end_date should be NULL
+# State.SUCCESS and State.FAILED are all ending state and should set 
end_date
+# State.RUNNING set end_date back to NULL
+session.add(dr)
+session.commit()
+self.assertIsNone(dr.end_date)
+
+dr.set_state(State.SUCCESS)
+session.merge(dr)
+session.commit()
+
+dr_database = session.query(DagRun).filter(
+DagRun.run_id == 'test_dagrun_set_state_end_date'
+).one()
+self.assertIsNotNone(dr_database.end_date)
+self.assertEqual(dr.end_date, dr_database.end_date)
+
+

[GitHub] kaxil closed pull request #3798: [AIRFLOW-2951] Update dag_run table end_date when state change

2018-08-31 Thread GitBox
kaxil closed pull request #3798: [AIRFLOW-2951] Update dag_run table end_date 
when state change
URL: https://github.com/apache/incubator-airflow/pull/3798
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/api/common/experimental/mark_tasks.py 
b/airflow/api/common/experimental/mark_tasks.py
index 88c5275f5a..2fac1254cd 100644
--- a/airflow/api/common/experimental/mark_tasks.py
+++ b/airflow/api/common/experimental/mark_tasks.py
@@ -208,6 +208,7 @@ def _set_dag_run_state(dag_id, execution_date, state, 
session=None):
 dr.state = state
 if state == State.RUNNING:
 dr.start_date = timezone.utcnow()
+dr.end_date = None
 else:
 dr.end_date = timezone.utcnow()
 session.commit()
diff --git a/airflow/models.py b/airflow/models.py
index 55badf4828..6c8031c18c 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -4840,6 +4840,8 @@ def get_state(self):
 def set_state(self, state):
 if self._state != state:
 self._state = state
+self.end_date = timezone.utcnow() if self._state in 
State.finished() else None
+
 if self.dag_id is not None:
 # FIXME: Due to the scoped_session factor we we don't get a 
clean
 # session here, so something really weird goes on:
@@ -5063,7 +5065,7 @@ def update_state(self, session=None):
 if (not unfinished_tasks and
 any(r.state in (State.FAILED, State.UPSTREAM_FAILED) for r 
in roots)):
 self.log.info('Marking run %s failed', self)
-self.state = State.FAILED
+self.set_state(State.FAILED)
 dag.handle_callback(self, success=False, reason='task_failure',
 session=session)
 
@@ -5071,20 +5073,20 @@ def update_state(self, session=None):
 elif not unfinished_tasks and all(r.state in (State.SUCCESS, 
State.SKIPPED)
   for r in roots):
 self.log.info('Marking run %s successful', self)
-self.state = State.SUCCESS
+self.set_state(State.SUCCESS)
 dag.handle_callback(self, success=True, reason='success', 
session=session)
 
 # if *all tasks* are deadlocked, the run failed
 elif (unfinished_tasks and none_depends_on_past and
   none_task_concurrency and no_dependencies_met):
 self.log.info('Deadlock; marking run %s failed', self)
-self.state = State.FAILED
+self.set_state(State.FAILED)
 dag.handle_callback(self, success=False, 
reason='all_tasks_deadlocked',
 session=session)
 
 # finally, if the roots aren't done, the dag is still running
 else:
-self.state = State.RUNNING
+self.set_state(State.RUNNING)
 
 # todo: determine we want to use with_for_update to make sure to lock 
the run
 session.merge(self)
diff --git a/tests/models.py b/tests/models.py
index a1fd1e9912..7adeb3acdd 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -896,6 +896,124 @@ def on_failure_callable(context):
 updated_dag_state = dag_run.update_state()
 self.assertEqual(State.FAILED, updated_dag_state)
 
+def test_dagrun_set_state_end_date(self):
+session = settings.Session()
+
+dag = DAG(
+'test_dagrun_set_state_end_date',
+start_date=DEFAULT_DATE,
+default_args={'owner': 'owner1'})
+
+dag.clear()
+
+now = timezone.utcnow()
+dr = dag.create_dagrun(run_id='test_dagrun_set_state_end_date',
+   state=State.RUNNING,
+   execution_date=now,
+   start_date=now)
+
+# Initial end_date should be NULL
+# State.SUCCESS and State.FAILED are all ending state and should set 
end_date
+# State.RUNNING set end_date back to NULL
+session.add(dr)
+session.commit()
+self.assertIsNone(dr.end_date)
+
+dr.set_state(State.SUCCESS)
+session.merge(dr)
+session.commit()
+
+dr_database = session.query(DagRun).filter(
+DagRun.run_id == 'test_dagrun_set_state_end_date'
+).one()
+self.assertIsNotNone(dr_database.end_date)
+self.assertEqual(dr.end_date, dr_database.end_date)
+
+dr.set_state(State.RUNNING)
+session.merge(dr)
+session.commit()
+
+dr_database = session.query(DagRun).filter(
+DagRun.run_id == 'test_dagrun_set_state_end_date'
+).one()
+
+

[GitHub] kaxil commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-31 Thread GitBox
kaxil commented on a change in pull request #3733: [AIRFLOW-491] Add cache 
parameter in BigQuery query method - with 'api_resource_configs'
URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r214496087
 
 

 ##
 File path: airflow/contrib/operators/bigquery_operator.py
 ##
 @@ -118,7 +118,8 @@ def __init__(self,
  query_params=None,
  labels=None,
  priority='INTERACTIVE',
- time_partitioning={},
+ time_partitioning=None,
+ api_resource_configs=None,
 
 Review comment:
   Add this parameter in docstring with example.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'

2018-08-31 Thread GitBox
codecov-io edited a comment on issue #3733: [AIRFLOW-491] Add cache parameter 
in BigQuery query method - with 'api_resource_configs'
URL: 
https://github.com/apache/incubator-airflow/pull/3733#issuecomment-413105867
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=h1)
 Report
   > Merging 
[#3733](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/98a787485ba311dafad13aab74f0d5f11d6c7856?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3733/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3733   +/-   ##
   ===
 Coverage   77.44%   77.44%   
   ===
 Files 203  203   
 Lines   1584415844   
   ===
 Hits1227012270   
 Misses   3574 3574
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=footer).
 Last update 
[98a7874...bc9f7d6](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yrqls21 commented on issue #3798: [AIRFLOW-2951] Update dag_run table end_date when state change

2018-08-31 Thread GitBox
yrqls21 commented on issue #3798: [AIRFLOW-2951] Update dag_run table end_date 
when state change
URL: 
https://github.com/apache/incubator-airflow/pull/3798#issuecomment-417808101
 
 
   @Fokko Nope it looks to me, tyvm.
   @YingboWang  Good job.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3661: [AIRFLOW-2780] Adds IMAP Hook to interact with a mail server

2018-08-31 Thread GitBox
codecov-io edited a comment on issue #3661: [AIRFLOW-2780] Adds IMAP Hook to 
interact with a mail server
URL: 
https://github.com/apache/incubator-airflow/pull/3661#issuecomment-408640089
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=h1)
 Report
   > Merging 
[#3661](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/98a787485ba311dafad13aab74f0d5f11d6c7856?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3661/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3661  +/-   ##
   ==
   - Coverage   77.44%   77.43%   -0.01% 
   ==
 Files 203  203  
 Lines   1584415844  
   ==
   - Hits1227012269   -1 
   - Misses   3574 3575   +1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3661/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.74% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=footer).
 Last update 
[98a7874...f4a0bf8](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3661: [AIRFLOW-2780] Adds IMAP Hook to interact with a mail server

2018-08-31 Thread GitBox
codecov-io edited a comment on issue #3661: [AIRFLOW-2780] Adds IMAP Hook to 
interact with a mail server
URL: 
https://github.com/apache/incubator-airflow/pull/3661#issuecomment-408640089
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=h1)
 Report
   > Merging 
[#3661](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/98a787485ba311dafad13aab74f0d5f11d6c7856?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3661/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3661  +/-   ##
   ==
   - Coverage   77.44%   77.43%   -0.01% 
   ==
 Files 203  203  
 Lines   1584415844  
   ==
   - Hits1227012269   -1 
   - Misses   3574 3575   +1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3661/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.74% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=footer).
 Last update 
[98a7874...f4a0bf8](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3661: [AIRFLOW-2780] Adds IMAP Hook to interact with a mail server

2018-08-31 Thread GitBox
codecov-io edited a comment on issue #3661: [AIRFLOW-2780] Adds IMAP Hook to 
interact with a mail server
URL: 
https://github.com/apache/incubator-airflow/pull/3661#issuecomment-408640089
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=h1)
 Report
   > Merging 
[#3661](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/98a787485ba311dafad13aab74f0d5f11d6c7856?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3661/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3661   +/-   ##
   ===
 Coverage   77.44%   77.44%   
   ===
 Files 203  203   
 Lines   1584415844   
   ===
 Hits1227012270   
 Misses   3574 3574
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=footer).
 Last update 
[98a7874...f4a0bf8](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3661: [AIRFLOW-2780] Adds IMAP Hook to interact with a mail server

2018-08-31 Thread GitBox
codecov-io edited a comment on issue #3661: [AIRFLOW-2780] Adds IMAP Hook to 
interact with a mail server
URL: 
https://github.com/apache/incubator-airflow/pull/3661#issuecomment-408640089
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=h1)
 Report
   > Merging 
[#3661](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/98a787485ba311dafad13aab74f0d5f11d6c7856?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3661/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3661   +/-   ##
   ===
 Coverage   77.44%   77.44%   
   ===
 Files 203  203   
 Lines   1584415844   
   ===
 Hits1227012270   
 Misses   3574 3574
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=footer).
 Last update 
[98a7874...f4a0bf8](https://codecov.io/gh/apache/incubator-airflow/pull/3661?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] dimberman commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose

2018-08-31 Thread GitBox
dimberman commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + 
docker-compose
URL: 
https://github.com/apache/incubator-airflow/pull/3797#issuecomment-417791464
 
 
   @gerardo ok further process. The main issue left is that it keeps attempting 
to compile the s3 tests even though there it's claiming there is no moto (this 
is after I attempted to install moto both in tox and in travis).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] danmactough commented on a change in pull request #3435: [AIRFLOW-2539] OS Environment variable for logging FILENAME_TEMPLATE

2018-08-31 Thread GitBox
danmactough commented on a change in pull request #3435: [AIRFLOW-2539] OS 
Environment variable for logging FILENAME_TEMPLATE
URL: https://github.com/apache/incubator-airflow/pull/3435#discussion_r214463797
 
 

 ##
 File path: airflow/config_templates/default_airflow.cfg
 ##
 @@ -61,9 +62,15 @@ logging_level = INFO
 logging_config_class =
 
 # Log format
+# we need to escape the curly braces by adding an additional curly brace
 log_format = [%%(asctime)s] {{%%(filename)s:%%(lineno)d}} %%(levelname)s - 
%%(message)s
 simple_log_format = %%(asctime)s %%(levelname)s - %%(message)s
 
+# Log filename format
+# we need to escape the curly braces by adding an additional curly brace
+log_filename_template =  ti.dag_id / ti.task_id / ts 
/ try_number .log
+log_processor_filename_template =  filename .log
 
 Review comment:
   @NielsZeilemaker I think these additional curly braces on LL71-72 (changing 
from double curly braces to 4 curly braces) are incorrect. In my environment, 
the break Jinja2 parsing. The additional curly braces on L540 (changing from 
single curly braces to double curly braces) are correct. I suspect this was 
just a mistake when making the changes, like maybe a wayward sed command.   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-2994) flatten_results in BigQueryOperator/BigQueryHook should default to None

2018-08-31 Thread Chris Riccomini (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-2994.
--
Resolution: Fixed

> flatten_results in BigQueryOperator/BigQueryHook should default to None
> ---
>
> Key: AIRFLOW-2994
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2994
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.0
>Reporter: Chris Riccomini
>Priority: Major
> Fix For: 1.10.1
>
>
> Upon upgrading to 1.10, we began seeing issues with our queries that were 
> using allow_large_results. They began failing because flatten_results now 
> defaults to False. This should default to unset (None), as it did before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jgao54 commented on issue #3829: [AIRFLOW-2994] Fix flatten_results for BigQueryOperator

2018-08-31 Thread GitBox
jgao54 commented on issue #3829: [AIRFLOW-2994] Fix flatten_results for 
BigQueryOperator
URL: 
https://github.com/apache/incubator-airflow/pull/3829#issuecomment-417771057
 
 
   lgtm!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2994) flatten_results in BigQueryOperator/BigQueryHook should default to None

2018-08-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599204#comment-16599204
 ] 

ASF GitHub Bot commented on AIRFLOW-2994:
-

jgao54 closed pull request #3829: [AIRFLOW-2994] Fix flatten_results for 
BigQueryOperator
URL: https://github.com/apache/incubator-airflow/pull/3829
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index e4c0653bfe..aee84e9eef 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -476,7 +476,7 @@ def run_query(self,
   destination_dataset_table=False,
   write_disposition='WRITE_EMPTY',
   allow_large_results=False,
-  flatten_results=False,
+  flatten_results=None,
   udf_config=False,
   use_legacy_sql=None,
   maximum_billing_tier=None,
diff --git a/airflow/contrib/operators/bigquery_operator.py 
b/airflow/contrib/operators/bigquery_operator.py
index bda0b08c23..a8fdc66ec9 100644
--- a/airflow/contrib/operators/bigquery_operator.py
+++ b/airflow/contrib/operators/bigquery_operator.py
@@ -106,7 +106,7 @@ def __init__(self,
  destination_dataset_table=False,
  write_disposition='WRITE_EMPTY',
  allow_large_results=False,
- flatten_results=False,
+ flatten_results=None,
  bigquery_conn_id='bigquery_default',
  delegate_to=None,
  udf_config=False,


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> flatten_results in BigQueryOperator/BigQueryHook should default to None
> ---
>
> Key: AIRFLOW-2994
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2994
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.0
>Reporter: Chris Riccomini
>Priority: Major
> Fix For: 1.10.1
>
>
> Upon upgrading to 1.10, we began seeing issues with our queries that were 
> using allow_large_results. They began failing because flatten_results now 
> defaults to False. This should default to unset (None), as it did before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jgao54 closed pull request #3829: [AIRFLOW-2994] Fix flatten_results for BigQueryOperator

2018-08-31 Thread GitBox
jgao54 closed pull request #3829: [AIRFLOW-2994] Fix flatten_results for 
BigQueryOperator
URL: https://github.com/apache/incubator-airflow/pull/3829
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index e4c0653bfe..aee84e9eef 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -476,7 +476,7 @@ def run_query(self,
   destination_dataset_table=False,
   write_disposition='WRITE_EMPTY',
   allow_large_results=False,
-  flatten_results=False,
+  flatten_results=None,
   udf_config=False,
   use_legacy_sql=None,
   maximum_billing_tier=None,
diff --git a/airflow/contrib/operators/bigquery_operator.py 
b/airflow/contrib/operators/bigquery_operator.py
index bda0b08c23..a8fdc66ec9 100644
--- a/airflow/contrib/operators/bigquery_operator.py
+++ b/airflow/contrib/operators/bigquery_operator.py
@@ -106,7 +106,7 @@ def __init__(self,
  destination_dataset_table=False,
  write_disposition='WRITE_EMPTY',
  allow_large_results=False,
- flatten_results=False,
+ flatten_results=None,
  bigquery_conn_id='bigquery_default',
  delegate_to=None,
  udf_config=False,


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators

2018-08-31 Thread GitBox
codecov-io edited a comment on issue #3828: [AIRFLOW-2993] s3_to_sftp and 
sftp_to_s3 operators
URL: 
https://github.com/apache/incubator-airflow/pull/3828#issuecomment-417764201
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr=h1)
 Report
   > Merging 
[#3828](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/ba27fcad69fa914cc55c530f1716b676974c7bd5?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3828/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3828   +/-   ##
   ===
 Coverage   77.44%   77.44%   
   ===
 Files 203  203   
 Lines   1584415844   
   ===
 Hits1227012270   
 Misses   3574 3574
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr=footer).
 Last update 
[ba27fca...15d5dbd](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feluelle edited a comment on issue #3796: [AIRFLOW-2824] - Add config to disable default conn creation

2018-08-31 Thread GitBox
feluelle edited a comment on issue #3796: [AIRFLOW-2824] - Add config to 
disable default conn creation
URL: 
https://github.com/apache/incubator-airflow/pull/3796#issuecomment-417759659
 
 
   My opinion on this is that I can not initialize my database by using 
`airflow upgradedb` just because of the name. It does not make sense to me. I 
mean how can I `upgrade` a database when I did not even initialize it?!
   
   > RBAC and KET should probably be in upgradedb. It might be worth merging 
them in to a single command behind a flag airflow initdb --with-examples which 
is more obvious/delibarate than having to know that upgradedb exists when all 
the tutorials just talk about initdb. THoughts?
   
   I also dont know any case where I want to have example dags and not the 
default connections or vice versa. But I probably would not remove the 
possiblity to do that. Would you? :/
   
   So in general `airflow initdb --with-examples` looks good to me but if you 
wanna be able to set both options seperatly it does not work.. :/
   
   I think we should get more opinions on this.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feluelle edited a comment on issue #3796: [AIRFLOW-2824] - Add config to disable default conn creation

2018-08-31 Thread GitBox
feluelle edited a comment on issue #3796: [AIRFLOW-2824] - Add config to 
disable default conn creation
URL: 
https://github.com/apache/incubator-airflow/pull/3796#issuecomment-417759659
 
 
   My opinion on this is that I can not initialize my database by using 
`airflow upgradedb` just because of the name. It does not make sense to me. I 
mean how can I `upgrade` a database when I did not even initialize it?!
   
   > RBAC and KET should probably be in upgradedb. It might be worth merging 
them in to a single command behind a flag airflow initdb --with-examples which 
is more obvious/delibarate than having to know that upgradedb exists when all 
the tutorials just talk about initdb. THoughts?
   
   I also dont know any case where I want to have example dags and not the 
default connections or vice versa. So `airflow initdb --with-examples` looks 
good to me.  
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators

2018-08-31 Thread GitBox
codecov-io commented on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 
operators
URL: 
https://github.com/apache/incubator-airflow/pull/3828#issuecomment-417764201
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr=h1)
 Report
   > Merging 
[#3828](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/ba27fcad69fa914cc55c530f1716b676974c7bd5?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3828/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3828   +/-   ##
   ===
 Coverage   77.44%   77.44%   
   ===
 Files 203  203   
 Lines   1584415844   
   ===
 Hits1227012270   
 Misses   3574 3574
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr=footer).
 Last update 
[ba27fca...15d5dbd](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3796: [AIRFLOW-2824] - Add config to disable default conn creation

2018-08-31 Thread GitBox
feng-tao commented on issue #3796: [AIRFLOW-2824] - Add config to disable 
default conn creation
URL: 
https://github.com/apache/incubator-airflow/pull/3796#issuecomment-417762067
 
 
   Current RBAC models file are not available in airflow code base. I wonder 
whether we could do the alembic mgiration without the mode file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feluelle commented on issue #3796: [AIRFLOW-2824] - Add config to disable default conn creation

2018-08-31 Thread GitBox
feluelle commented on issue #3796: [AIRFLOW-2824] - Add config to disable 
default conn creation
URL: 
https://github.com/apache/incubator-airflow/pull/3796#issuecomment-417759659
 
 
   My opinion on this is that I can not initialize my database by using 
`airflow upgradedb` just because of the name. It does not make sense to me.
   
   > RBAC and KET should probably be in upgradedb. It might be worth merging 
them in to a single command behind a flag airflow initdb --with-examples which 
is more obvious/delibarate than having to know that upgradedb exists when all 
the tutorials just talk about initdb. THoughts?
   
   `airflow initdb --with-examples` looks good to me.  
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS encryption.

2018-08-31 Thread GitBox
codecov-io commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS 
encryption.
URL: 
https://github.com/apache/incubator-airflow/pull/3805#issuecomment-417758770
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3805?src=pr=h1)
 Report
   > Merging 
[#3805](https://codecov.io/gh/apache/incubator-airflow/pull/3805?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/ba27fcad69fa914cc55c530f1716b676974c7bd5?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `83.47%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3805/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3805?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3805  +/-   ##
   ==
   - Coverage   77.44%   77.43%   -0.01% 
   ==
 Files 203  204   +1 
 Lines   1584415891  +47 
   ==
   + Hits1227012305  +35 
   - Misses   3574 3586  +12
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3805?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3805/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5)
 | `64.78% <100%> (ø)` | :arrow_up: |
   | 
[airflow/hooks/base\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3805/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9iYXNlX2hvb2sucHk=)
 | `83.33% <100%> (-8.83%)` | :arrow_down: |
   | 
[airflow/hooks/kmsapi\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3805/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9rbXNhcGlfaG9vay5weQ==)
 | `75% <75%> (ø)` | |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3805/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.76% <83.16%> (-0.03%)` | :arrow_down: |
   | 
[airflow/www\_rbac/utils.py](https://codecov.io/gh/apache/incubator-airflow/pull/3805/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy91dGlscy5weQ==)
 | `67.1% <0%> (-1.84%)` | :arrow_down: |
   | 
[airflow/www/utils.py](https://codecov.io/gh/apache/incubator-airflow/pull/3805/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdXRpbHMucHk=)
 | `88.75% <0%> (-0.6%)` | :arrow_down: |
   | 
[airflow/operators/s3\_file\_transform\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3805/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvczNfZmlsZV90cmFuc2Zvcm1fb3BlcmF0b3IucHk=)
 | `93.87% <0%> (ø)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3805?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3805?src=pr=footer).
 Last update 
[ba27fca...b056873](https://codecov.io/gh/apache/incubator-airflow/pull/3805?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on a change in pull request #3533: [AIRFLOW-161] New redirect route and extra links

2018-08-31 Thread GitBox
feng-tao commented on a change in pull request #3533: [AIRFLOW-161] New 
redirect route and extra links
URL: https://github.com/apache/incubator-airflow/pull/3533#discussion_r214437723
 
 

 ##
 File path: airflow/www_rbac/views.py
 ##
 @@ -1705,6 +1715,68 @@ def gantt(self, session=None):
 root=root,
 )
 
+@expose('/extra_links')
+@has_access
 
 Review comment:
   need to add @has_dag_access something decorator for DAG level access


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators

2018-08-31 Thread GitBox
feng-tao commented on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 
operators
URL: 
https://github.com/apache/incubator-airflow/pull/3828#issuecomment-417748517
 
 
   Please squash commit and fix the conflict error.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2511) Subdag failed by scheduler deadlock

2018-08-31 Thread Trevor Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599093#comment-16599093
 ] 

Trevor Edwards commented on AIRFLOW-2511:
-

[https://github.com/apache/incubator-airflow/blob/f18e2550543e455c9701af0995bc393ee6a97b47/airflow/jobs.py#L2254]
 This commit seems to be the most likely culprit, as it looks like the only 
code that would generate the Original exception from the trace provided. SubDag 
operator creates a BackfillJob for its execution.

> Subdag failed by scheduler deadlock
> ---
>
> Key: AIRFLOW-2511
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2511
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Yohei Onishi
>Priority: Major
>
> I am using subdag and sometimes main dag marked failed because of the 
> following error. In this case, tasks in the subdag stopped.
> {code:java}
> hourly_dag = DAG(
>   hourly_dag_name,
>   default_args=dag_default_args,
>   params=dag_custom_params,
>   schedule_interval=config_values.hourly_job_interval,
>   max_active_runs=2)
> hourly_subdag = SubDagOperator(
>   task_id='s3_to_hive',
>   subdag=LoadFromS3ToHive(
>   hourly_dag,
>   's3_to_hive'),
>   dag=hourly_dag)
> {code}
> I got this error in main dag. bug in scheduler?
> {code:java}
> [2018-05-22 21:52:19,683] {models.py:1595} ERROR - This Session's transaction 
> has been rolled back due to a previous exception during flush. To begin a new 
> transaction with this Session, first issue Session.rollback(). Original 
> exception was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found 
> when trying to get lock; try restarting transaction') [SQL: 'UPDATE 
> task_instance SET state=%s WHERE task_instance.task_id = %s AND 
> task_instance.dag_id = %s AND task_instance.execution_date = %s'] 
> [parameters: ('queued', 'transfer_from_tmp_table_into_cleaned_table', 
> 'rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h.s3_to_hive', 
> datetime.datetime(2018, 5, 7, 5, 2))] (Background on this error at: 
> http://sqlalche.me/e/e3q8)
> Traceback (most recent call last):
> sqlalchemy.exc.InvalidRequestError: This Session's transaction has been 
> rolled back due to a previous exception during flush. To begin a new 
> transaction with this Session, first issue Session.rollback(). Original 
> exception was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found 
> when trying to get lock; try restarting transaction') [SQL: 'UPDATE 
> task_instance SET state=%s WHERE task_instance.task_id = %s AND 
> task_instance.dag_id = %s AND task_instance.execution_date = %s'] 
> [parameters: ('queued', 'transfer_from_tmp_table_into_cleaned_table', 
> 'rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h.s3_to_hive', 
> datetime.datetime(2018, 5, 7, 5, 2))] (Background on this error at: 
> http://sqlalche.me/e/e3q8)
> [2018-05-22 21:52:19,687] {models.py:1624} INFO - Marking task as FAILED.
> [2018-05-22 21:52:19,688] {base_task_runner.py:98} INFO - Subtask: 
> [2018-05-22 21:52:19,688] {slack_hook.py:143} INFO - Message is prepared: 
> [2018-05-22 21:52:19,688] {base_task_runner.py:98} INFO - Subtask: 
> {"attachments": [{"color": "danger", "text": "", "fields": [{"title": "DAG", 
> "value": 
> "",
>  "short": true}, {"title": "Owner", "value": "airflow", "short": true}, 
> {"title": "Task", "value": "s3_to_hive", "short": false}, {"title": "Status", 
> "value": "FAILED", "short": false}, {"title": "Execution Time", "value": 
> "2018-05-07T05:02:00", "short": true}, {"title": "Duration", "value": 
> "826.305929", "short": true}, {"value": 
> "  Task Log>", "short": false}]}]}
> [2018-05-22 21:52:19,688] {models.py:1638} ERROR - Failed at executing 
> callback
> [2018-05-22 21:52:19,688] {models.py:1639} ERROR - This Session's transaction 
> has been rolled back due to a previous exception during flush. To begin a new 
> transaction with this Session, first issue Session.rollback(). Original 
> exception was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found 
> when trying to get lock; try restarting transaction') [SQL: 'UPDATE 
> task_instance SET state=%s WHERE task_instance.task_id = %s AND 
> task_instance.dag_id = %s AND task_instance.execution_date = %s'] 
> [parameters: ('queued', 'transfer_from_tmp_table_into_cleaned_table', 
> 'rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h.s3_to_hive', 
> datetime.datetime(2018, 5, 7, 5, 2))] (Background on this error at: 
> http://sqlalche.me/e/e3q8)
> 

[GitHub] jakahn commented on a change in pull request #3805: [AIRFLOW-2062] Add per-connection KMS encryption.

2018-08-31 Thread GitBox
jakahn commented on a change in pull request #3805: [AIRFLOW-2062] Add 
per-connection KMS encryption.
URL: https://github.com/apache/incubator-airflow/pull/3805#discussion_r214426737
 
 

 ##
 File path: airflow/models.py
 ##
 @@ -122,44 +123,47 @@ class NullFernet(object):
 """
 is_encrypted = False
 
+def __init__(self, k):
+LoggingMixin().log.warn(
 
 Review comment:
   Fixed!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2994) flatten_results in BigQueryOperator/BigQueryHook should default to None

2018-08-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599039#comment-16599039
 ] 

ASF GitHub Bot commented on AIRFLOW-2994:
-

criccomini opened a new pull request #3829: [AIRFLOW-2994] Fix flatten_results 
for BigQueryOperator
URL: https://github.com/apache/incubator-airflow/pull/3829
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-2994\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2994
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-2994\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> flatten_results in BigQueryOperator/BigQueryHook should default to None
> ---
>
> Key: AIRFLOW-2994
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2994
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.0
>Reporter: Chris Riccomini
>Priority: Major
> Fix For: 1.10.1
>
>
> Upon upgrading to 1.10, we began seeing issues with our queries that were 
> using allow_large_results. They began failing because flatten_results now 
> defaults to False. This should default to unset (None), as it did before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] criccomini opened a new pull request #3829: [AIRFLOW-2994] Fix flatten_results for BigQueryOperator

2018-08-31 Thread GitBox
criccomini opened a new pull request #3829: [AIRFLOW-2994] Fix flatten_results 
for BigQueryOperator
URL: https://github.com/apache/incubator-airflow/pull/3829
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-2994\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2994
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-2994\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-2994) flatten_results in BigQueryOperator/BigQueryHook should default to None

2018-08-31 Thread Chris Riccomini (JIRA)
Chris Riccomini created AIRFLOW-2994:


 Summary: flatten_results in BigQueryOperator/BigQueryHook should 
default to None
 Key: AIRFLOW-2994
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2994
 Project: Apache Airflow
  Issue Type: Bug
  Components: gcp
Affects Versions: 1.10.0
Reporter: Chris Riccomini
 Fix For: 1.10.1


Upon upgrading to 1.10, we began seeing issues with our queries that were using 
allow_large_results. They began failing because flatten_results now defaults to 
False. This should default to unset (None), as it did before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2511) Subdag failed by scheduler deadlock

2018-08-31 Thread Trevor Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599024#comment-16599024
 ] 

Trevor Edwards commented on AIRFLOW-2511:
-

Doing some digging into this issue, it seems like this kind of error should 
only happen if sessions are being misused: 
http://docs.sqlalchemy.org/en/latest/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar

 

Reviewing 1.10 (though 1.9.0 is similar) 
https://github.com/apache/incubator-airflow/blob/1.10.0/airflow/models.py#L1602-L1673,
 it seems like there are several calls to methods which call session.commit() 
but do not catch exceptions and rollback. The broad except at the end of the 
highlighted region in my link also attempts to do a session.commit() in the 
method call, meaning if we swallow a sql exception in order to get to that 
line, we'll get this exception. I'm not convinced this is what caused this 
specific issue, but it is at least related and probably an issue as well. I'll 
keep digging.

> Subdag failed by scheduler deadlock
> ---
>
> Key: AIRFLOW-2511
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2511
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Yohei Onishi
>Priority: Major
>
> I am using subdag and sometimes main dag marked failed because of the 
> following error. In this case, tasks in the subdag stopped.
> {code:java}
> hourly_dag = DAG(
>   hourly_dag_name,
>   default_args=dag_default_args,
>   params=dag_custom_params,
>   schedule_interval=config_values.hourly_job_interval,
>   max_active_runs=2)
> hourly_subdag = SubDagOperator(
>   task_id='s3_to_hive',
>   subdag=LoadFromS3ToHive(
>   hourly_dag,
>   's3_to_hive'),
>   dag=hourly_dag)
> {code}
> I got this error in main dag. bug in scheduler?
> {code:java}
> [2018-05-22 21:52:19,683] {models.py:1595} ERROR - This Session's transaction 
> has been rolled back due to a previous exception during flush. To begin a new 
> transaction with this Session, first issue Session.rollback(). Original 
> exception was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found 
> when trying to get lock; try restarting transaction') [SQL: 'UPDATE 
> task_instance SET state=%s WHERE task_instance.task_id = %s AND 
> task_instance.dag_id = %s AND task_instance.execution_date = %s'] 
> [parameters: ('queued', 'transfer_from_tmp_table_into_cleaned_table', 
> 'rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h.s3_to_hive', 
> datetime.datetime(2018, 5, 7, 5, 2))] (Background on this error at: 
> http://sqlalche.me/e/e3q8)
> Traceback (most recent call last):
> sqlalchemy.exc.InvalidRequestError: This Session's transaction has been 
> rolled back due to a previous exception during flush. To begin a new 
> transaction with this Session, first issue Session.rollback(). Original 
> exception was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found 
> when trying to get lock; try restarting transaction') [SQL: 'UPDATE 
> task_instance SET state=%s WHERE task_instance.task_id = %s AND 
> task_instance.dag_id = %s AND task_instance.execution_date = %s'] 
> [parameters: ('queued', 'transfer_from_tmp_table_into_cleaned_table', 
> 'rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h.s3_to_hive', 
> datetime.datetime(2018, 5, 7, 5, 2))] (Background on this error at: 
> http://sqlalche.me/e/e3q8)
> [2018-05-22 21:52:19,687] {models.py:1624} INFO - Marking task as FAILED.
> [2018-05-22 21:52:19,688] {base_task_runner.py:98} INFO - Subtask: 
> [2018-05-22 21:52:19,688] {slack_hook.py:143} INFO - Message is prepared: 
> [2018-05-22 21:52:19,688] {base_task_runner.py:98} INFO - Subtask: 
> {"attachments": [{"color": "danger", "text": "", "fields": [{"title": "DAG", 
> "value": 
> "",
>  "short": true}, {"title": "Owner", "value": "airflow", "short": true}, 
> {"title": "Task", "value": "s3_to_hive", "short": false}, {"title": "Status", 
> "value": "FAILED", "short": false}, {"title": "Execution Time", "value": 
> "2018-05-07T05:02:00", "short": true}, {"title": "Duration", "value": 
> "826.305929", "short": true}, {"value": 
> "  Task Log>", "short": false}]}]}
> [2018-05-22 21:52:19,688] {models.py:1638} ERROR - Failed at executing 
> callback
> [2018-05-22 21:52:19,688] {models.py:1639} ERROR - This Session's transaction 
> has been rolled back due to a previous exception during flush. To begin a new 
> transaction with this Session, 

[GitHub] eyaltrabelsi commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to trigger job from Airflow

2018-08-31 Thread GitBox
eyaltrabelsi commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to 
trigger job from Airflow
URL: 
https://github.com/apache/incubator-airflow/pull/2708#issuecomment-417707790
 
 
   i cannot currently ( i am on my honeymoon :) )
If you want and have time @ndmar i would love that and i am here if you 
need guidance :) if not i will pick it up in early october.
   
   @fokko thanks you for the help


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wmorris75 opened a new pull request #3828: Salesforce and sftp operators with s3

2018-08-31 Thread GitBox
wmorris75 opened a new pull request #3828: Salesforce and sftp operators with s3
URL: https://github.com/apache/incubator-airflow/pull/3828
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-2993) Addition of S3_to_SFTP and SFTP_to_S3 Operators

2018-08-31 Thread Wayne Morris (JIRA)
Wayne Morris created AIRFLOW-2993:
-

 Summary: Addition of S3_to_SFTP  and SFTP_to_S3 Operators
 Key: AIRFLOW-2993
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2993
 Project: Apache Airflow
  Issue Type: New Feature
  Components: operators
Affects Versions: 1.9.0
Reporter: Wayne Morris
Assignee: Wayne Morris
 Fix For: Airflow 1.9.0


New features enable transferring of files or data from S3 to a SFTP remote path 
and SFTP to S3 path. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] ashb commented on issue #2946: [AIRFLOW-1927] Convert naive datetimes for TaskInstances

2018-08-31 Thread GitBox
ashb commented on issue #2946: [AIRFLOW-1927] Convert naive datetimes for 
TaskInstances
URL: 
https://github.com/apache/incubator-airflow/pull/2946#issuecomment-417675330
 
 
   If the TZ is always UTC (which it is - we have an SQLA class that "inflates" 
things to ensure that date times are always converted to UTC on reading out of 
the DB) then I don't think a `ts_tz` macro would be useful - though one to 
convert to local (or specified) TZ could be useful?
   
   Logging config for task instance logs is the other place this affects things 
https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#change-of-per-task-log-path


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-2992) Google oauth redirect

2018-08-31 Thread Nitin Jain (JIRA)
Nitin Jain created AIRFLOW-2992:
---

 Summary: Google oauth redirect 
 Key: AIRFLOW-2992
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2992
 Project: Apache Airflow
  Issue Type: Bug
  Components: contrib
Affects Versions: 1.10.0
Reporter: Nitin Jain


airflow.contrib.auth.backends.google_auth

Google oauth redirect url is set to 
http://web_server_host:web_server_port/oauth_callback_route

This makes it impossible to use google auth and keep airflow behind a reverse 
proxy. Instead if we can use base_url/oauth_callback_route, it will work. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] mascah commented on issue #2946: [AIRFLOW-1927] Convert naive datetimes for TaskInstances

2018-08-31 Thread GitBox
mascah commented on issue #2946: [AIRFLOW-1927] Convert naive datetimes for 
TaskInstances
URL: 
https://github.com/apache/incubator-airflow/pull/2946#issuecomment-417672549
 
 
   Not a huge deal, and hopefully this is just an edge case that really doesn't 
affect many people. Considering I still need the HH:MM:SS data, I've updated 
everything in my environment to use `{{ ts_nodash.split('+')[0] }}` for the 
time being.
   
   I'm not sure what the right thing is to do either since this is already 
released, but I imagine something along the lines of adding a new macro `{{ 
ts_tz }}` that behaves like it does now with the TZ info, and stripping it from 
the existing ones would have made sense.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work started] (AIRFLOW-2991) Log path to driver output after Dataproc job is completed

2018-08-31 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2991 started by Miłosz Szymczak.

> Log path to driver output after Dataproc job is completed
> -
>
> Key: AIRFLOW-2991
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2991
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 1.10.1
>Reporter: Miłosz Szymczak
>Assignee: Miłosz Szymczak
>Priority: Minor
>  Labels: pull-request-available
>
> It's difficult to keep track of logs from Google Cloud Dataproc jobs invoked 
> by Airflow. Dataproc API provides an output property 
> [driverOutputResourceUri|https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs#Job]
>  containing path to stdout of job's driver. It's very convenient to be able 
> to quickly check the output log in case of errors without opening Google 
> Cloud Console.
> The change involves printing INFO-level log with value of 
> *driverOutputResourceUri* after Dataproc job is finished.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2991) Log path to driver output after Dataproc job is completed

2018-08-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598715#comment-16598715
 ] 

ASF GitHub Bot commented on AIRFLOW-2991:
-

miloszszymczak opened a new pull request #3827: [AIRFLOW-2991] Log path to 
driver output after Dataproc job is completed
URL: https://github.com/apache/incubator-airflow/pull/3827
 
 
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2991
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   It's difficult to keep track of logs from Google Cloud Dataproc jobs invoked 
by Airflow. Dataproc API provides an output property driverOutputResourceUri 
containing path to stdout of job's driver. It's very convenient to be able to 
quickly check the output log in case of errors without opening Google Cloud 
Console. The change involves printing INFO-level log with value of 
driverOutputResourceUri after Dataproc job is finished.
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Log-only changes, no tests required.
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Log path to driver output after Dataproc job is completed
> -
>
> Key: AIRFLOW-2991
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2991
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 1.10.1
>Reporter: Miłosz Szymczak
>Assignee: Miłosz Szymczak
>Priority: Minor
>  Labels: pull-request-available
>
> It's difficult to keep track of logs from Google Cloud Dataproc jobs invoked 
> by Airflow. Dataproc API provides an output property 
> [driverOutputResourceUri|https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs#Job]
>  containing path to stdout of job's driver. It's very convenient to be able 
> to quickly check the output log in case of errors without opening Google 
> Cloud Console.
> The change involves printing INFO-level log with value of 
> *driverOutputResourceUri* after Dataproc job is finished.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] miloszszymczak opened a new pull request #3827: [AIRFLOW-2991] Log path to driver output after Dataproc job is completed

2018-08-31 Thread GitBox
miloszszymczak opened a new pull request #3827: [AIRFLOW-2991] Log path to 
driver output after Dataproc job is completed
URL: https://github.com/apache/incubator-airflow/pull/3827
 
 
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2991
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   It's difficult to keep track of logs from Google Cloud Dataproc jobs invoked 
by Airflow. Dataproc API provides an output property driverOutputResourceUri 
containing path to stdout of job's driver. It's very convenient to be able to 
quickly check the output log in case of errors without opening Google Cloud 
Console. The change involves printing INFO-level log with value of 
driverOutputResourceUri after Dataproc job is finished.
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Log-only changes, no tests required.
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2986) Airflow Worker does not reach sqs

2018-08-31 Thread Shivakumar Gopalakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598709#comment-16598709
 ] 

Shivakumar Gopalakrishnan commented on AIRFLOW-2986:


I did some digging into this and found the following
There is a curl post request sent to get messages - this API call does not take 
credentials and is not signed; could this be the issue

> Airflow Worker does not reach sqs
> -
>
> Key: AIRFLOW-2986
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2986
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: amazon linux
>Reporter: Shivakumar Gopalakrishnan
>Priority: Major
>
> I am running the airflow worker service. The service is not able to connect 
> to the sqs
> The scheduler is able to reach and write to the queue
> Proxies are fine; I have implemented this in both python 2.7 and 3.5 same 
> issue
> Copy of the log is below
> {code}
> starting airflow-worker...
> /data/share/airflow
> /data/share/airflow/airflow.cfg
> [2018-08-30 15:41:44,367] \{settings.py:146} DEBUG - Setting up DB connection 
> pool (PID 12304)
> [2018-08-30 15:41:44,367] \{settings.py:174} INFO - setting.configure_orm(): 
> Using pool settings. pool_size=5, pool_recycle=1800
> [2018-08-30 15:41:44,468] \{__init__.py:42} DEBUG - Cannot import due to 
> doesn't look like a module path
> [2018-08-30 15:41:44,875] \{__init__.py:51} INFO - Using executor 
> CeleryExecutor
> [2018-08-30 15:41:44,886] \{cli_action_loggers.py:40} DEBUG - Adding 
>  to pre execution callback
> [2018-08-30 15:41:44,995] \{cli_action_loggers.py:64} DEBUG - Calling 
> callbacks: []
> [2018-08-30 15:41:45,768] \{settings.py:146} DEBUG - Setting up DB connection 
> pool (PID 12308)
> [2018-08-30 15:41:45,768] \{settings.py:174} INFO - setting.configure_orm(): 
> Using pool settings. pool_size=5, pool_recycle=1800
> [2018-08-30 15:41:45,883] \{__init__.py:42} DEBUG - Cannot import due to 
> doesn't look like a module path
> [2018-08-30 15:41:46,345] \{__init__.py:51} INFO - Using executor 
> CeleryExecutor
> [2018-08-30 15:41:46,358] \{cli_action_loggers.py:40} DEBUG - Adding 
>  to pre execution callback
> [2018-08-30 15:41:46,476] \{cli_action_loggers.py:64} DEBUG - Calling 
> callbacks: []
> Starting flask
> [2018-08-30 15:41:46,519] \{_internal.py:88} INFO - * Running on 
> http://0.0.0.0:8793/ (Press CTRL+C to quit)
> [2018-08-30 15:43:58,779: CRITICAL/MainProcess] Unrecoverable error: 
> Exception('Request Empty body HTTP 599 Failed to connect to 
> eu-west-1.queue.amazonaws.com port 443: Connection timed out (None)',)
> Traceback (most recent call last):
>  File "/usr/local/lib/python3.5/site-packages/celery/worker/worker.py", line 
> 207, in start
>  self.blueprint.start(self)
>  File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, 
> in start
>  step.start(parent)
>  File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 370, 
> in start
>  return self.obj.start()
>  File 
> "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", 
> line 316, in start
>  blueprint.start(self)
>  File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, 
> in start
>  step.start(parent)
>  File 
> "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", 
> line 592, in start
>  c.loop(*c.loop_args())
>  File "/usr/local/lib/python3.5/site-packages/celery/worker/loops.py", line 
> 91, in asynloop
>  next(loop)
>  File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/hub.py", 
> line 354, in create_loop
>  cb(*cbargs)
>  File 
> "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", 
> line 114, in on_writable
>  return self._on_event(fd, _pycurl.CSELECT_OUT)
>  File 
> "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", 
> line 124, in _on_event
>  self._process_pending_requests()
>  File 
> "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", 
> line 132, in _process_pending_requests
>  self._process(curl, errno, reason)
>  File 
> "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", 
> line 178, in _process
>  buffer=buffer, effective_url=effective_url, error=error,
>  File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 150, in 
> __call__
>  svpending(*ca, **ck)
>  File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in 
> __call__
>  return self.throw()
>  File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 140, in 
> __call__
>  retval = fun(*final_args, **final_kwargs)
>  File "/usr/local/lib/python3.5/site-packages/vine/funtools.py", line 100, in 
> _transback
>  return callback(ret)
>  File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in 
> __call__
>  

[jira] [Created] (AIRFLOW-2991) Log path to driver output after Dataproc job is completed

2018-08-31 Thread JIRA
Miłosz Szymczak created AIRFLOW-2991:


 Summary: Log path to driver output after Dataproc job is completed
 Key: AIRFLOW-2991
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2991
 Project: Apache Airflow
  Issue Type: Improvement
  Components: contrib
Affects Versions: 1.10.1
Reporter: Miłosz Szymczak
Assignee: Miłosz Szymczak


It's difficult to keep track of logs from Google Cloud Dataproc jobs invoked by 
Airflow. Dataproc API provides an output property 
**[driverOutputResourceUri|https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs#Job]
 containing path to stdout of job's driver. It's very convenient to be able to 
quickly check the output log in case of errors without opening Google Cloud 
Console.

The change involves printing INFO-level log with value of 
*driverOutputResourceUri* after Dataproc job is finished.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2991) Log path to driver output after Dataproc job is completed

2018-08-31 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miłosz Szymczak updated AIRFLOW-2991:
-
Description: 
It's difficult to keep track of logs from Google Cloud Dataproc jobs invoked by 
Airflow. Dataproc API provides an output property 
[driverOutputResourceUri|https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs#Job]
 containing path to stdout of job's driver. It's very convenient to be able to 
quickly check the output log in case of errors without opening Google Cloud 
Console.

The change involves printing INFO-level log with value of 
*driverOutputResourceUri* after Dataproc job is finished.

  was:
It's difficult to keep track of logs from Google Cloud Dataproc jobs invoked by 
Airflow. Dataproc API provides an output property 
**[driverOutputResourceUri|https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs#Job]
 containing path to stdout of job's driver. It's very convenient to be able to 
quickly check the output log in case of errors without opening Google Cloud 
Console.

The change involves printing INFO-level log with value of 
*driverOutputResourceUri* after Dataproc job is finished.


> Log path to driver output after Dataproc job is completed
> -
>
> Key: AIRFLOW-2991
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2991
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 1.10.1
>Reporter: Miłosz Szymczak
>Assignee: Miłosz Szymczak
>Priority: Minor
>  Labels: pull-request-available
>
> It's difficult to keep track of logs from Google Cloud Dataproc jobs invoked 
> by Airflow. Dataproc API provides an output property 
> [driverOutputResourceUri|https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs#Job]
>  containing path to stdout of job's driver. It's very convenient to be able 
> to quickly check the output log in case of errors without opening Google 
> Cloud Console.
> The change involves printing INFO-level log with value of 
> *driverOutputResourceUri* after Dataproc job is finished.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying

2018-08-31 Thread GitBox
codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 
object copying
URL: 
https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417296502
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=h1)
 Report
   > Merging 
[#3823](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/ba27fcad69fa914cc55c530f1716b676974c7bd5?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3823/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3823   +/-   ##
   ===
 Coverage   77.44%   77.44%   
   ===
 Files 203  203   
 Lines   1584415844   
   ===
 Hits1227012270   
 Misses   3574 3574
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=footer).
 Last update 
[ba27fca...217274a](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying

2018-08-31 Thread GitBox
codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 
object copying
URL: 
https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417296502
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=h1)
 Report
   > Merging 
[#3823](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/ba27fcad69fa914cc55c530f1716b676974c7bd5?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3823/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3823   +/-   ##
   ===
 Coverage   77.44%   77.44%   
   ===
 Files 203  203   
 Lines   1584415844   
   ===
 Hits1227012270   
 Misses   3574 3574
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=footer).
 Last update 
[ba27fca...217274a](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying

2018-08-31 Thread GitBox
codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 
object copying
URL: 
https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417296502
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=h1)
 Report
   > Merging 
[#3823](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/ba27fcad69fa914cc55c530f1716b676974c7bd5?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3823/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3823   +/-   ##
   ===
 Coverage   77.44%   77.44%   
   ===
 Files 203  203   
 Lines   1584415844   
   ===
 Hits1227012270   
 Misses   3574 3574
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=footer).
 Last update 
[ba27fca...217274a](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on issue #3820: [AIRFLOW-2990] Fix Docstrings for Hooks/Operators

2018-08-31 Thread GitBox
XD-DENG commented on issue #3820: [AIRFLOW-2990] Fix Docstrings for 
Hooks/Operators
URL: 
https://github.com/apache/incubator-airflow/pull/3820#issuecomment-417624840
 
 
   Thanks @kaxil for helping fix the typo `parame` (which was made by me...)
   
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2990) Docstrings for Hooks/Operators are in incorrect format

2018-08-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598424#comment-16598424
 ] 

ASF GitHub Bot commented on AIRFLOW-2990:
-

kaxil closed pull request #3820: [AIRFLOW-2990] Fix Docstrings for 
Hooks/Operators
URL: https://github.com/apache/incubator-airflow/pull/3820
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/gcp_container_hook.py 
b/airflow/contrib/hooks/gcp_container_hook.py
index e5fbda138e..0047b8dbeb 100644
--- a/airflow/contrib/hooks/gcp_container_hook.py
+++ b/airflow/contrib/hooks/gcp_container_hook.py
@@ -48,6 +48,7 @@ def __init__(self, project_id, location):
 def _dict_to_proto(py_dict, proto):
 """
 Converts a python dictionary to the proto supplied
+
 :param py_dict: The dictionary to convert
 :type py_dict: dict
 :param proto: The proto object to merge with dictionary
@@ -63,6 +64,7 @@ def wait_for_operation(self, operation):
 """
 Given an operation, continuously fetches the status from Google Cloud 
until either
 completion or an error occurring
+
 :param operation: The Operation to wait for
 :type operation: A google.cloud.container_V1.gapic.enums.Operator
 :return: A new, updated operation fetched from Google Cloud
@@ -83,6 +85,7 @@ def wait_for_operation(self, operation):
 def get_operation(self, operation_name):
 """
 Fetches the operation from Google Cloud
+
 :param operation_name: Name of operation to fetch
 :type operation_name: str
 :return: The new, updated operation from Google Cloud
@@ -196,6 +199,7 @@ def create_cluster(self, cluster, retry=DEFAULT, 
timeout=DEFAULT):
 def get_cluster(self, name, retry=DEFAULT, timeout=DEFAULT):
 """
 Gets details of specified cluster
+
 :param name: The name of the cluster to retrieve
 :type name: str
 :param retry: A retry object used to retry requests. If None is 
specified,
diff --git a/airflow/contrib/operators/awsbatch_operator.py 
b/airflow/contrib/operators/awsbatch_operator.py
index 50c6c2c319..4008c90c47 100644
--- a/airflow/contrib/operators/awsbatch_operator.py
+++ b/airflow/contrib/operators/awsbatch_operator.py
@@ -42,18 +42,20 @@ class AWSBatchOperator(BaseOperator):
 :type job_definition: str
 :param job_queue: the queue name on AWS Batch
 :type job_queue: str
-:param: overrides: the same parameter that boto3 will receive on
-containerOverrides (templated):
-
http://boto3.readthedocs.io/en/latest/reference/services/batch.html#submit_job
-:type: overrides: dict
-:param max_retries: exponential backoff retries while waiter is not 
merged, 4200 = 48 hours
+:param overrides: the same parameter that boto3 will receive on
+containerOverrides (templated).
+
http://boto3.readthedocs.io/en/latest/reference/services/batch.html#submit_job
+:type overrides: dict
+:param max_retries: exponential backoff retries while waiter is not merged,
+4200 = 48 hours
 :type max_retries: int
 :param aws_conn_id: connection id of AWS credentials / region name. If 
None,
-credential boto3 strategy will be used
-(http://boto3.readthedocs.io/en/latest/guide/configuration.html).
+credential boto3 strategy will be used
+(http://boto3.readthedocs.io/en/latest/guide/configuration.html).
 :type aws_conn_id: str
 :param region_name: region name to use in AWS Hook.
 Override the region_name in connection (if provided)
+:type region_name: str
 """
 
 ui_color = '#c3dae0'
diff --git a/airflow/contrib/operators/bigquery_check_operator.py 
b/airflow/contrib/operators/bigquery_check_operator.py
index a9c493f4fd..3eba0771db 100644
--- a/airflow/contrib/operators/bigquery_check_operator.py
+++ b/airflow/contrib/operators/bigquery_check_operator.py
@@ -56,7 +56,7 @@ class BigQueryCheckOperator(CheckOperator):
 :param bigquery_conn_id: reference to the BigQuery database
 :type bigquery_conn_id: string
 :param use_legacy_sql: Whether to use legacy SQL (true)
-or standard SQL (false).
+or standard SQL (false).
 :type use_legacy_sql: boolean
 """
 
@@ -83,7 +83,7 @@ class BigQueryValueCheckOperator(ValueCheckOperator):
 :param sql: the sql to be executed
 :type sql: string
 :param use_legacy_sql: Whether to use legacy SQL (true)
-or standard SQL (false).
+or standard SQL (false).
 :type use_legacy_sql: boolean
 """
 
@@ -125,7 +125,7 @@ class BigQueryIntervalCheckOperator(IntervalCheckOperator):
   

[GitHub] kaxil closed pull request #3820: [AIRFLOW-2990] Fix Docstrings for Hooks/Operators

2018-08-31 Thread GitBox
kaxil closed pull request #3820: [AIRFLOW-2990] Fix Docstrings for 
Hooks/Operators
URL: https://github.com/apache/incubator-airflow/pull/3820
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/gcp_container_hook.py 
b/airflow/contrib/hooks/gcp_container_hook.py
index e5fbda138e..0047b8dbeb 100644
--- a/airflow/contrib/hooks/gcp_container_hook.py
+++ b/airflow/contrib/hooks/gcp_container_hook.py
@@ -48,6 +48,7 @@ def __init__(self, project_id, location):
 def _dict_to_proto(py_dict, proto):
 """
 Converts a python dictionary to the proto supplied
+
 :param py_dict: The dictionary to convert
 :type py_dict: dict
 :param proto: The proto object to merge with dictionary
@@ -63,6 +64,7 @@ def wait_for_operation(self, operation):
 """
 Given an operation, continuously fetches the status from Google Cloud 
until either
 completion or an error occurring
+
 :param operation: The Operation to wait for
 :type operation: A google.cloud.container_V1.gapic.enums.Operator
 :return: A new, updated operation fetched from Google Cloud
@@ -83,6 +85,7 @@ def wait_for_operation(self, operation):
 def get_operation(self, operation_name):
 """
 Fetches the operation from Google Cloud
+
 :param operation_name: Name of operation to fetch
 :type operation_name: str
 :return: The new, updated operation from Google Cloud
@@ -196,6 +199,7 @@ def create_cluster(self, cluster, retry=DEFAULT, 
timeout=DEFAULT):
 def get_cluster(self, name, retry=DEFAULT, timeout=DEFAULT):
 """
 Gets details of specified cluster
+
 :param name: The name of the cluster to retrieve
 :type name: str
 :param retry: A retry object used to retry requests. If None is 
specified,
diff --git a/airflow/contrib/operators/awsbatch_operator.py 
b/airflow/contrib/operators/awsbatch_operator.py
index 50c6c2c319..4008c90c47 100644
--- a/airflow/contrib/operators/awsbatch_operator.py
+++ b/airflow/contrib/operators/awsbatch_operator.py
@@ -42,18 +42,20 @@ class AWSBatchOperator(BaseOperator):
 :type job_definition: str
 :param job_queue: the queue name on AWS Batch
 :type job_queue: str
-:param: overrides: the same parameter that boto3 will receive on
-containerOverrides (templated):
-
http://boto3.readthedocs.io/en/latest/reference/services/batch.html#submit_job
-:type: overrides: dict
-:param max_retries: exponential backoff retries while waiter is not 
merged, 4200 = 48 hours
+:param overrides: the same parameter that boto3 will receive on
+containerOverrides (templated).
+
http://boto3.readthedocs.io/en/latest/reference/services/batch.html#submit_job
+:type overrides: dict
+:param max_retries: exponential backoff retries while waiter is not merged,
+4200 = 48 hours
 :type max_retries: int
 :param aws_conn_id: connection id of AWS credentials / region name. If 
None,
-credential boto3 strategy will be used
-(http://boto3.readthedocs.io/en/latest/guide/configuration.html).
+credential boto3 strategy will be used
+(http://boto3.readthedocs.io/en/latest/guide/configuration.html).
 :type aws_conn_id: str
 :param region_name: region name to use in AWS Hook.
 Override the region_name in connection (if provided)
+:type region_name: str
 """
 
 ui_color = '#c3dae0'
diff --git a/airflow/contrib/operators/bigquery_check_operator.py 
b/airflow/contrib/operators/bigquery_check_operator.py
index a9c493f4fd..3eba0771db 100644
--- a/airflow/contrib/operators/bigquery_check_operator.py
+++ b/airflow/contrib/operators/bigquery_check_operator.py
@@ -56,7 +56,7 @@ class BigQueryCheckOperator(CheckOperator):
 :param bigquery_conn_id: reference to the BigQuery database
 :type bigquery_conn_id: string
 :param use_legacy_sql: Whether to use legacy SQL (true)
-or standard SQL (false).
+or standard SQL (false).
 :type use_legacy_sql: boolean
 """
 
@@ -83,7 +83,7 @@ class BigQueryValueCheckOperator(ValueCheckOperator):
 :param sql: the sql to be executed
 :type sql: string
 :param use_legacy_sql: Whether to use legacy SQL (true)
-or standard SQL (false).
+or standard SQL (false).
 :type use_legacy_sql: boolean
 """
 
@@ -125,7 +125,7 @@ class BigQueryIntervalCheckOperator(IntervalCheckOperator):
 between the current day, and the prior days_back.
 :type metrics_threshold: dict
 :param use_legacy_sql: Whether to use legacy SQL (true)
-or standard SQL (false).
+or standard SQL (false).
 :type use_legacy_sql: 

[GitHub] kaxil commented on issue #3820: [AIRFLOW-2990] Fix Docstrings for Hooks/Operators

2018-08-31 Thread GitBox
kaxil commented on issue #3820: [AIRFLOW-2990] Fix Docstrings for 
Hooks/Operators
URL: 
https://github.com/apache/incubator-airflow/pull/3820#issuecomment-417593219
 
 
   Good Spot  @Fokko . I have fixed the issue, it had a missing `type` for 
`region_name` in docstrings. I have also created a Jira issue  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-2990) Docstrings for Hooks/Operators are in incorrect format

2018-08-31 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-2990:

Issue Type: Improvement  (was: New Feature)

> Docstrings for Hooks/Operators are in incorrect format
> --
>
> Key: AIRFLOW-2990
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2990
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs, Documentation
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.1
>
>
> Some of the docstrings have missing values and some have improper parameters 
> formatting



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2990) Docstrings for Hooks/Operators are in incorrect format

2018-08-31 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-2990:
---

 Summary: Docstrings for Hooks/Operators are in incorrect format
 Key: AIRFLOW-2990
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2990
 Project: Apache Airflow
  Issue Type: New Feature
  Components: docs, Documentation
Reporter: Kaxil Naik
Assignee: Kaxil Naik
 Fix For: 1.10.1


Some of the docstrings have missing values and some have improper parameters 
formatting



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko commented on issue #3702: [AIRFLOW-81] Add ScheduleBlackoutSensor

2018-08-31 Thread GitBox
Fokko commented on issue #3702: [AIRFLOW-81] Add ScheduleBlackoutSensor
URL: 
https://github.com/apache/incubator-airflow/pull/3702#issuecomment-417584317
 
 
   @andscoop Why would this make it more complex? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-2948) Arg checking & better doc for SSHOperator and SFTPOperator

2018-08-31 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-2948.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Arg checking & better doc for SSHOperator and SFTPOperator
> --
>
> Key: AIRFLOW-2948
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2948
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Minor
> Fix For: 2.0.0
>
>
> There may be different combinations of arguments, and some processings are 
> being done 'silently', while users may not be fully aware of them.
>  
>  For example
>  - User only needs to provide either `ssh_hook` or `ssh_conn_id`, while this 
> is not clear in doc
>  - if both provided, `ssh_conn_id` will be ignored.
>  - if `remote_host` is provided, it will replace the `remote_host` which 
> wasndefined in `ssh_hook` or predefined in the connection of `ssh_conn_id`
>  
>  These should be documented clearly to ensure it's transparent to the users.
>  
>  log.info() should also be used to remind users and provide clear logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko commented on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …

2018-08-31 Thread GitBox
Fokko commented on issue #3813: [AIRFLOW-1998] Implemented 
DatabricksRunNowOperator for jobs/run-now …
URL: 
https://github.com/apache/incubator-airflow/pull/3813#issuecomment-417581950
 
 
   @isknight Can you rebase onto master? There are merge conflicts :(


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #2946: [AIRFLOW-1927] Convert naive datetimes for TaskInstances

2018-08-31 Thread GitBox
Fokko commented on issue #2946: [AIRFLOW-1927] Convert naive datetimes for 
TaskInstances
URL: 
https://github.com/apache/incubator-airflow/pull/2946#issuecomment-417581377
 
 
   Yes, this sounds lik a breaking change that I've missed. A workaround would 
be `{{ ds_nodash }}T00`. @bolkedebruin Any thoughts on this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #3805: [AIRFLOW-2062] Add per-connection KMS encryption.

2018-08-31 Thread GitBox
Fokko commented on a change in pull request #3805: [AIRFLOW-2062] Add 
per-connection KMS encryption.
URL: https://github.com/apache/incubator-airflow/pull/3805#discussion_r214268210
 
 

 ##
 File path: airflow/models.py
 ##
 @@ -122,44 +123,47 @@ class NullFernet(object):
 """
 is_encrypted = False
 
+def __init__(self, k):
+LoggingMixin().log.warn(
 
 Review comment:
   Please add the logging mixin to the class itself: `class 
NullFernet(LoggingMixin):
   `


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2974) Databricks Cluster Operations

2018-08-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598367#comment-16598367
 ] 

ASF GitHub Bot commented on AIRFLOW-2974:
-

Fokko closed pull request #3817: [AIRFLOW-2974] Extended Databricks hook with 
cluster operation
URL: https://github.com/apache/incubator-airflow/pull/3817
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/databricks_hook.py 
b/airflow/contrib/hooks/databricks_hook.py
index 5b97a0eba0..cb2ba9bd00 100644
--- a/airflow/contrib/hooks/databricks_hook.py
+++ b/airflow/contrib/hooks/databricks_hook.py
@@ -33,6 +33,9 @@
 except ImportError:
 import urlparse
 
+RESTART_CLUSTER_ENDPOINT = ("POST", "api/2.0/clusters/restart")
+START_CLUSTER_ENDPOINT = ("POST", "api/2.0/clusters/start")
+TERMINATE_CLUSTER_ENDPOINT = ("POST", "api/2.0/clusters/delete")
 
 SUBMIT_RUN_ENDPOINT = ('POST', 'api/2.0/jobs/runs/submit')
 GET_RUN_ENDPOINT = ('GET', 'api/2.0/jobs/runs/get')
@@ -189,6 +192,15 @@ def cancel_run(self, run_id):
 json = {'run_id': run_id}
 self._do_api_call(CANCEL_RUN_ENDPOINT, json)
 
+def restart_cluster(self, json):
+self._do_api_call(RESTART_CLUSTER_ENDPOINT, json)
+
+def start_cluster(self, json):
+self._do_api_call(START_CLUSTER_ENDPOINT, json)
+
+def terminate_cluster(self, json):
+self._do_api_call(TERMINATE_CLUSTER_ENDPOINT, json)
+
 
 def _retryable_error(exception):
 return isinstance(exception, requests_exceptions.ConnectionError) \
diff --git a/setup.cfg b/setup.cfg
index 622cc1303a..881fe0107d 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -14,6 +14,7 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+
 [metadata]
 name = Airflow
 summary = Airflow is a system to programmatically author, schedule and monitor 
data pipelines.
@@ -34,4 +35,3 @@ all_files = 1
 upload-dir = docs/_build/html
 
 [easy_install]
-
diff --git a/tests/contrib/hooks/test_databricks_hook.py 
b/tests/contrib/hooks/test_databricks_hook.py
index a022431899..04a7c8dc3c 100644
--- a/tests/contrib/hooks/test_databricks_hook.py
+++ b/tests/contrib/hooks/test_databricks_hook.py
@@ -52,6 +52,7 @@
 'node_type_id': 'r3.xlarge',
 'num_workers': 1
 }
+CLUSTER_ID = 'cluster_id'
 RUN_ID = 1
 HOST = 'xx.cloud.databricks.com'
 HOST_WITH_SCHEME = 'https://xx.cloud.databricks.com'
@@ -93,6 +94,26 @@ def cancel_run_endpoint(host):
 return 'https://{}/api/2.0/jobs/runs/cancel'.format(host)
 
 
+def start_cluster_endpoint(host):
+"""
+Utility function to generate the get run endpoint given the host.
+"""
+return 'https://{}/api/2.0/clusters/start'.format(host)
+
+
+def restart_cluster_endpoint(host):
+"""
+Utility function to generate the get run endpoint given the host.
+"""
+return 'https://{}/api/2.0/clusters/restart'.format(host)
+
+
+def terminate_cluster_endpoint(host):
+"""
+Utility function to generate the get run endpoint given the host.
+"""
+return 'https://{}/api/2.0/clusters/delete'.format(host)
+
 def create_valid_response_mock(content):
 response = mock.MagicMock()
 response.json.return_value = content
@@ -293,6 +314,54 @@ def test_cancel_run(self, mock_requests):
 headers=USER_AGENT_HEADER,
 timeout=self.hook.timeout_seconds)
 
+@mock.patch('airflow.contrib.hooks.databricks_hook.requests')
+def test_start_cluster(self, mock_requests):
+mock_requests.codes.ok = 200
+mock_requests.post.return_value.json.return_value = {}
+status_code_mock = mock.PropertyMock(return_value=200)
+type(mock_requests.post.return_value).status_code = status_code_mock
+
+self.hook.start_cluster({"cluster_id": CLUSTER_ID})
+
+mock_requests.post.assert_called_once_with(
+start_cluster_endpoint(HOST),
+json={'cluster_id': CLUSTER_ID},
+auth=(LOGIN, PASSWORD),
+headers=USER_AGENT_HEADER,
+timeout=self.hook.timeout_seconds)
+
+@mock.patch('airflow.contrib.hooks.databricks_hook.requests')
+def test_restart_cluster(self, mock_requests):
+mock_requests.codes.ok = 200
+mock_requests.post.return_value.json.return_value = {}
+status_code_mock = mock.PropertyMock(return_value=200)
+type(mock_requests.post.return_value).status_code = status_code_mock
+
+self.hook.restart_cluster({"cluster_id": CLUSTER_ID})
+
+mock_requests.post.assert_called_once_with(
+restart_cluster_endpoint(HOST),
+json={'cluster_id': CLUSTER_ID},
+auth=(LOGIN, 

[jira] [Assigned] (AIRFLOW-2963) Error parsing AIRFLOW_CONN_ URI

2018-08-31 Thread Casandra julie mitchell (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Casandra julie mitchell reassigned AIRFLOW-2963:


Assignee: (was: Casandra julie mitchell)

> Error parsing AIRFLOW_CONN_ URI
> ---
>
> Key: AIRFLOW-2963
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2963
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: boto3, configuration
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Leonardo de Campos Almeida
>Priority: Minor
>  Labels: easyfix
>
> I'm using the environment variable AIRFLOW_CONN_ to define my connection to 
> AWS, but my AWS secret access key has a slash on it.
>  e.g.:
> {code:java}
> s3://login:pass/word@bucket
> {code}
>  The problem is that the method *BaseHook._get_connection_from_env* doesn't 
> accept this URI as a valid URI. When it finds the / it is assuming that the 
> path starts there, so it is returning:
>  * host: login
>  * port: pass
>  * path: word
> And ignoring the rest, so I get an error, because pass is not a valid port 
> number.
> So, I tried to pass the URI quoted
> {code:java}
> s3://login:pass%2Fword@bucker
> {code}
> But them, the values are not being unquoted correctly, and the AwsHook is 
> trying to use pass%2Fword as the secret access key.
>  I took a look at the method that parses the URI, and it is only unquoting 
> the host, manually.
> {code:java}
> def parse_from_uri(self, uri):
> temp_uri = urlparse(uri)
> hostname = temp_uri.hostname or ''
> if '%2f' in hostname:
> hostname = hostname.replace('%2f', '/').replace('%2F', '/')
> conn_type = temp_uri.scheme
> if conn_type == 'postgresql':
> conn_type = 'postgres'
> self.conn_type = conn_type
> self.host = hostname
> self.schema = temp_uri.path[1:]
> self.login = temp_uri.username
> self.password = temp_uri.password
> self.port = temp_uri.port
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko closed pull request #3817: [AIRFLOW-2974] Extended Databricks hook with cluster operation

2018-08-31 Thread GitBox
Fokko closed pull request #3817: [AIRFLOW-2974] Extended Databricks hook with 
cluster operation
URL: https://github.com/apache/incubator-airflow/pull/3817
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/databricks_hook.py 
b/airflow/contrib/hooks/databricks_hook.py
index 5b97a0eba0..cb2ba9bd00 100644
--- a/airflow/contrib/hooks/databricks_hook.py
+++ b/airflow/contrib/hooks/databricks_hook.py
@@ -33,6 +33,9 @@
 except ImportError:
 import urlparse
 
+RESTART_CLUSTER_ENDPOINT = ("POST", "api/2.0/clusters/restart")
+START_CLUSTER_ENDPOINT = ("POST", "api/2.0/clusters/start")
+TERMINATE_CLUSTER_ENDPOINT = ("POST", "api/2.0/clusters/delete")
 
 SUBMIT_RUN_ENDPOINT = ('POST', 'api/2.0/jobs/runs/submit')
 GET_RUN_ENDPOINT = ('GET', 'api/2.0/jobs/runs/get')
@@ -189,6 +192,15 @@ def cancel_run(self, run_id):
 json = {'run_id': run_id}
 self._do_api_call(CANCEL_RUN_ENDPOINT, json)
 
+def restart_cluster(self, json):
+self._do_api_call(RESTART_CLUSTER_ENDPOINT, json)
+
+def start_cluster(self, json):
+self._do_api_call(START_CLUSTER_ENDPOINT, json)
+
+def terminate_cluster(self, json):
+self._do_api_call(TERMINATE_CLUSTER_ENDPOINT, json)
+
 
 def _retryable_error(exception):
 return isinstance(exception, requests_exceptions.ConnectionError) \
diff --git a/setup.cfg b/setup.cfg
index 622cc1303a..881fe0107d 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -14,6 +14,7 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+
 [metadata]
 name = Airflow
 summary = Airflow is a system to programmatically author, schedule and monitor 
data pipelines.
@@ -34,4 +35,3 @@ all_files = 1
 upload-dir = docs/_build/html
 
 [easy_install]
-
diff --git a/tests/contrib/hooks/test_databricks_hook.py 
b/tests/contrib/hooks/test_databricks_hook.py
index a022431899..04a7c8dc3c 100644
--- a/tests/contrib/hooks/test_databricks_hook.py
+++ b/tests/contrib/hooks/test_databricks_hook.py
@@ -52,6 +52,7 @@
 'node_type_id': 'r3.xlarge',
 'num_workers': 1
 }
+CLUSTER_ID = 'cluster_id'
 RUN_ID = 1
 HOST = 'xx.cloud.databricks.com'
 HOST_WITH_SCHEME = 'https://xx.cloud.databricks.com'
@@ -93,6 +94,26 @@ def cancel_run_endpoint(host):
 return 'https://{}/api/2.0/jobs/runs/cancel'.format(host)
 
 
+def start_cluster_endpoint(host):
+"""
+Utility function to generate the get run endpoint given the host.
+"""
+return 'https://{}/api/2.0/clusters/start'.format(host)
+
+
+def restart_cluster_endpoint(host):
+"""
+Utility function to generate the get run endpoint given the host.
+"""
+return 'https://{}/api/2.0/clusters/restart'.format(host)
+
+
+def terminate_cluster_endpoint(host):
+"""
+Utility function to generate the get run endpoint given the host.
+"""
+return 'https://{}/api/2.0/clusters/delete'.format(host)
+
 def create_valid_response_mock(content):
 response = mock.MagicMock()
 response.json.return_value = content
@@ -293,6 +314,54 @@ def test_cancel_run(self, mock_requests):
 headers=USER_AGENT_HEADER,
 timeout=self.hook.timeout_seconds)
 
+@mock.patch('airflow.contrib.hooks.databricks_hook.requests')
+def test_start_cluster(self, mock_requests):
+mock_requests.codes.ok = 200
+mock_requests.post.return_value.json.return_value = {}
+status_code_mock = mock.PropertyMock(return_value=200)
+type(mock_requests.post.return_value).status_code = status_code_mock
+
+self.hook.start_cluster({"cluster_id": CLUSTER_ID})
+
+mock_requests.post.assert_called_once_with(
+start_cluster_endpoint(HOST),
+json={'cluster_id': CLUSTER_ID},
+auth=(LOGIN, PASSWORD),
+headers=USER_AGENT_HEADER,
+timeout=self.hook.timeout_seconds)
+
+@mock.patch('airflow.contrib.hooks.databricks_hook.requests')
+def test_restart_cluster(self, mock_requests):
+mock_requests.codes.ok = 200
+mock_requests.post.return_value.json.return_value = {}
+status_code_mock = mock.PropertyMock(return_value=200)
+type(mock_requests.post.return_value).status_code = status_code_mock
+
+self.hook.restart_cluster({"cluster_id": CLUSTER_ID})
+
+mock_requests.post.assert_called_once_with(
+restart_cluster_endpoint(HOST),
+json={'cluster_id': CLUSTER_ID},
+auth=(LOGIN, PASSWORD),
+headers=USER_AGENT_HEADER,
+timeout=self.hook.timeout_seconds)
+
+@mock.patch('airflow.contrib.hooks.databricks_hook.requests')
+def test_terminate_cluster(self, mock_requests):
+

[jira] [Resolved] (AIRFLOW-2974) Databricks Cluster Operations

2018-08-31 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-2974.
---
Resolution: Fixed

> Databricks Cluster Operations
> -
>
> Key: AIRFLOW-2974
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2974
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.9.0
>Reporter: Wayne Morris
>Assignee: Wayne Morris
>Priority: Major
>  Labels: features
> Fix For: 1.9.0
>
>
> This extends the current databricks hook for adding the functionality of 
> starting, restarting or terminating clusters in databricks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2974) Databricks Cluster Operations

2018-08-31 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong updated AIRFLOW-2974:
--
Fix Version/s: (was: 1.9.0)
   1.10.1

> Databricks Cluster Operations
> -
>
> Key: AIRFLOW-2974
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2974
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Affects Versions: 1.9.0
>Reporter: Wayne Morris
>Assignee: Wayne Morris
>Priority: Major
>  Labels: features
> Fix For: 1.10.1
>
>
> This extends the current databricks hook for adding the functionality of 
> starting, restarting or terminating clusters in databricks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2989) No Parameter to change bootDiskType for DataprocClusterCreateOperator

2018-08-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598357#comment-16598357
 ] 

ASF GitHub Bot commented on AIRFLOW-2989:
-

Fokko closed pull request #3825: [AIRFLOW-2989] Add param to set bootDiskType 
in Dataproc Op
URL: https://github.com/apache/incubator-airflow/pull/3825
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/dataproc_operator.py 
b/airflow/contrib/operators/dataproc_operator.py
index 69073f67de..9b6cae5282 100644
--- a/airflow/contrib/operators/dataproc_operator.py
+++ b/airflow/contrib/operators/dataproc_operator.py
@@ -75,10 +75,20 @@ class DataprocClusterCreateOperator(BaseOperator):
 :type properties: dict
 :param master_machine_type: Compute engine machine type to use for the 
master node
 :type master_machine_type: string
+:param master_disk_type: Type of the boot disk for the master node
+(default is ``pd-standard``).
+Valid values: ``pd-ssd`` (Persistent Disk Solid State Drive) or
+``pd-standard`` (Persistent Disk Hard Disk Drive).
+:type master_disk_type: string
 :param master_disk_size: Disk size for the master node
 :type master_disk_size: int
 :param worker_machine_type: Compute engine machine type to use for the 
worker nodes
 :type worker_machine_type: string
+:param worker_disk_type: Type of the boot disk for the worker node
+(default is ``pd-standard``).
+Valid values: ``pd-ssd`` (Persistent Disk Solid State Drive) or
+``pd-standard`` (Persistent Disk Hard Disk Drive).
+:type worker_disk_type: string
 :param worker_disk_size: Disk size for the worker nodes
 :type worker_disk_size: int
 :param num_preemptible_workers: The # of preemptible worker nodes to spin 
up
@@ -141,8 +151,10 @@ def __init__(self,
  image_version=None,
  properties=None,
  master_machine_type='n1-standard-4',
+ master_disk_type='pd-standard',
  master_disk_size=500,
  worker_machine_type='n1-standard-4',
+ worker_disk_type='pd-standard',
  worker_disk_size=500,
  num_preemptible_workers=0,
  labels=None,
@@ -171,8 +183,10 @@ def __init__(self,
 self.image_version = image_version
 self.properties = properties
 self.master_machine_type = master_machine_type
+self.master_disk_type = master_disk_type
 self.master_disk_size = master_disk_size
 self.worker_machine_type = worker_machine_type
+self.worker_disk_type = worker_disk_type
 self.worker_disk_size = worker_disk_size
 self.labels = labels
 self.zone = zone
@@ -272,6 +286,7 @@ def _build_cluster_data(self):
 'numInstances': 1,
 'machineTypeUri': master_type_uri,
 'diskConfig': {
+'bootDiskType': self.master_disk_type,
 'bootDiskSizeGb': self.master_disk_size
 }
 },
@@ -279,6 +294,7 @@ def _build_cluster_data(self):
 'numInstances': self.num_workers,
 'machineTypeUri': worker_type_uri,
 'diskConfig': {
+'bootDiskType': self.worker_disk_type,
 'bootDiskSizeGb': self.worker_disk_size
 }
 },
@@ -292,6 +308,7 @@ def _build_cluster_data(self):
 'numInstances': self.num_preemptible_workers,
 'machineTypeUri': worker_type_uri,
 'diskConfig': {
+'bootDiskType': self.worker_disk_type,
 'bootDiskSizeGb': self.worker_disk_size
 },
 'isPreemptible': True
@@ -401,7 +418,7 @@ class DataprocClusterScaleOperator(BaseOperator):
 cluster_name='cluster-1',
 num_workers=10,
 num_preemptible_workers=10,
-graceful_decommission_timeout='1h'
+graceful_decommission_timeout='1h',
 dag=dag)
 
 .. seealso::
diff --git a/tests/contrib/operators/test_dataproc_operator.py 
b/tests/contrib/operators/test_dataproc_operator.py
index e5cc770321..5b403a86ba 100644
--- a/tests/contrib/operators/test_dataproc_operator.py
+++ b/tests/contrib/operators/test_dataproc_operator.py
@@ -61,8 +61,10 @@
 IMAGE_VERSION = '1.1'
 MASTER_MACHINE_TYPE = 'n1-standard-2'
 MASTER_DISK_SIZE = 100
+MASTER_DISK_TYPE = 'pd-standard'
 WORKER_MACHINE_TYPE = 'n1-standard-2'
 WORKER_DISK_SIZE = 100
+WORKER_DISK_TYPE = 

[GitHub] Fokko commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to trigger job from Airflow

2018-08-31 Thread GitBox
Fokko commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to trigger 
job from Airflow
URL: 
https://github.com/apache/incubator-airflow/pull/2708#issuecomment-417577274
 
 
   The CI is failing. @etrabelsi can you pick this up? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko closed pull request #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op

2018-08-31 Thread GitBox
Fokko closed pull request #3825: [AIRFLOW-2989] Add param to set bootDiskType 
in Dataproc Op
URL: https://github.com/apache/incubator-airflow/pull/3825
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/dataproc_operator.py 
b/airflow/contrib/operators/dataproc_operator.py
index 69073f67de..9b6cae5282 100644
--- a/airflow/contrib/operators/dataproc_operator.py
+++ b/airflow/contrib/operators/dataproc_operator.py
@@ -75,10 +75,20 @@ class DataprocClusterCreateOperator(BaseOperator):
 :type properties: dict
 :param master_machine_type: Compute engine machine type to use for the 
master node
 :type master_machine_type: string
+:param master_disk_type: Type of the boot disk for the master node
+(default is ``pd-standard``).
+Valid values: ``pd-ssd`` (Persistent Disk Solid State Drive) or
+``pd-standard`` (Persistent Disk Hard Disk Drive).
+:type master_disk_type: string
 :param master_disk_size: Disk size for the master node
 :type master_disk_size: int
 :param worker_machine_type: Compute engine machine type to use for the 
worker nodes
 :type worker_machine_type: string
+:param worker_disk_type: Type of the boot disk for the worker node
+(default is ``pd-standard``).
+Valid values: ``pd-ssd`` (Persistent Disk Solid State Drive) or
+``pd-standard`` (Persistent Disk Hard Disk Drive).
+:type worker_disk_type: string
 :param worker_disk_size: Disk size for the worker nodes
 :type worker_disk_size: int
 :param num_preemptible_workers: The # of preemptible worker nodes to spin 
up
@@ -141,8 +151,10 @@ def __init__(self,
  image_version=None,
  properties=None,
  master_machine_type='n1-standard-4',
+ master_disk_type='pd-standard',
  master_disk_size=500,
  worker_machine_type='n1-standard-4',
+ worker_disk_type='pd-standard',
  worker_disk_size=500,
  num_preemptible_workers=0,
  labels=None,
@@ -171,8 +183,10 @@ def __init__(self,
 self.image_version = image_version
 self.properties = properties
 self.master_machine_type = master_machine_type
+self.master_disk_type = master_disk_type
 self.master_disk_size = master_disk_size
 self.worker_machine_type = worker_machine_type
+self.worker_disk_type = worker_disk_type
 self.worker_disk_size = worker_disk_size
 self.labels = labels
 self.zone = zone
@@ -272,6 +286,7 @@ def _build_cluster_data(self):
 'numInstances': 1,
 'machineTypeUri': master_type_uri,
 'diskConfig': {
+'bootDiskType': self.master_disk_type,
 'bootDiskSizeGb': self.master_disk_size
 }
 },
@@ -279,6 +294,7 @@ def _build_cluster_data(self):
 'numInstances': self.num_workers,
 'machineTypeUri': worker_type_uri,
 'diskConfig': {
+'bootDiskType': self.worker_disk_type,
 'bootDiskSizeGb': self.worker_disk_size
 }
 },
@@ -292,6 +308,7 @@ def _build_cluster_data(self):
 'numInstances': self.num_preemptible_workers,
 'machineTypeUri': worker_type_uri,
 'diskConfig': {
+'bootDiskType': self.worker_disk_type,
 'bootDiskSizeGb': self.worker_disk_size
 },
 'isPreemptible': True
@@ -401,7 +418,7 @@ class DataprocClusterScaleOperator(BaseOperator):
 cluster_name='cluster-1',
 num_workers=10,
 num_preemptible_workers=10,
-graceful_decommission_timeout='1h'
+graceful_decommission_timeout='1h',
 dag=dag)
 
 .. seealso::
diff --git a/tests/contrib/operators/test_dataproc_operator.py 
b/tests/contrib/operators/test_dataproc_operator.py
index e5cc770321..5b403a86ba 100644
--- a/tests/contrib/operators/test_dataproc_operator.py
+++ b/tests/contrib/operators/test_dataproc_operator.py
@@ -61,8 +61,10 @@
 IMAGE_VERSION = '1.1'
 MASTER_MACHINE_TYPE = 'n1-standard-2'
 MASTER_DISK_SIZE = 100
+MASTER_DISK_TYPE = 'pd-standard'
 WORKER_MACHINE_TYPE = 'n1-standard-2'
 WORKER_DISK_SIZE = 100
+WORKER_DISK_TYPE = 'pd-standard'
 NUM_PREEMPTIBLE_WORKERS = 2
 GET_INIT_ACTION_TIMEOUT = "600s"  # 10m
 LABEL1 = {}
@@ -125,8 +127,10 @@ def setUp(self):
 storage_bucket=STORAGE_BUCKET,
 image_version=IMAGE_VERSION,
   

[jira] [Resolved] (AIRFLOW-2989) No Parameter to change bootDiskType for DataprocClusterCreateOperator

2018-08-31 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-2989.
---
Resolution: Fixed

> No Parameter to change bootDiskType for DataprocClusterCreateOperator 
> --
>
> Key: AIRFLOW-2989
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2989
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, gcp
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.1
>
>
> Currently, we cannot set the Primary disk type for master and worker to 
> `pd-ssd` for DataprocClusterCreateOperator.
> Google API: 
> https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters#diskconfig
> Related StackOverflow Issue: 
> https://stackoverflow.com/questions/52090315/airflow-dataprocclustercreateoperator/52092942#52092942



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko commented on issue #3820: [AIRFLOW-XXX] Fix Docstrings for Hooks/Operators

2018-08-31 Thread GitBox
Fokko commented on issue #3820: [AIRFLOW-XXX] Fix Docstrings for Hooks/Operators
URL: 
https://github.com/apache/incubator-airflow/pull/3820#issuecomment-417576288
 
 
   
![image](https://user-images.githubusercontent.com/1134248/44898447-3aec4200-acff-11e8-9254-81d18b7a4167.png)
   
   Why is this `region_name` not annotated with `(str)`, it looks good in the 
docstring.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 closed pull request #1875: [AIRFLOW-620] Add log refresh button to TI's log view page

2018-08-31 Thread GitBox
r39132 closed pull request #1875: [AIRFLOW-620] Add log refresh button to TI's 
log view page
URL: https://github.com/apache/incubator-airflow/pull/1875
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/www/templates/airflow/log.html 
b/airflow/www/templates/airflow/log.html
new file mode 100644
index 00..51cce6b98c
--- /dev/null
+++ b/airflow/www/templates/airflow/log.html
@@ -0,0 +1,76 @@
+{#
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+#}
+{% extends "airflow/task_instance.html" %}
+{% block title %}Airflow - DAGs{% endblock %}
+
+{% block body %}
+{{ super() }}
+
+
+
+Oops.
+
+
+
+
+Log
+
+{% if log %}
+{{ log }}
+
+
+
+
+{% endif %}
+{% endblock %}
+
+{% block tail %}
+{{ super() }}
+
+function error(msg) {
+$('#error_msg').html(msg);
+$('#error').show();
+reset();
+}
+
+function reset() {
+$('#loading').css('display', 'none');
+$('pre#log_block').show();
+$(".refresh_button").removeClass('disabled');
+}
+
+$(".refresh_button").on("click", function() {
+$("#loading").css('display', 'block');
+$("pre#log_block").hide();
+$('#error').hide();
+$(".refresh_button").addClass('disabled');
+$.get(decodeURIComponent(window.location.href)
+).done(
+function(data) {
+$("pre#log_block").html(data);
+reset();
+window.scrollTo(0, document.body.scrollHeight);
+}
+).fail(function(jqxhr, textStatus, err) {
+error(textStatus + ': ' + err);
+});
+});
+
+
+{% endblock %}
diff --git a/airflow/www/views.py b/airflow/www/views.py
index de33843945..b85f7841b1 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -791,9 +791,14 @@ def log(self):
 if PY2 and not isinstance(log, unicode):
 log = log.decode('utf-8')
 
+if request.is_xhr:
+return log
+
+title = "Log"
+
 return self.render(
-'airflow/ti_code.html',
-code=log, dag=dag, title="Log", task_id=task_id,
+'airflow/log.html',
+log=log, dag=dag, title=title, task_id=task_id,
 execution_date=execution_date, form=form)
 
 @expose('/task')


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 commented on issue #1875: [AIRFLOW-620] Add log refresh button to TI's log view page

2018-08-31 Thread GitBox
r39132 commented on issue #1875: [AIRFLOW-620] Add log refresh button to TI's 
log view page
URL: 
https://github.com/apache/incubator-airflow/pull/1875#issuecomment-417568453
 
 
   closing for now..  reopen to update and rebase.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro

2018-08-31 Thread GitBox
feng-tao commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and 
next_ds_nodash macro
URL: 
https://github.com/apache/incubator-airflow/pull/3821#issuecomment-417564477
 
 
   Thanks @r39132 . Will keep in mind next time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro

2018-08-31 Thread GitBox
r39132 commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and 
next_ds_nodash macro
URL: 
https://github.com/apache/incubator-airflow/pull/3821#issuecomment-417561701
 
 
   FYI... don't forget to close JIRA issues when you merge ... for future 
reference .. I just closed https://issues.apache.org/jira/browse/AIRFLOW-2983 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >