[GitHub] zhongjiajie commented on issue #4773: [AIRFLOW-3767] Correct bulk insert function

2019-02-27 Thread GitBox
zhongjiajie commented on issue #4773: [AIRFLOW-3767] Correct bulk insert 
function
URL: https://github.com/apache/airflow/pull/4773#issuecomment-468174775
 
 
   CI test failed many many many time for no detaile resone :sob: :sob: :sob:


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ryanyuan commented on a change in pull request #4786: [AIRFLOW-3966] Correct default bigquery_conn_id in BigQueryTableSensor

2019-02-27 Thread GitBox
ryanyuan commented on a change in pull request #4786: [AIRFLOW-3966] Correct 
default bigquery_conn_id in BigQueryTableSensor
URL: https://github.com/apache/airflow/pull/4786#discussion_r261073067
 
 

 ##
 File path: airflow/contrib/sensors/bigquery_sensor.py
 ##
 @@ -50,7 +50,7 @@ def __init__(self,
  project_id,
  dataset_id,
  table_id,
- bigquery_conn_id='bigquery_default_conn',
+ bigquery_conn_id='bigquery_default',
 
 Review comment:
   @mik-laj Cool! :thumbsup:


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on issue #4801: [AIRFLOW-XXX] Unpin cryptography (2.6.1 fixes issue in 2.6)

2019-02-27 Thread GitBox
XD-DENG commented on issue #4801: [AIRFLOW-XXX] Unpin cryptography (2.6.1 fixes 
issue in 2.6)
URL: https://github.com/apache/airflow/pull/4801#issuecomment-468151503
 
 
   @feng-tao @Fokko PTAL.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-2767) Upgrade gunicorn to 19.5.0 or greater to avoid moderate-severity CVE

2019-02-27 Thread Tao Feng (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Feng resolved AIRFLOW-2767.
---
Resolution: Fixed
  Assignee: (was: Siddharth Anand)

> Upgrade gunicorn to 19.5.0 or greater to avoid moderate-severity CVE
> 
>
> Key: AIRFLOW-2767
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2767
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Siddharth Anand
>Priority: Major
>
> Refer to the moderate-severity CVE in gunicorn 19.4.5 (apparently fixed in 
> 19.5.0)
> [https://nvd.nist.gov/vuln/detail/CVE-2018-1000164] 
> Currently, apache airflow's setup.py allows 19.4.0
> -s



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2767) Upgrade gunicorn to 19.5.0 or greater to avoid moderate-severity CVE

2019-02-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780144#comment-16780144
 ] 

ASF subversion and git services commented on AIRFLOW-2767:
--

Commit 71140dd2dfb63f16254420b8ba3a4a62b5919f45 in airflow's branch 
refs/heads/master from RosterIn
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=71140dd ]

[AIRFLOW-2767] - Upgrade gunicorn to 19.5.0 to avoid moderate-severity CVE 
(#4795)

Upgrade gunicorn to 19.5.0 to avoid moderate-severity CVE

> Upgrade gunicorn to 19.5.0 or greater to avoid moderate-severity CVE
> 
>
> Key: AIRFLOW-2767
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2767
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Siddharth Anand
>Assignee: Siddharth Anand
>Priority: Major
>
> Refer to the moderate-severity CVE in gunicorn 19.4.5 (apparently fixed in 
> 19.5.0)
> [https://nvd.nist.gov/vuln/detail/CVE-2018-1000164] 
> Currently, apache airflow's setup.py allows 19.4.0
> -s



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao commented on issue #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit…

2019-02-27 Thread GitBox
feng-tao commented on issue #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to 
avoid moderate-severit…
URL: https://github.com/apache/airflow/pull/4795#issuecomment-468149763
 
 
   thanks @RosterIn 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2767) Upgrade gunicorn to 19.5.0 or greater to avoid moderate-severity CVE

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780141#comment-16780141
 ] 

ASF GitHub Bot commented on AIRFLOW-2767:
-

feng-tao commented on pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 
19.5.0 to avoid moderate-severit…
URL: https://github.com/apache/airflow/pull/4795
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade gunicorn to 19.5.0 or greater to avoid moderate-severity CVE
> 
>
> Key: AIRFLOW-2767
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2767
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Siddharth Anand
>Assignee: Siddharth Anand
>Priority: Major
>
> Refer to the moderate-severity CVE in gunicorn 19.4.5 (apparently fixed in 
> 19.5.0)
> [https://nvd.nist.gov/vuln/detail/CVE-2018-1000164] 
> Currently, apache airflow's setup.py allows 19.4.0
> -s



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao merged pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit…

2019-02-27 Thread GitBox
feng-tao merged pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 
to avoid moderate-severit…
URL: https://github.com/apache/airflow/pull/4795
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit…

2019-02-27 Thread GitBox
codecov-io commented on issue #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 
to avoid moderate-severit…
URL: https://github.com/apache/airflow/pull/4795#issuecomment-468144716
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=h1) 
Report
   > Merging 
[#4795](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/b0c4d37fb5cbc471097f2383ac2f1c4f37a5c859?src=pr=desc)
 will **increase** coverage by `0.77%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/4795/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#4795  +/-   ##
   ==
   + Coverage   74.44%   75.22%   +0.77% 
   ==
 Files 450  450  
 Lines   2897030099+1129 
   ==
   + Hits2156722641+1074 
   - Misses   7403 7458  +55
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/task/task\_runner/base\_task\_runner.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy90YXNrL3Rhc2tfcnVubmVyL2Jhc2VfdGFza19ydW5uZXIucHk=)
 | `78.57% <0%> (-0.74%)` | :arrow_down: |
   | 
[...irflow/contrib/example\_dags/example\_gcp\_spanner.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2V4YW1wbGVfZGFncy9leGFtcGxlX2djcF9zcGFubmVyLnB5)
 | `0% <0%> (ø)` | :arrow_up: |
   | 
[.../kubernetes\_request\_factory/pod\_request\_factory.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2t1YmVybmV0ZXMva3ViZXJuZXRlc19yZXF1ZXN0X2ZhY3RvcnkvcG9kX3JlcXVlc3RfZmFjdG9yeS5weQ==)
 | `100% <0%> (ø)` | :arrow_up: |
   | 
[airflow/ti\_deps/dep\_context.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy90aV9kZXBzL2RlcF9jb250ZXh0LnB5)
 | `100% <0%> (ø)` | :arrow_up: |
   | 
[airflow/models/taskreschedule.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza3Jlc2NoZWR1bGUucHk=)
 | `100% <0%> (ø)` | :arrow_up: |
   | 
[airflow/operators/python\_operator.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvcHl0aG9uX29wZXJhdG9yLnB5)
 | `96.63% <0%> (+0.8%)` | :arrow_up: |
   | 
[airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=)
 | `94.05% <0%> (+1.4%)` | :arrow_up: |
   | 
[airflow/contrib/utils/gcp\_field\_validator.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL3V0aWxzL2djcF9maWVsZF92YWxpZGF0b3IucHk=)
 | `93.67% <0%> (+2.14%)` | :arrow_up: |
   | 
[airflow/utils/dates.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYXRlcy5weQ==)
 | `85.71% <0%> (+2.2%)` | :arrow_up: |
   | 
[airflow/contrib/hooks/gcp\_vision\_hook.py](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2djcF92aXNpb25faG9vay5weQ==)
 | `86.36% <0%> (+3.5%)` | :arrow_up: |
   | ... and [1 
more](https://codecov.io/gh/apache/airflow/pull/4795/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=footer). 
Last update 
[b0c4d37...66f5152](https://codecov.io/gh/apache/airflow/pull/4795?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #4801: [AIRFLOW-XXX] Unpin cryptography (2.6.1 fixes issue in 2.6)

2019-02-27 Thread GitBox
codecov-io commented on issue #4801: [AIRFLOW-XXX] Unpin cryptography (2.6.1 
fixes issue in 2.6)
URL: https://github.com/apache/airflow/pull/4801#issuecomment-468137177
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=h1) 
Report
   > Merging 
[#4801](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/2ade9126588cef252cc7406a4729976f95e1c66e?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/4801/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#4801  +/-   ##
   ==
   + Coverage   74.44%   74.44%   +<.01% 
   ==
 Files 450  450  
 Lines   2897028970  
   ==
   + Hits2156621567   +1 
   + Misses   7404 7403   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/4801/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=)
 | `92.64% <0%> (+0.05%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=footer). 
Last update 
[2ade912...ab1bf06](https://codecov.io/gh/apache/airflow/pull/4801?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] fenglu-g commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper failed session commit handling

2019-02-27 Thread GitBox
fenglu-g commented on a change in pull request #4769: [AIRFLOW-2511] Fix 
improper failed session commit handling
URL: https://github.com/apache/airflow/pull/4769#discussion_r261043400
 
 

 ##
 File path: airflow/jobs.py
 ##
 @@ -2168,147 +2168,149 @@ def _process_backfill_task_instances(self,
 # or leaf to root, as otherwise tasks might be
 # determined deadlocked while they are actually
 # waiting for their upstream to finish
+@provide_session
 
 Review comment:
   Any other concerns? @Fokko @ashb 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] fenglu-g commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper failed session commit handling

2019-02-27 Thread GitBox
fenglu-g commented on a change in pull request #4769: [AIRFLOW-2511] Fix 
improper failed session commit handling
URL: https://github.com/apache/airflow/pull/4769#discussion_r261043313
 
 

 ##
 File path: airflow/jobs.py
 ##
 @@ -2168,147 +2168,149 @@ def _process_backfill_task_instances(self,
 # or leaf to root, as otherwise tasks might be
 # determined deadlocked while they are actually
 # waiting for their upstream to finish
+@provide_session
 
 Review comment:
   I think SQLAlchemy does the pooling but is un-opinionated about how sessions 
are managed. The following access pattern is recommended per 
https://docs.sqlalchemy.org/en/latest/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it,
 which is what Airflow follows: 
https://github.com/apache/airflow/blob/c50a85146373bafb0cbf86850f834d63bd4dede8/airflow/utils/db.py#L37.
 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG opened a new pull request #4801: [AIRFLOW-XXX] Unpin cryptography (2.6.1 fixes issue in 2.6)

2019-02-27 Thread GitBox
XD-DENG opened a new pull request #4801: [AIRFLOW-XXX] Unpin cryptography 
(2.6.1 fixes issue in 2.6)
URL: https://github.com/apache/airflow/pull/4801
 
 
   There was issue in `cryptography` 2.6, and 
https://github.com/apache/airflow/commit/2ade9126588cef252cc7406a4729976f95e1c66e
 helped pin version of it to avoid the issue.
   
   `cryptography` 2.6.1 was released very fast though to fix this issue. So we 
can remove the pin on `cryptography`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #4781: [AIRFLOW-3962] Added graceful handling for creation of dag_run of a dag which doesn't have any task

2019-02-27 Thread GitBox
codecov-io edited a comment on issue #4781: [AIRFLOW-3962] Added graceful 
handling for creation of dag_run of a dag which doesn't have any task
URL: https://github.com/apache/airflow/pull/4781#issuecomment-467492161
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=h1) 
Report
   > Merging 
[#4781](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/2ade9126588cef252cc7406a4729976f95e1c66e?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/4781/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#4781  +/-   ##
   ==
   + Coverage   74.44%   74.44%   +<.01% 
   ==
 Files 450  450  
 Lines   2897028970  
   ==
   + Hits2156621567   +1 
   + Misses   7404 7403   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/jobs.py](https://codecov.io/gh/apache/airflow/pull/4781/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5)
 | `76.46% <100%> (ø)` | :arrow_up: |
   | 
[airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/4781/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=)
 | `92.64% <0%> (+0.05%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=footer). 
Last update 
[2ade912...690aed5](https://codecov.io/gh/apache/airflow/pull/4781?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #4799: [AIRFLOW-3975] Handle null inputs in attribute renderers.

2019-02-27 Thread GitBox
codecov-io commented on issue #4799: [AIRFLOW-3975] Handle null inputs in 
attribute renderers.
URL: https://github.com/apache/airflow/pull/4799#issuecomment-468114720
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=h1) 
Report
   > Merging 
[#4799](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/2ade9126588cef252cc7406a4729976f95e1c66e?src=pr=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/4799/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#4799  +/-   ##
   ==
   + Coverage   74.44%   74.45%   +0.01% 
   ==
 Files 450  450  
 Lines   2897028969   -1 
   ==
   + Hits2156621569   +3 
   + Misses   7404 7400   -4
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/www/utils.py](https://codecov.io/gh/apache/airflow/pull/4799/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdXRpbHMucHk=)
 | `75.39% <100%> (+1.43%)` | :arrow_up: |
   | 
[airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/4799/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=)
 | `92.64% <0%> (+0.05%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=footer). 
Last update 
[2ade912...3e65674](https://codecov.io/gh/apache/airflow/pull/4799?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3853) Duplicate Logs appearing in S3

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780002#comment-16780002
 ] 

ASF GitHub Bot commented on AIRFLOW-3853:
-

samuelwbock commented on pull request #4675: [AIRFLOW-3853] Default to delete 
local logs after remote upload
URL: https://github.com/apache/airflow/pull/4675
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3853) issues and references 
them in the PR title. For example, "\[AIRFLOW-3853\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3853
   
   ### Description
   
   - [X] We've recently started to see duplicate logs in S3. After digging into 
it, we discovered that this was due to our use of the new `reschedule` mode on 
our sensors. Because the same `try_number` is used when a task reschedules, the 
local log file frequently contains results from previous attempts. 
Additionally, because the `s3_task_helper.py` always tries to `append` the 
local log file to the remove log file, this can result in massive logs (we 
found one that 400 mb).
   
   To fix this, we'd like to remove the local log after a successful upload. 
Because the file is uploaded to S3, no data will be lost.
   
   ### Tests
   
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason: I've modified the following unit tests to cover the 
change to `s3_write`: `test_write`, `test_write_existing`, `test_write_raises`.
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [X] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Duplicate Logs appearing in S3
> --
>
> Key: AIRFLOW-3853
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3853
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.2
>Reporter: Sam Bock
>Assignee: Sam Bock
>Priority: Major
>
> We've recently started to see duplicate logs in S3. After digging into it, we 
> discovered that this was due to our use of the new `reschedule` mode on our 
> sensors. Because the same `try_number` is used when a task reschedules, the 
> local log file frequently contains results from previous attempts. 
> Additionally, because the `s3_task_helper.py` always tries to `append` the 
> local log file to the remove log file, this can result in massive logs (we 
> found one that 400 mb).
> To fix this, we'd like to remove the local log after a successful upload.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3853) Duplicate Logs appearing in S3

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780001#comment-16780001
 ] 

ASF GitHub Bot commented on AIRFLOW-3853:
-

samuelwbock commented on pull request #4675: [AIRFLOW-3853] Default to delete 
local logs after remote upload
URL: https://github.com/apache/airflow/pull/4675
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Duplicate Logs appearing in S3
> --
>
> Key: AIRFLOW-3853
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3853
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.2
>Reporter: Sam Bock
>Assignee: Sam Bock
>Priority: Major
>
> We've recently started to see duplicate logs in S3. After digging into it, we 
> discovered that this was due to our use of the new `reschedule` mode on our 
> sensors. Because the same `try_number` is used when a task reschedules, the 
> local log file frequently contains results from previous attempts. 
> Additionally, because the `s3_task_helper.py` always tries to `append` the 
> local log file to the remove log file, this can result in massive logs (we 
> found one that 400 mb).
> To fix this, we'd like to remove the local log after a successful upload.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] samuelwbock closed pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload

2019-02-27 Thread GitBox
samuelwbock closed pull request #4675: [AIRFLOW-3853] Default to delete local 
logs after remote upload
URL: https://github.com/apache/airflow/pull/4675
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] samuelwbock opened a new pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload

2019-02-27 Thread GitBox
samuelwbock opened a new pull request #4675: [AIRFLOW-3853] Default to delete 
local logs after remote upload
URL: https://github.com/apache/airflow/pull/4675
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3853) issues and references 
them in the PR title. For example, "\[AIRFLOW-3853\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3853
   
   ### Description
   
   - [X] We've recently started to see duplicate logs in S3. After digging into 
it, we discovered that this was due to our use of the new `reschedule` mode on 
our sensors. Because the same `try_number` is used when a task reschedules, the 
local log file frequently contains results from previous attempts. 
Additionally, because the `s3_task_helper.py` always tries to `append` the 
local log file to the remove log file, this can result in massive logs (we 
found one that 400 mb).
   
   To fix this, we'd like to remove the local log after a successful upload. 
Because the file is uploaded to S3, no data will be lost.
   
   ### Tests
   
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason: I've modified the following unit tests to cover the 
change to `s3_write`: `test_write`, `test_write_existing`, `test_write_raises`.
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [X] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] zhongjiajie commented on a change in pull request #4773: [AIRFLOW-3767] Correct bulk insert function

2019-02-27 Thread GitBox
zhongjiajie commented on a change in pull request #4773: [AIRFLOW-3767] Correct 
bulk insert function
URL: https://github.com/apache/airflow/pull/4773#discussion_r261014804
 
 

 ##
 File path: airflow/hooks/oracle_hook.py
 ##
 @@ -199,12 +199,20 @@ def bulk_insert_rows(self, table, rows, 
target_fields=None, commit_every=5000):
 Default 5000. Set greater than 0. Set 1 to insert each row in each 
transaction
 :type commit_every: int
 """
+if not rows:
+raise ValueError("parameter rows could not be None or empty 
iterable")
 conn = self.get_conn()
 cursor = conn.cursor()
-values = ', '.join(':%s' % i for i in range(1, len(target_fields) + 1))
-prepared_stm = 'insert into {tablename} ({columns}) values 
({values})'.format(
+if target_fields:
+columns = ', '.join(target_fields)
 
 Review comment:
   @Fokko Change the code as you said, waiting for CI pass, PTAL.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib

2019-02-27 Thread GitBox
feng-tao commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib
URL: https://github.com/apache/airflow/pull/4800#issuecomment-468094775
 
 
   @XD-DENG , let's keep it as it is. If anyone confirms that the issue has 
been fixed, we could unpin later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] zhongjiajie commented on issue #4699: [AIRFLOW-3881] Correct to_csv row number

2019-02-27 Thread GitBox
zhongjiajie commented on issue #4699: [AIRFLOW-3881] Correct to_csv row number
URL: https://github.com/apache/airflow/pull/4699#issuecomment-468094463
 
 
   @Fokko You are welcome.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #4797: [AIRFLOW-3973] Run each Alembic migration in its own transaction

2019-02-27 Thread GitBox
codecov-io commented on issue #4797: [AIRFLOW-3973] Run each Alembic migration 
in its own transaction
URL: https://github.com/apache/airflow/pull/4797#issuecomment-468092563
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/4797?src=pr=h1) 
Report
   > Merging 
[#4797](https://codecov.io/gh/apache/airflow/pull/4797?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/2ade9126588cef252cc7406a4729976f95e1c66e?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/4797/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/4797?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#4797   +/-   ##
   ===
 Coverage   74.44%   74.44%   
   ===
 Files 450  450   
 Lines   2897028970   
   ===
 Hits2156621566   
 Misses   7404 7404
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/4797?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/4797?src=pr=footer). 
Last update 
[2ade912...ad67c68](https://codecov.io/gh/apache/airflow/pull/4797?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib

2019-02-27 Thread GitBox
XD-DENG commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib
URL: https://github.com/apache/airflow/pull/4800#issuecomment-468075302
 
 
   Hi @feng-tao, 2.6.1 of cryptograph was released a few minutes ago.
   
   I’m afk. Do you want to try unpinning it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao merged pull request #4800: [AIRFLOW-XXX] Fix CI for broken lib

2019-02-27 Thread GitBox
feng-tao merged pull request #4800: [AIRFLOW-XXX] Fix CI for broken lib
URL: https://github.com/apache/airflow/pull/4800
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib

2019-02-27 Thread GitBox
feng-tao commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib
URL: https://github.com/apache/airflow/pull/4800#issuecomment-468072275
 
 
   CI fixed. merged now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib

2019-02-27 Thread GitBox
feng-tao commented on issue #4800: [AIRFLOW-XXX] Fix CI for broken lib
URL: https://github.com/apache/airflow/pull/4800#issuecomment-468062616
 
 
   PTAL @Fokko @kaxil @ashb @XD-DENG 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ryanyuan commented on a change in pull request #4786: [AIRFLOW-3966] Correct default bigquery_conn_id in BigQueryTableSensor

2019-02-27 Thread GitBox
ryanyuan commented on a change in pull request #4786: [AIRFLOW-3966] Correct 
default bigquery_conn_id in BigQueryTableSensor
URL: https://github.com/apache/airflow/pull/4786#discussion_r260972377
 
 

 ##
 File path: airflow/contrib/sensors/bigquery_sensor.py
 ##
 @@ -50,7 +50,7 @@ def __init__(self,
  project_id,
  dataset_id,
  table_id,
- bigquery_conn_id='bigquery_default_conn',
+ bigquery_conn_id='bigquery_default',
 
 Review comment:
   @mik-laj @Fokko Agreed.
   
   @mik-laj Nice initiative for your changes. Will you have time to continue 
working on it? If not, I would love to take the whole task. Cheers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3975) Handle null values in attr renderers

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779794#comment-16779794
 ] 

ASF GitHub Bot commented on AIRFLOW-3975:
-

jmcarp commented on pull request #4799: [AIRFLOW-3975] Handle null inputs in 
attribute renderers.
URL: https://github.com/apache/airflow/pull/4799
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3975
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement 
Proposal([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Handle null values in attr renderers
> 
>
> Key: AIRFLOW-3975
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3975
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Josh Carp
>Assignee: Josh Carp
>Priority: Trivial
>
> Some renderers in `attr_renderers` raise unhandled exceptions when given null 
> inputs. For example, the `python_callable` renderer raises an error if passed 
> `None`. Some operators allow null values for this attribute, such as 
> `TriggerDagRunOperator`. I think all renderers should handle null input by 
> returning the empty string and not raising an exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jmcarp opened a new pull request #4799: [AIRFLOW-3975] Handle null inputs in attribute renderers.

2019-02-27 Thread GitBox
jmcarp opened a new pull request #4799: [AIRFLOW-3975] Handle null inputs in 
attribute renderers.
URL: https://github.com/apache/airflow/pull/4799
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3975
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement 
Proposal([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3975) Handle null values in attr renderers

2019-02-27 Thread Josh Carp (JIRA)
Josh Carp created AIRFLOW-3975:
--

 Summary: Handle null values in attr renderers
 Key: AIRFLOW-3975
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3975
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Josh Carp
Assignee: Josh Carp


Some renderers in `attr_renderers` raise unhandled exceptions when given null 
inputs. For example, the `python_callable` renderer raises an error if passed 
`None`. Some operators allow null values for this attribute, such as 
`TriggerDagRunOperator`. I think all renderers should handle null input by 
returning the empty string and not raising an exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] tayloramurphy opened a new pull request #4798: Add GitLab to list of organizations using Airflow

2019-02-27 Thread GitBox
tayloramurphy opened a new pull request #4798: Add GitLab to list of 
organizations using Airflow
URL: https://github.com/apache/airflow/pull/4798
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement 
Proposal([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Adds GitLab as an organization using Airflow.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] lucafuji edited a comment on issue #4783: [AIRFLOW-578] Fix check return code

2019-02-27 Thread GitBox
lucafuji edited a comment on issue #4783: [AIRFLOW-578] Fix check return code 
URL: https://github.com/apache/airflow/pull/4783#issuecomment-468023066
 
 
   > thanks @lucafuji for the fix. Just curious: does uber use Airflow as well? 
If that's the case, it would be great to put the company under "who uses 
Airflow" section :) BTW, very cool for your team's aresdb project :)
   
   Thanks, @feng-tao.  We have an internal workflow management system. I'm not 
very familiar with that now but if you want to touch base, I can connect you 
with their manager. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] lucafuji commented on issue #4783: [AIRFLOW-578] Fix check return code

2019-02-27 Thread GitBox
lucafuji commented on issue #4783: [AIRFLOW-578] Fix check return code 
URL: https://github.com/apache/airflow/pull/4783#issuecomment-468025010
 
 
   BTW @feng-tao BTW seems the fix does not integrate well with the 
impersonation test.
   I take a look at the tests but I'm not sure what I'm doing wrong.  Can you 
help validate whether treat non zero return code as failure will break the 
impersonation? If that's the case, it's better someone from airflow to fix the 
issue https://issues.apache.org/jira/browse/AIRFLOW-578 via another way.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] lucafuji edited a comment on issue #4783: [AIRFLOW-578] Fix check return code

2019-02-27 Thread GitBox
lucafuji edited a comment on issue #4783: [AIRFLOW-578] Fix check return code 
URL: https://github.com/apache/airflow/pull/4783#issuecomment-468023066
 
 
   > thanks @lucafuji for the fix. Just curious: does uber use Airflow as well? 
If that's the case, it would be great to put the company under "who uses 
Airflow" section :) BTW, very cool for your team's aresdb project :)
   
   Thanks, @feng-tao.  We have an internal workflow management tool which was 
forked from airflow 3 years ago. I'm not very familiar with that now but if you 
want to touch base, I can connect you with their manager. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] lucafuji commented on a change in pull request #4783: [AIRFLOW-578] Fix check return code

2019-02-27 Thread GitBox
lucafuji commented on a change in pull request #4783: [AIRFLOW-578] Fix check 
return code 
URL: https://github.com/apache/airflow/pull/4783#discussion_r260934052
 
 

 ##
 File path: airflow/jobs.py
 ##
 @@ -2559,7 +2569,13 @@ def signal_handler(signum, frame):
 while True:
 # Monitor the task to see if it's done
 return_code = self.task_runner.return_code()
+
 if return_code is not None:
+if return_code != 0:
+msg = ("LocalTaskJob process exited with non zero 
status "
+   "{}".format(return_code))
+raise AirflowException(msg)
 
 Review comment:
   that's exactly the issue in 
[AIRFLOW-578](https://issues.apache.org/jira/browse/AIRFLOW-578).
   BaseJob ignores the return code of the spawned process. which makes even 
that process is killed or returned abnormally, it will think it finishes with 
success.
   So raise an exception here will make the job finished with failures


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] lucafuji commented on issue #4783: [AIRFLOW-578] Fix check return code

2019-02-27 Thread GitBox
lucafuji commented on issue #4783: [AIRFLOW-578] Fix check return code 
URL: https://github.com/apache/airflow/pull/4783#issuecomment-468023066
 
 
   > thanks @lucafuji for the fix. Just curious: does uber use Airflow as well? 
If that's the case, it would be great to put the company under "who uses 
Airflow" section :) BTW, very cool for your team's aresdb project :)
   Thanks, @feng-tao.  We have an internal workflow management tool which was 
forked from airflow 3 years ago. I'm not very familiar with that now but if you 
want to touch base, I can connect you with their manager. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] lucafuji commented on a change in pull request #4783: [AIRFLOW-578] Fix check return code

2019-02-27 Thread GitBox
lucafuji commented on a change in pull request #4783: [AIRFLOW-578] Fix check 
return code 
URL: https://github.com/apache/airflow/pull/4783#discussion_r260934052
 
 

 ##
 File path: airflow/jobs.py
 ##
 @@ -2559,7 +2569,13 @@ def signal_handler(signum, frame):
 while True:
 # Monitor the task to see if it's done
 return_code = self.task_runner.return_code()
+
 if return_code is not None:
+if return_code != 0:
+msg = ("LocalTaskJob process exited with non zero 
status "
+   "{}".format(return_code))
+raise AirflowException(msg)
 
 Review comment:
   that's exactly the issue in 
[AIRFLOW-578](https://issues.apache.org/jira/browse/AIRFLOW-578).
   BaseJob ignores the return code of the spawned process. which makes even 
that process is killed or returned abnormally, it will think it finishes with 
success


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3974) Having task with `trigger_rule='one_success'` causes failed dag to be marked succesful

2019-02-27 Thread David (JIRA)
David created AIRFLOW-3974:
--

 Summary: Having task with `trigger_rule='one_success'` causes 
failed dag to be marked succesful
 Key: AIRFLOW-3974
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3974
 Project: Apache Airflow
  Issue Type: Bug
  Components: scheduler
Reporter: David


 

The following dag will be marked successful and the failure callback will not 
run
{code:java}
import datetime
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
from util.slack.callback import post_error_to_slack_callback
dag = DAG('a_slack_dag', schedule_interval=None, 
start_date=datetime.datetime.now(),
 on_failure_callback=post_error_to_slack_callback)
with dag:
 succeed = DummyOperator(task_id='will_succeed')

 def raise_it():
 raise Exception('raised')

 fail = PythonOperator(task_id='branch_operator_fail', python_callable=raise_it)
option_one = DummyOperator(task_id='option_one')
 option_two = DummyOperator(task_id='option_two')
 final_task = DummyOperator(task_id='final_task', trigger_rule='one_success')
succeed >> fail >> option_one >> final_task
 fail >> option_two >> final_task
{code}
 

However if the `one_success` rule is removed from `final_task` the dag will 
correctly be marked as failed. While the example doesn't explicitly show it, 
the failing task is a branch python operator and only one of the option task 
will ever be run, hence the requirement for th `one_success` rule.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3853) Duplicate Logs appearing in S3

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779742#comment-16779742
 ] 

ASF GitHub Bot commented on AIRFLOW-3853:
-

samuelwbock commented on pull request #4675: [AIRFLOW-3853] Default to delete 
local logs after remote upload
URL: https://github.com/apache/airflow/pull/4675
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Duplicate Logs appearing in S3
> --
>
> Key: AIRFLOW-3853
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3853
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.2
>Reporter: Sam Bock
>Assignee: Sam Bock
>Priority: Major
>
> We've recently started to see duplicate logs in S3. After digging into it, we 
> discovered that this was due to our use of the new `reschedule` mode on our 
> sensors. Because the same `try_number` is used when a task reschedules, the 
> local log file frequently contains results from previous attempts. 
> Additionally, because the `s3_task_helper.py` always tries to `append` the 
> local log file to the remove log file, this can result in massive logs (we 
> found one that 400 mb).
> To fix this, we'd like to remove the local log after a successful upload.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] samuelwbock closed pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload

2019-02-27 Thread GitBox
samuelwbock closed pull request #4675: [AIRFLOW-3853] Default to delete local 
logs after remote upload
URL: https://github.com/apache/airflow/pull/4675
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] samuelwbock opened a new pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload

2019-02-27 Thread GitBox
samuelwbock opened a new pull request #4675: [AIRFLOW-3853] Default to delete 
local logs after remote upload
URL: https://github.com/apache/airflow/pull/4675
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3853) issues and references 
them in the PR title. For example, "\[AIRFLOW-3853\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3853
   
   ### Description
   
   - [X] We've recently started to see duplicate logs in S3. After digging into 
it, we discovered that this was due to our use of the new `reschedule` mode on 
our sensors. Because the same `try_number` is used when a task reschedules, the 
local log file frequently contains results from previous attempts. 
Additionally, because the `s3_task_helper.py` always tries to `append` the 
local log file to the remove log file, this can result in massive logs (we 
found one that 400 mb).
   
   To fix this, we'd like to remove the local log after a successful upload. 
Because the file is uploaded to S3, no data will be lost.
   
   ### Tests
   
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason: I've modified the following unit tests to cover the 
change to `s3_write`: `test_write`, `test_write_existing`, `test_write_raises`.
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [X] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] galak75 edited a comment on issue #4743: [AIRFLOW-3871] render Operators template fields recursively

2019-02-27 Thread GitBox
galak75 edited a comment on issue #4743: [AIRFLOW-3871] render Operators 
template fields recursively
URL: https://github.com/apache/airflow/pull/4743#issuecomment-468013507
 
 
   @Fokko : I could not decide between :
   
   - a recursive template rendering  over inner attributes approach (as done in 
this pull request)
   - a duck typing custom rendering approach (like in [this 
comment](https://issues.apache.org/jira/browse/AIRFLOW-2508?focusedCommentId=16654887=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16654887))
   
   (see  https://issues.apache.org/jira/browse/AIRFLOW-3871)
   Is this recursive solution accepted ? is it preferred to a duck typing 
solution?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] galak75 commented on issue #4743: [AIRFLOW-3871] render Operators template fields recursively

2019-02-27 Thread GitBox
galak75 commented on issue #4743: [AIRFLOW-3871] render Operators template 
fields recursively
URL: https://github.com/apache/airflow/pull/4743#issuecomment-468013507
 
 
   @Fokko : I could not decide between :
   
   - a recursive template rendering  over inner attributes approach (as done in 
this pull request)
   - a duck typing custom rendering approach (like in [this 
comment](https://issues.apache.org/jira/browse/AIRFLOW-2508?focusedCommentId=16654887=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16654887))
   
   (see  https://issues.apache.org/jira/browse/AIRFLOW-3871)
   Is this recursive solution accepted ?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database

2019-02-27 Thread Elliott Shugerman (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Shugerman updated AIRFLOW-3973:
---
Description: 
h2. Notes:
 * This does not occur if the database is already initialized. If it is, run 
`resetdb` instead to observe the bug.
 * This does not occur with the default SQLite database.

h2. Example

{{ERROR [airflow.models.DagBag] Failed to import: 
/home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 1236, in _execute_context cursor, statement, parameters, context File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 536, in do_execute cursor.execute(statement, parameters) 
psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
variable}}
h2. Explanation

The first thing {{airflow initdb}} does is run the Alembic migrations. All 
migrations are run in one transaction. Most tables, including the {{variable}} 
table, are defined in the initial migration. A [later 
migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
 imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
calls its {{collect_dags}} method, which scans the DAGs directory and attempts 
to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it 
will query the database to see if that {{Variable}} is defined in the 
{{variable}} table. It's not clear to me how exactly the connection for that 
query is created, but I think it is apparent that it does _not_ use the same 
transaction that is used to run the migrations. Since the migrations are not 
yet complete, and all migrations are run in one transaction, the migration that 
creates the {{variable}} table has not yet been committed, and therefore the 
table does not exist to any other connection/transaction. This raises 
{{ProgrammingError}}, which is caught and logged by {{collect_dags}}.

 
h2. Proposed Solution

Run each Alembic migration in its own transaction. I will open a pull request 
which accomplishes this shortly.

  was:
h2. Notes:
 * This does not occur if the database is already initialized. If it is, run 
`resetdb` instead to observe the bug.
 * This does not occur with the default SQLite database.

h2. Example

{{ERROR [airflow.models.DagBag] Failed to import: 
/home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 1236, in _execute_context cursor, statement, parameters, context File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 536, in do_execute cursor.execute(statement, parameters) 
psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
variable}}
h2. Explanation

The first thing {{airflow initdb}} does is run the Alembic migrations. All 
migrations are run in one transaction. Most tables, including the {{variable}} 
table, are defined in the initial migration. A [later 
migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
 imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
calls its {{collect_dags}} method, which scans the DAGs directory and attempts 
to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it 
will query the database to see if that {{Variable}} is defined in the 
{{variable}} table. It's not clear to me how exactly the connection for that 
query is created, but I think it is a fair assumption that it does _not_ use 
the same transaction that is used to run the migrations. Since the migrations 
are not yet complete, and all migrations are run in one transaction, the 
migration that creates the {{variable}} table has not yet been committed, and 
therefore the table does not exist to any other connection/transaction. This 
raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}.

 
h2. Proposed Solution

Run each Alembic migration in its own transaction. I will open a pull request 
which accomplishes this shortly.


> `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is 
> used for the internal database
> ---
>
> Key: AIRFLOW-3973
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3973
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Elliott Shugerman
>Assignee: Elliott Shugerman
>Priority: Minor
>
> h2. Notes:
>  * This does not occur if the database is 

[jira] [Commented] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779731#comment-16779731
 ] 

ASF GitHub Bot commented on AIRFLOW-3973:
-

eeshugerman commented on pull request #4797: [AIRFLOW-3973] problem: `initdb` 
spams log with errors | solution: run each migration in its own transaction
URL: https://github.com/apache/airflow/pull/4797
 
 
   ### Jira
   https://issues.apache.org/jira/browse/AIRFLOW-3973
   
   ### Description
   If `Variable`s are used in DAGs, and Postgres is used for the internal 
database, a fresh `$ airflow initdb` (or `$ airflow resetdb`) spams the logs 
with error messages (but does not fail).
   
   This commit corrects this by running each migration in a separate 
transaction.
   
   See Jira ticket for more details.
   
   I have tested this change with the default SQLite database and, of course, 
with Postgres.
   ### Tests
   
   No tests included as this is a one line change which adds no functionality 
whatsoever.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is 
> used for the internal database
> ---
>
> Key: AIRFLOW-3973
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3973
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Elliott Shugerman
>Assignee: Elliott Shugerman
>Priority: Minor
>
> h2. Notes:
>  * This does not occur if the database is already initialized. If it is, run 
> `resetdb` instead to observe the bug.
>  * This does not occur with the default SQLite database.
> h2. Example
> {{ERROR [airflow.models.DagBag] Failed to import: 
> /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): 
> File 
> "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
>  line 1236, in _execute_context cursor, statement, parameters, context File 
> "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
>  line 536, in do_execute cursor.execute(statement, parameters) 
> psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
> variable}}
> h2. Explanation
> The first thing {{airflow initdb}} does is run the Alembic migrations. All 
> migrations are run in one transaction. Most tables, including the 
> {{variable}} table, are defined in the initial migration. A [later 
> migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
>  imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
> calls its {{collect_dags}} method, which scans the DAGs directory and 
> attempts to load all DAGs it finds. When it loads a DAG that uses a 
> {{Variable}}, it will query the database to see if that {{Variable}} is 
> defined in the {{variable}} table. It's not clear to me how exactly the 
> connection for that query is created, but I think it is apparent that it does 
> _not_ use the same transaction that is used to run the migrations. Since the 
> migrations are not yet complete, and all migrations are run in one 
> transaction, the migration that creates the {{variable}} table has not yet 
> been committed, and therefore the table does not exist to any other 
> connection/transaction. This raises {{ProgrammingError}}, which is caught and 
> logged by {{collect_dags}}.
>  
> h2. Proposed Solution
> Run each Alembic migration in its own transaction. I will open a pull request 
> which accomplishes this shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] eeshugerman opened a new pull request #4797: [AIRFLOW-3973] problem: `initdb` spams log with errors | solution: run each migration in its own transaction

2019-02-27 Thread GitBox
eeshugerman opened a new pull request #4797: [AIRFLOW-3973] problem: `initdb` 
spams log with errors | solution: run each migration in its own transaction
URL: https://github.com/apache/airflow/pull/4797
 
 
   ### Jira
   https://issues.apache.org/jira/browse/AIRFLOW-3973
   
   ### Description
   If `Variable`s are used in DAGs, and Postgres is used for the internal 
database, a fresh `$ airflow initdb` (or `$ airflow resetdb`) spams the logs 
with error messages (but does not fail).
   
   This commit corrects this by running each migration in a separate 
transaction.
   
   See Jira ticket for more details.
   
   I have tested this change with the default SQLite database and, of course, 
with Postgres.
   ### Tests
   
   No tests included as this is a one line change which adds no functionality 
whatsoever.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] galak75 commented on issue #4691: [AIRFLOW-1814] : templatize PythonOperator op_args and op_kwargs fields

2019-02-27 Thread GitBox
galak75 commented on issue #4691: [AIRFLOW-1814] : templatize PythonOperator 
op_args and op_kwargs fields
URL: https://github.com/apache/airflow/pull/4691#issuecomment-468008932
 
 
   Thanks a lot @Fokko! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] lucafuji removed a comment on issue #4783: [AIRFLOW-578] Fix check return code

2019-02-27 Thread GitBox
lucafuji removed a comment on issue #4783: [AIRFLOW-578] Fix check return code 
URL: https://github.com/apache/airflow/pull/4783#issuecomment-467673428
 
 
   @ddavydov Sorry for bothering. But it seems my newly introduced tests broke 
your impersonation tests. I took a look but have no idea why it broke. Would 
you mind help take a look, thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database

2019-02-27 Thread Elliott Shugerman (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Shugerman updated AIRFLOW-3973:
---
Description: 
h2. Notes:
 * This does not occur if the database is already initialized. If it is, run 
`resetdb` instead to observe the bug.
 * This does not occur with the default SQLite database.

h2. Example

{{ERROR [airflow.models.DagBag] Failed to import: 
/home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 1236, in _execute_context cursor, statement, parameters, context File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 536, in do_execute cursor.execute(statement, parameters) 
psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
variable}}
h2. Explanation

The first thing {{airflow initdb}} does is run the Alembic migrations. All 
migrations are run in one transaction. Most tables, including the {{variable}} 
table, are defined in the initial migration. A [later 
migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
 imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
calls its {{collect_dags}} method, which scans the DAGs directory and attempts 
to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it 
will query the database to see if that {{Variable}} is defined in the 
{{variable}} table. It's not clear to me how exactly the connection for that 
query is created, but I think it is a fair assumption that it does _not_ use 
the same transaction that is used to run the migrations. Since the migrations 
are not yet complete, and all migrations are run in one transaction, the 
migration that creates the {{variable}} table has not yet been committed, and 
therefore the table does not exist to any other connection/transaction. This 
raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}.

 
h2. Proposed Solution

Run each Alembic migration in its own transaction. I will open a pull request 
which accomplishes this shortly.

  was:
h2. Example

{{ERROR [airflow.models.DagBag] Failed to import: 
/home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 1236, in _execute_context cursor, statement, parameters, context File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 536, in do_execute cursor.execute(statement, parameters) 
psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
variable}}
h2. Explanation

The first thing {{airflow initdb}} does is run the Alembic migrations. All 
migrations are run in one transaction. Most tables, including the {{variable}} 
table, are defined in the initial migration. A [later 
migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
 imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
calls its {{collect_dags}} method, which scans the DAGs directory and attempts 
to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it 
will query the database to see if that {{Variable}} is defined in the 
{{variable}} table. It's not clear to me how exactly the connection for that 
query is created, but I think it is a fair assumption that it does _not_ use 
the same transaction that is used to run the migrations. Since the migrations 
are not yet complete, and all migrations are run in one transaction, the 
migration that creates the {{variable}} table has not yet been committed, and 
therefore the table does not exist to any other connection/transaction. This 
raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}.

NOTE: This does not occur with the default SQLite database.
h2. Proposed Solution

Run each Alembic migration in its own transaction. I will open a pull request 
which accomplishes this shortly.


> `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is 
> used for the internal database
> ---
>
> Key: AIRFLOW-3973
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3973
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Elliott Shugerman
>Assignee: Elliott Shugerman
>Priority: Minor
>
> h2. Notes:
>  * This does not occur if the database is already initialized. If it is, run 
> `resetdb` instead to observe the bug.
>  * This does not occur with the default 

[jira] [Commented] (AIRFLOW-3853) Duplicate Logs appearing in S3

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779683#comment-16779683
 ] 

ASF GitHub Bot commented on AIRFLOW-3853:
-

samuelwbock commented on pull request #4675: [AIRFLOW-3853] Default to delete 
local logs after remote upload
URL: https://github.com/apache/airflow/pull/4675
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Duplicate Logs appearing in S3
> --
>
> Key: AIRFLOW-3853
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3853
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.2
>Reporter: Sam Bock
>Assignee: Sam Bock
>Priority: Major
>
> We've recently started to see duplicate logs in S3. After digging into it, we 
> discovered that this was due to our use of the new `reschedule` mode on our 
> sensors. Because the same `try_number` is used when a task reschedules, the 
> local log file frequently contains results from previous attempts. 
> Additionally, because the `s3_task_helper.py` always tries to `append` the 
> local log file to the remove log file, this can result in massive logs (we 
> found one that 400 mb).
> To fix this, we'd like to remove the local log after a successful upload.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3853) Duplicate Logs appearing in S3

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779684#comment-16779684
 ] 

ASF GitHub Bot commented on AIRFLOW-3853:
-

samuelwbock commented on pull request #4675: [AIRFLOW-3853] Default to delete 
local logs after remote upload
URL: https://github.com/apache/airflow/pull/4675
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3853) issues and references 
them in the PR title. For example, "\[AIRFLOW-3853\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3853
   
   ### Description
   
   - [X] We've recently started to see duplicate logs in S3. After digging into 
it, we discovered that this was due to our use of the new `reschedule` mode on 
our sensors. Because the same `try_number` is used when a task reschedules, the 
local log file frequently contains results from previous attempts. 
Additionally, because the `s3_task_helper.py` always tries to `append` the 
local log file to the remove log file, this can result in massive logs (we 
found one that 400 mb).
   
   To fix this, we'd like to remove the local log after a successful upload. 
Because the file is uploaded to S3, no data will be lost.
   
   ### Tests
   
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason: I've modified the following unit tests to cover the 
change to `s3_write`: `test_write`, `test_write_existing`, `test_write_raises`.
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [X] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Duplicate Logs appearing in S3
> --
>
> Key: AIRFLOW-3853
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3853
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.10.2
>Reporter: Sam Bock
>Assignee: Sam Bock
>Priority: Major
>
> We've recently started to see duplicate logs in S3. After digging into it, we 
> discovered that this was due to our use of the new `reschedule` mode on our 
> sensors. Because the same `try_number` is used when a task reschedules, the 
> local log file frequently contains results from previous attempts. 
> Additionally, because the `s3_task_helper.py` always tries to `append` the 
> local log file to the remove log file, this can result in massive logs (we 
> found one that 400 mb).
> To fix this, we'd like to remove the local log after a successful upload.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] samuelwbock opened a new pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload

2019-02-27 Thread GitBox
samuelwbock opened a new pull request #4675: [AIRFLOW-3853] Default to delete 
local logs after remote upload
URL: https://github.com/apache/airflow/pull/4675
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3853) issues and references 
them in the PR title. For example, "\[AIRFLOW-3853\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3853
   
   ### Description
   
   - [X] We've recently started to see duplicate logs in S3. After digging into 
it, we discovered that this was due to our use of the new `reschedule` mode on 
our sensors. Because the same `try_number` is used when a task reschedules, the 
local log file frequently contains results from previous attempts. 
Additionally, because the `s3_task_helper.py` always tries to `append` the 
local log file to the remove log file, this can result in massive logs (we 
found one that 400 mb).
   
   To fix this, we'd like to remove the local log after a successful upload. 
Because the file is uploaded to S3, no data will be lost.
   
   ### Tests
   
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason: I've modified the following unit tests to cover the 
change to `s3_write`: `test_write`, `test_write_existing`, `test_write_raises`.
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [X] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] samuelwbock closed pull request #4675: [AIRFLOW-3853] Default to delete local logs after remote upload

2019-02-27 Thread GitBox
samuelwbock closed pull request #4675: [AIRFLOW-3853] Default to delete local 
logs after remote upload
URL: https://github.com/apache/airflow/pull/4675
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-3964) Reduce duplicated tasks and optimize with scheduler embedded sensor

2019-02-27 Thread Yingbo Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingbo Wang updated AIRFLOW-3964:
-
Description: 
h2. Problem
h3. Airflow Sensor:

Sensors are a certain type of operator that will keep running until a certain 
criterion is met. Examples include a specific file landing in HDFS or S3, a 
partition appearing in Hive, or a specific time of the day. Sensors are derived 
from BaseSensorOperator and run a poke method at a specified poke_interval 
until it returns True.

Airflow Sensor duplication is a normal problem for large scale airflow project. 
There are duplicated partitions needing to be detected from same/different DAG. 
In Airbnb there are 88 boxes running four different types of sensors everyday. 
The number of running sensor tasks ranges from 8k to 16k, which takes great 
amount of resources. Although Airflow team had redirected all sensors to a 
specific queue to allocate relatively minor resource, there is still large room 
to reduce the number of workers and relief DB pressure by optimizing the sensor 
mechanism.

Existing sensor implementation creates an identical task for any sensor task 
with specific dag_id, task_id and execution_date. This task is responsible of 
keeping querying DB until the specified partitions exists. Even if two tasks 
are waiting for same partition in DB, they are creating two connections with 
the DB and checking the status in two separate processes. In one hand, DB need 
to run duplicate jobs in multiple processes which will take both cpu and memory 
resources. At the same time, Airflow need to maintain a process for each sensor 
to query and wait for the partition/table to be created.

To optimize the sensor, add a hashcode for each partition decided by the set of 
(conn_id, schema, table, partition). Add dependencies between qualified sensors 
and partitions. Use a single entry for each sensor to query DB and avoid 
duplication in Airflow.

Add a sensor scheduling part in scheduler to:
 # Check partitions status to enable downstream sensor success and trigger 
sensor downstream tasks
 # Selecting all pending partitions in DB including:
 ## New coming partition sensor request
 ## Existing sensor request that is still waiting
 # With a time interval:
 ## Create the set of tasks for sensing all pending partitions.
 ## Kill previous sensor tasks
 # For the task mentioned in 3: Each task should check many partitions. We can 
introduce the sensor chunk number here for a maximum number of partitions one 
task should handle. The sensors keep updating partition status in Airflow DB 
during running.

  was:
h2. Problem
h3. Airflow Sensor:

Sensors are a certain type of operator that will keep running until a certain 
criterion is met. Examples include a specific file landing in HDFS or S3, a 
partition appearing in Hive, or a specific time of the day. Sensors are derived 
from BaseSensorOperator and run a poke method at a specified poke_interval 
until it returns True.

Airflow Sensor duplication is a normal problem for large scale airflow project. 
There are duplicated partitions needing to be detected from same/different DAG. 
In Airbnb there are 88 boxes running four different types of sensors everyday. 
The number of running sensor tasks ranges from 8k to 16k, which takes great 
amount of resources. Although Airflow team had redirected all sensors to a 
specific queue to allocate relatively minor resource, there is still large room 
to reduce the number of workers and relief DB pressure by optimizing the sensor 
mechanism.

Existing sensor implementation creates an identical task for any sensor task 
with specific dag_id, task_id and execution_date. This task is responsible of 
keeping querying DB until the specified partitions exists. Even if two tasks 
are waiting for same partition in DB, they are creating two connections with 
the DB and checking the status in two separate processes. In one hand, DB need 
to run duplicate jobs in multiple processes which will take both cpu and memory 
resources. At the same time, Airflow need to maintain a process for each sensor 
to query and wait for the partition/table to be created.

To optimize the sensor, add a hashcode for each partition decided by the set of 
(conn_id, schema, table, partition). Add dependencies between qualified sensors 
and partitions. Use a single entry for each sensor to query DB and avoid 
duplication in Airflow.

Add a sensor scheduling part in scheduler to:
 # Check partitions status to enable downstream sensor success and trigger 
sensor downstream tasks
 # Selecting all pending partitions in DB including:
 ## New coming partition sensor request
 ## Existing sensor request that is still waiting
 ## With a time interval:
 ### Create the set of tasks for sensing all pending partitions.
 ### Kill previous sensor tasks

 # For the task mentioned in 3: Each task should check many partitions. 

[jira] [Work started] (AIRFLOW-3964) Reduce duplicated tasks and optimize with scheduler embedded sensor

2019-02-27 Thread Yingbo Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3964 started by Yingbo Wang.

> Reduce duplicated tasks and optimize with scheduler embedded sensor 
> 
>
> Key: AIRFLOW-3964
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3964
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: dependencies, operators, scheduler
>Reporter: Yingbo Wang
>Assignee: Yingbo Wang
>Priority: Critical
>
> h2. Problem
> h3. Airflow Sensor:
> Sensors are a certain type of operator that will keep running until a certain 
> criterion is met. Examples include a specific file landing in HDFS or S3, a 
> partition appearing in Hive, or a specific time of the day. Sensors are 
> derived from BaseSensorOperator and run a poke method at a specified 
> poke_interval until it returns True.
> Airflow Sensor duplication is a normal problem for large scale airflow 
> project. There are duplicated partitions needing to be detected from 
> same/different DAG. In Airbnb there are 88 boxes running four different types 
> of sensors everyday. The number of running sensor tasks ranges from 8k to 
> 16k, which takes great amount of resources. Although Airflow team had 
> redirected all sensors to a specific queue to allocate relatively minor 
> resource, there is still large room to reduce the number of workers and 
> relief DB pressure by optimizing the sensor mechanism.
> Existing sensor implementation creates an identical task for any sensor task 
> with specific dag_id, task_id and execution_date. This task is responsible of 
> keeping querying DB until the specified partitions exists. Even if two tasks 
> are waiting for same partition in DB, they are creating two connections with 
> the DB and checking the status in two separate processes. In one hand, DB 
> need to run duplicate jobs in multiple processes which will take both cpu and 
> memory resources. At the same time, Airflow need to maintain a process for 
> each sensor to query and wait for the partition/table to be created.
> To optimize the sensor, add a hashcode for each partition decided by the set 
> of (conn_id, schema, table, partition). Add dependencies between qualified 
> sensors and partitions. Use a single entry for each sensor to query DB and 
> avoid duplication in Airflow.
> Add a sensor scheduling part in scheduler to:
>  # Check partitions status to enable downstream sensor success and trigger 
> sensor downstream tasks
>  # Selecting all pending partitions in DB including:
>  ## New coming partition sensor request
>  ## Existing sensor request that is still waiting
>  # With a time interval:
>  ## Create the set of tasks for sensing all pending partitions.
>  ## Kill previous sensor tasks
>  # For the task mentioned in 3: Each task should check many partitions. We 
> can introduce the sensor chunk number here for a maximum number of partitions 
> one task should handle. The sensors keep updating partition status in Airflow 
> DB during running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3964) Reduce duplicated tasks and optimize with scheduler embedded sensor

2019-02-27 Thread Yingbo Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingbo Wang updated AIRFLOW-3964:
-
Description: 
h2. Problem
h3. Airflow Sensor:

Sensors are a certain type of operator that will keep running until a certain 
criterion is met. Examples include a specific file landing in HDFS or S3, a 
partition appearing in Hive, or a specific time of the day. Sensors are derived 
from BaseSensorOperator and run a poke method at a specified poke_interval 
until it returns True.

Airflow Sensor duplication is a normal problem for large scale airflow project. 
There are duplicated partitions needing to be detected from same/different DAG. 
In Airbnb there are 88 boxes running four different types of sensors everyday. 
The number of running sensor tasks ranges from 8k to 16k, which takes great 
amount of resources. Although Airflow team had redirected all sensors to a 
specific queue to allocate relatively minor resource, there is still large room 
to reduce the number of workers and relief DB pressure by optimizing the sensor 
mechanism.

Existing sensor implementation creates an identical task for any sensor task 
with specific dag_id, task_id and execution_date. This task is responsible of 
keeping querying DB until the specified partitions exists. Even if two tasks 
are waiting for same partition in DB, they are creating two connections with 
the DB and checking the status in two separate processes. In one hand, DB need 
to run duplicate jobs in multiple processes which will take both cpu and memory 
resources. At the same time, Airflow need to maintain a process for each sensor 
to query and wait for the partition/table to be created.

To optimize the sensor, add a hashcode for each partition decided by the set of 
(conn_id, schema, table, partition). Add dependencies between qualified sensors 
and partitions. Use a single entry for each sensor to query DB and avoid 
duplication in Airflow.

Add a sensor scheduling part in scheduler to:
 # Check partitions status to enable downstream sensor success and trigger 
sensor downstream tasks
 # Selecting all pending partitions in DB including:
 ## New coming partition sensor request
 ## Existing sensor request that is still waiting
 ## With a time interval:
 ### Create the set of tasks for sensing all pending partitions.
 ### Kill previous sensor tasks

 # For the task mentioned in 3: Each task should check many partitions. We can 
introduce the sensor chunk number here for a maximum number of partitions one 
task should handle. The sensors keep updating partition status in Airflow DB 
during running

  was:
h2. Problem
h3. Airflow Sensor:

Sensors are a certain type of operator that will keep running until a certain 
criterion is met. Examples include a specific file landing in HDFS or S3, a 
partition appearing in Hive, or a specific time of the day. Sensors are derived 
from BaseSensorOperator and run a poke method at a specified poke_interval 
until it returns True.

Airflow Sensor duplication is a normal problem for large scale airflow project. 
There are duplicated partitions needing to be detected from same/different DAG. 
In Airbnb there are 88 boxes running four different types of sensors everyday. 
The number of running sensor tasks ranges from 8k to 16k, which takes great 
amount of resources. Although Airflow team had redirected all sensors to a 
specific queue to allocate relatively minor resource, there is still large room 
to reduce the number of workers and relief DB pressure by optimizing the sensor 
mechanism.

Existing sensor implementation creates an identical task for any sensor task 
with specific dag_id, task_id and execution_date. This task is responsible of 
keeping querying DB until the specified partitions exists. Even if two tasks 
are waiting for same partition in DB, they are creating two connections with 
the DB and checking the status in two separate processes. In one hand, DB need 
to run duplicate jobs in multiple processes which will take both cpu and memory 
resources. At the same time, Airflow need to maintain a process for each sensor 
to query and wait for the partition/table to be created.

To optimize the sensor, add a hashcode for each partition decided by the set of 
(conn_id, schema, table, partition). Add dependencies between qualified sensors 
and partitions. Use a single entry for each sensor to query DB and avoid 
duplication in Airflow.

Add a sensor scheduling part in scheduler to:
 # Check partitions status to enable downstream sensor success and trigger 
sensor downstream tasks
 # Selecting all pending partitions in DB including:
 # New coming partition sensor request
 # Existing sensor request that is still waiting


 # With a time interval:
 # Create the set of tasks for sensing all pending partitions.
 # Kill previous sensor tasks


 # For the task mentioned in 3: Each task should check many partitions. 

[jira] [Updated] (AIRFLOW-3964) Reduce duplicated tasks and optimize with scheduler embedded sensor

2019-02-27 Thread Yingbo Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingbo Wang updated AIRFLOW-3964:
-
Description: 
h2. Problem
h3. Airflow Sensor:

Sensors are a certain type of operator that will keep running until a certain 
criterion is met. Examples include a specific file landing in HDFS or S3, a 
partition appearing in Hive, or a specific time of the day. Sensors are derived 
from BaseSensorOperator and run a poke method at a specified poke_interval 
until it returns True.

Airflow Sensor duplication is a normal problem for large scale airflow project. 
There are duplicated partitions needing to be detected from same/different DAG. 
In Airbnb there are 88 boxes running four different types of sensors everyday. 
The number of running sensor tasks ranges from 8k to 16k, which takes great 
amount of resources. Although Airflow team had redirected all sensors to a 
specific queue to allocate relatively minor resource, there is still large room 
to reduce the number of workers and relief DB pressure by optimizing the sensor 
mechanism.

Existing sensor implementation creates an identical task for any sensor task 
with specific dag_id, task_id and execution_date. This task is responsible of 
keeping querying DB until the specified partitions exists. Even if two tasks 
are waiting for same partition in DB, they are creating two connections with 
the DB and checking the status in two separate processes. In one hand, DB need 
to run duplicate jobs in multiple processes which will take both cpu and memory 
resources. At the same time, Airflow need to maintain a process for each sensor 
to query and wait for the partition/table to be created.

To optimize the sensor, add a hashcode for each partition decided by the set of 
(conn_id, schema, table, partition). Add dependencies between qualified sensors 
and partitions. Use a single entry for each sensor to query DB and avoid 
duplication in Airflow.

Add a sensor scheduling part in scheduler to:
 # Check partitions status to enable downstream sensor success and trigger 
sensor downstream tasks
 # Selecting all pending partitions in DB including:
 # New coming partition sensor request
 # Existing sensor request that is still waiting


 # With a time interval:
 # Create the set of tasks for sensing all pending partitions.
 # Kill previous sensor tasks


 # For the task mentioned in 3: Each task should check many partitions. We can 
introduce the sensor chunk number here for a maximum number of partitions one 
task should handle. The sensors keep updating partition status in Airflow DB 
during running

  was:
h2. Problem
h3. Airflow Sensor:

Sensors are a certain type of operator that will keep running until a certain 
criterion is met. Examples include a specific file landing in HDFS or S3, a 
partition appearing in Hive, or a specific time of the day. Sensors are derived 
from BaseSensorOperator and run a poke method at a specified poke_interval 
until it returns True.

Airflow Sensor duplication is a normal problem for large scale airflow project. 
There are duplicated partitions needing to be detected from same/different DAG. 
In Airbnb there are 88 boxes running four different types of sensors everyday. 
The number of running sensor tasks ranges from 8k to 16k, which takes great 
amount of resources. Although Airflow team had redirected all sensors to a 
specific queue to allocate relatively minor resource, there is still large room 
to reduce the number of workers and relief DB pressure by optimizing the sensor 
mechanism.

Existing sensor implementation creates an identical task for any sensor task 
with specific dag_id, task_id and execution_date. This task is responsible of 
keeping querying DB until the specified partitions exists. Even if two tasks 
are waiting for same partition in DB, they are creating two connections with 
the DB and checking the status in two separate processes. In one hand, DB need 
to run duplicate jobs in multiple processes which will take both cpu and memory 
resources. At the same time, Airflow need to maintain a process for each sensor 
to query and wait for the partition/table to be created. 

Airflow Scheduler: 

Airflow scheduler is responsible of parsing DAGs and scheduling airflow tasks. 
The jobs.process_file function process all python file that have “airflow” and 
“Dag”:
 # Execute the file and look for DAG objects in the namespace.
 # Pickle the DAG and save it to the DB (if necessary).
 # For each DAG, see what tasks should run and create appropriate task 
instances in the DB.
 # Record any errors importing the file into ORM
 # Kill (in ORM) any task instances belonging to the DAGs that haven't issued a 
heartbeat in a while.

This function returns a list of SimpleDag objects that represent the DAGs found 
in the file

There are some issues with existing Airflow scheduler:
 # Multiple parsing: Scheduler will parse a 

[jira] [Updated] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database

2019-02-27 Thread Elliott Shugerman (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Shugerman updated AIRFLOW-3973:
---
Description: 
h2. Example

{{ERROR [airflow.models.DagBag] Failed to import: 
/home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 1236, in _execute_context cursor, statement, parameters, context File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 536, in do_execute cursor.execute(statement, parameters) 
psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
variable}}
h2. Explanation

The first thing {{airflow initdb}} does is run the Alembic migrations. All 
migrations are run in one transaction. Most tables, including the {{variable}} 
table, are defined in the initial migration. A [later 
migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
 imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
calls its {{collect_dags}} method, which scans the DAGs directory and attempts 
to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it 
will query the database to see if that {{Variable}} is defined in the 
{{variable}} table. It's not clear to me how exactly the connection for that 
query is created, but I think it is a fair assumption that it does _not_ use 
the same transaction that is used to run the migrations. Since the migrations 
are not yet complete, and all migrations are run in one transaction, the 
migration that creates the {{variable}} table has not yet been committed, and 
therefore the table does not exist to any other connection/transaction. This 
raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}.

NOTE: This does not occur with the default SQLite database.
h2. Proposed Solution

Run each Alembic migration in its own transaction. I will open a pull request 
which accomplishes this shortly.

  was:
h2. Example

{{ERROR [airflow.models.DagBag] Failed to import: 
/home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 1236, in _execute_context cursor, statement, parameters, context File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 536, in do_execute cursor.execute(statement, parameters) 
psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
variable}}
h2. Explanation

The first thing {{airflow initdb}} does is run the Alembic migrations. All 
migrations are run in one transaction. Most tables, including the {{variable}} 
table, are defined in the initial migration. A [later 
migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
 imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
calls its {{collect_dags}} method, which scans the DAGs directory and attempts 
to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it 
will query the database to see if that {{Variable}} is defined in the 
{{variable}} table. It's not clear to me how exactly the connection for that 
query is created, but I think it is a fair assumption that it does _not_ use 
the same transaction that is used to run the migrations. Since the migrations 
are not yet complete, and all migrations are run in one transaction, the 
migration that creates the {{variable}} table has not yet been committed, and 
therefore the table does not exist to any other connection/transaction. This 
raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}.

NOTE: This does not occur with the default SQLite database.
h2. Proposed Solution

Run each Alembic migration in its own transaction. I will be opening a pull 
request which accomplishes this shortly.


> `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is 
> used for the internal database
> ---
>
> Key: AIRFLOW-3973
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3973
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Elliott Shugerman
>Assignee: Elliott Shugerman
>Priority: Minor
>
> h2. Example
> {{ERROR [airflow.models.DagBag] Failed to import: 
> /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): 
> File 
> "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
>  line 1236, in _execute_context cursor, 

[jira] [Created] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database

2019-02-27 Thread Elliott Shugerman (JIRA)
Elliott Shugerman created AIRFLOW-3973:
--

 Summary: `airflow initdb` logs errors when `Variable` is used in 
DAGs and Postgres is used for the internal database
 Key: AIRFLOW-3973
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3973
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Elliott Shugerman
Assignee: Elliott Shugerman


h2. Example

{{ERROR [airflow.models.DagBag] Failed to import: 
/home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 1236, in _execute_context cursor, statement, parameters, context File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 536, in do_execute cursor.execute(statement, parameters) 
psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
variable}}
h2. Explanation

The first thing {{airflow initdb}} does is run the Alembic migrations. All 
migrations are run in one transaction. Most tables, including the {{variable}} 
table, are defined in the initial migration. A [later 
migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
 imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
calls its {{collect_dags}} method, which scans the DAGs directory and attempts 
to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it 
will query the database to see if that {{Variable}} is defined in the 
{{variable}} table. It's not clear to me how exactly the connection for that 
query is created, but I think it is a fair assumption that it does _not_ use 
the same transaction that is used to run the migrations. Since the migrations 
are not yet complete, and all migrations are run in one transaction, the 
migration that creates the {{variable}} table has not yet been committed, and 
therefore the table does not exist to any other connection/transaction. This 
raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}.

NOTE: This does not occur with the default SQLite database.
h2. Proposed Solution

Run each Alembic migration in its own transaction. I will be opening a pull 
request which accomplishes this shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3972) Http Operators sets Connection model's schema property to the scheme of the uri

2019-02-27 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3972:
---
Fix Version/s: 2.0.0
  Description: 
The HttpOperator is expecting there to be a *schema* property on the Connection 
model after it parses through a URI env connection string here: 
[https://github.com/apache/airflow/blob/6b38649fa6cdf16055c7f5458050c70f39cac8fd/airflow/hooks/http_hook.py#L67]

However the Connection model uses a subset of the uri path to set the `schema` 
property on itself. The HttpOperator sets the url to *schema + '://' + host* 
and if the schema isn't set, it uses http by default which prevents us from 
hitting https endpoints.  

the HTTP operator is assuming the schema property is a scheme, and the 
connection model doesn't include a scheme in the model. 

  was:

The HttpOperator is expecting there to be a *schema* property on the Connection 
model after it parses through a URI env connection string here: 
[https://github.com/apache/airflow/blob/6b38649fa6cdf16055c7f5458050c70f39cac8fd/airflow/hooks/http_hook.py#L67]

However the Connection model uses a subset of the uri path to set the `schema` 
property on itself. The HttpOperator sets the url to *schema + '://' + host* 
and if the schema isn't set, it uses http by default which prevents us from 
hitting https endpoints.  

the HTTP operator is assuming the schema property is a scheme, and the 
connection model doesn't include a scheme in the model. 


> Http Operators sets Connection model's schema property to the scheme of the 
> uri
> ---
>
> Key: AIRFLOW-3972
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3972
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks, models
>Affects Versions: 1.10.2
>Reporter: Kamla
>Priority: Major
> Fix For: 2.0.0
>
>
> The HttpOperator is expecting there to be a *schema* property on the 
> Connection model after it parses through a URI env connection string here: 
> [https://github.com/apache/airflow/blob/6b38649fa6cdf16055c7f5458050c70f39cac8fd/airflow/hooks/http_hook.py#L67]
> However the Connection model uses a subset of the uri path to set the 
> `schema` property on itself. The HttpOperator sets the url to *schema + '://' 
> + host* and if the schema isn't set, it uses http by default which prevents 
> us from hitting https endpoints.  
> the HTTP operator is assuming the schema property is a scheme, and the 
> connection model doesn't include a scheme in the model. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3918) Adding a parameter in Airflow-kubernetes config to support git-sync with SSH credential

2019-02-27 Thread Daniel Mateus Pires (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Mateus Pires updated AIRFLOW-3918:
-
External issue URL: https://github.com/apache/airflow/pull/4777

> Adding a parameter in Airflow-kubernetes config to support git-sync with SSH 
> credential
> ---
>
> Key: AIRFLOW-3918
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3918
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Daniel Mateus Pires
>Assignee: Daniel Mateus Pires
>Priority: Minor
>
> It's the preferred pattern in my work place to integrate deployment systems 
> with GitHub using the SSH deploy key feature that can easily be scoped to 
> read-only on a single repository
> I would like to support this feature by supporting a "git_ssh_key_file" 
> parameter in the kubernetes section of the config, which would be an 
> alternate authentication method to the already supported git_user + 
> git_password
> It will use the following feature: 
> https://github.com/kubernetes/git-sync/blob/7bb3262084ac1ad64321856c1e769358cf18f67d/cmd/git-sync/main.go#L88
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] vardancse commented on a change in pull request #4781: [AIRFLOW-3962] Added graceful handling for creation of dag_run of a dag which doesn't have any task

2019-02-27 Thread GitBox
vardancse commented on a change in pull request #4781: [AIRFLOW-3962] Added 
graceful handling for creation of dag_run of a dag which doesn't have any task
URL: https://github.com/apache/airflow/pull/4781#discussion_r260836841
 
 

 ##
 File path: tests/test_jobs.py
 ##
 @@ -2665,6 +2665,22 @@ def test_scheduler_do_not_schedule_too_early(self):
 
 queue.put.assert_not_called()
 
+def test_scheduler_do_not_schedule_without_tasks(self):
+dag = DAG(
+dag_id='test_scheduler_do_not_schedule_without_tasks',
+start_date=DEFAULT_DATE)
+
+with create_session() as session:
+orm_dag = DagModel(dag_id=dag.dag_id)
+session.merge(orm_dag)
+session.commit()
 
 Review comment:
   Thanks for catching that, removed suggested lines.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] mik-laj commented on issue #4787: [AIRFLOW-3967] Extract Jinja directive from Javascript

2019-02-27 Thread GitBox
mik-laj commented on issue #4787: [AIRFLOW-3967] Extract Jinja directive from 
Javascript
URL: https://github.com/apache/airflow/pull/4787#issuecomment-467932679
 
 
   I finished my work.  It looks good to me. However, I encourage you to submit 
your suggestions. 
   
   @ashb Can I ask for review?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] mik-laj commented on a change in pull request #4786: [AIRFLOW-3966] Correct default bigquery_conn_id in BigQueryTableSensor

2019-02-27 Thread GitBox
mik-laj commented on a change in pull request #4786: [AIRFLOW-3966] Correct 
default bigquery_conn_id in BigQueryTableSensor
URL: https://github.com/apache/airflow/pull/4786#discussion_r260830804
 
 

 ##
 File path: airflow/contrib/sensors/bigquery_sensor.py
 ##
 @@ -50,7 +50,7 @@ def __init__(self,
  project_id,
  dataset_id,
  table_id,
- bigquery_conn_id='bigquery_default_conn',
+ bigquery_conn_id='bigquery_default',
 
 Review comment:
   I started working in this direction, but I did not have time to finish it. 
If you want, you can base your work on this change.
   https://github.com/PolideaInternal/airflow/pull/42/files 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] mik-laj commented on a change in pull request #4784: [AIRFLOW-XXX][WIP]Enforce order in imports

2019-02-27 Thread GitBox
mik-laj commented on a change in pull request #4784: [AIRFLOW-XXX][WIP]Enforce 
order in imports
URL: https://github.com/apache/airflow/pull/4784#discussion_r260828711
 
 

 ##
 File path: setup.py
 ##
 @@ -233,6 +233,8 @@ def write_version(filename=os.path.join(*['airflow',
 
 devel = [
 'click==6.7',
+'flake8-import-order-0.18',
 
 Review comment:
   ```suggestion
   'flake8-import-order>=0.18',
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #4781: [AIRFLOW-3962] Added graceful handling for creation of dag_run of a dag which doesn't have any task

2019-02-27 Thread GitBox
Fokko commented on a change in pull request #4781: [AIRFLOW-3962] Added 
graceful handling for creation of dag_run of a dag which doesn't have any task
URL: https://github.com/apache/airflow/pull/4781#discussion_r260806069
 
 

 ##
 File path: tests/test_jobs.py
 ##
 @@ -2665,6 +2665,22 @@ def test_scheduler_do_not_schedule_too_early(self):
 
 queue.put.assert_not_called()
 
+def test_scheduler_do_not_schedule_without_tasks(self):
+dag = DAG(
+dag_id='test_scheduler_do_not_schedule_without_tasks',
+start_date=DEFAULT_DATE)
+
+with create_session() as session:
+orm_dag = DagModel(dag_id=dag.dag_id)
+session.merge(orm_dag)
+session.commit()
 
 Review comment:
   `.commit()` and `.close()` can be omitted since they are part of the 
`create_session`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4699: [AIRFLOW-3881] Correct to_csv row number

2019-02-27 Thread GitBox
Fokko commented on issue #4699: [AIRFLOW-3881] Correct to_csv row number
URL: https://github.com/apache/airflow/pull/4699#issuecomment-467909116
 
 
   Thanks @zhongjiajie 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko merged pull request #4699: [AIRFLOW-3881] Correct to_csv row number

2019-02-27 Thread GitBox
Fokko merged pull request #4699: [AIRFLOW-3881] Correct to_csv row number
URL: https://github.com/apache/airflow/pull/4699
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #4699: [AIRFLOW-3881] Correct to_csv row number

2019-02-27 Thread GitBox
Fokko commented on a change in pull request #4699: [AIRFLOW-3881] Correct 
to_csv row number
URL: https://github.com/apache/airflow/pull/4699#discussion_r260804565
 
 

 ##
 File path: tests/hooks/test_hive_hook.py
 ##
 @@ -451,7 +452,23 @@ def test_get_results_data(self):
 results = hook.get_results(query, schema=self.database)
 self.assertListEqual(results['data'], [(1, 1), (2, 2)])
 
-def test_to_csv(self):
+@unittest.skipIf(NOT_ASSERTLOGS_VERSION < 3.4, 'assertLogs not support 
before python 3.4')
 
 Review comment:
   Ah check, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3795) provide_context is not a passable parameter for PythonVirtualenvOperator

2019-02-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779440#comment-16779440
 ] 

ASF subversion and git services commented on AIRFLOW-3795:
--

Commit 217c940d0e82c0b8bf0d43c26d69297d2d374107 in airflow's branch 
refs/heads/master from Sergio Soto
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=217c940 ]

[AIRFLOW-3795] provide_context param is now used (#4735)

* provide_context param is now used

* Fixed new PythonVirtualenvOperator test


> provide_context is not a passable parameter for PythonVirtualenvOperator
> 
>
> Key: AIRFLOW-3795
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3795
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Susannah Doss
>Assignee: Sergio Soto Núñez
>Priority: Trivial
>
> `PythonVirtualenvOperator` does not allow me to specify 
> `provide_context=True`: 
> https://github.com/apache/airflow/blob/83cb9c3acdd3b4eeadf1cab3cb45d644c3e9ede0/airflow/operators/python_operator.py#L242
> However, I am able to do so when I use the plain `PythonOperator`. I can't 
> see a reason why I wouldn't be allowed to have it be set to `True` when using 
> a `PythonVirtualenvOperator`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3795) provide_context is not a passable parameter for PythonVirtualenvOperator

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779439#comment-16779439
 ] 

ASF GitHub Bot commented on AIRFLOW-3795:
-

Fokko commented on pull request #4735: [AIRFLOW-3795] provide_context param is 
now used
URL: https://github.com/apache/airflow/pull/4735
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> provide_context is not a passable parameter for PythonVirtualenvOperator
> 
>
> Key: AIRFLOW-3795
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3795
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Susannah Doss
>Assignee: Sergio Soto Núñez
>Priority: Trivial
>
> `PythonVirtualenvOperator` does not allow me to specify 
> `provide_context=True`: 
> https://github.com/apache/airflow/blob/83cb9c3acdd3b4eeadf1cab3cb45d644c3e9ede0/airflow/operators/python_operator.py#L242
> However, I am able to do so when I use the plain `PythonOperator`. I can't 
> see a reason why I wouldn't be allowed to have it be set to `True` when using 
> a `PythonVirtualenvOperator`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3795) provide_context is not a passable parameter for PythonVirtualenvOperator

2019-02-27 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-3795.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> provide_context is not a passable parameter for PythonVirtualenvOperator
> 
>
> Key: AIRFLOW-3795
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3795
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Susannah Doss
>Assignee: Sergio Soto Núñez
>Priority: Trivial
> Fix For: 2.0.0
>
>
> `PythonVirtualenvOperator` does not allow me to specify 
> `provide_context=True`: 
> https://github.com/apache/airflow/blob/83cb9c3acdd3b4eeadf1cab3cb45d644c3e9ede0/airflow/operators/python_operator.py#L242
> However, I am able to do so when I use the plain `PythonOperator`. I can't 
> see a reason why I wouldn't be allowed to have it be set to `True` when using 
> a `PythonVirtualenvOperator`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko merged pull request #4735: [AIRFLOW-3795] provide_context param is now used

2019-02-27 Thread GitBox
Fokko merged pull request #4735: [AIRFLOW-3795] provide_context param is now 
used
URL: https://github.com/apache/airflow/pull/4735
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #4786: [AIRFLOW-3966] Correct default bigquery_conn_id in BigQueryTableSensor

2019-02-27 Thread GitBox
Fokko commented on a change in pull request #4786: [AIRFLOW-3966] Correct 
default bigquery_conn_id in BigQueryTableSensor
URL: https://github.com/apache/airflow/pull/4786#discussion_r260800642
 
 

 ##
 File path: airflow/contrib/sensors/bigquery_sensor.py
 ##
 @@ -50,7 +50,7 @@ def __init__(self,
  project_id,
  dataset_id,
  table_id,
- bigquery_conn_id='bigquery_default_conn',
+ bigquery_conn_id='bigquery_default',
 
 Review comment:
   Agree with @mik-laj :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] mik-laj commented on issue #4788: [AIRFLOW-3811][3/3] Add automatic generation of API Reference

2019-02-27 Thread GitBox
mik-laj commented on issue #4788: [AIRFLOW-3811][3/3] Add automatic generation 
of API Reference
URL: https://github.com/apache/airflow/pull/4788#issuecomment-467904933
 
 
   @Fokko This is corrected, but in earlier PR.  Please accept changes in 
order. I have divided all the changes into a few PRs to increase transparency.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] RosterIn opened a new pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit…

2019-02-27 Thread GitBox
RosterIn opened a new pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 
19.5.0 to avoid moderate-severit…
URL: https://github.com/apache/airflow/pull/4795
 
 
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-2767/) issues and 
references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2767
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit…

2019-02-27 Thread GitBox
Fokko commented on issue #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to 
avoid moderate-severit…
URL: https://github.com/apache/airflow/pull/4795#issuecomment-467902825
 
 
   I've restarted the Kubernetes tests


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko closed pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to avoid moderate-severit…

2019-02-27 Thread GitBox
Fokko closed pull request #4795: [AIRFLOW-2767] Upgrade gunicorn to 19.5.0 to 
avoid moderate-severit…
URL: https://github.com/apache/airflow/pull/4795
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4788: [AIRFLOW-3811][3/3] Add automatic generation of API Reference

2019-02-27 Thread GitBox
Fokko commented on issue #4788: [AIRFLOW-3811][3/3] Add automatic generation of 
API Reference
URL: https://github.com/apache/airflow/pull/4788#issuecomment-467899614
 
 
   Fully agree with @potiuk, this was overdue for a long time!
   
   Awesome work @mik-laj 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3961) Centos 7 + mysql 8.0 - initdb - Incorrect datetime value

2019-02-27 Thread Florian FERREIRA (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779415#comment-16779415
 ] 

Florian FERREIRA commented on AIRFLOW-3961:
---

Hello,

I have reinstalled my VM and i can't reproduce.

sorry for the inconvenience

> Centos 7 + mysql 8.0 - initdb - Incorrect datetime value
> 
>
> Key: AIRFLOW-3961
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3961
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.2
>Reporter: Florian FERREIRA
>Priority: Major
>
> Hello, i have some problems with initialization of backend database. 
> How to reproduce : 
> Airflow : 1.10.2
> Mysql : mysql Ver 8.0.15 for Linux on x86_64 (MySQL Community Server - GPL)
> Mysql user : 
> {code}
> CREATE USER 'airflow'@'%'  IDENTIFIED WITH mysql_native_password BY ''; GRANT 
> ALL PRIVILEGES ON airflow.* TO 'airflow'@'%';
> {code}
> My.cnf file:
> {code}
> [mysqld]
> #
> # Remove leading # and set to the amount of RAM for the most important data
> # cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
> # innodb_buffer_pool_size = 128M
> #
> # Remove the leading "# " to disable binary logging
> # Binary logging captures changes between backups and is enabled by
> # default. It's default setting is log_bin=binlog
> # disable_log_bin
> #
> # Remove leading # to set options mainly useful for reporting servers.
> # The server defaults are faster for transactions and fast SELECTs.
> # Adjust sizes as needed, experiment to find the optimal values.
> # join_buffer_size = 128M
> # sort_buffer_size = 2M
> # read_rnd_buffer_size = 2M
> #
> # Remove leading # to revert to previous value for 
> default_authentication_plugin,
> # this will increase compatibility with older clients. For background, see:
> # 
> https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_default_authentication_plugin
> # default-authentication-plugin=mysql_native_password
> datadir=/var/lib/mysql
> socket=/var/lib/mysql/mysql.sock
> log-error=/var/log/mysqld.log
> pid-file=/var/run/mysqld/mysqld.pid
> explicit_defaults_for_timestamp=1
> {code}
> When i launch on *empty* database `airflow initdb` or `airflow resetdb`
> i have the following error :
> {code}
> airflow resetdb
> /usr/lib/python2.7/site-packages/requests/__init__.py:91: 
> RequestsDependencyWarning: urllib3 (1.24.1) or chardet (2.2.1) doesn't match 
> a supported version!
>   RequestsDependencyWarning)
> [2019-02-26 14:29:25,280] {settings.py:174} INFO - settings.configure_orm(): 
> Using pool settings. pool_size=5, pool_recycle=1800, pid=12078
> [2019-02-26 14:29:25,526] {__init__.py:51} INFO - Using executor LocalExecutor
> DB: mysql://airflow:***@airflow.bboxdata-dev.lan.oxv.fr/airflow
> This will drop existing tables if they exist. Proceed? (y/n)y
> [2019-02-26 14:29:26,693] {db.py:358} INFO - Dropping tables that exist
> [2019-02-26 14:29:27,238] {migration.py:116} INFO - Context impl MySQLImpl.
> [2019-02-26 14:29:27,238] {migration.py:121} INFO - Will assume 
> non-transactional DDL.
> [2019-02-26 14:29:27,265] {db.py:338} INFO - Creating tables
> INFO  [alembic.runtime.migration] Context impl MySQLImpl.
> INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade  -> e3a246e0dc1, current 
> schema
> INFO  [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> 
> 1507a7289a2f, create is_encrypted
> INFO  [alembic.runtime.migration] Running upgrade 1507a7289a2f -> 
> 13eb55f81627, maintain history for compatibility with earlier migrations
> INFO  [alembic.runtime.migration] Running upgrade 13eb55f81627 -> 
> 338e90f54d61, More logging into task_instance
> INFO  [alembic.runtime.migration] Running upgrade 338e90f54d61 -> 
> 52d714495f0, job_id indices
> INFO  [alembic.runtime.migration] Running upgrade 52d714495f0 -> 
> 502898887f84, Adding extra to Log
> INFO  [alembic.runtime.migration] Running upgrade 502898887f84 -> 
> 1b38cef5b76e, add dagrun
> INFO  [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> 
> 2e541a1dcfed, task_duration
> INFO  [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> 
> 40e67319e3a9, dagrun_config
> INFO  [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> 
> 561833c1c74b, add password column to user
> INFO  [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, 
> dagrun start end
> INFO  [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, 
> Add notification_sent column to sla_miss
> INFO  [alembic.runtime.migration] Running upgrade bbc73705a13e -> 
> bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field 
> in connection
> INFO  [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> 
> 1968acfc09e3, add 

[GitHub] Fokko commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper failed session commit handling

2019-02-27 Thread GitBox
Fokko commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper 
failed session commit handling
URL: https://github.com/apache/airflow/pull/4769#discussion_r260790149
 
 

 ##
 File path: airflow/jobs.py
 ##
 @@ -2168,147 +2168,149 @@ def _process_backfill_task_instances(self,
 # or leaf to root, as otherwise tasks might be
 # determined deadlocked while they are actually
 # waiting for their upstream to finish
+@provide_session
 
 Review comment:
   Personally I would prefer to have a `create_session`, since we commit the 
result on the last line anyway. If we do this properly, we shouldn't have to do 
`refresh_from_db` so often.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3967) Avoid mixin Jinja and Javascript

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779397#comment-16779397
 ] 

ASF GitHub Bot commented on AIRFLOW-3967:
-

mik-laj commented on pull request #4787: [AIRFLOW-3967][WIP] Extract Jinja 
directive from Javascript
URL: https://github.com/apache/airflow/pull/4787
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3967
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   The current state is unexpected because:
   
   * When Javascript is generated by the template engine, we do 
metaprogramming, which raises the complexity of the problems.
   * It does not allow testing the code in isolation.
   * Jinja is a HTML Template Engine. Using it to the JS code may cause a 
security risk.
   
   As a next step, I would like to:
   * move the JS code to separate file
   * introduce linting for all JS/HTML files. 
   * extract inlined CSS  to separate file
   
   For a long-term goal, I would like to 
   * introduce visual regression and snapshot testing. If the JS code will be 
in separate files then it will be relatively simple. 
   * update a dependency - Bootstrap 4 including dropping glyphicons
   
   @jmcarp  I saw that you changed HTML / JS files recently. What do you think 
about the change and plans for the future? Does what I want to do in the future 
make sense for you?
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Avoid mixin Jinja and Javascript
> 
>
> Key: AIRFLOW-3967
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3967
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kamil Bregula
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3967) Avoid mixin Jinja and Javascript

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779396#comment-16779396
 ] 

ASF GitHub Bot commented on AIRFLOW-3967:
-

mik-laj commented on pull request #4787: [AIRFLOW-3967][WIP] Extract Jinja 
directive from Javascript
URL: https://github.com/apache/airflow/pull/4787
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Avoid mixin Jinja and Javascript
> 
>
> Key: AIRFLOW-3967
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3967
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kamil Bregula
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper failed session commit handling

2019-02-27 Thread GitBox
Fokko commented on a change in pull request #4769: [AIRFLOW-2511] Fix improper 
failed session commit handling
URL: https://github.com/apache/airflow/pull/4769#discussion_r260789285
 
 

 ##
 File path: airflow/jobs.py
 ##
 @@ -2168,147 +2168,149 @@ def _process_backfill_task_instances(self,
 # or leaf to root, as otherwise tasks might be
 # determined deadlocked while they are actually
 # waiting for their upstream to finish
+@provide_session
 
 Review comment:
   In general I think we should let SQLAlchemy do the pooling and close the 
sessions that we don't use anymore, instead of keeping them open and passing 
them around all the time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4637: [AIRFLOW-3793] Decommission configuration items for Flask-Admin web UI & related codes

2019-02-27 Thread GitBox
Fokko commented on issue #4637: [AIRFLOW-3793] Decommission configuration items 
for Flask-Admin web UI & related codes
URL: https://github.com/apache/airflow/pull/4637#issuecomment-467893390
 
 
   Restarted the failing tests


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #4781: [AIRFLOW-3962] Added graceful handling for creation of dag_run of a dag which doesn't have any task

2019-02-27 Thread GitBox
Fokko commented on a change in pull request #4781: [AIRFLOW-3962] Added 
graceful handling for creation of dag_run of a dag which doesn't have any task
URL: https://github.com/apache/airflow/pull/4781#discussion_r260786130
 
 

 ##
 File path: tests/test_jobs.py
 ##
 @@ -2665,6 +2665,22 @@ def test_scheduler_do_not_schedule_too_early(self):
 
 queue.put.assert_not_called()
 
+def test_scheduler_do_not_schedule_without_tasks(self):
+dag = DAG(
+dag_id='test_scheduler_do_not_schedule_without_tasks',
+start_date=DEFAULT_DATE)
+
+session = settings.Session()
 
 Review comment:
   Please use the `create_session()`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3543) rescheduled tasks block DAG deletion

2019-02-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779404#comment-16779404
 ] 

ASF subversion and git services commented on AIRFLOW-3543:
--

Commit 078ff765dbde1a47a0f9bcbd605c711e96201f79 in airflow's branch 
refs/heads/master from Stefan Seelmann
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=078ff76 ]

AIRFLOW-3543: Fix deletion of DAG with rescheduled tasks (#4646)



> rescheduled tasks block DAG deletion
> 
>
> Key: AIRFLOW-3543
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3543
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli, database
> Environment: postgres 10 database
>Reporter: Christopher
>Assignee: Stefan Seelmann
>Priority: Critical
>
> This applies to current master branch after 
> [AIRFLOW-2747|https://github.com/apache/incubator-airflow/commit/dc59d7e2750aa90e099afad8689f2646f18f92a6]
>  was merged. 
> Once a sensor task is rescheduled, the task cannot be deleted from the DB due 
> to a foreign key constraint. This prevents deletion of tasks and DAGS. This 
> occurs regardless of whether the DAG is still running or whether the sensor 
> is actually rescheduled to run in the future or not (ie the task may complete 
> successfully but its entry still resides as a row in the task_reschedule 
> table.
>  
> I am running a postgres-backed airflow instance.
>  
> {{Traceback (most recent call last):}}
> {{ File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1193, in _execute_context}}
> {{context)
> {{File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 509, in do_execute
> {{cursor.execute(statement, parameters)}}
> {{psycopg2.IntegrityError: update or delete on table "task_instance" violates 
> foreign key constraint "task_reschedule_dag_task_date_fkey" on table 
> "task_reschedule"}}
> {{DETAIL: Key (task_id, dag_id, execution_date)=(check_images_ready_11504, 
> flight5105_v0.0.1, 2018-12-13 00:00:00+00) is still referenced from table 
> "task_reschedule".}}
> {{sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) update or delete 
> on table "task_instance" violates foreign key constraint 
> "task_reschedule_dag_task_date_fkey" on table "task_reschedule"}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3543) rescheduled tasks block DAG deletion

2019-02-27 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-3543.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> rescheduled tasks block DAG deletion
> 
>
> Key: AIRFLOW-3543
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3543
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli, database
> Environment: postgres 10 database
>Reporter: Christopher
>Assignee: Stefan Seelmann
>Priority: Critical
> Fix For: 2.0.0
>
>
> This applies to current master branch after 
> [AIRFLOW-2747|https://github.com/apache/incubator-airflow/commit/dc59d7e2750aa90e099afad8689f2646f18f92a6]
>  was merged. 
> Once a sensor task is rescheduled, the task cannot be deleted from the DB due 
> to a foreign key constraint. This prevents deletion of tasks and DAGS. This 
> occurs regardless of whether the DAG is still running or whether the sensor 
> is actually rescheduled to run in the future or not (ie the task may complete 
> successfully but its entry still resides as a row in the task_reschedule 
> table.
>  
> I am running a postgres-backed airflow instance.
>  
> {{Traceback (most recent call last):}}
> {{ File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1193, in _execute_context}}
> {{context)
> {{File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 509, in do_execute
> {{cursor.execute(statement, parameters)}}
> {{psycopg2.IntegrityError: update or delete on table "task_instance" violates 
> foreign key constraint "task_reschedule_dag_task_date_fkey" on table 
> "task_reschedule"}}
> {{DETAIL: Key (task_id, dag_id, execution_date)=(check_images_ready_11504, 
> flight5105_v0.0.1, 2018-12-13 00:00:00+00) is still referenced from table 
> "task_reschedule".}}
> {{sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) update or delete 
> on table "task_instance" violates foreign key constraint 
> "task_reschedule_dag_task_date_fkey" on table "task_reschedule"}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko merged pull request #4646: AIRFLOW-3543: Fix deletion of DAG with rescheduled tasks

2019-02-27 Thread GitBox
Fokko merged pull request #4646: AIRFLOW-3543: Fix deletion of DAG with 
rescheduled tasks
URL: https://github.com/apache/airflow/pull/4646
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4646: AIRFLOW-3543: Fix deletion of DAG with rescheduled tasks

2019-02-27 Thread GitBox
Fokko commented on issue #4646: AIRFLOW-3543: Fix deletion of DAG with 
rescheduled tasks
URL: https://github.com/apache/airflow/pull/4646#issuecomment-467892646
 
 
   Thanks again @seelmann 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] mik-laj opened a new pull request #4787: [AIRFLOW-3967][WIP] Extract Jinja directive from Javascript

2019-02-27 Thread GitBox
mik-laj opened a new pull request #4787: [AIRFLOW-3967][WIP] Extract Jinja 
directive from Javascript
URL: https://github.com/apache/airflow/pull/4787
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3967
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   The current state is unexpected because:
   
   * When Javascript is generated by the template engine, we do 
metaprogramming, which raises the complexity of the problems.
   * It does not allow testing the code in isolation.
   * Jinja is a HTML Template Engine. Using it to the JS code may cause a 
security risk.
   
   As a next step, I would like to:
   * move the JS code to separate file
   * introduce linting for all JS/HTML files. 
   * extract inlined CSS  to separate file
   
   For a long-term goal, I would like to 
   * introduce visual regression and snapshot testing. If the JS code will be 
in separate files then it will be relatively simple. 
   * update a dependency - Bootstrap 4 including dropping glyphicons
   
   @jmcarp  I saw that you changed HTML / JS files recently. What do you think 
about the change and plans for the future? Does what I want to do in the future 
make sense for you?
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] mik-laj closed pull request #4787: [AIRFLOW-3967][WIP] Extract Jinja directive from Javascript

2019-02-27 Thread GitBox
mik-laj closed pull request #4787: [AIRFLOW-3967][WIP] Extract Jinja directive 
from Javascript
URL: https://github.com/apache/airflow/pull/4787
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4768: [AIRFLOW-3800] run a dag at the beginning of the scheduled interval

2019-02-27 Thread GitBox
Fokko commented on issue #4768: [AIRFLOW-3800] run a dag at the beginning of 
the scheduled interval
URL: https://github.com/apache/airflow/pull/4768#issuecomment-467889897
 
 
   I also agree with @XD-DENG, if we want this, this should preferably be on 
Airflow instance level.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4768: [AIRFLOW-3800] run a dag at the beginning of the scheduled interval

2019-02-27 Thread GitBox
Fokko commented on issue #4768: [AIRFLOW-3800] run a dag at the beginning of 
the scheduled interval
URL: https://github.com/apache/airflow/pull/4768#issuecomment-467889502
 
 
   I see where this comes from, but I think it might only increase the 
confusion. If you explain the philosophy behind kicking off the first DAG after 
the dag_start+invarval, it makes perfect sense.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-1847) Webhook Sensor

2019-02-27 Thread gsemet (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779369#comment-16779369
 ] 

gsemet commented on AIRFLOW-1847:
-

Yes, but since the web server is already a web server, i would like it (or a 
minibackend behind the extension) to provide a webhook for that. 
Also, bust treatment need to be adressed somehow, so we would need a kind of 
queue. Overall this very classic scheme should be handled by the webhook sensor 
proposal.

> Webhook Sensor
> --
>
> Key: AIRFLOW-1847
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1847
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Reporter: gsemet
>Assignee: gsemet
>Priority: Minor
>  Labels: api, sensors, webhook
> Attachments: airflow-webhook-proposal.png
>
>
> h1. Webhook sensor
> May require a hook in the experimental API
> Register an api endpoint and wait for input on each.
> It is different than the {{dag_runs}} api in that the format is not airflow 
> specific, it is just a callback web url called by an external system on some 
> even with its application specific content. The content in really important 
> and need to be sent to the dag (as XCom?)
> Use Case:
> - A Dag registers a WebHook sensor named {{}}
> - An custom endpoint is exposed at 
> {{http://myairflow.server/api/experimental/webhook/}}.
> - I set this URL in the external system I wish to use the webhook from. Ex: 
> github/gitlab project webhook
> - when the external application performs a request to this URL, this is 
> automatically sent to the WebHook sensor. For simplicity, we can have a 
> JsonWebHookSensor that would be able to carry any kind of json content.
> - sensor only job would be normally to trigger the exection of a DAG, 
> providing it with the json content as xcom.
> If there are several requests at the same time, the system should be scalable 
> enough to not die or not slow down the webui. It is also possible to 
> instantiate an independant flask/gunicorn server to split the load. It would 
> mean it runs on another port, but this could be just an option in the 
> configuration file or even a complete independant application ({{airflow 
> webhookserver}}). I saw recent changes integrated gunicorn in airflow core, 
> guess it can help this use case.
> To support the charge, I think it is good that the part in the API just post 
> the received request in an internal queue so the Sensor can handle them later 
> without risk of missing one.
> Documentation would be updated to describe the classic scheme to implement 
> this use case, which would look like:
> !airflow-webhook-proposal.png!
> I think it is good to split it into 2 DAGs, one for linear handling of the 
> messages and triggering new DAG, and the processing DAG that might be 
> executed in parallel.  
> h2. Example usage in Sensor DAG: trigger a DAG on GitHub Push Event
> {code}
> sensor = JsonWebHookSensor(
> task_id='my_task_id',
> name="on_github_push"
> )
> .. user is responsible to triggering the processing DAG himself.
> {code}
> In my github project, I register the following URL in webhook page:
> {code}
> http://airflow.myserver.com/api/experimental/webhook/on_github_push
> {code}
> From now on, on push, github will send a [json with this 
> format|https://developer.github.com/v3/activity/events/types/#pushevent] to 
> the previous URL.
> The {{JsonWebHookSensor}} receives the payload, and a new dag is triggered in 
> this Sensing Dag.
> h2. Documenation update
> - add new item in the [scheduling 
> documentation|https://pythonhosted.org/airflow/scheduler.html] about how to 
> trigger a DAG using a webhook
> - describe the sensing dag + processing dag scheme and provide the github use 
> case as real life example
> h2. Possible evolutions
> - use an external queue (redis, amqp) to handle lot of events
> - subscribe in a pub/sub system such as WAMP?
> - allow batch processing (trigger processing DAG on n events or after a 
> timeout, gathering n messages alltogether)
> - for higher throughput, kafka?
> - Security, authentication and other related subject might be adresses in 
> another ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3867) Unification of the subpackage's name for GCP

2019-02-27 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-3867.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Unification of the subpackage's name for GCP
> 
>
> Key: AIRFLOW-3867
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3867
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kamil Bregula
>Assignee: Kamil Bregula
>Priority: Minor
> Fix For: 2.0.0
>
>
> The names for packages for Azure and AWS have been standardized. Google Cloud 
> Platform should follow the trend.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3970) Misleading navigation on the DAG screen.

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779355#comment-16779355
 ] 

ASF GitHub Bot commented on AIRFLOW-3970:
-

mik-laj commented on pull request #4796: [AIRFLOW-3970] Pull out the action 
buttons from the tabs
URL: https://github.com/apache/airflow/pull/4796
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
 - https://issues.apache.org/jira/browse/AIRFLOW-3970
   
   ### Description
   
   Assigning actions to a bookmark element is misleading.Now, bookmarks are 
used only for browsing and  buttons are used to perform action.
   
   Before:
   ![localhost_8004_tree_dag_id example_gcp_vision 
1](https://user-images.githubusercontent.com/12058428/53496242-e81ef780-3aa1-11e9-8082-4d57463e135a.png)
   
   After:
   ![localhost_8004_tree_dag_id 
example_gcp_vision](https://user-images.githubusercontent.com/12058428/53496105-9aa28a80-3aa1-11e9-87c1-3574b0f4badc.png)
   
   @ashb Can you look at it?
   
   ### Tests
   
   No applicable
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Misleading navigation on the DAG screen.
> 
>
> Key: AIRFLOW-3970
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3970
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kamil Bregula
>Assignee: Kamil Bregula
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3867) Unification of the subpackage's name for GCP

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779360#comment-16779360
 ] 

ASF GitHub Bot commented on AIRFLOW-3867:
-

Fokko commented on pull request #4690: [AIRFLOW-3867] Rename GCP's subpackage
URL: https://github.com/apache/airflow/pull/4690
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unification of the subpackage's name for GCP
> 
>
> Key: AIRFLOW-3867
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3867
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kamil Bregula
>Assignee: Kamil Bregula
>Priority: Minor
>
> The names for packages for Azure and AWS have been standardized. Google Cloud 
> Platform should follow the trend.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko merged pull request #4690: [AIRFLOW-3867] Rename GCP's subpackage

2019-02-27 Thread GitBox
Fokko merged pull request #4690: [AIRFLOW-3867] Rename GCP's subpackage
URL: https://github.com/apache/airflow/pull/4690
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >