[jira] [Commented] (AIRFLOW-3326) High Sierra Complaining 'in progress in another thread when fork() was called'

2018-11-21 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695621#comment-16695621
 ] 

Iuliia Volkova commented on AIRFLOW-3326:
-

[~ryan.yuan] is this all code of plugin? 

from airflow.contrib.hooks.bigquery_hook import BigQueryHook

class BQHook(BigQueryHook):
pass 

> High Sierra Complaining 'in progress in another thread when fork() was called'
> --
>
> Key: AIRFLOW-3326
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3326
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
> Environment: macOS High Sierra 10.13.6 (17G65)
>Reporter: Ryan Yuan
>Assignee: Ryan Yuan
>Priority: Blocker
>
> Inside the plugins folder, I have a hook that is a child class of 
> BigQueryHook. 
> {code:java}
> // code
> from airflow.contrib.hooks.bigquery_hook import BigQueryHook
> class BQHook(BigQueryHook):
> pass{code}
> When I run the airflow server, it keeps throwing messages complaining 'in 
> progress in another thread when fork() was called', and I can't use the web 
> server UI at all.
> {code:java}
> // messages from terminal
> objc[15098]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called.
> objc[15098]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called. We cannot safely call it or ignore it 
> in the fork() child process. Crashing instead. Set a breakpoint on 
> objc_initializeAfterForkError to debug.
> [2018-11-12 14:03:40 +1100] [15102] [INFO] Booting worker with pid: 15102
> [2018-11-12 14:03:40,792] {__init__.py:51} INFO - Using executor 
> SequentialExecutor
> [2018-11-12 14:03:40,851] {base_hook.py:83} INFO - Using connection to: 
> https://custom-data-z00100-dev.appspot.com/
> objc[15099]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called.
> objc[15099]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called. We cannot safely call it or ignore it 
> in the fork() child process. Crashing instead. Set a breakpoint on 
> objc_initializeAfterForkError to debug.
> [2018-11-12 14:03:40 +1100] [15103] [INFO] Booting worker with pid: 15103
> [2018-11-12 14:03:40,902] {base_hook.py:83} INFO - Using connection to: 
> https://custom-data-z00100-dev.appspot.com/
> objc[15101]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called.
> objc[15101]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called. We cannot safely call it or ignore it 
> in the fork() child process. Crashing instead. Set a breakpoint on 
> objc_initializeAfterForkError to debug.
> [2018-11-12 14:03:40 +1100] [15104] [INFO] Booting worker with pid: 15104
> [2018-11-12 14:03:40,948] {base_hook.py:83} INFO - Using connection to: 
> https://custom-data-z00100-dev.appspot.com/
> objc[15100]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called.
> objc[15100]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called. We cannot safely call it or ignore it 
> in the fork() child process. Crashing instead. Set a breakpoint on 
> objc_initializeAfterForkError to debug.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3118) DAGs not successful on new installation

2018-11-21 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695620#comment-16695620
 ] 

Iuliia Volkova commented on AIRFLOW-3118:
-

[~huyanhvn], make sense to open new PR for this ticket, because of the 
mentioned PR very old, it needs to be rebased, and author of PR must return 

> DAGs not successful on new installation
> ---
>
> Key: AIRFLOW-3118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3118
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.10.0
> Environment: Ubuntu 18.04
> Python 3.6
>Reporter: Brylie Christopher Oxley
>Assignee: Huy Nguyen
>Priority: Blocker
> Fix For: 1.10.2
>
> Attachments: Screenshot_20180926_161837.png, 
> image-2018-09-26-12-39-03-094.png
>
>
> When trying out Airflow, on localhost, none of the DAG runs are getting to 
> the 'success' state. They are getting stuck in 'running', or I manually label 
> them as failed:
> !image-2018-09-26-12-39-03-094.png!
> h2. Steps to reproduce
>  # create new conda environment
>  ** conda create -n airflow
>  ** source activate airflow
>  # install airflow
>  ** pip install apache-airflow
>  # initialize Airflow db
>  ** airflow initdb
>  # disable default paused setting in airflow.cfg
>  ** dags_are_paused_at_creation = False
>  # {color:#6a8759}run airflow and airflow scheduler (in separate 
> terminal){color}
>  ** {color:#6a8759}airflow scheduler{color}
>  ** {color:#6a8759}airflow webserver{color}
>  # {color:#6a8759}unpause example_bash_operator{color}
>  ** {color:#6a8759}airflow unpause example_bash_operator{color}
>  # {color:#6a8759}log in to Airflow UI{color}
>  # {color:#6a8759}turn on example_bash_operator{color}
>  # {color:#6a8759}click "Trigger DAG" in `example_bash_operator` row{color}
> h2. {color:#6a8759}Observed result{color}
> {color:#6a8759}The `example_bash_operator` never leaves the "running" 
> state.{color}
> h2. {color:#6a8759}Expected result{color}
> {color:#6a8759}The `example_bash_operator` would quickly enter the "success" 
> state{color}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] xnuinside edited a comment on issue #2635: [AIRFLOW-1561] Fix scheduler to pick up example DAGs without other DAGs

2018-11-21 Thread GitBox
xnuinside edited a comment on issue #2635: [AIRFLOW-1561] Fix scheduler to pick 
up example DAGs without other DAGs
URL: 
https://github.com/apache/incubator-airflow/pull/2635#issuecomment-440936233
 
 
   @mrkm4ntr, will you update PR and squash commits? 
   @kaxil, @ashb, can you review this PR? It was relative to several tasks, 
including https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-3118 
   
   if the author will not answer, maybe make sense just reopen it as PR for 
https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-3118 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on issue #2635: [AIRFLOW-1561] Fix scheduler to pick up example DAGs without other DAGs

2018-11-21 Thread GitBox
xnuinside commented on issue #2635: [AIRFLOW-1561] Fix scheduler to pick up 
example DAGs without other DAGs
URL: 
https://github.com/apache/incubator-airflow/pull/2635#issuecomment-440936233
 
 
   @mrkm4ntr, will you update PR and squash comments? 
   @kaxil, @ashb, can you review this PR? It was relative to several tasks, 
including https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-3118 
   
   if the author will not answer, maybe make sense just reopen it as PR for 
https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-3118 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3382) Fix incorrect docstring in DatastoreHook

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695461#comment-16695461
 ] 

ASF GitHub Bot commented on AIRFLOW-3382:
-

ryanyuan opened a new pull request #4222: [AIRFLOW-3382] Fix incorrect 
docstring in DatastoreHook
URL: https://github.com/apache/incubator-airflow/pull/4222
 
 
   Correct docstring in DatastoreHook
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following 
[Airflow-3382](https://issues.apache.org/jira/browse/AIRFLOW-3382) issues and 
references them in the PR title. 
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   Changed 'Google Cloud Storage' to "Google Cloud Datastore" in 
DatastoreHook
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   No tests for this
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix incorrect docstring in DatastoreHook
> 
>
> Key: AIRFLOW-3382
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3382
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Ryan Yuan
>Assignee: Ryan Yuan
>Priority: Minor
>
> One of the docstrings in DatastoreHook incorrectly states 'Google Cloud 
> Storage' instead of "Google Cloud Datastore"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] ryanyuan opened a new pull request #4222: [AIRFLOW-3382] Fix incorrect docstring in DatastoreHook

2018-11-21 Thread GitBox
ryanyuan opened a new pull request #4222: [AIRFLOW-3382] Fix incorrect 
docstring in DatastoreHook
URL: https://github.com/apache/incubator-airflow/pull/4222
 
 
   Correct docstring in DatastoreHook
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following 
[Airflow-3382](https://issues.apache.org/jira/browse/AIRFLOW-3382) issues and 
references them in the PR title. 
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   Changed 'Google Cloud Storage' to "Google Cloud Datastore" in 
DatastoreHook
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   No tests for this
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3382) Fix incorrect docstring in DatastoreHook

2018-11-21 Thread Ryan Yuan (JIRA)
Ryan Yuan created AIRFLOW-3382:
--

 Summary: Fix incorrect docstring in DatastoreHook
 Key: AIRFLOW-3382
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3382
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Ryan Yuan
Assignee: Ryan Yuan


One of the docstrings in DatastoreHook incorrectly states 'Google Cloud 
Storage' instead of "Google Cloud Datastore"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-3382) Fix incorrect docstring in DatastoreHook

2018-11-21 Thread Ryan Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3382 started by Ryan Yuan.
--
> Fix incorrect docstring in DatastoreHook
> 
>
> Key: AIRFLOW-3382
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3382
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Ryan Yuan
>Assignee: Ryan Yuan
>Priority: Minor
>
> One of the docstrings in DatastoreHook incorrectly states 'Google Cloud 
> Storage' instead of "Google Cloud Datastore"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] codecov-io edited a comment on issue #4207: [AIRFLOW-3367] Run celery integration test with redis broker.

2018-11-21 Thread GitBox
codecov-io edited a comment on issue #4207: [AIRFLOW-3367] Run celery 
integration test with redis broker.
URL: 
https://github.com/apache/incubator-airflow/pull/4207#issuecomment-439668923
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=h1)
 Report
   > Merging 
[#4207](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/6dee66f4662ac3750d902e2047c004c528bfb917?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/4207/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#4207  +/-   ##
   ==
   + Coverage   77.82%   77.82%   +<.01% 
   ==
 Files 201  201  
 Lines   1634116341  
   ==
   + Hits1271712718   +1 
   + Misses   3624 3623   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/4207/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `92.33% <0%> (+0.04%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=footer).
 Last update 
[6dee66f...0c4b7a4](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #4207: [AIRFLOW-3367] Run celery integration test with redis broker.

2018-11-21 Thread GitBox
codecov-io edited a comment on issue #4207: [AIRFLOW-3367] Run celery 
integration test with redis broker.
URL: 
https://github.com/apache/incubator-airflow/pull/4207#issuecomment-439668923
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=h1)
 Report
   > Merging 
[#4207](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/6dee66f4662ac3750d902e2047c004c528bfb917?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/4207/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#4207  +/-   ##
   ==
   + Coverage   77.82%   77.82%   +<.01% 
   ==
 Files 201  201  
 Lines   1634116341  
   ==
   + Hits1271712718   +1 
   + Misses   3624 3623   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/4207/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `92.33% <0%> (+0.04%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=footer).
 Last update 
[6dee66f...0c4b7a4](https://codecov.io/gh/apache/incubator-airflow/pull/4207?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (AIRFLOW-3318) Add a function to BigQueryHook to check the existence of a dataset.

2018-11-21 Thread Ryan Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Yuan closed AIRFLOW-3318.
--
Resolution: Won't Do

> Add a function to BigQueryHook to check the existence of a dataset.
> ---
>
> Key: AIRFLOW-3318
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3318
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, gcp, hooks
>Affects Versions: 1.10.0
>Reporter: Ryan Yuan
>Assignee: Ryan Yuan
>Priority: Major
>
> To check the existence of a dataset in BigQuery, existing BigQueryHook only 
> supports either 1) using get_datasets_list() to get all the datasets and then 
> searching the target dataset from the list; or 2) using get_dataset().
> However, with get_dataset(), it raises AirflowException whenever an HttpError 
> received. So it has no capabilities to determine if the dataset exists or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3381) KubernetesPodOperator: Use secretKeyRef or configMapKeyRef in env_vars

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695407#comment-16695407
 ] 

ASF GitHub Bot commented on AIRFLOW-3381:
-

abrenaut opened a new pull request #4221: [AIRFLOW-3381] Allow use of 
secretKeyRef or configMapKeyRef in env_vars
URL: https://github.com/apache/incubator-airflow/pull/4221
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3381
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Updated the KubernetesRequestFactory.extract_env_and_secrets() static method 
to support `valueFrom` for environment variables.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> KubernetesPodOperator: Use secretKeyRef or configMapKeyRef in env_vars
> --
>
> Key: AIRFLOW-3381
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3381
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: kubernetes
>Affects Versions: 1.10.0
>Reporter: Arthur Brenaut
>Priority: Major
>
> The env_vars attribute of the KubernetesPodOperator allows to pass 
> environment variables as string but it doesn't allows to pass a value from a 
> configmap or a secret.
> I'd like to be able to do
> {code:java}
> modeling = KubernetesPodOperator(
>  ...
>  env_vars={
>   'MY_ENV_VAR': {
>'valueFrom': {
> 'secretKeyRef': {
>  'name': 'an-already-existing-secret',
>  'key': 'key',
>}
>   }
>  },
>  ...
> )
> {code}
> Right now if I do that, Airflow generates the following config
> {code:java}
> - name: MY_ENV_VAR
>   value:
>valueFrom:
> configMapKeyRef:
>  name: an-already-existing-secret
>  key: key
> {code}
> instead of 
> {code:java}
> - name: MY_ENV_VAR
>   valueFrom:
>configMapKeyRef:
> name: an-already-existing-secret
> key: key
> {code}
> The _extract_env_and_secrets_ method of the _KubernetesRequestFactory_ could 
> check if the value is a dictionary and use it directly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] abrenaut opened a new pull request #4221: [AIRFLOW-3381] Allow use of secretKeyRef or configMapKeyRef in env_vars

2018-11-21 Thread GitBox
abrenaut opened a new pull request #4221: [AIRFLOW-3381] Allow use of 
secretKeyRef or configMapKeyRef in env_vars
URL: https://github.com/apache/incubator-airflow/pull/4221
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3381
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Updated the KubernetesRequestFactory.extract_env_and_secrets() static method 
to support `valueFrom` for environment variables.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-3213) Create ADLS to GCS operator

2018-11-21 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-3213.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

Resolved by https://github.com/apache/incubator-airflow/pull/4134

> Create ADLS to GCS operator 
> 
>
> Key: AIRFLOW-3213
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3213
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp, operators
>Reporter: Brandon Kvarda
>Assignee: Brandon Kvarda
>Priority: Minor
> Fix For: 2.0.0
>
>
> Create ADLS to GCS operator that supports copying of files from ADLS to GCS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3318) Add a function to BigQueryHook to check the existence of a dataset.

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695359#comment-16695359
 ] 

ASF GitHub Bot commented on AIRFLOW-3318:
-

ryanyuan closed pull request #4164: [AIRFLOW-3318] BigQueryHook check if 
dataset exists
URL: https://github.com/apache/incubator-airflow/pull/4164
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index d300dbe6b7..1c6f7329cd 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -1515,6 +1515,26 @@ def get_datasets_list(self, project_id=None):
 
 return datasets_list
 
+def dataset_exists(self, project_id, dataset_id):
+"""
+Checks for the existence of a dataset in Google BigQuery.
+
+:param project_id: The Google cloud project in which to look for the
+dataset. The connection supplied to the hook must provide access to
+the specified project.
+:type project_id: str
+:param dataset_id: The name of the dataset to check the existence of.
+:type dataset_id: str
+"""
+try:
+self.service.datasets().get(
+projectId=project_id, datasetId=dataset_id).execute()
+return True
+except errors.HttpError as e:
+if e.resp['status'] == '404':
+return False
+raise
+
 
 class BigQueryCursor(BigQueryBaseCursor):
 """
diff --git a/tests/contrib/hooks/test_bigquery_hook.py 
b/tests/contrib/hooks/test_bigquery_hook.py
index 8f350ff2ee..ad0bef1694 100644
--- a/tests/contrib/hooks/test_bigquery_hook.py
+++ b/tests/contrib/hooks/test_bigquery_hook.py
@@ -22,13 +22,15 @@
 
 from google.auth.exceptions import GoogleAuthError
 import mock
-
+from apiclient import errors
 from airflow.contrib.hooks import bigquery_hook as hook
 from airflow.contrib.hooks.bigquery_hook import _cleanse_time_partitioning, \
 _validate_value, _api_resource_configs_duplication_check
 
 bq_available = True
 
+EMPTY_CONTENT = ''.encode('utf8')
+
 try:
 hook.BigQueryHook().get_service()
 except GoogleAuthError:
@@ -401,6 +403,49 @@ def test_get_datasets_list(self):
 project_id=project_id)
 self.assertEqual(result, expected_result['datasets'])
 
+def test_check_dataset_exists(self):
+dataset_id = "dataset_test"
+project_id = "project-test"
+dataset_result = {
+"kind": "bigquery#dataset",
+"location": "US",
+"id": "{}:{}".format(project_id, dataset_id),
+"datasetReference": {
+"projectId": project_id,
+"datasetId": dataset_id
+}
+}
+
+mocked = mock.Mock()
+with mock.patch.object(
+hook.BigQueryBaseCursor(mocked, project_id).service, "datasets"
+) as mock_service:
+mock_service.return_value.get(
+datasetId=dataset_id, projectId=project_id
+).execute.return_value = dataset_result
+result = hook.BigQueryBaseCursor(
+mocked, "test_check_dataset_exists"
+).dataset_exists(dataset_id=dataset_id, project_id=project_id)
+self.assertTrue(result)
+
+def test_check_dataset_exists_not_exist(self):
+dataset_id = "dataset_test"
+project_id = "project_test"
+
+mocked = mock.Mock()
+with mock.patch.object(
+hook.BigQueryBaseCursor(mocked, project_id).service, "datasets"
+) as mock_service:
+(
+mock_service.return_value.get(
+dataset_id=dataset_id, project_id=project_id
+).execute.side_effect
+) = errors.HttpError(resp={"status": "404"}, content=EMPTY_CONTENT)
+result = hook.BigQueryBaseCursor(
+mocked, "test_check_dataset_exists_not_found"
+).dataset_exists(dataset_id=dataset_id, project_id=project_id)
+self.assertFalse(result)
+
 
 class TestTimePartitioningInRunJob(unittest.TestCase):
 @mock.patch("airflow.contrib.hooks.bigquery_hook.LoggingMixin")


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add a function to BigQueryHook to check the existence of a dataset.
> ---
>
>   

[GitHub] ryanyuan closed pull request #4164: [AIRFLOW-3318] BigQueryHook check if dataset exists

2018-11-21 Thread GitBox
ryanyuan closed pull request #4164: [AIRFLOW-3318] BigQueryHook check if 
dataset exists
URL: https://github.com/apache/incubator-airflow/pull/4164
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index d300dbe6b7..1c6f7329cd 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -1515,6 +1515,26 @@ def get_datasets_list(self, project_id=None):
 
 return datasets_list
 
+def dataset_exists(self, project_id, dataset_id):
+"""
+Checks for the existence of a dataset in Google BigQuery.
+
+:param project_id: The Google cloud project in which to look for the
+dataset. The connection supplied to the hook must provide access to
+the specified project.
+:type project_id: str
+:param dataset_id: The name of the dataset to check the existence of.
+:type dataset_id: str
+"""
+try:
+self.service.datasets().get(
+projectId=project_id, datasetId=dataset_id).execute()
+return True
+except errors.HttpError as e:
+if e.resp['status'] == '404':
+return False
+raise
+
 
 class BigQueryCursor(BigQueryBaseCursor):
 """
diff --git a/tests/contrib/hooks/test_bigquery_hook.py 
b/tests/contrib/hooks/test_bigquery_hook.py
index 8f350ff2ee..ad0bef1694 100644
--- a/tests/contrib/hooks/test_bigquery_hook.py
+++ b/tests/contrib/hooks/test_bigquery_hook.py
@@ -22,13 +22,15 @@
 
 from google.auth.exceptions import GoogleAuthError
 import mock
-
+from apiclient import errors
 from airflow.contrib.hooks import bigquery_hook as hook
 from airflow.contrib.hooks.bigquery_hook import _cleanse_time_partitioning, \
 _validate_value, _api_resource_configs_duplication_check
 
 bq_available = True
 
+EMPTY_CONTENT = ''.encode('utf8')
+
 try:
 hook.BigQueryHook().get_service()
 except GoogleAuthError:
@@ -401,6 +403,49 @@ def test_get_datasets_list(self):
 project_id=project_id)
 self.assertEqual(result, expected_result['datasets'])
 
+def test_check_dataset_exists(self):
+dataset_id = "dataset_test"
+project_id = "project-test"
+dataset_result = {
+"kind": "bigquery#dataset",
+"location": "US",
+"id": "{}:{}".format(project_id, dataset_id),
+"datasetReference": {
+"projectId": project_id,
+"datasetId": dataset_id
+}
+}
+
+mocked = mock.Mock()
+with mock.patch.object(
+hook.BigQueryBaseCursor(mocked, project_id).service, "datasets"
+) as mock_service:
+mock_service.return_value.get(
+datasetId=dataset_id, projectId=project_id
+).execute.return_value = dataset_result
+result = hook.BigQueryBaseCursor(
+mocked, "test_check_dataset_exists"
+).dataset_exists(dataset_id=dataset_id, project_id=project_id)
+self.assertTrue(result)
+
+def test_check_dataset_exists_not_exist(self):
+dataset_id = "dataset_test"
+project_id = "project_test"
+
+mocked = mock.Mock()
+with mock.patch.object(
+hook.BigQueryBaseCursor(mocked, project_id).service, "datasets"
+) as mock_service:
+(
+mock_service.return_value.get(
+dataset_id=dataset_id, project_id=project_id
+).execute.side_effect
+) = errors.HttpError(resp={"status": "404"}, content=EMPTY_CONTENT)
+result = hook.BigQueryBaseCursor(
+mocked, "test_check_dataset_exists_not_found"
+).dataset_exists(dataset_id=dataset_id, project_id=project_id)
+self.assertFalse(result)
+
 
 class TestTimePartitioningInRunJob(unittest.TestCase):
 @mock.patch("airflow.contrib.hooks.bigquery_hook.LoggingMixin")


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4197: [AIRFLOW-3348] update run statistics on dag refresh

2018-11-21 Thread GitBox
feng-tao commented on issue #4197:  [AIRFLOW-3348] update run statistics on dag 
refresh
URL: 
https://github.com/apache/incubator-airflow/pull/4197#issuecomment-440849220
 
 
   @ms32035 , Do you need to do it for old UI as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #4190: [AIRFLOW-3368] Remove {table} parameter on COPY command

2018-11-21 Thread GitBox
kaxil commented on issue #4190: [AIRFLOW-3368] Remove {table} parameter on COPY 
command
URL: 
https://github.com/apache/incubator-airflow/pull/4190#issuecomment-440848674
 
 
   Please follow the commit guidelines


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3371) BigQueryHook's Ability to Create View

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695349#comment-16695349
 ] 

ASF GitHub Bot commented on AIRFLOW-3371:
-

kaxil closed pull request #4213: [AIRFLOW-3371] BigQueryHook's Ability to 
Create View
URL: https://github.com/apache/incubator-airflow/pull/4213
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index a03429e155..c7324adde4 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -218,10 +218,11 @@ def create_empty_table(self,
table_id,
schema_fields=None,
time_partitioning=None,
-   labels=None
-   ):
+   labels=None,
+   view=None):
 """
 Creates a new, empty table in the dataset.
+To create a view, which is defined by a SQL query, parse a dictionary 
to 'view' kwarg
 
 :param project_id: The project to create the table into.
 :type project_id: str
@@ -246,6 +247,17 @@ def create_empty_table(self,
 .. seealso::
 
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#timePartitioning
 :type time_partitioning: dict
+:param view: [Optional] A dictionary containing definition for the 
view.
+If set, it will create a view instead of a table:
+
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#view
+:type view: dict
+
+**Example**: ::
+
+view = {
+"query": "SELECT * FROM 
`test-project-id.test_dataset_id.test_table_prefix*` LIMIT 1000",
+"useLegacySql": False
+}
 
 :return:
 """
@@ -267,6 +279,9 @@ def create_empty_table(self,
 if labels:
 table_resource['labels'] = labels
 
+if view:
+table_resource['view'] = view
+
 self.log.info('Creating Table %s:%s.%s',
   project_id, dataset_id, table_id)
 
diff --git a/tests/contrib/hooks/test_bigquery_hook.py 
b/tests/contrib/hooks/test_bigquery_hook.py
index 9099dcbbb7..8c59116c85 100644
--- a/tests/contrib/hooks/test_bigquery_hook.py
+++ b/tests/contrib/hooks/test_bigquery_hook.py
@@ -22,6 +22,7 @@
 
 from google.auth.exceptions import GoogleAuthError
 import mock
+from apiclient.errors import HttpError
 
 from airflow.contrib.hooks import bigquery_hook as hook
 from airflow.contrib.hooks.bigquery_hook import _cleanse_time_partitioning, \
@@ -344,6 +345,48 @@ def test_insert_all_fail(self, run_with_config):
 cursor.insert_all(project_id, dataset_id, table_id,
   rows, fail_on_error=True)
 
+@mock.patch.object(hook.BigQueryBaseCursor, 'run_with_configuration')
+def test_create_view_fails_on_exception(self, run_with_config):
+project_id = 'bq-project'
+dataset_id = 'bq_dataset'
+table_id = 'bq_table_view'
+view = {
+'incorrect_key': 'SELECT * FROM 
`test-project-id.test_dataset_id.test_table_prefix*`',
+"useLegacySql": False
+}
+
+mock_service = mock.Mock()
+method = (mock_service.tables.return_value.insert)
+method.return_value.execute.side_effect = HttpError(
+resp={'status': '400'}, content=b'Query is required for views')
+cursor = hook.BigQueryBaseCursor(mock_service, project_id)
+with self.assertRaises(Exception):
+cursor.create_empty_table(project_id, dataset_id, table_id,
+  view=view)
+
+@mock.patch.object(hook.BigQueryBaseCursor, 'run_with_configuration')
+def test_create_view(self, run_with_config):
+project_id = 'bq-project'
+dataset_id = 'bq_dataset'
+table_id = 'bq_table_view'
+view = {
+'query': 'SELECT * FROM 
`test-project-id.test_dataset_id.test_table_prefix*`',
+"useLegacySql": False
+}
+
+mock_service = mock.Mock()
+method = (mock_service.tables.return_value.insert)
+cursor = hook.BigQueryBaseCursor(mock_service, project_id)
+cursor.create_empty_table(project_id, dataset_id, table_id,
+  view=view)
+body = {
+'tableReference': {
+'tableId': table_id
+},
+'view': view
+}
+method.assert_called_once_with(projectId=project_id, 
datasetId=dataset_id, body=body)
+
 
 class 

[GitHub] kaxil closed pull request #4134: [AIRFLOW-3213] Create ADLS to GCS operator

2018-11-21 Thread GitBox
kaxil closed pull request #4134: [AIRFLOW-3213] Create ADLS to GCS operator
URL: https://github.com/apache/incubator-airflow/pull/4134
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/adls_to_gcs.py 
b/airflow/contrib/operators/adls_to_gcs.py
new file mode 100644
index 00..affbd45626
--- /dev/null
+++ b/airflow/contrib/operators/adls_to_gcs.py
@@ -0,0 +1,146 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import os
+from tempfile import NamedTemporaryFile
+
+from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook
+from airflow.contrib.operators.adls_list_operator import 
AzureDataLakeStorageListOperator
+from airflow.contrib.hooks.gcs_hook import GoogleCloudStorageHook, 
_parse_gcs_url
+from airflow.utils.decorators import apply_defaults
+
+
+class AdlsToGoogleCloudStorageOperator(AzureDataLakeStorageListOperator):
+"""
+Synchronizes an Azure Data Lake Storage path with a GCS bucket
+
+:param src_adls: The Azure Data Lake path to find the objects (templated)
+:type src_adls: str
+:param dest_gcs: The Google Cloud Storage bucket and prefix to
+store the objects. (templated)
+:type dest_gcs: str
+:param replace: If true, replaces same-named files in GCS
+:type replace: bool
+:param azure_data_lake_conn_id: The connection ID to use when
+connecting to Azure Data Lake Storage.
+:type azure_data_lake_conn_id: str
+:param google_cloud_storage_conn_id: The connection ID to use when
+connecting to Google Cloud Storage.
+:type google_cloud_storage_conn_id: str
+:param delegate_to: The account to impersonate, if any.
+For this to work, the service account making the request must have
+domain-wide delegation enabled.
+:type delegate_to: str
+
+**Examples**:
+The following Operator would copy a single file named
+``hello/world.avro`` from ADLS to the GCS bucket ``mybucket``. Its full
+resulting gcs path will be ``gs://mybucket/hello/world.avro`` ::
+copy_single_file = AdlsToGoogleCloudStorageOperator(
+task_id='copy_single_file',
+src_adls='hello/world.avro',
+dest_gcs='gs://mybucket',
+replace=False,
+azure_data_lake_conn_id='azure_data_lake_default',
+google_cloud_storage_conn_id='google_cloud_default'
+)
+
+The following Operator would copy all parquet files from ADLS
+to the GCS bucket ``mybucket``. ::
+copy_all_files = AdlsToGoogleCloudStorageOperator(
+task_id='copy_all_files',
+src_adls='*.parquet',
+dest_gcs='gs://mybucket',
+replace=False,
+azure_data_lake_conn_id='azure_data_lake_default',
+google_cloud_storage_conn_id='google_cloud_default'
+)
+
+ The following Operator would copy all parquet files from ADLS
+ path ``/hello/world``to the GCS bucket ``mybucket``. ::
+copy_world_files = AdlsToGoogleCloudStorageOperator(
+task_id='copy_world_files',
+src_adls='hello/world/*.parquet',
+dest_gcs='gs://mybucket',
+replace=False,
+azure_data_lake_conn_id='azure_data_lake_default',
+google_cloud_storage_conn_id='google_cloud_default'
+)
+"""
+template_fields = ('src_adls', 'dest_gcs')
+ui_color = '#f0eee4'
+
+@apply_defaults
+def __init__(self,
+ src_adls,
+ dest_gcs,
+ azure_data_lake_conn_id,
+ google_cloud_storage_conn_id,
+ delegate_to=None,
+ replace=False,
+ *args,
+ **kwargs):
+
+super(AdlsToGoogleCloudStorageOperator, self).__init__(
+path=src_adls,
+

[jira] [Commented] (AIRFLOW-3213) Create ADLS to GCS operator

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695352#comment-16695352
 ] 

ASF GitHub Bot commented on AIRFLOW-3213:
-

kaxil closed pull request #4134: [AIRFLOW-3213] Create ADLS to GCS operator
URL: https://github.com/apache/incubator-airflow/pull/4134
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/adls_to_gcs.py 
b/airflow/contrib/operators/adls_to_gcs.py
new file mode 100644
index 00..affbd45626
--- /dev/null
+++ b/airflow/contrib/operators/adls_to_gcs.py
@@ -0,0 +1,146 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import os
+from tempfile import NamedTemporaryFile
+
+from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook
+from airflow.contrib.operators.adls_list_operator import 
AzureDataLakeStorageListOperator
+from airflow.contrib.hooks.gcs_hook import GoogleCloudStorageHook, 
_parse_gcs_url
+from airflow.utils.decorators import apply_defaults
+
+
+class AdlsToGoogleCloudStorageOperator(AzureDataLakeStorageListOperator):
+"""
+Synchronizes an Azure Data Lake Storage path with a GCS bucket
+
+:param src_adls: The Azure Data Lake path to find the objects (templated)
+:type src_adls: str
+:param dest_gcs: The Google Cloud Storage bucket and prefix to
+store the objects. (templated)
+:type dest_gcs: str
+:param replace: If true, replaces same-named files in GCS
+:type replace: bool
+:param azure_data_lake_conn_id: The connection ID to use when
+connecting to Azure Data Lake Storage.
+:type azure_data_lake_conn_id: str
+:param google_cloud_storage_conn_id: The connection ID to use when
+connecting to Google Cloud Storage.
+:type google_cloud_storage_conn_id: str
+:param delegate_to: The account to impersonate, if any.
+For this to work, the service account making the request must have
+domain-wide delegation enabled.
+:type delegate_to: str
+
+**Examples**:
+The following Operator would copy a single file named
+``hello/world.avro`` from ADLS to the GCS bucket ``mybucket``. Its full
+resulting gcs path will be ``gs://mybucket/hello/world.avro`` ::
+copy_single_file = AdlsToGoogleCloudStorageOperator(
+task_id='copy_single_file',
+src_adls='hello/world.avro',
+dest_gcs='gs://mybucket',
+replace=False,
+azure_data_lake_conn_id='azure_data_lake_default',
+google_cloud_storage_conn_id='google_cloud_default'
+)
+
+The following Operator would copy all parquet files from ADLS
+to the GCS bucket ``mybucket``. ::
+copy_all_files = AdlsToGoogleCloudStorageOperator(
+task_id='copy_all_files',
+src_adls='*.parquet',
+dest_gcs='gs://mybucket',
+replace=False,
+azure_data_lake_conn_id='azure_data_lake_default',
+google_cloud_storage_conn_id='google_cloud_default'
+)
+
+ The following Operator would copy all parquet files from ADLS
+ path ``/hello/world``to the GCS bucket ``mybucket``. ::
+copy_world_files = AdlsToGoogleCloudStorageOperator(
+task_id='copy_world_files',
+src_adls='hello/world/*.parquet',
+dest_gcs='gs://mybucket',
+replace=False,
+azure_data_lake_conn_id='azure_data_lake_default',
+google_cloud_storage_conn_id='google_cloud_default'
+)
+"""
+template_fields = ('src_adls', 'dest_gcs')
+ui_color = '#f0eee4'
+
+@apply_defaults
+def __init__(self,
+ src_adls,
+ dest_gcs,
+ azure_data_lake_conn_id,
+ google_cloud_storage_conn_id,

[GitHub] kaxil commented on issue #4164: [AIRFLOW-3318] BigQueryHook check if dataset exists

2018-11-21 Thread GitBox
kaxil commented on issue #4164: [AIRFLOW-3318] BigQueryHook check if dataset 
exists
URL: 
https://github.com/apache/incubator-airflow/pull/4164#issuecomment-440845912
 
 
   Correct @xnuinside 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-3371) BigQueryHook's Ability to Create View

2018-11-21 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-3371.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

Resolved by https://github.com/apache/incubator-airflow/pull/4213

> BigQueryHook's Ability to Create View
> -
>
> Key: AIRFLOW-3371
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3371
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Ryan Yuan
>Assignee: Ryan Yuan
>Priority: Major
> Fix For: 2.0.0
>
>
> Modify *BigQueryBaseCursor.create_empty_table()* to take in an optional 
> 'view' parameter to create view in BigQuery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3381) KubernetesPodOperator: Use secretKeyRef or configMapKeyRef in env_vars

2018-11-21 Thread Arthur Brenaut (JIRA)
Arthur Brenaut created AIRFLOW-3381:
---

 Summary: KubernetesPodOperator: Use secretKeyRef or 
configMapKeyRef in env_vars
 Key: AIRFLOW-3381
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3381
 Project: Apache Airflow
  Issue Type: Improvement
  Components: kubernetes
Affects Versions: 1.10.0
Reporter: Arthur Brenaut


The env_vars attribute of the KubernetesPodOperator allows to pass environment 
variables as string but it doesn't allows to pass a value from a configmap or a 
secret.

I'd like to be able to do
{code:java}
modeling = KubernetesPodOperator(
 ...
 env_vars={
  'MY_ENV_VAR': {
   'valueFrom': {
'secretKeyRef': {
 'name': 'an-already-existing-secret',
 'key': 'key',
   }
  }
 },
 ...
)
{code}
Right now if I do that, Airflow generates the following config
{code:java}
- name: MY_ENV_VAR
  value:
   valueFrom:
configMapKeyRef:
 name: an-already-existing-secret
 key: key
{code}
instead of 
{code:java}
- name: MY_ENV_VAR
  valueFrom:
   configMapKeyRef:
name: an-already-existing-secret
key: key
{code}
The _extract_env_and_secrets_ method of the _KubernetesRequestFactory_ could 
check if the value is a dictionary and use it directly.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


svn commit: r31028 - /release/incubator/airflow/1.10.1-incubating/

2018-11-21 Thread ash
Author: ash
Date: Wed Nov 21 22:18:50 2018
New Revision: 31028

Log:
Apache Airflow (Incubating) 1.10.1 release

Added:
release/incubator/airflow/1.10.1-incubating/

release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-bin.tar.gz
   (with props)

release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-bin.tar.gz.asc

release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-bin.tar.gz.sha512

release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-source.tar.gz
   (with props)

release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-source.tar.gz.asc

release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-source.tar.gz.sha512

Added: 
release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-bin.tar.gz
==
Binary file - no diff available.

Propchange: 
release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-bin.tar.gz
--
svn:mime-type = application/octet-stream

Added: 
release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-bin.tar.gz.asc
==
--- 
release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-bin.tar.gz.asc
 (added)
+++ 
release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-bin.tar.gz.asc
 Wed Nov 21 22:18:50 2018
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJDBAABCAAtFiEEXMrqx1jtZMoyPwU7gHxzGoyCoJUFAlvsnuAPHGFzaEBhcGFj
+aGUub3JnAAoJEIB8cxqMgqCV+M8P/20FSycvdJIWqDMCWvziWc8RHuWgu7C/aeZJ
+OnMMutpMGaK51cSVc5dTKMxMbHzL61Uh7nUypAHwHN4sk6oukbBPF+AazVgT1fMP
+Mi0LujgUyiXyH0fF0/vY1Rf+G9cCX7kGWPEpx2jbQcPqtHUlp2UUk/4WAf+3rMmY
+xLQqdkGzFG/+ZgHRCiVfGTynLV3BFEy1kMFqlVwqRmDs0F6JnxLzN3V9ESA4H15T
+PNQ3WLZnAslSWgxxl2DiuzQzJBW4Y6zra1TrOC1Ag0UpxowYzwwd/BlcaAfaQtPX
+VS6lMJ25GgvG+/8H3viGHjaVpagFC/abnJJ/ZPLNaqtFne5hZkpjks54ET2X56Sw
+ZcduNamLjKLe89vXrny/4UkqRKi43rCRmxrEEMoTH2u4u+vZfhbTo2NiXXsHiXur
+Cu1ItjD/bGi3ybOUXbwSOJJBkjLbqU7GS05MP8TVLDvpeyD2HhK00w8Dq3DZju2H
+q3ofIi7f2dDkB9vzw8UxNsx9pw5kBqB+z8BylcDmdtIz0WJHwCKNk+/iRQpVpl6B
+phD8zjYQ1YkUTt/4AJFRj0Z5+P6gf5yQHFlVybW9yyxl1uSbVI0bnq642yNdEWpb
+mpns1wjHn5OLv7et7R/uSbIYVbIRAdX8nq3gWXD0U3xg9hCOPOxAVk5VnjKasU7L
+gbGXY/FC
+=cDD/
+-END PGP SIGNATURE-

Added: 
release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-bin.tar.gz.sha512
==
--- 
release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-bin.tar.gz.sha512
 (added)
+++ 
release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-bin.tar.gz.sha512
 Wed Nov 21 22:18:50 2018
@@ -0,0 +1,3 @@
+apache-airflow-1.10.1rc2+incubating-bin.tar.gz: 
+82160600 2ABEB75F 6D6EA432 117E6803 EFCC8F70 A5DA1994 D22C4DBA 95CFFB71 
E0DFACF6
+ 81E26F71 DBF0764C C1C5A77C DC8ABD0C D579F488 E8178AAC C11A9156

Added: 
release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-source.tar.gz
==
Binary file - no diff available.

Propchange: 
release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-source.tar.gz
--
svn:mime-type = application/octet-stream

Added: 
release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-source.tar.gz.asc
==
--- 
release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-source.tar.gz.asc
 (added)
+++ 
release/incubator/airflow/1.10.1-incubating/apache-airflow-1.10.1+incubating-source.tar.gz.asc
 Wed Nov 21 22:18:50 2018
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJDBAABCAAtFiEEXMrqx1jtZMoyPwU7gHxzGoyCoJUFAlvsnuQPHGFzaEBhcGFj
+aGUub3JnAAoJEIB8cxqMgqCV1JEP/2XEjoxRi/C9E9RyHsZr7TC8byJnySzdm/7V
+abVhdlHgPknHT8kGKmBw9hA6PDZJeFl+ne/AMDlIhpibjaGnl8+74K54qejSu/bP
+k6dUMkwkuLRTt5b8a9TwoqVkU0kbNiFEaS9yuI9YRDkscadNWdduvS0FibKRwgXw
+WiyeqBT8rcQ3iUKKJe9RZRYN1kgfWxjQah3Onq4ruqWkV6bANdYWMoRdUeYalP44
+M7AKVyBWJKn+wzDUlq8RXh5Et/Of8Vwn63FvUjevwb6J4ZzWg8HVLxihktF5SWuY
+CWnhQdFB9cj/YWI0DZf/qJhGS17fICPR1kARHmKRykXPt2z3LmN6OKOkaVskiipC
+Inz2KTgTo250MIzfdf6nPDbC2GHlhWVjxRnmIT4BRMjtk0bojgJZHcKXapu4f/3q
+h2QyeyNDQzS/f3Hn7nggnMt1fvttuqKLNcE0aw9ZnDXAcls+xjziNU55bJK2kGpL
+ScY016rCzuxiE4UFnAZRikq7a7g3nokJGmxksqPKJv7UHGGAQHIFxvyGGGc0FdiM
+i9UImBzKXW3x4n/V8BSXumptA/s8zHhOtxro+ZVOP+xK6w1e9oNga4HEQ9SMy6FQ
+NTwow6G/xpPmuJoDXHteFNbJQsV3ZNZnAkWBUdCOghU6L7siq/WTsTeoHCk6yhyj
+pkfrvkqz
+=ja6Q
+-END PGP SIGNATURE-

Added: 

[GitHub] feng-tao closed pull request #4220: [AIRFLOW-XXX] Update NOTICE file per suggestion

2018-11-21 Thread GitBox
feng-tao closed pull request #4220: [AIRFLOW-XXX] Update NOTICE file per 
suggestion
URL: https://github.com/apache/incubator-airflow/pull/4220
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/NOTICE b/NOTICE
index 99069f7a40..a642546efb 100644
--- a/NOTICE
+++ b/NOTICE
@@ -6,13 +6,6 @@ Foundation (http://www.apache.org/).
 
 ===
 
-Apache Airflow contains subcomponents with separate copyright notices and
-license terms. Your use of the source code for the these subcomponents
-is subject to the terms and conditions of their respective licenses.
-
-See the LICENSE file for a list of subcomponents and dependencies and
-their respective licenses.
-
 airflow.contrib.auth.backends.github_enterprise_auth:
 -
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4220: [AIRFLOW-XXX] Update NOTICE file per suggestion

2018-11-21 Thread GitBox
feng-tao commented on issue #4220: [AIRFLOW-XXX] Update NOTICE file per 
suggestion
URL: 
https://github.com/apache/incubator-airflow/pull/4220#issuecomment-440825535
 
 
   thanks @ashb for running the release.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4220: [AIRFLOW-XXX] Update NOTICE file per suggestion

2018-11-21 Thread GitBox
feng-tao commented on issue #4220: [AIRFLOW-XXX] Update NOTICE file per 
suggestion
URL: 
https://github.com/apache/incubator-airflow/pull/4220#issuecomment-440816019
 
 
   PTAL @ashb 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao opened a new pull request #4220: [AIRFLOW-XXX] Update NOTICE file per suggestion

2018-11-21 Thread GitBox
feng-tao opened a new pull request #4220: [AIRFLOW-XXX] Update NOTICE file per 
suggestion
URL: https://github.com/apache/incubator-airflow/pull/4220
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Per suggestion by Justin from IPMC in 
(http://mail-archives.apache.org/mod_mbox/incubator-general/201811.mbox/%3cf50b2c71-9706-4f96-86d4-63776bd7d...@apache.org%3E),
 we should remove the following lines in NOTICE file as they are not needed.
   
   PS: here is the comment:
   ```Hi,
   
   +1 (binding)
   
   I checked:
   - incubating in name
   - signatures and hashes good
   - DISCLAIMER exists
   - LICENSE and NOTICE correct
   - No unexpected binary files
   - All ASF source code has ASF headers
   
   I don’t have the setup to test if it compiles.
   
   One minor thing I’d remove the "Apache Airflow contains subcomponents …” and 
"See the LICENSE file …” from teh NOTICE file as I don’t think they are needed.
   
   Thanks,
   Justin```
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3380) Metrics documentation

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695020#comment-16695020
 ] 

ASF GitHub Bot commented on AIRFLOW-3380:
-

feng-tao closed pull request #4219: [AIRFLOW-3380] Metrics documentation
URL: https://github.com/apache/incubator-airflow/pull/4219
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/index.rst b/docs/index.rst
index 4c18ce5ce6..efd0a8b78d 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -103,6 +103,7 @@ Content
 timezone
 api
 integration
+metrics
 lineage
 faq
 code
diff --git a/docs/metrics.rst b/docs/metrics.rst
new file mode 100644
index 00..29819c03e6
--- /dev/null
+++ b/docs/metrics.rst
@@ -0,0 +1,67 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+..http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Metrics
+===
+
+Configuration
+-
+Airflow can be set up to send metrics to `StatsD 
`__:
+
+.. code-block:: bash
+
+[scheduler]
+statsd_on = True
+statsd_host = localhost
+statsd_port = 8125
+statsd_prefix = airflow
+
+Counters
+
+
+=== 

+NameDescription
+=== 

+_startNumber of started  job, ex. 
SchedulerJob, LocalTaskJob
+_end  Number of ended  job, ex. 
SchedulerJob, LocalTaskJob
+operator_failures_   Operator  failures
+operator_successes_  Operator  successes
+ti_failures Overall task instances failures
+ti_successesOverall task instances successes
+zombies_killed  Zombie tasks killed
+scheduler_heartbeat Scheduler heartbeats
+=== 

+
+Gauges
+--
+
+= =
+Name  Description
+= =
+collect_dags  Seconds taken to scan and import DAGs
+dagbag_import_errors  DAG import errors
+dagbag_size   DAG bag size
+= =
+
+Timers
+--
+
+= ===
+Name  Description
+= ===
+dagrun.dependency-check.  Seconds taken to check DAG dependencies
+= ===


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Metrics documentation
> -
>
> Key: AIRFLOW-3380
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3380
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Bartosz Ługowski
>Assignee: Bartosz Ługowski
>Priority: Trivial
>
> At the moment there is no documentation about Airflow metrics:
>  * how to enable it
>  * which metrics are generated
>  * description of each metric



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao commented on issue #4219: [AIRFLOW-3380] Metrics documentation

2018-11-21 Thread GitBox
feng-tao commented on issue #4219: [AIRFLOW-3380] Metrics documentation
URL: 
https://github.com/apache/incubator-airflow/pull/4219#issuecomment-440753554
 
 
   thanks. lgtm. But i am not sure if the list is comprehensive for all the 
stats. But we could update the missings later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao closed pull request #4219: [AIRFLOW-3380] Metrics documentation

2018-11-21 Thread GitBox
feng-tao closed pull request #4219: [AIRFLOW-3380] Metrics documentation
URL: https://github.com/apache/incubator-airflow/pull/4219
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/index.rst b/docs/index.rst
index 4c18ce5ce6..efd0a8b78d 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -103,6 +103,7 @@ Content
 timezone
 api
 integration
+metrics
 lineage
 faq
 code
diff --git a/docs/metrics.rst b/docs/metrics.rst
new file mode 100644
index 00..29819c03e6
--- /dev/null
+++ b/docs/metrics.rst
@@ -0,0 +1,67 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+..http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Metrics
+===
+
+Configuration
+-
+Airflow can be set up to send metrics to `StatsD 
`__:
+
+.. code-block:: bash
+
+[scheduler]
+statsd_on = True
+statsd_host = localhost
+statsd_port = 8125
+statsd_prefix = airflow
+
+Counters
+
+
+=== 

+NameDescription
+=== 

+_startNumber of started  job, ex. 
SchedulerJob, LocalTaskJob
+_end  Number of ended  job, ex. 
SchedulerJob, LocalTaskJob
+operator_failures_   Operator  failures
+operator_successes_  Operator  successes
+ti_failures Overall task instances failures
+ti_successesOverall task instances successes
+zombies_killed  Zombie tasks killed
+scheduler_heartbeat Scheduler heartbeats
+=== 

+
+Gauges
+--
+
+= =
+Name  Description
+= =
+collect_dags  Seconds taken to scan and import DAGs
+dagbag_import_errors  DAG import errors
+dagbag_size   DAG bag size
+= =
+
+Timers
+--
+
+= ===
+Name  Description
+= ===
+dagrun.dependency-check.  Seconds taken to check DAG dependencies
+= ===


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3375) Support returning multiple tasks with BranchPythonOperator

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694810#comment-16694810
 ] 

ASF GitHub Bot commented on AIRFLOW-3375:
-

Fokko closed pull request #4215: [AIRFLOW-3375] Support returning multiple 
tasks with BranchPythonOperator
URL: https://github.com/apache/incubator-airflow/pull/4215
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/operators/python_operator.py 
b/airflow/operators/python_operator.py
index 9b31838b0c..a92cb86642 100644
--- a/airflow/operators/python_operator.py
+++ b/airflow/operators/python_operator.py
@@ -114,14 +114,14 @@ def execute_callable(self):
 
 class BranchPythonOperator(PythonOperator, SkipMixin):
 """
-Allows a workflow to "branch" or follow a single path following the
-execution of this task.
+Allows a workflow to "branch" or follow a path following the execution
+of this task.
 
 It derives the PythonOperator and expects a Python function that returns
-the task_id to follow. The task_id returned should point to a task
-directly downstream from {self}. All other "branches" or
-directly downstream tasks are marked with a state of ``skipped`` so that
-these paths can't move forward. The ``skipped`` states are propageted
+a single task_id or list of task_ids to follow. The task_id(s) returned
+should point to a task directly downstream from {self}. All other 
"branches"
+or directly downstream tasks are marked with a state of ``skipped`` so that
+these paths can't move forward. The ``skipped`` states are propagated
 downstream to allow for the DAG state to fill up and the DAG run's state
 to be inferred.
 
@@ -133,13 +133,15 @@ class BranchPythonOperator(PythonOperator, SkipMixin):
 """
 def execute(self, context):
 branch = super(BranchPythonOperator, self).execute(context)
+if isinstance(branch, str):
+branch = [branch]
 self.log.info("Following branch %s", branch)
 self.log.info("Marking other directly downstream tasks as skipped")
 
 downstream_tasks = context['task'].downstream_list
 self.log.debug("Downstream task_ids %s", downstream_tasks)
 
-skip_tasks = [t for t in downstream_tasks if t.task_id != branch]
+skip_tasks = [t for t in downstream_tasks if t.task_id not in branch]
 if downstream_tasks:
 self.skip(context['dag_run'], context['ti'].execution_date, 
skip_tasks)
 
diff --git a/docs/concepts.rst b/docs/concepts.rst
index 2896010248..8753958af3 100644
--- a/docs/concepts.rst
+++ b/docs/concepts.rst
@@ -500,8 +500,8 @@ that happened in an upstream task. One way to do this is by 
using the
 ``BranchPythonOperator``.
 
 The ``BranchPythonOperator`` is much like the PythonOperator except that it
-expects a python_callable that returns a task_id. The task_id returned
-is followed, and all of the other paths are skipped.
+expects a python_callable that returns a task_id (or list of task_ids). The
+task_id returned is followed, and all of the other paths are skipped.
 The task_id returned by the Python function has to be referencing a task
 directly downstream from the BranchPythonOperator task.
 
diff --git a/tests/operators/test_python_operator.py 
b/tests/operators/test_python_operator.py
index afc2a1383a..dd830b899c 100644
--- a/tests/operators/test_python_operator.py
+++ b/tests/operators/test_python_operator.py
@@ -183,15 +183,9 @@ def setUp(self):
'owner': 'airflow',
'start_date': DEFAULT_DATE},
schedule_interval=INTERVAL)
-self.branch_op = BranchPythonOperator(task_id='make_choice',
-  dag=self.dag,
-  python_callable=lambda: 
'branch_1')
 
 self.branch_1 = DummyOperator(task_id='branch_1', dag=self.dag)
-self.branch_1.set_upstream(self.branch_op)
 self.branch_2 = DummyOperator(task_id='branch_2', dag=self.dag)
-self.branch_2.set_upstream(self.branch_op)
-self.dag.clear()
 
 def tearDown(self):
 super(BranchOperatorTest, self).tearDown()
@@ -206,6 +200,13 @@ def tearDown(self):
 
 def test_without_dag_run(self):
 """This checks the defensive against non existent tasks in a dag run"""
+self.branch_op = BranchPythonOperator(task_id='make_choice',
+  dag=self.dag,
+  python_callable=lambda: 
'branch_1')
+self.branch_1.set_upstream(self.branch_op)
+

[jira] [Resolved] (AIRFLOW-3375) Support returning multiple tasks with BranchPythonOperator

2018-11-21 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-3375.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Support returning multiple tasks with BranchPythonOperator
> --
>
> Key: AIRFLOW-3375
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3375
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bas Harenslak
>Assignee: Bas Harenslak
>Priority: Major
> Fix For: 2.0.0
>
>
> I hit a case where I'm using the BranchPythonOperator and want to branch to 
> multiple tasks, so I added support to returning a list of task ids.
> Both a single task id (string type) and list of task ids are supported.
> PR: https://github.com/apache/incubator-airflow/pull/4215



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko closed pull request #4215: [AIRFLOW-3375] Support returning multiple tasks with BranchPythonOperator

2018-11-21 Thread GitBox
Fokko closed pull request #4215: [AIRFLOW-3375] Support returning multiple 
tasks with BranchPythonOperator
URL: https://github.com/apache/incubator-airflow/pull/4215
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/operators/python_operator.py 
b/airflow/operators/python_operator.py
index 9b31838b0c..a92cb86642 100644
--- a/airflow/operators/python_operator.py
+++ b/airflow/operators/python_operator.py
@@ -114,14 +114,14 @@ def execute_callable(self):
 
 class BranchPythonOperator(PythonOperator, SkipMixin):
 """
-Allows a workflow to "branch" or follow a single path following the
-execution of this task.
+Allows a workflow to "branch" or follow a path following the execution
+of this task.
 
 It derives the PythonOperator and expects a Python function that returns
-the task_id to follow. The task_id returned should point to a task
-directly downstream from {self}. All other "branches" or
-directly downstream tasks are marked with a state of ``skipped`` so that
-these paths can't move forward. The ``skipped`` states are propageted
+a single task_id or list of task_ids to follow. The task_id(s) returned
+should point to a task directly downstream from {self}. All other 
"branches"
+or directly downstream tasks are marked with a state of ``skipped`` so that
+these paths can't move forward. The ``skipped`` states are propagated
 downstream to allow for the DAG state to fill up and the DAG run's state
 to be inferred.
 
@@ -133,13 +133,15 @@ class BranchPythonOperator(PythonOperator, SkipMixin):
 """
 def execute(self, context):
 branch = super(BranchPythonOperator, self).execute(context)
+if isinstance(branch, str):
+branch = [branch]
 self.log.info("Following branch %s", branch)
 self.log.info("Marking other directly downstream tasks as skipped")
 
 downstream_tasks = context['task'].downstream_list
 self.log.debug("Downstream task_ids %s", downstream_tasks)
 
-skip_tasks = [t for t in downstream_tasks if t.task_id != branch]
+skip_tasks = [t for t in downstream_tasks if t.task_id not in branch]
 if downstream_tasks:
 self.skip(context['dag_run'], context['ti'].execution_date, 
skip_tasks)
 
diff --git a/docs/concepts.rst b/docs/concepts.rst
index 2896010248..8753958af3 100644
--- a/docs/concepts.rst
+++ b/docs/concepts.rst
@@ -500,8 +500,8 @@ that happened in an upstream task. One way to do this is by 
using the
 ``BranchPythonOperator``.
 
 The ``BranchPythonOperator`` is much like the PythonOperator except that it
-expects a python_callable that returns a task_id. The task_id returned
-is followed, and all of the other paths are skipped.
+expects a python_callable that returns a task_id (or list of task_ids). The
+task_id returned is followed, and all of the other paths are skipped.
 The task_id returned by the Python function has to be referencing a task
 directly downstream from the BranchPythonOperator task.
 
diff --git a/tests/operators/test_python_operator.py 
b/tests/operators/test_python_operator.py
index afc2a1383a..dd830b899c 100644
--- a/tests/operators/test_python_operator.py
+++ b/tests/operators/test_python_operator.py
@@ -183,15 +183,9 @@ def setUp(self):
'owner': 'airflow',
'start_date': DEFAULT_DATE},
schedule_interval=INTERVAL)
-self.branch_op = BranchPythonOperator(task_id='make_choice',
-  dag=self.dag,
-  python_callable=lambda: 
'branch_1')
 
 self.branch_1 = DummyOperator(task_id='branch_1', dag=self.dag)
-self.branch_1.set_upstream(self.branch_op)
 self.branch_2 = DummyOperator(task_id='branch_2', dag=self.dag)
-self.branch_2.set_upstream(self.branch_op)
-self.dag.clear()
 
 def tearDown(self):
 super(BranchOperatorTest, self).tearDown()
@@ -206,6 +200,13 @@ def tearDown(self):
 
 def test_without_dag_run(self):
 """This checks the defensive against non existent tasks in a dag run"""
+self.branch_op = BranchPythonOperator(task_id='make_choice',
+  dag=self.dag,
+  python_callable=lambda: 
'branch_1')
+self.branch_1.set_upstream(self.branch_op)
+self.branch_2.set_upstream(self.branch_op)
+self.dag.clear()
+
 self.branch_op.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE)
 
 session = Session()
@@ -226,7 +227,48 @@ def test_without_dag_run(self):
 else:
 raise
 
+def 

[jira] [Assigned] (AIRFLOW-3375) Support returning multiple tasks with BranchPythonOperator

2018-11-21 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong reassigned AIRFLOW-3375:
-

Assignee: Bas Harenslak

> Support returning multiple tasks with BranchPythonOperator
> --
>
> Key: AIRFLOW-3375
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3375
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bas Harenslak
>Assignee: Bas Harenslak
>Priority: Major
>
> I hit a case where I'm using the BranchPythonOperator and want to branch to 
> multiple tasks, so I added support to returning a list of task ids.
> Both a single task id (string type) and list of task ids are supported.
> PR: https://github.com/apache/incubator-airflow/pull/4215



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] codecov-io edited a comment on issue #4218: [AIRFLOW-3378] KubernetesPodOperator does not delete on timeout failure

2018-11-21 Thread GitBox
codecov-io edited a comment on issue #4218: [AIRFLOW-3378] 
KubernetesPodOperator does not delete on timeout failure
URL: 
https://github.com/apache/incubator-airflow/pull/4218#issuecomment-440673139
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4218?src=pr=h1)
 Report
   > Merging 
[#4218](https://codecov.io/gh/apache/incubator-airflow/pull/4218?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/e56e625152da98c20b453b67b1333fb2b8597194?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/4218/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4218?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#4218   +/-   ##
   ===
 Coverage   77.82%   77.82%   
   ===
 Files 201  201   
 Lines   1633916339   
   ===
 Hits1271612716   
 Misses   3623 3623
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4218?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4218?src=pr=footer).
 Last update 
[e56e625...2248c6e](https://codecov.io/gh/apache/incubator-airflow/pull/4218?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #4218: [AIRFLOW-3378] KubernetesPodOperator does not delete on timeout failure

2018-11-21 Thread GitBox
codecov-io commented on issue #4218: [AIRFLOW-3378] KubernetesPodOperator does 
not delete on timeout failure
URL: 
https://github.com/apache/incubator-airflow/pull/4218#issuecomment-440673139
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4218?src=pr=h1)
 Report
   > Merging 
[#4218](https://codecov.io/gh/apache/incubator-airflow/pull/4218?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/e56e625152da98c20b453b67b1333fb2b8597194?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/4218/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4218?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#4218   +/-   ##
   ===
 Coverage   77.82%   77.82%   
   ===
 Files 201  201   
 Lines   1633916339   
   ===
 Hits1271612716   
 Misses   3623 3623
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4218?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4218?src=pr=footer).
 Last update 
[e56e625...2248c6e](https://codecov.io/gh/apache/incubator-airflow/pull/4218?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (AIRFLOW-1856) How to allow airflow dags for concrete user(s) only?

2018-11-21 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-1856:
--

Assignee: Lokesh Chinnaga

> How to allow airflow dags for concrete user(s) only?
> 
>
> Key: AIRFLOW-1856
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1856
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication, ui, webapp
>Reporter: Ikar Pohorsky
>Assignee: Lokesh Chinnaga
>Priority: Major
>
> The problem is pretty simple. I need to limit airflow web users to see and 
> execute only certain DAGs and tasks.
> If possible, I'd prefer not to use 
> [Kerberos|https://airflow.incubator.apache.org/security.html#kerberos] nor 
> [OAuth|https://airflow.incubator.apache.org/security.html#oauth-authentication].
> The 
> [Multi-tenancy|https://airflow.incubator.apache.org/security.html#multi-tenancy]
>  option seems like an option to go, but couldn't make it work the way I 
> expect.
> My current setup:
> * added airflow web users _test_ and _ikar_ via [Web Authentication / 
> Password|https://airflow.incubator.apache.org/security.html#password]
> * my unix username is _ikar_ with a home in _/home/ikar_
> * no _test_ unix user
> * airflow _1.8.2_ is installed in _/home/ikar/airflow_
> * added two DAGs with one task:
> ** one with _owner_ set to _ikar_
> ** one with _owner_ set to _test_
> * airflow.cfg:
> {code}
> [core]
> # The home folder for airflow, default is ~/airflow
> airflow_home = /home/ikar/airflow
> # The folder where your airflow pipelines live, most likely a
> # subfolder in a code repository
> # This path must be absolute
> dags_folder = /home/ikar/airflow-test/dags
> # The folder where airflow should store its log files
> # This path must be absolute
> base_log_folder = /home/ikar/airflow/logs
> # Airflow can store logs remotely in AWS S3 or Google Cloud Storage. Users
> # must supply a remote location URL (starting with either 's3://...' or
> # 'gs://...') and an Airflow connection id that provides access to the storage
> # location.
> remote_base_log_folder =
> remote_log_conn_id =
> # Use server-side encryption for logs stored in S3
> encrypt_s3_logs = False   
>   
>  
> # DEPRECATED option for remote log storage, use remote_base_log_folder 
> instead!  
> 
> s3_log_folder =   
>   
>  
>   
>   
>   
> # The executor class that airflow should use. Choices include 
>   
>  
> # SequentialExecutor, LocalExecutor, CeleryExecutor   
>   
>  
> executor = SequentialExecutor 
>   
>  
>   
>   
>   
> # The SqlAlchemy connection string to the metadata database.  
>   
>  
> # SqlAlchemy supports many different database engine, more information
>   
>  
> # their website   
>   
>  
> sql_alchemy_conn = sqlite:home/ikar/airflow/airflow.db
> # The SqlAlchemy pool size is the maximum number of database connections
> # in the pool.
> sql_alchemy_pool_size = 5
> # The SqlAlchemy pool recycle is the number of seconds a connection
> # can be idle in the pool before it is invalidated. This config does
> # not apply to sqlite.
> sql_alchemy_pool_recycle = 3600
> # The amount of parallelism as a setting to the executor. This defines
> 

[jira] [Assigned] (AIRFLOW-1856) How to allow airflow dags for concrete user(s) only?

2018-11-21 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-1856:
--

Assignee: (was: Lokesh Chinnaga)

> How to allow airflow dags for concrete user(s) only?
> 
>
> Key: AIRFLOW-1856
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1856
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication, ui, webapp
>Reporter: Ikar Pohorsky
>Priority: Major
>
> The problem is pretty simple. I need to limit airflow web users to see and 
> execute only certain DAGs and tasks.
> If possible, I'd prefer not to use 
> [Kerberos|https://airflow.incubator.apache.org/security.html#kerberos] nor 
> [OAuth|https://airflow.incubator.apache.org/security.html#oauth-authentication].
> The 
> [Multi-tenancy|https://airflow.incubator.apache.org/security.html#multi-tenancy]
>  option seems like an option to go, but couldn't make it work the way I 
> expect.
> My current setup:
> * added airflow web users _test_ and _ikar_ via [Web Authentication / 
> Password|https://airflow.incubator.apache.org/security.html#password]
> * my unix username is _ikar_ with a home in _/home/ikar_
> * no _test_ unix user
> * airflow _1.8.2_ is installed in _/home/ikar/airflow_
> * added two DAGs with one task:
> ** one with _owner_ set to _ikar_
> ** one with _owner_ set to _test_
> * airflow.cfg:
> {code}
> [core]
> # The home folder for airflow, default is ~/airflow
> airflow_home = /home/ikar/airflow
> # The folder where your airflow pipelines live, most likely a
> # subfolder in a code repository
> # This path must be absolute
> dags_folder = /home/ikar/airflow-test/dags
> # The folder where airflow should store its log files
> # This path must be absolute
> base_log_folder = /home/ikar/airflow/logs
> # Airflow can store logs remotely in AWS S3 or Google Cloud Storage. Users
> # must supply a remote location URL (starting with either 's3://...' or
> # 'gs://...') and an Airflow connection id that provides access to the storage
> # location.
> remote_base_log_folder =
> remote_log_conn_id =
> # Use server-side encryption for logs stored in S3
> encrypt_s3_logs = False   
>   
>  
> # DEPRECATED option for remote log storage, use remote_base_log_folder 
> instead!  
> 
> s3_log_folder =   
>   
>  
>   
>   
>   
> # The executor class that airflow should use. Choices include 
>   
>  
> # SequentialExecutor, LocalExecutor, CeleryExecutor   
>   
>  
> executor = SequentialExecutor 
>   
>  
>   
>   
>   
> # The SqlAlchemy connection string to the metadata database.  
>   
>  
> # SqlAlchemy supports many different database engine, more information
>   
>  
> # their website   
>   
>  
> sql_alchemy_conn = sqlite:home/ikar/airflow/airflow.db
> # The SqlAlchemy pool size is the maximum number of database connections
> # in the pool.
> sql_alchemy_pool_size = 5
> # The SqlAlchemy pool recycle is the number of seconds a connection
> # can be idle in the pool before it is invalidated. This config does
> # not apply to sqlite.
> sql_alchemy_pool_recycle = 3600
> # The amount of parallelism as a setting to the executor. This defines
> # the max number of task 

[jira] [Closed] (AIRFLOW-3279) Documentation for Google Logging unclear

2018-11-21 Thread Paul Velthuis (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Velthuis closed AIRFLOW-3279.
--
Assignee: Paul Velthuis

> Documentation for Google Logging unclear
> 
>
> Key: AIRFLOW-3279
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3279
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration, Documentation, gcp, logging
>Reporter: Paul Velthuis
>Assignee: Paul Velthuis
>Priority: Blocker
>
> The documentation of how to install logging to a Google Cloud bucket is 
> unclear.
> I am now following the tutorial on the airflow page:
> [https://airflow.apache.org/howto/write-logs.html]
> Here I find it unclear what part of the 'logger' I have to adjust in the 
> `{{airflow/config_templates/airflow_local_settings.py}}`.
>  
> The adjustment states:
>  
>  # Update the airflow.task and airflow.tas_runner blocks to be 'gcs.task' 
> instead of 'file.task'. 'loggers':
>  Unknown macro: \{ 'airflow.task'}
>  
> However what I find in the template is:
> |'loggers': \{\| \|'airflow.processor': { | |'handlers': 
> ['processor'], | |'level': LOG_LEVEL, | 
> |'propagate': False, | |},|
> |'airflow.task': { 
> \| 
> \|'handlers': ['task'], 
> \| 
> \|'level': LOG_LEVEL, 
> \| 
> \|'propagate': False, 
> \| 
> \|},|
> |'flask_appbuilder': { 
> \| 
> \|'handler': ['console'], 
> \| 
> \|'level': FAB_LOG_LEVEL, 
> \| 
> \|'propagate': True, 
> \| 
> \|}|
> },
>  
> Since for me it is very important to do it right at the first time I hope 
> some clarity can be provided in what has to be adjusted in the logger. Is it 
> only the 'airflow.task' or more?
> Furthermore, at step 6 it is a little unclear what remote_log_conn_id means. 
> I would propose to add a little more information to make this more clear.
>  
> The current error I am facing is:
> Traceback (most recent call last):
>  File "/usr/local/bin/airflow", line 16, in 
>  from airflow import configuration
>  File "/usr/local/lib/python2.7/site-packages/airflow/__init__.py", line 31, 
> in 
>  from airflow import settings
>  File "/usr/local/lib/python2.7/site-packages/airflow/settings.py", line 198, 
> in 
>  configure_logging()
>  File "/usr/local/lib/python2.7/site-packages/airflow/logging_config.py", 
> line 71, in configure_logging
>  dictConfig(logging_config)
>  File "/usr/local/lib/python2.7/logging/config.py", line 794, in dictConfig
>  dictConfigClass(config).configure()
>  File "/usr/local/lib/python2.7/logging/config.py", line 568, in configure
>  handler = self.configure_handler(handlers[name])
>  File "/usr/local/lib/python2.7/logging/config.py", line 733, in 
> configure_handler
>  result = factory(**kwargs)
>  File 
> "/usr/local/lib/python2.7/site-packages/airflow/utils/log/gcs_task_handler.py",
>  line 30, in __init__
>  super(GCSTaskHandler, self).__init__(base_log_folder, filename_template)
>  File 
> "/usr/local/lib/python2.7/site-packages/airflow/utils/log/file_task_handler.py",
>  line 46, in __init__
>  self.filename_jinja_template = Template(self.filename_template)
>  File "/usr/local/lib/python2.7/site-packages/jinja2/environment.py", line 
> 926, in __new__
>  return env.from_string(source, template_class=cls)
>  File "/usr/local/lib/python2.7/site-packages/jinja2/environment.py", line 
> 862, in from_string
>  return cls.from_code(self, self.compile(source), globals, None)
>  File "/usr/local/lib/python2.7/site-packages/jinja2/environment.py", line 
> 565, in compile
>  self.handle_exception(exc_info, source_hint=source_hint)
>  File "/usr/local/lib/python2.7/site-packages/jinja2/environment.py", line 
> 754, in handle_exception
>  reraise(exc_type, exc_value, tb)
>  File "", line 1, in template
> jinja2.exceptions.TemplateSyntaxError: expected token ':', got '}'
> Error in atexit._run_exitfuncs:
> Traceback (most recent call last):
>  File "/usr/local/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
>  func(*targs, **kargs)
>  File "/usr/local/lib/python2.7/logging/__init__.py", line 1676, in shutdown
>  h.close()
>  File 
> "/usr/local/lib/python2.7/site-packages/airflow/utils/log/gcs_task_handler.py",
>  line 73, in close
>  if self.closed:
> AttributeError: 'GCSTaskHandler' object has no attribute 'closed'
> Error in sys.exitfunc:
> Traceback (most recent call last):
>  File "/usr/local/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
>  func(*targs, **kargs)
>  File "/usr/local/lib/python2.7/logging/__init__.py", line 1676, in shutdown
>  h.close()
>  File 
> "/usr/local/lib/python2.7/site-packages/airflow/utils/log/gcs_task_handler.py",
>  line 73, in close
>  if self.closed:
> AttributeError: 'GCSTaskHandler' object has no attribute 'closed'
>  If I look at the Airflow code I see the following 

[GitHub] codecov-io edited a comment on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12

2018-11-21 Thread GitBox
codecov-io edited a comment on issue #3723: [AIRFLOW-2876] Update Tenacity to 
4.12
URL: 
https://github.com/apache/incubator-airflow/pull/3723#issuecomment-411565604
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=h1)
 Report
   > Merging 
[#3723](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/e56e625152da98c20b453b67b1333fb2b8597194?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3723/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3723   +/-   ##
   ===
 Coverage   77.82%   77.82%   
   ===
 Files 201  201   
 Lines   1633916339   
   ===
 Hits1271612716   
 Misses   3623 3623
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=footer).
 Last update 
[e56e625...b648a15](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12

2018-11-21 Thread GitBox
codecov-io edited a comment on issue #3723: [AIRFLOW-2876] Update Tenacity to 
4.12
URL: 
https://github.com/apache/incubator-airflow/pull/3723#issuecomment-411565604
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=h1)
 Report
   > Merging 
[#3723](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/e56e625152da98c20b453b67b1333fb2b8597194?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3723/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3723   +/-   ##
   ===
 Coverage   77.82%   77.82%   
   ===
 Files 201  201   
 Lines   1633916339   
   ===
 Hits1271612716   
 Misses   3623 3623
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=footer).
 Last update 
[e56e625...b648a15](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12

2018-11-21 Thread GitBox
Fokko commented on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12
URL: 
https://github.com/apache/incubator-airflow/pull/3723#issuecomment-440646588
 
 
   @villasv Currently I'm quite busy, feel free to pick this up if you like.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4217: [AIRFLOW-3376] Upgrade tenacity to 5.0.2

2018-11-21 Thread GitBox
Fokko commented on issue #4217: [AIRFLOW-3376] Upgrade tenacity to 5.0.2
URL: 
https://github.com/apache/incubator-airflow/pull/4217#issuecomment-440646399
 
 
   @villasv The K8s tests only executes a simple dag. This dag might not 
contain an operator that uses Tenacity. This is exactly the reason why we have 
the different test suites :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] villasv edited a comment on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12

2018-11-21 Thread GitBox
villasv edited a comment on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12
URL: 
https://github.com/apache/incubator-airflow/pull/3723#issuecomment-440635722
 
 
   @Fokko do you still intend to see this PR through? Did you reproduce 
@r39132's error?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] villasv commented on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12

2018-11-21 Thread GitBox
villasv commented on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12
URL: 
https://github.com/apache/incubator-airflow/pull/3723#issuecomment-440635722
 
 
   @Fokko do you still intend to see this PR through?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3376) Required tenacity version doesn't support python 3.7

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694591#comment-16694591
 ] 

ASF GitHub Bot commented on AIRFLOW-3376:
-

villasv closed pull request #4217: [AIRFLOW-3376] Upgrade tenacity to 5.0.2
URL: https://github.com/apache/incubator-airflow/pull/4217
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/setup.py b/setup.py
index e651f5a66e..499ea5e5f7 100644
--- a/setup.py
+++ b/setup.py
@@ -324,7 +324,7 @@ def do_setup():
 'setproctitle>=1.1.8, <2',
 'sqlalchemy>=1.1.15, <1.2.0',
 'tabulate>=0.7.5, <=0.8.2',
-'tenacity==4.8.0',
+'tenacity==5.0.2',
 'thrift>=0.9.2',
 'tzlocal>=1.4',
 'unicodecsv>=0.14.1',


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Required tenacity version doesn't support python 3.7
> 
>
> Key: AIRFLOW-3376
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3376
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: dependencies
>Affects Versions: 1.10.0
>Reporter: Victor Villas Bôas Chaves
>Priority: Major
>
> HttpHook uses tenacity, which is installed with version pinned at 4.8. 
> Version >5 is compatible with python 3.7 so it should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] villasv commented on issue #4217: [AIRFLOW-3376] Upgrade tenacity to 5.0.2

2018-11-21 Thread GitBox
villasv commented on issue #4217: [AIRFLOW-3376] Upgrade tenacity to 5.0.2
URL: 
https://github.com/apache/incubator-airflow/pull/4217#issuecomment-440635438
 
 
   Hmm. I think I'll give up on this PR and just force update `tenacity` on my 
py37 environment, though I'll leave the JIRA issue open. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] villasv closed pull request #4217: [AIRFLOW-3376] Upgrade tenacity to 5.0.2

2018-11-21 Thread GitBox
villasv closed pull request #4217: [AIRFLOW-3376] Upgrade tenacity to 5.0.2
URL: https://github.com/apache/incubator-airflow/pull/4217
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/setup.py b/setup.py
index e651f5a66e..499ea5e5f7 100644
--- a/setup.py
+++ b/setup.py
@@ -324,7 +324,7 @@ def do_setup():
 'setproctitle>=1.1.8, <2',
 'sqlalchemy>=1.1.15, <1.2.0',
 'tabulate>=0.7.5, <=0.8.2',
-'tenacity==4.8.0',
+'tenacity==5.0.2',
 'thrift>=0.9.2',
 'tzlocal>=1.4',
 'unicodecsv>=0.14.1',


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3372) Unable to start airflow scheduler

2018-11-21 Thread MADHANKUMAR C (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694503#comment-16694503
 ] 

MADHANKUMAR C commented on AIRFLOW-3372:


+*The below are the output i am getting while install airflow using helm*+

[root@kubernetes-cpal-master-0 kube-airflow]# make helm-install 
NAMESPACE=yournamespace HELM_VALUES=airflow/values.yaml
helm upgrade -f airflow/values.yaml \
 --install \
 --debug \
 airflow \
 ./airflow
[debug] Created tunnel using local port: '45999'

[debug] SERVER: "127.0.0.1:45999"

Release "airflow" does not exist. Installing it now.
[debug] CHART PATH: /home/centos/madhan3/kube-airflow/airflow

NAME: airflow
REVISION: 1
RELEASED: Wed Nov 21 09:52:49 2018
CHART: airflow-v0.2.1
USER-SUPPLIED VALUES:
airflow:
 config: {}
 dag_path: /dags
 fernet_key: ""
 image: rcmadhankumar/docker-airflow
 image_pull_policy: IfNotPresent
 imageTag: 1.9.0
 init_retry_loop: null
 scheduler_num_runs: "-1"
 service:
 type: ClusterIP
 url_prefix: /airflow
celery:
 num_workers: 1
dags:
 git_branch: master
 git_repo: null
 git_sync_debug: false
 git_sync_enabled: false
 load_examples: true
 pickle_dag: true
 poll_interval_sec: 60
flower:
 url_prefix: /flower
ingress:
 annotations:
 flower: null
 web: null
 enabled: false
 host: ""
 path:
 flower: /airflow/flower
 web: /airflow
persistence:
 accessMode: ReadWriteOnce
 enabled: false
 size: 1Gi
postgresql:
 enabled: true
 persistence:
 enabled: true
 postgresDatabase: airflow
 postgresPassword: airflow
 postgresUser: airflow
redis:
 enabled: true
 persistence:
 enabled: true
 redisPassword: redis

COMPUTED VALUES:
airflow:
 config: {}
 dag_path: /dags
 fernet_key: ""
 image: rcmadhankumar/docker-airflow
 image_pull_policy: IfNotPresent
 imageTag: 1.9.0
 init_retry_loop: null
 scheduler_num_runs: "-1"
 service:
 type: ClusterIP
 url_prefix: /airflow
celery:
 num_workers: 1
dags:
 git_branch: master
 git_repo: null
 git_sync_debug: false
 git_sync_enabled: false
 load_examples: true
 pickle_dag: true
 poll_interval_sec: 60
flower:
 url_prefix: /flower
ingress:
 annotations:
 flower: null
 web: null
 enabled: false
 host: ""
 path:
 flower: /airflow/flower
 web: /airflow
persistence:
 accessMode: ReadWriteOnce
 enabled: false
 size: 1Gi
postgresql:
 affinity: {}
 enabled: true
 global: {}
 image: postgres
 imageTag: 9.6.2
 metrics:
 enabled: false
 image: wrouesnel/postgres_exporter
 imagePullPolicy: IfNotPresent
 imageTag: v0.1.1
 resources:
 requests:
 cpu: 100m
 memory: 256Mi
 networkPolicy:
 allowExternal: true
 enabled: false
 nodeSelector: {}
 persistence:
 accessMode: ReadWriteOnce
 enabled: true
 mountPath: /var/lib/postgresql/data/pgdata
 size: 8Gi
 subPath: postgresql-db
 postgresDatabase: airflow
 postgresPassword: airflow
 postgresUser: airflow
 resources:
 requests:
 cpu: 100m
 memory: 256Mi
 service:
 externalIPs: []
 port: 5432
 type: ClusterIP
 tolerations: []
redis:
 enabled: true
 global: {}
 image: bitnami/redis:4.0.8-r0
 imagePullPolicy: IfNotPresent
 metrics:
 annotations:
 prometheus.io/port: "9121"
 prometheus.io/scrape: "true"
 enabled: false
 image: oliver006/redis_exporter
 imagePullPolicy: IfNotPresent
 imageTag: v0.11
 resources: {}
 networkPolicy:
 allowExternal: true
 enabled: false
 nodeSelector: {}
 persistence:
 accessMode: ReadWriteOnce
 enabled: true
 path: /bitnami
 size: 8Gi
 subPath: ""
 podAnnotations: {}
 podLabels: {}
 redisPassword: redis
 resources:
 requests:
 cpu: 100m
 memory: 256Mi
 securityContext:
 enabled: true
 fsGroup: 1001
 runAsUser: 1001
 service:
 annotations: {}
 loadBalancerIP: null
 serviceType: ClusterIP
 tolerations: []
 usePassword: true

HOOKS:
MANIFEST:

---
# Source: airflow/charts/postgresql/templates/secrets.yaml
apiVersion: v1
kind: Secret
metadata:
 name: airflow-postgresql
 labels:
 app: airflow-postgresql
 chart: "postgresql-0.8.12"
 release: "airflow"
 heritage: "Tiller"
type: Opaque
data:

postgres-password: "YWlyZmxvdw=="
---
# Source: airflow/charts/redis/templates/secrets.yaml
apiVersion: v1
kind: Secret
metadata:
 name: airflow-redis
 labels:
 app: airflow-redis
 chart: "redis-1.1.12"
 release: "airflow"
 heritage: "Tiller"
type: Opaque
data:
 redis-password: "cmVkaXM="
---
# Source: airflow/charts/postgresql/templates/pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
 name: airflow-postgresql
 labels:
 app: airflow-postgresql
 chart: "postgresql-0.8.12"
 release: "airflow"
 heritage: "Tiller"
spec:
 accessModes:
 - "ReadWriteOnce"
 resources:
 requests:
 storage: "8Gi"
---
# Source: airflow/charts/redis/templates/pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
 name: airflow-redis
 labels:
 app: airflow-redis
 chart: "redis-1.1.12"
 release: "airflow"
 heritage: "Tiller"
spec:
 accessModes:
 - "ReadWriteOnce"
 resources:
 requests:
 storage: "8Gi"
---
# Source: airflow/charts/postgresql/templates/svc.yaml
apiVersion: v1
kind: Service
metadata:
 name: 

[jira] [Commented] (AIRFLOW-3380) Metrics documentation

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694507#comment-16694507
 ] 

ASF GitHub Bot commented on AIRFLOW-3380:
-

blugowski opened a new pull request #4219: [AIRFLOW-3380] Metrics documentation
URL: https://github.com/apache/incubator-airflow/pull/4219
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3380) issues and references 
them in the PR title.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   This PR adds metrics documenation:
   * how to enable metrics
   * list of generated metrics
   * description of each metric
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Changes in documenation.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Metrics documentation
> -
>
> Key: AIRFLOW-3380
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3380
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Bartosz Ługowski
>Assignee: Bartosz Ługowski
>Priority: Trivial
>
> At the moment there is no documentation about Airflow metrics:
>  * how to enable it
>  * which metrics are generated
>  * description of each metric



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] blugowski opened a new pull request #4219: [AIRFLOW-3380] Metrics documentation

2018-11-21 Thread GitBox
blugowski opened a new pull request #4219: [AIRFLOW-3380] Metrics documentation
URL: https://github.com/apache/incubator-airflow/pull/4219
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3380) issues and references 
them in the PR title.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   This PR adds metrics documenation:
   * how to enable metrics
   * list of generated metrics
   * description of each metric
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Changes in documenation.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work started] (AIRFLOW-3380) Metrics documentation

2018-11-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3380 started by Bartosz Ługowski.
-
> Metrics documentation
> -
>
> Key: AIRFLOW-3380
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3380
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Bartosz Ługowski
>Assignee: Bartosz Ługowski
>Priority: Trivial
>
> At the moment there is no documentation about Airflow metrics:
>  * how to enable it
>  * which metrics are generated
>  * description of each metric



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3380) Metrics documentation

2018-11-21 Thread JIRA
Bartosz Ługowski created AIRFLOW-3380:
-

 Summary: Metrics documentation
 Key: AIRFLOW-3380
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3380
 Project: Apache Airflow
  Issue Type: Improvement
  Components: Documentation
Reporter: Bartosz Ługowski
Assignee: Bartosz Ługowski


At the moment there is no documentation about Airflow metrics:
 * how to enable it
 * which metrics are generated
 * description of each metric



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3372) Unable to start airflow scheduler

2018-11-21 Thread MADHANKUMAR C (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694479#comment-16694479
 ] 

MADHANKUMAR C commented on AIRFLOW-3372:


Actually i am installing airflow using helm chart.It installs worker ,scheduler 
and other pods.Since i have copied the airflow configuration file on each 
container during image creation,I login to the container and check the 
configuration file.Eventhough i have changed the max threads value to 1,it 
shows it as 2.During docker deployment of the same,the value is still 1.But ,in 
helm chart deployment the value is changed and i am facing the issue.So where 
can i change the max thread configuration in helm chart.

 

Here is the github  link of my whole experiment : 
[https://github.com/mumoshu/kube-airflow]

In the above repository ,the folder *airflow* contains all the *helm chart* 
information.

 

 

> Unable to start airflow scheduler
> -
>
> Key: AIRFLOW-3372
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3372
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docker, kubernetes, scheduler
>Affects Versions: 1.9.0
> Environment: Kubernetes,docker
>Reporter: MADHANKUMAR C
>Priority: Blocker
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> *I have installed airflow in kubernetes cluster.When i am installing airflow 
> ,i am unable to start the scheduler.The below is the log of scheduler 
> container.*
> [2018-11-20 12:02:40,860] {{__init__.py:51}} INFO - Using executor 
> SequentialExecutor
>  [2018-11-20 12:02:40,973] {{cli_action_loggers.py:69}} ERROR - Failed on 
> pre-execution callback using 
>  Traceback (most recent call last):
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
>  context)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
>  cursor.execute(statement, parameters)
>  sqlite3.OperationalError: no such table: log
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
>  File 
> "/usr/local/lib/python3.5/dist-packages/airflow/utils/cli_action_loggers.py", 
> line 67, in on_pre_execution
>  cb(**kwargs)
>  File 
> "/usr/local/lib/python3.5/dist-packages/airflow/utils/cli_action_loggers.py", 
> line 99, in default_action_log
>  session.commit()
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/session.py", 
> line 927, in commit
>  self.transaction.commit()
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/session.py", 
> line 467, in commit
>  self._prepare_impl()
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/session.py", 
> line 447, in _prepare_impl
>  self.session.flush()
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/session.py", 
> line 2209, in flush
>  self._flush(objects)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/session.py", 
> line 2329, in _flush
>  transaction.rollback(_capture_exception=True)
>  File 
> "/usr/local/lib/python3.5/dist-packages/sqlalchemy/util/langhelpers.py", line 
> 66, in __exit__
>  compat.reraise(exc_type, exc_value, exc_tb)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/util/compat.py", 
> line 187, in reraise
>  raise value
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/session.py", 
> line 2293, in _flush
>  flush_context.execute()
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/unitofwork.py", 
> line 389, in execute
>  rec.execute(self)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/unitofwork.py", 
> line 548, in execute
>  uow
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/persistence.py", 
> line 181, in save_obj
>  mapper, table, insert)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/persistence.py", 
> line 835, in _emit_insert_statements
>  execute(statement, params)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", 
> line 945, in execute
>  return meth(self, multiparams, params)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/sql/elements.py", 
> line 263, in _execute_on_connection
>  return connection._execute_clauseelement(self, multiparams, params)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", 
> line 1053, in _execute_clauseelement
>  compiled_sql, distilled_params
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", 
> line 1189, in _execute_context
>  context)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", 
> line 1402, in _handle_dbapi_exception
>  exc_info
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/util/compat.py", 
> line 203, in raise_from_cause
>  reraise(type(exception), 

[jira] [Commented] (AIRFLOW-3374) KubernetesPodOperator gets stuck on failure

2018-11-21 Thread Victor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694436#comment-16694436
 ] 

Victor commented on AIRFLOW-3374:
-

Actually, I don't think it is related to KubernetesPodOperator but maybe to the 
scheduler?

> KubernetesPodOperator gets stuck on failure
> ---
>
> Key: AIRFLOW-3374
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3374
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: kubernetes
>Affects Versions: 1.10.1
>Reporter: Victor
>Priority: Major
>
> I am running Airflow 1.10.1rc2 on kubernetes with the LocalExecutor and DAGs 
> using the KubernetesPodOperator and when the execution fail (in this case it 
> is because kubernetes can't download the image), the logs tell me that it 
> failed but the task stays marked as running and nothing happens.
> Also the pod doesn't get deleted so I suppose something is happening in 
> launch.run_pod…
> This is the DAG operator call:
> {code:java}
> with DAG(
> 'demo',
> default_args=default_args,
> # since we always take the latest upload available,
> # we don't have to catchup (i.e., execute every run between start_date 
> and now)
> # but run only once.
> catchup=False,
> # for now, we don't schedule the DAG daily
> schedule_interval=None
> ) as dag:
> datapipe = kubernetes_pod_operator.KubernetesPodOperator(
> task_id='datapipe',
> name='datapipe',
> namespace='default',
> image='my-image:master',
> cmds=['python3'],
> arguments=['-m', 'fb'],
> in_cluster=True,
> is_delete_operator_pod=True,
> # TODO add image_pull_secrets (in 1.10.2, see 
> https://github.com/apache/incubator-airflow/pull/4188)
> )
> {code}
> Those are the logs:
> {noformat}
> *** Reading local file: 
> /airflow/logs/demo/datapipe/2018-11-20T15:51:31.604882+00:00/1.log
> [2018-11-20 15:51:35,483] {models.py:1361} INFO - Dependencies all met for 
> 
> [2018-11-20 15:51:35,497] {models.py:1361} INFO - Dependencies all met for 
> 
> [2018-11-20 15:51:35,497] {models.py:1573} INFO -
> 
> Starting attempt 1 of 1
> 
> [2018-11-20 15:51:35,534] {models.py:1595} INFO - Executing 
>  on 2018-11-20T15:51:31.604882+00:00
> [2018-11-20 15:51:35,535] {base_task_runner.py:118} INFO - Running: ['bash', 
> '-c', 'airflow run demo datapipe 2018-11-20T15:51:31.604882+00:00 --job_id 3 
> --raw -sd /usr/local/airflow/dags_volume/..data/demo.py --cfg_path 
> /tmp/tmpsf0htmc0']
> [2018-11-20 15:51:36,799] {base_task_runner.py:101} INFO - Job 3: Subtask 
> datapipe [2018-11-20 15:51:36,795] {settings.py:174} INFO - 
> setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800
> [2018-11-20 15:51:37,425] {base_task_runner.py:101} INFO - Job 3: Subtask 
> datapipe [2018-11-20 15:51:37,424] {__init__.py:51} INFO - Using executor 
> LocalExecutor
> [2018-11-20 15:51:37,779] {base_task_runner.py:101} INFO - Job 3: Subtask 
> datapipe [2018-11-20 15:51:37,779] {models.py:271} INFO - Filling up the 
> DagBag from /usr/local/airflow/dags_volume/..data/demo.py
> [2018-11-20 15:51:38,379] {base_task_runner.py:101} INFO - Job 3: Subtask 
> datapipe [2018-11-20 15:51:38,378] {cli.py:484} INFO - Running  demo.datapipe 2018-11-20T15:51:31.604882+00:00 [running]> on host 
> infra-airflow-6d78c56489-r9trl
> [2018-11-20 15:51:38,452] {logging_mixin.py:95} INFO - [2018-11-20 
> 15:51:38,451] {pod_launcher.py:121} INFO - Event: datapipe-2a76c2a2 had an 
> event of type Pending
> [2018-11-20 15:51:39,458] {logging_mixin.py:95} INFO - [2018-11-20 
> 15:51:39,458] {pod_launcher.py:121} INFO - Event: datapipe-2a76c2a2 had an 
> event of type Pending
> [2018-11-20 15:51:40,467] {logging_mixin.py:95} INFO - [2018-11-20 
> 15:51:40,466] {pod_launcher.py:121} INFO - Event: datapipe-2a76c2a2 had an 
> event of type Pending
> [2018-11-20 15:51:41,473] {logging_mixin.py:95} INFO - [2018-11-20 
> 15:51:41,473] {pod_launcher.py:121} INFO - Event: datapipe-2a76c2a2 had an 
> event of type Pending
> [2018-11-20 15:51:42,479] {logging_mixin.py:95} INFO - [2018-11-20 
> 15:51:42,479] {pod_launcher.py:121} INFO - Event: datapipe-2a76c2a2 had an 
> event of type Pending
> [2018-11-20 15:51:43,485] {logging_mixin.py:95} INFO - [2018-11-20 
> 15:51:43,485] {pod_launcher.py:121} INFO - Event: datapipe-2a76c2a2 had an 
> event of type Pending
> [2018-11-20 15:51:44,492] {logging_mixin.py:95} INFO - [2018-11-20 
> 15:51:44,492] {pod_launcher.py:121} INFO - Event: datapipe-2a76c2a2 had an 
> event of type Pending
> [2018-11-20 15:51:45,498] {logging_mixin.py:95} INFO - [2018-11-20 
> 15:51:45,498] {pod_launcher.py:121} 

[jira] [Commented] (AIRFLOW-3378) KubernetesPodOperator does not delete on timeout failure

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694416#comment-16694416
 ] 

ASF GitHub Bot commented on AIRFLOW-3378:
-

victornoel opened a new pull request #4218: [AIRFLOW-3378] 
KubernetesPodOperator does not delete on timeout failure
URL: https://github.com/apache/incubator-airflow/pull/4218
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3378
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Basically, I changed the code so that the pod is always deleted if a failure 
happens via an exception.
   I'm open to a nicer way to do that though.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Add a simple test to ensure pod deletion is triggered when failure happens 
via an exception.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> KubernetesPodOperator does not delete on timeout failure
> 
>
> Key: AIRFLOW-3378
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3378
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: kubernetes
>Affects Versions: 1.10.1
>Reporter: Victor
>Priority: Major
>
> I am runnig airflow 1.10.1rc2 on kubernetes with the LocalExecutior and 
> KubernetesPodOperator to run pods.
>  
> When a failure happens because the pod can't be created (see logs below), the 
> whole operator fails and if is_delete_operator_pod is set to True, the pod is 
> NOT deleted as expected.
>  
> Logs:
> {noformat}
> *** Reading local file: 
> /airflow/logs/demo/datapipe/2018-11-21T08:29:57.206456+00:00/1.log
> [2018-11-21 08:30:02,027] {models.py:1361} INFO - Dependencies all met for 
> 
> [2018-11-21 08:30:02,040] {models.py:1361} INFO - Dependencies all met for 
> 
> [2018-11-21 08:30:02,041] {models.py:1573} INFO -
> 
> Starting attempt 1 of 1
> 
> [2018-11-21 08:30:02,066] {models.py:1595} INFO - Executing 
>  on 2018-11-21T08:29:57.206456+00:00
> [2018-11-21 08:30:02,067] {base_task_runner.py:118} INFO - Running: ['bash', 
> '-c', 'airflow run demo datapipe 2018-11-21T08:29:57.206456+00:00 --job_id 4 
> --raw -sd /usr/local/airflow/dags_volume/..data/demo.py --cfg_path 
> /tmp/tmp5qa5qyhs']
> [2018-11-21 08:30:03,065] {base_task_runner.py:101} INFO - Job 4: Subtask 
> datapipe [2018-11-21 08:30:03,065] {settings.py:174} INFO - 
> setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800
> [2018-11-21 08:30:03,734] {base_task_runner.py:101} INFO - Job 4: Subtask 
> datapipe [2018-11-21 08:30:03,732] {__init__.py:51} INFO - Using executor 
> LocalExecutor
> [2018-11-21 08:30:04,038] {base_task_runner.py:101} INFO - Job 4: Subtask 
> datapipe [2018-11-21 08:30:04,038] {models.py:271} INFO - Filling up the 
> DagBag from /usr/local/airflow/dags_volume/..data/demo.py
> [2018-11-21 08:30:04,590] {base_task_runner.py:101} INFO - Job 4: Subtask 
> datapipe [2018-11-21 08:30:04,590] {cli.py:484} INFO - Running  demo.datapipe 

[GitHub] victornoel opened a new pull request #4218: [AIRFLOW-3378] KubernetesPodOperator does not delete on timeout failure

2018-11-21 Thread GitBox
victornoel opened a new pull request #4218: [AIRFLOW-3378] 
KubernetesPodOperator does not delete on timeout failure
URL: https://github.com/apache/incubator-airflow/pull/4218
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3378
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Basically, I changed the code so that the pod is always deleted if a failure 
happens via an exception.
   I'm open to a nicer way to do that though.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Add a simple test to ensure pod deletion is triggered when failure happens 
via an exception.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3379) Support for AWS triggers (S3, SNS, SQS, CloudWatch, etc.)

2018-11-21 Thread Brylie Christopher Oxley (JIRA)
Brylie Christopher Oxley created AIRFLOW-3379:
-

 Summary: Support for AWS triggers (S3, SNS, SQS, CloudWatch, etc.)
 Key: AIRFLOW-3379
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3379
 Project: Apache Airflow
  Issue Type: Improvement
  Components: aws
Reporter: Brylie Christopher Oxley


We would like to build a 'reactive', or event-driven, data pipeline. From what 
I can gather, the primary Airflow DAG trigger is a timer (cron). However, it 
would be useful to trigger dags on external events, namely AWS events like S3 
file, SQS, SNS, and/or CloudWatch.

I note there is an experimental API, which could be triggered from an AWS 
Lambda, but would add boilerplate and brittleness to the data pipeline.

What are our options for triggering Airflow DAGS from external AWS events?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)