[jira] [Commented] (AIRFLOW-3561) Improve some views

2018-12-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728013#comment-16728013
 ] 

ASF GitHub Bot commented on AIRFLOW-3561:
-

ffinfo commented on pull request #4368: AIRFLOW-3561 - improve queries
URL: https://github.com/apache/incubator-airflow/pull/4368
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-3561\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3561
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve some views
> --
>
> Key: AIRFLOW-3561
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3561
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Peter van 't Hof
>Assignee: Peter van 't Hof
>Priority: Minor
>
> Some views does interaction with the dag bag while is not needed for the 
> query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3551) Improve BashOperator Test Coverage

2018-12-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727999#comment-16727999
 ] 

ASF GitHub Bot commented on AIRFLOW-3551:
-

feluelle commented on pull request #4367: [AIRFLOW-3551] Improve BashOperator 
Test Coverage
URL: https://github.com/apache/incubator-airflow/pull/4367
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3551
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   - adds test case for xcom_push=True
   - refactoring
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve BashOperator Test Coverage
> --
>
> Key: AIRFLOW-3551
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3551
> Project: Apache Airflow
>  Issue Type: Test
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Minor
>
> The current tests for the `BashOperator` are not covering
> * pre_exec
> * xcom_push_flag



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3551) Improve BashOperator Test Coverage

2018-12-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727998#comment-16727998
 ] 

ASF GitHub Bot commented on AIRFLOW-3551:
-

feluelle commented on pull request #4366: [AIRFLOW-3551] Improve BashOperator 
Test Coverage
URL: https://github.com/apache/incubator-airflow/pull/4366
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve BashOperator Test Coverage
> --
>
> Key: AIRFLOW-3551
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3551
> Project: Apache Airflow
>  Issue Type: Test
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Minor
>
> The current tests for the `BashOperator` are not covering
> * pre_exec
> * xcom_push_flag



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3552) Add ImapToS3TransferOperator

2018-12-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727993#comment-16727993
 ] 

ASF GitHub Bot commented on AIRFLOW-3552:
-

feluelle commented on pull request #4366: [AIRFLOW-3552] Improve BashOperator 
Test Coverage
URL: https://github.com/apache/incubator-airflow/pull/4366
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3551
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   - adds test case for xcom_push=True
   - refactoring
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ImapToS3TransferOperator
> 
>
> Key: AIRFLOW-3552
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3552
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Major
>
> This operator transfers mail attachments from a mail server to an amazon s3 
> bucket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3558) Have tox flake8 skip ignored and hidden directories

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727826#comment-16727826
 ] 

ASF GitHub Bot commented on AIRFLOW-3558:
-

bolkedebruin commented on pull request #4361: [AIRFLOW-3558] Improve default 
tox flake8 excludes
URL: https://github.com/apache/incubator-airflow/pull/4361
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Have tox flake8 skip ignored and hidden directories
> ---
>
> Key: AIRFLOW-3558
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3558
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
>
> By default if you run tox with the flake8 target in Airflow it checks all of 
> the directories, this includes .eggs, env, etc. all of which are ignored by 
> our gitignore but cat by flake8 and gives a bunch of errors for non-Airflow 
> code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1684) Branching based on XCOM variable

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727813#comment-16727813
 ] 

ASF GitHub Bot commented on AIRFLOW-1684:
-

eladkal commented on pull request #4365: [AIRFLOW-1684] - Branching based on 
XCom variable (Docs)
URL: https://github.com/apache/incubator-airflow/pull/4365
 
 
   Elaborate how to use branching with XComs
   
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-1684) issues and references 
them in the PR title.
   
   ### Description
   
   - Elaborate how to use branching with XComs
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Branching based on XCOM variable
> 
>
> Key: AIRFLOW-1684
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1684
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: xcom
>Affects Versions: 1.7.0
> Environment: Centos 7, Airflow1.7
>Reporter: Virendhar Sivaraman
>Assignee: Elad
>Priority: Major
>
> I would like to branch my dag based on a XCOM variable.
> Steps:
> 1. Populate XCOM in bash
> 2. pull the XCOM variable in a BranchPythonOperator and branch it out based 
> on the XCOM variable
> I've tried the documentation and researched on the internet - haven't been 
> successful.
> This feature will be helpful if its not available yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3550) GKEClusterHook doesn't use gcp_conn_id

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727807#comment-16727807
 ] 

ASF GitHub Bot commented on AIRFLOW-3550:
-

jmcarp commented on pull request #4364: [AIRFLOW-3550] Standardize GKE hook.
URL: https://github.com/apache/incubator-airflow/pull/4364
 
 
   Refactor `GKEClusterHook` to subclass `GoogleCloudBaseHook` and
   authenticate with connection credentials.
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3550
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Refactor `GKEClusterHook` to subclass `GoogleCloudBaseHook` and authenticate 
with connection credentials.
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> GKEClusterHook doesn't use gcp_conn_id
> --
>
> Key: AIRFLOW-3550
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3550
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Wilson Lian
>Priority: Major
>
> The hook doesn't inherit from GoogleCloudBaseHook. API calls are made using 
> the default service account (if present).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3560) Add Sensor that polls until a day of the week

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727764#comment-16727764
 ] 

ASF GitHub Bot commented on AIRFLOW-3560:
-

kaxil commented on pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek 
Sensors
URL: https://github.com/apache/incubator-airflow/pull/4363
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3560
   
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   One of the use-cases we had is we wanted to run certain tasks only on 
Weekends or certain days of the weeks. Along the way, I have seen more people 
requiring the same.
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   * `DayOfWeekSensorTests`
   * `WeekEndSensorTests`
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Sensor that polls until a day of the week
> -
>
> Key: AIRFLOW-3560
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3560
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.2
>
>
> One of the use-case we have is we want to run certain tasks only on Weekends



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1077) Subdags can deadlock

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727537#comment-16727537
 ] 

ASF GitHub Bot commented on AIRFLOW-1077:
-

stale[bot] closed pull request #2367: [AIRFLOW-1077] Warn about subdag deadlock 
case
URL: https://github.com/apache/incubator-airflow/pull/2367
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/concepts.rst b/docs/concepts.rst
index 33a6ea44c7..56fbd2a531 100644
--- a/docs/concepts.rst
+++ b/docs/concepts.rst
@@ -457,10 +457,10 @@ Not like this, where the join task is skipped
 
 .. image:: img/branch_bad.png
 
-SubDAGs
+SubDags
 ===
 
-SubDAGs are perfect for repeating patterns. Defining a function that returns a
+SubDags are perfect for repeating patterns. Defining a function that returns a
 DAG object is a nice design pattern when using Airflow.
 
 Airbnb uses the *stage-check-exchange* pattern when loading data. Data is 
staged
@@ -472,13 +472,13 @@ As another example, consider the following DAG:
 
 .. image:: img/subdag_before.png
 
-We can combine all of the parallel ``task-*`` operators into a single SubDAG,
+We can combine all of the parallel ``task-*`` operators into a single SubDag,
 so that the resulting DAG resembles the following:
 
 .. image:: img/subdag_after.png
 
-Note that SubDAG operators should contain a factory method that returns a DAG
-object. This will prevent the SubDAG from being treated like a separate DAG in
+Note that SubDag operators should contain a factory method that returns a DAG
+object. This will prevent the SubDag from being treated like a separate DAG in
 the main UI. For example:
 
 .. code:: python
@@ -503,7 +503,7 @@ the main UI. For example:
 
 return dag
 
-This SubDAG can then be referenced in your main DAG file:
+This SubDag can then be referenced in your main DAG file:
 
 .. code:: python
 
@@ -531,29 +531,36 @@ This SubDAG can then be referenced in your main DAG file:
   )
 
 You can zoom into a SubDagOperator from the graph view of the main DAG to show
-the tasks contained within the SubDAG:
+the tasks contained within the SubDag:
 
 .. image:: img/subdag_zoom.png
 
-Some other tips when using SubDAGs:
+Some other tips when using SubDags:
 
--  by convention, a SubDAG's ``dag_id`` should be prefixed by its parent and
+-  by convention, a SubDag's ``dag_id`` should be prefixed by its parent and
a dot. As in ``parent.child``
--  share arguments between the main DAG and the SubDAG by passing arguments to
-   the SubDAG operator (as demonstrated above)
--  SubDAGs must have a schedule and be enabled. If the SubDAG's schedule is
-   set to ``None`` or ``@once``, the SubDAG will succeed without having done
+-  share arguments between the main DAG and the SubDag by passing arguments to
+   the SubDag operator (as demonstrated above)
+-  SubDags must have a schedule and be enabled. If the SubDag's schedule is
+   set to ``None`` or ``@once``, the SubDag will succeed without having done
anything
 -  clearing a SubDagOperator also clears the state of the tasks within
 -  marking success on a SubDagOperator does not affect the state of the tasks
within
--  refrain from using ``depends_on_past=True`` in tasks within the SubDAG as
+-  refrain from using ``depends_on_past=True`` in tasks within the SubDag as
this can be confusing
--  it is possible to specify an executor for the SubDAG. It is common to use
-   the SequentialExecutor if you want to run the SubDAG in-process and
+-  it is possible to specify an executor for the SubDag. It is common to use
+   the SequentialExecutor if you want to run the SubDag in-process and
effectively limit its parallelism to one. Using LocalExecutor can be
problematic as it may over-subscribe your worker, running multiple tasks in
a single slot
+-  do not create more SubDags then your concurrency limit or the scheduler
+   will deadlock. Each SubDags counts towards your concurrency limit. For
+   example, if you have a concurrency limit of 16 and you have 25 SubDags,
+   the 16 SubDags will be scheduled, effectively blocking any of the tasks
+   within the given SubDags. You can work around this by setting the SubDag's
+   executor to SequentialExecutor. This allows multiple SubDag to run
+   concurrently without locking the tasks within the SubDag
 
 See ``airflow/example_dags`` for a demonstration.
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> 

[jira] [Commented] (AIRFLOW-3559) Add missing options to DatadogHook

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727533#comment-16727533
 ] 

ASF GitHub Bot commented on AIRFLOW-3559:
-

jmcarp opened a new pull request #4362: [AIRFLOW-3559] Add missing options to 
DatadogHook.
URL: https://github.com/apache/incubator-airflow/pull/4362
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3559
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Adds missing arguments to `DatadogHook`.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Backfills missing tests for `DatadogHook`
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add missing options to DatadogHook
> --
>
> Key: AIRFLOW-3559
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3559
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Josh Carp
>Assignee: Josh Carp
>Priority: Trivial
>
> The DataDog hook is missing a few options for creating events and metrics. 
> I'll add those options and backfill unit tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3558) Have tox flake8 skip ignored and hidden directories

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727526#comment-16727526
 ] 

ASF GitHub Bot commented on AIRFLOW-3558:
-

holdenk opened a new pull request #4361: [AIRFLOW-3558] Improve default tox 
flake8 excludes
URL: https://github.com/apache/incubator-airflow/pull/4361
 
 
   Right now our gitignore skips a bunch of temporary Python directories
   but our flake8 config will still test against them, leading to
   unnecessary error messages. This changes the excludes
   to skip the common directories that can cause false flake8 failures.
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ X ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ X ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Right now our gitignore skips a bunch of temporary Python directories
   but our flake8 config will still test against them, leading to
   unnecessary error messages. This changes the excludes
   to skip the common directories that can cause false flake8 failures.
   
   This should not impact end users.
   
   ### Tests
   
   - [ X ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   The existing flake8 env still runs from tox
   
   ### Commits
   
   - [ X ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ X ] In case of new functionality, my PR adds documentation that 
describes how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   No new functionality.
   
   ### Code Quality
   
   - [ X ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Have tox flake8 skip ignored and hidden directories
> ---
>
> Key: AIRFLOW-3558
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3558
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
>
> By default if you run tox with the flake8 target in Airflow it checks all of 
> the directories, this includes .eggs, env, etc. all of which are ignored by 
> our gitignore but cat by flake8 and gives a bunch of errors for non-Airflow 
> code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1191) Contrib Spark Submit hook should permit override of spark-submit cmd

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727522#comment-16727522
 ] 

ASF GitHub Bot commented on AIRFLOW-1191:
-

holdenk opened a new pull request #4360: [AIRFLOW-1191] Simplify override of 
spark submit command
URL: https://github.com/apache/incubator-airflow/pull/4360
 
 
   This will better support distros which ship spark 1 & 2 ( and eventually 3)
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ X ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ X ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Adds a spark_binary param to the spark submit operator to allow folks to 
more easily configure the operator to use a different binary, as is needed for 
some distros of the Hadoop ecosystem which ship multiple version of Spark.
   
   ### Tests
   
   - [ X ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Updates the existing test_spark_submit_operator to check for override
   
   ### Commits
   
   - [ X ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ X ] In case of new functionality, my PR adds documentation that 
describes how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   docstring update
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Contrib Spark Submit hook should permit override of spark-submit cmd
> 
>
> Key: AIRFLOW-1191
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1191
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, hooks
>Affects Versions: 1.8.1
> Environment: Cloudera based Spark parcel
>Reporter: Vianney FOUCAULT
>Assignee: Vianney FOUCAULT
>Priority: Major
> Fix For: 1.10.0
>
>
> Using Cloudera based cluster with spark 2 parcel that rename spark-submit to 
> spark2-submit
> It should be possible to change the spark submit cmd without specifying a env 
> var



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-850) Airflow should support a general purpose PythonSensor

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727516#comment-16727516
 ] 

ASF GitHub Bot commented on AIRFLOW-850:


kaxil closed pull request #4349: [AIRFLOW-850] Add a PythonSensor
URL: https://github.com/apache/incubator-airflow/pull/4349
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/sensors/python_sensor.py 
b/airflow/contrib/sensors/python_sensor.py
new file mode 100644
index 00..68bc7497ea
--- /dev/null
+++ b/airflow/contrib/sensors/python_sensor.py
@@ -0,0 +1,81 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils.decorators import apply_defaults
+
+
+class PythonSensor(BaseSensorOperator):
+"""
+Waits for a Python callable to return True.
+
+User could put input argument in templates_dict
+e.g templates_dict = {'start_ds': 1970}
+and access the argument by calling `kwargs['templates_dict']['start_ds']`
+in the the callable
+
+:param python_callable: A reference to an object that is callable
+:type python_callable: python callable
+:param op_kwargs: a dictionary of keyword arguments that will get unpacked
+in your function
+:type op_kwargs: dict
+:param op_args: a list of positional arguments that will get unpacked when
+calling your callable
+:type op_args: list
+:param provide_context: if set to true, Airflow will pass a set of
+keyword arguments that can be used in your function. This set of
+kwargs correspond exactly to what you can use in your jinja
+templates. For this to work, you need to define `**kwargs` in your
+function header.
+:type provide_context: bool
+:param templates_dict: a dictionary where the values are templates that
+will get templated by the Airflow engine sometime between
+``__init__`` and ``execute`` takes place and are made available
+in your callable's context after the template has been applied.
+:type templates_dict: dict of str
+"""
+
+template_fields = ('templates_dict',)
+template_ext = tuple()
+
+@apply_defaults
+def __init__(
+self,
+python_callable,
+op_args=None,
+op_kwargs=None,
+provide_context=False,
+templates_dict=None,
+*args, **kwargs):
+super(PythonSensor, self).__init__(*args, **kwargs)
+self.python_callable = python_callable
+self.op_args = op_args or []
+self.op_kwargs = op_kwargs or {}
+self.provide_context = provide_context
+self.templates_dict = templates_dict
+
+def poke(self, context):
+if self.provide_context:
+context.update(self.op_kwargs)
+context['templates_dict'] = self.templates_dict
+self.op_kwargs = context
+
+self.log.info("Poking callable: " + str(self.python_callable))
+return_value = self.python_callable(*self.op_args, **self.op_kwargs)
+return bool(return_value)
diff --git a/docs/code.rst b/docs/code.rst
index 61414ecbd6..e890adffec 100644
--- a/docs/code.rst
+++ b/docs/code.rst
@@ -256,6 +256,7 @@ Sensors
 .. autoclass:: 
airflow.contrib.sensors.imap_attachment_sensor.ImapAttachmentSensor
 .. autoclass:: airflow.contrib.sensors.jira_sensor.JiraSensor
 .. autoclass:: airflow.contrib.sensors.pubsub_sensor.PubSubPullSensor
+.. autoclass:: airflow.contrib.sensors.python_sensor.PythonSensor
 .. autoclass:: airflow.contrib.sensors.qubole_sensor.QuboleSensor
 .. autoclass:: airflow.contrib.sensors.redis_key_sensor.RedisKeySensor
 .. autoclass:: 
airflow.contrib.sensors.sagemaker_base_sensor.SageMakerBaseSensor
diff --git a/tests/contrib/sensors/test_python_sensor.py 
b/tests/contrib/sensors/test_python_sensor.py
new file mode 100644
index 

[jira] [Commented] (AIRFLOW-3557) Various typos

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727513#comment-16727513
 ] 

ASF GitHub Bot commented on AIRFLOW-3557:
-

kaxil closed pull request #4357: [AIRFLOW-3557] Fix various typos
URL: https://github.com/apache/incubator-airflow/pull/4357
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/CHANGELOG.txt b/CHANGELOG.txt
index abb0563d71..98a1103792 100644
--- a/CHANGELOG.txt
+++ b/CHANGELOG.txt
@@ -24,7 +24,7 @@ Improvements:
 [AIRFLOW-2622] Add "confirm=False" option to SFTPOperator
 [AIRFLOW-2662] support affinity & nodeSelector policies for kubernetes 
executor/operator
 [AIRFLOW-2709] Improve error handling in Databricks hook
-[AIRFLOW-2723] Update lxml dependancy to >= 4.0.
+[AIRFLOW-2723] Update lxml dependency to >= 4.0.
 [AIRFLOW-2763] No precheck mechanism in place during worker initialisation for 
the connection to metadata database
 [AIRFLOW-2789] Add ability to create single node cluster to 
DataprocClusterCreateOperator
 [AIRFLOW-2797] Add ability to create Google Dataproc cluster with custom image
@@ -269,7 +269,7 @@ AIRFLOW 1.10.0, 2018-08-03
 [AIRFLOW-2429] Make Airflow flake8 compliant
 [AIRFLOW-2491] Resolve flask version conflict
 [AIRFLOW-2484] Remove duplicate key in MySQL to GCS Op
-[ARIFLOW-2458] Add cassandra-to-gcs operator
+[AIRFLOW-2458] Add cassandra-to-gcs operator
 [AIRFLOW-2477] Improve time units for task duration and landing times charts 
for RBAC UI
 [AIRFLOW-2474] Only import snakebite if using py2
 [AIRFLOW-48] Parse connection uri querystring
@@ -1504,7 +1504,7 @@ AIRFLOW 1.8.0, 2017-03-12
 [AIRFLOW-784] Pin funcsigs to 1.0.0
 [AIRFLOW-624] Fix setup.py to not import airflow.version as version
 [AIRFLOW-779] Task should fail with specific message when deleted
-[AIRFLOW-778] Fix completey broken MetastorePartitionSensor
+[AIRFLOW-778] Fix completely broken MetastorePartitionSensor
 [AIRFLOW-739] Set pickle_info log to debug
 [AIRFLOW-771] Make S3 logs append instead of clobber
 [AIRFLOW-773] Fix flaky datetime addition in api test
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 556a5d847b..2a60f1dc3c 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -166,10 +166,10 @@ There are three ways to setup an Apache Airflow 
development environment.
   tox -e py35-backend_mysql
   ```
 
-  If you wish to run individual tests inside of docker enviroment you can do 
as follows:
+  If you wish to run individual tests inside of Docker environment you can do 
as follows:
 
   ```bash
-# From the container (with your desired enviroment) with druid hook
+# From the container (with your desired environment) with druid hook
 tox -e py35-backend_mysql -- tests/hooks/test_druid_hook.py
  ```
 
diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index 5cab013b28..30a16305db 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -1594,7 +1594,7 @@ def insert_all(self, project_id, dataset_id, table_id,
 self.log.info('All row(s) inserted successfully: 
{}:{}.{}'.format(
 dataset_project_id, dataset_id, table_id))
 else:
-error_msg = '{} insert error(s) occured: {}:{}.{}. Details: 
{}'.format(
+error_msg = '{} insert error(s) occurred: {}:{}.{}. Details: 
{}'.format(
 len(resp['insertErrors']),
 dataset_project_id, dataset_id, table_id, 
resp['insertErrors'])
 if fail_on_error:
diff --git a/airflow/contrib/hooks/emr_hook.py 
b/airflow/contrib/hooks/emr_hook.py
index f9fd3f04de..fcdf4ac848 100644
--- a/airflow/contrib/hooks/emr_hook.py
+++ b/airflow/contrib/hooks/emr_hook.py
@@ -23,7 +23,7 @@
 
 class EmrHook(AwsHook):
 """
-Interact with AWS EMR. emr_conn_id is only neccessary for using the
+Interact with AWS EMR. emr_conn_id is only necessary for using the
 create_job_flow method.
 """
 
diff --git a/airflow/executors/celery_executor.py 
b/airflow/executors/celery_executor.py
index 98ce6efba7..10694ea4b7 100644
--- a/airflow/executors/celery_executor.py
+++ b/airflow/executors/celery_executor.py
@@ -74,7 +74,7 @@ def execute_command(command_to_exec):
 
 class ExceptionWithTraceback(object):
 """
-Wrapper class used to propogate exceptions to parent processes from 
subprocesses.
+Wrapper class used to propagate exceptions to parent processes from 
subprocesses.
 :param exception: The exception to wrap
 :type exception: Exception
 :param traceback: The stacktrace to wrap
diff --git a/airflow/sensors/base_sensor_operator.py 

[jira] [Commented] (AIRFLOW-3150) Make execution_date a template field in TriggerDagRunOperator

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727510#comment-16727510
 ] 

ASF GitHub Bot commented on AIRFLOW-3150:
-

kaxil opened a new pull request #4359: [AIRFLOW-3150] Make execution_date 
templated in TriggerDagRunOperator
URL: https://github.com/apache/incubator-airflow/pull/4359
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3150
   
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   * `test_trigger_dagrun_with_str_execution_date`
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make execution_date a template field in TriggerDagRunOperator
> -
>
> Key: AIRFLOW-3150
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3150
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Kyle Hamlin
>Assignee: Kaxil Naik
>Priority: Minor
>  Labels: easy-fix
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3556) Add a "cross join" function for setting dependencies between two lists of tasks

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727313#comment-16727313
 ] 

ASF GitHub Bot commented on AIRFLOW-3556:
-

BasPH opened a new pull request #4356: [AIRFLOW-3556] Add cross join set 
downstream function
URL: https://github.com/apache/incubator-airflow/pull/4356
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3556
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Add function to set "cross join style" downstream dependencies between two 
list of tasks. For example:
   
   ```
   cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
   
   Sets dependencies:
   t1 --> t4
  \ /
   t2 -X> t5
  / \
   t3 --> t6
   
   Equivalent to:
   t1.set_downstream(t4)
   t1.set_downstream(t5)
   t1.set_downstream(t6)
   t2.set_downstream(t4)
   t2.set_downstream(t5)
   t2.set_downstream(t6)
   t3.set_downstream(t4)
   t3.set_downstream(t5)
   t3.set_downstream(t6)
   ```
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   HelpersTest.test_cross_downstream()
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add a "cross join" function for setting dependencies between two lists of 
> tasks
> ---
>
> Key: AIRFLOW-3556
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3556
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Bas Harenslak
>Priority: Major
>
> Similar to airflow.utils.helpers.chain(), it would be useful to have a helper 
> function that sets downstream dependencies in a cross join fashion between 
> two lists of tasks.
> For example:
> {code}
> cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
> Sets dependencies:
> t1 --> t4
>\ /
> t2 -X> t5
>/ \
> t3 --> t6
> Equivalent to:
> t1.set_downstream(t4)
> t1.set_downstream(t5)
> t1.set_downstream(t6)
> t2.set_downstream(t4)
> t2.set_downstream(t5)
> t2.set_downstream(t6)
> t3.set_downstream(t4)
> t3.set_downstream(t5)
> t3.set_downstream(t6){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3557) Various typos

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727311#comment-16727311
 ] 

ASF GitHub Bot commented on AIRFLOW-3557:
-

BasPH opened a new pull request #4357: [AIRFLOW-3557] Fix various typos
URL: https://github.com/apache/incubator-airflow/pull/4357
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3557
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   
   Check source code with [misspell](https://github.com/client9/misspell) and 
fixed various typos.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   No changes to code.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   No new functionality.
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Various typos
> -
>
> Key: AIRFLOW-3557
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3557
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bas Harenslak
>Priority: Major
>
> Fix various typos, checked with 
> [misspell|https://github.com/client9/misspell].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3556) Add a "cross join" function for setting dependencies between two lists of tasks

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727310#comment-16727310
 ] 

ASF GitHub Bot commented on AIRFLOW-3556:
-

BasPH closed pull request #4356: [AIRFLOW-3556] Add cross join set downstream 
function
URL: https://github.com/apache/incubator-airflow/pull/4356
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/utils/helpers.py b/airflow/utils/helpers.py
index 328147c1cf..5f8c88879c 100644
--- a/airflow/utils/helpers.py
+++ b/airflow/utils/helpers.py
@@ -169,6 +169,37 @@ def chain(*tasks):
 up_task.set_downstream(down_task)
 
 
+def cross_downstream(from_tasks, to_tasks):
+"""
+Set downstream dependencies for all tasks in from_tasks to all tasks in 
to_tasks.
+E.g.: cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
+Is equivalent to:
+
+t1 --> t4
+   \ /
+t2 -X> t5
+   / \
+t3 --> t6
+
+t1.set_downstream(t4)
+t1.set_downstream(t5)
+t1.set_downstream(t6)
+t2.set_downstream(t4)
+t2.set_downstream(t5)
+t2.set_downstream(t6)
+t3.set_downstream(t4)
+t3.set_downstream(t5)
+t3.set_downstream(t6)
+
+:param from_tasks: List of tasks to start from.
+:type from_tasks: List[airflow.models.BaseOperator]
+:param to_tasks: List of tasks to set as downstream dependencies.
+:type to_tasks: List[airflow.models.BaseOperator]
+"""
+for task in from_tasks:
+task.set_downstream(to_tasks)
+
+
 def pprinttable(rows):
 """Returns a pretty ascii table from tuples
 
diff --git a/tests/utils/test_helpers.py b/tests/utils/test_helpers.py
index 4cb3e1a1fc..837a79acba 100644
--- a/tests/utils/test_helpers.py
+++ b/tests/utils/test_helpers.py
@@ -20,11 +20,16 @@
 import logging
 import multiprocessing
 import os
-import psutil
 import signal
 import time
 import unittest
+from datetime import datetime
+
+import psutil
+import six
 
+from airflow import DAG
+from airflow.operators.dummy_operator import DummyOperator
 from airflow.utils import helpers
 
 
@@ -210,6 +215,16 @@ def test_is_container(self):
 # Pass an object that is not iter nor a string.
 self.assertFalse(helpers.is_container(10))
 
+def test_cross_downstream(self):
+"""Test if all dependencies between tasks are all set correctly."""
+dag = DAG(dag_id="test_dag", start_date=datetime.now())
+start_tasks = [DummyOperator(task_id="t{i}".format(i=i), dag=dag) for 
i in range(1, 4)]
+end_tasks = [DummyOperator(task_id="t{i}".format(i=i), dag=dag) for i 
in range(4, 7)]
+helpers.cross_downstream(from_tasks=start_tasks, to_tasks=end_tasks)
+
+for start_task in start_tasks:
+six.assertCountEqual(self, 
start_task.get_direct_relatives(upstream=False), end_tasks)
+
 
 if __name__ == '__main__':
 unittest.main()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add a "cross join" function for setting dependencies between two lists of 
> tasks
> ---
>
> Key: AIRFLOW-3556
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3556
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Bas Harenslak
>Priority: Major
>
> Similar to airflow.utils.helpers.chain(), it would be useful to have a helper 
> function that sets downstream dependencies in a cross join fashion between 
> two lists of tasks.
> For example:
> {code}
> cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
> Sets dependencies:
> t1 --> t4
>\ /
> t2 -X> t5
>/ \
> t3 --> t6
> Equivalent to:
> t1.set_downstream(t4)
> t1.set_downstream(t5)
> t1.set_downstream(t6)
> t2.set_downstream(t4)
> t2.set_downstream(t5)
> t2.set_downstream(t6)
> t3.set_downstream(t4)
> t3.set_downstream(t5)
> t3.set_downstream(t6){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3153) send dag last_run to statsd

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727178#comment-16727178
 ] 

ASF GitHub Bot commented on AIRFLOW-3153:
-

stale[bot] closed pull request #3997: [AIRFLOW-3153] send dag last_run to statsd
URL: https://github.com/apache/incubator-airflow/pull/3997
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/jobs.py b/airflow/jobs.py
index da1089d690..94ec4458d8 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -580,7 +580,8 @@ def __init__(
 self.using_sqlite = False
 if 'sqlite' in conf.get('core', 'sql_alchemy_conn'):
 if self.max_threads > 1:
-self.log.error("Cannot use more than 1 thread when using 
sqlite. Setting max_threads to 1")
+self.log.error("Cannot use more than 1 thread when using 
sqlite. "
+   "Setting max_threads to 1")
 self.max_threads = 1
 self.using_sqlite = True
 
@@ -1026,7 +1027,8 @@ def _change_state_for_tis_without_dagrun(self,
 
 if tis_changed > 0:
 self.log.warning(
-"Set %s task instances to state=%s as their associated DagRun 
was not in RUNNING state",
+"Set %s task instances to state=%s "
+"as their associated DagRun was not in RUNNING state",
 tis_changed, new_state
 )
 
@@ -1201,7 +1203,8 @@ def _find_executable_task_instances(self, simple_dag_bag, 
states, session=None):
   " this task has been reached.", 
task_instance)
 continue
 else:
-task_concurrency_map[(task_instance.dag_id, 
task_instance.task_id)] += 1
+task_concurrency_map[(task_instance.dag_id,
+  task_instance.task_id)] += 1
 
 if self.executor.has_task(task_instance):
 self.log.debug(
@@ -1505,6 +1508,8 @@ def _log_file_processing_stats(self,
"Last Run"]
 
 rows = []
+dags_folder = conf.get('core', 'dags_folder').rstrip(os.sep)
+
 for file_path in known_file_paths:
 last_runtime = processor_manager.get_last_runtime(file_path)
 processor_pid = processor_manager.get_pid(file_path)
@@ -1513,6 +1518,16 @@ def _log_file_processing_stats(self,
if processor_start_time else None)
 last_run = processor_manager.get_last_finish_time(file_path)
 
+file_name = file_path[len(dags_folder) + 1:]
+dag_name = os.path.splitext(file_name)[0].replace(os.sep, '.')
+if last_runtime is not None:
+Stats.gauge('last_runtime.{}'.format(dag_name), last_runtime)
+if last_run is not None:
+unixtime = last_run.strftime("%s")
+seconds_ago = (timezone.utcnow() - last_run).total_seconds()
+Stats.gauge('last_run.unixtime.{}'.format(dag_name), unixtime)
+Stats.gauge('last_run.seconds_ago.{}'.format(dag_name), 
seconds_ago)
+
 rows.append((file_path,
  processor_pid,
  runtime,


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> send dag last_run to statsd
> ---
>
> Key: AIRFLOW-3153
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3153
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Tao Feng
>Assignee: Tao Feng
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3556) Add a "cross join" function for setting dependencies between two lists of tasks

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727090#comment-16727090
 ] 

ASF GitHub Bot commented on AIRFLOW-3556:
-

BasPH opened a new pull request #4356: [AIRFLOW-3556] Add cross join set 
dependency function
URL: https://github.com/apache/incubator-airflow/pull/4356
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3556
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Add function to set "cross join style" downstream dependencies between two 
list of tasks. For example:
   
   ```
   cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
   
   Sets dependencies:
   t1 --> t4
  \ /
   t2 -X> t5
  / \
   t3 --> t6
   
   Equivalent to:
   t1.set_downstream(t4)
   t1.set_downstream(t5)
   t1.set_downstream(t6)
   t2.set_downstream(t4)
   t2.set_downstream(t5)
   t2.set_downstream(t6)
   t3.set_downstream(t4)
   t3.set_downstream(t5)
   t3.set_downstream(t6)
   ```
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   HelpersTest.test_cross_downstream()
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add a "cross join" function for setting dependencies between two lists of 
> tasks
> ---
>
> Key: AIRFLOW-3556
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3556
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Bas Harenslak
>Priority: Major
>
> Similar to airflow.utils.helpers.chain(), it would be useful to have a helper 
> function that sets downstream dependencies in a cross join fashion between 
> two lists of tasks.
> For example:
> {code}
> cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
> Sets dependencies:
> t1 --> t4
>\ /
> t2 -X> t5
>/ \
> t3 --> t6
> Equivalent to:
> t1.set_downstream(t4)
> t1.set_downstream(t5)
> t1.set_downstream(t6)
> t2.set_downstream(t4)
> t2.set_downstream(t5)
> t2.set_downstream(t6)
> t3.set_downstream(t4)
> t3.set_downstream(t5)
> t3.set_downstream(t6){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3446) Add operators for Google Cloud BigTable

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726969#comment-16726969
 ] 

ASF GitHub Bot commented on AIRFLOW-3446:
-

DariuszAniszewski opened a new pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following 
[AIRFLOW-3446](https://issues.apache.org/jira/browse/AIRFLOW-3446/) issues and 
references them in the PR title. 
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   New operators allows:
   * creating and deleting instance
   * creating and deleting table
   * updating cluster
   * waiting for table replication (sensor)
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests:
   * tests/contrib/operators/test_gcp_bigtable_operator.py
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add operators for Google Cloud BigTable
> ---
>
> Key: AIRFLOW-3446
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3446
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Dariusz Aniszewski
>Assignee: Dariusz Aniszewski
>Priority: Major
>
> Proposed operators:
>  * BigTableInstanceCreateOperator
>  * BigTableInstanceDeleteOperator
>  * BigTableTableCreateOperator
>  * BigTableTableDeleteOperator
>  * BigTableClusterUpdateOperator
>  * BigTableTableWaitForReplicationSensor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3480) Google Cloud Spanner Instance Database Deploy/Update/Delete

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726905#comment-16726905
 ] 

ASF GitHub Bot commented on AIRFLOW-3480:
-

potiuk opened a new pull request #4353: [AIRFLOW-3480] Added Database 
Deploy/Update/Delete operators
URL: https://github.com/apache/incubator-airflow/pull/4353
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/AIRFLOW-3480) issue and 
references them in the PR title. 
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Added Database Deploy/Update/Delete operators for Google Cloud Spanner
   
   ### Tests
   
   - [x] My PR adds the following unit tests:
   
   - test_database_create
   - test_database_create_with_pre_existing_db
   - test_database_create_ex_if_param_missing(parameterised)
   - test_database_update
   - test_database_update_ex_if_param_missing(parameterised)
   - test_database_update_ex_if_database_not_exist
   - test_database_delete
   - test_database_delete_exits_and_succeeds_if_database_does_not_exist
   - test_database_delete_ex_if_param_missing (parameterised)

   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Google Cloud Spanner Instance Database Deploy/Update/Delete
> ---
>
> Key: AIRFLOW-3480
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3480
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: gcp
>Reporter: Jarek Potiuk
>Assignee: Jarek Potiuk
>Priority: Minor
>
> We need to have operators to implement Instance management operations:
>  * InstanceDeploy (create database if it does not exist, succeed if already 
> created(
>  * Update (run update_ddl method changing database structure)
>  * Delete (delete the database)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3555) Remove lxml dependency

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726898#comment-16726898
 ] 

ASF GitHub Bot commented on AIRFLOW-3555:
-

jcao219 opened a new pull request #4352: [AIRFLOW-3555] Remove lxml dependency
URL: https://github.com/apache/incubator-airflow/pull/4352
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3555
   
   ### Description
   
   - [x] The lxml dependency is no longer needed except for when running tests.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason: Dependency clean up
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove lxml dependency
> --
>
> Key: AIRFLOW-3555
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3555
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: dependencies
>Affects Versions: 1.9.0, 1.10.0, 1.10.1
>Reporter: Jimmy Cao
>Assignee: Jimmy Cao
>Priority: Major
>
> In this PR: 
> [https://github.com/apache/incubator-airflow/pull/1712/files#diff-948e87b4f8f644b3ad8c7950958df033]
>  lxml was added to airflow/www/views.py, and then in this following PR: 
> [https://github.com/apache/incubator-airflow/pull/1722]  the lxml package was 
> added to the list of core dependencies.
> However, months later in this commit: 
> [https://github.com/apache/incubator-airflow/commit/1accb54ff561b8d745277308447dd6f9d3e9f8d5#diff-948e87b4f8f644b3ad8c7950958df033]
>  the lxml import was removed from airflow/www/views.py so it is no longer 
> needed except in the devel extras because it's still used in tests.
> It should be removed from the install_requires list.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3554) Remove contrib folder from being omitted by code cov

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726772#comment-16726772
 ] 

ASF GitHub Bot commented on AIRFLOW-3554:
-

feluelle opened a new pull request #4351: [AIRFLOW-3554] Remove contrib folder 
from code cov omit list
URL: https://github.com/apache/incubator-airflow/pull/4351
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3554
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Currently the `contrib` folder is not being processed by codecov.
   That means that contributors won't see a code coverage of their implemented 
code in this folder.
   To generally improve code/test coverage for this project I would recommend 
to enable coverage for the `contrib` folder, too.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove contrib folder from being omitted by code cov
> 
>
> Key: AIRFLOW-3554
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3554
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3527) Cloud SQL proxy with UNIX sockets might lead to too long socket path

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726752#comment-16726752
 ] 

ASF GitHub Bot commented on AIRFLOW-3527:
-

potiuk opened a new pull request #4350: [AIRFLOW-3527] Cloud SQL Proxy has 
shorter path for UNIX socket
URL: https://github.com/apache/incubator-airflow/pull/4350
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3527)
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   There is a limitation of UNIX socket path length as described in
   
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_un.h.html#tag_13_67_04
   
   Cloud SQL Proxy uses generated path and it can get longer than the limit
   in case of POSTGRES connections especially (POSTGRES adds few characters on
   its own). The error returned by sqlproxy in this case is pretty vague
   (invalid path) - it makes it difficult to understand the problem by
   the user.
   
   This commit fixes it in two ways:
   * makes it less likely that the path length will be exceeded
   by shorter random string generated for the socket directory.
   * raises an Error in case of calculated path is too long
   ### Tests
   
   - [x] My PR adds the following unit tests:
   CloudSqlQueryValidationTest:
   * test_create_operator_with_too_long_unix_socket_path
   * test_create_operator_with_not_too_long_unix_socket_path
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] No documentation update is needed.
   
   ### Code Quality
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cloud SQL proxy with UNIX sockets might lead to too long socket path
> 
>
> Key: AIRFLOW-3527
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3527
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jarek Potiuk
>Priority: Major
>
> Currently Cloud SQL Proxy with UNIX sockets creates the proxy dir in 
> /tmp/\{UDID1}/folder - which in case of postgres and long instance names 
> might lead to too long name of UNIX socket (the path length for socket is 
> limited to 108 characters in Linux). 
> [http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_un.h.html#tag_13_67_04]
> However in case instance name is long enough that leads to too long path 
> (which turns to be fairly short - instance names can often exceed 20-30 
> characters)  and a cryptic "invalid path name" error. Therefor we need to 
> 1) generate the path with shorter random number prefix. 8 characters should 
> be random enough + we can check if the generated path did not exist already 
> and generate another one if that's the case.
> 2) fail validation in case the generated path is too long and propose a 
> solution (shorter names or switching to TCP).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-850) Airflow should support a general purpose PythonSensor

2018-12-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726474#comment-16726474
 ] 

ASF GitHub Bot commented on AIRFLOW-850:


feng-tao closed pull request #2058: [AIRFLOW-850] Add a PythonSensor
URL: https://github.com/apache/incubator-airflow/pull/2058
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/operators/sensors.py b/airflow/operators/sensors.py
index 44a97e00c1..bf02335a95 100644
--- a/airflow/operators/sensors.py
+++ b/airflow/operators/sensors.py
@@ -679,3 +679,57 @@ def poke(self, context):
 raise ae
 
 return True
+
+class PythonSensor(BaseSensorOperator):
+"""
+Waits for a Python callable to return True
+
+:param python_callable: A reference to an object that is callable
+:type python_callable: python callable
+:param op_kwargs: a dictionary of keyword arguments that will get unpacked
+in your function
+:type op_kwargs: dict
+:param op_args: a list of positional arguments that will get unpacked when
+calling your callable
+:type op_args: list
+:param provide_context: if set to true, Airflow will pass a set of
+keyword arguments that can be used in your function. This set of
+kwargs correspond exactly to what you can use in your jinja
+templates. For this to work, you need to define `**kwargs` in your
+function header.
+:type provide_context: bool
+:param templates_dict: a dictionary where the values are templates that
+will get templated by the Airflow engine sometime between
+``__init__`` and ``execute`` takes place and are made available
+in your callable's context after the template has been applied
+:type templates_dict: dict of str
+"""
+
+template_fields = ('templates_dict',)
+template_ext = tuple()
+
+def __init__(
+self,
+python_callable,
+op_args=None,
+op_kwargs=None,
+provide_context=False,
+templates_dict=None,
+*args, **kwargs):
+super(PythonSensor, self).__init__(*args, **kwargs)
+self.python_callable = python_callable
+self.op_args = op_args or []
+self.op_kwargs = op_kwargs or {}
+self.provide_context = provide_context
+self.templates_dict = templates_dict
+
+
+def poke(self, context):
+if self.provide_context:
+context.update(self.op_kwargs)
+context['templates_dict'] = self.templates_dict
+self.op_kwargs = context
+
+logging.info("Poking callable: " + str(self.python_callable))
+return_value = self.python_callable(*self.op_args, **self.op_kwargs)
+return bool(return_value)
diff --git a/tests/operators/sensors.py b/tests/operators/sensors.py
index e77216b580..2633e4c41b 100644
--- a/tests/operators/sensors.py
+++ b/tests/operators/sensors.py
@@ -22,7 +22,7 @@
 from datetime import datetime, timedelta
 
 from airflow import DAG, configuration
-from airflow.operators.sensors import HttpSensor, BaseSensorOperator, 
HdfsSensor
+from airflow.operators.sensors import HttpSensor, BaseSensorOperator, 
HdfsSensor, PythonSensor
 from airflow.utils.decorators import apply_defaults
 from airflow.exceptions import (AirflowException,
 AirflowSensorTimeout,
@@ -181,3 +181,38 @@ def test_legacy_file_does_not_exists(self):
 # Then
 with self.assertRaises(AirflowSensorTimeout):
 task.execute(None)
+
+class PythonSensorTests(unittest.TestCase):
+
+def setUp(self):
+configuration.load_test_config()
+args = {
+'owner': 'airflow',
+'start_date': DEFAULT_DATE
+}
+dag = DAG(TEST_DAG_ID, default_args=args)
+self.dag = dag
+
+def test_python_sensor_true(self):
+t = PythonSensor(
+task_id='python_sensor_check_true',
+python_callable=lambda: True,
+dag=self.dag)
+t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, 
ignore_ti_state=True)
+
+def test_python_sensor_false(self):
+t = PythonSensor(
+task_id='python_sensor_check_false',
+timeout=1,
+python_callable=lambda: False,
+dag=self.dag)
+with self.assertRaises(AirflowSensorTimeout):
+t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, 
ignore_ti_state=True)
+
+def test_python_sensor_raise(self):
+t = PythonSensor(
+task_id='python_sensor_check_raise',
+python_callable=lambda: 1/0,
+dag=self.dag)
+with 

[jira] [Commented] (AIRFLOW-850) Airflow should support a general purpose PythonSensor

2018-12-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726472#comment-16726472
 ] 

ASF GitHub Bot commented on AIRFLOW-850:


feng-tao opened a new pull request #4349: [AIRFLOW-850] Add a PythonSensor
URL: https://github.com/apache/incubator-airflow/pull/4349
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-850
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   A general purpose PythonSensor which allows an arbitrary Python callable to 
delay Task execution until the callable returns True. This is based on a stale 
pr(https://github.com/apache/incubator-airflow/pull/2058)
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   yes
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow should support a general purpose PythonSensor
> -
>
> Key: AIRFLOW-850
> URL: https://issues.apache.org/jira/browse/AIRFLOW-850
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.8.0
>Reporter: Daniel Gies
>Assignee: Daniel Gies
>Priority: Major
>
> Today I found myself trying to use a sensor to postpone execution until data 
> for the current execution date appeared in a file.  It occurred to me that 
> having a general purpose PythonSensor would allow developers to use the 
> sensor paradigm with arbitrary code.
> We should add a PythonSensor to the core sensors module which takes a 
> python_callable and optional args like the PythonOperator does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3163) Add set table description operator to BigQuery operators

2018-12-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725931#comment-16725931
 ] 

ASF GitHub Bot commented on AIRFLOW-3163:
-

stale[bot] closed pull request #4003: [AIRFLOW-3163] add operator to enable 
setting table description in BigQuery table
URL: https://github.com/apache/incubator-airflow/pull/4003
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index dd77df1283..ccbb36dbd4 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -135,6 +135,34 @@ def table_exists(self, project_id, dataset_id, table_id):
 return False
 raise
 
+def set_table_description(self, dataset_id, table_id, description, 
project_id=None):
+"""
+Sets the description for the given table
+
+:param project_id: The Google cloud project in which to look for the
+table. The connection supplied to the hook must provide access to
+the specified project.
+:type project_id: string
+:param dataset_id: The name of the dataset in which to look for the
+table.
+:type dataset_id: string
+:param table_id: The name of the table to set the description for.
+:type table_id: string
+:param description: The description to set
+:type description: string
+"""
+service = self.get_service()
+project_id = project_id if project_id is not None else 
self._get_field('project')
+table = service.tables().get(
+projectId=project_id, datasetId=dataset_id,
+tableId=table_id).execute()
+table['description'] = description
+service.tables().patch(
+projectId=project_id,
+datasetId=dataset_id,
+tableId=table_id,
+body=table).execute()
+
 
 class BigQueryPandasConnector(GbqConnector):
 """
diff --git a/airflow/contrib/operators/bigquery_operator.py 
b/airflow/contrib/operators/bigquery_operator.py
index 9386e57c07..1ad19a7aa0 100644
--- a/airflow/contrib/operators/bigquery_operator.py
+++ b/airflow/contrib/operators/bigquery_operator.py
@@ -629,3 +629,57 @@ def execute(self, context):
 project_id=self.project_id,
 dataset_id=self.dataset_id,
 dataset_reference=self.dataset_reference)
+
+
+class BigQuerySetTableDescriptionOperator(BaseOperator):
+"""
+This operator is called to set the desription on a table
+
+:param project_id: The Google cloud project in which to look for the
+table. The connection supplied must provide access to
+the specified project.
+:type project_id: string
+:param dataset_id: The name of the dataset in which to look for the
+table.
+:type dataset_id: string
+:param table_id: The name of the table to set the description for.
+:type table_id: string
+:param description: The description to set
+:type description: string
+:param bigquery_conn_id: The connection ID to use when
+connecting to BigQuery.
+:type google_cloud_storage_conn_id: string
+:param delegate_to: The account to impersonate, if any. For this to
+work, the service account making the request must have domain-wide
+delegation enabled.
+:type delegate_to: string
+"""
+template_fields = ('project_id', 'dataset_id', 'table_id', 'description')
+ui_color = '#f0eee4'
+
+@apply_defaults
+def __init__(self,
+ project_id=None,
+ dataset_id=None,
+ table_id=None,
+ description=None,
+ bigquery_conn_id='bigquery_default',
+ delegate_to=None,
+ *args,
+ **kwargs):
+super(BigQuerySetTableDescriptionOperator, self).__init__(*args, 
**kwargs)
+self.project_id = project_id
+self.dataset_id = dataset_id
+self.table_id = table_id
+self.description = description
+self.bigquery_conn_id = bigquery_conn_id
+self.delegate_to = delegate_to
+
+def execute(self, context):
+bq_hook = BigQueryHook(
+bigquery_conn_id=self.bigquery_conn_id,
+delegate_to=self.delegate_to)
+bq_hook.set_table_description(project_id=self.project_id,
+  dataset_id=self.dataset_id,
+  table_id=self.table_id,
+  description=self.description)


 


[jira] [Commented] (AIRFLOW-2937) HttpHook doesn't respect the URI scheme when the connection is defined via Environment Variable

2018-12-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725810#comment-16725810
 ] 

ASF GitHub Bot commented on AIRFLOW-2937:
-

stale[bot] closed pull request #3783: [AIRFLOW-2937] Support HTTPS in Http 
connection form environment variables
URL: https://github.com/apache/incubator-airflow/pull/3783
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/hooks/http_hook.py b/airflow/hooks/http_hook.py
index c449fe0c15..359e588a96 100644
--- a/airflow/hooks/http_hook.py
+++ b/airflow/hooks/http_hook.py
@@ -62,7 +62,7 @@ def get_conn(self, headers=None):
 self.base_url = conn.host
 else:
 # schema defaults to HTTP
-schema = conn.schema if conn.schema else "http"
+schema = conn.conn_type if conn.conn_type else conn.schema if 
conn.schema else "http"
 self.base_url = schema + "://" + conn.host
 
 if conn.port:


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> HttpHook doesn't respect the URI scheme when the connection is defined via 
> Environment Variable
> ---
>
> Key: AIRFLOW-2937
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2937
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Matt Chapman
>Assignee: Matt Chapman
>Priority: Major
>
> AIRFLOW-645 almost solved this, but not quite.
> I believe AIRFLOW-2841 is another misguided attempt at solving this problem, 
> and shows that this is an issue for other users.
> The core issue is that the HttpHook confusingly mixes up the ideas of 'URI 
> scheme' and 'Database schema.' 
> I'm submitting a patch that fixes the issue while maintaining backward 
> compatibility, but does not solve the core confusion, which I suggest should 
> be addressed in the next major release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3458) Refactor: Move Connection out of models.py

2018-12-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725802#comment-16725802
 ] 

ASF GitHub Bot commented on AIRFLOW-3458:
-

Fokko closed pull request #4335: [AIRFLOW-3458] Move models.Connection into 
separate file
URL: https://github.com/apache/incubator-airflow/pull/4335
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index cd414d2821..143e2b34aa 100644
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -33,6 +33,8 @@
 import argparse
 from builtins import input
 from collections import namedtuple
+
+from airflow.models.connection import Connection
 from airflow.utils.timezone import parse as parsedate
 import json
 from tabulate import tabulate
@@ -55,8 +57,7 @@
 from airflow.exceptions import AirflowException, AirflowWebServerTimeout
 from airflow.executors import GetDefaultExecutor
 from airflow.models import (DagModel, DagBag, TaskInstance,
-DagPickle, DagRun, Variable, DagStat,
-Connection, DAG)
+DagPickle, DagRun, Variable, DagStat, DAG)
 
 from airflow.ti_deps.dep_context import (DepContext, SCHEDULER_DEPS)
 from airflow.utils import cli as cli_utils
diff --git a/airflow/contrib/executors/mesos_executor.py 
b/airflow/contrib/executors/mesos_executor.py
index 0609d71cf2..7aae91e6d4 100644
--- a/airflow/contrib/executors/mesos_executor.py
+++ b/airflow/contrib/executors/mesos_executor.py
@@ -80,7 +80,7 @@ def registered(self, driver, frameworkId, masterInfo):
 if configuration.conf.getboolean('mesos', 'CHECKPOINT') and \
 configuration.conf.get('mesos', 'FAILOVER_TIMEOUT'):
 # Import here to work around a circular import error
-from airflow.models import Connection
+from airflow.models.connection import Connection
 
 # Update the Framework ID in the database.
 session = Session()
@@ -253,7 +253,7 @@ def start(self):
 
 if configuration.conf.get('mesos', 'FAILOVER_TIMEOUT'):
 # Import here to work around a circular import error
-from airflow.models import Connection
+from airflow.models.connection import Connection
 
 # Query the database to get the ID of the Mesos Framework, if 
available.
 conn_id = FRAMEWORK_CONNID_PREFIX + framework.name
diff --git a/airflow/contrib/hooks/gcp_sql_hook.py 
b/airflow/contrib/hooks/gcp_sql_hook.py
index 1581637e0d..9872746b7b 100644
--- a/airflow/contrib/hooks/gcp_sql_hook.py
+++ b/airflow/contrib/hooks/gcp_sql_hook.py
@@ -34,7 +34,7 @@
 import requests
 from googleapiclient.discovery import build
 
-from airflow import AirflowException, LoggingMixin, models
+from airflow import AirflowException, LoggingMixin
 from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
 
 # Number of retries - used by googleapiclient method calls to perform retries
@@ -42,7 +42,7 @@
 from airflow.hooks.base_hook import BaseHook
 from airflow.hooks.mysql_hook import MySqlHook
 from airflow.hooks.postgres_hook import PostgresHook
-from airflow.models import Connection
+from airflow.models.connection import Connection
 from airflow.utils.db import provide_session
 
 NUM_RETRIES = 5
@@ -457,8 +457,8 @@ def _download_sql_proxy_if_needed(self):
 
 @provide_session
 def _get_credential_parameters(self, session):
-connection = session.query(models.Connection). \
-filter(models.Connection.conn_id == self.gcp_conn_id).first()
+connection = session.query(Connection). \
+filter(Connection.conn_id == self.gcp_conn_id).first()
 session.expunge_all()
 if GCP_CREDENTIALS_KEY_PATH in connection.extra_dejson:
 credential_params = [
@@ -851,8 +851,8 @@ def delete_connection(self, session=None):
 decorator).
 """
 self.log.info("Deleting connection {}".format(self.db_conn_id))
-connection = session.query(models.Connection).filter(
-models.Connection.conn_id == self.db_conn_id)[0]
+connection = session.query(Connection).filter(
+Connection.conn_id == self.db_conn_id)[0]
 session.delete(connection)
 session.commit()
 
diff --git a/airflow/hooks/base_hook.py b/airflow/hooks/base_hook.py
index ef44f6469d..c1283e3fb4 100644
--- a/airflow/hooks/base_hook.py
+++ b/airflow/hooks/base_hook.py
@@ -25,7 +25,7 @@
 import os
 import random
 
-from airflow.models import Connection
+from airflow.models.connection import Connection
 from airflow.exceptions import AirflowException
 

[jira] [Commented] (AIRFLOW-3547) Jinja templating is not enabled for some SparkSubmitOperator parameters.

2018-12-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725598#comment-16725598
 ] 

ASF GitHub Bot commented on AIRFLOW-3547:
-

thesuperzapper opened a new pull request #4347: [AIRFLOW-3547] Fixed Jinja 
templating in SparkSubmitOperator
URL: https://github.com/apache/incubator-airflow/pull/4347
 
 
   This is a minor change to allow Jinja templating in parameters where it 
makes sense for SparkSubmitOperator.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Jinja templating is not enabled for some SparkSubmitOperator parameters.
> 
>
> Key: AIRFLOW-3547
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3547
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 1.10.1
>Reporter: Mathew
>Assignee: Mathew
>Priority: Minor
>
> SparkSubmitOperator currently only supports Jinja templating in its 'name', 
> 'application_args' and 'packages' parameters, this is problematic as a user 
> might want to do something like:
> {code:python}
> application="{{ dag.folder }}/spark_code.py"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3546) Typo in jobs.py logs

2018-12-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725383#comment-16725383
 ] 

ASF GitHub Bot commented on AIRFLOW-3546:
-

feng-tao closed pull request #4346: [AIRFLOW-3546] Fix typos in jobs.py logs
URL: https://github.com/apache/incubator-airflow/pull/4346
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/jobs.py b/airflow/jobs.py
index e60f135972..8472ecd383 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -1213,7 +1213,7 @@ def _find_executable_task_instances(self, simple_dag_bag, 
states, session=None):
 task_instance_str = "\n\t".join(
 ["{}".format(x) for x in executable_tis])
 self.log.info(
-"Setting the follow tasks to queued state:\n\t%s", 
task_instance_str)
+"Setting the following tasks to queued state:\n\t%s", 
task_instance_str)
 # so these dont expire on commit
 for ti in executable_tis:
 copy_dag_id = ti.dag_id
@@ -1408,7 +1408,7 @@ def _change_state_for_tasks_failed_to_execute(self, 
session):
 ["{}".format(x) for x in tis_to_set_to_scheduled])
 
 session.commit()
-self.log.info("Set the follow tasks to scheduled state:\n\t{}"
+self.log.info("Set the following tasks to scheduled state:\n\t{}"
   .format(task_instance_str))
 
 def _process_dags(self, dagbag, dags, tis_out):


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Typo in jobs.py logs
> 
>
> Key: AIRFLOW-3546
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3546
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Stan Kudrow
>Assignee: Stan Kudrow
>Priority: Trivial
>
> PR: https://github.com/apache/incubator-airflow/pull/4346



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3398) Google Cloud Spanner instance database query operator

2018-12-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725360#comment-16725360
 ] 

ASF GitHub Bot commented on AIRFLOW-3398:
-

kaxil closed pull request #4314: [AIRFLOW-3398] Google Cloud Spanner instance 
database query operator
URL: https://github.com/apache/incubator-airflow/pull/4314
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/example_dags/example_gcp_spanner.py 
b/airflow/contrib/example_dags/example_gcp_spanner.py
index dd8b8c52b9..cec3dcb855 100644
--- a/airflow/contrib/example_dags/example_gcp_spanner.py
+++ b/airflow/contrib/example_dags/example_gcp_spanner.py
@@ -18,18 +18,18 @@
 # under the License.
 
 """
-Example Airflow DAG that creates, updates and deletes a Cloud Spanner instance.
+Example Airflow DAG that creates, updates, queries and deletes a Cloud Spanner 
instance.
 
 This DAG relies on the following environment variables
-* PROJECT_ID - Google Cloud Platform project for the Cloud Spanner instance.
-* INSTANCE_ID - Cloud Spanner instance ID.
-* CONFIG_NAME - The name of the instance's configuration. Values are of the 
form
+* SPANNER_PROJECT_ID - Google Cloud Platform project for the Cloud Spanner 
instance.
+* SPANNER_INSTANCE_ID - Cloud Spanner instance ID.
+* SPANNER_CONFIG_NAME - The name of the instance's configuration. Values are 
of the form
 projects//instanceConfigs/.
 See also:
 
https://cloud.google.com/spanner/docs/reference/rest/v1/projects.instanceConfigs#InstanceConfig
 
https://cloud.google.com/spanner/docs/reference/rest/v1/projects.instanceConfigs/list#google.spanner.admin.instance.v1.InstanceAdmin.ListInstanceConfigs
-* NODE_COUNT - Number of nodes allocated to the instance.
-* DISPLAY_NAME - The descriptive name for this instance as it appears in UIs.
+* SPANNER_NODE_COUNT - Number of nodes allocated to the instance.
+* SPANNER_DISPLAY_NAME - The descriptive name for this instance as it appears 
in UIs.
 Must be unique per project and between 4 and 30 characters in length.
 """
 
@@ -38,15 +38,17 @@
 import airflow
 from airflow import models
 from airflow.contrib.operators.gcp_spanner_operator import \
-CloudSpannerInstanceDeployOperator, CloudSpannerInstanceDeleteOperator
+CloudSpannerInstanceDeployOperator, 
CloudSpannerInstanceDatabaseQueryOperator, \
+CloudSpannerInstanceDeleteOperator
 
 # [START howto_operator_spanner_arguments]
-PROJECT_ID = os.environ.get('PROJECT_ID', 'example-project')
-INSTANCE_ID = os.environ.get('INSTANCE_ID', 'testinstance')
-CONFIG_NAME = os.environ.get('CONFIG_NAME',
+PROJECT_ID = os.environ.get('SPANNER_PROJECT_ID', 'example-project')
+INSTANCE_ID = os.environ.get('SPANNER_INSTANCE_ID', 'testinstance')
+DB_ID = os.environ.get('SPANNER_DB_ID', 'db1')
+CONFIG_NAME = os.environ.get('SPANNER_CONFIG_NAME',
  'projects/example-project/instanceConfigs/eur3')
-NODE_COUNT = os.environ.get('NODE_COUNT', '1')
-DISPLAY_NAME = os.environ.get('DISPLAY_NAME', 'Test Instance')
+NODE_COUNT = os.environ.get('SPANNER_NODE_COUNT', '1')
+DISPLAY_NAME = os.environ.get('SPANNER_DISPLAY_NAME', 'Test Instance')
 # [END howto_operator_spanner_arguments]
 
 default_args = {
@@ -80,6 +82,24 @@
 task_id='spanner_instance_update_task'
 )
 
+# [START howto_operator_spanner_query]
+spanner_instance_query = CloudSpannerInstanceDatabaseQueryOperator(
+project_id=PROJECT_ID,
+instance_id=INSTANCE_ID,
+database_id='db1',
+query="DELETE FROM my_table2 WHERE true",
+task_id='spanner_instance_query'
+)
+# [END howto_operator_spanner_query]
+
+spanner_instance_query2 = CloudSpannerInstanceDatabaseQueryOperator(
+project_id=PROJECT_ID,
+instance_id=INSTANCE_ID,
+database_id='db1',
+query="example_gcp_spanner.sql",
+task_id='spanner_instance_query2'
+)
+
 # [START howto_operator_spanner_delete]
 spanner_instance_delete_task = CloudSpannerInstanceDeleteOperator(
 project_id=PROJECT_ID,
@@ -89,4 +109,5 @@
 # [END howto_operator_spanner_delete]
 
 spanner_instance_create_task >> spanner_instance_update_task \
+>> spanner_instance_query >> spanner_instance_query2 \
 >> spanner_instance_delete_task
diff --git a/airflow/contrib/example_dags/example_gcp_spanner.sql 
b/airflow/contrib/example_dags/example_gcp_spanner.sql
new file mode 100644
index 00..5d5f238022
--- /dev/null
+++ b/airflow/contrib/example_dags/example_gcp_spanner.sql
@@ -0,0 +1,3 @@
+INSERT my_table2 (id, name) VALUES (7, 'Seven');
+INSERT my_table2 (id, name)
+VALUES (8, 'Eight');
diff --git 

[jira] [Commented] (AIRFLOW-3546) Typo in jobs.py logs

2018-12-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725352#comment-16725352
 ] 

ASF GitHub Bot commented on AIRFLOW-3546:
-

stankud opened a new pull request #4346: [AIRFLOW-3546] Fix typos in jobs.py 
logs
URL: https://github.com/apache/incubator-airflow/pull/4346
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3546) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3546
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x ] Here are some details about my PR, including screenshots of any UI 
changes:
   Scheduler logs the following:
   ```
   ...INFO - Setting the follow tasks to queued state:
   ```
   this PR changes `follow` to `following`
   
   ### Tests
   
   - [ x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   This is a trivial change which doesn't need tests
   
   ### Commits
   
   - [x ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Typo in jobs.py logs
> 
>
> Key: AIRFLOW-3546
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3546
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Stan Kudrow
>Assignee: Stan Kudrow
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3249) unify do_xcom_push for all operators

2018-12-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725331#comment-16725331
 ] 

ASF GitHub Bot commented on AIRFLOW-3249:
-

marengaz opened a new pull request #4345: [AIRFLOW-3249] unify (and fix) 
pushing result to xcom
URL: https://github.com/apache/incubator-airflow/pull/4345
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://jira.apache.org/jira/browse/AIRFLOW-3249) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://jira.apache.org/jira/browse/AIRFLOW-3249
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   see https://jira.apache.org/jira/browse/AIRFLOW-3249
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> unify do_xcom_push for all operators
> 
>
> Key: AIRFLOW-3249
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3249
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Ben Marengo
>Assignee: Ben Marengo
>Priority: Major
>
> following the implementation of AIRFLOW-3207 (global option to stop task 
> pushing result to xcom), i did a quick search around to find out which 
> operators have a custom implementation of this {{do_xcom_push}} flag:
> ||operator||instance var||__init__ arg||will change be backward comp?||
> |DatabricksRunNowOperator|do_xcom_push | do_xcom_push|(/)|
> |DatabricksSubmitRunOperator|do_xcom_push| do_xcom_push|(/)|
> |DatastoreExportOperator|xcom_push| xcom_push|(x)|
> |DatastoreImportOperator|xcom_push| xcom_push|(x)|
> |KubernetesPodOperator|xcom_push|xcom_push |(x)|
> |SSHOperator|xcom_push|xcom_push |(x)|
> |WinRMOperator|xcom_push| xcom_push|(x)|
> |BashOperator|xcom_push_flag|xcom_push|(x)|
> |DockerOperator|xcom_push_flag|xcom_push|(x)|
> |SimpleHttpOperator|xcom_push_flag|xcom_push|(x)|
> this custom implementation should be removed.
> i presume also that the operators with instance var = xcom_push conflict with 
> method BaseOperator.xcom_push() and probably aren't working properly anyway!?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2554) Inlets and outlets should be availabe in templates by their fully_qualified name or name

2018-12-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725287#comment-16725287
 ] 

ASF GitHub Bot commented on AIRFLOW-2554:
-

stale[bot] closed pull request #3453: [AIRFLOW-2554] Enable convenience access 
to in/outlets in templates
URL: https://github.com/apache/incubator-airflow/pull/3453
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/models.py b/airflow/models.py
index eda480832b..0f9e1eb131 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -1808,6 +1808,10 @@ def get_template_context(self, session=None):
 session.expunge_all()
 session.commit()
 
+# create convenience names for inlets and outlets
+inlets = dict((i.qualified_name, i) for i in task.inlets)
+outlets = dict((o.qualified_name, o) for o in task.outlets)
+
 if task.params:
 params.update(task.params)
 
@@ -1844,7 +1848,9 @@ def __getattr__(self, item):
 def __repr__(self):
 return str(self.var)
 
-return {
+# make sure dag level overwrite inlets/outlets
+context = dict(inlets, **outlets)
+context.update({
 'dag': task.dag,
 'ds': ds,
 'next_ds': next_ds,
@@ -1877,9 +1883,11 @@ def __repr__(self):
 'value': VariableAccessor(),
 'json': VariableJsonAccessor()
 },
-'inlets': task.inlets,
-'outlets': task.outlets,
-}
+'inlets': inlets,
+'outlets': outlets,
+})
+
+return context
 
 def overwrite_params_with_dag_run_conf(self, params, dag_run):
 if dag_run and dag_run.conf:
diff --git a/docs/lineage.rst b/docs/lineage.rst
index 719ef0115e..3a9c87e4ff 100644
--- a/docs/lineage.rst
+++ b/docs/lineage.rst
@@ -16,30 +16,30 @@ works.
 from airflow.lineage.datasets import File
 from airflow.models import DAG
 from datetime import timedelta
-
+
 FILE_CATEGORIES = ["CAT1", "CAT2", "CAT3"]
-
+
 args = {
 'owner': 'airflow',
 'start_date': airflow.utils.dates.days_ago(2)
 }
-
+
 dag = DAG(
 dag_id='example_lineage', default_args=args,
 schedule_interval='0 0 * * *',
 dagrun_timeout=timedelta(minutes=60))
-
+
 f_final = File("/tmp/final")
-run_this_last = DummyOperator(task_id='run_this_last', dag=dag, 
+run_this_last = DummyOperator(task_id='run_this_last', dag=dag,
 inlets={"auto": True},
 outlets={"datasets": [f_final,]})
-
+
 f_in = File("/tmp/whole_directory/")
 outlets = []
 for file in FILE_CATEGORIES:
 f_out = File("/tmp/{}/ execution_date ".format(file))
 outlets.append(f_out)
-run_this = BashOperator(
+run_this = BashOperator(
 task_id='run_me_first', bash_command='echo 1', dag=dag,
 inlets={"datasets": [f_in,]},
 outlets={"datasets": outlets}
@@ -49,25 +49,39 @@ works.
 
 Tasks take the parameters `inlets` and `outlets`. Inlets can be manually 
defined by a list of dataset `{"datasets":
 [dataset1, dataset2]}` or can be configured to look for outlets from upstream 
tasks `{"task_ids": ["task_id1", "task_id2"]}`
-or can be configured to pick up outlets from direct upstream tasks `{"auto": 
True}` or a combination of them. Outlets 
-are defined as list of dataset `{"datasets": [dataset1, dataset2]}`. Any 
fields for the dataset are templated with 
-the context when the task is being executed. 
+or can be configured to pick up outlets from direct upstream tasks `{"auto": 
True}` or a combination of them. Outlets
+are defined as list of dataset `{"datasets": [dataset1, dataset2]}`. Any 
fields for the dataset are templated with
+the context when the task is being executed.
 
 .. note:: Operators can add inlets and outlets automatically if the operator 
supports it.
 
-In the example DAG task `run_me_first` is a BashOperator that takes 3 inlets: 
`CAT1`, `CAT2`, `CAT3`, that are 
+In the example DAG task `run_me_first` is a BashOperator that takes 3 inlets: 
`CAT1`, `CAT2`, `CAT3`, that are
 generated from a list. Note that `execution_date` is a templated field and 
will be rendered when the task is running.
 
 .. note:: Behind the scenes Airflow prepares the lineage metadata as part of 
the `pre_execute` method of a task. When the task
-  has finished execution `post_execute` is called and lineage metadata 
is pushed into XCOM. Thus if you are creating 
+  has finished execution `post_execute` is called and lineage metadata 
is pushed into XCOM. Thus if you are creating
   your own 

[jira] [Commented] (AIRFLOW-2179) Make parametrable the IP on which the worker log server binds to

2018-12-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725286#comment-16725286
 ] 

ASF GitHub Bot commented on AIRFLOW-2179:
-

stale[bot] closed pull request #3101: AIRFLOW-2179: Make parametrable the IP on 
which the worker log server binds to
URL: https://github.com/apache/incubator-airflow/pull/3101
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index 98b4321706..e3112ae1bc 100755
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -859,10 +859,12 @@ def serve_logs(filename):  # noqa
 mimetype="application/json",
 as_attachment=False)
 
+WORKER_LOG_SERVER_BIND_IP = \
+conf.get('celery', 'WORKER_LOG_SERVER_BIND_IP')
 WORKER_LOG_SERVER_PORT = \
 int(conf.get('celery', 'WORKER_LOG_SERVER_PORT'))
 flask_app.run(
-host='0.0.0.0', port=WORKER_LOG_SERVER_PORT)
+host=WORKER_LOG_SERVER_BIND_IP, port=WORKER_LOG_SERVER_PORT)
 
 
 def worker(args):
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 5356af79b6..04da5b 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -298,8 +298,9 @@ worker_concurrency = 16
 # When you start an airflow worker, airflow starts a tiny web server
 # subprocess to serve the workers local log files to the airflow main
 # web server, who then builds pages and sends them to users. This defines
-# the port on which the logs are served. It needs to be unused, and open
+# the ip and the port on which the logs are served. It needs to be unused, and 
open
 # visible from the main web server to connect into the workers.
+worker_log_server_bind_ip = 0.0.0.0
 worker_log_server_port = 8793
 
 # The Celery broker URL. Celery supports RabbitMQ, Redis and experimentally
diff --git a/airflow/config_templates/default_test.cfg 
b/airflow/config_templates/default_test.cfg
index eaf3d03694..897a1fbdea 100644
--- a/airflow/config_templates/default_test.cfg
+++ b/airflow/config_templates/default_test.cfg
@@ -77,6 +77,7 @@ smtp_mail_from = airf...@example.com
 [celery]
 celery_app_name = airflow.executors.celery_executor
 worker_concurrency = 16
+worker_log_server_bind_ip = 0.0.0.0
 worker_log_server_port = 8793
 broker_url = sqla+mysql://airflow:airflow@localhost:3306/airflow
 result_backend = db+mysql://airflow:airflow@localhost:3306/airflow
diff --git a/scripts/ci/airflow_travis.cfg b/scripts/ci/airflow_travis.cfg
index 03d7e594a1..c5c4da1535 100644
--- a/scripts/ci/airflow_travis.cfg
+++ b/scripts/ci/airflow_travis.cfg
@@ -47,6 +47,7 @@ smtp_mail_from = airf...@example.com
 [celery]
 celery_app_name = airflow.executors.celery_executor
 worker_concurrency = 16
+worker_log_server_bind_ip = 0.0.0.0
 worker_log_server_port = 8793
 broker_url = amqp://guest:guest@localhost:5672/
 result_backend = db+mysql://root@localhost/airflow


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make parametrable the IP on which the worker log server binds to
> 
>
> Key: AIRFLOW-2179
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2179
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: celery, webserver
>Reporter: Albin Gilles
>Priority: Minor
>
> Hello,
> I'd be glad if the tiny web server subprocess to serve the workers local log 
> files could be set to bind to localhost only as could be done for Gunicorn or 
> Flower. See 
> [cli.py#L865|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L865]
> If you don't see any issue with that possibility, I'll be happy to propose a 
> PR on github, see 
> [3101|https://github.com/apache/incubator-airflow/pull/3101].
> Regards,
>  Albin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3540) Presence of ~/airflow/airflow.cfg shouldn't override config file settings

2018-12-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725248#comment-16725248
 ] 

ASF GitHub Bot commented on AIRFLOW-3540:
-

feng-tao closed pull request #4340: [AIRFLOW-3540] Respect environment config 
when looking up config file.
URL: https://github.com/apache/incubator-airflow/pull/4340
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/UPDATING.md b/UPDATING.md
index 986d3a23c1..575fc0a3c5 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -24,6 +24,14 @@ assists users migrating to a new version.
 
 ## Airflow Master
 
+### Modification to config file discovery
+
+If the `AIRFLOW_CONFIG` environment variable was not set and the
+`~/airflow/airflow.cfg` file existed, airflow previously used
+`~/airflow/airflow.cfg` instead of `$AIRFLOW_HOME/airflow.cfg`. Now airflow
+will discover its config file using the `$AIRFLOW_CONFIG` and `$AIRFLOW_HOME`
+environment variables rather than checking for the presence of a file.
+
 ### Modification to `ts_nodash` macro
 `ts_nodash` previously contained TimeZone information alongwith execution 
date. For Example: `20150101T00+`. This is not user-friendly for file 
or folder names which was a popular use case for `ts_nodash`. Hence this 
behavior has been changed and using `ts_nodash` will no longer contain TimeZone 
information, restoring the pre-1.10 behavior of this macro. And a new macro 
`ts_nodash_with_tz` has been added which can be used to get a string with 
execution date and timezone info without dashes. 
 
diff --git a/airflow/configuration.py b/airflow/configuration.py
index 3662df8d06..332c069a3a 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -441,23 +441,23 @@ def mkdir_p(path):
 'Error creating {}: {}'.format(path, exc.strerror))
 
 
-# Setting AIRFLOW_HOME and AIRFLOW_CONFIG from environment variables, using
-# "~/airflow" and "~/airflow/airflow.cfg" respectively as defaults.
+def get_airflow_home():
+return expand_env_var(os.environ.get('AIRFLOW_HOME', '~/airflow'))
+
+
+def get_airflow_config(airflow_home):
+if 'AIRFLOW_CONFIG' not in os.environ:
+return os.path.join(airflow_home, 'airflow.cfg')
+return expand_env_var(os.environ['AIRFLOW_CONFIG'])
 
-if 'AIRFLOW_HOME' not in os.environ:
-AIRFLOW_HOME = expand_env_var('~/airflow')
-else:
-AIRFLOW_HOME = expand_env_var(os.environ['AIRFLOW_HOME'])
 
+# Setting AIRFLOW_HOME and AIRFLOW_CONFIG from environment variables, using
+# "~/airflow" and "$AIRFLOW_HOME/airflow.cfg" respectively as defaults.
+
+AIRFLOW_HOME = get_airflow_home()
+AIRFLOW_CONFIG = get_airflow_config(AIRFLOW_HOME)
 mkdir_p(AIRFLOW_HOME)
 
-if 'AIRFLOW_CONFIG' not in os.environ:
-if os.path.isfile(expand_env_var('~/airflow.cfg')):
-AIRFLOW_CONFIG = expand_env_var('~/airflow.cfg')
-else:
-AIRFLOW_CONFIG = AIRFLOW_HOME + '/airflow.cfg'
-else:
-AIRFLOW_CONFIG = expand_env_var(os.environ['AIRFLOW_CONFIG'])
 
 # Set up dags folder for unit tests
 # this directory won't exist if users install via pip
diff --git a/tests/test_configuration.py b/tests/test_configuration.py
index acebd5732c..9f903f58b3 100644
--- a/tests/test_configuration.py
+++ b/tests/test_configuration.py
@@ -21,6 +21,7 @@
 from __future__ import unicode_literals
 
 import os
+import contextlib
 from collections import OrderedDict
 
 import six
@@ -35,6 +36,23 @@
 import unittest
 
 
+@contextlib.contextmanager
+def env_vars(**vars):
+original = {}
+for key, value in vars.items():
+original[key] = os.environ.get(key)
+if value is not None:
+os.environ[key] = value
+else:
+os.environ.pop(key, None)
+yield
+for key, value in original.items():
+if value is not None:
+os.environ[key] = value
+else:
+os.environ.pop(key, None)
+
+
 class ConfTest(unittest.TestCase):
 
 @classmethod
@@ -49,6 +67,30 @@ def tearDownClass(cls):
 del os.environ['AIRFLOW__TESTSECTION__TESTKEY']
 del os.environ['AIRFLOW__TESTSECTION__TESTPERCENT']
 
+def test_airflow_home_default(self):
+with env_vars(AIRFLOW_HOME=None):
+self.assertEqual(
+configuration.get_airflow_home(),
+configuration.expand_env_var('~/airflow'))
+
+def test_airflow_home_override(self):
+with env_vars(AIRFLOW_HOME='/path/to/airflow'):
+self.assertEqual(
+configuration.get_airflow_home(),
+'/path/to/airflow')
+
+def test_airflow_config_default(self):
+with env_vars(AIRFLOW_CONFIG=None):
+self.assertEqual(
+

[jira] [Commented] (AIRFLOW-3246) Make hmsclient import optional

2018-12-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724841#comment-16724841
 ] 

ASF GitHub Bot commented on AIRFLOW-3246:
-

ashb closed pull request #4080: [AIRFLOW-3246] Make hmsclient import optional
URL: https://github.com/apache/incubator-airflow/pull/4080
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/hooks/hive_hooks.py b/airflow/hooks/hive_hooks.py
index 178606aa4e..42a42318c6 100644
--- a/airflow/hooks/hive_hooks.py
+++ b/airflow/hooks/hive_hooks.py
@@ -27,7 +27,6 @@
 from collections import OrderedDict
 from tempfile import NamedTemporaryFile
 
-import hmsclient
 import six
 import unicodecsv as csv
 from past.builtins import basestring
@@ -496,6 +495,7 @@ def get_metastore_client(self):
 """
 Returns a Hive thrift client.
 """
+import hmsclient
 from thrift.transport import TSocket, TTransport
 from thrift.protocol import TBinaryProtocol
 ms = self.metastore_conn


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make hmsclient import optional
> --
>
> Key: AIRFLOW-3246
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3246
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hive_hooks
>Affects Versions: 1.10.0
>Reporter: Gavrilov Seva
>Priority: Minor
>
> Currently hmsclient is imported globally in hive_hooks.py, which is 
> inconsistent with the general style in this file: hive dependencies are 
> imported during the runtime. For example thrift components are imported 
> inside the {{get_metastore_client}} method, but hmsclient also imports thrift 
> components, so it forces you to have them installed.
> I moved the import in this PR: 
> https://github.com/apache/incubator-airflow/pull/4080
> To give you a bit more information on why i even bother to do such a change, 
> we are having problems with the new hive dependencies of airflow 1.10, 
> particularly new version of pyhive. I described the problem 
> [here|https://github.com/dropbox/PyHive/issues/240], seems like a combination 
> of docker environment with newest versions of these libraries. We opted to 
> rollback HiveServer2 hook to use the old dependencies, among them 
> {{thrift==0.9.3}}, and hmsclient requires newer version of thrift. If you by 
> chance have any clue on how we can diagnose our problem, please let me know.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3506) Fail to query log from elasticsearch

2018-12-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724537#comment-16724537
 ] 

ASF GitHub Bot commented on AIRFLOW-3506:
-

feng-tao closed pull request #4342: [AIRFLOW-3506] use match_phrase to query 
log_id in elasticsearch
URL: https://github.com/apache/incubator-airflow/pull/4342
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/utils/log/es_task_handler.py 
b/airflow/utils/log/es_task_handler.py
index 16372c0600..2dbee94171 100644
--- a/airflow/utils/log/es_task_handler.py
+++ b/airflow/utils/log/es_task_handler.py
@@ -135,7 +135,7 @@ def es_read(self, log_id, offset):
 
 # Offset is the unique key for sorting logs given log_id.
 s = Search(using=self.client) \
-.query('match', log_id=log_id) \
+.query('match_phrase', log_id=log_id) \
 .sort('offset')
 
 s = s.filter('range', offset={'gt': offset})
diff --git a/tests/utils/log/elasticmock/fake_elasticsearch.py 
b/tests/utils/log/elasticmock/fake_elasticsearch.py
index 0e29e91bb7..f068ede0e5 100644
--- a/tests/utils/log/elasticmock/fake_elasticsearch.py
+++ b/tests/utils/log/elasticmock/fake_elasticsearch.py
@@ -172,15 +172,8 @@ def count(self, index=None, doc_type=None, body=None, 
params=None):
   'track_scores', 'version')
 def search(self, index=None, doc_type=None, body=None, params=None):
 searchable_indexes = self._normalize_index_to_list(index)
-searchable_doc_types = self._normalize_doc_type_to_list(doc_type)
 
-matches = []
-for searchable_index in searchable_indexes:
-for document in self.__documents_dict[searchable_index]:
-if searchable_doc_types\
-   and document.get('_type') not in searchable_doc_types:
-continue
-matches.append(document)
+matches = self._find_match(index, doc_type, body, params)
 
 result = {
 'hits': {
@@ -258,6 +251,31 @@ def suggest(self, body, index=None, params=None):
 ]
 return result_dict
 
+def _find_match(self, index, doc_type, body, params=None):
+searchable_indexes = self._normalize_index_to_list(index)
+searchable_doc_types = self._normalize_doc_type_to_list(doc_type)
+
+must = body['query']['bool']['must'][0]  # only support one must
+
+matches = []
+for searchable_index in searchable_indexes:
+for document in self.__documents_dict[searchable_index]:
+if searchable_doc_types\
+   and document.get('_type') not in searchable_doc_types:
+continue
+
+if 'match_phrase' in must:
+for query_id in must['match_phrase']:
+query_val = must['match_phrase'][query_id]
+if query_id in document['_source']:
+if query_val in document['_source'][query_id]:
+# use in as a proxy for match_phrase
+matches.append(document)
+else:
+matches.append(document)
+
+return matches
+
 def _normalize_index_to_list(self, index):
 # Ensure to have a list of index
 if index is None:
diff --git a/tests/utils/log/test_es_task_handler.py 
b/tests/utils/log/test_es_task_handler.py
index 94184fc826..c5164b1e19 100644
--- a/tests/utils/log/test_es_task_handler.py
+++ b/tests/utils/log/test_es_task_handler.py
@@ -39,8 +39,7 @@ class TestElasticsearchTaskHandler(unittest.TestCase):
 DAG_ID = 'dag_for_testing_file_task_handler'
 TASK_ID = 'task_for_testing_file_log_handler'
 EXECUTION_DATE = datetime(2016, 1, 1)
-LOG_ID = 'dag_for_testing_file_task_handler-task_for_testing' \
- '_file_log_handler-2016-01-01T00:00:00+00:00-1'
+LOG_ID = 
'{dag_id}-{task_id}-2016-01-01T00:00:00+00:00-1'.format(dag_id=DAG_ID, 
task_id=TASK_ID)
 
 @elasticmock
 def setUp(self):
@@ -94,6 +93,31 @@ def test_read(self):
 self.assertEqual(1, metadatas[0]['offset'])
 self.assertTrue(timezone.parse(metadatas[0]['last_log_timestamp']) > 
ts)
 
+def test_read_with_match_phrase_query(self):
+simiar_log_id = 
'{task_id}-{dag_id}-2016-01-01T00:00:00+00:00-1'.format(
+dag_id=TestElasticsearchTaskHandler.DAG_ID,
+task_id=TestElasticsearchTaskHandler.TASK_ID)
+another_test_message = 'another message'
+
+another_body = {'message': another_test_message, 'log_id': 
simiar_log_id, 'offset': 1}
+self.es.index(index=self.index_name, 

[jira] [Commented] (AIRFLOW-2082) Password Web Authentication is not working

2018-12-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724451#comment-16724451
 ] 

ASF GitHub Bot commented on AIRFLOW-2082:
-

rgangopadhya opened a new pull request #4343: [AIRFLOW-2082] Resolve a bug in 
adding password_auth to API as auth method
URL: https://github.com/apache/incubator-airflow/pull/4343
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-2082) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
 - The current docs indicate that password authentication can be added to 
the experimental REST API by adding `auth_backend = 
airflow.contrib.auth.backends.password_auth` to the api config. This fails with 
`AttributeError: module 'airflow.contrib.auth.backends.password_auth' has no 
attribute 'client_auth'`. This PR follows the suggestion in the JIRA ticket to 
simply add `client_auth = None`
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   If there are existing tests around config, would be nice to add this in 
there. I took a quick look and didn't see any. 
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Password Web Authentication is not working
> --
>
> Key: AIRFLOW-2082
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2082
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.8.0
>Reporter: Andrey
>Assignee: Brian Charous
>Priority: Major
>
> I followed instruction from 
> [https://github.com/apache/incubator-airflow/blob/master/docs/security.rst#password]
> and as a result scheduler is unable to start anymore
> {code}
> scheduler_1 | [2018-02-08 17:46:49,546] \{configuration.py:206} WARNING - 
> section/key [celery/celery_ssl_active] not found in config
> scheduler_1 | [2018-02-08 17:46:49,550] \{default_celery.py:41} WARNING - 
> Celery Executor will run without SSL
> scheduler_1 | [2018-02-08 17:46:49,553] \{__init__.py:45} INFO - Using 
> executor CeleryExecutor
> scheduler_1 | 
> /usr/local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: 
> The psycopg2 wheel package will be renamed from release 2.8; in order to keep 
> installing from binary please use "pip install psycopg2-binary" instead. For 
> details see: 
> .
> scheduler_1 | """)
> scheduler_1 | Traceback (most recent call last):
> scheduler_1 | File "/usr/local/bin/airflow", line 17, in 
> scheduler_1 | from airflow.bin.cli import CLIFactory
> scheduler_1 | File 
> "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 67, in 
> 
> scheduler_1 | auth=api.api_auth.client_auth)
> scheduler_1 | AttributeError: module 
> 'airflow.contrib.auth.backends.password_auth' has no attribute 'client_auth'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3537) Allow AWS ECS Operator to use templates in task_definition parameter

2018-12-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724281#comment-16724281
 ] 

ASF GitHub Bot commented on AIRFLOW-3537:
-

tomoyat opened a new pull request #4341: [AIRFLOW-3537] Add task_definition as 
templated parameter in AWS ECS Operator
URL: https://github.com/apache/incubator-airflow/pull/4341
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3537
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   When using the AWS ECS Operator, I'd like to pass the task_definition as a 
template. Because I decide the task_definition dynamically by another operator, 
and get the task_definition from XCom.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Updated existing test case to check for new field.
   
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow AWS ECS Operator to use templates in task_definition parameter
> 
>
> Key: AIRFLOW-3537
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3537
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: aws
>Reporter: tomoya tabata
>Assignee: tomoya tabata
>Priority: Minor
>
> The AWS ECS operator does not currently apply templates to the 
> task_definition parameter.
> I'd like to allow AWS ECS Operator to use templates in task_definition 
> parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3540) Presence of ~/airflow/airflow.cfg shouldn't override config file settings

2018-12-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724133#comment-16724133
 ] 

ASF GitHub Bot commented on AIRFLOW-3540:
-

jmcarp opened a new pull request #4340: [AIRFLOW-3540] Respect environment 
config when looking up config file.
URL: https://github.com/apache/incubator-airflow/pull/4340
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3540
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   If `~/airflow/airflow.cfg` exists, airflow uses that file instead of 
`$AIRFLOW_HOME/airflow.cfg`. This behavior is confusing--airflow shouldn't look 
for its config file in one location if it happens to exist there, else in 
another location. If `$AIRFLOW_HOME` is set and `$AIRFLOW_CONFIG` is not, we 
should only check `$AIRFLOW_HOME/airflow.cfg`.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Presence of ~/airflow/airflow.cfg shouldn't override config file settings
> -
>
> Key: AIRFLOW-3540
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3540
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Josh Carp
>Assignee: Josh Carp
>Priority: Trivial
>
> If `~/airflow/airflow.cfg` exists, airflow uses that file instead of 
> `$AIRFLOW_HOME/airflow.cfg`. This behavior is confusing. If `$AIRFLOW_HOME` 
> is set and `$AIRFLOW_CONFIG` is not, we should only check 
> `$AIRFLOW_HOME/airflow.cfg`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3528) handle socket exception with SFTPOperator

2018-12-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724130#comment-16724130
 ] 

ASF GitHub Bot commented on AIRFLOW-3528:
-

eladkal closed pull request #4325: [AIRFLOW-3528] handle socket exception with 
SFTPOperator
URL: https://github.com/apache/incubator-airflow/pull/4325
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/sftp_operator.py 
b/airflow/contrib/operators/sftp_operator.py
index 117bc55a8c..d364772c83 100644
--- a/airflow/contrib/operators/sftp_operator.py
+++ b/airflow/contrib/operators/sftp_operator.py
@@ -122,7 +122,11 @@ def execute(self, context):
 self.ssh_hook.remote_host = self.remote_host
 
 with self.ssh_hook.get_conn() as ssh_client:
-sftp_client = ssh_client.open_sftp()
+try:
+sftp_client = ssh_client.open_sftp()
+except OSError as e:
+raise AirflowException("Error while connecting client, 
error: {0}"
+   .format(str(e)))
 if self.operation.lower() == SFTPOperation.GET:
 local_folder = os.path.dirname(self.local_filepath)
 if self.create_intermediate_dirs:


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> handle socket exception with SFTPOperator
> -
>
> Key: AIRFLOW-3528
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3528
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: Elad
>Assignee: Elad
>Priority: Major
>
> Currently SFTPOperator executes:
>  
> {code:java}
> sftp_client = ssh_client.open_sftp()
> {code}
> without handling socket errors. 
> If error occurs with the connection the operator show the following message:
> {code:java}
> Subtask: airflow.exceptions.AirflowException: Error while transferring None, 
> error: 'NoneType' object has no attribute 'open_sftp'{code}
> This error is misleading because it suggests that there is a problem with the 
> file transfer yet the problem in fact can be with the connection.
>  
> The issue reported by lot on Slack.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3303) Deprecate old UI in favor of new FAB RBAC

2018-12-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723893#comment-16723893
 ] 

ASF GitHub Bot commented on AIRFLOW-3303:
-

verdan opened a new pull request #4339: [AIRFLOW-3303] Deprecate old UI in 
favor of FAB
URL: https://github.com/apache/incubator-airflow/pull/4339
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3303) issues and references 
them in the PR title. 
   
   ### Description
   
   - [x] We are using two different versions of UI in Apache Airflow. Idea is 
to deprecate and remove the older version of UI and use the new Flask App 
Builder (RBAC) version as the default UI from now on. (most probably in release 
2.0.x)
   This PR removes the old UI and renames the references of `www_rbac` to 
`www`. 
   
   ### Tests
   
   - [x] Skipped some of the test case classes as these were purely using the 
older version of application and configurations. 
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
   **All the help with manual testing would be highly appreciated ** 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Deprecate old UI in favor of new FAB RBAC
> -
>
> Key: AIRFLOW-3303
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3303
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Reporter: Verdan Mahmood
>Assignee: Verdan Mahmood
>Priority: Major
>
> It's hard to maintain two multiple UIs in parallel. 
> The idea is to remove the old UI in favor of the new FAB RBAC version. 
> Make sure to verify all the REST APIs are in place, and working. 
> All test cases should pass. Skip the tests related to the old UI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2929) Add get and set for pool class in models.py

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723733#comment-16723733
 ] 

ASF GitHub Bot commented on AIRFLOW-2929:
-

stale[bot] closed pull request #3858: [AIRFLOW-2929] Add get and set for pool 
class in models
URL: https://github.com/apache/incubator-airflow/pull/3858
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/api/client/local_client.py 
b/airflow/api/client/local_client.py
index 4b46921e64..194d6d225e 100644
--- a/airflow/api/client/local_client.py
+++ b/airflow/api/client/local_client.py
@@ -18,7 +18,7 @@
 # under the License.
 
 from airflow.api.client import api_client
-from airflow.api.common.experimental import pool
+from airflow.models import Pool
 from airflow.api.common.experimental import trigger_dag
 from airflow.api.common.experimental import delete_dag
 
@@ -38,16 +38,16 @@ def delete_dag(self, dag_id):
 return "Removed {} record(s)".format(count)
 
 def get_pool(self, name):
-p = pool.get_pool(name=name)
+p = Pool.get_pool(name=name)
 return p.pool, p.slots, p.description
 
 def get_pools(self):
-return [(p.pool, p.slots, p.description) for p in pool.get_pools()]
+return [(p.pool, p.slots, p.description) for p in Pool.get_pools()]
 
 def create_pool(self, name, slots, description):
-p = pool.create_pool(name=name, slots=slots, description=description)
+p = Pool.create_pool(name=name, slots=slots, description=description)
 return p.pool, p.slots, p.description
 
 def delete_pool(self, name):
-p = pool.delete_pool(name=name)
+p = Pool.delete_pool(name=name)
 return p.pool, p.slots, p.description
diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index 4ff1ae3679..c638c3c233 100644
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -57,7 +57,7 @@
 from airflow.executors import GetDefaultExecutor
 from airflow.models import (DagModel, DagBag, TaskInstance,
 DagPickle, DagRun, Variable, DagStat,
-Connection, DAG)
+Connection, DAG, Pool)
 
 from airflow.ti_deps.dep_context import (DepContext, SCHEDULER_DEPS)
 from airflow.utils import cli as cli_utils
@@ -263,29 +263,32 @@ def pool(args):
 log = LoggingMixin().log
 
 def _tabulate(pools):
-return "\n%s" % tabulate(pools, ['Pool', 'Slots', 'Description'],
+return "\n%s" % tabulate([(pool.pool,
+   pool.slots,
+   pool.description) for pool in pools],
+ ['Pool', 'Slots', 'Description'],
  tablefmt="fancy_grid")
 
 try:
 imp = getattr(args, 'import')
 if args.get is not None:
-pools = [api_client.get_pool(name=args.get)]
+pools = [Pool.get_pool(name=args.get)]
 elif args.set:
-pools = [api_client.create_pool(name=args.set[0],
-slots=args.set[1],
-description=args.set[2])]
+pools = [Pool.create_pool(name=args.set[0],
+  slots=args.set[1],
+  description=args.set[2])]
 elif args.delete:
-pools = [api_client.delete_pool(name=args.delete)]
+pools = [Pool.delete_pool(name=args.delete)]
 elif imp:
 if os.path.exists(imp):
 pools = pool_import_helper(imp)
 else:
 print("Missing pools file.")
-pools = api_client.get_pools()
+pools = Pool.get_pools()
 elif args.export:
 pools = pool_export_helper(args.export)
 else:
-pools = api_client.get_pools()
+pools = Pool.get_pools()
 except (AirflowException, IOError) as err:
 log.error(err)
 else:
@@ -305,9 +308,9 @@ def pool_import_helper(filepath):
 n = 0
 for k, v in d.items():
 if isinstance(v, dict) and len(v) == 2:
-pools.append(api_client.create_pool(name=k,
-slots=v["slots"],
-
description=v["description"]))
+pools.append(Pool.create_pool(name=k,
+  slots=v["slots"],
+  
description=v["description"]))
 n += 1
 else:
 pass
@@ 

[jira] [Commented] (AIRFLOW-2428) Add AutoScalingRole key to emr_hook

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723724#comment-16723724
 ] 

ASF GitHub Bot commented on AIRFLOW-2428:
-

stale[bot] closed pull request #3713: [AIRFLOW-2428] Add AutoScalingRole key to 
emr_hook
URL: https://github.com/apache/incubator-airflow/pull/3713
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/emr_hook.py 
b/airflow/contrib/hooks/emr_hook.py
index 6cd92c6d85..fc77b4dfb5 100644
--- a/airflow/contrib/hooks/emr_hook.py
+++ b/airflow/contrib/hooks/emr_hook.py
@@ -63,6 +63,8 @@ def create_job_flow(self, job_flow_overrides):
 VisibleToAllUsers=config.get('VisibleToAllUsers'),
 JobFlowRole=config.get('JobFlowRole'),
 ServiceRole=config.get('ServiceRole'),
+AutoScalingRole=config.get('AutoScalingRole'),
+ScaleDownBehavior=config.get('ScaleDownBehavior'),
 Tags=config.get('Tags')
 )
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add AutoScalingRole key to emr_hook
> ---
>
> Key: AIRFLOW-2428
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2428
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Kyle Hamlin
>Priority: Minor
> Fix For: 1.10.0
>
>
> Need to be able to pass the `AutoScalingRole` param to the `run_job_flow` 
> method for EMR autoscaling to work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3029) New Operator - SqlOperator

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723737#comment-16723737
 ] 

ASF GitHub Bot commented on AIRFLOW-3029:
-

stale[bot] closed pull request #3891: [AIRFLOW-3029] New Operator - SqlOperator
URL: https://github.com/apache/incubator-airflow/pull/3891
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/sql_operator.py 
b/airflow/contrib/operators/sql_operator.py
new file mode 100644
index 00..19a0c857b7
--- /dev/null
+++ b/airflow/contrib/operators/sql_operator.py
@@ -0,0 +1,65 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.hooks.base_hook import BaseHook
+from airflow.models import BaseOperator
+from airflow.utils.decorators import apply_defaults
+
+
+class SqlOperator(BaseOperator):
+"""
+Executes sql code in a database.
+
+This abstract operator can be instantiated directly,
+and does not need to be derived into subclasses for each DbApiHook 
subclass.
+It will automatically usesthe correct DbApiHook subclass implmenetation,
+made possible by reflecting upon the Connection's assigned `conn_type`.
+
+:param conn_id: reference to a predefined sql database connection
+:type conn_id: str
+:param sql: the sql code to be executed. (templated)
+:type sql: Can receive a str representing a sql statement,
+a list of str (sql statements), or reference to a template file.
+Template reference are recognized by str ending in '.sql'
+"""
+
+template_fields = ('sql',)
+template_ext = ('.sql',)
+ui_color = '#ededed'
+
+@apply_defaults
+def __init__(
+self,
+sql,
+conn_id,
+autocommit=False,
+parameters=None,
+*args, **kwargs):
+super(SqlOperator, self).__init__(*args, **kwargs)
+self.parameters = parameters
+self.sql = sql
+self.conn_id = conn_id
+self.autocommit = autocommit
+
+def execute(self, context):
+self.log.info('Executing: %s', self.sql)
+hook = BaseHook.get_hook(conn_id=self.conn_id)
+hook.run(sql=self.sql,
+ autocommit=self.autocommit,
+ parameters=self.parameters)
diff --git a/docs/concepts.rst b/docs/concepts.rst
index 50c18c9b98..100b070096 100644
--- a/docs/concepts.rst
+++ b/docs/concepts.rst
@@ -116,7 +116,7 @@ Airflow provides operators for many common tasks, including:
 - ``PythonOperator`` - calls an arbitrary Python function
 - ``EmailOperator`` - sends an email
 - ``SimpleHttpOperator`` - sends an HTTP request
-- ``MySqlOperator``, ``SqliteOperator``, ``PostgresOperator``, 
``MsSqlOperator``, ``OracleOperator``, ``JdbcOperator``, etc. - executes a SQL 
command
+- ``SqlOperator``, ``MySqlOperator``, ``SqliteOperator``, 
``PostgresOperator``, ``MsSqlOperator``, ``OracleOperator``, ``JdbcOperator``, 
etc. - executes a SQL command
 - ``Sensor`` - waits for a certain time, file, database row, S3 key, etc...
 
 In addition to these basic building blocks, there are many more specific
diff --git a/tests/contrib/operators/test_sql_operator.py 
b/tests/contrib/operators/test_sql_operator.py
new file mode 100644
index 00..dc849270e1
--- /dev/null
+++ b/tests/contrib/operators/test_sql_operator.py
@@ -0,0 +1,105 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,

[jira] [Commented] (AIRFLOW-2920) Kubernetes pod operator: namespace is a hard requirement

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723726#comment-16723726
 ] 

ASF GitHub Bot commented on AIRFLOW-2920:
-

stale[bot] closed pull request #3774: [AIRFLOW-2920] Added downward API 
metadata to Kubernetes pods
URL: https://github.com/apache/incubator-airflow/pull/3774
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/airflow/contrib/kubernetes/kubernetes_request_factory/kubernetes_request_factory.py
 
b/airflow/contrib/kubernetes/kubernetes_request_factory/kubernetes_request_factory.py
index 27e0ebd29c..e17cb6a773 100644
--- 
a/airflow/contrib/kubernetes/kubernetes_request_factory/kubernetes_request_factory.py
+++ 
b/airflow/contrib/kubernetes/kubernetes_request_factory/kubernetes_request_factory.py
@@ -57,6 +57,21 @@ def add_secret_to_env(env, secret):
 }
 })
 
+@staticmethod
+def add_downward_api_metadata_to_env(env):
+env.append({
+'name': 'POD_NAME',
+'valueFrom': {
+'fieldRef': {'fieldPath': 'metadata.name'}
+}
+})
+env.append({
+'name': 'POD_NAMESPACE',
+'valueFrom': {
+'fieldRef': {'fieldPath': 'metadata.namespace'}
+}
+})
+
 @staticmethod
 def extract_labels(pod, req):
 req['metadata']['labels'] = req['metadata'].get('labels', {})
@@ -138,6 +153,7 @@ def extract_env_and_secrets(pod, req):
 env.append({'name': k, 'value': pod.envs[k]})
 for secret in env_secrets:
 KubernetesRequestFactory.add_secret_to_env(env, secret)
+KubernetesRequestFactory.add_downward_api_metadata_to_env(env)
 req['spec']['containers'][0]['env'] = env
 
 @staticmethod


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Kubernetes pod operator: namespace is a hard requirement
> 
>
> Key: AIRFLOW-2920
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2920
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jon Davies
>Priority: Major
>
> Hello,
> I'm using the Kubernetes pod operator for my DAGs, I install Airflow to its 
> own namespace within my Kubernetes cluster (for example: "testing-airflow") 
> and I would like pods spun up by that Airflow instance to live in that 
> namespace.
> However, I have to hardcode the namespace into my DAG definition code and so 
> I have to rebuild the Docker image for Airflow to be able to spin up a 
> "production-airflow" namespace as the namespace is a hard requirement in the 
> Python code - it'd be nice if the DAG could just default to its own namespace 
> if none is defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3106) Validate Postgres connection when saving

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723736#comment-16723736
 ] 

ASF GitHub Bot commented on AIRFLOW-3106:
-

stale[bot] closed pull request #3941: [AIRFLOW-3106] Validate Postgres 
connection after saving it
URL: https://github.com/apache/incubator-airflow/pull/3941
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Validate Postgres connection when saving
> 
>
> Key: AIRFLOW-3106
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3106
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bas Harenslak
>Priority: Minor
>
> I've encountered failures in DAG runs at various occasions due to invalid 
> connection credentials, or a domain was unreachable from the Airflow instance.
> It'd be nice to validate a connection when saving it, to directly know if a 
> given connection can be made or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3206) More neutral language regarding Copyleft in installation instructions

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723738#comment-16723738
 ] 

ASF GitHub Bot commented on AIRFLOW-3206:
-

stale[bot] closed pull request #4055: [AIRFLOW-3206] neutral and clear GPL 
dependency notice
URL: https://github.com/apache/incubator-airflow/pull/4055
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/setup.py b/setup.py
index b1376bb5bb..a85b3e9581 100644
--- a/setup.py
+++ b/setup.py
@@ -46,11 +46,13 @@ def verify_gpl_dependency():
 os.environ["SLUGIFY_USES_TEXT_UNIDECODE"] = "yes"
 
 if not os.getenv("AIRFLOW_GPL_UNIDECODE") and not 
os.getenv("SLUGIFY_USES_TEXT_UNIDECODE") == "yes":
-raise RuntimeError("By default one of Airflow's dependencies installs 
a GPL "
-   "dependency (unidecode). To avoid this dependency 
set "
-   "SLUGIFY_USES_TEXT_UNIDECODE=yes in your 
environment when you "
-   "install or upgrade Airflow. To force installing 
the GPL "
-   "version set AIRFLOW_GPL_UNIDECODE")
+raise RuntimeError(
+"By default, one of Airflow's dependencies (unidecode) is GPL 
licensed .\n"
+"In order to proceed with installation, "
+"you will need to set one of the following environment 
variables:\n"
+"To disallow the dependency, export 
SLUGIFY_USES_TEXT_UNIDECODE=yes.\n"
+"To allow the dependency, export AIRFLOW_GPL_UNIDECODE=yes.\n"
+"Once either environment variable is set, you may proceed with 
installation.")
 
 
 class Tox(TestCommand):


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> More neutral language regarding Copyleft in installation instructions
> -
>
> Key: AIRFLOW-3206
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3206
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Brylie Christopher Oxley
>Assignee: Brylie Christopher Oxley
>Priority: Trivial
>  Labels: newbie
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> When installing Airflow, the user must set an environment variable to 
> explicitly _allow_ or _disallow_ the installation of a GPL dependency. The 
> text of the error message is somewhat difficult to read, and seems biased 
> against the GPL dependency.
> h2. Task
>  * add proper line breaks to GPL dependency notice, for improved readability
>  * use neutral language _allow_ and _disallow_ (as opposed to 'force')



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3011) CLI command to output actual airflow.cfg

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723735#comment-16723735
 ] 

ASF GitHub Bot commented on AIRFLOW-3011:
-

stale[bot] closed pull request #3867: [AIRFLOW-3011][CLI] Add cmd function 
printing config
URL: https://github.com/apache/incubator-airflow/pull/3867
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index 4ff1ae3679..69ebe27794 100644
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -1443,6 +1443,19 @@ def sync_perm(args): # noqa
 Arg.__new__.__defaults__ = (None, None, None, None, None, None, None)
 
 
+@cli_utils.action_logging
+def config(args):
+"""
+Prints config file contents to STDOUT.
+:param args:
+:return:
+"""
+config_file_path = os.path.join(settings.AIRFLOW_HOME, "airflow.cfg")
+log.info("Config file location: {}".format(config_file_path))
+with open(config_file_path, 'r') as config_file:
+print(config_file.read())
+
+
 class CLIFactory(object):
 args = {
 # Shared
@@ -2040,6 +2053,11 @@ class CLIFactory(object):
 'func': sync_perm,
 'help': "Update existing role's permissions.",
 'args': tuple(),
+},
+{
+'func': config,
+'help': "Displaying config file",
+'args': tuple(),
 }
 )
 subparsers_dict = {sp['func'].__name__: sp for sp in subparsers}


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CLI command to output actual airflow.cfg
> 
>
> Key: AIRFLOW-3011
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3011
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: cli
>Affects Versions: 1.10.0
>Reporter: Victor
>Assignee: Valerii Zhuk
>Priority: Minor
>
> The only way to see the actual airflow configuration (including overriden 
> informations with environment variables) is through web ui.
> For security reason, this is often disabled.
> A CLI command to do the same thing would:
>  * give admins a way to see it
>  * give operators a way to manipulate the configuration (storing it, etc)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3077) Mongo Hook Raise Error and Stop Migration Due to Bad Encoding from PyMongo

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723734#comment-16723734
 ] 

ASF GitHub Bot commented on AIRFLOW-3077:
-

stale[bot] closed pull request #3912: [AIRFLOW-3077] Default Not to Raise Error 
When PyMongo Contruct JSON Data
URL: https://github.com/apache/incubator-airflow/pull/3912
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/mongo_hook.py 
b/airflow/contrib/hooks/mongo_hook.py
index 80ceddec14..bd495eb294 100644
--- a/airflow/contrib/hooks/mongo_hook.py
+++ b/airflow/contrib/hooks/mongo_hook.py
@@ -70,6 +70,9 @@ def get_conn(self):
 if options.get('ssl', False):
 options.update({'ssl_cert_reqs': CERT_NONE})
 
+if not options.get('unicode_decode_error_handler', False):
+options.update({'unicode_decode_error_handler': 'ignore'})
+
 self.client = MongoClient(uri, **options)
 
 return self.client


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Mongo Hook Raise Error and Stop Migration Due to Bad Encoding from PyMongo
> --
>
> Key: AIRFLOW-3077
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3077
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: database, hooks
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Bernie Chiu
>Assignee: Bernie Chiu
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Due to the fact that a single encoding problem should not stop the dataflow, 
> provide the default with `ignore` option is best for default since PyMongo 
> will still try to do its best to reconstruct the JSON data.
> [https://stackoverflow.com/questions/36314776/pymongo-error-bson-errors-invalidbson-utf8-codec-cant-decode-byte-0xa1-in-p]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3014) Password column on connection table should be longer

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723729#comment-16723729
 ] 

ASF GitHub Bot commented on AIRFLOW-3014:
-

stale[bot] closed pull request #3851: AIRFLOW-3014 Increase possible length of 
passwords in connection table
URL: https://github.com/apache/incubator-airflow/pull/3851
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/migrations/versions/5bb129a06791_allow_longer_passwords.py 
b/airflow/migrations/versions/5bb129a06791_allow_longer_passwords.py
new file mode 100644
index 00..0a9ad01aaa
--- /dev/null
+++ b/airflow/migrations/versions/5bb129a06791_allow_longer_passwords.py
@@ -0,0 +1,44 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Increase length of password column in connection table
+
+Revision ID: 5bb129a06791
+Revises: dd25f486b8ea
+Create Date: 2018-09-05 09:07:19.143887
+
+"""
+from alembic import op
+import sqlalchemy as sa
+
+# revision identifiers, used by Alembic.
+revision = '5bb129a06791'
+down_revision = 'dd25f486b8ea'
+branch_labels = None
+depends_on = None
+
+
+def upgrade():
+op.alter_column(table_name='connection',
+column_name='password',
+type_=sa.String(length=5000))
+
+
+def downgrade():
+# This migration cannot be undone
+pass


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Password column on connection table should be longer
> 
>
> Key: AIRFLOW-3014
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3014
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Brian Campbell
>Priority: Minor
>
> The password column in the connection table has a maximum length of 500 
> characters. In some cases this is insufficient. For example, AWS provides 
> passwords for ECR that, when encrypted, are longer than 3400 characters. In 
> order to use the docker operator with ECR, you will need a longer password 
> column on the connection table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2824) Disable loading of default connections via airflow config

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723728#comment-16723728
 ] 

ASF GitHub Bot commented on AIRFLOW-2824:
-

stale[bot] closed pull request #3796: [AIRFLOW-2824] - Add config to disable 
default conn creation
URL: https://github.com/apache/incubator-airflow/pull/3796
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 18c486cb1e..bb1e1c8665 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -122,6 +122,10 @@ non_pooled_task_slot_count = 128
 # The maximum number of active DAG runs per DAG
 max_active_runs_per_dag = 16
 
+# Whether to load the default connections during the CLI command
+# `airflow initdb`.
+load_default_conns = True
+
 # Whether to load the examples that ship with Airflow. It's good to
 # get started, but you probably want to set this to False in a production
 # environment
diff --git a/airflow/utils/db.py b/airflow/utils/db.py
index b57b8cf92b..9c9fd6bcad 100644
--- a/airflow/utils/db.py
+++ b/airflow/utils/db.py
@@ -27,6 +27,7 @@
 import os
 import contextlib
 
+from airflow import configuration
 from airflow import settings
 from airflow.utils.log.logging_mixin import LoggingMixin
 
@@ -85,11 +86,8 @@ def merge_conn(conn, session=None):
 session.commit()
 
 
-def initdb(rbac=False):
-session = settings.Session()
-
+def load_default_conns():
 from airflow import models
-upgradedb()
 
 merge_conn(
 models.Connection(
@@ -286,6 +284,16 @@ def initdb(rbac=False):
 conn_id='cassandra_default', conn_type='cassandra',
 host='cassandra', port=9042))
 
+
+def initdb(rbac=False):
+session = settings.Session()
+
+from airflow import models
+upgradedb()
+
+if configuration.conf.getboolean('core', 'LOAD_DEFAULT_CONNS'):
+load_default_conns()
+
 # Known event types
 KET = models.KnownEventType
 if not session.query(KET).filter(KET.know_event_type == 'Holiday').first():


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable loading of default connections via airflow config
> -
>
> Key: AIRFLOW-2824
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2824
> Project: Apache Airflow
>  Issue Type: Wish
>Reporter: Felix Uellendall
>Assignee: Andy Cooper
>Priority: Major
>
> I would love to have a variable I can set in the airflow.cfg, like the DAG 
> examples have, to not load the default connections.
> Either by using {{load_examples}} that is already 
> [there|https://github.com/apache/incubator-airflow/blob/dfa7b26ddaca80ee8fd9915ee9f6eac50fac77f6/airflow/config_templates/default_airflow.cfg#L128]
>  for loading dag examples or by a new one like {{load_default_connections}} 
> to check if the user wants to have it or not.
> The implementation of the default connections starts 
> [here|https://github.com/apache/incubator-airflow/blob/9e1d8ee837ea2c23e828d070b6a72a6331d98602/airflow/utils/db.py#L94]
> Let me know what you guys think of it, pls. :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2062) Support fine-grained Connection encryption

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723730#comment-16723730
 ] 

ASF GitHub Bot commented on AIRFLOW-2062:
-

stale[bot] closed pull request #3805: [AIRFLOW-2062] Add per-connection KMS 
encryption.
URL: https://github.com/apache/incubator-airflow/pull/3805
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index 1c5494ead1..15b061c94c 100644
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -1133,7 +1133,8 @@ def version(args):  # noqa
 
 
 alternative_conn_specs = ['conn_type', 'conn_host',
-  'conn_login', 'conn_password', 'conn_schema', 
'conn_port']
+  'conn_login', 'conn_password', 'conn_schema', 
'conn_port',
+  'kms_conn_id', 'kms_extra']
 
 
 @cli_utils.action_logging
@@ -1235,7 +1236,10 @@ def connections(args):
 return
 
 if args.conn_uri:
-new_conn = Connection(conn_id=args.conn_id, uri=args.conn_uri)
+new_conn = Connection(conn_id=args.conn_id,
+  uri=args.conn_uri,
+  kms_conn_id=args.kms_conn_id,
+  kms_extra=args.kms_extra)
 else:
 new_conn = Connection(conn_id=args.conn_id,
   conn_type=args.conn_type,
@@ -1243,7 +1247,10 @@ def connections(args):
   login=args.conn_login,
   password=args.conn_password,
   schema=args.conn_schema,
-  port=args.conn_port)
+  port=args.conn_port,
+  kms_conn_id=args.kms_conn_id,
+  kms_extra=args.kms_extra
+  )
 if args.conn_extra is not None:
 new_conn.set_extra(args.conn_extra)
 
@@ -1883,6 +1890,15 @@ class CLIFactory(object):
 ('--conn_extra',),
 help='Connection `Extra` field, optional when adding a connection',
 type=str),
+'kms_conn_id': Arg(
+('--kms_conn_id',),
+help='An existing connection to use when encrypting this 
connection with a '
+ 'KMS, optional when adding a connection',
+type=str),
+'kms_extra': Arg(
+('--kms_extra',),
+help='Connection `KMS Extra` field, optional when adding a 
connection',
+type=str),
 # users
 'username': Arg(
 ('--username',),
diff --git a/airflow/contrib/hooks/gcp_kms_hook.py 
b/airflow/contrib/hooks/gcp_kms_hook.py
index 6f2b3aedff..63e35fbe89 100644
--- a/airflow/contrib/hooks/gcp_kms_hook.py
+++ b/airflow/contrib/hooks/gcp_kms_hook.py
@@ -20,6 +20,7 @@
 
 import base64
 
+from airflow.hooks.kmsapi_hook import KmsApiHook
 from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
 
 from apiclient.discovery import build
@@ -35,7 +36,7 @@ def _b64decode(s):
 return base64.b64decode(s.encode('utf-8'))
 
 
-class GoogleCloudKMSHook(GoogleCloudBaseHook):
+class GoogleCloudKMSHook(GoogleCloudBaseHook, KmsApiHook):
 """
 Interact with Google Cloud KMS. This hook uses the Google Cloud Platform
 connection.
@@ -106,3 +107,17 @@ def decrypt(self, key_name, ciphertext, 
authenticated_data=None):
 
 plaintext = _b64decode(response['plaintext'])
 return plaintext
+
+def encrypt_conn_key(self, connection):
+kms_extras = connection.kms_extra_dejson
+key_name = kms_extras['kms_extra__google_cloud_platform__key_name']
+conn_key = connection._plain_conn_key
+
+connection.conn_key = self.encrypt(key_name, conn_key)
+
+def decrypt_conn_key(self, connection):
+kms_extras = connection.kms_extra_dejson
+key_name = kms_extras['kms_extra__google_cloud_platform__key_name']
+conn_key = connection.conn_key
+
+connection._plain_conn_key = self.decrypt(key_name, conn_key)
diff --git a/airflow/hooks/base_hook.py b/airflow/hooks/base_hook.py
index 103fa6260b..fe663f61c1 100644
--- a/airflow/hooks/base_hook.py
+++ b/airflow/hooks/base_hook.py
@@ -22,16 +22,9 @@
 from __future__ import print_function
 from __future__ import unicode_literals
 
-import os
-import random
-
 from airflow.models import Connection
-from airflow.exceptions import AirflowException
-from airflow.utils.db import provide_session
 from airflow.utils.log.logging_mixin import LoggingMixin
 
-CONN_ENV_PREFIX = 'AIRFLOW_CONN_'
-
 
 class 

[jira] [Commented] (AIRFLOW-2936) Docker images should use official Python images as base

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723727#comment-16723727
 ] 

ASF GitHub Bot commented on AIRFLOW-2936:
-

stale[bot] closed pull request #3782: [AIRFLOW-2936] Use official Python images 
as base image for Docker
URL: https://github.com/apache/incubator-airflow/pull/3782
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/scripts/ci/kubernetes/docker/Dockerfile 
b/scripts/ci/kubernetes/docker/Dockerfile.py2
similarity index 63%
rename from scripts/ci/kubernetes/docker/Dockerfile
rename to scripts/ci/kubernetes/docker/Dockerfile.py2
index 93b20dbcd2..b90577ce5e 100644
--- a/scripts/ci/kubernetes/docker/Dockerfile
+++ b/scripts/ci/kubernetes/docker/Dockerfile.py2
@@ -15,41 +15,29 @@
 #  specific language governing permissions and limitations  *
 #  under the License.   *
 
-FROM ubuntu:16.04
+FROM python:2.7.15-slim-stretch
 
+ENV AIRFLOW_HOME /home/airflow/
 ENV SLUGIFY_USES_TEXT_UNIDECODE=yes
 
-# install deps
-RUN apt-get update -y && apt-get install -y \
-wget \
-python-dev \
-python-pip \
-libczmq-dev \
-libcurlpp-dev \
-curl \
-libssl-dev \
-git \
-inetutils-telnet \
-bind9utils \
-zip \
-unzip \
-&& apt-get clean
-
-RUN pip install --upgrade pip
-
-# Since we install vanilla Airflow, we also want to have support for Postgres 
and Kubernetes
-RUN pip install -U setuptools && \
-pip install kubernetes && \
-pip install cryptography && \
-pip install psycopg2-binary==2.7.4  # I had issues with older versions of 
psycopg2, just a warning
-
-# install airflow
 COPY airflow.tar.gz /tmp/airflow.tar.gz
-RUN pip install /tmp/airflow.tar.gz
-
-COPY airflow-test-env-init.sh /tmp/airflow-test-env-init.sh
-
+COPY airflow-init.sh /home/airflow/airflow-init.sh
 COPY bootstrap.sh /bootstrap.sh
-RUN chmod +x /bootstrap.sh
 
+RUN useradd -ms /bin/bash -d ${AIRFLOW_HOME} airflow && \
+chown -Rv airflow: ${AIRFLOW_HOME} && \
+apt-get update && \
+apt-get install --no-install-recommends -y build-essential libxml2-dev 
libxslt1-dev && \
+pip install --upgrade pip && \
+pip install -U setuptools && \
+pip install kubernetes && \
+pip install cryptography && \
+pip install psycopg2-binary && \
+pip install /tmp/airflow.tar.gz && \
+chmod +x /bootstrap.sh /home/airflow/airflow-init.sh && \
+apt-get remove --purge -y build-essential libxml2-dev libxslt1-dev && \
+apt-get autoremove --purge -y && \
+rm -rf /var/lib/apt/lists/*
+
+USER airflow
 ENTRYPOINT ["/bootstrap.sh"]
diff --git a/scripts/ci/kubernetes/docker/Dockerfile.py3 
b/scripts/ci/kubernetes/docker/Dockerfile.py3
new file mode 100644
index 00..7fdbcde8b6
--- /dev/null
+++ b/scripts/ci/kubernetes/docker/Dockerfile.py3
@@ -0,0 +1,43 @@
+#  Licensed to the Apache Software Foundation (ASF) under one   *
+#  or more contributor license agreements.  See the NOTICE file *
+#  distributed with this work for additional information*
+#  regarding copyright ownership.  The ASF licenses this file   *
+#  to you under the Apache License, Version 2.0 (the*
+#  "License"); you may not use this file except in compliance   *
+#  with the License.  You may obtain a copy of the License at   *
+#   *
+#http://www.apache.org/licenses/LICENSE-2.0 *
+#   *
+#  Unless required by applicable law or agreed to in writing,   *
+#  software distributed under the License is distributed on an  *
+#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY   *
+#  KIND, either express or implied.  See the License for the*
+#  specific language governing permissions and limitations  *
+#  under the License.   *
+
+FROM python:3.7.0-slim-stretch
+
+ENV AIRFLOW_HOME /home/airflow/
+ENV SLUGIFY_USES_TEXT_UNIDECODE=yes
+
+COPY airflow.tar.gz /tmp/airflow.tar.gz
+COPY airflow-init.sh /home/airflow/airflow-init.sh
+COPY bootstrap.sh /bootstrap.sh
+
+RUN useradd -ms /bin/bash -d ${AIRFLOW_HOME} airflow && \
+chown -Rv airflow: ${AIRFLOW_HOME} && \
+apt-get update && \
+apt-get install --no-install-recommends -y build-essential libxml2-dev 
libxslt1-dev && \
+pip install --upgrade pip && \
+pip install -U setuptools && \
+pip install kubernetes && \
+pip install cryptography && \
+pip install psycopg2-binary && \
+pip install /tmp/airflow.tar.gz && \
+chmod +x /bootstrap.sh 

[jira] [Commented] (AIRFLOW-2973) Use Python 3.6.x everywhere possible

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723731#comment-16723731
 ] 

ASF GitHub Bot commented on AIRFLOW-2973:
-

stale[bot] closed pull request #3816: [HOLD][AIRFLOW-2973] Use Python 3.6.x 
everywhere possible
URL: https://github.com/apache/incubator-airflow/pull/3816
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.travis.yml b/.travis.yml
index 5bd750453a..1e04f8cd8e 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -32,8 +32,12 @@ env:
 - TOX_ENV=py35-backend_mysql PYTHON_VERSION=3
 - TOX_ENV=py35-backend_sqlite PYTHON_VERSION=3
 - TOX_ENV=py35-backend_postgres PYTHON_VERSION=3
+- TOX_ENV=py36-backend_mysql PYTHON_VERSION=3
+- TOX_ENV=py36-backend_sqlite PYTHON_VERSION=3
+- TOX_ENV=py36-backend_postgres PYTHON_VERSION=3
 - TOX_ENV=py27-backend_postgres KUBERNETES_VERSION=v1.9.0
 - TOX_ENV=py35-backend_postgres KUBERNETES_VERSION=v1.10.0 PYTHON_VERSION=3
+- TOX_ENV=py36-backend_postgres KUBERNETES_VERSION=v1.10.0 PYTHON_VERSION=3
 cache:
   directories:
 - $HOME/.wheelhouse/
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index beaf609b5a..f84c7501ec 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -87,7 +87,7 @@ There are three ways to setup an Apache Airflow development 
environment.
 
 1. Using tools and libraries installed directly on your system.
 
-  Install Python (2.7.x or 3.4.x), MySQL, and libxml by using system-level 
package
+  Install Python (2.7.x or 3.6.x), MySQL, and libxml by using system-level 
package
   managers like yum, apt-get for Linux, or Homebrew for Mac OS at first. Refer 
to the [base CI 
Dockerfile](https://github.com/apache/incubator-airflow-ci/blob/master/Dockerfile.base)
 for
   a comprehensive list of required packages.
 
@@ -146,7 +146,7 @@ There are three ways to setup an Apache Airflow development 
environment.
   # From the container
   pip install -e .[devel]
   # Run all the tests with python and mysql through tox
-  tox -e py35-backend_mysql
+  tox -e py36-backend_mysql
   ```
 
 ### Running unit tests
@@ -195,7 +195,7 @@ meets these guidelines:
 1. Preface your commit's subject & PR's title with **[AIRFLOW-XXX]** where 
*XXX* is the JIRA number. We compose release notes (i.e. for Airflow releases) 
from all commit titles in a release. By placing the JIRA number in the commit 
title and hence in the release notes, Airflow users can look into JIRA and 
Github PRs for more details about a particular change.
 1. Add an [Apache License](http://www.apache.org/legal/src-headers.html) 
header to all new files
 1. If the pull request adds functionality, the docs should be updated as part 
of the same PR. Doc string are often sufficient.  Make sure to follow the 
Sphinx compatible standards.
-1. The pull request should work for Python 2.7 and 3.4. If you need help 
writing code that works in both Python 2 and 3, see the documentation at the 
[Python-Future project](http://python-future.org) (the future package is an 
Airflow requirement and should be used where possible).
+1. The pull request should work for Python 2.7 and 3.6. If you need help 
writing code that works in both Python 2 and 3, see the documentation at the 
[Python-Future project](http://python-future.org) (the future package is an 
Airflow requirement and should be used where possible).
 1. As Airflow grows as a project, we try to enforce a more consistent style 
and try to follow the Python community guidelines. We track this using 
[landscape.io](https://landscape.io/github/apache/incubator-airflow/), which 
you can setup on your fork as well to check before you submit your PR. We 
currently enforce most [PEP8](https://www.python.org/dev/peps/pep-0008/) and a 
few other linting rules. It is usually a good idea to lint locally as well 
using [flake8](https://flake8.readthedocs.org/en/latest/) using `flake8 airflow 
tests`. `git diff upstream/master -u -- "*.py" | flake8 --diff` will return any 
changed files in your branch that require linting.
 1. Please read this excellent 
[article](http://chris.beams.io/posts/git-commit/) on commit messages and 
adhere to them. It makes the lives of those who come after you a lot easier.
 
diff --git 
a/airflow/contrib/kubernetes/kubernetes_request_factory/pod_request_factory.py 
b/airflow/contrib/kubernetes/kubernetes_request_factory/pod_request_factory.py
index 95d6c829de..00f92161d3 100644
--- 
a/airflow/contrib/kubernetes/kubernetes_request_factory/pod_request_factory.py
+++ 
b/airflow/contrib/kubernetes/kubernetes_request_factory/pod_request_factory.py
@@ -85,7 +85,7 @@ class ExtractXcomPodRequestFactory(KubernetesRequestFactory):
 - name: xcom
 

[jira] [Commented] (AIRFLOW-81) Scheduler blackout time period

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723723#comment-16723723
 ] 

ASF GitHub Bot commented on AIRFLOW-81:
---

stale[bot] closed pull request #3702: [AIRFLOW-81] Add ScheduleBlackoutSensor
URL: https://github.com/apache/incubator-airflow/pull/3702
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/sensors/schedule_blackout_sensor.py 
b/airflow/contrib/sensors/schedule_blackout_sensor.py
new file mode 100644
index 00..ac66cbb3b2
--- /dev/null
+++ b/airflow/contrib/sensors/schedule_blackout_sensor.py
@@ -0,0 +1,99 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils.decorators import apply_defaults
+from datetime import datetime
+
+
+class ScheduleBlackoutSensor(BaseSensorOperator):
+"""
+Checks to see if a task is running for a specified date and time criteria
+Returns false if sensor is running within "blackout" criteria, true 
otherwise
+
+:param month_of_year: Integer representing month of year
+Not checked if left to default to None
+:type month_of_year: int
+:param day_of_month: Integer representing day of month
+Not checked if left to default to None
+:type day_of_month: int
+:param hour_of_day: Integer representing hour of day
+Not checked if left to default to None
+:type hour_of_day: int
+:param min_of_hour: Integer representing minute of hour
+Not checked if left to default to None
+:type min_of_hour: int
+:param day_of_week: Integer representing day of week
+Not checked if left to default to None
+:type day_of_week: int
+:param day_of_week: Datetime object to check criteria against
+Defaults to datetime.now() if set to none
+:type day_of_week: datetime
+"""
+
+@apply_defaults
+def __init__(self,
+ month_of_year=None, day_of_month=None,
+ hour_of_day=None, min_of_hour=None,
+ day_of_week=None,
+ dt=None, *args, **kwargs):
+
+super(ScheduleBlackoutSensor, self).__init__(*args, **kwargs)
+
+self.dt = dt
+self.month_of_year = month_of_year
+self.day_of_month = day_of_month
+self.hour_of_day = hour_of_day
+self.min_of_hour = min_of_hour
+self.day_of_week = day_of_week
+
+def _check_criteria(self, crit, datepart):
+if crit is None:
+return None
+
+elif isinstance(crit, list):
+for i in crit:
+if i == datepart:
+return True
+return False
+elif isinstance(crit, int):
+return True if datepart == crit else False
+else:
+raise TypeError(
+"Expected an interger or a list, received a 
{0}".format(type(crit)))
+
+def poke(self, context):
+self.dt = datetime.now() if self.dt is None else self.dt
+
+criteria = [
+# month of year
+self._check_criteria(self.month_of_year, self.dt.month),
+# day of month
+self._check_criteria(self.day_of_month, self.dt.day),
+# hour of day
+self._check_criteria(self.hour_of_day, self.dt.hour),
+# minute of hour
+self._check_criteria(self.min_of_hour, self.dt.minute),
+# day of week
+self._check_criteria(self.day_of_week, self.dt.weekday())
+]
+
+# Removes criteria that are set to None and then checks that all
+# specified criteria are True. If all criteria are True - returns False
+# in order to trigger a sensor failure if blackout criteria are met
+return not all([crit for crit in criteria if crit is not None])
diff --git a/docs/code.rst b/docs/code.rst
index 80ec76193f..db16b028b5 

[jira] [Commented] (AIRFLOW-2759) Simplify proxy server based access to external platforms like Google cloud

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723725#comment-16723725
 ] 

ASF GitHub Bot commented on AIRFLOW-2759:
-

stale[bot] closed pull request #3722: [AIRFLOW-2759] Add changes to extract 
proxy details at the base hook …
URL: https://github.com/apache/incubator-airflow/pull/3722
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 4f1f0df383..328c40108e 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -173,6 +173,9 @@ killed_task_cleanup_time = 60
 # `airflow trigger_dag -c`, the key-value pairs will override the existing 
ones in params.
 dag_run_conf_overrides_params = False
 
+# Connect via Proxy
+use_proxy = False
+
 [cli]
 # In what way should the cli access the API. The LocalClient will use the
 # database directly, while the json_client will use the api running on the
@@ -635,3 +638,9 @@ in_cluster = True
 #
 # Additionally you may override worker airflow settings with the 
AIRFLOW
 # formatting as supported by airflow normally.
+
+[proxy]
+# Proxy section to pass proxy related details
+proxy_type =
+proxy_host =
+proxy_port =
diff --git a/airflow/config_templates/default_test.cfg 
b/airflow/config_templates/default_test.cfg
index 01696c6906..60e574f822 100644
--- a/airflow/config_templates/default_test.cfg
+++ b/airflow/config_templates/default_test.cfg
@@ -51,6 +51,7 @@ enable_xcom_pickling = False
 killed_task_cleanup_time = 5
 secure_mode = False
 hostname_callable = socket:getfqdn
+use_proxy = False
 
 [cli]
 api_client = airflow.api.client.local_client
@@ -123,3 +124,8 @@ hide_sensitive_variable_fields = True
 elasticsearch_host =
 elasticsearch_log_id_template = 
{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}
 elasticsearch_end_of_log_mark = end_of_log
+
+[proxy]
+proxy_type =
+proxy_host =
+proxy_port =
diff --git a/airflow/contrib/hooks/gcp_api_base_hook.py 
b/airflow/contrib/hooks/gcp_api_base_hook.py
index 053494743f..342af2b255 100644
--- a/airflow/contrib/hooks/gcp_api_base_hook.py
+++ b/airflow/contrib/hooks/gcp_api_base_hook.py
@@ -27,7 +27,7 @@
 from airflow.exceptions import AirflowException
 from airflow.hooks.base_hook import BaseHook
 from airflow.utils.log.logging_mixin import LoggingMixin
-
+from airflow.configuration import conf
 
 _DEFAULT_SCOPES = ('https://www.googleapis.com/auth/cloud-platform',)
 
@@ -129,7 +129,8 @@ def _authorize(self):
 service hook connection.
 """
 credentials = self._get_credentials()
-http = httplib2.Http()
+proxy_obj = self._get_proxy_obj()
+http = httplib2.Http(proxy_info=proxy_obj)
 authed_http = google_auth_httplib2.AuthorizedHttp(
 credentials, http=http)
 return authed_http
@@ -147,6 +148,48 @@ def _get_field(self, f, default=None):
 else:
 return default
 
+def _get_proxy_obj(self):
+"""
+Returns proxy object with proxy details auch as host, port and type
+"""
+proxy_obj = None
+if self._get_useproxy() is True:
+proxy = self.get_proxyconfig()
+proxy_host = proxy.get('proxy_host')
+proxy_type = self._get_proxy_type(proxy)
+try:
+proxy_port = conf.getint('proxy', 'proxy_port')
+except ValueError:
+proxy_port = None
+proxy_obj = httplib2.ProxyInfo(proxy_type, proxy_host, proxy_port)
+return proxy_obj
+
+def _get_proxy_type(self, proxy):
+"""
+:param proxy: Proxy details fetched from configuration file
+:return: Proxy type
+"""
+proxy_type_dictionary = {
+"SOCKS4": httplib2.socks.PROXY_TYPE_SOCKS4,
+"SOCKS5": httplib2.socks.PROXY_TYPE_SOCKS5,
+"HTTP": httplib2.socks.PROXY_TYPE_HTTP,
+"HTTP_NO_TUNNEL": httplib2.socks.PROXY_TYPE_HTTP_NO_TUNNEL
+}
+
+proxy_type_from_config = proxy.get('proxy_type')
+proxy_type = proxy_type_dictionary.get(proxy_type_from_config)
+
+if proxy_type is None:
+self.log.info("Proxy type does not exist returning proxy type as 
None")
+return proxy_type
+
+def _get_useproxy(self):
+"""
+Fetch use_proxy field from config file
+"""
+use_proxy = conf.getboolean('core', 'use_proxy')
+return use_proxy
+
 @property
 def project_id(self):
 return self._get_field('project')
diff --git a/airflow/hooks/base_hook.py b/airflow/hooks/base_hook.py

[jira] [Commented] (AIRFLOW-2651) Add file system hooks with a common interface

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723675#comment-16723675
 ] 

ASF GitHub Bot commented on AIRFLOW-2651:
-

stale[bot] closed pull request #3526: [AIRFLOW-2651] Add file system hooks with 
a common interface
URL: https://github.com/apache/incubator-airflow/pull/3526
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/hooks/fs_hooks/__init__.py 
b/airflow/hooks/fs_hooks/__init__.py
new file mode 100644
index 00..e69de29bb2
diff --git a/airflow/hooks/fs_hooks/base.py b/airflow/hooks/fs_hooks/base.py
new file mode 100644
index 00..8e4c4c8243
--- /dev/null
+++ b/airflow/hooks/fs_hooks/base.py
@@ -0,0 +1,264 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+from builtins import super
+import errno
+import fnmatch
+import importlib
+import posixpath
+import re
+import shutil
+
+from airflow.hooks.base_hook import BaseHook
+
+_FS_BASE_MODULE = '.'.join(__name__.split('.')[:-1])
+
+
+class FsHook(BaseHook):
+"""Base FsHook defining the FsHook interface and providing some basic
+   functionality built on this interface.
+"""
+
+_conn_classes = {
+'ftp': _FS_BASE_MODULE + '.ftp.FtpHook',
+'hdfs': _FS_BASE_MODULE + '.hdfs3.Hdfs3Hook',
+'local': _FS_BASE_MODULE + '.local.LocalFsHook',
+'s3': _FS_BASE_MODULE + '.s3.S3FsHook',
+'sftp': _FS_BASE_MODULE + '.sftp.SftpHook'
+}
+
+sep = posixpath.sep
+
+def __init__(self, conn_id=None):
+super().__init__(source=None)
+self._conn_id = conn_id
+
+@classmethod
+def for_connection(cls, conn_id=None):
+"""Return appropriate hook for the given connection."""
+
+if conn_id is None or conn_id == 'local':
+conn_type = 'local'
+else:
+conn_type = cls.get_connection(conn_id).conn_type
+
+try:
+class_ = cls._conn_classes[conn_type]
+except KeyError:
+raise ValueError('Conn type {!r} is not supported'
+ .format(conn_type))
+
+if isinstance(class_, str):
+# conn_class is a string identifier, import
+# class from the indicated module.
+split = class_.split('.')
+module_name = '.'.join(split[:-1])
+class_name = split[-1]
+
+module = importlib.import_module(module_name)
+class_ = getattr(module, class_name)
+
+return class_(conn_id=conn_id)
+
+@classmethod
+def register_hook(cls, conn_type, class_):
+"""Register FsHook subclass for the given connection type.
+
+Registered FsHook subclasses are used by `for_connection` when
+instantiating the appropriate hook for a given connection, based
+on its connection type.
+
+:param str conn_type: Connection type.
+:param class_: FsHook to register. Can either be the class itself
+or a string specifying the full module path for the class.
+"""
+cls._conn_classes[conn_type] = class_
+
+def __enter__(self):
+return self
+
+def __exit__(self, exc_type, exc_val, exc_tb):
+self.disconnect()
+
+def disconnect(self):
+"""Closes fs connection (if applicable)."""
+pass
+
+# Interface methods (should be implemented by sub-classes).
+
+def open(self, file_path, mode='rb'):
+"""Returns file_obj for given file path.
+
+:param str file_path: Path to the file to open.
+:param str mode: Mode to open the file in.
+
+:returns: An opened file object.
+"""
+raise NotImplementedError()
+
+def exists(self, file_path):
+"""Checks whether the given file path exists.
+
+:param str file_path: File path.
+
+:returns: True if the file exists, else False.
+:rtype: bool
+"""
+raise 

[jira] [Commented] (AIRFLOW-2325) Task logging with AWS Cloud watch

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723670#comment-16723670
 ] 

ASF GitHub Bot commented on AIRFLOW-2325:
-

stale[bot] closed pull request #3229: [AIRFLOW-2325] Add cloudwatch task 
handler (IN PROGRESS)
URL: https://github.com/apache/incubator-airflow/pull/3229
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/utils/log/cloudwatch_task_handler.py 
b/airflow/utils/log/cloudwatch_task_handler.py
new file mode 100644
index 00..97e67b8c89
--- /dev/null
+++ b/airflow/utils/log/cloudwatch_task_handler.py
@@ -0,0 +1,124 @@
+# -*- coding: utf-8 -*-
+import os
+import logging
+
+import boto3
+import watchtower
+from jinja2 import Template
+from airflow import configuration
+from airflow.utils.log.logging_mixin import LoggingMixin
+
+
+class CloudwatchTaskHandler(logging.Handler, LoggingMixin):
+def __init__(self, log_group, filename_template, region_name=None, 
**kwargs):
+super(CloudwatchTaskHandler, self).__init__()
+self.handler = None
+self.log_group = log_group
+self.region_name = region_name
+self.filename_template = filename_template
+self.filename_jinja_template = None
+self.kwargs = kwargs
+self.closed = False
+
+if "{{" in self.filename_template: #jinja mode
+self.filename_jinja_template = Template(self.filename_template)
+
+def _render_filename(self, ti, try_number):
+if self.filename_jinja_template:
+jinja_context = ti.get_template_context()
+jinja_context['try_number'] = try_number
+return (
+self.filename_jinja_template.render(**jinja_context)
+.replace(':', '_')
+)
+
+return self.filename_template.format(
+dag_id=ti.dag_id,
+task_id=ti.task_id,
+execution_date=ti.execution_date.isoformat(),
+try_number=try_number,
+).replace(':', '_')
+
+def set_context(self, ti):
+kwargs = self.kwargs.copy()
+stream_name = kwargs.pop('stream_name', None)
+if stream_name is None:
+stream_name = self._render_filename(ti, ti.try_number)
+if 'boto3_session' not in kwargs and self.region_name is not None:
+kwargs['boto3_session'] = boto3.session.Session(
+region_name=self.region_name,
+)
+self.handler = watchtower.CloudWatchLogHandler(
+log_group=self.log_group,
+stream_name=stream_name,
+**kwargs
+)
+
+def emit(self, record):
+if self.handler is not None:
+self.handler.emit(record)
+
+def flush(self):
+if self.handler is not None:
+self.handler.flush()
+
+def close(self):
+"""
+Close and upload local log file to remote storage S3.
+"""
+# When application exit, system shuts down all handlers by
+# calling close method. Here we check if logger is already
+# closed to prevent uploading the log to remote storage multiple
+# times when `logging.shutdown` is called.
+if self.closed:
+return
+
+if self.handler is not None:
+self.handler.close()
+# Mark closed so we don't double write if close is called twice
+self.closed = True
+
+def read(self, task_instance, try_number=None):
+if try_number is None:
+next_try = task_instance.next_try_number
+try_numbers = list(range(1, next_try))
+elif try_number < 1:
+logs = [
+'Error fetching the logs. Try number {try_number} is invalid.',
+]
+return logs
+else:
+try_numbers = [try_number]
+
+logs = [''] * len(try_numbers)
+for i, try_number in enumerate(try_numbers):
+logs[i] += self._read(task_instance, try_number)
+
+return logs
+
+def _read(self, task_instance, try_number):
+stream_name = self._render_filename(task_instance, try_number)
+if self.handler is not None:
+client = self.handler.cwl_client
+else:
+client = boto3.client('logs', region_name=self.region_name)
+events = []
+try:
+response = client.get_log_events(
+logGroupName=self.log_group,
+logStreamName=stream_name,
+)
+events.extend(response['events'])
+next_token = response['nextForwardToken']
+while True:
+response = client.get_log_events(
+   

[jira] [Commented] (AIRFLOW-2568) Implement a Azure Container Instances operator

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723674#comment-16723674
 ] 

ASF GitHub Bot commented on AIRFLOW-2568:
-

stale[bot] closed pull request #3467: [AIRFLOW-2568] Azure Container Instances 
operator
URL: https://github.com/apache/incubator-airflow/pull/3467
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/azure_container_hook.py 
b/airflow/contrib/hooks/azure_container_hook.py
new file mode 100644
index 00..f74e8fc86b
--- /dev/null
+++ b/airflow/contrib/hooks/azure_container_hook.py
@@ -0,0 +1,129 @@
+
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+
+from airflow.hooks.base_hook import BaseHook
+from airflow.exceptions import AirflowException
+
+from azure.common.client_factory import get_client_from_auth_file
+from azure.common.credentials import ServicePrincipalCredentials
+
+from azure.mgmt.containerinstance import ContainerInstanceManagementClient
+from azure.mgmt.containerinstance.models import (ImageRegistryCredential,
+ Volume,
+ AzureFileVolume)
+
+
+class AzureContainerInstanceHook(BaseHook):
+
+def __init__(self, conn_id='azure_default'):
+self.conn_id = conn_id
+self.connection = self.get_conn()
+
+def get_conn(self):
+conn = self.get_connection(self.conn_id)
+key_path = conn.extra_dejson.get('key_path', False)
+if key_path:
+if key_path.endswith('.json'):
+self.log.info('Getting connection using a JSON key file.')
+return 
get_client_from_auth_file(ContainerInstanceManagementClient,
+ key_path)
+else:
+raise AirflowException('Unrecognised extension for key file.')
+
+if os.environ.get('AZURE_AUTH_LOCATION'):
+key_path = os.environ.get('AZURE_AUTH_LOCATION')
+if key_path.endswith('.json'):
+self.log.info('Getting connection using a JSON key file.')
+return 
get_client_from_auth_file(ContainerInstanceManagementClient,
+ key_path)
+else:
+raise AirflowException('Unrecognised extension for key file.')
+
+credentials = ServicePrincipalCredentials(
+client_id=conn.login,
+secret=conn.password,
+tenant=conn.extra_dejson['tenantId']
+)
+
+subscription_id = conn.extra_dejson['subscriptionId']
+return ContainerInstanceManagementClient(credentials, 
str(subscription_id))
+
+def create_or_update(self, resource_group, name, container_group):
+self.connection.container_groups.create_or_update(resource_group,
+  name,
+  container_group)
+
+def get_state_exitcode(self, resource_group, name):
+response = self.connection.container_groups.get(resource_group,
+name,
+
raw=True).response.json()
+containers = response['properties']['containers']
+instance_view = containers[0]['properties'].get('instanceView', {})
+current_state = instance_view.get('currentState', {})
+
+return current_state.get('state'), current_state.get('exitCode', 0)
+
+def get_messages(self, resource_group, name):
+response = self.connection.container_groups.get(resource_group,
+name,
+
raw=True).response.json()
+containers = response['properties']['containers']
+instance_view = containers[0]['properties'].get('instanceView', {})
+
+return [event['message'] for event in instance_view.get('events', [])]
+
+def get_logs(self, resource_group, name, tail=1000):
+logs = 

[jira] [Commented] (AIRFLOW-2354) Run single task instance blocks everything except CeleryExecutor

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723672#comment-16723672
 ] 

ASF GitHub Bot commented on AIRFLOW-2354:
-

stale[bot] closed pull request #3249: [AIRFLOW-2354] Change task instance run 
validation to not exclude das…
URL: https://github.com/apache/incubator-airflow/pull/3249
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/www/views.py b/airflow/www/views.py
index 5dda0362cc..6a5cec1e0c 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -944,14 +944,16 @@ def run(self):
 
 try:
 from airflow.executors import GetDefaultExecutor
-from airflow.executors.celery_executor import CeleryExecutor
+from airflow.executors.local_executor import LocalExecutor
+from airflow.executors.sequential_executor import 
SequentialExecutor
 executor = GetDefaultExecutor()
-if not isinstance(executor, CeleryExecutor):
-flash("Only works with the CeleryExecutor, sorry", "error")
+if isinstance(executor, LocalExecutor) or \
+isinstance(executor, SequentialExecutor):
+flash("Doesn't work with the LocalExecutor or 
SequentialExecutor, sorry",
+  "error")
 return redirect(origin)
 except ImportError:
-# in case CeleryExecutor cannot be imported it is not active either
-flash("Only works with the CeleryExecutor, sorry", "error")
+flash("Error when attempting to validate the executor", "error")
 return redirect(origin)
 
 ti = models.TaskInstance(task=task, execution_date=execution_date)
diff --git a/airflow/www_rbac/views.py b/airflow/www_rbac/views.py
index f064c14c47..8d1ecfaf42 100644
--- a/airflow/www_rbac/views.py
+++ b/airflow/www_rbac/views.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -591,14 +591,16 @@ def run(self):
 
 try:
 from airflow.executors import GetDefaultExecutor
-from airflow.executors.celery_executor import CeleryExecutor
+from airflow.executors.local_executor import LocalExecutor
+from airflow.executors.sequential_executor import 
SequentialExecutor
 executor = GetDefaultExecutor()
-if not isinstance(executor, CeleryExecutor):
-flash("Only works with the CeleryExecutor, sorry", "error")
+if isinstance(executor, LocalExecutor) or \
+isinstance(executor, SequentialExecutor):
+flash("Doesn't work with the LocalExecutor or 
SequentialExecutor, sorry",
+  "error")
 return redirect(origin)
 except ImportError:
-# in case CeleryExecutor cannot be imported it is not active either
-flash("Only works with the CeleryExecutor, sorry", "error")
+flash("Error when attempting to validate the executor", "error")
 return redirect(origin)
 
 ti = models.TaskInstance(task=task, execution_date=execution_date)
diff --git a/tests/www/test_views.py b/tests/www/test_views.py
index 3b2892d10c..18dcc70142 100644
--- a/tests/www/test_views.py
+++ b/tests/www/test_views.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -29,6 +29,7 @@
 
 from urllib.parse import quote_plus
 from werkzeug.test import Client
+from mock import Mock
 
 from airflow import models, 

[jira] [Commented] (AIRFLOW-2549) GCP DataProc Workflow Template operators report success when jobs fail

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723673#comment-16723673
 ] 

ASF GitHub Bot commented on AIRFLOW-2549:
-

stale[bot] closed pull request #3447: [AIRFLOW-2549] Fix DataProcOperation 
error-check
URL: https://github.com/apache/incubator-airflow/pull/3447
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/gcp_dataproc_hook.py 
b/airflow/contrib/hooks/gcp_dataproc_hook.py
index ce65b2b915..afa82ce5f9 100644
--- a/airflow/contrib/hooks/gcp_dataproc_hook.py
+++ b/airflow/contrib/hooks/gcp_dataproc_hook.py
@@ -175,21 +175,45 @@ def get(self):
 return self.operation
 
 def _check_done(self):
-if 'done' in self.operation:
+def _check_error():
+""" Check the operation for errors.  Precondition is that the
+operation must be marked as done already.
+"""
 if 'error' in self.operation:
-self.log.warning(
-'Dataproc Operation %s failed with error: %s',
-self.operation_name, self.operation['error']['message'])
-self._raise_error()
-else:
-self.log.info(
-'Dataproc Operation %s done', self.operation['name'])
-return True
-return False
+return (True, self.operation['error']['message'])
+
+# Dataproc workflow templates do not set the 'error' field when
+# jobs fail; we have to examine the individual jobs for failures.
+metadata = self.operation.get('metadata', {})
+if not metadata.get('@type', '').endswith('WorkflowMetadata'):
+return (False, None)
+
+nodes = metadata.get('graph', {}).get('nodes', [])
+
+error_nodes = [node for node in nodes if node.get('error')]
+
+return (False, None) if not error_nodes else \
+(True, str({
+node['jobId']: node['error'] for node in error_nodes}))
+
+if not self.operation.get('done'):
+# either the done field is not present, or it is false
+return False
+
+(operation_failed, error_message) = _check_error()
+if operation_failed:
+self.log.warning(
+'Dataproc Operation %s failed with error: %s',
+self.operation_name, error_message)
+self._raise_error(error_message)
+else:
+self.log.info(
+'Dataproc Operation %s done', self.operation_name)
+return True
 
-def _raise_error(self):
+def _raise_error(self, error_message):
 raise Exception('Google Dataproc Operation %s failed: %s' %
-(self.operation_name, 
self.operation['error']['message']))
+(self.operation_name, error_message))
 
 
 class DataProcHook(GoogleCloudBaseHook):
diff --git a/tests/contrib/hooks/test_gcp_dataproc_hook.py 
b/tests/contrib/hooks/test_gcp_dataproc_hook.py
index f2629ff148..26f059d35c 100644
--- a/tests/contrib/hooks/test_gcp_dataproc_hook.py
+++ b/tests/contrib/hooks/test_gcp_dataproc_hook.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -19,7 +19,7 @@
 #
 
 import unittest
-from airflow.contrib.hooks.gcp_dataproc_hook import DataProcHook
+from airflow.contrib.hooks.gcp_dataproc_hook import DataProcHook, 
_DataProcOperation
 
 try:
 from unittest import mock
@@ -48,6 +48,98 @@ def setUp(self):
 
 @mock.patch(DATAPROC_STRING.format('_DataProcJob'))
 def test_submit(self, job_mock):
-  with mock.patch(DATAPROC_STRING.format('DataProcHook.get_conn', 
return_value=None)):
-self.dataproc_hook.submit(PROJECT_ID, JOB)
-job_mock.assert_called_once_with(mock.ANY, PROJECT_ID, JOB, REGION)
+with mock.patch(DATAPROC_STRING.format('DataProcHook.get_conn',
+   return_value=None)):
+self.dataproc_hook.submit(PROJECT_ID, JOB)
+job_mock.assert_called_once_with(mock.ANY, PROJECT_ID, JOB, REGION)
+
+def test_successful_operation_detector(self):
+operation_api_response = \
+{
+"done": True,
+"metadata": {
+

[jira] [Commented] (AIRFLOW-2224) Add support for CSV file exports in mysql_to_gcs contrib operator

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723669#comment-16723669
 ] 

ASF GitHub Bot commented on AIRFLOW-2224:
-

stale[bot] closed pull request #3139: [AIRFLOW-2224] Add support for CSV files 
in mysql_to_gcs operator
URL: https://github.com/apache/incubator-airflow/pull/3139
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/mysql_to_gcs.py 
b/airflow/contrib/operators/mysql_to_gcs.py
index 9ba84c7556..c0c48c5c68 100644
--- a/airflow/contrib/operators/mysql_to_gcs.py
+++ b/airflow/contrib/operators/mysql_to_gcs.py
@@ -25,13 +25,14 @@
 from MySQLdb.constants import FIELD_TYPE
 from tempfile import NamedTemporaryFile
 from six import string_types
+import unicodecsv as csv
 
 PY3 = sys.version_info[0] == 3
 
 
 class MySqlToGoogleCloudStorageOperator(BaseOperator):
 """
-Copy data from MySQL to Google cloud storage in JSON format.
+Copy data from MySQL to Google cloud storage in JSON or CSV format.
 """
 template_fields = ('sql', 'bucket', 'filename', 'schema_filename', 
'schema')
 template_ext = ('.sql',)
@@ -48,6 +49,7 @@ def __init__(self,
  google_cloud_storage_conn_id='google_cloud_storage_default',
  schema=None,
  delegate_to=None,
+ export_format={'file_format': 'json'},
  *args,
  **kwargs):
 """
@@ -82,6 +84,50 @@ def __init__(self,
 :param delegate_to: The account to impersonate, if any. For this to
 work, the service account making the request must have domain-wide
 delegation enabled.
+:param export_format: Details for files to be exported into GCS.
+Allows to specify 'json' or 'csv', and also addiitional details for
+CSV file exports (quotes, separators, etc.)
+This is a dict with the following key-value pairs:
+  * file_format: 'json' or 'csv'. If using CSV, more details can
+  be added
+  * csv_dialect: preconfigured set of CSV export parameters
+ (i.e.: 'excel', 'excel-tab', 'unix_dialect').
+ If present, will ignore all other 'csv_' options.
+ See https://docs.python.org/3/library/csv.html
+  * csv_delimiter: A one-character string used to separate fields.
+   It defaults to ','.
+  * csv_doublequote: If doublequote is False and no escapechar is 
set,
+ Error is raised if a quotechar is found in a 
field.
+ It defaults to True.
+  * csv_escapechar: A one-character string used to escape the 
delimiter
+if quoting is set to QUOTE_NONE and the 
quotechar
+if doublequote is False.
+It defaults to None, which disables escaping.
+  * csv_lineterminator: The string used to terminate lines.
+It defaults to '\r\n'.
+  * csv_quotechar: A one-character string used to quote fields
+containing special characters, such as the 
delimiter
+or quotechar, or which contain new-line 
characters.
+It defaults to '"'.
+  * csv_quoting: Controls when quotes should be generated.
+ It can take on any of the QUOTE_* constants
+ Defaults to csv.QUOTE_MINIMAL.
+ Valid values are:
+ 'csv.QUOTE_ALL': Quote all fields
+ 'csv.QUOTE_MINIMAL': only quote those fields 
which contain
+special characters such as 
delimiter,
+quotechar or any of the 
characters
+in lineterminator.
+ 'csv.QUOTE_NONNUMERIC': Quote all non-numeric 
fields.
+ 'csv.QUOTE_NONE': never quote fields. When the 
current
+delimiter occurs in output 
data it is
+preceded by the current 
escapechar
+character. If escapechar is 
not set,
+the writer will raise Error if 
any
+  

[jira] [Commented] (AIRFLOW-2280) Extra argument for comparison with another table in IntervalCheckOperator

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723671#comment-16723671
 ] 

ASF GitHub Bot commented on AIRFLOW-2280:
-

stale[bot] closed pull request #3186: [AIRFLOW-2280]Add feature in 
CheckIntervalOperator
URL: https://github.com/apache/incubator-airflow/pull/3186
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/operators/check_operator.py 
b/airflow/operators/check_operator.py
index 9994671a70..682a15749c 100644
--- a/airflow/operators/check_operator.py
+++ b/airflow/operators/check_operator.py
@@ -181,6 +181,9 @@ class IntervalCheckOperator(BaseOperator):
 
 :param table: the table name
 :type table: str
+:param check_with_table: the table name to check against, default None
+indicates comparing within the same table
+:type table: str
 :param days_back: number of days between ds and the ds we want to check
 against. Defaults to 7 days
 :type days_back: int
@@ -197,7 +200,7 @@ class IntervalCheckOperator(BaseOperator):
 
 @apply_defaults
 def __init__(
-self, table, metrics_thresholds,
+self, table, metrics_thresholds, check_with_table=None,
 date_filter_column='ds', days_back=-7,
 conn_id=None,
 *args, **kwargs):
@@ -208,11 +211,15 @@ def __init__(
 self.date_filter_column = date_filter_column
 self.days_back = -abs(days_back)
 self.conn_id = conn_id
+if not check_with_table:
+check_with_table = table
 sqlexp = ', '.join(self.metrics_sorted)
-sqlt = ("SELECT {sqlexp} FROM {table}"
-" WHERE {date_filter_column}=").format(**locals())
-self.sql1 = sqlt + "'{{ ds }}'"
-self.sql2 = sqlt + "'{{ macros.ds_add(ds, "+str(self.days_back)+") }}'"
+sqlt1 = ("SELECT {sqlexp} FROM {table}"
+ " WHERE {date_filter_column}=").format(**locals())
+self.sql1 = sqlt1 + "'{{ ds }}'"
+sqlt2 = ("SELECT {sqlexp} FROM {check_with_table}"
+ " WHERE {date_filter_column}=").format(**locals())
+self.sql2 = sqlt2 + "'{{ macros.ds_add(ds, " + str(self.days_back) + 
") }}'"
 
 def execute(self, context=None):
 hook = self.get_db_hook()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Extra argument for comparison with another table in IntervalCheckOperator
> -
>
> Key: AIRFLOW-2280
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2280
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Reporter: Yuyin Yang
>Assignee: Yuyin Yang
>Priority: Minor
>
> Current IntervalCheckOperator can only check the values of metrics given as 
> SQL expressions are within a certain tolerance of the ones from days_back 
> before for the same table. For example, if I set metrics as COUNT(*), 
> threshold ratio=1.5,  and days_back=-7, then I can compare the count of this 
> table at current, and the count of same table 7 days back.
> However, in practice, we would like to first load tables to a tmp dataset, 
> which has an expiration date. And after validation, we start to load it to 
> production dataset. In this case, it makes more sense to compare the current 
> tmp one, with production dataset days_back, because days_back temporary table 
> may not exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1098) parent_dag is not correctly set when subdags have subdags inside

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723662#comment-16723662
 ] 

ASF GitHub Bot commented on AIRFLOW-1098:
-

stale[bot] closed pull request #2233: [AIRFLOW-1098] Fix issue in setting 
parent_dag when loading dags
URL: https://github.com/apache/incubator-airflow/pull/2233
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/models.py b/airflow/models.py
index e6374d45e0..c7af17bacc 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -367,7 +367,7 @@ def bag_dag(self, dag, parent_dag, root_dag):
 for task in dag.tasks:
 settings.policy(task)
 
-for subdag in dag.subdags:
+for subdag in dag.direct_subdags:
 subdag.full_filepath = dag.full_filepath
 subdag.parent_dag = dag
 subdag.is_subdag = True
@@ -2994,6 +2994,20 @@ def latest_execution_date(self):
 session.close()
 return execution_date
 
+@property
+def direct_subdags(self):
+"""
+Returns only directly connected subdag objects rather than all 
associated to this DAG
+"""
+from airflow.operators.subdag_operator import SubDagOperator
+l = []
+for task in self.tasks:
+if (isinstance(task, SubDagOperator) or
+#TODO remove in Airflow 2.0
+type(task).__name__ == 'SubDagOperator'):
+l.append(task.subdag)
+return l
+
 @property
 def subdags(self):
 """


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parent_dag is not correctly set when subdags have subdags inside
> 
>
> Key: AIRFLOW-1098
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1098
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: wangwenxiang
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Dag has a subdags method to recursively retrieve all sub dags associated. It 
> is incorrectly used to bag one dag and all its subdags when subdags has 
> subdags.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1463) Scheduler does not reschedule tasks in QUEUED state

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723666#comment-16723666
 ] 

ASF GitHub Bot commented on AIRFLOW-1463:
-

stale[bot] closed pull request #2483: [AIRFLOW-1463] Clear state of queued task
URL: https://github.com/apache/incubator-airflow/pull/2483
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index a8543d38a9..945dc1e97c 100755
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -304,6 +304,31 @@ def set_is_paused(is_paused, args, dag=None):
 print(msg)
 
 
+def mark_task_as(dag_id, task_id, execution_date, state):
+"""Finds task instance with specified fields, and changes state.
+
+If the task instance doesn't exist, it does nothing.
+
+"""
+session = settings.Session()
+from sqlalchemy import and_
+tis = session.query(TaskInstance).filter(
+and_(
+TaskInstance.dag_id == dag_id,
+TaskInstance.task_id == task_id,
+TaskInstance.execution_date == execution_date
+)
+).all()
+if tis:
+assert len(tis) == 1, (
+"There must be at most one task instance with given properties"
+)
+ti = tis[0]
+ti.state = state
+session.merge(ti)
+session.commit()
+
+
 def run(args, dag=None):
 # Disable connection pooling to reduce the # of connections on the DB
 # while it's waiting for the task to finish.
@@ -327,7 +352,19 @@ def run(args, dag=None):
 settings.configure_orm()
 
 if not args.pickle and not dag:
-dag = get_dag(args)
+try:
+dag = get_dag(args)
+except Exception as e:
+# DAG import can fail
+# it's an app dev code, we cannot require it to be reliable,
+# so we catch this error here and set task instance state to NONE
+# to reschedule it
+# DAG import errors are observed and expected to be transient
+print('Failed to load DAG, reason: %r' % e)
+print('Setting the task state back to NONE')
+from airflow.utils.state import State
+mark_task_as(args.dag_id, args.task_id, args.execution_date, 
State.NONE)
+raise e
 elif not dag:
 session = settings.Session()
 logging.info('Loading pickle id {args.pickle}'.format(args=args))
diff --git a/tests/core.py b/tests/core.py
index 923e0c3e86..3105c63c79 100644
--- a/tests/core.py
+++ b/tests/core.py
@@ -1088,6 +1088,7 @@ def _cleanup(session=None):
 
 session.query(models.Pool).delete()
 session.query(models.Variable).delete()
+session.query(models.TaskInstance).delete()
 session.commit()
 session.close()
 
@@ -1326,6 +1327,24 @@ def test_cli_run(self):
 'run', 'example_bash_operator', 'runme_0', '-l',
 DEFAULT_DATE.isoformat()]))
 
+def test_cli_run_import_failure(self):
+task = DummyOperator(dag_id='no_such_dag', task_id='no_such_task')
+self.session.add(
+models.TaskInstance(
+task,
+execution_date=DEFAULT_DATE,
+state=State.QUEUED
+)
+)
+self.session.commit()
+with self.assertRaises(Exception):
+cli.run(self.parser.parse_args([
+'run', task.dag_id, task.task_id, '-l',
+DEFAULT_DATE.isoformat()]))
+ti = self.session.query(models.TaskInstance).filter_by(
+dag_id=task.dag_id, task_id=task.task_id).first()
+self.assertTrue(ti.state == State.NONE)
+
 def test_task_state(self):
 cli.task_state(self.parser.parse_args([
 'task_state', 'example_bash_operator', 'runme_0',


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Scheduler does not reschedule tasks in QUEUED state
> ---
>
> Key: AIRFLOW-1463
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1463
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
> Environment: Ubuntu 14.04
> Airflow 1.8.0
> SQS backed task queue, AWS RDS backed meta storage
> DAG folder is synced by script on code push: archive is downloaded from s3, 
> unpacked, moved, install script is run. airflow executable is replaced with 
> 

[jira] [Commented] (AIRFLOW-351) Failed to clear downstream tasks

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723664#comment-16723664
 ] 

ASF GitHub Bot commented on AIRFLOW-351:


stale[bot] closed pull request #2228: [AIRFLOW-351] Ensure python_callable is 
pickleable
URL: https://github.com/apache/incubator-airflow/pull/2228
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/operators/python_operator.py 
b/airflow/operators/python_operator.py
index cf240f2802..a8febf0d74 100644
--- a/airflow/operators/python_operator.py
+++ b/airflow/operators/python_operator.py
@@ -13,8 +13,10 @@
 # limitations under the License.
 
 from builtins import str
+import copy
 from datetime import datetime
 import logging
+import pickle
 
 from airflow.exceptions import AirflowException
 from airflow.models import BaseOperator, TaskInstance
@@ -66,6 +68,10 @@ def __init__(
 super(PythonOperator, self).__init__(*args, **kwargs)
 if not callable(python_callable):
 raise AirflowException('`python_callable` param must be callable')
+try:
+pickle.dumps(copy.deepcopy(python_callable))
+except TypeError:
+raise AirflowException('`python_callable` param must be 
pickleable')
 self.python_callable = python_callable
 self.op_args = op_args or []
 self.op_kwargs = op_kwargs or {}
diff --git a/tests/operators/python_operator.py 
b/tests/operators/python_operator.py
index 3aa8b6cf8f..56053052f6 100644
--- a/tests/operators/python_operator.py
+++ b/tests/operators/python_operator.py
@@ -15,6 +15,7 @@
 from __future__ import print_function, unicode_literals
 
 import datetime
+import threading
 import unittest
 
 from airflow import configuration, DAG
@@ -33,6 +34,28 @@
 FROZEN_NOW = datetime.datetime(2016, 1, 2, 12, 1, 1)
 
 
+class NotPickleable(object):
+
+def __init__(self):
+# [AIRFLOW-351] This object prevents pickle, which
+# eventually causes TypeError by running `airflow clear`
+self.lock = threading.Lock()
+
+def callable(self):
+pass
+
+
+class StatefulTask(object):
+def do_run(self):
+self.run = True
+
+def clear_run(self):
+self.run = False
+
+def is_run(self):
+return self.run
+
+
 class PythonOperatorTest(unittest.TestCase):
 
 def setUp(self):
@@ -45,27 +68,19 @@ def setUp(self):
 'start_date': DEFAULT_DATE},
 schedule_interval=INTERVAL)
 self.addCleanup(self.dag.clear)
-self.clear_run()
-self.addCleanup(self.clear_run)
-
-def do_run(self):
-self.run = True
-
-def clear_run(self):
-self.run = False
-
-def is_run(self):
-return self.run
+self.stateful_task = StatefulTask()
+self.stateful_task.clear_run()
+self.addCleanup(self.stateful_task.clear_run)
 
 def test_python_operator_run(self):
 """Tests that the python callable is invoked on task run."""
 task = PythonOperator(
-python_callable=self.do_run,
+python_callable=self.stateful_task.do_run,
 task_id='python_operator',
 dag=self.dag)
-self.assertFalse(self.is_run())
+self.assertFalse(self.stateful_task.is_run())
 task.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE)
-self.assertTrue(self.is_run())
+self.assertTrue(self.stateful_task.is_run())
 
 def test_python_operator_python_callable_is_callable(self):
 """Tests that PythonOperator will only instantiate if
@@ -83,6 +98,15 @@ def test_python_operator_python_callable_is_callable(self):
 task_id='python_operator',
 dag=self.dag)
 
+def test_python_operator_python_callable_is_pickleable(self):
+"""Tests that PythonOperator will only instantiate if
+the python_callable argument is able to pickle."""
+with self.assertRaises(AirflowException):
+PythonOperator(
+python_callable=NotPickleable().callable,
+task_id='python_operator',
+dag=self.dag)
+
 
 class BranchOperatorTest(unittest.TestCase):
 def setUp(self):


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Failed to clear downstream tasks
> 
>
> Key: AIRFLOW-351
> URL: 

[jira] [Commented] (AIRFLOW-2193) R Language Operator

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723667#comment-16723667
 ] 

ASF GitHub Bot commented on AIRFLOW-2193:
-

stale[bot] closed pull request #3115: [AIRFLOW-2193] Add ROperator for using R
URL: https://github.com/apache/incubator-airflow/pull/3115
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/r_operator.py 
b/airflow/contrib/operators/r_operator.py
new file mode 100644
index 00..9974061892
--- /dev/null
+++ b/airflow/contrib/operators/r_operator.py
@@ -0,0 +1,85 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from builtins import bytes
+import os
+from tempfile import NamedTemporaryFile
+
+from airflow.models import BaseOperator
+from airflow.utils.decorators import apply_defaults
+from airflow.utils.file import TemporaryDirectory
+
+import rpy2.robjects as robjects
+from rpy2.rinterface import RRuntimeError
+
+
+class ROperator(BaseOperator):
+"""
+Execute an R script or command
+
+:param r_command: The command or a reference to an R script (must have
+'.r' extension) to be executed (templated)
+:type r_command: string
+:param xcom_push: If xcom_push is True (default: False), the last line
+written to stdout will also be pushed to an XCom (key 'return_value')
+when the R command completes.
+:type xcom_push: bool
+:param output_encoding: encoding output from R (default: 'utf-8')
+:type output_encoding: string
+
+"""
+
+template_fields = ('r_command',)
+template_ext = ('.r', '.R')
+ui_color = '#C8D5E6'
+
+@apply_defaults
+def __init__(
+self,
+r_command,
+xcom_push=False,
+output_encoding='utf-8',
+*args, **kwargs):
+
+super(ROperator, self).__init__(*args, **kwargs)
+self.r_command = r_command
+self.xcom_push = xcom_push
+self.output_encoding = output_encoding
+
+def execute(self, context):
+"""
+Execute the R command or script in a temporary directory
+"""
+
+with TemporaryDirectory(prefix='airflowtmp') as tmp_dir:
+with NamedTemporaryFile(dir=tmp_dir, prefix=self.task_id) as f:
+
+f.write(bytes(self.r_command, 'utf_8'))
+f.flush()
+fname = f.name
+script_location = os.path.abspath(fname)
+
+self.log.info("Temporary script location: %s", script_location)
+self.log.info("Running command(s):\n%s", self.r_command)
+
+try:
+res = robjects.r.source(fname, echo=False)
+except RRuntimeError as e:
+self.log.error("Received R error: %s", e)
+res = None
+
+if self.xcom_push and res:
+# This will be a pickled rpy2.robjects.vectors.ListVector
+self.log.info('Pushing last line of output to Xcom: \n %s', res)
+return res
diff --git a/tests/contrib/operators/test_r_operator.py 
b/tests/contrib/operators/test_r_operator.py
new file mode 100644
index 00..13205d9c1e
--- /dev/null
+++ b/tests/contrib/operators/test_r_operator.py
@@ -0,0 +1,171 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import print_function, unicode_literals
+
+import os
+import unittest
+
+from airflow import configuration, DAG
+from airflow.contrib.operators.r_operator import ROperator
+from airflow.models import TaskInstance
+from airflow.utils import timezone
+
+
+DEFAULT_DATE = timezone.datetime(2016, 1, 1)
+
+
+class 

[jira] [Commented] (AIRFLOW-1310) Kubernetes execute operator

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723665#comment-16723665
 ] 

ASF GitHub Bot commented on AIRFLOW-1310:
-

stale[bot] closed pull request #2456: [AIRFLOW-1310] Basic operator to run 
docker container on Kubernetes
URL: https://github.com/apache/incubator-airflow/pull/2456
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/kubernetes_hook.py 
b/airflow/contrib/hooks/kubernetes_hook.py
new file mode 100644
index 00..fbabf81966
--- /dev/null
+++ b/airflow/contrib/hooks/kubernetes_hook.py
@@ -0,0 +1,334 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import requests
+import json
+
+from airflow.exceptions import AirflowException
+from airflow.hooks.base_hook import BaseHook
+
+from kubernetes import client, config
+
+class KubernetesHook(BaseHook):
+"""
+Kubernetes interaction hook
+
+:param k8s_conn_id: reference to a pre-defined K8s Connection
+:type k8s_conn_id: string
+"""
+
+def __init__(self, k8s_conn_id="k8s_default"):
+self.conn_id = k8s_conn_id
+self.core_client = None
+
+def get_conn(self):
+"""
+Initializes the api client. Only config file or env
+configuration supported at the moment.
+"""
+if not self.core_client:
+config.load_kube_config()
+self.core_client = client.CoreV1Api()
+
+return self.core_client
+
+def get_env_definitions(self, env):
+def get_env(name, definition):
+if isinstance(definition, str):
+return client.V1EnvVar(name=name, value=definition)
+elif isinstance(definition, dict):
+source = definition['source']
+if source == 'configMap':
+return client.V1EnvVar(name=name,
+value_from=client.V1EnvVarSource(
+
config_map_key_ref=client.V1ConfigMapKeySelector(
+key=definition['key'], 
name=definition['name'])))
+elif source == 'secret':
+return client.V1EnvVar(name=name,
+value_from=client.V1EnvVarSource(
+secret_key_ref=client.V1SecretKeySelector(
+key=definition['key'], 
name=definition['name'])))
+else:
+raise AirflowException('Creating env vars from %s not 
implemented',
+source)
+else:
+raise AirflowException('Environment variable definition \
+has to be either string or a dictionary. %s given 
instead',
+type(definition))
+
+return [get_env(name, definition) for name, definition in env.items()]
+
+def get_env_from_definitions(self, env_from):
+def get_env_from(definition):
+configmap = definition.get('configMap')
+secret = definition.get('secret')
+prefix = definition.get('prefix')
+
+cfg_ref = client.V1ConfigMapEnvSource(name=configmap) if configmap 
else None
+secret_ref = client.V1SecretEnvSource(name=secret) if secret else 
None
+
+return client.V1EnvFromSource(
+config_map_ref=cfg_ref,
+secret_ref=secret_ref,
+prefix=prefix
+)
+return [get_env_from(definition) for definition in env_from]
+
+def get_volume_definitions(self, volumes):
+def get_volume(name, definition):
+if definition['type'] == 'emptyDir':
+volume = client.V1Volume(
+name=name,
+empty_dir=client.V1EmptyDirVolumeSource()
+)
+volume_mount = client.V1VolumeMount(
+mount_path=definition['mountPath'],
+name=name
+)
+elif definition['type'] == 'hostPath':
+volume = client.V1Volume(
+name=name,
+

[jira] [Commented] (AIRFLOW-1096) Add conn_ids to template_fields in PostgreSQL and MySQL operators

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723663#comment-16723663
 ] 

ASF GitHub Bot commented on AIRFLOW-1096:
-

stale[bot] closed pull request #2235: [AIRFLOW-1096] Add conn_ids to 
template_fields
URL: https://github.com/apache/incubator-airflow/pull/2235
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/operators/mysql_operator.py 
b/airflow/operators/mysql_operator.py
index 156ada8e90..9c93910385 100644
--- a/airflow/operators/mysql_operator.py
+++ b/airflow/operators/mysql_operator.py
@@ -33,7 +33,7 @@ class MySqlOperator(BaseOperator):
 :type database: string
 """
 
-template_fields = ('sql',)
+template_fields = ('sql', 'mysql_conn_id',)
 template_ext = ('.sql',)
 ui_color = '#ededed'
 
diff --git a/airflow/operators/postgres_operator.py 
b/airflow/operators/postgres_operator.py
index 0de5aa53cd..0b9d5556a4 100644
--- a/airflow/operators/postgres_operator.py
+++ b/airflow/operators/postgres_operator.py
@@ -33,7 +33,7 @@ class PostgresOperator(BaseOperator):
 :type database: string
 """
 
-template_fields = ('sql',)
+template_fields = ('sql', 'postgres_conn_id',)
 template_ext = ('.sql',)
 ui_color = '#ededed'
 
diff --git a/tests/operators/operators.py b/tests/operators/operators.py
index 62bc4bf80e..9930f6ef6a 100644
--- a/tests/operators/operators.py
+++ b/tests/operators/operators.py
@@ -89,6 +89,16 @@ def mysql_hook_test_bulk_load(self):
 results = tuple(result[0] for result in c.fetchall())
 self.assertEqual(sorted(results), sorted(records))
 
+def test_mysql_conn_id_template(self):
+conn = 'airflow_db'
+
+import airflow.operators.mysql_operator
+t = operators.mysql_operator.MySqlOperator(
+task_id='test_mysql_conn_id_template',
+mysql_conn_id='{{ conn }}',
+sql='SELECT count(1) FROM INFORMATION_SCHEMA.TABLES',
+dag=self.dag)
+
 def test_mysql_to_mysql(self):
 sql = "SELECT * FROM INFORMATION_SCHEMA.TABLES LIMIT 100;"
 import airflow.operators.generic_transfer
@@ -174,6 +184,16 @@ def postgres_operator_test_multi(self):
 t = operators.postgres_operator.PostgresOperator(
 task_id='postgres_operator_test_multi', sql=sql, dag=self.dag)
 t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, 
ignore_ti_state=True)
+
+def test_postgres_conn_id_template(self):
+conn = 'postgres_default'
+
+import airflow.operators.postgres_operator
+t = operators.postgres_operator.PostgresOperator(
+task_id='test_postgres_conn_id_template', 
+postgres_conn_id='{{ conn }}',
+sql='SELECT count(1) FROM INFORMATION_SCHEMA.TABLES', 
+dag=self.dag)
 
 def test_postgres_to_postgres(self):
 sql = "SELECT * FROM INFORMATION_SCHEMA.TABLES LIMIT 100;"


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add conn_ids to template_fields in PostgreSQL and MySQL operators
> -
>
> Key: AIRFLOW-1096
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1096
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Nicholas Duffy
>Assignee: Nicholas Duffy
>Priority: Minor
>
> As an Airflow developer, I would like to have the `postgres_conn_id` field on 
> the PostgresOperator and `mysql_conn_id` field on the MysqlOperator 
> templated, so that I can pass in dynamic connection ID names.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-675) Add an error log to the UI of Airflow.

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723654#comment-16723654
 ] 

ASF GitHub Bot commented on AIRFLOW-675:


stale[bot] closed pull request #1922: [AIRFLOW-675] Add error log to UI
URL: https://github.com/apache/incubator-airflow/pull/1922
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/www/app.py b/airflow/www/app.py
index c2c180ac9c..5df479e48a 100644
--- a/airflow/www/app.py
+++ b/airflow/www/app.py
@@ -88,6 +88,8 @@ def create_app(config=None, testing=False):
 models.Pool, Session, name="Pools", category="Admin"))
 av(vs.ConfigurationView(
 name='Configuration', category="Admin"))
+av(vs.ErrorLogView(
+name='Error Log', category="Admin"))
 av(vs.UserModelView(
 models.User, Session, name="Users", category="Admin"))
 av(vs.ConnectionModelView(
diff --git a/airflow/www/templates/airflow/log.html 
b/airflow/www/templates/airflow/log.html
new file mode 100644
index 00..50950f3ca7
--- /dev/null
+++ b/airflow/www/templates/airflow/log.html
@@ -0,0 +1,39 @@
+{#
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+
+#}
+{% extends "airflow/master.html" %}
+
+{% block title %}
+{{ title }}
+{% endblock %}
+
+{% block body %}
+{{ super() }}
+{{ title }}
+
+{% if subtitle %}
+{{ subtitle }}
+{% endif %}
+
+{% if post_subtitle %}
+{{ post_subtitle }}
+{% endif %}
+
+{% if log %}
+{{ log }}
+{% endif %}
+{% endblock %}
diff --git a/airflow/www/views.py b/airflow/www/views.py
index d22e8e4b3f..8ffd70f912 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -2620,6 +2620,36 @@ def conf(self):
 table=table)
 
 
+class ErrorLogView(wwwutils.SuperUserMixin, BaseView):
+@expose('/')
+def error_log(self):
+raw = request.args.get('raw') == "true"
+title = "Airflow Error Log"
+log_base = os.path.expanduser(conf.get('core', 'BASE_LOG_FOLDER'))
+error_log_path = os.path.normpath(log_base + "/error.log")
+subtitle = "Error log path: " + error_log_path
+post_subtitle = None
+error_log = None
+
+if not (os.path.isfile(error_log_path)):
+post_subtitle = "The error log file does not exist."
+elif os.stat(error_log_path).st_size == 0:
+post_subtitle = "The error log file is empty."
+else:
+with open(error_log_path, 'r') as f:
+error_log = f.read()
+
+if raw:
+return Response(
+response=error_log,
+status=200,
+mimetype="application/text")
+else:
+return self.render(
+'airflow/log.html', log=error_log, title=title, 
subtitle=subtitle,
+post_subtitle=post_subtitle)
+
+
 class DagModelView(wwwutils.SuperUserMixin, ModelView):
 column_list = ('dag_id', 'owners')
 column_editable_list = ('is_paused',)
diff --git a/tests/core.py b/tests/core.py
index 24315f186b..390639237a 100644
--- a/tests/core.py
+++ b/tests/core.py
@@ -1397,6 +1397,10 @@ def test_dag_views(self):
 '/admin/configurationview/')
 assert "Airflow Configuration" in response.data.decode('utf-8')
 assert "Running Configuration" in response.data.decode('utf-8')
+response = self.app.get(
+'/admin/errorlogview/')
+assert "Airflow Error Log" in response.data.decode('utf-8')
+assert "Error log path" in response.data.decode('utf-8')
 response = self.app.get(
 '/admin/airflow/rendered?'
 'task_id=runme_1_id=example_bash_operator&'


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific 

[jira] [Commented] (AIRFLOW-1044) Run from UI with Ignore Task Debs not working

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723660#comment-16723660
 ] 

ASF GitHub Bot commented on AIRFLOW-1044:
-

stale[bot] closed pull request #2192: [AIRFLOW-1044] base_task_runner fix for 
Task RUn Ignoring dependencies
URL: https://github.com/apache/incubator-airflow/pull/2192
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/task_runner/base_task_runner.py 
b/airflow/task_runner/base_task_runner.py
index 51c382561b..05b41ad223 100644
--- a/airflow/task_runner/base_task_runner.py
+++ b/airflow/task_runner/base_task_runner.py
@@ -78,6 +78,7 @@ def __init__(self, local_task_job):
 raw=True,
 ignore_all_deps=local_task_job.ignore_all_deps,
 ignore_depends_on_past=local_task_job.ignore_depends_on_past,
+ignore_task_deps=local_task_job.ignore_task_deps,
 ignore_ti_state=local_task_job.ignore_ti_state,
 pickle_id=local_task_job.pickle_id,
 mark_success=local_task_job.mark_success,


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run from UI with Ignore Task Debs not working
> -
>
> Key: AIRFLOW-1044
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1044
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Muthuraj Ramasamy
>Assignee: Muthuraj Ramasamy
>Priority: Minor
>
> When "Run" is performed on a task with Ignore Task Debs from UI, its not 
> passed to the command formation and not getting started. The reason is 
> ignore_task_deps property is not set in the init defintion of BaseTaskRunner 
> [base_task_runner.py].
> Current:
>self._command = popen_prepend + 
> self._task_instance.command_as_list(
> raw=True,
> ignore_all_deps=local_task_job.ignore_all_deps,
> ignore_depends_on_past=local_task_job.ignore_depends_on_past,
> ignore_ti_state=local_task_job.ignore_ti_state,
> pickle_id=local_task_job.pickle_id,
> mark_success=local_task_job.mark_success,
> job_id=local_task_job.id,
> pool=local_task_job.pool,
> cfg_path=cfg_path,
> )
> Fixed:
> self._command = popen_prepend + 
> self._task_instance.command_as_list(
> raw=True,
> ignore_all_deps=local_task_job.ignore_all_deps,
> ignore_depends_on_past=local_task_job.ignore_depends_on_past,
> ignore_task_deps=local_task_job.ignore_task_deps,
> ignore_ti_state=local_task_job.ignore_ti_state,
> pickle_id=local_task_job.pickle_id,
> mark_success=local_task_job.mark_success,
> job_id=local_task_job.id,
> pool=local_task_job.pool,
> cfg_path=cfg_path,
> )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-920) Can't mark non-existent tasks as successful from graph view

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723658#comment-16723658
 ] 

ASF GitHub Bot commented on AIRFLOW-920:


stale[bot] closed pull request #2113: [AIRFLOW-920] Allow marking tasks in 
zoomed in subdags
URL: https://github.com/apache/incubator-airflow/pull/2113
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/api/common/experimental/mark_tasks.py 
b/airflow/api/common/experimental/mark_tasks.py
index 0ddbf987bf..938b856393 100644
--- a/airflow/api/common/experimental/mark_tasks.py
+++ b/airflow/api/common/experimental/mark_tasks.py
@@ -49,6 +49,72 @@ def _create_dagruns(dag, execution_dates, state, 
run_id_template):
 return drs
 
 
+def _verify_tree(dag, dates, state):
+"""
+Go through the tree of dags and create any missing dag runs for sub dags
+:param dag: top of the tree to start looking
+:param dates: dates for which to create the runs
+:return: list of confirmed dates for which dag runs existed
+"""
+root = dag
+while root.is_subdag:
+root = root.parent_dag
+
+confirmed_dates = []
+
+drs = DagRun.find(dag_id=root.dag_id, execution_date=dates)
+for dr in drs:
+dr.dag = root
+dr.verify_integrity()
+confirmed_dates.append(dr.execution_date)
+
+dags = [root]
+while len(dags) > 0:
+current_dag = dags.pop(0)
+for task_id in current_dag.task_ids:
+task = current_dag.get_task(task_id)
+if isinstance(task, SubDagOperator):
+_create_dagruns(task.subdag,
+execution_dates=confirmed_dates,
+state=state,
+run_id_template=BackfillJob.ID_FORMAT_PREFIX)
+dags.append(task.subdag)
+
+return confirmed_dates
+
+
+def _walk_subdags(dag, task_ids, execution_dates, state, commit, session):
+dags = [dag]
+sub_dag_ids = []
+while len(dags) > 0:
+current_dag = dags.pop()
+for task_id in task_ids:
+if not current_dag.has_task(task_id):
+continue
+
+current_task = current_dag.get_task(task_id)
+if isinstance(current_task, SubDagOperator):
+# this works as a kind of integrity check
+# it creates missing dag runs for subdagoperators,
+# maybe this should be moved to dagrun.verify_integrity
+drs = _create_dagruns(current_task.subdag,
+  execution_dates=execution_dates,
+  state=state,
+  
run_id_template=BackfillJob.ID_FORMAT_PREFIX)
+
+for dr in drs:
+dr.dag = current_task.subdag
+dr.verify_integrity()
+if commit:
+dr.state = state
+session.merge(dr)
+
+dags.append(current_task.subdag)
+sub_dag_ids.append(current_task.subdag.dag_id)
+
+return sub_dag_ids
+
+
 def set_state(task, execution_date, upstream=False, downstream=False,
   future=False, past=False, state=State.SUCCESS, commit=False):
 """
@@ -78,7 +144,6 @@ def set_state(task, execution_date, upstream=False, 
downstream=False,
 dag = task.dag
 
 latest_execution_date = dag.latest_execution_date
-assert latest_execution_date is not None
 
 # determine date range of dag runs and tasks to consider
 end_date = latest_execution_date if future else execution_date
@@ -109,44 +174,15 @@ def set_state(task, execution_date, upstream=False, 
downstream=False,
 # verify the integrity of the dag runs in case a task was added or removed
 # set the confirmed execution dates as they might be different
 # from what was provided
-confirmed_dates = []
-drs = DagRun.find(dag_id=dag.dag_id, execution_date=dates)
-for dr in drs:
-dr.dag = dag
-dr.verify_integrity()
-confirmed_dates.append(dr.execution_date)
+confirmed_dates = _verify_tree(dag, dates, state=State.RUNNING)
 
 # go through subdagoperators and create dag runs. We will only work
-# within the scope of the subdag. We wont propagate to the parent dag,
-# but we will propagate from parent to subdag.
+# within the scope of the subdag.
 session = Session()
-dags = [dag]
-sub_dag_ids = []
-while len(dags) > 0:
-current_dag = dags.pop()
-for task_id in task_ids:
-if not current_dag.has_task(task_id):
-continue
 
-current_task = 

[jira] [Commented] (AIRFLOW-1076) Support getting variable by string in templates

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723661#comment-16723661
 ] 

ASF GitHub Bot commented on AIRFLOW-1076:
-

stale[bot] closed pull request #2223: [AIRFLOW-1076] Add get method for 
template variable accessor
URL: https://github.com/apache/incubator-airflow/pull/2223
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/models.py b/airflow/models.py
index edb3b67a40..75e2238c67 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -1699,20 +1699,31 @@ def get_template_context(self, session=None):
 
 class VariableAccessor:
 """
-Wrapper around Variable. This way you can get variables in 
templates by using
-{var.variable_name}.
+Wrapper around Variable. This way you can get variables in
+templates by using {{ var.value.variable_name }} or
+{{ var.value.get('variable_name', 'backup') }}.
 """
 def __init__(self):
 self.var = None
 
-def __getattr__(self, item):
+def __getattr__(self, item, default_var=None):
 self.var = Variable.get(item)
 return self.var
 
 def __repr__(self):
 return str(self.var)
 
+@staticmethod
+def get(item, default_var=None):
+self.var = Variable.get(item, default_var=default_var)
+return self.var
+
 class VariableJsonAccessor:
+"""
+Wrapper around Variable. This way you can get variables in
+templates by using {{ var.json.variable_name }} or
+{{ var.json.get('variable_name', 'backup') }}.
+"""
 def __init__(self):
 self.var = None
 
@@ -1723,6 +1734,12 @@ def __getattr__(self, item):
 def __repr__(self):
 return str(self.var)
 
+@staticmethod
+def get(item, default_var=None):
+self.var = Variable.get(item, default_var=default_var,
+deserialize_json=True)
+return self.var
+
 return {
 'dag': task.dag,
 'ds': ds,
diff --git a/docs/code.rst b/docs/code.rst
index d74d00ec56..ef5fa1ab2a 100644
--- a/docs/code.rst
+++ b/docs/code.rst
@@ -193,6 +193,11 @@ UI. You can access them as either plain-text or JSON. If 
you use JSON, you are
 also able to walk nested structures, such as dictionaries like:
 ``{{ var.json.my_dict_var.key1 }}``
 
+It is also possible to fetch a variable by string if
+needed with ``{{ var.value.get('my_var', 'fallback') }}`` or
+``{{ var.json.get('my_dict_var', {'key1': 'val1'}).key1 }}``. Defaults can be
+supplied in case the variable does not exist.
+
 Macros
 ''
 Macros are a way to expose objects to your templates and live under the
diff --git a/tests/core.py b/tests/core.py
index f25d0e7ff2..6322a9fd05 100644
--- a/tests/core.py
+++ b/tests/core.py
@@ -520,7 +520,54 @@ def verify_templated_field(context):
 some_templated_field='{{ var.value.a_variable }}',
 on_success_callback=verify_templated_field,
 dag=self.dag)
-t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, 
ignore_ti_state=True)
+t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE,
+  ignore_ti_state=True)
+self.assertTrue(val['success'])
+
+def test_template_with_variable_get(self):
+"""
+Test the availability of variables in templates using get() method
+"""
+val = {
+'success': False,
+'test_value': 'a test value'
+}
+Variable.set('a_variable', val['test_value'])
+
+def verify_templated_field(context):
+self.assertEqual(context['ti'].task.some_templated_field,
+ val['test_value'])
+val['success'] = True
+
+t = OperatorSubclass(
+task_id='test_complex_template',
+some_templated_field='{{ var.value.get("a_variable") }}',
+on_success_callback=verify_templated_field,
+dag=self.dag)
+t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE,
+  ignore_ti_state=True)
+self.assertTrue(val['success'])
+
+def test_template_with_variable_get_with_default(self):
+"""
+Test the availability of variables in templates using get() method with
+a default value
+"""
+val = {
+'success': False,
+}
+
+def verify_templated_field(context):
+

[jira] [Commented] (AIRFLOW-811) Bash_operator dont read multiline output

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723657#comment-16723657
 ] 

ASF GitHub Bot commented on AIRFLOW-811:


stale[bot] closed pull request #2026: [AIRFLOW-811] [BugFix] bash_operator does 
not return full output
URL: https://github.com/apache/incubator-airflow/pull/2026
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/operators/bash_operator.py 
b/airflow/operators/bash_operator.py
index ff2ed51b96..ebba0ee07e 100644
--- a/airflow/operators/bash_operator.py
+++ b/airflow/operators/bash_operator.py
@@ -13,11 +13,15 @@
 # limitations under the License.
 
 
-from builtins import bytes
+import logging
+import mmap
 import os
+import re
 import signal
+import io
 from subprocess import Popen, STDOUT, PIPE
 from tempfile import gettempdir, NamedTemporaryFile
+from builtins import bytes
 
 from airflow.exceptions import AirflowException
 from airflow.models import BaseOperator
@@ -52,60 +56,90 @@ def __init__(
 bash_command,
 xcom_push=False,
 env=None,
+log_outout=True,
 output_encoding='utf-8',
+output_regex_filter=None,
 *args, **kwargs):
 
 super(BashOperator, self).__init__(*args, **kwargs)
 self.bash_command = bash_command
 self.env = env
-self.xcom_push_flag = xcom_push
+self.xcom_push = xcom_push
+self.log_outout = log_outout
 self.output_encoding = output_encoding
+self.output_regex_filter = output_regex_filter
+self.sp = None
 
 def execute(self, context):
 """
 Execute the bash command in a temporary directory
 which will be cleaned afterwards
 """
-bash_command = self.bash_command
-self.log.info("Tmp dir root location: \n %s", gettempdir())
 with TemporaryDirectory(prefix='airflowtmp') as tmp_dir:
-with NamedTemporaryFile(dir=tmp_dir, prefix=self.task_id) as f:
-
-f.write(bytes(bash_command, 'utf_8'))
-f.flush()
-fname = f.name
+self.log.info("Tmp dir root location: {0}".format(tmp_dir))
+with NamedTemporaryFile(dir=tmp_dir, prefix=self.task_id) as 
cmd_file, \
+NamedTemporaryFile(dir=tmp_dir, prefix=self.task_id) as 
stdout_file, \
+NamedTemporaryFile(dir=tmp_dir, prefix=self.task_id) as 
stderr_file:
+
+cmd_file.write(bytes(self.bash_command, 'utf_8'))
+cmd_file.flush()
+fname = cmd_file.name
 script_location = tmp_dir + "/" + fname
-self.log.info(
-"Temporary script location: %s",
-script_location
-)
-self.log.info("Running command: %s", bash_command)
-sp = Popen(
-['bash', fname],
-stdout=PIPE, stderr=STDOUT,
-cwd=tmp_dir, env=self.env,
-preexec_fn=os.setsid)
-
-self.sp = sp
-
-self.log.info("Output:")
-line = ''
-for line in iter(sp.stdout.readline, b''):
-line = line.decode(self.output_encoding).strip()
-self.log.info(line)
-sp.wait()
-self.log.info(
-"Command exited with return code %s",
-sp.returncode
-)
-
-if sp.returncode:
-raise AirflowException("Bash command failed")
-
-if self.xcom_push_flag:
-return line
+logging.info("Temporary script location 
:{0}".format(script_location))
+logging.info("Running command: " + self.bash_command)
+self.sp = Popen(
+['bash', fname],
+stdout=stdout_file,
+stderr=stderr_file,
+cwd=tmp_dir,
+env=self.env,
+preexec_fn=os.setsid)
+
+self.sp.wait()
+
+exit_msg = "Command exited with return code 
{0}".format(self.sp.returncode)
+if self.sp.returncode:
+stderr_output = None
+with io.open(stderr_file.name, 'r+', 
encoding=self.output_encoding) as stderr_file_handle:
+if os.path.getsize(stderr_file.name) > 0:
+stderr_output = 
mmap.mmap(stderr_file_handle.fileno(), 0, access=mmap.ACCESS_READ)
+raise AirflowException("Bash 

[jira] [Commented] (AIRFLOW-765) Auto detect dag dependency files, variables, and resource file changes and reload dag

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723656#comment-16723656
 ] 

ASF GitHub Bot commented on AIRFLOW-765:


stale[bot] closed pull request #2015: [AIRFLOW-765] Auto detect dag dependency 
files, variables, and resour…
URL: https://github.com/apache/incubator-airflow/pull/2015
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.rat-excludes b/.rat-excludes
index 1238abb0a0..bdcb289891 100644
--- a/.rat-excludes
+++ b/.rat-excludes
@@ -24,4 +24,4 @@ CHANGELOG.txt
 # it is compatible according to 
http://www.apache.org/legal/resolved.html#category-a
 kerberos_auth.py
 airflow_api_auth_backend_kerberos_auth_py.html
-
+autoreload.json
diff --git a/airflow/example_dags/config/autoreload.json 
b/airflow/example_dags/config/autoreload.json
new file mode 100644
index 00..587719bd9c
--- /dev/null
+++ b/airflow/example_dags/config/autoreload.json
@@ -0,0 +1,4 @@
+{
+  "dag_owner": "amin",
+  "schedule_interval": "*/1 * * * *"
+}
\ No newline at end of file
diff --git a/airflow/example_dags/example_autoreload.py 
b/airflow/example_dags/example_autoreload.py
new file mode 100644
index 00..74f38de978
--- /dev/null
+++ b/airflow/example_dags/example_autoreload.py
@@ -0,0 +1,61 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import print_function
+
+import json
+import os
+from datetime import datetime, timedelta
+
+from airflow.models import DAG, Variable
+from airflow.operators.dummy_operator import DummyOperator
+
+config_path = os.path.join(
+os.path.dirname(os.path.abspath(__file__)),
+'config'
+)
+
+# these files will be watched for changes
+__resource_file_selectors__ = [
+(config_path, ('.json'))
+]
+
+with open(os.path.join(config_path, 'autoreload.json')) as config_data:
+config_object = json.load(config_data)
+
+# instances of MaterializedVariable will be watched for changes and
+# trigger autoreload of the DAG
+number_of_days_back_var = Variable.materialize(
+'number_of_days_back',
+default_value=7
+)
+
+number_of_days_back = datetime.combine(
+datetime.today() -
+timedelta(int(number_of_days_back_var.val)), datetime.min.time()
+)
+
+args = {
+'owner': config_object['dag_owner'],
+'start_date': number_of_days_back,
+}
+
+dag = DAG(
+dag_id='example_autoreload',
+default_args=args,
+schedule_interval=config_object['schedule_interval']
+)
+
+run_this = DummyOperator(
+task_id='dummy_operator',
+dag=dag)
diff --git 
a/airflow/migrations/versions/5986598b22a9_add_last_updated_to_variable.py 
b/airflow/migrations/versions/5986598b22a9_add_last_updated_to_variable.py
new file mode 100644
index 00..4bdb59bab0
--- /dev/null
+++ b/airflow/migrations/versions/5986598b22a9_add_last_updated_to_variable.py
@@ -0,0 +1,40 @@
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Add Last Updated to Variable
+
+Revision ID: 5986598b22a9
+Revises: 5e7d17757c7a
+Create Date: 2017-01-17 23:22:10.142592
+
+"""
+
+# revision identifiers, used by Alembic.
+revision = '5986598b22a9'
+down_revision = '5e7d17757c7a'
+branch_labels = None
+depends_on = None
+
+import sqlalchemy as sa
+from alembic import op
+
+
+def upgrade():
+op.add_column('variable', sa.Column('last_updated',
+sa.DateTime(),
+default=sa.func.now(),
+onupdate=sa.func.now()))
+
+
+def downgrade():
+op.drop_column('variable', 'last_updated')
diff --git a/airflow/models.py b/airflow/models.py
index aab4833ae8..9fc134f11a 100755
--- a/airflow/models.py
+++ 

[jira] [Commented] (AIRFLOW-697) Ability to exclude a task in a Dag run.

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723655#comment-16723655
 ] 

ASF GitHub Bot commented on AIRFLOW-697:


stale[bot] closed pull request #1942: [AIRFLOW-697] Add exclusion of tasks.
URL: https://github.com/apache/incubator-airflow/pull/1942
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/jobs.py b/airflow/jobs.py
index a2d94e30cd..7321025cae 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -43,7 +43,7 @@
 from airflow import executors, models, settings
 from airflow import configuration as conf
 from airflow.exceptions import AirflowException
-from airflow.models import DagRun
+from airflow.models import DagRun, TaskExclusion
 from airflow.settings import Stats
 from airflow.ti_deps.dep_context import DepContext, QUEUE_DEPS, RUN_DEPS
 from airflow.utils.state import State
@@ -844,8 +844,15 @@ def _process_task_instances(self, dag, queue):
 if ti.are_dependencies_met(
 dep_context=DepContext(flag_upstream_failed=True),
 session=session):
-self.logger.debug('Queuing task: {}'.format(ti))
-queue.append(ti.key)
+if TaskExclusion.should_exclude_task(
+dag_id=ti.dag_id,
+task_id=ti.task_id,
+execution_date=ti.execution_date):
+self.logger.debug('Excluding task: {}'.format(ti))
+ti.set_state(State.EXCLUDED, session)
+else:
+self.logger.debug('Queuing task: {}'.format(ti))
+queue.append(ti.key)
 
 session.close()
 
@@ -1733,7 +1740,7 @@ def get_task_instances_for_dag_run(dag_run):
   .format(ti, ti.state))
 # The task was already marked successful or skipped by a
 # different Job. Don't rerun it.
-if ti.state == State.SUCCESS:
+if ti.state_for_dependents() == State.SUCCESS:
 succeeded.add(key)
 self.logger.debug("Task instance {} succeeded. "
   "Don't rerun.".format(ti))
@@ -1831,7 +1838,7 @@ def get_task_instances_for_dag_run(dag_run):
 elif state == State.SUCCESS:
 
 # task reports success
-if ti.state == State.SUCCESS:
+if ti.state_for_dependents() == State.SUCCESS:
 self.logger.info(
 'Task instance {} succeeded'.format(ti))
 succeeded.add(key)
diff --git 
a/airflow/migrations/versions/bbb79aef5cac_create_task_exclusion_table.py 
b/airflow/migrations/versions/bbb79aef5cac_create_task_exclusion_table.py
new file mode 100644
index 00..85e957ea72
--- /dev/null
+++ b/airflow/migrations/versions/bbb79aef5cac_create_task_exclusion_table.py
@@ -0,0 +1,62 @@
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""create task_exclusion table
+
+Revision ID: bbb79aef5cac
+Revises: f2ca10b85618
+Create Date: 2016-11-18 13:38:34.653202
+
+"""
+
+# revision identifiers, used by Alembic.
+revision = 'bbb79aef5cac'
+down_revision = 'f2ca10b85618'
+branch_labels = None
+depends_on = None
+
+from alembic import op, context
+import sqlalchemy as sa
+from sqlalchemy.dialects import mysql
+
+
+def upgrade():
+if context.config.get_main_option('sqlalchemy.url').startswith('mysql'):
+op.create_table(
+'task_exclusion',
+sa.Column('id', sa.Integer(), nullable=False),
+sa.Column('dag_id', sa.String(length=250), nullable=False),
+sa.Column('task_id', sa.String(length=250), nullable=False),
+sa.Column('exclusion_type', sa.String(length=32), nullable=False),
+sa.Column('exclusion_start_date', mysql.DATETIME(fsp=6),
+  nullable=False),
+sa.Column('exclusion_end_date', mysql.DATETIME(fsp=6),
+  nullable=False),
+   

[jira] [Commented] (AIRFLOW-659) Automatic Refresh on DAG Graph View

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723653#comment-16723653
 ] 

ASF GitHub Bot commented on AIRFLOW-659:


stale[bot] closed pull request #1910: [AIRFLOW-659] Automatic Refresh on DAG 
Graph View
URL: https://github.com/apache/incubator-airflow/pull/1910
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/configuration.py b/airflow/configuration.py
index 265f7289ea..7d17451100 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -261,6 +261,11 @@ def run_command(command):
 # privacy.
 demo_mode = False
 
+# Rate at which to automatically refresh the task states in the graph view in
+# milliseconds. If not set or set to 0, the graph will require refreshing
+# manually. Otherwise the manual refresh button will not be displayed.
+graph_refresh_rate = 0
+
 # The amount of time (in secs) webserver will wait for initial handshake
 # while fetching logs from other worker machine
 log_fetch_timeout_sec = 5
diff --git a/airflow/www/templates/airflow/graph.html 
b/airflow/www/templates/airflow/graph.html
index 24fc508027..72efaa6747 100644
--- a/airflow/www/templates/airflow/graph.html
+++ b/airflow/www/templates/airflow/graph.html
@@ -1,13 +1,13 @@
-{# 
+{#
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
-  
+
 http://www.apache.org/licenses/LICENSE-2.0
-  
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -76,9 +76,11 @@
 Oops.
 
 
+{% if refresh_rate|int == 0 %}
 
 
 
+{% endif %}
 
 
 
@@ -345,25 +347,29 @@
   $('#datatable_section').hide(1000);
 }
 
-d3.select("#refresh_button").on("click",
-function() {
-$("#loading").css("display", "block");
-$("div#svg_container").css("opacity", "0.2");
-$.get(
-"/admin/airflow/object/task_instances",
-{dag_id : "{{ dag.dag_id }}", execution_date : "{{ 
execution_date }}"})
-.done(
-function(task_instances) {
-update_nodes_states(JSON.parse(task_instances));
-$("#loading").hide();
-$("div#svg_container").css("opacity", "1");
-$('#error').hide();
-}
-).fail(function(jqxhr, textStatus, err) {
-error(textStatus + ': ' + err);
-});
-}
-);
+function refreshGraph() {
+$("#loading").css("display", "block");
+$("div#svg_container").css("opacity", "0.2");
+$.get(
+"/admin/airflow/object/task_instances",
+{dag_id : "{{ dag.dag_id }}", execution_date : "{{ execution_date 
}}"})
+.done(
+function(task_instances) {
+update_nodes_states(JSON.parse(task_instances));
+$("#loading").hide();
+$("div#svg_container").css("opacity", "1");
+$('#error').hide();
+}
+).fail(function(jqxhr, textStatus, err) {
+error(textStatus + ': ' + err);
+});
+}
+
+{% if refresh_rate|int > 0 %}
+window.setInterval(refreshGraph, {{ refresh_rate }})
+{% else %}
+d3.select("#refresh_button").on("click", refreshGraph);
+{% endif %}
 
 
 
diff --git a/airflow/www/views.py b/airflow/www/views.py
index d22e8e4b3f..cd4dd88403 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -1416,6 +1416,10 @@ class GraphForm(Form):
 session.close()
 doc_md = markdown.markdown(dag.doc_md) if hasattr(dag, 'doc_md') else 
''
 
+refresh_rate = int(conf.get('webserver', 'graph_refresh_rate'))
+if not refresh_rate:
+refresh_rate = 0
+
 return self.render(
 'airflow/graph.html',
 dag=dag,
@@ -1435,7 +1439,8 @@ class GraphForm(Form):
 task_instances=json.dumps(task_instances, indent=2),
 tasks=json.dumps(tasks, indent=2),
 nodes=json.dumps(nodes, indent=2),
-edges=json.dumps(edges, indent=2),)
+edges=json.dumps(edges, indent=2),
+refresh_rate=refresh_rate)
 
 

[jira] [Commented] (AIRFLOW-946) Virtualenv not explicitly used by webserver/worker subprocess

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723659#comment-16723659
 ] 

ASF GitHub Bot commented on AIRFLOW-946:


stale[bot] closed pull request #2131: [AIRFLOW-946] call commands with 
virtualenv if available
URL: https://github.com/apache/incubator-airflow/pull/2131
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index eb96e77537..0119a3e861 100755
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -56,6 +56,7 @@
 from airflow.ti_deps.dep_context import (DepContext, SCHEDULER_DEPS)
 from airflow.utils import cli as cli_utils
 from airflow.utils import db as db_utils
+from airflow.utils.file import use_virtualenv
 from airflow.utils.net import get_hostname
 from airflow.utils.log.logging_mixin import (LoggingMixin, redirect_stderr,
  redirect_stdout)
@@ -776,7 +777,7 @@ def webserver(args):
 '''.format(**locals(
 
 run_args = [
-'gunicorn',
+use_virtualenv('gunicorn'),
 '-w', str(num_workers),
 '-k', str(args.workerclass),
 '-t', str(worker_timeout),
@@ -802,6 +803,7 @@ def webserver(args):
 run_args += ["airflow." + webserver_module + ".app:cached_app()"]
 
 gunicorn_master_proc = None
+env = os.environ.copy()
 
 def kill_proc(dummy_signum, dummy_frame):
 gunicorn_master_proc.terminate()
@@ -830,7 +832,7 @@ def monitor_gunicorn(gunicorn_master_proc):
 },
 )
 with ctx:
-subprocess.Popen(run_args, close_fds=True)
+subprocess.Popen(run_args, env=env, close_fds=True)
 
 # Reading pid file directly, since Popen#pid doesn't
 # seem to return the right value with DaemonContext.
@@ -849,7 +851,7 @@ def monitor_gunicorn(gunicorn_master_proc):
 stdout.close()
 stderr.close()
 else:
-gunicorn_master_proc = subprocess.Popen(run_args, close_fds=True)
+gunicorn_master_proc = subprocess.Popen(run_args, env=env, 
close_fds=True)
 
 signal.signal(signal.SIGINT, kill_proc)
 signal.signal(signal.SIGTERM, kill_proc)
@@ -943,7 +945,8 @@ def worker(args):
 stderr=stderr,
 )
 with ctx:
-sp = subprocess.Popen(['airflow', 'serve_logs'], env=env, 
close_fds=True)
+sp = subprocess.Popen(use_virtualenv(['airflow', 'serve_logs']), 
env=env,
+  close_fds=True)
 worker.run(**options)
 sp.kill()
 
@@ -953,7 +956,8 @@ def worker(args):
 signal.signal(signal.SIGINT, sigint_handler)
 signal.signal(signal.SIGTERM, sigint_handler)
 
-sp = subprocess.Popen(['airflow', 'serve_logs'], env=env, 
close_fds=True)
+sp = subprocess.Popen(use_virtualenv(['airflow', 'serve_logs']), 
env=env,
+  close_fds=True)
 
 worker.run(**options)
 sp.kill()
@@ -1143,7 +1147,8 @@ def flower(args):
 flower_conf = '--conf=' + args.flower_conf
 
 if args.daemon:
-pid, stdout, stderr, log_file = setup_locations("flower", args.pid, 
args.stdout, args.stderr, args.log_file)
+pid, stdout, stderr, log_file = setup_locations(
+"flower", args.pid, args.stdout, args.stderr, args.log_file)
 stdout = open(stdout, 'w+')
 stderr = open(stderr, 'w+')
 
@@ -1154,8 +1159,8 @@ def flower(args):
 )
 
 with ctx:
-os.execvp("flower", ['flower', '-b',
- broka, address, port, api, flower_conf, 
url_prefix])
+os.execvp(use_virtualenv('flower'),
+  ['flower', '-b', broka, address, port, api, flower_conf, 
url_prefix])
 
 stdout.close()
 stderr.close()
@@ -1163,8 +1168,8 @@ def flower(args):
 signal.signal(signal.SIGINT, sigint_handler)
 signal.signal(signal.SIGTERM, sigint_handler)
 
-os.execvp("flower", ['flower', '-b',
- broka, address, port, api, flower_conf, 
url_prefix])
+os.execvp(use_virtualenv('flower'),
+  ['flower', '-b', broka, address, port, api, flower_conf, 
url_prefix])
 
 
 @cli_utils.action_logging
diff --git a/airflow/models.py b/airflow/models.py
index afcacd126b..ac4814c333 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -78,6 +78,7 @@
 from airflow.utils.db import provide_session
 from airflow.utils.decorators import apply_defaults
 from airflow.utils.email import send_email
+from 

[jira] [Commented] (AIRFLOW-1488) Add a sensor operator to wait on DagRuns

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723488#comment-16723488
 ] 

ASF GitHub Bot commented on AIRFLOW-1488:
-

stale[bot] closed pull request #2500: [AIRFLOW-1488] Add the DagRunSensor 
operator.
URL: https://github.com/apache/incubator-airflow/pull/2500
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/dagrun_sensor.py 
b/airflow/contrib/operators/dagrun_sensor.py
new file mode 100644
index 00..f4465626af
--- /dev/null
+++ b/airflow/contrib/operators/dagrun_sensor.py
@@ -0,0 +1,86 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import logging
+from airflow import settings
+from airflow.utils.state import State
+from airflow.utils.decorators import apply_defaults
+from airflow.models import DagRun
+from airflow.operators.sensors import BaseSensorOperator
+
+
+class DagRunSensor(BaseSensorOperator):
+"""
+Waits for a DAG run to complete.
+
+:param external_dag_id: The dag_id that you want to wait for
+:type external_dag_id: string
+:param allowed_states: list of allowed states, default is ``['success']``
+:type allowed_states: list
+:param execution_delta: time difference with the previous execution to look
+at, the default is the same execution_date as the current task.  For
+yesterday, use [positive!] datetime.timedelta(days=1). Either
+execution_delta or execution_date_fn can be passed to DagRunSensor, but not
+both.
+:type execution_delta: datetime.timedelta
+:param execution_date_fn: function that receives the current execution date
+and returns the desired execution dates to query. Either execution_delta or
+execution_date_fn can be passed to DagRunSensor, but not both.
+:type execution_date_fn: callable
+"""
+@apply_defaults
+def __init__(
+self,
+external_dag_id,
+allowed_states=None,
+execution_delta=None,
+execution_date_fn=None,
+*args, **kwargs):
+super(DagRunSensor, self).__init__(*args, **kwargs)
+
+if execution_delta is not None and execution_date_fn is not None:
+raise ValueError(
+'Only one of `execution_date` or `execution_date_fn` may'
+'be provided to DagRunSensor; not both.')
+
+self.allowed_states = allowed_states or [State.SUCCESS]
+self.execution_delta = execution_delta
+self.execution_date_fn = execution_date_fn
+self.external_dag_id = external_dag_id
+
+def poke(self, context):
+if self.execution_delta:
+dttm = context['execution_date'] - self.execution_delta
+elif self.execution_date_fn:
+dttm = self.execution_date_fn(context['execution_date'])
+else:
+dttm = context['execution_date']
+
+dttm_filter = dttm if isinstance(dttm, list) else [dttm]
+serialized_dttm_filter = ','.join([datetime.isoformat() for datetime in
+   dttm_filter])
+
+logging.info(
+ 'Poking for '
+ '{self.external_dag_id}.'
+ '{serialized_dttm_filter} ... '.format(**locals()))
+
+session = settings.Session()
+count = session.query(DagRun).filter(
+DagRun.dag_id == self.external_dag_id,
+DagRun.state.in_(self.allowed_states),
+DagRun.execution_date.in_(dttm_filter),
+).count()
+session.commit()
+session.close()
+return count == len(dttm_filter)
diff --git a/tests/contrib/operators/test_dagrun_sensor.py 
b/tests/contrib/operators/test_dagrun_sensor.py
new file mode 100644
index 00..74e4d46ccb
--- /dev/null
+++ b/tests/contrib/operators/test_dagrun_sensor.py
@@ -0,0 +1,119 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# 

[jira] [Commented] (AIRFLOW-351) Failed to clear downstream tasks

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723484#comment-16723484
 ] 

ASF GitHub Bot commented on AIRFLOW-351:


stale[bot] closed pull request #2543: [AIRFLOW-351] fix bug with stop operator
URL: https://github.com/apache/incubator-airflow/pull/2543
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/models.py b/airflow/models.py
index d83bc9a73d..a15ac32752 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -3408,7 +3408,7 @@ def sub_dag(self, task_regex, include_downstream=False,
 upstream and downstream neighbours based on the flag passed.
 """
 
-dag = copy.deepcopy(self)
+dag = copy.copy(self)
 
 regex_match = [
 t for t in dag.tasks if re.findall(task_regex, t.task_id)]
diff --git a/airflow/task_runner/bash_task_runner.py 
b/airflow/task_runner/bash_task_runner.py
index b73e25818d..583834bdc0 100644
--- a/airflow/task_runner/bash_task_runner.py
+++ b/airflow/task_runner/bash_task_runner.py
@@ -37,3 +37,4 @@ def terminate(self):
 
 def on_finish(self):
 super(BashTaskRunner, self).on_finish()
+self.process.kill()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Failed to clear downstream tasks
> 
>
> Key: AIRFLOW-351
> URL: https://issues.apache.org/jira/browse/AIRFLOW-351
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models, subdag, webserver
>Affects Versions: 1.7.1.3
>Reporter: Adinata
>Priority: Major
> Attachments: dag_error.py, error.log, error_on_clear_dag.txt, 
> ubuntu-14-packages.log, ubuntu-16-oops.log, ubuntu-16-packages.log
>
>
> {code}
>   / (  ()   )  \___
>  /( (  (  )   _))  )   )\
>(( (   )()  )   (   )  )
>  ((/  ( _(   )   (   _) ) (  () )  )
> ( (  ( (_)   (((   )  .((_ ) .  )_
>( (  )(  (  ))   ) . ) (   )
>   (  (   (  (   ) (  _  ( _) ).  ) . ) ) ( )
>   ( (  (   ) (  )   (  )) ) _)(   )  )  )
>  ( (  ( \ ) ((_  ( ) ( )  )   ) )  )) ( )
>   (  (   (  (   (_ ( ) ( _)  ) (  )  )   )
>  ( (  ( (  (  ) (_  )  ) )  _)   ) _( ( )
>   ((  (   )(( _)   _) _(_ (  (_ )
>(_((__(_(__(( ( ( |  ) ) ) )_))__))_)___)
>((__)\\||lll|l||///  \_))
> (   /(/ (  )  ) )\   )
>   (( ( ( | | ) ) )\   )
>(   /(| / ( )) ) ) )) )
>  ( ( _(|)_) )
>   (  ||\(|(|)|/|| )
> (|(||(||))
>   ( //|/l|||)|\\ \ )
> (/ / //  /|//\\  \ \  \ _)
> ---
> Node: 9889a7c79e9b
> ---
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in 
> wsgi_app
> response = self.full_dispatch_request()
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in 
> full_dispatch_request
> rv = self.handle_user_exception(e)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in 
> handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in 
> full_dispatch_request
> rv = self.dispatch_request()
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in 
> dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 68, 
> in inner
> return self._run_view(f, *args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 
> 367, in _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_login.py", line 755, in 
> decorated_view
> return func(*args, 

[jira] [Commented] (AIRFLOW-1844) task would not be executed when celery broker recovery

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723496#comment-16723496
 ] 

ASF GitHub Bot commented on AIRFLOW-1844:
-

stale[bot] closed pull request #2811: [AIRFLOW-1844] task would not be executed 
when celery broker recovery
URL: https://github.com/apache/incubator-airflow/pull/2811
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/executors/base_executor.py 
b/airflow/executors/base_executor.py
index d3d0675ee4..da2a7827ab 100644
--- a/airflow/executors/base_executor.py
+++ b/airflow/executors/base_executor.py
@@ -120,8 +120,8 @@ def heartbeat(self):
 self.queued_tasks.pop(key)
 ti.refresh_from_db()
 if ti.state != State.RUNNING:
-self.running[key] = command
 self.execute_async(key, command=command, queue=queue)
+self.running[key] = command
 else:
 self.log.debug(
 'Task is already running, not sending to executor: %s',


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> task would not be executed when celery broker recovery
> --
>
> Key: AIRFLOW-1844
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1844
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery, executor, scheduler
>Affects Versions: 1.8.0, 1.9.0
>Reporter: hujiahua
>Priority: Major
>
> When the scheduler fail to send task during celery broker not working, then 
> the task will never send again and never be executed when celery broker 
> recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1729) Ignore whole directories in .airflowignore

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723492#comment-16723492
 ] 

ASF GitHub Bot commented on AIRFLOW-1729:
-

stale[bot] closed pull request #2754: [AIRFLOW-1729] Ignore whole directories 
from .airflowignore
URL: https://github.com/apache/incubator-airflow/pull/2754
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/utils/dag_processing.py b/airflow/utils/dag_processing.py
index 68cee7601e..67502b0d6e 100644
--- a/airflow/utils/dag_processing.py
+++ b/airflow/utils/dag_processing.py
@@ -174,11 +174,11 @@ def list_py_file_paths(directory, safe_mode=True):
 elif os.path.isdir(directory):
 patterns = []
 for root, dirs, files in os.walk(directory, followlinks=True):
-ignore_file = [f for f in files if f == '.airflowignore']
-if ignore_file:
-f = open(os.path.join(root, ignore_file[0]), 'r')
-patterns += [p for p in f.read().split('\n') if p]
-f.close()
+if '.airflowignore' in files:
+with open(os.path.join(root, '.airflowignore'), 'r') as f:
+patterns += [p for p in f if p]
+dirs[:] = [d for d in dirs if not any(
+[re.findall(p, os.path.join(root, d)) for p in patterns])]
 for f in files:
 try:
 file_path = os.path.join(root, f)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ignore whole directories in .airflowignore
> --
>
> Key: AIRFLOW-1729
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1729
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Cedric Hourcade
>Assignee: Ash Berlin-Taylor
>Priority: Minor
> Fix For: 1.10.0
>
>
> The .airflowignore file allows to prevent scanning files for DAG. But even if 
> we blacklist fulldirectory the {{os.walk}} will still go through them no 
> matter how deep they are and skip files one by one, which can be an issue 
> when you keep around big .git or virtualvenv directories.
> I suggest to add something like:
> {code}
> dirs[:] = [d for d in dirs if not any([re.findall(p, os.path.join(root, d)) 
> for p in patterns])]
> {code}
> to prune the directories here: 
> https://github.com/apache/incubator-airflow/blob/cfc2f73c445074e1e09d6ef6a056cd2b33a945da/airflow/utils/dag_processing.py#L208-L209
>  and in {{list_py_file_paths}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1822) Add gaiohttp and gthread gunicorn workerclass option in cli

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723495#comment-16723495
 ] 

ASF GitHub Bot commented on AIRFLOW-1822:
-

stale[bot] closed pull request #2794: [AIRFLOW-1822] Add gaiohttp and gthread 
gunicorn worker class to webserver cli
URL: https://github.com/apache/incubator-airflow/pull/2794
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index 6d012935a4..1002c69db5 100755
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -728,7 +728,10 @@ def webserver(args):
 '-p', str(pid),
 '-c', 'python:airflow.www.gunicorn_config'
 ]
-
+
+if args.workerclass == 'gthread':
+run_args += ['--threads', str(args.threads)]
+
 if args.access_logfile:
 run_args += ['--access-logfile', str(args.access_logfile)]
 
@@ -1342,10 +1345,15 @@ class CLIFactory(object):
 default=conf.get('webserver', 'WORKERS'),
 type=int,
 help="Number of workers to run the webserver on"),
+'threads': Arg(
+("--threads",),
+default=conf.get('webserver', 'THREADS'),
+type=int,
+help="Number of workers threads for handling webserver requests"),
 'workerclass': Arg(
 ("-k", "--workerclass"),
 default=conf.get('webserver', 'WORKER_CLASS'),
-choices=['sync', 'eventlet', 'gevent', 'tornado'],
+choices=['sync', 'eventlet', 'gevent', 'tornado', 'gaiohttp', 
'gthread'],
 help="The worker class to use for Gunicorn"),
 'worker_timeout': Arg(
 ("-t", "--worker_timeout"),
@@ -1571,7 +1579,7 @@ class CLIFactory(object):
 }, {
 'func': webserver,
 'help': "Start a Airflow webserver instance",
-'args': ('port', 'workers', 'workerclass', 'worker_timeout', 
'hostname',
+'args': ('port', 'workers', 'workerclass', 'worker_timeout', 
'threads','hostname',
  'pid', 'daemon', 'stdout', 'stderr', 'access_logfile',
  'error_logfile', 'log_file', 'ssl_cert', 'ssl_key', 
'debug'),
 }, {
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index fd78253f18..ea79af0257 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -192,6 +192,10 @@ secret_key = temporary_key
 # Number of workers to run the Gunicorn web server
 workers = 4
 
+# Number of worker threads for handling requests
+# This setting only affects the Gthread worker type
+threads = 1
+
 # The worker class gunicorn should use. Choices include
 # sync (default), eventlet, gevent
 worker_class = sync
diff --git a/docs/installation.rst b/docs/installation.rst
index b4fb126aab..8d35056a04 100644
--- a/docs/installation.rst
+++ b/docs/installation.rst
@@ -83,6 +83,8 @@ Here's the list of the subpackages and what they enable:
 
+---+--+-+
 |  slack| ``pip install apache-airflow[slack]``| 
``SlackAPIPostOperator``|
 
+---+--+-+
+|  aiohttp  | ``pip install apache-airflow[aiohttp]``  | Gaiohttp 
worker class for gunicorn  |
++---+--+-+
 |  vertica  | ``pip install apache-airflow[vertica]``  | Vertica hook  
  |
 |   |  | support as an 
Airflow backend   |
 
+---+--+-+
diff --git a/setup.py b/setup.py
index e9d68b3bae..aa3b6b7a19 100644
--- a/setup.py
+++ b/setup.py
@@ -99,6 +99,9 @@ def write_version(filename=os.path.join(*['airflow',
 with open(filename, 'w') as a:
 a.write(text)
 
+aiohttp = [
+'aiohttp>=0.21.5'
+]
 async = [
 'greenlet>=0.4.9',
 'eventlet>= 0.9.7',
@@ -244,6 +247,7 @@ def do_setup():
 extras_require={
 'all': devel_all,
 'all_dbs': all_dbs,
+'aiohttp': aiohttp,
 'async': async,
 'azure': azure,
 'celery': celery,


 


This is an automated message from the Apache Git Service.

[jira] [Commented] (AIRFLOW-1775) Remote file handler for logging

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723490#comment-16723490
 ] 

ASF GitHub Bot commented on AIRFLOW-1775:
-

stale[bot] closed pull request #2757: [AIRFLOW-1775] Remote File Task Handler
URL: https://github.com/apache/incubator-airflow/pull/2757
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/utils/log/file_task_handler.py 
b/airflow/utils/log/file_task_handler.py
index e7d257f06a..5c09efaac9 100644
--- a/airflow/utils/log/file_task_handler.py
+++ b/airflow/utils/log/file_task_handler.py
@@ -1,199 +1,204 @@
-# -*- coding: utf-8 -*-
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import logging
-import os
-import requests
-
-from jinja2 import Template
-
-from airflow import configuration as conf
-from airflow.configuration import AirflowConfigException
-from airflow.utils.file import mkdirs
-
-
-class FileTaskHandler(logging.Handler):
-"""
-FileTaskHandler is a python log handler that handles and reads
-task instance logs. It creates and delegates log handling
-to `logging.FileHandler` after receiving task instance context.
-It reads logs from task instance's host machine.
-"""
-
-def __init__(self, base_log_folder, filename_template):
-"""
-:param base_log_folder: Base log folder to place logs.
-:param filename_template: template filename string
-"""
-super(FileTaskHandler, self).__init__()
-self.handler = None
-self.local_base = base_log_folder
-self.filename_template = filename_template
-self.filename_jinja_template = None
-
-if "{{" in self.filename_template: #jinja mode
-self.filename_jinja_template = Template(self.filename_template)
-
-def set_context(self, ti):
-"""
-Provide task_instance context to airflow task handler.
-:param ti: task instance object
-"""
-local_loc = self._init_file(ti)
-self.handler = logging.FileHandler(local_loc)
-self.handler.setFormatter(self.formatter)
-self.handler.setLevel(self.level)
-
-def emit(self, record):
-if self.handler is not None:
-self.handler.emit(record)
-
-def flush(self):
-if self.handler is not None:
-self.handler.flush()
-
-def close(self):
-if self.handler is not None:
-self.handler.close()
-
-def _render_filename(self, ti, try_number):
-if self.filename_jinja_template:
-jinja_context = ti.get_template_context()
-jinja_context['try_number'] = try_number
-return self.filename_jinja_template.render(**jinja_context)
-
-return self.filename_template.format(dag_id=ti.dag_id,
- task_id=ti.task_id,
- 
execution_date=ti.execution_date.isoformat(),
- try_number=try_number)
-
-def _read(self, ti, try_number):
-"""
-Template method that contains custom logic of reading
-logs given the try_number.
-:param ti: task instance record
-:param try_number: current try_number to read log from
-:return: log message as a string
-"""
-# Task instance here might be different from task instance when
-# initializing the handler. Thus explicitly getting log location
-# is needed to get correct log path.
-log_relative_path = self._render_filename(ti, try_number)
-location = os.path.join(self.local_base, log_relative_path)
-
-log = ""
-
-if os.path.exists(location):
-try:
-with open(location) as f:
-log += "*** Reading local file: {}\n".format(location)
-log += "".join(f.readlines())
-except Exception as e:
-log = "*** Failed to load local log file: 
{}\n".format(location)
-log += "*** {}\n".format(str(e))
-else:
-url = os.path.join(
-"http://{ti.hostname}:{worker_log_server_port}/log;, 

[jira] [Commented] (AIRFLOW-1974) Make databricks operator more generic

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723494#comment-16723494
 ] 

ASF GitHub Bot commented on AIRFLOW-1974:
-

stale[bot] closed pull request #2932: [AIRFLOW-1974] Improve Databricks 
Hook/Operator
URL: https://github.com/apache/incubator-airflow/pull/2932
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/databricks_hook.py 
b/airflow/contrib/hooks/databricks_hook.py
index 54f00e0090..77a97a53d2 100644
--- a/airflow/contrib/hooks/databricks_hook.py
+++ b/airflow/contrib/hooks/databricks_hook.py
@@ -19,43 +19,69 @@
 #
 import requests
 
+from collections import namedtuple
+from requests.exceptions import ConnectionError, Timeout
+from requests.auth import AuthBase
+
 from airflow import __version__
 from airflow.exceptions import AirflowException
 from airflow.hooks.base_hook import BaseHook
-from requests import exceptions as requests_exceptions
-from requests.auth import AuthBase
-
-from airflow.utils.log.logging_mixin import LoggingMixin
 
 try:
-from urllib import parse as urlparse
+from urllib.parse import urlparse
 except ImportError:
-import urlparse
-
+from urlparse import urlparse
 
-SUBMIT_RUN_ENDPOINT = ('POST', 'api/2.0/jobs/runs/submit')
-GET_RUN_ENDPOINT = ('GET', 'api/2.0/jobs/runs/get')
-CANCEL_RUN_ENDPOINT = ('POST', 'api/2.0/jobs/runs/cancel')
 USER_AGENT_HEADER = {'user-agent': 'airflow-{v}'.format(v=__version__)}
+DEFAULT_API_VERSION = '2.0'
+
+Endpoint = namedtuple('Endpoint', ['http_method', 'path', 'method'])
 
 
-class DatabricksHook(BaseHook, LoggingMixin):
+class DatabricksHook(BaseHook):
 """
 Interact with Databricks.
 """
-def __init__(
-self,
-databricks_conn_id='databricks_default',
-timeout_seconds=180,
-retry_limit=3):
+
+API = {
+# API V2.0
+#   JOBS API
+# '2.0/jobs/create': Endpoint('POST', '2.0/jobs/create', ''),
+# '2.0/jobs/list': Endpoint('GET', '2.0/jobs/list', ''),
+# '2.0/jobs/delete': Endpoint('POST', '2.0/jobs/delete', ''),
+# '2.0/jobs/get': Endpoint('GET', '2.0/jobs/get', ''),
+# '2.0/jobs/reset': Endpoint('POST', '2.0/jobs/reset', ''),
+'2.0/jobs/run-now': Endpoint('POST', '2.0/jobs/run-now', ''),
+'2.0/jobs/runs/submit': Endpoint('POST', '2.0/jobs/runs/submit',
+ 'jobs_runs_submit'),
+# '2.0/jobs/runs/list': Endpoint('GET', '2.0/jobs/runs/list', ''),
+'2.0/jobs/runs/get': Endpoint('GET', '2.0/jobs/runs/get',
+  'jobs_runs_get'),
+# '2.0/jobs/runs/export': Endpoint('GET', '2.0/jobs/runs/export', ''),
+'2.0/jobs/runs/cancel': Endpoint('POST', '2.0/jobs/runs/cancel',
+ 'jobs_runs_cancel')
+# '2.0/jobs/runs/get-output': Endpoint('GET',
+#  '2.0/jobs/runs/get-output', '')
+}
+# TODO: https://docs.databricks.com/api/latest/index.html
+# TODO: https://docs.databricks.com/api/latest/dbfs.html
+# TODO: https://docs.databricks.com/api/latest/groups.html
+# TODO: https://docs.databricks.com/api/latest/instance-profiles.html
+# TODO: https://docs.databricks.com/api/latest/libraries.html
+# TODO: https://docs.databricks.com/api/latest/tokens.html
+# TODO: https://docs.databricks.com/api/latest/workspace.html
+
+def __init__(self, databricks_conn_id='databricks_default',
+ timeout_seconds=180, retry_limit=3):
 """
-:param databricks_conn_id: The name of the databricks connection to 
use.
+:param databricks_conn_id: The name of the databricks connection to
+use.
 :type databricks_conn_id: string
-:param timeout_seconds: The amount of time in seconds the requests 
library
-will wait before timing-out.
+:param timeout_seconds: The amount of time in seconds the requests
+library will wait before timing-out.
 :type timeout_seconds: int
-:param retry_limit: The number of times to retry the connection in 
case of
-service outages.
+:param retry_limit: The number of times to retry the connection in
+case of service outages.
 :type retry_limit: int
 """
 self.databricks_conn_id = databricks_conn_id
@@ -67,34 +93,32 @@ def __init__(
 
 @staticmethod
 def _parse_host(host):
-"""
+"""Verify connection host setting provided by the user.
+
 The purpose of this function is to be robust to improper connections

[jira] [Commented] (AIRFLOW-878) FileNotFoundError: 'gunicorn' after initial setup

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723493#comment-16723493
 ] 

ASF GitHub Bot commented on AIRFLOW-878:


stale[bot] closed pull request #3134: [AIRFLOW-878] Use absolute gunicorn 
executable location
URL: https://github.com/apache/incubator-airflow/pull/3134
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index 449d8ca8da..6046569391 100755
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -734,8 +734,17 @@ def webserver(args):
 
=\
 '''.format(**locals(
 
+def get_gunicorn_location():
+location = os.path.join(
+os.path.dirname(sys.executable), "gunicorn")
+if os.path.isfile(location) and os.access(location, os.X_OK):
+return location
+raise AirflowException("gunicorn could not be found")
+
+gunicorn_exec = get_gunicorn_location()
+
 run_args = [
-'gunicorn',
+gunicorn_exec,
 '-w', str(num_workers),
 '-k', str(args.workerclass),
 '-t', str(worker_timeout),


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FileNotFoundError: 'gunicorn' after initial setup
> -
>
> Key: AIRFLOW-878
> URL: https://issues.apache.org/jira/browse/AIRFLOW-878
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Manuel Barkhau
>Priority: Major
>
> I get the following error after installing airflow and doing {{airflow init}}
> {code}
> $ venv/bin/airflow webserver 
> [2017-02-15 12:24:52,813] {__init__.py:56} INFO - Using executor 
> SequentialExecutor
> [2017-02-15 12:24:52,886] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python3.5/lib2to3/Grammar.txt
> [2017-02-15 12:24:52,904] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python3.5/lib2to3/PatternGrammar.txt
>      _
>  |__( )_  __/__  /  __
>   /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
> ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
>  _/_/  |_/_/  /_//_//_/  \//|__/
>  
> /home/mbarkhau/testproject/venv/lib/python3.5/site-packages/flask/exthook.py:71:
>  ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use 
> flask_cache in
> stead.
>   .format(x=modname), ExtDeprecationWarning
> [2017-02-15 12:24:53,145] [11995] {models.py:167} INFO - Filling up the 
> DagBag from /home/mbarkhau/airflow/dags
> Traceback (most recent call last):
>   File "venv/bin/airflow", line 6, in 
> exec(compile(open(__file__).read(), __file__, 'exec'))
>   File "/home/mbarkhau/testproject/venv/src/airflow/airflow/bin/airflow", 
> line 28, in 
> args.func(args)
>   File "/home/mbarkhau/testproject/venv/src/airflow/airflow/bin/cli.py", line 
> 791, in webserver
> gunicorn_master_proc = subprocess.Popen(run_args)
>   File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
> restore_signals, start_new_session)
>   File "/usr/lib/python3.5/subprocess.py", line 1551, in _execute_child
> raise child_exception_type(errno_num, err_msg)
> FileNotFoundError: [Errno 2] No such file or directory: 'gunicorn'
> Running the Gunicorn Server with:
> Workers: 4 sync
> Host: 0.0.0.0:8080
> Timeout: 120
> Logfiles: - -
> =
> {code}
> My setup
> {code}
> $ venv/bin/python --version
> Python 3.5.2
> $ venv/bin/pip freeze | grep airflow
> -e 
> git+ssh://g...@github.com/apache/incubator-airflow.git@debc69e2787542cd56ab28b6c48db01c65ad05c4#egg=airflow
> $ uname -a
> Linux mbarkhau-office 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1595) SqliteHook is broken

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723489#comment-16723489
 ] 

ASF GitHub Bot commented on AIRFLOW-1595:
-

stale[bot] closed pull request #2598: [AIRFLOW-1595] Change to construct 
sqlite_hook from connection schema
URL: https://github.com/apache/incubator-airflow/pull/2598
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/hooks/sqlite_hook.py b/airflow/hooks/sqlite_hook.py
index c241c2ddef..5a9c7e3516 100644
--- a/airflow/hooks/sqlite_hook.py
+++ b/airflow/hooks/sqlite_hook.py
@@ -32,5 +32,5 @@ def get_conn(self):
 Returns a sqlite connection object
 """
 conn = self.get_connection(self.sqlite_conn_id)
-conn = sqlite3.connect(conn.host)
+conn = sqlite3.connect(conn.schema)
 return conn
diff --git a/airflow/utils/db.py b/airflow/utils/db.py
index 35c187ca54..44aa05429d 100644
--- a/airflow/utils/db.py
+++ b/airflow/utils/db.py
@@ -160,7 +160,7 @@ def initdb():
 merge_conn(
 models.Connection(
 conn_id='sqlite_default', conn_type='sqlite',
-host='/tmp/sqlite_default.db'))
+schema='/tmp/sqlite_default.db'))
 merge_conn(
 models.Connection(
 conn_id='http_default', conn_type='http',
diff --git a/tests/hooks/test_sqlite_hook.py b/tests/hooks/test_sqlite_hook.py
new file mode 100644
index 00..de830f09ae
--- /dev/null
+++ b/tests/hooks/test_sqlite_hook.py
@@ -0,0 +1,53 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import mock
+import unittest
+import os
+
+from airflow import settings, models
+from airflow.settings import Session
+from airflow.hooks.sqlite_hook import SqliteHook
+
+
+@unittest.skipIf(not settings.SQL_ALCHEMY_CONN.startswith('sqlite'), 
'SqliteHook won\'t work without backend SQLite. No need to test anything here')
+class TestSqliteHook(unittest.TestCase):
+
+CONN_ID = 'sqlite_hook_test'
+
+@classmethod
+def setUpClass(cls):
+super(TestSqliteHook, cls).setUpClass()
+session = Session()
+
session.query(models.Connection).filter_by(conn_id=cls.CONN_ID).delete()
+session.commit()
+connection = models.Connection(conn_id=cls.CONN_ID, 
uri=settings.SQL_ALCHEMY_CONN)
+session.add(connection)
+session.commit()
+session.close()
+
+def test_sql_hook(self):
+hook = SqliteHook(sqlite_conn_id=self.CONN_ID)
+conn_id, = hook.get_first('SELECT conn_id FROM connection WHERE 
conn_id = :conn_id',
+  {'conn_id': self.CONN_ID})
+self.assertEqual(conn_id, self.CONN_ID)
+
+@classmethod
+def tearDownClass(cls):
+session = Session()
+
session.query(models.Connection).filter_by(conn_id=cls.CONN_ID).delete()
+session.commit()
+session.close()
+super(TestSqliteHook, cls).tearDownClass()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SqliteHook is broken
> 
>
> Key: AIRFLOW-1595
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1595
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: 1.8.1
>Reporter: Shintaro Murakami
>Priority: Major
>
> SqliteHook is built using the host attribute of connection, but correctly we 
> should use the schema attribute. The path to the DB parsed from the URI is 
> set as schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-161) Redirection to external url

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723491#comment-16723491
 ] 

ASF GitHub Bot commented on AIRFLOW-161:


stale[bot] closed pull request #2657: [AIRFLOW-161] New redirect route and 
extra links
URL: https://github.com/apache/incubator-airflow/pull/2657
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 1dfb079b49..4a80053502 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -251,6 +251,9 @@ hide_paused_dags_by_default = False
 # Consistent page size across all listing views in the UI
 page_size = 100
 
+# List of domains which are allowed to get redirected to by operators
+whitelisted_domains = []
+
 [email]
 email_backend = airflow.utils.email.send_email_smtp
 
diff --git a/airflow/config_templates/default_test.cfg 
b/airflow/config_templates/default_test.cfg
index b065313c17..c3b6d2e717 100644
--- a/airflow/config_templates/default_test.cfg
+++ b/airflow/config_templates/default_test.cfg
@@ -59,6 +59,7 @@ dag_default_view = tree
 log_fetch_timeout_sec = 5
 hide_paused_dags_by_default = False
 page_size = 100
+whitelisted_domains = []
 
 [email]
 email_backend = airflow.utils.email.send_email_smtp
diff --git a/airflow/contrib/hooks/qubole_hook.py 
b/airflow/contrib/hooks/qubole_hook.py
index f3bcc202ed..712d782eb9 100755
--- a/airflow/contrib/hooks/qubole_hook.py
+++ b/airflow/contrib/hooks/qubole_hook.py
@@ -17,12 +17,15 @@
 import time
 import datetime
 import six
+import re
 
+from airflow import settings
 from airflow.exceptions import AirflowException
 from airflow.hooks.base_hook import BaseHook
 from airflow import configuration
 from airflow.utils.log.logging_mixin import LoggingMixin
 from airflow.utils.state import State
+from airflow.models import TaskInstance
 
 from qds_sdk.qubole import Qubole
 from qds_sdk.commands import Command, HiveCommand, PrestoCommand, 
HadoopCommand, \
@@ -175,6 +178,29 @@ def get_jobs_id(self, ti):
 cmd_id = ti.xcom_pull(key="qbol_cmd_id", task_ids=self.task_id)
 Command.get_jobs_id(self.cls, cmd_id)
 
+def get_redirect_url(self, task, dttm):
+session = settings.Session()
+url = ''
+
+try:
+conn = BaseHook.get_connection(task.kwargs['qubole_conn_id'])
+if conn and conn.host:
+host = re.sub(r'api$', 'v2/analyze?command_id=', conn.host)
+else:
+host = 'https://api.qubole.com/v2/analyze?command_id='
+
+ti = TaskInstance(task=task, execution_date=dttm)
+qds_command_id = ti.xcom_pull(task_ids=task.task_id, 
key='qbol_cmd_id')
+
+url = host + str(qds_command_id) if qds_command_id else ''
+except Exception as e:
+print('Could not find the url to redirect. Error: %s' % str(e))
+finally:
+session.commit()
+session.close()
+
+return url
+
 def create_cmd_args(self, context):
 args = []
 cmd_type = self.kwargs['command_type']
diff --git a/airflow/contrib/operators/qubole_operator.py 
b/airflow/contrib/operators/qubole_operator.py
index a5e9f5ed63..5d99911fcf 100755
--- a/airflow/contrib/operators/qubole_operator.py
+++ b/airflow/contrib/operators/qubole_operator.py
@@ -122,6 +122,7 @@ class QuboleOperator(BaseOperator):
 template_ext = ('.txt',)
 ui_color = '#3064A1'
 ui_fgcolor = '#fff'
+extra_links = ['Go to QDS']
 
 @apply_defaults
 def __init__(self, qubole_conn_id="qubole_default", *args, **kwargs):
@@ -155,6 +156,9 @@ def get_hook(self):
 # Reinitiating the hook, as some template fields might have changed
 return QuboleHook(*self.args, **self.kwargs)
 
+def get_redirect_url(self, dttm, redirect_to):
+return self.get_hook().get_redirect_url(self, dttm)
+
 def __getattribute__(self, name):
 if name in QuboleOperator.template_fields:
 if name in self.kwargs:
diff --git a/airflow/models.py b/airflow/models.py
index 5837363bd9..a419d12918 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -2094,6 +2094,8 @@ class derived from this one results in the creation of a 
task object,
 template_fields = []
 # Defines which files extensions to look for in the templated fields
 template_ext = []
+# Defines the extra buttons to display in the task instance model view
+extra_links = []
 # Defines the color in the UI
 ui_color = '#fff'
 ui_fgcolor = '#000'
@@ -2747,6 +2749,9 @@ def xcom_pull(
 dag_id=dag_id,
 

[jira] [Commented] (AIRFLOW-1793) DockerOperator doesn't work with docker_conn_id

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723497#comment-16723497
 ] 

ASF GitHub Bot commented on AIRFLOW-1793:
-

stale[bot] closed pull request #2776: [AIRFLOW-1793] fix DockerOperator using 
docker_conn_id
URL: https://github.com/apache/incubator-airflow/pull/2776
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/operators/docker_operator.py 
b/airflow/operators/docker_operator.py
index 38edc8b4d7..adaabd2e07 100644
--- a/airflow/operators/docker_operator.py
+++ b/airflow/operators/docker_operator.py
@@ -147,7 +147,7 @@ def __init__(
 def get_hook(self):
 return DockerHook(
 docker_conn_id=self.docker_conn_id,
-base_url=self.base_url,
+base_url=self.docker_url,
 version=self.api_version,
 tls=self.__get_tls_config()
 )
diff --git a/tests/operators/docker_operator.py 
b/tests/operators/docker_operator.py
index a12b6f829f..915992338d 100644
--- a/tests/operators/docker_operator.py
+++ b/tests/operators/docker_operator.py
@@ -188,7 +188,8 @@ def test_execute_no_docker_conn_id_no_hook(self, 
operator_client_mock):
 )
 
 @mock.patch('airflow.operators.docker_operator.Client')
-def test_execute_with_docker_conn_id_use_hook(self, operator_client_mock):
+@mock.patch('airflow.operators.docker_operator.DockerHook')
+def test_execute_with_docker_conn_id_use_hook(self, docker_hook_mock, 
operator_client_mock):
 # Mock out a Docker client, so operations don't raise errors
 client_mock = mock.Mock(name='DockerOperator.Client mock', spec=Client)
 client_mock.images.return_value = []
@@ -198,6 +199,9 @@ def test_execute_with_docker_conn_id_use_hook(self, 
operator_client_mock):
 client_mock.wait.return_value = 0
 operator_client_mock.return_value = client_mock
 
+# Mock out the DockerHook
+docker_hook_mock.return_value.get_conn.return_value = client_mock
+
 # Create the DockerOperator
 operator = DockerOperator(
 image='publicregistry/someimage',
@@ -206,22 +210,13 @@ def test_execute_with_docker_conn_id_use_hook(self, 
operator_client_mock):
 docker_conn_id='some_conn_id'
 )
 
-# Mock out the DockerHook
-hook_mock = mock.Mock(name='DockerHook mock', spec=DockerHook)
-hook_mock.get_conn.return_value = client_mock
-operator.get_hook = mock.Mock(
-name='DockerOperator.get_hook mock',
-spec=DockerOperator.get_hook,
-return_value=hook_mock
-)
-
 operator.execute(None)
 self.assertEqual(
 operator_client_mock.call_count, 0,
 'Client was called on the operator instead of the hook'
 )
 self.assertEqual(
-operator.get_hook.call_count, 1,
+docker_hook_mock.call_count, 1,
 'Hook was not called although docker_conn_id configured'
 )
 self.assertEqual(
@@ -229,5 +224,6 @@ def test_execute_with_docker_conn_id_use_hook(self, 
operator_client_mock):
 'Image was not pulled using operator client'
 )
 
+
 if __name__ == "__main__":
 unittest.main()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DockerOperator doesn't work with docker_conn_id
> ---
>
> Key: AIRFLOW-1793
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1793
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Cedrik Neumann
>Assignee: Cedrik Neumann
>Priority: Major
> Fix For: 2.0.0
>
>
> The implementation of DockerOperator uses `self.base_url` when loading the 
> DockerHook instead of `self.docker_url`:
> https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/operators/docker_operator.py#L150
> {noformat}
> [2017-11-08 16:10:13,082] {base_task_runner.py:98} INFO - Subtask:   File 
> "/src/apache-airflow/airflow/operators/docker_operator.py", line 161, in 
> execute
> [2017-11-08 16:10:13,083] {base_task_runner.py:98} INFO - Subtask: 
> self.cli = self.get_hook().get_conn()
> [2017-11-08 16:10:13,083] {base_task_runner.py:98} INFO - Subtask:   File 
> "/src/apache-airflow/airflow/operators/docker_operator.py", line 150, in 
> get_hook
> [2017-11-08 

[jira] [Commented] (AIRFLOW-1592) Add keep-alive argument supported by gunicorn backend to the airflow configuration

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723487#comment-16723487
 ] 

ASF GitHub Bot commented on AIRFLOW-1592:
-

stale[bot] closed pull request #2595: [AIRFLOW-1592] Add --keep-alive option 
for gunicorn to airflow config
URL: https://github.com/apache/incubator-airflow/pull/2595
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index 09bd0c1806..0f38cd3530 100644
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -857,6 +857,7 @@ def webserver(args):
 num_workers = args.workers or conf.get('webserver', 'workers')
 worker_timeout = (args.worker_timeout or
   conf.get('webserver', 'web_server_worker_timeout'))
+keep_alive = (args.keep_alive or conf.get('webserver', 
'web_server_keep_alive'))
 ssl_cert = args.ssl_cert or conf.get('webserver', 'web_server_ssl_cert')
 ssl_key = args.ssl_key or conf.get('webserver', 'web_server_ssl_key')
 if not ssl_cert and ssl_key:
@@ -903,6 +904,7 @@ def webserver(args):
 '-w', str(num_workers),
 '-k', str(args.workerclass),
 '-t', str(worker_timeout),
+'--keep-alive', int(keep_alive),
 '-b', args.hostname + ':' + str(args.port),
 '-n', 'airflow-webserver',
 '-p', str(pid),
diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index 0028d7832f..9e7ebe9271 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -239,6 +239,9 @@ web_server_master_timeout = 120
 # Number of seconds the gunicorn webserver waits before timing out on a worker
 web_server_worker_timeout = 120
 
+# Number of seconds for the gunicorn keep-alive
+web_server_keep_alive = 75
+
 # Number of workers to refresh at a time. When set to 0, worker refresh is
 # disabled. When nonzero, airflow periodically refreshes webserver workers by
 # bringing up new ones and killing old ones.
diff --git a/tests/cli/test_cli.py b/tests/cli/test_cli.py
index aeafdd85fe..2eb509d5b9 100644
--- a/tests/cli/test_cli.py
+++ b/tests/cli/test_cli.py
@@ -75,6 +75,7 @@ def create_mock_args(
 pickle=None,
 raw=None,
 interactive=None,
+keep_alive=None,
 ):
 if executor_config is None:
 executor_config = {}
@@ -101,6 +102,7 @@ def create_mock_args(
 args.ignore_dependencies = ignore_dependencies
 args.force = force
 args.interactive = interactive
+args.keep_alive = keep_alive
 return args
 
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add keep-alive argument supported by gunicorn backend to the airflow 
> configuration
> --
>
> Key: AIRFLOW-1592
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1592
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Demian Ginther
>Assignee: Iuliia Volkova
>Priority: Minor
>
> The --keep-alive option is necessary for gunicorn to function properly with 
> AWS ELBs, as gunicorn appears to have an issue with the ELB timeouts as set 
> by default.
> In addition, it makes no sense to provide a wrapper for a program but not 
> allow all configuration options to be set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1325) Airflow streaming log backed by ElasticSearch

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723483#comment-16723483
 ] 

ASF GitHub Bot commented on AIRFLOW-1325:
-

stale[bot] closed pull request #2515: [AIRFLOW-1325][WIP] Airflow streaming log 
backed by ElasticSearch
URL: https://github.com/apache/incubator-airflow/pull/2515
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/config_templates/default_airflow_logging.py 
b/airflow/config_templates/default_airflow_logging.py
index d6ae0366d1..3663b35aea 100644
--- a/airflow/config_templates/default_airflow_logging.py
+++ b/airflow/config_templates/default_airflow_logging.py
@@ -73,6 +73,13 @@
 'gcs_log_folder': GCS_LOG_FOLDER,
 'filename_template': FILENAME_TEMPLATE,
 },
+'es.task': {
+'class': 
'airflow.utils.log.elasticsearch_task_handler.ElasticsearchTaskHandler',
+'formatter': 'airflow.task',
+'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
+'filename_template': FILENAME_TEMPLATE,
+'host': 'localhost:9200',
+},
 },
 'loggers': {
 'airflow.task': {
diff --git a/airflow/utils/log/elasticsearch_task_handler.py 
b/airflow/utils/log/elasticsearch_task_handler.py
new file mode 100644
index 00..df18ba4dd2
--- /dev/null
+++ b/airflow/utils/log/elasticsearch_task_handler.py
@@ -0,0 +1,111 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from airflow import AirflowException
+from airflow.utils.log.file_task_handler import FileTaskHandler
+from elasticsearch import Elasticsearch, ElasticsearchException, helpers
+from elasticsearch_dsl import Search
+
+
+class ElasticsearchTaskHandler(FileTaskHandler):
+"""
+ElasticsearchTaskHandler is a python log handler that
+reads logs from Elasticsearch. Note logs are not directly
+indexed into Elasticsearch. Instead, it flushes logs
+into local files. Additional software setup is required
+to index the log into Elasticsearch, such as using
+Filebeat and Logstash.
+
+To efficiently query and sort Elasticsearch results, we assume each
+log message has a field `log_id` consists of ti primary keys:
+`log_id = {dag_id}-{task_id}-{execution_date}-{try_number}`
+Log messages with specific log_id are sorted based on `offset`,
+which is a unique integer indicates log message's order.
+Timestamp here are unreliable because multiple log messages
+might have the same timestamp.
+"""
+def __init__(self, base_log_folder, filename_template,
+ host='localhost:9200'):
+"""
+:param base_log_folder: base folder to store logs locally
+:param filename_template: log filename template
+:param host: Elasticsearh host name
+"""
+super(ElasticsearchTaskHandler, self).__init__(
+base_log_folder, filename_template)
+self.client = Elasticsearch([host])
+
+def streaming_read(self, dag_id, task_id, execution_date,
+  try_number, offset=None, page=0, max_line_per_page=1000):
+"""
+Endpoint for streaming log.
+:param dag_id: id of the dag
+:param task_id: id of the task
+:param execution_date: execution date in isoformat
+:param try_number: try_number of the task instance
+:param offset: filter log with offset strictly greater than offset
+:param page: logs at given page
+:param max_line_per_page: maximum number of results returned per ES 
query
+:return a list of log documents
+"""
+log_id = '-'.join([dag_id, task_id, execution_date, try_number])
+
+s = Search(using=self.client) \
+.query('match', log_id=log_id) \
+.sort('offset')
+
+# Offset is the unique key for sorting logs given log_id.
+if offset:
+s = s.filter('range', offset={'gt': offset})
+
+try:
+response = s[max_line_per_page * page:max_line_per_page].execute()
+logs = [hit for hit in response]
+
+except ElasticsearchException as e:
+# Do not 

[jira] [Commented] (AIRFLOW-1491) Celery executor restarts result in duplicate tasks

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723486#comment-16723486
 ] 

ASF GitHub Bot commented on AIRFLOW-1491:
-

stale[bot] closed pull request #2503: AIRFLOW-1491 Check celery queue before 
scheduling commands
URL: https://github.com/apache/incubator-airflow/pull/2503
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/executors/base_executor.py 
b/airflow/executors/base_executor.py
index 7a4065eb07..8a3a71ad46 100644
--- a/airflow/executors/base_executor.py
+++ b/airflow/executors/base_executor.py
@@ -132,7 +132,7 @@ def heartbeat(self):
 self.sync()
 
 def change_state(self, key, state):
-self.running.pop(key)
+self.running.pop(key, None)
 self.event_buffer[key] = state
 
 def fail(self, key):
diff --git a/airflow/executors/celery_executor.py 
b/airflow/executors/celery_executor.py
index 17c343bd4a..5654685815 100644
--- a/airflow/executors/celery_executor.py
+++ b/airflow/executors/celery_executor.py
@@ -13,6 +13,7 @@
 # limitations under the License.
 
 from builtins import object
+import itertools
 import logging
 import subprocess
 import ssl
@@ -20,6 +21,7 @@
 import traceback
 
 from celery import Celery
+from celery.result import AsyncResult
 from celery import states as celery_states
 
 from airflow.exceptions import AirflowConfigException, AirflowException
@@ -83,6 +85,26 @@ def execute_command(command):
 raise AirflowException('Celery command failed')
 
 
+def recover_command(command):
+"""Recover the celery AsyncResult object for a command, if possible.
+
+Note:
+- We use iterables and generators to minimize backend calls.
+- Functionality is dependent on features presented by the celery
+  backend in use.
+"""
+insp = app.control.inspect()
+task_providers = [insp.active, insp.scheduled, insp.reserved]
+task_lists = (
+itertools.chain.from_iterable(tp().values())
+for tp in task_providers)
+tasks = itertools.chain.from_iterable(task_lists)
+for task in tasks:
+if task['args'] == [command]:
+return AsyncResult(id=task['id'], app=app)
+return None
+
+
 class CeleryExecutor(BaseExecutor):
 """
 CeleryExecutor is recommended for production use of Airflow. It allows
@@ -98,17 +120,29 @@ def start(self):
 self.last_state = {}
 
 def execute_async(self, key, command, queue=DEFAULT_QUEUE):
-self.logger.info( "[celery] queuing {key} through celery, "
-   "queue={queue}".format(**locals()))
-self.tasks[key] = execute_command.apply_async(
-args=[command], queue=queue)
+if key in self.tasks:
+self.logger.warning('[celery] existing command scheduled '
+'for {key}'.format(key=key))
+return
+
+async_command = recover_command(command)
+if async_command is not None:
+self.logger.warning('[celery] recovering already scheduled '
+'command {command}'.format(command=command))
+else:
+self.logger.info('[celery] queuing {key} through celery, '
+ 'queue={queue}'.format(key=key, queue=queue))
+async_command = execute_command.apply_async(
+args=[command], queue=queue)
+
+self.tasks[key] = async_command
 self.last_state[key] = celery_states.PENDING
 
 def sync(self):
 
 self.logger.debug(
 "Inquiring about {} celery task(s)".format(len(self.tasks)))
-for key, async in list(self.tasks.items()):
+for key, async in self.tasks.items():
 try:
 state = async.state
 if self.last_state[key] != state:
@@ -129,7 +163,7 @@ def sync(self):
 self.last_state[key] = async.state
 except Exception as e:
 logging.error("Error syncing the celery executor, ignoring "
-  "it:\n{}\n".format(e, traceback.format_exc()))
+  "it:\n{}\n{}".format(e, traceback.format_exc()))
 
 def end(self, synchronous=False):
 if synchronous:
diff --git a/tests/executors/celery_executor.py 
b/tests/executors/celery_executor.py
new file mode 100644
index 00..61c62eabd8
--- /dev/null
+++ b/tests/executors/celery_executor.py
@@ -0,0 +1,143 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# 

[jira] [Commented] (AIRFLOW-1558) S3FileTransformOperator fails in Python 3 due to file mode

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723485#comment-16723485
 ] 

ASF GitHub Bot commented on AIRFLOW-1558:
-

stale[bot] closed pull request #2559: [AIRFLOW-1558] Py3 fix for 
S3FileTransformOperator
URL: https://github.com/apache/incubator-airflow/pull/2559
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/operators/s3_file_transform_operator.py 
b/airflow/operators/s3_file_transform_operator.py
index 1cdd0e5e48..3f2a2a990a 100644
--- a/airflow/operators/s3_file_transform_operator.py
+++ b/airflow/operators/s3_file_transform_operator.py
@@ -14,6 +14,7 @@
 
 import logging
 from tempfile import NamedTemporaryFile
+import six
 import subprocess
 
 from airflow.exceptions import AirflowException
@@ -81,7 +82,7 @@ def execute(self, context):
 raise AirflowException("The source key {0} does not exist"
 "".format(self.source_s3_key))
 source_s3_key_object = source_s3.get_key(self.source_s3_key)
-with NamedTemporaryFile("w") as f_source, NamedTemporaryFile("w") as 
f_dest:
+with NamedTemporaryFile("wb") as f_source, NamedTemporaryFile("wb") as 
f_dest:
 logging.info("Dumping S3 file {0} contents to local file {1}"
  "".format(self.source_s3_key, f_source.name))
 source_s3_key_object.get_contents_to_file(f_source)
@@ -91,9 +92,13 @@ def execute(self, context):
 [self.transform_script, f_source.name, f_dest.name],
 stdout=subprocess.PIPE, stderr=subprocess.PIPE)
 (transform_script_stdoutdata, transform_script_stderrdata) = 
transform_script_process.communicate()
+if six.PY3:
+transform_script_stdoutdata = 
transform_script_stdoutdata.decode()
 logging.info("Transform script stdout "
  "" + transform_script_stdoutdata)
 if transform_script_process.returncode > 0:
+if six.PY3:
+transform_script_stderrdata = 
transform_script_stderrdata.decode()
 raise AirflowException("Transform script failed "
 "" + transform_script_stderrdata)
 else:
diff --git a/tests/operators/test_s3_file_transform_operator.py 
b/tests/operators/test_s3_file_transform_operator.py
new file mode 100644
index 00..0bd6bdf230
--- /dev/null
+++ b/tests/operators/test_s3_file_transform_operator.py
@@ -0,0 +1,82 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import unittest
+
+import boto
+from airflow.exceptions import AirflowException
+from airflow.models import Connection
+from airflow.operators.s3_file_transform_operator import 
S3FileTransformOperator
+from airflow.utils import db
+try:
+from moto import mock_s3_deprecated
+except ImportError:
+mock_s3_deprecated = None
+
+
+DEFAULT_CONN_ID = "s3_default"
+
+
+class S3FileTransformTest(unittest.TestCase):
+"""
+Tests for the S3 file transform operator.
+"""
+
+@db.provide_session
+def setUp(self, session=None):
+self.mock_s3 = mock_s3_deprecated()
+self.mock_s3.start()
+self.s3_connection = session.query(Connection).filter(
+Connection.conn_id == DEFAULT_CONN_ID
+).first()
+if self.s3_connection is None:
+self.s3_connection = Connection(conn_id=DEFAULT_CONN_ID, 
conn_type="s3")
+session.add(self.s3_connection)
+session.commit()
+
+def tearDown(self):
+self.mock_s3.stop()
+
+@unittest.skipIf(mock_s3_deprecated is None, 'mock package not present')
+def test_execute(self):
+source_key = "/source/key"
+source_bucket_name = "source-bucket"
+dest_key = "/dest/key"
+dest_bucket_name = "dest-bucket"
+key_data = u"foobar"
+# set up mock data
+s3_client = boto.connect_s3()
+source_bucket = s3_client.create_bucket(source_bucket_name)
+dest_bucket = s3_client.create_bucket(dest_bucket_name)
+source_obj = boto.s3.key.Key(source_bucket)
+

[jira] [Commented] (AIRFLOW-3458) Refactor: Move Connection out of models.py

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723245#comment-16723245
 ] 

ASF GitHub Bot commented on AIRFLOW-3458:
-

BasPH opened a new pull request #4335: [AIRFLOW-3458] Move models.Connection 
into separate file
URL: https://github.com/apache/incubator-airflow/pull/4335
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   AIRFLOW-3458
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   WIP
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   No new functionality.
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor: Move Connection out of models.py
> --
>
> Key: AIRFLOW-3458
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3458
> Project: Apache Airflow
>  Issue Type: Task
>  Components: models
>Affects Versions: 1.10.1
>Reporter: Fokko Driesprong
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3522) Support Slack Attachments for SlackWebhookHook

2018-12-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723048#comment-16723048
 ] 

ASF GitHub Bot commented on AIRFLOW-3522:
-

mholtzscher opened a new pull request #4332: [AIRFLOW-3522] Add support for 
sending Slack attachments
URL: https://github.com/apache/incubator-airflow/pull/4332
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/AIRFLOW-3522) issues and 
references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3522
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Adding support for sending Slack attachments with messages via the 
SlackWebhookOperator. This will allow customized messages to be sent with 
interactive content. 
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
 - Modified test_slack_webhook_hook.py
 - Modified test_slack_webhook_operator.py
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Slack Attachments for SlackWebhookHook
> --
>
> Key: AIRFLOW-3522
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3522
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Michael Holtzscher
>Assignee: Michael Holtzscher
>Priority: Minor
>
> The SlackWebhookHook and SlackWebhookOperator do not support sending 
> attachments. Adding support for attachments would allow for a much more full 
> featured Slack messaging experience.  
>  
> [Slack Documentation|https://api.slack.com/docs/message-attachments]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2770) kubernetes: add support for dag folder in the docker image

2018-12-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722734#comment-16722734
 ] 

ASF GitHub Bot commented on AIRFLOW-2770:
-

feng-tao closed pull request #4319: [AIRFLOW-2770] Read `dags_in_image` config 
value as a boolean
URL: https://github.com/apache/incubator-airflow/pull/4319
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index a9473178c1..8dc7db5e09 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -605,6 +605,10 @@ namespace = default
 # The name of the Kubernetes ConfigMap Containing the Airflow Configuration 
(this file)
 airflow_configmap =
 
+# For docker image already contains DAGs, this is set to `True`, and the 
worker will search for dags in dags_folder,
+# otherwise use git sync or dags volume claim to mount DAGs
+dags_in_image = False
+
 # For either git sync or volume mounted DAGs, the worker will look in this 
subpath for DAGs
 dags_volume_subpath =
 
diff --git a/airflow/contrib/executors/kubernetes_executor.py 
b/airflow/contrib/executors/kubernetes_executor.py
index f9d9ddb0fc..9342ce0a3e 100644
--- a/airflow/contrib/executors/kubernetes_executor.py
+++ b/airflow/contrib/executors/kubernetes_executor.py
@@ -137,6 +137,10 @@ def __init__(self):
 self.kubernetes_section, 'worker_service_account_name')
 self.image_pull_secrets = conf.get(self.kubernetes_section, 
'image_pull_secrets')
 
+# NOTE: user can build the dags into the docker image directly,
+# this will set to True if so
+self.dags_in_image = conf.getboolean(self.kubernetes_section, 
'dags_in_image')
+
 # NOTE: `git_repo` and `git_branch` must be specified together as a 
pair
 # The http URL of the git repository to clone from
 self.git_repo = conf.get(self.kubernetes_section, 'git_repo')
@@ -204,10 +208,12 @@ def __init__(self):
 self._validate()
 
 def _validate(self):
-if not self.dags_volume_claim and (not self.git_repo or not 
self.git_branch):
+if not self.dags_volume_claim and not self.dags_in_image \
+and (not self.git_repo or not self.git_branch):
 raise AirflowConfigException(
 'In kubernetes mode the following must be set in the 
`kubernetes` '
-'config section: `dags_volume_claim` or `git_repo and 
git_branch`')
+'config section: `dags_volume_claim` or `git_repo and 
git_branch` '
+'or `dags_in_image`')
 
 
 class KubernetesJobWatcher(multiprocessing.Process, LoggingMixin, object):
diff --git a/airflow/contrib/kubernetes/worker_configuration.py 
b/airflow/contrib/kubernetes/worker_configuration.py
index f857cbc237..58cf9cbd20 100644
--- a/airflow/contrib/kubernetes/worker_configuration.py
+++ b/airflow/contrib/kubernetes/worker_configuration.py
@@ -38,7 +38,7 @@ def __init__(self, kube_config):
 def _get_init_containers(self, volume_mounts):
 """When using git to retrieve the DAGs, use the GitSync Init 
Container"""
 # If we're using volume claims to mount the dags, no init container is 
needed
-if self.kube_config.dags_volume_claim:
+if self.kube_config.dags_volume_claim or 
self.kube_config.dags_in_image:
 return []
 
 # Otherwise, define a git-sync init container
@@ -128,32 +128,19 @@ def _construct_volume(name, claim):
 return volume
 
 volumes = [
-_construct_volume(
-dags_volume_name,
-self.kube_config.dags_volume_claim
-),
 _construct_volume(
 logs_volume_name,
 self.kube_config.logs_volume_claim
 )
 ]
 
-dag_volume_mount_path = ""
-
-if self.kube_config.dags_volume_claim:
-dag_volume_mount_path = self.worker_airflow_dags
-else:
-dag_volume_mount_path = os.path.join(
-self.worker_airflow_dags,
-self.kube_config.git_subpath
+if not self.kube_config.dags_in_image:
+volumes.append(
+_construct_volume(
+dags_volume_name,
+self.kube_config.dags_volume_claim
+)
 )
-dags_volume_mount = {
-'name': dags_volume_name,
-'mountPath': dag_volume_mount_path,
-'readOnly': True,
-}
-if self.kube_config.dags_volume_subpath:
-dags_volume_mount['subPath'] = 

  1   2   3   4   5   6   7   8   9   10   >