Podling Report Reminder - August 2018

2018-08-03 Thread jmclean
Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 15 August 2018, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, August 01).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.
*   How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://wiki.apache.org/incubator/August2018

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC


Re: Apache Airflow welcome new committer/PMC member : Feng Tao (a.k.a. feng-tao)

2018-08-03 Thread Grant Nicholas
Congrats Feng!

On Fri, Aug 3, 2018 at 12:35 PM Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Well deserved, welcome aboard!
>
> On Fri, Aug 3, 2018 at 9:07 AM Mark Grover 
> wrote:
>
> > Congrats Tao!
> >
> > On Fri, Aug 3, 2018, 08:52 Jin Chang  wrote:
> >
> > > Congrats, Tao!!
> > >
> > > On Fri, Aug 3, 2018 at 8:20 AM Taylor Edmiston 
> > > wrote:
> > >
> > > > Congratulations, Feng!
> > > >
> > > > *Taylor Edmiston*
> > > > Blog  | CV
> > > >  | LinkedIn
> > > >  | AngelList
> > > >  | Stack Overflow
> > > > 
> > > >
> > > >
> > > > On Fri, Aug 3, 2018 at 7:31 AM, Driesprong, Fokko
>  > >
> > > > wrote:
> > > >
> > > > > Welcome Feng! Awesome to have you on board!
> > > > >
> > > > > 2018-08-03 10:41 GMT+02:00 Naik Kaxil :
> > > > >
> > > > > > Hi Airflow'ers,
> > > > > >
> > > > > >
> > > > > >
> > > > > > Please join the Apache Airflow PMC in welcoming its newest member
> > and
> > > > > >
> > > > > > co-committer, Feng Tao (a.k.a. feng-tao<
> > https://github.com/feng-tao
> > > >).
> > > > > >
> > > > > >
> > > > > >
> > > > > > Welcome Feng, great to have you on board!
> > > > > >
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Kaxil
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Kaxil Naik
> > > > > >
> > > > > > Data Reply
> > > > > > 2nd Floor, Nova South
> > > > > > 160 Victoria Street, Westminster
> > > > > >  > > > > Westminster+%0D%0ALondon+SW1E+5LB+-+UK=gmail=g>
> > > > > > London SW1E 5LB - UK
> > > > > > phone: +44 (0)20 7730 6000
> > > > > > k.n...@reply.com
> > > > > > www.reply.com
> > > > > >
> > > > > > [image: Data Reply]
> > > > > >
> > > > >
> > > >
> > >
> >
>


Deploy Airflow on Kubernetes using Airflow Operator

2018-08-03 Thread Barni Seetharaman
Hi

We at Google just open-sourced a Kubernetes custom controller (also called
operator) to make deploying and managing Airflow on kubernetes simple.
The operator pattern is a power abstraction in kubernetes.
Please watch this repo (in the process of adding docs) for further updates.

https://github.com/GoogleCloudPlatform/airflow-operator

Do reach out if you have any questions.

Also created a channel in kubernetes slack  (#airflow-operator)
 for any
discussions specific to Airflow on Kubernetes (including Daniel's
Kubernetes Executor, Kuberenetes operator and this custom controller also
called kuberntes airflow operator).

regs
Barni


[GitHub] xnuinside commented on issue #3690: [AIRFLOW-2845] Remove asserts from the contrib package

2018-08-03 Thread GitBox
xnuinside commented on issue #3690: [AIRFLOW-2845] Remove asserts from the 
contrib package
URL: 
https://github.com/apache/incubator-airflow/pull/3690#issuecomment-410381632
 
 
   it's little PR, just for clean code


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #2747: AIRFLOW-1772: Fix bug with handling cron expressions as an schedule i…

2018-08-03 Thread GitBox
codecov-io edited a comment on issue #2747: AIRFLOW-1772: Fix bug with handling 
cron expressions as an schedule i…
URL: 
https://github.com/apache/incubator-airflow/pull/2747#issuecomment-341237492
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/2747?src=pr=h1)
 Report
   > Merging 
[#2747](https://codecov.io/gh/apache/incubator-airflow/pull/2747?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/4ce25029524e10770c9047ec04d3ab9c6e257cf4?src=pr=desc)
 will **decrease** coverage by `3.37%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/2747/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/2747?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ##   master   #2747  +/-   ##
   =
   - Coverage   76.47%   73.1%   -3.38% 
   =
 Files 203 156  -47 
 Lines   15012   11889-3123 
   =
   - Hits114808691-2789 
   + Misses   35323198 -334
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/2747?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/operators/redshift\_to\_s3\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/2747/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvcmVkc2hpZnRfdG9fczNfb3BlcmF0b3IucHk=)
 | `0% <0%> (-95.46%)` | :arrow_down: |
   | 
[airflow/operators/s3\_file\_transform\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/2747/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvczNfZmlsZV90cmFuc2Zvcm1fb3BlcmF0b3IucHk=)
 | `0% <0%> (-93.62%)` | :arrow_down: |
   | 
[airflow/hooks/mssql\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/2747/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9tc3NxbF9ob29rLnB5)
 | `6.66% <0%> (-66.67%)` | :arrow_down: |
   | 
[airflow/hooks/S3\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/2747/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9TM19ob29rLnB5)
 | `30.95% <0%> (-63.65%)` | :arrow_down: |
   | 
[airflow/hooks/hdfs\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/2747/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9oZGZzX2hvb2sucHk=)
 | `32.5% <0%> (-60%)` | :arrow_down: |
   | 
[airflow/operators/hive\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/2747/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvaGl2ZV9vcGVyYXRvci5weQ==)
 | `41.02% <0%> (-44.98%)` | :arrow_down: |
   | 
[airflow/operators/sensors.py](https://codecov.io/gh/apache/incubator-airflow/pull/2747/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvc2Vuc29ycy5weQ==)
 | `67.7% <0%> (-32.3%)` | :arrow_down: |
   | 
[airflow/utils/log/s3\_task\_handler.py](https://codecov.io/gh/apache/incubator-airflow/pull/2747/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9sb2cvczNfdGFza19oYW5kbGVyLnB5)
 | `72.05% <0%> (-26.52%)` | :arrow_down: |
   | 
[airflow/hooks/hive\_hooks.py](https://codecov.io/gh/apache/incubator-airflow/pull/2747/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9oaXZlX2hvb2tzLnB5)
 | `39.52% <0%> (-22.69%)` | :arrow_down: |
   | 
[airflow/utils/log/file\_task\_handler.py](https://codecov.io/gh/apache/incubator-airflow/pull/2747/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZmlsZV90YXNrX2hhbmRsZXIucHk=)
 | `75.9% <0%> (-13.51%)` | :arrow_down: |
   | ... and [195 
more](https://codecov.io/gh/apache/incubator-airflow/pull/2747/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/2747?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/2747?src=pr=footer).
 Last update 
[4ce2502...6952f0d](https://codecov.io/gh/apache/incubator-airflow/pull/2747?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: Kerberos and Airflow

2018-08-03 Thread Dan Davydov
I designed a system similar to what you are describing which is in use at
Airbnb (only DAGs on a whitelist would be allowed to merged to the git repo
if they used certain types of impersonation), it worked for simple use
cases, but the problem was doing access control becomes very difficult,
e.g. solving the problem of which DAGs map to which manifest files, and
which manifest files can access which secrets.

There is also a security risk where someone changes e.g. a python file
dependency of your task, or let's say you figure out a way to block those
kinds of changes based on your hashing, what if there is a legitimate
change in a dependency and you want to recalculate the hash? Then I think
you go back to a solution like your proposed "airflow submit" command to
accomplish this.

Additional concerns:
- I'm not sure if I'm a fan of the the first time a scheduler parses a DAG
to be what creates the hashes either, it feels to me like
encryption/hashing should be done before DAGs are even parsed by the
scheduler (at commit time or submit time of the DAGs)
- The type of the encrypted key seem kind of hacky to me, i.e. some kind of
custom hash based on DAG structure instead of a simple token passed in by
users which has a clear separation of concerns WRT security
- Added complexity both to Airflow code, and to users as they need to
define or customize hashing functions for DAGs to improve security
If we can get a reasonably secure solution then it might be a reasonable
trade-off considering the alternative is a major overhaul/restrictions to
DAGs.

Maybe I'm missing some details that would alleviate my concerns here, and a
bit of a more in-depth document might help?



*Also: using the Kubernetes executor combined with some of the things
wediscussed greatly enhances the security of Airflow as the
environment isn’t really shared anymore.*
Assuming a multi-tenant scheduler, I feel the same set of hard problems
exist with Kubernetes, as the executor mainly just simplifies the
post-executor parts of task scheduling/execution which I think you already
outlined a good solution for early on in this thread (passing keys from the
executor to workers).

Happy to set up some time to talk real-time about this by the way, once we
iron out the details I want to implement whatever the best solution we come
up with is.

On Thu, Aug 2, 2018 at 4:13 PM Bolke de Bruin  wrote:

> You mentioned you would like to make sure that the DAG (and its tasks)
> runs in a confined set of settings. Ie.
> A given set of connections at submission time not at run time. So here we
> can make use of the fact that both the scheduler
> and the worker parse the DAG.
>
> Firstly, when scheduler evaluates a DAG it can add an integrity check
> (hash) for each task. The executor can encrypt the
> metadata with this hash ensuring that the structure of the DAG remained
> the same. It means that the task is only
> able to decrypt the metadata when it is able to calculate the same hash.
>
> Similarly, if the scheduler parses a DAG for the first time it can
> register the hashes for the tasks. It can then verify these hashes
> at runtime to ensure the structure of the tasks have stayed the same. In
> the manifest (which could even in the DAG or
> part of the DAG definition) we could specify which fields would be used
> for hash calculation. We could even specify
> static hashes. This would give flexibility as to what freedom the users
> have in the auto-generated DAGS.
>
> Something like that?
>
> B.
>
> > On 2 Aug 2018, at 20:12, Dan Davydov 
> wrote:
> >
> > I'm very intrigued, and am curious how this would work in a bit more
> > detail, especially for dynamically created DAGs (how would static
> manifests
> > map to DAGs that are generated from rows in a MySQL table for example)?
> You
> > could of course have something like regexes in your manifest file like
> > some_dag_framework_dag_*, but then how would you make sure that other
> users
> > did not create DAGs that matched this regex?
> >
> > On Thu, Aug 2, 2018 at 1:51 PM Bolke de Bruin  > wrote:
> >
> >> Hi Dan,
> >>
> >> I discussed this a little bit with one of the security architects here.
> We
> >> think that
> >> you can have a fair trade off between security and usability by having
> >> a kind of manifest with the dag you are submitting. This manifest can
> then
> >> specify what the generated tasks/dags are allowed to do and what
> metadata
> >> to provide to them. We could also let the scheduler generate hashes per
> >> generated
> >> DAG / task and verify those with an established version (1st run?). This
> >> limits the
> >> attack vector.
> >>
> >> A DagSerializer would be great, but I think it solves a different issue
> >> and the above
> >> is somewhat simpler to implement?
> >>
> >> Bolke
> >>
> >>> On 29 Jul 2018, at 23:47, Dan Davydov 
> >> wrote:
> >>>
> >>> *Let’s say we trust the owner field of the DAGs I think we could do the
> >>> following.*
> >>> 

[GitHub] xnuinside opened a new pull request #3690: [AIRFLOW-2845] Remove asserts from the contrib package

2018-08-03 Thread GitBox
xnuinside opened a new pull request #3690: [AIRFLOW-2845] Remove asserts from 
the contrib package
URL: https://github.com/apache/incubator-airflow/pull/3690
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following 
[AIRFLOW-2845](https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-2845)
 issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My 
Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   `asserts` is used in Airflow contrib package code .  And from point of view 
for which purposes asserts are really is, it's not correct.
   
   If we look at documentation we could find information what asserts is debug 
tool: 
https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement and 
also it is could be disabled globally by default. 
   
   So, I just want to change debug asserts to ValueError and TypeError.
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   It's covered by existing tests. No new features or important changes. 
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil closed pull request #3689: [AIRFLOW-XXX] Update project.rst page

2018-08-03 Thread GitBox
kaxil closed pull request #3689: [AIRFLOW-XXX] Update project.rst page
URL: https://github.com/apache/incubator-airflow/pull/3689
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/project.rst b/docs/project.rst
index d1f2cc010c..cd3b60fca1 100644
--- a/docs/project.rst
+++ b/docs/project.rst
@@ -30,6 +30,7 @@ Committers
 - @fokko (Fokko Driesprong)
 - @ash (Ash Berlin-Taylor)
 - @kaxilnaik (Kaxil Naik)
+- @feng-tao (Tao Feng)
 
 For the full list of contributors, take a look at `Airflow's Github
 Contributor page:


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: Use 'watch' feature of Github instead of this list?

2018-08-03 Thread Maxime Beauchemin
We have an open issue with Apache Infra about this that you can track here:
https://issues.apache.org/jira/browse/INFRA-16854

On Fri, Aug 3, 2018 at 11:29 AM Trent Robbins  wrote:

> Hi All,
>
> Is it possible that people who want to see a notification for every issue
> can subscribe to notifications using the 'watch' feature of GitHub rather
> than mirroring each notification on this list through GitBox?
>
> Thanks!
>
>
>
> Best,
>
> Trent Robbins
> Strategic Consultant for Open Source Software
> Tau Informatics LLC
> desk: 415-404-9452
> tr...@tauinformatics.com
> https://www.linkedin.com/in/trentrobbins
>


Use 'watch' feature of Github instead of this list?

2018-08-03 Thread Trent Robbins
Hi All,

Is it possible that people who want to see a notification for every issue
can subscribe to notifications using the 'watch' feature of GitHub rather
than mirroring each notification on this list through GitBox?

Thanks!



Best,

Trent Robbins
Strategic Consultant for Open Source Software
Tau Informatics LLC
desk: 415-404-9452
tr...@tauinformatics.com
https://www.linkedin.com/in/trentrobbins


[GitHub] codecov-io edited a comment on issue #3689: [AIRFLOW-XXX] Update project.rst page

2018-08-03 Thread GitBox
codecov-io edited a comment on issue #3689: [AIRFLOW-XXX] Update project.rst 
page
URL: 
https://github.com/apache/incubator-airflow/pull/3689#issuecomment-410334094
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3689?src=pr=h1)
 Report
   > Merging 
[#3689](https://codecov.io/gh/apache/incubator-airflow/pull/3689?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/97bc70bd20d377d8eb372e4a7989ed48f6133ab1?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3689/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3689?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3689   +/-   ##
   ===
 Coverage   77.57%   77.57%   
   ===
 Files 205  205   
 Lines   1577115771   
   ===
 Hits1223412234   
 Misses   3537 3537
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3689?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3689?src=pr=footer).
 Last update 
[97bc70b...cd4f819](https://codecov.io/gh/apache/incubator-airflow/pull/3689?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3689: [AIRFLOW-XXX] Update project.rst page

2018-08-03 Thread GitBox
codecov-io commented on issue #3689: [AIRFLOW-XXX] Update project.rst page
URL: 
https://github.com/apache/incubator-airflow/pull/3689#issuecomment-410334094
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3689?src=pr=h1)
 Report
   > Merging 
[#3689](https://codecov.io/gh/apache/incubator-airflow/pull/3689?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/97bc70bd20d377d8eb372e4a7989ed48f6133ab1?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3689/graphs/tree.svg?width=650=pr=WdLKlKHOAU=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3689?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3689   +/-   ##
   ===
 Coverage   77.57%   77.57%   
   ===
 Files 205  205   
 Lines   1577115771   
   ===
 Hits1223412234   
 Misses   3537 3537
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3689?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3689?src=pr=footer).
 Last update 
[97bc70b...cd4f819](https://codecov.io/gh/apache/incubator-airflow/pull/3689?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3677: [AIRFLOW-2826] Add GoogleCloudKMSHook

2018-08-03 Thread GitBox
codecov-io edited a comment on issue #3677: [AIRFLOW-2826] Add 
GoogleCloudKMSHook
URL: 
https://github.com/apache/incubator-airflow/pull/3677#issuecomment-409752876
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3677?src=pr=h1)
 Report
   > Merging 
[#3677](https://codecov.io/gh/apache/incubator-airflow/pull/3677?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/97bc70bd20d377d8eb372e4a7989ed48f6133ab1?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3677/graphs/tree.svg?height=150=650=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3677?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3677   +/-   ##
   ===
 Coverage   77.57%   77.57%   
   ===
 Files 205  205   
 Lines   1577115771   
   ===
 Hits1223412234   
 Misses   3537 3537
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3677?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3677?src=pr=footer).
 Last update 
[97bc70b...a4b7753](https://codecov.io/gh/apache/incubator-airflow/pull/3677?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] gglanzani commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training

2018-08-03 Thread GitBox
gglanzani commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r207618294
 
 

 ##
 File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py
 ##
 @@ -0,0 +1,98 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.models import BaseOperator
+from airflow.utils import apply_defaults
+from airflow.exceptions import AirflowException
+
+
+class SageMakerCreateTrainingJobOperator(BaseOperator):
+
+"""
+   Initiate a SageMaker training
+
+   This operator returns The ARN of the model created in Amazon SageMaker
+
+   :param training_job_config:
+   The configuration necessary to start a training job (templated)
+   :type training_job_config: dict
+   :param region_name: The AWS region_name
+   :type region_name: string
+   :param sagemaker_conn_id: The SageMaker connection ID to use.
+   :type aws_conn_id: string
 
 Review comment:
   @troychen728 The use case you mention justifies adding the sensor, but not 
removing the functionality from the operator.
   
   Running sequentially is not really suited for production workflows, it 
should not be a concern.
   
   Having a sensor per operator problematic though (for normal usage), hence 
keeping the functionality in the operator sounds the most sensible to me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: Apache Airflow welcome new committer/PMC member : Feng Tao (a.k.a. feng-tao)

2018-08-03 Thread Maxime Beauchemin
Well deserved, welcome aboard!

On Fri, Aug 3, 2018 at 9:07 AM Mark Grover 
wrote:

> Congrats Tao!
>
> On Fri, Aug 3, 2018, 08:52 Jin Chang  wrote:
>
> > Congrats, Tao!!
> >
> > On Fri, Aug 3, 2018 at 8:20 AM Taylor Edmiston 
> > wrote:
> >
> > > Congratulations, Feng!
> > >
> > > *Taylor Edmiston*
> > > Blog  | CV
> > >  | LinkedIn
> > >  | AngelList
> > >  | Stack Overflow
> > > 
> > >
> > >
> > > On Fri, Aug 3, 2018 at 7:31 AM, Driesprong, Fokko  >
> > > wrote:
> > >
> > > > Welcome Feng! Awesome to have you on board!
> > > >
> > > > 2018-08-03 10:41 GMT+02:00 Naik Kaxil :
> > > >
> > > > > Hi Airflow'ers,
> > > > >
> > > > >
> > > > >
> > > > > Please join the Apache Airflow PMC in welcoming its newest member
> and
> > > > >
> > > > > co-committer, Feng Tao (a.k.a. feng-tao<
> https://github.com/feng-tao
> > >).
> > > > >
> > > > >
> > > > >
> > > > > Welcome Feng, great to have you on board!
> > > > >
> > > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Kaxil
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Kaxil Naik
> > > > >
> > > > > Data Reply
> > > > > 2nd Floor, Nova South
> > > > > 160 Victoria Street, Westminster
> > > > >  > > > Westminster+%0D%0ALondon+SW1E+5LB+-+UK=gmail=g>
> > > > > London SW1E 5LB - UK
> > > > > phone: +44 (0)20 7730 6000
> > > > > k.n...@reply.com
> > > > > www.reply.com
> > > > >
> > > > > [image: Data Reply]
> > > > >
> > > >
> > >
> >
>


[GitHub] feng-tao commented on issue #3689: [AIRFLOW-XXX] Update project.rst page

2018-08-03 Thread GitBox
feng-tao commented on issue #3689: [AIRFLOW-XXX] Update project.rst page
URL: 
https://github.com/apache/incubator-airflow/pull/3689#issuecomment-410324053
 
 
   PTAL @kaxil 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao opened a new pull request #3689: [AIRFLOW-XXX] Update project.rst page

2018-08-03 Thread GitBox
feng-tao opened a new pull request #3689: [AIRFLOW-XXX] Update project.rst page
URL: https://github.com/apache/incubator-airflow/pull/3689
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Update the project page.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training

2018-08-03 Thread GitBox
troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r207614985
 
 

 ##
 File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py
 ##
 @@ -0,0 +1,98 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.models import BaseOperator
+from airflow.utils import apply_defaults
+from airflow.exceptions import AirflowException
+
+
+class SageMakerCreateTrainingJobOperator(BaseOperator):
+
+"""
+   Initiate a SageMaker training
+
+   This operator returns The ARN of the model created in Amazon SageMaker
+
+   :param training_job_config:
+   The configuration necessary to start a training job (templated)
+   :type training_job_config: dict
+   :param region_name: The AWS region_name
+   :type region_name: string
+   :param sagemaker_conn_id: The SageMaker connection ID to use.
+   :type aws_conn_id: string
 
 Review comment:
   @Fokko 
   Thank you very much for your reply. I agree with you that multiple jobs can 
still run in parallel, but to my understanding, there's also an option to run 
in sequential.  More importantly, I think one thing that is very essential is 
that, what if there's already a training job running (whether initiated from 
Airflow or other means) before a dag is scheduled to run? Since a training job 
can takes days to finish, I think it might be a potential use case for Airflow 
users. As a result, a DAG can start with a sensor, and have other nodes 
dependent on the sensor. It would not be possible to accomplish this if 
operator, and sensor are coupled together.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jakahn commented on a change in pull request #3677: [AIRFLOW-2826] Add GoogleCloudKMSHook

2018-08-03 Thread GitBox
jakahn commented on a change in pull request #3677: [AIRFLOW-2826] Add 
GoogleCloudKMSHook
URL: https://github.com/apache/incubator-airflow/pull/3677#discussion_r207613456
 
 

 ##
 File path: airflow/contrib/hooks/gcp_kms_hook.py
 ##
 @@ -0,0 +1,108 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import base64
+
+from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
+
+from apiclient.discovery import build
+
+
+def _b64encode(s):
+""" Base 64 encodes a bytes object to a string """
+return base64.b64encode(s).decode('ascii')
+
+
+def _b64decode(s):
+""" Base 64 decodes a string to bytes. """
+return base64.b64decode(s.encode('utf-8'))
+
+
+class GoogleCloudKMSHook(GoogleCloudBaseHook):
+"""
+Interact with Google Cloud KMS. This hook uses the Google Cloud Platform
+connection.
+"""
+
+def __init__(self, gcp_conn_id='google_cloud_default', delegate_to=None):
+super(GoogleCloudKMSHook, self).__init__(gcp_conn_id, 
delegate_to=delegate_to)
+
+def get_conn(self):
+"""
+Returns a KMS service object.
+
+:rtype: apiclient.discovery.Resource
+"""
+http_authorized = self._authorize()
+return build(
+'cloudkms', 'v1', http=http_authorized, cache_discovery=False)
+
+def encrypt(self, key_name, plaintext, authenticated_data=None):
+"""
+Encrypts a plaintext message using Google Cloud KMS.
+
+:param key_name: The Resource Name for the key (or key version)
+ to be used for encyption. Of the form
+ ``projects/*/locations/*/keyRings/*/cryptoKeys/**``
+:type key_name: str
+:param plaintext: The message to be encrypted.
+:type plaintext: bytes
+:param authenticated_data: Optional additional authenticated data that
+   must also be provided to decrypt the 
message.
+:type authenticated_data: str
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3688: [AIRFLOW-2843] ExternalTaskSensor-check if external task exists

2018-08-03 Thread GitBox
XD-DENG commented on a change in pull request #3688: [AIRFLOW-2843] 
ExternalTaskSensor-check if external task exists
URL: https://github.com/apache/incubator-airflow/pull/3688#discussion_r207610690
 
 

 ##
 File path: airflow/sensors/external_task_sensor.py
 ##
 @@ -70,9 +76,24 @@ def __init__(self,
 self.execution_date_fn = execution_date_fn
 self.external_dag_id = external_dag_id
 self.external_task_id = external_task_id
+self.check_existence = check_existence
 
 @provide_session
 def poke(self, context, session=None):
+TI = TaskInstance
+
+if self.check_existence:
+existence = session.query(TI).filter(
+TI.dag_id == self.external_dag_id,
+TI.task_id == self.external_task_id,
+).count()
+session.commit()
+if existence == 0:
+raise AirflowException('The external task "' +
 
 Review comment:
   But I do agree this feature may be too specific and only applicable to 
limited use cases (like some I'm dealing with).
   
   Please share your thoughts, and feel free to close this PR should you find 
pro < con. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3688: [AIRFLOW-2843] ExternalTaskSensor-check if external task exists

2018-08-03 Thread GitBox
XD-DENG commented on a change in pull request #3688: [AIRFLOW-2843] 
ExternalTaskSensor-check if external task exists
URL: https://github.com/apache/incubator-airflow/pull/3688#discussion_r207606729
 
 

 ##
 File path: airflow/sensors/external_task_sensor.py
 ##
 @@ -70,9 +76,24 @@ def __init__(self,
 self.execution_date_fn = execution_date_fn
 self.external_dag_id = external_dag_id
 self.external_task_id = external_task_id
+self.check_existence = check_existence
 
 @provide_session
 def poke(self, context, session=None):
+TI = TaskInstance
+
+if self.check_existence:
+existence = session.query(TI).filter(
+TI.dag_id == self.external_dag_id,
+TI.task_id == self.external_task_id,
+).count()
+session.commit()
+if existence == 0:
+raise AirflowException('The external task "' +
 
 Review comment:
   There may be a few cases:
   - The external DAG ID specified is wrong (due to reasons like typo);
   - The external task specified doesn't exist in the corresponding DAG 
(similar reason).
   - ...
   
   Starting on time or not doesn't matter much here, since only DAG ID& task ID 
are used here for querying.
   
   I consider this feature as a "guard" to prevent from entering wrong DAG ID 
and task ID. Without guard, it may be hard to find this type of errors (the 
sensor will keep waiting and eventually leaves an impression that it fails only 
because the external task was not executed/finished yet).
   
   Given the default value of this new argument is `False`, it will not change 
the current behavior.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on a change in pull request #3688: [AIRFLOW-2843] ExternalTaskSensor-check if external task exists

2018-08-03 Thread GitBox
feng-tao commented on a change in pull request #3688: [AIRFLOW-2843] 
ExternalTaskSensor-check if external task exists
URL: https://github.com/apache/incubator-airflow/pull/3688#discussion_r207604878
 
 

 ##
 File path: airflow/sensors/external_task_sensor.py
 ##
 @@ -70,9 +76,24 @@ def __init__(self,
 self.execution_date_fn = execution_date_fn
 self.external_dag_id = external_dag_id
 self.external_task_id = external_task_id
+self.check_existence = check_existence
 
 @provide_session
 def poke(self, context, session=None):
+TI = TaskInstance
+
+if self.check_existence:
+existence = session.query(TI).filter(
+TI.dag_id == self.external_dag_id,
+TI.task_id == self.external_task_id,
+).count()
+session.commit()
+if existence == 0:
+raise AirflowException('The external task "' +
 
 Review comment:
   why stop waiting if the external task not exist? Shouldn't the right 
behavior to continue waiting for the task until task exist and finish(or 
timeout?). Sometimes the external task may not start right on time(e.g 
scheduler reason etc) and we still want downstream task wait for external task 
to finish first to make sure result align.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3688: [AIRFLOW-2843] ExternalTaskSensor-check if external task exists

2018-08-03 Thread GitBox
codecov-io commented on issue #3688: [AIRFLOW-2843] ExternalTaskSensor-check if 
external task exists
URL: 
https://github.com/apache/incubator-airflow/pull/3688#issuecomment-410310567
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3688?src=pr=h1)
 Report
   > Merging 
[#3688](https://codecov.io/gh/apache/incubator-airflow/pull/3688?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/97bc70bd20d377d8eb372e4a7989ed48f6133ab1?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3688/graphs/tree.svg?width=650=pr=WdLKlKHOAU=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3688?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3688  +/-   ##
   ==
   + Coverage   77.57%   77.57%   +<.01% 
   ==
 Files 205  205  
 Lines   1577115778   +7 
   ==
   + Hits1223412240   +6 
   - Misses   3537 3538   +1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3688?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/sensors/external\_task\_sensor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3688/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL2V4dGVybmFsX3Rhc2tfc2Vuc29yLnB5)
 | `97.43% <100%> (+0.56%)` | :arrow_up: |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3688/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.54% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3688?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3688?src=pr=footer).
 Last update 
[97bc70b...2605258](https://codecov.io/gh/apache/incubator-airflow/pull/3688?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jakahn commented on a change in pull request #3677: [AIRFLOW-2826] Add GoogleCloudKMSHook

2018-08-03 Thread GitBox
jakahn commented on a change in pull request #3677: [AIRFLOW-2826] Add 
GoogleCloudKMSHook
URL: https://github.com/apache/incubator-airflow/pull/3677#discussion_r207600866
 
 

 ##
 File path: airflow/contrib/hooks/gcp_kms_hook.py
 ##
 @@ -0,0 +1,108 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import base64
+
+from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
+
+from apiclient.discovery import build
+
+
+def _b64encode(s):
+""" Base 64 encodes a bytes object to a string """
+return base64.b64encode(s).decode('ascii')
+
+
+def _b64decode(s):
+""" Base 64 decodes a string to bytes. """
+return base64.b64decode(s.encode('utf-8'))
+
+
+class GoogleCloudKMSHook(GoogleCloudBaseHook):
+"""
+Interact with Google Cloud KMS. This hook uses the Google Cloud Platform
+connection.
+"""
+
+def __init__(self, gcp_conn_id='google_cloud_default', delegate_to=None):
+super(GoogleCloudKMSHook, self).__init__(gcp_conn_id, 
delegate_to=delegate_to)
+
+def get_conn(self):
+"""
+Returns a KMS service object.
+
+:rtype: apiclient.discovery.Resource
+"""
+http_authorized = self._authorize()
+return build(
+'cloudkms', 'v1', http=http_authorized, cache_discovery=False)
+
+def encrypt(self, key_name, plaintext, authenticated_data=None):
+"""
+Encrypts a plaintext message using Google Cloud KMS.
+
+:param key_name: The Resource Name for the key (or key version)
+ to be used for encyption. Of the form
+ ``projects/*/locations/*/keyRings/*/cryptoKeys/**``
+:type key_name: str
+:param plaintext: The message to be encrypted.
+:type plaintext: bytes
+:param authenticated_data: Optional additional authenticated data that
+   must also be provided to decrypt the 
message.
+:type authenticated_data: str
+:return: The base 64 encoded ciphertext of the original message.
+:rtype: str
+"""
+keys = self.get_conn().projects().locations().keyRings().cryptoKeys()
+body = {'plaintext': _b64encode(plaintext)}
+if authenticated_data:
+body['additionalAuthenticatedData'] = 
_b64encode(authenticated_data)
+
+request = keys.encrypt(name=key_name, body=body)
+response = request.execute()
+
+ciphertext = response['ciphertext']
+return ciphertext
+
+def decrypt(self, key_name, ciphertext, authenticated_data=None):
+"""
+Decrypts a ciphertext message using Google Cloud KMS.
+
+:param key_name: The Resource Name for the key to be used for 
decyption.
+ Of the form 
``projects/*/locations/*/keyRings/*/cryptoKeys/**``
+:type key_name: str
+:param ciphertext: The message to be decrypted.
+:type ciphertext: str
+:param authenticated_data: Any additional authenticated data that was
+   provided when encrypting the message.
+:type authenticated_data: str
+:return: The original message.
+:rtype: bytes
+"""
+keys = self.get_conn().projects().locations().keyRings().cryptoKeys()
+body = {'ciphertext': ciphertext}
+if authenticated_data:
+body['additionalAuthenticatedData'] = 
_b64encode(authenticated_data)
+
+request = keys.decrypt(name=key_name, body=body)
+response = request.execute()
+
+plaintext = _b64decode(response['plaintext'])
 
 Review comment:
   I have, all calls to the API result in base64 encoded strings


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: Apache Airflow welcome new committer/PMC member : Feng Tao (a.k.a. feng-tao)

2018-08-03 Thread Mark Grover
Congrats Tao!

On Fri, Aug 3, 2018, 08:52 Jin Chang  wrote:

> Congrats, Tao!!
>
> On Fri, Aug 3, 2018 at 8:20 AM Taylor Edmiston 
> wrote:
>
> > Congratulations, Feng!
> >
> > *Taylor Edmiston*
> > Blog  | CV
> >  | LinkedIn
> >  | AngelList
> >  | Stack Overflow
> > 
> >
> >
> > On Fri, Aug 3, 2018 at 7:31 AM, Driesprong, Fokko 
> > wrote:
> >
> > > Welcome Feng! Awesome to have you on board!
> > >
> > > 2018-08-03 10:41 GMT+02:00 Naik Kaxil :
> > >
> > > > Hi Airflow'ers,
> > > >
> > > >
> > > >
> > > > Please join the Apache Airflow PMC in welcoming its newest member and
> > > >
> > > > co-committer, Feng Tao (a.k.a. feng-tao >).
> > > >
> > > >
> > > >
> > > > Welcome Feng, great to have you on board!
> > > >
> > > >
> > > >
> > > > Cheers,
> > > >
> > > > Kaxil
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Kaxil Naik
> > > >
> > > > Data Reply
> > > > 2nd Floor, Nova South
> > > > 160 Victoria Street, Westminster
> > > >  > > Westminster+%0D%0ALondon+SW1E+5LB+-+UK=gmail=g>
> > > > London SW1E 5LB - UK
> > > > phone: +44 (0)20 7730 6000
> > > > k.n...@reply.com
> > > > www.reply.com
> > > >
> > > > [image: Data Reply]
> > > >
> > >
> >
>


Re: Apache Airflow welcome new committer/PMC member : Feng Tao (a.k.a. feng-tao)

2018-08-03 Thread Jin Chang
Congrats, Tao!!

On Fri, Aug 3, 2018 at 8:20 AM Taylor Edmiston  wrote:

> Congratulations, Feng!
>
> *Taylor Edmiston*
> Blog  | CV
>  | LinkedIn
>  | AngelList
>  | Stack Overflow
> 
>
>
> On Fri, Aug 3, 2018 at 7:31 AM, Driesprong, Fokko 
> wrote:
>
> > Welcome Feng! Awesome to have you on board!
> >
> > 2018-08-03 10:41 GMT+02:00 Naik Kaxil :
> >
> > > Hi Airflow'ers,
> > >
> > >
> > >
> > > Please join the Apache Airflow PMC in welcoming its newest member and
> > >
> > > co-committer, Feng Tao (a.k.a. feng-tao).
> > >
> > >
> > >
> > > Welcome Feng, great to have you on board!
> > >
> > >
> > >
> > > Cheers,
> > >
> > > Kaxil
> > >
> > >
> > >
> > >
> > >
> > >
> > > Kaxil Naik
> > >
> > > Data Reply
> > > 2nd Floor, Nova South
> > > 160 Victoria Street, Westminster
> > >  > Westminster+%0D%0ALondon+SW1E+5LB+-+UK=gmail=g>
> > > London SW1E 5LB - UK
> > > phone: +44 (0)20 7730 6000
> > > k.n...@reply.com
> > > www.reply.com
> > >
> > > [image: Data Reply]
> > >
> >
>


Re: Apache Airflow welcome new committer/PMC member : Feng Tao (a.k.a. feng-tao)

2018-08-03 Thread Dan Davydov
Welcome Feng, awesome work :)!

On Fri, Aug 3, 2018 at 11:20 AM Taylor Edmiston  wrote:

> Congratulations, Feng!
>
> *Taylor Edmiston*
> Blog  | CV
>  | LinkedIn
>  | AngelList
>  | Stack Overflow
> 
>
>
> On Fri, Aug 3, 2018 at 7:31 AM, Driesprong, Fokko 
> wrote:
>
> > Welcome Feng! Awesome to have you on board!
> >
> > 2018-08-03 10:41 GMT+02:00 Naik Kaxil :
> >
> > > Hi Airflow'ers,
> > >
> > >
> > >
> > > Please join the Apache Airflow PMC in welcoming its newest member and
> > >
> > > co-committer, Feng Tao (a.k.a. feng-tao).
> > >
> > >
> > >
> > > Welcome Feng, great to have you on board!
> > >
> > >
> > >
> > > Cheers,
> > >
> > > Kaxil
> > >
> > >
> > >
> > >
> > >
> > >
> > > Kaxil Naik
> > >
> > > Data Reply
> > > 2nd Floor, Nova South
> > > 160 Victoria Street, Westminster
> > >  > Westminster+%0D%0ALondon+SW1E+5LB+-+UK=gmail=g>
> > > London SW1E 5LB - UK
> > > phone: +44 (0)20 7730 6000
> > > k.n...@reply.com
> > > www.reply.com
> > >
> > > [image: Data Reply]
> > >
> >
>


Re: Apache Airflow welcome new committer/PMC member : Feng Tao (a.k.a. feng-tao)

2018-08-03 Thread Taylor Edmiston
Congratulations, Feng!

*Taylor Edmiston*
Blog  | CV
 | LinkedIn
 | AngelList
 | Stack Overflow



On Fri, Aug 3, 2018 at 7:31 AM, Driesprong, Fokko 
wrote:

> Welcome Feng! Awesome to have you on board!
>
> 2018-08-03 10:41 GMT+02:00 Naik Kaxil :
>
> > Hi Airflow'ers,
> >
> >
> >
> > Please join the Apache Airflow PMC in welcoming its newest member and
> >
> > co-committer, Feng Tao (a.k.a. feng-tao).
> >
> >
> >
> > Welcome Feng, great to have you on board!
> >
> >
> >
> > Cheers,
> >
> > Kaxil
> >
> >
> >
> >
> >
> >
> > Kaxil Naik
> >
> > Data Reply
> > 2nd Floor, Nova South
> > 160 Victoria Street, Westminster
> >  Westminster+%0D%0ALondon+SW1E+5LB+-+UK=gmail=g>
> > London SW1E 5LB - UK
> > phone: +44 (0)20 7730 6000
> > k.n...@reply.com
> > www.reply.com
> >
> > [image: Data Reply]
> >
>


[GitHub] XD-DENG opened a new pull request #3688: [AIRFLOW-2843] ExternalTaskSensor-check if external task exists

2018-08-03 Thread GitBox
XD-DENG opened a new pull request #3688: [AIRFLOW-2843] 
ExternalTaskSensor-check if external task exists
URL: https://github.com/apache/incubator-airflow/pull/3688
 
 
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2843
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
    Background
   `ExternalTaskSensor` will keep waiting (given restrictions of retries, 
poke_interval, etc), even if the external task specified doesn't exist at all. 
In some cases, this waiting may still make sense as new DAG may backfill.
   
   But it may be good to provide an option to cease waiting immediately if the 
external task specified doesn't exist.
   
    Proposal
   Provide an argument `check_existence`. Set to `True` to check if the 
external task exists, and immediately cease waiting if the external task does 
not exist.
   
   **The default value is set to `False` (no check or ceasing will happen), so 
it will not affect any existing DAGs or user expectation.**
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on issue #3674: [AIRFLOW-2836] Minor improvement of contrib.sensors.FileSensor

2018-08-03 Thread GitBox
XD-DENG commented on issue #3674: [AIRFLOW-2836] Minor improvement of 
contrib.sensors.FileSensor
URL: 
https://github.com/apache/incubator-airflow/pull/3674#issuecomment-410261523
 
 
   Thanks @Fokko 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko closed pull request #3674: [AIRFLOW-2836] Minor improvement of contrib.sensors.FileSensor

2018-08-03 Thread GitBox
Fokko closed pull request #3674: [AIRFLOW-2836] Minor improvement of 
contrib.sensors.FileSensor
URL: https://github.com/apache/incubator-airflow/pull/3674
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/sensors/file_sensor.py 
b/airflow/contrib/sensors/file_sensor.py
index 3f7bb24e08..3e49abdfb5 100644
--- a/airflow/contrib/sensors/file_sensor.py
+++ b/airflow/contrib/sensors/file_sensor.py
@@ -46,7 +46,7 @@ class FileSensor(BaseSensorOperator):
 @apply_defaults
 def __init__(self,
  filepath,
- fs_conn_id='fs_default2',
+ fs_conn_id='fs_default',
  *args,
  **kwargs):
 super(FileSensor, self).__init__(*args, **kwargs)
@@ -56,7 +56,7 @@ def __init__(self,
 def poke(self, context):
 hook = FSHook(self.fs_conn_id)
 basepath = hook.get_path()
-full_path = "/".join([basepath, self.filepath])
+full_path = os.path.join(basepath, self.filepath)
 self.log.info('Poking for file {full_path}'.format(**locals()))
 try:
 if stat.S_ISDIR(os.stat(full_path).st_mode):
diff --git a/tests/contrib/sensors/test_file_sensor.py 
b/tests/contrib/sensors/test_file_sensor.py
index d78400e317..0bb0007c60 100644
--- a/tests/contrib/sensors/test_file_sensor.py
+++ b/tests/contrib/sensors/test_file_sensor.py
@@ -125,6 +125,18 @@ def test_file_in_dir(self):
 finally:
 shutil.rmtree(dir)
 
+def test_default_fs_conn_id(self):
+with tempfile.NamedTemporaryFile() as tmp:
+task = FileSensor(
+task_id="test",
+filepath=tmp.name[1:],
+dag=self.dag,
+timeout=0,
+)
+task._hook = self.hook
+task.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE,
+ ignore_ti_state=True)
+
 
 if __name__ == '__main__':
 unittest.main()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training

2018-08-03 Thread GitBox
Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon 
SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r207548867
 
 

 ##
 File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py
 ##
 @@ -0,0 +1,98 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.models import BaseOperator
+from airflow.utils import apply_defaults
+from airflow.exceptions import AirflowException
+
+
+class SageMakerCreateTrainingJobOperator(BaseOperator):
+
+"""
+   Initiate a SageMaker training
+
+   This operator returns The ARN of the model created in Amazon SageMaker
+
+   :param training_job_config:
+   The configuration necessary to start a training job (templated)
+   :type training_job_config: dict
+   :param region_name: The AWS region_name
+   :type region_name: string
+   :param sagemaker_conn_id: The SageMaker connection ID to use.
+   :type aws_conn_id: string
 
 Review comment:
   @srrajeev-aws In this case you would just kick off multiple operators in 
parallel. This is inherent of the concept of a DAG, if the training jobs don't 
have any dependencies on each other, they will just run in parallel. The only 
flexibility that the decoupling of the kicking of the job, and monitoring the 
job is in the case when you don't care about the outcome of the job. This is 
also analoge to Druid, an indexing job can take up to a couple of hours.
   Having a separate operator and sensor would make the DAGs unnecessarily 
complicated, since in practice you will always use them as a pair.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #3475: [AIRFLOW-2315] Improve S3Hook

2018-08-03 Thread GitBox
Fokko commented on issue #3475: [AIRFLOW-2315] Improve S3Hook
URL: 
https://github.com/apache/incubator-airflow/pull/3475#issuecomment-410255815
 
 
   The tests are still red:
   ```
   ==
   25) ERROR: test_execute 
(tests.contrib.operators.test_gcs_to_s3_operator.GoogleCloudStorageToS3OperatorTest)
   --
  Traceback (most recent call last):
   .tox/py27-backend_mysql/lib/python2.7/site-packages/moto/core/models.py 
line 70 in wrapper
 result = func(*args, **kwargs)
   .tox/py27-backend_mysql/lib/python2.7/site-packages/mock/mock.py line 
1305 in patched
 return func(*args, **keywargs)
   tests/contrib/operators/test_gcs_to_s3_operator.py line 70 in 
test_execute
 uploaded_files = operator.execute(None)
   airflow/contrib/operators/gcs_to_s3.py line 107 in execute
 replace=self.replace)
   airflow/hooks/S3_hook.py line 379 in load_bytes
 upload_args)
   airflow/hooks/S3_hook.py line 425 in _prepare_load
 connection_object = self.get_connection(self.aws_conn_id)
   airflow/hooks/base_hook.py line 80 in get_connection
 conn = random.choice(cls.get_connections(conn_id))
   airflow/hooks/base_hook.py line 71 in get_connections
 conn = cls._get_connection_from_env(conn_id)
   airflow/hooks/base_hook.py line 63 in _get_connection_from_env
 environment_uri = os.environ.get(CONN_ENV_PREFIX + conn_id.upper())
  AttributeError: 'NoneType' object has no attribute 'upper'
   ```
   
   Could you rebase onto master? Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #3686: [AIRFLOW-2796] Expand code coverage for utils/helpers.py

2018-08-03 Thread GitBox
Fokko commented on issue #3686: [AIRFLOW-2796] Expand code coverage for 
utils/helpers.py
URL: 
https://github.com/apache/incubator-airflow/pull/3686#issuecomment-410254009
 
 
   Thanks @Noremac201 Appreciate it!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko closed pull request #3686: [AIRFLOW-2796] Expand code coverage for utils/helpers.py

2018-08-03 Thread GitBox
Fokko closed pull request #3686: [AIRFLOW-2796] Expand code coverage for 
utils/helpers.py
URL: https://github.com/apache/incubator-airflow/pull/3686
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/tests/utils/test_helpers.py b/tests/utils/test_helpers.py
index 1005671e9e..b2e79560f4 100644
--- a/tests/utils/test_helpers.py
+++ b/tests/utils/test_helpers.py
@@ -117,5 +117,62 @@ def test_reduce_in_chunks(self):
  14)
 
 
+class HelpersTest(unittest.TestCase):
+def test_as_tuple_iter(self):
+test_list = ['test_str']
+as_tup = helpers.as_tuple(test_list)
+self.assertTupleEqual(tuple(test_list), as_tup)
+
+def test_as_tuple_no_iter(self):
+test_str = 'test_str'
+as_tup = helpers.as_tuple(test_str)
+self.assertTupleEqual((test_str,), as_tup)
+
+def test_is_in(self):
+from airflow.utils import helpers
+# `is_in` expects an object, and a list as input
+
+test_dict = {'test': 1}
+test_list = ['test', 1, dict()]
+small_i = 3
+big_i = 2 ** 31
+test_str = 'test_str'
+test_tup = ('test', 'tuple')
+
+test_container = [test_dict, test_list, small_i, big_i, test_str, 
test_tup]
+
+# Test that integers are referenced as the same object
+self.assertTrue(helpers.is_in(small_i, test_container))
+self.assertTrue(helpers.is_in(3, test_container))
+
+# python caches small integers, so i is 3 will be True,
+# but `big_i is 2 ** 31` is False.
+self.assertTrue(helpers.is_in(big_i, test_container))
+self.assertFalse(helpers.is_in(2 ** 31, test_container))
+
+self.assertTrue(helpers.is_in(test_dict, test_container))
+self.assertFalse(helpers.is_in({'test': 1}, test_container))
+
+self.assertTrue(helpers.is_in(test_list, test_container))
+self.assertFalse(helpers.is_in(['test', 1, dict()], test_container))
+
+self.assertTrue(helpers.is_in(test_str, test_container))
+self.assertTrue(helpers.is_in('test_str', test_container))
+bad_str = 'test_'
+bad_str += 'str'
+self.assertFalse(helpers.is_in(bad_str, test_container))
+
+self.assertTrue(helpers.is_in(test_tup, test_container))
+self.assertFalse(helpers.is_in(('test', 'tuple'), test_container))
+bad_tup = ('test', 'tuple', 'hello')
+self.assertFalse(helpers.is_in(bad_tup[:2], test_container))
+
+def test_is_container(self):
+self.assertTrue(helpers.is_container(['test_list']))
+self.assertFalse(helpers.is_container('test_str_not_iterable'))
+# Pass an object that is not iter nor a string.
+self.assertFalse(helpers.is_container(10))
+
+
 if __name__ == '__main__':
 unittest.main()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bolkedebruin closed pull request #2416: try to enable password for sshpass

2018-08-03 Thread GitBox
bolkedebruin closed pull request #2416: try to enable password for sshpass
URL: https://github.com/apache/incubator-airflow/pull/2416
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/ssh_hook.py 
b/airflow/contrib/hooks/ssh_hook.py
index e63a65d174..fc0cf5f324 100755
--- a/airflow/contrib/hooks/ssh_hook.py
+++ b/airflow/contrib/hooks/ssh_hook.py
@@ -72,8 +72,8 @@ def _host_ref(self):
 
 def _prepare_command(self, cmd):
 connection_cmd = ["ssh", self._host_ref(), "-o", "ControlMaster=no"]
-if self.sshpass:
-connection_cmd = ["sshpass", "-e"] + connection_cmd
+if self.conn.password:
+connection_cmd = ["sshpass", "-p", self.conn.password] + 
connection_cmd
 else:
 connection_cmd += ["-o", "BatchMode=yes"]  # no password prompts
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: Apache Airflow welcome new committer/PMC member : Feng Tao (a.k.a. feng-tao)

2018-08-03 Thread Driesprong, Fokko
Welcome Feng! Awesome to have you on board!

2018-08-03 10:41 GMT+02:00 Naik Kaxil :

> Hi Airflow'ers,
>
>
>
> Please join the Apache Airflow PMC in welcoming its newest member and
>
> co-committer, Feng Tao (a.k.a. feng-tao).
>
>
>
> Welcome Feng, great to have you on board!
>
>
>
> Cheers,
>
> Kaxil
>
>
>
>
>
>
> Kaxil Naik
>
> Data Reply
> 2nd Floor, Nova South
> 160 Victoria Street, Westminster
> 
> London SW1E 5LB - UK
> phone: +44 (0)20 7730 6000
> k.n...@reply.com
> www.reply.com
>
> [image: Data Reply]
>


Re: [VOTE] Airflow 1.10.0rc3

2018-08-03 Thread Driesprong, Fokko
+1 Binding

Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz

Cheers, Fokko

2018-08-03 9:47 GMT+02:00 Bolke de Bruin :

> Hey all,
>
> I have cut Airflow 1.10.0 RC3. This email is calling a vote on the release,
> which will last for 72 hours. Consider this my (binding) +1.
>
> Airflow 1.10.0 RC 3 is available at:
>
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>
> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release that
> comes with INSTALL instructions.
> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python
> "sdist"
> release.
>
> Public keys are available at:
>
> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>
> The amount of JIRAs fixed is over 700. Please have a look at the
> changelog.
> Since RC2 the following has been fixed:
>
> * [AIRFLOW-2817] Force explicit choice on GPL dependency
> * [AIRFLOW-2716] Replace async and await py3.7 keywords
> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
>
> Please note that the version number excludes the `rcX` string as well
> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
> to rename the artifact without modifying the artifact checksums when we
> actually release.
>
> WARNING: Due to licensing requirements you will need to set
>  SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
> installing or upgrading. We will try to remove this requirement for the
> next release.
>
> Cheers,
> Bolke


[GitHub] verdan edited a comment on issue #3687: [AIRFLOW-2805] Display multiple timezones in the tooltip on TaskInstances

2018-08-03 Thread GitBox
verdan edited a comment on issue #3687: [AIRFLOW-2805] Display multiple 
timezones in the tooltip on TaskInstances
URL: 
https://github.com/apache/incubator-airflow/pull/3687#issuecomment-410183891
 
 
   Can't remove jqClock from License, as it is still being used in `www` 
version of webserver. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3687: [AIRFLOW-2805] Display multiple timezones in the tooltip on TaskInstances

2018-08-03 Thread GitBox
codecov-io edited a comment on issue #3687: [AIRFLOW-2805] Display multiple 
timezones in the tooltip on TaskInstances
URL: 
https://github.com/apache/incubator-airflow/pull/3687#issuecomment-410067011
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3687?src=pr=h1)
 Report
   > Merging 
[#3687](https://codecov.io/gh/apache/incubator-airflow/pull/3687?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/c25e63970dd6f3fe7beb52004dd5ba84fee675b9?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3687/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3687?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3687   +/-   ##
   ===
 Coverage   77.53%   77.53%   
   ===
 Files 205  205   
 Lines   1577115771   
   ===
 Hits1222812228   
 Misses   3543 3543
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3687?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3687?src=pr=footer).
 Last update 
[c25e639...b3dfa10](https://codecov.io/gh/apache/incubator-airflow/pull/3687?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Apache Airflow welcome new committer/PMC member : Feng Tao (a.k.a. feng-tao)

2018-08-03 Thread Naik Kaxil
Hi Airflow'ers,

Please join the Apache Airflow PMC in welcoming its newest member and
co-committer, Feng Tao (a.k.a. feng-tao).

Welcome Feng, great to have you on board!

Cheers,
Kaxil




Kaxil Naik

Data Reply
2nd Floor, Nova South
160 Victoria Street, Westminster
London SW1E 5LB - UK
phone: +44 (0)20 7730 6000
k.n...@reply.com
www.reply.com

[Data Reply]


[GitHub] verdan commented on issue #3687: [AIRFLOW-2805] Display multiple timezones in the tooltip on TaskInstances

2018-08-03 Thread GitBox
verdan commented on issue #3687: [AIRFLOW-2805] Display multiple timezones in 
the tooltip on TaskInstances
URL: 
https://github.com/apache/incubator-airflow/pull/3687#issuecomment-410183891
 
 
   Can't remove jqClock, as it is still being used in `www` version of 
webserver. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil closed pull request #3669: Revert [AIRFLOW-2814] - Change `min_file_process_interval` to 0

2018-08-03 Thread GitBox
kaxil closed pull request #3669: Revert [AIRFLOW-2814] - Change 
`min_file_process_interval` to 0
URL: https://github.com/apache/incubator-airflow/pull/3669
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bolkedebruin commented on issue #3687: [AIRFLOW-28005] Display multiple timezones in the tooltip on TaskInstances

2018-08-03 Thread GitBox
bolkedebruin commented on issue #3687: [AIRFLOW-28005] Display multiple 
timezones in the tooltip on TaskInstances
URL: 
https://github.com/apache/incubator-airflow/pull/3687#issuecomment-410174400
 
 
   Great work @verdan ! Can you please make sure to update the LICENSE file 
accordingly (ie. add moments.js under MIT and remove jqclock). In addition 
please fix your JIRA reference.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[VOTE] Airflow 1.10.0rc3

2018-08-03 Thread Bolke de Bruin
Hey all,

I have cut Airflow 1.10.0 RC3. This email is calling a vote on the release,
which will last for 72 hours. Consider this my (binding) +1.

Airflow 1.10.0 RC 3 is available at:

https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ 


apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release that
comes with INSTALL instructions.
apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python "sdist"
release.

Public keys are available at:

https://dist.apache.org/repos/dist/release/incubator/airflow/ 


The amount of JIRAs fixed is over 700. Please have a look at the changelog. 
Since RC2 the following has been fixed:

* [AIRFLOW-2817] Force explicit choice on GPL dependency
* [AIRFLOW-2716] Replace async and await py3.7 keywords
* [AIRFLOW-2810] Fix typo in Xcom model timestamp

Please note that the version number excludes the `rcX` string as well
as the "+incubating" string, so it's now simply 1.10.0. This will allow us
to rename the artifact without modifying the artifact checksums when we
actually release.

WARNING: Due to licensing requirements you will need to set 
 SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
installing or upgrading. We will try to remove this requirement for the 
next release.

Cheers,
Bolke

[GitHub] wwlian commented on a change in pull request #3677: [AIRFLOW-2826] Add GoogleCloudKMSHook

2018-08-03 Thread GitBox
wwlian commented on a change in pull request #3677: [AIRFLOW-2826] Add 
GoogleCloudKMSHook
URL: https://github.com/apache/incubator-airflow/pull/3677#discussion_r207459323
 
 

 ##
 File path: airflow/contrib/hooks/gcp_kms_hook.py
 ##
 @@ -0,0 +1,108 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import base64
+
+from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
+
+from apiclient.discovery import build
+
+
+def _b64encode(s):
+""" Base 64 encodes a bytes object to a string """
+return base64.b64encode(s).decode('ascii')
+
+
+def _b64decode(s):
+""" Base 64 decodes a string to bytes. """
+return base64.b64decode(s.encode('utf-8'))
+
+
+class GoogleCloudKMSHook(GoogleCloudBaseHook):
+"""
+Interact with Google Cloud KMS. This hook uses the Google Cloud Platform
+connection.
+"""
+
+def __init__(self, gcp_conn_id='google_cloud_default', delegate_to=None):
+super(GoogleCloudKMSHook, self).__init__(gcp_conn_id, 
delegate_to=delegate_to)
+
+def get_conn(self):
+"""
+Returns a KMS service object.
+
+:rtype: apiclient.discovery.Resource
+"""
+http_authorized = self._authorize()
+return build(
+'cloudkms', 'v1', http=http_authorized, cache_discovery=False)
+
+def encrypt(self, key_name, plaintext, authenticated_data=None):
+"""
+Encrypts a plaintext message using Google Cloud KMS.
+
+:param key_name: The Resource Name for the key (or key version)
+ to be used for encyption. Of the form
+ ``projects/*/locations/*/keyRings/*/cryptoKeys/**``
+:type key_name: str
+:param plaintext: The message to be encrypted.
+:type plaintext: bytes
+:param authenticated_data: Optional additional authenticated data that
+   must also be provided to decrypt the 
message.
+:type authenticated_data: str
+:return: The base 64 encoded ciphertext of the original message.
+:rtype: str
+"""
+keys = self.get_conn().projects().locations().keyRings().cryptoKeys()
+body = {'plaintext': _b64encode(plaintext)}
+if authenticated_data:
+body['additionalAuthenticatedData'] = 
_b64encode(authenticated_data)
+
+request = keys.encrypt(name=key_name, body=body)
+response = request.execute()
+
+ciphertext = response['ciphertext']
+return ciphertext
+
+def decrypt(self, key_name, ciphertext, authenticated_data=None):
+"""
+Decrypts a ciphertext message using Google Cloud KMS.
+
+:param key_name: The Resource Name for the key to be used for 
decyption.
+ Of the form 
``projects/*/locations/*/keyRings/*/cryptoKeys/**``
+:type key_name: str
+:param ciphertext: The message to be decrypted.
+:type ciphertext: str
+:param authenticated_data: Any additional authenticated data that was
+   provided when encrypting the message.
+:type authenticated_data: str
+:return: The original message.
+:rtype: bytes
+"""
+keys = self.get_conn().projects().locations().keyRings().cryptoKeys()
+body = {'ciphertext': ciphertext}
+if authenticated_data:
+body['additionalAuthenticatedData'] = 
_b64encode(authenticated_data)
+
+request = keys.decrypt(name=key_name, body=body)
+response = request.execute()
+
+plaintext = _b64decode(response['plaintext'])
 
 Review comment:
   Please run a quick manual integration test in python3 to verify that 
response['plaintext'] is indeed a str rather than a bytes object which won't 
have an `encode` method.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wwlian commented on a change in pull request #3677: [AIRFLOW-2826] Add GoogleCloudKMSHook

2018-08-03 Thread GitBox
wwlian commented on a change in pull request #3677: [AIRFLOW-2826] Add 
GoogleCloudKMSHook
URL: https://github.com/apache/incubator-airflow/pull/3677#discussion_r207377292
 
 

 ##
 File path: airflow/contrib/hooks/gcp_kms_hook.py
 ##
 @@ -0,0 +1,108 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import base64
+
+from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
+
+from apiclient.discovery import build
+
+
+def _b64encode(s):
+""" Base 64 encodes a bytes object to a string """
+return base64.b64encode(s).decode('ascii')
+
+
+def _b64decode(s):
+""" Base 64 decodes a string to bytes. """
+return base64.b64decode(s.encode('utf-8'))
+
+
+class GoogleCloudKMSHook(GoogleCloudBaseHook):
+"""
+Interact with Google Cloud KMS. This hook uses the Google Cloud Platform
+connection.
+"""
+
+def __init__(self, gcp_conn_id='google_cloud_default', delegate_to=None):
+super(GoogleCloudKMSHook, self).__init__(gcp_conn_id, 
delegate_to=delegate_to)
+
+def get_conn(self):
+"""
+Returns a KMS service object.
+
+:rtype: apiclient.discovery.Resource
+"""
+http_authorized = self._authorize()
+return build(
+'cloudkms', 'v1', http=http_authorized, cache_discovery=False)
+
+def encrypt(self, key_name, plaintext, authenticated_data=None):
+"""
+Encrypts a plaintext message using Google Cloud KMS.
+
+:param key_name: The Resource Name for the key (or key version)
+ to be used for encyption. Of the form
+ ``projects/*/locations/*/keyRings/*/cryptoKeys/**``
+:type key_name: str
+:param plaintext: The message to be encrypted.
+:type plaintext: bytes
+:param authenticated_data: Optional additional authenticated data that
+   must also be provided to decrypt the 
message.
+:type authenticated_data: str
 
 Review comment:
   The authenticated data should also be bytes too, right? Same for decrypt().


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3669: Revert [AIRFLOW-2814] - Change `min_file_process_interval` to 0

2018-08-03 Thread GitBox
feng-tao commented on issue #3669: Revert [AIRFLOW-2814] - Change 
`min_file_process_interval` to 0
URL: 
https://github.com/apache/incubator-airflow/pull/3669#issuecomment-410162070
 
 
   @kaxil , lgtm


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services