Plan to change type of dag_id from String to Number?

2018-08-05 Thread vardanguptacse
Hi Everyone,

Do we have any plan to change type of dag_id from String to Number, this will 
make queries on metadata more performant, proposal could be generating an 
auto-incremental value in dag table and this id getting used in rest of the 
other tables?


Regards,
Vardan Gupta


Re: Deploy Airflow on Kubernetes using Airflow Operator

2018-08-05 Thread Bolke de Bruin
Really awesome stuff. We are in progress to move over to k8s for Airflow (on 
prem though) and this is really helpful.

B.

Verstuurd vanaf mijn iPad

> Op 3 aug. 2018 om 23:35 heeft Barni Seetharaman  
> het volgende geschreven:
> 
> Hi
> 
> We at Google just open-sourced a Kubernetes custom controller (also called
> operator) to make deploying and managing Airflow on kubernetes simple.
> The operator pattern is a power abstraction in kubernetes.
> Please watch this repo (in the process of adding docs) for further updates.
> 
> https://github.com/GoogleCloudPlatform/airflow-operator
> 
> Do reach out if you have any questions.
> 
> Also created a channel in kubernetes slack  (#airflow-operator)
>  for any
> discussions specific to Airflow on Kubernetes (including Daniel's
> Kubernetes Executor, Kuberenetes operator and this custom controller also
> called kuberntes airflow operator).
> 
> regs
> Barni


Re: Readthedocs is Broken

2018-08-05 Thread Taylor Edmiston
Kaxil -

>From what I can tell RTD doesn't support custom env vars from conf.py or
.readthedocs.yml in the build process.

I have a solution here based on another env var that they do set.  It works
on my machine.  Does this work for you?

https://github.com/apache/incubator-airflow/pull/3703

I do not currently have access to apache/incubator-airflow on
readthedocs.org but I would be happy to help test there if it's possible to
grant access.  I'm a docs admin for another Python project and have used it
before.

Best,
Taylor

*Taylor Edmiston*
Blog  | CV
 | LinkedIn
 | AngelList
 | Stack Overflow



On Sun, Aug 5, 2018 at 7:21 PM, Naik Kaxil  wrote:

> Hi Guys,
>
>
>
> The readthedocs builds are failing due to https://github.com/apache/
> incubator-airflow/pull/3660:
>
>
>
> https://readthedocs.org/projects/airflow/builds/7585546/
>
>
>
>
>
> Does someone know how we can pass an environment variable (
> SLUGIFY_USES_TEXT_UNIDECODE=yes) to readthedocs environment? Is it
> possible by adding something to https://github.com/apache/
> incubator-airflow/blob/master/.readthedocs.yml file or any setting that
> needs to be changed?
>
>
>
> I tried adding it to docs/conf.py but it doesn’t work.
>
>
>
> Regards,
>
> Kaxil
>
>
>
>
> Kaxil Naik
>
> Data Reply
> 2nd Floor, Nova South
> 160 Victoria Street, Westminster
> 
> London SW1E 5LB - UK
> phone: +44 (0)20 7730 6000
> k.n...@reply.com
> www.reply.com
>
> [image: Data Reply]
>


Re: Deploy Airflow on Kubernetes using Airflow Operator

2018-08-05 Thread Sid Anand
Neeto!

-s

On Fri, Aug 3, 2018 at 2:36 PM Barni Seetharaman 
wrote:

> Hi
>
> We at Google just open-sourced a Kubernetes custom controller (also called
> operator) to make deploying and managing Airflow on kubernetes simple.
> The operator pattern is a power abstraction in kubernetes.
> Please watch this repo (in the process of adding docs) for further updates.
>
> https://github.com/GoogleCloudPlatform/airflow-operator
>
> Do reach out if you have any questions.
>
> Also created a channel in kubernetes slack  (#airflow-operator)
>  for any
> discussions specific to Airflow on Kubernetes (including Daniel's
> Kubernetes Executor, Kuberenetes operator and this custom controller also
> called kuberntes airflow operator).
>
> regs
> Barni
>


Re: Kerberos and Airflow

2018-08-05 Thread Dan Davydov
I look forward to reading the draft and working on it with you! Not 100%
sure I can make it so SF for the hackathon (I'm in New York now), but I can
participate remotely.



On Sat, Aug 4, 2018 at 9:30 AM Bolke de Bruin  wrote:

> Hi Dan,
>
> Don’t misunderstand me. I think what I proposed is complementary to the
> dag submit function. The only thing you mentioned I don’t think is needed
> is to fully serialize up front and therefore excluding callback etc
> (although there are other serialization libraries like marshmallow that
> might be able to do it).
>
> You are right to mention that the hashes should be calculated at submit
> time and a authorized user should be able to recalculate a hash. Another
> option could be something like https://pypi.org/project/signedimp/ which
> we could use to verify dependencies.
>
> I’ll start writing something up. We can then shoot holes in it (i think
> you have a point on the crypto) and maybe do some hacking on it. This could
> be part of the hackathon in sept in SF, I’m sure some other people would
> have an interest in it as well.
>
> B.
>
> Verstuurd vanaf mijn iPad
>
> > Op 3 aug. 2018 om 23:14 heeft Dan Davydov 
> het volgende geschreven:
> >
> > I designed a system similar to what you are describing which is in use at
> > Airbnb (only DAGs on a whitelist would be allowed to merged to the git
> repo
> > if they used certain types of impersonation), it worked for simple use
> > cases, but the problem was doing access control becomes very difficult,
> > e.g. solving the problem of which DAGs map to which manifest files, and
> > which manifest files can access which secrets.
> >
> > There is also a security risk where someone changes e.g. a python file
> > dependency of your task, or let's say you figure out a way to block those
> > kinds of changes based on your sthashing, what if there is a legitimate
> > change in a dependency and you want to recalculate the hash? Then I think
> > you go back to a solution like your proposed "airflow submit" command to
> > accomplish this.
> >
> > Additional concerns:
> > - I'm not sure if I'm a fan of the the first time a scheduler parses a
> DAG
> > to be what creates the hashes either, it feels to me like
> > encryption/hashing should be done before DAGs are even parsed by the
> > scheduler (at commit time or submit time of the DAGs)
> > - The type of the encrypted key seem kind of hacky to me, i.e. some kind
> of
> > custom hash based on DAG structure instead of a simple token passed in by
> > users which has a clear separation of concerns WRT security
> > - Added complexity both to Airflow code, and to users as they need to
> > define or customize hashing functions for DAGs to improve security
> > If we can get a reasonably secure solution then it might be a reasonable
> > trade-off considering the alternative is a major overhaul/restrictions to
> > DAGs.
> >
> > Maybe I'm missing some details that would alleviate my concerns here,
> and a
> > bit of a more in-depth document might help?
> >
> >
> >
> > *Also: using the Kubernetes executor combined with some of the things
> > wediscussed greatly enhances the security of Airflow as the
> > environment isn’t really shared anymore.*
> > Assuming a multi-tenant scheduler, I feel the same set of hard problems
> > exist with Kubernetes, as the executor mainly just simplifies the
> > post-executor parts of task scheduling/execution which I think you
> already
> > outlined a good solution for early on in this thread (passing keys from
> the
> > executor to workers).
> >
> > Happy to set up some time to talk real-time about this by the way, once
> we
> > iron out the details I want to implement whatever the best solution we
> come
> > up with is.
> >
> >> On Thu, Aug 2, 2018 at 4:13 PM Bolke de Bruin 
> wrote:
> >>
> >> You mentioned you would like to make sure that the DAG (and its tasks)
> >> runs in a confined set of settings. Ie.
> >> A given set of connections at submission time not at run time. So here
> we
> >> can make use of the fact that both the scheduler
> >> and the worker parse the DAG.
> >>
> >> Firstly, when scheduler evaluates a DAG it can add an integrity check
> >> (hash) for each task. The executor can encrypt the
> >> metadata with this hash ensuring that the structure of the DAG remained
> >> the same. It means that the task is only
> >> able to decrypt the metadata when it is able to calculate the same hash.
> >>
> >> Similarly, if the scheduler parses a DAG for the first time it can
> >> register the hashes for the tasks. It can then verify these hashes
> >> at runtime to ensure the structure of the tasks have stayed the same. In
> >> the manifest (which could even in the DAG or
> >> part of the DAG definition) we could specify which fields would be used
> >> for hash calculation. We could even specify
> >> static hashes. This would give flexibility as to what freedom the users
> >> have in the auto-generated DAGS.
> >>
> >> Something like that?
> >>
> >> B.

Re: Airflow-pr merge fails!

2018-08-05 Thread Sid Anand
Awesome.. This works now. Bumps on the road to progress are always welcome.

Honestly, the only remaining issues are 1) the notifications and 2) how to
auto-close JIRAs.
-s

On Thu, Aug 2, 2018 at 12:53 AM Ash Berlin-Taylor 
wrote:

> I had to change my git remote to use the SSH version - it turns out I had
> already done that when I was testing the pr tool changes against my fork.
>
> https://github.com/apache/incubator-airflow/pull/3680 opened that will
> correct the remote. Or run this command
>
> git remote set-url github g...@github.com:apache/incubator-airflow
>
> Sorry for all the bumps!
>
> Ash
>
> > On 1 Aug 2018, at 21:47, Sid Anand  wrote:
> >
> > Ash,
> > Per https://github.com/apache/incubator-airflow/pull/3413
> >
> > I tried running the dev/airflow-pr merge and it failed at the Github
> > authentication point.
> >
> > The local merge is complete (PR_TOOL_MERGE_PR_3663_MASTER).
> >
> > Push to Gitbox (github)? [y/N]: y
> >
> >>> Running command: git push github PR_TOOL_MERGE_PR_3663_MASTER:master
> >
> > Username for 'https://github.com': r39132
> >
> > Password for 'https://r39...@github.com':
> >
> > remote: Invalid username or password.
> >
> > fatal: Authentication failed for '
> > https://github.com/apache/incubator-airflow.git/'
> >
> > Since we have enabled 2fa, would this approach still work?
> >
> > I did remove my previous git remotes and did rerun dev/airflow-pr
> > setup_git_remotes
> >
> > Any ideas?
> > -s
>
>


Readthedocs is Broken

2018-08-05 Thread Naik Kaxil
Hi Guys,

The readthedocs builds are failing due to 
https://github.com/apache/incubator-airflow/pull/3660:

https://readthedocs.org/projects/airflow/builds/7585546/


Does someone know how we can pass an environment variable 
(SLUGIFY_USES_TEXT_UNIDECODE=yes) to readthedocs environment? Is it possible by 
adding something to 
https://github.com/apache/incubator-airflow/blob/master/.readthedocs.yml file 
or any setting that needs to be changed?

I tried adding it to docs/conf.py but it doesn’t work.

Regards,
Kaxil



Kaxil Naik

Data Reply
2nd Floor, Nova South
160 Victoria Street, Westminster
London SW1E 5LB - UK
phone: +44 (0)20 7730 6000
k.n...@reply.com
www.reply.com

[Data Reply]


Re: Podling Report Reminder - August 2018

2018-08-05 Thread Sid Anand
Bolke?

-s

On Sun, Aug 5, 2018 at 3:27 PM Naik Kaxil  wrote:

> Just reviewed it and looks good to me. Just one thing, do we just need to
> resolve the license issue for graduation?
>
> Regards,
> Kaxil
>
>
>
> On 05/08/2018, 22:15, "Sid Anand"  wrote:
>
> Done.
>
> Folks,
> Please review and correct if anything is off:
> https://wiki.apache.org/incubator/August2018#preview
>
> @Mentors, please sign off!!
>
> If you can't access it, here's the Airflow section:
>
> AirflowAirflow is a workflow automation and
> scheduling system that can be used toauthor and manage data
> pipelines.Airflow has been incubating since 2016-03-31.Three most
> important issues to address in the move towards graduation:  1.Once we
> make a release with the licensing fix we will move forward with
> graduation.  2.  3.Any issues that the Incubator PMC (IPMC) or ASF
> Board wish/need to beaware of? NoneHow has the community developed
> since the last report?1. Since our last podling report 5 months ago
> (i.e. between March   28 & August 5, inclusive), we grew our
> contributors from 427 to 5122. Since our last podling report 5 months
> ago (i.e. between March   28 & August 5, inclusive), we resolved 554
> pull requests (currently at 2834 closed   PRs)3. Since being accepted
> into the incubator, the number of companies officially   using Apache
> Airflow has risen from 30 to 183, 34 new from the last podling
> report 5 months ago.How has the project developed since the last
> report?See above : 554 PRs resolved, 85 new contributors, & 34 new
> companies  officially using it. We have also added 2 new committers :
> Kaxil Naik on May 7 & Tao Feng on Aug 3.How would you assess the
> podling's maturity?Please feel free to add your own commentary.  [ ]
> Initial setup  [ ] Working towards first release  [ ] Community
> building  [x] Nearing graduation  [ ] Other:Date of last release:
> 2017-12-15When were the last committers or PPMC members elected?2 new
> committers : Kaxil Naik on May 7 & Tao Feng on Aug 3Signed-off-by:  [
> ](airflow) Chris Nauroth Comments:  [ ](airflow) Hitesh Shah
> Comments:  [ ](airflow) Jakob Homan Comments:
>
>
> On Sun, Aug 5, 2018 at 1:45 PM Sid Anand  wrote:
>
> > I'll do this now! Will send an update shortly to the Dev DL!
> >
> > -s
> >
> > On Fri, Aug 3, 2018 at 9:34 PM  wrote:
> >
> >> Dear podling,
> >>
> >> This email was sent by an automated system on behalf of the Apache
> >> Incubator PMC. It is an initial reminder to give you plenty of time
> to
> >> prepare your quarterly board report.
> >>
> >> The board meeting is scheduled for Wed, 15 August 2018, 10:30 am
> PDT.
> >> The report for your podling will form a part of the Incubator PMC
> >> report. The Incubator PMC requires your report to be submitted 2
> weeks
> >> before the board meeting, to allow sufficient time for review and
> >> submission (Wed, August 01).
> >>
> >> Please submit your report with sufficient time to allow the
> Incubator
> >> PMC, and subsequently board members to review and digest. Again, the
> >> very latest you should submit your report is 2 weeks prior to the
> board
> >> meeting.
> >>
> >> Candidate names should not be made public before people are actually
> >> elected, so please do not include the names of potential committers
> or
> >> PPMC members in your report.
> >>
> >> Thanks,
> >>
> >> The Apache Incubator PMC
> >>
> >> Submitting your Report
> >>
> >> --
> >>
> >> Your report should contain the following:
> >>
> >> *   Your project name
> >> *   A brief description of your project, which assumes no knowledge
> of
> >> the project or necessarily of its field
> >> *   A list of the three most important issues to address in the move
> >> towards graduation.
> >> *   Any issues that the Incubator PMC or ASF Board might wish/need
> to be
> >> aware of
> >> *   How has the community developed since the last report
> >> *   How has the project developed since the last report.
> >> *   How does the podling rate their own maturity.
> >>
> >> This should be appended to the Incubator Wiki page at:
> >>
> >> https://wiki.apache.org/incubator/August2018
> >>
> >> Note: This is manually populated. You may need to wait a little
> before
> >> this page is created from a template.
> >>
> >> Mentors
> >> ---
> >>
> >> Mentors should review reports for their project(s) and sign them
> off on
> >> the Incubator wiki page. Signing off reports shows that you are
> >> following the project - projects that are not signed may raise
> alarms
> >> for the Incubator PMC.
> >>
> >> Incubator PMC

Re: Airflow committers (a list)

2018-08-05 Thread Sid Anand
Nice. Thx Kaxil.
-s

On Sun, Aug 5, 2018 at 3:57 PM Naik Kaxil  wrote:

> Last time I updated to add my name, it was reflected the next day.
>
> I had added the instructions here when I joined PPMC:
> https://cwiki.apache.org/confluence/display/AIRFLOW/Committers%27+Guide#Committers'Guide-AddingNewCommitters/PMCMemberstotheAirflowIncubationStatusPage
>
> Jenkins job runs daily at around 01:39 AM for the entire incubator site:
> https://builds.apache.org/view/H-L/view/Incubator/job/Incubator%20Site/
>
> Regards,
> Kaxil
>
> On 05/08/2018, 21:44, "Sid Anand"  wrote:
>
> What are the instructions to add committers to this page?
> -s
>
> To update http://incubator.apache.org/projects/airflow.html, you need
> to
> check out the incubator SVN repo and modifying
> content/projects/airflow.xml
>
> I SVN'committed the change below for tfeng but am not sure when the
> html
> page gets rebuilt and redeployed to the site. Some CI/CD process I
> believe.
>
> sianand@LM-SJN-21002367:~/Projects/incubator/content/projects $ svn
> diff
>
> Index: airflow.xml
>
> ===
>
> --- airflow.xml (revision 1837468)
>
> +++ airflow.xml (working copy)
>
> @@ -180,6 +180,11 @@
>
>Kaxil Naik
>
>  
>
>  
>
> +  .
>
> +  tfeng
>
> +  Tao Feng
>
> +
>
> +
>
>Extra
>
>.
>
>.
>
> Every time we add a committer, we need to update this page. Here's some
> info on the SVN repo so you can check it out for future reference:
>
> sianand@LM-SJN-21002367:~/Projects/incubator/content/projects $ svn
> info
>
> Path: .
>
> Working Copy Root Path: /Users/sianand/Projects/incubator
>
> URL:
>
> https://svn.apache.org/repos/asf/incubator/public/trunk/content/projects
>
> Relative URL: ^/incubator/public/trunk/content/projects
>
> Repository Root: https://svn.apache.org/repos/asf
>
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>
> Revision: 1837468
>
> Node Kind: directory
>
> Schedule: normal
>
> Last Changed Author: mck
>
> Last Changed Rev: 1837396
>
> Last Changed Date: 2018-08-03 19:07:54 -0700 (Fri, 03 Aug 2018)
>
> On Sun, Aug 5, 2018 at 1:39 PM Ash Berlin-Taylor <
> ash_airflowl...@firemirror.com> wrote:
>
> >   - https://github.com/orgs/apache/teams/airflow-committers/members
> <
> > https://github.com/orgs/apache/teams/airflow-committers/members>
> >
> > This one at least is populated automatically via a 30-minutely cron
> job.
> >
> >
> > > On 5 Aug 2018, at 21:35, Sid Anand  wrote:
> > >
> > > Committers/Mentors,
> > > We have several places where committers are listed:
> > >
> > >   - https://whimsy.apache.org/roster/ppmc/airflow
> > >   - http://incubator.apache.org/projects/airflow.html
> > >   - Mentors, is the committer list here meant to be current?
> > >  - Kaxil, I added tfeng to this page (done via SVN).. though
> I'm not
> > >  sure when the actual publishing happens to this site. I
> believe
> > it's done
> > >  via some CI/CD process now.
> > >   -
> https://github.com/orgs/apache/teams/airflow-committers/members
> > >   - This is needed for the Gitbox integration we are using now.
> Does it
> > >  get populated automatically from whimsy?
> > >
> > > When we promote a contributor to committer/PPMC, do we need to
> update
> > all 3
> > > of these places?
> > >
> > > FYI, there is a PR to also add this list to the README so the
> community
> > can
> > > see who the committers are :
> > > https://github.com/apache/incubator-airflow/pull/3699
> > >
> > > Currently, the committer section of the README points to a wiki
> page that
> > > displays all of these links.
> > >
> > > And if you wanted more options, GitHub's CODEOWNERS file provides
> some
> > > interesting functionality, though I don't think we need it:
> > > https://blog.github.com/2017-07-06-introducing-code-owners/
> > >
> > > -s
> >
> >
>
>
>
>
>
>
> Kaxil Naik
>
> Data Reply
> 2nd Floor, Nova South
> 160 Victoria Street, Westminster
> London SW1E 5LB - UK
> phone: +44 (0)20 7730 6000
> k.n...@reply.com
> www.reply.com
>


Re: Airflow committers (a list)

2018-08-05 Thread Naik Kaxil
Last time I updated to add my name, it was reflected the next day.

I had added the instructions here when I joined PPMC: 
https://cwiki.apache.org/confluence/display/AIRFLOW/Committers%27+Guide#Committers'Guide-AddingNewCommitters/PMCMemberstotheAirflowIncubationStatusPage

Jenkins job runs daily at around 01:39 AM for the entire incubator site: 
https://builds.apache.org/view/H-L/view/Incubator/job/Incubator%20Site/

Regards,
Kaxil

On 05/08/2018, 21:44, "Sid Anand"  wrote:

What are the instructions to add committers to this page?
-s

To update http://incubator.apache.org/projects/airflow.html, you need to
check out the incubator SVN repo and modifying content/projects/airflow.xml

I SVN'committed the change below for tfeng but am not sure when the html
page gets rebuilt and redeployed to the site. Some CI/CD process I believe.

sianand@LM-SJN-21002367:~/Projects/incubator/content/projects $ svn diff

Index: airflow.xml

===

--- airflow.xml (revision 1837468)

+++ airflow.xml (working copy)

@@ -180,6 +180,11 @@

   Kaxil Naik

 

 

+  .

+  tfeng

+  Tao Feng

+

+

   Extra

   .

   .

Every time we add a committer, we need to update this page. Here's some
info on the SVN repo so you can check it out for future reference:

sianand@LM-SJN-21002367:~/Projects/incubator/content/projects $ svn info

Path: .

Working Copy Root Path: /Users/sianand/Projects/incubator

URL:
https://svn.apache.org/repos/asf/incubator/public/trunk/content/projects

Relative URL: ^/incubator/public/trunk/content/projects

Repository Root: https://svn.apache.org/repos/asf

Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68

Revision: 1837468

Node Kind: directory

Schedule: normal

Last Changed Author: mck

Last Changed Rev: 1837396

Last Changed Date: 2018-08-03 19:07:54 -0700 (Fri, 03 Aug 2018)

On Sun, Aug 5, 2018 at 1:39 PM Ash Berlin-Taylor <
ash_airflowl...@firemirror.com> wrote:

>   - https://github.com/orgs/apache/teams/airflow-committers/members <
> https://github.com/orgs/apache/teams/airflow-committers/members>
>
> This one at least is populated automatically via a 30-minutely cron job.
>
>
> > On 5 Aug 2018, at 21:35, Sid Anand  wrote:
> >
> > Committers/Mentors,
> > We have several places where committers are listed:
> >
> >   - https://whimsy.apache.org/roster/ppmc/airflow
> >   - http://incubator.apache.org/projects/airflow.html
> >   - Mentors, is the committer list here meant to be current?
> >  - Kaxil, I added tfeng to this page (done via SVN).. though I'm not
> >  sure when the actual publishing happens to this site. I believe
> it's done
> >  via some CI/CD process now.
> >   - https://github.com/orgs/apache/teams/airflow-committers/members
> >   - This is needed for the Gitbox integration we are using now. Does it
> >  get populated automatically from whimsy?
> >
> > When we promote a contributor to committer/PPMC, do we need to update
> all 3
> > of these places?
> >
> > FYI, there is a PR to also add this list to the README so the community
> can
> > see who the committers are :
> > https://github.com/apache/incubator-airflow/pull/3699
> >
> > Currently, the committer section of the README points to a wiki page 
that
> > displays all of these links.
> >
> > And if you wanted more options, GitHub's CODEOWNERS file provides some
> > interesting functionality, though I don't think we need it:
> > https://blog.github.com/2017-07-06-introducing-code-owners/
> >
> > -s
>
>






Kaxil Naik 

Data Reply
2nd Floor, Nova South
160 Victoria Street, Westminster
London SW1E 5LB - UK 
phone: +44 (0)20 7730 6000
k.n...@reply.com
www.reply.com


Re: Podling Report Reminder - August 2018

2018-08-05 Thread Naik Kaxil
Just reviewed it and looks good to me. Just one thing, do we just need to 
resolve the license issue for graduation?

Regards,
Kaxil



On 05/08/2018, 22:15, "Sid Anand"  wrote:

Done.

Folks,
Please review and correct if anything is off:
https://wiki.apache.org/incubator/August2018#preview

@Mentors, please sign off!!

If you can't access it, here's the Airflow section:

AirflowAirflow is a workflow automation and
scheduling system that can be used toauthor and manage data
pipelines.Airflow has been incubating since 2016-03-31.Three most
important issues to address in the move towards graduation:  1.Once we
make a release with the licensing fix we will move forward with
graduation.  2.  3.Any issues that the Incubator PMC (IPMC) or ASF
Board wish/need to beaware of? NoneHow has the community developed
since the last report?1. Since our last podling report 5 months ago
(i.e. between March   28 & August 5, inclusive), we grew our
contributors from 427 to 5122. Since our last podling report 5 months
ago (i.e. between March   28 & August 5, inclusive), we resolved 554
pull requests (currently at 2834 closed   PRs)3. Since being accepted
into the incubator, the number of companies officially   using Apache
Airflow has risen from 30 to 183, 34 new from the last podling
report 5 months ago.How has the project developed since the last
report?See above : 554 PRs resolved, 85 new contributors, & 34 new
companies  officially using it. We have also added 2 new committers :
Kaxil Naik on May 7 & Tao Feng on Aug 3.How would you assess the
podling's maturity?Please feel free to add your own commentary.  [ ]
Initial setup  [ ] Working towards first release  [ ] Community
building  [x] Nearing graduation  [ ] Other:Date of last release:
2017-12-15When were the last committers or PPMC members elected?2 new
committers : Kaxil Naik on May 7 & Tao Feng on Aug 3Signed-off-by:  [
](airflow) Chris Nauroth Comments:  [ ](airflow) Hitesh Shah
Comments:  [ ](airflow) Jakob Homan Comments:


On Sun, Aug 5, 2018 at 1:45 PM Sid Anand  wrote:

> I'll do this now! Will send an update shortly to the Dev DL!
>
> -s
>
> On Fri, Aug 3, 2018 at 9:34 PM  wrote:
>
>> Dear podling,
>>
>> This email was sent by an automated system on behalf of the Apache
>> Incubator PMC. It is an initial reminder to give you plenty of time to
>> prepare your quarterly board report.
>>
>> The board meeting is scheduled for Wed, 15 August 2018, 10:30 am PDT.
>> The report for your podling will form a part of the Incubator PMC
>> report. The Incubator PMC requires your report to be submitted 2 weeks
>> before the board meeting, to allow sufficient time for review and
>> submission (Wed, August 01).
>>
>> Please submit your report with sufficient time to allow the Incubator
>> PMC, and subsequently board members to review and digest. Again, the
>> very latest you should submit your report is 2 weeks prior to the board
>> meeting.
>>
>> Candidate names should not be made public before people are actually
>> elected, so please do not include the names of potential committers or
>> PPMC members in your report.
>>
>> Thanks,
>>
>> The Apache Incubator PMC
>>
>> Submitting your Report
>>
>> --
>>
>> Your report should contain the following:
>>
>> *   Your project name
>> *   A brief description of your project, which assumes no knowledge of
>> the project or necessarily of its field
>> *   A list of the three most important issues to address in the move
>> towards graduation.
>> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
>> aware of
>> *   How has the community developed since the last report
>> *   How has the project developed since the last report.
>> *   How does the podling rate their own maturity.
>>
>> This should be appended to the Incubator Wiki page at:
>>
>> https://wiki.apache.org/incubator/August2018
>>
>> Note: This is manually populated. You may need to wait a little before
>> this page is created from a template.
>>
>> Mentors
>> ---
>>
>> Mentors should review reports for their project(s) and sign them off on
>> the Incubator wiki page. Signing off reports shows that you are
>> following the project - projects that are not signed may raise alarms
>> for the Incubator PMC.
>>
>> Incubator PMC
>>
>






Kaxil Naik 

Data Reply
2nd Floor, Nova South
160 Victoria Street, Westminster
London SW1E 5LB - UK 
phone: +44 (0)20 7730 6000
k.n...@reply.com
www.reply.com


Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
+1 :-)

Sent from my iPhone

> On 5 Aug 2018, at 23:08, Ash Berlin-Taylor  
> wrote:
> 
> Yup, just worked out the same thing.
> 
> I think as "punishment" for me finding bugs so late in two RCs (this, and 
> 1.9) I should run the release for the next release.
> 
> -ash
> 
>> On 5 Aug 2018, at 22:05, Bolke de Bruin  wrote:
>> 
>> Yeah I figured it out. Originally i was using a different implementation of 
>> UTCDateTime, but that was unmaintained. I switched, but this version changed 
>> or has a different contract. While it transforms on storing to UTC it does 
>> not so when it receives timezone aware fields from the db. Hence the issue.
>> 
>> I will prepare a PR that removes the dependency and implements our own 
>> extension of DateTime. Probably tomorrow.
>> 
>> Good catch! Just in time :-(.
>> 
>> B.
>> 
>>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor  
>>> wrote:
>>> 
>>> Entirely possible, though I wasn't even dealing with the scheduler - the 
>>> issue I was addressing was entirely in the webserver for a pre-existing 
>>> Task Instance.
>>> 
>>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that 
>>> isn't working right/ as expected. This line: 
>>> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>>>  doens't look right for us - as you mentioned the TZ is set to something 
>>> (rather than having no TZ value).
>>> 
>>> Some background on how Pq handles TZs. It always returns DTs in the TZ of 
>>> the connection. I'm not sure if this is unique to postgres or if other DBs 
>>> behave the same.
>>> 
>>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone;
>>>timestamptz
>>> 
>>> 2018-08-03 01:00:00+01
>>> 
>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>>timestamptz
>>> 
>>> 2018-08-03 01:00:00+01
>>> 
>>> The server will always return TZs in the connection timezone.
>>> 
>>> postgres=# set timezone=utc;
>>> SET
>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>>timestamptz
>>> 
>>> 2018-08-03 00:00:00+00
>>> (1 row)
>>> 
>>> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
>>>timestamptz
>>> 
>>> 2018-08-03 00:00:00+00
>>> (1 row)
>>> 
>>> 
>>> 
>>> 
>>> -ash
>>> 
 On 5 Aug 2018, at 21:28, Bolke de Bruin  wrote:
 
 This is the issue:
 
 [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 
 00:00:00+00:00 tzinfo: 
 [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created >>> example_http_operator @ 2018-08-03 02:00:00+02:00: 
 scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
 
 [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 
 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, 
 name=None)
 [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created >>> example_http_operator @ 2018-08-04 02:00:00+02:00: 
 scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
 
 Notice at line 1+2: that the next run date is correctly in UTC but from 
 the DB it gets a +2. At the next bit (3+4) we get a 
 psycopg2.tz.FixedOffsetTimezone which should be set to UTC according to 
 the specs of https://github.com/spoqa/sqlalchemy-utc 
  , but it isn’t. 
 
 So changing your setting of the DB to UTC fixes the symptom but not the 
 cause.
 
 B.
 
> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor 
>  wrote:
> 
> Sorry for being terse before.
> 
> So the issue is that the ts loaded from the DB is not in UTC, it's in 
> GB/+01 (the default of the DB server)
> 
> For me, on a currently running 1.9 (no TZ) db:
> 
> airflow=# select * from task_instance;
> get_op| example_http_operator | 2018-07-23 00:00:00
> 
> This date time appears in the log url, and the path it looks at on S3 is 
> 
> .../example_http_operator/2018-07-23T00:00:00/1.log
> 
> If my postgres server has a default timezone of GB (which the one running 
> on my laptop does), and I then apply the migration then it is converted 
> to that local time.
> 
> airflow=# select * from task_instance;
> get_op| example_http_operator | 2018-07-23 01:00:00+01
> 
> airflow=# set timezone=UTC;
> airflow=# select * from task_instance;
> get_op| example_http_operator | 2018-07-23 00:00:00+00
> 
> 
> This is all okay so far. The migration has kept the column at the same 
> moment in time.
> 
> The issue come when the UI tries to display logs for this old task: 
> because the timezone of the connection is not UTC, PG returns a date with 
> a +01 TZ. Thus after the migration this old task tries to look for a log 
> file of
>

Re: Podling Report Reminder - August 2018

2018-08-05 Thread Sid Anand
Done.

Folks,
Please review and correct if anything is off:
https://wiki.apache.org/incubator/August2018#preview

@Mentors, please sign off!!

If you can't access it, here's the Airflow section:

AirflowAirflow is a workflow automation and
scheduling system that can be used toauthor and manage data
pipelines.Airflow has been incubating since 2016-03-31.Three most
important issues to address in the move towards graduation:  1.Once we
make a release with the licensing fix we will move forward with
graduation.  2.  3.Any issues that the Incubator PMC (IPMC) or ASF
Board wish/need to beaware of? NoneHow has the community developed
since the last report?1. Since our last podling report 5 months ago
(i.e. between March   28 & August 5, inclusive), we grew our
contributors from 427 to 5122. Since our last podling report 5 months
ago (i.e. between March   28 & August 5, inclusive), we resolved 554
pull requests (currently at 2834 closed   PRs)3. Since being accepted
into the incubator, the number of companies officially   using Apache
Airflow has risen from 30 to 183, 34 new from the last podling
report 5 months ago.How has the project developed since the last
report?See above : 554 PRs resolved, 85 new contributors, & 34 new
companies  officially using it. We have also added 2 new committers :
Kaxil Naik on May 7 & Tao Feng on Aug 3.How would you assess the
podling's maturity?Please feel free to add your own commentary.  [ ]
Initial setup  [ ] Working towards first release  [ ] Community
building  [x] Nearing graduation  [ ] Other:Date of last release:
2017-12-15When were the last committers or PPMC members elected?2 new
committers : Kaxil Naik on May 7 & Tao Feng on Aug 3Signed-off-by:  [
](airflow) Chris Nauroth Comments:  [ ](airflow) Hitesh Shah
Comments:  [ ](airflow) Jakob Homan Comments:


On Sun, Aug 5, 2018 at 1:45 PM Sid Anand  wrote:

> I'll do this now! Will send an update shortly to the Dev DL!
>
> -s
>
> On Fri, Aug 3, 2018 at 9:34 PM  wrote:
>
>> Dear podling,
>>
>> This email was sent by an automated system on behalf of the Apache
>> Incubator PMC. It is an initial reminder to give you plenty of time to
>> prepare your quarterly board report.
>>
>> The board meeting is scheduled for Wed, 15 August 2018, 10:30 am PDT.
>> The report for your podling will form a part of the Incubator PMC
>> report. The Incubator PMC requires your report to be submitted 2 weeks
>> before the board meeting, to allow sufficient time for review and
>> submission (Wed, August 01).
>>
>> Please submit your report with sufficient time to allow the Incubator
>> PMC, and subsequently board members to review and digest. Again, the
>> very latest you should submit your report is 2 weeks prior to the board
>> meeting.
>>
>> Candidate names should not be made public before people are actually
>> elected, so please do not include the names of potential committers or
>> PPMC members in your report.
>>
>> Thanks,
>>
>> The Apache Incubator PMC
>>
>> Submitting your Report
>>
>> --
>>
>> Your report should contain the following:
>>
>> *   Your project name
>> *   A brief description of your project, which assumes no knowledge of
>> the project or necessarily of its field
>> *   A list of the three most important issues to address in the move
>> towards graduation.
>> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
>> aware of
>> *   How has the community developed since the last report
>> *   How has the project developed since the last report.
>> *   How does the podling rate their own maturity.
>>
>> This should be appended to the Incubator Wiki page at:
>>
>> https://wiki.apache.org/incubator/August2018
>>
>> Note: This is manually populated. You may need to wait a little before
>> this page is created from a template.
>>
>> Mentors
>> ---
>>
>> Mentors should review reports for their project(s) and sign them off on
>> the Incubator wiki page. Signing off reports shows that you are
>> following the project - projects that are not signed may raise alarms
>> for the Incubator PMC.
>>
>> Incubator PMC
>>
>


Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
Yup, just worked out the same thing.

I think as "punishment" for me finding bugs so late in two RCs (this, and 1.9) 
I should run the release for the next release.

-ash

> On 5 Aug 2018, at 22:05, Bolke de Bruin  wrote:
> 
> Yeah I figured it out. Originally i was using a different implementation of 
> UTCDateTime, but that was unmaintained. I switched, but this version changed 
> or has a different contract. While it transforms on storing to UTC it does 
> not so when it receives timezone aware fields from the db. Hence the issue.
> 
> I will prepare a PR that removes the dependency and implements our own 
> extension of DateTime. Probably tomorrow.
> 
> Good catch! Just in time :-(.
> 
> B.
> 
>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor  
>> wrote:
>> 
>> Entirely possible, though I wasn't even dealing with the scheduler - the 
>> issue I was addressing was entirely in the webserver for a pre-existing Task 
>> Instance.
>> 
>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that 
>> isn't working right/ as expected. This line: 
>> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>>  doens't look right for us - as you mentioned the TZ is set to something 
>> (rather than having no TZ value).
>> 
>> Some background on how Pq handles TZs. It always returns DTs in the TZ of 
>> the connection. I'm not sure if this is unique to postgres or if other DBs 
>> behave the same.
>> 
>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone;
>> timestamptz
>> 
>> 2018-08-03 01:00:00+01
>> 
>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>> timestamptz
>> 
>> 2018-08-03 01:00:00+01
>> 
>> The server will always return TZs in the connection timezone.
>> 
>> postgres=# set timezone=utc;
>> SET
>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>> timestamptz
>> 
>> 2018-08-03 00:00:00+00
>> (1 row)
>> 
>> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
>> timestamptz
>> 
>> 2018-08-03 00:00:00+00
>> (1 row)
>> 
>> 
>> 
>> 
>> -ash
>> 
>>> On 5 Aug 2018, at 21:28, Bolke de Bruin  wrote:
>>> 
>>> This is the issue:
>>> 
>>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 
>>> 00:00:00+00:00 tzinfo: 
>>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created >> example_http_operator @ 2018-08-03 02:00:00+02:00: 
>>> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
>>> 
>>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 
>>> 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, 
>>> name=None)
>>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created >> example_http_operator @ 2018-08-04 02:00:00+02:00: 
>>> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
>>> 
>>> Notice at line 1+2: that the next run date is correctly in UTC but from the 
>>> DB it gets a +2. At the next bit (3+4) we get a 
>>> psycopg2.tz.FixedOffsetTimezone which should be set to UTC according to the 
>>> specs of https://github.com/spoqa/sqlalchemy-utc 
>>>  , but it isn’t. 
>>> 
>>> So changing your setting of the DB to UTC fixes the symptom but not the 
>>> cause.
>>> 
>>> B.
>>> 
 On 5 Aug 2018, at 22:03, Ash Berlin-Taylor 
  wrote:
 
 Sorry for being terse before.
 
 So the issue is that the ts loaded from the DB is not in UTC, it's in 
 GB/+01 (the default of the DB server)
 
 For me, on a currently running 1.9 (no TZ) db:
 
 airflow=# select * from task_instance;
 get_op| example_http_operator | 2018-07-23 00:00:00
 
 This date time appears in the log url, and the path it looks at on S3 is 
 
 .../example_http_operator/2018-07-23T00:00:00/1.log
 
 If my postgres server has a default timezone of GB (which the one running 
 on my laptop does), and I then apply the migration then it is converted to 
 that local time.
 
 airflow=# select * from task_instance;
 get_op| example_http_operator | 2018-07-23 01:00:00+01
 
 airflow=# set timezone=UTC;
 airflow=# select * from task_instance;
 get_op| example_http_operator | 2018-07-23 00:00:00+00
 
 
 This is all okay so far. The migration has kept the column at the same 
 moment in time.
 
 The issue come when the UI tries to display logs for this old task: 
 because the timezone of the connection is not UTC, PG returns a date with 
 a +01 TZ. Thus after the migration this old task tries to look for a log 
 file of
 
 .../example_http_operator/2018-07-23T01:00:00/1.log
 
 which doesn't exist - it's changed the time it has rendered from midnight 
 (in v1.9) to 1am (in v1.10).
 
 (This is with my change to log_

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
Yeah I figured it out. Originally i was using a different implementation of 
UTCDateTime, but that was unmaintained. I switched, but this version changed or 
has a different contract. While it transforms on storing to UTC it does not so 
when it receives timezone aware fields from the db. Hence the issue.

I will prepare a PR that removes the dependency and implements our own 
extension of DateTime. Probably tomorrow.

Good catch! Just in time :-(.

B.

> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor  
> wrote:
> 
> Entirely possible, though I wasn't even dealing with the scheduler - the 
> issue I was addressing was entirely in the webserver for a pre-existing Task 
> Instance.
> 
> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that 
> isn't working right/ as expected. This line: 
> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>  doens't look right for us - as you mentioned the TZ is set to something 
> (rather than having no TZ value).
> 
> Some background on how Pq handles TZs. It always returns DTs in the TZ of the 
> connection. I'm not sure if this is unique to postgres or if other DBs behave 
> the same.
> 
> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone;
>  timestamptz
> 
> 2018-08-03 01:00:00+01
> 
> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>  timestamptz
> 
> 2018-08-03 01:00:00+01
> 
> The server will always return TZs in the connection timezone.
> 
> postgres=# set timezone=utc;
> SET
> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>  timestamptz
> 
> 2018-08-03 00:00:00+00
> (1 row)
> 
> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
>  timestamptz
> 
> 2018-08-03 00:00:00+00
> (1 row)
> 
> 
> 
> 
> -ash
> 
>> On 5 Aug 2018, at 21:28, Bolke de Bruin  wrote:
>> 
>> This is the issue:
>> 
>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 
>> 00:00:00+00:00 tzinfo: 
>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created > example_http_operator @ 2018-08-03 02:00:00+02:00: 
>> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
>> 
>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 
>> 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, name=None)
>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created > example_http_operator @ 2018-08-04 02:00:00+02:00: 
>> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
>> 
>> Notice at line 1+2: that the next run date is correctly in UTC but from the 
>> DB it gets a +2. At the next bit (3+4) we get a 
>> psycopg2.tz.FixedOffsetTimezone which should be set to UTC according to the 
>> specs of https://github.com/spoqa/sqlalchemy-utc 
>>  , but it isn’t. 
>> 
>> So changing your setting of the DB to UTC fixes the symptom but not the 
>> cause.
>> 
>> B.
>> 
>>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor  
>>> wrote:
>>> 
>>> Sorry for being terse before.
>>> 
>>> So the issue is that the ts loaded from the DB is not in UTC, it's in 
>>> GB/+01 (the default of the DB server)
>>> 
>>> For me, on a currently running 1.9 (no TZ) db:
>>> 
>>> airflow=# select * from task_instance;
>>> get_op| example_http_operator | 2018-07-23 00:00:00
>>> 
>>> This date time appears in the log url, and the path it looks at on S3 is 
>>> 
>>> .../example_http_operator/2018-07-23T00:00:00/1.log
>>> 
>>> If my postgres server has a default timezone of GB (which the one running 
>>> on my laptop does), and I then apply the migration then it is converted to 
>>> that local time.
>>> 
>>> airflow=# select * from task_instance;
>>> get_op| example_http_operator | 2018-07-23 01:00:00+01
>>> 
>>> airflow=# set timezone=UTC;
>>> airflow=# select * from task_instance;
>>> get_op| example_http_operator | 2018-07-23 00:00:00+00
>>> 
>>> 
>>> This is all okay so far. The migration has kept the column at the same 
>>> moment in time.
>>> 
>>> The issue come when the UI tries to display logs for this old task: because 
>>> the timezone of the connection is not UTC, PG returns a date with a +01 TZ. 
>>> Thus after the migration this old task tries to look for a log file of
>>> 
>>> .../example_http_operator/2018-07-23T01:00:00/1.log
>>> 
>>> which doesn't exist - it's changed the time it has rendered from midnight 
>>> (in v1.9) to 1am (in v1.10).
>>> 
>>> (This is with my change to log_filename_template from UPDATING.md in my 
>>> other branch)
>>> 
>>> Setting the timezone to UTC per connection means the behaviour of Airflow 
>>> doesn't change depending on how the server is configured.
>>> 
>>> -ash
>>> 
 On 5 Aug 2018, at 20:58, Bolke de Bruin  wrote:
 
 Digging in a bit further. 
 
  ti.dag_id / ti.task_id / 

Re: Podling Report Reminder - August 2018

2018-08-05 Thread Sid Anand
I'll do this now! Will send an update shortly to the Dev DL!

-s

On Fri, Aug 3, 2018 at 9:34 PM  wrote:

> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 15 August 2018, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, August 01).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Candidate names should not be made public before people are actually
> elected, so please do not include the names of potential committers or
> PPMC members in your report.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
> *   How does the podling rate their own maturity.
>
> This should be appended to the Incubator Wiki page at:
>
> https://wiki.apache.org/incubator/August2018
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>


Re: Apache Airflow welcome new committer/PMC member : Feng Tao (a.k.a. feng-tao)

2018-08-05 Thread Sid Anand
Welcome aboard Tao!
-s

On Sat, Aug 4, 2018 at 5:40 PM Tao Feng  wrote:

> Thanks everyone!
>
> On Fri, Aug 3, 2018 at 3:02 PM, Grant Nicholas <
> grantnicholas2...@u.northwestern.edu> wrote:
>
> > Congrats Feng!
> >
> > On Fri, Aug 3, 2018 at 12:35 PM Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> >
> > > Well deserved, welcome aboard!
> > >
> > > On Fri, Aug 3, 2018 at 9:07 AM Mark Grover <
> grover.markgro...@gmail.com>
> > > wrote:
> > >
> > > > Congrats Tao!
> > > >
> > > > On Fri, Aug 3, 2018, 08:52 Jin Chang  wrote:
> > > >
> > > > > Congrats, Tao!!
> > > > >
> > > > > On Fri, Aug 3, 2018 at 8:20 AM Taylor Edmiston <
> tedmis...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Congratulations, Feng!
> > > > > >
> > > > > > *Taylor Edmiston*
> > > > > > Blog  | CV
> > > > > >  | LinkedIn
> > > > > >  | AngelList
> > > > > >  | Stack Overflow
> > > > > > 
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 3, 2018 at 7:31 AM, Driesprong, Fokko
> > >  > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Welcome Feng! Awesome to have you on board!
> > > > > > >
> > > > > > > 2018-08-03 10:41 GMT+02:00 Naik Kaxil :
> > > > > > >
> > > > > > > > Hi Airflow'ers,
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Please join the Apache Airflow PMC in welcoming its newest
> > member
> > > > and
> > > > > > > >
> > > > > > > > co-committer, Feng Tao (a.k.a. feng-tao<
> > > > https://github.com/feng-tao
> > > > > >).
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Welcome Feng, great to have you on board!
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > >
> > > > > > > > Kaxil
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Kaxil Naik
> > > > > > > >
> > > > > > > > Data Reply
> > > > > > > > 2nd Floor, Nova South
> > > > > > > > 160 Victoria Street, Westminster
> > > > > > > >  > > > > > > Westminster+%0D%0ALondon+SW1E+5LB+-+UK&entry=gmail&source=g>
> > > > > > > > London SW1E 5LB - UK
> > > > > > > > phone: +44 (0)20 7730 6000
> > > > > > > > k.n...@reply.com
> > > > > > > > www.reply.com
> > > > > > > >
> > > > > > > > [image: Data Reply]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Airflow committers (a list)

2018-08-05 Thread Sid Anand
What are the instructions to add committers to this page?
-s

To update http://incubator.apache.org/projects/airflow.html, you need to
check out the incubator SVN repo and modifying content/projects/airflow.xml

I SVN'committed the change below for tfeng but am not sure when the html
page gets rebuilt and redeployed to the site. Some CI/CD process I believe.

sianand@LM-SJN-21002367:~/Projects/incubator/content/projects $ svn diff

Index: airflow.xml

===

--- airflow.xml (revision 1837468)

+++ airflow.xml (working copy)

@@ -180,6 +180,11 @@

   Kaxil Naik

 

 

+  .

+  tfeng

+  Tao Feng

+

+

   Extra

   .

   .

Every time we add a committer, we need to update this page. Here's some
info on the SVN repo so you can check it out for future reference:

sianand@LM-SJN-21002367:~/Projects/incubator/content/projects $ svn info

Path: .

Working Copy Root Path: /Users/sianand/Projects/incubator

URL:
https://svn.apache.org/repos/asf/incubator/public/trunk/content/projects

Relative URL: ^/incubator/public/trunk/content/projects

Repository Root: https://svn.apache.org/repos/asf

Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68

Revision: 1837468

Node Kind: directory

Schedule: normal

Last Changed Author: mck

Last Changed Rev: 1837396

Last Changed Date: 2018-08-03 19:07:54 -0700 (Fri, 03 Aug 2018)

On Sun, Aug 5, 2018 at 1:39 PM Ash Berlin-Taylor <
ash_airflowl...@firemirror.com> wrote:

>   - https://github.com/orgs/apache/teams/airflow-committers/members <
> https://github.com/orgs/apache/teams/airflow-committers/members>
>
> This one at least is populated automatically via a 30-minutely cron job.
>
>
> > On 5 Aug 2018, at 21:35, Sid Anand  wrote:
> >
> > Committers/Mentors,
> > We have several places where committers are listed:
> >
> >   - https://whimsy.apache.org/roster/ppmc/airflow
> >   - http://incubator.apache.org/projects/airflow.html
> >   - Mentors, is the committer list here meant to be current?
> >  - Kaxil, I added tfeng to this page (done via SVN).. though I'm not
> >  sure when the actual publishing happens to this site. I believe
> it's done
> >  via some CI/CD process now.
> >   - https://github.com/orgs/apache/teams/airflow-committers/members
> >   - This is needed for the Gitbox integration we are using now. Does it
> >  get populated automatically from whimsy?
> >
> > When we promote a contributor to committer/PPMC, do we need to update
> all 3
> > of these places?
> >
> > FYI, there is a PR to also add this list to the README so the community
> can
> > see who the committers are :
> > https://github.com/apache/incubator-airflow/pull/3699
> >
> > Currently, the committer section of the README points to a wiki page that
> > displays all of these links.
> >
> > And if you wanted more options, GitHub's CODEOWNERS file provides some
> > interesting functionality, though I don't think we need it:
> > https://blog.github.com/2017-07-06-introducing-code-owners/
> >
> > -s
>
>


Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
Entirely possible, though I wasn't even dealing with the scheduler - the issue 
I was addressing was entirely in the webserver for a pre-existing Task Instance.

Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that isn't 
working right/ as expected. This line: 
https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
 doens't look right for us - as you mentioned the TZ is set to something 
(rather than having no TZ value).

Some background on how Pq handles TZs. It always returns DTs in the TZ of the 
connection. I'm not sure if this is unique to postgres or if other DBs behave 
the same.

postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone;
  timestamptz

 2018-08-03 01:00:00+01

postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
  timestamptz

 2018-08-03 01:00:00+01

The server will always return TZs in the connection timezone.

postgres=# set timezone=utc;
SET
postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
  timestamptz

 2018-08-03 00:00:00+00
(1 row)

postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
  timestamptz

 2018-08-03 00:00:00+00
(1 row)




-ash

> On 5 Aug 2018, at 21:28, Bolke de Bruin  wrote:
> 
> This is the issue:
> 
> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 
> 00:00:00+00:00 tzinfo: 
> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created  example_http_operator @ 2018-08-03 02:00:00+02:00: 
> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
> 
> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 
> 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, name=None)
> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created  example_http_operator @ 2018-08-04 02:00:00+02:00: 
> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
> 
> Notice at line 1+2: that the next run date is correctly in UTC but from the 
> DB it gets a +2. At the next bit (3+4) we get a 
> psycopg2.tz.FixedOffsetTimezone which should be set to UTC according to the 
> specs of https://github.com/spoqa/sqlalchemy-utc 
>  , but it isn’t. 
> 
> So changing your setting of the DB to UTC fixes the symptom but not the cause.
> 
> B.
> 
>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor  
>> wrote:
>> 
>> Sorry for being terse before.
>> 
>> So the issue is that the ts loaded from the DB is not in UTC, it's in GB/+01 
>> (the default of the DB server)
>> 
>> For me, on a currently running 1.9 (no TZ) db:
>> 
>> airflow=# select * from task_instance;
>> get_op| example_http_operator | 2018-07-23 00:00:00
>> 
>> This date time appears in the log url, and the path it looks at on S3 is 
>> 
>> .../example_http_operator/2018-07-23T00:00:00/1.log
>> 
>> If my postgres server has a default timezone of GB (which the one running on 
>> my laptop does), and I then apply the migration then it is converted to that 
>> local time.
>> 
>> airflow=# select * from task_instance;
>> get_op| example_http_operator | 2018-07-23 01:00:00+01
>> 
>> airflow=# set timezone=UTC;
>> airflow=# select * from task_instance;
>> get_op| example_http_operator | 2018-07-23 00:00:00+00
>> 
>> 
>> This is all okay so far. The migration has kept the column at the same 
>> moment in time.
>> 
>> The issue come when the UI tries to display logs for this old task: because 
>> the timezone of the connection is not UTC, PG returns a date with a +01 TZ. 
>> Thus after the migration this old task tries to look for a log file of
>> 
>> .../example_http_operator/2018-07-23T01:00:00/1.log
>> 
>> which doesn't exist - it's changed the time it has rendered from midnight 
>> (in v1.9) to 1am (in v1.10).
>> 
>> (This is with my change to log_filename_template from UPDATING.md in my 
>> other branch)
>> 
>> Setting the timezone to UTC per connection means the behaviour of Airflow 
>> doesn't change depending on how the server is configured.
>> 
>> -ash
>> 
>>> On 5 Aug 2018, at 20:58, Bolke de Bruin  wrote:
>>> 
>>> Digging in a bit further. 
>>> 
>>>  ti.dag_id / ti.task_id / ts / try_number 
>>> .log
>>> 
>>> is the format
>>> 
>>> ts = execution_date.isoformat and should be in UTC afaik.
>>> 
>>> something is weird tbh.
>>> 
>>> B.
>>> 
>>> 
 On 5 Aug 2018, at 21:32, Bolke de Bruin  wrote:
 
 Ash,
 
 Reading your proposed changes on your “set-timezone-to-utc” branch and 
 below analysis, I am not sure what you are perceiving as an issue.
 
 For conversion we assume everything is stored in UTC and in a naive 
 format. Conversion then adds the timezone information. This results in the 
 following
 
 postgres timezone = “Europe/Amsterdam”
 
 
 airflow=# select * from task_insta

Re: Airflow committers (a list)

2018-08-05 Thread Ash Berlin-Taylor
  - https://github.com/orgs/apache/teams/airflow-committers/members 


This one at least is populated automatically via a 30-minutely cron job.


> On 5 Aug 2018, at 21:35, Sid Anand  wrote:
> 
> Committers/Mentors,
> We have several places where committers are listed:
> 
>   - https://whimsy.apache.org/roster/ppmc/airflow
>   - http://incubator.apache.org/projects/airflow.html
>   - Mentors, is the committer list here meant to be current?
>  - Kaxil, I added tfeng to this page (done via SVN).. though I'm not
>  sure when the actual publishing happens to this site. I believe it's done
>  via some CI/CD process now.
>   - https://github.com/orgs/apache/teams/airflow-committers/members
>   - This is needed for the Gitbox integration we are using now. Does it
>  get populated automatically from whimsy?
> 
> When we promote a contributor to committer/PPMC, do we need to update all 3
> of these places?
> 
> FYI, there is a PR to also add this list to the README so the community can
> see who the committers are :
> https://github.com/apache/incubator-airflow/pull/3699
> 
> Currently, the committer section of the README points to a wiki page that
> displays all of these links.
> 
> And if you wanted more options, GitHub's CODEOWNERS file provides some
> interesting functionality, though I don't think we need it:
> https://blog.github.com/2017-07-06-introducing-code-owners/
> 
> -s



Airflow committers (a list)

2018-08-05 Thread Sid Anand
Committers/Mentors,
We have several places where committers are listed:

   - https://whimsy.apache.org/roster/ppmc/airflow
   - http://incubator.apache.org/projects/airflow.html
   - Mentors, is the committer list here meant to be current?
  - Kaxil, I added tfeng to this page (done via SVN).. though I'm not
  sure when the actual publishing happens to this site. I believe it's done
  via some CI/CD process now.
   - https://github.com/orgs/apache/teams/airflow-committers/members
   - This is needed for the Gitbox integration we are using now. Does it
  get populated automatically from whimsy?

When we promote a contributor to committer/PPMC, do we need to update all 3
of these places?

FYI, there is a PR to also add this list to the README so the community can
see who the committers are :
https://github.com/apache/incubator-airflow/pull/3699

Currently, the committer section of the README points to a wiki page that
displays all of these links.

And if you wanted more options, GitHub's CODEOWNERS file provides some
interesting functionality, though I don't think we need it:
https://blog.github.com/2017-07-06-introducing-code-owners/

-s


Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
This is the issue:

[2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 
00:00:00+00:00 tzinfo: 
[2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created 

[2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 
02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, name=None)
[2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created 

Notice at line 1+2: that the next run date is correctly in UTC but from the DB 
it gets a +2. At the next bit (3+4) we get a psycopg2.tz.FixedOffsetTimezone 
which should be set to UTC according to the specs of 
https://github.com/spoqa/sqlalchemy-utc 
 , but it isn’t. 

So changing your setting of the DB to UTC fixes the symptom but not the cause.

B.

> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor  
> wrote:
> 
> Sorry for being terse before.
> 
> So the issue is that the ts loaded from the DB is not in UTC, it's in GB/+01 
> (the default of the DB server)
> 
> For me, on a currently running 1.9 (no TZ) db:
> 
> airflow=# select * from task_instance;
> get_op| example_http_operator | 2018-07-23 00:00:00
> 
> This date time appears in the log url, and the path it looks at on S3 is 
> 
> .../example_http_operator/2018-07-23T00:00:00/1.log
> 
> If my postgres server has a default timezone of GB (which the one running on 
> my laptop does), and I then apply the migration then it is converted to that 
> local time.
> 
> airflow=# select * from task_instance;
> get_op| example_http_operator | 2018-07-23 01:00:00+01
> 
> airflow=# set timezone=UTC;
> airflow=# select * from task_instance;
> get_op| example_http_operator | 2018-07-23 00:00:00+00
> 
> 
> This is all okay so far. The migration has kept the column at the same moment 
> in time.
> 
> The issue come when the UI tries to display logs for this old task: because 
> the timezone of the connection is not UTC, PG returns a date with a +01 TZ. 
> Thus after the migration this old task tries to look for a log file of
> 
> .../example_http_operator/2018-07-23T01:00:00/1.log
> 
> which doesn't exist - it's changed the time it has rendered from midnight (in 
> v1.9) to 1am (in v1.10).
> 
> (This is with my change to log_filename_template from UPDATING.md in my other 
> branch)
> 
> Setting the timezone to UTC per connection means the behaviour of Airflow 
> doesn't change depending on how the server is configured.
> 
> -ash
> 
>> On 5 Aug 2018, at 20:58, Bolke de Bruin  wrote:
>> 
>> Digging in a bit further. 
>> 
>>  ti.dag_id / ti.task_id / ts / try_number 
>> .log
>> 
>> is the format
>> 
>> ts = execution_date.isoformat and should be in UTC afaik.
>> 
>> something is weird tbh.
>> 
>> B.
>> 
>> 
>>> On 5 Aug 2018, at 21:32, Bolke de Bruin  wrote:
>>> 
>>> Ash,
>>> 
>>> Reading your proposed changes on your “set-timezone-to-utc” branch and 
>>> below analysis, I am not sure what you are perceiving as an issue.
>>> 
>>> For conversion we assume everything is stored in UTC and in a naive format. 
>>> Conversion then adds the timezone information. This results in the following
>>> 
>>> postgres timezone = “Europe/Amsterdam”
>>> 
>>> 
>>> airflow=# select * from task_instance;
>>> get_op| example_http_operator | 2018-07-27 02:00:00+02
>>> 
>>> airflow=# set timezone=UTC;
>>> airflow=# select * from task_instance;
>>> get_op| example_http_operator | 2018-07-27 00:00:00+00
>>> 
>>> If we don’t set the timezone in the connection postgres assumes server 
>>> timezone (in my case “Europe/Amsterdam”). So every datetime Airflow 
>>> receives will be in “Europe/Amsterdam” format. However as we defined the 
>>> model to use UTCDateTime it will always convert the returned DateTime to 
>>> UTC.
>>> 
>>> If we have configured Airflow to support something else as UTC as the 
>>> default timezone or a DAG has a associated timezone we only convert to that 
>>> timezone when calculating the next runtime (not for cron btw). Nowhere else 
>>> and thus we are UTC everywhere.
>>> 
>>> What do you think is inconsistent?
>>> 
>>> Bolke
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
 On 5 Aug 2018, at 18:13, Ash Berlin-Taylor 
  wrote:
 
 Relating to 2): I'm not sure that the upgrade from timezoneless to 
 timezone aware colums in the task instance is right, or at least it's not 
 what I expected.
 
 Before weren't all TZs from schedule dates etc in UTC? For the same task 
 instance (these outputs from psql directly):
 
 before: execution_date=2017-09-04 00:00:00
 after: execution_date=2017-09-04 01:00:00+01
 
 **Okay the migration is fine**. It appears that the migration has done the 
 right thing, but my local DB I'm testing with has a Timezone of GB set, so 
 Postgres converts it to that TZ on returning an object.
 
 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
 consistent behaviour? Is

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
Sorry for being terse before.

So the issue is that the ts loaded from the DB is not in UTC, it's in GB/+01 
(the default of the DB server)

For me, on a currently running 1.9 (no TZ) db:

airflow=# select * from task_instance;
get_op| example_http_operator | 2018-07-23 00:00:00

This date time appears in the log url, and the path it looks at on S3 is 

.../example_http_operator/2018-07-23T00:00:00/1.log

If my postgres server has a default timezone of GB (which the one running on my 
laptop does), and I then apply the migration then it is converted to that local 
time.

airflow=# select * from task_instance;
get_op| example_http_operator | 2018-07-23 01:00:00+01

airflow=# set timezone=UTC;
airflow=# select * from task_instance;
get_op| example_http_operator | 2018-07-23 00:00:00+00


This is all okay so far. The migration has kept the column at the same moment 
in time.

The issue come when the UI tries to display logs for this old task: because the 
timezone of the connection is not UTC, PG returns a date with a +01 TZ. Thus 
after the migration this old task tries to look for a log file of

.../example_http_operator/2018-07-23T01:00:00/1.log

which doesn't exist - it's changed the time it has rendered from midnight (in 
v1.9) to 1am (in v1.10).

(This is with my change to log_filename_template from UPDATING.md in my other 
branch)

Setting the timezone to UTC per connection means the behaviour of Airflow 
doesn't change depending on how the server is configured.

-ash

> On 5 Aug 2018, at 20:58, Bolke de Bruin  wrote:
> 
> Digging in a bit further. 
> 
>  ti.dag_id / ti.task_id / ts / try_number .log
> 
> is the format
> 
> ts = execution_date.isoformat and should be in UTC afaik.
> 
> something is weird tbh.
> 
> B.
> 
> 
>> On 5 Aug 2018, at 21:32, Bolke de Bruin  wrote:
>> 
>> Ash,
>> 
>> Reading your proposed changes on your “set-timezone-to-utc” branch and below 
>> analysis, I am not sure what you are perceiving as an issue.
>> 
>> For conversion we assume everything is stored in UTC and in a naive format. 
>> Conversion then adds the timezone information. This results in the following
>> 
>> postgres timezone = “Europe/Amsterdam”
>> 
>> 
>> airflow=# select * from task_instance;
>> get_op| example_http_operator | 2018-07-27 02:00:00+02
>> 
>> airflow=# set timezone=UTC;
>> airflow=# select * from task_instance;
>> get_op| example_http_operator | 2018-07-27 00:00:00+00
>> 
>> If we don’t set the timezone in the connection postgres assumes server 
>> timezone (in my case “Europe/Amsterdam”). So every datetime Airflow receives 
>> will be in “Europe/Amsterdam” format. However as we defined the model to use 
>> UTCDateTime it will always convert the returned DateTime to UTC.
>> 
>> If we have configured Airflow to support something else as UTC as the 
>> default timezone or a DAG has a associated timezone we only convert to that 
>> timezone when calculating the next runtime (not for cron btw). Nowhere else 
>> and thus we are UTC everywhere.
>> 
>> What do you think is inconsistent?
>> 
>> Bolke
>> 
>> 
>> 
>> 
>> 
>> 
>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor  
>>> wrote:
>>> 
>>> Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
>>> aware colums in the task instance is right, or at least it's not what I 
>>> expected.
>>> 
>>> Before weren't all TZs from schedule dates etc in UTC? For the same task 
>>> instance (these outputs from psql directly):
>>> 
>>> before: execution_date=2017-09-04 00:00:00
>>> after: execution_date=2017-09-04 01:00:00+01
>>> 
>>> **Okay the migration is fine**. It appears that the migration has done the 
>>> right thing, but my local DB I'm testing with has a Timezone of GB set, so 
>>> Postgres converts it to that TZ on returning an object.
>>> 
>>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
>>> consistent behaviour? Is this possible some how? I don't know SQLAlchemy 
>>> that well.
>>> 
>>> 
>>> -ash
>>> 
 On 5 Aug 2018, at 16:01, Ash Berlin-Taylor 
  wrote:
 
 1.) Missing UPDATING note about change of task_log_reader to now always 
 being "task" (was "s3.task" before.). Logging config is much simpler now 
 though. This may be particular to my logging config, but given how much of 
 a pain it was to set up S3 logging in 1.9 I have shared my config with 
 some people in the Gitter chat so It's not just me.
 
 2) The path that log-files are written to in S3 has changed (again - this 
 happened from 1.8 to 1.9). I'd like to avoid having to move all of my log 
 files again to continue viewing them. The change is that the path now (in 
 1.10) has a timezone in it, and the date is in local time, before it was 
 UTC:
 
 before: 2018-07-23T00:00:00/1.log
 after: 2018-07-23T01:00:00+01:00/1.log
 
 We can possibly get away with an updating note about

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
Digging in a bit further. 

 ti.dag_id / ti.task_id / ts / try_number .log

is the format

ts = execution_date.isoformat and should be in UTC afaik.

something is weird tbh.

B.


> On 5 Aug 2018, at 21:32, Bolke de Bruin  wrote:
> 
> Ash,
> 
> Reading your proposed changes on your “set-timezone-to-utc” branch and below 
> analysis, I am not sure what you are perceiving as an issue.
> 
> For conversion we assume everything is stored in UTC and in a naive format. 
> Conversion then adds the timezone information. This results in the following
> 
> postgres timezone = “Europe/Amsterdam”
> 
> 
> airflow=# select * from task_instance;
> get_op| example_http_operator | 2018-07-27 02:00:00+02
> 
> airflow=# set timezone=UTC;
> airflow=# select * from task_instance;
> get_op| example_http_operator | 2018-07-27 00:00:00+00
> 
> If we don’t set the timezone in the connection postgres assumes server 
> timezone (in my case “Europe/Amsterdam”). So every datetime Airflow receives 
> will be in “Europe/Amsterdam” format. However as we defined the model to use 
> UTCDateTime it will always convert the returned DateTime to UTC.
> 
> If we have configured Airflow to support something else as UTC as the default 
> timezone or a DAG has a associated timezone we only convert to that timezone 
> when calculating the next runtime (not for cron btw). Nowhere else and thus 
> we are UTC everywhere.
> 
> What do you think is inconsistent?
> 
> Bolke
> 
> 
> 
> 
> 
> 
>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor  
>> wrote:
>> 
>> Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
>> aware colums in the task instance is right, or at least it's not what I 
>> expected.
>> 
>> Before weren't all TZs from schedule dates etc in UTC? For the same task 
>> instance (these outputs from psql directly):
>> 
>> before: execution_date=2017-09-04 00:00:00
>> after: execution_date=2017-09-04 01:00:00+01
>> 
>> **Okay the migration is fine**. It appears that the migration has done the 
>> right thing, but my local DB I'm testing with has a Timezone of GB set, so 
>> Postgres converts it to that TZ on returning an object.
>> 
>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
>> consistent behaviour? Is this possible some how? I don't know SQLAlchemy 
>> that well.
>> 
>> 
>> -ash
>> 
>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor  
>>> wrote:
>>> 
>>> 1.) Missing UPDATING note about change of task_log_reader to now always 
>>> being "task" (was "s3.task" before.). Logging config is much simpler now 
>>> though. This may be particular to my logging config, but given how much of 
>>> a pain it was to set up S3 logging in 1.9 I have shared my config with some 
>>> people in the Gitter chat so It's not just me.
>>> 
>>> 2) The path that log-files are written to in S3 has changed (again - this 
>>> happened from 1.8 to 1.9). I'd like to avoid having to move all of my log 
>>> files again to continue viewing them. The change is that the path now (in 
>>> 1.10) has a timezone in it, and the date is in local time, before it was 
>>> UTC:
>>> 
>>> before: 2018-07-23T00:00:00/1.log
>>> after: 2018-07-23T01:00:00+01:00/1.log
>>> 
>>> We can possibly get away with an updating note about this to set a custom 
>>> log_filename_template. Testing this now.
>>> 
 On 5 Aug 2018, at 15:00, Ash Berlin-Taylor  wrote:
 
 -1(binding) from me.
 
 Installed with:
 
 AIRFLOW_GPL_UNIDECODE=yes pip install 
 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
  
 ,
  s3, crypto]>=1.10'
 
 Install went fine.
 
 Our DAGs that use SparkSubmitOperator are now failing as there is now a 
 hard dependency on the Kubernetes client libs, but the `emr` group doesn't 
 mention this.
 
 Introduced in https://github.com/apache/incubator-airflow/pull/3112 
 
 
 I see two options for this - either conditionally enable k8s:// support if 
 the import works, or (less preferred) add kube-client to the emr deps 
 (which I like less)
 
 Sorry - this is the first time I've been able to test it.
 
 I will install this dep manually and continue testing.
 
 -ash
 
 (Normally no time at home due to new baby, but I got a standing desk, and 
 a carrier meaning she can sleep on me and I can use my laptop. Win!)
 
 
 
> On 4 Aug 2018, at 22:32, Bolke de Bruin  > wrote:
> 
> Bump. 
> 
> Committers please cast your vote. 
> 
> B.
> 
> Sent from my iPhone
> 
>> On 3 Aug 2018, at 13:23, Driesprong, Fokko >

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
Ash,

Reading your proposed changes on your “set-timezone-to-utc” branch and below 
analysis, I am not sure what you are perceiving as an issue.

For conversion we assume everything is stored in UTC and in a naive format. 
Conversion then adds the timezone information. This results in the following

postgres timezone = “Europe/Amsterdam”


airflow=# select * from task_instance;
get_op| example_http_operator | 2018-07-27 02:00:00+02

airflow=# set timezone=UTC;
airflow=# select * from task_instance;
get_op| example_http_operator | 2018-07-27 00:00:00+00

If we don’t set the timezone in the connection postgres assumes server timezone 
(in my case “Europe/Amsterdam”). So every datetime Airflow receives will be in 
“Europe/Amsterdam” format. However as we defined the model to use UTCDateTime 
it will always convert the returned DateTime to UTC.

If we have configured Airflow to support something else as UTC as the default 
timezone or a DAG has a associated timezone we only convert to that timezone 
when calculating the next runtime (not for cron btw). Nowhere else and thus we 
are UTC everywhere.

What do you think is inconsistent?

Bolke






> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor  
> wrote:
> 
> Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
> aware colums in the task instance is right, or at least it's not what I 
> expected.
> 
> Before weren't all TZs from schedule dates etc in UTC? For the same task 
> instance (these outputs from psql directly):
> 
> before: execution_date=2017-09-04 00:00:00
> after: execution_date=2017-09-04 01:00:00+01
> 
> **Okay the migration is fine**. It appears that the migration has done the 
> right thing, but my local DB I'm testing with has a Timezone of GB set, so 
> Postgres converts it to that TZ on returning an object.
> 
> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
> consistent behaviour? Is this possible some how? I don't know SQLAlchemy that 
> well.
> 
> 
> -ash
> 
>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor  
>> wrote:
>> 
>> 1.) Missing UPDATING note about change of task_log_reader to now always 
>> being "task" (was "s3.task" before.). Logging config is much simpler now 
>> though. This may be particular to my logging config, but given how much of a 
>> pain it was to set up S3 logging in 1.9 I have shared my config with some 
>> people in the Gitter chat so It's not just me.
>> 
>> 2) The path that log-files are written to in S3 has changed (again - this 
>> happened from 1.8 to 1.9). I'd like to avoid having to move all of my log 
>> files again to continue viewing them. The change is that the path now (in 
>> 1.10) has a timezone in it, and the date is in local time, before it was UTC:
>> 
>> before: 2018-07-23T00:00:00/1.log
>> after: 2018-07-23T01:00:00+01:00/1.log
>> 
>> We can possibly get away with an updating note about this to set a custom 
>> log_filename_template. Testing this now.
>> 
>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor  wrote:
>>> 
>>> -1(binding) from me.
>>> 
>>> Installed with:
>>> 
>>> AIRFLOW_GPL_UNIDECODE=yes pip install 
>>> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>>>  
>>> ,
>>>  s3, crypto]>=1.10'
>>> 
>>> Install went fine.
>>> 
>>> Our DAGs that use SparkSubmitOperator are now failing as there is now a 
>>> hard dependency on the Kubernetes client libs, but the `emr` group doesn't 
>>> mention this.
>>> 
>>> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
>>> 
>>> 
>>> I see two options for this - either conditionally enable k8s:// support if 
>>> the import works, or (less preferred) add kube-client to the emr deps 
>>> (which I like less)
>>> 
>>> Sorry - this is the first time I've been able to test it.
>>> 
>>> I will install this dep manually and continue testing.
>>> 
>>> -ash
>>> 
>>> (Normally no time at home due to new baby, but I got a standing desk, and a 
>>> carrier meaning she can sleep on me and I can use my laptop. Win!)
>>> 
>>> 
>>> 
 On 4 Aug 2018, at 22:32, Bolke de Bruin >>> > wrote:
 
 Bump. 
 
 Committers please cast your vote. 
 
 B.
 
 Sent from my iPhone
 
> On 3 Aug 2018, at 13:23, Driesprong, Fokko  > wrote:
> 
> +1 Binding
> 
> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>  
> 
> 
> Cheers, Fokko
> 
> 2018-08

Re: Apache Git Services

2018-08-05 Thread Sid Anand
I believe Chris R. is also an ASF member and can also create the new
mailing list.

Mentors, Chris R,
Can you help on this?

Ash can then update the current infra ticket (or create a new one) to
redirect some of the notifications.

-s

On Thu, Aug 2, 2018 at 2:11 AM Ash Berlin-Taylor <
ash_airflowl...@firemirror.com> wrote:

> I think we now only get them once, rather than once from gitbox, and once
> again form gitbox sending them to Jira :/
>
> On https://issues.apache.org/jira/browse/INFRA-16854 it was said we
> require github notifications to go a list (even though we didn't have them
> before. Guess policies change, eh?), and using notifications@ is a common
> pattern other projects use.
>
> We need one of our Apache mentors to create us
> notificati...@airflow.incubator.apache.org via
> http://selfserve.apache.org/ then we can get the github comments etc
> redirected there.
>
> Chris, Hitesh, Jakob: would one of you be so kind as to create this list
> for us? Thanks.
>
> -ash
>
> > On 1 Aug 2018, at 22:46, Sid Anand  wrote:
> >
> > So, apparently, we should no longer see comments via Gitbox?
> >
> > On Wed, Aug 1, 2018 at 3:40 AM Michał Niemiec <
> michal.niem...@hotmail.com>
> > wrote:
> >
> >> I find the experience similar - it's already classified as spam and
> makes
> >> it bit difficult to use.
> >>
> >> Any chance this could become separate list maybe?
> >>
> >> Regards
> >> Michał Niemiec
> >>
> >>
> >> On 01/08/2018, 12:07, "Victor Noagbodji" <
> >> vnoagbo...@amplify-analytics.com> wrote:
> >>
> >>Hey people, what is Apache Git Services? And why are we all receiving
> >> notifications (even those by bots) sent by that service? Where can I
> turn
> >> those off? They are (for me at least) ruining the mailing list
> experience...
> >>
> >>
>
>


Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
It is not being smart. It does as it is required to: all our supported 
databases (apart from sqlite) do this but also Oracle and SqL server. 
Nevertheless we enforce utc by using UTCDateTime as the replacement for 
sqlalchemy's DateTime field this makes sure whatever the database sends us we 
transform it to UtC. 

I'll have a look at what needs to be done. 

Conditional import sounds fine to me. 

B.


Sent from my iPhone

> On 5 Aug 2018, at 19:14, Ash Berlin-Taylor  wrote:
> 
> 
>> On 5 Aug 2018, at 18:01, Bolke de Bruin  wrote:
>> 
>> Hi Ash,
>> 
>> Thanks a lot for the proper review, obviously I would have liked that these 
>> issues (I think just one) are popping up at rc3 but I understand why it 
>> happened. 
> 
> Yeah, sorry I didn't couldn't make time to test the betas :(
> 
>> 
>> Can you work out a patch for the k8s issue? I’m sure Fokko and others can 
>> chime in to make sure it will be the right change. 
> 
> Working on it - I'll go for a conditional import, and only except if the 
> "k8s://" scheme is specified I think.
> 
>> 
>> On the timezone change. The database will do the right thing and correctly 
>> transform a datetime into the timezone the client is using. Even then we 
>> enforce UTC internally and only transform it for user interaction or when it 
>> is relevant (to make sure we do daylight savings for example). It is 
>> therefore not required to force a timezone setting with sql alchemy beyond 
>> when we convert to timezone aware (see migration scripts).
> 
> I think the database is being "smart" here in converting, but I'm not sure 
> it's the Right thing. It wouldn't surprise me if we have other places in the 
> codebase that expect datetime columns to come back in UTC, but they might 
> come back in DB-server local timezone.
> 
> Trying 
> https://github.com/apache/incubator-airflow/compare/master...ashb:set-timezone-to-utc-on-connect?expand=1
>  
> 
>  - it "fixes" my logging issue, tests are running 
> https://travis-ci.org/ashb/incubator-airflow/builds/412360920
> 
>> 
>> On the logging file format I agree this could be handled better. However I 
>> do think we should honor local system time for this as this is the standard 
>> for any other logging. Also logging output will be time stamped  in local 
>> system time. Maybe we could cut off the timezone identifier as it can be 
>> assumed to be in local system time (+01:00). 
> 
> The issue with just cutting off the timezone is that old log files are now 
> unviewable - they ran at 00:00:00 UTC, but the hour of the record coming back 
> is 01.
> 
>> 
>> If we take on the k8s fix we can also fix the logging format. What do you 
>> think?
> 
> Also as a quick fix I've changed the UPDATING.md as suggested: 
> https://github.com/apache/incubator-airflow/compare/master...ashb:updating-for-logging-changes?expand=1.
>  The log format is a bit clunky, but the note about log_task_reader is needed 
> either way. (Do we need a Jira ticket for this sort of change, or is 
> AIRFLOW-XXX okay for this?)
> 
>> 
>> Cheers
>> Bolke
>> 
>> Verstuurd vanaf mijn iPad
>> 
>>> Op 5 aug. 2018 om 18:13 heeft Ash Berlin-Taylor 
>>>  het volgende geschreven:
>>> 
>>> Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
>>> aware colums in the task instance is right, or at least it's not what I 
>>> expected.
>>> 
>>> Before weren't all TZs from schedule dates etc in UTC? For the same task 
>>> instance (these outputs from psql directly):
>>> 
>>> before: execution_date=2017-09-04 00:00:00
>>> after: execution_date=2017-09-04 01:00:00+01
>>> 
>>> **Okay the migration is fine**. It appears that the migration has done the 
>>> right thing, but my local DB I'm testing with has a Timezone of GB set, so 
>>> Postgres converts it to that TZ on returning an object.
>>> 
>>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
>>> consistent behaviour? Is this possible some how? I don't know SQLAlchemy 
>>> that well.
>>> 
>>> 
>>> -ash
>>> 
 On 5 Aug 2018, at 16:01, Ash Berlin-Taylor 
  wrote:
 
 1.) Missing UPDATING note about change of task_log_reader to now always 
 being "task" (was "s3.task" before.). Logging config is much simpler now 
 though. This may be particular to my logging config, but given how much of 
 a pain it was to set up S3 logging in 1.9 I have shared my config with 
 some people in the Gitter chat so It's not just me.
 
 2) The path that log-files are written to in S3 has changed (again - this 
 happened from 1.8 to 1.9). I'd like to avoid having to move all of my log 
 files again to continue viewing them. The change is that the path now (in 
 1.10) has a timezone in it, and the date is in local time, before it was 
 UTC:
 
 before: 2018-07-23T00:00:00/1.log
 after: 2018-07-23T01:00:00+01:00/1.log
 
 We c

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor

> On 5 Aug 2018, at 18:01, Bolke de Bruin  wrote:
> 
> Hi Ash,
> 
> Thanks a lot for the proper review, obviously I would have liked that these 
> issues (I think just one) are popping up at rc3 but I understand why it 
> happened. 

Yeah, sorry I didn't couldn't make time to test the betas :(

> 
> Can you work out a patch for the k8s issue? I’m sure Fokko and others can 
> chime in to make sure it will be the right change. 

Working on it - I'll go for a conditional import, and only except if the 
"k8s://" scheme is specified I think.

> 
> On the timezone change. The database will do the right thing and correctly 
> transform a datetime into the timezone the client is using. Even then we 
> enforce UTC internally and only transform it for user interaction or when it 
> is relevant (to make sure we do daylight savings for example). It is 
> therefore not required to force a timezone setting with sql alchemy beyond 
> when we convert to timezone aware (see migration scripts).

I think the database is being "smart" here in converting, but I'm not sure it's 
the Right thing. It wouldn't surprise me if we have other places in the 
codebase that expect datetime columns to come back in UTC, but they might come 
back in DB-server local timezone.

Trying 
https://github.com/apache/incubator-airflow/compare/master...ashb:set-timezone-to-utc-on-connect?expand=1
 

 - it "fixes" my logging issue, tests are running 
https://travis-ci.org/ashb/incubator-airflow/builds/412360920

> 
> On the logging file format I agree this could be handled better. However I do 
> think we should honor local system time for this as this is the standard for 
> any other logging. Also logging output will be time stamped  in local system 
> time. Maybe we could cut off the timezone identifier as it can be assumed to 
> be in local system time (+01:00). 

The issue with just cutting off the timezone is that old log files are now 
unviewable - they ran at 00:00:00 UTC, but the hour of the record coming back 
is 01.

> 
> If we take on the k8s fix we can also fix the logging format. What do you 
> think?

Also as a quick fix I've changed the UPDATING.md as suggested: 
https://github.com/apache/incubator-airflow/compare/master...ashb:updating-for-logging-changes?expand=1.
 The log format is a bit clunky, but the note about log_task_reader is needed 
either way. (Do we need a Jira ticket for this sort of change, or is 
AIRFLOW-XXX okay for this?)

> 
> Cheers
> Bolke
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 5 aug. 2018 om 18:13 heeft Ash Berlin-Taylor 
>>  het volgende geschreven:
>> 
>> Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
>> aware colums in the task instance is right, or at least it's not what I 
>> expected.
>> 
>> Before weren't all TZs from schedule dates etc in UTC? For the same task 
>> instance (these outputs from psql directly):
>> 
>> before: execution_date=2017-09-04 00:00:00
>> after: execution_date=2017-09-04 01:00:00+01
>> 
>> **Okay the migration is fine**. It appears that the migration has done the 
>> right thing, but my local DB I'm testing with has a Timezone of GB set, so 
>> Postgres converts it to that TZ on returning an object.
>> 
>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
>> consistent behaviour? Is this possible some how? I don't know SQLAlchemy 
>> that well.
>> 
>> 
>> -ash
>> 
>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor  
>>> wrote:
>>> 
>>> 1.) Missing UPDATING note about change of task_log_reader to now always 
>>> being "task" (was "s3.task" before.). Logging config is much simpler now 
>>> though. This may be particular to my logging config, but given how much of 
>>> a pain it was to set up S3 logging in 1.9 I have shared my config with some 
>>> people in the Gitter chat so It's not just me.
>>> 
>>> 2) The path that log-files are written to in S3 has changed (again - this 
>>> happened from 1.8 to 1.9). I'd like to avoid having to move all of my log 
>>> files again to continue viewing them. The change is that the path now (in 
>>> 1.10) has a timezone in it, and the date is in local time, before it was 
>>> UTC:
>>> 
>>> before: 2018-07-23T00:00:00/1.log
>>> after: 2018-07-23T01:00:00+01:00/1.log
>>> 
>>> We can possibly get away with an updating note about this to set a custom 
>>> log_filename_template. Testing this now.
>>> 
 On 5 Aug 2018, at 15:00, Ash Berlin-Taylor  wrote:
 
 -1(binding) from me.
 
 Installed with:
 
 AIRFLOW_GPL_UNIDECODE=yes pip install 
 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
  
 ,
  s3, crypto]>=1.10'
 
 Install w

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
Hi Ash,

Thanks a lot for the proper review, obviously I would have liked that these 
issues (I think just one) are popping up at rc3 but I understand why it 
happened. 

Can you work out a patch for the k8s issue? I’m sure Fokko and others can chime 
in to make sure it will be the right change. 

On the timezone change. The database will do the right thing and correctly 
transform a datetime into the timezone the client is using. Even then we 
enforce UTC internally and only transform it for user interaction or when it is 
relevant (to make sure we do daylight savings for example). It is therefore not 
required to force a timezone setting with sql alchemy beyond when we convert to 
timezone aware (see migration scripts).

On the logging file format I agree this could be handled better. However I do 
think we should honor local system time for this as this is the standard for 
any other logging. Also logging output will be time stamped  in local system 
time. Maybe we could cut off the timezone identifier as it can be assumed to be 
in local system time (+01:00). 

If we take on the k8s fix we can also fix the logging format. What do you think?

Cheers
Bolke

Verstuurd vanaf mijn iPad

> Op 5 aug. 2018 om 18:13 heeft Ash Berlin-Taylor 
>  het volgende geschreven:
> 
> Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
> aware colums in the task instance is right, or at least it's not what I 
> expected.
> 
> Before weren't all TZs from schedule dates etc in UTC? For the same task 
> instance (these outputs from psql directly):
> 
> before: execution_date=2017-09-04 00:00:00
> after: execution_date=2017-09-04 01:00:00+01
> 
> **Okay the migration is fine**. It appears that the migration has done the 
> right thing, but my local DB I'm testing with has a Timezone of GB set, so 
> Postgres converts it to that TZ on returning an object.
> 
> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
> consistent behaviour? Is this possible some how? I don't know SQLAlchemy that 
> well.
> 
> 
> -ash
> 
>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor  
>> wrote:
>> 
>> 1.) Missing UPDATING note about change of task_log_reader to now always 
>> being "task" (was "s3.task" before.). Logging config is much simpler now 
>> though. This may be particular to my logging config, but given how much of a 
>> pain it was to set up S3 logging in 1.9 I have shared my config with some 
>> people in the Gitter chat so It's not just me.
>> 
>> 2) The path that log-files are written to in S3 has changed (again - this 
>> happened from 1.8 to 1.9). I'd like to avoid having to move all of my log 
>> files again to continue viewing them. The change is that the path now (in 
>> 1.10) has a timezone in it, and the date is in local time, before it was UTC:
>> 
>> before: 2018-07-23T00:00:00/1.log
>> after: 2018-07-23T01:00:00+01:00/1.log
>> 
>> We can possibly get away with an updating note about this to set a custom 
>> log_filename_template. Testing this now.
>> 
>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor  wrote:
>>> 
>>> -1(binding) from me.
>>> 
>>> Installed with:
>>> 
>>> AIRFLOW_GPL_UNIDECODE=yes pip install 
>>> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>>>  
>>> ,
>>>  s3, crypto]>=1.10'
>>> 
>>> Install went fine.
>>> 
>>> Our DAGs that use SparkSubmitOperator are now failing as there is now a 
>>> hard dependency on the Kubernetes client libs, but the `emr` group doesn't 
>>> mention this.
>>> 
>>> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
>>> 
>>> 
>>> I see two options for this - either conditionally enable k8s:// support if 
>>> the import works, or (less preferred) add kube-client to the emr deps 
>>> (which I like less)
>>> 
>>> Sorry - this is the first time I've been able to test it.
>>> 
>>> I will install this dep manually and continue testing.
>>> 
>>> -ash
>>> 
>>> (Normally no time at home due to new baby, but I got a standing desk, and a 
>>> carrier meaning she can sleep on me and I can use my laptop. Win!)
>>> 
>>> 
>>> 
 On 4 Aug 2018, at 22:32, Bolke de Bruin >>> > wrote:
 
 Bump. 
 
 Committers please cast your vote. 
 
 B.
 
 Sent from my iPhone
 
> On 3 Aug 2018, at 13:23, Driesprong, Fokko  > wrote:
> 
> +1 Binding
> 
> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>  
> 
>>>

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
aware colums in the task instance is right, or at least it's not what I 
expected.

Before weren't all TZs from schedule dates etc in UTC? For the same task 
instance (these outputs from psql directly):

before: execution_date=2017-09-04 00:00:00
after: execution_date=2017-09-04 01:00:00+01

**Okay the migration is fine**. It appears that the migration has done the 
right thing, but my local DB I'm testing with has a Timezone of GB set, so 
Postgres converts it to that TZ on returning an object.

3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
consistent behaviour? Is this possible some how? I don't know SQLAlchemy that 
well.


-ash

> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor  
> wrote:
> 
> 1.) Missing UPDATING note about change of task_log_reader to now always being 
> "task" (was "s3.task" before.). Logging config is much simpler now though. 
> This may be particular to my logging config, but given how much of a pain it 
> was to set up S3 logging in 1.9 I have shared my config with some people in 
> the Gitter chat so It's not just me.
> 
> 2) The path that log-files are written to in S3 has changed (again - this 
> happened from 1.8 to 1.9). I'd like to avoid having to move all of my log 
> files again to continue viewing them. The change is that the path now (in 
> 1.10) has a timezone in it, and the date is in local time, before it was UTC:
> 
> before: 2018-07-23T00:00:00/1.log
> after: 2018-07-23T01:00:00+01:00/1.log
> 
> We can possibly get away with an updating note about this to set a custom 
> log_filename_template. Testing this now.
> 
>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor  wrote:
>> 
>> -1(binding) from me.
>> 
>> Installed with:
>> 
>> AIRFLOW_GPL_UNIDECODE=yes pip install 
>> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>>  
>> ,
>>  s3, crypto]>=1.10'
>> 
>> Install went fine.
>> 
>> Our DAGs that use SparkSubmitOperator are now failing as there is now a hard 
>> dependency on the Kubernetes client libs, but the `emr` group doesn't 
>> mention this.
>> 
>> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
>> 
>> 
>> I see two options for this - either conditionally enable k8s:// support if 
>> the import works, or (less preferred) add kube-client to the emr deps (which 
>> I like less)
>> 
>> Sorry - this is the first time I've been able to test it.
>> 
>> I will install this dep manually and continue testing.
>> 
>> -ash
>> 
>> (Normally no time at home due to new baby, but I got a standing desk, and a 
>> carrier meaning she can sleep on me and I can use my laptop. Win!)
>> 
>> 
>> 
>>> On 4 Aug 2018, at 22:32, Bolke de Bruin >> > wrote:
>>> 
>>> Bump. 
>>> 
>>> Committers please cast your vote. 
>>> 
>>> B.
>>> 
>>> Sent from my iPhone
>>> 
 On 3 Aug 2018, at 13:23, Driesprong, Fokko >>> > wrote:
 
 +1 Binding
 
 Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
 https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
  
 
 
 Cheers, Fokko
 
 2018-08-03 9:47 GMT+02:00 Bolke de Bruin :
 
> Hey all,
> 
> I have cut Airflow 1.10.0 RC3. This email is calling a vote on the 
> release,
> which will last for 72 hours. Consider this my (binding) +1.
> 
> Airflow 1.10.0 RC 3 is available at:
> 
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
> 
> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release that
> comes with INSTALL instructions.
> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python
> "sdist"
> release.
> 
> Public keys are available at:
> 
> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> https://dist.apache.org/repos/dist/release/incubator/airflow/>
> 
> The amount of JIRAs fixed is over 700. Please have a look at the
> changelog.
> Since RC2 the following has been fixed:
> 
> * [AIRFLOW-2817] Force explicit choice on GPL dependency
> * [AIRFLOW-2716] Replace async and await py3.7 keywords
> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
> 
> Please note that the version number excludes the `rcX` string as well
> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
> to rename the artif

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
-1(binding) from me.

Installed with:

AIRFLOW_GPL_UNIDECODE=yes pip install 
'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr,
 s3, crypto]>=1.10'

Install went fine.

Our DAGs that use SparkSubmitOperator are now failing as there is now a hard 
dependency on the Kubernetes client libs, but the `emr` group doesn't mention 
this.

Introduced in https://github.com/apache/incubator-airflow/pull/3112 


I see two options for this - either conditionally enable k8s:// support if the 
import works, or (less preferred) add kube-client to the emr deps (which I like 
less)

Sorry - this is the first time I've been able to test it.

I will install this dep manually and continue testing.

-ash

(Normally no time at home due to new baby, but I got a standing desk, and a 
carrier meaning she can sleep on me and I can use my laptop. Win!)



> On 4 Aug 2018, at 22:32, Bolke de Bruin  wrote:
> 
> Bump. 
> 
> Committers please cast your vote. 
> 
> B.
> 
> Sent from my iPhone
> 
>> On 3 Aug 2018, at 13:23, Driesprong, Fokko  wrote:
>> 
>> +1 Binding
>> 
>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>> 
>> Cheers, Fokko
>> 
>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin :
>> 
>>> Hey all,
>>> 
>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote on the release,
>>> which will last for 72 hours. Consider this my (binding) +1.
>>> 
>>> Airflow 1.10.0 RC 3 is available at:
>>> 
>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>>> 
>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release that
>>> comes with INSTALL instructions.
>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python
>>> "sdist"
>>> release.
>>> 
>>> Public keys are available at:
>>> 
>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
>>> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>> 
>>> The amount of JIRAs fixed is over 700. Please have a look at the
>>> changelog.
>>> Since RC2 the following has been fixed:
>>> 
>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency
>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords
>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
>>> 
>>> Please note that the version number excludes the `rcX` string as well
>>> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
>>> to rename the artifact without modifying the artifact checksums when we
>>> actually release.
>>> 
>>> WARNING: Due to licensing requirements you will need to set
>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>>> installing or upgrading. We will try to remove this requirement for the
>>> next release.
>>> 
>>> Cheers,
>>> Bolke



Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
1) Missing UPDATING note about change of task_log_reader to now always being 
"task" (was "s3.task" before.). Logging config is much simpler now though. This 
may be particular to my logging config, but given how much of a pain it was to 
set up S3 logging in 1.9 I have shared my config with some people in the Gitter 
chat so It's not just me.

2) The path that log-files are written to in S3 has changed (again - this 
happened from 1.8 to 1.9). I'd like to avoid having to move all of my log files 
again to continue viewing them. The change is that the path now (in 1.10) has a 
timezone in it, and the date is in local time, before it was UTC:

before: 2018-07-23T00:00:00/1.log
after: 2018-07-23T01:00:00+01:00/1.log

We can possibly get away with an updating note about this to set a custom 
log_filename_template. Testing this now.


> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor  wrote:
> 
> -1(binding) from me.
> 
> Installed with:
> 
> AIRFLOW_GPL_UNIDECODE=yes pip install 
> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>  
> ,
>  s3, crypto]>=1.10'
> 
> Install went fine.
> 
> Our DAGs that use SparkSubmitOperator are now failing as there is now a hard 
> dependency on the Kubernetes client libs, but the `emr` group doesn't mention 
> this.
> 
> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
> 
> 
> I see two options for this - either conditionally enable k8s:// support if 
> the import works, or (less preferred) add kube-client to the emr deps (which 
> I like less)
> 
> Sorry - this is the first time I've been able to test it.
> 
> I will install this dep manually and continue testing.
> 
> -ash
> 
> (Normally no time at home due to new baby, but I got a standing desk, and a 
> carrier meaning she can sleep on me and I can use my laptop. Win!)
> 
> 
> 
>> On 4 Aug 2018, at 22:32, Bolke de Bruin > > wrote:
>> 
>> Bump. 
>> 
>> Committers please cast your vote. 
>> 
>> B.
>> 
>> Sent from my iPhone
>> 
>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko >> > wrote:
>>> 
>>> +1 Binding
>>> 
>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>>>  
>>> 
>>> 
>>> Cheers, Fokko
>>> 
>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin :
>>> 
 Hey all,
 
 I have cut Airflow 1.10.0 RC3. This email is calling a vote on the release,
 which will last for 72 hours. Consider this my (binding) +1.
 
 Airflow 1.10.0 RC 3 is available at:
 
 https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
 https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
 
 apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release that
 comes with INSTALL instructions.
 apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python
 "sdist"
 release.
 
 Public keys are available at:
 
 https://dist.apache.org/repos/dist/release/incubator/airflow/ <
 https://dist.apache.org/repos/dist/release/incubator/airflow/>
 
 The amount of JIRAs fixed is over 700. Please have a look at the
 changelog.
 Since RC2 the following has been fixed:
 
 * [AIRFLOW-2817] Force explicit choice on GPL dependency
 * [AIRFLOW-2716] Replace async and await py3.7 keywords
 * [AIRFLOW-2810] Fix typo in Xcom model timestamp
 
 Please note that the version number excludes the `rcX` string as well
 as the "+incubating" string, so it's now simply 1.10.0. This will allow us
 to rename the artifact without modifying the artifact checksums when we
 actually release.
 
 WARNING: Due to licensing requirements you will need to set
 SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
 installing or upgrading. We will try to remove this requirement for the
 next release.
 
 Cheers,
 Bolke
> 



Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Ash Berlin-Taylor
1.) Missing UPDATING note about change of task_log_reader to now always being 
"task" (was "s3.task" before.). Logging config is much simpler now though. This 
may be particular to my logging config, but given how much of a pain it was to 
set up S3 logging in 1.9 I have shared my config with some people in the Gitter 
chat so It's not just me.

2) The path that log-files are written to in S3 has changed (again - this 
happened from 1.8 to 1.9). I'd like to avoid having to move all of my log files 
again to continue viewing them. The change is that the path now (in 1.10) has a 
timezone in it, and the date is in local time, before it was UTC:

before: 2018-07-23T00:00:00/1.log
after: 2018-07-23T01:00:00+01:00/1.log

We can possibly get away with an updating note about this to set a custom 
log_filename_template. Testing this now.

> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor  wrote:
> 
> -1(binding) from me.
> 
> Installed with:
> 
> AIRFLOW_GPL_UNIDECODE=yes pip install 
> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>  
> ,
>  s3, crypto]>=1.10'
> 
> Install went fine.
> 
> Our DAGs that use SparkSubmitOperator are now failing as there is now a hard 
> dependency on the Kubernetes client libs, but the `emr` group doesn't mention 
> this.
> 
> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
> 
> 
> I see two options for this - either conditionally enable k8s:// support if 
> the import works, or (less preferred) add kube-client to the emr deps (which 
> I like less)
> 
> Sorry - this is the first time I've been able to test it.
> 
> I will install this dep manually and continue testing.
> 
> -ash
> 
> (Normally no time at home due to new baby, but I got a standing desk, and a 
> carrier meaning she can sleep on me and I can use my laptop. Win!)
> 
> 
> 
>> On 4 Aug 2018, at 22:32, Bolke de Bruin > > wrote:
>> 
>> Bump. 
>> 
>> Committers please cast your vote. 
>> 
>> B.
>> 
>> Sent from my iPhone
>> 
>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko >> > wrote:
>>> 
>>> +1 Binding
>>> 
>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>>>  
>>> 
>>> 
>>> Cheers, Fokko
>>> 
>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin :
>>> 
 Hey all,
 
 I have cut Airflow 1.10.0 RC3. This email is calling a vote on the release,
 which will last for 72 hours. Consider this my (binding) +1.
 
 Airflow 1.10.0 RC 3 is available at:
 
 https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
 https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
 
 apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release that
 comes with INSTALL instructions.
 apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python
 "sdist"
 release.
 
 Public keys are available at:
 
 https://dist.apache.org/repos/dist/release/incubator/airflow/ <
 https://dist.apache.org/repos/dist/release/incubator/airflow/>
 
 The amount of JIRAs fixed is over 700. Please have a look at the
 changelog.
 Since RC2 the following has been fixed:
 
 * [AIRFLOW-2817] Force explicit choice on GPL dependency
 * [AIRFLOW-2716] Replace async and await py3.7 keywords
 * [AIRFLOW-2810] Fix typo in Xcom model timestamp
 
 Please note that the version number excludes the `rcX` string as well
 as the "+incubating" string, so it's now simply 1.10.0. This will allow us
 to rename the artifact without modifying the artifact checksums when we
 actually release.
 
 WARNING: Due to licensing requirements you will need to set
 SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
 installing or upgrading. We will try to remove this requirement for the
 next release.
 
 Cheers,
 Bolke
>