Re: Deprecate legacy UI in favor of FAB RBAC

2018-12-18 Thread Bolke de Bruin
The branch has for 1.10 is different from master (v1-10-test and stable) so
there is no issue from that perspective. If there are fixes for 1.10 after
the merge that concern the old ui they can just be based against the right
branch.

B.


Op di 18 dec. 2018 13:49 schreef airflowuser
 I don't think this can be merged until stopping adding fixes to 1.10.2 as
> there are some fixes for the old UI which this PR removes.
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐ Original Message ‐‐‐
> On Tuesday, December 18, 2018 12:55 PM, Verdan Mahmood <
> verdan.mahm...@gmail.com> wrote:
>
> > Hi all,
> >
> > At the moment, we do have 2 versions of UI for Apache Airflow which is
> > really hard to maintain.
> > In an effort to remove the legacy UI, and make the FAB (RBAC UI) the
> > default and only UI version of Apache Airflow, I've opened a PR, with all
> > tests passing on Travis.
> >
> > PR: https://github.com/apache/incubator-airflow/pull/4339
> > Travis Build:
> > https://travis-ci.org/apache/incubator-airflow/builds/469438619
> >
> > Since it's a big change and really hard to keep it updated with the
> latest
> > master all the time, can someone please help with the reviews?
> >
> > Best,
> > Verdan Mahmood
>
>
>


Re: [RESULT] Graduate Apache Airflow as a TLP

2018-12-09 Thread Bolke de Bruin
Woohoo :-)

Thanks Jakob!


Op zo 9 dec. 2018 20:01 schreef Jakob Homan  The VOTE passed in Incubator.  I'll add it the resolution to the Board
> agenda, who meet this month on the 19th.  They'll approve and Airflow
> will be a TLP.
>
> -Jakob
> On Wed, Dec 5, 2018 at 3:31 PM Jakob Homan  wrote:
> >
> > Been traveling today.  Just started the VOTE over in the general@ in
> Incubator.
> >
> > -jg
> > On Wed, Dec 5, 2018 at 1:26 PM Kaxil Naik  wrote:
> > >
> > > Hi Jakob,
> > >
> > > Did you raise this with IPMC, can we track it somewhere?
> > >
> > > Excited for graduation :-)
> > >
> > > Regards,
> > > Kaxil
> > >
> > > On Tue, Dec 4, 2018, 20:04 Ash Berlin-Taylor  > >>
> > >> Missed my vote off that list :)
> > >>
> > >> > On 4 Dec 2018, at 18:17, Jakob Homan  wrote:
> > >> >
> > >> > I neglected to add my binding +1, so I'll do so now.
> > >> >
> > >> > With three days having elapsed, the VOTE is concluded successfully.
> > >> >
> > >> > Overall: 20 x +1 votes, 0 x -1 votes
> > >> >
> > >> > Binding +1 x 10: Kaxil, Tao, Bolke, Fokko, Maxime, Arthur, Hitesh,
> > >> > Chris, Sid, Jakob.
> > >> > Non-binding +1 x 10: Daniel, Shah, Stefan, Kevin, Marc, Sunil,
> > >> > Adityan, Deng, Neelesh, Sai
> > >> >
> > >> > I'll use this result to start the corresponding VOTE on the IPMC.
> I'm
> > >> > at an offsite today, so I have limited email time.  Likely will open
> > >> > the VOTE this evening.
> > >> >
> > >> > Thanks everyone.
> > >> > -Jakob
> > >> >
> > >> >
> > >> > On Tue, Dec 4, 2018 at 6:03 AM Bolke de Bruin 
> wrote:
> > >> >>
> > >> >> Shall we close the vote? @jakob?
> > >> >>
> > >> >>> On 2 Dec 2018, at 13:08, Sid Anand  wrote:
> > >> >>>
> > >> >>> +1 binding
> > >> >>>
> > >> >>> Woot! Thanks to all for this happy day!
> > >> >>> -s
> > >> >>>
> > >> >>> On Sun, Dec 2, 2018 at 1:25 AM Sai Phanindhra <
> phani8...@gmail.com> wrote:
> > >> >>>
> > >> >>>> +1 (non binding)
> > >> >>>>
> > >> >>>> Excited to see this happenning.
> > >> >>>>
> > >> >>>> On Sat 1 Dec, 2018, 20:35  > >> >>>>
> > >> >>>>> +1 (binding)!
> > >> >>>>>
> > >> >>>>> On 30 November 2018 21:33:14 GMT, Jakob Homan <
> jgho...@gmail.com> wrote:
> > >> >>>>>> Hey all!
> > >> >>>>>>
> > >> >>>>>> Following a very successful DISCUSS[1] regarding graduating
> Airflow to
> > >> >>>>>> Top Level Project (TLP) status, I'm starting the official VOTE.
> > >> >>>>>>
> > >> >>>>>> Since entering the Incubator in 2016, the community has:
> > >> >>>>>> * successfully produced 7 releases
> > >> >>>>>> * added 9 new committers/PPMC members
> > >> >>>>>> * built a diverse group of committers from multiple different
> employers
> > >> >>>>>> * had more than 3,300 JIRA tickets opened
> > >> >>>>>> * completed the project maturity model with positive
> responses[2]
> > >> >>>>>>
> > >> >>>>>> Accordingly, I believe we're ready to graduate and am calling
> a VOTE
> > >> >>>>>> on the following graduation resolution.  This VOTE will remain
> open
> > >> >>>>>> for at least 72 hours.  If successful, the resolution will be
> > >> >>>>>> forwarded to the IPMC for its consideration.  If that VOTE is
> > >> >>>>>> successful, the resolution will be voted upon by the Board at
> its next
> > >> >>>>>> monthly meeting.
> > >> >>>>>>
> > >> >>>>>> Everyone is encouraged to vote, even if their vote is not
> binding.
> > >> >>>>>>

Re: [VOTE] Graduate the Apache Airflow as a TLP

2018-12-04 Thread Bolke de Bruin
Shall we close the vote? @jakob?

> On 2 Dec 2018, at 13:08, Sid Anand  wrote:
> 
> +1 binding
> 
> Woot! Thanks to all for this happy day!
> -s
> 
> On Sun, Dec 2, 2018 at 1:25 AM Sai Phanindhra  wrote:
> 
>> +1 (non binding)
>> 
>> Excited to see this happenning.
>> 
>> On Sat 1 Dec, 2018, 20:35 > 
>>> +1 (binding)!
>>> 
>>> On 30 November 2018 21:33:14 GMT, Jakob Homan  wrote:
>>>> Hey all!
>>>> 
>>>> Following a very successful DISCUSS[1] regarding graduating Airflow to
>>>> Top Level Project (TLP) status, I'm starting the official VOTE.
>>>> 
>>>> Since entering the Incubator in 2016, the community has:
>>>>  * successfully produced 7 releases
>>>>  * added 9 new committers/PPMC members
>>>> * built a diverse group of committers from multiple different employers
>>>>  * had more than 3,300 JIRA tickets opened
>>>>  * completed the project maturity model with positive responses[2]
>>>> 
>>>> Accordingly, I believe we're ready to graduate and am calling a VOTE
>>>> on the following graduation resolution.  This VOTE will remain open
>>>> for at least 72 hours.  If successful, the resolution will be
>>>> forwarded to the IPMC for its consideration.  If that VOTE is
>>>> successful, the resolution will be voted upon by the Board at its next
>>>> monthly meeting.
>>>> 
>>>> Everyone is encouraged to vote, even if their vote is not binding.
>>>> We've built a nice community here, let's make sure everyone has their
>>>> voice heard.
>>>> 
>>>> Thanks,
>>>> Jakob
>>>> 
>>>> [1]
>>>> 
>>> 
>> https://lists.apache.org/thread.html/%3c0a763b0b-7d0d-4353-979a-ac6769eb0...@gmail.com%3E
>>>> [2]
>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
>>>> 
>>>> 
>>>> 
>>>> Establish the Apache Airflow Project
>>>> 
>>>> WHEREAS, the Board of Directors deems it to be in the best
>>>> interests of the Foundation and consistent with the
>>>> Foundation's purpose to establish a Project Management
>>>> Committee charged with the creation and maintenance of
>>>> open-source software, for distribution at no charge to
>>>> the public, related to workflow automation and scheduling
>>>> that can be used to author and manage data pipelines.
>>>> 
>>>> NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>>>> Committee (PMC), to be known as the "Apache Airflow Project",
>>>> be and hereby is established pursuant to Bylaws of the
>>>> Foundation; and be it further
>>>> 
>>>> RESOLVED, that the Apache Airflow Project be and hereby is
>>>> responsible for the creation and maintenance of software
>>>> related to workflow automation and scheduling that can be
>>>> used to author and manage data pipelines; and be it further
>>>> 
>>>> RESOLVED, that the office of "Vice President, Apache Airflow" be
>>>> and hereby is created, the person holding such office to
>>>> serve at the direction of the Board of Directors as the chair
>>>> of the Apache Airflow Project, and to have primary responsibility
>>>> for management of the projects within the scope of
>>>> responsibility of the Apache Airflow Project; and be it further
>>>> 
>>>> RESOLVED, that the persons listed immediately below be and
>>>> hereby are appointed to serve as the initial members of the
>>>> Apache Airflow Project:
>>>> 
>>>> * Alex Guziel 
>>>> * Alex Van Boxel 
>>>> * Arthur Wiedmer 
>>>> * Ash Berlin-Taylor 
>>>> * Bolke de Bruin 
>>>> * Chris Riccomini 
>>>> * Dan Davydov 
>>>> * Fokko Driesprong 
>>>> * Hitesh Shah 
>>>> * Jakob Homan 
>>>> * Jeremiah Lowin 
>>>> * Joy Gao 
>>>> * Kaxil Naik 
>>>> * Maxime Beauchemin 
>>>> * Siddharth Anand 
>>>> * Sumit Maheshwari 
>>>> * Tao Feng 
>>>> 
>>>> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Bolke de Bruin
>>>> be appointed to the office of Vice President, Apache Airflow, to
>>>> serve in accordance with and subject to the direction of the
>>>> Board of Directors and the Bylaws of the Foundation until
>>>> death, resignation, retirement, removal or disqualification,
>>>> or until a successor is appointed; and be it further
>>>> 
>>>> RESOLVED, that the initial Apache Airflow PMC be and hereby is
>>>> tasked with the creation of a set of bylaws intended to
>>>> encourage open development and increased participation in the
>>>> Apache Airflow Project; and be it further
>>>> 
>>>> RESOLVED, that the Apache Airflow Project be and hereby
>>>> is tasked with the migration and rationalization of the Apache
>>>> Incubator Airflow podling; and be it further
>>>> 
>>>> RESOLVED, that all responsibilities pertaining to the Apache
>>>> Incubator Airflow podling encumbered upon the Apache Incubator
>>>> Project are hereafter discharged.
>>> 
>> 



Re: [VOTE] Graduate the Apache Airflow as a TLP

2018-11-30 Thread Bolke de Bruin
+1, binding

Yahoo! :-)

Verstuurd vanaf mijn iPad

> Op 30 nov. 2018 om 22:48 heeft Tao Feng  het volgende 
> geschreven:
> 
> +1 (binding)
> 
> Thanks Jakob and everyone!
> 
>> On Fri, Nov 30, 2018 at 1:33 PM Jakob Homan  wrote:
>> 
>> Hey all!
>> 
>> Following a very successful DISCUSS[1] regarding graduating Airflow to
>> Top Level Project (TLP) status, I'm starting the official VOTE.
>> 
>> Since entering the Incubator in 2016, the community has:
>>   * successfully produced 7 releases
>>   * added 9 new committers/PPMC members
>>   * built a diverse group of committers from multiple different employers
>>   * had more than 3,300 JIRA tickets opened
>>   * completed the project maturity model with positive responses[2]
>> 
>> Accordingly, I believe we're ready to graduate and am calling a VOTE
>> on the following graduation resolution.  This VOTE will remain open
>> for at least 72 hours.  If successful, the resolution will be
>> forwarded to the IPMC for its consideration.  If that VOTE is
>> successful, the resolution will be voted upon by the Board at its next
>> monthly meeting.
>> 
>> Everyone is encouraged to vote, even if their vote is not binding.
>> We've built a nice community here, let's make sure everyone has their
>> voice heard.
>> 
>> Thanks,
>> Jakob
>> 
>> [1]
>> https://lists.apache.org/thread.html/%3c0a763b0b-7d0d-4353-979a-ac6769eb0...@gmail.com%3E
>> [2]
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
>> 
>> 
>> 
>> Establish the Apache Airflow Project
>> 
>> WHEREAS, the Board of Directors deems it to be in the best
>> interests of the Foundation and consistent with the
>> Foundation's purpose to establish a Project Management
>> Committee charged with the creation and maintenance of
>> open-source software, for distribution at no charge to
>> the public, related to workflow automation and scheduling
>> that can be used to author and manage data pipelines.
>> 
>> NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>> Committee (PMC), to be known as the "Apache Airflow Project",
>> be and hereby is established pursuant to Bylaws of the
>> Foundation; and be it further
>> 
>> RESOLVED, that the Apache Airflow Project be and hereby is
>> responsible for the creation and maintenance of software
>> related to workflow automation and scheduling that can be
>> used to author and manage data pipelines; and be it further
>> 
>> RESOLVED, that the office of "Vice President, Apache Airflow" be
>> and hereby is created, the person holding such office to
>> serve at the direction of the Board of Directors as the chair
>> of the Apache Airflow Project, and to have primary responsibility
>> for management of the projects within the scope of
>> responsibility of the Apache Airflow Project; and be it further
>> 
>> RESOLVED, that the persons listed immediately below be and
>> hereby are appointed to serve as the initial members of the
>> Apache Airflow Project:
>> 
>> * Alex Guziel 
>> * Alex Van Boxel 
>> * Arthur Wiedmer 
>> * Ash Berlin-Taylor 
>> * Bolke de Bruin 
>> * Chris Riccomini 
>> * Dan Davydov 
>> * Fokko Driesprong 
>> * Hitesh Shah 
>> * Jakob Homan 
>> * Jeremiah Lowin 
>> * Joy Gao 
>> * Kaxil Naik 
>> * Maxime Beauchemin 
>> * Siddharth Anand 
>> * Sumit Maheshwari 
>> * Tao Feng 
>> 
>> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Bolke de Bruin
>> be appointed to the office of Vice President, Apache Airflow, to
>> serve in accordance with and subject to the direction of the
>> Board of Directors and the Bylaws of the Foundation until
>> death, resignation, retirement, removal or disqualification,
>> or until a successor is appointed; and be it further
>> 
>> RESOLVED, that the initial Apache Airflow PMC be and hereby is
>> tasked with the creation of a set of bylaws intended to
>> encourage open development and increased participation in the
>> Apache Airflow Project; and be it further
>> 
>> RESOLVED, that the Apache Airflow Project be and hereby
>> is tasked with the migration and rationalization of the Apache
>> Incubator Airflow podling; and be it further
>> 
>> RESOLVED, that all responsibilities pertaining to the Apache
>> Incubator Airflow podling encumbered upon the Apache Incubator
>> Project are hereafter discharged.
>> 


Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-11-28 Thread Bolke de Bruin
Ping!

Verstuurd vanaf mijn iPad

> Op 27 nov. 2018 om 21:39 heeft Bolke de Bruin  het 
> volgende geschreven:
> 
> Hi Folks,
> 
> Thanks all for your responses and particularly Stefan for his suggestion to 
> use the generic Apache way to handle security issues. This seems to be an 
> accepted way for more projects, so I have added this to the maturity 
> evaluation[1] and marked is as resolved. While handling the GPL library can 
> be nicer we are already in compliance with CD30, so @Fokko and @Ash if you 
> want to help out towards graduation please spend your time elsewhere like 
> fixing CO50. This means adding a page to confluence that describes how to 
> become a committer on the project. As we are following Apache many examples 
> of other projects are around[2]
> 
> Then there is the paperwork[3] as referred to by Jakob. This mainly concerns 
> filling in some items, maybe here and there creation some documentation but I 
> don't think much. @Kaxil, @Tao: are you willing to pick this up? @Sid can you 
> share how to edit that page? 
> 
> If we have resolved these items in my opinion we can start the voting here 
> and at the IPMC thereafter, targeting the board meeting of January for 
> graduation. How’s that for a New Year’s resolution?
> 
> Cheers!
> Bolke
> 
> P.S. Would it be nice to have updated graduation web page? Maybe one of the 
> contributors/community members likes to take a stab at this[4]
> 
> [1] https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
> [2] https://cwiki.apache.org/confluence/display/HAWQ/Becoming+a+committer
> [3] http://incubator.apache.org/projects/airflow.html
> [4] https://airflow.apache.org/
> 
> 
> 
>> On 27 Nov 2018, at 16:32, Driesprong, Fokko  wrote:
>> 
>> +1 from my side. Would be awesome to graduate Airflow
>> 
>> If time allows, I'll also dive into CD30.
>> 
>> Cheers, Fokko
>> 
>> Op di 27 nov. 2018 om 16:21 schreef Ash Berlin-Taylor :
>> 
>>> Oarsome Bolke, thanks for starting this.
>>> 
>>> It looks like we are closer than I thought!
>>> 
>>> We can use those security lists (though having our own would be nice) -
>>> either way we will need to make this prominent in the docs.
>>> 
>>> Couple of points
>>> 
>>> CS10: that github link is only visible to members of the team
>>> 
>>> CD30: probably good as it is, we may want to do
>>> https://issues.apache.org/jira/browse/AIRFLOW-3400 <
>>> https://issues.apache.org/jira/browse/AIRFLOW-3400> to remove the last
>>> niggle of the GPL env var at install time (but not a hard requirement, just
>>> nice)
>>> 
>>> -ash
>>> 
>>>> On 26 Nov 2018, at 21:10, Stefan Seelmann 
>>> wrote:
>>>> 
>>>> I agree that Apache Airflow should graduate.
>>>> 
>>>> I'm only involved since beginning of this year, but the project did two
>>>> releases during that time, once TLP releasing becomes easier :)
>>>> 
>>>> Regarding QU30 you may consider to use the ASF wide security mailing
>>>> list [3] and process [4].
>>>> 
>>>> Kind Regards,
>>>> Stefan
>>>> 
>>>> [3] https://www.apache.org/security/
>>>> [4] https://www.apache.org/security/committers.html
>>>> 
>>>> 
>>>>> On 11/26/18 8:46 PM, Bolke de Bruin wrote:
>>>>> Ping!
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On 24 Nov 2018, at 12:57, Bolke de Bruin  wrote:
>>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> With the Apache Airflow community healthy and growing, I think now
>>> would be a good time to
>>>>>> discuss where we stand regarding to graduation from the Incubator, and
>>> what requirements remains.
>>>>>> 
>>>>>> Apache Airflow entered incubation around 2 years ago, since then, the
>>> Airflow community learned
>>>>>> a lot about how to do things in Apache ways. Now we are a very helpful
>>> and engaged community,
>>>>>> ready to help on all questions from the Airflow community. We
>>> delivered multiple releases that have
>>>>>> been increasing in quality ever since, now we can do self-driving
>>> releases in good cadence.
>>>>>> 
>>>>>> The community is growing, new committers and PPMC members keep
>>> joining. We addressed almost all
>>>>>> the maturity issues stipulated by Apache Project Maturity Model [1].
>>> So final requirements remain, but
>>>>>> those just need a final nudge. Committers and contributors are invited
>>> to verify the list and pick up the last
>>>>>> bits (QU30, CO50). Finally (yahoo!) all the License and IP issues we
>>> can see got resolved.
>>>>>> 
>>>>>> Base on those, I believes it's time for us to graduate to TLP. [2] Any
>>> thoughts?
>>>>>> And welcome advice from Airflow Mentors?
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> [1]
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
>>>>>> [2]
>>> https://incubator.apache.org/guides/graduation.html#graduating_to_a_top_level_project
>>> Regards,
>>>> 
>>> 
>>> 
> 


Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-11-27 Thread Bolke de Bruin
Hi Folks,

Thanks all for your responses and particularly Stefan for his suggestion to use 
the generic Apache way to handle security issues. This seems to be an accepted 
way for more projects, so I have added this to the maturity evaluation[1] and 
marked is as resolved. While handling the GPL library can be nicer we are 
already in compliance with CD30, so @Fokko and @Ash if you want to help out 
towards graduation please spend your time elsewhere like fixing CO50. This 
means adding a page to confluence that describes how to become a committer on 
the project. As we are following Apache many examples of other projects are 
around[2]

Then there is the paperwork[3] as referred to by Jakob. This mainly concerns 
filling in some items, maybe here and there creation some documentation but I 
don't think much. @Kaxil, @Tao: are you willing to pick this up? @Sid can you 
share how to edit that page? 

If we have resolved these items in my opinion we can start the voting here and 
at the IPMC thereafter, targeting the board meeting of January for graduation. 
How’s that for a New Year’s resolution?

Cheers!
Bolke

P.S. Would it be nice to have updated graduation web page? Maybe one of the 
contributors/community members likes to take a stab at this[4]

[1] https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation 
<https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation>
[2] https://cwiki.apache.org/confluence/display/HAWQ/Becoming+a+committer 
<https://cwiki.apache.org/confluence/display/HAWQ/Becoming+a+committer>
[3] http://incubator.apache.org/projects/airflow.html 
<http://incubator.apache.org/projects/airflow.html>
[4] https://airflow.apache.org/ <https://airflow.apache.org/>



> On 27 Nov 2018, at 16:32, Driesprong, Fokko  wrote:
> 
> +1 from my side. Would be awesome to graduate Airflow
> 
> If time allows, I'll also dive into CD30.
> 
> Cheers, Fokko
> 
> Op di 27 nov. 2018 om 16:21 schreef Ash Berlin-Taylor :
> 
>> Oarsome Bolke, thanks for starting this.
>> 
>> It looks like we are closer than I thought!
>> 
>> We can use those security lists (though having our own would be nice) -
>> either way we will need to make this prominent in the docs.
>> 
>> Couple of points
>> 
>> CS10: that github link is only visible to members of the team
>> 
>> CD30: probably good as it is, we may want to do
>> https://issues.apache.org/jira/browse/AIRFLOW-3400 <
>> https://issues.apache.org/jira/browse/AIRFLOW-3400> to remove the last
>> niggle of the GPL env var at install time (but not a hard requirement, just
>> nice)
>> 
>> -ash
>> 
>>> On 26 Nov 2018, at 21:10, Stefan Seelmann 
>> wrote:
>>> 
>>> I agree that Apache Airflow should graduate.
>>> 
>>> I'm only involved since beginning of this year, but the project did two
>>> releases during that time, once TLP releasing becomes easier :)
>>> 
>>> Regarding QU30 you may consider to use the ASF wide security mailing
>>> list [3] and process [4].
>>> 
>>> Kind Regards,
>>> Stefan
>>> 
>>> [3] https://www.apache.org/security/
>>> [4] https://www.apache.org/security/committers.html
>>> 
>>> 
>>> On 11/26/18 8:46 PM, Bolke de Bruin wrote:
>>>> Ping!
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On 24 Nov 2018, at 12:57, Bolke de Bruin  wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> With the Apache Airflow community healthy and growing, I think now
>> would be a good time to
>>>>> discuss where we stand regarding to graduation from the Incubator, and
>> what requirements remains.
>>>>> 
>>>>> Apache Airflow entered incubation around 2 years ago, since then, the
>> Airflow community learned
>>>>> a lot about how to do things in Apache ways. Now we are a very helpful
>> and engaged community,
>>>>> ready to help on all questions from the Airflow community. We
>> delivered multiple releases that have
>>>>> been increasing in quality ever since, now we can do self-driving
>> releases in good cadence.
>>>>> 
>>>>> The community is growing, new committers and PPMC members keep
>> joining. We addressed almost all
>>>>> the maturity issues stipulated by Apache Project Maturity Model [1].
>> So final requirements remain, but
>>>>> those just need a final nudge. Committers and contributors are invited
>> to verify the list and pick up the last
>>>>> bits (QU30, CO50). Finally (yahoo!) all the License and IP issues we
>> can see got resolved.
>>>>> 
>>>>> Base on those, I believes it's time for us to graduate to TLP. [2] Any
>> thoughts?
>>>>> And welcome advice from Airflow Mentors?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> [1]
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
>>>>> [2]
>> https://incubator.apache.org/guides/graduation.html#graduating_to_a_top_level_project
>> Regards,
>>> 
>> 
>> 



Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-11-26 Thread Bolke de Bruin
Hi Jakob

Thanks for the vote of confidence! That's appreciated. 

I linked the maturity model and already did a grading (I think :-) ) in my 
original mail, so one thing less to worry about. It's probably a good idea if 
some of the other committers and contributers also take a look. 2 items I'm 
unsure about. 

Cheers
Bolke



Sent from my iPhone

> On 26 Nov 2018, at 22:30, Jakob Homan  wrote:
> 
> With my Mentor hat on, I'm entirely confident that Airflow is ready to
> graduate.  The community broadly gets the Apache Way and operates
> within it.  The community is healthy and engaged.  The last couple
> releases went well, with no hitches whatsoever for the last one.
> 
> The graduation process is mainly paperwork[1] and running VOTEs [2]
> here and on the IPMC.  Last time I suggested this had to be done by
> the Champion, which wasn't correct - anyone from the PPMC can do so.
> I may have some free cycles over the next few weeks, so I'll take a
> look at the check list to see what we can get out of the way, but any
> of the PPMCers can also take items.  The IPMC also likes it if
> Podlings go through and grade themselves on the Maturity Model[2], if
> someone wants to do that.
> 
> -Jakob
> 
> [1] http://incubator.apache.org/projects/airflow.html
> [2] 
> http://mail-archives.apache.org/mod_mbox/airflow-dev/201809.mbox/%3CCAMdzn8vNbKQr2FF8WcJydFj16q0bgn0B42jfu5h28RS-ZQ=w...@mail.gmail.com%3E
> [3] https://community.apache.org/apache-way/apache-project-maturity-model.html
>> On Mon, Nov 26, 2018 at 1:10 PM Stefan Seelmann  
>> wrote:
>> 
>> I agree that Apache Airflow should graduate.
>> 
>> I'm only involved since beginning of this year, but the project did two
>> releases during that time, once TLP releasing becomes easier :)
>> 
>> Regarding QU30 you may consider to use the ASF wide security mailing
>> list [3] and process [4].
>> 
>> Kind Regards,
>> Stefan
>> 
>> [3] https://www.apache.org/security/
>> [4] https://www.apache.org/security/committers.html
>> 
>> 
>>> On 11/26/18 8:46 PM, Bolke de Bruin wrote:
>>> Ping!
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 24 Nov 2018, at 12:57, Bolke de Bruin  wrote:
>>>> 
>>>> Hi All,
>>>> 
>>>> With the Apache Airflow community healthy and growing, I think now would 
>>>> be a good time to
>>>> discuss where we stand regarding to graduation from the Incubator, and 
>>>> what requirements remains.
>>>> 
>>>> Apache Airflow entered incubation around 2 years ago, since then, the 
>>>> Airflow community learned
>>>> a lot about how to do things in Apache ways. Now we are a very helpful and 
>>>> engaged community,
>>>> ready to help on all questions from the Airflow community. We delivered 
>>>> multiple releases that have
>>>> been increasing in quality ever since, now we can do self-driving releases 
>>>> in good cadence.
>>>> 
>>>> The community is growing, new committers and PPMC members keep joining. We 
>>>> addressed almost all
>>>> the maturity issues stipulated by Apache Project Maturity Model [1]. So 
>>>> final requirements remain, but
>>>> those just need a final nudge. Committers and contributors are invited to 
>>>> verify the list and pick up the last
>>>> bits (QU30, CO50). Finally (yahoo!) all the License and IP issues we can 
>>>> see got resolved.
>>>> 
>>>> Base on those, I believes it's time for us to graduate to TLP. [2] Any 
>>>> thoughts?
>>>> And welcome advice from Airflow Mentors?
>>>> 
>>>> Thanks,
>>>> 
>>>> [1] https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
>>>> [2] 
>>>> https://incubator.apache.org/guides/graduation.html#graduating_to_a_top_level_project
>>>>  Regards,
>> 


Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-11-26 Thread Bolke de Bruin
Ping!

Sent from my iPhone

> On 24 Nov 2018, at 12:57, Bolke de Bruin  wrote:
> 
> Hi All,
> 
> With the Apache Airflow community healthy and growing, I think now would be a 
> good time to 
> discuss where we stand regarding to graduation from the Incubator, and what 
> requirements remains. 
> 
> Apache Airflow entered incubation around 2 years ago, since then, the Airflow 
> community learned 
> a lot about how to do things in Apache ways. Now we are a very helpful and 
> engaged community, 
> ready to help on all questions from the Airflow community. We delivered 
> multiple releases that have 
> been increasing in quality ever since, now we can do self-driving releases in 
> good cadence. 
> 
> The community is growing, new committers and PPMC members keep joining. We 
> addressed almost all 
> the maturity issues stipulated by Apache Project Maturity Model [1]. So final 
> requirements remain, but
> those just need a final nudge. Committers and contributors are invited to 
> verify the list and pick up the last 
> bits (QU30, CO50). Finally (yahoo!) all the License and IP issues we can see 
> got resolved. 
> 
> Base on those, I believes it's time for us to graduate to TLP. [2] Any 
> thoughts? 
> And welcome advice from Airflow Mentors? 
> 
> Thanks, 
> 
> [1] https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
> [2] 
> https://incubator.apache.org/guides/graduation.html#graduating_to_a_top_level_project
>  Regards,


[DISCUSS] Apache Airflow graduation from the incubator

2018-11-24 Thread Bolke de Bruin
Hi All,

With the Apache Airflow community healthy and growing, I think now would be a 
good time to 
discuss where we stand regarding to graduation from the Incubator, and what 
requirements remains. 

Apache Airflow entered incubation around 2 years ago, since then, the Airflow 
community learned 
a lot about how to do things in Apache ways. Now we are a very helpful and 
engaged community, 
ready to help on all questions from the Airflow community. We delivered 
multiple releases that have 
been increasing in quality ever since, now we can do self-driving releases in 
good cadence. 

The community is growing, new committers and PPMC members keep joining. We 
addressed almost all 
the maturity issues stipulated by Apache Project Maturity Model [1]. So final 
requirements remain, but
those just need a final nudge. Committers and contributors are invited to 
verify the list and pick up the last 
bits (QU30, CO50). Finally (yahoo!) all the License and IP issues we can see 
got resolved. 

Base on those, I believes it's time for us to graduate to TLP. [2] Any 
thoughts? 
And welcome advice from Airflow Mentors? 

Thanks, 

[1] https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
[2] 
https://incubator.apache.org/guides/graduation.html#graduating_to_a_top_level_project
 Regards,

Re: Remove airflow from pypi

2018-11-23 Thread Bolke de Bruin
Agree! This is even a security issue. 

Sent from my iPhone

> On 23 Nov 2018, at 15:29, Driesprong, Fokko  wrote:
> 
> Hi all,
> 
> I think we should remove airflow  (not
> apache-airflow) from Pypi. I still get questions from people who
> accidentally install Airflow 1.8.0. I see this is maintained
> by mistercrunch, artwr, aeon. Anyone any objections?
> 
> Cheers, Fokko


Re: Airflow 1.10.1 is released

2018-11-23 Thread Bolke de Bruin
That was pretty smooth sailing! Well done!

Sent from my iPhone

> On 22 Nov 2018, at 04:13, Kevin Yang  wrote:
> 
>  Thank you Ash for cutting the release!
> 
>> On Wed, Nov 21, 2018 at 6:09 PM Sid Anand  wrote:
>> 
>> Excellent work Ash! Thanks for doing the needful!!
>> 
>> -s
>> 
>>> On Wed, Nov 21, 2018 at 5:40 PM Tao Feng  wrote:
>>> 
>>> Thanks Ash for running the release!
>>> 
>>> On Wed, Nov 21, 2018 at 2:20 PM Ash Berlin-Taylor 
>> wrote:
>>> 
 Dear Airflow community,
 
 I'm happy to announce that Airflow 1.10.1 was just released.
 
 The source release as well as the binary "sdist" release are available
 here:
 
 
 
>>> 
>> https://dist.apache.org/repos/dist/release/incubator/airflow/1.10.1-incubating/
 
 We also made this version available on PyPi for convenience (`pip
>> install
 apache-airflow`):
 
 https://pypi.python.org/pypi/apache-airflow
 
 Find the CHANGELOG here for more details:
 
 https://github.com/apache/incubator-airflow/blob/master/CHANGELOG.txt
>>> 
>> 


Re: [VOTE CANCELED] Airflow 1.10.1rc1

2018-11-14 Thread Bolke de Bruin
You need to wait 72h and until you have 3+ votes. Satisfy both conditions :-)

On putting things on pypi I'm personally in favor. It's not an Apache channel 
thus not official. On the other hand Apache is Java focused and only 
understands SNAPSHOTS but that's essentially the same as this. 

B. 

Sent from my iPhone

> On 14 Nov 2018, at 18:37, Ash Berlin-Taylor  wrote:
> 
> We've had two regressions against this release reported in Slack so I'm 
> cancelling this vote, to re-open a new one once these two PRs are merged:
> 
> https://github.com/apache/incubator-airflow/pull/4186
> https://github.com/apache/incubator-airflow/pull/4187
> 
> Committeers: if you could look at the PRs so we can get a new vote started?
> 
> Some other questions: 
> 
> - Do our votes need to last 72 hours each, or can we have it be "72 hours or 
> until 3 (or 5) +1 binding votes?"
> - What do people think about making the RCs available on pip? It is how most 
> people install Airflow and publishing it there makes it easier for people to 
> test. (Pip won't install beta or rc versions when doing `pip install 
> apache-airflow`, you have to add `==1.10.1b1`, so it's "safe" in that regard.)
> 
> -ash
> 
>> On 13 Nov 2018, at 15:59, Ash Berlin-Taylor  wrote:
>> 
>> CORRECTION: Correct URLs are
>> 
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.1rc1 
>> 
>> 
>> Copy-and-paste fail
>> 
>> 
>>> On 13 Nov 2018, at 15:29, Ash Berlin-Taylor  wrote:
>>> 
>>> Hey all,
>>> 
>>> I have cut Airflow 1.10.1 RC1. This email is calling a vote on the release, 
>>> which will last for 72 hours. Consider this my (binding) +1.
>>> 
>>> Airflow 1.10.1 RC1 is available at:
>>> 
>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc1/
>>> 
>>> apache-airflow-1.10.1rc1+incubating-source.tar.gz is a source release that 
>>> comes with INSTALL instructions.
>>> apache-airflow-1.10.1rc1+incubating-bin.tar.gz is the binary Python "sdist" 
>>> release.
>>> 
>>> Public keys are available at:
>>> 
>>> https://dist.apache.org/repos/dist/release/incubator/airflow/KEYS
>>> 
>>> Only votes from PMC members are binding, but members of the community are 
>>> encouraged to test the release and vote with "(non-binding)".
>>> 
>>> Changes since 1.10.1b1:
>>> 
>>> [AIRFLOW-XXX] Correct date and version in Changelog
>>> [AIRFLOW-2779] Add license headers to doc files (#4178)
>>> [AIRFLOW-XXX] Changelog and version for 1.10.1
>>> [AIRFLOW-2779] Add license headers to doc files (#4178)
>>> [AIRFLOW-2779] Add project version to license (#4177)
>>> [AIRFLOW-XXX] Sync changelog between release and master branch
>>> [AIRFLOW-XXX] Add missing docs for SNS classes (#4155)
>>> [AIRFLOW-3072] Assign permission get_logs_with_metadata to viewer role 
>>> (#4175)
>>> [AIRFLOW-2723] Update lxml dependancy to >= 4.0.0
>>> [AIRFLOW-3325] Fix UI Page DAGs-column 'Recent Tasks' display issue (#4173)
>>> [AIRFLOW-XXX] Update Updating instructions for changes in 1.10.1
>>> [AIRFLOW-XXX] Fix a few typos in CHANGELOG (#4169)
>>> 
>>> 
>>> Full changelog is below:
>>> 
>>> New features:
>>> 
>>> [AIRFLOW-2524] Airflow integration with AWS Sagemaker
>>> [AIRFLOW-2657] Add ability to delete DAG from web ui
>>> [AIRFLOW-2780] Adds IMAP Hook to interact with a mail server
>>> [AIRFLOW-2794] Add delete support for Azure blob
>>> [AIRFLOW-2912] Add operators for Google Cloud Functions
>>> [AIRFLOW-2974] Add Start/Restart/Terminate methods Databricks Hook
>>> [AIRFLOW-2989] No Parameter to change bootDiskType for 
>>> DataprocClusterCreateOperator
>>> [AIRFLOW-3078] Basic operators for Google Compute Engine
>>> [AIRFLOW-3147] Update Flask-AppBuilder version
>>> [AIRFLOW-3231] Basic operators for Google Cloud SQL (deploy / patch / 
>>> delete)
>>> [AIRFLOW-3276] Google Cloud SQL database create / patch / delete operators
>>> 
>>> Improvements:
>>> 
>>> [AIRFLOW-393] Add progress callbacks for FTP downloads
>>> [AIRFLOW-520] Show Airflow version on web page
>>> [AIRFLOW-843] Exceptions now available in context during on_failure_callback
>>> [AIRFLOW-2476] Update tabulate dependency to v0.8.2
>>> [AIRFLOW-2592] Bump Bleach dependency
>>> [AIRFLOW-2622] Add "confirm=False" option to SFTPOperator
>>> [AIRFLOW-2662] support affinity & nodeSelector policies for kubernetes 
>>> executor/operator
>>> [AIRFLOW-2709] Improve error handling in Databricks hook
>>> [AIRFLOW-2723] Update lxml dependancy to >= 4.0.
>>> [AIRFLOW-2763] No precheck mechanism in place during worker initialisation 
>>> for the connection to metadata database
>>> [AIRFLOW-2789] Add ability to create single node cluster to 
>>> DataprocClusterCreateOperator
>>> [AIRFLOW-2797] Add ability to create Google Dataproc cluster with custom 
>>> image
>>> [AIRFLOW-2854] kubernetes_pod_operator add more configuration items
>>> [AIRFLOW-2855] Need to Check Validity of Cron Expression When Process DAG 
>>> File/Zip File
>>> [AIRFLOW-2904] Clean an 

Re: Airflow 1.10.1b1 release available - PLEASE TEST

2018-11-13 Thread Bolke de Bruin
Please make sure not to call it release at least 

Sent from my iPhone

> On 12 Nov 2018, at 22:51, Ash Berlin-Taylor  wrote:
> 
> Hi Hitesh,
> 
> My understanding was that the only official place an Apache release can 
> happen from is https://www.apache.org/dist/incubator/airflow/ 
>  - so anything else is by 
> definition not an official Apache release.
> 
> So i guess it depends on what we mean by "Apache" release - could it be 
> confused that it is in some way official, yes, possibly. Although the 
> end-user would have to go out of their way to find it and install it so the 
> risk was low, and I felt that the benefit to the community of being able to 
> test and install this easily was worth the small risk of confusion.
> 
> Possibly something I should have asked (voted on?) first, or before we do 
> this in the future.
> 
> -ash
> 
>> On 12 Nov 2018, at 20:32, Hitesh Shah  wrote:
>> 
>> Hello Ash
>> 
>> For someone who is not familiar with the beta notation or folks who do not
>> check whether it is signed or not, could this be confused as an apache
>> release? Also, is there a plan to clean up/delete this version within a
>> finite time window once the official release voting is kicked off?
>> 
>> thanks
>> Hitesh
>> 
>> 
>>> On Fri, Nov 9, 2018 at 5:52 AM Naik Kaxil  wrote:
>>> 
>>> Good work Ash. Appreciate the clean and categorised Change Log.
>>> 
>>> Regards,
>>> Kaxil
>>> 
>>> On 09/11/2018, 13:45, "Ash Berlin-Taylor"  wrote:
>>> 
>>>   Hi Everyone,
>>> 
>>>   I've just released a beta version of 1.10.1! Please could you test
>>> this and report back any problems you notice, and also report back if you
>>> tried it and it works fine. As this is the first time I've released Airflow
>>> there is possible that there are packaging mistakes too. I'm not calling
>>> for a vote just yet, but I will give this a few days until I start making
>>> release candidates and calling for a formal vote, probably on Monday or
>>> Tuesday.
>>> 
>>>   In order to distinguish it from an actual (apache) release it is:
>>> 
>>>   1. Marked as beta (python package managers do not install beta
>>> versions by default - PEP 440)
>>>   2. It is not signed
>>>   3. It is not at an official apache distribution location
>>> 
>>>   It can be installed with SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
>>> 'apache-airflow==1.10.1b1'
>>> 
>>>   (Don't worry, without asking for `--pre` or specifying the version
>>> `pip install apache-airflow` will still get 1.10.0)
>>> 
>>>   Thanks,
>>>   Ash
>>> 
>>> 
>>> 
>>> 
>>> 
>>>   Included below is the changelog of this release:
>>> 
>>>   New features:
>>> 
>>>   [AIRFLOW-2524] Airflow integration with AWS Sagemaker
>>>   [AIRFLOW-2657] Add ability to delete DAG from web ui
>>>   [AIRFLOW-2780] Adds IMAP Hook to interact with a mail server
>>>   [AIRFLOW-2794] Add delete support for Azure blob
>>>   [AIRFLOW-2912] Add operators for Google Cloud Functions
>>>   [AIRFLOW-2974] Add Start/Restart/Terminate methods Databricks Hook
>>>   [AIRFLOW-2989] No Parameter to change bootDiskType for
>>> DataprocClusterCreateOperator
>>>   [AIRFLOW-3078] Basic operators for Google Compute Engine
>>>   [AIRFLOW-3147] Update Flask-AppBuilder version
>>>   [AIRFLOW-3231] Basic operators for Google Cloud SQL (deploy / patch /
>>> delete)
>>>   [AIRFLOW-3276] Google Cloud SQL database create / patch / delete
>>> operators
>>> 
>>>   Improvements:
>>> 
>>>   [AIRFLOW-393] Add progress callbacks for FTP downloads
>>>   [AIRFLOW-520] Show Airflow version on web page
>>>   [AIRFLOW-843] Excpetions now available in context durint
>>> on_failure_callback
>>>   [AIRFLOW-2476] Update tabulate dependency to v0.8.2
>>>   [AIRFLOW-2592] Bump Bleach dependency
>>>   [AIRFLOW-2622] Add "confirm=False" option to SFTPOperator
>>>   [AIRFLOW-2662] support affinity & nodeSelector policies for kubernetes
>>> executor/operator
>>>   [AIRFLOW-2709] Improve error handling in Databricks hook
>>>   [AIRFLOW-2763] No precheck mechanism in place during worker
>>> initialisation for the connection to metadata database
>>>   [AIRFLOW-2789] Add ability to create single node cluster to
>>> DataprocClusterCreateOperator
>>>   [AIRFLOW-2797] Add ability to create Google Dataproc cluster with
>>> custom image
>>>   [AIRFLOW-2854] kubernetes_pod_operator add more configuration items
>>>   [AIRFLOW-2855] Need to Check Validity of Cron Expression When Process
>>> DAG File/Zip File
>>>   [AIRFLOW-2904] Clean an unnecessary line in
>>> airflow/executors/celery_executor.py
>>>   [AIRFLOW-2921] A trivial incorrectness in CeleryExecutor()
>>>   [AIRFLOW-2922] Potential deal-lock bug in CeleryExecutor()
>>>   [AIRFLOW-2932] GoogleCloudStorageHook - allow compression of file
>>>   [AIRFLOW-2949] Syntax Highlight for Single Quote
>>>   [AIRFLOW-2951] dag_run end_date Null after a dag is finished
>>>   [AIRFLOW-2956] Kubernetes tolerations for pod operator
>>>   [AIRFLOW-2997] Support for 

Re: 1.10.1 Release?

2018-11-05 Thread Bolke de Bruin
The fix is in master and should work across all DST changes. It will be 
included in 1.10.1. 

B. 

Sent from my iPhone

> On 5 Nov 2018, at 19:54, Dave Fisher  wrote:
> 
> 
> 
>> On 2018/10/28 00:09:05, Bolke de Bruin  wrote: 
>> I wonder how to treat this:
>> 
>> This is what I think happens (need to verify more, but I am pretty sure) the 
>> specified DAG should run every 5 minutes. At DST change (3AM -> 2AM)
> 
> FYI - In the US the DST change is 2AM -> 1AM. Yes, TZ is hard stuff.
> 
> we basically hit a schedule that we have already seen. 2AM -> 3AM has already 
> happened. Obviously the intention is to run every 5 minutes. But what do we 
> do with the execution_date? Is this still idempotent? Should we indeed 
> reschedule? 
>> 
>> B.
>> 
>>> On 30 Oct 2018, at 19:01, Ash Berlin-Taylor  wrote:
>>> 
>>> I've done a bit more digging - the issue is of our tz-aware handling inside 
>>> following_schedule (and previous schedule) - causing it to loop.
>>> 
>>> This section of the croniter docs seems relevant 
>>> https://github.com/kiorky/croniter#about-dst
>>> 
>>>   Be sure to init your croniter instance with a TZ aware datetime for this 
>>> to work !:
>>>>>> local_date = tz.localize(datetime(2017, 3, 26))
>>>>>> val = croniter('0 0 * * *', local_date).get_next(datetime)
>>> 
>>> I think the problem is that we are _not_ passing a TZ aware dag in and we 
>>> should be.
>>> 
>>>> On 30 Oct 2018, at 17:35, Bolke de Bruin  wrote:
>>>> 
>>>> Oh that’s a great environment to start digging. Thanks. I’ll have a look.
>>>> 
>>>> B.
>>>> 
>>>> Verstuurd vanaf mijn iPad
>>>> 
>>>>> Op 30 okt. 2018 om 18:25 heeft Ash Berlin-Taylor  het 
>>>>> volgende geschreven:
>>>>> 
>>>>> This line in airflow.jobs (line 874 in my checkout) is causing the loop:
>>>>> 
>>>>> last_run = dag.get_last_dagrun(session=session)
>>>>> if last_run and next_run_date:
>>>>> while next_run_date <= last_run.execution_date:
>>>>> next_run_date = dag.following_schedule(next_run_date)
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor  wrote:
>>>>>> 
>>>>>> Hi, kaczors on gitter has produced a minmal reproduction case: 
>>>>>> https://github.com/kaczors/airflow_1_10_tz_bug
>>>>>> 
>>>>>> Rough repro steps: In a VM, with time syncing disabled, and configured 
>>>>>> with system timezone of Europe/Zurich (or any other CEST one) run 
>>>>>> 
>>>>>> - `date 10280250.00`
>>>>>> - initdb, start scheduler, webserver, enable dag etc.
>>>>>> - `date 10280259.00`
>>>>>> - wait 5-10 mins for scheduler to catch up
>>>>>> - After the on-the-hour task run the scheduler will spin up another 
>>>>>> process to parse the dag... and it never returns.
>>>>>> 
>>>>>> I've only just managed to reproduce it, so haven't dug in to why yet. A 
>>>>>> quick hacky debug print shows something is stuck in an infinite loop.
>>>>>> 
>>>>>> -ash
>>>>>> 
>>>>>>> On 29 Oct 2018, at 17:59, Bolke de Bruin  wrote:
>>>>>>> 
>>>>>>> Can this be confirmed? Then I can have a look at it. Preferably with 
>>>>>>> dag definition code.
>>>>>>> 
>>>>>>> On the licensing requirements:
>>>>>>> 
>>>>>>> 1. Indeed licensing header for markdown documents. It was suggested to 
>>>>>>> use html comments. I’m not sure how that renders with others like PDF 
>>>>>>> though.
>>>>>>> 2. The licensing notifications need to be tied to a specific version as 
>>>>>>> licenses might change with versions.
>>>>>>> 
>>>>>>> Cheers
>>>>>>> Bolke
>>>>>>> 
>>>>>>> Verstuurd vanaf mijn iPad
>>>>>>> 
>>>>>>>> Op 29 okt. 2018 om 12:39 heeft Ash Berlin-Taylor  het 
>>>>>>>> volgende geschreven:
>>>>>>>> 
>>>>>&g

Re: 1.10.1 Release?

2018-10-30 Thread Bolke de Bruin
They are :-)

I might have to adjust some tests, let's see. Will do that tomorrow if required.

B.

> On 30 Oct 2018, at 21:53, Ash Berlin-Taylor  wrote:
> 
> Fair :) Timezones are _hard_
> 
> Giving it a look now.
> 
> -ash
> 
>> On 30 Oct 2018, at 20:50, Bolke de Bruin  wrote:
>> 
>> The reason for not passing a TZ aware object is, is that many libraries make 
>> mistakes (pytz, arrow etc) when doing transitions hence to use of pendulum 
>> which seem most complete. I don’t know what croniter is relying on and I 
>> don’t want to find out ;-).
>> 
>> B.
>> 
>>> On 30 Oct 2018, at 21:13, Ash Berlin-Taylor >> <mailto:a...@firemirror.com>> wrote:
>>> 
>>> I think if we give croniter a tz-aware DT in the local tz it will deal with 
>>> DST (i.e. will give 2:55 CEST followed by 2:00 CET) and then we convert it 
>>> to UTC for return - but right now we are giving it a TZ-unaware local time.
>>> 
>>> I think.
>>> 
>>> Ash
>>> 
>>> On 30 October 2018 19:40:27 GMT, Bolke de Bruin  wrote:
>>> I think we should use the UTC date for cron instead of the naive local date 
>>> time. I will check of croniter implements this so we can rely on that.
>>> 
>>> B.
>>> 
>>> On 28 Oct 2018, at 02:09, Bolke de Bruin  wrote:
>>> 
>>> I wonder how to treat this:
>>> 
>>> This is what I think happens (need to verify more, but I am pretty sure) 
>>> the specified DAG should run every 5 minutes. At DST change (3AM -> 2AM) we 
>>> basically hit a schedule that we have already seen. 2AM -> 3AM has already 
>>> happened. Obviously the intention is to run every 5 minutes. But what do we 
>>> do with the execution_date? Is this still idempotent? Should we indeed 
>>> reschedule? 
>>> 
>>> B.
>>> 
>>> On 30 Oct 2018, at 19:01, Ash Berlin-Taylor  wrote:
>>> 
>>> I've done a bit more digging - the issue is of our tz-aware handling inside 
>>> following_schedule (and previous schedule) - causing it to loop.
>>> 
>>> This section of the croniter docs seems relevant 
>>> https://github.com/kiorky/croniter#about-dst 
>>> <https://github.com/kiorky/croniter#about-dst><https://github.com/kiorky/croniter#about-dst
>>>  <https://github.com/kiorky/croniter#about-dst>>
>>> 
>>> Be sure to init your croniter instance with a TZ aware datetime for this to 
>>> work !:
>>> local_date = tz.localize(datetime(2017, 3, 26))
>>> val = croniter('0 0 * * *', local_date).get_next(datetime)
>>> 
>>> I think the problem is that we are _not_ passing a TZ aware dag in and we 
>>> should be.
>>> 
>>> On 30 Oct 2018, at 17:35, Bolke de Bruin >> <mailto:bdbr...@gmail.com>> wrote:
>>> 
>>> Oh that’s a great environment to start digging. Thanks. I’ll have a look.
>>> 
>>> B.
>>> 
>>> Verstuurd vanaf mijn iPad
>>> 
>>> Op 30 okt. 2018 om 18:25 heeft Ash Berlin-Taylor >> <mailto:a...@apache.org>> het volgende geschreven:
>>> 
>>> This line in airflow.jobs (line 874 in my checkout) is causing the loop:
>>> 
>>>   last_run = dag.get_last_dagrun(session=session)
>>>   if last_run and next_run_date:
>>>   while next_run_date <= last_run.execution_date:
>>>   next_run_date = dag.following_schedule(next_run_date)
>>> 
>>> 
>>> 
>>> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor >> <mailto:a...@apache.org>> wrote:
>>> 
>>> Hi, kaczors on gitter has produced a minmal reproduction case: 
>>> https://github.com/kaczors/airflow_1_10_tz_bug 
>>> <https://github.com/kaczors/airflow_1_10_tz_bug> 
>>> <https://github.com/kaczors/airflow_1_10_tz_bug 
>>> <https://github.com/kaczors/airflow_1_10_tz_bug>>
>>> 
>>> Rough repro steps: In a VM, with time syncing disabled, and configured with 
>>> system timezone of Europe/Zurich (or any other CEST one) run 
>>> 
>>> - `date 10280250.00`
>>> - initdb, start scheduler, webserver, enable dag etc.
>>> - `date 10280259.00`
>>> - wait 5-10 mins for scheduler to catch up
>>> - After the on-the-hour task run the scheduler will spin up another process 
>>> to parse the dag... and it never returns.
>>> 
>>> I've only just managed to reproduce it, so haven't dug in to why yet. A 
>>> qu

Re: 1.10.1 Release?

2018-10-30 Thread Bolke de Bruin
The reason for not passing a TZ aware object is, is that many libraries make 
mistakes (pytz, arrow etc) when doing transitions hence to use of pendulum 
which seem most complete. I don’t know what croniter is relying on and I don’t 
want to find out ;-).

B.

> On 30 Oct 2018, at 21:13, Ash Berlin-Taylor  wrote:
> 
> I think if we give croniter a tz-aware DT in the local tz it will deal with 
> DST (i.e. will give 2:55 CEST followed by 2:00 CET) and then we convert it to 
> UTC for return - but right now we are giving it a TZ-unaware local time.
> 
> I think.
> 
> Ash
> 
> On 30 October 2018 19:40:27 GMT, Bolke de Bruin  wrote:
> I think we should use the UTC date for cron instead of the naive local date 
> time. I will check of croniter implements this so we can rely on that.
> 
> B.
> 
> On 28 Oct 2018, at 02:09, Bolke de Bruin  wrote:
> 
> I wonder how to treat this:
> 
> This is what I think happens (need to verify more, but I am pretty sure) the 
> specified DAG should run every 5 minutes. At DST change (3AM -> 2AM) we 
> basically hit a schedule that we have already seen. 2AM -> 3AM has already 
> happened. Obviously the intention is to run every 5 minutes. But what do we 
> do with the execution_date? Is this still idempotent? Should we indeed 
> reschedule? 
> 
> B.
> 
> On 30 Oct 2018, at 19:01, Ash Berlin-Taylor  wrote:
> 
> I've done a bit more digging - the issue is of our tz-aware handling inside 
> following_schedule (and previous schedule) - causing it to loop.
> 
> This section of the croniter docs seems relevant 
> https://github.com/kiorky/croniter#about-dst 
> <https://github.com/kiorky/croniter#about-dst>
> 
>   Be sure to init your croniter instance with a TZ aware datetime for this to 
> work !:
> local_date = tz.localize(datetime(2017, 3, 26))
> val = croniter('0 0 * * *', local_date).get_next(datetime)
> 
> I think the problem is that we are _not_ passing a TZ aware dag in and we 
> should be.
> 
> On 30 Oct 2018, at 17:35, Bolke de Bruin  wrote:
> 
> Oh that’s a great environment to start digging. Thanks. I’ll have a look.
> 
> B.
> 
> Verstuurd vanaf mijn iPad
> 
> Op 30 okt. 2018 om 18:25 heeft Ash Berlin-Taylor  het 
> volgende geschreven:
> 
> This line in airflow.jobs (line 874 in my checkout) is causing the loop:
> 
> last_run = dag.get_last_dagrun(session=session)
> if last_run and next_run_date:
> while next_run_date <= last_run.execution_date:
> next_run_date = dag.following_schedule(next_run_date)
> 
> 
> 
> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor  wrote:
> 
> Hi, kaczors on gitter has produced a minmal reproduction case: 
> https://github.com/kaczors/airflow_1_10_tz_bug 
> <https://github.com/kaczors/airflow_1_10_tz_bug>
> 
> Rough repro steps: In a VM, with time syncing disabled, and configured with 
> system timezone of Europe/Zurich (or any other CEST one) run 
> 
> - `date 10280250.00`
> - initdb, start scheduler, webserver, enable dag etc.
> - `date 10280259.00`
> - wait 5-10 mins for scheduler to catch up
> - After the on-the-hour task run the scheduler will spin up another process 
> to parse the dag... and it never returns.
> 
> I've only just managed to reproduce it, so haven't dug in to why yet. A quick 
> hacky debug print shows something is stuck in an infinite loop.
> 
> -ash
> 
> On 29 Oct 2018, at 17:59, Bolke de Bruin  wrote:
> 
> Can this be confirmed? Then I can have a look at it. Preferably with dag 
> definition code.
> 
> On the licensing requirements:
> 
> 1. Indeed licensing header for markdown documents. It was suggested to use 
> html comments. I’m not sure how that renders with others like PDF though.
> 2. The licensing notifications need to be tied to a specific version as 
> licenses might change with versions.
> 
> Cheers
> Bolke
> 
> Verstuurd vanaf mijn iPad
> 
> Op 29 okt. 2018 om 12:39 heeft Ash Berlin-Taylor  het 
> volgende geschreven:
> 
> I was going to make a start on the release, but two people have reported that 
> there might be an issue around non-UTC dags and the scheduler changing over 
> from Summer time.
> 
> 08:45 Emmanuel> Hi there, we are currently experiencing a very strange issue 
> : we have hourly DAGs with a start_date in a local timezone (not UTC) and 
> since (Sunday) the last winter time change they don’t run anymore. Any idea ?
> 09:41  it impacted all our DAG that had a run at 3am 
> (Europe/Paris), the exact time of winter time change :(
> 
> I am going to take a look at this today and see if I can get to the bottom of 
> it.
> 
> Bolke: are there any outstanding tasks/

Re: 1.10.1 Release?

2018-10-30 Thread Bolke de Bruin
Patch available at:

https://github.com/apache/incubator-airflow/pull/4117

please test.

B.

> On 30 Oct 2018, at 21:14, Bolke de Bruin  wrote:
> 
> We should just pass it the UTC date (we should never use local time except at 
> the user interface). I’m testing a patch right now.
> 
> B.
> 
>> On 30 Oct 2018, at 21:13, Ash Berlin-Taylor > <mailto:a...@firemirror.com>> wrote:
>> 
>> I think if we give croniter a tz-aware DT in the local tz it will deal with 
>> DST (i.e. will give 2:55 CEST followed by 2:00 CET) and then we convert it 
>> to UTC for return - but right now we are giving it a TZ-unaware local time.
>> 
>> I think.
>> 
>> Ash
>> 
>> On 30 October 2018 19:40:27 GMT, Bolke de Bruin > <mailto:bdbr...@gmail.com>> wrote:
>> I think we should use the UTC date for cron instead of the naive local date 
>> time. I will check of croniter implements this so we can rely on that.
>> 
>> B.
>> 
>> On 28 Oct 2018, at 02:09, Bolke de Bruin > <mailto:bdbr...@gmail.com>> wrote:
>> 
>> I wonder how to treat this:
>> 
>> This is what I think happens (need to verify more, but I am pretty sure) the 
>> specified DAG should run every 5 minutes. At DST change (3AM -> 2AM) we 
>> basically hit a schedule that we have already seen. 2AM -> 3AM has already 
>> happened. Obviously the intention is to run every 5 minutes. But what do we 
>> do with the execution_date? Is this still idempotent? Should we indeed 
>> reschedule? 
>> 
>> B.
>> 
>> On 30 Oct 2018, at 19:01, Ash Berlin-Taylor > <mailto:a...@apache.org>> wrote:
>> 
>> I've done a bit more digging - the issue is of our tz-aware handling inside 
>> following_schedule (and previous schedule) - causing it to loop.
>> 
>> This section of the croniter docs seems relevant 
>> https://github.com/kiorky/croniter#about-dst 
>> <https://github.com/kiorky/croniter#about-dst>
>> 
>>   Be sure to init your croniter instance with a TZ aware datetime for this 
>> to work !:
>> local_date = tz.localize(datetime(2017, 3, 26))
>> val = croniter('0 0 * * *', local_date).get_next(datetime)
>> 
>> I think the problem is that we are _not_ passing a TZ aware dag in and we 
>> should be.
>> 
>> On 30 Oct 2018, at 17:35, Bolke de Bruin > <mailto:bdbr...@gmail.com>> wrote:
>> 
>> Oh that’s a great environment to start digging. Thanks. I’ll have a look.
>> 
>> B.
>> 
>> Verstuurd vanaf mijn iPad
>> 
>> Op 30 okt. 2018 om 18:25 heeft Ash Berlin-Taylor > <mailto:a...@apache.org>> het volgende geschreven:
>> 
>> This line in airflow.jobs (line 874 in my checkout) is causing the loop:
>> 
>> last_run = dag.get_last_dagrun(session=session)
>> if last_run and next_run_date:
>> while next_run_date <= last_run.execution_date:
>> next_run_date = dag.following_schedule(next_run_date)
>> 
>> 
>> 
>> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor > <mailto:a...@apache.org>> wrote:
>> 
>> Hi, kaczors on gitter has produced a minmal reproduction case: 
>> https://github.com/kaczors/airflow_1_10_tz_bug 
>> <https://github.com/kaczors/airflow_1_10_tz_bug>
>> 
>> Rough repro steps: In a VM, with time syncing disabled, and configured with 
>> system timezone of Europe/Zurich (or any other CEST one) run 
>> 
>> - `date 10280250.00`
>> - initdb, start scheduler, webserver, enable dag etc.
>> - `date 10280259.00`
>> - wait 5-10 mins for scheduler to catch up
>> - After the on-the-hour task run the scheduler will spin up another process 
>> to parse the dag... and it never returns.
>> 
>> I've only just managed to reproduce it, so haven't dug in to why yet. A 
>> quick hacky debug print shows something is stuck in an infinite loop.
>> 
>> -ash
>> 
>> On 29 Oct 2018, at 17:59, Bolke de Bruin > <mailto:bdbr...@gmail.com>> wrote:
>> 
>> Can this be confirmed? Then I can have a look at it. Preferably with dag 
>> definition code.
>> 
>> On the licensing requirements:
>> 
>> 1. Indeed licensing header for markdown documents. It was suggested to use 
>> html comments. I’m not sure how that renders with others like PDF though.
>> 2. The licensing notifications need to be tied to a specific version as 
>> licenses might change with versions.
>> 
>> Cheers
>> Bolke
>> 
>> Verstuurd vanaf mijn iPad
>> 
>> Op 29 okt. 2018 om 12:39 

Re: 1.10.1 Release?

2018-10-30 Thread Bolke de Bruin
We should just pass it the UTC date (we should never use local time except at 
the user interface). I’m testing a patch right now.

B.

> On 30 Oct 2018, at 21:13, Ash Berlin-Taylor  wrote:
> 
> I think if we give croniter a tz-aware DT in the local tz it will deal with 
> DST (i.e. will give 2:55 CEST followed by 2:00 CET) and then we convert it to 
> UTC for return - but right now we are giving it a TZ-unaware local time.
> 
> I think.
> 
> Ash
> 
> On 30 October 2018 19:40:27 GMT, Bolke de Bruin  wrote:
> I think we should use the UTC date for cron instead of the naive local date 
> time. I will check of croniter implements this so we can rely on that.
> 
> B.
> 
> On 28 Oct 2018, at 02:09, Bolke de Bruin  wrote:
> 
> I wonder how to treat this:
> 
> This is what I think happens (need to verify more, but I am pretty sure) the 
> specified DAG should run every 5 minutes. At DST change (3AM -> 2AM) we 
> basically hit a schedule that we have already seen. 2AM -> 3AM has already 
> happened. Obviously the intention is to run every 5 minutes. But what do we 
> do with the execution_date? Is this still idempotent? Should we indeed 
> reschedule? 
> 
> B.
> 
> On 30 Oct 2018, at 19:01, Ash Berlin-Taylor  wrote:
> 
> I've done a bit more digging - the issue is of our tz-aware handling inside 
> following_schedule (and previous schedule) - causing it to loop.
> 
> This section of the croniter docs seems relevant 
> https://github.com/kiorky/croniter#about-dst 
> <https://github.com/kiorky/croniter#about-dst>
> 
>   Be sure to init your croniter instance with a TZ aware datetime for this to 
> work !:
> local_date = tz.localize(datetime(2017, 3, 26))
> val = croniter('0 0 * * *', local_date).get_next(datetime)
> 
> I think the problem is that we are _not_ passing a TZ aware dag in and we 
> should be.
> 
> On 30 Oct 2018, at 17:35, Bolke de Bruin  wrote:
> 
> Oh that’s a great environment to start digging. Thanks. I’ll have a look.
> 
> B.
> 
> Verstuurd vanaf mijn iPad
> 
> Op 30 okt. 2018 om 18:25 heeft Ash Berlin-Taylor  het 
> volgende geschreven:
> 
> This line in airflow.jobs (line 874 in my checkout) is causing the loop:
> 
> last_run = dag.get_last_dagrun(session=session)
> if last_run and next_run_date:
> while next_run_date <= last_run.execution_date:
> next_run_date = dag.following_schedule(next_run_date)
> 
> 
> 
> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor  wrote:
> 
> Hi, kaczors on gitter has produced a minmal reproduction case: 
> https://github.com/kaczors/airflow_1_10_tz_bug 
> <https://github.com/kaczors/airflow_1_10_tz_bug>
> 
> Rough repro steps: In a VM, with time syncing disabled, and configured with 
> system timezone of Europe/Zurich (or any other CEST one) run 
> 
> - `date 10280250.00`
> - initdb, start scheduler, webserver, enable dag etc.
> - `date 10280259.00`
> - wait 5-10 mins for scheduler to catch up
> - After the on-the-hour task run the scheduler will spin up another process 
> to parse the dag... and it never returns.
> 
> I've only just managed to reproduce it, so haven't dug in to why yet. A quick 
> hacky debug print shows something is stuck in an infinite loop.
> 
> -ash
> 
> On 29 Oct 2018, at 17:59, Bolke de Bruin  wrote:
> 
> Can this be confirmed? Then I can have a look at it. Preferably with dag 
> definition code.
> 
> On the licensing requirements:
> 
> 1. Indeed licensing header for markdown documents. It was suggested to use 
> html comments. I’m not sure how that renders with others like PDF though.
> 2. The licensing notifications need to be tied to a specific version as 
> licenses might change with versions.
> 
> Cheers
> Bolke
> 
> Verstuurd vanaf mijn iPad
> 
> Op 29 okt. 2018 om 12:39 heeft Ash Berlin-Taylor  het 
> volgende geschreven:
> 
> I was going to make a start on the release, but two people have reported that 
> there might be an issue around non-UTC dags and the scheduler changing over 
> from Summer time.
> 
> 08:45 Emmanuel> Hi there, we are currently experiencing a very strange issue 
> : we have hourly DAGs with a start_date in a local timezone (not UTC) and 
> since (Sunday) the last winter time change they don’t run anymore. Any idea ?
> 09:41  it impacted all our DAG that had a run at 3am 
> (Europe/Paris), the exact time of winter time change :(
> 
> I am going to take a look at this today and see if I can get to the bottom of 
> it.
> 
> Bolke: are there any outstanding tasks/issues that you know of that might 
> slow down the vote for a 1.10.1? (i.e. did we sort of out all the licensing 
> issues that 

Re: 1.10.1 Release?

2018-10-30 Thread Bolke de Bruin
I think we should use the UTC date for cron instead of the naive local date 
time. I will check of croniter implements this so we can rely on that.

B.

> On 28 Oct 2018, at 02:09, Bolke de Bruin  wrote:
> 
> I wonder how to treat this:
> 
> This is what I think happens (need to verify more, but I am pretty sure) the 
> specified DAG should run every 5 minutes. At DST change (3AM -> 2AM) we 
> basically hit a schedule that we have already seen. 2AM -> 3AM has already 
> happened. Obviously the intention is to run every 5 minutes. But what do we 
> do with the execution_date? Is this still idempotent? Should we indeed 
> reschedule? 
> 
> B.
> 
>> On 30 Oct 2018, at 19:01, Ash Berlin-Taylor  wrote:
>> 
>> I've done a bit more digging - the issue is of our tz-aware handling inside 
>> following_schedule (and previous schedule) - causing it to loop.
>> 
>> This section of the croniter docs seems relevant 
>> https://github.com/kiorky/croniter#about-dst
>> 
>>   Be sure to init your croniter instance with a TZ aware datetime for this 
>> to work !:
>>>>> local_date = tz.localize(datetime(2017, 3, 26))
>>>>> val = croniter('0 0 * * *', local_date).get_next(datetime)
>> 
>> I think the problem is that we are _not_ passing a TZ aware dag in and we 
>> should be.
>> 
>>> On 30 Oct 2018, at 17:35, Bolke de Bruin  wrote:
>>> 
>>> Oh that’s a great environment to start digging. Thanks. I’ll have a look.
>>> 
>>> B.
>>> 
>>> Verstuurd vanaf mijn iPad
>>> 
>>>> Op 30 okt. 2018 om 18:25 heeft Ash Berlin-Taylor  het 
>>>> volgende geschreven:
>>>> 
>>>> This line in airflow.jobs (line 874 in my checkout) is causing the loop:
>>>> 
>>>> last_run = dag.get_last_dagrun(session=session)
>>>> if last_run and next_run_date:
>>>> while next_run_date <= last_run.execution_date:
>>>> next_run_date = dag.following_schedule(next_run_date)
>>>> 
>>>> 
>>>> 
>>>>> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor  wrote:
>>>>> 
>>>>> Hi, kaczors on gitter has produced a minmal reproduction case: 
>>>>> https://github.com/kaczors/airflow_1_10_tz_bug
>>>>> 
>>>>> Rough repro steps: In a VM, with time syncing disabled, and configured 
>>>>> with system timezone of Europe/Zurich (or any other CEST one) run 
>>>>> 
>>>>> - `date 10280250.00`
>>>>> - initdb, start scheduler, webserver, enable dag etc.
>>>>> - `date 10280259.00`
>>>>> - wait 5-10 mins for scheduler to catch up
>>>>> - After the on-the-hour task run the scheduler will spin up another 
>>>>> process to parse the dag... and it never returns.
>>>>> 
>>>>> I've only just managed to reproduce it, so haven't dug in to why yet. A 
>>>>> quick hacky debug print shows something is stuck in an infinite loop.
>>>>> 
>>>>> -ash
>>>>> 
>>>>>> On 29 Oct 2018, at 17:59, Bolke de Bruin  wrote:
>>>>>> 
>>>>>> Can this be confirmed? Then I can have a look at it. Preferably with dag 
>>>>>> definition code.
>>>>>> 
>>>>>> On the licensing requirements:
>>>>>> 
>>>>>> 1. Indeed licensing header for markdown documents. It was suggested to 
>>>>>> use html comments. I’m not sure how that renders with others like PDF 
>>>>>> though.
>>>>>> 2. The licensing notifications need to be tied to a specific version as 
>>>>>> licenses might change with versions.
>>>>>> 
>>>>>> Cheers
>>>>>> Bolke
>>>>>> 
>>>>>> Verstuurd vanaf mijn iPad
>>>>>> 
>>>>>>> Op 29 okt. 2018 om 12:39 heeft Ash Berlin-Taylor  het 
>>>>>>> volgende geschreven:
>>>>>>> 
>>>>>>> I was going to make a start on the release, but two people have 
>>>>>>> reported that there might be an issue around non-UTC dags and the 
>>>>>>> scheduler changing over from Summer time.
>>>>>>> 
>>>>>>>> 08:45 Emmanuel> Hi there, we are currently experiencing a very strange 
>>>>>>>> issue : we have hourly DAGs with a start_date in

Re: 1.10.1 Release?

2018-10-30 Thread Bolke de Bruin
I think DAGs will start running by themselves again as soon as the interval has 
passed that it has already seen. So depending your schedule this can take a 
while.

B.

> On 30 Oct 2018, at 19:53, a...@apache.org wrote:
> 
> 1.10.1 isn't out yet, so 1.10.0 :)
> 
> I think this only affects days with a schedule interval that is one hour or 
> more frequently, and where the dag timezone is not UTC.
> 
> For anyone stuck on this in prod I think the fix is to manually create (via 
> backfill or trigger) a dag run after the tz change over time. This should 
> unblock the scheduler.
> 
> Ash
> 
> On 30 October 2018 18:43:30 GMT, David Klosowski  wrote:
>> Hi Airflow Devs:
>> 
>> Is this timezone issue in Airflow version 1.10.0 or only in 1.10.1?
>> 
>> Thanks.
>> 
>> Regards,
>> David
>> 
>> On Tue, Oct 30, 2018 at 11:11 AM Bolke de Bruin 
>> wrote:
>> 
>>> we specifically remove timezone info to determine the next schedule.
>> Ie.
>>> cron sets exact date times so tz info should not make sense. I’m
>> going to
>>> have a look now.
>>> 
>>> 
>>>> On 30 Oct 2018, at 19:01, Ash Berlin-Taylor  wrote:
>>>> 
>>>> I've done a bit more digging - the issue is of our tz-aware
>> handling
>>> inside following_schedule (and previous schedule) - causing it to
>> loop.
>>>> 
>>>> This section of the croniter docs seems relevant
>>> https://github.com/kiorky/croniter#about-dst
>>>> 
>>>>   Be sure to init your croniter instance with a TZ aware datetime
>> for
>>> this to work !:
>>>>>>> local_date = tz.localize(datetime(2017, 3, 26))
>>>>>>> val = croniter('0 0 * * *', local_date).get_next(datetime)
>>>> 
>>>> I think the problem is that we are _not_ passing a TZ aware dag in
>> and
>>> we should be.
>>>> 
>>>>> On 30 Oct 2018, at 17:35, Bolke de Bruin 
>> wrote:
>>>>> 
>>>>> Oh that’s a great environment to start digging. Thanks. I’ll have
>> a
>>> look.
>>>>> 
>>>>> B.
>>>>> 
>>>>> Verstuurd vanaf mijn iPad
>>>>> 
>>>>>> Op 30 okt. 2018 om 18:25 heeft Ash Berlin-Taylor 
>> het
>>> volgende geschreven:
>>>>>> 
>>>>>> This line in airflow.jobs (line 874 in my checkout) is causing
>> the
>>> loop:
>>>>>> 
>>>>>> last_run = dag.get_last_dagrun(session=session)
>>>>>> if last_run and next_run_date:
>>>>>> while next_run_date <= last_run.execution_date:
>>>>>> next_run_date =
>> dag.following_schedule(next_run_date)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor 
>> wrote:
>>>>>>> 
>>>>>>> Hi, kaczors on gitter has produced a minmal reproduction case:
>>> https://github.com/kaczors/airflow_1_10_tz_bug
>>>>>>> 
>>>>>>> Rough repro steps: In a VM, with time syncing disabled, and
>>> configured with system timezone of Europe/Zurich (or any other CEST
>> one)
>>> run
>>>>>>> 
>>>>>>> - `date 10280250.00`
>>>>>>> - initdb, start scheduler, webserver, enable dag etc.
>>>>>>> - `date 10280259.00`
>>>>>>> - wait 5-10 mins for scheduler to catch up
>>>>>>> - After the on-the-hour task run the scheduler will spin up
>> another
>>> process to parse the dag... and it never returns.
>>>>>>> 
>>>>>>> I've only just managed to reproduce it, so haven't dug in to why
>> yet.
>>> A quick hacky debug print shows something is stuck in an infinite
>> loop.
>>>>>>> 
>>>>>>> -ash
>>>>>>> 
>>>>>>>> On 29 Oct 2018, at 17:59, Bolke de Bruin 
>> wrote:
>>>>>>>> 
>>>>>>>> Can this be confirmed? Then I can have a look at it. Preferably
>> with
>>> dag definition code.
>>>>>>>> 
>>>>>>>> On the licensing requirements:
>>>>>>>> 
>>>>>>>> 1. Indeed licensing header for markdown documents. It was
>> suggested
>>> to use

Re: 1.10.1 Release?

2018-10-30 Thread Bolke de Bruin
I wonder how to treat this:

This is what I think happens (need to verify more, but I am pretty sure) the 
specified DAG should run every 5 minutes. At DST change (3AM -> 2AM) we 
basically hit a schedule that we have already seen. 2AM -> 3AM has already 
happened. Obviously the intention is to run every 5 minutes. But what do we do 
with the execution_date? Is this still idempotent? Should we indeed reschedule? 

B.

> On 30 Oct 2018, at 19:01, Ash Berlin-Taylor  wrote:
> 
> I've done a bit more digging - the issue is of our tz-aware handling inside 
> following_schedule (and previous schedule) - causing it to loop.
> 
> This section of the croniter docs seems relevant 
> https://github.com/kiorky/croniter#about-dst
> 
>Be sure to init your croniter instance with a TZ aware datetime for this 
> to work !:
>>>> local_date = tz.localize(datetime(2017, 3, 26))
>>>> val = croniter('0 0 * * *', local_date).get_next(datetime)
> 
> I think the problem is that we are _not_ passing a TZ aware dag in and we 
> should be.
> 
>> On 30 Oct 2018, at 17:35, Bolke de Bruin  wrote:
>> 
>> Oh that’s a great environment to start digging. Thanks. I’ll have a look.
>> 
>> B.
>> 
>> Verstuurd vanaf mijn iPad
>> 
>>> Op 30 okt. 2018 om 18:25 heeft Ash Berlin-Taylor  het 
>>> volgende geschreven:
>>> 
>>> This line in airflow.jobs (line 874 in my checkout) is causing the loop:
>>> 
>>>  last_run = dag.get_last_dagrun(session=session)
>>>  if last_run and next_run_date:
>>>  while next_run_date <= last_run.execution_date:
>>>  next_run_date = dag.following_schedule(next_run_date)
>>> 
>>> 
>>> 
>>>> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor  wrote:
>>>> 
>>>> Hi, kaczors on gitter has produced a minmal reproduction case: 
>>>> https://github.com/kaczors/airflow_1_10_tz_bug
>>>> 
>>>> Rough repro steps: In a VM, with time syncing disabled, and configured 
>>>> with system timezone of Europe/Zurich (or any other CEST one) run 
>>>> 
>>>> - `date 10280250.00`
>>>> - initdb, start scheduler, webserver, enable dag etc.
>>>> - `date 10280259.00`
>>>> - wait 5-10 mins for scheduler to catch up
>>>> - After the on-the-hour task run the scheduler will spin up another 
>>>> process to parse the dag... and it never returns.
>>>> 
>>>> I've only just managed to reproduce it, so haven't dug in to why yet. A 
>>>> quick hacky debug print shows something is stuck in an infinite loop.
>>>> 
>>>> -ash
>>>> 
>>>>> On 29 Oct 2018, at 17:59, Bolke de Bruin  wrote:
>>>>> 
>>>>> Can this be confirmed? Then I can have a look at it. Preferably with dag 
>>>>> definition code.
>>>>> 
>>>>> On the licensing requirements:
>>>>> 
>>>>> 1. Indeed licensing header for markdown documents. It was suggested to 
>>>>> use html comments. I’m not sure how that renders with others like PDF 
>>>>> though.
>>>>> 2. The licensing notifications need to be tied to a specific version as 
>>>>> licenses might change with versions.
>>>>> 
>>>>> Cheers
>>>>> Bolke
>>>>> 
>>>>> Verstuurd vanaf mijn iPad
>>>>> 
>>>>>> Op 29 okt. 2018 om 12:39 heeft Ash Berlin-Taylor  het 
>>>>>> volgende geschreven:
>>>>>> 
>>>>>> I was going to make a start on the release, but two people have reported 
>>>>>> that there might be an issue around non-UTC dags and the scheduler 
>>>>>> changing over from Summer time.
>>>>>> 
>>>>>>> 08:45 Emmanuel> Hi there, we are currently experiencing a very strange 
>>>>>>> issue : we have hourly DAGs with a start_date in a local timezone (not 
>>>>>>> UTC) and since (Sunday) the last winter time change they don’t run 
>>>>>>> anymore. Any idea ?
>>>>>>> 09:41  it impacted all our DAG that had a run at 3am 
>>>>>>> (Europe/Paris), the exact time of winter time change :(
>>>>>> 
>>>>>> I am going to take a look at this today and see if I can get to the 
>>>>>> bottom of it.
>>>>>> 
>>>>>> Bolke: are there any outstanding tasks/issues that y

Re: 1.10.1 Release?

2018-10-30 Thread Bolke de Bruin
we specifically remove timezone info to determine the next schedule. Ie. cron 
sets exact date times so tz info should not make sense. I’m going to have a 
look now.


> On 30 Oct 2018, at 19:01, Ash Berlin-Taylor  wrote:
> 
> I've done a bit more digging - the issue is of our tz-aware handling inside 
> following_schedule (and previous schedule) - causing it to loop.
> 
> This section of the croniter docs seems relevant 
> https://github.com/kiorky/croniter#about-dst
> 
>Be sure to init your croniter instance with a TZ aware datetime for this 
> to work !:
>>>> local_date = tz.localize(datetime(2017, 3, 26))
>>>> val = croniter('0 0 * * *', local_date).get_next(datetime)
> 
> I think the problem is that we are _not_ passing a TZ aware dag in and we 
> should be.
> 
>> On 30 Oct 2018, at 17:35, Bolke de Bruin  wrote:
>> 
>> Oh that’s a great environment to start digging. Thanks. I’ll have a look.
>> 
>> B.
>> 
>> Verstuurd vanaf mijn iPad
>> 
>>> Op 30 okt. 2018 om 18:25 heeft Ash Berlin-Taylor  het 
>>> volgende geschreven:
>>> 
>>> This line in airflow.jobs (line 874 in my checkout) is causing the loop:
>>> 
>>>  last_run = dag.get_last_dagrun(session=session)
>>>  if last_run and next_run_date:
>>>  while next_run_date <= last_run.execution_date:
>>>  next_run_date = dag.following_schedule(next_run_date)
>>> 
>>> 
>>> 
>>>> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor  wrote:
>>>> 
>>>> Hi, kaczors on gitter has produced a minmal reproduction case: 
>>>> https://github.com/kaczors/airflow_1_10_tz_bug
>>>> 
>>>> Rough repro steps: In a VM, with time syncing disabled, and configured 
>>>> with system timezone of Europe/Zurich (or any other CEST one) run 
>>>> 
>>>> - `date 10280250.00`
>>>> - initdb, start scheduler, webserver, enable dag etc.
>>>> - `date 10280259.00`
>>>> - wait 5-10 mins for scheduler to catch up
>>>> - After the on-the-hour task run the scheduler will spin up another 
>>>> process to parse the dag... and it never returns.
>>>> 
>>>> I've only just managed to reproduce it, so haven't dug in to why yet. A 
>>>> quick hacky debug print shows something is stuck in an infinite loop.
>>>> 
>>>> -ash
>>>> 
>>>>> On 29 Oct 2018, at 17:59, Bolke de Bruin  wrote:
>>>>> 
>>>>> Can this be confirmed? Then I can have a look at it. Preferably with dag 
>>>>> definition code.
>>>>> 
>>>>> On the licensing requirements:
>>>>> 
>>>>> 1. Indeed licensing header for markdown documents. It was suggested to 
>>>>> use html comments. I’m not sure how that renders with others like PDF 
>>>>> though.
>>>>> 2. The licensing notifications need to be tied to a specific version as 
>>>>> licenses might change with versions.
>>>>> 
>>>>> Cheers
>>>>> Bolke
>>>>> 
>>>>> Verstuurd vanaf mijn iPad
>>>>> 
>>>>>> Op 29 okt. 2018 om 12:39 heeft Ash Berlin-Taylor  het 
>>>>>> volgende geschreven:
>>>>>> 
>>>>>> I was going to make a start on the release, but two people have reported 
>>>>>> that there might be an issue around non-UTC dags and the scheduler 
>>>>>> changing over from Summer time.
>>>>>> 
>>>>>>> 08:45 Emmanuel> Hi there, we are currently experiencing a very strange 
>>>>>>> issue : we have hourly DAGs with a start_date in a local timezone (not 
>>>>>>> UTC) and since (Sunday) the last winter time change they don’t run 
>>>>>>> anymore. Any idea ?
>>>>>>> 09:41  it impacted all our DAG that had a run at 3am 
>>>>>>> (Europe/Paris), the exact time of winter time change :(
>>>>>> 
>>>>>> I am going to take a look at this today and see if I can get to the 
>>>>>> bottom of it.
>>>>>> 
>>>>>> Bolke: are there any outstanding tasks/issues that you know of that 
>>>>>> might slow down the vote for a 1.10.1? (i.e. did we sort of out all the 
>>>>>> licensing issues that were asked of us? I thought I read something about 
>>>>>> license declaration

Re: 1.10.1 Release?

2018-10-30 Thread Bolke de Bruin
Oh that’s a great environment to start digging. Thanks. I’ll have a look.

B.

Verstuurd vanaf mijn iPad

> Op 30 okt. 2018 om 18:25 heeft Ash Berlin-Taylor  het 
> volgende geschreven:
> 
> This line in airflow.jobs (line 874 in my checkout) is causing the loop:
> 
>last_run = dag.get_last_dagrun(session=session)
>if last_run and next_run_date:
>while next_run_date <= last_run.execution_date:
>next_run_date = dag.following_schedule(next_run_date)
> 
> 
> 
>> On 30 Oct 2018, at 17:20, Ash Berlin-Taylor  wrote:
>> 
>> Hi, kaczors on gitter has produced a minmal reproduction case: 
>> https://github.com/kaczors/airflow_1_10_tz_bug
>> 
>> Rough repro steps: In a VM, with time syncing disabled, and configured with 
>> system timezone of Europe/Zurich (or any other CEST one) run 
>> 
>> - `date 10280250.00`
>> - initdb, start scheduler, webserver, enable dag etc.
>> - `date 10280259.00`
>> - wait 5-10 mins for scheduler to catch up
>> - After the on-the-hour task run the scheduler will spin up another process 
>> to parse the dag... and it never returns.
>> 
>> I've only just managed to reproduce it, so haven't dug in to why yet. A 
>> quick hacky debug print shows something is stuck in an infinite loop.
>> 
>> -ash
>> 
>>> On 29 Oct 2018, at 17:59, Bolke de Bruin  wrote:
>>> 
>>> Can this be confirmed? Then I can have a look at it. Preferably with dag 
>>> definition code.
>>> 
>>> On the licensing requirements:
>>> 
>>> 1. Indeed licensing header for markdown documents. It was suggested to use 
>>> html comments. I’m not sure how that renders with others like PDF though.
>>> 2. The licensing notifications need to be tied to a specific version as 
>>> licenses might change with versions.
>>> 
>>> Cheers
>>> Bolke
>>> 
>>> Verstuurd vanaf mijn iPad
>>> 
>>>> Op 29 okt. 2018 om 12:39 heeft Ash Berlin-Taylor  het 
>>>> volgende geschreven:
>>>> 
>>>> I was going to make a start on the release, but two people have reported 
>>>> that there might be an issue around non-UTC dags and the scheduler 
>>>> changing over from Summer time.
>>>> 
>>>>> 08:45 Emmanuel> Hi there, we are currently experiencing a very strange 
>>>>> issue : we have hourly DAGs with a start_date in a local timezone (not 
>>>>> UTC) and since (Sunday) the last winter time change they don’t run 
>>>>> anymore. Any idea ?
>>>>> 09:41  it impacted all our DAG that had a run at 3am 
>>>>> (Europe/Paris), the exact time of winter time change :(
>>>> 
>>>> I am going to take a look at this today and see if I can get to the bottom 
>>>> of it.
>>>> 
>>>> Bolke: are there any outstanding tasks/issues that you know of that might 
>>>> slow down the vote for a 1.10.1? (i.e. did we sort of out all the 
>>>> licensing issues that were asked of us? I thought I read something about 
>>>> license declarations in markdown files?)
>>>> 
>>>> -ash
>>>> 
>>>>> On 28 Oct 2018, at 14:46, Bolke de Bruin  wrote:
>>>>> 
>>>>> I agree with that, but I would favor time based releases instead. We are 
>>>>> again at the point that a release takes so much time that the gap is 
>>>>> getting really big again. @ash why not start releasing now and move the 
>>>>> remainder to 1.10.2? I dont think there are real blockers (although we 
>>>>> might find them).
>>>>> 
>>>>> 
>>>>>> On 28 Oct 2018, at 15:35, airflowuser 
>>>>>>  wrote:
>>>>>> 
>>>>>> I was really hoping that 
>>>>>> https://github.com/apache/incubator-airflow/pull/4069 will be merged 
>>>>>> into 1.10.1
>>>>>> Deleting dags was a highly requested feature for 1.10 - this can fix the 
>>>>>> problem with it.
>>>>>> 
>>>>>> 
>>>>>> ‐‐‐ Original Message ‐‐‐
>>>>>>>> On Friday, October 26, 2018 6:12 PM, Bolke de Bruin 
>>>>>>>>  wrote:
>>>>>>> 
>>>>>>> Hey Ash,
>>>>>>> 
>>>>>>> I was wondering if you are picking up the 1.10.1 release? Master is 
>>>>>>> speeding ahead and you were tracking fixes for 1.10.1 right?
>>>>>>> 
>>>>>>> B.
>>>> 
>> 
> 


Re: explicit_defaults_for_timestamp for mysql

2018-10-29 Thread Bolke de Bruin
I’m ok with it if you say you are running it (prod?). We don’t use MySQL so I 
cannot vouch for it.

Cheers
Bolke

> On 29 Oct 2018, at 23:14, Feng Lu  wrote:
> 
> I haven't tested the part where database tables are created with one flag but 
> accessed under a different flag, the changes have been working for us so far. 
> 
> On Tue, Oct 23, 2018 at 10:09 PM Bolke de Bruin  <mailto:bdbr...@gmail.com>> wrote:
> We only need it at table creation time or alter table time during which an 
> alembic script would fail if MySQL restarts I assume?
> 
> I'm not sure if the PR in this way is required (but if it works and works 
> well it's okay to me too just like consistency across DBS and no surprises 
> with MySQL )
> 
> Sent from my iPhone
> 
> On 24 Oct 2018, at 05:18, Feng Lu  <mailto:fen...@google.com>> wrote:
> 
>> Sorry for the late reply. 
>> GCP (CloudSQL) does support setting this parameter at session level but the 
>> VM used to host the mysqld might be restarted at any time, so it can't be 
>> done reliably. 
>> 
>> Haotian (cc-ed) in my team has looked into the needed schema changes to make 
>> Airflow 1.10 timestamp support to work with mysql without setting the 
>> exlicit_defaults_for_timestamp flag in mysql, details below: 
>> 
>> @@ -40,10 +40,6 @@
>>  conn = op.get_bind()
>>  if conn.dialect.name <http://conn.dialect.name/> == 'mysql':
>>  conn.execute("SET time_zone = '+00:00'")
>> -cur = conn.execute("SELECT @@explicit_defaults_for_timestamp")
>> -res = cur.fetchall()
>> -if res[0][0] == 0:
>> -raise Exception("Global variable 
>> explicit_defaults_for_timestamp needs to be on (1) for mysql")
>> 
>>  op.alter_column(table_name='chart', column_name='last_modified', 
>> type_=mysql.TIMESTAMP(fsp=6))
>> 
>> @@ -69,20 +65,28 @@
>>  op.alter_column(table_name='log', column_name='dttm', 
>> type_=mysql.TIMESTAMP(fsp=6))
>>  op.alter_column(table_name='log', column_name='execution_date', 
>> type_=mysql.TIMESTAMP(fsp=6))
>> 
>> -op.alter_column(table_name='sla_miss', 
>> column_name='execution_date', type_=mysql.TIMESTAMP(fsp=6), nullable=False)
>> +op.alter_column(table_name='sla_miss', 
>> column_name='execution_date', type_=mysql.TIMESTAMP(fsp=6), \
>> +nullable=False, server_default=sa.text('CURRENT_TIMESTAMP(6)'))
>>  op.alter_column(table_name='sla_miss', column_name='timestamp', 
>> type_=mysql.TIMESTAMP(fsp=6))
>> 
>> -op.alter_column(table_name='task_fail', 
>> column_name='execution_date', type_=mysql.TIMESTAMP(fsp=6))
>> +op.alter_column(table_name='task_fail', 
>> column_name='execution_date', type_=mysql.TIMESTAMP(fsp=6), \
>> +nullable=False, server_default=sa.text('CURRENT_TIMESTAMP(6)'))
>>  op.alter_column(table_name='task_fail', column_name='start_date', 
>> type_=mysql.TIMESTAMP(fsp=6))
>>  op.alter_column(table_name='task_fail', column_name='end_date', 
>> type_=mysql.TIMESTAMP(fsp=6))
>> 
>> -op.alter_column(table_name='task_instance', 
>> column_name='execution_date', type_=mysql.TIMESTAMP(fsp=6), nullable=False)
>> +op.alter_column(table_name='task_instance', 
>> column_name='execution_date', type_=mysql.TIMESTAMP(fsp=6), \
>> +nullable=False, server_default=sa.text('CURRENT_TIMESTAMP(6)'))
>>  op.alter_column(table_name='task_instance', 
>> column_name='start_date', type_=mysql.TIMESTAMP(fsp=6))
>>  op.alter_column(table_name='task_instance', column_name='end_date', 
>> type_=mysql.TIMESTAMP(fsp=6))
>>  op.alter_column(table_name='task_instance', 
>> column_name='queued_dttm', type_=mysql.TIMESTAMP(fsp=6))
>> 
>> -op.alter_column(table_name='xcom', column_name='timestamp', 
>> type_=mysql.TIMESTAMP(fsp=6))
>> -op.alter_column(table_name='xcom', column_name='execution_date', 
>> type_=mysql.TIMESTAMP(fsp=6))
>> +op.alter_column(table_name='xcom', column_name='timestamp', 
>> type_=mysql.TIMESTAMP(fsp=6), \
>> +nullable=False, server_default=sa.text('CURRENT_TIMESTAMP(6)'))
>> +op.alter_column(table_name='xcom', column_name='execution_date', 
>> type_=mysql.TIMESTAMP(fsp=6), \
>> +nullable=False, server_default=sa.text('CURRENT_TIMESTAMP(6)'))
>> +conn.execute("alter table task_instance alter column execution_date 
>> drop default")
>> +conn.execute("alter table sla_

Re: 1.10.1 Release?

2018-10-29 Thread Bolke de Bruin
Can this be confirmed? Then I can have a look at it. Preferably with dag 
definition code.

On the licensing requirements:

1. Indeed licensing header for markdown documents. It was suggested to use html 
comments. I’m not sure how that renders with others like PDF though.
2. The licensing notifications need to be tied to a specific version as 
licenses might change with versions.

Cheers
Bolke

Verstuurd vanaf mijn iPad

> Op 29 okt. 2018 om 12:39 heeft Ash Berlin-Taylor  het 
> volgende geschreven:
> 
> I was going to make a start on the release, but two people have reported that 
> there might be an issue around non-UTC dags and the scheduler changing over 
> from Summer time.
> 
>> 08:45 Emmanuel> Hi there, we are currently experiencing a very strange issue 
>> : we have hourly DAGs with a start_date in a local timezone (not UTC) and 
>> since (Sunday) the last winter time change they don’t run anymore. Any idea ?
>> 09:41  it impacted all our DAG that had a run at 3am 
>> (Europe/Paris), the exact time of winter time change :(
> 
> I am going to take a look at this today and see if I can get to the bottom of 
> it.
> 
> Bolke: are there any outstanding tasks/issues that you know of that might 
> slow down the vote for a 1.10.1? (i.e. did we sort of out all the licensing 
> issues that were asked of us? I thought I read something about license 
> declarations in markdown files?)
> 
> -ash
> 
>> On 28 Oct 2018, at 14:46, Bolke de Bruin  wrote:
>> 
>> I agree with that, but I would favor time based releases instead. We are 
>> again at the point that a release takes so much time that the gap is getting 
>> really big again. @ash why not start releasing now and move the remainder to 
>> 1.10.2? I dont think there are real blockers (although we might find them).
>> 
>> 
>>> On 28 Oct 2018, at 15:35, airflowuser  
>>> wrote:
>>> 
>>> I was really hoping that 
>>> https://github.com/apache/incubator-airflow/pull/4069 will be merged into 
>>> 1.10.1
>>> Deleting dags was a highly requested feature for 1.10 - this can fix the 
>>> problem with it.
>>> 
>>> 
>>> ‐‐‐ Original Message ‐‐‐
>>>>> On Friday, October 26, 2018 6:12 PM, Bolke de Bruin  
>>>>> wrote:
>>>> 
>>>> Hey Ash,
>>>> 
>>>> I was wondering if you are picking up the 1.10.1 release? Master is 
>>>> speeding ahead and you were tracking fixes for 1.10.1 right?
>>>> 
>>>> B.
> 


Re: 1.10.1 Release?

2018-10-28 Thread Bolke de Bruin
I agree with that, but I would favor time based releases instead. We are again 
at the point that a release takes so much time that the gap is getting really 
big again. @ash why not start releasing now and move the remainder to 1.10.2? I 
dont think there are real blockers (although we might find them).


> On 28 Oct 2018, at 15:35, airflowuser  
> wrote:
> 
> I was really hoping that 
> https://github.com/apache/incubator-airflow/pull/4069 will be merged into 
> 1.10.1
> Deleting dags was a highly requested feature for 1.10 - this can fix the 
> problem with it.
> 
> 
> ‐‐‐ Original Message ‐‐‐
> On Friday, October 26, 2018 6:12 PM, Bolke de Bruin  wrote:
> 
>> Hey Ash,
>> 
>> I was wondering if you are picking up the 1.10.1 release? Master is speeding 
>> ahead and you were tracking fixes for 1.10.1 right?
>> 
>> B.
> 
> 



1.10.1 Release?

2018-10-26 Thread Bolke de Bruin
Hey Ash,

I was wondering if you are picking up the 1.10.1 release? Master is speeding 
ahead and you were tracking fixes for 1.10.1 right?

B.

Re: explicit_defaults_for_timestamp for mysql

2018-10-19 Thread Bolke de Bruin
O and reading 

https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_explicit_defaults_for_timestamp
 


indicates that it can be set on the session level as well. So we could just 
change the alembic scripts do try it. However
MariaDB does not support it in a session so we always need to check the 
variable. We will also need to set it at *every* 
alembic script that deals with datetimes in the future. Nevertheless this might 
be the easiest solution.

Does GCP’s MySQL also allow this setting in the session scope? 

B.

> On 19 Oct 2018, at 18:48, Deng Xiaodong  wrote:
> 
> I'm ok to test this.
> 
> @ash, may you kindly give some examples of what exact behaviour the testers
> should pay attention to? Since people like me may not know the full
> background of having introduced this restriction & check, or what issue it
> was trying to address.
> 
> @Feng Lu, may you please advise if you are still interested to prepare this
> PR?
> 
> Thanks!
> 
> 
> XD
> 
> On Sat, Oct 20, 2018 at 12:38 AM Ash Berlin-Taylor  wrote:
> 
>> This sounds sensible and would mean we could also run on GCP's MySQL
>> offering too.
>> 
>> This would need someone to try out and check that timezones behave
>> sensibly with this change made.
>> 
>> Any volunteers?
>> 
>> -ash
>> 
>>> On 19 Oct 2018, at 17:32, Deng Xiaodong  wrote:
>>> 
>>> Wondering if there is any further thoughts about this proposal kindly
>> raised by Feng Lu earlier?
>>> 
>>> If we can skip this check & allow explicit_defaults_for_timestamp to be
>> 0, it would be helpful, especially for enterprise users in whose
>> environments it’s really hard to ask for a database global variable change
>> (like myself…).
>>> 
>>> 
>>> XD
>>> 
>>> On 2018/08/28 15:23:10, Feng Lu  wrote:
 Bolke, a gentle ping..>
 Thank you.>
 
 On Thu, Aug 23, 2018, 23:01 Feng Lu  wrote:>
 
> Hi all,>
>> 
> After reading the MySQL documentation on the>
> exlicit_defaults_for_timestamp, it appears that we can skip the check
>> on explicit_defaults_for_timestamp>
> = 1>
> <
>> https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/0e2a74e0fc9f_add_time_zone_awareness.py#L43>
>> by>
> setting the column to accept NULL explicitly. For example:>
>> 
> op.alter_column(table_name='chart', column_name='last_modified',>
> type_=mysql.TIMESTAMP(fsp=6)) -->>
> op.alter_column(table_name='chart', column_name='last_modified',>
> type_=mysql.TIMESTAMP(fsp=6), nullable=True)>
>> 
> Here's why:>
> From MySQL doc (when explicit_defaults_for_timestamp is set to True):>
> "TIMESTAMP columns not explicitly declared with the NOT NULL attribute
>> are>
> automatically declared with the NULL attribute and permit NULL
>> values.>
> Assigning such a column a value of NULL sets it to NULL, not the
>> current>
> timestamp.">
>> 
> Thanks and happy to shoot a PR if it makes sense.>
>> 
> Feng>
>> 
>> 
>> 
>> 
>> 



Re: execution_date - can we stop the confusion?

2018-09-26 Thread Bolke de Bruin
I dont think this makes sense and I dont that think anyone had a real issue 
with this. Execution date has been clearly documented  and is part of the core 
principles of airflow. Renaming will create more confusion.  

Please note that I do think that as an anonymous user you cannot speak for any 
"new airflow user". That is a contradiction to me. 

Thanks
Bolke

Sent from my iPhone

> On 26 Sep 2018, at 07:59, airflowuser  
> wrote:
> 
> One of the most annoying, hard to understand and against all common sense is 
> the execution_date behavior. I assume that any new Airflow user has been 
> struggling with it.
> The amount of questions with answers referring to : 
> https://airflow.apache.org/scheduler.html?scheduling-triggers  is uncountable.
> 
> Most people mistakenly think that execution_date is the datetime which the 
> DAG started to run.
> 
> I suggest the following changes:
> 1. Renaming the execution_date to something else like: run_stamped   This 
> name won't cause people to get confused.
> 2. Adding a new variable which indicated the actual datetime when the DAG run 
> was generated. call it execution_start_date. People seem to want the 
> information when the DAG actually started to be executed/run.
> 
> This is only naming changes. No need to actual change the behavior - This 
> will only make things simpler as when user encounter  run_stamped  he won't 
> be confused by the name like execution_date


Re: Airflow: Apache Graduation

2018-09-21 Thread Bolke de Bruin
I love champions 

Op vr 21 sep. 2018 14:01 schreef Jakob Homan :

> The big list that *has* to be filled out and correct is here:
> http://incubator.apache.org/projects/airflow.html
>
> This is usually done by the Champion, Chris Riccomini.  Chris  - are
> you still interested to do this?
>
> After that and the maturity questionnaire are in good shape, it's:
> 1) Graduation DISCUSS by the community.  This thread can cover that,
> although an explicit DISCUSS would help too.  Example:
>
> https://mail-archives.apache.org/mod_mbox/samza-dev/201412.mbox/%3CCADiKvVtBOPaji22Nd3927R6Tj%2BmWpxnHH%2Bk%3DyXCQEMGf_CnAqw%40mail.gmail.com%3E
> 2) Graduation VOTE by the community on a resolution for the new top
> level project (TLP). Example:
>
> https://mail-archives.apache.org/mod_mbox/samza-dev/201412.mbox/%3CCADiKvVt_LK9%2BBXC8eR_gb3ALOU2KFJqDDyZOLao7rm0XjuZmcg%40mail.gmail.com%3E
> - most of the text is boiler plate, but be sure not leave anything in
> there that shouldn't be (Jakob said with experience...)
> 3) Exact same text is VOTEd on by the IPMC
> 4) Champion adds the resolution to the Board's next monthly agenda,
> where the Board approves it.  Project is officially TLP once that is
> done.
> 5) Champion cleans up incubator resources, creates new TLP resources.
> Example: https://jira.apache.org/jira/browse/INFRA-10639
>
> -Jakob
>
>
> On 21 September 2018 at 13:49, Bolke de Bruin  wrote:
> > There is a maturity check that we can do. Don't have the link ready now.
> > But can look it up later.
> >
> > To me two things need to be more mature
> >
> > 1. Handling of security issues
> > 2. Licensing.
> >  2a having ppl aware that with new dependencies updates to dependencies
> the
> > respective licenses need to be updated. Too often I need to fix some
> issues
> > at release time.
> >  2b versions need to be added to the licenses (version of the software)
> >
> > B.
> >
> > Op vr 21 sep. 2018 13:00 schreef Ash Berlin-Taylor :
> >
> >> Your guess is as good as mine on what is involved in graduation. I think
> >> that we sorted the Licensing issues in 1.10.0 (even if the way we
> sorted it
> >> was a little annoying - having to specify the environment variable at
> `pip
> >> install` time is a littlle bit un-pythonic, but I wasn't thinking of
> fixing
> >> that in 1.10.1)
> >>
> >> Some of the steps and requirements are listed here
> >> graduating_to_a_top_level_project <
> >> https://incubator.apache.org/guides/graduation.html>, but in summary
> from
> >> a quick read of it I think the process is:
> >>
> >> - Collect some information, put it on
> >> http://incubator.apache.org/projects/airflow.html <
> >> http://incubator.apache.org/projects/airflow.html> for the IPMC
> >> - We as a community hold a vote on if we think we're ready to graduate
> >> - IPMC vote on it too
> >> - propose motion for (monthly) Apache board meeting
> >>
> >> There might be a few more steps involves, such as drafting a Charter (we
> >> would probably start with a "stock" Apache one)
> >>
> >> -ash
> >>
> >> > On 20 Sep 2018, at 18:22, Maxime Beauchemin <
> maximebeauche...@gmail.com>
> >> wrote:
> >> >
> >> > Yeah let's make it happen! I'm happy to set some time aside to help
> with
> >> > the final push.
> >> >
> >> > Max
> >> >
> >> > On Thu, Sep 20, 2018 at 9:53 AM Sid Anand  wrote:
> >> >
> >> >> Folks! (specifically Bolke, Fokko, Ash)
> >> >> What's needed to graduate from Apache?
> >> >>
> >> >> Can we make 1.10.1 be about meeting our licensing needs to allow us
> to
> >> >> graduate?
> >> >>
> >> >> -s
> >> >>
> >>
> >>
>


Re: Airflow: Apache Graduation

2018-09-21 Thread Bolke de Bruin
There is a maturity check that we can do. Don't have the link ready now.
But can look it up later.

To me two things need to be more mature

1. Handling of security issues
2. Licensing.
 2a having ppl aware that with new dependencies updates to dependencies the
respective licenses need to be updated. Too often I need to fix some issues
at release time.
 2b versions need to be added to the licenses (version of the software)

B.

Op vr 21 sep. 2018 13:00 schreef Ash Berlin-Taylor :

> Your guess is as good as mine on what is involved in graduation. I think
> that we sorted the Licensing issues in 1.10.0 (even if the way we sorted it
> was a little annoying - having to specify the environment variable at `pip
> install` time is a littlle bit un-pythonic, but I wasn't thinking of fixing
> that in 1.10.1)
>
> Some of the steps and requirements are listed here
> graduating_to_a_top_level_project <
> https://incubator.apache.org/guides/graduation.html>, but in summary from
> a quick read of it I think the process is:
>
> - Collect some information, put it on
> http://incubator.apache.org/projects/airflow.html <
> http://incubator.apache.org/projects/airflow.html> for the IPMC
> - We as a community hold a vote on if we think we're ready to graduate
> - IPMC vote on it too
> - propose motion for (monthly) Apache board meeting
>
> There might be a few more steps involves, such as drafting a Charter (we
> would probably start with a "stock" Apache one)
>
> -ash
>
> > On 20 Sep 2018, at 18:22, Maxime Beauchemin 
> wrote:
> >
> > Yeah let's make it happen! I'm happy to set some time aside to help with
> > the final push.
> >
> > Max
> >
> > On Thu, Sep 20, 2018 at 9:53 AM Sid Anand  wrote:
> >
> >> Folks! (specifically Bolke, Fokko, Ash)
> >> What's needed to graduate from Apache?
> >>
> >> Can we make 1.10.1 be about meeting our licensing needs to allow us to
> >> graduate?
> >>
> >> -s
> >>
>
>


Re: Sep Airflow Bay Area Meetup @ Google

2018-09-18 Thread Bolke de Bruin
it’s a bit short notice but we could try. Maybe smart if you send it out? 
Fokko, my colleague (Frank) and I are pretty flexible although the schedule 
gets a bit fuller. But and afternoon with a dinner would work on every day 
except Monday.

What would work best for you?

Bolke.

> On 17 Sep 2018, at 20:18, Maxime Beauchemin  
> wrote:
> 
> Lyft could probably host if we want to schedule something last minute while
> you and your crew are in town @Bolke. Maybe a one day get together + some
> hacking. Do you want to start another thread to assess interest?
> 
> Max
> 
> On Fri, Sep 14, 2018 at 11:42 PM Feng Lu  wrote:
> 
>> Not going to happen for this time, we don't receive enough interest from
>> the community.
>> 
>> 
>> On Wed, Sep 12, 2018 at 7:57 AM Bolke de Bruin  wrote:
>> 
>>> btw how are we doing on the “one day” hackathon?
>>> 
>>>> On 12 Sep 2018, at 16:49, Bolke de Bruin  wrote:
>>>> 
>>>> Hi feng,
>>>> 
>>>> I can do “Elegant pipelining with Airflow” recycle of pydata 2018
>>> amsterdam (that I did together with Fokko).
>>>> 
>>>> Cheers
>>>> Bolke
>>>> 
>>>>> On 4 Sep 2018, at 22:13, Feng Lu  wrote:
>>>>> 
>>>>> We are 3 weeks away from the meetup and still have a few lightening
>>> talks
>>>>> open, please take the chance and share your cool ideas/work ;)
>>>>> Meanwhile, speakers could you please send me and Trishka (
>>> tris...@google.com)
>>>>> your slides?
>>>>> 
>>>>> Thank you.
>>>>> 
>>>>> Feng
>>>>> 
>>>>> On Sun, Aug 12, 2018 at 9:46 PM Maxime Beauchemin <
>>>>> maximebeauche...@gmail.com> wrote:
>>>>> 
>>>>>> Hey Feng,
>>>>>> 
>>>>>> Sign me up for a session on "Challenges ahead - taking airflow to the
>>> next
>>>>>> level". I'm planning on recycling the content from the talk @Google
>>> next
>>>>>> Friday.
>>>>>> 
>>>>>> Max
>>>>>> 
>>>>>> On Fri, Aug 10, 2018 at 3:22 PM Feng Lu 
>>> wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> We still have 1-2 regular sessions and 4-5 lightening sessions
>>> available,
>>>>>>> please send in your talks ;)
>>>>>>> Here's a quick summary on the talks I have received:
>>>>>>> 
>>>>>>> Regular sessions:
>>>>>>> Ben Gregory (Astronomer): Running Cloud Native Airflow.
>>>>>>> Feng Lu (Google): Managing Airflow As a Service: Best Practices,
>>>>>> Experience
>>>>>>> and Roadmap
>>>>>>> Fokko Driesprong (GoDataDriven): Apache Airflow in the Google Cloud:
>>>>>>> Backfilling streaming data using Dataflow
>>>>>>> 
>>>>>>> Lightening Session:
>>>>>>> Barni Seetharaman (Google): Deploy Airflow on Kubernetes using
>> Airflow
>>>>>>> Operator
>>>>>>> 
>>>>>>> Session type TBD:
>>>>>>> Manish Ranjan (Tile): Functional yet cost-effective Data Engineering
>>> With
>>>>>>> Airflow
>>>>>>> 
>>>>>>> Thanks and looking forward to the meetup(120+ sign-ups to date)!
>>>>>>> 
>>>>>>> 
>>>>>>> Feng
>>>>>>> 
>>>>>>> On Thu, Jul 19, 2018 at 2:26 PM Feng Lu  wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> Hope you are enjoying your summer!
>>>>>>>> 
>>>>>>>> This is Feng Lu from Google and we'll host the next Airflow meetup
>> in
>>>>>>> our Sunnyvale
>>>>>>>> campus. We plan to add a *lightening session* this time for people
>> to
>>>>>>>> share their airflow ideas, work in progress, pain points, etc.
>>>>>>>> Here's the meetup date and schedule:
>>>>>>>> 
>>>>>>>> -- Sep 24 (Monday)  --
>>>>>>>> 6:00PM meetup starts
>>>>>>>> 6:00 - 8:00PM light dinner /mix-n-mingle
>>>>>>>> 8:00PM - 9:40PM: 5 sessions (20 minutes each)
>>>>>>>> 9:40PM to 10:10PM: 6 lightening sessions (5 minutes each)
>>>>>>>> 10:10PM to 11:00PM: drinks and social hour
>>>>>>>> 
>>>>>>>> I've seen a lot of interesting discussions in the dev mailing-list
>> on
>>>>>>>> security, scalability, event interactions, future directions,
>> hosting
>>>>>>>> platform and others. Please feel free to send your talk proposal to
>>> us
>>>>>> by
>>>>>>>> replying this email.
>>>>>>>> 
>>>>>>>> The Cloud Composer team is also going to share their experience
>>> running
>>>>>>>> Apache Airflow as a managed solution and service roadmap.
>>>>>>>> 
>>>>>>>> Thank you and looking forward to hearing from y'all soon!
>>>>>>>> 
>>>>>>>> p.s., if folks are interested, we can also add a one-day Airflow
>>>>>>> hackathon
>>>>>>>> prior to the meet-up on the same day, please let us know.
>>>>>>>> 
>>>>>>>> Feng
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>>> 
>> 



Re: Database referral integrity

2018-09-18 Thread Bolke de Bruin
Adding these kind of checks which work for integrity well make database access 
pretty slow. In addition it isnt there because in the past there was no strong 
connection between for example tasks and dagruns, it was more or less just 
coincidental. There also so some bisecting tools that probably have difficulty 
functioning in a new regime. In other words it is not an easy change and it 
will have operational challenges.

> On 18 Sep 2018, at 11:03, Ash Berlin-Taylor  wrote:
> 
> Ooh good spot.
> 
> Yes I would be in favour of adding these, but as you say we need to thing 
> about how we might migrate old data.
> 
> Doing this at 2.0.0 and providing a cleanup script (or doing it as part of 
> the migration?) is probably the way to go.
> 
> -ash-
> 
>> On 17 Sep 2018, at 19:56, Stefan Seelmann  wrote:
>> 
>> Hi,
>> 
>> looking into the DB schema there is almost no referral integrity
>> enforced at the database level. Many foreign key constraints between
>> dag, dag_run, task_instance, xcom, dag_pickle, log, etc would make sense
>> IMO.
>> 
>> Is there a particular reason why that's not implemented?
>> 
>> Introducing it now will be hard, probably any real-world setup has some
>> violations. But I'm still in favor of this additional safety net.
>> 
>> Kind Regards,
>> Stefan
> 



Live Airflow Meetup @ Amsterdam

2018-09-12 Thread Bolke de Bruin
https://www.youtube.com/watch?v=MM8tfTrcnfk 


Enjoy!
Bolke

Re: Sep Airflow Bay Area Meetup @ Google

2018-09-12 Thread Bolke de Bruin
btw how are we doing on the “one day” hackathon? 

> On 12 Sep 2018, at 16:49, Bolke de Bruin  wrote:
> 
> Hi feng,
> 
> I can do “Elegant pipelining with Airflow” recycle of pydata 2018 amsterdam 
> (that I did together with Fokko).
> 
> Cheers
> Bolke
> 
>> On 4 Sep 2018, at 22:13, Feng Lu  wrote:
>> 
>> We are 3 weeks away from the meetup and still have a few lightening talks
>> open, please take the chance and share your cool ideas/work ;)
>> Meanwhile, speakers could you please send me and Trishka (tris...@google.com)
>> your slides?
>> 
>> Thank you.
>> 
>> Feng
>> 
>> On Sun, Aug 12, 2018 at 9:46 PM Maxime Beauchemin <
>> maximebeauche...@gmail.com> wrote:
>> 
>>> Hey Feng,
>>> 
>>> Sign me up for a session on "Challenges ahead - taking airflow to the next
>>> level". I'm planning on recycling the content from the talk @Google next
>>> Friday.
>>> 
>>> Max
>>> 
>>> On Fri, Aug 10, 2018 at 3:22 PM Feng Lu  wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> We still have 1-2 regular sessions and 4-5 lightening sessions available,
>>>> please send in your talks ;)
>>>> Here's a quick summary on the talks I have received:
>>>> 
>>>> Regular sessions:
>>>> Ben Gregory (Astronomer): Running Cloud Native Airflow.
>>>> Feng Lu (Google): Managing Airflow As a Service: Best Practices,
>>> Experience
>>>> and Roadmap
>>>> Fokko Driesprong (GoDataDriven): Apache Airflow in the Google Cloud:
>>>> Backfilling streaming data using Dataflow
>>>> 
>>>> Lightening Session:
>>>> Barni Seetharaman (Google): Deploy Airflow on Kubernetes using Airflow
>>>> Operator
>>>> 
>>>> Session type TBD:
>>>> Manish Ranjan (Tile): Functional yet cost-effective Data Engineering With
>>>> Airflow
>>>> 
>>>> Thanks and looking forward to the meetup(120+ sign-ups to date)!
>>>> 
>>>> 
>>>> Feng
>>>> 
>>>> On Thu, Jul 19, 2018 at 2:26 PM Feng Lu  wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> Hope you are enjoying your summer!
>>>>> 
>>>>> This is Feng Lu from Google and we'll host the next Airflow meetup in
>>>> our Sunnyvale
>>>>> campus. We plan to add a *lightening session* this time for people to
>>>>> share their airflow ideas, work in progress, pain points, etc.
>>>>> Here's the meetup date and schedule:
>>>>> 
>>>>> -- Sep 24 (Monday)  --
>>>>> 6:00PM meetup starts
>>>>> 6:00 - 8:00PM light dinner /mix-n-mingle
>>>>> 8:00PM - 9:40PM: 5 sessions (20 minutes each)
>>>>> 9:40PM to 10:10PM: 6 lightening sessions (5 minutes each)
>>>>> 10:10PM to 11:00PM: drinks and social hour
>>>>> 
>>>>> I've seen a lot of interesting discussions in the dev mailing-list on
>>>>> security, scalability, event interactions, future directions, hosting
>>>>> platform and others. Please feel free to send your talk proposal to us
>>> by
>>>>> replying this email.
>>>>> 
>>>>> The Cloud Composer team is also going to share their experience running
>>>>> Apache Airflow as a managed solution and service roadmap.
>>>>> 
>>>>> Thank you and looking forward to hearing from y'all soon!
>>>>> 
>>>>> p.s., if folks are interested, we can also add a one-day Airflow
>>>> hackathon
>>>>> prior to the meet-up on the same day, please let us know.
>>>>> 
>>>>> Feng
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
> 



Re: Sep Airflow Bay Area Meetup @ Google

2018-09-12 Thread Bolke de Bruin
Hi feng,

I can do “Elegant pipelining with Airflow” recycle of pydata 2018 amsterdam 
(that I did together with Fokko).

Cheers
Bolke

> On 4 Sep 2018, at 22:13, Feng Lu  wrote:
> 
> We are 3 weeks away from the meetup and still have a few lightening talks
> open, please take the chance and share your cool ideas/work ;)
> Meanwhile, speakers could you please send me and Trishka (tris...@google.com)
> your slides?
> 
> Thank you.
> 
> Feng
> 
> On Sun, Aug 12, 2018 at 9:46 PM Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
> 
>> Hey Feng,
>> 
>> Sign me up for a session on "Challenges ahead - taking airflow to the next
>> level". I'm planning on recycling the content from the talk @Google next
>> Friday.
>> 
>> Max
>> 
>> On Fri, Aug 10, 2018 at 3:22 PM Feng Lu  wrote:
>> 
>>> Hi all,
>>> 
>>> We still have 1-2 regular sessions and 4-5 lightening sessions available,
>>> please send in your talks ;)
>>> Here's a quick summary on the talks I have received:
>>> 
>>> Regular sessions:
>>> Ben Gregory (Astronomer): Running Cloud Native Airflow.
>>> Feng Lu (Google): Managing Airflow As a Service: Best Practices,
>> Experience
>>> and Roadmap
>>> Fokko Driesprong (GoDataDriven): Apache Airflow in the Google Cloud:
>>> Backfilling streaming data using Dataflow
>>> 
>>> Lightening Session:
>>> Barni Seetharaman (Google): Deploy Airflow on Kubernetes using Airflow
>>> Operator
>>> 
>>> Session type TBD:
>>> Manish Ranjan (Tile): Functional yet cost-effective Data Engineering With
>>> Airflow
>>> 
>>> Thanks and looking forward to the meetup(120+ sign-ups to date)!
>>> 
>>> 
>>> Feng
>>> 
>>> On Thu, Jul 19, 2018 at 2:26 PM Feng Lu  wrote:
>>> 
 Hi all,
 
 Hope you are enjoying your summer!
 
 This is Feng Lu from Google and we'll host the next Airflow meetup in
>>> our Sunnyvale
 campus. We plan to add a *lightening session* this time for people to
 share their airflow ideas, work in progress, pain points, etc.
 Here's the meetup date and schedule:
 
 -- Sep 24 (Monday)  --
 6:00PM meetup starts
 6:00 - 8:00PM light dinner /mix-n-mingle
 8:00PM - 9:40PM: 5 sessions (20 minutes each)
 9:40PM to 10:10PM: 6 lightening sessions (5 minutes each)
 10:10PM to 11:00PM: drinks and social hour
 
 I've seen a lot of interesting discussions in the dev mailing-list on
 security, scalability, event interactions, future directions, hosting
 platform and others. Please feel free to send your talk proposal to us
>> by
 replying this email.
 
 The Cloud Composer team is also going to share their experience running
 Apache Airflow as a managed solution and service roadmap.
 
 Thank you and looking forward to hearing from y'all soon!
 
 p.s., if folks are interested, we can also add a one-day Airflow
>>> hackathon
 prior to the meet-up on the same day, please let us know.
 
 Feng
 
 
 
 
 
 
 
 
 
>>> 
>> 



Re: TriggerDagRunOperator sub tasks are scheduled to run after few hours

2018-09-10 Thread Bolke de Bruin
Dates are converted to UTC before submitting to the DB and are converted to UTC 
on retrieval. Is this on 1.10.0 release? Postgres will return datetimes with 
timezone information so conversion should go properly (MySQL won’t but we set 
it to UTC explicit)

B.

> On 10 Sep 2018, at 20:15, John Culver  wrote:
> 
> Not sure if this will help, but we had same issue 3 hours apart.
> Problem was the DBMS (Postgres) was different timezone than
> webserver/scheduler/workers.
> When a new DAGRun was inserted, it was allowing DB to use current time for
> execution date.
> 
> On Mon, Sep 10, 2018 at 2:10 PM, Bolke de Bruin  wrote:
> 
>> I cannot connect this to timezone issues yet. The trigger dagrun operator
>> seems to work fine (checked the examples).
>> 
>> B.
>> 
>>> On 10 Sep 2018, at 08:19, Deng Xiaodong  wrote:
>>> 
>>> Hi Goutam,
>>> 
>>> Seems it’s due to the timezone setting?
>>> 
>>> If that’s the case, you may try either adjusting your timezone setting or
>>> “manually” adjusting using execution_date argument in
>> TriggerDagRunOperator.
>>> 
>>> 
>>> XD
>>> 
>>> On Mon, Sep 10, 2018 at 14:00 Goutam Kumar Sahoo <
>>> goutamkumar.sa...@infosys.com> wrote:
>>> 
>>>> HI All,
>>>> 
>>>> 
>>>> 
>>>> Could anyone please address this issue ? We are really stuck here and
>> not
>>>> able to proceed further
>>>> 
>>>> 
>>>> 
>>>> Thanks & Regards
>>>> 
>>>> Goutam Sahoo
>>>> 
>>>> 
>>>> 
>>>> *From:* Goutam Kumar Sahoo
>>>> *Sent:* Friday, September 07, 2018 4:51 PM
>>>> *To:* 'dev@airflow.incubator.apache.org' > org>;
>>>> 'd...@airflow.apache.org' 
>>>> *Subject:* TriggerDagRunOperator sub tasks are scheduled to run after
>> few
>>>> hours
>>>> 
>>>> 
>>>> 
>>>> HI Experts
>>>> 
>>>> 
>>>> 
>>>> In our project, we are trying to replicate the existing  job scheduling
>>>> implemented in Microsoft Orchestrater to new scheduler called “Apache
>>>> Airflow” .
>>>> 
>>>> 
>>>> 
>>>> During this replication process we have an requirement to create Master
>>>> DAG which in turn will call Other child DAGs based on different
>> condition.
>>>> We referred to the existing example (example_trigger_controller_dag,
>>>> example_trigger_target_dag) available in Airflow and tried running them
>>>> manually.
>>>> 
>>>> 
>>>> 
>>>> When we triggered the controller DAG manually , it is got triggered
>>>> immediately with execution time created in local timezone (PDT) whereas
>> the
>>>> child DAG is scheduled to run few hours later. Even though child DAGRUN
>> is
>>>> showing as running, it is doing nothing and waiting for the time to
>>>> satisfy.  Our requirement is to trigger the Target/Child DAG run as
>> soon it
>>>> is triggered by controller DAG
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> If you see here , the trigger time and execution time is having a 7
>> hours
>>>> GAP.
>>>> 
>>>> 
>>>> 
>>>> We have tried to solve this many different ways but couldn’t succeed. We
>>>> have asked for help in many forums but didn’t get any satisfactory
>> answer
>>>> form any of them
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Please help us ASAP as this issue is one of the show stopper for us .
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 



Re: TriggerDagRunOperator sub tasks are scheduled to run after few hours

2018-09-10 Thread Bolke de Bruin
I cannot connect this to timezone issues yet. The trigger dagrun operator seems 
to work fine (checked the examples).

B.

> On 10 Sep 2018, at 08:19, Deng Xiaodong  wrote:
> 
> Hi Goutam,
> 
> Seems it’s due to the timezone setting?
> 
> If that’s the case, you may try either adjusting your timezone setting or
> “manually” adjusting using execution_date argument in TriggerDagRunOperator.
> 
> 
> XD
> 
> On Mon, Sep 10, 2018 at 14:00 Goutam Kumar Sahoo <
> goutamkumar.sa...@infosys.com> wrote:
> 
>> HI All,
>> 
>> 
>> 
>> Could anyone please address this issue ? We are really stuck here and not
>> able to proceed further
>> 
>> 
>> 
>> Thanks & Regards
>> 
>> Goutam Sahoo
>> 
>> 
>> 
>> *From:* Goutam Kumar Sahoo
>> *Sent:* Friday, September 07, 2018 4:51 PM
>> *To:* 'dev@airflow.incubator.apache.org' ;
>> 'd...@airflow.apache.org' 
>> *Subject:* TriggerDagRunOperator sub tasks are scheduled to run after few
>> hours
>> 
>> 
>> 
>> HI Experts
>> 
>> 
>> 
>> In our project, we are trying to replicate the existing  job scheduling
>> implemented in Microsoft Orchestrater to new scheduler called “Apache
>> Airflow” .
>> 
>> 
>> 
>> During this replication process we have an requirement to create Master
>> DAG which in turn will call Other child DAGs based on different condition.
>> We referred to the existing example (example_trigger_controller_dag,
>> example_trigger_target_dag) available in Airflow and tried running them
>> manually.
>> 
>> 
>> 
>> When we triggered the controller DAG manually , it is got triggered
>> immediately with execution time created in local timezone (PDT) whereas the
>> child DAG is scheduled to run few hours later. Even though child DAGRUN is
>> showing as running, it is doing nothing and waiting for the time to
>> satisfy.  Our requirement is to trigger the Target/Child DAG run as soon it
>> is triggered by controller DAG
>> 
>> 
>> 
>> 
>> 
>> If you see here , the trigger time and execution time is having a 7 hours
>> GAP.
>> 
>> 
>> 
>> We have tried to solve this many different ways but couldn’t succeed. We
>> have asked for help in many forums but didn’t get any satisfactory answer
>> form any of them
>> 
>> 
>> 
>> 
>> 
>> Please help us ASAP as this issue is one of the show stopper for us .
>> 
>> 
>> 
>> 
>> 
>> 
>> 



Re: Call for fixes for Airflow 1.10.1

2018-09-09 Thread Bolke de Bruin
You can already add them to v1-10-test.

Normally we are a bit cautious to this if you are not the release manager
to ensure that he/she knows what the state is.

B

Op zo 9 sep. 2018 18:02 schreef Driesprong, Fokko :

> Can we add this one as well?
>
> https://github.com/apache/incubator-airflow/pull/3862
> https://issues.apache.org/jira/browse/AIRFLOW-1917
>
> I'm happy to cherry pick them onto the 1.10.1 by myself as well. Any idea
> when we will start this branch?
>
> Cheers, Fokko
>
> Op do 6 sep. 2018 om 08:08 schreef Deng Xiaodong :
>
> > Hi Ash,
> >
> >
> > May you consider including JIRA ticket 2848 (PR 3693, Ensure dag_id in
> > metadata "job" for LocalTaskJob) in 1.10.1 as well?
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2848
> >
> > https://github.com/apache/incubator-airflow/pull/3693
> >
> >
> > This is a bug in terms of metadata, which also affects the UI
> > “Browse->Jobs”.
> >
> >
> > Thanks.
> >
> >
> > Regards,
> >
> > XD
> >
> > On Wed, Sep 5, 2018 at 23:55 Bolke de Bruin  wrote:
> >
> > > You should push these to v1-10-test not to stable. Only once we start
> > > cutting RCs you should push to -stable. See the docs. This ensures a
> > stable
> > > “stable”branch.
> > >
> > > Cheers
> > > Bolke.
> > >
> > > > On 3 Sep 2018, at 14:20, Ash Berlin-Taylor  wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > I'm starting the process of gathering fixes for a 1.10.1. So far the
> > > list of issues I have that we should pull in are
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20AIRFLOW%20AND%20fixVersion%20%3D%201.10.1%20ORDER%20BY%20key%20ASC
> > > (reproduces below)
> > > >
> > > > I will start pushing these as cherry-picked commits to the
> v1-10-stable
> > > branch today.
> > > >
> > > > If you have something that is not in the list below let me know. I'd
> > > like to keep this to bug fixes against 1.10.0 only if possible.
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2145 Deadlock after
> > > clearing a running task
> > > > https://github.com/apache/incubator-airflow/pull/3657
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2476 update tabulate
> dep
> > > to 0.8.2
> > > > https://github.com/apache/incubator-airflow/pull/3835
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2778 Bad Import in
> > > collect_dag in DagBag
> > > > https://github.com/apache/incubator-airflow/pull/3624
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2900 Show code for
> > > packaged DAGs
> > > > https://github.com/apache/incubator-airflow/pull/3749
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2949 Add syntax
> > highlight
> > > for single quote strings
> > > > https://github.com/apache/incubator-airflow/pull/3795
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2984 Cannot convert
> > > naive_datetime when task has a naive start_date/end_date
> > > > https://github.com/apache/incubator-airflow/pull/3822
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2979 Deprecated Celery
> > > Option not in Options list
> > > > https://github.com/apache/incubator-airflow/pull/3832
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2989 No Parameter to
> > > change bootDiskType for DataprocClusterCreateOperator
> > > > https://github.com/apache/incubator-airflow/pull/3825
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2990 Docstrings for
> > > Hooks/Operators are in incorrect format
> > > > https://github.com/apache/incubator-airflow/pull/3820
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2994 flatten_results
> in
> > > BigQueryOperator/BigQueryHook should default to None
> > > > https://github.com/apache/incubator-airflow/pull/3829
> > > >
> > > >
> > > > In addition to those PRs which are already marked with Fix Version of
> > > 1.10.1 I think we should also pull in these:
> > > >
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2713 Rename async
> > > variable for Python 3.7.0 compatibility
> > > > https://github.com/apache/incubator-airflow/pull/3561
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2895 Prevent scheduler
> > > from spamming heartbeats/logs
> > > > https://github.com/apache/incubator-airflow/pull/3747
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2921 A trivial
> > > incorrectness in CeleryExecutor()
> > > > https://github.com/apache/incubator-airflow/pull/3773
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2866 Missing CSRF
> Token
> > > Error on Web RBAC UI Create/Update Operations
> > > > https://github.com/apache/incubator-airflow/pull/3804
> > > >
> > > >
> > > > https://issues.apache.org/jira/browse/AIRFLOW-2951
> > > > https://github.com/apache/incubator-airflow/pull/3798 Update dag_run
> > > table end_date when state change
> > > > (though as written it has a few other deps to cherry pick in, so will
> > > see about this one)
> > > >
> > >
> > >
> >
>


Re: [VOTE][RESULT] Replace with Gitter with Slack?

2018-09-09 Thread Bolke de Bruin
I know I'm nitpicking but afaik you need 3 +1s to make the change.

However why dont you open a slack Channel and see who joins and let people
know on gitter as well that it exists?

B.

Op zo 9 sep. 2018 08:15 schreef Sid Anand :

> Taking Binding votes into account :
>
> +1: 1 vote
>
>- Sid Anand
>
> 0: 2 votes
>
>- Bolke de Bruin
>- Kaxil Naik
>
> -0.5: 1 vote
>
>- Arthur Wiedmer
>
>
> Vote result is a net positive of +0.5.
>
> I counted all of the PMC/committers' votes as binding.
>
> -s
>
> On Sat, Sep 8, 2018 at 11:07 PM Arthur Wiedmer 
> wrote:
>
> > Sid,
> >
> > Erm, the next line is (emphasis mine) :
> >
> > PMC members have formally binding votes, but in general community members
> > are encouraged to vote, even if their votes are *only advisory*.
> >
> > Again, that's not to say that the community at large cannot decide to
> host
> > a Slack channel. I think you are more than welcome to if you want to.
> >
> > Best,
> > Arthur
> >
> > On Sat, Sep 8, 2018 at 10:59 PM Sid Anand  wrote:
> >
> > > Why doesn't every vote matter for this topic?
> > > https://www.apache.org/foundation/voting.html#binding-votes
> > >
> > > Am I misinterpreting the "Who is permitted to vote is, to some extent,
> a
> > > community-specific thing."?
> > >
> > > For the established Apache processes around promoting a contributor to
> > > committer/PMC, deciding on the number of +1s to allow a merge to
> master,
> > or
> > > voting on releases, I understand the need to follow the
> > binding/non-binding
> > > protocol. In all of those cases, the outcomes affect maintainers.
> > >
> > > My understanding is that this topic affects the entire community, where
> > > some members of the community are helping others. This seems to chug
> > along
> > > without maintainers being present. Hence, why do maintainers' votes
> > matter
> > > more here?
> > >
> > > Question for the mentors.. Jakob?
> > >
> > > -s
> > >
> > > On Sat, Sep 8, 2018 at 10:27 PM Bolke de Bruin 
> > wrote:
> > >
> > > > Committers can only vote binding. Most of the time it is tagged by,
> > > > indeed, adding "binding" to the vote.
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On 9 Sep 2018, at 07:20, Scott Halgrim  > > .INVALID>
> > > > wrote:
> > > > >
> > > > > What makes a vote binding? Just putting “(binding)” after your
> vote?
> > > > >> On Sep 8, 2018, 10:19 PM -0700, Bolke de Bruin  >,
> > > > wrote:
> > > > >> Sorry sid, only binding votes directly count to the vote. Please
> > > update
> > > > the
> > > > >> result accordingly.
> > > > >>
> > > > >> Btw did you ask on gitter itself?
> > > > >>
> > > > >>
> > > > >>
> > > > >> Op zo 9 sep. 2018 03:47 schreef Sid Anand :
> > > > >>
> > > > >>> +1: 12 votes
> > > > >>>
> > > > >>> - Sid Anand
> > > > >>> - Steve Carpenter
> > > > >>> - Beau Barker
> > > > >>> - Marc Bollinger
> > > > >>> - Pedro Machado
> > > > >>> - Scott Halgrim
> > > > >>> - Eamon Keane
> > > > >>> - Adam Boscarino
> > > > >>> - Daniel Cohen
> > > > >>> - Chandu Kavar
> > > > >>> - William Horton
> > > > >>> - Ben Gregory
> > > > >>>
> > > > >>> 0: 2 votes
> > > > >>>
> > > > >>> - Bolke de Bruin
> > > > >>> - Kaxil Naik
> > > > >>>
> > > > >>>
> > > > >>> -0.5: 1 vote
> > > > >>>
> > > > >>> - Arthur Wiedmer
> > > > >>>
> > > > >>> -1: 4 votes
> > > > >>>
> > > > >>> - Shah Altaf
> > > > >>> - James Meickle
> > > > >>> - Ravi Kotecha
> > > > >>> - airflowuser (??)
> > > > >>>
> > > > >>> The vote concludes with 12 votes for and 4.5 votes against. The
> > > > community
> > > > >>> has opted to move to Slack from Gitter.
> > > > >>>
> > > > >>> -s
> > > > >>>
> > > >
> > >
> >
>


Re: [VOTE][RESULT] Replace with Gitter with Slack?

2018-09-08 Thread Bolke de Bruin
Sorry sid, only binding votes directly count to the vote. Please update the
result accordingly.

Btw did you ask on gitter itself?



Op zo 9 sep. 2018 03:47 schreef Sid Anand :

> +1: 12 votes
>
>- Sid Anand
>- Steve Carpenter
>- Beau Barker
>- Marc Bollinger
>- Pedro Machado
>- Scott Halgrim
>- Eamon Keane
>- Adam Boscarino
>- Daniel Cohen
>- Chandu Kavar
>- William Horton
>- Ben Gregory
>
> 0: 2 votes
>
>- Bolke de Bruin
>- Kaxil Naik
>
>
> -0.5: 1 vote
>
>- Arthur Wiedmer
>
> -1: 4 votes
>
>- Shah Altaf
>- James Meickle
>- Ravi Kotecha
>- airflowuser (??)
>
> The vote concludes with 12 votes for and 4.5 votes against. The community
> has opted to move to Slack from Gitter.
>
> -s
>


Re: TriggerDagRunOperator sub tasks are scheduled to run after few hours

2018-09-07 Thread Bolke de Bruin
Can you share the definition? And timezone settings?

Op vr 7 sep. 2018 20:53 schreef Maxime Beauchemin <
maximebeauche...@gmail.com>:

> Is the issue timezone related? Personally I've only used Airflow in
> UTC-aligned environments so I can't help much on this topic. Bolke as
> contributed timezone awareness to the codebase in the past, I'm not sure
> what the common caveats may be.
>
> Max
>
> On Fri, Sep 7, 2018 at 4:29 AM Goutam Kumar Sahoo <
> goutamkumar.sa...@infosys.com> wrote:
>
> > HI Experts
> >
> >
> >
> > In our project, we are trying to replicate the existing  job scheduling
> > implemented in Microsoft Orchestrater to new scheduler called “Apache
> > Airflow” .
> >
> >
> >
> > During this replication process we have an requirement to create Master
> > DAG which in turn will call Other child DAGs based on different
> condition.
> > We referred to the existing example (example_trigger_controller_dag,
> > example_trigger_target_dag) available in Airflow and tried running them
> > manually.
> >
> >
> >
> > When we triggered the controller DAG manually , it is got triggered
> > immediately with execution time created in local timezone (PDT) whereas
> the
> > child DAG is scheduled to run few hours later. Even though child DAGRUN
> is
> > showing as running, it is doing nothing and waiting for the time to
> > satisfy.  Our requirement is to trigger the Target/Child DAG run as soon
> it
> > is triggered by controller DAG
> >
> >
> >
> > If you see here , the trigger time and execution time is having a 7 hours
> > GAP.
> >
> >
> >
> > We have tried to solve this many different ways but couldn’t succeed. We
> > have asked for help in many forums but didn’t get any satisfactory answer
> > form any of them
> >
> >
> >
> >
> >
> > Please help us ASAP as this issue is one of the show stopper for us .
> >
> >
> >
> >
> >
> >
> >
>


Re: Security issue being ignored?

2018-09-06 Thread Bolke de Bruin
Both are not security vulnerabilities: either it is in an upstream project or 
it is due to the way Airflow can be used. PR is welcome for the second JIRA.

B.

Verstuurd vanaf mijn iPad

> Op 6 sep. 2018 om 11:07 heeft airflowuser 
>  het volgende geschreven:
> 
> Another example:
> https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-2283
> 
> Sent with [ProtonMail](https://protonmail.com) Secure Email.
> 
> ‐‐‐ Original Message ‐‐‐
>> On September 3, 2018 10:20 AM, airflowuser  
>> wrote:
>> 
>> Hi,
>> I noticed you opened a disccusion about the neccesity of Gitter...
>> I think the main problem is that unlike other open source projects with 
>> Airflow no one is monitoring the Jira. So people tend to report many stuff 
>> on the Gitter to get assistance. Sometimes answers are given but no one 
>> answer on the open tickets.
>> 
>> Other projects hosted on GitHub or others always have someone reviewing new 
>> tickets and tag them. On airflow any user tag any thing he wishes.. there 
>> are no priorities. There are open tickets for version 1.7 which will 
>> probebly stay there forever.
>> 
>> Airflow doesn't have this function in the team... no one monitor the Jira 
>> and so there are cases like this:
>> [https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-1260](https://deref-gmx.com/mail/client/dzTsJ-2uKlU/dereferrer/?redirectUrl=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FAIRFLOW%2Fissues%2FAIRFLOW-1260)
>> A report of security issue where no one see that. This could be nothing or 
>> it could be sirious but I think the Jira should be more than just a place to 
>> paste you commit notices.
>> In other projects the comunnity handle security issues asap... no one wants 
>> his project to be hacked.
>> 
>> May I suggest that the Jira is not very user-firendly... I think the GitHub 
>> issues section (which is disabled in this project) is better for discussion 
>> and bug reports. This can be used for questions as well and can also replace 
>> the Gitter.
>> I noticed that many people submit PR and only then there is a disccution 
>> about the implemntation - the disscution should be done before... not 
>> eveyone are on mailing lists.. especialy new developers - you are limiting 
>> access to the project with this approch. See how many open PR are from 
>> 2017,2016...
>> It's easier for first time commiters to choose a ticket which it's taged as 
>> "easy fix" and there was a disscution on it..
>> 
>> Thanks,


Re: [VOTE] Replace with Gitter with Slack?

2018-09-06 Thread Bolke de Bruin
0 (binding) abstaining from the vote. 

I am not convinced we gain anything except disruption for an existing 
community. We cannot even force people to move away only make it “less” 
official. My question would be do we get more attendance of contributors and 
committers if we move to slack? Otherwise I don’t see the benefit.

B.

Verstuurd vanaf mijn iPad

> Op 6 sep. 2018 om 09:04 heeft airflowuser 
>  het volgende geschreven:
> 
> -1
> 
> 
> Sent with ProtonMail Secure Email.
> 
> ‐‐‐ Original Message ‐‐‐
>> On September 6, 2018 5:30 AM, Sid Anand  wrote:
>> 
>> Hi Folks!
>> In the Apache tradition, I'd like to ask the community to vote on replacing
>> Gitter with Slack.
>> 
>> For more information about what was recently discussed, refer to
>> https://lists.apache.org/thread.html/8eeb6c46ec431b9158f87022ceaa5eed8dbaaf082c887dae55f86f96@
>> 
>> If you would like to replace Gitter with Slack, vote +1. If you want to
>> keep things they way they are, vote -1. You can also vote 0 if you don't
>> care either way because you wouldn't use either much, preferring to use the
>> mailing list instead, which is highly encouraged as it is Apache's official
>> record.
>> 
>> The vote will be open for 72 hours and will expire at 8p PT this Saturday.
>> -s
>> 
>> P.S. If the community votes for Slack, we could create our own workspace
>> (e.g. airflow.slack.com).
>> P.P.S. In general, anyone in the community can launch a vote like this from
>> time to time. There is no binding/non-binding distinction since we are not
>> running an official Apache vote.
> 
> 


Re: Call for fixes for Airflow 1.10.1

2018-09-05 Thread Bolke de Bruin
You should push these to v1-10-test not to stable. Only once we start cutting 
RCs you should push to -stable. See the docs. This ensures a stable 
“stable”branch.

Cheers
Bolke.

> On 3 Sep 2018, at 14:20, Ash Berlin-Taylor  wrote:
> 
> Hi everyone,
> 
> I'm starting the process of gathering fixes for a 1.10.1. So far the list of 
> issues I have that we should pull in are 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20AIRFLOW%20AND%20fixVersion%20%3D%201.10.1%20ORDER%20BY%20key%20ASC
>  (reproduces below)
> 
> I will start pushing these as cherry-picked commits to the v1-10-stable 
> branch today.
> 
> If you have something that is not in the list below let me know. I'd like to 
> keep this to bug fixes against 1.10.0 only if possible.
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2145 Deadlock after clearing a 
> running task
> https://github.com/apache/incubator-airflow/pull/3657
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2476 update tabulate dep to 
> 0.8.2
> https://github.com/apache/incubator-airflow/pull/3835
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2778 Bad Import in collect_dag 
> in DagBag
> https://github.com/apache/incubator-airflow/pull/3624
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2900 Show code for packaged DAGs
> https://github.com/apache/incubator-airflow/pull/3749
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2949 Add syntax highlight for 
> single quote strings
> https://github.com/apache/incubator-airflow/pull/3795
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2984 Cannot convert 
> naive_datetime when task has a naive start_date/end_date
> https://github.com/apache/incubator-airflow/pull/3822
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2979 Deprecated Celery Option 
> not in Options list
> https://github.com/apache/incubator-airflow/pull/3832 
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2989 No Parameter to change 
> bootDiskType for DataprocClusterCreateOperator
> https://github.com/apache/incubator-airflow/pull/3825
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2990 Docstrings for 
> Hooks/Operators are in incorrect format
> https://github.com/apache/incubator-airflow/pull/3820
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2994 flatten_results in 
> BigQueryOperator/BigQueryHook should default to None
> https://github.com/apache/incubator-airflow/pull/3829
> 
> 
> In addition to those PRs which are already marked with Fix Version of 1.10.1 
> I think we should also pull in these:
> 
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2713 Rename async variable for 
> Python 3.7.0 compatibility
> https://github.com/apache/incubator-airflow/pull/3561
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2895 Prevent scheduler from 
> spamming heartbeats/logs
> https://github.com/apache/incubator-airflow/pull/3747
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2921 A trivial incorrectness in 
> CeleryExecutor()
> https://github.com/apache/incubator-airflow/pull/3773
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2866 Missing CSRF Token Error 
> on Web RBAC UI Create/Update Operations
> https://github.com/apache/incubator-airflow/pull/3804
> 
> 
> https://issues.apache.org/jira/browse/AIRFLOW-2951 
> https://github.com/apache/incubator-airflow/pull/3798 Update dag_run table 
> end_date when state change
> (though as written it has a few other deps to cherry pick in, so will see 
> about this one)
> 



Re: Add git tag for 1.10

2018-09-03 Thread Bolke de Bruin
I’m not sure what you think is off? The change is only in master not in 1.10. 
If you are reading the docs for updating to master then you are using the wrong 
docs if you are upgrading to 1.10.

B.

Verstuurd vanaf mijn iPad

> Op 3 sep. 2018 om 14:53 heeft Robin Edwards  het volgende 
> geschreven:
> 
> I am not sure if anyone's aware of this, the 1.10.0 tag and the PyPi upload
> dont contain the 'BashTaskRunner' -> 'StandardTaskRunner' change.
> 
> The docs in master do
> https://github.com/apache/incubator-airflow/blob/master/UPDATING.md (They
> don't in the 1.10.0 tag)
> 
> If this is intentional and the change is going to be in 1.10.1 perhaps it
> should be put under a new heading in UPDATING.md?
> 
> It just tripped me up as I thought it was part of 1.10
> 
> 
>> On Fri, Aug 31, 2018 at 8:00 AM, Kaxil Naik  wrote:
>> 
>> We already do have it:
>> https://github.com/apache/incubator-airflow/releases/tag/1.10.0
>> 
>>> On Fri, 31 Aug 2018, 06:23 Beau Barker,  wrote:
>>> 
>>> Can we please tag the final v1.10 commit?
>>> 
>> 


Re: Running unit tests against SLUGIFY_USES_TEXT_UNIDECODE and AIRFLOW_GPL_UNIDECODE (also is this broken?)

2018-08-29 Thread Bolke de Bruin
The dependency is so small,   besides python-slugify executes tests with both 
that I wouldn’t bother. Probably it makes more sense to get rid of python-nvd3 
and to be able to upgrade d3.

Verstuurd vanaf mijn iPad

> Op 28 aug. 2018 om 22:18 heeft Taylor Edmiston  het 
> volgende geschreven:
> 
> Since the release of 1.10, we now have the option to install Airflow with
> either:
> 
> 1. python-nvd3  --> python-slugify
>  --> text-unidecode
>  (via env var
> SLUGIFY_USES_TEXT_UNIDECODE=yes), or
> 2. python-nvd3 --> python-slugify --> unidecode
>  (via AIRFLOW_GPL_UNIDECODE=yes)
> 
> Currently on Travis CI we only test the former configuration.  Does anyone
> have a recommendation on how to go about testing the latter?  Running an
> entire second copy of the unit tests for one dependency feels a bit
> redundant... maybe there's a small relevant subset of tests that could only
> run for the alternative dependency config?
> 
> On a related note, I think this part of the install may be broken.
> 
> I've tried running a pip install under each config like so (using pyenv +
> virtualenvwrapper):
> 
> Shell 1:
> 
> pyenv shell 3.6.5
> mktmpenv
> export SLUGIFY_USES_TEXT_UNIDECODE=yes
> pip install apache-airflow
> pip freeze > ~/a.txt
> 
> Shell 2:
> 
> pyenv shell 3.6.5
> mktmpenv
> export AIRFLOW_GPL_UNIDECODE=yes
> pip install apache-airflow
> pip freeze > ~/b.txt
> 
> Shell 3:
> 
> diff ~/a.txt ~/b.txt
> (empty)
> 
> I would expect the former to have text-unidecode and the latter to have
> Unidecode.  *Can someone else attempt to reproduce this behavior?*
> 
> Additionally, I'm also experiencing this same behavior when trying to
> install the underlying python-slugify package similarly as well.  I've
> opened an issue for that here -
> https://github.com/un33k/python-slugify/issues/59.
> 
> Thank you,
> Taylor
> 
> *Taylor Edmiston*
> Blog  | CV
>  | LinkedIn
>  | AngelList
>  | Stack Overflow
> 


Re: PR Review Dashboard?

2018-08-28 Thread Bolke de Bruin
I’m not in favor of moving to GitHub issues. While JIRA is not perfect it 
actually moves discussion to the mailing list. With GitHub issues the stuff 
just gets lost imho.

I do like our changelogs a lot better now.

B.

Verstuurd vanaf mijn iPad

> Op 27 aug. 2018 om 05:53 heeft Holden Karau  het 
> volgende geschreven:
> 
> Awesome, so I'll give this a shot with JIRA for now and if we end up moving 
> to GH we can move it over. I've got a fork of the dashboard I've been using 
> in Apache Beam as well as the Spark one so it shouldn't take me too long to 
> generalize it again.
> 
>> On Sun, Aug 26, 2018 at 8:50 PM Maxime Beauchemin 
>>  wrote:
>> I love the idea. I took a quick look at /databricks/spark-pr-dashboard
>>  and that looks like a
>> nice easy start. It would be interesting to try to make a generic project
>> out of it that would work on top of any project/repo. Basically just
>> refactor all of the Spark-specific code and configurations into a
>> `config.py` (assuming there's not too much frontend Spark-specific
>> code...).
>> 
>> Of course this tool assumes Jira+Github which works for Airflow, but
>> probably isn't as common of a setup to really generalize beyond Apache. It
>> seems like by embracing Github issues and dropping Jira we could be
>> building something much more relevant.
>> 
>> Max
>> 
>> On Fri, Aug 24, 2018 at 7:27 PM Holden Karau  wrote:
>> 
>> > Update: we can do this with the dashboard code. Since this would modify
>> > the JIRAs I’d love a sign-off on turning that feature on from someone on
>> > the PMC (or at least a week with no PMC folks saying no).
>> >
>> > On Thu, Aug 23, 2018 at 8:00 AM Holden Karau  wrote:
>> >
>> >> I mean a few ASF projects update JIRA tickets based on PRs automatically.
>> >> Switching from JIRA to GH issues (or back) is super painful, so I’d
>> >> probably do more incremental improvements personally, I just don’t have 
>> >> the
>> >> time to do something like that.
>> >>
>> >> I’ll take a look at some the K8s tools (over in Beam they’re looking at
>> >> one of the review tagging tools out of K8s) if the Spark one is too
>> >> difficult to adapt to our use case.
>> >>
>> >> On Thu, Aug 23, 2018 at 2:34 AM Eamon Keane 
>> >> wrote:
>> >>
>> >>> Kubernetes is a good place to look as they've invested a lot in github
>> >>> bots
>> >>> and label based workflows. E.g. the cherry-picking script and doc is
>> >>> here:
>> >>>
>> >>>
>> >>> https://github.com/kubernetes/kubernetes/blob/master/hack/cherry_pick_pull.sh
>> >>>
>> >>> https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md
>> >>>
>> >>> And more general overview here:
>> >>>
>> >>> https://github.com/kubernetes/community/tree/master/contributors/devel
>> >>>
>> >>> On Thu, Aug 23, 2018 at 5:11 AM Maxime Beauchemin <
>> >>> maximebeauche...@gmail.com> wrote:
>> >>>
>> >>> > I've heard many times in the past about a GH/Jira syncing tool but
>> >>> never
>> >>> > seen it in action. Personally my vote is to move issues to GH and drop
>> >>> > Jira. Though in the process this will break the release helper script
>> >>> here:
>> >>> >
>> >>> https://github.com/apache/incubator-airflow/blob/master/dev/airflow-jira
>> >>> >
>> >>> > We'll be working on a Github label-driven release baking magic script
>> >>> for
>> >>> > Superset, maybe we could use the same tooling on both Airflow and
>> >>> Superset.
>> >>> > The idea is that the script would use labels like `target:apache-1.11`
>> >>> on
>> >>> > PRs to bake releases. The tool would take as input a base SHA and
>> >>> release
>> >>> > minor release number, and would craft a release branch, fetch and
>> >>> > cherry-pick all the right commits in the right order based on labels,
>> >>> > generate release tags (on minor versions) and output state into
>> >>> > release-info files (listing the base, all cherries, all tags, ...). The
>> >>> > tricky part is resolving merge conflicts while auto-picking cherries,
>> >>> but
>> >>> > the script would guide the operator through it .
>> >>> >
>> >>> > Curious to hear about how other projects do it. I think it generally
>> >>> > involves a lot of manual work. Let me know if you know of open source
>> >>> > tooling to deal with release management.
>> >>> >
>> >>> > Max
>> >>> >
>> >>> > On Wed, Aug 22, 2018 at 6:25 PM Holden Karau 
>> >>> > wrote:
>> >>> >
>> >>> > > Thanks for the reminder, forgot to ask at coffee but I'll ask.
>> >>> > >
>> >>> > > On Wed, Aug 22, 2018, 1:52 AM Driesprong, Fokko > >>> >
>> >>> > > wrote:
>> >>> > >
>> >>> > > > Hi Holden,
>> >>> > > >
>> >>> > > > Just curious if you got a hold of someone at the coffee machine :-)
>> >>> > > >
>> >>> > > > Cheers, Fokko
>> >>> > > >
>> >>> > > > Op di 7 aug. 2018 om 09:17 schreef Holden Karau <
>> >>> hol...@pigscanfly.ca
>> >>> > >:
>> >>> > > >
>> >>> > > > > The JIRA/Github integration tooling I’m a little more fuzzy on
>> >>> but
>> >>> > I’m
>> >>> > 

Re: Python 3.6 Support for Airflow 1.10.0

2018-08-28 Thread Bolke de Bruin
Let’s not drop 2.7 too quickly but maybe mark it deprecated. I’m pretty sure 
Airbnb still runs on 2.7.

Also RedHat does not deliver python 3 in its enterprise edition yet by default 
so it will put enterprise users in a bit of an awkward spot.

B.

Verstuurd vanaf mijn iPad

> Op 28 aug. 2018 om 19:00 heeft Sid Anand  het volgende 
> geschreven:
> 
> I'm +1 on going to 3.7 -- I'm running 3.6 myself.
> 
> Regarding dropping Python2 support, with almost 200 companies using
> Airflow, I'd want to be very careful that we don't put any of them at a
> disadvantage. For example, my former employer (a small startup) is running
> on Python2 -- after I left, they don't have anyone actively maintaining it
> at the company. Easing upgrades for such cases will keep them using Airflow.
> 
> It would be good to hold a survey that we promote beyond daily readers of
> this mailing list and raise this as an AIP, since it's a major change.
> Let's not rush it.
> 
> -s
> 
>> On Tue, Aug 28, 2018 at 9:24 AM Naik Kaxil  wrote:
>> 
>> We should definitely support 3.7. I left comments on the PR @tedmiston
>> regarding the same. Python 2.7 will be dropped in 2020, so I guess we
>> should start planning about it. Not really 100% sure though that we should
>> drop it in Airflow 2.0
>> 
>> On 28/08/2018, 17:08, "Taylor Edmiston"  wrote:
>> 
>>I am onboard with dropping Python 2.x support.  Django officially
>> dropped
>>Python 2.x support with their 2.0 release since December 2017.
>> 
>>*Taylor Edmiston*
>>Blog  | CV
>> | LinkedIn
>> | AngelList
>> | Stack Overflow
>>
>> 
>> 
>> 
>>On Tue, Aug 28, 2018 at 12:03 PM Ash Berlin-Taylor 
>> wrote:
>> 
>>> Supporting 3.7 is absolutely something we should do - it just got
>> released
>>> while we were already mid-way through the release process of 1.10 and
>>> didn't want the scope creep.
>>> 
>>> I'm happy to release a 1.10.1 that supports Py 3.7. The only issue
>> I've
>>> seen so far is around the use of `async` as a keyword. both in
>>> 
>>> A perhaps bigger question: What are people's thoughts on dropping
>> support
>>> for Python2? This wouldn't happen before 2.0 at the earliest if we
>> did it.
>>> Probably something to raise an AIP for.
>>> 
>>> -ash
>>> 
 
>> 
>> Kaxil Naik
>> 
>> Data Reply
>> 2nd Floor, Nova South
>> 160 Victoria Street, Westminster
>> London SW1E 5LB - UK
>> phone: +44 (0)20 7730 6000
>> k.n...@reply.com
>> www.reply.com
>> On 28 Aug 2018, at 16:56, Taylor Edmiston  wrote:
 
 We are also running on 3.6 for some time.
 
 I put a quick branch together adding / upgrading to 3.6 in all of
>> the
 places.  CI is still running so I may expect some test failures but
 hopefully nothing major.  I would be happy to merge this into
>> Kaxil's
 current #3815 or as a follow-on PR.  I'll paste this back onto his
>> PR as
 well.
 
 https://github.com/apache/incubator-airflow/pull/3816
 
 I think it's important for the project to officially support
>> Python 3.6
 latest especially since 3.7 is out now.  While we're on the topic,
>> does
 anyone else have thoughts on supporting 3.7 (perhaps unofficially
>> to
 start)?  I wouldn't mind starting to get that ball rolling.
 
 *Taylor Edmiston*
 Blog  | CV
  | LinkedIn
  | AngelList
  | Stack Overflow
 
 
 
 
 On Tue, Aug 28, 2018 at 9:29 AM Adam Boscarino
  wrote:
 
> fwiw, we run Airflow on Python 3.6.
> 
> On Tue, Aug 28, 2018 at 8:30 AM Naik Kaxil 
>> wrote:
> 
>> To provide more context to the issue:
>> 
>> 
>> 
>> PyPI shows that Airflow is supported on Py2.7, 3.4 and 3.5 :
>> https://pypi.org/project/apache-airflow/
>> 
>> 
>> 
>> This is picked from setup.py:
>> 
>> 
>> 
> 
>>> 
>> https://github.com/apache/incubator-airflow/blob/26e0d449737e8671000f671d820a9537f23f345a/setup.py#L367
>> 
>> 
>> 
>> 
>> 
>> So, should we update setup.py to include 3.6 as well?
>> 
>> 
>> 
>> @bolke – Thughts?
>> 
>> 
>> 
>> 
>> Kaxil Naik
>> 
>> Data Reply
>> 2nd Floor, Nova South
>> 160 Victoria Street, Westminster
>> London SW1E 5LB - UK
>> phone: +44 (0)20 7730 6000
>> k.n...@reply.com
>> www.reply.com
>> 
>> [image: Data Reply]
>> 
>> *From: *Naik Kaxil 
>> *Reply-To: *"dev@airflow.incubator.apache.org" <
>> dev@airflow.incubator.apache.org>
>> *Date: *Tuesday, 28 August 2018 at 13:27
>> *To: 

Re: Cloudera Hue in License

2018-08-24 Thread Bolke de Bruin
Yes we do. The kerberos ticket renewer i amended from hue. 

Sent from my iPhone

> On 24 Aug 2018, at 17:46, Taylor Edmiston  wrote:
> 
> For anyone else who's curious - it looks like Hue is a web app for Hadoop.
> 
> https://github.com/cloudera/hue
> 
> https://en.wikipedia.org/wiki/Hue_(Hadoop)
> 
> *Taylor Edmiston*
> Blog  | CV
>  | LinkedIn
>  | AngelList
>  | Stack Overflow
> 
> 
> 
> 
>> On Fri, Aug 24, 2018 at 11:41 AM Ash Berlin-Taylor  wrote:
>> 
>> Hi everyone,
>> 
>> So we include references to Cloudera's Hue in the LICENSE file, and
>> mention it again in the NOTICE file saying:
>> 
>>> This product contains a modified portion of 'Hue' developed by Cloudera,
>> Inc.
>> 
>> Does anyone know what this refers to? Is it still the case? Grepping for
>> hue doesn't turn up anything likely looking.
>> 
>> -ash


Re: [RESULT][VOTE] Release Airflow 1.10.0

2018-08-22 Thread Bolke de Bruin
@max

Mine is "bolke"

Cheers

B.

Sent from my iPhone

> On 22 Aug 2018, at 16:13, Driesprong, Fokko  wrote:
> 
> Certainly: https://github.com/apache/incubator-airflow/releases/tag/1.10.0
> 
> Cheers, Fokko
> 
> Op wo 22 aug. 2018 om 15:18 schreef Ash Berlin-Taylor :
> 
>> Could you push the git tag too please Fokko/Bolke?
>> 
>> -ash
>> 
>>> On 22 Aug 2018, at 08:16, Driesprong, Fokko 
>> wrote:
>>> 
>>> Thanks Max,
>>> 
>>> My PyPI ID is Fokko
>>> 
>>> Cheers, Fokko
>>> 
>>> 2018-08-21 22:49 GMT+02:00 Maxime Beauchemin >> :
>>> 
>>>> I can, what's your PyPI ID?
>>>> 
>>>> Max
>>>> 
>>>> On Mon, Aug 20, 2018 at 2:11 PM Driesprong, Fokko >> 
>>>> wrote:
>>>> 
>>>>> Thanks Bolke!
>>>>> 
>>>>> I've just pushed the artifacts to Apache Dist:
>>>>> 
>>>>> https://dist.apache.org/repos/dist/release/incubator/
>>>> airflow/1.10.0-incubating/
>>>>> 
>>>>> I don't have any access to pypi, this means that I'm not able to upload
>>>> the
>>>>> artifacts over there. Anyone in the position to grand me access or
>> upload
>>>>> it to pypi?
>>>>> 
>>>>> Thanks! Cheers, Fokko
>>>>> 
>>>>> 2018-08-20 20:06 GMT+02:00 Bolke de Bruin :
>>>>> 
>>>>>> Hi Guys and Gals,
>>>>>> 
>>>>>> The vote has passed! Apache Airflow 1.10.0 is official.
>>>>>> 
>>>>>> As I am AFK for a while can one of the committers please rename
>>>> according
>>>>>> to the release docs and push it to the relevant locations (pypi and
>>>>> Apache
>>>>>> dist)?
>>>>>> 
>>>>>> Oh and maybe start a quick 1.10.1?
>>>>>> 
>>>>>> Cheers
>>>>>> Bolke
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>> Begin forwarded message:
>>>>>> 
>>>>>>> From: Bolke de Bruin 
>>>>>>> Date: 20 August 2018 at 20:00:28 CEST
>>>>>>> To: gene...@incubator.apache.org, dev@airflow.incubator.apache.org
>>>>>>> Subject: [RESULT][VOTE] Release Airflow 1.10.0
>>>>>>> 
>>>>>>> The vote to release Airflow 1.10.0-incubating, having been open for 8
>>>>>>> days is now closed.
>>>>>>> 
>>>>>>> There were three binding +1s and no -1 votes.
>>>>>>> 
>>>>>>> +1 (binding):
>>>>>>> Justin Mclean
>>>>>>> Jakob Homan
>>>>>>> Hitesh Shah
>>>>>>> 
>>>>>>> The release is approved.
>>>>>>> 
>>>>>>> Thanks to all those who voted.
>>>>>>> 
>>>>>>> Bolke
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> Begin forwarded message:
>>>>>>> 
>>>>>>>> From: Bolke de Bruin 
>>>>>>>> Date: 20 August 2018 at 19:56:23 CEST
>>>>>>>> To: gene...@incubator.apache.org
>>>>>>>> Subject: Re: [VOTE] Release Airflow 1.10.0 (new vote based on rc4)
>>>>>>>> 
>>>>>>>> Appreciated Hitesh. Do you know how to add headers to .MD files?
>>>> There
>>>>>> seems to be no technical standard way[1]. Is there a way to solve this
>>>>>> elegantly?
>>>>>>>> 
>>>>>>>> Cheers
>>>>>>>> Bolke
>>>>>>>> 
>>>>>>>> [1] https://alvinalexander.com/technology/markdown-comments-
>>>>>> syntax-not-in-generated-output
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Sent from my iPhone
>>>>>>>> 
>>>>>>>>> On 20 Aug 2018, at 19:48, Hitesh Shah  wrote:
>>>>>>>>> 
>>>>>>>>> +1 (binding)
>>>>>>>>> 
>>>>>>>>> Ran through the basic checks.
&g

Fwd: [RESULT][VOTE] Release Airflow 1.10.0

2018-08-20 Thread Bolke de Bruin
Hi Guys and Gals,

The vote has passed! Apache Airflow 1.10.0 is official. 

As I am AFK for a while can one of the committers please rename according to 
the release docs and push it to the relevant locations (pypi and Apache dist)?

Oh and maybe start a quick 1.10.1?

Cheers
Bolke

Sent from my iPhone

Begin forwarded message:

> From: Bolke de Bruin 
> Date: 20 August 2018 at 20:00:28 CEST
> To: gene...@incubator.apache.org, dev@airflow.incubator.apache.org
> Subject: [RESULT][VOTE] Release Airflow 1.10.0
> 
> The vote to release Airflow 1.10.0-incubating, having been open for 8
> days is now closed.
> 
> There were three binding +1s and no -1 votes.
> 
> +1 (binding):
> Justin Mclean
> Jakob Homan
> Hitesh Shah
> 
> The release is approved.
> 
> Thanks to all those who voted.
> 
> Bolke
> 
> Sent from my iPhone
> 
> Begin forwarded message:
> 
>> From: Bolke de Bruin 
>> Date: 20 August 2018 at 19:56:23 CEST
>> To: gene...@incubator.apache.org
>> Subject: Re: [VOTE] Release Airflow 1.10.0 (new vote based on rc4)
>> 
>> Appreciated Hitesh. Do you know how to add headers to .MD files? There seems 
>> to be no technical standard way[1]. Is there a way to solve this elegantly?
>> 
>> Cheers
>> Bolke
>> 
>> [1] 
>> https://alvinalexander.com/technology/markdown-comments-syntax-not-in-generated-output
>> 
>> 
>> 
>> Sent from my iPhone
>> 
>>> On 20 Aug 2018, at 19:48, Hitesh Shah  wrote:
>>> 
>>> +1 (binding)
>>> 
>>> Ran through the basic checks.
>>> 
>>> Minor nit which can be fixed in the next release: there are a bunch of
>>> documentation files which could have a license header added (e.g. .md,
>>> .rst, )
>>> 
>>> thanks
>>> Hitesh
>>> 
>>>> On Mon, Aug 20, 2018 at 4:08 AM Bolke de Bruin  wrote:
>>>> 
>>>> Sorry Willem that should be of course. Apologies.
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On 20 Aug 2018, at 13:07, Bolke de Bruin  wrote:
>>>>> 
>>>>> Hi William
>>>>> 
>>>>> You seem to be missing a "4" at the end of the URL? Ah it seems that my
>>>> original email had a quirk. Would you mind using the below?
>>>>> 
>>>>> https://github.com/apache/incubator-airflow/releases/tag/1.10.0rc4
>>>>> 
>>>>> Thanks!
>>>>> Bolke
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On 20 Aug 2018, at 13:03, Willem Jiang  wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> The Git tag cannot be accessed.  I can only get the 404  error there.
>>>>>> 
>>>>>> https://github.com/apache/incubator-airflow/releases/tag/1.10.0rc
>>>>>> 
>>>>>> 
>>>>>> Willem Jiang
>>>>>> 
>>>>>> Twitter: willemjiang
>>>>>> Weibo: 姜宁willem
>>>>>> 
>>>>>>> On Sun, Aug 12, 2018 at 8:25 PM, Bolke de Bruin 
>>>> wrote:
>>>>>>> 
>>>>>>> Hello Incubator PMC’ers,
>>>>>>> 
>>>>>>> The Apache Airflow community has voted and approved the proposal to
>>>> release
>>>>>>> Apache Airflow 1.10.0 (incubating) based on 1.10.0 Release Candidate
>>>> 4. We
>>>>>>> now kindly request the Incubator PMC members to review and vote on this
>>>>>>> incubator release.
>>>>>>> 
>>>>>>> Airflow is a platform to programmatically author, schedule, and monitor
>>>>>>> workflows. Use Airflow to author workflows as directed acyclic graphs
>>>>>>> (DAGs) of tasks. The airflow scheduler executes your tasks on an array
>>>> of
>>>>>>> workers while following the specified dependencies. Rich command line
>>>>>>> utilities make performing complex surgeries on DAGs a snap. The rich
>>>> user
>>>>>>> interface makes it easy to visualize pipelines running in production,
>>>>>>> monitor progress, and troubleshoot issues when needed. When workflows
>>>> are
>>>>>>> defined as code, they become more maintainable, versionable, testable,
>>>> and
>>>>>>> collaborative.
>>>>>>> 
>>>

[RESULT][VOTE] Release Airflow 1.10.0

2018-08-20 Thread Bolke de Bruin
The vote to release Airflow 1.10.0-incubating, having been open for 8
days is now closed.

There were three binding +1s and no -1 votes.

+1 (binding):
Justin Mclean
Jakob Homan
Hitesh Shah

The release is approved.

Thanks to all those who voted.

Bolke

Sent from my iPhone

Begin forwarded message:

> From: Bolke de Bruin 
> Date: 20 August 2018 at 19:56:23 CEST
> To: gene...@incubator.apache.org
> Subject: Re: [VOTE] Release Airflow 1.10.0 (new vote based on rc4)
> 
> Appreciated Hitesh. Do you know how to add headers to .MD files? There seems 
> to be no technical standard way[1]. Is there a way to solve this elegantly?
> 
> Cheers
> Bolke
> 
> [1] 
> https://alvinalexander.com/technology/markdown-comments-syntax-not-in-generated-output
> 
> 
> 
> Sent from my iPhone
> 
>> On 20 Aug 2018, at 19:48, Hitesh Shah  wrote:
>> 
>> +1 (binding)
>> 
>> Ran through the basic checks.
>> 
>> Minor nit which can be fixed in the next release: there are a bunch of
>> documentation files which could have a license header added (e.g. .md,
>> .rst, )
>> 
>> thanks
>> Hitesh
>> 
>>> On Mon, Aug 20, 2018 at 4:08 AM Bolke de Bruin  wrote:
>>> 
>>> Sorry Willem that should be of course. Apologies.
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 20 Aug 2018, at 13:07, Bolke de Bruin  wrote:
>>>> 
>>>> Hi William
>>>> 
>>>> You seem to be missing a "4" at the end of the URL? Ah it seems that my
>>> original email had a quirk. Would you mind using the below?
>>>> 
>>>> https://github.com/apache/incubator-airflow/releases/tag/1.10.0rc4
>>>> 
>>>> Thanks!
>>>> Bolke
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On 20 Aug 2018, at 13:03, Willem Jiang  wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> The Git tag cannot be accessed.  I can only get the 404  error there.
>>>>> 
>>>>> https://github.com/apache/incubator-airflow/releases/tag/1.10.0rc
>>>>> 
>>>>> 
>>>>> Willem Jiang
>>>>> 
>>>>> Twitter: willemjiang
>>>>> Weibo: 姜宁willem
>>>>> 
>>>>>> On Sun, Aug 12, 2018 at 8:25 PM, Bolke de Bruin 
>>> wrote:
>>>>>> 
>>>>>> Hello Incubator PMC’ers,
>>>>>> 
>>>>>> The Apache Airflow community has voted and approved the proposal to
>>> release
>>>>>> Apache Airflow 1.10.0 (incubating) based on 1.10.0 Release Candidate
>>> 4. We
>>>>>> now kindly request the Incubator PMC members to review and vote on this
>>>>>> incubator release.
>>>>>> 
>>>>>> Airflow is a platform to programmatically author, schedule, and monitor
>>>>>> workflows. Use Airflow to author workflows as directed acyclic graphs
>>>>>> (DAGs) of tasks. The airflow scheduler executes your tasks on an array
>>> of
>>>>>> workers while following the specified dependencies. Rich command line
>>>>>> utilities make performing complex surgeries on DAGs a snap. The rich
>>> user
>>>>>> interface makes it easy to visualize pipelines running in production,
>>>>>> monitor progress, and troubleshoot issues when needed. When workflows
>>> are
>>>>>> defined as code, they become more maintainable, versionable, testable,
>>> and
>>>>>> collaborative.
>>>>>> 
>>>>>> After a successful IPMC vote Artifacts will be available at:
>>>>>> 
>>>>>> https://www.apache.org/dyn/closer.cgi/incubator/airflow <
>>>>>> https://www.apache.org/dyn/closer.cgi/incubator/airflow>
>>>>>> 
>>>>>> Public keys are available at:
>>>>>> 
>>>>>> https://www.apache.org/dist/incubator/airflow/ <
>>>>>> https://www.apache.org/dist/incubator/airflow/>
>>>>>> 
>>>>>> apache-airflow-1.10.0rc4+incubating-source.tar.gz
>>>>>> 
>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.
>>>>>> 10.0rc4/apache-airflow-1.10.0rc4+incubating-source.tar.gz <
>>>>>> 
>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/apache-
>>>>>> airflow-1.10.0rc4+

Fwd: [VOTE] Release Airflow 1.10.0 (new vote based on rc4)

2018-08-19 Thread Bolke de Bruin
Hi Folks,

If you have contacts with IPMC members that can vote binding, can you
please reach out to them and ask for help? The vote is stalled at the
moment and without 3 +1's we can't release Airflow 1.10.0

Cheers
Bolke

-- Forwarded message --
From: Bolke de Bruin 
Date: 2018-08-19 16:41 GMT+02:00
Subject: Re: [VOTE] Release Airflow 1.10.0 (new vote based on rc4)
To: gene...@incubator.apache.org


Hi,

We are 7 days into the vote and still lacking 2. Please help in reviewing
it's much appreciated!

Bolke

Sent from my iPhone

> On 15 Aug 2018, at 20:22, Bolke de Bruin  wrote:
>
> Friendly ping. We are missing 2 votes.
>
> Cheers
> Bolke
>
> Sent from my iPhone
>
>> On 14 Aug 2018, at 11:00, Justin Mclean  wrote:
>>
>> Hi,
>>
>> +1 (binding) thanks for fixing up the GPL dependancy issue.
>>
>> I checked:
>> - incubating in name
>> - signatures and hashes good
>> - DISCLAIMER exists
>> - No unexpected binary files
>> - all source files have headers
>>
>> Thanks,
>> Justin
>>
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>



-- 

--
Bolke de Bruin
bdbr...@gmail.com


Re: apache-airflow v1.10.0 on PyPi?

2018-08-15 Thread Bolke de Bruin
O shoot apologies. I forgot to push the tags. I can still fix that. Give me an 
hour or 2

Sent from my iPhone

> On 15 Aug 2018, at 18:34, James Meickle  wrote:
> 
> Can we make it a policy going forward to push GH tags for all RCs as part of 
> the release announcement? I deploy via the incubator-airflow repo, but 
> currently it only has tags for up to RC2, which means I have to look up and 
> then specify an ugly-looking commit to deploy an RC :(
> 
>> On Wed, Aug 15, 2018 at 10:54 AM Taylor Edmiston  wrote:
>> Krish - You can also use the RCs before they're released on PyPI if you'd
>> like to help test.  Instead of:
>> 
>> pip install apache-airflow
>> 
>> You can install the 1.10 stable latest with:
>> 
>> pip install git+git://github.com/apache/incubator-airflow.git@v1-10-stable
>> 
>> Or the 1.10 RC tags with eg:
>> 
>> pip install git+git://github.com/apache/incubator-airflow.git@1.10.0rc2
>> 
>> Best,
>> Taylor
>> 
>> *Taylor Edmiston*
>> Blog <https://blog.tedmiston.com/> | CV
>> <https://stackoverflow.com/cv/taylor> | LinkedIn
>> <https://www.linkedin.com/in/tedmiston/> | AngelList
>> <https://angel.co/taylor> | Stack Overflow
>> <https://stackoverflow.com/users/149428/taylor-edmiston>
>> 
>> 
>> On Thu, Aug 9, 2018 at 5:43 PM, Krish Sigler  wrote:
>> 
>> > Got it, will use the mailing list in the future.  Thanks for the info
>> >
>> > On Thu, Aug 9, 2018 at 2:42 PM, Bolke de Bruin  wrote:
>> >
>> > > Hi Kris,
>> > >
>> > > Please use the mailing list for these kind of questions.
>> > >
>> > > Airflow 1.10.0 hasn’t been released yet. We are going through the
>> > motions,
>> > > but it will take a couple of days before it’s official (if all goes
>> > well).
>> > >
>> > > B.
>> > >
>> > > Verstuurd vanaf mijn iPad
>> > >
>> > > Op 9 aug. 2018 om 23:33 heeft Krish Sigler  het
>> > > volgende geschreven:
>> > >
>> > > Hi,
>> > >
>> > > First, I apologize if this is weird.  I saw on the Airflow github page
>> > > that you most recently updated the v1.10.0 changelog, and I found your
>> > > email using the instructions here (https://www.sourcecon.com/
>> > > how-to-find-almost-any-github-users-email-address/).  If that's too
>> > weird
>> > > feel free to tell me and/or ignore this.
>> > >
>> > > I'm emailing because I'm working with the apache-airflow project,
>> > > specifically for setting up pipelines involving GCP packages.  My
>> > > environment uses Python3, and I've been running into the issue outlined
>> > in
>> > > this PR: https://github.com/apache/incubator-airflow/pull/3273.  I
>> > > noticed that the fix is part of the v1.10.0 changelog.
>> > > However, the latest version available on PyPi is 1.9.0.  On the Airflow
>> > > wiki page I read that the project is intended to be updated every ~6
>> > > months, and v1.9.0 was released in January.
>> > >
>> > > So my question, if you're at liberty to tell me, is can I expect v1.10.0
>> > > to be available on PyPi in the near future?  If so then great!  That
>> > would
>> > > solve my package dependency problem.  If not, then I'll look into some
>> > > workaround for my issue.
>> > >
>> > > Thanks,
>> > > Krish
>> > >
>> > >
>> >


[RESULT][VOTE] Airflow 1.10.0rc4

2018-08-12 Thread Bolke de Bruin
Hello,

Apache Airflow (incubating) 1.10.0 (based on RC4) has been accepted.

4 “+1” binding votes received:

- Ash Berlin-Taylor (binding)
- Naik Kaxil (binding)
- Bolke de Bruin (binding)
- Fokko Driesprong (binding)

1 “-1” non-binding:

- Daniel Imberman (non-binding)

My next step is to open a thread with the IPMC.

Cheers,
Bolke

> Begin forwarded message:
> 
> From: Bolke de Bruin 
> Subject: Re: [VOTE] Airflow 1.10.0rc4
> Date: 10 August 2018 at 21:34:40 CEST
> To: dev@airflow.incubator.apache.org
> 
> Thanks @ash
> 
> @daniel: I understand your concern but as k8s is pretty new the impact will 
> be relatively low of having the bug. Let’s have a 1.10.1 quickly after this 
> one (he Fokko ;-) ) and make sure the fixes are targeted for that one. Can 
> you please create JIRA issues for the things  you mentioned?
> 
> B.
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 10 aug. 2018 om 11:27 heeft Ash Berlin-Taylor  het 
>> volgende geschreven:
>> 
>> If we can't score fractions then, yes +1 :)
>> 
>> (And this time sent from the correct email address. I'm really bad at 
>> driving a Mail client it turns out.)
>> 
>> -ash
>> 
>>> On 9 Aug 2018, at 19:22, Bolke de Bruin  wrote:
>>> 
>>> 0.5?? Can we score fractions :-) ? Sorry I missed this Ash. I think Fokko 
>>> really wants a 1.10.1 quickly so better include it then? Can you make your 
>>> vote +1?
>>> 
>>> Thx
>>> Bolke
>>> 
>>>> On 9 Aug 2018, at 14:06, Ash Berlin-Taylor  wrote:
>>>> 
>>>> +0.5 (binding) from me.
>>>> 
>>>> Tested upgrading form 1.9.0 metadb on Py3.5. Timezones behaving themselves 
>>>> on Postgres. Have not tested the Rbac-based UI.
>>>> 
>>>> https://github.com/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153
>>>>  
>>>> <https://github.com/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153>
>>>>  (expanding on UPDATING.md for Logging changes) isn't in the release, but 
>>>> would only affect people who look at the UPDATING.md in the source 
>>>> tarball, which isn't going to be very many - most people will check in the 
>>>> repo and just install via PyPi I'd guess?
>>>> 
>>>> -ash
>>>> 
>>>>> On 8 Aug 2018, at 19:21, Bolke de Bruin  wrote:
>>>>> 
>>>>> Hey all,
>>>>> 
>>>>> I have cut Airflow 1.10.0 RC4. This email is calling a vote on the 
>>>>> release,
>>>>> which will last for 72 hours. Consider this my (binding) +1.
>>>>> 
>>>>> Airflow 1.10.0 RC 4 is available at:
>>>>> 
>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/ 
>>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/>
>>>>> 
>>>>> apache-airflow-1.10.0rc4+incubating-source.tar.gz is a source release that
>>>>> comes with INSTALL instructions.
>>>>> apache-airflow-1.10.0rc4+incubating-bin.tar.gz is the binary Python 
>>>>> "sdist"
>>>>> release.
>>>>> 
>>>>> Public keys are available at:
>>>>> 
>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ 
>>>>> <https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>>>> 
>>>>> The amount of JIRAs fixed is over 700. Please have a look at the 
>>>>> changelog. 
>>>>> Since RC3 the following has been fixed:
>>>>> 
>>>>> [AIRFLOW-2870] Use abstract TaskInstance for migration
>>>>> [AIRFLOW-2859] Implement own UtcDateTime
>>>>> [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook
>>>>> [AIRFLOW-2869] Remove smart quote from default config
>>>>> [AIRFLOW-2857] Fix Read the Docs env
>>>>> 
>>>>> Please note that the version number excludes the `rcX` string as well
>>>>> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
>>>>> to rename the artifact without modifying the artifact checksums when we
>>>>> actually release.
>>>>> 
>>>>> WARNING: Due to licensing requirements you will need to set 
>>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>>>>> installing or upgrading. We will try to remove this requirement for the 
>>>>> next release.
>>>>> 
>>>>> Cheers,
>>>>> Bolke
>>>> 
>>> 
>> 



Re: [VOTE] Airflow 1.10.0rc4

2018-08-10 Thread Bolke de Bruin
Thanks @ash

@daniel: I understand your concern but as k8s is pretty new the impact will be 
relatively low of having the bug. Let’s have a 1.10.1 quickly after this one 
(he Fokko ;-) ) and make sure the fixes are targeted for that one. Can you 
please create JIRA issues for the things  you mentioned?

B.

Verstuurd vanaf mijn iPad

> Op 10 aug. 2018 om 11:27 heeft Ash Berlin-Taylor  het 
> volgende geschreven:
> 
> If we can't score fractions then, yes +1 :)
> 
> (And this time sent from the correct email address. I'm really bad at driving 
> a Mail client it turns out.)
> 
> -ash
> 
>> On 9 Aug 2018, at 19:22, Bolke de Bruin  wrote:
>> 
>> 0.5?? Can we score fractions :-) ? Sorry I missed this Ash. I think Fokko 
>> really wants a 1.10.1 quickly so better include it then? Can you make your 
>> vote +1?
>> 
>> Thx
>> Bolke
>> 
>>> On 9 Aug 2018, at 14:06, Ash Berlin-Taylor  wrote:
>>> 
>>> +0.5 (binding) from me.
>>> 
>>> Tested upgrading form 1.9.0 metadb on Py3.5. Timezones behaving themselves 
>>> on Postgres. Have not tested the Rbac-based UI.
>>> 
>>> https://github.com/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153
>>>  
>>> <https://github.com/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153>
>>>  (expanding on UPDATING.md for Logging changes) isn't in the release, but 
>>> would only affect people who look at the UPDATING.md in the source tarball, 
>>> which isn't going to be very many - most people will check in the repo and 
>>> just install via PyPi I'd guess?
>>> 
>>> -ash
>>> 
>>>> On 8 Aug 2018, at 19:21, Bolke de Bruin  wrote:
>>>> 
>>>> Hey all,
>>>> 
>>>> I have cut Airflow 1.10.0 RC4. This email is calling a vote on the release,
>>>> which will last for 72 hours. Consider this my (binding) +1.
>>>> 
>>>> Airflow 1.10.0 RC 4 is available at:
>>>> 
>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/ 
>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/>
>>>> 
>>>> apache-airflow-1.10.0rc4+incubating-source.tar.gz is a source release that
>>>> comes with INSTALL instructions.
>>>> apache-airflow-1.10.0rc4+incubating-bin.tar.gz is the binary Python "sdist"
>>>> release.
>>>> 
>>>> Public keys are available at:
>>>> 
>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ 
>>>> <https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>>> 
>>>> The amount of JIRAs fixed is over 700. Please have a look at the 
>>>> changelog. 
>>>> Since RC3 the following has been fixed:
>>>> 
>>>> [AIRFLOW-2870] Use abstract TaskInstance for migration
>>>> [AIRFLOW-2859] Implement own UtcDateTime
>>>> [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook
>>>> [AIRFLOW-2869] Remove smart quote from default config
>>>> [AIRFLOW-2857] Fix Read the Docs env
>>>> 
>>>> Please note that the version number excludes the `rcX` string as well
>>>> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
>>>> to rename the artifact without modifying the artifact checksums when we
>>>> actually release.
>>>> 
>>>> WARNING: Due to licensing requirements you will need to set 
>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>>>> installing or upgrading. We will try to remove this requirement for the 
>>>> next release.
>>>> 
>>>> Cheers,
>>>> Bolke
>>> 
>> 
> 


Re: apache-airflow v1.10.0 on PyPi?

2018-08-09 Thread Bolke de Bruin
Hi Kris,

Please use the mailing list for these kind of questions.

Airflow 1.10.0 hasn’t been released yet. We are going through the motions, but 
it will take a couple of days before it’s official (if all goes well).

B.

Verstuurd vanaf mijn iPad

> Op 9 aug. 2018 om 23:33 heeft Krish Sigler  het volgende 
> geschreven:
> 
> Hi,
> 
> First, I apologize if this is weird.  I saw on the Airflow github page that 
> you most recently updated the v1.10.0 changelog, and I found your email using 
> the instructions here 
> (https://www.sourcecon.com/how-to-find-almost-any-github-users-email-address/).
>   If that's too weird feel free to tell me and/or ignore this.
> 
> I'm emailing because I'm working with the apache-airflow project, 
> specifically for setting up pipelines involving GCP packages.  My environment 
> uses Python3, and I've been running into the issue outlined in this PR: 
> https://github.com/apache/incubator-airflow/pull/3273.  I noticed that the 
> fix is part of the v1.10.0 changelog.
> However, the latest version available on PyPi is 1.9.0.  On the Airflow wiki 
> page I read that the project is intended to be updated every ~6 months, and 
> v1.9.0 was released in January.
> 
> So my question, if you're at liberty to tell me, is can I expect v1.10.0 to 
> be available on PyPi in the near future?  If so then great!  That would solve 
> my package dependency problem.  If not, then I'll look into some workaround 
> for my issue.
> 
> Thanks,
> Krish


Re: [VOTE] Airflow 1.10.0rc4

2018-08-09 Thread Bolke de Bruin
0.5?? Can we score fractions :-) ? Sorry I missed this Ash. I think Fokko 
really wants a 1.10.1 quickly so better include it then? Can you make your vote 
+1?

Thx
Bolke

> On 9 Aug 2018, at 14:06, Ash Berlin-Taylor  wrote:
> 
> +0.5 (binding) from me.
> 
> Tested upgrading form 1.9.0 metadb on Py3.5. Timezones behaving themselves on 
> Postgres. Have not tested the Rbac-based UI.
> 
> https://github.com/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153
>  
> <https://github.com/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153>
>  (expanding on UPDATING.md for Logging changes) isn't in the release, but 
> would only affect people who look at the UPDATING.md in the source tarball, 
> which isn't going to be very many - most people will check in the repo and 
> just install via PyPi I'd guess?
> 
> -ash
> 
>> On 8 Aug 2018, at 19:21, Bolke de Bruin  wrote:
>> 
>> Hey all,
>> 
>> I have cut Airflow 1.10.0 RC4. This email is calling a vote on the release,
>> which will last for 72 hours. Consider this my (binding) +1.
>> 
>> Airflow 1.10.0 RC 4 is available at:
>> 
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/ 
>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/>
>> 
>> apache-airflow-1.10.0rc4+incubating-source.tar.gz is a source release that
>> comes with INSTALL instructions.
>> apache-airflow-1.10.0rc4+incubating-bin.tar.gz is the binary Python "sdist"
>> release.
>> 
>> Public keys are available at:
>> 
>> https://dist.apache.org/repos/dist/release/incubator/airflow/ 
>> <https://dist.apache.org/repos/dist/release/incubator/airflow/>
>> 
>> The amount of JIRAs fixed is over 700. Please have a look at the changelog. 
>> Since RC3 the following has been fixed:
>> 
>> [AIRFLOW-2870] Use abstract TaskInstance for migration
>> [AIRFLOW-2859] Implement own UtcDateTime
>> [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook
>> [AIRFLOW-2869] Remove smart quote from default config
>> [AIRFLOW-2857] Fix Read the Docs env
>> 
>> Please note that the version number excludes the `rcX` string as well
>> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
>> to rename the artifact without modifying the artifact checksums when we
>> actually release.
>> 
>> WARNING: Due to licensing requirements you will need to set 
>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>> installing or upgrading. We will try to remove this requirement for the 
>> next release.
>> 
>> Cheers,
>> Bolke
> 



Re: Identifying delay between schedule & run instances

2018-08-09 Thread Bolke de Bruin
Hi vardang,

What do you intent to gain from this metric? There are many influences that 
influence a difference between execution date and start date. You named one of 
them, but there are also functional ones (limits reached etc). We are not a 
real time system so we never really purposefully aimed for lowering a 
difference because.

B.

Verstuurd vanaf mijn iPad

> Op 9 aug. 2018 om 08:04 heeft vardangupta...@gmail.com 
>  het volgende geschreven:
> 
> 
> 
>> On 2018/08/06 07:07:05, vardangupta...@gmail.com  
>> wrote: 
>> Hi Everyone,
>> 
>> We just wanted to calculate a metric which can talk about what's the 
>> delay(if any) between DAG getting active in scheduler & server and then 
>> tasks of DAG actually getting kicked off (let's suppose start_date was of 1 
>> hour earlier and schedule was every 10 minutes).
>> 
>> Currently task_instance table has execution_date, start_date, end_date & 
>> queued_dttm, we can easily get this metric from the difference of start_date 
>>  & execution_date but in case of back fill, execution_date will be of 
>> previous schedule occurrence and difference of start_date & execution_date 
>> will be skewed, though it will be okay for any future runs to get the delay 
>> in scheduling but for back fills, this number won't be trustworthy, any 
>> suggestions how to smartly identify this metric, may be by knowing somehow 
>> back fill details? Even in DAG table, there is no create_date & update_date 
>> notion which can tell me when this DAG was originally brought to existence?
>> 
>> 
>> Regards,
>> Vardan Gupta
>> 
> Can someone look at the issue?


Re: Plan to change type of dag_id from String to Number?

2018-08-09 Thread Bolke de Bruin
No we don’t have such plan. Dag ids are used to have a readable identifier. If 
you think it makes such a big difference in speed please show us numbers from 
Airflow running with and without.

Thx
B.

Verstuurd vanaf mijn iPad

> Op 6 aug. 2018 om 08:31 heeft vardangupta...@gmail.com 
>  het volgende geschreven:
> 
> Hi Everyone,
> 
> Do we have any plan to change type of dag_id from String to Number, this will 
> make queries on metadata more performant, proposal could be generating an 
> auto-incremental value in dag table and this id getting used in rest of the 
> other tables?
> 
> 
> Regards,
> Vardan Gupta


Re: [VOTE] Airflow 1.10.0rc4

2018-08-08 Thread Bolke de Bruin
3.7 is not an officially supported version. I don't think it warrants a -1. 
Maybe it does warrant a quick 1.10.1. I don't like the scope creep tbh. 

Please reconsider. 

B. 

Sent from my iPhone

> On 8 Aug 2018, at 23:30, Driesprong, Fokko  wrote:
> 
> -1 (binding)
> 
> Sorry Bolke for not checking this earlier. In rc3 we've replaced some of
> the reserved keywords. But still I'm unable to run a simple dag in the 1.10
> rc4 release under Python 3.7:
> 
> MacBook-Pro-van-Fokko:sdh-api-pobt fokkodriesprong$ docker run -e
> SLUGIFY_USES_TEXT_UNIDECODE=yes -t -i python:3.7 /bin/bash -c "pip install
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/apache-airflow-1.10.0rc4+incubating-bin.tar.gz
> && airflow initdb && airflow run example_bash_operator runme_0 2017-07-01"
> Collecting
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/apache-airflow-1.10.0rc4+incubating-bin.tar.gz
>  Downloading
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/apache-airflow-1.10.0rc4+incubating-bin.tar.gz
> (4.4MB)
>100% || 4.4MB 2.5MB/s
>Complete output from command python setup.py egg_info:
>Traceback (most recent call last):
>  File "", line 1, in 
>  File "/tmp/pip-req-build-91ci7xlu/setup.py", line 124
>async = [
>  ^
>SyntaxError: invalid syntax
> 
>
> Command "python setup.py egg_info" failed with error code 1 in
> /tmp/pip-req-build-91ci7xlu/
> You are using pip version 10.0.1, however version 18.0 is available.
> You should consider upgrading via the 'pip install --upgrade pip' command.
> 
> I think we should cherry-pick these three commits on to 1.10 branch:
> 
> *- Remove the async from setup.py*
> https://github.com/apache/incubator-airflow/commit/e38a4e5d3064980abd10b8afa6918ab9f10dd8a2
> 
> *- Upgrade lxml to >4.0 to let it compile with Python 3.7*
> https://github.com/apache/incubator-airflow/commit/5290688ee0576ad167d9622c96cdeb08e9965a20
> lxml is needed for Python 3.7:
> https://github.com/apache/incubator-airflow/pull/3583
> 
> *- Bump tenacy from 4.8.0 to 4.12.0 *
> https://github.com/apache/incubator-airflow/pull/3723/commits/271ea663df72c16aa105017ed5cc87a639846777
> The 4.8 version of Tenacy contains reserved keywords
> https://github.com/apache/incubator-airflow/pull/3723
> 
> After this I'm able to run an example dag using Python3.7: docker run -e
> SLUGIFY_USES_TEXT_UNIDECODE=yes -t -i python:3.7 /bin/bash -c "pip install
> git+https://github.com/Fokko/incubator-airflow.git@v1-10-stable && airflow
> initdb && airflow run example_bash_operator runme_0 2017-07-01"
> 
> [2018-08-08 21:25:57,944] {__init__.py:51} INFO - Using executor
> SequentialExecutor
> [2018-08-08 21:25:58,069] {models.py:258} INFO - Filling up the DagBag from
> /root/airflow/dags
> [2018-08-08 21:25:58,112] {example_kubernetes_operator.py:54} WARNING -
> Could not import KubernetesPodOperator: No module named 'kubernetes'
> [2018-08-08 21:25:58,112] {example_kubernetes_operator.py:55} WARNING -
> Install kubernetes dependencies with: pip install airflow['kubernetes']
> [2018-08-08 21:25:58,155] {cli.py:492} INFO - Running  example_bash_operator.runme_0 2017-07-01T00:00:00+00:00 [None]> on host
> 31ec1d1554b7
> [2018-08-08 21:25:58,739] {__init__.py:51} INFO - Using executor
> SequentialExecutor
> [2018-08-08 21:25:58,915] {models.py:258} INFO - Filling up the DagBag from
> /root/airflow/dags/example_dags/example_bash_operator.py
> [2018-08-08 21:25:58,987] {example_kubernetes_operator.py:54} WARNING -
> Could not import KubernetesPodOperator: No module named 'kubernetes'
> [2018-08-08 21:25:58,987] {example_kubernetes_operator.py:55} WARNING -
> Install kubernetes dependencies with: pip install airflow['kubernetes']
> [2018-08-08 21:25:59,060] {cli.py:492} INFO - Running  example_bash_operator.runme_0 2017-07-01T00:00:00+00:00 [None]> on host
> 31ec1d1554b7
> 
> https://github.com/Fokko/incubator-airflow/commits/v1-10-stable
> 
> Still no hard guarantees that 3.7 will be fully supported, but at least it
> runs :-)
> 
> Cheers, Fokko
> 
> 2018-08-08 20:21 GMT+02:00 Bolke de Bruin :
> 
>> Hey all,
>> 
>> I have cut Airflow 1.10.0 RC4. This email is calling a vote on the release,
>> which will last for 72 hours. Consider this my (binding) +1.
>> 
>> Airflow 1.10.0 RC 4 is available at:
>> 
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/ <
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/>
>> 
>> apache-air

[VOTE] Airflow 1.10.0rc4

2018-08-08 Thread Bolke de Bruin
Hey all,

I have cut Airflow 1.10.0 RC4. This email is calling a vote on the release,
which will last for 72 hours. Consider this my (binding) +1.

Airflow 1.10.0 RC 4 is available at:

https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/ 


apache-airflow-1.10.0rc4+incubating-source.tar.gz is a source release that
comes with INSTALL instructions.
apache-airflow-1.10.0rc4+incubating-bin.tar.gz is the binary Python "sdist"
release.

Public keys are available at:

https://dist.apache.org/repos/dist/release/incubator/airflow/ 


The amount of JIRAs fixed is over 700. Please have a look at the changelog. 
Since RC3 the following has been fixed:

[AIRFLOW-2870] Use abstract TaskInstance for migration
[AIRFLOW-2859] Implement own UtcDateTime
[AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook
[AIRFLOW-2869] Remove smart quote from default config
[AIRFLOW-2857] Fix Read the Docs env

Please note that the version number excludes the `rcX` string as well
as the "+incubating" string, so it's now simply 1.10.0. This will allow us
to rename the artifact without modifying the artifact checksums when we
actually release.

WARNING: Due to licensing requirements you will need to set 
 SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
installing or upgrading. We will try to remove this requirement for the 
next release.

Cheers,
Bolke

Re: [VOTE] Airflow 1.10.0rc3

2018-08-07 Thread Bolke de Bruin
George 

Thanks for the report. Can you dig in a bit further? You mention that alembic 
picks up the current version (1.8 in your case) of TaskInstance. That seems 
impossible if you installed Airflow by pip or setup.py first. 

Cheers
Bolke

Sent from my iPhone

> On 8 Aug 2018, at 01:07, George Leslie-Waksman  wrote:
> 
> We just tried to upgrade a 1.8.1 install to 1.10rc3 and ran into a critical
> error on alembic migration execution. I have captured the issue in JIRA:
> https://issues.apache.org/jira/browse/AIRFLOW-2870
> 
> I would consider this a critical blocker for release because it hard blocks
> upgrading.
> 
> George
> 
>> On Tue, Aug 7, 2018 at 7:58 AM Bolke de Bruin  wrote:
>> 
>> Done. When I roll rc4 it will be part of it.
>> 
>> 
>>> On 7 Aug 2018, at 16:26, Naik Kaxil  wrote:
>>> 
>>> @bolke Can we also include the following commit to 1.10 release as we
>> would need this commit to generate docs at ReadTheDocs?
>>> 
>>> -
>> https://github.com/apache/incubator-airflow/commit/8af0aa96bfe3caa51d67ab393db069d37b0c4169
>>> 
>>> Regards,
>>> Kaxil
>>> 
>>> On 06/08/2018, 14:59, "James Meickle" 
>> wrote:
>>> 
>>>   Not a vote, but a comment: it might be worth noting that the new
>>>   environment variable is also required if you have any Airflow plugin
>> test
>>>   suites that install Airflow as part of their dependencies. In my
>> case, I
>>>   had to set the new env var outsidfe of tox and add this:
>>> 
>>>   ```
>>>   [testenv]
>>>   passenv = SLUGIFY_USES_TEXT_UNIDECODE
>>>   ```
>>> 
>>>   (`setenv` did not work as that provides env vars at runtime but not
>>>   installtime, as far as I can tell.)
>>> 
>>> 
>>>   On Sun, Aug 5, 2018 at 5:20 PM Bolke de Bruin 
>> wrote:
>>> 
>>>> +1 :-)
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On 5 Aug 2018, at 23:08, Ash Berlin-Taylor <
>>>> ash_airflowl...@firemirror.com> wrote:
>>>>> 
>>>>> Yup, just worked out the same thing.
>>>>> 
>>>>> I think as "punishment" for me finding bugs so late in two RCs (this,
>>>> and 1.9) I should run the release for the next release.
>>>>> 
>>>>> -ash
>>>>> 
>>>>>> On 5 Aug 2018, at 22:05, Bolke de Bruin  wrote:
>>>>>> 
>>>>>> Yeah I figured it out. Originally i was using a different
>>>> implementation of UTCDateTime, but that was unmaintained. I switched,
>> but
>>>> this version changed or has a different contract. While it transforms on
>>>> storing to UTC it does not so when it receives timezone aware fields
>> from
>>>> the db. Hence the issue.
>>>>>> 
>>>>>> I will prepare a PR that removes the dependency and implements our own
>>>> extension of DateTime. Probably tomorrow.
>>>>>> 
>>>>>> Good catch! Just in time :-(.
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor <
>>>> ash_airflowl...@firemirror.com> wrote:
>>>>>>> 
>>>>>>> Entirely possible, though I wasn't even dealing with the scheduler -
>>>> the issue I was addressing was entirely in the webserver for a
>> pre-existing
>>>> Task Instance.
>>>>>>> 
>>>>>>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears
>>>> that isn't working right/ as expected. This line:
>>>> 
>> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>>>> doens't look right for us - as you mentioned the TZ is set to something
>>>> (rather than having no TZ value).
>>>>>>> 
>>>>>>> Some background on how Pq handles TZs. It always returns DTs in the
>> TZ
>>>> of the connection. I'm not sure if this is unique to postgres or if
>> other
>>>> DBs behave the same.
>>>>>>> 
>>>>>>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time
>>>> zone;
>>>>>>>  timestamptz
>>>>>>> 
>>>>>>> 2018-08-03 01:00:00+01
>>>>>>> 
>>>>>>> 

Re: [VOTE] Airflow 1.10.0rc3

2018-08-07 Thread Bolke de Bruin
Rc4 will, but note 3.7 is not officially supported. 

Sent from my iPhone

> On 8 Aug 2018, at 07:09, Yongjun Park  wrote:
> 
> I tried to install this version of airflow with python 3.7.
> There was still `async` keyword in setup.py.
> 
> I think it should include AIRFLOW-2713
> <https://issues.apache.org/jira/browse/AIRFLOW-2713> for supporting python
> 3.7
> 
> Regards,
> Yongjun
> 
> 2018년 8월 8일 (수) 오전 8:08, George Leslie-Waksman 님이 작성:
> 
>> We just tried to upgrade a 1.8.1 install to 1.10rc3 and ran into a critical
>> error on alembic migration execution. I have captured the issue in JIRA:
>> https://issues.apache.org/jira/browse/AIRFLOW-2870
>> 
>> I would consider this a critical blocker for release because it hard blocks
>> upgrading.
>> 
>> George
>> 
>>> On Tue, Aug 7, 2018 at 7:58 AM Bolke de Bruin  wrote:
>>> 
>>> Done. When I roll rc4 it will be part of it.
>>> 
>>> 
>>>> On 7 Aug 2018, at 16:26, Naik Kaxil  wrote:
>>>> 
>>>> @bolke Can we also include the following commit to 1.10 release as we
>>> would need this commit to generate docs at ReadTheDocs?
>>>> 
>>>> -
>>> 
>> https://github.com/apache/incubator-airflow/commit/8af0aa96bfe3caa51d67ab393db069d37b0c4169
>>>> 
>>>> Regards,
>>>> Kaxil
>>>> 
>>>> On 06/08/2018, 14:59, "James Meickle" > .INVALID>
>>> wrote:
>>>> 
>>>>   Not a vote, but a comment: it might be worth noting that the new
>>>>   environment variable is also required if you have any Airflow plugin
>>> test
>>>>   suites that install Airflow as part of their dependencies. In my
>>> case, I
>>>>   had to set the new env var outsidfe of tox and add this:
>>>> 
>>>>   ```
>>>>   [testenv]
>>>>   passenv = SLUGIFY_USES_TEXT_UNIDECODE
>>>>   ```
>>>> 
>>>>   (`setenv` did not work as that provides env vars at runtime but not
>>>>   installtime, as far as I can tell.)
>>>> 
>>>> 
>>>>   On Sun, Aug 5, 2018 at 5:20 PM Bolke de Bruin 
>>> wrote:
>>>> 
>>>>> +1 :-)
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On 5 Aug 2018, at 23:08, Ash Berlin-Taylor <
>>>>> ash_airflowl...@firemirror.com> wrote:
>>>>>> 
>>>>>> Yup, just worked out the same thing.
>>>>>> 
>>>>>> I think as "punishment" for me finding bugs so late in two RCs (this,
>>>>> and 1.9) I should run the release for the next release.
>>>>>> 
>>>>>> -ash
>>>>>> 
>>>>>>> On 5 Aug 2018, at 22:05, Bolke de Bruin  wrote:
>>>>>>> 
>>>>>>> Yeah I figured it out. Originally i was using a different
>>>>> implementation of UTCDateTime, but that was unmaintained. I switched,
>>> but
>>>>> this version changed or has a different contract. While it transforms
>> on
>>>>> storing to UTC it does not so when it receives timezone aware fields
>>> from
>>>>> the db. Hence the issue.
>>>>>>> 
>>>>>>> I will prepare a PR that removes the dependency and implements our
>> own
>>>>> extension of DateTime. Probably tomorrow.
>>>>>>> 
>>>>>>> Good catch! Just in time :-(.
>>>>>>> 
>>>>>>> B.
>>>>>>> 
>>>>>>>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor <
>>>>> ash_airflowl...@firemirror.com> wrote:
>>>>>>>> 
>>>>>>>> Entirely possible, though I wasn't even dealing with the scheduler
>> -
>>>>> the issue I was addressing was entirely in the webserver for a
>>> pre-existing
>>>>> Task Instance.
>>>>>>>> 
>>>>>>>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It
>> appears
>>>>> that isn't working right/ as expected. This line:
>>>>> 
>>> 
>> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>>>>> doens't look right for us - as you mentioned the TZ is set to
>> something
>>>>> (rather than having no TZ value).
>>>>>>>> 
&

Re: Airflow's JS code (and dependencies) manageable via npm and webpack

2018-08-07 Thread Bolke de Bruin
O that’s great stuff. Hopefully it works nicely with Keycloak. Did you try by 
any chance? Why not upstream this?

> On 7 Aug 2018, at 11:55, Ravi Kotecha  wrote:
> 
> Slightly OT but we've done some work on auth with FAB that might be useful
> for anyone using OpenIDConnect and www_rbac here:
> https://github.com/ministryofjustice/fab-oidc
> 
> It could serve as a template if people are doing something different too
> 
> On Mon, Jul 23, 2018 at 6:50 PM Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
> 
>> Are there any gaps on the new vs old UI at this point?
>> 
>> For many that upgrade will require some work to set up authentication on
>> the new UI. This is fairly well documented in FAB.
>> 
>> Max
>> 
>> On Mon, Jul 23, 2018 at 3:08 AM Bolke de Bruin  wrote:
>> 
>>> I think it should be removed now. 1.10.X should be the last release seri
>> s
>>> that supports the old www. Do we need to vote on this?
>>> 
>>> Great work Verdan!
>>> 
>>> Verstuurd vanaf mijn iPad
>>> 
>>>> Op 23 jul. 2018 om 10:23 heeft Driesprong, Fokko >> 
>>> het volgende geschreven:
>>>> 
>>>> ​Nice work Verdan.
>>>> 
>>>> The frontend really needed some love, thank you for picking this up.
>>> Maybe
>>>> we should also think deprecating the old www. Keeping both of the UI's
>> is
>>>> something that takes a lot of time. Maybe after the release of 1.10 we
>>> can
>>>> think of moving to Airflow 2.0, and removing the old UI.
>>>> 
>>>> 
>>>> Cheers, Fokko​
>>>> 
>>>> 2018-07-23 10:02 GMT+02:00 Naik Kaxil :
>>>> 
>>>>> Awesome. Thanks @Verdan
>>>>> 
>>>>> On 23/07/2018, 07:58, "Verdan Mahmood" 
>>> wrote:
>>>>> 
>>>>>   Heads-up!! This frontend change has been merged in master branch
>>>>> recently.
>>>>>   This will impact the users working on Airflow RBAC UI only. That
>>> means:
>>>>> 
>>>>>   *If you are a contributor/developer of Apache Airflow:*
>>>>>   You'll need to install and build the frontend packages if you want
>> to
>>>>> run
>>>>>   the web UI.
>>>>>   Please make sure to read the new section, "Setting up the node /
>> npm
>>>>>   javascript environment"
>>>>>   <https://github.com/apache/incubator-airflow/blob/master/
>>>>> CONTRIBUTING.md#setting-up-the-node--npm-javascript-
>>>>> environment-only-for-www_rbac>
>>>>> 
>>>>>   in CONTRIBUTING.md
>>>>> 
>>>>>   *If you are using Apache Airflow in your production environment:*
>>>>>   Nothing will impact you, as every new build of Apache Airflow will
>>>>> come up
>>>>>   with pre-built dependencies.
>>>>> 
>>>>>   Please let me know if you have any questions. Thank you
>>>>> 
>>>>>   Best,
>>>>>   *Verdan Mahmood*
>>>>> 
>>>>> 
>>>>>   On Sun, Jul 15, 2018 at 6:52 PM Maxime Beauchemin <
>>>>>   maximebeauche...@gmail.com> wrote:
>>>>> 
>>>>>> Glad to see this is happening!
>>>>>> 
>>>>>> Max
>>>>>> 
>>>>>> On Mon, Jul 9, 2018 at 6:37 AM Ash Berlin-Taylor <
>>>>>> ash_airflowl...@firemirror.com> wrote:
>>>>>> 
>>>>>>> Great! Thanks for doing this. I've left some review comments on
>>>>> your PR.
>>>>>>> 
>>>>>>> -ash
>>>>>>> 
>>>>>>>> On 9 Jul 2018, at 11:45, Verdan Mahmood <
>>>>> verdan.mahm...@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> ​Hey Guys, ​
>>>>>>>> 
>>>>>>>> In an effort to simplify the JS dependencies of Airflow
>>>>>>>> ​​
>>>>>>>> ,
>>>>>>>> ​I've
>>>>>>>> introduce
>>>>>>>> ​d​
>>>>>>>> npm and webpack for the package management. For now, it only
>>>>> implements
>>>>>>>> this in the www_rbac version of the web server.
>>>>>>>> 

Re: [VOTE] Airflow 1.10.0rc3

2018-08-07 Thread Bolke de Bruin
Done. When I roll rc4 it will be part of it.


> On 7 Aug 2018, at 16:26, Naik Kaxil  wrote:
> 
> @bolke Can we also include the following commit to 1.10 release as we would 
> need this commit to generate docs at ReadTheDocs?
> 
> - 
> https://github.com/apache/incubator-airflow/commit/8af0aa96bfe3caa51d67ab393db069d37b0c4169
> 
> Regards,
> Kaxil
> 
> On 06/08/2018, 14:59, "James Meickle"  
> wrote:
> 
>Not a vote, but a comment: it might be worth noting that the new
>environment variable is also required if you have any Airflow plugin test
>suites that install Airflow as part of their dependencies. In my case, I
>had to set the new env var outsidfe of tox and add this:
> 
>```
>[testenv]
>passenv = SLUGIFY_USES_TEXT_UNIDECODE
>```
> 
>(`setenv` did not work as that provides env vars at runtime but not
>    installtime, as far as I can tell.)
> 
> 
>On Sun, Aug 5, 2018 at 5:20 PM Bolke de Bruin  wrote:
> 
>> +1 :-)
>> 
>> Sent from my iPhone
>> 
>>> On 5 Aug 2018, at 23:08, Ash Berlin-Taylor <
>> ash_airflowl...@firemirror.com> wrote:
>>> 
>>> Yup, just worked out the same thing.
>>> 
>>> I think as "punishment" for me finding bugs so late in two RCs (this,
>> and 1.9) I should run the release for the next release.
>>> 
>>> -ash
>>> 
>>>> On 5 Aug 2018, at 22:05, Bolke de Bruin  wrote:
>>>> 
>>>> Yeah I figured it out. Originally i was using a different
>> implementation of UTCDateTime, but that was unmaintained. I switched, but
>> this version changed or has a different contract. While it transforms on
>> storing to UTC it does not so when it receives timezone aware fields from
>> the db. Hence the issue.
>>>> 
>>>> I will prepare a PR that removes the dependency and implements our own
>> extension of DateTime. Probably tomorrow.
>>>> 
>>>> Good catch! Just in time :-(.
>>>> 
>>>> B.
>>>> 
>>>>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor <
>> ash_airflowl...@firemirror.com> wrote:
>>>>> 
>>>>> Entirely possible, though I wasn't even dealing with the scheduler -
>> the issue I was addressing was entirely in the webserver for a pre-existing
>> Task Instance.
>>>>> 
>>>>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears
>> that isn't working right/ as expected. This line:
>> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>> doens't look right for us - as you mentioned the TZ is set to something
>> (rather than having no TZ value).
>>>>> 
>>>>> Some background on how Pq handles TZs. It always returns DTs in the TZ
>> of the connection. I'm not sure if this is unique to postgres or if other
>> DBs behave the same.
>>>>> 
>>>>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time
>> zone;
>>>>>   timestamptz
>>>>> 
>>>>> 2018-08-03 01:00:00+01
>>>>> 
>>>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>>>>   timestamptz
>>>>> 
>>>>> 2018-08-03 01:00:00+01
>>>>> 
>>>>> The server will always return TZs in the connection timezone.
>>>>> 
>>>>> postgres=# set timezone=utc;
>>>>> SET
>>>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>>>>   timestamptz
>>>>> 
>>>>> 2018-08-03 00:00:00+00
>>>>> (1 row)
>>>>> 
>>>>> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
>>>>>   timestamptz
>>>>> 
>>>>> 2018-08-03 00:00:00+00
>>>>> (1 row)
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -ash
>>>>> 
>>>>>> On 5 Aug 2018, at 21:28, Bolke de Bruin  wrote:
>>>>>> 
>>>>>> This is the issue:
>>>>>> 
>>>>>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE:
>> 2018-08-03 00:00:00+00:00 tzinfo: 
>>>>>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created > example_http_operator @ 2018-08-03 02:00:00+02:00:
>> schedul

Re: Airflow's JS code (and dependencies) manageable via npm and webpack

2018-08-07 Thread Bolke de Bruin
After the release of 1.10 we will start removing the old “www”.

B.

> On 7 Aug 2018, at 11:03, Verdan Mahmood  wrote:
> 
> Hi Dave,
> 
> AceJS is being used only in "www" version of webserver.
> "www_rbac" does not use AceJS, and we implemented npm and webpack only for
> www_rbac.
> 
> Best,
> *Verdan Mahmood*
> 
> On Mon, Aug 6, 2018 at 11:49 PM dave.allan...@gmail.com <
> dave.allan...@gmail.com> wrote:
> 
>> Hi Verdan
>> 
>> Which version of Ace js was used? I couldn't track that
>> 
>> Cheers
>> David
>> 
>> On 2018/07/09 10:45:00, Verdan Mahmood  wrote:
>>> ​Hey Guys, ​
>>> 
>>> In an effort to simplify the JS dependencies of Airflow
>>> ​​
>>> ,
>>> ​I've
>>> introduce
>>> ​d​
>>> npm and webpack for the package management. For now, it only implements
>>> this in the www_rbac version of the web server.
>>> ​
>>> 
>>> Pull Request: https://github.com/apache/incubator-airflow/pull/3572
>>> 
>>> The problem with the
>>> ​existing ​
>>> frontend (
>>> ​JS
>>> ) code of Airflow is that most of the custom JS is written
>>> ​with​
>>> in the html files, using the Flask's (Jinja) variables in that JS. The
>> next
>>> step of this effort would be to extract that custom
>>> ​JS
>>> code in separate JS files
>>> ​,​
>>> use the dependencies in those files using require or import
>>> ​ and introduce the JS automated test suite eventually. ​
>>> (At the moment, I'm simply using the CopyWebPackPlugin to copy the
>> required
>>> dependencies for use)
>>> ​.
>>> 
>>> There are also some dependencies which are directly modified in the
>> codebase
>>> ​ or are outdated​
>>> . I couldn't found the
>>> ​ correct​
>>> npm versions of those libraries. (dagre-d3.js and gantt-chart-d3v2.js).
>>> Apparently dagre-d3.js that we are using is one of the gist or is very
>> old
>>> version
>>> ​ not supported with webpack 4​
>>> , while the gantt-chart-d3v2 has been modified according to Airflow's
>>> requirements
>>> ​ I believe​
>>> .
>>> ​ Used the existing libraries for now. ​
>>> 
>>> ​I am currently working in a separate branch to upgrade the DagreD3
>>> library, and updating the custom JS related to DagreD3 accordingly. ​
>>> 
>>> This PR also introduces the pypi_push.sh
>>> <
>> https://github.com/apache/incubator-airflow/pull/3572/files#diff-8fae684cdcc8cc8df2232c8df16f64cb
>>> 
>>> script that will generate all the JS statics before creating and
>> uploading
>>> the package.
>>> ​
>>> ​Please let me know if you guys have any questions or suggestions and I'd
>>> be happy to answer that. ​
>>> 
>>> Best,
>>> *Verdan Mahmood*
>>> (+31) 655 576 560
>>> 
>> 



Re: Deploy Airflow on Kubernetes using Airflow Operator

2018-08-05 Thread Bolke de Bruin
Really awesome stuff. We are in progress to move over to k8s for Airflow (on 
prem though) and this is really helpful.

B.

Verstuurd vanaf mijn iPad

> Op 3 aug. 2018 om 23:35 heeft Barni Seetharaman  
> het volgende geschreven:
> 
> Hi
> 
> We at Google just open-sourced a Kubernetes custom controller (also called
> operator) to make deploying and managing Airflow on kubernetes simple.
> The operator pattern is a power abstraction in kubernetes.
> Please watch this repo (in the process of adding docs) for further updates.
> 
> https://github.com/GoogleCloudPlatform/airflow-operator
> 
> Do reach out if you have any questions.
> 
> Also created a channel in kubernetes slack  (#airflow-operator)
>  for any
> discussions specific to Airflow on Kubernetes (including Daniel's
> Kubernetes Executor, Kuberenetes operator and this custom controller also
> called kuberntes airflow operator).
> 
> regs
> Barni


Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
+1 :-)

Sent from my iPhone

> On 5 Aug 2018, at 23:08, Ash Berlin-Taylor  
> wrote:
> 
> Yup, just worked out the same thing.
> 
> I think as "punishment" for me finding bugs so late in two RCs (this, and 
> 1.9) I should run the release for the next release.
> 
> -ash
> 
>> On 5 Aug 2018, at 22:05, Bolke de Bruin  wrote:
>> 
>> Yeah I figured it out. Originally i was using a different implementation of 
>> UTCDateTime, but that was unmaintained. I switched, but this version changed 
>> or has a different contract. While it transforms on storing to UTC it does 
>> not so when it receives timezone aware fields from the db. Hence the issue.
>> 
>> I will prepare a PR that removes the dependency and implements our own 
>> extension of DateTime. Probably tomorrow.
>> 
>> Good catch! Just in time :-(.
>> 
>> B.
>> 
>>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor  
>>> wrote:
>>> 
>>> Entirely possible, though I wasn't even dealing with the scheduler - the 
>>> issue I was addressing was entirely in the webserver for a pre-existing 
>>> Task Instance.
>>> 
>>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that 
>>> isn't working right/ as expected. This line: 
>>> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>>>  doens't look right for us - as you mentioned the TZ is set to something 
>>> (rather than having no TZ value).
>>> 
>>> Some background on how Pq handles TZs. It always returns DTs in the TZ of 
>>> the connection. I'm not sure if this is unique to postgres or if other DBs 
>>> behave the same.
>>> 
>>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone;
>>>timestamptz
>>> 
>>> 2018-08-03 01:00:00+01
>>> 
>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>>timestamptz
>>> 
>>> 2018-08-03 01:00:00+01
>>> 
>>> The server will always return TZs in the connection timezone.
>>> 
>>> postgres=# set timezone=utc;
>>> SET
>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>>timestamptz
>>> 
>>> 2018-08-03 00:00:00+00
>>> (1 row)
>>> 
>>> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
>>>timestamptz
>>> 
>>> 2018-08-03 00:00:00+00
>>> (1 row)
>>> 
>>> 
>>> 
>>> 
>>> -ash
>>> 
>>>> On 5 Aug 2018, at 21:28, Bolke de Bruin  wrote:
>>>> 
>>>> This is the issue:
>>>> 
>>>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 
>>>> 00:00:00+00:00 tzinfo: 
>>>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created >>> example_http_operator @ 2018-08-03 02:00:00+02:00: 
>>>> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
>>>> 
>>>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 
>>>> 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, 
>>>> name=None)
>>>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created >>> example_http_operator @ 2018-08-04 02:00:00+02:00: 
>>>> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
>>>> 
>>>> Notice at line 1+2: that the next run date is correctly in UTC but from 
>>>> the DB it gets a +2. At the next bit (3+4) we get a 
>>>> psycopg2.tz.FixedOffsetTimezone which should be set to UTC according to 
>>>> the specs of https://github.com/spoqa/sqlalchemy-utc 
>>>> <https://github.com/spoqa/sqlalchemy-utc> , but it isn’t. 
>>>> 
>>>> So changing your setting of the DB to UTC fixes the symptom but not the 
>>>> cause.
>>>> 
>>>> B.
>>>> 
>>>>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor 
>>>>>  wrote:
>>>>> 
>>>>> Sorry for being terse before.
>>>>> 
>>>>> So the issue is that the ts loaded from the DB is not in UTC, it's in 
>>>>> GB/+01 (the default of the DB server)
>>>>> 
>>>>> For me, on a currently running 1.9 (no TZ) db:
>>>>> 
>>>>> airflow=# select * from task_instance;
>

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
Yeah I figured it out. Originally i was using a different implementation of 
UTCDateTime, but that was unmaintained. I switched, but this version changed or 
has a different contract. While it transforms on storing to UTC it does not so 
when it receives timezone aware fields from the db. Hence the issue.

I will prepare a PR that removes the dependency and implements our own 
extension of DateTime. Probably tomorrow.

Good catch! Just in time :-(.

B.

> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor  
> wrote:
> 
> Entirely possible, though I wasn't even dealing with the scheduler - the 
> issue I was addressing was entirely in the webserver for a pre-existing Task 
> Instance.
> 
> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that 
> isn't working right/ as expected. This line: 
> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>  doens't look right for us - as you mentioned the TZ is set to something 
> (rather than having no TZ value).
> 
> Some background on how Pq handles TZs. It always returns DTs in the TZ of the 
> connection. I'm not sure if this is unique to postgres or if other DBs behave 
> the same.
> 
> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone;
>  timestamptz
> 
> 2018-08-03 01:00:00+01
> 
> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>  timestamptz
> 
> 2018-08-03 01:00:00+01
> 
> The server will always return TZs in the connection timezone.
> 
> postgres=# set timezone=utc;
> SET
> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>  timestamptz
> 
> 2018-08-03 00:00:00+00
> (1 row)
> 
> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
>  timestamptz
> 
> 2018-08-03 00:00:00+00
> (1 row)
> 
> 
> 
> 
> -ash
> 
>> On 5 Aug 2018, at 21:28, Bolke de Bruin  wrote:
>> 
>> This is the issue:
>> 
>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 
>> 00:00:00+00:00 tzinfo: 
>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created > example_http_operator @ 2018-08-03 02:00:00+02:00: 
>> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
>> 
>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 
>> 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, name=None)
>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created > example_http_operator @ 2018-08-04 02:00:00+02:00: 
>> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
>> 
>> Notice at line 1+2: that the next run date is correctly in UTC but from the 
>> DB it gets a +2. At the next bit (3+4) we get a 
>> psycopg2.tz.FixedOffsetTimezone which should be set to UTC according to the 
>> specs of https://github.com/spoqa/sqlalchemy-utc 
>> <https://github.com/spoqa/sqlalchemy-utc> , but it isn’t. 
>> 
>> So changing your setting of the DB to UTC fixes the symptom but not the 
>> cause.
>> 
>> B.
>> 
>>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor  
>>> wrote:
>>> 
>>> Sorry for being terse before.
>>> 
>>> So the issue is that the ts loaded from the DB is not in UTC, it's in 
>>> GB/+01 (the default of the DB server)
>>> 
>>> For me, on a currently running 1.9 (no TZ) db:
>>> 
>>> airflow=# select * from task_instance;
>>> get_op| example_http_operator | 2018-07-23 00:00:00
>>> 
>>> This date time appears in the log url, and the path it looks at on S3 is 
>>> 
>>> .../example_http_operator/2018-07-23T00:00:00/1.log
>>> 
>>> If my postgres server has a default timezone of GB (which the one running 
>>> on my laptop does), and I then apply the migration then it is converted to 
>>> that local time.
>>> 
>>> airflow=# select * from task_instance;
>>> get_op| example_http_operator | 2018-07-23 01:00:00+01
>>> 
>>> airflow=# set timezone=UTC;
>>> airflow=# select * from task_instance;
>>> get_op| example_http_operator | 2018-07-23 00:00:00+00
>>> 
>>> 
>>> This is all okay so far. The migration has kept the column at the same 
>>> moment in time.
>>> 
>>> The issue come when the UI tries to display logs for this old task: because 
>>> the timezone of the connection is not UTC, PG returns a date with a +01 TZ. 
>>> Thus after the migration

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
This is the issue:

[2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 
00:00:00+00:00 tzinfo: 
[2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created 

[2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 
02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, name=None)
[2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created 

Notice at line 1+2: that the next run date is correctly in UTC but from the DB 
it gets a +2. At the next bit (3+4) we get a psycopg2.tz.FixedOffsetTimezone 
which should be set to UTC according to the specs of 
https://github.com/spoqa/sqlalchemy-utc 
<https://github.com/spoqa/sqlalchemy-utc> , but it isn’t. 

So changing your setting of the DB to UTC fixes the symptom but not the cause.

B.

> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor  
> wrote:
> 
> Sorry for being terse before.
> 
> So the issue is that the ts loaded from the DB is not in UTC, it's in GB/+01 
> (the default of the DB server)
> 
> For me, on a currently running 1.9 (no TZ) db:
> 
> airflow=# select * from task_instance;
> get_op| example_http_operator | 2018-07-23 00:00:00
> 
> This date time appears in the log url, and the path it looks at on S3 is 
> 
> .../example_http_operator/2018-07-23T00:00:00/1.log
> 
> If my postgres server has a default timezone of GB (which the one running on 
> my laptop does), and I then apply the migration then it is converted to that 
> local time.
> 
> airflow=# select * from task_instance;
> get_op| example_http_operator | 2018-07-23 01:00:00+01
> 
> airflow=# set timezone=UTC;
> airflow=# select * from task_instance;
> get_op| example_http_operator | 2018-07-23 00:00:00+00
> 
> 
> This is all okay so far. The migration has kept the column at the same moment 
> in time.
> 
> The issue come when the UI tries to display logs for this old task: because 
> the timezone of the connection is not UTC, PG returns a date with a +01 TZ. 
> Thus after the migration this old task tries to look for a log file of
> 
> .../example_http_operator/2018-07-23T01:00:00/1.log
> 
> which doesn't exist - it's changed the time it has rendered from midnight (in 
> v1.9) to 1am (in v1.10).
> 
> (This is with my change to log_filename_template from UPDATING.md in my other 
> branch)
> 
> Setting the timezone to UTC per connection means the behaviour of Airflow 
> doesn't change depending on how the server is configured.
> 
> -ash
> 
>> On 5 Aug 2018, at 20:58, Bolke de Bruin  wrote:
>> 
>> Digging in a bit further. 
>> 
>>  ti.dag_id / ti.task_id / ts / try_number 
>> .log
>> 
>> is the format
>> 
>> ts = execution_date.isoformat and should be in UTC afaik.
>> 
>> something is weird tbh.
>> 
>> B.
>> 
>> 
>>> On 5 Aug 2018, at 21:32, Bolke de Bruin  wrote:
>>> 
>>> Ash,
>>> 
>>> Reading your proposed changes on your “set-timezone-to-utc” branch and 
>>> below analysis, I am not sure what you are perceiving as an issue.
>>> 
>>> For conversion we assume everything is stored in UTC and in a naive format. 
>>> Conversion then adds the timezone information. This results in the following
>>> 
>>> postgres timezone = “Europe/Amsterdam”
>>> 
>>> 
>>> airflow=# select * from task_instance;
>>> get_op| example_http_operator | 2018-07-27 02:00:00+02
>>> 
>>> airflow=# set timezone=UTC;
>>> airflow=# select * from task_instance;
>>> get_op| example_http_operator | 2018-07-27 00:00:00+00
>>> 
>>> If we don’t set the timezone in the connection postgres assumes server 
>>> timezone (in my case “Europe/Amsterdam”). So every datetime Airflow 
>>> receives will be in “Europe/Amsterdam” format. However as we defined the 
>>> model to use UTCDateTime it will always convert the returned DateTime to 
>>> UTC.
>>> 
>>> If we have configured Airflow to support something else as UTC as the 
>>> default timezone or a DAG has a associated timezone we only convert to that 
>>> timezone when calculating the next runtime (not for cron btw). Nowhere else 
>>> and thus we are UTC everywhere.
>>> 
>>> What do you think is inconsistent?
>>> 
>>> Bolke
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor 
>>>>  wrote:
>>>> 
>>>> Relating to 2): I'm not sure that the upgrade from timezoneless to 
>>>> timezone

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
Digging in a bit further. 

 ti.dag_id / ti.task_id / ts / try_number .log

is the format

ts = execution_date.isoformat and should be in UTC afaik.

something is weird tbh.

B.


> On 5 Aug 2018, at 21:32, Bolke de Bruin  wrote:
> 
> Ash,
> 
> Reading your proposed changes on your “set-timezone-to-utc” branch and below 
> analysis, I am not sure what you are perceiving as an issue.
> 
> For conversion we assume everything is stored in UTC and in a naive format. 
> Conversion then adds the timezone information. This results in the following
> 
> postgres timezone = “Europe/Amsterdam”
> 
> 
> airflow=# select * from task_instance;
> get_op| example_http_operator | 2018-07-27 02:00:00+02
> 
> airflow=# set timezone=UTC;
> airflow=# select * from task_instance;
> get_op| example_http_operator | 2018-07-27 00:00:00+00
> 
> If we don’t set the timezone in the connection postgres assumes server 
> timezone (in my case “Europe/Amsterdam”). So every datetime Airflow receives 
> will be in “Europe/Amsterdam” format. However as we defined the model to use 
> UTCDateTime it will always convert the returned DateTime to UTC.
> 
> If we have configured Airflow to support something else as UTC as the default 
> timezone or a DAG has a associated timezone we only convert to that timezone 
> when calculating the next runtime (not for cron btw). Nowhere else and thus 
> we are UTC everywhere.
> 
> What do you think is inconsistent?
> 
> Bolke
> 
> 
> 
> 
> 
> 
>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor  
>> wrote:
>> 
>> Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
>> aware colums in the task instance is right, or at least it's not what I 
>> expected.
>> 
>> Before weren't all TZs from schedule dates etc in UTC? For the same task 
>> instance (these outputs from psql directly):
>> 
>> before: execution_date=2017-09-04 00:00:00
>> after: execution_date=2017-09-04 01:00:00+01
>> 
>> **Okay the migration is fine**. It appears that the migration has done the 
>> right thing, but my local DB I'm testing with has a Timezone of GB set, so 
>> Postgres converts it to that TZ on returning an object.
>> 
>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
>> consistent behaviour? Is this possible some how? I don't know SQLAlchemy 
>> that well.
>> 
>> 
>> -ash
>> 
>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor  
>>> wrote:
>>> 
>>> 1.) Missing UPDATING note about change of task_log_reader to now always 
>>> being "task" (was "s3.task" before.). Logging config is much simpler now 
>>> though. This may be particular to my logging config, but given how much of 
>>> a pain it was to set up S3 logging in 1.9 I have shared my config with some 
>>> people in the Gitter chat so It's not just me.
>>> 
>>> 2) The path that log-files are written to in S3 has changed (again - this 
>>> happened from 1.8 to 1.9). I'd like to avoid having to move all of my log 
>>> files again to continue viewing them. The change is that the path now (in 
>>> 1.10) has a timezone in it, and the date is in local time, before it was 
>>> UTC:
>>> 
>>> before: 2018-07-23T00:00:00/1.log
>>> after: 2018-07-23T01:00:00+01:00/1.log
>>> 
>>> We can possibly get away with an updating note about this to set a custom 
>>> log_filename_template. Testing this now.
>>> 
>>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor  wrote:
>>>> 
>>>> -1(binding) from me.
>>>> 
>>>> Installed with:
>>>> 
>>>> AIRFLOW_GPL_UNIDECODE=yes pip install 
>>>> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>>>>  
>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>,
>>>>  s3, crypto]>=1.10'
>>>> 
>>>> Install went fine.
>>>> 
>>>> Our DAGs that use SparkSubmitOperator are now failing as there is now a 
>>>> hard dependency on the Kubernetes client libs, but the `emr` group doesn't 
>>>> mention this.
>>>> 
>>>> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
>>>> <https://github.com/apache/incubator-airflow/pull/3112>
>>>> 
>>>> I see two option

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
Ash,

Reading your proposed changes on your “set-timezone-to-utc” branch and below 
analysis, I am not sure what you are perceiving as an issue.

For conversion we assume everything is stored in UTC and in a naive format. 
Conversion then adds the timezone information. This results in the following

postgres timezone = “Europe/Amsterdam”


airflow=# select * from task_instance;
get_op| example_http_operator | 2018-07-27 02:00:00+02

airflow=# set timezone=UTC;
airflow=# select * from task_instance;
get_op| example_http_operator | 2018-07-27 00:00:00+00

If we don’t set the timezone in the connection postgres assumes server timezone 
(in my case “Europe/Amsterdam”). So every datetime Airflow receives will be in 
“Europe/Amsterdam” format. However as we defined the model to use UTCDateTime 
it will always convert the returned DateTime to UTC.

If we have configured Airflow to support something else as UTC as the default 
timezone or a DAG has a associated timezone we only convert to that timezone 
when calculating the next runtime (not for cron btw). Nowhere else and thus we 
are UTC everywhere.

What do you think is inconsistent?

Bolke






> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor  
> wrote:
> 
> Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
> aware colums in the task instance is right, or at least it's not what I 
> expected.
> 
> Before weren't all TZs from schedule dates etc in UTC? For the same task 
> instance (these outputs from psql directly):
> 
> before: execution_date=2017-09-04 00:00:00
> after: execution_date=2017-09-04 01:00:00+01
> 
> **Okay the migration is fine**. It appears that the migration has done the 
> right thing, but my local DB I'm testing with has a Timezone of GB set, so 
> Postgres converts it to that TZ on returning an object.
> 
> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
> consistent behaviour? Is this possible some how? I don't know SQLAlchemy that 
> well.
> 
> 
> -ash
> 
>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor  
>> wrote:
>> 
>> 1.) Missing UPDATING note about change of task_log_reader to now always 
>> being "task" (was "s3.task" before.). Logging config is much simpler now 
>> though. This may be particular to my logging config, but given how much of a 
>> pain it was to set up S3 logging in 1.9 I have shared my config with some 
>> people in the Gitter chat so It's not just me.
>> 
>> 2) The path that log-files are written to in S3 has changed (again - this 
>> happened from 1.8 to 1.9). I'd like to avoid having to move all of my log 
>> files again to continue viewing them. The change is that the path now (in 
>> 1.10) has a timezone in it, and the date is in local time, before it was UTC:
>> 
>> before: 2018-07-23T00:00:00/1.log
>> after: 2018-07-23T01:00:00+01:00/1.log
>> 
>> We can possibly get away with an updating note about this to set a custom 
>> log_filename_template. Testing this now.
>> 
>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor  wrote:
>>> 
>>> -1(binding) from me.
>>> 
>>> Installed with:
>>> 
>>> AIRFLOW_GPL_UNIDECODE=yes pip install 
>>> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>>>  
>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>,
>>>  s3, crypto]>=1.10'
>>> 
>>> Install went fine.
>>> 
>>> Our DAGs that use SparkSubmitOperator are now failing as there is now a 
>>> hard dependency on the Kubernetes client libs, but the `emr` group doesn't 
>>> mention this.
>>> 
>>> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
>>> <https://github.com/apache/incubator-airflow/pull/3112>
>>> 
>>> I see two options for this - either conditionally enable k8s:// support if 
>>> the import works, or (less preferred) add kube-client to the emr deps 
>>> (which I like less)
>>> 
>>> Sorry - this is the first time I've been able to test it.
>>> 
>>> I will install this dep manually and continue testing.
>>> 
>>> -ash
>>> 
>>> (Normally no time at home due to new baby, but I got a standing desk, and a 
>>> carrier meaning she can sleep on me and I can use my laptop. Win!)
>>> 
>>> 
>>> 
>>>> On 4 Aug 2018, at 22:32, Bolke de Bruin >>> <mailto:bdbr...@gmail.com>> wrote:
>>>> 
>

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
It is not being smart. It does as it is required to: all our supported 
databases (apart from sqlite) do this but also Oracle and SqL server. 
Nevertheless we enforce utc by using UTCDateTime as the replacement for 
sqlalchemy's DateTime field this makes sure whatever the database sends us we 
transform it to UtC. 

I'll have a look at what needs to be done. 

Conditional import sounds fine to me. 

B.


Sent from my iPhone

> On 5 Aug 2018, at 19:14, Ash Berlin-Taylor  wrote:
> 
> 
>> On 5 Aug 2018, at 18:01, Bolke de Bruin  wrote:
>> 
>> Hi Ash,
>> 
>> Thanks a lot for the proper review, obviously I would have liked that these 
>> issues (I think just one) are popping up at rc3 but I understand why it 
>> happened. 
> 
> Yeah, sorry I didn't couldn't make time to test the betas :(
> 
>> 
>> Can you work out a patch for the k8s issue? I’m sure Fokko and others can 
>> chime in to make sure it will be the right change. 
> 
> Working on it - I'll go for a conditional import, and only except if the 
> "k8s://" scheme is specified I think.
> 
>> 
>> On the timezone change. The database will do the right thing and correctly 
>> transform a datetime into the timezone the client is using. Even then we 
>> enforce UTC internally and only transform it for user interaction or when it 
>> is relevant (to make sure we do daylight savings for example). It is 
>> therefore not required to force a timezone setting with sql alchemy beyond 
>> when we convert to timezone aware (see migration scripts).
> 
> I think the database is being "smart" here in converting, but I'm not sure 
> it's the Right thing. It wouldn't surprise me if we have other places in the 
> codebase that expect datetime columns to come back in UTC, but they might 
> come back in DB-server local timezone.
> 
> Trying 
> https://github.com/apache/incubator-airflow/compare/master...ashb:set-timezone-to-utc-on-connect?expand=1
>  
> <https://github.com/apache/incubator-airflow/compare/master...ashb:set-timezone-to-utc-on-connect?expand=1>
>  - it "fixes" my logging issue, tests are running 
> https://travis-ci.org/ashb/incubator-airflow/builds/412360920
> 
>> 
>> On the logging file format I agree this could be handled better. However I 
>> do think we should honor local system time for this as this is the standard 
>> for any other logging. Also logging output will be time stamped  in local 
>> system time. Maybe we could cut off the timezone identifier as it can be 
>> assumed to be in local system time (+01:00). 
> 
> The issue with just cutting off the timezone is that old log files are now 
> unviewable - they ran at 00:00:00 UTC, but the hour of the record coming back 
> is 01.
> 
>> 
>> If we take on the k8s fix we can also fix the logging format. What do you 
>> think?
> 
> Also as a quick fix I've changed the UPDATING.md as suggested: 
> https://github.com/apache/incubator-airflow/compare/master...ashb:updating-for-logging-changes?expand=1.
>  The log format is a bit clunky, but the note about log_task_reader is needed 
> either way. (Do we need a Jira ticket for this sort of change, or is 
> AIRFLOW-XXX okay for this?)
> 
>> 
>> Cheers
>> Bolke
>> 
>> Verstuurd vanaf mijn iPad
>> 
>>> Op 5 aug. 2018 om 18:13 heeft Ash Berlin-Taylor 
>>>  het volgende geschreven:
>>> 
>>> Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
>>> aware colums in the task instance is right, or at least it's not what I 
>>> expected.
>>> 
>>> Before weren't all TZs from schedule dates etc in UTC? For the same task 
>>> instance (these outputs from psql directly):
>>> 
>>> before: execution_date=2017-09-04 00:00:00
>>> after: execution_date=2017-09-04 01:00:00+01
>>> 
>>> **Okay the migration is fine**. It appears that the migration has done the 
>>> right thing, but my local DB I'm testing with has a Timezone of GB set, so 
>>> Postgres converts it to that TZ on returning an object.
>>> 
>>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
>>> consistent behaviour? Is this possible some how? I don't know SQLAlchemy 
>>> that well.
>>> 
>>> 
>>> -ash
>>> 
>>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor 
>>>>  wrote:
>>>> 
>>>> 1.) Missing UPDATING note about change of task_log_reader to now always 
>>>> being "task" (was "s3.task" before.). Logging config is much simpler now 
>>>> though

Re: [VOTE] Airflow 1.10.0rc3

2018-08-05 Thread Bolke de Bruin
Hi Ash,

Thanks a lot for the proper review, obviously I would have liked that these 
issues (I think just one) are popping up at rc3 but I understand why it 
happened. 

Can you work out a patch for the k8s issue? I’m sure Fokko and others can chime 
in to make sure it will be the right change. 

On the timezone change. The database will do the right thing and correctly 
transform a datetime into the timezone the client is using. Even then we 
enforce UTC internally and only transform it for user interaction or when it is 
relevant (to make sure we do daylight savings for example). It is therefore not 
required to force a timezone setting with sql alchemy beyond when we convert to 
timezone aware (see migration scripts).

On the logging file format I agree this could be handled better. However I do 
think we should honor local system time for this as this is the standard for 
any other logging. Also logging output will be time stamped  in local system 
time. Maybe we could cut off the timezone identifier as it can be assumed to be 
in local system time (+01:00). 

If we take on the k8s fix we can also fix the logging format. What do you think?

Cheers
Bolke

Verstuurd vanaf mijn iPad

> Op 5 aug. 2018 om 18:13 heeft Ash Berlin-Taylor 
>  het volgende geschreven:
> 
> Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
> aware colums in the task instance is right, or at least it's not what I 
> expected.
> 
> Before weren't all TZs from schedule dates etc in UTC? For the same task 
> instance (these outputs from psql directly):
> 
> before: execution_date=2017-09-04 00:00:00
> after: execution_date=2017-09-04 01:00:00+01
> 
> **Okay the migration is fine**. It appears that the migration has done the 
> right thing, but my local DB I'm testing with has a Timezone of GB set, so 
> Postgres converts it to that TZ on returning an object.
> 
> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
> consistent behaviour? Is this possible some how? I don't know SQLAlchemy that 
> well.
> 
> 
> -ash
> 
>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor  
>> wrote:
>> 
>> 1.) Missing UPDATING note about change of task_log_reader to now always 
>> being "task" (was "s3.task" before.). Logging config is much simpler now 
>> though. This may be particular to my logging config, but given how much of a 
>> pain it was to set up S3 logging in 1.9 I have shared my config with some 
>> people in the Gitter chat so It's not just me.
>> 
>> 2) The path that log-files are written to in S3 has changed (again - this 
>> happened from 1.8 to 1.9). I'd like to avoid having to move all of my log 
>> files again to continue viewing them. The change is that the path now (in 
>> 1.10) has a timezone in it, and the date is in local time, before it was UTC:
>> 
>> before: 2018-07-23T00:00:00/1.log
>> after: 2018-07-23T01:00:00+01:00/1.log
>> 
>> We can possibly get away with an updating note about this to set a custom 
>> log_filename_template. Testing this now.
>> 
>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor  wrote:
>>> 
>>> -1(binding) from me.
>>> 
>>> Installed with:
>>> 
>>> AIRFLOW_GPL_UNIDECODE=yes pip install 
>>> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>>>  
>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>,
>>>  s3, crypto]>=1.10'
>>> 
>>> Install went fine.
>>> 
>>> Our DAGs that use SparkSubmitOperator are now failing as there is now a 
>>> hard dependency on the Kubernetes client libs, but the `emr` group doesn't 
>>> mention this.
>>> 
>>> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
>>> <https://github.com/apache/incubator-airflow/pull/3112>
>>> 
>>> I see two options for this - either conditionally enable k8s:// support if 
>>> the import works, or (less preferred) add kube-client to the emr deps 
>>> (which I like less)
>>> 
>>> Sorry - this is the first time I've been able to test it.
>>> 
>>> I will install this dep manually and continue testing.
>>> 
>>> -ash
>>> 
>>> (Normally no time at home due to new baby, but I got a standing desk, and a 
>>> carrier meaning she can sleep on me and I can use my laptop. Win!)
>>> 
>>> 
>>> 
>>>> On 4 Aug 2018, at 22:32, Bolke de Bruin >>> <mailto:bdbr...@

Re: [VOTE] Airflow 1.10.0rc3

2018-08-04 Thread Bolke de Bruin
Bump. 

Committers please cast your vote. 

B.

Sent from my iPhone

> On 3 Aug 2018, at 13:23, Driesprong, Fokko  wrote:
> 
> +1 Binding
> 
> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
> 
> Cheers, Fokko
> 
> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin :
> 
>> Hey all,
>> 
>> I have cut Airflow 1.10.0 RC3. This email is calling a vote on the release,
>> which will last for 72 hours. Consider this my (binding) +1.
>> 
>> Airflow 1.10.0 RC 3 is available at:
>> 
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>> 
>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release that
>> comes with INSTALL instructions.
>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python
>> "sdist"
>> release.
>> 
>> Public keys are available at:
>> 
>> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
>> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>> 
>> The amount of JIRAs fixed is over 700. Please have a look at the
>> changelog.
>> Since RC2 the following has been fixed:
>> 
>> * [AIRFLOW-2817] Force explicit choice on GPL dependency
>> * [AIRFLOW-2716] Replace async and await py3.7 keywords
>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
>> 
>> Please note that the version number excludes the `rcX` string as well
>> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
>> to rename the artifact without modifying the artifact checksums when we
>> actually release.
>> 
>> WARNING: Due to licensing requirements you will need to set
>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>> installing or upgrading. We will try to remove this requirement for the
>> next release.
>> 
>> Cheers,
>> Bolke


Re: The need for LocalTaskJob

2018-08-04 Thread Bolke de Bruin
It is actually all Executors doing this (at least Local, Celery). And yes 
(although you description is a bit cryptic) I think you are right. 

What you call “shelling out” does not really cover what happens on a process 
level though. We execute “bash -c” with “shell=True” which probably makes the 
issue worse. Basically what happens is “ -> “bash -c” -> 
“python (airflow)”. That’s three processes and then twice. 

We can just execute “python” just fine. Because it will run in a separate 
interpreter no issues will come from sys.modules as that is not inherited. Will 
still parse DAGs in a separate process then. Forking (@ash) probably does not 
work as that does share sys.modules. 

Same goes for jobs running through sudo and with cgroups. No shell is required 
at all. 

The worker of the executor we can relatively easily extend to take over what 
LocalTaskJob does. If necessary we can keep it a bit dumber and either report 
back by API or MQ instead of DB.

The way we handle SIGTERM is pretty messy anyways and not really standard (we 
need to kill all descendent processes most of them are our own, e.g. airflow 
core). It also can be handled within the executor/worker. A cleanup will 
probably increase reliability. 

I’m writing AIP-2 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-2+Simplify+process+launching
 to work this out.

B.

Verstuurd vanaf mijn iPad

> Op 4 aug. 2018 om 19:40 heeft Ash Berlin-Taylor 
>  het volgende geschreven:
> 
> Comments inline.
> 
>> On 4 Aug 2018, at 18:28, Maxime Beauchemin  
>> wrote:
>> 
>> Let me confirm I'm understanding this right, we're talking specifically
>> about the CeleryExecutor not starting and `airflow run` (not --raw)
>> command, and fire up a LocalTaskJob instead? Then we'd still have the
>> worker fire up the `airflow run --raw` command?
>> 
>> Seems reasonable. One thing to keep in mind is the fact that shelling out
>> guarantees no `sys.module` caching, which is a real issue for slowly
>> changing DAG definitions. That's the reason why we'd have to reboot the
>> scheduler periodically before it used sub-processes to evaluate DAGs. Any
>> code that needs to evaluate a DAG should probably be done in a subprocess.
> 
>> 
>> Shelling out also allows for doing things like unix impersonation and
>> applying CGROUPS. This currently happens between `airflow run` and `airflow
>> run --raw`. The parent process also does heartbeat and listen for external
>> kill signal (kill pills).
>> 
>> I think what we want is smarter executors and only one level of bash
>> command: the `airflow run --raw`, and ideally the system that fires this up
>> is not Airflow itself, and cannot be DAG-aware (or it will need to get
>> restarted to flush the cache).
> 
> Rather than shelling out to `airflow run` could we instead fork and run the 
> CLI code directly? This involves parsing the config twice, loading all of the 
> airflow and SQLAlchemy deps twice etc. This I think would account for a 
> not-insignificant speed difference for the unit tests. In the case of 
> impersonation we'd probably have no option but to exec `airflow`, but most(?) 
> people don't use that?
> 
> Avoiding the extra parsing pentalty and process when we don't need it might 
> be worth it for test speed up alone. And we've already got impersonation 
> covered in the tests so we'll know that it still works.
> 
>> 
>> To me that really brings up the whole question of what should be handled by
>> the Executor, and what belongs in core Airflow. The Executor needs to do
>> more, and Airflow core less.
> 
> I agree with the sentiment that Core should do less and Executors more -- 
> many parts of the core are reimplementing what Celery itself could do.
> 
> 
>> 
>> When you think about how this should all work on Kubernetes, it looks
>> something like this:
>> * the scheduler, through KubeExecutor, calls the k8s API, tells it to fire
>> up and Airflow task
>> * container boots up and starts an `airflow run --raw` command
>> * k8s handles heartbeats, monitors tasks, knows how to kill a running task
>> * the scheduler process (call it supervisor), talks with k8s through
>> KubeExecutor
>> and handles zombie cleanup and sending kill pills
>> 
>> Now because Celery doesn't offer as many guarantees it gets a bit more
>> tricky. Is there even a way to send a kill pill through Celery? Are there
>> other ways than using a parent process to accomplish this?
> 
> It does 
> http://docs.celeryproject.org/en/latest/userguide/workers.html#revoke-revoking-tasks
>  (at least it does now)
> 
>> 
>> At a higher level, it seems like we need to move more logic from core
>> Airflow into the executors. For instance, the heartbeat construct should
>> probably be 100% handled by the executor, and not an assumption in the core
>> code base.
>> 
>> I think I drifted a bit, hopefully that's still helpful.
>> 
>> Max


Re: [DISCUSS] AIP - Time for Airflow Improvement Proposals?

2018-08-04 Thread Bolke de Bruin
Here you go:

https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals

B.

Verstuurd vanaf mijn iPad

> Op 31 jul. 2018 om 06:49 heeft Tao Feng  het volgende 
> geschreven:
> 
> Hey Jakob,
> 
> Are we going to start AIF for airflow? Currently it is a little bit hard to
> find out the proposals for different enhancement. It would be great if we
> could consolidate and put them in a central place(e.g like airflow wiki).
> 
> Thanks,
> -Tao
> 
> On Tue, Jul 17, 2018 at 10:14 PM, Chao-Han Tsai 
> wrote:
> 
>> +1
>> 
>>> On Tue, Jul 17, 2018 at 3:49 PM, Jin Chang  wrote:
>>> 
>>> +1
>>> 
 On Tue, Jul 17, 2018 at 3:26 PM, Tao Feng  wrote:
 
 +1, should we put them in airflow wiki space(like kafka(kip)) or google
>>> doc
 or email?
 
 On Sun, Jul 15, 2018 at 11:24 AM, Arthur Wiedmer <
>>> arthur.wied...@gmail.com
> 
 wrote:
 
> +1
> 
> On Sun, Jul 15, 2018, 20:12 Maxime Beauchemin <
 maximebeauche...@gmail.com>
> wrote:
> 
>> +1
>> 
>> On Tue, Jul 10, 2018 at 1:09 PM Sid Anand 
>> wrote:
>> 
>>> +1
>>> 
>>> On Tue, Jul 10, 2018 at 1:02 PM George Leslie-Waksman
>>>  wrote:
>>> 
 +1
 
 On Tue, Jul 10, 2018 at 11:50 AM Jakob Homan <
>> jgho...@gmail.com>
>> wrote:
 
> Lots of Apache projects use ?IPs - Whatever Improvement
>>> Proposal
 -
> to
> document and gather consensus on large changes to the code
>>> base.
>> Some
> examples:
>   * Kafka Improvement Proposals (KIP) -
> 
> 
 
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Improvement+Proposals
>  * Flink Improvement Proposal (FLIP) -
> 
> 
 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/
> Flink+Improvement+Proposals
>  * Spark Improvement Proposal (SPIP) -
> https://spark.apache.org/improvement-proposals.html
> 
> We've got a few changes that have been discussed, either on
>> the
> list/JIRA (good) or in private (bad -
> https://incubator.apache.org/guides/committer.html#mailing_
>>> lists
 )
>> that
> are of a magnitude that they may benefit from some version of
 this
> process.  Examples:
>   * The in-progress plan to refactor out connectors and
>> hooks
> (AIRFLOW-2732)
>   * K8S deployment operator proposal
>   * Initial Design for Supporting fine-grained Connection
> encryption
> 
> 
> The benefits of this approach is that the design is hosted
> somewhere
> less ephemeral and more editable than email.  It also
>> provides
>>> a
> framework for documenting and confirming consensus through
>> the
> whole
> community.
> 
>   What do y'all think?
> 
> -Jakob
> 
 
>>> 
>> 
> 
 
>>> 
>> 
>> 
>> 
>> --
>> 
>> Chao-Han Tsai
>> 


Re: Kerberos and Airflow

2018-08-04 Thread Bolke de Bruin
Hi Dan,

Don’t misunderstand me. I think what I proposed is complementary to the dag 
submit function. The only thing you mentioned I don’t think is needed is to 
fully serialize up front and therefore excluding callback etc (although there 
are other serialization libraries like marshmallow that might be able to do it).

You are right to mention that the hashes should be calculated at submit time 
and a authorized user should be able to recalculate a hash. Another option 
could be something like https://pypi.org/project/signedimp/ which we could use 
to verify dependencies.

I’ll start writing something up. We can then shoot holes in it (i think you 
have a point on the crypto) and maybe do some hacking on it. This could be part 
of the hackathon in sept in SF, I’m sure some other people would have an 
interest in it as well.

B.

Verstuurd vanaf mijn iPad

> Op 3 aug. 2018 om 23:14 heeft Dan Davydov  het 
> volgende geschreven:
> 
> I designed a system similar to what you are describing which is in use at
> Airbnb (only DAGs on a whitelist would be allowed to merged to the git repo
> if they used certain types of impersonation), it worked for simple use
> cases, but the problem was doing access control becomes very difficult,
> e.g. solving the problem of which DAGs map to which manifest files, and
> which manifest files can access which secrets.
> 
> There is also a security risk where someone changes e.g. a python file
> dependency of your task, or let's say you figure out a way to block those
> kinds of changes based on your sthashing, what if there is a legitimate
> change in a dependency and you want to recalculate the hash? Then I think
> you go back to a solution like your proposed "airflow submit" command to
> accomplish this.
> 
> Additional concerns:
> - I'm not sure if I'm a fan of the the first time a scheduler parses a DAG
> to be what creates the hashes either, it feels to me like
> encryption/hashing should be done before DAGs are even parsed by the
> scheduler (at commit time or submit time of the DAGs)
> - The type of the encrypted key seem kind of hacky to me, i.e. some kind of
> custom hash based on DAG structure instead of a simple token passed in by
> users which has a clear separation of concerns WRT security
> - Added complexity both to Airflow code, and to users as they need to
> define or customize hashing functions for DAGs to improve security
> If we can get a reasonably secure solution then it might be a reasonable
> trade-off considering the alternative is a major overhaul/restrictions to
> DAGs.
> 
> Maybe I'm missing some details that would alleviate my concerns here, and a
> bit of a more in-depth document might help?
> 
> 
> 
> *Also: using the Kubernetes executor combined with some of the things
> wediscussed greatly enhances the security of Airflow as the
> environment isn’t really shared anymore.*
> Assuming a multi-tenant scheduler, I feel the same set of hard problems
> exist with Kubernetes, as the executor mainly just simplifies the
> post-executor parts of task scheduling/execution which I think you already
> outlined a good solution for early on in this thread (passing keys from the
> executor to workers).
> 
> Happy to set up some time to talk real-time about this by the way, once we
> iron out the details I want to implement whatever the best solution we come
> up with is.
> 
>> On Thu, Aug 2, 2018 at 4:13 PM Bolke de Bruin  wrote:
>> 
>> You mentioned you would like to make sure that the DAG (and its tasks)
>> runs in a confined set of settings. Ie.
>> A given set of connections at submission time not at run time. So here we
>> can make use of the fact that both the scheduler
>> and the worker parse the DAG.
>> 
>> Firstly, when scheduler evaluates a DAG it can add an integrity check
>> (hash) for each task. The executor can encrypt the
>> metadata with this hash ensuring that the structure of the DAG remained
>> the same. It means that the task is only
>> able to decrypt the metadata when it is able to calculate the same hash.
>> 
>> Similarly, if the scheduler parses a DAG for the first time it can
>> register the hashes for the tasks. It can then verify these hashes
>> at runtime to ensure the structure of the tasks have stayed the same. In
>> the manifest (which could even in the DAG or
>> part of the DAG definition) we could specify which fields would be used
>> for hash calculation. We could even specify
>> static hashes. This would give flexibility as to what freedom the users
>> have in the auto-generated DAGS.
>> 
>> Something like that?
>> 
>> B.
>> 
>>> On 2 Aug 2018, at 20:12, Dan Davydov 
>> wrote:
>>>

The need for LocalTaskJob

2018-08-04 Thread Bolke de Bruin
Hi Max, Dan et al,

Currently, when a scheduled task runs this happens in three steps: 

1. Worker 
2. LocalTaskJob
3. Raw task instance

It uses (by default) 5 (!) different processes:

1. Worker 
2. Bash + Airflow
3. Bash + Airflow 

I think we can merge worker and LocalTaskJob as the latter seems exist only to 
track a particular task. This can be done within the worker without side 
effects. Next to thatI think we can limit the amount of (airflow) processes to 
2 if we remove the bash dependency. I don’t see any reason to depend on bash.

Can you guys shed some light on what the thoughts were around those choices? Am 
I missing anything on why they should exist?

Cheers
Bolke

Verstuurd vanaf mijn iPad

[VOTE] Airflow 1.10.0rc3

2018-08-03 Thread Bolke de Bruin
Hey all,

I have cut Airflow 1.10.0 RC3. This email is calling a vote on the release,
which will last for 72 hours. Consider this my (binding) +1.

Airflow 1.10.0 RC 3 is available at:

https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ 


apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release that
comes with INSTALL instructions.
apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python "sdist"
release.

Public keys are available at:

https://dist.apache.org/repos/dist/release/incubator/airflow/ 


The amount of JIRAs fixed is over 700. Please have a look at the changelog. 
Since RC2 the following has been fixed:

* [AIRFLOW-2817] Force explicit choice on GPL dependency
* [AIRFLOW-2716] Replace async and await py3.7 keywords
* [AIRFLOW-2810] Fix typo in Xcom model timestamp

Please note that the version number excludes the `rcX` string as well
as the "+incubating" string, so it's now simply 1.10.0. This will allow us
to rename the artifact without modifying the artifact checksums when we
actually release.

WARNING: Due to licensing requirements you will need to set 
 SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
installing or upgrading. We will try to remove this requirement for the 
next release.

Cheers,
Bolke

Re: Kerberos and Airflow

2018-08-02 Thread Bolke de Bruin
You mentioned you would like to make sure that the DAG (and its tasks) runs in 
a confined set of settings. Ie.
A given set of connections at submission time not at run time. So here we can 
make use of the fact that both the scheduler
and the worker parse the DAG. 

Firstly, when scheduler evaluates a DAG it can add an integrity check (hash) 
for each task. The executor can encrypt the
metadata with this hash ensuring that the structure of the DAG remained the 
same. It means that the task is only
able to decrypt the metadata when it is able to calculate the same hash.

Similarly, if the scheduler parses a DAG for the first time it can register the 
hashes for the tasks. It can then verify these hashes
at runtime to ensure the structure of the tasks have stayed the same. In the 
manifest (which could even in the DAG or
part of the DAG definition) we could specify which fields would be used for 
hash calculation. We could even specify
static hashes. This would give flexibility as to what freedom the users have in 
the auto-generated DAGS.

Something like that?

B.

> On 2 Aug 2018, at 20:12, Dan Davydov  wrote:
> 
> I'm very intrigued, and am curious how this would work in a bit more
> detail, especially for dynamically created DAGs (how would static manifests
> map to DAGs that are generated from rows in a MySQL table for example)? You
> could of course have something like regexes in your manifest file like
> some_dag_framework_dag_*, but then how would you make sure that other users
> did not create DAGs that matched this regex?
> 
> On Thu, Aug 2, 2018 at 1:51 PM Bolke de Bruin  <mailto:bdbr...@gmail.com>> wrote:
> 
>> Hi Dan,
>> 
>> I discussed this a little bit with one of the security architects here. We
>> think that
>> you can have a fair trade off between security and usability by having
>> a kind of manifest with the dag you are submitting. This manifest can then
>> specify what the generated tasks/dags are allowed to do and what metadata
>> to provide to them. We could also let the scheduler generate hashes per
>> generated
>> DAG / task and verify those with an established version (1st run?). This
>> limits the
>> attack vector.
>> 
>> A DagSerializer would be great, but I think it solves a different issue
>> and the above
>> is somewhat simpler to implement?
>> 
>> Bolke
>> 
>>> On 29 Jul 2018, at 23:47, Dan Davydov 
>> wrote:
>>> 
>>> *Let’s say we trust the owner field of the DAGs I think we could do the
>>> following.*
>>> *Obviously, the trusting the user part is key here. It is one of the
>>> reasons I was suggesting using “airflow submit” to update / add dags in
>>> Airflow*
>>> 
>>> 
>>> *This is the hard part about my question.*
>>> I think in a true multi-tenant environment we wouldn't be able to trust
>> the
>>> user, otherwise we wouldn't necessarily even need a mapping of Airflow
>> DAG
>>> users to secrets, because if we trust users to set the correct Airflow
>> user
>>> for DAGs, we are basically trusting them with all of the creds the
>> Airflow
>>> scheduler can access for all users anyways.
>>> 
>>> I actually had the same thought as your "airflow submit" a while ago,
>> which
>>> I discussed with Alex, basically creating an API for adding DAGs instead
>> of
>>> having the Scheduler parse them. FWIW I think it's superior to the git
>> time
>>> machine approach because it's a more generic form of "serialization" and
>> is
>>> more correct as well because the same DAG file parsed on a given git SHA
>>> can produce different DAGs. Let me know what you think, and maybe I can
>>> start a more formal design doc if you are onboard:
>>> 
>>> A user or service with an auth token sends an "airflow submit" request
>> to a
>>> new kind of Dag Serialization service, along with the serialized DAG
>>> objects generated by parsing on the client. It's important that these
>>> serialized objects are declaritive and not e.g. pickles so that the
>>> scheduler/workers can consume them and reproducability of the DAGs is
>>> guaranteed. The service will then store each generated DAG along with
>> it's
>>> access based on the provided token e.g. using Ranger, and the
>>> scheduler/workers will use the stored DAGs for scheduling/execution.
>>> Operators would be deployed along with the Airflow code separately from
>> the
>>> serialized DAGs.
>>> 
>>> A serialed DAG would look something like this (basically Luigi-style :)):

Re: Kerberos and Airflow

2018-08-02 Thread Bolke de Bruin
Also: using the Kubernetes executor combined with some of the things we
discussed greatly enhances the security of Airflow as the environment 
isn’t really shared anymore.

B.

> On 2 Aug 2018, at 19:51, Bolke de Bruin  wrote:
> 
> Hi Dan,
> 
> I discussed this a little bit with one of the security architects here. We 
> think that 
> you can have a fair trade off between security and usability by having
> a kind of manifest with the dag you are submitting. This manifest can then 
> specify what the generated tasks/dags are allowed to do and what metadata 
> to provide to them. We could also let the scheduler generate hashes per 
> generated
> DAG / task and verify those with an established version (1st run?). This 
> limits the 
> attack vector.
> 
> A DagSerializer would be great, but I think it solves a different issue and 
> the above 
> is somewhat simpler to implement?
> 
> Bolke
> 
>> On 29 Jul 2018, at 23:47, Dan Davydov > <mailto:ddavy...@twitter.com.INVALID>> wrote:
>> 
>> *Let’s say we trust the owner field of the DAGs I think we could do the
>> following.*
>> *Obviously, the trusting the user part is key here. It is one of the
>> reasons I was suggesting using “airflow submit” to update / add dags in
>> Airflow*
>> 
>> 
>> *This is the hard part about my question.*
>> I think in a true multi-tenant environment we wouldn't be able to trust the
>> user, otherwise we wouldn't necessarily even need a mapping of Airflow DAG
>> users to secrets, because if we trust users to set the correct Airflow user
>> for DAGs, we are basically trusting them with all of the creds the Airflow
>> scheduler can access for all users anyways.
>> 
>> I actually had the same thought as your "airflow submit" a while ago, which
>> I discussed with Alex, basically creating an API for adding DAGs instead of
>> having the Scheduler parse them. FWIW I think it's superior to the git time
>> machine approach because it's a more generic form of "serialization" and is
>> more correct as well because the same DAG file parsed on a given git SHA
>> can produce different DAGs. Let me know what you think, and maybe I can
>> start a more formal design doc if you are onboard:
>> 
>> A user or service with an auth token sends an "airflow submit" request to a
>> new kind of Dag Serialization service, along with the serialized DAG
>> objects generated by parsing on the client. It's important that these
>> serialized objects are declaritive and not e.g. pickles so that the
>> scheduler/workers can consume them and reproducability of the DAGs is
>> guaranteed. The service will then store each generated DAG along with it's
>> access based on the provided token e.g. using Ranger, and the
>> scheduler/workers will use the stored DAGs for scheduling/execution.
>> Operators would be deployed along with the Airflow code separately from the
>> serialized DAGs.
>> 
>> A serialed DAG would look something like this (basically Luigi-style :)):
>> MyTask - BashOperator: {
>>  cmd: "sleep 1"
>>  user: "Foo"
>>  access: "token1", "token2"
>> }
>> 
>> MyDAG: {
>>  MyTask1 >> SomeOtherTask1
>>  MyTask2 >> SomeOtherTask1
>> }
>> 
>> Dynamic DAGs in this case would just consist of a service calling "Airflow
>> Submit" that does it's own form of authentication to get access to some
>> kind of tokens (or basically just forwarding the secrets the users of the
>> dynamic DAG submit).
>> 
>> For the default Airflow implementation you can maybe just have the Dag
>> Serialization server bundled with the Scheduler, with auth turned off, and
>> to periodically update the Dag Serialization store which would emulate the
>> current behavior closely.
>> 
>> Pros:
>> 1. Consistency across running task instances in a dagrun/scheduler,
>> reproducability and auditability of DAGs
>> 2. Users can control when to deploy their DAGs
>> 3. Scheduler runs much faster since it doesn't have to run python files and
>> e.g. make network calls
>> 4. Scaling scheduler becomes easier because can have different service
>> responsible for parsing DAGs which can be trivially scaled horizontally
>> (clients are doing the parsing)
>> 5. Potentially makes creating ad-hoc DAGs/backfilling/iterating on DAGs
>> easier? e.g. can use the Scheduler itself to schedule backfills with a
>> slightly modified serialized version of a DAG.
>> 
>> Cons:
>> 1. Have to deprecate a lot of popular features, e.g. allowin

Re: Kerberos and Airflow

2018-08-02 Thread Bolke de Bruin
Hi Dan,

I discussed this a little bit with one of the security architects here. We 
think that 
you can have a fair trade off between security and usability by having
a kind of manifest with the dag you are submitting. This manifest can then 
specify what the generated tasks/dags are allowed to do and what metadata 
to provide to them. We could also let the scheduler generate hashes per 
generated
DAG / task and verify those with an established version (1st run?). This limits 
the 
attack vector.

A DagSerializer would be great, but I think it solves a different issue and the 
above 
is somewhat simpler to implement?

Bolke

> On 29 Jul 2018, at 23:47, Dan Davydov  wrote:
> 
> *Let’s say we trust the owner field of the DAGs I think we could do the
> following.*
> *Obviously, the trusting the user part is key here. It is one of the
> reasons I was suggesting using “airflow submit” to update / add dags in
> Airflow*
> 
> 
> *This is the hard part about my question.*
> I think in a true multi-tenant environment we wouldn't be able to trust the
> user, otherwise we wouldn't necessarily even need a mapping of Airflow DAG
> users to secrets, because if we trust users to set the correct Airflow user
> for DAGs, we are basically trusting them with all of the creds the Airflow
> scheduler can access for all users anyways.
> 
> I actually had the same thought as your "airflow submit" a while ago, which
> I discussed with Alex, basically creating an API for adding DAGs instead of
> having the Scheduler parse them. FWIW I think it's superior to the git time
> machine approach because it's a more generic form of "serialization" and is
> more correct as well because the same DAG file parsed on a given git SHA
> can produce different DAGs. Let me know what you think, and maybe I can
> start a more formal design doc if you are onboard:
> 
> A user or service with an auth token sends an "airflow submit" request to a
> new kind of Dag Serialization service, along with the serialized DAG
> objects generated by parsing on the client. It's important that these
> serialized objects are declaritive and not e.g. pickles so that the
> scheduler/workers can consume them and reproducability of the DAGs is
> guaranteed. The service will then store each generated DAG along with it's
> access based on the provided token e.g. using Ranger, and the
> scheduler/workers will use the stored DAGs for scheduling/execution.
> Operators would be deployed along with the Airflow code separately from the
> serialized DAGs.
> 
> A serialed DAG would look something like this (basically Luigi-style :)):
> MyTask - BashOperator: {
>  cmd: "sleep 1"
>  user: "Foo"
>  access: "token1", "token2"
> }
> 
> MyDAG: {
>  MyTask1 >> SomeOtherTask1
>  MyTask2 >> SomeOtherTask1
> }
> 
> Dynamic DAGs in this case would just consist of a service calling "Airflow
> Submit" that does it's own form of authentication to get access to some
> kind of tokens (or basically just forwarding the secrets the users of the
> dynamic DAG submit).
> 
> For the default Airflow implementation you can maybe just have the Dag
> Serialization server bundled with the Scheduler, with auth turned off, and
> to periodically update the Dag Serialization store which would emulate the
> current behavior closely.
> 
> Pros:
> 1. Consistency across running task instances in a dagrun/scheduler,
> reproducability and auditability of DAGs
> 2. Users can control when to deploy their DAGs
> 3. Scheduler runs much faster since it doesn't have to run python files and
> e.g. make network calls
> 4. Scaling scheduler becomes easier because can have different service
> responsible for parsing DAGs which can be trivially scaled horizontally
> (clients are doing the parsing)
> 5. Potentially makes creating ad-hoc DAGs/backfilling/iterating on DAGs
> easier? e.g. can use the Scheduler itself to schedule backfills with a
> slightly modified serialized version of a DAG.
> 
> Cons:
> 1. Have to deprecate a lot of popular features, e.g. allowing custom
> callbacks in operators (e.g. on_failure), and jinja_templates
> 2. Version compatibility problems, e.g. user/service client might be
> serializing arguments for hooks/operators that have been deprecated in
> newer versions of the hooks, or the serialized DAG schema changes and old
> DAGs aren't automatically updated. Might want to have some kind of
> versioning system for serialized DAGs to at least ensure that stored DAGs
> are valid when the Scheduler/Worker/etc are upgraded, maybe something
> similar to thrift/protobuf versioning.
> 3. Additional complexity - additional service, logic on workers/scheduler
> to fetch/

Re: Kerberos and Airflow

2018-07-29 Thread Bolke de Bruin
Ah gotcha. That’s another issue actually (but related).

Let’s say we trust the owner field of the DAGs I think we could do the 
following. We then have a table (and interface) to tell Airflow what users have 
access to what connections. The scheduler can then check if the task in the dag 
can access the conn_id it is asking for. Auto generated dags still have an 
owner (or should) and therefore should be fine. Some integrity checking 
could/should be added as we want to be sure that the task we schedule is the 
task we launch. So a signature calculated at the scheduler (or part of the 
DAG), send as part of the metadata and checked by the executor is probably 
smart.

You can also make this more fancy by integrating with something like Apache 
Ranger that allows for policy checking.

Obviously, the trusting the user part is key here. It is one of the reasons I 
was suggesting using “airflow submit” to update / add dags in Airflow. We could 
enforce authentication on the DAG. It was kind of ruled out in favor of git 
time machines although these never happened afaik ;-).

BTW: I have updated my implementation with protobuf. Metadata is now available 
at executor and task. 


> On 29 Jul 2018, at 15:47, Dan Davydov  wrote:
> 
> The concern is how to secure secrets on the scheduler such that only
> certain DAGs can access them, and in the case of files that create DAGs
> dynamically, only some set of DAGs should be able to access these secrets.
> 
> e.g. if there is a secret/keytab that can be read by DAG A generated by
> file X, and file X generates DAG B as well, there needs to be a scheme to
> stop the parsing of DAG B on the scheduler from being able to read the
> secret in DAG A.
> 
> Does that make sense?
> 
> On Sun, Jul 29, 2018 at 6:14 AM Bolke de Bruin  <mailto:bdbr...@gmail.com>> wrote:
> 
>> I’m not sure what you mean. The example I created allows for dynamic DAGs,
>> as the scheduler obviously knows about the tasks when they are ready to be
>> scheduled.
>> This isn’t any different from a static DAG or a dynamic one.
>> 
>> For Kerberos it isnt that special. Basically a keytab are the revokable
>> users credentials
>> in a special format. The keytab itself can be protected by a password. So
>> I can imagine
>> that a connection is defined that sets a keytab location and password to
>> access the keytab.
>> The scheduler understands this (or maybe the Connection model) and
>> serializes and sends
>> it to the worker as part of the metadata. The worker then reconstructs the
>> keytab and issues
>> a kinit or supplies it to the other service requiring it (eg. Spark)
>> 
>> * Obviously the worker and scheduler need to communicate over SSL.
>> * There is a challenge at the worker level. Credentials are secured
>> against other users, but are readable by the owning user. So imagine 2 DAGs
>> from two different users with different connections without sudo
>> configured. If they end up at the same worker if DAG 2 is malicious it
>> could read files and memory created by DAG 1. This is the reason why using
>> environment variables are NOT safe (DAG 2 could read /proc//environ).
>> To mitigate this we probably need to PIPE the data to the task’s STDIN. It
>> won’t solve the issue but will make it harder as now it will only be in
>> memory.
>> * The reconstructed keytab (or the initalized version) can be stored in,
>> most likely, the process-keyring (
>> http://man7.org/linux/man-pages/man7/process-keyring.7.html <
>> http://man7.org/linux/man-pages/man7/process-keyring.7.html 
>> <http://man7.org/linux/man-pages/man7/process-keyring.7.html>>). As
>> mentioned earlier this poses a challenge for Java applications that cannot
>> read from this location (keytab an ccache). Writing it out to the
>> filesystem then becomes a possibility. This is essentially the same how
>> Spark solves it (
>> https://spark.apache.org/docs/latest/security.html#yarn-mode 
>> <https://spark.apache.org/docs/latest/security.html#yarn-mode> <
>> https://spark.apache.org/docs/latest/security.html#yarn-mode 
>> <https://spark.apache.org/docs/latest/security.html#yarn-mode>>).
>> 
>> Why not work on this together? We need it as well. Airflow as it is now we
>> consider the biggest security threat and it is really hard to secure it.
>> The above would definitely be a serious improvement. Another step would be
>> to stop Tasks from accessing the Airflow DB all together.
>> 
>> Cheers
>> Bolke
>> 
>>> On 29 Jul 2018, at 05:36, Dan Davydov >> <mailto:ddavy...@twitter.com.INVALID>>
>> wrote:
>>> 
>>> This makes sense,

Re: Kerberos and Airflow

2018-07-29 Thread Bolke de Bruin
I’m not sure what you mean. The example I created allows for dynamic DAGs,
as the scheduler obviously knows about the tasks when they are ready to be 
scheduled.
This isn’t any different from a static DAG or a dynamic one.

For Kerberos it isnt that special. Basically a keytab are the revokable users 
credentials
in a special format. The keytab itself can be protected by a password. So I can 
imagine
that a connection is defined that sets a keytab location and password to access 
the keytab.
The scheduler understands this (or maybe the Connection model) and serializes 
and sends
it to the worker as part of the metadata. The worker then reconstructs the 
keytab and issues 
a kinit or supplies it to the other service requiring it (eg. Spark)

* Obviously the worker and scheduler need to communicate over SSL. 
* There is a challenge at the worker level. Credentials are secured against 
other users, but are readable by the owning user. So imagine 2 DAGs from two 
different users with different connections without sudo configured. If they end 
up at the same worker if DAG 2 is malicious it could read files and memory 
created by DAG 1. This is the reason why using environment variables are NOT 
safe (DAG 2 could read /proc//environ). To mitigate this we probably need 
to PIPE the data to the task’s STDIN. It won’t solve the issue but will make it 
harder as now it will only be in memory.
* The reconstructed keytab (or the initalized version) can be stored in, most 
likely, the process-keyring 
(http://man7.org/linux/man-pages/man7/process-keyring.7.html 
<http://man7.org/linux/man-pages/man7/process-keyring.7.html>). As mentioned 
earlier this poses a challenge for Java applications that cannot read from this 
location (keytab an ccache). Writing it out to the filesystem then becomes a 
possibility. This is essentially the same how Spark solves it 
(https://spark.apache.org/docs/latest/security.html#yarn-mode 
<https://spark.apache.org/docs/latest/security.html#yarn-mode>). 

Why not work on this together? We need it as well. Airflow as it is now we 
consider the biggest security threat and it is really hard to secure it. The 
above would definitely be a serious improvement. Another step would be to stop 
Tasks from accessing the Airflow DB all together.

Cheers
Bolke

> On 29 Jul 2018, at 05:36, Dan Davydov  wrote:
> 
> This makes sense, and thanks for putting this together. I might pick this
> up myself depending on if we can get the rest of the mutli-tenancy story
> nailed down, but I still think the tricky part is figuring out how to allow
> dynamic DAGs (e.g. DAGs created from rows in a Mysql table) to work with
> Kerberos, curious what your thoughts are there. How would secrets be passed
> securely in a multi-tenant Scheduler starting from parsing the DAGs up to
> the executor sending them off?
> 
> On Sat, Jul 28, 2018 at 5:07 PM Bolke de Bruin  <mailto:bdbr...@gmail.com>> wrote:
> 
>> Here:
>> 
>> https://github.com/bolkedebruin/airflow/tree/secure_connections 
>> <https://github.com/bolkedebruin/airflow/tree/secure_connections> <
>> https://github.com/bolkedebruin/airflow/tree/secure_connections 
>> <https://github.com/bolkedebruin/airflow/tree/secure_connections>>
>> 
>> Is a working rudimentary implementation that allows securing the
>> connections (only LocalExecutor at the moment)
>> 
>> * It enforces the use of “conn_id” instead of the mix that we have now
>> * A task if using “conn_id” has ‘auto-registered’ (which is a noop) its
>> connections
>> * The scheduler reads the connection informations and serializes it to
>> json (which should be a different format, protobuf preferably)
>> * The scheduler then sends this info to the executor
>> * The executor puts this in the environment of the task (environment most
>> likely not secure enough for us)
>> * The BaseHook reads out this environment variable and does not need to
>> touch the database
>> 
>> The example_http_operator works, I havent tested any other. To make it
>> work I just adjusted the hook and operator to use “conn_id” instead
>> of the non standard http_conn_id.
>> 
>> Makes sense?
>> 
>> B.
>> 
>> * The BaseHook is adjusted to not connect to the database
>>> On 28 Jul 2018, at 17:50, Bolke de Bruin  wrote:
>>> 
>>> Well, I don’t think a hook (or task) should be obtain it by itself. It
>> should be supplied.
>>> At the moment you start executing the task you cannot trust it anymore
>> (ie. it is unmanaged
>>> / non airflow code).
>>> 
>>> So we could change the basehook to understand supplied credentials and
>> populate
>>> a hash with “conn_ids”. Hooks normally call BaseHook.get_connectio

Re: Kerberos and Airflow

2018-07-28 Thread Bolke de Bruin
Here:

https://github.com/bolkedebruin/airflow/tree/secure_connections 
<https://github.com/bolkedebruin/airflow/tree/secure_connections>

Is a working rudimentary implementation that allows securing the connections 
(only LocalExecutor at the moment)

* It enforces the use of “conn_id” instead of the mix that we have now
* A task if using “conn_id” has ‘auto-registered’ (which is a noop) its 
connections
* The scheduler reads the connection informations and serializes it to json 
(which should be a different format, protobuf preferably)
* The scheduler then sends this info to the executor
* The executor puts this in the environment of the task (environment most 
likely not secure enough for us)
* The BaseHook reads out this environment variable and does not need to touch 
the database

The example_http_operator works, I havent tested any other. To make it work I 
just adjusted the hook and operator to use “conn_id” instead 
of the non standard http_conn_id.

Makes sense? 

B.

* The BaseHook is adjusted to not connect to the database
> On 28 Jul 2018, at 17:50, Bolke de Bruin  wrote:
> 
> Well, I don’t think a hook (or task) should be obtain it by itself. It should 
> be supplied.
> At the moment you start executing the task you cannot trust it anymore (ie. 
> it is unmanaged 
> / non airflow code).
> 
> So we could change the basehook to understand supplied credentials and 
> populate
> a hash with “conn_ids”. Hooks normally call BaseHook.get_connection anyway, so
> it shouldnt be too hard and should in principle not require changes to the 
> hooks
> themselves if they are well behaved.
> 
> B.
> 
>> On 28 Jul 2018, at 17:41, Dan Davydov > <mailto:ddavy...@twitter.com.INVALID>> wrote:
>> 
>> *So basically in the scheduler we parse the dag. Either from the manifest
>> (new) or from smart parsing (probably harder, maybe some auto register?) we
>> know what connections and keytabs are available dag wide or per task.*
>> This is the hard part that I was curious about, for dynamically created
>> DAGs, e.g. those generated by reading tasks in a MySQL database or a json
>> file, there isn't a great way to do this.
>> 
>> I 100% agree with deprecating the connections table (at least for the
>> secure option). The main work there is rewriting all hooks to take
>> credentials from arbitrary data sources by allowing a customized
>> CredentialsReader class. Although hooks are technically private, I think a
>> lot of companies depend on them so the PMC should probably discuss if this
>> is an Airflow 2.0 change or not.
>> 
>> On Fri, Jul 27, 2018 at 5:24 PM Bolke de Bruin > <mailto:bdbr...@gmail.com>> wrote:
>> 
>>> Sure. In general I consider keytabs as a part of connection information.
>>> Connections should be secured by sending the connection information a task
>>> needs as part of information the executor gets. A task should then not need
>>> access to the connection table in Airflow. Keytabs could then be send as
>>> part of the connection information (base64 encoded) and setup by the
>>> executor (this key) to be read only to the task it is launching.
>>> 
>>> So basically in the scheduler we parse the dag. Either from the manifest
>>> (new) or from smart parsing (probably harder, maybe some auto register?) we
>>> know what connections and keytabs are available dag wide or per task.
>>> 
>>> The credentials and connection information then are serialized into a
>>> protobuf message and send to the executor as part of the “queue” action.
>>> The worker then deserializes the information and makes it securely
>>> available to the task (which is quite hard btw).
>>> 
>>> On that last bit making the info securely available might be storing it in
>>> the Linux KEYRING (supported by python keyring). Keytabs will be tough to
>>> do properly due to Java not properly supporting KEYRING and only files and
>>> these are hard to make secure (due to the possibility a process will list
>>> all files in /tmp and get credentials through that). Maybe storing the
>>> keytab with a password and having the password in the KEYRING might work.
>>> Something to find out.
>>> 
>>> B.
>>> 
>>> Verstuurd vanaf mijn iPad
>>> 
>>>> Op 27 jul. 2018 om 22:04 heeft Dan Davydov >>> <mailto:ddavy...@twitter.com.INVALID>>
>>> het volgende geschreven:
>>>> 
>>>> I'm curious if you had any ideas in terms of ideas to enable
>>> multi-tenancy
>>>> with respect to Kerberos in Airflow.
>>>> 
>>>>>

Re: Kerberos and Airflow

2018-07-28 Thread Bolke de Bruin
Well, I don’t think a hook (or task) should be obtain it by itself. It should 
be supplied.
At the moment you start executing the task you cannot trust it anymore (ie. it 
is unmanaged 
/ non airflow code).

So we could change the basehook to understand supplied credentials and populate
a hash with “conn_ids”. Hooks normally call BaseHook.get_connection anyway, so
it shouldnt be too hard and should in principle not require changes to the hooks
themselves if they are well behaved.

B.

> On 28 Jul 2018, at 17:41, Dan Davydov  wrote:
> 
> *So basically in the scheduler we parse the dag. Either from the manifest
> (new) or from smart parsing (probably harder, maybe some auto register?) we
> know what connections and keytabs are available dag wide or per task.*
> This is the hard part that I was curious about, for dynamically created
> DAGs, e.g. those generated by reading tasks in a MySQL database or a json
> file, there isn't a great way to do this.
> 
> I 100% agree with deprecating the connections table (at least for the
> secure option). The main work there is rewriting all hooks to take
> credentials from arbitrary data sources by allowing a customized
> CredentialsReader class. Although hooks are technically private, I think a
> lot of companies depend on them so the PMC should probably discuss if this
> is an Airflow 2.0 change or not.
> 
> On Fri, Jul 27, 2018 at 5:24 PM Bolke de Bruin  wrote:
> 
>> Sure. In general I consider keytabs as a part of connection information.
>> Connections should be secured by sending the connection information a task
>> needs as part of information the executor gets. A task should then not need
>> access to the connection table in Airflow. Keytabs could then be send as
>> part of the connection information (base64 encoded) and setup by the
>> executor (this key) to be read only to the task it is launching.
>> 
>> So basically in the scheduler we parse the dag. Either from the manifest
>> (new) or from smart parsing (probably harder, maybe some auto register?) we
>> know what connections and keytabs are available dag wide or per task.
>> 
>> The credentials and connection information then are serialized into a
>> protobuf message and send to the executor as part of the “queue” action.
>> The worker then deserializes the information and makes it securely
>> available to the task (which is quite hard btw).
>> 
>> On that last bit making the info securely available might be storing it in
>> the Linux KEYRING (supported by python keyring). Keytabs will be tough to
>> do properly due to Java not properly supporting KEYRING and only files and
>> these are hard to make secure (due to the possibility a process will list
>> all files in /tmp and get credentials through that). Maybe storing the
>> keytab with a password and having the password in the KEYRING might work.
>> Something to find out.
>> 
>> B.
>> 
>> Verstuurd vanaf mijn iPad
>> 
>>> Op 27 jul. 2018 om 22:04 heeft Dan Davydov 
>> het volgende geschreven:
>>> 
>>> I'm curious if you had any ideas in terms of ideas to enable
>> multi-tenancy
>>> with respect to Kerberos in Airflow.
>>> 
>>>> On Fri, Jul 27, 2018 at 2:38 PM Bolke de Bruin 
>> wrote:
>>>> 
>>>> Cool. The doc will need some refinement as it isn't entirely accurate.
>> In
>>>> addition we need to separate between Airflow as a client of kerberized
>>>> services (this is what is talked about in the astronomer doc) vs
>>>> kerberizing airflow itself, which the API supports.
>>>> 
>>>> In general to access kerberized services (airflow as a client) one needs
>>>> to start the ticket renewer with a valid keytab. For the hooks it isn't
>>>> always required to change the hook to support it. Hadoop cli tools often
>>>> just pick it up as their client config is set to do so. Then another
>> class
>>>> is there for HTTP-like services which are accessed by urllib under the
>>>> hood, these typically use SPNEGO. These often need to be adjusted as it
>>>> requires some urllib config. Finally, there are protocols which use SASL
>>>> with kerberos. Like HDFS (not webhdfs, that uses SPNEGO). These require
>> per
>>>> protocol implementations.
>>>> 
>>>> From the top of my head we support kerberos client side now with:
>>>> 
>>>> * Spark
>>>> * HDFS (snakebite python 2.7, cli and with the upcoming libhdfs
>>>> implementation)
>>>> * Hive (not metastore afaik)
>>>> 
>>>> Two things to 

[REVOKE][VOTE] Release Airflow 1.10.0

2018-07-28 Thread Bolke de Bruin
FYI,

Revoking the vote to address (mostly) license issues.

Cheers
Bolke

> Begin forwarded message:
> 
> From: Bolke de Bruin 
> Subject: Re: [VOTE] Release Airflow 1.10.0
> Date: 28 July 2018 at 12:52:28 CEST
> To: gene...@incubator.apache.org
> 
> Hi Justin,
> 
> Thanks for the feedback. The holiday period seems to make gathering +1s a bit 
> slow. We are going to address the issues you and Sebb mentioned and put it up 
> for a revote at the PMC, which should be relatively quick as they don't 
> consider functional changes. For the potential GPL issue I am going to see if 
> we can set NON_GPL by default from the setup while I will also re-open the 
> legal issue to see if there is a more operations friendly way.
> 
> Cheers
> Bolke
> 
>> On 24 Jul 2018, at 23:23, Justin Mclean  wrote:
>> 
>> HI,
>> 
>>> On the GPL dependency you mentioned. We are not distributing GPL sources, 
>>> not in source or in binary form. This has never been the case.
>> 
>> Which is fine. There are two issues with GPL (Category X software):
>> - you can’t distribute them [1]
>> - Can you rely on them [2]
>> 
>> It’s [2] that seem to be the issues here. Optional dependancies on Category 
>> X are allowed but I’m really not sure in this case that it is truly optional.
>> 
>>> As to our solution (for now). Python packages are often installed site-wide 
>>> and can be part of the dependencies of other packages. While we maybe could 
>>> enforce the installation of the non-GPL API it would/could 1) interfere 
>>> with other packages on the same system that do not set this environment 
>>> variable explicitly. 2) If any the other packages upgrades without setting 
>>> this variable it would pull in the GPL API. So we decided that it would be 
>>> better to educate the user and make it part of the install instructions.
>>> 
>>> We can reconsider, but we cannot solve #1 and #2. Which, in my opinion, 
>>> would make it more opaque to the users. 
>> 
>> IMO at the very least user should be informed that this is the case  and 
>> loudly and possibly with a prompt as part of the build and install process 
>> so that they understand that what they are using may not be under the terms 
>> of the ALv2 as claimed on the cover.
>> 
>>> Given the current situation is at least improvement over the old situation 
>>> can you reconsider your -1 for this release and preferably agree with our 
>>> approach (or maybe have an improvement over it)?
>> 
>> I would suggest you reopen the legal JIRA and describe the current situation 
>> (like above) and see if an answer can be found.
>> 
>> Other IPMC member (and you mentors) can vote on this release and if it gets 
>> 3 +1’s and more plus ones than -1s then it’s a release. Remember a -1 vote 
>> on a release is not a veto.
>> 
>> Thanks,
>> Justin
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>> 
> 



Re: Kerberos and Airflow

2018-07-27 Thread Bolke de Bruin
Sure. In general I consider keytabs as a part of connection information. 
Connections should be secured by sending the connection information a task 
needs as part of information the executor gets. A task should then not need 
access to the connection table in Airflow. Keytabs could then be send as part 
of the connection information (base64 encoded) and setup by the executor (this 
key) to be read only to the task it is launching.

So basically in the scheduler we parse the dag. Either from the manifest (new) 
or from smart parsing (probably harder, maybe some auto register?) we know what 
connections and keytabs are available dag wide or per task. 

The credentials and connection information then are serialized into a protobuf 
message and send to the executor as part of the “queue” action. The worker then 
deserializes the information and makes it securely available to the task (which 
is quite hard btw).

On that last bit making the info securely available might be storing it in the 
Linux KEYRING (supported by python keyring). Keytabs will be tough to do 
properly due to Java not properly supporting KEYRING and only files and these 
are hard to make secure (due to the possibility a process will list all files 
in /tmp and get credentials through that). Maybe storing the keytab with a 
password and having the password in the KEYRING might work. Something to find 
out.

B.

Verstuurd vanaf mijn iPad

> Op 27 jul. 2018 om 22:04 heeft Dan Davydov  het 
> volgende geschreven:
> 
> I'm curious if you had any ideas in terms of ideas to enable multi-tenancy
> with respect to Kerberos in Airflow.
> 
>> On Fri, Jul 27, 2018 at 2:38 PM Bolke de Bruin  wrote:
>> 
>> Cool. The doc will need some refinement as it isn't entirely accurate. In
>> addition we need to separate between Airflow as a client of kerberized
>> services (this is what is talked about in the astronomer doc) vs
>> kerberizing airflow itself, which the API supports.
>> 
>> In general to access kerberized services (airflow as a client) one needs
>> to start the ticket renewer with a valid keytab. For the hooks it isn't
>> always required to change the hook to support it. Hadoop cli tools often
>> just pick it up as their client config is set to do so. Then another class
>> is there for HTTP-like services which are accessed by urllib under the
>> hood, these typically use SPNEGO. These often need to be adjusted as it
>> requires some urllib config. Finally, there are protocols which use SASL
>> with kerberos. Like HDFS (not webhdfs, that uses SPNEGO). These require per
>> protocol implementations.
>> 
>> From the top of my head we support kerberos client side now with:
>> 
>> * Spark
>> * HDFS (snakebite python 2.7, cli and with the upcoming libhdfs
>> implementation)
>> * Hive (not metastore afaik)
>> 
>> Two things to remember:
>> 
>> * If a job (ie. Spark job) will finish later than the maximum ticket
>> lifetime you probably need to provide a keytab to said application.
>> Otherwise you will get failures after the expiry.
>> * A keytab (used by the renewer) are credentials (user and pass) so jobs
>> are executed under the keytab in use at that moment
>> * Securing keytab in multi tenancy airflow is a challenge. This also goes
>> for securing connections. This we need to fix at some point. Solution for
>> now seems to be no multi tenancy.
>> 
>> Kerberos seems harder than it is btw. Still, we are sometimes moving away
>> from it to OAUTH2 based authentication. This gets use closer to cloud
>> standards (but we are on prem)
>> 
>> B.
>> 
>> Sent from my iPhone
>> 
>>> On 27 Jul 2018, at 17:41, Hitesh Shah  wrote:
>>> 
>>> Hi Taylor
>>> 
>>> +1 on upstreaming this. It would be great if you can submit a pull
>> request
>>> to enhance the apache airflow docs.
>>> 
>>> thanks
>>> Hitesh
>>> 
>>> 
>>>> On Thu, Jul 26, 2018 at 2:32 PM Taylor Edmiston 
>> wrote:
>>>> 
>>>> While we're on the topic, I'd love any feedback from Bolke or others
>> who've
>>>> used Kerberos with Airflow on this quick guide I put together yesterday.
>>>> It's similar to what's in the Airflow docs but instead all on one page
>>>> and slightly
>>>> expanded.
>>>> 
>>>> 
>>>> 
>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md
>>>> (or web version <https://www.astronomer.io/guides/kerberos/>)
>>>> 
>>>> One thing I'd like to add is a minimal example of how to Kerberize a
>> hoo

Re: Sep Airflow Bay Area Meetup @ Google

2018-07-24 Thread Bolke de Bruin
Great stuff Feng Lu!

Sent from my iPhone

> On 24 Jul 2018, at 21:49, Feng Lu  wrote:
> 
> The meetup event is now available for people to sign up:
> https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/253105418/.
> yay!
> 
> On Mon, Jul 23, 2018 at 3:40 PM Chris Riccomini 
> wrote:
> 
>> @Feng Lu   apparently you're listed as an EVENT
>> ORGANIZER on the group. I believe that should allow you to create meetups.
>> If not, can you let me know?
>> 
>>> On Sun, Jul 22, 2018 at 12:36 PM Ben Gregory  wrote:
>>> 
>>> Will do Feng!
>>> 
>>> Also - is there an approximate date we'll know if the hackathon is going
>>> to
>>> happen? Want to make sure we can get a good attendance internally.
>>> 
>>> Looking forward to it!
>>> 
 On Sat, Jul 21, 2018 at 10:44 AM Feng Lu  wrote:
 
 Sounds great, thank you Ben.
 When you get a chance, could you please send me your talk
 title/abstract/session type(regular or lightening)?
 
> On Fri, Jul 20, 2018 at 2:10 PM Ben Gregory  wrote:
> 
> Hey Feng!
> 
> Awesome to hear that you're hosting the next meetup! We'd love to give
>>> a
> talk (and potentially a lightning session if available) -- we have a
>>> number
> of topics we could speak on but off the top of our heads we're thinking
> "Running Cloud Native Airflow", tying in some of our work on the
>>> Kubernetes
> Executor. How does that sound?
> 
> Also, if there ends up being an Airflow hackathon, you can absolutely
> count us in. Let us know how we can help coordinate if the need
>>> presents
> itself!
> 
> -Ben
> 
> On Thu, Jul 19, 2018 at 3:26 PM Feng Lu 
> wrote:
> 
>> Hi all,
>> 
>> Hope you are enjoying your summer!
>> 
>> This is Feng Lu from Google and we'll host the next Airflow meetup in
>> our Sunnyvale
>> campus . We plan to
>>> add
>> a *lightening
>> session* this time for people to share their airflow ideas, work in
>> progress, pain points, etc.
>> Here's the meetup date and schedule:
>> 
>> -- Sep 24 (Monday)  --
>> 6:00PM meetup starts
>> 6:00 - 8:00PM light dinner /mix-n-mingle
>> 8:00PM - 9:40PM: 5 sessions (20 minutes each)
>> 9:40PM to 10:10PM: 6 lightening sessions (5 minutes each)
>> 10:10PM to 11:00PM: drinks and social hour
>> 
>> I've seen a lot of interesting discussions in the dev mailing-list on
>> security, scalability, event interactions, future directions, hosting
>> platform and others. Please feel free to send your talk proposal to
>>> us by
>> replying this email.
>> 
>> The Cloud Composer team is also going to share their experience
>>> running
>> Apache Airflow as a managed solution and service roadmap.
>> 
>> Thank you and looking forward to hearing from y'all soon!
>> 
>> p.s., if folks are interested, we can also add a one-day Airflow
>> hackathon
>> prior to the meet-up on the same day, please let us know.
>> 
>> Feng
>> 
> 
> 
> --
> 
> [image: Astronomer Logo] 
> 
> *Ben Gregory*
> Data Engineer
> 
> Mobile: +1-615-483-3653 • Online: astronomer.io
> 
> 
> Download our new ebook.  From
> Volume to Value - A Guide to Data Engineering.
> 
 
>>> 
>>> --
>>> 
>>> [image: Astronomer Logo] 
>>> 
>>> *Ben Gregory*
>>> Data Engineer
>>> 
>>> Mobile: +1-615-483-3653 • Online: astronomer.io <
>>> https://www.astronomer.io/>
>>> 
>>> Download our new ebook.  From
>>> Volume
>>> to Value - A Guide to Data Engineering.
>>> 
>> 


Re: Airflow talk at Munich Data Engineering Meetup

2018-07-24 Thread Bolke de Bruin
Hi Stefan,

Great stuff! Can you make sure to at least once use “Apache Airflow 
(incubating)” in the text, preferably at the beginning of the paragraph? That 
would be greatly appreciated.

Thanks
Bolke

Verstuurd vanaf mijn iPad

> Op 24 jul. 2018 om 07:54 heeft Stefan Seelmann  het 
> volgende geschreven:
> 
> Hi all,
> 
> I'll give a talk about Airflow at the next Data Engineering Meetup in
> Munich (Germany) on next Thursday the 26th. Maybe some folks from the
> Munich area are interested. Details at [1].
> 
> Kind Regards,
> Stefan
> 
> [1] https://www.meetup.com/data-engineering-munich/events/252170998/


Re: Airflow's JS code (and dependencies) manageable via npm and webpack

2018-07-23 Thread Bolke de Bruin
I think it should be removed now. 1.10.X should be the last release seri s that 
supports the old www. Do we need to vote on this?

Great work Verdan!

Verstuurd vanaf mijn iPad

> Op 23 jul. 2018 om 10:23 heeft Driesprong, Fokko  het 
> volgende geschreven:
> 
> ​Nice work Verdan.
> 
> The frontend really needed some love, thank you for picking this up. Maybe
> we should also think deprecating the old www. Keeping both of the UI's is
> something that takes a lot of time. Maybe after the release of 1.10 we can
> think of moving to Airflow 2.0, and removing the old UI.
> 
> 
> Cheers, Fokko​
> 
> 2018-07-23 10:02 GMT+02:00 Naik Kaxil :
> 
>> Awesome. Thanks @Verdan
>> 
>> On 23/07/2018, 07:58, "Verdan Mahmood"  wrote:
>> 
>>Heads-up!! This frontend change has been merged in master branch
>> recently.
>>This will impact the users working on Airflow RBAC UI only. That means:
>> 
>>*If you are a contributor/developer of Apache Airflow:*
>>You'll need to install and build the frontend packages if you want to
>> run
>>the web UI.
>>Please make sure to read the new section, "Setting up the node / npm
>>javascript environment"
>>> CONTRIBUTING.md#setting-up-the-node--npm-javascript-
>> environment-only-for-www_rbac>
>> 
>>in CONTRIBUTING.md
>> 
>>*If you are using Apache Airflow in your production environment:*
>>Nothing will impact you, as every new build of Apache Airflow will
>> come up
>>with pre-built dependencies.
>> 
>>Please let me know if you have any questions. Thank you
>> 
>>Best,
>>*Verdan Mahmood*
>> 
>> 
>>On Sun, Jul 15, 2018 at 6:52 PM Maxime Beauchemin <
>>maximebeauche...@gmail.com> wrote:
>> 
>>> Glad to see this is happening!
>>> 
>>> Max
>>> 
>>> On Mon, Jul 9, 2018 at 6:37 AM Ash Berlin-Taylor <
>>> ash_airflowl...@firemirror.com> wrote:
>>> 
 Great! Thanks for doing this. I've left some review comments on
>> your PR.
 
 -ash
 
> On 9 Jul 2018, at 11:45, Verdan Mahmood <
>> verdan.mahm...@gmail.com>
 wrote:
> 
> ​Hey Guys, ​
> 
> In an effort to simplify the JS dependencies of Airflow
> ​​
> ,
> ​I've
> introduce
> ​d​
> npm and webpack for the package management. For now, it only
>> implements
> this in the www_rbac version of the web server.
> ​
> 
> Pull Request: https://github.com/apache/
>> incubator-airflow/pull/3572
> 
> The problem with the
> ​existing ​
> frontend (
> ​JS
> ) code of Airflow is that most of the custom JS is written
> ​with​
> in the html files, using the Flask's (Jinja) variables in that
>> JS. The
 next
> step of this effort would be to extract that custom
> ​JS
> code in separate JS files
> ​,​
> use the dependencies in those files using require or import
> ​ and introduce the JS automated test suite eventually. ​
> (At the moment, I'm simply using the CopyWebPackPlugin to copy
>> the
 required
> dependencies for use)
> ​.
> 
> There are also some dependencies which are directly modified in
>> the
 codebase
> ​ or are outdated​
> . I couldn't found the
> ​ correct​
> npm versions of those libraries. (dagre-d3.js and
>> gantt-chart-d3v2.js).
> Apparently dagre-d3.js that we are using is one of the gist or
>> is very
 old
> version
> ​ not supported with webpack 4​
> , while the gantt-chart-d3v2 has been modified according to
>> Airflow's
> requirements
> ​ I believe​
> .
> ​ Used the existing libraries for now. ​
> 
> ​I am currently working in a separate branch to upgrade the
>> DagreD3
> library, and updating the custom JS related to DagreD3
>> accordingly. ​
> 
> This PR also introduces the pypi_push.sh
> <
 
>>> https://github.com/apache/incubator-airflow/pull/3572/files#diff-
>> 8fae684cdcc8cc8df2232c8df16f64cb
> 
> script that will generate all the JS statics before creating and
 uploading
> the package.
> ​
> ​Please let me know if you guys have any questions or
>> suggestions and
>>> I'd
> be happy to answer that. ​
> 
> Best,
> *Verdan Mahmood*
> (+31) 655 576 560
 
 
>>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Kaxil Naik
>> 
>> Data Reply
>> 2nd Floor, Nova South
>> 160 Victoria Street, Westminster
>> London SW1E 5LB - UK
>> phone: +44 (0)20 7730 6000
>> k.n...@reply.com
>> www.reply.com
>> 


Re: [VOTE] Release Airflow 1.10.0

2018-07-21 Thread Bolke de Bruin
Hi Justin,

Thank you for the thorough review! I have created AIRFLOW-2779 to track most of 
the issues you have raised. 

On the GPL dependency you mentioned. We are not distributing GPL sources, not 
in source or in binary form. This has never been the case. In the third degree 
there potentially was a GPL issue during runtime. The author of the package in 
question (unidecode) when asked mentioned several times that he considered the 
usage equal to an API (ie. like the Linux kernel exposing a set of generic 
calls) and the API could be implemented by an alternative. This was discussed 
in LEGAL-362, which you took part in.

We managed to convince the upstream package maintainers (python-slugify and 
python-nvd3) to allow a patch that allowed switching to a different API 
implementation by setting a environment variable while installing their 
packages and to release new versions. However it is not the default for them. 
This means at least that the situation we are now in is an improvement over the 
previous releases (1.8.0 -> 1.8.1 -> 1.8.2 -> 1.9.0) as there was no way switch 
and avoid the package before.

As to our solution (for now). Python packages are often installed site-wide and 
can be part of the dependencies of other packages. While we maybe could enforce 
the installation of the non-GPL API it would/could 1) interfere with other 
packages on the same system that do not set this environment variable 
explicitly. 2) If any the other packages upgrades without setting this variable 
it would pull in the GPL API. So we decided that it would be better to educate 
the user and make it part of the install instructions.

We can reconsider, but we cannot solve #1 and #2. Which, in my opinion, would 
make it more opaque to the users. 

Given the current situation is at least improvement over the old situation can 
you reconsider your -1 for this release and preferably agree with our approach 
(or maybe have an improvement over it)?   

Cheers
Bolke



> On 21 Jul 2018, at 03:03, Justin Mclean  wrote:
> 
> Hi,
> 
> -1 (binding) because of GPL dependancy
> 
> I checked the source release:
> - incubating in name
> - signatures and hash good but please remove md5 hashes and don’t publish then
> - DISCLAIMER exists
> - Year in NOTICE is not correct "2016 and onwards” isn’t valid as copyright 
> has an expiry date
> - NOTICE and LICENSE have a couple of minor issues (see below)
> - Several files look to have incorrect headers with copyright lines 
> [8][9][10] Are these actually 3rd party files?
> - No unexpected binary files
> - Failed to install, probably my set up. Would be nice to note python version 
> required and supported OS’s in INSTALL.
> 
> LICENSE is:
> - missing jQuery clock [3] and typeahead [4], as they are ALv2 it’s not 
> required to list them but it’s a good idea to do so.
> - missing the license for this [5]
> - this file [7] oddly has © 2016 GitHub, Inc.at the bottom of it
> 
> This files [1][2] seem to be 3rd party ALv2 licensed files that refers to a 
> NOTICE file, that information in that NOTICE file (at the very least the 
> copyright into) should be in your NOTICE file. This should also be noted in 
> LICENSE.
> 
> I also find it very odd that the GPL dependancy unidecode is opt out, rather 
> than opt in (ie the user has to do something to not get it) and that makes it 
> non optional IMO [6].  Can you explain why it was done this way and I’ll 
> consider changing my vote.
> 
> Thanks,
> Justin
> 
> 1. /airflow/security/utils.py
> 2. ./airflow/security/kerberos.py
> 3. ./airflow/www_rbac/static/jqClock.min.js
> 4. ./airflow/www/static/bootstrap3-typeahead.min.js
> 5. ./apache-airflow-1.10.0rc2+incubating/scripts/ci/flake8_diff.sh
> 6. https://www.apache.org/legal/resolved.html#optional
> 7. ./docs/license.rst
> 8. airflow/contrib/auth/backends/google_auth.py
> 9. /airflow/contrib/auth/backends/github_enterprise_auth.py
> 10. /airflow/contrib/hooks/ssh_hook.py
> 11. /airflow/minihivecluster.py
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
> 



Re: [VOTE] Release Airflow 1.10.0

2018-07-20 Thread Bolke de Bruin
It is just a convenience package and not part of the vote, but that argument 
seems mood. I will therefore remove the binary package.

B.

Verstuurd vanaf mijn iPad

> Op 20 jul. 2018 om 16:58 heeft Bertrand Delacretaz  
> het volgende geschreven:
> 
>> Le ven. 20 juil. 2018 à 16:15, sebb  a écrit :
>> 
>> A VOTE thread is about a particular set of artifacts; you cannot
>> change any of them without restarting the vote.
>> Otherwise the vote is invalid...
>> 
> 
> I agree with that.
> Bertrand


[RESULT][VOTE] Airflow 1.10.0rc2

2018-07-18 Thread Bolke de Bruin
Hello,

(Just making sure this works with the IPMC)

Apache Airflow (incubating) 1.10.0 (based on RC2) has been accepted.

4 “+1” binding votes received:

- Joy Gao (binding)
- Naik Kaxil (binding)
- Bolke de Bruin (binding)
- Fokko Driesprong (binding)

My next step is to open a thread with the IPMC.

Cheers,
Bolke

> On 18 Jul 2018, at 01:04, Joy Gao  wrote:
> 
> Tested this with new installment of Airflow in python 2.7/3.6, mainly
> focusing on:
> - simple DAGs running with local executor
> - log generation / rendering
> - fernet key generation on installment
> - rbac/non-rbac UI view rendering
> 
> Voting: +1 (binding)
> 
> 
> 
> 
> On Mon, Jul 16, 2018 at 1:20 AM, Naik Kaxil  wrote:
> 
>> I have run tests on python 2.7 and 3.5 and it works fine.
>> 
>> My vote: +1 (binding)
>> 
>> Thanks @bolke.
>> 
>> Regards,
>> K
>> 
>> On 16/07/2018, 09:03, "fo...@driesprongen.nl on behalf of Driesprong,
>> Fokko"  wrote:
>> 
>>My vote is ​+1 (binding)​
>> 
>>Cheers, Fokko
>> 
>>2018-07-16 9:52 GMT+02:00 Driesprong, Fokko :
>> 
>>> Awesome Bolke!
>>> 
>>> I don't have a production Airflow at hand right now, but I've ran
>> some
>>> simple tests against Python 2.7, 3.5, 3.6 and it all looked fine.
>>> 
>>> +1 From my side
>>> 
>>> Cheers, Fokko
>>> 
>>> 2018-07-15 20:05 GMT+02:00 Bolke de Bruin :
>>> 
>>>> Hey all,
>>>> 
>>>> I have cut Airflow 1.10.0 RC2. This email is calling a vote on the
>>>> release,
>>>> which will last for 72 hours. Consider this my (binding) +1.
>>>> 
>>>> Airflow 1.10.0 RC 2 is available at:
>>>> 
>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc2/
>> <
>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc2/
>>> 
>>>> 
>>>> apache-airflow-1.10.0rc2+incubating-source.tar.gz is a source
>> release
>>>> that
>>>> comes with INSTALL instructions.
>>>> apache-airflow-1.10.0rc2+incubating-bin.tar.gz is the binary Python
>>>> "sdist"
>>>> release.
>>>> 
>>>> Public keys are available at:
>>>> 
>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>>> 
>>>> The amount of JIRAs fixed is over 700. Please have a look at the
>>>> changelog.
>>>> Since RC2 the following has been fixed:
>>>> 
>>>> * [AIRFLOW-1729][AIRFLOW-2797][AIRFLOW-2729] Ignore whole
>> directories in
>>>> .airflowignore
>>>> * [AIRFLOW-2739] Always read default configuration files as utf-8
>>>> * [AIRFLOW-2752] Log using logging instead of stdout
>>>> * [AIRFLOW-1729][AIRFLOW-XXX] Remove extra debug log at info level
>>>> 
>>>> Please note that the version number excludes the `rcX` string as
>> well
>>>> as the "+incubating" string, so it's now simply 1.10.0. This will
>> allow us
>>>> to rename the artifact without modifying the artifact checksums
>> when we
>>>> actually release.
>>>> 
>>>> 
>>>> Cheers,
>>>> Bolke
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Kaxil Naik
>> 
>> Data Reply
>> 2nd Floor, Nova South
>> 160 Victoria Street, Westminster
>> London SW1E 5LB - UK
>> phone: +44 (0)20 7730 6000
>> k.n...@reply.com
>> www.reply.com
>> 



Re: [RESULT][VOTE] Airflow 1.10.0rc2

2018-07-18 Thread Bolke de Bruin
well my name is Bolke, but you get what I mean :P


> On 18 Jul 2018, at 20:14, Bolke de Bruin  wrote:
> 
> Hello,
> 
> Apache Airflow (incubating) 1.10.0 (based on RC2) has been accepted.
> 
> 4 “+1” binding votes received:
> 
> - Joy Gao (binding)
> - Naik Kaxil (binding)
> - Bolke de Bruin (binding)
> - Fokko Driesprong (binding)
> 
> My next step is to open a thread with the IPMC.
> 
> Cheers,
> Chris



[RESULT][VOTE] Airflow 1.10.0rc2

2018-07-18 Thread Bolke de Bruin
Hello,

Apache Airflow (incubating) 1.10.0 (based on RC2) has been accepted.

4 “+1” binding votes received:

- Joy Gao (binding)
- Naik Kaxil (binding)
- Bolke de Bruin (binding)
- Fokko Driesprong (binding)

My next step is to open a thread with the IPMC.

Cheers,
Chris

Re: Airflow support for kubernetes Exceutor

2018-07-18 Thread Bolke de Bruin
Yes it Will. 

Sent from my iPhone

> On 18 Jul 2018, at 10:19, ramandu...@gmail.com  wrote:
> 
> will 1.10 release have support for kubernetes Executor.
> 
> Thanks,
> Raman


  1   2   3   4   5   6   >