Re: programmatically creating and airflow quirks

2018-11-28 Thread soma dhavala
Great inputs James. I was premature in saying we need micro-services. Any 
solutioning should  depend on the problem(s) being solved and promise(s) being 
made.

thanks,
-soma

> On Nov 28, 2018, at 11:24 PM, James Meickle  
> wrote:
> 
> I would be very interested in helping draft a rearchitecting AIP. Of
> course, that's a vague statement. I am interested in several specific areas
> of Airflow functionality that would be hard to modify without some
> refactoring taking place first:
> 
> 1) Improving Airflow's data model so it's easier to have functional data
> pipelines (such as addressing information propagation and artifacts via a
> non-xcom mechanism)
> 
> 2) Having point-in-timeness for DAGs: a concept of which revision of a DAG
> was in use at which date, represented in-Airflow.
> 
> 3) Better idioms and loading capabilities for DAG factories (either
> config-driven, or non-Python creation of DAGs, like with boundary-layer).
> 
> 4) Flexible execution dates: in finance we operate day over day, and have
> valid use cases for "t-1", "t+0", and "t+1" dates. The current execution
> date status is incredibly confusing for literally every developer we've
> brought onto Airflow (they understand it eventually but do make mistakes at
> first).
> 
> 5) Scheduler-integrated sensors
> 
> 6) Making Airflow more operator-friendly with better alerting, health
> checks, notifications, deploy-time configuration, etc.
> 
> 7) Improving testability of various components (both within the Airflow
> repo, as well as making it easier to test DAGs and plugins)
> 
> 8) Deprecating "newbie trap" or excess complexity features (like skips), by
> fixing their internal implementation or by providing alternatives that
> address their use cases in more sound ways.
> 
> To my mind, I would need Airflow to be more modular to accomplish several
> of those. Even if these aims don't happen in Airflow contrib (as some are
> quite contentious and have been discussed on this list before), it would
> currently be nearly impossible to maintain an in-house branch that
> attempted to implement them.
> 
> That being said, saying that it requires microservices is IMO incorrect.
> Airflow already scales quite well, so while it needs more modularization,
> we probably would see no benefit from immediately breaking those modules
> into independent services.
> 
> On Wed, Nov 28, 2018 at 11:38 AM Ash Berlin-Taylor  wrote:
> 
>> I have similar feelings around the "core" of Airflow and would _love_ to
>> somehow find time to spend a month really getting to grips with the
>> scheduler and the dagbag and see what comes to light with fresh eyes and
>> the benefits of hindsight.
>> 
>> Finding that time is going to be A Challenge though.
>> 
>> (Oh, except no to microservices. Airflow is hard enough to operator right
>> now without splitting things in to even more daemons)
>> 
>> -ash
>>> On 26 Nov 2018, at 03:06, soma dhavala  wrote:
>>> 
>>> 
>>> 
 On Nov 26, 2018, at 7:50 AM, Maxime Beauchemin <
>> maximebeauche...@gmail.com> wrote:
 
 The historical reason is that people would check in scripts in the repo
 that had actual compute or other forms or undesired effect in module
>> scope
 (scripts with no "if __name__ == '__main__':") and Airflow would just
>> run
 this script while seeking for DAGs. So we added this mitigation patch
>> that
 would confirm that there's something Airflow-related in the .py file.
>> Not
 elegant, and confusing at times, but it also probably prevented some
>> issues
 over the years.
 
 The solution here is to have a more explicit way of adding DAGs to the
 DagBag (instead of the folder-crawling approach). The DagFetcher
>> proposal
 offers solutions around that, having a central "manifest" file that
 provides explicit pointers to all DAGs in the environment.
>>> 
>>> Some rebasing needs to happen. When I looked at 1.8 code base almost an
>> year ago, it felt like more complex than necessary.  What airflow is trying
>> to promise from an architectural standpoint — that was not clear to me. It
>> is trying to do too many things, scattered in too many places, is the
>> feeling I got. As a result, I stopped peeping, and just trust that it works
>> — which it does, btw. I tend to think that, airflow outgrew its original
>> intents. A sort of micro-services architecture has to be brought in. I may
>> sound critical, but no offense. I truly appreciate the contributions.
>>> 
 
 Max
 
 On Sat, Nov 24, 2018 at 5:04 PM Beau Barker 
 wrote:
 
> In my opinion this searching for dags is not ideal.
> 
> We should be explicitly specifying the dags to load somewhere.
> 
> 
>> On 25 Nov 2018, at 10:41 am, Kevin Yang  wrote:
>> 
>> I believe that is mostly because we want to skip parsing/loading .py
> files
>> that doesn't contain DAG defs to save time, as scheduler is going to
>> parse/load the .py files over and 

Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-11-28 Thread Jakob Homan
I'll finish up the template at
http://incubator.apache.org/projects/airflow.html tomorrow or Friday
(I *think* you have to be an IPMC member to update it since it lives
in the Incubator SVN).  Looks like there's no actual work to do, just
marking stuff that has been done but not yet recorded, and verifying
some licenses.

-Jakob



On Wed, Nov 28, 2018 at 2:48 PM Tao Feng  wrote:
>
> Sorry, just saw Kaxil's latest email. Kaxil, is there anything else I could
> help with?
>
> Thanks,
> -Tao
>
> On Wed, Nov 28, 2018 at 2:40 PM Tao Feng  wrote:
>
> > I would like to help on the documentation. Let me take a look at it. I
> > will work Kaxil on that.
> >
> > On Tue, Nov 27, 2018 at 12:39 PM Bolke de Bruin  wrote:
> >
> >> Hi Folks,
> >>
> >> Thanks all for your responses and particularly Stefan for his suggestion
> >> to use the generic Apache way to handle security issues. This seems to be
> >> an accepted way for more projects, so I have added this to the maturity
> >> evaluation[1] and marked is as resolved. While handling the GPL library can
> >> be nicer we are already in compliance with CD30, so @Fokko and @Ash if you
> >> want to help out towards graduation please spend your time elsewhere like
> >> fixing CO50. This means adding a page to confluence that describes how to
> >> become a committer on the project. As we are following Apache many examples
> >> of other projects are around[2]
> >>
> >> Then there is the paperwork[3] as referred to by Jakob. This mainly
> >> concerns filling in some items, maybe here and there creation some
> >> documentation but I don't think much. @Kaxil, @Tao: are you willing to pick
> >> this up? @Sid can you share how to edit that page?
> >>
> >> If we have resolved these items in my opinion we can start the voting
> >> here and at the IPMC thereafter, targeting the board meeting of January for
> >> graduation. How’s that for a New Year’s resolution?
> >>
> >> Cheers!
> >> Bolke
> >>
> >> P.S. Would it be nice to have updated graduation web page? Maybe one of
> >> the contributors/community members likes to take a stab at this[4]
> >>
> >> [1]
> >> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation <
> >> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation>
> >> [2] https://cwiki.apache.org/confluence/display/HAWQ/Becoming+a+committer
> >> 
> >> [3] http://incubator.apache.org/projects/airflow.html <
> >> http://incubator.apache.org/projects/airflow.html>
> >> [4] https://airflow.apache.org/ 
> >>
> >>
> >>
> >> > On 27 Nov 2018, at 16:32, Driesprong, Fokko 
> >> wrote:
> >> >
> >> > +1 from my side. Would be awesome to graduate Airflow
> >> >
> >> > If time allows, I'll also dive into CD30.
> >> >
> >> > Cheers, Fokko
> >> >
> >> > Op di 27 nov. 2018 om 16:21 schreef Ash Berlin-Taylor :
> >> >
> >> >> Oarsome Bolke, thanks for starting this.
> >> >>
> >> >> It looks like we are closer than I thought!
> >> >>
> >> >> We can use those security lists (though having our own would be nice) -
> >> >> either way we will need to make this prominent in the docs.
> >> >>
> >> >> Couple of points
> >> >>
> >> >> CS10: that github link is only visible to members of the team
> >> >>
> >> >> CD30: probably good as it is, we may want to do
> >> >> https://issues.apache.org/jira/browse/AIRFLOW-3400 <
> >> >> https://issues.apache.org/jira/browse/AIRFLOW-3400> to remove the last
> >> >> niggle of the GPL env var at install time (but not a hard requirement,
> >> just
> >> >> nice)
> >> >>
> >> >> -ash
> >> >>
> >> >>> On 26 Nov 2018, at 21:10, Stefan Seelmann 
> >> >> wrote:
> >> >>>
> >> >>> I agree that Apache Airflow should graduate.
> >> >>>
> >> >>> I'm only involved since beginning of this year, but the project did
> >> two
> >> >>> releases during that time, once TLP releasing becomes easier :)
> >> >>>
> >> >>> Regarding QU30 you may consider to use the ASF wide security mailing
> >> >>> list [3] and process [4].
> >> >>>
> >> >>> Kind Regards,
> >> >>> Stefan
> >> >>>
> >> >>> [3] https://www.apache.org/security/
> >> >>> [4] https://www.apache.org/security/committers.html
> >> >>>
> >> >>>
> >> >>> On 11/26/18 8:46 PM, Bolke de Bruin wrote:
> >>  Ping!
> >> 
> >>  Sent from my iPhone
> >> 
> >> > On 24 Nov 2018, at 12:57, Bolke de Bruin  wrote:
> >> >
> >> > Hi All,
> >> >
> >> > With the Apache Airflow community healthy and growing, I think now
> >> >> would be a good time to
> >> > discuss where we stand regarding to graduation from the Incubator,
> >> and
> >> >> what requirements remains.
> >> >
> >> > Apache Airflow entered incubation around 2 years ago, since then,
> >> the
> >> >> Airflow community learned
> >> > a lot about how to do things in Apache ways. Now we are a very
> >> helpful
> >> >> and engaged community,
> >> > ready to help on all questions from the Airflow 

Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-11-28 Thread Tao Feng
Sorry, just saw Kaxil's latest email. Kaxil, is there anything else I could
help with?

Thanks,
-Tao

On Wed, Nov 28, 2018 at 2:40 PM Tao Feng  wrote:

> I would like to help on the documentation. Let me take a look at it. I
> will work Kaxil on that.
>
> On Tue, Nov 27, 2018 at 12:39 PM Bolke de Bruin  wrote:
>
>> Hi Folks,
>>
>> Thanks all for your responses and particularly Stefan for his suggestion
>> to use the generic Apache way to handle security issues. This seems to be
>> an accepted way for more projects, so I have added this to the maturity
>> evaluation[1] and marked is as resolved. While handling the GPL library can
>> be nicer we are already in compliance with CD30, so @Fokko and @Ash if you
>> want to help out towards graduation please spend your time elsewhere like
>> fixing CO50. This means adding a page to confluence that describes how to
>> become a committer on the project. As we are following Apache many examples
>> of other projects are around[2]
>>
>> Then there is the paperwork[3] as referred to by Jakob. This mainly
>> concerns filling in some items, maybe here and there creation some
>> documentation but I don't think much. @Kaxil, @Tao: are you willing to pick
>> this up? @Sid can you share how to edit that page?
>>
>> If we have resolved these items in my opinion we can start the voting
>> here and at the IPMC thereafter, targeting the board meeting of January for
>> graduation. How’s that for a New Year’s resolution?
>>
>> Cheers!
>> Bolke
>>
>> P.S. Would it be nice to have updated graduation web page? Maybe one of
>> the contributors/community members likes to take a stab at this[4]
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation <
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation>
>> [2] https://cwiki.apache.org/confluence/display/HAWQ/Becoming+a+committer
>> 
>> [3] http://incubator.apache.org/projects/airflow.html <
>> http://incubator.apache.org/projects/airflow.html>
>> [4] https://airflow.apache.org/ 
>>
>>
>>
>> > On 27 Nov 2018, at 16:32, Driesprong, Fokko 
>> wrote:
>> >
>> > +1 from my side. Would be awesome to graduate Airflow
>> >
>> > If time allows, I'll also dive into CD30.
>> >
>> > Cheers, Fokko
>> >
>> > Op di 27 nov. 2018 om 16:21 schreef Ash Berlin-Taylor :
>> >
>> >> Oarsome Bolke, thanks for starting this.
>> >>
>> >> It looks like we are closer than I thought!
>> >>
>> >> We can use those security lists (though having our own would be nice) -
>> >> either way we will need to make this prominent in the docs.
>> >>
>> >> Couple of points
>> >>
>> >> CS10: that github link is only visible to members of the team
>> >>
>> >> CD30: probably good as it is, we may want to do
>> >> https://issues.apache.org/jira/browse/AIRFLOW-3400 <
>> >> https://issues.apache.org/jira/browse/AIRFLOW-3400> to remove the last
>> >> niggle of the GPL env var at install time (but not a hard requirement,
>> just
>> >> nice)
>> >>
>> >> -ash
>> >>
>> >>> On 26 Nov 2018, at 21:10, Stefan Seelmann 
>> >> wrote:
>> >>>
>> >>> I agree that Apache Airflow should graduate.
>> >>>
>> >>> I'm only involved since beginning of this year, but the project did
>> two
>> >>> releases during that time, once TLP releasing becomes easier :)
>> >>>
>> >>> Regarding QU30 you may consider to use the ASF wide security mailing
>> >>> list [3] and process [4].
>> >>>
>> >>> Kind Regards,
>> >>> Stefan
>> >>>
>> >>> [3] https://www.apache.org/security/
>> >>> [4] https://www.apache.org/security/committers.html
>> >>>
>> >>>
>> >>> On 11/26/18 8:46 PM, Bolke de Bruin wrote:
>>  Ping!
>> 
>>  Sent from my iPhone
>> 
>> > On 24 Nov 2018, at 12:57, Bolke de Bruin  wrote:
>> >
>> > Hi All,
>> >
>> > With the Apache Airflow community healthy and growing, I think now
>> >> would be a good time to
>> > discuss where we stand regarding to graduation from the Incubator,
>> and
>> >> what requirements remains.
>> >
>> > Apache Airflow entered incubation around 2 years ago, since then,
>> the
>> >> Airflow community learned
>> > a lot about how to do things in Apache ways. Now we are a very
>> helpful
>> >> and engaged community,
>> > ready to help on all questions from the Airflow community. We
>> >> delivered multiple releases that have
>> > been increasing in quality ever since, now we can do self-driving
>> >> releases in good cadence.
>> >
>> > The community is growing, new committers and PPMC members keep
>> >> joining. We addressed almost all
>> > the maturity issues stipulated by Apache Project Maturity Model [1].
>> >> So final requirements remain, but
>> > those just need a final nudge. Committers and contributors are
>> invited
>> >> to verify the list and pick up the last
>> > bits (QU30, CO50). Finally (yahoo!) all the License and IP issues we
>> >> can see got 

Call for fixes for Airflow 1.10.2

2018-11-28 Thread Kaxil Naik
Hi everyone,

I'm starting the process of gathering fixes for a 1.10.2. So far the list
of issues I have that we should pull in are
*https://issues.apache.org/jira/browse/AIRFLOW-3384?jql=project%20%3D%20AIRFLOW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%201.10.2
*

I will start pushing these as cherry-picked commits to the v1-10-test
branch today.

*Kaxil Naik*
*Big Data Consultant *@ *Data Reply UK*
*Certified *Google Cloud Data Engineer | *Certified* Apache Spark & Neo4j
Developer
*Phone: *+44 (0) 74820 88992
*LinkedIn*: https://www.linkedin.com/in/kaxil


Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-11-28 Thread Tao Feng
I would like to help on the documentation. Let me take a look at it. I will
work Kaxil on that.

On Tue, Nov 27, 2018 at 12:39 PM Bolke de Bruin  wrote:

> Hi Folks,
>
> Thanks all for your responses and particularly Stefan for his suggestion
> to use the generic Apache way to handle security issues. This seems to be
> an accepted way for more projects, so I have added this to the maturity
> evaluation[1] and marked is as resolved. While handling the GPL library can
> be nicer we are already in compliance with CD30, so @Fokko and @Ash if you
> want to help out towards graduation please spend your time elsewhere like
> fixing CO50. This means adding a page to confluence that describes how to
> become a committer on the project. As we are following Apache many examples
> of other projects are around[2]
>
> Then there is the paperwork[3] as referred to by Jakob. This mainly
> concerns filling in some items, maybe here and there creation some
> documentation but I don't think much. @Kaxil, @Tao: are you willing to pick
> this up? @Sid can you share how to edit that page?
>
> If we have resolved these items in my opinion we can start the voting here
> and at the IPMC thereafter, targeting the board meeting of January for
> graduation. How’s that for a New Year’s resolution?
>
> Cheers!
> Bolke
>
> P.S. Would it be nice to have updated graduation web page? Maybe one of
> the contributors/community members likes to take a stab at this[4]
>
> [1]
> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation <
> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation>
> [2] https://cwiki.apache.org/confluence/display/HAWQ/Becoming+a+committer
> 
> [3] http://incubator.apache.org/projects/airflow.html <
> http://incubator.apache.org/projects/airflow.html>
> [4] https://airflow.apache.org/ 
>
>
>
> > On 27 Nov 2018, at 16:32, Driesprong, Fokko 
> wrote:
> >
> > +1 from my side. Would be awesome to graduate Airflow
> >
> > If time allows, I'll also dive into CD30.
> >
> > Cheers, Fokko
> >
> > Op di 27 nov. 2018 om 16:21 schreef Ash Berlin-Taylor :
> >
> >> Oarsome Bolke, thanks for starting this.
> >>
> >> It looks like we are closer than I thought!
> >>
> >> We can use those security lists (though having our own would be nice) -
> >> either way we will need to make this prominent in the docs.
> >>
> >> Couple of points
> >>
> >> CS10: that github link is only visible to members of the team
> >>
> >> CD30: probably good as it is, we may want to do
> >> https://issues.apache.org/jira/browse/AIRFLOW-3400 <
> >> https://issues.apache.org/jira/browse/AIRFLOW-3400> to remove the last
> >> niggle of the GPL env var at install time (but not a hard requirement,
> just
> >> nice)
> >>
> >> -ash
> >>
> >>> On 26 Nov 2018, at 21:10, Stefan Seelmann 
> >> wrote:
> >>>
> >>> I agree that Apache Airflow should graduate.
> >>>
> >>> I'm only involved since beginning of this year, but the project did two
> >>> releases during that time, once TLP releasing becomes easier :)
> >>>
> >>> Regarding QU30 you may consider to use the ASF wide security mailing
> >>> list [3] and process [4].
> >>>
> >>> Kind Regards,
> >>> Stefan
> >>>
> >>> [3] https://www.apache.org/security/
> >>> [4] https://www.apache.org/security/committers.html
> >>>
> >>>
> >>> On 11/26/18 8:46 PM, Bolke de Bruin wrote:
>  Ping!
> 
>  Sent from my iPhone
> 
> > On 24 Nov 2018, at 12:57, Bolke de Bruin  wrote:
> >
> > Hi All,
> >
> > With the Apache Airflow community healthy and growing, I think now
> >> would be a good time to
> > discuss where we stand regarding to graduation from the Incubator,
> and
> >> what requirements remains.
> >
> > Apache Airflow entered incubation around 2 years ago, since then, the
> >> Airflow community learned
> > a lot about how to do things in Apache ways. Now we are a very
> helpful
> >> and engaged community,
> > ready to help on all questions from the Airflow community. We
> >> delivered multiple releases that have
> > been increasing in quality ever since, now we can do self-driving
> >> releases in good cadence.
> >
> > The community is growing, new committers and PPMC members keep
> >> joining. We addressed almost all
> > the maturity issues stipulated by Apache Project Maturity Model [1].
> >> So final requirements remain, but
> > those just need a final nudge. Committers and contributors are
> invited
> >> to verify the list and pick up the last
> > bits (QU30, CO50). Finally (yahoo!) all the License and IP issues we
> >> can see got resolved.
> >
> > Base on those, I believes it's time for us to graduate to TLP. [2]
> Any
> >> thoughts?
> > And welcome advice from Airflow Mentors?
> >
> > Thanks,
> >
> > [1]
> >> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
> > 

Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-11-28 Thread Kaxil Naik
I have updated http://incubator.apache.org/projects/airflow.html with news
(all our announcements since 2016), added links, add links to incubation
status reports.

@Jakob Homan  - Let me know if we need to update
anything else.

Regards,
Kaxil

On Wed, Nov 28, 2018 at 8:00 PM Kaxil Naik  wrote:

> Definitely willing to pick up the paperwork at
> http://incubator.apache.org/projects/airflow.html .
>
> @Jakob Homan   Is there any guide on what needs to be
> filled out?
>
>
>
> On Wed, Nov 28, 2018 at 6:04 PM Bolke de Bruin  wrote:
>
>> Ping!
>>
>> Verstuurd vanaf mijn iPad
>>
>> > Op 27 nov. 2018 om 21:39 heeft Bolke de Bruin  het
>> volgende geschreven:
>> >
>> > Hi Folks,
>> >
>> > Thanks all for your responses and particularly Stefan for his
>> suggestion to use the generic Apache way to handle security issues. This
>> seems to be an accepted way for more projects, so I have added this to the
>> maturity evaluation[1] and marked is as resolved. While handling the GPL
>> library can be nicer we are already in compliance with CD30, so @Fokko and
>> @Ash if you want to help out towards graduation please spend your time
>> elsewhere like fixing CO50. This means adding a page to confluence that
>> describes how to become a committer on the project. As we are following
>> Apache many examples of other projects are around[2]
>> >
>> > Then there is the paperwork[3] as referred to by Jakob. This mainly
>> concerns filling in some items, maybe here and there creation some
>> documentation but I don't think much. @Kaxil, @Tao: are you willing to pick
>> this up? @Sid can you share how to edit that page?
>> >
>> > If we have resolved these items in my opinion we can start the voting
>> here and at the IPMC thereafter, targeting the board meeting of January for
>> graduation. How’s that for a New Year’s resolution?
>> >
>> > Cheers!
>> > Bolke
>> >
>> > P.S. Would it be nice to have updated graduation web page? Maybe one of
>> the contributors/community members likes to take a stab at this[4]
>> >
>> > [1]
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
>> > [2]
>> https://cwiki.apache.org/confluence/display/HAWQ/Becoming+a+committer
>> > [3] http://incubator.apache.org/projects/airflow.html
>> > [4] https://airflow.apache.org/
>> >
>> >
>> >
>> >> On 27 Nov 2018, at 16:32, Driesprong, Fokko 
>> wrote:
>> >>
>> >> +1 from my side. Would be awesome to graduate Airflow
>> >>
>> >> If time allows, I'll also dive into CD30.
>> >>
>> >> Cheers, Fokko
>> >>
>> >> Op di 27 nov. 2018 om 16:21 schreef Ash Berlin-Taylor > >:
>> >>
>> >>> Oarsome Bolke, thanks for starting this.
>> >>>
>> >>> It looks like we are closer than I thought!
>> >>>
>> >>> We can use those security lists (though having our own would be nice)
>> -
>> >>> either way we will need to make this prominent in the docs.
>> >>>
>> >>> Couple of points
>> >>>
>> >>> CS10: that github link is only visible to members of the team
>> >>>
>> >>> CD30: probably good as it is, we may want to do
>> >>> https://issues.apache.org/jira/browse/AIRFLOW-3400 <
>> >>> https://issues.apache.org/jira/browse/AIRFLOW-3400> to remove the
>> last
>> >>> niggle of the GPL env var at install time (but not a hard
>> requirement, just
>> >>> nice)
>> >>>
>> >>> -ash
>> >>>
>>  On 26 Nov 2018, at 21:10, Stefan Seelmann 
>> >>> wrote:
>> 
>>  I agree that Apache Airflow should graduate.
>> 
>>  I'm only involved since beginning of this year, but the project did
>> two
>>  releases during that time, once TLP releasing becomes easier :)
>> 
>>  Regarding QU30 you may consider to use the ASF wide security mailing
>>  list [3] and process [4].
>> 
>>  Kind Regards,
>>  Stefan
>> 
>>  [3] https://www.apache.org/security/
>>  [4] https://www.apache.org/security/committers.html
>> 
>> 
>> > On 11/26/18 8:46 PM, Bolke de Bruin wrote:
>> > Ping!
>> >
>> > Sent from my iPhone
>> >
>> >> On 24 Nov 2018, at 12:57, Bolke de Bruin 
>> wrote:
>> >>
>> >> Hi All,
>> >>
>> >> With the Apache Airflow community healthy and growing, I think now
>> >>> would be a good time to
>> >> discuss where we stand regarding to graduation from the Incubator,
>> and
>> >>> what requirements remains.
>> >>
>> >> Apache Airflow entered incubation around 2 years ago, since then,
>> the
>> >>> Airflow community learned
>> >> a lot about how to do things in Apache ways. Now we are a very
>> helpful
>> >>> and engaged community,
>> >> ready to help on all questions from the Airflow community. We
>> >>> delivered multiple releases that have
>> >> been increasing in quality ever since, now we can do self-driving
>> >>> releases in good cadence.
>> >>
>> >> The community is growing, new committers and PPMC members keep
>> >>> joining. We addressed almost all
>> >> the maturity issues stipulated by Apache Project Maturity Model
>> [1].
>> >>> So final 

Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-11-28 Thread Kaxil Naik
Definitely willing to pick up the paperwork at
http://incubator.apache.org/projects/airflow.html .

@Jakob Homan   Is there any guide on what needs to be
filled out?



On Wed, Nov 28, 2018 at 6:04 PM Bolke de Bruin  wrote:

> Ping!
>
> Verstuurd vanaf mijn iPad
>
> > Op 27 nov. 2018 om 21:39 heeft Bolke de Bruin  het
> volgende geschreven:
> >
> > Hi Folks,
> >
> > Thanks all for your responses and particularly Stefan for his suggestion
> to use the generic Apache way to handle security issues. This seems to be
> an accepted way for more projects, so I have added this to the maturity
> evaluation[1] and marked is as resolved. While handling the GPL library can
> be nicer we are already in compliance with CD30, so @Fokko and @Ash if you
> want to help out towards graduation please spend your time elsewhere like
> fixing CO50. This means adding a page to confluence that describes how to
> become a committer on the project. As we are following Apache many examples
> of other projects are around[2]
> >
> > Then there is the paperwork[3] as referred to by Jakob. This mainly
> concerns filling in some items, maybe here and there creation some
> documentation but I don't think much. @Kaxil, @Tao: are you willing to pick
> this up? @Sid can you share how to edit that page?
> >
> > If we have resolved these items in my opinion we can start the voting
> here and at the IPMC thereafter, targeting the board meeting of January for
> graduation. How’s that for a New Year’s resolution?
> >
> > Cheers!
> > Bolke
> >
> > P.S. Would it be nice to have updated graduation web page? Maybe one of
> the contributors/community members likes to take a stab at this[4]
> >
> > [1]
> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
> > [2]
> https://cwiki.apache.org/confluence/display/HAWQ/Becoming+a+committer
> > [3] http://incubator.apache.org/projects/airflow.html
> > [4] https://airflow.apache.org/
> >
> >
> >
> >> On 27 Nov 2018, at 16:32, Driesprong, Fokko 
> wrote:
> >>
> >> +1 from my side. Would be awesome to graduate Airflow
> >>
> >> If time allows, I'll also dive into CD30.
> >>
> >> Cheers, Fokko
> >>
> >> Op di 27 nov. 2018 om 16:21 schreef Ash Berlin-Taylor :
> >>
> >>> Oarsome Bolke, thanks for starting this.
> >>>
> >>> It looks like we are closer than I thought!
> >>>
> >>> We can use those security lists (though having our own would be nice) -
> >>> either way we will need to make this prominent in the docs.
> >>>
> >>> Couple of points
> >>>
> >>> CS10: that github link is only visible to members of the team
> >>>
> >>> CD30: probably good as it is, we may want to do
> >>> https://issues.apache.org/jira/browse/AIRFLOW-3400 <
> >>> https://issues.apache.org/jira/browse/AIRFLOW-3400> to remove the last
> >>> niggle of the GPL env var at install time (but not a hard requirement,
> just
> >>> nice)
> >>>
> >>> -ash
> >>>
>  On 26 Nov 2018, at 21:10, Stefan Seelmann 
> >>> wrote:
> 
>  I agree that Apache Airflow should graduate.
> 
>  I'm only involved since beginning of this year, but the project did
> two
>  releases during that time, once TLP releasing becomes easier :)
> 
>  Regarding QU30 you may consider to use the ASF wide security mailing
>  list [3] and process [4].
> 
>  Kind Regards,
>  Stefan
> 
>  [3] https://www.apache.org/security/
>  [4] https://www.apache.org/security/committers.html
> 
> 
> > On 11/26/18 8:46 PM, Bolke de Bruin wrote:
> > Ping!
> >
> > Sent from my iPhone
> >
> >> On 24 Nov 2018, at 12:57, Bolke de Bruin  wrote:
> >>
> >> Hi All,
> >>
> >> With the Apache Airflow community healthy and growing, I think now
> >>> would be a good time to
> >> discuss where we stand regarding to graduation from the Incubator,
> and
> >>> what requirements remains.
> >>
> >> Apache Airflow entered incubation around 2 years ago, since then,
> the
> >>> Airflow community learned
> >> a lot about how to do things in Apache ways. Now we are a very
> helpful
> >>> and engaged community,
> >> ready to help on all questions from the Airflow community. We
> >>> delivered multiple releases that have
> >> been increasing in quality ever since, now we can do self-driving
> >>> releases in good cadence.
> >>
> >> The community is growing, new committers and PPMC members keep
> >>> joining. We addressed almost all
> >> the maturity issues stipulated by Apache Project Maturity Model [1].
> >>> So final requirements remain, but
> >> those just need a final nudge. Committers and contributors are
> invited
> >>> to verify the list and pick up the last
> >> bits (QU30, CO50). Finally (yahoo!) all the License and IP issues we
> >>> can see got resolved.
> >>
> >> Base on those, I believes it's time for us to graduate to TLP. [2]
> Any
> >>> thoughts?
> >> And welcome advice from Airflow Mentors?
> >>
> >> Thanks,
> >>
> >> 

Re: programmatically creating and airflow quirks

2018-11-28 Thread James Meickle
I would be very interested in helping draft a rearchitecting AIP. Of
course, that's a vague statement. I am interested in several specific areas
of Airflow functionality that would be hard to modify without some
refactoring taking place first:

1) Improving Airflow's data model so it's easier to have functional data
pipelines (such as addressing information propagation and artifacts via a
non-xcom mechanism)

2) Having point-in-timeness for DAGs: a concept of which revision of a DAG
was in use at which date, represented in-Airflow.

3) Better idioms and loading capabilities for DAG factories (either
config-driven, or non-Python creation of DAGs, like with boundary-layer).

4) Flexible execution dates: in finance we operate day over day, and have
valid use cases for "t-1", "t+0", and "t+1" dates. The current execution
date status is incredibly confusing for literally every developer we've
brought onto Airflow (they understand it eventually but do make mistakes at
first).

5) Scheduler-integrated sensors

6) Making Airflow more operator-friendly with better alerting, health
checks, notifications, deploy-time configuration, etc.

7) Improving testability of various components (both within the Airflow
repo, as well as making it easier to test DAGs and plugins)

8) Deprecating "newbie trap" or excess complexity features (like skips), by
fixing their internal implementation or by providing alternatives that
address their use cases in more sound ways.

To my mind, I would need Airflow to be more modular to accomplish several
of those. Even if these aims don't happen in Airflow contrib (as some are
quite contentious and have been discussed on this list before), it would
currently be nearly impossible to maintain an in-house branch that
attempted to implement them.

That being said, saying that it requires microservices is IMO incorrect.
Airflow already scales quite well, so while it needs more modularization,
we probably would see no benefit from immediately breaking those modules
into independent services.

On Wed, Nov 28, 2018 at 11:38 AM Ash Berlin-Taylor  wrote:

> I have similar feelings around the "core" of Airflow and would _love_ to
> somehow find time to spend a month really getting to grips with the
> scheduler and the dagbag and see what comes to light with fresh eyes and
> the benefits of hindsight.
>
> Finding that time is going to be A Challenge though.
>
> (Oh, except no to microservices. Airflow is hard enough to operator right
> now without splitting things in to even more daemons)
>
> -ash
> > On 26 Nov 2018, at 03:06, soma dhavala  wrote:
> >
> >
> >
> >> On Nov 26, 2018, at 7:50 AM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
> >>
> >> The historical reason is that people would check in scripts in the repo
> >> that had actual compute or other forms or undesired effect in module
> scope
> >> (scripts with no "if __name__ == '__main__':") and Airflow would just
> run
> >> this script while seeking for DAGs. So we added this mitigation patch
> that
> >> would confirm that there's something Airflow-related in the .py file.
> Not
> >> elegant, and confusing at times, but it also probably prevented some
> issues
> >> over the years.
> >>
> >> The solution here is to have a more explicit way of adding DAGs to the
> >> DagBag (instead of the folder-crawling approach). The DagFetcher
> proposal
> >> offers solutions around that, having a central "manifest" file that
> >> provides explicit pointers to all DAGs in the environment.
> >
> > Some rebasing needs to happen. When I looked at 1.8 code base almost an
> year ago, it felt like more complex than necessary.  What airflow is trying
> to promise from an architectural standpoint — that was not clear to me. It
> is trying to do too many things, scattered in too many places, is the
> feeling I got. As a result, I stopped peeping, and just trust that it works
> — which it does, btw. I tend to think that, airflow outgrew its original
> intents. A sort of micro-services architecture has to be brought in. I may
> sound critical, but no offense. I truly appreciate the contributions.
> >
> >>
> >> Max
> >>
> >> On Sat, Nov 24, 2018 at 5:04 PM Beau Barker 
> >> wrote:
> >>
> >>> In my opinion this searching for dags is not ideal.
> >>>
> >>> We should be explicitly specifying the dags to load somewhere.
> >>>
> >>>
>  On 25 Nov 2018, at 10:41 am, Kevin Yang  wrote:
> 
>  I believe that is mostly because we want to skip parsing/loading .py
> >>> files
>  that doesn't contain DAG defs to save time, as scheduler is going to
>  parse/load the .py files over and over again and some files can take
> >>> quite
>  long to load.
> 
>  Cheers,
>  Kevin Y
> 
>  On Fri, Nov 23, 2018 at 12:44 AM soma dhavala  >
>  wrote:
> 
> > happy to report that the “fix” worked. thanks Alex.
> >
> > btw, wondering why was it there in the first place? how does it help
> —
> > saves time, early 

Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-11-28 Thread Bolke de Bruin
Ping!

Verstuurd vanaf mijn iPad

> Op 27 nov. 2018 om 21:39 heeft Bolke de Bruin  het 
> volgende geschreven:
> 
> Hi Folks,
> 
> Thanks all for your responses and particularly Stefan for his suggestion to 
> use the generic Apache way to handle security issues. This seems to be an 
> accepted way for more projects, so I have added this to the maturity 
> evaluation[1] and marked is as resolved. While handling the GPL library can 
> be nicer we are already in compliance with CD30, so @Fokko and @Ash if you 
> want to help out towards graduation please spend your time elsewhere like 
> fixing CO50. This means adding a page to confluence that describes how to 
> become a committer on the project. As we are following Apache many examples 
> of other projects are around[2]
> 
> Then there is the paperwork[3] as referred to by Jakob. This mainly concerns 
> filling in some items, maybe here and there creation some documentation but I 
> don't think much. @Kaxil, @Tao: are you willing to pick this up? @Sid can you 
> share how to edit that page? 
> 
> If we have resolved these items in my opinion we can start the voting here 
> and at the IPMC thereafter, targeting the board meeting of January for 
> graduation. How’s that for a New Year’s resolution?
> 
> Cheers!
> Bolke
> 
> P.S. Would it be nice to have updated graduation web page? Maybe one of the 
> contributors/community members likes to take a stab at this[4]
> 
> [1] https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
> [2] https://cwiki.apache.org/confluence/display/HAWQ/Becoming+a+committer
> [3] http://incubator.apache.org/projects/airflow.html
> [4] https://airflow.apache.org/
> 
> 
> 
>> On 27 Nov 2018, at 16:32, Driesprong, Fokko  wrote:
>> 
>> +1 from my side. Would be awesome to graduate Airflow
>> 
>> If time allows, I'll also dive into CD30.
>> 
>> Cheers, Fokko
>> 
>> Op di 27 nov. 2018 om 16:21 schreef Ash Berlin-Taylor :
>> 
>>> Oarsome Bolke, thanks for starting this.
>>> 
>>> It looks like we are closer than I thought!
>>> 
>>> We can use those security lists (though having our own would be nice) -
>>> either way we will need to make this prominent in the docs.
>>> 
>>> Couple of points
>>> 
>>> CS10: that github link is only visible to members of the team
>>> 
>>> CD30: probably good as it is, we may want to do
>>> https://issues.apache.org/jira/browse/AIRFLOW-3400 <
>>> https://issues.apache.org/jira/browse/AIRFLOW-3400> to remove the last
>>> niggle of the GPL env var at install time (but not a hard requirement, just
>>> nice)
>>> 
>>> -ash
>>> 
 On 26 Nov 2018, at 21:10, Stefan Seelmann 
>>> wrote:
 
 I agree that Apache Airflow should graduate.
 
 I'm only involved since beginning of this year, but the project did two
 releases during that time, once TLP releasing becomes easier :)
 
 Regarding QU30 you may consider to use the ASF wide security mailing
 list [3] and process [4].
 
 Kind Regards,
 Stefan
 
 [3] https://www.apache.org/security/
 [4] https://www.apache.org/security/committers.html
 
 
> On 11/26/18 8:46 PM, Bolke de Bruin wrote:
> Ping!
> 
> Sent from my iPhone
> 
>> On 24 Nov 2018, at 12:57, Bolke de Bruin  wrote:
>> 
>> Hi All,
>> 
>> With the Apache Airflow community healthy and growing, I think now
>>> would be a good time to
>> discuss where we stand regarding to graduation from the Incubator, and
>>> what requirements remains.
>> 
>> Apache Airflow entered incubation around 2 years ago, since then, the
>>> Airflow community learned
>> a lot about how to do things in Apache ways. Now we are a very helpful
>>> and engaged community,
>> ready to help on all questions from the Airflow community. We
>>> delivered multiple releases that have
>> been increasing in quality ever since, now we can do self-driving
>>> releases in good cadence.
>> 
>> The community is growing, new committers and PPMC members keep
>>> joining. We addressed almost all
>> the maturity issues stipulated by Apache Project Maturity Model [1].
>>> So final requirements remain, but
>> those just need a final nudge. Committers and contributors are invited
>>> to verify the list and pick up the last
>> bits (QU30, CO50). Finally (yahoo!) all the License and IP issues we
>>> can see got resolved.
>> 
>> Base on those, I believes it's time for us to graduate to TLP. [2] Any
>>> thoughts?
>> And welcome advice from Airflow Mentors?
>> 
>> Thanks,
>> 
>> [1]
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
>> [2]
>>> https://incubator.apache.org/guides/graduation.html#graduating_to_a_top_level_project
>>> Regards,
 
>>> 
>>> 
> 


Re: programmatically creating and airflow quirks

2018-11-28 Thread Ash Berlin-Taylor
I have similar feelings around the "core" of Airflow and would _love_ to 
somehow find time to spend a month really getting to grips with the scheduler 
and the dagbag and see what comes to light with fresh eyes and the benefits of 
hindsight.

Finding that time is going to be A Challenge though.

(Oh, except no to microservices. Airflow is hard enough to operator right now 
without splitting things in to even more daemons)

-ash
> On 26 Nov 2018, at 03:06, soma dhavala  wrote:
> 
> 
> 
>> On Nov 26, 2018, at 7:50 AM, Maxime Beauchemin  
>> wrote:
>> 
>> The historical reason is that people would check in scripts in the repo
>> that had actual compute or other forms or undesired effect in module scope
>> (scripts with no "if __name__ == '__main__':") and Airflow would just run
>> this script while seeking for DAGs. So we added this mitigation patch that
>> would confirm that there's something Airflow-related in the .py file. Not
>> elegant, and confusing at times, but it also probably prevented some issues
>> over the years.
>> 
>> The solution here is to have a more explicit way of adding DAGs to the
>> DagBag (instead of the folder-crawling approach). The DagFetcher proposal
>> offers solutions around that, having a central "manifest" file that
>> provides explicit pointers to all DAGs in the environment.
> 
> Some rebasing needs to happen. When I looked at 1.8 code base almost an year 
> ago, it felt like more complex than necessary.  What airflow is trying to 
> promise from an architectural standpoint — that was not clear to me. It is 
> trying to do too many things, scattered in too many places, is the feeling I 
> got. As a result, I stopped peeping, and just trust that it works — which it 
> does, btw. I tend to think that, airflow outgrew its original intents. A sort 
> of micro-services architecture has to be brought in. I may sound critical, 
> but no offense. I truly appreciate the contributions.
> 
>> 
>> Max
>> 
>> On Sat, Nov 24, 2018 at 5:04 PM Beau Barker 
>> wrote:
>> 
>>> In my opinion this searching for dags is not ideal.
>>> 
>>> We should be explicitly specifying the dags to load somewhere.
>>> 
>>> 
 On 25 Nov 2018, at 10:41 am, Kevin Yang  wrote:
 
 I believe that is mostly because we want to skip parsing/loading .py
>>> files
 that doesn't contain DAG defs to save time, as scheduler is going to
 parse/load the .py files over and over again and some files can take
>>> quite
 long to load.
 
 Cheers,
 Kevin Y
 
 On Fri, Nov 23, 2018 at 12:44 AM soma dhavala 
 wrote:
 
> happy to report that the “fix” worked. thanks Alex.
> 
> btw, wondering why was it there in the first place? how does it help —
> saves time, early termination — what?
> 
> 
>> On Nov 23, 2018, at 8:18 AM, Alex Guziel 
>>> wrote:
>> 
>> Yup.
>> 
>> On Thu, Nov 22, 2018 at 3:16 PM soma dhavala  > wrote:
>> 
>> 
>>> On Nov 23, 2018, at 3:28 AM, Alex Guziel  > wrote:
>>> 
>>> It’s because of this
>>> 
>>> “When searching for DAGs, Airflow will only consider files where the
> string “airflow” and “DAG” both appear in the contents of the .py file.”
>>> 
>> 
>> Have not noticed it.  From airflow/models.py, in process_file — (both
>>> in
> 1.9 and 1.10)
>> ..
>> if not all([s in content for s in (b'DAG', b'airflow')]):
>> ..
>> is looking for those strings and if they are not found, it is returning
> without loading the DAGs.
>> 
>> 
>> So having “airflow” and “DAG”  dummy strings placed somewhere will make
> it work?
>> 
>> 
>>> On Thu, Nov 22, 2018 at 2:27 AM soma dhavala  > wrote:
>>> 
>>> 
 On Nov 22, 2018, at 3:37 PM, Alex Guziel  > wrote:
 
 I think this is what is going on. The dags are picked by local
> variables. I.E. if you do
 dag = Dag(...)
 dag = Dag(…)
>>> 
>>> from my_module import create_dag
>>> 
>>> for file in yaml_files:
>>>   dag = create_dag(file)
>>>   globals()[dag.dag_id] = dag
>>> 
>>> You notice that create_dag is in a different module. If it is in the
> same scope (file), it will be fine.
>>> 
 
>>> 
 Only the second dag will be picked up.
 
 On Thu, Nov 22, 2018 at 2:04 AM Soma S Dhavala <
>>> soma.dhav...@gmail.com
> > wrote:
 Hey AirFlow Devs:
 In our organization, we build a Machine Learning WorkBench with
> AirFlow as
 an orchestrator of the ML Work Flows, and have wrapped AirFlow python
 operators to customize the behaviour. These work flows are specified
>>> in
 YAML.
 
 We drop a DAG loader (written python) in the default location