Re: Graduation resolution passed - Airflow is a TLP

2018-12-20 Thread Driesprong, Fokko
Awesome! Congrats!

Cheers, Fokko

Op do 20 dec. 2018 om 22:40 schreef Sid Anand 

> YaaY!
>
> -s
>
> On Thu, Dec 20, 2018 at 1:13 PM Jakob Homan  wrote:
>
> > Hey all-
> >The Board minutes haven't been published yet (probably due to
> > Holiday-related slowness), but I can see through the admin tool that
> > our Graduation resolution was approved yesterday at the meeting.
> > Airflow is the 199th current active Top Level Project in Apache.
> >
> > Congrats all.
> >
> > -Jakob
> >
>


Refactor models.py

2018-12-06 Thread Driesprong, Fokko
Hi All,

I think it is time to refactor the infamous models.py. This file is far too
big, and it only keeps growing. My suggestion is to create a new package,
called models, which will contain all the orm classes (the ones
with __tablename__ in the class). And for example the BaseOperator to the
operator packages. I've created a lot of tickets to move the classes one by
one out of models.py. The reason to do this one by one is to relieve the
pain of fixing the circular dependencies.

Refactor: Move DagBag out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3456

Refactor: Move User out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3457

Refactor: Move Connection out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3458

Refactor: Move DagPickle out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3459

Refactor: Move TaskInstance out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3460

Refactor: Move TaskFail out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3461

Refactor: Move TaskReschedule out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3462

Refactor: Move Log out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3463

Refactor: Move SkipMixin out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3464

Refactor: Move BaseOperator out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3465

Refactor: Move DAG out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3466

Refactor: Move Chart out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3467

Refactor: Move KnownEventType out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3468

Refactor: Move KnownEvent out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3469

Refactor: Move Variable out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3470

Refactor: Move XCom out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3471

Refactor: Move DagStat out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3472

Refactor: Move DagRun out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3473

Refactor: Move SlaMiss out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3474

Refactor: Move ImportError out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3475

Refactor: Move KubeResourceVersion out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3476

Refactor: Move KubeWorkerIdentifier out of models.py
https://issues.apache.org/jira/browse/AIRFLOW-3477

Some classes are really simple, and would also be a nice opportunity for
newcomers to start contributing to Airflow :-)

Cheers, Fokko


Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-12-02 Thread Driesprong, Fokko
Ash,

I can pick this up as early as the beginning of next week. Right now this
is what we have:
https://cwiki.apache.org/confluence/display/AIRFLOW/Committers But I might
extend it a little to make it a bit more clear.

Cheers, Fokko

Op zo 2 dec. 2018 om 12:56 schreef Sid Anand :

> Great!
>
> -s
>
> On Sun, Dec 2, 2018 at 5:53 AM Ash Berlin-Taylor  wrote:
>
> > I've created two tickets to add QU30 and CO50 to our docs.
> >
> > (I think even if we use sec@a.o we should still add something to our
> docs
> > saying how to do it)
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-3430 -- Fokko: did you get
> > anywhere on this?
> > https://issues.apache.org/jira/browse/AIRFLOW-3431 -- I'll make a start
> > on this
> >
> > -ash
> >
> > > On 30 Nov 2018, at 22:06, Bolke de Bruin  wrote:
> > >
> > > Thanks Jakob!
> > >
> > > Verstuurd vanaf mijn iPad
> > >
> > >> Op 30 nov. 2018 om 22:49 heeft Jakob Homan  het
> > volgende geschreven:
> > >>
> > >> I've finished the paperwork.  I don't seem to have karma to trigger
> > >> the build on Jenkins, so we'll just wait for the daily rebuild.  With
> > >> that, I've opened the VOTE thread as well.  Thanks everybody.
> > >>> On Wed, Nov 28, 2018 at 5:08 PM Jakob Homan 
> wrote:
> > >>>
> > >>> I'll finish up the template at
> > >>> http://incubator.apache.org/projects/airflow.html tomorrow or Friday
> > >>> (I *think* you have to be an IPMC member to update it since it lives
> > >>> in the Incubator SVN).  Looks like there's no actual work to do, just
> > >>> marking stuff that has been done but not yet recorded, and verifying
> > >>> some licenses.
> > >>>
> > >>> -Jakob
> > >>>
> > >>>
> > >>>
> > >>>> On Wed, Nov 28, 2018 at 2:48 PM Tao Feng 
> wrote:
> > >>>>
> > >>>> Sorry, just saw Kaxil's latest email. Kaxil, is there anything else
> I
> > could
> > >>>> help with?
> > >>>>
> > >>>> Thanks,
> > >>>> -Tao
> > >>>>
> > >>>>> On Wed, Nov 28, 2018 at 2:40 PM Tao Feng 
> > wrote:
> > >>>>>
> > >>>>> I would like to help on the documentation. Let me take a look at
> it.
> > I
> > >>>>> will work Kaxil on that.
> > >>>>>
> > >>>>>> On Tue, Nov 27, 2018 at 12:39 PM Bolke de Bruin <
> bdbr...@gmail.com>
> > wrote:
> > >>>>>>
> > >>>>>> Hi Folks,
> > >>>>>>
> > >>>>>> Thanks all for your responses and particularly Stefan for his
> > suggestion
> > >>>>>> to use the generic Apache way to handle security issues. This
> seems
> > to be
> > >>>>>> an accepted way for more projects, so I have added this to the
> > maturity
> > >>>>>> evaluation[1] and marked is as resolved. While handling the GPL
> > library can
> > >>>>>> be nicer we are already in compliance with CD30, so @Fokko and
> @Ash
> > if you
> > >>>>>> want to help out towards graduation please spend your time
> > elsewhere like
> > >>>>>> fixing CO50. This means adding a page to confluence that describes
> > how to
> > >>>>>> become a committer on the project. As we are following Apache many
> > examples
> > >>>>>> of other projects are around[2]
> > >>>>>>
> > >>>>>> Then there is the paperwork[3] as referred to by Jakob. This
> mainly
> > >>>>>> concerns filling in some items, maybe here and there creation some
> > >>>>>> documentation but I don't think much. @Kaxil, @Tao: are you
> willing
> > to pick
> > >>>>>> this up? @Sid can you share how to edit that page?
> > >>>>>>
> > >>>>>> If we have resolved these items in my opinion we can start the
> > voting
> > >>>>>> here and at the IPMC thereafter, targeting the board meeting of
> > January for
> > >>>>>> graduation. How’s that for a New Year’s resolution?
> > >>>>>>
> > >>>>>> Cheers!
> > >>>>>> Bo

Re: [VOTE] Graduate the Apache Airflow as a TLP

2018-11-30 Thread Driesprong, Fokko
+1 binding

Op vr 30 nov. 2018 om 23:05 schreef Bolke de Bruin 

> +1, binding
>
> Yahoo! :-)
>
> Verstuurd vanaf mijn iPad
>
> > Op 30 nov. 2018 om 22:48 heeft Tao Feng  het
> volgende geschreven:
> >
> > +1 (binding)
> >
> > Thanks Jakob and everyone!
> >
> >> On Fri, Nov 30, 2018 at 1:33 PM Jakob Homan  wrote:
> >>
> >> Hey all!
> >>
> >> Following a very successful DISCUSS[1] regarding graduating Airflow to
> >> Top Level Project (TLP) status, I'm starting the official VOTE.
> >>
> >> Since entering the Incubator in 2016, the community has:
> >>   * successfully produced 7 releases
> >>   * added 9 new committers/PPMC members
> >>   * built a diverse group of committers from multiple different
> employers
> >>   * had more than 3,300 JIRA tickets opened
> >>   * completed the project maturity model with positive responses[2]
> >>
> >> Accordingly, I believe we're ready to graduate and am calling a VOTE
> >> on the following graduation resolution.  This VOTE will remain open
> >> for at least 72 hours.  If successful, the resolution will be
> >> forwarded to the IPMC for its consideration.  If that VOTE is
> >> successful, the resolution will be voted upon by the Board at its next
> >> monthly meeting.
> >>
> >> Everyone is encouraged to vote, even if their vote is not binding.
> >> We've built a nice community here, let's make sure everyone has their
> >> voice heard.
> >>
> >> Thanks,
> >> Jakob
> >>
> >> [1]
> >>
> https://lists.apache.org/thread.html/%3c0a763b0b-7d0d-4353-979a-ac6769eb0...@gmail.com%3E
> >> [2]
> >> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
> >>
> >> 
> >>
> >> Establish the Apache Airflow Project
> >>
> >> WHEREAS, the Board of Directors deems it to be in the best
> >> interests of the Foundation and consistent with the
> >> Foundation's purpose to establish a Project Management
> >> Committee charged with the creation and maintenance of
> >> open-source software, for distribution at no charge to
> >> the public, related to workflow automation and scheduling
> >> that can be used to author and manage data pipelines.
> >>
> >> NOW, THEREFORE, BE IT RESOLVED, that a Project Management
> >> Committee (PMC), to be known as the "Apache Airflow Project",
> >> be and hereby is established pursuant to Bylaws of the
> >> Foundation; and be it further
> >>
> >> RESOLVED, that the Apache Airflow Project be and hereby is
> >> responsible for the creation and maintenance of software
> >> related to workflow automation and scheduling that can be
> >> used to author and manage data pipelines; and be it further
> >>
> >> RESOLVED, that the office of "Vice President, Apache Airflow" be
> >> and hereby is created, the person holding such office to
> >> serve at the direction of the Board of Directors as the chair
> >> of the Apache Airflow Project, and to have primary responsibility
> >> for management of the projects within the scope of
> >> responsibility of the Apache Airflow Project; and be it further
> >>
> >> RESOLVED, that the persons listed immediately below be and
> >> hereby are appointed to serve as the initial members of the
> >> Apache Airflow Project:
> >>
> >> * Alex Guziel 
> >> * Alex Van Boxel 
> >> * Arthur Wiedmer 
> >> * Ash Berlin-Taylor 
> >> * Bolke de Bruin 
> >> * Chris Riccomini 
> >> * Dan Davydov 
> >> * Fokko Driesprong 
> >> * Hitesh Shah 
> >> * Jakob Homan 
> >> * Jeremiah Lowin 
> >> * Joy Gao 
> >> * Kaxil Naik 
> >> * Maxime Beauchemin 
> >> * Siddharth Anand 
> >> * Sumit Maheshwari 
> >> * Tao Feng 
> >>
> >> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Bolke de Bruin
> >> be appointed to the office of Vice President, Apache Airflow, to
> >> serve in accordance with and subject to the direction of the
> >> Board of Directors and the Bylaws of the Foundation until
> >> death, resignation, retirement, removal or disqualification,
> >> or until a successor is appointed; and be it further
> >>
> >> RESOLVED, that the initial Apache Airflow PMC be and hereby is
> >> tasked with the creation of a set of bylaws intended to
> >> encourage open development and increased participation in the
> >> Apache Airflow Project; and be it further
> >>
> >> RESOLVED, that the Apache Airflow Project be and hereby
> >> is tasked with the migration and rationalization of the Apache
> >> Incubator Airflow podling; and be it further
> >>
> >> RESOLVED, that all responsibilities pertaining to the Apache
> >> Incubator Airflow podling encumbered upon the Apache Incubator
> >> Project are hereafter discharged.
> >>
>


Re: [DISCUSS] Apache Airflow graduation from the incubator

2018-11-27 Thread Driesprong, Fokko
+1 from my side. Would be awesome to graduate Airflow

If time allows, I'll also dive into CD30.

Cheers, Fokko

Op di 27 nov. 2018 om 16:21 schreef Ash Berlin-Taylor :

> Oarsome Bolke, thanks for starting this.
>
> It looks like we are closer than I thought!
>
> We can use those security lists (though having our own would be nice) -
> either way we will need to make this prominent in the docs.
>
> Couple of points
>
> CS10: that github link is only visible to members of the team
>
> CD30: probably good as it is, we may want to do
> https://issues.apache.org/jira/browse/AIRFLOW-3400 <
> https://issues.apache.org/jira/browse/AIRFLOW-3400> to remove the last
> niggle of the GPL env var at install time (but not a hard requirement, just
> nice)
>
> -ash
>
> > On 26 Nov 2018, at 21:10, Stefan Seelmann 
> wrote:
> >
> > I agree that Apache Airflow should graduate.
> >
> > I'm only involved since beginning of this year, but the project did two
> > releases during that time, once TLP releasing becomes easier :)
> >
> > Regarding QU30 you may consider to use the ASF wide security mailing
> > list [3] and process [4].
> >
> > Kind Regards,
> > Stefan
> >
> > [3] https://www.apache.org/security/
> > [4] https://www.apache.org/security/committers.html
> >
> >
> > On 11/26/18 8:46 PM, Bolke de Bruin wrote:
> >> Ping!
> >>
> >> Sent from my iPhone
> >>
> >>> On 24 Nov 2018, at 12:57, Bolke de Bruin  wrote:
> >>>
> >>> Hi All,
> >>>
> >>> With the Apache Airflow community healthy and growing, I think now
> would be a good time to
> >>> discuss where we stand regarding to graduation from the Incubator, and
> what requirements remains.
> >>>
> >>> Apache Airflow entered incubation around 2 years ago, since then, the
> Airflow community learned
> >>> a lot about how to do things in Apache ways. Now we are a very helpful
> and engaged community,
> >>> ready to help on all questions from the Airflow community. We
> delivered multiple releases that have
> >>> been increasing in quality ever since, now we can do self-driving
> releases in good cadence.
> >>>
> >>> The community is growing, new committers and PPMC members keep
> joining. We addressed almost all
> >>> the maturity issues stipulated by Apache Project Maturity Model [1].
> So final requirements remain, but
> >>> those just need a final nudge. Committers and contributors are invited
> to verify the list and pick up the last
> >>> bits (QU30, CO50). Finally (yahoo!) all the License and IP issues we
> can see got resolved.
> >>>
> >>> Base on those, I believes it's time for us to graduate to TLP. [2] Any
> thoughts?
> >>> And welcome advice from Airflow Mentors?
> >>>
> >>> Thanks,
> >>>
> >>> [1]
> https://cwiki.apache.org/confluence/display/AIRFLOW/Maturity+Evaluation
> >>> [2]
> https://incubator.apache.org/guides/graduation.html#graduating_to_a_top_level_project
> Regards,
> >
>
>


Re: Remove airflow from pypi

2018-11-27 Thread Driesprong, Fokko
Hi all,

I've pushed a replacement artifact on airflow which will throw an error
when it is being installed:

MacBook-Pro-van-Fokko:airflow fokkodriesprong$ pip install airflow==0.6

Collecting airflow==0.6

  Downloading
https://files.pythonhosted.org/packages/98/e7/d8cad667296e49a74d64e0a55713fcd491301a2e2e0e82b94b065fda3087/airflow-0.6.tar.gz

Complete output from command python setup.py egg_info:


/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py:267:
UserWarning: Unknown distribution option: 'long_description_content_type'

  warnings.warn(msg)

running egg_info

creating pip-egg-info/airflow.egg-info

writing pip-egg-info/airflow.egg-info/PKG-INFO

writing top-level names to pip-egg-info/airflow.egg-info/top_level.txt

writing dependency_links to
pip-egg-info/airflow.egg-info/dependency_links.txt

writing manifest file 'pip-egg-info/airflow.egg-info/SOURCES.txt'

reading manifest file 'pip-egg-info/airflow.egg-info/SOURCES.txt'

writing manifest file 'pip-egg-info/airflow.egg-info/SOURCES.txt'

Traceback (most recent call last):

  File "", line 1, in 

  File
"/private/var/folders/km/xypq2kxs4ys3dt6bwtd4fbj0gn/T/pip-install-APOym3/airflow/setup.py",
line 32, in 

raise RuntimeError('Please install package apache-airflow instead
of airflow')

RuntimeError: Please install package apache-airflow instead of airflow





Command "python setup.py egg_info" failed with error code 1 in
/private/var/folders/km/xypq2kxs4ys3dt6bwtd4fbj0gn/T/pip-install-APOym3/airflow/

MacBook-Pro-van-Fokko:airflow fokkodriesprong$ pip3 install airflow==0.6

Collecting airflow==0.6

  Using cached
https://files.pythonhosted.org/packages/98/e7/d8cad667296e49a74d64e0a55713fcd491301a2e2e0e82b94b065fda3087/airflow-0.6.tar.gz

Complete output from command python setup.py egg_info:

running egg_info

creating pip-egg-info/airflow.egg-info

writing pip-egg-info/airflow.egg-info/PKG-INFO

writing dependency_links to
pip-egg-info/airflow.egg-info/dependency_links.txt

writing top-level names to pip-egg-info/airflow.egg-info/top_level.txt

writing manifest file 'pip-egg-info/airflow.egg-info/SOURCES.txt'

reading manifest file 'pip-egg-info/airflow.egg-info/SOURCES.txt'

writing manifest file 'pip-egg-info/airflow.egg-info/SOURCES.txt'


/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py:261:
UserWarning: Unknown distribution option: 'long_description_content_type'

  warnings.warn(msg)

Traceback (most recent call last):

  File "", line 1, in 

  File
"/private/var/folders/km/xypq2kxs4ys3dt6bwtd4fbj0gn/T/pip-install-7y_awdxy/airflow/setup.py",
line 32, in 

raise RuntimeError('Please install package apache-airflow instead
of airflow')

RuntimeError: Please install package apache-airflow instead of airflow





Command "python setup.py egg_info" failed with error code 1 in
/private/var/folders/km/xypq2kxs4ys3dt6bwtd4fbj0gn/T/pip-install-7y_awdxy/airflow/

You are using pip version 10.0.1, however version 18.1 is available.

You should consider upgrading via the 'pip install --upgrade pip' command.

Cheers, Fokko


Op di 27 nov. 2018 om 02:39 schreef Kevin Yang :

> +1 for having a placeholder for security reason. Not sure about adding
> dependency on apache-airflow tho--if the warning is overlooked and stuff
> starting to break because of backward incompatibility, it can be quite
> confusing.
>
> Cheers,
> Kevin Y
>
> On Sat, Nov 24, 2018 at 5:19 PM soma dhavala 
> wrote:
>
> >
> >
> > > On Nov 25, 2018, at 5:02 AM, George Leslie-Waksman 
> > wrote:
> > >
> > > It's probably a good idea to put something at "airflow", even if it
> > > just fails to install and tells people to install apache-airflow
> > > instead.
> > >
> > > If not, there's a risk someone squats the name airflow and puts up
> > > something malicious.
> > >
> >
> > + 1
> >
> > > --George
> > > On Fri, Nov 23, 2018 at 11:44 AM Driesprong, Fokko
> 
> > wrote:
> > >>
> > >> Thanks Dan for picking this up quickly.
> > >>
> > >> Op vr 23 nov. 2018 om 18:31 schreef Kaxil Naik :
> > >>
> > >>> Thanks Dan
> > >>>
> > >>> On Fri, Nov 23, 2018 at 3:44 PM Dan Davydov
> > 
> > >>> wrote:
> > >>>
> > >>>> This could potentially break builds for some users but I feel the
> pros
> > >>>> mentioned outweigh this, I went ahead and deleted it

Re: Remove airflow from pypi

2018-11-23 Thread Driesprong, Fokko
Thanks Dan for picking this up quickly.

Op vr 23 nov. 2018 om 18:31 schreef Kaxil Naik :

> Thanks Dan
>
> On Fri, Nov 23, 2018 at 3:44 PM Dan Davydov 
> wrote:
>
> > This could potentially break builds for some users but I feel the pros
> > mentioned outweigh this, I went ahead and deleted it.
> >
> > On Fri, Nov 23, 2018 at 10:18 AM Bolke de Bruin 
> wrote:
> >
> > > Agree! This is even a security issue.
> > >
> > > Sent from my iPhone
> > >
> > > > On 23 Nov 2018, at 15:29, Driesprong, Fokko 
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I think we should remove airflow <https://pypi.org/project/airflow/>
> > > (not
> > > > apache-airflow) from Pypi. I still get questions from people who
> > > > accidentally install Airflow 1.8.0. I see this is maintained
> > > > by mistercrunch, artwr, aeon. Anyone any objections?
> > > >
> > > > Cheers, Fokko
> > >
> >
>
>
> --
> *Kaxil Naik*
> *Big Data Consultant *@ *Data Reply UK*
> *Certified *Google Cloud Data Engineer | *Certified* Apache Spark & Neo4j
> Developer
> *Phone: *+44 (0) 74820 88992
> *LinkedIn*: https://www.linkedin.com/in/kaxil
>


Remove airflow from pypi

2018-11-23 Thread Driesprong, Fokko
Hi all,

I think we should remove airflow  (not
apache-airflow) from Pypi. I still get questions from people who
accidentally install Airflow 1.8.0. I see this is maintained
by mistercrunch, artwr, aeon. Anyone any objections?

Cheers, Fokko


Re: Task got executed even when it's only immediate parent was in upstream failed state

2018-11-18 Thread Driesprong, Fokko
Hi Abhishek,

That sounds very specific. There are a lot of improvements since 1.10. For
now, the only advice I can give, please save the logs when this happens
again.

Cheers, Fokko

Op zo 18 nov. 2018 om 18:36 schreef Abhishek Sinha :

> Hi Fokko,
>
> This would be difficult to test. I am not sure of the condition when it
> occurred. Our DAG runs nightly and this has happened once in the past three
> months.
>
> If this is something that has been identified and fixed in 1.10, I can
> make an attempt to upgrade it. But I did not see anything related to this
> in the changelog.
>
>
>
>
>
>
> Regards,
>
> Abhishek
>
> On 18 November 2018 at 8:54:43 AM, Driesprong, Fokko (fo...@driesprong.frl)
> wrote:
>
> Hi Abhishek,
>
> Do you think it would be difficult to verify if this behavior still exists
> in 1.10? Maybe write a dummy dag, and run it using Docker.
>
> Cheers, Fokko
>
> Op vr 16 nov. 2018 om 21:45 schreef Abhishek Sinha :
>
>
> > Airflow Version 1.8.2, Celery and Postgres
> >
> > Task (and its downstream nodes) got executed even when it's only
> immediate
> > parent was in upstream failed state. The trigger rule for all nodes was
> set
> > to "all_success".
> >
> > Looks like a bug in version 1.8.2. Can someone help me on this?
> >
> >
> >
> >
> >
> >
> >
> >
> > Regards,
> >
> > Abhishek
> >
>
>


Re: Fusing operators together

2018-11-18 Thread Driesprong, Fokko
Hi Shubham,

I think the EmrStepOperator and EmrStepSensor are a clear exception. Most
operators wait until the operation has finished successfully. For example,
the DruidOperator will block until the indexing job has successfully
finished:
https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/druid_hook.py#L84-L109.
I think this should also be the case of the EmrStepOperator, but this
slipped through at the review. Hope this helps.

Cheers, Fokko



Op wo 14 nov. 2018 om 21:56 schreef Shubham Gupta <
y2k.shubhamgu...@gmail.com>:

> *[Please let me know if this is NOT the correct place for such a query]*
>
> Hello maintainers and committers,
> I've stumbled upon this design decision for my Airflow project. Any
> pointers would be helpful.
>
> Overview
>
>- I'm in the process of deploying Airflow and I've felt the need to
>merge groups of operators that form a single logical task (to clear the
>clutter in huge DAGs)
>- The most common use-case would be coupling an operator and the
>corresponding sensor. For instance, one might want to chain together the
>EmrStepOperator and EmrStepSensor
>
>
> 
>
> Possible approaches
>
>- This could be achieved by offloading actual logic to Hooks and then
>using as many hooks as needed within an operator
>- A hacky alternative (if at all) would be SubDagOperator
>
>
> 
>
> Questions
>
>- Are hooks the right tool for this problem?
>- Any other way to compose operators together?
>- Is it a good idea to combine operators at all?
>
>
> Here's  my complete (more
> elaborate) question on StackOverflow
>
> Thanks
>
> *Shubham Gupta*
> Software Engineer
>  zomato
>


Re: Task got executed even when it's only immediate parent was in upstream failed state

2018-11-18 Thread Driesprong, Fokko
Hi Abhishek,

Do you think it would be difficult to verify if this behavior still exists
in 1.10? Maybe write a dummy dag, and run it using Docker.

Cheers, Fokko

Op vr 16 nov. 2018 om 21:45 schreef Abhishek Sinha :

> Airflow Version 1.8.2, Celery and Postgres
>
> Task (and its downstream nodes) got executed even when it's only immediate
> parent was in upstream failed state. The trigger rule for all nodes was set
> to "all_success".
>
> Looks like a bug in version 1.8.2. Can someone help me on this?
>
>
>
>
>
>
>
>
> Regards,
>
> Abhishek
>


Re: [VOTE] Airflow 1.10.1 RC2

2018-11-18 Thread Driesprong, Fokko
A +1 from my side as well.

Thanks for picking this up Ash. Just checked the new release using Docke
r,
everything seems to work.

Cheers, Fokko

Op za 17 nov. 2018 om 16:43 schreef Deng Xiaodong :

> Even though my vote is non-binding, I would like to change my vote to +1 as
> well.
> Reason being the both points I suggested earlier were not regressions from
> 1.10.0, and they should not be blocking the release.
>
> Cheers.
>
> XD
>
> On Sat, Nov 17, 2018 at 8:11 PM Naik Kaxil  wrote:
>
> > +1 (binding) . I am convinced, we should follow up with 1.10.2 with fixes
> > soon with small number of commits avoiding a huge gap again between minor
> > releases.
> >
> > Regards,
> > Kaxil
> >
> > On 17/11/2018, 11:53, "Ash Berlin-Taylor"  wrote:
> >
> > The RBAC UI is still marked as experimental and this isn't a
> > regression from 1.10.0, so could you be convinced to change this to a +1?
> >
> > There are other more critical changes I would like to get out, and I
> > will follow up straight away with a 1.10.2 that addresses this and XD's
> > points.
> >
> > (I feel Bolke's pain :) I'm now moderately annoyed at the Apache
> > release process and how long it takes, it means each release ends up
> > getting big)
> >
> > -ash
> >
> > >
> >
> > Kaxil Naik
> >
> > Data Reply
> > Nova South
> > 160 Victoria Street, Westminster
> > London SW1E 5LB - UK
> > phone: +44 (0)20 7730 6000
> > k.n...@reply.com
> > www.reply.com
> > On 17 Nov 2018, at 01:01, Naik Kaxil  wrote:
> > >
> > > -1 (binding) . Tested it on Python 2.7.14, got expected result but
> > had 1 security concern that I want to get in the release.
> > >
> > > Even when 'expose_config'=False, RBAC you still shows the configs
> > which can contain sensitive information like airflow metadb passwords,
> etc.
> > >
> > > If we can get that in +1 from me. The PR with this fixed has been
> > merged in the master, commit:
> >
> https://github.com/apache/incubator-airflow/commit/85abd44e241e17338a800e37a3c2e85ef346898d
> > <
> >
> https://github.com/apache/incubator-airflow/commit/85abd44e241e17338a800e37a3c2e85ef346898d
> > >
> > >
> > > PR: https://github.com/apache/incubator-airflow/pull/4194 <
> > https://github.com/apache/incubator-airflow/pull/4194>
> > >
> > > Regards,
> > > Kaxil
> > >
> > > On 16/11/2018, 13:41, "Deng Xiaodong"   > xd.den...@gmail.com>> wrote:
> > >
> > >Hi Ash,
> > >
> > >I would like to give -1 (non-binding), due to two reasons we
> > discussed
> > >earlier on Slack:
> > >
> > >- there is an issue with the new “delete DAG” button in UI. It’s
> > a great
> > >feature, so let’s try to release it “bug-less”. The fix is in PR
> > >https://github.com/apache/incubator-airflow/pull/4069 (But
> > understand your
> > >concern is that this PR comes with no test yet).
> > >
> > >- it may be good to pin all dependencies to a specific version
> to
> > avoid the
> > >incident caused by dependency breaking change (like what happens
> > to Redis
> > >yesterday)
> > >
> > >
> > >Last but not least: nice job! Thanks for your works!
> > >
> > >
> > >XD
> > >
> > >
> > >On Fri, Nov 16, 2018 at 21:13 Ash Berlin-Taylor  >
> > wrote:
> > >
> > >> Friendly reminder for people (and especially committers) to test
> > this out
> > >> and vote on it please!
> > >>
> > >> -ash
> > >>
> > >>>
> > >
> > > Kaxil Naik
> > >
> > > Data Reply
> > > Nova South
> > > 160 Victoria Street, Westminster
> > > London SW1E 5LB - UK
> > > phone: +44 (0)20 7730 6000
> > > k.n...@reply.com 
> > > www.reply.com 
> > > On 14 Nov 2018, at 22:31, Ash Berlin-Taylor   > a...@apache.org>> wrote:
> > >>>
> > >>> Hey all,
> > >>>
> > >>> I have cut Airflow 1.10.1 RC2. This email is calling a vote on
> the
> > >> release, which will last for 72 hours. Consider this my (binding)
> > +1.
> > >>>
> > >>> Airflow 1.10.1 RC2 is available at:
> > >>>
> > >>>
> > https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.1rc2/
> > >>>
> > >>> apache-airflow-1.10.1rc2+incubating-source.tar.gz is a source
> > release
> > >> that comes with INSTALL instructions.
> > >>> apache-airflow-1.10.1rc2+incubating-bin.tar.gz is the binary
> Python
> > >> "sdist" release.
> > >>>
> > >>> Public keys are available at:
> > >>>
> > >>>
> https://dist.apache.org/repos/dist/release/incubator/airflow/KEYS
> > >>>
> > >>> This release candidate has been published to PyPi as a convince
> for
> > >> testing, but the vote is against the published artefacts at the
> > above URL,
> > >> and not this. To install from 

Re: Can I change the subject line for alert emails

2018-11-13 Thread Driesprong, Fokko
Thanks. Alek, can you rebase the PR, and resolve conflicts?

Op ma 12 nov. 2018 om 19:42 schreef Alek Storm :

> I have a PR I need to rebase that would enable this (we had the exact same
> requirement): https://github.com/apache/incubator-airflow/pull/2338
>
> Alek
>
> On Mon, Nov 12, 2018 at 12:40 PM muralisa...@gmail.com <
> muralisa...@gmail.com> wrote:
>
> > Hi,
> >
> > We have  airflow running in different environments. I want to customize
> > the subject line in airflow alert to add environment name so that we
> could
> > perform different Acton's upon receiving the alerts from different
> > environments.
> >
> > I 'd really appreciate if any one can help.
> >
> >
> > Regards,
> > Murali
> >
>


Re: REST API roadmap/plan?

2018-11-05 Thread Driesprong, Fokko
Thanks Matthew,

We're working on getting the experimental API. This API should be used for
both the GUI and can be used for external systems to interface with Airflow
(triggering dags for example). Verdan was working on this, but for now,
this process is a bit stuck again. It turns out that data engineers aren't
really good at doing front end.
There isn't a full roadmap at the moment. There are some tickets, but
nothing with a full description:
https://issues.apache.org/jira/browse/AIRFLOW-890?jql=project%20%3D%20AIRFLOW%20AND%20status%20%3D%20Open%20AND%20text%20~%20%22REST%22

Please feel free to pick this up and get rid of the logic from the GUI (web
interface), or extending the experimental API with sensible endpoints :-)

If there are any questions, let me know.

Cheers, Fokko



Op wo 31 okt. 2018 om 21:00 schreef matthew :

> I've been poking around Jira and confluence but haven't seen any roadmap or
> plans for the REST API.  Did I just miss it or has it stalled out?
>
> I'm interested in working on it if it needs some help.
>
> Thanks
> -Matthew
>


Re: Apache Airflow / Cloud Composer workshops Amsterdam

2018-10-15 Thread Driesprong, Fokko
Hi Ben,

Sorry for the late reply, I had to ask a colleague. The session will be
recorded and published on Youtube, I will share the link. Unfortunately no
live-stream, so you'll have to be a bit patient ;)

Cheers, Fokko

Op vr 12 okt. 2018 om 21:22 schreef Ben Gregory :

> Hey Fokko!
>
> Sounds like a great event! Will any of the talks/workshops be
> streamed/livecast/recorded for those of us who can't make it to Amsterdam?
>
> - Ben
>
> On Fri, Oct 12, 2018 at 12:40 PM Driesprong, Fokko 
> wrote:
>
> > Hi all,
> >
> > From October 15-19, 2018, GoDataFest takes place in Amsterdam, The
> > Netherlands. This week is dedicated to data technology and features free
> > talks, training sessions and workshops.
> >
> > Leading tech companies, like AWS (Monday, October 15), Dataiku (Tuesday,
> > October 16), Databricks (Wednesday, October 17), and Google Cloud
> > (Thursday, October 18) each host an entire day to share their latest
> > innovations. The final day, Friday, October 19, is dedicated to
> > open-source, including Apache Airflow. During the open-source day,
> October
> > 19, we organize a free Airflow workshop, taking place from 15:00 – 17:00.
> >
> > Feel free to mix-and-match activities to create your ultimate and
> personal
> > data festival. Make sure to register directly, as seats are limited.
> > http://www.godatafest.com/
> >
> > Cheers, Fokko
> >
>
>
> --
>
> [image: Astronomer Logo] <https://www.astronomer.io/>
>
> *Ben Gregory*
> Data Engineer
>
> Mobile: +1-615-483-3653 • Online: astronomer.io <
> https://www.astronomer.io/>
>
> Download our new ebook. <http://marketing.astronomer.io/guide/> From
> Volume
> to Value - A Guide to Data Engineering.
>


Re: Something May Be Wrong with the Travis CI Tests

2018-10-14 Thread Driesprong, Fokko
Hi XD,

This is a very valid point. I think most of the Operators are still good,
since the PythonOperator is used quite a lot also in other tests, but we
should re-enable these tests as well like you mention. Nevertheless it
might be that the tests are outdated because they aren't updated. Thanks
for addressing this and picking it up.

This is also a nice opportunity for people who want to get involved into
contributing to Airflow.

Cheers, Fokko

Op za 13 okt. 2018 om 14:48 schreef Deng Xiaodong :

> Hi Fokko,
>
> I have tried your idea. You are correct: after prepend the filename with
> "test_", the CI test failed as we want (
>
> https://travis-ci.org/XD-DENG/incubator-airflow/builds/440983339?utm_source=github_status_medium=notification
> ).
> It DOES relate to the test discovery.
>
> We need to tackle this issue to make sure these tests really work (by
> prepending the test file names with "test_").
>
> But my concern is that some of these tests were never really run, and their
> corresponding operators/hooks/sensors may be very "unhealthy" (only in
> folder "tests/operators", there are 9 test scripts which were not named
> correctly, i.e., never really run). We can fix the tests itself
> quite easily, but fixing the potential "accumulated" issues in these
> corresponding operators/hooks/sensors may make this a big ticket to work
> on.
>
> Please let me know what you think.
> (I will start from DockerOperator first though).
>
>
> XD
>
> On Sat, Oct 13, 2018 at 8:02 PM Driesprong, Fokko 
> wrote:
>
> > Hi XD,
> >
> > Very good point. I was looking into this recently, but since time is a
> > limited matter, I did not really dig into it. It has to do with the test
> > discovery. The python_operator does not match the given pattern test*.py:
> >
> >
> https://docs.python.org/3/library/unittest.html#cmdoption-unittest-discover-p
> >
> > Could you try to prepend the filename with test_. For example,
> > test_python_operator.py?
> >
> > Cheers, Fokko
> >
> > Op za 13 okt. 2018 om 13:51 schreef Deng Xiaodong :
> >
> > > Hi folks, especially our committers,
> > >
> > > Something may be wrong with our Travis CI tests, unless I
> > > misunderstood/missed something.
> > >
> > > I'm checking *DockerOperator*, and some implementations inside are not
> > > making sense to me. But no CI tests ever failed due to it. When I check
> > the
> > > log of the historical Travis CI, surprisingly, I found the test of
> > > DockerOperator never really run (you search any one of the recent
> Travis
> > > log).
> > >
> > > To prove this, I forked the latest master branch and tried to add "self
> > > .assertTrue(1 == 0)" into the code of
> tests/operators/docker_operator.py
> > > <
> > >
> >
> https://github.com/XD-DENG/incubator-airflow/commit/2d6f47202349aa75b8d3e8e1631a285d2d75f1e7#diff-17e0452f4ce967751edfa767d46ae0ce
> > > >
> > >  and tests/operators/python_operator.py
> > > <
> > >
> >
> https://github.com/XD-DENG/incubator-airflow/commit/d7e4205f2f25dc2ea29356e4f43543f9b0bca963#diff-b5351e876d48957e2b64da5c16b0bd60
> > > >,
> > > which would for sure fail the tests. However, and as I suspected, the
> > > Travis CI passed (
> > > https://github.com/XD-DENG/incubator-airflow/commits/patch-6). This
> > means
> > > these two tests were never invoked during the Travis CI, and I believe
> > > these two are not the only tests affected.
> > >
> > > May anyone take a look into this? If I did misunderstand/miss
> something,
> > > kindly let me know.
> > >
> > > Many thanks!
> > >
> > > XD
> > >
> >
>


Re: Something May Be Wrong with the Travis CI Tests

2018-10-13 Thread Driesprong, Fokko
Hi XD,

Very good point. I was looking into this recently, but since time is a
limited matter, I did not really dig into it. It has to do with the test
discovery. The python_operator does not match the given pattern test*.py:
https://docs.python.org/3/library/unittest.html#cmdoption-unittest-discover-p

Could you try to prepend the filename with test_. For example,
test_python_operator.py?

Cheers, Fokko

Op za 13 okt. 2018 om 13:51 schreef Deng Xiaodong :

> Hi folks, especially our committers,
>
> Something may be wrong with our Travis CI tests, unless I
> misunderstood/missed something.
>
> I'm checking *DockerOperator*, and some implementations inside are not
> making sense to me. But no CI tests ever failed due to it. When I check the
> log of the historical Travis CI, surprisingly, I found the test of
> DockerOperator never really run (you search any one of the recent Travis
> log).
>
> To prove this, I forked the latest master branch and tried to add "self
> .assertTrue(1 == 0)" into the code of tests/operators/docker_operator.py
> <
> https://github.com/XD-DENG/incubator-airflow/commit/2d6f47202349aa75b8d3e8e1631a285d2d75f1e7#diff-17e0452f4ce967751edfa767d46ae0ce
> >
>  and tests/operators/python_operator.py
> <
> https://github.com/XD-DENG/incubator-airflow/commit/d7e4205f2f25dc2ea29356e4f43543f9b0bca963#diff-b5351e876d48957e2b64da5c16b0bd60
> >,
> which would for sure fail the tests. However, and as I suspected, the
> Travis CI passed (
> https://github.com/XD-DENG/incubator-airflow/commits/patch-6). This means
> these two tests were never invoked during the Travis CI, and I believe
> these two are not the only tests affected.
>
> May anyone take a look into this? If I did misunderstand/miss something,
> kindly let me know.
>
> Many thanks!
>
> XD
>


Apache Airflow / Cloud Composer workshops Amsterdam

2018-10-12 Thread Driesprong, Fokko
Hi all,

>From October 15-19, 2018, GoDataFest takes place in Amsterdam, The
Netherlands. This week is dedicated to data technology and features free
talks, training sessions and workshops.

Leading tech companies, like AWS (Monday, October 15), Dataiku (Tuesday,
October 16), Databricks (Wednesday, October 17), and Google Cloud
(Thursday, October 18) each host an entire day to share their latest
innovations. The final day, Friday, October 19, is dedicated to
open-source, including Apache Airflow. During the open-source day, October
19, we organize a free Airflow workshop, taking place from 15:00 – 17:00.

Feel free to mix-and-match activities to create your ultimate and personal
data festival. Make sure to register directly, as seats are limited.
http://www.godatafest.com/

Cheers, Fokko


Re: "setup.py test" is being naughty

2018-10-12 Thread Driesprong, Fokko
We're working hard to get rid of the tight Travis integration and moving to
a Docker based setup. I think it should be very easy to get a Docker up and
running which is packed with the required dependencies. Unfortunately we're
not there yet. Also the tox layer feels a bit redundant to me, since we're
using Docker now.

Cheers, Fokko

Op wo 3 okt. 2018 om 15:08 schreef Jarek Potiuk :

> Local testing works well for a number of unit tests when run from the IDE.
> We of course run full suite of tests via docker environment but our own
> test classess/modules are run using local python environment. It's the
> easiest way to configure local python virtualenv with IntelliJ/Pycharm for
> one. You can - in recent version of PyCharm/IntelliJ - have docker python
> environment setup, but there are certain downsides of using it
> (speed/mounting local volumes with sources etc.).
>
> So I think we should not really discourage running at least some tests
> locally. Maybe (if there are not many of those) we could identify the tests
> which require the full-blown docker environment and mark them with
> skipUnless and only have them executed when we are inside dockerized
> environment for unit tests ?
>
> J.
>
>
> On Wed, Oct 3, 2018 at 1:48 PM Holden Karau  wrote:
>
> > I think (in the short term) discontinuing local testing and telling folks
> > to use the docker based approach makes more sense (many of the tests
> have a
> > complex set of dependencies that don't make sense to try and test
> locally).
> > What do other folks think?
> >
> > On Wed, Oct 3, 2018 at 4:45 AM EKC (Erik Cederstrand)
> >  wrote:
> >
> > > The test suite is also trying to create /usr/local/bin/airflow, which
> > > means I can't run the test suite on a machine that actually uses
> > > /usr/local/bin/airflow. And the default config file doesn't find the
> > MySQL
> > > server I set up locally. I'm trying the Docker-based test environment
> > now.
> > >
> > >
> > > It seems the local test setup either needs polishing or should be
> > > discontinued.
> > >
> > >
> > > Erik
> > >
> > > 
> > > From: EKC (Erik Cederstrand)
> > > Sent: Wednesday, October 3, 2018 12:01:00 PM
> > > To: dev@airflow.incubator.apache.org
> > > Subject: "setup.py test" is being naughty
> > >
> > >
> > > Hi all,
> > >
> > >
> > > I wanted to contribute a simple patch, and as a good open source
> citizen
> > I
> > > wanted to also contribute a test. So I git clone from GitHub, create a
> > > virtualenv and run "setup.py test". First experience is that my
> > > /etc/krb5.conf is overwritten, which means my account is locked out of
> > all
> > > systems here at work. I recovered from that, only to find out that
> > > ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub were also overwritten. Now I'm not
> > very
> > > amused.
> > >
> > >
> > > Did I miss something in CONTRIBUTING.md?
> > >
> > >
> > > Erik
> > >
> >
> >
> > --
> > Twitter: https://twitter.com/holdenkarau
> > Books (Learning Spark, High Performance Spark, etc.):
> > https://amzn.to/2MaRAG9  
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >
>
>
> --
>
> *Jarek Potiuk, Principal Software Engineer*
> Mobile: +48 660 796 129
>


Re: Pinning dependencies for Apache Airflow

2018-10-04 Thread Driesprong, Fokko
Hi Jarek,

Thanks for bringing this up. I missed the discussion on Slack since I'm on
holiday, but I saw the thread and it was way too interesting, and therefore
this email :)

This is actually something that we need to address asap. Like you mention,
we saw it earlier that specific transient dependencies are not compatible
and then we end up with a breaking CI, or even worse, a broken release.
Earlier we had in the setup.py the fixed versions (==) and in a separate
requirements.txt the requirements for the CI. This was also far from
optimal since we had two versions of the requirements.

I like the idea that you are proposing. Maybe we can do an experiment with
it, because of the nature of Airflow (orchestrating different systems), we
have a huge list of dependencies. To not install everything, we've created
groups. For example specific libraries when you're using the Google Cloud,
Elastic, Druid, etc. So I'm curious how it will work with the `
extras_require` of Airflow

Regarding the pipenv. I don't use any pipenv/virtualenv anymore. For me
Docker is much easier to work with. I'm also working on a PR to get rid of
tox for the testing, and move to a more Docker idiomatic test pipeline.
Curious what you thoughts are on that.

Cheers, Fokko

Op do 4 okt. 2018 om 15:39 schreef Arthur Wiedmer :

> Thanks Jakob!
>
> I think that this is a huge risk of Slack.
> I am not against Slack as a support channel, but it is a slippery slope to
> have more and more decisions/conversations happening there, contrary to
> what we hope to achieve with the ASF.
>
> When we are starting to discuss issues of development, extensions and
> improvements, it is important for the discussion to happen in the mailing
> list.
>
> Jarek, I wouldn't worry too much, we are still in the process of learning
> as a community. Welcome and thank you for your contribution!
>
> Best,
> Arthur.
>
> On Thu, Oct 4, 2018 at 1:42 PM Jarek Potiuk 
> wrote:
>
> > Thanks for pointing it out Jakob.
> >
> > I am still very fresh in the ASF community and learning the ropes and
> > etiquette and code of conduct. Apologies for my ignorance.
> > I re-read the conduct and FAQ now again - with more understanding and
> will
> > pay more attention to wording in the future. As you mentioned it's more
> the
> > wording than intentions, but since it was in TL;DR; it has stronger
> > consequences.
> >
> > BTW. Thanks for actually following the code of conduct and pointing it
> out
> > in respectful manner. I really appreciate it.
> >
> > J.
> >
> > Principal Software Engineer
> > Phone: +48660796129
> >
> > On Thu, 4 Oct 2018, 20:41 Jakob Homan,  wrote:
> >
> > > > TL;DR; A change is coming in the way how dependencies/requirements
> are
> > > > specified for Apache Airflow - they will be fixed rather than
> flexible
> > > (==
> > > > rather than >=).
> > >
> > > > This is follow up after Slack discussion we had with Ash and Kaxil -
> > > > summarising what we propose we'll do.
> > >
> > > Hey all.  It's great that we're moving this discussion back from Slack
> > > to the mailing list.  But I've gotta point out that the wording needs
> > > a small but critical fix up:
> > >
> > > "A change *is* coming... they *will* be fixed"
> > >
> > > needs to be
> > >
> > > "We'd like to propose a change... We would like to make them fixed."
> > >
> > > The first says that this decision has been made and the result of the
> > > decision, which was made on Slack, is being reported back to the
> > > mailing list.  The second is more accurate to the rest of the
> > > discussion ('what we propose...').  And again, since it's axiomatic in
> > > ASF that if it didn't happen on a list, it didn't happen[1], we gotta
> > > make sure there's no confusion about where the community is on the
> > > decision-making process.
> > >
> > > Thanks,
> > > Jakob
> > >
> > > [1]
> > >
> >
> https://community.apache.org/newbiefaq.html#NewbieFAQ-IsthereaCodeofConductforApacheprojects
> > > ?
> >
> > On Thu, Oct 4, 2018 at 9:56 AM Alex Guziel
> > >  wrote:
> > > >
> > > > You should run `pip check` to ensure no conflicts. Pip does not do
> this
> > > on
> > > > its own.
> > > >
> > > > On Thu, Oct 4, 2018 at 9:20 AM Jarek Potiuk <
> jarek.pot...@polidea.com>
> > > > wrote:
> > > >
> > > > > Great that this discussion already happened :). Lots of useful
> things
> > > in
> > > > > it. And yes - it means pinning in requirement.txt - this is how
> > > pip-tools
> > > > > work.
> > > > >
> > > > > J.
> > > > >
> > > > > Principal Software Engineer
> > > > > Phone: +48660796129
> > > > >
> > > > > On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, <
> arthur.wied...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Jarek,
> > > > > >
> > > > > > I will +1 the discussion Dan is referring to and George's advice.
> > > > > >
> > > > > > I just want to double check we are talking about pinning in
> > > > > > requirements.txt only.
> > > > > >
> > > > > > This offers the ability to
> > > > > > pip install -r 

Re: It's very hard to become a committer on the project

2018-09-23 Thread Driesprong, Fokko
Many thanks for the effort Stefan, I just went through them and updated the
status accordingly in Jira.

Thanks!

Cheers, Fokko

Op za 22 sep. 2018 om 23:25 schreef Stefan Seelmann :

> On 9/20/18 10:02 PM, Driesprong, Fokko wrote:
> > us still have a full time job on the side :) Tomorrow I'll spend time to
> > clean up the old Jira's.
>
> My cold prevents me for doing creative things, so I went through Jira
> and picked the low hanging fruits. The following issues can be closed IMHO:
>
> Spam:
> * AIRFLOW-1944
> * AIRFLOW-2277
>
> Missing description:
> * AIRFLOW-2583
>
> Fixed (commit in master):
> * AIRFLOW-1448
> * AIRFLOW-692
> * AIRFLOW-553
> * AIRFLOW-529
> * AIRFLOW-7
> * AIRFLOW-2810
> * AIRFLOW-2863
> * AIRFLOW-2419
> * AIRFLOW-951
> * AIRFLOW-1487
> * AIRFLOW-1532
> * AIRFLOW-1531
> * AIRFLOW-2653
> * AIRFLOW-3074
> * AIRFLOW-276
>
> Fixed (duplicate):
> * AIRFLOW-768
> * AIRFLOW-499
> * AIRFLOW-699
> * AIRFLOW-1701
> * AIRFLOW-1386
> * AIRFLOW-1353
> * AIRFLOW-346
>
> Not reproducable:
> * AIRFLOW-1285
>
> Fixed or no longer valid because of new FAB/RBAC UI:
> * AIRFLOW-294
> * AIRFLOW-292
>
> Issues referencing specifc issues with Airflow 1.6/1.7:
> * AIRFLOW-668
> * AIRFLOW-1137
> * AIRFLOW-344
> * AIRFLOW-317
>


Re: Fundamental change - Separate DAG name and id.

2018-09-20 Thread Driesprong, Fokko
I like the dag_id for both the name and as an unique identifier. If you
change the dag in such a way, that it deserves a new name, you probably
want to create a new dag anyway. If you want to give some additional
context, you can use the description field:
https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L3131-L3132

The name of the file of dag does not have any influence.

My 2¢

Cheers, Fokko

Op do 20 sep. 2018 om 19:40 schreef James Meickle
:

> I'm personally against having some kind of auto-increment numeric ID for
> DAGs. While this makes a lot of sense for systems where creation is a
> database activity (like a POST request), in Airflow, DAG creation is
> actually a code ship activity. There are all kinds of complex scenarios
> around that:
>
> - I revert a commit and a DAG disappears or is renamed
> - I run the same file, twice, with multiple parameters to create two DAGs
> - I create the DAG in both staging and prod, but they wind up with
> different IDs
>
> It's just too hard to automatically track these scenarios.
>
> If we really wanted to put something like this in place, it would first
> make more sense to decouple DAG creation from code shipping, and instead
> prefer creation of a DAG outside of code (but with a definition that
> references which git repo/committish/file/arguments/etc. to use). Then if
> you do something like rename a file, the DAG breaks, but at least still
> exists in the db with that ID and history still makes sense once you update
> the DAG definition with the new code location.
>
> On Thu, Sep 20, 2018 at 4:52 AM airflowuser
>  wrote:
>
> > Hi,
> > though this could have been explained on Jira I think this should be
> > discussed first.
> >
> > The problem:
> > Airflow mixes DAG name with id. It uses same filed for both purposes.
> >
> > I assume that most of you use the dag_id to describe what the DAG
> actually
> > does.
> > For example:
> >
> > dag = DAG(
> > dag_id='cost_report_daily',
> > ...
> > )
> >
> > This dag_id is reflected to the dag id column in the UI.
> > Now, lets say that you want to add another task to this specific dag -
> You
> > are to be extremely careful when you change the dag_id to represent the
> new
> > functionality for example : dag_id='cost_expenses_reports_daily' . This
> > will break the history of the DAG.
> >
> > Or even with simpler use case.. the user just want to change the name he
> > sees on the UI.
> >
> > I suggest to have a discussion if the dag_id should be split into id (an
> > actual id) and name to reflect what it does. When the "connection" is
> done
> > by id's  - names can change as much as you want without breaking
> anything.
> > essentially it becomes a field uses for display purpose  only.
> >
> > * I didn't mention also the issue of DAG file name which can also cause
> > trouble if someone wants to change it.
> >
> > Sent with [ProtonMail](https://protonmail.com) Secure Email.
>


Re: It's very hard to become a committer on the project

2018-09-20 Thread Driesprong, Fokko
Some history. Recently we've moved the to Gitbox
 and moved away from the Apache repo itself to
simplify the setup. Before that, the code on Github was merely a mirror of
the Apache git repo. Before Gitbox there was a Python script that would
like the issues to the Github PR's. Now because we've moved to Github
itself, this piece of automation is gone and we need to get this working
again. Also for projects like Spark there is a lot of automation going on
between Jira and Github. Unfortunately I don't have a lot of experience
here, but I suspect a whole bunch of hooks. For my perspective, automation
is key here.

Furthermore, as Ash is saying. It is hard for Airflow to pick up any
arbitrary task from Jira since the usage of Airflow ranges form a wide
range of applications, from AWS to GCP, from MySQL to Druid, from Bash to
Slack. My advise would be the same as Ash; start using Airflow, and if you
run into anything, raise a ticket (first check if it is already there), and
open a PR. Like your saying; you're pretty new to Airflow, it would be best
to get some experience first.

The community (committers, contributors and users) is happy to help and
assist, like the recent conversations on Slack. But nobody is getting paid
to work on Airflow, and this isn't a problem, but this means that most of
us still have a full time job on the side :) Tomorrow I'll spend time to
clean up the old Jira's.

Cheers, Fokko


Op do 20 sep. 2018 om 11:40 schreef airflowuser
:

> >> Are you volunteering to sponsor someone's time to be able to do this?
>
> I am. But I have no knowledge of all the components of the project.
> I doubt I can be much of value for this tagging task.
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐ Original Message ‐‐‐
> On Thursday, September 20, 2018 12:37 PM, Ash Berlin-Taylor <
> a...@apache.org> wrote:
>
> > > Remember my basic question: I want to contribute - how on earth I can
> find a ticket that is suitable for first time committer? Can you show me?
> >
> > There aren't that many feature requests in Jira, so looking there for
> easy tickets, is as you have probably found a fruitless exercise. I'd
> recommend using Airflow and when you come across something you want fixed,
> or a feature you want added, that you open a PR for it.
> >
> > > Again, If decided to stay with Jira.. I highly recommend that someone
> from the project will maintain it. Don't allow to regular users to tag and
> set priorities for the tickets.. someone from the project should do it.
> >
> > Are you volunteering to sponsor someone's time to be able to do this?
> >
> > > Sent with ProtonMail Secure Email.
> > > ‐‐‐ Original Message ‐‐‐
> > > On Tuesday, September 18, 2018 11:57 AM, Sid Anand san...@apache.org
> wrote:
> > >
> > > > Hi Folks!
> > > > For some history, Airlfow started on GH issues. We also had a very
> popular Google group. When we moved to Apache, we were told that Jira was
> the way we needed to go for issue tracking because it resided on Apache
> infrastructure. When we moved over, we had to drop 100+ GH issues on the
> floor -- there was no way to transfer them to Jira and maintain the
> original submitter/owner info since there was no mapping of users between
> the 2 systems.
> > > > Here's a pie chart of our existing issues by status:
> > > >
> https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12320023=statuses=12320023=com.atlassian.jira.jira-core-reports-plugin%3Apie-report_token=A5KQ-2QAV-T4JA-FDED|a85ff737799378265f90bab4f1456b5e2811a507|lin=Next
> 
> > > > I'm attaching a screen shot as well.
> > > > I think we all agree that there is better integration between GH PRs
> and GH Issues than between GH PRs and Jira issues.
> > > > There are some practical matters to consider:
> > > >
> > > > -   For the 1100-1200 unclosed/unresolved issues, how will we
> transfer them to GH or will we drop those on the floor? How would we map
> submitters between the 2 systems, and how would we transfer the
> content/comments,etc...
> > > > -   For the existing closed PRs (>3k), whose PRs reference JIRA,
> we'd need to keep JIRA around in read-only mode so we could reference the
> bug/feature details, but somehow disallow new JIRA creations, lest some
> people continue to use it to create new issues
> > > > -   I'm assuming the GH issue naming would not conflict with that of
> JIRA naming in commit message subjects and PRs. In other words,
> incubator-airlow-1 vs AIRFLOW-1 or airflow-1 vs AIRFLOW-1 or possibly
> conflict at AIRFLOW-1? Once we graduate, I'm pretty sure the incubator name
> will be dropped, so there may be a naming conflict.
> > > >
> > > > In the end, these are 2 different 

Re: Database referral integrity

2018-09-18 Thread Driesprong, Fokko
I'm in favor of having referential integrity. It will add some load in
having to enforce the referential integrity, but it will also make sure
that the database stays clean. Also in Airflow we use transactions which
will make sure that the integrity checks are not validated on every
statement, but after the commit. I'm happy to help with this as well.

Cheers, Fokko

Op di 18 sep. 2018 om 11:07 schreef Bolke de Bruin :

> Adding these kind of checks which work for integrity well make database
> access pretty slow. In addition it isnt there because in the past there was
> no strong connection between for example tasks and dagruns, it was more or
> less just coincidental. There also so some bisecting tools that probably
> have difficulty functioning in a new regime. In other words it is not an
> easy change and it will have operational challenges.
>
> > On 18 Sep 2018, at 11:03, Ash Berlin-Taylor  wrote:
> >
> > Ooh good spot.
> >
> > Yes I would be in favour of adding these, but as you say we need to
> thing about how we might migrate old data.
> >
> > Doing this at 2.0.0 and providing a cleanup script (or doing it as part
> of the migration?) is probably the way to go.
> >
> > -ash-
> >
> >> On 17 Sep 2018, at 19:56, Stefan Seelmann 
> wrote:
> >>
> >> Hi,
> >>
> >> looking into the DB schema there is almost no referral integrity
> >> enforced at the database level. Many foreign key constraints between
> >> dag, dag_run, task_instance, xcom, dag_pickle, log, etc would make sense
> >> IMO.
> >>
> >> Is there a particular reason why that's not implemented?
> >>
> >> Introducing it now will be hard, probably any real-world setup has some
> >> violations. But I'm still in favor of this additional safety net.
> >>
> >> Kind Regards,
> >> Stefan
> >
>
>


Re: Guidelines on Contrib vs Non-contrib

2018-09-18 Thread Driesprong, Fokko
I fully agree with using plain Python modules :)

I don't think a lot of hooks/operators graduate to core since it will break
the import. A few of them, for example Databricks and the Google hooks are
mature enough. For me the main point is having test coverage and a stable
API.

Cheers, Fokko

Op di 18 sep. 2018 om 18:30 schreef Victor Noagbodji <
vnoagbo...@amplify-analytics.com>:

> yes, please!
>
> > On Sep 18, 2018, at 12:23 PM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
> >
> > +1 for deprecating operators/hooks as plugins, let's use Python's good
> old
> > python packages and maybe python "entry points" if we want to inject them
> > in "airflow.operators"/"airflow.hooks" (which is probably not necessary)
> >
> > On Tue, Sep 18, 2018 at 2:12 AM Ash Berlin-Taylor 
> wrote:
> >
> >> Operators and hooks don't need any special plugin system - simply having
> >> them as as separate Python modules which are imported using normal
> python
> >> semantics is enough.
> >>
> >> In fact now that I think about it: I want to deprecate the plugins
> >> registering hooks/operators etc and limit it to only bits which a simple
> >> python import can't manage - which I think is only anything that needs
> to
> >> be registered with another system, such as custom routes in the web UI.
> >>
> >> I'll draft an AIP for this soon.
> >>
> >> -ash
> >>
> >>
> >>> On 18 Sep 2018, at 00:50, George Leslie-Waksman 
> >> wrote:
> >>>
> >>> Given we have a plugin system, could we alternatively move away from
> >>> keeping non-core supported code outside of the core project/repo?
> >>>
> >>> It would hugely decrease the surface area of the main repository and
> >>> testing infrastructure to get most of the contrib code out to its own
> >> place.
> >>>
> >>> Further, it would decrease the committer burden of having to
> >> approve/merge
> >>> code that is not supposed to be their responsibility.
> >>>
> >>> On Mon, Sep 17, 2018 at 4:37 PM Tim Swast 
> >> wrote:
> >>>
> > Individual operators and hooks living in separate repositories on
> >> github
>  (or possibly other Apache projects), which are then distributed by pip
> >> and
>  installed as libraries seems like it would scale better.
> 
>  Pandas did this about a year ago, and it's seemed to have worked well.
> >> For
>  example, pandas.read_gbq is a very thin wrapper around
> >> pandas_gbq.read_gbq
>  (distributed as a separate package). It has made it easier for me to
> >> track
>  issues corresponding to my area of expertise.
> 
>  On Sun, Sep 16, 2018 at 1:25 PM Jakob Homan 
> wrote:
> 
> >> My understanding as a contributor is that if a hook/operator is in
>  core,
> > it
> >> means that a committer is willing to take personal responsibility to
> >> maintain it (or at least help maintain it), and everything else goes
> >> in
> >> contrib.
> >
> > That's not correct.  All of the code is owned by the entire
> > community[1]; no one person is responsible for any of it.  There's no
> > silos, fiefdoms, walled gardens, etc.  If the community cannot
> support
> > a piece of code it should be deprecated and subsequently removed.
> >
> > Contrib sections are almost always problematic for this reason.
> > Hadoop ended up abandoning its.  Because Airflow acts as a gathering
> > point for so many disparate technologies (databases, storage systems,
> > compute engines, etc.), trying to keep all of them corralled and up
> to
> > date will be very difficult.  Individual operators and hooks living
> in
> > separate repositories on github (or possibly other Apache projects),
> > which are then distributed by pip and installed as libraries seems
> > like it would scale better.
> >
> > -Jakob
> >
> > [1]
> >> https://blogs.apache.org/foundation/entry/success-at-apache-a-newbie
> >
> > On 15 September 2018 at 13:29, Jeff Payne 
> wrote:
> >> How many operators are added to contrib per month? Is it too many to
> > make the decision case by case? If so, then the above mentioned rule
>  sounds
> > fairly reasonable. However, if that's the rule, shouldn't a bunch of
> > existing modules be moved from contrib to core?
> >>
> >> Get Outlook for Android
> >>
> >> 
> >> From: Taylor Edmiston 
> >> Sent: Saturday, September 15, 2018 1:13:47 PM
> >> To: dev@airflow.incubator.apache.org
> >> Subject: Re: Guidelines on Contrib vs Non-contrib
> >>
> >> My understanding as a contributor is that if a hook/operator is in
>  core,
> > it
> >> means that a committer is willing to take personal responsibility to
> >> maintain it (or at least help maintain it), and everything else goes
> >> in
> >> contrib.
> >>
> >> *Taylor Edmiston*
> >> Blog  | LinkedIn
> >> 

Re: Call for fixes for Airflow 1.10.1

2018-09-12 Thread Driesprong, Fokko
Hi Ash,

I've cherry-picked two commits on top of 1.10-test branch:

https://github.com/apache/incubator-airflow/commit/4d2f83b19af3489d6c9563d51210a3dab2f38b26

https://github.com/apache/incubator-airflow/commit/0d2bb5cd215bd3e2d4f47033230ea03b14b36634


This was on request of Yuan of the Amazon SageMaker team, because some
companies are waiting for this operator. Since the commits don't modify any
existing code, I took the liberty of cherry-picking these on top of the
1.10.1 branch.


Cheers, Fokko



Op zo 9 sep. 2018 om 20:37 schreef Ash Berlin-Taylor :

> I've (re)created the v1-10-test branch with some of the fixes
> cherry-picked in. I can't give much time this week (as what spare time I
> have is being used up working on my talk) but I'll work more on this
> towards the end of next week.
>
> I'll look at resolved Jira tickets targeted with a fix version of 1.10.1
> (i.e. if you want it in 1.10.1, merge the pr into master and also mark the
> Jira as fix in 1.10.1 and I'll work on cherry-picking the fixes. If they
> can be. If it is diffucult/has other things to cherry pick in I might
> change the fix version on you.)
>
> -ash
>
>
> > On 9 Sep 2018, at 19:22, Ash Berlin-Taylor  wrote:
> >
> > On 9 September 2018 18:19:40 BST, Bolke de Bruin 
> wrote:
> > You can already add them to v1-10-test.
> >
> > Normally we are a bit cautious to this if you are not the release
> manager to ensure that he/she knows what the state is.
> >
> > B
> >
> > Op zo 9 sep. 2018 18:02 schreef Driesprong, Fokko  >:
> > Can we add this one as well?
> >
> > https://github.com/apache/incubator-airflow/pull/3862 <
> https://github.com/apache/incubator-airflow/pull/3862>
> > https://issues.apache.org/jira/browse/AIRFLOW-1917 <
> https://issues.apache.org/jira/browse/AIRFLOW-1917>
> >
> > I'm happy to cherry pick them onto the 1.10.1 by myself as well. Any idea
> > when we will start this branch?
> >
> > Cheers, Fokko
> >
> > Op do 6 sep. 2018 om 08:08 schreef Deng Xiaodong  <mailto:xd.den...@gmail.com>>:
> >
> > > Hi Ash,
> > >
> > >
> > > May you consider including JIRA ticket 2848 (PR 3693, Ensure dag_id in
> > > metadata "job" for LocalTaskJob) in 1.10.1 as well?
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2848 <
> https://issues.apache.org/jira/browse/AIRFLOW-2848>
> > >
> > > https://github.com/apache/incubator-airflow/pull/3693 <
> https://github.com/apache/incubator-airflow/pull/3693>
> > >
> > >
> > > This is a bug in terms of metadata, which also affects the UI
> > > “Browse->Jobs”.
> > >
> > >
> > > Thanks.
> > >
> > >
> > > Regards,
> > >
> > > XD
> > >
> > > On Wed, Sep 5, 2018 at 23:55 Bolke de Bruin  <mailto:bdbr...@gmail.com>> wrote:
> > >
> > > > You should push these to v1-10-test not to stable. Only once we start
> > > > cutting RCs you should push to -stable. See the docs. This ensures a
> > > stable
> > > > “stable”branch.
> > > >
> > > > Cheers
> > > > Bolke.
> > > >
> > > > > On 3 Sep 2018, at 14:20, Ash Berlin-Taylor  <mailto:a...@apache.org>> wrote:
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > I'm starting the process of gathering fixes for a 1.10.1. So far
> the
> > > > list of issues I have that we should pull in are
> > > >
> > >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20AIRFLOW%20AND%20fixVersion%20%3D%201.10.1%20ORDER%20BY%20key%20ASC
> <
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20AIRFLOW%20AND%20fixVersion%20%3D%201.10.1%20ORDER%20BY%20key%20ASC
> >
> > > > (reproduces below)
> > > > >
> > > > > I will start pushing these as cherry-picked commits to the
> v1-10-stable
> > > > branch today.
> > > > >
> > > > > If you have something that is not in the list below let me know.
> I'd
> > > > like to keep this to bug fixes against 1.10.0 only if possible.
> > > > >
> > > > > https://issues.apache.org/jira/browse/AIRFLOW-2145 <
> https://issues.apache.org/jira/browse/AIRFLOW-2145> Deadlock after
> > > > clearing a running task
> > > > > https://github.com/apache/incubator-airflow/pull/3657 <
> https://github.com/apache/incubator-airflow/pull/3657>
> > > > >
> > >

Re: Call for fixes for Airflow 1.10.1

2018-09-09 Thread Driesprong, Fokko
Can we add this one as well?

https://github.com/apache/incubator-airflow/pull/3862
https://issues.apache.org/jira/browse/AIRFLOW-1917

I'm happy to cherry pick them onto the 1.10.1 by myself as well. Any idea
when we will start this branch?

Cheers, Fokko

Op do 6 sep. 2018 om 08:08 schreef Deng Xiaodong :

> Hi Ash,
>
>
> May you consider including JIRA ticket 2848 (PR 3693, Ensure dag_id in
> metadata "job" for LocalTaskJob) in 1.10.1 as well?
>
> https://issues.apache.org/jira/browse/AIRFLOW-2848
>
> https://github.com/apache/incubator-airflow/pull/3693
>
>
> This is a bug in terms of metadata, which also affects the UI
> “Browse->Jobs”.
>
>
> Thanks.
>
>
> Regards,
>
> XD
>
> On Wed, Sep 5, 2018 at 23:55 Bolke de Bruin  wrote:
>
> > You should push these to v1-10-test not to stable. Only once we start
> > cutting RCs you should push to -stable. See the docs. This ensures a
> stable
> > “stable”branch.
> >
> > Cheers
> > Bolke.
> >
> > > On 3 Sep 2018, at 14:20, Ash Berlin-Taylor  wrote:
> > >
> > > Hi everyone,
> > >
> > > I'm starting the process of gathering fixes for a 1.10.1. So far the
> > list of issues I have that we should pull in are
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20AIRFLOW%20AND%20fixVersion%20%3D%201.10.1%20ORDER%20BY%20key%20ASC
> > (reproduces below)
> > >
> > > I will start pushing these as cherry-picked commits to the v1-10-stable
> > branch today.
> > >
> > > If you have something that is not in the list below let me know. I'd
> > like to keep this to bug fixes against 1.10.0 only if possible.
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2145 Deadlock after
> > clearing a running task
> > > https://github.com/apache/incubator-airflow/pull/3657
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2476 update tabulate dep
> > to 0.8.2
> > > https://github.com/apache/incubator-airflow/pull/3835
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2778 Bad Import in
> > collect_dag in DagBag
> > > https://github.com/apache/incubator-airflow/pull/3624
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2900 Show code for
> > packaged DAGs
> > > https://github.com/apache/incubator-airflow/pull/3749
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2949 Add syntax
> highlight
> > for single quote strings
> > > https://github.com/apache/incubator-airflow/pull/3795
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2984 Cannot convert
> > naive_datetime when task has a naive start_date/end_date
> > > https://github.com/apache/incubator-airflow/pull/3822
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2979 Deprecated Celery
> > Option not in Options list
> > > https://github.com/apache/incubator-airflow/pull/3832
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2989 No Parameter to
> > change bootDiskType for DataprocClusterCreateOperator
> > > https://github.com/apache/incubator-airflow/pull/3825
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2990 Docstrings for
> > Hooks/Operators are in incorrect format
> > > https://github.com/apache/incubator-airflow/pull/3820
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2994 flatten_results in
> > BigQueryOperator/BigQueryHook should default to None
> > > https://github.com/apache/incubator-airflow/pull/3829
> > >
> > >
> > > In addition to those PRs which are already marked with Fix Version of
> > 1.10.1 I think we should also pull in these:
> > >
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2713 Rename async
> > variable for Python 3.7.0 compatibility
> > > https://github.com/apache/incubator-airflow/pull/3561
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2895 Prevent scheduler
> > from spamming heartbeats/logs
> > > https://github.com/apache/incubator-airflow/pull/3747
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2921 A trivial
> > incorrectness in CeleryExecutor()
> > > https://github.com/apache/incubator-airflow/pull/3773
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2866 Missing CSRF Token
> > Error on Web RBAC UI Create/Update Operations
> > > https://github.com/apache/incubator-airflow/pull/3804
> > >
> > >
> > > https://issues.apache.org/jira/browse/AIRFLOW-2951
> > > https://github.com/apache/incubator-airflow/pull/3798 Update dag_run
> > table end_date when state change
> > > (though as written it has a few other deps to cherry pick in, so will
> > see about this one)
> > >
> >
> >
>


Re: re run build

2018-09-05 Thread Driesprong, Fokko
Hi,

This PR has to rebase onto master first to get rid of the conflicts.

Cheers, Fokko

Op wo 5 sep. 2018 om 09:20 schreef airflowuser
:

> Can someone assist with rerun build for:
> https://github.com/apache/incubator-airflow/pull/2488
>
> tried to send it to committers mailing list but for some reason I can only
> get mails from list but not sending to list :\
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.


Re: Call for fixes for Airflow 1.10.1

2018-09-03 Thread Driesprong, Fokko
Ash, thanks for picking this up!

Maybe add this one:
https://github.com/apache/incubator-airflow/pull/3804
https://issues.apache.org/jira/browse/AIRFLOW-2866

Cheers, Fokko


Op ma 3 sep. 2018 om 14:47 schreef Deng Xiaodong :

> Hi Ash,
>
> JIRA ticket 2922  was
> fixed together with *2921* in PR 3773
>  (which was already
> listed by you). Understand that it may be included in 1.10.1 automatically
> if this PR is included, but may still be good to update ticket 2922's "*Fix
> Version*" to "*1.10.1*" in JIRA for better tracking?
>
> Thanks!
>
> Best regards,
> XD
>
> On Mon, Sep 3, 2018 at 8:20 PM Ash Berlin-Taylor  wrote:
>
> > Hi everyone,
> >
> > I'm starting the process of gathering fixes for a 1.10.1. So far the list
> > of issues I have that we should pull in are
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20AIRFLOW%20AND%20fixVersion%20%3D%201.10.1%20ORDER%20BY%20key%20ASC
> > (reproduces below)
> >
> > I will start pushing these as cherry-picked commits to the v1-10-stable
> > branch today.
> >
> > If you have something that is not in the list below let me know. I'd like
> > to keep this to bug fixes against 1.10.0 only if possible.
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2145 Deadlock after
> > clearing a running task
> > https://github.com/apache/incubator-airflow/pull/3657
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2476 update tabulate dep
> to
> > 0.8.2
> > https://github.com/apache/incubator-airflow/pull/3835
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2778 Bad Import in
> > collect_dag in DagBag
> > https://github.com/apache/incubator-airflow/pull/3624
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2900 Show code for
> packaged
> > DAGs
> > https://github.com/apache/incubator-airflow/pull/3749
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2949 Add syntax highlight
> > for single quote strings
> > https://github.com/apache/incubator-airflow/pull/3795
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2984 Cannot convert
> > naive_datetime when task has a naive start_date/end_date
> > https://github.com/apache/incubator-airflow/pull/3822
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2979 Deprecated Celery
> > Option not in Options list
> > https://github.com/apache/incubator-airflow/pull/3832
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2989 No Parameter to
> change
> > bootDiskType for DataprocClusterCreateOperator
> > https://github.com/apache/incubator-airflow/pull/3825
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2990 Docstrings for
> > Hooks/Operators are in incorrect format
> > https://github.com/apache/incubator-airflow/pull/3820
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2994 flatten_results in
> > BigQueryOperator/BigQueryHook should default to None
> > https://github.com/apache/incubator-airflow/pull/3829
> >
> >
> > In addition to those PRs which are already marked with Fix Version of
> > 1.10.1 I think we should also pull in these:
> >
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2713 Rename async variable
> > for Python 3.7.0 compatibility
> > https://github.com/apache/incubator-airflow/pull/3561
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2895 Prevent scheduler
> from
> > spamming heartbeats/logs
> > https://github.com/apache/incubator-airflow/pull/3747
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2921 A trivial
> > incorrectness in CeleryExecutor()
> > https://github.com/apache/incubator-airflow/pull/3773
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2866 Missing CSRF Token
> > Error on Web RBAC UI Create/Update Operations
> > https://github.com/apache/incubator-airflow/pull/3804
> >
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-2951
> > https://github.com/apache/incubator-airflow/pull/3798 Update dag_run
> > table end_date when state change
> > (though as written it has a few other deps to cherry pick in, so will see
> > about this one)
> >
> >
>


Re: WebHdfsSensor doesn't support HDFS HA

2018-08-29 Thread Driesprong, Fokko
Hi Manu,

Thanks for raising this question. There is a PR for moving
 to hdfs3. There is
code in the existing codebase, which support HA
,
but this might not be for the sensor.

Personally I'm not familiar with pyarrow.hdfs, so I'm not the one to judge
how mature it is. We need to replace Snakebite for sure since it is only
compatible with Python 2.7.

Cheers, Fokko


Op wo 29 aug. 2018 om 04:29 schreef Manu Zhang :

> Hi all,
>
> We've been using WebHdfsSensor happily to sensor the state of upstream
> tasks outputting to HDFS except when there is a namenode switch. I've
> opened https://issues.apache.org/jira/browse/AIRFLOW-2901 to discuss the
> HDFS HA support.
>
> There are two solutions that I can see,
>
> 1. use pyarrow.hdfs which has HA support
> 2. allow user to configure a list of namenodes
>
> WDYT ?
>
> Thanks,
> Manu Zhang
>


Re: [RESULT][VOTE] Release Airflow 1.10.0

2018-08-27 Thread Driesprong, Fokko
Thanks for picking this up Naik! Did not have the time today to upload the
artifacts.

Cheers, Fokko

Op ma 27 aug. 2018 om 18:05 schreef Naik Kaxil :

> I have upload it on PyPi as well and will update the documentation now.
>
> On 27/08/2018, 00:32, "Arthur Wiedmer"  wrote:
>
> Done for Bolke, Fokko and kaxil.
>
> Best,
> Arthur
>
> On Sun, Aug 26, 2018 at 3:08 AM Driesprong, Fokko  >
> wrote:
>
> > Gentle ping! Would be awesome to get 1.10 on Pypi :-)
> >
> > Op wo 22 aug. 2018 om 23:43 schreef Naik Kaxil :
> >
> > > Mine is "kaxil"
> > >
> > > On 22/08/2018, 16:18, "Bolke de Bruin"  wrote:
> > >
> > > @max
> > >
> > > Mine is "bolke"
> > >
> > > Cheers
> > >
> > > B.
> > >
> > > Sent from my iPhone
> > >
> > > >
> > >
> > > Kaxil Naik
> > >
> > > Data Reply
> > > 2nd Floor, Nova South
> > > 160 Victoria Street, Westminster
> > > London SW1E 5LB - UK
> > > phone: +44 (0)20 7730 6000
> > > k.n...@reply.com
> > > www.reply.com
> > >
>
> Kaxil Naik
>
> Data Reply
> 2nd Floor, Nova South
> 160 Victoria Street, Westminster
> London SW1E 5LB - UK
> phone: +44 (0)20 7730 6000
> k.n...@reply.com
> www.reply.com
> On 22 Aug 2018, at 16:13, Driesprong, Fokko 
> > wrote:
> > > >
> > > > Certainly:
> > > https://github.com/apache/incubator-airflow/releases/tag/1.10.0
> > > >
> > > > Cheers, Fokko
> > > >
> > > > Op wo 22 aug. 2018 om 15:18 schreef Ash Berlin-Taylor <
> > > a...@apache.org>:
> > > >
> > > >> Could you push the git tag too please Fokko/Bolke?
> > > >>
> > > >> -ash
> > > >>
> > >     >>> On 22 Aug 2018, at 08:16, Driesprong, Fokko
>  > >
> > > >> wrote:
> > > >>>
> > > >>> Thanks Max,
> > > >>>
> > > >>> My PyPI ID is Fokko
> > > >>>
> > > >>> Cheers, Fokko
> > > >>>
> > > >>> 2018-08-21 22:49 GMT+02:00 Maxime Beauchemin <
> > > maximebeauche...@gmail.com
> > > >>> :
> > > >>>
> > > >>>> I can, what's your PyPI ID?
> > > >>>>
> > > >>>> Max
> > > >>>>
> > > >>>> On Mon, Aug 20, 2018 at 2:11 PM Driesprong, Fokko
> > >  > > >>>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Thanks Bolke!
> > > >>>>>
> > > >>>>> I've just pushed the artifacts to Apache Dist:
> > > >>>>>
> > > >>>>> https://dist.apache.org/repos/dist/release/incubator/
> > > >>>> airflow/1.10.0-incubating/
> > > >>>>>
> > > >>>>> I don't have any access to pypi, this means that I'm not
> able
> > to
> > > upload
> > > >>>> the
> > > >>>>> artifacts over there. Anyone in the position to grand me
> access
> > > or
> > > >> upload
> > > >>>>> it to pypi?
> > > >>>>>
> > > >>>>> Thanks! Cheers, Fokko
> > > >>>>>
> > > >>>>> 2018-08-20 20:06 GMT+02:00 Bolke de Bruin <
> bdbr...@gmail.com>:
> > > >>>>>
> > > >>>>>> Hi Guys and Gals,
> > > >>>>>>
> > > >>>>>> The vote has passed! Apache Airflow 1.10.0 is official.
> > > >>>>>>
> > > >>>>>> As I am AFK for a while can one of the committers please
> 

Re: [RESULT][VOTE] Release Airflow 1.10.0

2018-08-26 Thread Driesprong, Fokko
Gentle ping! Would be awesome to get 1.10 on Pypi :-)

Op wo 22 aug. 2018 om 23:43 schreef Naik Kaxil :

> Mine is "kaxil"
>
> On 22/08/2018, 16:18, "Bolke de Bruin"  wrote:
>
> @max
>
> Mine is "bolke"
>
> Cheers
>
> B.
>
> Sent from my iPhone
>
> >
>
> Kaxil Naik
>
> Data Reply
> 2nd Floor, Nova South
> 160 Victoria Street, Westminster
> London SW1E 5LB - UK
> phone: +44 (0)20 7730 6000
> k.n...@reply.com
> www.reply.com
> On 22 Aug 2018, at 16:13, Driesprong, Fokko  wrote:
> >
> > Certainly:
> https://github.com/apache/incubator-airflow/releases/tag/1.10.0
> >
> > Cheers, Fokko
> >
> > Op wo 22 aug. 2018 om 15:18 schreef Ash Berlin-Taylor <
> a...@apache.org>:
> >
> >> Could you push the git tag too please Fokko/Bolke?
> >>
> >> -ash
> >>
> >>> On 22 Aug 2018, at 08:16, Driesprong, Fokko 
> >> wrote:
> >>>
> >>> Thanks Max,
> >>>
> >>> My PyPI ID is Fokko
> >>>
> >>> Cheers, Fokko
> >>>
> >>> 2018-08-21 22:49 GMT+02:00 Maxime Beauchemin <
> maximebeauche...@gmail.com
> >>> :
> >>>
> >>>> I can, what's your PyPI ID?
> >>>>
> >>>> Max
> >>>>
> >>>> On Mon, Aug 20, 2018 at 2:11 PM Driesprong, Fokko
>  >>>
> >>>> wrote:
> >>>>
> >>>>> Thanks Bolke!
> >>>>>
> >>>>> I've just pushed the artifacts to Apache Dist:
> >>>>>
> >>>>> https://dist.apache.org/repos/dist/release/incubator/
> >>>> airflow/1.10.0-incubating/
> >>>>>
> >>>>> I don't have any access to pypi, this means that I'm not able to
> upload
> >>>> the
> >>>>> artifacts over there. Anyone in the position to grand me access
> or
> >> upload
> >>>>> it to pypi?
> >>>>>
> >>>>> Thanks! Cheers, Fokko
> >>>>>
> >>>>> 2018-08-20 20:06 GMT+02:00 Bolke de Bruin :
> >>>>>
> >>>>>> Hi Guys and Gals,
> >>>>>>
> >>>>>> The vote has passed! Apache Airflow 1.10.0 is official.
> >>>>>>
> >>>>>> As I am AFK for a while can one of the committers please rename
> >>>> according
> >>>>>> to the release docs and push it to the relevant locations (pypi
> and
> >>>>> Apache
> >>>>>> dist)?
> >>>>>>
> >>>>>> Oh and maybe start a quick 1.10.1?
> >>>>>>
> >>>>>> Cheers
> >>>>>> Bolke
> >>>>>>
> >>>>>> Sent from my iPhone
> >>>>>>
> >>>>>> Begin forwarded message:
> >>>>>>
> >>>>>>> From: Bolke de Bruin 
> >>>>>>> Date: 20 August 2018 at 20:00:28 CEST
> >>>>>>> To: gene...@incubator.apache.org,
> dev@airflow.incubator.apache.org
> >>>>>>> Subject: [RESULT][VOTE] Release Airflow 1.10.0
> >>>>>>>
> >>>>>>> The vote to release Airflow 1.10.0-incubating, having been
> open for 8
> >>>>>>> days is now closed.
> >>>>>>>
> >>>>>>> There were three binding +1s and no -1 votes.
> >>>>>>>
> >>>>>>> +1 (binding):
> >>>>>>> Justin Mclean
> >>>>>>> Jakob Homan
> >>>>>>> Hitesh Shah
> >>>>>>>
> >>>>>>> The release is approved.
> >>>>>>>
> >>>>>>> Thanks to all those who voted.
> >>>>>>>
> >>>>>>> Bolke
> >>>>>>>
> >>>>>>> Sent from my iPhone
> >>>>>>>
> >>>>>>> Begin forwarded message:
> >

Amsterdam Apache Airflow meetup

2018-08-25 Thread Driesprong, Fokko
Hi all,

Due a cancellation there is a speaker slot available at the Amsterdam
Meetup the 12th of September:
https://www.meetup.com/Amsterdam-Airflow-meetup/events/253673642/

If you're interested in speaking let me know. It can be quite broad, from
how you're using Airflow within your organisation, to new features that
you're working on. Let me know if you're interested!

Cheers, Fokko


Re: [RESULT][VOTE] Release Airflow 1.10.0

2018-08-22 Thread Driesprong, Fokko
Certainly: https://github.com/apache/incubator-airflow/releases/tag/1.10.0

Cheers, Fokko

Op wo 22 aug. 2018 om 15:18 schreef Ash Berlin-Taylor :

> Could you push the git tag too please Fokko/Bolke?
>
> -ash
>
> > On 22 Aug 2018, at 08:16, Driesprong, Fokko 
> wrote:
> >
> > Thanks Max,
> >
> > My PyPI ID is Fokko
> >
> > Cheers, Fokko
> >
> > 2018-08-21 22:49 GMT+02:00 Maxime Beauchemin  >:
> >
> >> I can, what's your PyPI ID?
> >>
> >> Max
> >>
> >> On Mon, Aug 20, 2018 at 2:11 PM Driesprong, Fokko  >
> >> wrote:
> >>
> >>> Thanks Bolke!
> >>>
> >>> I've just pushed the artifacts to Apache Dist:
> >>>
> >>> https://dist.apache.org/repos/dist/release/incubator/
> >> airflow/1.10.0-incubating/
> >>>
> >>> I don't have any access to pypi, this means that I'm not able to upload
> >> the
> >>> artifacts over there. Anyone in the position to grand me access or
> upload
> >>> it to pypi?
> >>>
> >>> Thanks! Cheers, Fokko
> >>>
> >>> 2018-08-20 20:06 GMT+02:00 Bolke de Bruin :
> >>>
> >>>> Hi Guys and Gals,
> >>>>
> >>>> The vote has passed! Apache Airflow 1.10.0 is official.
> >>>>
> >>>> As I am AFK for a while can one of the committers please rename
> >> according
> >>>> to the release docs and push it to the relevant locations (pypi and
> >>> Apache
> >>>> dist)?
> >>>>
> >>>> Oh and maybe start a quick 1.10.1?
> >>>>
> >>>> Cheers
> >>>> Bolke
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>> Begin forwarded message:
> >>>>
> >>>>> From: Bolke de Bruin 
> >>>>> Date: 20 August 2018 at 20:00:28 CEST
> >>>>> To: gene...@incubator.apache.org, dev@airflow.incubator.apache.org
> >>>>> Subject: [RESULT][VOTE] Release Airflow 1.10.0
> >>>>>
> >>>>> The vote to release Airflow 1.10.0-incubating, having been open for 8
> >>>>> days is now closed.
> >>>>>
> >>>>> There were three binding +1s and no -1 votes.
> >>>>>
> >>>>> +1 (binding):
> >>>>> Justin Mclean
> >>>>> Jakob Homan
> >>>>> Hitesh Shah
> >>>>>
> >>>>> The release is approved.
> >>>>>
> >>>>> Thanks to all those who voted.
> >>>>>
> >>>>> Bolke
> >>>>>
> >>>>> Sent from my iPhone
> >>>>>
> >>>>> Begin forwarded message:
> >>>>>
> >>>>>> From: Bolke de Bruin 
> >>>>>> Date: 20 August 2018 at 19:56:23 CEST
> >>>>>> To: gene...@incubator.apache.org
> >>>>>> Subject: Re: [VOTE] Release Airflow 1.10.0 (new vote based on rc4)
> >>>>>>
> >>>>>> Appreciated Hitesh. Do you know how to add headers to .MD files?
> >> There
> >>>> seems to be no technical standard way[1]. Is there a way to solve this
> >>>> elegantly?
> >>>>>>
> >>>>>> Cheers
> >>>>>> Bolke
> >>>>>>
> >>>>>> [1] https://alvinalexander.com/technology/markdown-comments-
> >>>> syntax-not-in-generated-output
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Sent from my iPhone
> >>>>>>
> >>>>>>> On 20 Aug 2018, at 19:48, Hitesh Shah  wrote:
> >>>>>>>
> >>>>>>> +1 (binding)
> >>>>>>>
> >>>>>>> Ran through the basic checks.
> >>>>>>>
> >>>>>>> Minor nit which can be fixed in the next release: there are a bunch
> >>> of
> >>>>>>> documentation files which could have a license header added (e.g.
> >>> .md,
> >>>>>>> .rst, )
> >>>>>>>
> >>>>>>> thanks
> >>>>>>> Hitesh
> >>>>>>>
> >>>>>>>> On Mon, Aug 20, 2018 at 4:08 AM Bolke de Bruin  >

Re: PR Review Dashboard?

2018-08-22 Thread Driesprong, Fokko
Hi Holden,

Just curious if you got a hold of someone at the coffee machine :-)

Cheers, Fokko

Op di 7 aug. 2018 om 09:17 schreef Holden Karau :

> The JIRA/Github integration tooling I’m a little more fuzzy on but I’m
> doing coffee with some of the folks who probably know the details this week
> and I’ll report back.
>
> On Tue, Aug 7, 2018 at 12:15 AM Driesprong, Fokko 
> wrote:
>
> > Hi Holden,
> >
> > Thanks for reaching out. Recently we've moved to Apache Gitbox (
> > https://gitbox.apache.org/), so we use the Github UI directly instead of
> > having to merge using a CLI (
> > https://github.com/apache/incubator-airflow/blob/master/dev/airflow-pr).
> >
> > Not sure if we're already up to the game of the dashboard, which looks
> > awesome btw. But as you also mentioned in your live Airflow PR, we're
> > missing some automation of communication between Jira and Github. For
> > example, as you mentioned, when a PR is opened, the status automagically
> > changes to In Progress. Do you have any pointer of how this is set up at,
> > for example the Spark or Beam project? So we can replicate this in
> Airflow.
> >
> > Cheers, Fokko
> >
> > 2018-08-07 7:28 GMT+02:00 Holden Karau :
> >
> > > Hi Y'all,
> > >
> > > One of the comments from my livestream was asking if the code for the
> > Spark
> > > PR review dashboard <http://spark-prs.appspot.com/> is OSS (it is
> > > <https://github.com/databricks/spark-pr-dashboard>), and I have a fork
> > up
> > > for Beam, and I was wondering if folks in Airflow would find something
> > like
> > > this useful? If so I'd be happy to set that up (if not no stress).
> > >
> > > Cheers,
> > >
> > > Holden :)
> > >
> > > --
> > > Cell : 425-233-8271
> > >
> >
> --
> Twitter: https://twitter.com/holdenkarau
>


Re: Jira cleanup and triage

2018-08-22 Thread Driesprong, Fokko
Hi Gerardo,

Thanks for bringing this up. This is actually a good point.

Recently we've moved the Apache Airflow repo to the Gitbox repo (
https://gitbox.apache.org/). Before with the Apache repo, the Github repo
was just a mirror of the Apache one. Now we do everything on Github itself.
We still need to hook up the hooks to automagically close a Jira issue when
the PR is being closed. This would work very well combined with the stale
robot.

For the Jira tickets itself, sometimes one of the committers goes through
the list of open tickets. I did that two month ago or so. We don't have an
explicit process in place to be honest.

I think in terms of contributing, checking stale PR's would be awesome of
course.

Cheers, Fokko

Ps. If you're looking for a new ticket, the CI still needs to cache the
Docker images instead of pulling them every time ;-)





2018-08-22 10:42 GMT+02:00 Gerardo Curiel :

> Hi folks,
>
> Is there a recommended way for contributors to help close/triage Jira
> issues?
>
> I've been looking at issues to work on next, and I've found a few
> categories of issues:
>
> - Issues in need of triage: these might need to be checked against the
> latest version and then closed if they can't be reproduced
> - Duplicated issues
> - Issues that are still open issues with merged PRs (one example:
> https://issues.apache.org/jira/browse/AIRFLOW-2856)
>
> How can we help to point out these out to committers? Cleaning up Jira
> should help newcomers to easily visualise the work being done and pick what
> to work on.
>
> Also, has something like https://github.com/probot/stale (or whatever the
> equivalent in Jira is) being considered for closing issues and PRs
> automatically?
>
> Cheers,
>
> --
> Gerardo Curiel // https://gerar.do
>


Re: [RESULT][VOTE] Release Airflow 1.10.0

2018-08-22 Thread Driesprong, Fokko
Thanks Max,

My PyPI ID is Fokko

Cheers, Fokko

2018-08-21 22:49 GMT+02:00 Maxime Beauchemin :

> I can, what's your PyPI ID?
>
> Max
>
> On Mon, Aug 20, 2018 at 2:11 PM Driesprong, Fokko 
> wrote:
>
> > Thanks Bolke!
> >
> > I've just pushed the artifacts to Apache Dist:
> >
> > https://dist.apache.org/repos/dist/release/incubator/
> airflow/1.10.0-incubating/
> >
> > I don't have any access to pypi, this means that I'm not able to upload
> the
> > artifacts over there. Anyone in the position to grand me access or upload
> > it to pypi?
> >
> > Thanks! Cheers, Fokko
> >
> > 2018-08-20 20:06 GMT+02:00 Bolke de Bruin :
> >
> > > Hi Guys and Gals,
> > >
> > > The vote has passed! Apache Airflow 1.10.0 is official.
> > >
> > > As I am AFK for a while can one of the committers please rename
> according
> > > to the release docs and push it to the relevant locations (pypi and
> > Apache
> > > dist)?
> > >
> > > Oh and maybe start a quick 1.10.1?
> > >
> > > Cheers
> > > Bolke
> > >
> > > Sent from my iPhone
> > >
> > > Begin forwarded message:
> > >
> > > > From: Bolke de Bruin 
> > > > Date: 20 August 2018 at 20:00:28 CEST
> > > > To: gene...@incubator.apache.org, dev@airflow.incubator.apache.org
> > > > Subject: [RESULT][VOTE] Release Airflow 1.10.0
> > > >
> > > > The vote to release Airflow 1.10.0-incubating, having been open for 8
> > > > days is now closed.
> > > >
> > > > There were three binding +1s and no -1 votes.
> > > >
> > > > +1 (binding):
> > > > Justin Mclean
> > > > Jakob Homan
> > > > Hitesh Shah
> > > >
> > > > The release is approved.
> > > >
> > > > Thanks to all those who voted.
> > > >
> > > > Bolke
> > > >
> > > > Sent from my iPhone
> > > >
> > > > Begin forwarded message:
> > > >
> > > >> From: Bolke de Bruin 
> > > >> Date: 20 August 2018 at 19:56:23 CEST
> > > >> To: gene...@incubator.apache.org
> > > >> Subject: Re: [VOTE] Release Airflow 1.10.0 (new vote based on rc4)
> > > >>
> > > >> Appreciated Hitesh. Do you know how to add headers to .MD files?
> There
> > > seems to be no technical standard way[1]. Is there a way to solve this
> > > elegantly?
> > > >>
> > > >> Cheers
> > > >> Bolke
> > > >>
> > > >> [1] https://alvinalexander.com/technology/markdown-comments-
> > > syntax-not-in-generated-output
> > > >>
> > > >>
> > > >>
> > > >> Sent from my iPhone
> > > >>
> > > >>> On 20 Aug 2018, at 19:48, Hitesh Shah  wrote:
> > > >>>
> > > >>> +1 (binding)
> > > >>>
> > > >>> Ran through the basic checks.
> > > >>>
> > > >>> Minor nit which can be fixed in the next release: there are a bunch
> > of
> > > >>> documentation files which could have a license header added (e.g.
> > .md,
> > > >>> .rst, )
> > > >>>
> > > >>> thanks
> > > >>> Hitesh
> > > >>>
> > > >>>> On Mon, Aug 20, 2018 at 4:08 AM Bolke de Bruin  >
> > > wrote:
> > > >>>>
> > > >>>> Sorry Willem that should be of course. Apologies.
> > > >>>>
> > > >>>> Sent from my iPhone
> > > >>>>
> > > >>>>> On 20 Aug 2018, at 13:07, Bolke de Bruin 
> > wrote:
> > > >>>>>
> > > >>>>> Hi William
> > > >>>>>
> > > >>>>> You seem to be missing a "4" at the end of the URL? Ah it seems
> > that
> > > my
> > > >>>> original email had a quirk. Would you mind using the below?
> > > >>>>>
> > > >>>>> https://github.com/apache/incubator-airflow/releases/
> tag/1.10.0rc4
> > > >>>>>
> > > >>>>> Thanks!
> > > >>>>> Bolke
> > > >>>>>
> > > >>>>> Sent from my iPhone
> > > >>>>>

Re: Regarding airflow 1.10

2018-08-21 Thread Driesprong, Fokko
Hi Hemanth,

Thanks for the question. There is the KubernetesExecutor which just uses a
container with Airflow to schedule a task on the kubernetes cluster. This
container will then execute a single task, and then the Executor checks the
exitcode of the container.

There is also the KubernetesOperator which allows you to kick of a new
container on the kubernetes cluster. Hope this helps.

Cheers, Fokko

Op di 21 aug. 2018 om 15:54 schreef Musunuru, Hemanth <
hemanth.musun...@nike.com>

> Hi Team,
>
> Does Kubernetes executor in 1.10 create new  pod for every task ? (Which
> means won’t there be any active workers running except for new Tasks)
>
> We are trying to containerize our existing to airflow environment, your
> reply Will help us to understand it more.
>
> Thanks in Advance .
>
> Thanks
> Hemanth


Re: updating dags best practice

2018-08-21 Thread Driesprong, Fokko
Hi Aleksander,

What kind of executor are you using? This is really important when choosing
a strategy. For most of the Airflow deployments that I've worked with
simply uses git to deploy the master branch. This is with a LocalExecutor.

Cheers, Fokko

Op di 21 aug. 2018 om 15:25 schreef Dev Aleksander 

> Hi all,
> in the place I'm currently at we're building and redeploying a new set of
> containers with the latest code every time we want to update a DAG. That
> doesn't feel like the fastest way.
>
> Anyone can share their approach?
>
> Thanks,
> Aleksander
>


Re: [RESULT][VOTE] Release Airflow 1.10.0

2018-08-20 Thread Driesprong, Fokko
Thanks Bolke!

I've just pushed the artifacts to Apache Dist:
https://dist.apache.org/repos/dist/release/incubator/airflow/1.10.0-incubating/

I don't have any access to pypi, this means that I'm not able to upload the
artifacts over there. Anyone in the position to grand me access or upload
it to pypi?

Thanks! Cheers, Fokko

2018-08-20 20:06 GMT+02:00 Bolke de Bruin :

> Hi Guys and Gals,
>
> The vote has passed! Apache Airflow 1.10.0 is official.
>
> As I am AFK for a while can one of the committers please rename according
> to the release docs and push it to the relevant locations (pypi and Apache
> dist)?
>
> Oh and maybe start a quick 1.10.1?
>
> Cheers
> Bolke
>
> Sent from my iPhone
>
> Begin forwarded message:
>
> > From: Bolke de Bruin 
> > Date: 20 August 2018 at 20:00:28 CEST
> > To: gene...@incubator.apache.org, dev@airflow.incubator.apache.org
> > Subject: [RESULT][VOTE] Release Airflow 1.10.0
> >
> > The vote to release Airflow 1.10.0-incubating, having been open for 8
> > days is now closed.
> >
> > There were three binding +1s and no -1 votes.
> >
> > +1 (binding):
> > Justin Mclean
> > Jakob Homan
> > Hitesh Shah
> >
> > The release is approved.
> >
> > Thanks to all those who voted.
> >
> > Bolke
> >
> > Sent from my iPhone
> >
> > Begin forwarded message:
> >
> >> From: Bolke de Bruin 
> >> Date: 20 August 2018 at 19:56:23 CEST
> >> To: gene...@incubator.apache.org
> >> Subject: Re: [VOTE] Release Airflow 1.10.0 (new vote based on rc4)
> >>
> >> Appreciated Hitesh. Do you know how to add headers to .MD files? There
> seems to be no technical standard way[1]. Is there a way to solve this
> elegantly?
> >>
> >> Cheers
> >> Bolke
> >>
> >> [1] https://alvinalexander.com/technology/markdown-comments-
> syntax-not-in-generated-output
> >>
> >>
> >>
> >> Sent from my iPhone
> >>
> >>> On 20 Aug 2018, at 19:48, Hitesh Shah  wrote:
> >>>
> >>> +1 (binding)
> >>>
> >>> Ran through the basic checks.
> >>>
> >>> Minor nit which can be fixed in the next release: there are a bunch of
> >>> documentation files which could have a license header added (e.g. .md,
> >>> .rst, )
> >>>
> >>> thanks
> >>> Hitesh
> >>>
>  On Mon, Aug 20, 2018 at 4:08 AM Bolke de Bruin 
> wrote:
> 
>  Sorry Willem that should be of course. Apologies.
> 
>  Sent from my iPhone
> 
> > On 20 Aug 2018, at 13:07, Bolke de Bruin  wrote:
> >
> > Hi William
> >
> > You seem to be missing a "4" at the end of the URL? Ah it seems that
> my
>  original email had a quirk. Would you mind using the below?
> >
> > https://github.com/apache/incubator-airflow/releases/tag/1.10.0rc4
> >
> > Thanks!
> > Bolke
> >
> > Sent from my iPhone
> >
> >> On 20 Aug 2018, at 13:03, Willem Jiang 
> wrote:
> >>
> >> Hi,
> >>
> >> The Git tag cannot be accessed.  I can only get the 404  error
> there.
> >>
> >> https://github.com/apache/incubator-airflow/releases/tag/1.10.0rc
> >>
> >>
> >> Willem Jiang
> >>
> >> Twitter: willemjiang
> >> Weibo: 姜宁willem
> >>
> >>> On Sun, Aug 12, 2018 at 8:25 PM, Bolke de Bruin  >
>  wrote:
> >>>
> >>> Hello Incubator PMC’ers,
> >>>
> >>> The Apache Airflow community has voted and approved the proposal to
>  release
> >>> Apache Airflow 1.10.0 (incubating) based on 1.10.0 Release
> Candidate
>  4. We
> >>> now kindly request the Incubator PMC members to review and vote on
> this
> >>> incubator release.
> >>>
> >>> Airflow is a platform to programmatically author, schedule, and
> monitor
> >>> workflows. Use Airflow to author workflows as directed acyclic
> graphs
> >>> (DAGs) of tasks. The airflow scheduler executes your tasks on an
> array
>  of
> >>> workers while following the specified dependencies. Rich command
> line
> >>> utilities make performing complex surgeries on DAGs a snap. The
> rich
>  user
> >>> interface makes it easy to visualize pipelines running in
> production,
> >>> monitor progress, and troubleshoot issues when needed. When
> workflows
>  are
> >>> defined as code, they become more maintainable, versionable,
> testable,
>  and
> >>> collaborative.
> >>>
> >>> After a successful IPMC vote Artifacts will be available at:
> >>>
> >>> https://www.apache.org/dyn/closer.cgi/incubator/airflow <
> >>> https://www.apache.org/dyn/closer.cgi/incubator/airflow>
> >>>
> >>> Public keys are available at:
> >>>
> >>> https://www.apache.org/dist/incubator/airflow/ <
> >>> https://www.apache.org/dist/incubator/airflow/>
> >>>
> >>> apache-airflow-1.10.0rc4+incubating-source.tar.gz
> >>>
> >>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.
> >>> 10.0rc4/apache-airflow-1.10.0rc4+incubating-source.tar.gz <
>  https://dist.apache.org/repos/dist/dev/incubator/airflow/1.
> 10.0rc4/apache-
> >>> 

Use Docker for running Airflow tests

2018-08-19 Thread Driesprong, Fokko
Hi all,

Gerardo is doing some awesome work on Dockerizing the CI pipeline. The PR
is still open here: https://github.com/apache/incubator-airflow/pull/3393

To make this work, and not store any Docker images in private repositories,
I've created a new repository which stores the Docker image which contains
a lot of the dependencies we use to test against. The repository is here:
https://github.com/apache/incubator-airflow-ci
For most of the tests, we use mocks, but sometimes we also use the actual
service, for example for some Hadoop, Kerberos, Hive functionality. The
images are build in the Dockerhub service:
https://hub.docker.com/r/airflowci/incubator-airflow-ci/builds/
I've created it for now. I'm open for another name, but Airflow was already
taken, and hyphens are not allowed.

This change will greatly decouple our dependency on Travis and will make it
much easier to test the code locally. Instead of setting up tox
environments, we can just use Docker instead. I'm really enthusiastic about
the change, and really thankful for Gerardo to put all this effort in
there.

Recently Holden Karau did a lifestream
, and did her first PR. I
think the most important learning from it was that testing was not as
trivial as it should be. Therefore I think we should see if we can merge
the PR from Gerardo soon.

Cheers, Fokko


Re: Apache Airflow welcome new committer/PMC member : Feng Tao (a.k.a. feng-tao)

2018-08-12 Thread Driesprong, Fokko
Welcome Tao!

2018-08-11 1:28 GMT+02:00 Alagappan AL S :

> Congratulations Tao!
>
> On Fri, Aug 3, 2018 at 3:02 PM, Grant Nicholas <
> grantnicholas2...@u.northwestern.edu> wrote:
>
> > Congrats Feng!
> >
> > On Fri, Aug 3, 2018 at 12:35 PM Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> >
> > > Well deserved, welcome aboard!
> > >
> > > On Fri, Aug 3, 2018 at 9:07 AM Mark Grover <
> grover.markgro...@gmail.com>
> > > wrote:
> > >
> > > > Congrats Tao!
> > > >
> > > > On Fri, Aug 3, 2018, 08:52 Jin Chang  wrote:
> > > >
> > > > > Congrats, Tao!!
> > > > >
> > > > > On Fri, Aug 3, 2018 at 8:20 AM Taylor Edmiston <
> tedmis...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Congratulations, Feng!
> > > > > >
> > > > > > *Taylor Edmiston*
> > > > > > Blog <https://blog.tedmiston.com/> | CV
> > > > > > <https://stackoverflow.com/cv/taylor> | LinkedIn
> > > > > > <https://www.linkedin.com/in/tedmiston/> | AngelList
> > > > > > <https://angel.co/taylor> | Stack Overflow
> > > > > > <https://stackoverflow.com/users/149428/taylor-edmiston>
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 3, 2018 at 7:31 AM, Driesprong, Fokko
> > >  > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Welcome Feng! Awesome to have you on board!
> > > > > > >
> > > > > > > 2018-08-03 10:41 GMT+02:00 Naik Kaxil :
> > > > > > >
> > > > > > > > Hi Airflow'ers,
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Please join the Apache Airflow PMC in welcoming its newest
> > member
> > > > and
> > > > > > > >
> > > > > > > > co-committer, Feng Tao (a.k.a. feng-tao<
> > > > https://github.com/feng-tao
> > > > > >).
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Welcome Feng, great to have you on board!
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > >
> > > > > > > > Kaxil
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Kaxil Naik
> > > > > > > >
> > > > > > > > Data Reply
> > > > > > > > 2nd Floor, Nova South
> > > > > > > > 160 Victoria Street, Westminster
> > > > > > > > <https://maps.google.com/?q=160+Victoria+Street,+
> > > > > > > Westminster+%0D%0ALondon+SW1E+5LB+-+UK=gmail=g>
> > > > > > > > London SW1E 5LB - UK
> > > > > > > > phone: +44 (0)20 7730 6000
> > > > > > > > k.n...@reply.com
> > > > > > > > www.reply.com
> > > > > > > >
> > > > > > > > [image: Data Reply]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
>


Re: [VOTE] Airflow 1.10.0rc4

2018-08-09 Thread Driesprong, Fokko
Good point Bolke, Sid, seems that there are still a few issues with
Tenacity as well ,
therefore I would like to change my vote:

+1 (binding)

Cheers, Fokko

2018-08-09 14:08 GMT+02:00 Ash Berlin-Taylor :

> +0.5 (binding) from me.
>
> Tested upgrading form 1.9.0 metadb on Py3.5. Timezones behaving themselves
> on Postgres. Have not tested the Rbac-based UI.
>
> https://github.com/apache/incubator-airflow/commit/
> d9fecba14c5eb56990508573a91b13ab27ca5153  incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153>
> (expanding on UPDATING.md for Logging changes) isn't in the release, but
> would only affect people who look at the UPDATING.md in the source tarball,
> which isn't going to be very many - most people will check in the repo and
> just install via PyPi I'd guess?
>
> -ash
>
> > On 8 Aug 2018, at 19:21, Bolke de Bruin  wrote:
> >
> > Hey all,
> >
> > I have cut Airflow 1.10.0 RC4. This email is calling a vote on the
> release,
> > which will last for 72 hours. Consider this my (binding) +1.
> >
> > Airflow 1.10.0 RC 4 is available at:
> >
> > https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/ <
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/>
> >
> > apache-airflow-1.10.0rc4+incubating-source.tar.gz is a source release
> that
> > comes with INSTALL instructions.
> > apache-airflow-1.10.0rc4+incubating-bin.tar.gz is the binary Python
> "sdist"
> > release.
> >
> > Public keys are available at:
> >
> > https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> https://dist.apache.org/repos/dist/release/incubator/airflow/>
> >
> > The amount of JIRAs fixed is over 700. Please have a look at the
> changelog.
> > Since RC3 the following has been fixed:
> >
> > [AIRFLOW-2870] Use abstract TaskInstance for migration
> > [AIRFLOW-2859] Implement own UtcDateTime
> > [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook
> > [AIRFLOW-2869] Remove smart quote from default config
> > [AIRFLOW-2857] Fix Read the Docs env
> >
> > Please note that the version number excludes the `rcX` string as well
> > as the "+incubating" string, so it's now simply 1.10.0. This will allow
> us
> > to rename the artifact without modifying the artifact checksums when we
> > actually release.
> >
> > WARNING: Due to licensing requirements you will need to set
> > SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
> > installing or upgrading. We will try to remove this requirement for the
> > next release.
> >
> > Cheers,
> > Bolke
>
>


Re: [VOTE] Airflow 1.10.0rc4

2018-08-08 Thread Driesprong, Fokko
-1 (binding)

Sorry Bolke for not checking this earlier. In rc3 we've replaced some of
the reserved keywords. But still I'm unable to run a simple dag in the 1.10
rc4 release under Python 3.7:

MacBook-Pro-van-Fokko:sdh-api-pobt fokkodriesprong$ docker run -e
SLUGIFY_USES_TEXT_UNIDECODE=yes -t -i python:3.7 /bin/bash -c "pip install
https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/apache-airflow-1.10.0rc4+incubating-bin.tar.gz
&& airflow initdb && airflow run example_bash_operator runme_0 2017-07-01"
Collecting
https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/apache-airflow-1.10.0rc4+incubating-bin.tar.gz
  Downloading
https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/apache-airflow-1.10.0rc4+incubating-bin.tar.gz
(4.4MB)
100% || 4.4MB 2.5MB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
  File "", line 1, in 
  File "/tmp/pip-req-build-91ci7xlu/setup.py", line 124
async = [
  ^
SyntaxError: invalid syntax


Command "python setup.py egg_info" failed with error code 1 in
/tmp/pip-req-build-91ci7xlu/
You are using pip version 10.0.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

I think we should cherry-pick these three commits on to 1.10 branch:

*- Remove the async from setup.py*
https://github.com/apache/incubator-airflow/commit/e38a4e5d3064980abd10b8afa6918ab9f10dd8a2

*- Upgrade lxml to >4.0 to let it compile with Python 3.7*
https://github.com/apache/incubator-airflow/commit/5290688ee0576ad167d9622c96cdeb08e9965a20
lxml is needed for Python 3.7:
https://github.com/apache/incubator-airflow/pull/3583

*- Bump tenacy from 4.8.0 to 4.12.0 *
https://github.com/apache/incubator-airflow/pull/3723/commits/271ea663df72c16aa105017ed5cc87a639846777
The 4.8 version of Tenacy contains reserved keywords
https://github.com/apache/incubator-airflow/pull/3723

After this I'm able to run an example dag using Python3.7: docker run -e
SLUGIFY_USES_TEXT_UNIDECODE=yes -t -i python:3.7 /bin/bash -c "pip install
git+https://github.com/Fokko/incubator-airflow.git@v1-10-stable && airflow
initdb && airflow run example_bash_operator runme_0 2017-07-01"

[2018-08-08 21:25:57,944] {__init__.py:51} INFO - Using executor
SequentialExecutor
[2018-08-08 21:25:58,069] {models.py:258} INFO - Filling up the DagBag from
/root/airflow/dags
[2018-08-08 21:25:58,112] {example_kubernetes_operator.py:54} WARNING -
Could not import KubernetesPodOperator: No module named 'kubernetes'
[2018-08-08 21:25:58,112] {example_kubernetes_operator.py:55} WARNING -
Install kubernetes dependencies with: pip install airflow['kubernetes']
[2018-08-08 21:25:58,155] {cli.py:492} INFO - Running  on host
31ec1d1554b7
[2018-08-08 21:25:58,739] {__init__.py:51} INFO - Using executor
SequentialExecutor
[2018-08-08 21:25:58,915] {models.py:258} INFO - Filling up the DagBag from
/root/airflow/dags/example_dags/example_bash_operator.py
[2018-08-08 21:25:58,987] {example_kubernetes_operator.py:54} WARNING -
Could not import KubernetesPodOperator: No module named 'kubernetes'
[2018-08-08 21:25:58,987] {example_kubernetes_operator.py:55} WARNING -
Install kubernetes dependencies with: pip install airflow['kubernetes']
[2018-08-08 21:25:59,060] {cli.py:492} INFO - Running  on host
31ec1d1554b7

https://github.com/Fokko/incubator-airflow/commits/v1-10-stable

Still no hard guarantees that 3.7 will be fully supported, but at least it
runs :-)

Cheers, Fokko

2018-08-08 20:21 GMT+02:00 Bolke de Bruin :

> Hey all,
>
> I have cut Airflow 1.10.0 RC4. This email is calling a vote on the release,
> which will last for 72 hours. Consider this my (binding) +1.
>
> Airflow 1.10.0 RC 4 is available at:
>
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/ <
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc4/>
>
> apache-airflow-1.10.0rc4+incubating-source.tar.gz is a source release that
> comes with INSTALL instructions.
> apache-airflow-1.10.0rc4+incubating-bin.tar.gz is the binary Python
> "sdist"
> release.
>
> Public keys are available at:
>
> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>
> The amount of JIRAs fixed is over 700. Please have a look at the
> changelog.
> Since RC3 the following has been fixed:
>
> [AIRFLOW-2870] Use abstract TaskInstance for migration
> [AIRFLOW-2859] Implement own UtcDateTime
> [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook
> [AIRFLOW-2869] Remove smart quote from default config
> [AIRFLOW-2857] Fix Read the Docs env
>
> Please note that the version number excludes the `rcX` string as well
> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
> to rename the artifact without modifying the artifact 

Re: PR Review Dashboard?

2018-08-07 Thread Driesprong, Fokko
Hi Holden,

Thanks for reaching out. Recently we've moved to Apache Gitbox (
https://gitbox.apache.org/), so we use the Github UI directly instead of
having to merge using a CLI (
https://github.com/apache/incubator-airflow/blob/master/dev/airflow-pr).

Not sure if we're already up to the game of the dashboard, which looks
awesome btw. But as you also mentioned in your live Airflow PR, we're
missing some automation of communication between Jira and Github. For
example, as you mentioned, when a PR is opened, the status automagically
changes to In Progress. Do you have any pointer of how this is set up at,
for example the Spark or Beam project? So we can replicate this in Airflow.

Cheers, Fokko

2018-08-07 7:28 GMT+02:00 Holden Karau :

> Hi Y'all,
>
> One of the comments from my livestream was asking if the code for the Spark
> PR review dashboard  is OSS (it is
> ), and I have a fork up
> for Beam, and I was wondering if folks in Airflow would find something like
> this useful? If so I'd be happy to set that up (if not no stress).
>
> Cheers,
>
> Holden :)
>
> --
> Cell : 425-233-8271
>


Re: Apache Airflow welcome new committer/PMC member : Feng Tao (a.k.a. feng-tao)

2018-08-03 Thread Driesprong, Fokko
Welcome Feng! Awesome to have you on board!

2018-08-03 10:41 GMT+02:00 Naik Kaxil :

> Hi Airflow'ers,
>
>
>
> Please join the Apache Airflow PMC in welcoming its newest member and
>
> co-committer, Feng Tao (a.k.a. feng-tao).
>
>
>
> Welcome Feng, great to have you on board!
>
>
>
> Cheers,
>
> Kaxil
>
>
>
>
>
>
> Kaxil Naik
>
> Data Reply
> 2nd Floor, Nova South
> 160 Victoria Street, Westminster
> 
> London SW1E 5LB - UK
> phone: +44 (0)20 7730 6000
> k.n...@reply.com
> www.reply.com
>
> [image: Data Reply]
>


Re: [VOTE] Airflow 1.10.0rc3

2018-08-03 Thread Driesprong, Fokko
+1 Binding

Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz

Cheers, Fokko

2018-08-03 9:47 GMT+02:00 Bolke de Bruin :

> Hey all,
>
> I have cut Airflow 1.10.0 RC3. This email is calling a vote on the release,
> which will last for 72 hours. Consider this my (binding) +1.
>
> Airflow 1.10.0 RC 3 is available at:
>
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>
> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release that
> comes with INSTALL instructions.
> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python
> "sdist"
> release.
>
> Public keys are available at:
>
> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>
> The amount of JIRAs fixed is over 700. Please have a look at the
> changelog.
> Since RC2 the following has been fixed:
>
> * [AIRFLOW-2817] Force explicit choice on GPL dependency
> * [AIRFLOW-2716] Replace async and await py3.7 keywords
> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
>
> Please note that the version number excludes the `rcX` string as well
> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
> to rename the artifact without modifying the artifact checksums when we
> actually release.
>
> WARNING: Due to licensing requirements you will need to set
>  SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
> installing or upgrading. We will try to remove this requirement for the
> next release.
>
> Cheers,
> Bolke


Re: We've migrated to Github to repo!

2018-08-01 Thread Driesprong, Fokko
Hi Max,

We're totally on the same page, I think I've phrased it a bit clumsy.

Two things that I've noticed:

1. Apache is not being mirrored, is this expected behaviour?

MacBook-Pro-van-Fokko:incubator-airflow fokkodriesprong$ git reset --hard
apache/master

HEAD is now at dfa7b26d [AIRFLOW-2809] Fix security issue regarding Flask
SECRET_KEY

MacBook-Pro-van-Fokko:incubator-airflow fokkodriesprong$ git reset --hard
github/master

HEAD is now at ed972042 [AIRFLOW-1104] Update jobs.py so Airflow does not
over schedule tasks (#3568)

2. We need to make sure that we close the Jira ourself.

Cheers, Fokko




2018-07-31 21:50 GMT+02:00 Maxime Beauchemin :

> What I meant by changing history is mutating one or many SHAs in the
> branch, an operation that would require force-pushing, which merging
> doesn't do. Personally I prefer "Squash & Merge" as it makes for a
> merge-commit free `git log` and having a linear branch history in master
> that aligns with when things were introduced to the branch.
>
> It's possible to disable some of these options from the repo (only if
> you're an Admin, meaning we'd have to involve INFRA to change that). But
> it's good to have options for the cases I mentioned above.
>
> So committers, use "Squash and Merge"! It matches our previous process when
> using the defaults in the now defunct `scripts/airflow-pr`
>
> [I'm really hoping I'm not starting a merge vs rebase workflow debate
> here...]
>
> Max
>
> On Tue, Jul 31, 2018 at 12:37 PM Driesprong, Fokko 
> wrote:
>
> > Hi Max,
> >
> > You're right. I just started plowing though my mailbox and merged a
> commit
> > without squash and merge, but it changes history as you mention.
> > Nice thing of Github is if you change it, it remembers your preference
> > which is Squash and Merge :-)
> >
> > Love the Gitbox so far, great work!
> >
> > Cheers, Fokko
> >
> > 2018-07-31 21:34 GMT+02:00 Maxime Beauchemin  >:
> >
> > > "Squash & Merge" (the default) does the right thing (squashes the
> > multiple
> > > commit and replays the resulting commit on top of master), we should
> use
> > > that most of the times. We'd only want to merge if we wanted to
> preserve
> > > history from within the PR (multiple collaborators or multiple
> important
> > > commits that we want to keep detailed in master for instance).
> > >
> > > I'm not sure how to verify whether the `master` branch is protected on
> > this
> > > setup (without pushing to it as a test, which I'd rather not do). We
> > should
> > > make sure that it is though as changing history on master can cause all
> > > sorts of problems.
> > >
> > > Max
> > >
> > > On Tue, Jul 31, 2018 at 9:21 AM Sid Anand  wrote:
> > >
> > > > The other benefit of using Option 3 over Option 1 is that you
> maintain
> > > the
> > > > history of who committed and who authored in one line in the Git
> log--
> > > i.e.
> > > > "bob33 authored and ashb committed 3 hours ago" instead of just "ashb
> > > > committed" for a merge commit followed by the commit(s) from bob33.
> > > >
> > > > On Tue, Jul 31, 2018 at 9:11 AM Sid Anand  wrote:
> > > >
> > > > > Ash,
> > > > > This is pretty cool. I just merged one PR from GH directly.
> > > > >
> > > > > Interestingly, I still used the `dev/airflow-pr work_local` to test
> > out
> > > > > the PR, but merging directly in the GitHub UI afterwards definitely
> > > > avoided
> > > > > my needing to do another `dev/airflow-pr merge` CLI command.
> > > > >
> > > > > There are 3 options in the UI: The default is "Create a merge
> commit"
> > > > > (Option 1). I think the ones we want is the "Rebase & Merge"
> (Option
> > > 3),
> > > > > which requires that PR submitters squash their commits. Otherwise,
> we
> > > > could
> > > > > use "Squash & Merge" (Option 2), though I am not clear if Squash &
> > > Merge
> > > > is
> > > > > more like option 1 or option 3.
> > > > >
> > > > > -s
> > > > >
> > > > > On Mon, Jul 30, 2018 at 7:19 PM Andrew Phillips <
> > aphill...@qrmedia.com
> > > >
> > > > > wrote:
> > > > >
> > > > >> > We should ask Apache infra to send the GH notifs to another
> > mailing
> > > > >> > list.
> > > > >>
> > > > >> Over at jclouds, we created a "notifications@" list for this
> > purpose
> > > > >> (well, actually we renamed "issues@" to "notifications@"), and
> send
> > > > >> messages there:
> > > > >>
> > > > >> https://issues.apache.org/jira/browse/INFRA-7180
> > > > >> https://mail-archives.apache.org/mod_mbox/jclouds-notifications/
> > > > >>
> > > > >> Regards
> > > > >>
> > > > >> ap
> > > > >>
> > > > >
> > > >
> > >
> >
>


Re: We've migrated to Github to repo!

2018-07-31 Thread Driesprong, Fokko
Hi Max,

You're right. I just started plowing though my mailbox and merged a commit
without squash and merge, but it changes history as you mention.
Nice thing of Github is if you change it, it remembers your preference
which is Squash and Merge :-)

Love the Gitbox so far, great work!

Cheers, Fokko

2018-07-31 21:34 GMT+02:00 Maxime Beauchemin :

> "Squash & Merge" (the default) does the right thing (squashes the multiple
> commit and replays the resulting commit on top of master), we should use
> that most of the times. We'd only want to merge if we wanted to preserve
> history from within the PR (multiple collaborators or multiple important
> commits that we want to keep detailed in master for instance).
>
> I'm not sure how to verify whether the `master` branch is protected on this
> setup (without pushing to it as a test, which I'd rather not do). We should
> make sure that it is though as changing history on master can cause all
> sorts of problems.
>
> Max
>
> On Tue, Jul 31, 2018 at 9:21 AM Sid Anand  wrote:
>
> > The other benefit of using Option 3 over Option 1 is that you maintain
> the
> > history of who committed and who authored in one line in the Git log--
> i.e.
> > "bob33 authored and ashb committed 3 hours ago" instead of just "ashb
> > committed" for a merge commit followed by the commit(s) from bob33.
> >
> > On Tue, Jul 31, 2018 at 9:11 AM Sid Anand  wrote:
> >
> > > Ash,
> > > This is pretty cool. I just merged one PR from GH directly.
> > >
> > > Interestingly, I still used the `dev/airflow-pr work_local` to test out
> > > the PR, but merging directly in the GitHub UI afterwards definitely
> > avoided
> > > my needing to do another `dev/airflow-pr merge` CLI command.
> > >
> > > There are 3 options in the UI: The default is "Create a merge commit"
> > > (Option 1). I think the ones we want is the "Rebase & Merge" (Option
> 3),
> > > which requires that PR submitters squash their commits. Otherwise, we
> > could
> > > use "Squash & Merge" (Option 2), though I am not clear if Squash &
> Merge
> > is
> > > more like option 1 or option 3.
> > >
> > > -s
> > >
> > > On Mon, Jul 30, 2018 at 7:19 PM Andrew Phillips  >
> > > wrote:
> > >
> > >> > We should ask Apache infra to send the GH notifs to another mailing
> > >> > list.
> > >>
> > >> Over at jclouds, we created a "notifications@" list for this purpose
> > >> (well, actually we renamed "issues@" to "notifications@"), and send
> > >> messages there:
> > >>
> > >> https://issues.apache.org/jira/browse/INFRA-7180
> > >> https://mail-archives.apache.org/mod_mbox/jclouds-notifications/
> > >>
> > >> Regards
> > >>
> > >> ap
> > >>
> > >
> >
>


Re: Kerberos and Airflow

2018-07-26 Thread Driesprong, Fokko
Hi Ry,

You should ask Bolke de Bruin. He's really experienced with Kerberos and he
did also the implementation for Airflow. Beside that he worked also on
implementing Kerberos in Ambari. Just want to let you know.

Cheers, Fokko

Op do 26 jul. 2018 om 23:03 schreef Ry Walker 

> Hi everyone -
>
> We have several bigCo's who are considering using Airflow asking into its
> support for Kerberos.
>
> We're going to work on a proof-of-concept next week, will likely record a
> screencast on it.
>
> For now, we're looking for any anecdotal information from organizations who
> are using Kerberos with Airflow, if anyone would be willing to share their
> experiences here, or reply to me personally, it would be greatly
> appreciated!
>
> -Ry
>
> --
>
> *Ry Walker* | CEO, Astronomer  | 513.417.2163 |
> @rywalker  | LinkedIn
> 
>


Re: Airflow's JS code (and dependencies) manageable via npm and webpack

2018-07-23 Thread Driesprong, Fokko
​Nice work Verdan.

The frontend really needed some love, thank you for picking this up. Maybe
we should also think deprecating the old www. Keeping both of the UI's is
something that takes a lot of time. Maybe after the release of 1.10 we can
think of moving to Airflow 2.0, and removing the old UI.


Cheers, Fokko​

2018-07-23 10:02 GMT+02:00 Naik Kaxil :

> Awesome. Thanks @Verdan
>
> On 23/07/2018, 07:58, "Verdan Mahmood"  wrote:
>
> Heads-up!! This frontend change has been merged in master branch
> recently.
> This will impact the users working on Airflow RBAC UI only. That means:
>
> *If you are a contributor/developer of Apache Airflow:*
> You'll need to install and build the frontend packages if you want to
> run
> the web UI.
> Please make sure to read the new section, "Setting up the node / npm
> javascript environment"
>  CONTRIBUTING.md#setting-up-the-node--npm-javascript-
> environment-only-for-www_rbac>
>
> in CONTRIBUTING.md
>
> *If you are using Apache Airflow in your production environment:*
> Nothing will impact you, as every new build of Apache Airflow will
> come up
> with pre-built dependencies.
>
> Please let me know if you have any questions. Thank you
>
> Best,
> *Verdan Mahmood*
>
>
> On Sun, Jul 15, 2018 at 6:52 PM Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > Glad to see this is happening!
> >
> > Max
> >
> > On Mon, Jul 9, 2018 at 6:37 AM Ash Berlin-Taylor <
> > ash_airflowl...@firemirror.com> wrote:
> >
> > > Great! Thanks for doing this. I've left some review comments on
> your PR.
> > >
> > > -ash
> > >
> > > > On 9 Jul 2018, at 11:45, Verdan Mahmood <
> verdan.mahm...@gmail.com>
> > > wrote:
> > > >
> > > > ​Hey Guys, ​
> > > >
> > > > In an effort to simplify the JS dependencies of Airflow
> > > > ​​
> > > > ,
> > > > ​I've
> > > > introduce
> > > > ​d​
> > > > npm and webpack for the package management. For now, it only
> implements
> > > > this in the www_rbac version of the web server.
> > > > ​
> > > >
> > > > Pull Request: https://github.com/apache/
> incubator-airflow/pull/3572
> > > >
> > > > The problem with the
> > > > ​existing ​
> > > > frontend (
> > > > ​JS
> > > > ) code of Airflow is that most of the custom JS is written
> > > > ​with​
> > > > in the html files, using the Flask's (Jinja) variables in that
> JS. The
> > > next
> > > > step of this effort would be to extract that custom
> > > > ​JS
> > > > code in separate JS files
> > > > ​,​
> > > > use the dependencies in those files using require or import
> > > > ​ and introduce the JS automated test suite eventually. ​
> > > > (At the moment, I'm simply using the CopyWebPackPlugin to copy
> the
> > > required
> > > > dependencies for use)
> > > > ​.
> > > >
> > > > There are also some dependencies which are directly modified in
> the
> > > codebase
> > > > ​ or are outdated​
> > > > . I couldn't found the
> > > > ​ correct​
> > > > npm versions of those libraries. (dagre-d3.js and
> gantt-chart-d3v2.js).
> > > > Apparently dagre-d3.js that we are using is one of the gist or
> is very
> > > old
> > > > version
> > > > ​ not supported with webpack 4​
> > > > , while the gantt-chart-d3v2 has been modified according to
> Airflow's
> > > > requirements
> > > > ​ I believe​
> > > > .
> > > > ​ Used the existing libraries for now. ​
> > > >
> > > > ​I am currently working in a separate branch to upgrade the
> DagreD3
> > > > library, and updating the custom JS related to DagreD3
> accordingly. ​
> > > >
> > > > This PR also introduces the pypi_push.sh
> > > > <
> > >
> > https://github.com/apache/incubator-airflow/pull/3572/files#diff-
> 8fae684cdcc8cc8df2232c8df16f64cb
> > > >
> > > > script that will generate all the JS statics before creating and
> > > uploading
> > > > the package.
> > > > ​
> > > > ​Please let me know if you guys have any questions or
> suggestions and
> > I'd
> > > > be happy to answer that. ​
> > > >
> > > > Best,
> > > > *Verdan Mahmood*
> > > > (+31) 655 576 560
> > >
> > >
> >
>
>
>
>
>
>
> Kaxil Naik
>
> Data Reply
> 2nd Floor, Nova South
> 160 Victoria Street, Westminster
> London SW1E 5LB - UK
> phone: +44 (0)20 7730 6000
> k.n...@reply.com
> www.reply.com
>


Re: [VOTE] Airflow 1.10.0rc2

2018-07-16 Thread Driesprong, Fokko
My vote is ​+1 (binding)​

Cheers, Fokko

2018-07-16 9:52 GMT+02:00 Driesprong, Fokko :

> Awesome Bolke!
>
> I don't have a production Airflow at hand right now, but I've ran some
> simple tests against Python 2.7, 3.5, 3.6 and it all looked fine.
>
> +1 From my side
>
> Cheers, Fokko
>
> 2018-07-15 20:05 GMT+02:00 Bolke de Bruin :
>
>> Hey all,
>>
>> I have cut Airflow 1.10.0 RC2. This email is calling a vote on the
>> release,
>> which will last for 72 hours. Consider this my (binding) +1.
>>
>> Airflow 1.10.0 RC 2 is available at:
>>
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc2/ <
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc2/>
>>
>> apache-airflow-1.10.0rc2+incubating-source.tar.gz is a source release
>> that
>> comes with INSTALL instructions.
>> apache-airflow-1.10.0rc2+incubating-bin.tar.gz is the binary Python
>> "sdist"
>> release.
>>
>> Public keys are available at:
>>
>> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
>> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>
>> The amount of JIRAs fixed is over 700. Please have a look at the
>> changelog.
>> Since RC2 the following has been fixed:
>>
>> * [AIRFLOW-1729][AIRFLOW-2797][AIRFLOW-2729] Ignore whole directories in
>> .airflowignore
>> * [AIRFLOW-2739] Always read default configuration files as utf-8
>> * [AIRFLOW-2752] Log using logging instead of stdout
>> * [AIRFLOW-1729][AIRFLOW-XXX] Remove extra debug log at info level
>>
>> Please note that the version number excludes the `rcX` string as well
>> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
>> to rename the artifact without modifying the artifact checksums when we
>> actually release.
>>
>>
>> Cheers,
>> Bolke
>
>
>


Re: [VOTE] Airflow 1.10.0rc2

2018-07-16 Thread Driesprong, Fokko
Awesome Bolke!

I don't have a production Airflow at hand right now, but I've ran some
simple tests against Python 2.7, 3.5, 3.6 and it all looked fine.

+1 From my side

Cheers, Fokko

2018-07-15 20:05 GMT+02:00 Bolke de Bruin :

> Hey all,
>
> I have cut Airflow 1.10.0 RC2. This email is calling a vote on the release,
> which will last for 72 hours. Consider this my (binding) +1.
>
> Airflow 1.10.0 RC 2 is available at:
>
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc2/ <
> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc2/>
>
> apache-airflow-1.10.0rc2+incubating-source.tar.gz is a source release that
> comes with INSTALL instructions.
> apache-airflow-1.10.0rc2+incubating-bin.tar.gz is the binary Python
> "sdist"
> release.
>
> Public keys are available at:
>
> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>
> The amount of JIRAs fixed is over 700. Please have a look at the
> changelog.
> Since RC2 the following has been fixed:
>
> * [AIRFLOW-1729][AIRFLOW-2797][AIRFLOW-2729] Ignore whole directories in
> .airflowignore
> * [AIRFLOW-2739] Always read default configuration files as utf-8
> * [AIRFLOW-2752] Log using logging instead of stdout
> * [AIRFLOW-1729][AIRFLOW-XXX] Remove extra debug log at info level
>
> Please note that the version number excludes the `rcX` string as well
> as the "+incubating" string, so it's now simply 1.10.0. This will allow us
> to rename the artifact without modifying the artifact checksums when we
> actually release.
>
>
> Cheers,
> Bolke


Re: [Proposal] Explicit re-schedule of sensors

2018-07-15 Thread Driesprong, Fokko
Thanks Stefan for picking this up. The sensors are in desperate need for
some redesign for the aforementioned reasons. Please note this ticket:
https://issues.apache.org/jira/browse/AIRFLOW-2001 It addresses the same
issue.

Regarding the open question. I would be reluctant for introducing new
states. Adding new states involves also changing/adding logic in the
scheduler. This scheduler is already far too complex right now. Maybe we
can also do something with the priority of the sensor, maybe lower the
priority once it has been polled, and came back with a negative state. In
such a strategy the other tasks will get priority over the sensors.

Cheers, Fokko

2018-07-12 14:53 GMT+02:00 Stefan Seelmann :

> Hi all,
>
> I'd like to discuss a proposal to enable explicit re-scheduling of
> sensors. I think there is demand for such a thing, in the last weeks
> multiple people asked for it or mentioned workarounds.
>
> I created a Jira [1] that describes the proposal and an initial PR [2].
>
> Feedback welcomed :-)
>
> Kind Regards,
> Stefan
>
> [1] https://issues.apache.org/jira/browse/AIRFLOW-2747
> [2] https://github.com/apache/incubator-airflow/pull/3596
>


Re: [VOTE] Airflow 1.10.0rc1

2018-07-15 Thread Driesprong, Fokko
Hi all,

I've did some more tests and it looks good. I was under the assumption that
the sequential executor runs within the webserver, but this was a wrong
assumption on my end. The behaviour is still the same as in 1.9. I've did
some tests on Python 2.7, 3.5 and 3.6 and it looks good. Python 3.7 does
not work yet, but it isn't supported anyway.

I would like to have https://github.com/apache/incubator-airflow/pull/3604 in
RC2. It isn't a critical bug, but it looks messy.

Cheers, Fokko

2018-07-14 0:06 GMT+02:00 Driesprong, Fokko :

> Thanks Bolke for all the effort.
>
> I think I've miscommunicated the issue. It doesn't schedule the runs, and
> when I explicitly kick of a run, it also isn't being picked up.
>
> Currently I'm doing a git bisect to check when this bug was introduced
> happend. To be continued.
>
> Cheers, Fokko
>
> 2018-07-13 23:28 GMT+02:00 Bolke de Bruin :
>
>> Hi Fokko,
>>
>> Please confirm this, because I tried sequential with a clean install. The
>> only thing I found was that example DAGs were not picked up.
>>
>> I’m rolling rc2 anyway though so it would be good to get it fixed.
>>
>> B.
>>
>> Verstuurd vanaf mijn iPad
>>
>> > Op 13 jul. 2018 om 22:23 heeft Driesprong, Fokko 
>> het volgende geschreven:
>> >
>> > Ok, I've did some testing.
>> >
>> > 1.10 works fine with the LocalExecutor. With the SequentialExecutor it
>> does
>> > not pick up any task, even with a different database as sqlite. Found
>> this
>> > one along the way: https://github.com/apache/incu
>> bator-airflow/pull/3604
>> >
>> > There are no recent changes to the SequentialExecutor, so I'm still
>> looking
>> > how this bug found its way into the source. For me this is a -1, right
>> now
>> > it is not possible to just give Airflow a try using a basic setup with a
>> > SequentialExecutor.
>> >
>> > Along the way this also makes me reconsider the tests. Like with the
>> > Kubernetes test we just run a task, and then assert if it ran properly.
>> > This might also be an idea for the sequential executor.
>> >
>> > Cheers, Fokko
>> >
>> > 2018-07-13 20:15 GMT+02:00 Jakob Homan :
>> >
>> >> @Bolke - I didn't raise the concern, so I can't speak to whether or
>> >> not Sebb will be ok with that. He tends to be pretty fastidious on
>> >> this stuff and 'but some other TLP does it' hasn't gone over well
>> >> before (trust me... I've tried).  Totally up to you if you'd rather
>> >> discuss it as part of the IPMC vote or just fix it to avoid
>> >> discussion.
>> >>
>> >> -jakob
>> >>
>> >> On 13 July 2018 at 09:48, Ash Berlin-Taylor
>> >>  wrote:
>> >>> Cloud that be related to my ignorefile change? `airflow list_dags`
>> still
>> >> shows the example dags - the output is the same for that command as on
>> >> v1-9-stable.
>> >>>
>> >>> Though I just noticed I'd left `self.log.info <http://self.log.info/
>> >()`
>> >> in there. That's going to be noisy. https://github.com/apache/
>> >> incubator-airflow/pull/3603 <https://github.com/apache/
>> >> incubator-airflow/pull/3603>
>> >>>
>> >>> -ash
>> >>>
>> >>>> On 13 Jul 2018, at 17:36, Bolke de Bruin  wrote:
>> >>>>
>> >>>> Example dags are not picked up. If you put a dag in the normal dag
>> >> folder it works fine.
>> >>>>
>> >>>> Please create a jira for this @fokko. A pr would be appreciated.
>> >>>>
>> >>>> B.
>> >>>>
>> >>>> Sent from my iPhone
>> >>>>
>> >>>>> On 13 Jul 2018, at 15:46, Driesprong, Fokko 
>> >> wrote:
>> >>>>>
>> >>>>> With the SequentialExecutor the webserver also acts as the scheduler
>> >>>>> (without parallelism)
>> >>>>>
>> >>>>> 2018-07-13 15:43 GMT+02:00 Carl Johan Gustavsson <
>> >> carl.jo...@tictail.com>:
>> >>>>>
>> >>>
>> >>
>>
>
>


Re: [VOTE] Airflow 1.10.0rc1

2018-07-13 Thread Driesprong, Fokko
Thanks Bolke for all the effort.

I think I've miscommunicated the issue. It doesn't schedule the runs, and
when I explicitly kick of a run, it also isn't being picked up.

Currently I'm doing a git bisect to check when this bug was introduced
happend. To be continued.

Cheers, Fokko

2018-07-13 23:28 GMT+02:00 Bolke de Bruin :

> Hi Fokko,
>
> Please confirm this, because I tried sequential with a clean install. The
> only thing I found was that example DAGs were not picked up.
>
> I’m rolling rc2 anyway though so it would be good to get it fixed.
>
> B.
>
> Verstuurd vanaf mijn iPad
>
> > Op 13 jul. 2018 om 22:23 heeft Driesprong, Fokko 
> het volgende geschreven:
> >
> > Ok, I've did some testing.
> >
> > 1.10 works fine with the LocalExecutor. With the SequentialExecutor it
> does
> > not pick up any task, even with a different database as sqlite. Found
> this
> > one along the way: https://github.com/apache/incubator-airflow/pull/3604
> >
> > There are no recent changes to the SequentialExecutor, so I'm still
> looking
> > how this bug found its way into the source. For me this is a -1, right
> now
> > it is not possible to just give Airflow a try using a basic setup with a
> > SequentialExecutor.
> >
> > Along the way this also makes me reconsider the tests. Like with the
> > Kubernetes test we just run a task, and then assert if it ran properly.
> > This might also be an idea for the sequential executor.
> >
> > Cheers, Fokko
> >
> > 2018-07-13 20:15 GMT+02:00 Jakob Homan :
> >
> >> @Bolke - I didn't raise the concern, so I can't speak to whether or
> >> not Sebb will be ok with that. He tends to be pretty fastidious on
> >> this stuff and 'but some other TLP does it' hasn't gone over well
> >> before (trust me... I've tried).  Totally up to you if you'd rather
> >> discuss it as part of the IPMC vote or just fix it to avoid
> >> discussion.
> >>
> >> -jakob
> >>
> >> On 13 July 2018 at 09:48, Ash Berlin-Taylor
> >>  wrote:
> >>> Cloud that be related to my ignorefile change? `airflow list_dags`
> still
> >> shows the example dags - the output is the same for that command as on
> >> v1-9-stable.
> >>>
> >>> Though I just noticed I'd left `self.log.info <http://self.log.info/
> >()`
> >> in there. That's going to be noisy. https://github.com/apache/
> >> incubator-airflow/pull/3603 <https://github.com/apache/
> >> incubator-airflow/pull/3603>
> >>>
> >>> -ash
> >>>
> >>>> On 13 Jul 2018, at 17:36, Bolke de Bruin  wrote:
> >>>>
> >>>> Example dags are not picked up. If you put a dag in the normal dag
> >> folder it works fine.
> >>>>
> >>>> Please create a jira for this @fokko. A pr would be appreciated.
> >>>>
> >>>> B.
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>>> On 13 Jul 2018, at 15:46, Driesprong, Fokko 
> >> wrote:
> >>>>>
> >>>>> With the SequentialExecutor the webserver also acts as the scheduler
> >>>>> (without parallelism)
> >>>>>
> >>>>> 2018-07-13 15:43 GMT+02:00 Carl Johan Gustavsson <
> >> carl.jo...@tictail.com>:
> >>>>>
> >>>
> >>
>


Re: [VOTE] Airflow 1.10.0rc1

2018-07-13 Thread Driesprong, Fokko
Ok, I've did some testing.

1.10 works fine with the LocalExecutor. With the SequentialExecutor it does
not pick up any task, even with a different database as sqlite. Found this
one along the way: https://github.com/apache/incubator-airflow/pull/3604

There are no recent changes to the SequentialExecutor, so I'm still looking
how this bug found its way into the source. For me this is a -1, right now
it is not possible to just give Airflow a try using a basic setup with a
SequentialExecutor.

Along the way this also makes me reconsider the tests. Like with the
Kubernetes test we just run a task, and then assert if it ran properly.
This might also be an idea for the sequential executor.

Cheers, Fokko

2018-07-13 20:15 GMT+02:00 Jakob Homan :

> @Bolke - I didn't raise the concern, so I can't speak to whether or
> not Sebb will be ok with that. He tends to be pretty fastidious on
> this stuff and 'but some other TLP does it' hasn't gone over well
> before (trust me... I've tried).  Totally up to you if you'd rather
> discuss it as part of the IPMC vote or just fix it to avoid
> discussion.
>
> -jakob
>
> On 13 July 2018 at 09:48, Ash Berlin-Taylor
>  wrote:
> > Cloud that be related to my ignorefile change? `airflow list_dags` still
> shows the example dags - the output is the same for that command as on
> v1-9-stable.
> >
> > Though I just noticed I'd left `self.log.info <http://self.log.info/>()`
> in there. That's going to be noisy. https://github.com/apache/
> incubator-airflow/pull/3603 <https://github.com/apache/
> incubator-airflow/pull/3603>
> >
> > -ash
> >
> >> On 13 Jul 2018, at 17:36, Bolke de Bruin  wrote:
> >>
> >> Example dags are not picked up. If you put a dag in the normal dag
> folder it works fine.
> >>
> >> Please create a jira for this @fokko. A pr would be appreciated.
> >>
> >> B.
> >>
> >> Sent from my iPhone
> >>
> >>> On 13 Jul 2018, at 15:46, Driesprong, Fokko 
> wrote:
> >>>
> >>> With the SequentialExecutor the webserver also acts as the scheduler
> >>> (without parallelism)
> >>>
> >>> 2018-07-13 15:43 GMT+02:00 Carl Johan Gustavsson <
> carl.jo...@tictail.com>:
> >>>
> >
>


Re: [VOTE] Airflow 1.10.0rc1

2018-07-13 Thread Driesprong, Fokko
With the SequentialExecutor the webserver also acts as the scheduler
(without parallelism)

2018-07-13 15:43 GMT+02:00 Carl Johan Gustavsson :

> Need to run the scheduler also right?
>
> --
> Carl Johan Gustavsson
>
> On 13 July 2018 at 15:37:22, Driesprong, Fokko (fo...@driesprong.frl)
> wrote:
>
> I've installed Airflow, but using a clean install and the sequential
> executor, the tasks are not being picked up.
>
> This is easy to replicate:
>
> docker run -t -i --rm -p 8080:8080 python:3.5 bash
> pip install https://dist.apache.org/repos/dist/dev/incubator/airflow/1.
> 10.0rc1/apache-airflow-1.10.0rc1+incubating-bin.tar.gz
> airflow initdb
> airflow webserver
>
> I've enabled the http example dag and kicked of a dag, but it isn't being
> picked up by the SequentialScheduler.
>
> - Fokko
>
>
>
>
> 2018-07-13 11:51 GMT+02:00 Bolke de Bruin :
>
>> Hi Sid,
>>
>> Do you have a JIRA and a PR to address it? I can then consider it for RC2
>>
>> B.
>>
>> > On 12 Jul 2018, at 04:50, Sid Anand  wrote:
>> >
>> > FYI!
>> > I just installed the release candidate. The first thing I noticed is a
>> missing tool tip for the Null State in the Recent Tasks column on the
>> landing page. Since the null globe is new to this UI, users will likely
>> hover over it to inquire what it means... and will be left wanting. Of
>> course, they could click on the globe, which will take them to
>> http://localhost:8080/admin/taskinstance/?flt1_dag_id_equals
>> =example_bash_operator_state_equals=null <
>> http://localhost:8080/admin/taskinstance/?flt1_dag_id_equal
>> s=example_bash_operator_state_equals=null>, which will always show
>> an empty list, leaving them a bit more confused.
>> >
>> > -s
>> >
>> > On Wed, Jul 11, 2018 at 3:13 PM Carl Johan Gustavsson <
>> carl.jo...@tictail.com.invalid> wrote:
>> > Hi Bolke,
>> >
>> > (Switching email to avoid moderation on my emails.)
>> >
>> > The normal Airflow test suite does not fail as it uses a LC_ALL set to
>> > utf-8.
>> >
>> > I think it is a proper test though, it is a minimal reproducible
>> version of
>> > the code that fails. And the only difference in behaviour is at 3.7
>> which
>> > we don’t support anyway so I’m fairly sure it is broken for all
>> supported
>> > Python 3 versions.
>> >
>> > I now tried running the tests in docker using 3.5 with the LC_ALL/LANG
>> > unset and I see the same failure.
>> >
>> > I don’t think this is a big thing though and we could release it without
>> > the fix I made. I think most people run it with a sane LC_ALL, but
>> > apparently we didn’t.
>> > Here’s the log for the test:
>> >
>> > > docker run -t -i -v `pwd`:/airflow/ python:3.5 bash
>> > root@b99b297df111:/# locale
>> > LANG=C.UTF-8
>> > LANGUAGE=
>> > LC_CTYPE="C.UTF-8"
>> > LC_NUMERIC="C.UTF-8"
>> > LC_TIME="C.UTF-8"
>> > LC_COLLATE="C.UTF-8"
>> > LC_MONETARY="C.UTF-8"
>> > LC_MESSAGES="C.UTF-8"
>> > LC_PAPER="C.UTF-8"
>> > LC_NAME="C.UTF-8"
>> > LC_ADDRESS="C.UTF-8"
>> > LC_TELEPHONE="C.UTF-8"
>> > LC_MEASUREMENT="C.UTF-8"
>> > LC_IDENTIFICATION="C.UTF-8"
>> > LC_ALL=
>> > > unset LANG
>> > root@b99b297df111:/# locale
>> > LANG=
>> > LANGUAGE=
>> > LC_CTYPE="POSIX"
>> > LC_NUMERIC="POSIX"
>> > LC_TIME="POSIX"
>> > LC_COLLATE="POSIX"
>> > LC_MONETARY="POSIX"
>> > LC_MESSAGES="POSIX"
>> > LC_PAPER="POSIX"
>> > LC_NAME="POSIX"
>> > LC_ADDRESS="POSIX"
>> > LC_TELEPHONE="POSIX"
>> > LC_MEASUREMENT="POSIX"
>> > LC_IDENTIFICATION="POSIX"
>> > LC_ALL=
>> > root@b99b297df111:/# pip install -e .[devel]
>> > root@b99b297df111:/airflow# ./run_unit_tests.sh
>> > + export AIRFLOW_HOME=/root/airflow
>> > + AIRFLOW_HOME=/root/airflow
>> > + export AIRFLOW__CORE__UNIT_TEST_MODE=True
>> > + AIRFLOW__CORE__UNIT_TEST_MODE=True
>> > + export AIRFLOW__TESTSECTION__TESTKEY=testvalue
>> > + AIRFLOW__TESTSECTION__TESTKEY=testvalue
>> > + export AIRFLOW_USE_NEW_IMPORTS=1
>> > + AIRFLOW_USE_NEW_IMPORTS=1
>> 

Re: [VOTE] Airflow 1.10.0rc1

2018-07-13 Thread Driesprong, Fokko
I've installed Airflow, but using a clean install and the sequential
executor, the tasks are not being picked up.

This is easy to replicate:

docker run -t -i --rm -p 8080:8080 python:3.5 bash
pip install
https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc1/apache-airflow-1.10.0rc1+incubating-bin.tar.gz
airflow initdb
airflow webserver

I've enabled the http example dag and kicked of a dag, but it isn't being
picked up by the SequentialScheduler.

- Fokko




2018-07-13 11:51 GMT+02:00 Bolke de Bruin :

> Hi Sid,
>
> Do you have a JIRA and a PR to address it? I can then consider it for RC2
>
> B.
>
> > On 12 Jul 2018, at 04:50, Sid Anand  wrote:
> >
> > FYI!
> > I just installed the release candidate. The first thing I noticed is a
> missing tool tip for the Null State in the Recent Tasks column on the
> landing page. Since the null globe is new to this UI, users will likely
> hover over it to inquire what it means... and will be left wanting. Of
> course, they could click on the globe, which will take them to
> http://localhost:8080/admin/taskinstance/?flt1_dag_id_
> equals=example_bash_operator_state_equals=null <
> http://localhost:8080/admin/taskinstance/?flt1_dag_id_
> equals=example_bash_operator_state_equals=null>, which will always
> show an empty list, leaving them a bit more confused.
> >
> > -s
> >
> > On Wed, Jul 11, 2018 at 3:13 PM Carl Johan Gustavsson <
> carl.jo...@tictail.com.invalid> wrote:
> > Hi Bolke,
> >
> > (Switching email to avoid moderation on my emails.)
> >
> > The normal Airflow test suite does not fail as it uses a LC_ALL set to
> > utf-8.
> >
> > I think it is a proper test though, it is a minimal reproducible version
> of
> > the code that fails. And the only difference in behaviour is at 3.7 which
> > we don’t support anyway so I’m fairly sure it is broken for all supported
> > Python 3 versions.
> >
> > I now tried running the tests in docker using 3.5 with the LC_ALL/LANG
> > unset and I see the same failure.
> >
> > I don’t think this is a big thing though and we could release it without
> > the fix I made. I think most people run it with a sane LC_ALL, but
> > apparently we didn’t.
> > Here’s the log for the test:
> >
> > > docker run -t -i -v `pwd`:/airflow/ python:3.5 bash
> > root@b99b297df111:/# locale
> > LANG=C.UTF-8
> > LANGUAGE=
> > LC_CTYPE="C.UTF-8"
> > LC_NUMERIC="C.UTF-8"
> > LC_TIME="C.UTF-8"
> > LC_COLLATE="C.UTF-8"
> > LC_MONETARY="C.UTF-8"
> > LC_MESSAGES="C.UTF-8"
> > LC_PAPER="C.UTF-8"
> > LC_NAME="C.UTF-8"
> > LC_ADDRESS="C.UTF-8"
> > LC_TELEPHONE="C.UTF-8"
> > LC_MEASUREMENT="C.UTF-8"
> > LC_IDENTIFICATION="C.UTF-8"
> > LC_ALL=
> > > unset LANG
> > root@b99b297df111:/# locale
> > LANG=
> > LANGUAGE=
> > LC_CTYPE="POSIX"
> > LC_NUMERIC="POSIX"
> > LC_TIME="POSIX"
> > LC_COLLATE="POSIX"
> > LC_MONETARY="POSIX"
> > LC_MESSAGES="POSIX"
> > LC_PAPER="POSIX"
> > LC_NAME="POSIX"
> > LC_ADDRESS="POSIX"
> > LC_TELEPHONE="POSIX"
> > LC_MEASUREMENT="POSIX"
> > LC_IDENTIFICATION="POSIX"
> > LC_ALL=
> > root@b99b297df111:/# pip install -e .[devel]
> > root@b99b297df111:/airflow# ./run_unit_tests.sh
> > + export AIRFLOW_HOME=/root/airflow
> > + AIRFLOW_HOME=/root/airflow
> > + export AIRFLOW__CORE__UNIT_TEST_MODE=True
> > + AIRFLOW__CORE__UNIT_TEST_MODE=True
> > + export AIRFLOW__TESTSECTION__TESTKEY=testvalue
> > + AIRFLOW__TESTSECTION__TESTKEY=testvalue
> > + export AIRFLOW_USE_NEW_IMPORTS=1
> > + AIRFLOW_USE_NEW_IMPORTS=1
> > +++ dirname ./run_unit_tests.sh
> > ++ cd .
> > ++ pwd
> > + DIR=/airflow
> > + export PYTHONPATH=:/airflow/tests/test_utils
> > + PYTHONPATH=:/airflow/tests/test_utils
> > + nose_args=
> > + which airflow
> > + echo 'Initializing the DB'
> > Initializing the DB
> > + airflow resetdb
> > + yes
> > Traceback (most recent call last):
> >   File "/usr/local/bin/airflow", line 6, in 
> > exec(compile(open(__file__).read(), __file__, 'exec'))
> >   File "/airflow/airflow/bin/airflow", line 21, in 
> > from airflow import configuration
> >   File "/airflow/airflow/__init__.py", line 35, in 
> > from airflow import configuration as conf
> >   File "/airflow/airflow/configuration.py", line 106, in 
> > DEFAULT_CONFIG = f.read()
> >   File "/usr/local/lib/python3.5/encodings/ascii.py", line 26, in decode
> > return codecs.ascii_decode(input, self.errors)[0]
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position
> 21082:
> > ordinal not in range(128)
> > + '[' '' ']'
> > + '[' -z '' ']'
> > + nose_args='--with-coverage --cover-erase --cover-html
> > --cover-package=airflow --cover-html-dir=airflow/www/static/coverage
> >   --with-ignore-docstrings --rednose --with-timer -s -v
> > --logging-level=DEBUG '
> > + echo 'Starting the unit tests with the following nose arguments:
> > --with-coverage' --cover-erase --cover-html --cover-package=airflow
> > --cover-html-dir=airflow/www/static/coverage --with-ignore-docstrings
> > --rednose --with-timer -s -v 

Re: Apache Airflow 1.10.0b3

2018-07-13 Thread Driesprong, Fokko
Ah, I picked the wrong tar.gz, Thanks B

Cheers, Fokko

2018-07-13 15:07 GMT+02:00 Bolke de Bruin :

> Please use the Rc Fokko. This has been fixed in the RC.
>
> B.
>
> Verstuurd vanaf mijn iPad
>
> > Op 13 jul. 2018 om 15:01 heeft Driesprong, Fokko 
> het volgende geschreven:
> >
> > Hi all,
> >
> > I just tried to install a local Airflow instance within Docker, using
> Python 3.5, but I got an error on initdb. Anyone else experiencing this? It
> has to do with the schema migrations.
> >
> > Cheers,
> > Fokko
> >
> > 2018-07-09 22:56 GMT+02:00 Taylor Edmiston :
> >> We discussed the splitting out of hooks & operators from core
> internally at
> >> Astronomer again today.  I should be able to allocate some time to
> >> splitting out hooks & operators for the purpose of speeding up the CI.
> A
> >> faster CI on core would be hugely beneficial for all contributors.
> >>
> >> I created a Jira issue for this at https://issues.apache.org/
> >> jira/browse/AIRFLOW-2732.
> >>
> >> If anyone has pointers for this work, I'm all ears.  I've done a lot
> with
> >> Python unit tests in the past but this part of the Airflow codebase is
> new
> >> to me.
> >>
> >> Taylor
> >>
> >> *Taylor Edmiston*
> >> Blog <https://blog.tedmiston.com/> | CV
> >> <https://stackoverflow.com/cv/taylor> | LinkedIn
> >> <https://www.linkedin.com/in/tedmiston/> | AngelList
> >> <https://angel.co/taylor> | Stack Overflow
> >> <https://stackoverflow.com/users/149428/taylor-edmiston>
> >>
> >>
> >> On Sun, Jul 1, 2018 at 6:02 AM, Bolke de Bruin 
> wrote:
> >>
> >> > Separating the tests is where the effort lies. So having that as a
> >> > consequence of splitting the packages would be nice. It has come up a
> >> > couple of times but it was not picked up unfortunately.
> >> >
> >> > B.
> >> >
> >> > > On 28 Jun 2018, at 08:32, Maxime Beauchemin <
> maximebeauche...@gmail.com>
> >> > wrote:
> >> > >
> >> > > It would be so nice to have a fast test suite. Having to wait for
> Travis
> >> > > for up to an hour makes many workflows (like working on a release)
> super
> >> > > painful.
> >> > >
> >> > > I spoke with folks at Astronomer recently about moving all
> operators and
> >> > > hooks to another Python package that airflow would import. This
> would
> >> > allow
> >> > > for independent test suites and to have a more regular release
> cadence on
> >> > > hooks and operators. What do you think?
> >> > >
> >> > > Max
> >> > >
> >> > > On Wed, Jun 27, 2018 at 11:18 PM Bolke de Bruin 
> >> > wrote:
> >> > >
> >> > >> Arghhh. The downside of doing this late at night and wanting to go
> to
> >> > >> bed... :-). Will make a new one
> >> > >>
> >> > >> Sent from my iPhone
> >> > >>
> >> > >>> On 28 Jun 2018, at 00:07, Chris Fei  wrote:
> >> > >>>
> >> > >>> Great, thank you! I just took this for a quick spin and it looks
> like
> >> > >>> there's DB migration task missing. The task you committed just
> >> > recently,
> >> > >>> 9635ae0956e7_index_faskfail.py, has a down_revision of
> 856955da8476
> >> > >>> which can't be found when running airflow initdb (seehttps://
> >> > >> github.com/apache/incubator-airflow/tree/v1-10-test/airflow/
> >> > migrations/versions
> >> > >> ).
> >> > >>> Chris
> >> > >>>
> >> > >>>
> >> > >>>> On Wed, Jun 27, 2018, at 5:09 PM, Bolke de Bruin wrote:
> >> > >>>> Hi All,
> >> > >>>>
> >> > >>>> I have created a sdist package that is available at:
> >> > >>>>
> >> > >>>>
> >> > >> http://people.apache.org/~bolke/apache-airflow-1.10.0b3+incu
> >> > bating.tar.gz
> >> > >>>> <
> >> > >> http://people.apache.org/~bolke/apache-airflow-1.10.0b3+incu
> >> > bating.tar.gz>>
> >> > >>
> >> > >>>> In order 

Re: Apache Airflow 1.10.0b3

2018-07-13 Thread Driesprong, Fokko
Hi all,

I just tried to install a local Airflow instance within Docker, using
Python 3.5, but I got an error on initdb. Anyone else experiencing this? It
has to do with the schema migrations.

Cheers,
Fokko

2018-07-09 22:56 GMT+02:00 Taylor Edmiston :

> We discussed the splitting out of hooks & operators from core internally at
> Astronomer again today.  I should be able to allocate some time to
> splitting out hooks & operators for the purpose of speeding up the CI.  A
> faster CI on core would be hugely beneficial for all contributors.
>
> I created a Jira issue for this at https://issues.apache.org/
> jira/browse/AIRFLOW-2732.
>
> If anyone has pointers for this work, I'm all ears.  I've done a lot with
> Python unit tests in the past but this part of the Airflow codebase is new
> to me.
>
> Taylor
>
> *Taylor Edmiston*
> Blog  | CV
>  | LinkedIn
>  | AngelList
>  | Stack Overflow
> 
>
>
> On Sun, Jul 1, 2018 at 6:02 AM, Bolke de Bruin  wrote:
>
> > Separating the tests is where the effort lies. So having that as a
> > consequence of splitting the packages would be nice. It has come up a
> > couple of times but it was not picked up unfortunately.
> >
> > B.
> >
> > > On 28 Jun 2018, at 08:32, Maxime Beauchemin <
> maximebeauche...@gmail.com>
> > wrote:
> > >
> > > It would be so nice to have a fast test suite. Having to wait for
> Travis
> > > for up to an hour makes many workflows (like working on a release)
> super
> > > painful.
> > >
> > > I spoke with folks at Astronomer recently about moving all operators
> and
> > > hooks to another Python package that airflow would import. This would
> > allow
> > > for independent test suites and to have a more regular release cadence
> on
> > > hooks and operators. What do you think?
> > >
> > > Max
> > >
> > > On Wed, Jun 27, 2018 at 11:18 PM Bolke de Bruin 
> > wrote:
> > >
> > >> Arghhh. The downside of doing this late at night and wanting to go to
> > >> bed... :-). Will make a new one
> > >>
> > >> Sent from my iPhone
> > >>
> > >>> On 28 Jun 2018, at 00:07, Chris Fei  wrote:
> > >>>
> > >>> Great, thank you! I just took this for a quick spin and it looks like
> > >>> there's DB migration task missing. The task you committed just
> > recently,
> > >>> 9635ae0956e7_index_faskfail.py, has a down_revision of 856955da8476
> > >>> which can't be found when running airflow initdb (seehttps://
> > >> github.com/apache/incubator-airflow/tree/v1-10-test/airflow/
> > migrations/versions
> > >> ).
> > >>> Chris
> > >>>
> > >>>
> >  On Wed, Jun 27, 2018, at 5:09 PM, Bolke de Bruin wrote:
> >  Hi All,
> > 
> >  I have created a sdist package that is available at:
> > 
> > 
> > >> http://people.apache.org/~bolke/apache-airflow-1.10.0b3+incu
> > bating.tar.gz
> >  <
> > >> http://people.apache.org/~bolke/apache-airflow-1.10.0b3+incu
> > bating.tar.gz>>
> > >>
> >  In order to distinguish it from an actual (apache) release it is:
> > 
> >  1. Marked as beta (python package managers do not install beta
> >   versions by default - PEP 440)> 2. It is not signed
> >  3. It is not at an official apache distribution location
> > 
> >  You can also put something like this in a requirements.txt file:
> > 
> >  git+
> > 
> > >> https://github.com/apache/incubator-airflow@v1-10-test#egg=
> > apache-airflow[celery,crypto,emr,hive,hdfs,ldap,mysql,
> > postgres,redis,slack,s3
> >  <
> > >> https://github.com/apache/incubator-airflow@v1-10-test#egg=
> > apache-airflow[celery,crypto,emr,hive,hdfs,ldap,mysql,
> > postgres,redis,slack,s3
> > >>>
> >  ]>  >  airflow[celery,crypto,emr,hive,hdfs,ldap,mysql,postgres,redi
> > s,slack,s-
> >  3]
> >  <
> > >> https://github.com/rodrigc/incubator-airflow@master#egg=apac
> > he-airflow[celery,crypto,emr,hive,hdfs,ldap,mysql,postgres,
> > redis,slack,s3][1]
> > >>>
> > >>
> >  and then "pip install -r requirements.txt”.
> > 
> >  I hope that after this beta we can go to RC and start voting on
> 1.10.>
> >  Cheers
> >  Bolke
> > >>>
> > >>>
> > >>> Links:
> > >>>
> > >>> 1.
> > >> https://github.com/rodrigc/incubator-airflow@master#egg=apac
> > he-airflow[celery,crypto,emr,hive,hdfs,ldap,mysql,postgres,
> > redis,slack,s3]%20%3Chttps://github.com/rodrigc/incubator-
> > airflow@master#egg=apache-airflow[celery,crypto,emr,
> > hive,hdfs,ldap,mysql,postgres,redis,slack,s3]
> >  apache-airflow[celery,crypto,emr,hive,hdfs,ldap,mysql,
> postgres,redis,slack,s3]%20%3Chttps://github.com/rodrigc/
> incubator-airflow@master%23egg=apache-airflow[celery,
> crypto,emr,hive,hdfs,ldap,mysql,postgres,redis,slack,s3]>
> > >> 

Re: DagRunOperator - target dag tasks are not triggered - possible timezone conflict

2018-07-06 Thread Driesprong, Fokko
Hi Niranda,

What version of Airflow are you running? There are a lot of improvements
coming up in Airflow 1.10 regarding timezones.

https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg#L77-L79

Cheers, Fokko



2018-07-05 14:57 GMT+02:00 Niranda Perera :

> Hi,
>
> I am checking the DagRunOperator [1] and I was using the
> example_trigger_controller_dag.py & example_trigger_target_dag.py.
>
> When I triggered the controller dag, the target dag tasks are not
> triggered.
> Check dags.png
>
> while investigating further (check dagruns.png), I believe this due to dag
> run execution date setting. I am in UTC+5.30 and while the controller dag
> execution time was set by UTC+0, the target dag execution was set by my
> local timezone UTC+5.30. So, the target would not get triggered for another
> 5.30 hours.
>
> so, Is there a way to bind the timezones in airflow?
>
> [1] https://airflow.apache.org/_modules/dagrun_operator.html
>
> Best regards
>
> Niranda Perera
> Research Assistant
> Dept of CSE, University of Moratuwa
> niranda...@cse.mrt.ac.lk
> +94 71 554 8430
> https://lk.linkedin.com/in/niranda
>


Re: Airflow 1.10.0

2018-06-15 Thread Driesprong, Fokko
Hi all,

As you might know, I'm a consultant at GoDataDriven and I just changed to a
different project. At this project they are not using Airflow (yet), so for
me it is hard to test the RC's etc. I'm willing to help in the process, but
currently I'm struggling to find time for the project on the side
(including Airflow).

Cheers, Fokko

2018-06-13 23:28 GMT+02:00 Bolke de Bruin :

> Same here and will be for a while still :-(. So unfortunately 1.10 is a
> bit stalled.
>
> B,
>
> > On 13 Jun 2018, at 17:16, Chris Riccomini  wrote:
> >
> > Hey Fokko,
> >
> > Sorry that I've been MIA for a while. Just wanted to check in on 1.10,
> and
> > its current status.
> >
> > Cheers,
> > Chris
> >
> > On Tue, May 1, 2018 at 11:18 PM Driesprong, Fokko 
> > wrote:
> >
> >> Hi Bryon,
> >>
> >> We'll be releasing the RC's soon. If there aren't much issues with the
> >> RC's, it will be released quickly. But we need the community to test
> these.
> >>
> >> Cheers, Fokko
> >>
> >> 2018-05-01 20:57 GMT+02:00 Wicklund, Bryon :
> >>
> >>> Hey I was wondering if you had a date in mind or an estimate for when
> >>> Airflow 1.10.0 will be released?
> >>>
> >>> Thanks!
> >>> -Bryon
> >>>
> >>> This e-mail, including attachments, may include confidential and/or
> >>> proprietary information, and may be used only by the person or entity
> >>> to which it is addressed. If the reader of this e-mail is not the
> >> intended
> >>> recipient or his or her authorized agent, the reader is hereby notified
> >>> that any dissemination, distribution or copying of this e-mail is
> >>> prohibited. If you have received this e-mail in error, please notify
> the
> >>> sender by replying to this message and delete this e-mail immediately.
> >>>
> >>
>
>


Re: Master is broken

2018-06-12 Thread Driesprong, Fokko
Hi Gerardo,

I totally agree that when master turns red, we should stop merging and fix
the build or revert the commit that broke the build.

I think one of the underlying problems is having flaky tests, I tried to
fix a few of those, but they are quite persistent. Sometimes it is hard to
indentify if it is just a flaky test or if you really broke something.

Cheers, Fokko

Cheers, Fokko

Op di 12 jun. 2018 om 07:37 schreef Daniel Imberman <
daniel.imber...@gmail.com>

> +1 for merge blocking hooks. It would be great to have safety knowing that
> any commit I revert to will still pass tests (for rebase testing, etc)
>
> On Mon, Jun 11, 2018 at 10:23 PM Alex Tronchin-James 949-412-7220
> <(949)%20412-7220>  wrote:
>
> > Could we adopt some sort of merge-blocking hook that prohibits merge of
> PRs
> > failing unit tests? My team has such an approach at work and it reduces
> the
> > volume of breakage quite a bit. The only time we experience problems now
> is
> > where our unit test coverage is poor, but we improve the coverage every
> > time a breaking PR shows up. If our goal is to harden airflow for ongoing
> > functionality with reduced breakage, this would be one good way to get
> > there.
> >
> > On Mon, Jun 11, 2018 at 7:55 PM Gerardo Curiel  wrote:
> >
> > > Hi folks,
> > >
> > > The master branch has been broken for a couple of days already. But
> that
> > > hasn't stopped the project from merging pull requests. As time passes
> by,
> > > it gets hard to identify what change caused the breakage. And of
> course,
> > > fixing it might cause conflicts with the changes introduced by the
> merged
> > > PRs.
> > >
> > > It seems to me that there should be some sort of process or guidelines
> in
> > > place to avoid this sort of situations. "Don't merge if master is red"
> > > seems like a reasonable option.
> > >
> > > If this guideline sounds obvious enough that it shouldn't be spelled
> out
> > in
> > > the commiters' documentation, then that's fine, but it hasn't been
> > followed
> > > recently.
> > >
> > > Cheers,
> > >
> > > --
> > > Gerardo Curiel // https://gerar.do
> > >
> >
>


Re: KubernetesPodOperator: Invalid arguments were passed to BaseOperator

2018-05-30 Thread Driesprong, Fokko
Hi Taylor,

Thanks, I was thinking about something similar like you're suggesting. But
I'm not confident if the sys.exit() won't kill the whole Airflow process.
For example, if you do an airflow initdb, also the examples will be
initialised, and if you don't have kubernetes installed, it will hit the
sys.exit().

Cheers, Fokko

2018-05-30 20:08 GMT+02:00 Taylor Edmiston :

> I used requests instead of kube as an example, but what do you think about
> doing something like this?  I'm happy to put this into a PR if it would
> solve the pain point today.
>
> import logging
>
> try:
> import requests
> except ModuleNotFoundError:
> import sys
> logging.warning('kube not installed - skipping kube examples')
> sys.exit()
>
> resp = requests.get('http://example.com')
> print(resp)
> ...
>
> Taylor
>
> *Taylor Edmiston*
> Blog <https://blog.tedmiston.com/> | CV
> <https://stackoverflow.com/cv/taylor> | LinkedIn
> <https://www.linkedin.com/in/tedmiston/> | AngelList
> <https://angel.co/taylor> | Stack Overflow
> <https://stackoverflow.com/users/149428/taylor-edmiston>
>
>
> On Wed, May 30, 2018 at 4:40 AM, Driesprong, Fokko 
> wrote:
>
> > Hi Craig,
> >
> > This is something that needs to be fixed. I agree with you this is very
> > dirty. In your installation you're not installing the kubernetes stuff,
> so
> > the KubernetesPodOperator is ignored. We need to figure out how to have
> > example dags that are not compatible with the vanilla installation, or we
> > need to remove the kubernetes example for now, and move it to the
> > documentation.
> >
> > Cheers, Fokko
> >
> > 2018-05-30 2:11 GMT+02:00 Craig Rodrigues :
> >
> > > I tested master branch by putting the following in my requirements.txt:
> > >
> > > git+https://github.com/rodrigc/incubator-airflow@
> > > master#egg=apache-airflow[celery,crypto,emr,hive,hdfs,
> > > ldap,mysql,postgres,redis,slack,s3]
> > >
> > > and did a pip install -r requirements.txt
> > >
> > > When I started the airflow webserver, I saw deprecation warnings.  I
> > > put some additional debugging in models.py to through an exception so
> > that
> > > I could see the
> > > full stacktrace:
> > >
> > > [2018-05-29 14:00:34,419] {models.py:307} ERROR - Failed to import:
> > > /Users/c-craigr/airflow2/lib/python2.7/site-packages/
> > > airflow/example_dags/example_kubernetes_operator.py
> > > Traceback (most recent call last):
> > >   File "/Users/c-craigr/airflow2/lib/python2.7/site-packages/
> > airflow/models.py",
> > > line 304, in process_file
> > > m = imp.load_source(mod_name, filepath)
> > >   File "/Users/c-craigr/airflow2/lib/python2.7/site-packages/
> > > airflow/example_dags/example_kubernetes_operator.py", line 53, in
> > 
> > > dag=dag)
> > >   File "/Users/c-craigr/airflow2/lib/python2.7/site-packages/
> > airflow/utils/decorators.py",
> > > line 98, in wrapper
> > > result = func(*args, **kwargs)
> > >   File "/Users/c-craigr/airflow2/lib/python2.7/site-packages/
> > airflow/models.py",
> > > line 2308, in __init__
> > > raise Exception("Invalid use of args or kwargs")
> > > Exception: Invalid use of args or kwargs
> > >
> > >
> > > If looks like example_kubernetes_operator.py, this code is the source
> of
> > > the exception:
> > >
> > > k = KubernetesPodOperator(
> > > namespace='default',
> > > image="ubuntu:16.04",
> > > cmds=["bash", "-cx"],
> > > arguments=["echo", "10"],
> > > labels={"foo": "bar"},
> > > name="airflow-test-pod",
> > > in_cluster=False,
> > > task_id="task",
> > > get_logs=True,
> > > dag=dag)
> > >
> > >
> > > Without my extra debugging, the deprecation warning looks like this:
> > >
> > > [2018-05-29 14:06:27,567] {example_kubernetes_operator.py:30} WARNING
> -
> > > Could not import KubernetesPodOperator
> > > /Users/c-craigr/airflow2/lib/python2.7/site-packages/
> > airflow/models.py:2315:
> > > PendingDeprecationWarning: Invalid arguments were passed to
> BaseOperator.
> > > Support for passing such arguments will be dropped in Airflow 2.0.
> > Invalid
> > > arguments were:
> > > *args: ()
> > > **kwargs: {'name': 'airflow-test-pod', 'image': 'ubuntu:16.04',
> 'labels':
> > > {'foo': 'bar'}, 'namespace': 'default', 'cmds': ['bash', '-cx'],
> > > 'arguments': ['echo', '10'], 'in_cluster': False, 'get_logs': True}
> > >   category=PendingDeprecationWarning
> > >
> > >
> > >
> > > What is the correct fix for this?  It looks like a lot of operators
> pass
> > > in arguments which are not
> > > processed by BaseOperator, and thus trip over this deprecation warning.
> > >
> > > --
> > > Craig
> > >
> >
>


Re: KubernetesPodOperator: Invalid arguments were passed to BaseOperator

2018-05-30 Thread Driesprong, Fokko
Hi Craig,

This is something that needs to be fixed. I agree with you this is very
dirty. In your installation you're not installing the kubernetes stuff, so
the KubernetesPodOperator is ignored. We need to figure out how to have
example dags that are not compatible with the vanilla installation, or we
need to remove the kubernetes example for now, and move it to the
documentation.

Cheers, Fokko

2018-05-30 2:11 GMT+02:00 Craig Rodrigues :

> I tested master branch by putting the following in my requirements.txt:
>
> git+https://github.com/rodrigc/incubator-airflow@
> master#egg=apache-airflow[celery,crypto,emr,hive,hdfs,
> ldap,mysql,postgres,redis,slack,s3]
>
> and did a pip install -r requirements.txt
>
> When I started the airflow webserver, I saw deprecation warnings.  I
> put some additional debugging in models.py to through an exception so that
> I could see the
> full stacktrace:
>
> [2018-05-29 14:00:34,419] {models.py:307} ERROR - Failed to import:
> /Users/c-craigr/airflow2/lib/python2.7/site-packages/
> airflow/example_dags/example_kubernetes_operator.py
> Traceback (most recent call last):
>   File 
> "/Users/c-craigr/airflow2/lib/python2.7/site-packages/airflow/models.py",
> line 304, in process_file
> m = imp.load_source(mod_name, filepath)
>   File "/Users/c-craigr/airflow2/lib/python2.7/site-packages/
> airflow/example_dags/example_kubernetes_operator.py", line 53, in 
> dag=dag)
>   File 
> "/Users/c-craigr/airflow2/lib/python2.7/site-packages/airflow/utils/decorators.py",
> line 98, in wrapper
> result = func(*args, **kwargs)
>   File 
> "/Users/c-craigr/airflow2/lib/python2.7/site-packages/airflow/models.py",
> line 2308, in __init__
> raise Exception("Invalid use of args or kwargs")
> Exception: Invalid use of args or kwargs
>
>
> If looks like example_kubernetes_operator.py, this code is the source of
> the exception:
>
> k = KubernetesPodOperator(
> namespace='default',
> image="ubuntu:16.04",
> cmds=["bash", "-cx"],
> arguments=["echo", "10"],
> labels={"foo": "bar"},
> name="airflow-test-pod",
> in_cluster=False,
> task_id="task",
> get_logs=True,
> dag=dag)
>
>
> Without my extra debugging, the deprecation warning looks like this:
>
> [2018-05-29 14:06:27,567] {example_kubernetes_operator.py:30} WARNING -
> Could not import KubernetesPodOperator
> /Users/c-craigr/airflow2/lib/python2.7/site-packages/airflow/models.py:2315:
> PendingDeprecationWarning: Invalid arguments were passed to BaseOperator.
> Support for passing such arguments will be dropped in Airflow 2.0. Invalid
> arguments were:
> *args: ()
> **kwargs: {'name': 'airflow-test-pod', 'image': 'ubuntu:16.04', 'labels':
> {'foo': 'bar'}, 'namespace': 'default', 'cmds': ['bash', '-cx'],
> 'arguments': ['echo', '10'], 'in_cluster': False, 'get_logs': True}
>   category=PendingDeprecationWarning
>
>
>
> What is the correct fix for this?  It looks like a lot of operators pass
> in arguments which are not
> processed by BaseOperator, and thus trip over this deprecation warning.
>
> --
> Craig
>


Re: HttpSensor raising exception with status=403

2018-05-28 Thread Driesprong, Fokko
Hi Pedro,

You could just create a CustomHttpHook and place it on your pythonpath,
then you should also create a CustomHttpSensor. Hope this helps.

Cheers, Fokko

2018-05-26 2:48 GMT+02:00 Pedro Machado :

> Hi,
>
> I am using HttpSensor to look for a file. The webserver is returning 403
> (instead of 404) while the file is not available. This is causing the
> sensor to raise an exception.
>
> I see that a recent commit added the ability to disable the call to
> response.raise_for_status() on the http hook by passing
> extra_options={'check_response': False} to the sensor.
>
> https://github.com/apache/incubator-airflow/commit/
> 6c19468e0b3b938249acc43e4b833a753d093efc?diff=unified
>
> I am unable to upgrade airflow. What would be the best way to incorporate
> the new code, perhaps into a custom sensor?
>
> Thanks,
>
> Pedro
>


Re: How to wait for external process

2018-05-28 Thread Driesprong, Fokko
Hi Stefan,

Afaik there isn't a more efficient way of doing this. DAGs that are relying
on a lot of sensors are experiencing the same issues. The only way right
now, I can think of, is doing updating the state directly in the database.
But then you need to know what you are doing. I can image that this would
be feasible by using an AWS lambda function. Hope this helps.

Cheers, Fokko

2018-05-26 17:50 GMT+02:00 Stefan Seelmann :

> Hello,
>
> I have a DAG (externally triggered) where some processing is done at an
> external system (EC2 instance). The processing is started by an Airflow
> task (via HTTP request). The DAG should only continue once that
> processing is completed. In a first naive implementation I created a
> sensor that gets the progress (via HTTP request) and only if status is
> "finished" returns true and the DAG run continues. That works but...
>
> ... the external processing can take hours or days, and during that time
> a worker is occupied which does nothing but HTTP GET and sleep. There
> will be hundreds of DAG runs in parallel which means hundreds of workers
> are occupied.
>
> I looked into other operators that do computation on external systems
> (ECSOperator, AWSBatchOperator) but they also follow that pattern and
> just wait/sleep.
>
> So I want to ask if there is a more efficient way to build such a
> workflow with Airflow?
>
> Kind Regards,
> Stefan
>


Re: 1.10.0 Release

2018-05-26 Thread Driesprong, Fokko
Hi Kaxil,

Good point. Right now Bolke and I are a bit busy with the PyData talk that
we're giving tomorrow in Amsterdam.

I've did a `git checkout v1-10-test && git reset --hard apache/master` a
couple of times to make it up to date with master. We have to be careful to
set the version number back to 1.10: https://github.com/
apache/incubator-airflow/blob/master/airflow/version.py as Craig recently
pointed out.

Cheers, Fokko

2018-05-25 11:03 GMT+02:00 Naik Kaxil :

> Hi Bolke, Fokko,
>
> Are we going to rebranch master for v1.10 and delete the current
> v1.10-test branch?
>
> What are the plans for it and in case Bolke if you are busy, I am happy to
> assist alongwith @Fokko or anyone who is free?
>
> Regards,
> Kaxil
>
> On 21/05/2018, 18:58, "Bolke de Bruin"  wrote:
>
> We will rebranch 1.10 from master. Sorry, I have been too busy with
> normal life to be able to follow up on the release of 1.10.
>
> B.
>
> > On 21 May 2018, at 19:54, Craig Rodrigues 
> wrote:
> >
> > Kaxil,
> >
> > Thanks for merging this into master.
> > What is the procedure to get this into v1-10-test branch?
> > I am heavily testing that branch right now, and want to deploy that
> branch to my prod system
> > this Friday.
> >
> > --
> > Craig
> >
> > On 2018/05/21 14:52:33, Craig Rodrigues 
> wrote:
> >> I have submitted:
> >>
> >> https://github.com/apache/incubator-airflow/pull/3388
> >>
> >> --
> >> Craig
> >
>
>
>
>
>
>
>
> Kaxil Naik
>
> Data Reply
> 2nd Floor, Nova South
> 160 Victoria Street, Westminster
> London SW1E 5LB - UK
> phone: +44 (0)20 7730 6000
> k.n...@reply.com
> www.reply.com
>


Re: Moving to Github? Re: Merging PRs, closing Jira tickets (a.k.a New Committer) guide?

2018-05-22 Thread Driesprong, Fokko
Are we moving to Gitbox? I would really like that. I see a lot of other
project moving to Gitbox at the moment (Flink, Parquet). How's your
schedule Ash? ;)

Cheers, Fokko

2018-03-09 18:53 GMT+01:00 Bolke de Bruin <bdbr...@gmail.com>:

> I would love to be able to close PRs on GitHub, but personally I’m quite
> ok with JIRA. GitHub issues tend to get messy imho. I also like that it is
> clear from the subject of Pr what the associated issues was, since we moved
> history became a lot cleaner and changelogs are now easy to generate.
>
> My 2cents
>
> B.
>
> Verstuurd vanaf mijn iPad
>
> > Op 9 mrt. 2018 om 18:01 heeft Ash Berlin-Taylor <a...@firemirror.com>
> het volgende geschreven:
> >
> > That sounds like it is at worth me at least coming up a proposal for us
> to vote on then..
> >
> > One thing that might help with the "target version" is multiple Github
> Projects[1]: -- that, or labels, are the only way for a github issue to be
> in "two" groups at the same time.
> >
> > I'll see what I can do, but make zero promises due to imminent
> baby-driven "sleep" schedule ;)
> >
> > [1]: https://help.github.com/articles/about-project-boards/
> >
> >
> >> On 9 Mar 2018, at 16:10, Maxime Beauchemin <maximebeauche...@gmail.com>
> wrote:
> >>
> >> We use Gitbox and no Jira for Apache Superset and are happy with it.
> >>
> >> One downside is around the current release management tooling for
> Airflow
> >> has bindings with Jira and the "target version" field.
> >>
> >> Max
> >>
> >> On Thu, Mar 8, 2018 at 6:31 AM, Ash Berlin-Taylor <
> >> ash_airflowl...@firemirror.com> wrote:
> >>
> >>> I've done a bit of digging and there's an Apache "project" called
> >>> gitbox[1] that, if we choose to go that way lets us use Github more
> >>> "natively".
> >>>
> >>> The BookKeeper project migrated to using Github exclusively lsat Jun[2]
> >>> and from the looks of their Github repo are still using this approach,
> and
> >>> their Jira is read only. Their proposal on the migration was
> >>> https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> >>> BP-9+-+Github+issues+for+Issue+Tracking
> >>>
> >>> I think there are three ways we could go:
> >>>
> >>> 1. Nothing changes, we stay as we are and commit to the ASF git repo.
> >>> 2. Move to Gitbox and commit directly to githb, keep issues in Jira.
> >>> 3. Do as BookKeeper did and move to using Github Issues as well as
> Gitub
> >>> for the repo.
> >>>
> >>> Is there interest from anyone else in 2 or 3, if so I will attempt to
> draw
> >>> up a more detailed proposal.
> >>>
> >>> [1]: https://lists.apache.org/thread.html/Znkiyqnxqzryecv
> >>> [2]: http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/
> 201706.mbox/%
> >>> 3CCAO2yDybRq2VUM1JYo_6VT_H8Ca7Lu8af6H-2CZKQzYT6xYGU-g%40mail.gmail.com
> %3E
> >>>
> >>>
> >>>> On 6 Mar 2018, at 09:57, Ash Berlin-Taylor
> <ash_airflowlist@firemirror.
> >>> com> wrote:
> >>>>
> >>>> Ah that would explain why I don't have a button :)
> >>>>
> >>>> Is this Apache policy, or is it possible for committers to be granted
> >>> permission to do this? Having this permission would also let us click
> the
> >>> "rerun tests" button in Travis which would be nice.
> >>>>
> >>>> Is it worth opening an INFRA ticket asking for this, or is it not
> >>> possible?
> >>>>
> >>>> -ash
> >>>>
> >>>>> On 6 Mar 2018, at 08:25, Driesprong, Fokko <fo...@driesprong.frl>
> >>> wrote:
> >>>>>
> >>>>> Hi Ash,
> >>>>>
> >>>>> As a committer we don't have any rights on the Github itself. The
> Github
> >>>>> repo is just a sync of the apache repo. Unfortunately, therefore we
> >>> don't
> >>>>> have the right to close any PR.
> >>>>>
> >>>>> Cheers, Fokko
> >>>>>
> >>>>> 2018-03-06 0:49 GMT+01:00 Ash Berlin-Taylor <
> >>> ash_airflowl...@firemirror.com>
> >>>>> :
> >>>>>
> >>>>>> I've merged two PRs now, and the second one seemed to

Re: Airflow with Celery

2018-05-16 Thread Driesprong, Fokko
I had similar issues with Airflow running the Celery executor.

The celery_result_backend should be a persistent database like Postgres or
MySql. What broker are you using? I would recommend using Redis or
RabbitMQ, depending on what you like the most.

Cheers, Fokko

2018-05-15 21:12 GMT+02:00 David Capwell :

> What I find is that when celery rejects we hit this.  For us we don't do
> work on the hosts so solve by over provisioning tasks in celery
>
> On Tue, May 15, 2018, 6:30 AM Andy Cooper 
> wrote:
>
>> I have had very similar issues when there was a problem with the
>> connection
>> string pointing to the message broker. Triple check those connection
>> strings and attempt to connect outside of airflow.
>>
>> On Tue, May 15, 2018 at 9:27 AM Goutham Pratapa > >
>> wrote:
>>
>> > Hi all,
>> >
>> > I have been using airflow with Celery executor in the background
>> >
>> > https://hastebin.com/sipecovomi.ini --> airflow.cfg
>> >
>> > https://hastebin.com/urutokuvoq.py   --> The dag I have been using
>> >
>> >
>> >
>> > This shows that the dag is always in running state.
>> >
>> >
>> >
>> >
>> > Airflow flower shows nothing in the tasks or in the broker.
>> >
>> >
>> > Did I miss anything can anyone help me in this regard.
>> >
>> >
>> > --
>> > Cheers !!!
>> > Goutham Pratapa
>> >
>>
>


Re: Python3 and sensors module

2018-05-16 Thread Driesprong, Fokko
Hi Cindy,

The other sensors should work under Python3. We try to support Python3 as
much as possible, but sometimes libraries are used that are not compatible.
Could you describe what you are running into?

Cheers, Fokko

2018-05-16 5:36 GMT+02:00 Cindy Rottinghuis :

> Hi,
>
> Are there any plans to update the HDFS_hook.py script to remove the
> reference to the snakebite python library? I’d like to run airflow on
> python3, and this is causing some issues.   The hdfs_hook script is
> referenced in the sensors module.
>
> Any suggestions?
>
> Thanks,
> Cindy


Re: Airflow Docker Container

2018-05-14 Thread Driesprong, Fokko
Hi Daniel,

My dear colleague from GoDataDriven, Bas Harenslak, started on building an
official Docker container on the Dockerhub. I've put him in the CC. In the
end I strongly believe the image should end up in the official Docker
repository: https://github.com/docker-library/official-images

Right now, the excellent images provided by Puckel are widely used for
running Airflow in Docker. For the Kubernetes build we need to pull in some
additional dependencies. Maybe a good idea to do this separately from the
one from Puckel, to keep his images lightweight. Any thoughts?

Kind regards,
Fokko Driesprong


2018-05-14 22:09 GMT+02:00 Anirudh Ramanathan <
ramanath...@google.com.invalid>:

> @Erik Erlandson  has had conversations about publishing
> docker images with the ASF Legal team.
> Adding him to the thread.
>
> On Mon, May 14, 2018 at 1:07 PM Daniel Imberman  >
> wrote:
>
> > Hi everyone,
> >
> > I've started looking into creating an official airflow docker container
> > s.t. users of the KubernetesExecutor could auto-pull from helm
> > charts/deployment yamls/etc. I was wondering what everyone thinks the
> best
> > way to do this would be? Is there an official apache docker repo? Is
> there
> > a preferred linux distro?
> >
> > cc: @anirudh since this was something you had to deal with for
> > spark-on-k8s.
> >
>
>
> --
> Anirudh Ramanathan
>


Re: 答复: Airflow REST API proof of concept.

2018-05-11 Thread Driesprong, Fokko
Hi Luke,

This is the REST api for the new UI:
https://github.com/apache/incubator-airflow/blob/master/airflow/www_rbac/api/experimental/endpoints.py

RBAC = Role Based Access Control, the fine grained security model based on
the fabmanager. Recently we've added some endpoints to it. In the end also
all the GUI ajax calls should go by this API, instead of calling the flask
endpoints directly.

Cheers, Fokko






2018-05-11 14:34 GMT+02:00 Luke Diment :

> Our build pipeline uses Jenkinsfile with Docker kubernetes and helm...we
> orchestrate deployment against our rest api and use junit to assert our
> results...fully programmatically against airflow...!
>
> Sent from my iPhone
>
> > On 12/05/2018, at 12:31 AM, Luke Diment 
> wrote:
> >
> > No it executes the backend airflow command line over HTTP giving
> developers room to freely interact with airflow programmatically...hence
> you can easily integration test your business logic...
> >
> > Sent from my iPhone
> >
> >> On 12/05/2018, at 12:27 AM, Song Liu  wrote:
> >>
> >> So that this Java REST API server is talking to the meta db directly ?
> >> 
> >> 发件人: Luke Diment 
> >> 发送时间: 2018年5月11日 12:22
> >> 收件人: dev@airflow.incubator.apache.org
> >> 主题: Fwd: Airflow REST API proof of concept.
> >>
> >> FYI.
> >>
> >> Sent from my iPhone
> >>
> >> Begin forwarded message:
> >>
> >> From: Luke Diment >
> >> Date: 11 May 2018 at 1:02:43 PM NZST
> >> To: "dev-ow...@airflow.incubator.apache.org airflow.incubator.apache.org>"  >
> >> Subject: Fw: Airflow REST API proof of concept.
> >>
> >>
> >> FYI.
> >>
> >> 
> >> From: Luke Diment
> >> Sent: Thursday, May 10, 2018 4:33 PM
> >> To: dev-subscr...@airflow.incubator.apache.org v-subscr...@airflow.incubator.apache.org>
> >> Subject: Airflow REST API proof of concept.
> >>
> >>
> >> Hi Airflow contributors,
> >>
> >>
> >> I am a Java developer/full stack and lots of other stuff at Westpac
> Bank New Zealand.
> >>
> >>
> >> We currently use Airflow for task scheduling for a rather large
> integration project for financial risk assessment.
> >>
> >>
> >> During our development phase we started to understand that a REST API
> in front of Airflow would be a great idea.
> >>
> >>
> >> We realise that you guys have detailed there will a REST API at some
> stage.
> >>
> >>
> >> We have already built a proof of concept REST API implementation in
> Java (of course...;-))...
> >>
> >>
> >> We were wondering if your contributor group would find this helpful or
> if there would be any reason to continue such an API in Java.
> >>
> >>
> >> We look forward to your response.  We can share the code if needed...
> >>
> >>
> >> Thanks,
> >>
> >>
> >> Luke Diment.
> >>
> >>
> >>
> >>
> >>
> >> The contents of this email and any attachments are confidential and may
> be legally privileged. If you are not the intended recipient please advise
> the sender immediately and delete the email and attachments. Any use,
> dissemination, reproduction or distribution of this email and any
> attachments by anyone other than the intended recipient is prohibited.
> >
> >
> >
> > The contents of this email and any attachments are confidential and may
> be legally privileged. If you are not the intended recipient please advise
> the sender immediately and delete the email and attachments. Any use,
> dissemination, reproduction or distribution of this email and any
> attachments by anyone other than the intended recipient is prohibited.
>
>
>
> The contents of this email and any attachments are confidential and may be
> legally privileged. If you are not the intended recipient please advise the
> sender immediately and delete the email and attachments. Any use,
> dissemination, reproduction or distribution of this email and any
> attachments by anyone other than the intended recipient is prohibited.
>


Re: Airflow 1.10.0

2018-05-02 Thread Driesprong, Fokko
Hi Bryon,

We'll be releasing the RC's soon. If there aren't much issues with the
RC's, it will be released quickly. But we need the community to test these.

Cheers, Fokko

2018-05-01 20:57 GMT+02:00 Wicklund, Bryon :

> Hey I was wondering if you had a date in mind or an estimate for when
> Airflow 1.10.0 will be released?
>
> Thanks!
> -Bryon
>
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified
> that any dissemination, distribution or copying of this e-mail is
> prohibited. If you have received this e-mail in error, please notify the
> sender by replying to this message and delete this e-mail immediately.
>


Re: Problem with SparkSubmit

2018-04-28 Thread Driesprong, Fokko
Hi Anton,

Which version of Airflow are you running?

Cheers, Fokko

2018-04-27 10:24 GMT+02:00 Anton Mushin :

> Hi all,
> I have problem with spark operator. I get exception
>
> user@host:/# airflow test myDAG myTask 2018-04-26
> [2018-04-26 15:32:11,279] {driver.py:120} INFO - Generating grammar tables
> from /usr/lib/python3.5/lib2to3/Grammar.txt
> [2018-04-26 15:32:11,323] {driver.py:120} INFO - Generating grammar tables
> from /usr/lib/python3.5/lib2to3/PatternGrammar.txt
> [2018-04-26 15:32:11,456] {__init__.py:45} INFO - Using executor
> SequentialExecutor
> [2018-04-26 15:32:11,535] {models.py:189} INFO - Filling up the DagBag
> from /usr/local/airflow/dags
> [2018-04-26 15:32:11,811] {base_hook.py:80} INFO - Using connection to:
> sparkhost
> Traceback (most recent call last):
>   File "/usr/local/bin/airflow", line 27, in 
> args.func(args)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line
> 528, in test
> ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/utils/db.py", line
> 50, in wrapper
> result = func(*args, **kwargs)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/models.py", line
> 1584, in run
> session=session)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/utils/db.py", line
> 50, in wrapper
> result = func(*args, **kwargs)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/models.py", line
> 1493, in _run_raw_task
> result = task_copy.execute(context=context)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/contrib/
> operators/spark_submit_operator.py", line 145, in execute
> self._hook.submit(self._application)
>   File 
> "/usr/local/lib/python3.5/dist-packages/airflow/contrib/hooks/spark_submit_hook.py",
> line 231, in submit
> **kwargs)
>   File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
> restore_signals, start_new_session)
>   File "/usr/lib/python3.5/subprocess.py", line 1490, in _execute_child
> restore_signals, start_new_session, preexec_fn)
> TypeError: Can't convert 'list' object to str implicitly
>
> My DAG look like:
>
> from airflow import DAG
> from datetime import datetime, timedelta, date
> from airflow.contrib.operators.spark_submit_operator import
> SparkSubmitOperator
>
> default_args = {
> 'owner': 'spark',
> 'depends_on_past': False,
> 'start_date': datetime.now(),
> 'retries': 1,
> 'retry_delay': timedelta(minutes=1)
> }
>
> dag = DAG('myDAG', default_args=default_args,)
>
> connection_id = "SPARK"
> os.environ[('AIRFLOW_CONN_%s' % connection_id)] = 'spark://sparkhost:7077'
>
> _config = {
> 'jars': 'spark_job.jar',
> 'executor_memory': '2g',
> 'name': 'myJob',
> 'conn_id': connection_id,
> 'java_class':'org.Job'
> }
>
> operator = SparkSubmitOperator(
> task_id='myTask',
> dag=dag,
> **_config
> )
>
> What is wrong? Could somebody help me wit it?
>
>


Re: Use KubernetesExecutor to launch tasks into a Dask cluster in Kubernetes

2018-04-28 Thread Driesprong, Fokko
Also one of the main benefits of the Kubernetes Executor is having a Docker
image that contains all the dependencies that you need for your job.
Personally I would switch to Kubernetes when it leaves the experimental
stage.

Cheers, Fokko

2018-04-28 16:27 GMT+02:00 Kyle Hamlin :

> I don't have a Dask cluster yet, but I'm interested in taking advantage of
> it for ML tasks. My use case would be bursting a lot of ML jobs into a
> Dask cluster all at once.
> From what I understand, Dask clusters utilize caching to help speed up jobs
> so I don't know if it makes sense to launch a Dask cluster for every single
> ML job. Conceivably, I could just have a single Dask worker running 24/7
> and when its time to burst k8 could autoscale the Dask workers as more ML
> jobs are launched into the Dask cluster?
>
> On Fri, Apr 27, 2018 at 10:35 PM Daniel Imberman <
> daniel.imber...@gmail.com>
> wrote:
>
> > Hi Kyle,
> >
> > So you have a static Dask cluster running your k8s cluster? Is there any
> > reason you wouldn't just launch the Dask cluster for the job you're
> running
> > and then tear it down? I feel like with k8s the elasticity is one of the
> > main benefits.
> >
> > On Fri, Apr 27, 2018 at 12:32 PM Kyle Hamlin 
> wrote:
> >
> > > Hi all,
> > >
> > > If I have a Kubernetes cluster running in DCOC and a Dask cluster
> running
> > > in that same Kubernetes cluster is it possible/does it makes sense to
> use
> > > the KubernetesExecutor to launch tasks into the Dask cluster (these are
> > ML
> > > jobs with sklearn)? I feel like there is a bit of inception going on
> here
> > > in my mind and I just want to make sure a setup like this makes sense?
> > > Thanks in advance for anyone's input!
> > >
> >
>
>
> --
> Kyle Hamlin
>


Re: k8s example DAGs

2018-04-23 Thread Driesprong, Fokko
Hi Ruslan,

This is a good point. I also get No module named kubernetes exceptions when
running the initdb. This should be fixed. Could you create a Jira ticket
for this?

Cheers, Fokko

2018-04-23 4:28 GMT+02:00 Ruslan Dautkhanov :

>  Is it possible to make kubernetes examples installed optionally?
>
> We don't use Kubernetes and a bare Airflow install fills logs with
> following :
>
> 2018-04-22 19:49:04,718 ERROR - Failed to import:
> > /opt/airflow/airflow-20180420/src/apache-airflow/airflow/
> > example_dags/example_kubernetes_operator.py
> > Traceback (most recent call last):
> >   File "/opt/airflow/airflow-20180420/src/apache-airflow/
> airflow/models.py",
> > line 300, in process_file
> > m = imp.load_source(mod_name, filepath)
> >   File "/opt/airflow/airflow-20180420/src/apache-airflow/
> > airflow/example_dags/example_kubernetes_operator.py", line 19, in
> 
> > from airflow.contrib.operators.kubernetes_pod_operator import
> > KubernetesPodOperator
> >   File "/opt/airflow/airflow-20180420/src/apache-airflow/
> > airflow/contrib/operators/kubernetes_pod_operator.py", line 21, in
> > 
> > from airflow.contrib.kubernetes import kube_client, pod_generator,
> > pod_launcher
> >   File "/opt/airflow/airflow-20180420/src/apache-airflow/
> > airflow/contrib/kubernetes/pod_launcher.py", line 25, in 
> > from kubernetes import watch
> > ImportError: No module named kubernetes
>
>
> Would be great to make examples driven by what modules installed if they
> have external dependencies,
>
>
> Thanks!
>
> Ruslan Dautkhanov
>


Run duration

2018-04-19 Thread Driesprong, Fokko
Hi all,

I have a question regarding the run duration. Is anyone still using this?
You can use this to kill the scheduler after a predefined set of seconds. I
know this was implemented in the early days to force a restart of the
Airflow scheduler once a while.

Recently I was at a customer, which had issues with Airflow, and they where
using it and it would exit in a corrupt state. Therefore I would like to
remove it from Airflow. So I would like to know if people are still relying
on it.

https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L520

Cheers, Fokko


Re: Trouble with remote s3 logging

2018-04-17 Thread Driesprong, Fokko
Hi Kyle,

Thanks for reaching out. This looks like a bug. Could you open a Jira issue
describing this issue?

This seems to be something that should be rather easy fo fix. Since it
tries to append a tuple instead of a string. I would expect that more
people would encounter this bug.

Cheers, Fokko

2018-04-16 21:09 GMT+02:00 Kyle Hamlin :

> This morning I tried to upgrade to the newer version of the logging config
> file but I keep getting the following a TypeError for my database session.
> I know my credentials are correct so I'm confused why this is happening
> now.
>
> Has anyone experiences this? Note that I'm installing Airflow from master.
>
> *Config*
> AIRFLOW__CORE__LOGGING_LEVEL=WARN
> AIRFLOW__CORE__REMOTE_LOGGING=True
> AIRFLOW__CORE__REMOTE_LOG_CONN_ID=s3_logger
> AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER=s3://airflow/logs
> AIRFLOW__CORE__LOGGING_CONFIG_CLASS=config.log_config.
> DEFAULT_LOGGING_CONFIG
> AIRFLOW__SCHEDULER__CHILD_PROCESS_LOG_DIRECTORY=s3://airflow/logs
>
> *Session NoneType error*
>  Traceback (most recent call last):
>File
> "/usr/local/lib/python3.6/site-packages/airflow/utils/
> log/s3_task_handler.py",
> line 171, in s3_write
>  encrypt=configuration.conf.getboolean('core', 'ENCRYPT_S3_LOGS'),
>File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py",
> line 274, in load_string
>  encrypt=encrypt)
>File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py",
> line 313, in load_bytes
>  client = self.get_conn()
>File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py",
> line 34, in get_conn
>  return self.get_client_type('s3')
>File
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/
> hooks/aws_hook.py",
> line 151, in get_client_type
>  session, endpoint_url = self._get_credentials(region_name)
>File
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/
> hooks/aws_hook.py",
> line 97, in _get_credentials
>  connection_object = self.get_connection(self.aws_conn_id)
>File
> "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line
> 82, in get_connection
>  conn = random.choice(cls.get_connections(conn_id))
>File
> "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line
> 77, in get_connections
>  conns = cls._get_connections_from_db(conn_id)
>File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line
> 72, in wrapper
>  with create_session() as session:
>File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
>  return next(self.gen)
>File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line
> 41, in create_session
>  session = settings.Session()
>  TypeError: 'NoneType' object is not callable
>
> *TypeError must be str not tuple*
>  [2018-04-16 18:37:28,200] ERROR in app: Exception on
> /admin/airflow/get_logs_with_metadata [GET]
>  Traceback (most recent call last):
>File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982,
> in wsgi_app
>  response = self.full_dispatch_request()
>File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614,
> in full_dispatch_request
>  rv = self.handle_user_exception(e)
>File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517,
> in handle_user_exception
>  reraise(exc_type, exc_value, tb)
>File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line
> 33,
> in reraise
>  raise value
>File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612,
> in full_dispatch_request
>  rv = self.dispatch_request()
>File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598,
> in dispatch_request
>  return self.view_functions[rule.endpoint](**req.view_args)
>File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line
> 69, in inner
>  return self._run_view(f, *args, **kwargs)
>File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line
> 368, in _run_view
>  return fn(self, *args, **kwargs)
>File "/usr/local/lib/python3.6/site-packages/flask_login.py", line 755,
> in decorated_view
>  return func(*args, **kwargs)
>File "/usr/local/lib/python3.6/site-packages/airflow/www/utils.py",
> line
> 269, in wrapper
>  return f(*args, **kwargs)
>File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line
> 74, in wrapper
>  return func(*args, **kwargs)
>File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py",
> line
> 770, in get_logs_with_metadata
>  logs, metadatas = handler.read(ti, try_number, metadata=metadata)
>File
> "/usr/local/lib/python3.6/site-packages/airflow/utils/
> log/file_task_handler.py",
> line 165, in read
>  logs[i] += log
>  TypeError: must be str, not tuple
>


Re: Give it up for Fokko!

2018-04-14 Thread Driesprong, Fokko
Thanks you all for the kind words. I really love the energetic community
around Airflow and all the cool stuff that we're working on. I find it
truly amazing how we build such an awesome product with great people from
all around the world!

Cheers, Fokko

2018-04-14 1:56 GMT+02:00 Sid Anand :

> +100
>
> On Fri, Apr 13, 2018 at 4:08 PM, Alex Tronchin-James 949-412-7220 <
> alex.n.ja...@gmail.com> wrote:
>
> > Bravo!!! Bien fait!
> >
> > On Fri, Apr 13, 2018 at 3:54 PM Joy Gao  wrote:
> >
> > >  
> > >
> > > On Fri, Apr 13, 2018 at 11:47 AM, Naik Kaxil  wrote:
> > >
> > > > Couldn't agree more. Thanks Fokko
> > > >
> > > > On 13/04/2018, 17:56, "Maxime Beauchemin" <
> maximebeauche...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > Hey all,
> > > >
> > > > I wanted to point out the amazing work that Fokko is doing,
> > > > reviewing/merging PRs and doing fantastic committer & maintainer
> > > work.
> > > > It
> > > > takes a variety of contributions to make projects like Airflow
> > > thrive,
> > > > but
> > > > without this kind of involvement it wouldn't be possible to keep
> > > > shipping
> > > > better versions of the product steadily.
> > > >
> > > > Cheers to that!
> > > >
> > > > Max
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Kaxil Naik
> > > >
> > > > Data Reply
> > > > 38 Grosvenor Gardens
> > > > London SW1W 0EB - UK
> > > > phone: +44 (0)20 7730 6000 <+44%2020%207730%206000>
> > > > k.n...@reply.com
> > > > www.reply.com
> > > >
> > >
> >
>


Re: Cancel a Running dag

2018-04-13 Thread Driesprong, Fokko
Like Bolke said, it has been fixed in master. One of the perquisites is
support by the operator. For example, the Spark operator has implemented
how to kill the Spark job on YARN, Local and Kubernetes. If you are running
something else, you might want to check if this is implemented.

Implemented on_kill: https://github.com/apache/incubator-airflow/pull/3204
An example of the on_kill:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py#L485-L534

Cheers, Fokko

2018-04-12 21:19 GMT+02:00 Bolke de Bruin :

> This is now fixed in master. Clearing tasks will now properly terminate a
> running task. If you pause the dag run no new tasks will be scheduled.
>
> B.
>
>
>
> Verstuurd vanaf mijn iPad
>
> > Op 12 apr. 2018 om 20:23 heeft Laura Lorenz 
> het volgende geschreven:
> >
> > That won't stop them if they are already running in a celery worker or
> > already in your messaging queue backend (e.g. rabbitmq; redis), but it
> will
> > prevent the message to do them from being emitted again by the airflow
> > scheduler to your messaging queue backend. To be thorough you have to do
> > both - stop the scheduler from scheduling the tasks anymore (by failing
> > them individually and/or the DagRun in the metadata database) and, if you
> > want to make sure the tasks that already got picked up stop and don't try
> > again, you have to kill their worker processes and make sure your
> messaging
> > queue is clean of messages of that task type. If you don't care that any
> > already started or queued up tasks finish, you can simply doctor the
> > metadata database.
> >
> > Laura
> >
> > On Thu, Apr 12, 2018 at 12:40 PM, ramandu...@gmail.com <
> ramandu...@gmail.com
> >> wrote:
> >
> >> Thanks Laura,
> >> We are using the CeleryExecutor. Just wondering if marking the
> >> TaskInstances as failed in metadata store would also work.
> >> -Raman
> >>
> >>> On 2018/04/12 16:27:00, Laura Lorenz  wrote:
> >>> I use the CeleryExecutor and have used a mix of `celery control` and
> >>> messaging queue purges to kill the running tasks and prevent them from
> >>> being picked up by workers again (respectively), and doctor the DagRun
> to
> >>> failed to stop the scheduler from repopulating the message. I think if
> >> you
> >>> are using the Local or Sequential Executor you'd have to kill the
> >> scheduler
> >>> process.
> >>>
> >>> Laura
> >>>
> >>> On Thu, Apr 12, 2018 at 12:05 PM, Taylor Edmiston  >
> >>> wrote:
> >>>
>  I don't think killing a currently running task is possible today.
> 
>  Of course you can pause it from the CLI or web UI so that future runs
> >> don't
>  get triggered, but it sounds like that's not what you're looking for.
> 
>  Best,
>  Taylor
> 
>  *Taylor Edmiston*
>  Blog  | Stack Overflow CV
>   | LinkedIn
>   | AngelList
>  
> 
> 
>  On Thu, Apr 12, 2018 at 11:26 AM, ramandu...@gmail.com <
>  ramandu...@gmail.com
> > wrote:
> 
> > Hi All,
> > We have a use case to cancel the already running DAG. So is there any
> > recommended way to do so.
> >
> > Thanks,
> > Raman
> >
> 
> >>>
> >>
>


Re: Readthedocs

2018-04-10 Thread Driesprong, Fokko
Thanks Kaxil!

Cheers, Fokko

2018-04-10 14:19 GMT+02:00 Naik Kaxil <k.n...@reply.com>:

> Added you.
>
> https://readthedocs.org/projects/airflow/
>
> Regards,
> Kaxil
>
>
> On 10/04/2018, 13:04, "fo...@driesprongen.nl on behalf of Driesprong,
> Fokko" <fo...@driesprongen.nl on behalf of fo...@driesprong.frl> wrote:
>
> Thanks Kaxil,
>
> Thanks for the quick response! My username is Fokko
>
> Cheers, Fokko
>
> 2018-04-10 12:56 GMT+02:00 Naik Kaxil <k.n...@reply.com>:
>
> > Additionally, I have also updated the docs to reflect the latest
> change.
> >
> > https://airflow.readthedocs.io/en/latest/index.html
> >
> > https://airflow.readthedocs.io/en/latest/start.html
> >
> > Regards,
> > Kaxil
> >
> > On 10/04/2018, 11:54, "Naik Kaxil" <k.n...@reply.com> wrote:
> >
> > Can you create an account on readthedocs and give me your user
> id?
> > I'll add you to it.
> >
> > Also,
> >
> > You can run the "sphinx-build -b html . ~/Desktop/airflowdoc"
> command
> > on your local machine to generate the docs and
> "incubator-airflow/docs/_build/html'
> > directory to see the generated docs.
> >
> > Regards,
> > Kaxil
> >
> >
> > On 10/04/2018, 08:05, "fo...@driesprongen.nl on behalf of
> Driesprong,
> > Fokko" <fo...@driesprongen.nl on behalf of fo...@driesprong.frl>
> wrote:
> >
> > Hi all,
> >
> > Would it be possible to give me access to the Readthedocs
> > environment. I'm
> > working on improving the docs a bit,
> > <https://github.com/apache/incubator-airflow/pull/3201> but
> I
> > can't see the
> > output of the docs generation.
> >
> > Cheers, Fokko
> >
> >
> >
> >
> >
> >
> > Kaxil Naik
> >
> > Data Reply
> > 38 Grosvenor Gardens
> > London SW1W 0EB - UK
> > phone: +44 (0)20 7730 6000
> > k.n...@reply.com
> > www.reply.com
> >
> >
> >
> >
> >
> >
> > Kaxil Naik
> >
> > Data Reply
> > 38 Grosvenor Gardens
> > London SW1W 0EB - UK
> > phone: +44 (0)20 7730 6000
> > k.n...@reply.com
> > www.reply.com
> >
>
>
>
>
>
>
> Kaxil Naik
>
> Data Reply
> 38 Grosvenor Gardens
> London SW1W 0EB - UK
> phone: +44 (0)20 7730 6000
> k.n...@reply.com
> www.reply.com
>


Readthedocs

2018-04-10 Thread Driesprong, Fokko
Hi all,

Would it be possible to give me access to the Readthedocs environment. I'm
working on improving the docs a bit,
 but I can't see the
output of the docs generation.

Cheers, Fokko


Re: Issue : Merges are not syncing back to master

2018-04-09 Thread Driesprong, Fokko
Thanks for picking this up Sid!

Cheers, Fokko

2018-04-06 7:22 GMT+02:00 Sid Anand :

> Heads up. Apache Airflow github master
>  is behind Apache Airflow
> master
>  airflow.git;a=summary>
> currently. I've opened a ticket to fix this -- currently, merges are not
> syncing back to master .
>
> https://issues.apache.org/jira/browse/INFRA-16303
>
> -s
>


Re: Running dag run doesn't schedule task

2018-03-19 Thread Driesprong, Fokko
Hi David,

First I would update to Apache Airflow 1.9.0, there have been a lot of
fixes between 1.8.2 and 1.9.0. Just to see if the bug is still in there.

Cheers, Fokko

2018-03-18 19:41 GMT+01:00 David Capwell :

> Thanks for the reply
>
> Our script doesn't set it so should be off; the process does not normally
> restart (monitoring has a counter for number of restarts since deploy,
> currently as 0)
>
> At the point in time the UI showed the upstream tasks as green (success);
> we manually ran tasks so no longer in the same state, so can't check UI
> right now
>
> On Sun, Mar 18, 2018, 11:34 AM Bolke de Bruin  wrote:
>
> > Are you running with num_runs? If so disable it. We have seen this
> > behavior with num_runs. Also you can find out by clicking on the task if
> > there is a dependency issue.
> >
> > B.
> >
> > Verstuurd vanaf mijn iPad
> >
> > > Op 18 mrt. 2018 om 19:08 heeft David Capwell  het
> > volgende geschreven:
> > >
> > > We just started seeing this a few days ago after turning on SLA for our
> > > tasks (not saying SLA did this, may have been happening before and not
> > > noticing), but we have a dag that runs once a hour and we see that 4-5
> > dag
> > > runs are marked running but tasks are not getting scheduled.  When we
> get
> > > the SLA alert the action we are doing right now is going to the UI and
> > > clicking run on tasks manually; this is only needed for the oldest dag
> > run
> > > and the rest recover after that. In the past 3 days this has happened
> > twice
> > > to us.
> > >
> > > We are running 1.8.2, are there any known jira about this? Don't know
> > > scheduler well, what could I do to see why these tasks are getting
> > skipped
> > > without manual intervention?
> > >
> > > Thanks for your time.
> >
>


Re: Very looong py files

2018-03-11 Thread Driesprong, Fokko
Maybe wait until this one has been merged 浪
https://github.com/apache/incubator-airflow/pull/3116/files

Cheers, Fokko

Op zo 11 mrt. 2018 om 21:34 schreef Driesprong, Fokko <fo...@driesprong.frl>

> Hi Bruno,
>
> I agree that there are files that are too big as you mentioned. Sometimes
> structures are being refactored like I did with the sensors.
>
> The problem is that if you refactor one of these files, most PR requests
> will have merge conflicts. But this should not be seen as an impediment to
> increase the quality of Airflow’s code.
> Especially the models.py should be fairly easy to refactor and split into
> different files. This file grew due historical reasons. If you feel like
> picking up this task, I encourage you to create a Jira for it and start
> cracking.
>
> Cheers, Fokko
>
> Op zo 11 mrt. 2018 om 20:11 schreef Bruno Bonagura <bbonagu...@gmail.com>
>
>> I mean 'big files into submodules'.
>>
>> On Sun, Mar 11, 2018 at 4:09 PM, Bruno Bonagura <bbonagu...@gmail.com>
>> wrote:
>>
>> > Hello,
>> >
>> > I'm pretty newbie to the project, so I apologize in advance if I'm being
>> > too silly.
>> >
>> > Is there any plan or goal to refactor big files into packages, like it
>> > happened to sensors.py? Has it been tried before with the files I list
>> > bellow and failed? I searched Jira for 'refactor' and didn't find much.
>> >
>> > Airflow codebase has some giant ugly files, what makes it difficult to
>> > find things and organize. Also it makes difficult for new contributors
>> to
>> > understand the code and could even keep some from contributing, having
>> > impact in the community growth.
>> >
>> > Here are the bigger files:
>> >
>> > 5008 airflow/models.py
>> > 3121 tests/jobs.py
>> > 2844 airflow/www/views.py
>> > 2595 airflow/jobs.py
>> > 2494 tests/core.py
>> > 1847 tests/models.py
>> > 1674 airflow/bin/cli.py
>> > 1487 airflow/contrib/hooks/bigquery_hook.py
>> > 1045 airflow/contrib/operators/dataproc_operator.py
>> >
>> > I've been playing and experimenting with models.py a little. It's
>> > difficult, but it might be possible.
>> >
>> > Well, that's it.
>> >
>> > Best regards.
>> > Bruno
>> >
>>
>


Re: Very looong py files

2018-03-11 Thread Driesprong, Fokko
Hi Bruno,

I agree that there are files that are too big as you mentioned. Sometimes
structures are being refactored like I did with the sensors.

The problem is that if you refactor one of these files, most PR requests
will have merge conflicts. But this should not be seen as an impediment to
increase the quality of Airflow’s code.
Especially the models.py should be fairly easy to refactor and split into
different files. This file grew due historical reasons. If you feel like
picking up this task, I encourage you to create a Jira for it and start
cracking.

Cheers, Fokko

Op zo 11 mrt. 2018 om 20:11 schreef Bruno Bonagura 

> I mean 'big files into submodules'.
>
> On Sun, Mar 11, 2018 at 4:09 PM, Bruno Bonagura 
> wrote:
>
> > Hello,
> >
> > I'm pretty newbie to the project, so I apologize in advance if I'm being
> > too silly.
> >
> > Is there any plan or goal to refactor big files into packages, like it
> > happened to sensors.py? Has it been tried before with the files I list
> > bellow and failed? I searched Jira for 'refactor' and didn't find much.
> >
> > Airflow codebase has some giant ugly files, what makes it difficult to
> > find things and organize. Also it makes difficult for new contributors to
> > understand the code and could even keep some from contributing, having
> > impact in the community growth.
> >
> > Here are the bigger files:
> >
> > 5008 airflow/models.py
> > 3121 tests/jobs.py
> > 2844 airflow/www/views.py
> > 2595 airflow/jobs.py
> > 2494 tests/core.py
> > 1847 tests/models.py
> > 1674 airflow/bin/cli.py
> > 1487 airflow/contrib/hooks/bigquery_hook.py
> > 1045 airflow/contrib/operators/dataproc_operator.py
> >
> > I've been playing and experimenting with models.py a little. It's
> > difficult, but it might be possible.
> >
> > Well, that's it.
> >
> > Best regards.
> > Bruno
> >
>


Re: Ash Berlin-Taylor joins Apache Airflow as committer and PPMC member

2018-02-24 Thread Driesprong, Fokko
Welcome Ash!

2018-02-23 18:45 GMT+01:00 Andy Hadjigeorgiou :

> Congrats Ash!
>
> > On Feb 23, 2018, at 12:19 PM, Taylor Edmiston 
> wrote:
> >
> > Congrats, Ash!  Very exciting.
> >
> > *Taylor Edmiston*
> > TEdmiston.com  | Blog
> > 
> > Stack Overflow CV  | LinkedIn
> >  | AngelList
> > 
> >
> >
> >> On Fri, Feb 23, 2018 at 12:12 PM, Joy Gao  wrote:
> >>
> >> Congrats and welcome!  :D
> >>
> >>> On Fri, Feb 23, 2018 at 8:27 AM, Sid Anand  wrote:
> >>>
> >>> Folks!
> >>> Please join the Airflow PPMC in welcoming Ash Berlin-Taylor to its
> ranks
> >> as
> >>> both committer and PPMC member! Congrats Ash!
> >>>
> >>> Announcement :
> >>> https://cwiki.apache.org/confluence/display/AIRFLOW/
> >>> Announcements#Announcements-Feb15,2018
> >>> Tweet : https://twitter.com/ApacheAirflow/status/967072083418128385
> >>>
> >>> The Apache Airflow PPMC
> >>>
> >>
>


Re: Scheduling error that seems to be related to @once since Airflow v1.9 upgrade

2018-02-12 Thread Driesprong, Fokko
Hi Aaron,

It looks like you hit a bug:
https://issues.apache.org/jira/browse/AIRFLOW-1977

Currently I don't have a solution for this at hand, I have to dig into the
code. If the DAG is scheduled one, you might want to try setting it to None
instead of @once: https://airflow.apache.org/scheduler.html#dag-runs

Cheers, Fokko


2018-02-12 18:30 GMT+01:00 Aaron Polhamus :

> Question on StackOverflow. Has anyone else dealt with this
> https://stackoverflow.com/questions/48752087/odd-
> typeerror-from-the-airflow-scheduler-has-usage-of-once-for-scheduler-int
>
> I have a super simple test DAG that looks like this:
>
> from datetime import datetime
> from airflow.models import DAGfrom airflow.operators.python_operator import 
> PythonOperator
>
>
> DAG = DAG(
>   dag_id='scheduler_test_dag',
>   start_date=datetime(2017, 9, 9, 4, 0, 0, 0), #..EC2 time. Equal to 11pm 
> hora México
>   max_active_runs=1,
>   schedule_interval='@once' #externally triggered
>   )
> def ticker_function():
> with open('/tmp/ticker', 'a') as outfile:
> outfile.write('{}\n'.format(datetime.now()))
>
> time_ticker = PythonOperator(
> task_id='time_ticker',
> python_callable=ticker_function,
> dag=DAG)
>
> Since upgrading to apache-airflow v1.9 this DAG is hung and won't run.
> Digging into the scheduler logs I found the error trace:
>
> [2018-02-12 17:03:06,259] {jobs.py:1754} INFO - DAG(s) 
> dict_keys(['scheduler_test_dag']) retrieved from 
> /home/ubuntu/airflow/dags/scheduler_test_dag.py[2018-02-12 17:03:06,315] 
> {jobs.py:1386} INFO - Processing scheduler_test_dag[2018-02-12 17:03:06,320] 
> {jobs.py:379} ERROR - Got an exception! Propagating...Traceback (most recent 
> call last):
>   File "/usr/local/lib/python3.5/dist-packages/airflow/jobs.py", line 371, in 
> helper
> pickle_dags)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/utils/db.py", line 50, 
> in wrapper
> result = func(*args, **kwargs)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/jobs.py", line 1792, 
> in process_file
> self._process_dags(dagbag, dags, ti_keys_to_schedule)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/jobs.py", line 1388, 
> in _process_dags
> dag_run = self.create_dag_run(dag)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/utils/db.py", line 50, 
> in wrapper
> result = func(*args, **kwargs)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/jobs.py", line 807, in 
> create_dag_runif next_start <= now:TypeError: unorderable types: NoneType() 
> <= datetime.datetime()
>
> Where is this error coming from? The only thing that I can think of is
> that the usage of scheduler_interval='@once' has changed, which is the
> one thing that this DAG has in common with one other broken DAG on my
> server since the v1.9 upgrade. Otherwise it's the most basic DAG
> ever--doesn't seem like there should be a problem. Previously I was using
> the basic pip install before switching to the apache-airflow repo.
>
> Here's a screenshot of the Web UI. Everything seems to be working alright,
> except the top and bottom DAGS which have scheduling interval set to @once
>  and are indefinitely hung:
>
> ​
> Any thoughts?
>
> --
>
>
> *Aaron Polhamus*
> *Chief Technology Officer *
>
> Cel (México): +52 (55) 1951-5612 <+52%2055%201951%205612>
> Cell (USA): +1 (206) 380-3948 <+1%20206-380-3948>
> Tel: +52 (55) 1168 9757 <+52%2055%201168%209757> - Ext. 181
>
> ***Por favor referirse a nuestra página web
>  para más información
> acerca de nuestras políticas de privacidad.*
>
>


Re: Airflow Documentation - Readthedocs

2018-02-11 Thread Driesprong, Fokko
Thanks Kaxil,

I think it is good to merge this now after:
- Changing the url's to https
- Explicitly mention that it links to the master/latest docs

It would be great to host the docs of Airflow 1.9 using
https://pythonhosted.org/. I've tried to submit the apache-airflow package,
but I'm not allowed to do this. Anyone any idea how to resolve this?

Kind regards,
Fokko Driesprong


2018-02-11 22:29 GMT+01:00 Naik Kaxil <k.n...@reply.com>:

> I have updated the Pythonhosted links with the docs at
> http://airflow.incubator.apache.org/ . This has been documented at
> https://cwiki.apache.org/confluence/display/AIRFLOW/
> Building+and+deploying+the+docs by Chris Riccomini.
>
> I have also added the Readthedocs badge for the latest version. Have
> created a PR (https://github.com/apache/incubator-airflow/pull/3033).
>
> Regards,
> Kaxil
>
>
>
> On 12/02/2018, 01:54, "Naik Kaxil" <k.n...@reply.com> wrote:
>
> Hi all,
>
> I think instead of the old "airflow" docs on PyPi mentioned by @ash, I
> think the latest docs are also deleted from PyPI..
>
> The links for the docs on the GitHub home (Readme) are all broken.
>
> Should I change the all the links to readthedocs? Or can someone who
> has access to PyPI create the docs?
>
> What do you suggest?
>
> Regards,
> Kaxil
>
> On 11/02/2018, 06:03, "Driesprong, Fokko" <fo...@driesprong.frl>
> wrote:
>
> Vo
>
>
> Op zo 11 feb. 2018 om 01:31 schreef Maxime Beauchemin <
> maximebeauche...@gmail.com>
>
> > I just hit the "Destroy" button.
> >
> > Max
> >
> > On Sat, Feb 10, 2018 at 12:53 PM, Naik Kaxil <k.n...@reply.com>
> wrote:
> >
> > > From what I see the package owners at PyPi are mistercrunch,
> aoen, artwr
> > .
> > > Will it be possible for one of you to take care of this?
> > >
> > >
> > > On 10/02/2018, 20:26, "Ash Berlin-Taylor" <
> > ash_airflowl...@firemirror.com>
> > > wrote:
> > >
> > > Relatedly: can we remove the docs for the old "airflow"
> dist on pypi?
> > >
> > > If someone has permissions to manage the on pypi, can you
> go to
> > > https://pypi.python.org/pypi?%3Aaction=pkg_edit=airflow <
> > > https://pypi.python.org/pypi?:action=pkg_edit=airflow>
> and hit the
> > > "Destroy Documentation" button:
> > >
> > > > If you would like to DESTROY any existing documentation
> hosted at
> > > http://pythonhosted.org/airflow Use this button, There is no
> undo.
> > > This should hopefully mean fewer people find the older
> version of the
> > > docs form Google etc.
> > >
> > >
> > >
> > > -ash
> > >
> > >
> > >
> > > > On 9 Feb 2018, at 22:35, Naik Kaxil <k.n...@reply.com>
> wrote:
> > > >
> > > > Thanks Andy,
> > > >
> > > > It fails because of missing '.readthedocs.yml' (
> > > https://github.com/apache/incubator-airflow/blob/master/
> .readthedocs.yml
> > )
> > > file that forces it to use 'pip' to install packages and
> dependencies
> > > instead of using 'setup.py' file. I uploaded that file in my
> fork for
> > > V1.8-Stable branch and it passes there. I am not sure whether
> we should
> > do
> > > that or ignore V1.8 version.
> > > >
> > > > On 10/02/2018, 00:46, "Andy Hadjigeorgiou" <
> andyxha...@gmail.com>
> > > wrote:
> > > >
> > > >Absolutely - I did it for 1.9 stable and 1.8 stable.
> 1.8.2
> > > building failed (
> > > >https://readthedocs.org/projects/airflow/builds/
> 6724340/)
> > > >
> > > >
> > > >On Fri, Feb 9, 2018 at 12:56 PM, Naik Kaxil <
> k.n...@reply.com>
> > > wrote:
> > > >
> > > >> Thanks Andy for doing that. Can you please do the same
> for version
>  

Re: Airflow Documentation - Readthedocs

2018-02-10 Thread Driesprong, Fokko
Vo


Op zo 11 feb. 2018 om 01:31 schreef Maxime Beauchemin <
maximebeauche...@gmail.com>

> I just hit the "Destroy" button.
>
> Max
>
> On Sat, Feb 10, 2018 at 12:53 PM, Naik Kaxil <k.n...@reply.com> wrote:
>
> > From what I see the package owners at PyPi are mistercrunch, aoen, artwr
> .
> > Will it be possible for one of you to take care of this?
> >
> >
> > On 10/02/2018, 20:26, "Ash Berlin-Taylor" <
> ash_airflowl...@firemirror.com>
> > wrote:
> >
> > Relatedly: can we remove the docs for the old "airflow" dist on pypi?
> >
> > If someone has permissions to manage the on pypi, can you go to
> > https://pypi.python.org/pypi?%3Aaction=pkg_edit=airflow <
> > https://pypi.python.org/pypi?:action=pkg_edit=airflow> and hit the
> > "Destroy Documentation" button:
> >
> > > If you would like to DESTROY any existing documentation hosted at
> > http://pythonhosted.org/airflow Use this button, There is no undo.
> > This should hopefully mean fewer people find the older version of the
> > docs form Google etc.
> >
> >
> >
> > -ash
> >
> >
> >
> > > On 9 Feb 2018, at 22:35, Naik Kaxil <k.n...@reply.com> wrote:
> > >
> > > Thanks Andy,
> > >
> > > It fails because of missing '.readthedocs.yml' (
> > https://github.com/apache/incubator-airflow/blob/master/.readthedocs.yml
> )
> > file that forces it to use 'pip' to install packages and dependencies
> > instead of using 'setup.py' file. I uploaded that file in my fork for
> > V1.8-Stable branch and it passes there. I am not sure whether we should
> do
> > that or ignore V1.8 version.
> > >
> > > On 10/02/2018, 00:46, "Andy Hadjigeorgiou" <andyxha...@gmail.com>
> > wrote:
> > >
> > >Absolutely - I did it for 1.9 stable and 1.8 stable. 1.8.2
> > building failed (
> > >https://readthedocs.org/projects/airflow/builds/6724340/)
> > >
> > >
> > >On Fri, Feb 9, 2018 at 12:56 PM, Naik Kaxil <k.n...@reply.com>
> > wrote:
> > >
> > >> Thanks Andy for doing that. Can you please do the same for version
> > 1.9 and
> > >> 1.8.2 if possible? If Arthur adds me I will do it but if you can
> > spare some
> > >> time to do that, it would be great.
> > >>
> > >>
> > >>
> > >> On 09/02/2018, 22:58, "Andy Hadjigeorgiou" <andyxha...@gmail.com>
> > wrote:
> > >>
> > >>Absolutely, thanks!
> > >>
> > >>Updated the docs, build passing, Will think about a proper
> > cadence for
> > >> it.
> > >>
> > >>- Andy
> > >>
> > >>On Fri, Feb 9, 2018 at 11:16 AM, Arthur Wiedmer <
> > >> arthur.wied...@gmail.com>
> > >>wrote:
> > >>
> > >>> Done.
> > >>>
> > >>> Thanks for the help Andy!
> > >>>
> > >>> On Fri, Feb 9, 2018 at 6:56 AM, Andy Hadjigeorgiou <
> > >> andyxha...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> My username is andyxhadji, email is andyxha...@gmail.com.
> > >>>>
> > >>>> Thanks for the help Arthur!
> > >>>>
> > >>>> - Andy
> > >>>>
> > >>>> On Fri, Feb 9, 2018 at 9:46 AM, Arthur Wiedmer <
> > >> arthur.wied...@gmail.com
> > >>>>
> > >>>> wrote:
> > >>>>
> > >>>>> Someone needs to create an account on the site, I can then add
> > >> them as
> > >>> a
> > >>>>> maintainer for the project.
> > >>>>>
> > >>>>> Best,
> > >>>>> Arthur
> > >>>>>
> > >>>>> On Sun, Feb 4, 2018 at 8:14 AM, Bolke de Bruin <
> > >> bdbr...@gmail.com>
> > >>>> wrote:
> > >>>>>
> > >>>>>> Perfectly fine to me. It would be highly appreciated if this
> > >> could be
> > >>>>>> picked 

Re: Airflow Documentation - Readthedocs

2018-02-04 Thread Driesprong, Fokko
This is a good question, I see that artwr and Maxime are owners of the
project, maybe they can fix the build? Or add some more Airflow committers
as readthedocs project owners.

I get quite some questions from colleagues because they are reading old
docs :)

Cheers, Fokko

2018-02-04 14:32 GMT+01:00 Naik Kaxil :

> Hi guys,
>
>
>
> Are we still using http://airflow.readthedocs.io/ for latest
> documentation?
>
>
>
> I see that the last build was 2 months ago which failed..
>
>
>
> http://readthedocs.org/projects/airflow/builds/
>
>
>
> The documentation at http://pythonhosted.org/airflow/ will only be for
> the latest airflow version at PyPI (1.9 for now).
>
>
>
> It would be good to have documentation of the GitHub version at
> readthedocs.
>
>
>
> Regards,
>
> Kaxil
>
>
> Kaxil Naik
>
> Data Reply
> 38 Grosvenor Gardens
> 
> London SW1W 0EB - UK
> phone: +44 (0)20 7730 6000 <+44%2020%207730%206000>
> k.n...@reply.com
> www.reply.com
>
> [image: Data Reply]
>


Fixed Travis permission problem

2018-01-31 Thread Driesprong, Fokko
Hi all,

I've hotfixed the CI permission problem:

Exception:
Traceback (most recent call last):
  File 
"/home/travis/build/apache/incubator-airflow/.tox/py35-backend_mysql/lib/python3.5/site-packages/pip/basecommand.py",
line 215, in main
status = self.run(options, args)
  File 
"/home/travis/build/apache/incubator-airflow/.tox/py35-backend_mysql/lib/python3.5/site-packages/pip/commands/wheel.py",
line 199, in run
if not wb.build():
  File 
"/home/travis/build/apache/incubator-airflow/.tox/py35-backend_mysql/lib/python3.5/site-packages/pip/wheel.py",
line 749, in build
self.requirement_set.prepare_files(self.finder)
  File 
"/home/travis/build/apache/incubator-airflow/.tox/py35-backend_mysql/lib/python3.5/site-packages/pip/req/req_set.py",
line 380, in prepare_files
ignore_dependencies=self.ignore_dependencies))
  File 
"/home/travis/build/apache/incubator-airflow/.tox/py35-backend_mysql/lib/python3.5/site-packages/pip/req/req_set.py",
line 620, in _prepare_file
session=self.session, hashes=hashes)
  File 
"/home/travis/build/apache/incubator-airflow/.tox/py35-backend_mysql/lib/python3.5/site-packages/pip/download.py",
line 809, in unpack_url
unpack_file_url(link, location, download_dir, hashes=hashes)
  File 
"/home/travis/build/apache/incubator-airflow/.tox/py35-backend_mysql/lib/python3.5/site-packages/pip/download.py",
line 715, in unpack_file_url
unpack_file(from_path, location, content_type, link)
  File 
"/home/travis/build/apache/incubator-airflow/.tox/py35-backend_mysql/lib/python3.5/site-packages/pip/utils/__init__.py",
line 599, in unpack_file
flatten=not filename.endswith('.whl')
  File 
"/home/travis/build/apache/incubator-airflow/.tox/py35-backend_mysql/lib/python3.5/site-packages/pip/utils/__init__.py",
line 482, in unzip_file
zipfp = open(filename, 'rb')
PermissionError: [Errno 13] Permission denied:
'/home/travis/.wheelhouse/Flask_Admin-1.5.0-py3-none-any.whl'

I've fixed this now by explicitly removing the file:
https://github.com/apache/incubator-airflow/pull/2993/files

I know this isn't the nicest solution to fix this, but I don't have any
access to Travis to wipe the cache or so. We can revert this once this file
is evicted from the cache. If there are any questions, please let me know.

Cheers, Fokko


Re: Airflow 1.8 CeleryExecutor: multiple servers in broker_url

2018-01-17 Thread Driesprong, Fokko
Hi Andrii,

When looking in the Airflow code, I'd say this should workd:
https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/executors/celery_executor.py#L41

Which version of celery are you using?

Please note, when you are updating to 1.9.1, the config of Celery changes
to make it more transparent. The Airflow config is then congruent with the
Celery config:
https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#airflow-191

Cheers, Fokko




2018-01-17 11:51 GMT+01:00 Andrii Kinash :

> Dear Airflow Community,
>
> I was able to successfully setup my Celery executor pointing to one
> RabbitMQ server (broker).  However when I’m trying to setup it to point to
> multiple RabbitMQ servers for HA – it doesn’t seem to work, even thought
> I’m doing it exactly as it’s written in the Celery documentation:
> http://docs.celeryproject.org/en/latest/userguide/configuration.html#std:
> setting-broker_url  configuration.html#std:setting-broker_url>
>
>
> When I’m using:
>
> > broker_url = [
> > 'pyamqp://usr:pwd@host1:5672//airflow1',
> > 'pyamqp://usr:pwd@host2:5672//airflow1'
> > ]
>
>
> I’m getting:
>
> > Jan 17 10:19:58 apph05 airflow[14606]: ConfigParser.ParsingError: File
> contains parsing errors: /…./airflow.cfg
> > Jan 17 10:19:58 apph05 airflow[14606]: [line 162]: ']\n’
>
>
> and when
>
> > broker_url = 'pyamqp://usr:pwd@host1:5672//airflow1;pyamqp://usr:pwd@
> host2:5672//airflow1
>
> getting:
>
> > Jan 17 10:22:44 apph05 airflow[15336]: File “/…./airflow/usr/local/lib/
> python2.7/dist-packages/kombu/transport/__init__.py", line 62, in
> resolve_transport
> > Jan 17 10:22:44 apph05 airflow[15336]: raise KeyError('No such
> transport: {0}'.format(transport))
>
>
> Also I tried different approaches, with or without quotes etc., but the
> end result was the same. I’m not really sure whether it’s a bug or am I
> doing something wrong?
>
> Thank you,
> Andrii


Re: Airflow 1.10

2018-01-14 Thread Driesprong, Fokko
I think 1.10 is a good idea. I'm working on this refactoring of the sensor
structure: https://github.com/apache/incubator-airflow/pull/2875

Would be awesome to get this in. At my current project we use sensors in a
few places, but still there is some work to be done. For example, don't
allocate an executor slot to the sensors, but have a more sophisticated way
of poking.

Cheers, Fokko



2018-01-12 21:19 GMT+01:00 Chris Riccomini :

> Just the operator (AIRFLOW-1517)
>
> On Fri, Jan 12, 2018 at 11:21 AM, Anirudh Ramanathan <
> ramanath...@google.com.invalid> wrote:
>
> > Sounds awesome. Is k8s support here referring to both the executor and
> the
> > operator?
> >
> > Thanks,
> >
> >
> > On Jan 12, 2018 11:18 AM, "Sid Anand"  wrote:
> >
> > > +1
> > >
> > >
> > > On Fri, Jan 12, 2018 at 10:56 AM, Chris Riccomini <
> criccom...@apache.org
> > >
> > > wrote:
> > >
> > > > Hey all,
> > > >
> > > > After some past discussion on Airflow 1.10 vs 2.0, I think we've
> > > converged
> > > > on a 1.10 as the next step. 1.10 will include:
> > > >
> > > > * Timezone changes
> > > > * Kubernetes support
> > > > * New UI
> > > >
> > > > The first two have been merged in, as I saw Bolke just merged K8s (I
> > saw
> > > a
> > > > few follow-on patches coming, though), and I think the new UI is
> > > probably a
> > > > couple of weeks out from a PR on master.
> > > >
> > > > What do people think of starting the release process on master in
> Feb?
> > > > Given that it took a month last time, I expect 1.10 to be released in
> > > > March. Thoughts?
> > > >
> > > > Cheers,
> > > > Chris
> > > >
> > >
> >
>


Re: Fix on_kill command for operators

2018-01-08 Thread Driesprong, Fokko
Yes, for Spark this should work. Depending on the operator and the
implementation:
https://github.com/apache/incubator-airflow/blob/3e6babe8ed8f8f281b67aa3f4e03bf3cfc1bcbaa/airflow/contrib/hooks/spark_submit_hook.py#L412-L428

However this is a big change in behaviour. I'm curious about the opinion of
others.

Cheers,
Fokko


2018-01-08 14:12 GMT+01:00 Milan van der Meer <
milan.vanderm...@realimpactanalytics.com>:

> Any help? :)
>
> On Thu, Dec 14, 2017 at 8:12 PM, Milan van der Meer <
> milan.vanderm...@realimpactanalytics.com> wrote:
>
> > I recently openend the following PR: https://github.com/apache/
> > incubator-airflow/pull/2877
> >
> > The problem is that on_kill is not called for operators when you clear a
> > task from the UI.
> > Thats problematic when working with ex. spark clusters as the jobs on the
> > cluster need to be killed.
> >
> > The issue is in the core code of Airflow and Im not familiar enough with
> > the inner workings there. So I could use some directions on this one from
> > people who are familiar.
> >
> > For more info, check out the PR.
> >
> > Kind regards,
> > Milan
> >
>
>
>
> --
>
> *Milan van der Meer*
>
> *Real**Impact* Analytics *| *Big Data Consultant
> www.realimpactanalytics.com
>
> *BE *+32 498 45 96 22 <0032498459622>* | Skype *milan.vandermeer.ria
>


Re: Airflow 1.9.0 is released

2018-01-03 Thread Driesprong, Fokko
Awesome work!

Cheers, Fokko

Op wo 3 jan. 2018 om 18:50 schreef Chris Riccomini 

> Hey all,
>
> I have updated the docs as well:
>
> https://airflow.incubator.apache.org/
>
> Cheers,
> Chris
>
> On Wed, Jan 3, 2018 at 9:30 AM, Sid Anand  wrote:
>
> > Grazzi!
> > -s
> >
> > On Wed, Jan 3, 2018 at 9:03 AM, Chris Riccomini 
> > wrote:
> >
> > > Fixed!
> > >
> > > On Tue, Jan 2, 2018 at 9:23 PM, Niranda Perera <
> niranda...@cse.mrt.ac.lk
> > >
> > > wrote:
> > >
> > > > Hi sid,
> > > >
> > > > in here,
> > > >   - Announcements : https://cwiki.apache.org/
> > confluence/display/AIRFLOW/
> > > > Announcements#Announcements-Jan2,2018
> > > >
> > > > Source & Binary "Sdist" release link is broken! there's a space in
> the
> > > > middle ;-)
> > > >
> > > > Best regards
> > > >
> > > > Niranda Perera
> > > > Research Assistant
> > > > Dept of CSE, University of Moratuwa
> > > > niranda...@cse.mrt.ac.lk
> > > > +94 71 554 8430
> > > > https://lk.linkedin.com/in/niranda
> > > >
> > > > On Wed, Jan 3, 2018 at 5:14 AM, Sid Anand  wrote:
> > > >
> > > > > Woohoo!!! Thanks Chris & Bolke! Supreme accomplishment!
> > > > >
> > > > > I've updated:
> > > > >
> > > > >- Announcements : https://cwiki.apache.org/
> > > > confluence/display/AIRFLOW/
> > > > >Announcements#Announcements-Jan2,2018
> > > > > > > > > Announcements#Announcements-Jan2,2018>
> > > > >- Twitter : https://twitter.com/ApacheAirflow/status/
> > > > 948337902001405952
> > > > >- Updated our podling report with the new release :
> > > > >https://wiki.apache.org/incubator/January2018
> > > > >
> > > > > Again, great work getting this out!
> > > > > -s
> > > > >
> > > > > On Tue, Jan 2, 2018 at 2:57 PM, Marc Bollinger  >
> > > > wrote:
> > > > >
> > > > > > Phew! Great job, all involved! 
> > > > > >
> > > > > > On Tue, Jan 2, 2018 at 2:52 PM, Andy Loughran 
> > wrote:
> > > > > >
> > > > > > > Congratulations guys - lots and lots of voting and you got it
> > over
> > > > the
> > > > > > > line.
> > > > > > >
> > > > > > > happy new year!
> > > > > > >
> > > > > > > Andy
> > > > > > >
> > > > > > > On 2 January 2018 at 22:45, Arthur Wiedmer <
> > > arthur.wied...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Woohoo! 
> > > > > > > >
> > > > > > > > Thanks Chris!
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jan 2, 2018 at 2:40 PM, Chris Riccomini <
> > > > > criccom...@apache.org
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Dear Airflow community,
> > > > > > > > >
> > > > > > > > > Airflow 1.9.0 was just released.
> > > > > > > > >
> > > > > > > > > The source release as well as the binary "sdist" release
> are
> > > > > > available
> > > > > > > > > here:
> > > > > > > > >
> > > > > > > > > https://dist.apache.org/repos/dist/release/incubator/
> > > > > > > > > airflow/1.9.0-incubating/
> > > > > > > > >
> > > > > > > > > We also made this version available on PyPi for convenience
> > > (`pip
> > > > > > > install
> > > > > > > > > apache-airflow`):
> > > > > > > > >
> > > > > > > > > https://pypi.python.org/pypi/apache-airflow
> > > > > > > > >
> > > > > > > > > Find the CHANGELOG here for more details:
> > > > > > > > >
> > > > > > > > >
> https://github.com/apache/incubator-airflow/blob/master/CHAN
> > > > > > GELOG.txt
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Chris
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Airflow 1.9.0rc8

2017-12-15 Thread Driesprong, Fokko
+1 binding

Op vr 15 dec. 2017 om 23:39 schreef Bolke de Bruin 

> +1, binding
>
> Checked sigs, version, source is there (did not check build), bin is there.
>
> Bolke
>
> Verstuurd vanaf mijn iPad
>
> > Op 15 dec. 2017 om 23:31 heeft Joy Gao  het volgende
> geschreven:
> >
> > +1, binding
> >
> > Thank you Chris!
> >
> > On Fri, Dec 15, 2017 at 2:30 PM, Chris Riccomini 
> > wrote:
> >
> >> Hey all,
> >>
> >> (Last time, I hope)^2
> >>
> >> I have cut Airflow 1.9.0 RC8. This email is calling a vote on the
> release,
> >> which will last for 72 hours. Consider this my (binding) +1.
> >>
> >> Airflow 1.9.0 RC8 is available at:
> >>
> >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.9.0rc8/
> >>
> >> apache-airflow-1.9.0rc8+incubating-source.tar.gz is a source release
> that
> >> comes with INSTALL instructions.
> >> apache-airflow-1.9.0rc8+incubating-bin.tar.gz is the binary Python
> "sdist"
> >> release.
> >>
> >> Public keys are available at:
> >>
> >> https://dist.apache.org/repos/dist/release/incubator/airflow/
> >>
> >> The release contains no new JIRAs. Just a version fix.
> >>
> >> I also had to change the version number to exclude the `rc6` string as
> well
> >> as the "+incubating" string, so it's now simply 1.9.0. This will allow
> us
> >> to rename the artifact without modifying the artifact checksums when we
> >> actually release.
> >>
> >> See JIRAs that were in 1.9.0RC7 and before (see previous VOTE email for
> >> full list).
> >>
> >> Cheers,
> >> Chris
> >>
>


Re: [VOTE] Airflow 1.9.0rc6

2017-12-12 Thread Driesprong, Fokko
+1 from my side

Cheers, Fokko

Op di 12 dec. 2017 om 17:28 schreef Ash Berlin-Taylor <
ash_airflowl...@firemirror.com>

> +0.5 from me.
>
> Our big test will come on Thursday morning, but looking good so far for
> the small daily dags we've got are running okay, logs are showing up, and
> making their way to S3.
>
> -ash
>
> > On 11 Dec 2017, at 18:50, Chris Riccomini  wrote:
> >
> > Hey all,
> >
> > I have cut Airflow 1.9.0 RC6. This email is calling a vote on the
> release,
> > which will last for 72 hours. Consider this my (binding) +1.
> >
> > Airflow 1.9.0 RC6 is available at:
> >
> > https://dist.apache.org/repos/dist/dev/incubator/airflow/1.9.0rc6/
> >
> > apache-airflow-1.9.0rc6+incubating-source.tar.gz is a source release that
> > comes with INSTALL instructions.
> > apache-airflow-1.9.0rc6+incubating-bin.tar.gz is the binary Python
> "sdist"
> > release.
> >
> > Public keys are available at:
> >
> > https://dist.apache.org/repos/dist/release/incubator/airflow/
> >
> > The release contains the following JIRAs:
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-1897
> > https://issues.apache.org/jira/browse/AIRFLOW-1873
> > https://issues.apache.org/jira/browse/AIRFLOW-1896
> >
> > Along with all JIRAs that were in 1.9.0RC5 (see previous VOTE email for
> > full list).
> >
> > Cheers,
> > Chris
>
>


  1   2   >