Re: pip install airflow -- Error

2017-06-08 Thread Joseph Napolitano
Anthony, check out this issue here:
https://github.com/mitmproxy/mitmproxy/issues/68

You might need to install a system package: brew install libxml2

Make sure you also: brew update && brew upgrade

Cheers,
Joe

On Thu, Jun 8, 2017 at 9:16 AM, Anthony McClay 
wrote:

> Airflow Development Team,
>
> I am having an issue with the pip install of airflow.  I cannot figure out
> the difference, I have 2 Mac machines, and one machine airflow installs
> correctly and the other one, airflow fails on install.
>
> - Please advise
>
> Tony McClay
> anthony.mcc...@me.com
>
>
>
> creating var/folders/mf/k3gwf7795b594fkvb64vbz_rgn/T
> cc -I/usr/include/libxml2 -I/usr/include/libxml2 -c /var/folders/mf/
> k3gwf7795b594fkvb64vbz_rgn/T/xmlXPathInitynysjdpe.c -o var/folders/mf/
> k3gwf7795b594fkvb64vbz_rgn/T/xmlXPathInitynysjdpe.o
> 
> /var/folders/mf/k3gwf7795b594fkvb64vbz_rgn/T/xmlXPathInitynysjdpe.c:1:10:
> fatal error: 'libxml/xpath.h' file not found
> #include "libxml/xpath.h"
>  ^
> 1 error generated.
> 
> *
> Could not find function xmlCheckVersion in library libxml2. Is libxml2
> installed?
> Perhaps try: xcode-select --install
> 
> *
> error: command '/usr/bin/clang' failed with exit status 1
>
> 
> Command "/Users/anthonymcclay/.virtualenvs/airflow2/bin/python -u -c
> "import setuptools, tokenize;__file__='/private/var/folders/mf/
> k3gwf7795b594fkvb64vbz_rgn/T/pip-build-yfv3sebl/
> lxml/setup.py';f=getattr(tokenize, 'open', 
> open)(__file__);code=f.read().replace('\r\n',
> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record
> /var/folders/mf/k3gwf7795b594fkvb64vbz_rgn/T/pip-wj2zx_si-record/install-record.txt
> --single-version-externally-managed --compile --install-headers
> /Users/anthonymcclay/.virtualenvs/airflow2/include/site/python3.5/lxml"
> failed with error code 1 in /private/var/folders/mf/
> k3gwf7795b594fkvb64vbz_rgn/T/pip-build-yfv3sebl/lxml/
>
>
> Configuration
> 
> ===
>
> (airflow2) Anthonys-MacBook-Pro:PythonProjects anthonymcclay$ which python
> /Users/anthonymcclay/.virtualenvs/airflow2/bin/python
> (airflow2) Anthonys-MacBook-Pro:PythonProjects anthonymcclay$ python
> --version
> Python 3.5.2
> (airflow2) Anthonys-MacBook-Pro:PythonProjects anthonymcclay$ pip
> --version
> pip 9.0.1 from /Users/anthonymcclay/.virtualenvs/airflow2/lib/
> python3.5/site-packages/pip-9.0.1-py3.5.egg (python 3.5)
> (airflow2) Anthonys-MacBook-Pro:PythonProjects anthonymcclay$
> Darwin Anthonys-MacBook-Pro.local 15.6.0 Darwin Kernel Version 15.6.0: Tue
> Apr 11 16:00:51 PDT 2017; root:xnu-3248.60.11.5.3~1/RELEASE_X86_64 x86_64
>
>
>
>
>
>
>
>
>
>
>
>


-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


NYC Airflow Meetup #2

2017-03-21 Thread Joseph Napolitano
Hi all!

I'm hoping to keep the enthusiasm high with the NYC Meetup group.  I sent
out a couple of messages from the Meetup page here:
https://www.meetup.com/NYC-Apache-Airflow-incubating-Meetup/

I wanted to follow up on the mailing to inform new subscribers of the
group, and to kick off a discussion for the next meeting date.  There's an
open poll available to determine a date.  Please check it out here:
https://www.meetup.com/NYC-Apache-Airflow-incubating-Meetup/polls/1244158/

Reach out if you're interested in presenting.  I think good presentation
are ~15 minutes and focus on specific problems you faced while implementing
Airflow on your tech stack, clever workflow patterns, or workarounds.  If
we host the Meetup again, we have the large projector and PA system.  This
time we'll make an effort to broadcast (or at least record) the Meetup.
There's a good chance with Google Hangouts that we can broadcast remote
presentations too.

Maybe a presentation on the new 1.8 release? :)

Cheers!

-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Re: NYC Airflow Meetup

2017-02-04 Thread Joseph Napolitano
Thanks Sid!  Looks like my colleague Jason Jho added you first!

Joe Nap

On Fri, Feb 3, 2017 at 11:59 PM, siddharth anand <san...@apache.org> wrote:

> Great!
> Thanks for creating it - I've just joined so you can add me as an
> organizer.
>
> I've linked to it on :
>
>- https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements
>- https://twitter.com/ApacheAirflow/status/827743162789605382
>- https://cwiki.apache.org/confluence/display/AIRFLOW/Meetups
>-
>
>
> -s
>
> On Fri, Feb 3, 2017 at 11:01 AM, Joseph Napolitano <
> joseph.napolit...@blueapron.com.invalid> wrote:
>
> > Hi all,
> >
> > I want to thank everyone for attending NYC's first Airflow meetup at Blue
> > Apron.  It was a huge success and we're glad to have met everyone.
> >
> > As suggested, we decided to create an official NYC Meetup page, sponsored
> > by Blue Apron.  We'll add Sid and Max as Organizers.  Let us know if you
> > want to help organize.
> >
> > https://www.meetup.com/NYC-Apache-Airflow-incubating-Meetup/
> >
> > I planned on taking video of the presentations, but it completely slipped
> > my mind!  I'll upload my slides to Slideshare and provide a small writeup
> > to complement them.
> >
> > We're committed to Airflow at Blue Apron and we love the project.  Now
> that
> > our infrastructure is taking shape, we'll have time to contribute back to
> > the project.  We have top-down support at Blue Apron to dedicate company
> > time for it.
> >
> > Feel free to connect anytime!
> > https://www.linkedin.com/in/joenap
> >
> > Thanks again,
> > *Joe Napolitano *| Sr. Data Engineer
> > www.blueapron.com | 5 Crosby Street, New York, NY 10013
> >
>



-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


NYC Airflow Meetup

2017-02-03 Thread Joseph Napolitano
Hi all,

I want to thank everyone for attending NYC's first Airflow meetup at Blue
Apron.  It was a huge success and we're glad to have met everyone.

As suggested, we decided to create an official NYC Meetup page, sponsored
by Blue Apron.  We'll add Sid and Max as Organizers.  Let us know if you
want to help organize.

https://www.meetup.com/NYC-Apache-Airflow-incubating-Meetup/

I planned on taking video of the presentations, but it completely slipped
my mind!  I'll upload my slides to Slideshare and provide a small writeup
to complement them.

We're committed to Airflow at Blue Apron and we love the project.  Now that
our infrastructure is taking shape, we'll have time to contribute back to
the project.  We have top-down support at Blue Apron to dedicate company
time for it.

Feel free to connect anytime!
https://www.linkedin.com/in/joenap

Thanks again,
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Re: Airflow Meetup in NYC @ Blue Apron

2017-01-30 Thread Joseph Napolitano
Max,

I'll look into it first thing tomorrow.  Would be thrilled.

Cheers!

On Mon, Jan 30, 2017 at 7:33 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> I'd love to watch, is there any way you guys can livecast or share a video
> after the event?
>
> Looking forward to it!
>
> Max
>
> On Mon, Jan 30, 2017 at 1:56 PM, Joseph Napolitano <
> joseph.napolit...@blueapron.com.invalid> wrote:
>
> > Hi All!
> >
> > We are excited to host an Airflow Meetup in NYC.  We will have a guest
> > speaker from Spotify!
> >
> > The Meetup is in 2 days, on Feb 1st @ 6:30pm at Blue Apron's
> headquarters.
> >
> > In Summary:
> > Date: Feb 1st
> > Time 6:30 - 9pm EST
> > Location: 40 W 23rd St. New York, NY 10010
> > https://www.google.com/maps/place/40+W+23rd+St,+New+York,+NY
> > +10010/@40.7420885,-73.9938457,17z/data=!3m1!4b1!4m5!3m4!
> > 1s0x89c259a46471d2a1:0xc2517d92b1b68bba!8m2!3d40.
> > 7420845!4d-73.9916517?hl=en
> >
> > Schedule:
> > 6:30 - 7:15 Meet and greet
> > 7:15 - ? Presentations from Blue Apron and Spotify
> >
> > It's not too late to signup for a presentation.  We will stick around as
> > late as 9pm.
> >
> > We don't have an official Meetup page, so please sign up here :)
> > The signup sheet is available here:
> > https://docs.google.com/spreadsheets/d/1WmfgZeExSVdLf-u1uh3I
> > leeHy8QTwaJ4BkkSkVM-X1E/edit?usp=sharing
> >
> > Feel free to share the signup sheet with other parties.
> >
> > As mentioned, we're on the 5th floor.  You need to check in with security
> > in the building lobby, and again when you reach the fifth floor to get a
> > name tag.
> >
> > Thanks, and looking forward to meeting everyone!
> >
> > Cheers,
> > Joe Nap
> >
> >
> >
> > On Fri, Jan 20, 2017 at 1:37 PM, Joseph Napolitano <
> > joseph.napolit...@blueapron.com> wrote:
> >
> > > Hi all!
> > >
> > > I want to officially announce a Meetup for Airflow in NYC!  I'm looking
> > > forward to meeting other community members to share knowledge and
> > network.
> > >
> > > We may create an official Meetup page, but in the meantime please
> signup
> > > here:
> > > https://docs.google.com/spreadsheets/d/1WmfgZeExSVdLf-u1uh3I
> > > leeHy8QTwaJ4BkkSkVM-X1E/edit?usp=sharing
> > >
> > > I have a confirmed date of February 1st @ 6:30 at Blue Apron's
> > > headquarters.
> > >
> > > In Summary:
> > > Date: Feb 1st
> > > Time 6:30 - 9pm EST
> > > Location: 40 W 23rd St. New York, NY 10010
> > > https://www.google.com/maps/place/40+W+23rd+St,+New+York,+NY
> > > +10010/@40.7420885,-73.9938457,17z/data=!3m1!4b1!4m5!3m4!
> > > 1s0x89c259a46471d2a1:0xc2517d92b1b68bba!8m2!3d40.7420845!4d-
> > > 73.9916517?hl=en
> > >
> > > We're on the 5th floor.  You need to check in with security in the
> > > building lobby, and again when you reach the fifth floor to get a name
> > tag.
> > >
> > > Food & drink will be provided!
> > >
> > > Let me know if you would like to present.  We'd love to hear about your
> > > architecture and war stories.  We will have a large projector and PA
> > system
> > > setup.
> > >
> > > Sorry about the short notice, but it took a while to get approved over
> > the
> > > holidays and new year.  If we can't generate enough interest we can
> > > certainly push it back a month.
> > >
> > > Thanks, and Bon Appétite!
> > >
> > > --
> > > *Joe Napolitano *| Sr. Data Engineer
> > > www.blueapron.com | 5 Crosby Street, New York, NY 10013
> > >
> >
> >
> >
> > --
> > *Joe Napolitano *| Sr. Data Engineer
> > www.blueapron.com | 5 Crosby Street, New York, NY 10013
> >
>



-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Re: Airflow Meetup in NYC @ Blue Apron

2017-01-30 Thread Joseph Napolitano
Hi All!

We are excited to host an Airflow Meetup in NYC.  We will have a guest
speaker from Spotify!

The Meetup is in 2 days, on Feb 1st @ 6:30pm at Blue Apron's headquarters.

In Summary:
Date: Feb 1st
Time 6:30 - 9pm EST
Location: 40 W 23rd St. New York, NY 10010
https://www.google.com/maps/place/40+W+23rd+St,+New+York,+NY
+10010/@40.7420885,-73.9938457,17z/data=!3m1!4b1!4m5!3m4!
1s0x89c259a46471d2a1:0xc2517d92b1b68bba!8m2!3d40.7420845!4d-73.9916517?hl=en

Schedule:
6:30 - 7:15 Meet and greet
7:15 - ? Presentations from Blue Apron and Spotify

It's not too late to signup for a presentation.  We will stick around as
late as 9pm.

We don't have an official Meetup page, so please sign up here :)
The signup sheet is available here:
https://docs.google.com/spreadsheets/d/1WmfgZeExSVdLf-u1uh3I
leeHy8QTwaJ4BkkSkVM-X1E/edit?usp=sharing

Feel free to share the signup sheet with other parties.

As mentioned, we're on the 5th floor.  You need to check in with security
in the building lobby, and again when you reach the fifth floor to get a
name tag.

Thanks, and looking forward to meeting everyone!

Cheers,
Joe Nap



On Fri, Jan 20, 2017 at 1:37 PM, Joseph Napolitano <
joseph.napolit...@blueapron.com> wrote:

> Hi all!
>
> I want to officially announce a Meetup for Airflow in NYC!  I'm looking
> forward to meeting other community members to share knowledge and network.
>
> We may create an official Meetup page, but in the meantime please signup
> here:
> https://docs.google.com/spreadsheets/d/1WmfgZeExSVdLf-u1uh3I
> leeHy8QTwaJ4BkkSkVM-X1E/edit?usp=sharing
>
> I have a confirmed date of February 1st @ 6:30 at Blue Apron's
> headquarters.
>
> In Summary:
> Date: Feb 1st
> Time 6:30 - 9pm EST
> Location: 40 W 23rd St. New York, NY 10010
> https://www.google.com/maps/place/40+W+23rd+St,+New+York,+NY
> +10010/@40.7420885,-73.9938457,17z/data=!3m1!4b1!4m5!3m4!
> 1s0x89c259a46471d2a1:0xc2517d92b1b68bba!8m2!3d40.7420845!4d-
> 73.9916517?hl=en
>
> We're on the 5th floor.  You need to check in with security in the
> building lobby, and again when you reach the fifth floor to get a name tag.
>
> Food & drink will be provided!
>
> Let me know if you would like to present.  We'd love to hear about your
> architecture and war stories.  We will have a large projector and PA system
> setup.
>
> Sorry about the short notice, but it took a while to get approved over the
> holidays and new year.  If we can't generate enough interest we can
> certainly push it back a month.
>
> Thanks, and Bon Appétite!
>
> --
> *Joe Napolitano *| Sr. Data Engineer
> www.blueapron.com | 5 Crosby Street, New York, NY 10013
>



-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Re: Airflow + Celery + SQS

2017-01-30 Thread Joseph Napolitano
Jason,

We've considered SQS as the messaging backend for Celery, mostly due to its
serverless nature.  The disadvantage is that we'd depend on AWS for local
development.

Given this, we're leaning on Redis for messaging because we can use AWS's
ElasticCache help manage it.  This gives us the reliability of AWS while
not depending on it during development.  Redis can also be used as the
result store for Celery, but since we're already using postgres/RDS for the
metadata db, I assume we can have a separate schema for Celery's result
store in the same database instance.

What we don't want to do is have use different containers (using docker) in
both dev and production, i.e. relying on Redis or Rabbit locally, while
using SQS in production.

We're still investing this heavily, but would love to keep a tight loop on
the subject.  If you're in NYC, we're hosting an Airflow meetup in 2 day to
discuss some of our upcoming implementation ideas.  I'm going to reannouce
it later on mailing list.

Cheers

On Mon, Jan 30, 2017 at 9:59 AM, Jeremiah Lowin  wrote:

> Jason,
>
> I don't believe Airflow cares about Celery's backend as long as the task
> API remains the same. You should be OK (though I haven't tested to
> confirm).
>
> J
>
> On Sat, Jan 28, 2017 at 5:09 PM Jason Chen 
> wrote:
>
> > Hi Airflow team,
> >
> > Celery 4 supports AWS SQS
> >  http://docs.celeryproject.org/en/latest/getting-started/
> brokers/sqs.html
> >
> > We are using Airflow 1.7.1.3
> > Is there any problem, if we change config to use SQS for CeleryExecutor ?
> >
> > Thanks.
> >
> > Jason
> >
>



-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Airflow Meetup in NYC @ Blue Apron

2017-01-20 Thread Joseph Napolitano
Hi all!

I want to officially announce a Meetup for Airflow in NYC!  I'm looking
forward to meeting other community members to share knowledge and network.

We may create an official Meetup page, but in the meantime please signup
here:
https://docs.google.com/spreadsheets/d/1WmfgZeExSVdLf-
u1uh3IleeHy8QTwaJ4BkkSkVM-X1E/edit?usp=sharing

I have a confirmed date of February 1st @ 6:30 at Blue Apron's headquarters.

In Summary:
Date: Feb 1st
Time 6:30 - 9pm EST
Location: 40 W 23rd St. New York, NY 10010
https://www.google.com/maps/place/40+W+23rd+St,+New+York,+
NY+10010/@40.7420885,-73.9938457,17z/data=!3m1!4b1!4m5!
3m4!1s0x89c259a46471d2a1:0xc2517d92b1b68bba!8m2!3d40.
7420845!4d-73.9916517?hl=en

We're on the 5th floor.  You need to check in with security in the building
lobby, and again when you reach the fifth floor to get a name tag.

Food & drink will be provided!

Let me know if you would like to present.  We'd love to hear about your
architecture and war stories.  We will have a large projector and PA system
setup.

Sorry about the short notice, but it took a while to get approved over the
holidays and new year.  If we can't generate enough interest we can
certainly push it back a month.

Thanks, and Bon Appétite!

-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Re: NYC Meetup?

2017-01-20 Thread Joseph Napolitano
Hi All, I wanted to bump this thread again.  I sent out another email about
a meetup in NYC, so look for that one.

It took a long time to get approved over the holidays, so I hope we can
still generate interest in a short time.

Cheers!

On Thu, Dec 29, 2016 at 3:01 PM, Joseph Napolitano <
joseph.napolit...@blueapron.com> wrote:

> Great, I'll wrap up details early next week. Enjoy the new year.
>
> Cheers.
>
> On Thu, Dec 29, 2016 at 1:10 PM, Luke Ptz <lukeptzc...@gmail.com> wrote:
>
>> Count me in
>>
>> On Tue, Dec 27, 2016 at 1:41 PM, Joseph Napolitano <
>> joseph.napolit...@blueapron.com.invalid> wrote:
>>
>> > I regret to say that we've offered pizzas at the other meetups we
>> hosted :(
>> >
>> > But I will strongly advocate for a healthier option this time! :D
>> >
>> > Joe Nap
>> >
>> > On Mon, Dec 26, 2016 at 10:16 AM, Rob Goretsky <
>> robert.goret...@gmail.com>
>> > wrote:
>> >
>> > > Awesome, thanks for volunteering a place and time! Feb 1 should be
>> good
>> > > for me too.  Given that this is being hosted at blue apron, I'm
>> assuming
>> > > we'll be preparing a multi-course meal from scratch instead of the
>> usual
>> > > meetup pizza?  /s
>> > >
>> > > -rob
>> > >
>> > > > On Dec 24, 2016, at 12:49 PM, Jeremiah Lowin <jlo...@apache.org>
>> > wrote:
>> > > >
>> > > > Feb 1 should work for me -- thanks for getting the ball rolling,
>> Joe!
>> > > >
>> > > > On Sat, Dec 24, 2016 at 12:23 PM Joseph Napolitano
>> > > > <joseph.napolit...@blueapron.com.invalid> wrote:
>> > > >
>> > > >> Hi all,
>> > > >>
>> > > >> I got the green light to host an Airflow meetup here at Blue Apron.
>> > > We're
>> > > >> located on 23rd St. between 5th and 6th.  A precise date and time
>> are
>> > > TBD,
>> > > >> but I would like to suggest Wednesday, February 1st @ 6:30pm.
>> > > >>
>> > > >> I'll get the ball rolling if there's enough interest for February
>> 1st.
>> > > I
>> > > >> was asked to provide a headcount as well, so I would like to
>> create an
>> > > >> official Meetup event.  I am curious what others think about
>> creating
>> > > the
>> > > >> event on the existing Bay Area Meetup Group, despite being labeled
>> Bay
>> > > >> Area?
>> > > >>
>> > > >> Also, please reach out if you'd like to give a presentation.  We'll
>> > > have a
>> > > >> large screen setup w/ a projector and PA system.
>> > > >>
>> > > >> All the best,
>> > > >> Joe Nap
>> > > >>
>> > > >>
>> > > >> On Thu, Dec 22, 2016 at 2:34 PM, Chris Riccomini <
>> > criccom...@apache.org
>> > > >
>> > > >> wrote:
>> > > >>
>> > > >>> @Rob, gotta get that up on the "who uses airflow" README.md
>> section!
>> > :)
>> > > >>>
>> > > >>> On Thu, Dec 22, 2016 at 9:52 AM, Rob Goretsky <
>> > > robert.goret...@gmail.com
>> > > >>>
>> > > >>> wrote:
>> > > >>>
>> > > >>>> We at MLB Advanced Media (MLBAM / MLB.com) are just about to get
>> our
>> > > >>> first
>> > > >>>> few Airflow processes into production, so we'd love to join an
>> > > >> NYC-based
>> > > >>>> meetup!
>> > > >>>>
>> > > >>>> -rob
>> > > >>>>
>> > > >>>>
>> > > >>>> On Wed, Dec 21, 2016 at 9:49 AM, Jeremiah Lowin <
>> jlo...@apache.org>
>> > > >>> wrote:
>> > > >>>>
>> > > >>>>> It would be wonderful to have an east coast meetup! I would
>> love to
>> > > >>> join
>> > > >>>> if
>> > > >>>>> I can be in NY that day.
>> > > >>>>>
>> > > >>>>> Best,
>> > > >>>>> Jeremiah
>> > > >>>>>
>> > > >>>

Re: NYC Meetup?

2016-12-27 Thread Joseph Napolitano
I regret to say that we've offered pizzas at the other meetups we hosted :(

But I will strongly advocate for a healthier option this time! :D

Joe Nap

On Mon, Dec 26, 2016 at 10:16 AM, Rob Goretsky <robert.goret...@gmail.com>
wrote:

> Awesome, thanks for volunteering a place and time! Feb 1 should be good
> for me too.  Given that this is being hosted at blue apron, I'm assuming
> we'll be preparing a multi-course meal from scratch instead of the usual
> meetup pizza?  /s
>
> -rob
>
> > On Dec 24, 2016, at 12:49 PM, Jeremiah Lowin <jlo...@apache.org> wrote:
> >
> > Feb 1 should work for me -- thanks for getting the ball rolling, Joe!
> >
> > On Sat, Dec 24, 2016 at 12:23 PM Joseph Napolitano
> > <joseph.napolit...@blueapron.com.invalid> wrote:
> >
> >> Hi all,
> >>
> >> I got the green light to host an Airflow meetup here at Blue Apron.
> We're
> >> located on 23rd St. between 5th and 6th.  A precise date and time are
> TBD,
> >> but I would like to suggest Wednesday, February 1st @ 6:30pm.
> >>
> >> I'll get the ball rolling if there's enough interest for February 1st.
> I
> >> was asked to provide a headcount as well, so I would like to create an
> >> official Meetup event.  I am curious what others think about creating
> the
> >> event on the existing Bay Area Meetup Group, despite being labeled Bay
> >> Area?
> >>
> >> Also, please reach out if you'd like to give a presentation.  We'll
> have a
> >> large screen setup w/ a projector and PA system.
> >>
> >> All the best,
> >> Joe Nap
> >>
> >>
> >> On Thu, Dec 22, 2016 at 2:34 PM, Chris Riccomini <criccom...@apache.org
> >
> >> wrote:
> >>
> >>> @Rob, gotta get that up on the "who uses airflow" README.md section! :)
> >>>
> >>> On Thu, Dec 22, 2016 at 9:52 AM, Rob Goretsky <
> robert.goret...@gmail.com
> >>>
> >>> wrote:
> >>>
> >>>> We at MLB Advanced Media (MLBAM / MLB.com) are just about to get our
> >>> first
> >>>> few Airflow processes into production, so we'd love to join an
> >> NYC-based
> >>>> meetup!
> >>>>
> >>>> -rob
> >>>>
> >>>>
> >>>> On Wed, Dec 21, 2016 at 9:49 AM, Jeremiah Lowin <jlo...@apache.org>
> >>> wrote:
> >>>>
> >>>>> It would be wonderful to have an east coast meetup! I would love to
> >>> join
> >>>> if
> >>>>> I can be in NY that day.
> >>>>>
> >>>>> Best,
> >>>>> Jeremiah
> >>>>>
> >>>>> On Tue, Dec 20, 2016 at 4:24 PM Patrick D'Souza <
> >>>> patrick.dso...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Having hosted a bunch of meetups in the past, we at
> >> SecurityScorecard
> >>>> are
> >>>>>> very interested in hosting an Airflow meetup as well. We can easily
> >>>> host
> >>>>>> around 50 people or so in January.
> >>>>>>
> >>>>>> On Fri, Dec 16, 2016 at 4:26 PM, Chris Riccomini <
> >>>> criccom...@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> lol
> >>>>>>>
> >>>>>>> On Fri, Dec 16, 2016 at 11:04 AM, Joseph Napolitano <
> >>>>>>> joseph.napolit...@blueapron.com.invalid> wrote:
> >>>>>>>
> >>>>>>>> Auto-correct got me. Metopes = Meetups
> >>>>>>>>
> >>>>>>>> Metope - a square space between triglyphs in a Doric frieze.
> >>>>>>>>
> >>>>>>>> On Fri, Dec 16, 2016 at 2:03 PM, Joseph Napolitano <
> >>>>>>>> joseph.napolit...@blueapron.com> wrote:
> >>>>>>>>
> >>>>>>>>> We hosted several metopes here at Blue Apron.  I will bring
> >> it
> >>> up
> >>>>> to
> >>>>>>> our
> >>>>>>>>> administrative team and give an update.  Mid-january is
> >>> probably
> >>>> a
> >>>>>> good
> >>>>>>>>> target.
> >>>>>>>>>
> >>>>>>>>> - Joe
> >>>>>>>>>
> >>>>>>>>> On Thu, Dec 15, 2016 at 5:18 PM, Luke Ptz <
> >>> lukeptzc...@gmail.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Cool to see the interest is there! I unfortunately can't
> >>> offer a
> >>>>>> space
> >>>>>>>> for
> >>>>>>>>>> a meetup, can anyone else? If not could always be
> >>> informal/meet
> >>>>> in a
> >>>>>>>>>> public
> >>>>>>>>>> setting
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Dec 14, 2016 at 7:08 PM, Andrew Phillips <
> >>>>>> andr...@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> We at Blue Apron would be very interested.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Same here.
> >>>>>>>>>>>
> >>>>>>>>>>> ap
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> *Joe Napolitano *| Sr. Data Engineer
> >>>>>>>>> www.blueapron.com | 5 Crosby Street, New York, NY 10013
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> *Joe Napolitano *| Sr. Data Engineer
> >>>>>>>> www.blueapron.com | 5 Crosby Street, New York, NY 10013
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> *Joe Napolitano *| Sr. Data Engineer
> >> www.blueapron.com | 5 Crosby Street, New York, NY 10013
> >>
>



-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Re: NYC Meetup?

2016-12-24 Thread Joseph Napolitano
Hi all,

I got the green light to host an Airflow meetup here at Blue Apron.  We're
located on 23rd St. between 5th and 6th.  A precise date and time are TBD,
but I would like to suggest Wednesday, February 1st @ 6:30pm.

I'll get the ball rolling if there's enough interest for February 1st.  I
was asked to provide a headcount as well, so I would like to create an
official Meetup event.  I am curious what others think about creating the
event on the existing Bay Area Meetup Group, despite being labeled Bay Area?

Also, please reach out if you'd like to give a presentation.  We'll have a
large screen setup w/ a projector and PA system.

All the best,
Joe Nap


On Thu, Dec 22, 2016 at 2:34 PM, Chris Riccomini <criccom...@apache.org>
wrote:

> @Rob, gotta get that up on the "who uses airflow" README.md section! :)
>
> On Thu, Dec 22, 2016 at 9:52 AM, Rob Goretsky <robert.goret...@gmail.com>
> wrote:
>
> > We at MLB Advanced Media (MLBAM / MLB.com) are just about to get our
> first
> > few Airflow processes into production, so we'd love to join an NYC-based
> > meetup!
> >
> > -rob
> >
> >
> > On Wed, Dec 21, 2016 at 9:49 AM, Jeremiah Lowin <jlo...@apache.org>
> wrote:
> >
> > > It would be wonderful to have an east coast meetup! I would love to
> join
> > if
> > > I can be in NY that day.
> > >
> > > Best,
> > > Jeremiah
> > >
> > > On Tue, Dec 20, 2016 at 4:24 PM Patrick D'Souza <
> > patrick.dso...@gmail.com>
> > > wrote:
> > >
> > > > Having hosted a bunch of meetups in the past, we at SecurityScorecard
> > are
> > > > very interested in hosting an Airflow meetup as well. We can easily
> > host
> > > > around 50 people or so in January.
> > > >
> > > > On Fri, Dec 16, 2016 at 4:26 PM, Chris Riccomini <
> > criccom...@apache.org>
> > > > wrote:
> > > >
> > > > > lol
> > > > >
> > > > > On Fri, Dec 16, 2016 at 11:04 AM, Joseph Napolitano <
> > > > > joseph.napolit...@blueapron.com.invalid> wrote:
> > > > >
> > > > > > Auto-correct got me. Metopes = Meetups
> > > > > >
> > > > > > Metope - a square space between triglyphs in a Doric frieze.
> > > > > >
> > > > > > On Fri, Dec 16, 2016 at 2:03 PM, Joseph Napolitano <
> > > > > > joseph.napolit...@blueapron.com> wrote:
> > > > > >
> > > > > > > We hosted several metopes here at Blue Apron.  I will bring it
> up
> > > to
> > > > > our
> > > > > > > administrative team and give an update.  Mid-january is
> probably
> > a
> > > > good
> > > > > > > target.
> > > > > > >
> > > > > > > - Joe
> > > > > > >
> > > > > > > On Thu, Dec 15, 2016 at 5:18 PM, Luke Ptz <
> lukeptzc...@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > >> Cool to see the interest is there! I unfortunately can't
> offer a
> > > > space
> > > > > > for
> > > > > > >> a meetup, can anyone else? If not could always be
> informal/meet
> > > in a
> > > > > > >> public
> > > > > > >> setting
> > > > > > >>
> > > > > > >> On Wed, Dec 14, 2016 at 7:08 PM, Andrew Phillips <
> > > > andr...@apache.org>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > We at Blue Apron would be very interested.
> > > > > > >> >>
> > > > > > >> >
> > > > > > >> > Same here.
> > > > > > >> >
> > > > > > >> > ap
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *Joe Napolitano *| Sr. Data Engineer
> > > > > > > www.blueapron.com | 5 Crosby Street, New York, NY 10013
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > *Joe Napolitano *| Sr. Data Engineer
> > > > > > www.blueapron.com | 5 Crosby Street, New York, NY 10013
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Re: Airflow state change diagram

2016-11-02 Thread Joseph Napolitano
Gerard,

This is great.  Thanks for sharing this.

Joe

On Wed, Nov 2, 2016 at 7:43 AM, twinkle sachdeva  wrote:

> Thanks Gerard for sharing it.
>
> Regards,
> Twinkle
>
> On Mon, Oct 31, 2016 at 2:35 AM, Gerard Toonstra 
> wrote:
>
> > I was looking at trying to fix AIRFLOW-137 (max_active_runs not
> respected),
> > but quickly noticed that the code that does all the scheduling is rather
> > complex with state updates going on across multiple source files in
> > multiple threads, etc.
> >
> > It's then best to find a suitable way to visualize all this complexity,
> so
> > I built this state change diagram:
> >
> > https://docs.google.com/spreadsheets/d/1vVvOwfDSacTC_
> YzwUkOMyykP6LiipCeoW_
> > V70PuFrN4/edit?usp=sharing
> >
> > The state changes represent a potential execution path where the state
> for
> > a task instance will be updated to that value. Backfill is not considered
> > in this diagram. States for dagruns/jobs/dags are also not considered.
> >
> > Could be useful for someone else.
> >
> > Rgds,
> >
> > Gerard
> >
>



-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Re: scheduler questions

2016-10-13 Thread Joseph Napolitano
Hi Boris,

To answer the first question, the backfill command has a flag to mark jobs
as successful without running them.  Take care to align the start and end
times precisely as needed.  As an example, for a job that runs daily at 7am:

airflow backfill -s 2016-10-07T07 -e 2016-10-10T07 my-dag-name -m

The "-m" parameter tells Airflow to mark it successful without running it.

On Thu, Oct 13, 2016 at 10:46 AM, Boris Tyukin 
wrote:

> Hello all and thanks for such an amazing project! I have been evaluating
> Airflow and spent a few days reading about it and playing with it and I
> have a few questions that I struggle to understand.
>
> Let's say I have a simple DAG that runs once a day and it is doing a full
> reload of tables from the source database so the process is not
> incremental.
>
> Let's consider this scenario:
>
> Day 1 - OK
>
> Day 2 - airflow scheduler or server with airflow is down for some reason
> ((or
> DAG is paused)
>
> Day 3 - still down(or DAG is paused)
>
> Day 4 - server is up and now needs to run missing jobs.
>
>
> How can I make airflow to run only Day 4 job and not backfill Day 2 and 3?
>
>
> I tried to do depend_on_past = True but it does not seem to do this trick.
>
>
> I also found in a roadmap doc this but seems it is not made to the release
> yet:
>
>
>  Only Run Latest - Champion : Sid
>
> • For cases where we need to only run the latest in a series of task
> instance runs and mark the others as skipped. For example, we may have job
> to execute a DB snapshot every day. If the DAG is paused for 5 days and
> then unpaused, we don’t want to run all 5, just the latest. With this
> feature, we will provide “cron” functionality for task scheduling that is
> not related to ETL
>
>
> My second question, what if I have another DAG that does incremental loads
> from a source table:
>
>
> Day 1 - OK, loaded new/changed data for previous day
>
> Day 2 - source system is down (or DAG is paused), Airflow DagRun failed
>
> Day 3 - source system is down (or DAG is paused), Airflow DagRun failed
>
> Day 4 - source system is up, Airflow Dagrun succeeded
>
>
> My problem (unless I am missing something), Airflow on Day 4 would use
> execution time from Day 3, so the interval for incremental load would be
> since the last run (which was Failed). My hope it would use the last
> _successful_ run so on Day 4 it would go back to Day 1. Is it possible to
> achieve this?
>
> I am aware of a manual backfill command via CLI but I am not sure I want to
> use due to all the issues and inconsistencies I've read about it.
>
> Thanks!
>



-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Re: BranchPythonOperator skips 'join' branch in all cases

2016-09-19 Thread Joseph Napolitano
Great, glad you figured it out!

Cheers

On Mon, Sep 19, 2016 at 3:29 PM, Nikita Tovstoles <nik...@stripe.com.invalid
> wrote:

> You're right, Joe. The key is to set a trigger_rule on joining task. see
> updated gist and thanks!
> https://gist.github.com/dukehoops/dae9c45c2035d50e41fee7c7d75a50dd
>
> On Mon, Sep 19, 2016 at 12:16 PM, Joseph Napolitano <
> joseph.napolit...@blueapron.com.invalid> wrote:
>
> > My guess is that this line:
> > [one, two, branch_skip_upload] >> join >> finish
> >
> > Requires all 3 tasks to complete. This may or may not be true--I haven't
> > needed to rejoin branched tasks yet. Given this though, all 3 of these
> > cannot complete. There may be a way to say "at least 1" has to complete,
> in
> > which case you'll have to join one & two into the same path.
> >
> > Hope that helps until somebody can confirm or deny.
> >
> > Cheers
> >
> > On Mon, Sep 19, 2016 at 3:07 PM, Joseph Napolitano <
> > joseph.napolit...@blueapron.com> wrote:
> >
> > > Can you confirm that "return 'branch_upload'" is "tabbed" over?  It's
> on
> > > the left margin in your Gist.
> > >
> > > On Mon, Sep 19, 2016 at 3:02 PM, Nikita Tovstoles <
> > > nik...@stripe.com.invalid> wrote:
> > >
> > >> Hi, folks:
> > >>
> > >> Airflow novice here trying to build a simple workflow where an
> upstream
> > >> task decides whether to follow an 'upload' branch or proceed directly
> to
> > >> subsequent join >> finish tasks. Regardless of whether branch callable
> > >> returns 'branch_skip_upload' or 'branch_skip_upload', tasks 'join',
> > >> 'finish' are skipped - and I want these two tasks to always execute.
> > >>
> > >> The source is here:
> > >> https://gist.github.com/dukehoops/dae9c45c2035d50e41fee7c7d75a50dd
> > >>
> > >> What am I doing incorrectly?
> > >>
> > >> Thank you.
> > >>
> > >> --
> > >> -nikita
> > >>
> > >
> > >
> > >
> > > --
> > > *Joe Napolitano *| Sr. Data Engineer
> > > www.blueapron.com | 5 Crosby Street, New York, NY 10013
> > >
> >
> >
> >
> > --
> > *Joe Napolitano *| Sr. Data Engineer
> > www.blueapron.com | 5 Crosby Street, New York, NY 10013
> >
>
>
>
> --
> -nikita
>



-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Re: BranchPythonOperator skips 'join' branch in all cases

2016-09-19 Thread Joseph Napolitano
My guess is that this line:
[one, two, branch_skip_upload] >> join >> finish

Requires all 3 tasks to complete. This may or may not be true--I haven't
needed to rejoin branched tasks yet. Given this though, all 3 of these
cannot complete. There may be a way to say "at least 1" has to complete, in
which case you'll have to join one & two into the same path.

Hope that helps until somebody can confirm or deny.

Cheers

On Mon, Sep 19, 2016 at 3:07 PM, Joseph Napolitano <
joseph.napolit...@blueapron.com> wrote:

> Can you confirm that "return 'branch_upload'" is "tabbed" over?  It's on
> the left margin in your Gist.
>
> On Mon, Sep 19, 2016 at 3:02 PM, Nikita Tovstoles <
> nik...@stripe.com.invalid> wrote:
>
>> Hi, folks:
>>
>> Airflow novice here trying to build a simple workflow where an upstream
>> task decides whether to follow an 'upload' branch or proceed directly to
>> subsequent join >> finish tasks. Regardless of whether branch callable
>> returns 'branch_skip_upload' or 'branch_skip_upload', tasks 'join',
>> 'finish' are skipped - and I want these two tasks to always execute.
>>
>> The source is here:
>> https://gist.github.com/dukehoops/dae9c45c2035d50e41fee7c7d75a50dd
>>
>> What am I doing incorrectly?
>>
>> Thank you.
>>
>> --
>> -nikita
>>
>
>
>
> --
> *Joe Napolitano *| Sr. Data Engineer
> www.blueapron.com | 5 Crosby Street, New York, NY 10013
>



-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Re: BranchPythonOperator skips 'join' branch in all cases

2016-09-19 Thread Joseph Napolitano
Can you confirm that "return 'branch_upload'" is "tabbed" over?  It's on
the left margin in your Gist.

On Mon, Sep 19, 2016 at 3:02 PM, Nikita Tovstoles  wrote:

> Hi, folks:
>
> Airflow novice here trying to build a simple workflow where an upstream
> task decides whether to follow an 'upload' branch or proceed directly to
> subsequent join >> finish tasks. Regardless of whether branch callable
> returns 'branch_skip_upload' or 'branch_skip_upload', tasks 'join',
> 'finish' are skipped - and I want these two tasks to always execute.
>
> The source is here:
> https://gist.github.com/dukehoops/dae9c45c2035d50e41fee7c7d75a50dd
>
> What am I doing incorrectly?
>
> Thank you.
>
> --
> -nikita
>



-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Re: Suggested way of passing "input parameters" to a DAG run?

2016-08-03 Thread Joseph Napolitano
What I can say is that we use it a lot, but very lightly.  We basically use
it to communicate the S3 key for a flat file between operators.

Definitely don't use it to send actual data :)

Cheers

On Wed, Aug 3, 2016 at 6:21 PM, Andrew Phillips  wrote:

> Let me know if that helps, or if I completely misunderstood :)
>>
>
> That helps, indeed - thanks, Joe! We were in fact going down exactly this
> path as an alternative; we were just a bit hesitant to use XComs based on
> the following comment in the docs [1]:
>
> "If it absolutely can’t be avoided, Airflow does have a feature for
> operator cross-communication called XCom that is described elsewhere in
> this document."
>
> The statements talks about sharing information *between* tasks, but we
> weren't sure if this should be read as "stay away from XComs unless there's
> no other option". Curious to hear the community's thoughts on that.
>
> Thanks for the quick response!
>
> ap
>
> [1] https://pythonhosted.org/airflow/concepts.html#operators
>



-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013


Re: Suggested way of passing "input parameters" to a DAG run?

2016-08-03 Thread Joseph Napolitano
There are a lot of ways to define the input source.  Let's suppose you have
these inputs in a relational database, or a flat file on S3.

The first task in your DAG would be a matter of querying for those inputs,
or grabbing the file.  The trick is getting the inputs to later tasks.  The
XCOM feature is a way to share data between your tasks, so it's a matter of
pulling the XCOM from the task that originally queried the inputs.

Suppose you had an "input operator"

class InputOperator(BaseOperator)
 with an execute method ...
def execute(self, context):
... whatever you return it retrievable in later tasks through XCOM

return {"input_key": "input_value"}

Then in your DAG

input_operator_task = ... your InputOperator 

downstream_task = SomeExistingOperator(
task_id='downstream_task',

 
keyword_arg_using_your_inputs="{{ti.xcom_pull(task_ids='input_operator_task')}}",
dag=dag
)

The XCOM pull is evaluated through the Jinja template.

Let me know if that helps, or if I completely misunderstood :)

Joe Nap

On Wed, Aug 3, 2016 at 5:29 PM, Andrew Phillips  wrote:

> Hi all
>
> What is/are the suggested way(s) of passing "input parameters" to a DAG
> run (adding quotes since, as far as we can tell, that concept doesn't exist
> natively in Airflow, probably by design)?
>
> This would be information that is used by one or multiple operators in a
> DAG run and that should not change for all task instances in that DAG run,
> but may be different for another DAG run executing concurrently. An example
> would be a Git pull request number.
>
> What we tried first was to use a Variable for this, but it doesn't look
> like that will work because the value can change during the execution of
> the DAG run. At least, that seems to be the case in the way we're using it:
>
> input_params = Variable.get()
> dag = DAG(..., params=input_params)
>
> We had hoped that this would "fix" the values of the parameters when the
> DAG run was created, but that does not seem to be the case: if the variable
> is updated (in preparation for a new DAG run) while a DAG run is active,
> tasks that haven't executed yet see the new value. I.e. we end up seeing
> this:
>
> set Variable my_param to "foo"
> dag_run_1 starts, gets the variable and passes my_param to the Dag object
> dag_run_1.op_1 evaluates {{ params.my_param }} and gets "foo"
> set Variable my_param to "bar"
> dag_run_2 starts and passes var to the Dag object
> dag_run_1.op_2 evaluates {{ params.my_param }} and sees "bar" # want this
> to still be foo!
>
> Not sure at this point whether this is a bug or, if not, whether there's a
> different way to retrieve the value of a variable that allows us to "fix"
> it for the duration of the DAG run.
>
> Or, taking a step back, is there some other approach that we could use to
> store and retrieve input data to DAGs?
>
> Regards
>
> ap
>
>
>


-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013