Fwd: Cannot access https://cms.apache.org/incubator/publish

2018-05-15 Thread siddharth anand
Kaxil,
Can you try these steps and update the airflow wiki (committer guide) based on 
your findings?

-s

Sent from Sid's iPhone 

Begin forwarded message:

> From: Martin Gainty 
> Date: May 15, 2018 at 4:21:13 AM PDT
> To: "san...@apache.org" 
> Subject: Re: Cannot access https://cms.apache.org/incubator/publish
> 
> Hi Anand
> 
> apparently the new URL is incubator.apache.org as I could not find any 
> references to cms ..here is jbake readme
> # Apache Incubator Website
> 2
> 3
>  ## Prerequisites
> 4
> 5
>  The website is built using JBake and a Groovy template.  The builds for the 
> website do require internet access.
> 6
> 7
>  - Install JBake from http://jbake.org/download.html
> 8
>  - Create an environment variable `JBAKE_HOME` pointing to your JBake 
> installation
> 9
>  - Ensure that you have a JVM locally, e.g. 
> [OpenJDK](http://openjdk.java.net/install/)
> 10
> 11
>  ## Building & Running the site
> 12
> 13
>  There is a custom `bake.sh` file that is used to build the website.  You can 
> call it with any of the [arguments you would pass to 
> jbake](http://jbake.org/docs/2.5.1/#bake_command).
> 14
>  The easiest way to use it is to run `./bake.sh -b -s` this will start up 
> JBake in a watching mode as you make changes it will refresh after a short 
> period of time.
> 15
>  While working with it locally, you'll notice that the site URLs redirect to 
> `incubator.apache.org`, to change this edit `jbake.properties` and uncomment 
> the line referencing `localhost`
> 16
> 17
>  ## Jenkins Setup
> 18
> 19
>  Commits to the `jbake-site` branch are automatically checked out and built 
> using `build_site.sh`.  Once this goes live those commits will go against 
> `master`.  The jenkins job can be found at 
> [https://builds.apache.org/view/H-L/view/Incubator/job/Incubator%20Site/](https://builds.apache.org/view/H-L/view/Incubator/job/Incubator%20Site/)
> 20
>  The result of the commits are pushed to the `asf-site` branch which are then 
> published using `gitwcsub`
> 21
> 22
>  ## Asciidoctor
> 23
> 24
>  Most of the pages in the site are written using Asciidoctor.  While it is a 
> form of asciidoc it does have some [syntax differences that are worth 
> reviewing](http://asciidoctor.org/docs/asciidoc-syntax-quick-reference/)
> 25
> 26
>  ## Groovy Templates
> 27
> 28
>  The site templates are written in groovy scripts.  Even though the files end 
> with `.gsp` they are not GSP files and do not have access to tag libraries.  
> You can run custom code in them, similar to what is done in 
> [homepage.gsp](templates/homepage.gsp)
> 
> if you have hard requirement to accessing cms.apache.org 
> write a request for username/password to cms.apache.org to site admin: 
> d...@apache.org
> 
> Good Luck!
> Martin 
> __ 
>  
> 
> 
> From: John D. Ament 
> Sent: Monday, May 14, 2018 10:01 PM
> To: gene...@incubator.apache.org
> Cc: san...@apache.org
> Subject: Re: Cannot access https://cms.apache.org/incubator/publish
>  
> The Incubator website is no longer managed via CMS.  Please review
> https://incubator.apache.org/guides/website.html
> Updating the top-level Incubator website
> incubator.apache.org
> The Incubator website is generated by the incubator git repository. The 
> primary document format is asciidoc, templates are based on gsp, and we use 
> jbake to build it. You can edit files directly on github and raise a pull 
> request or just checkout the repository at 
> https://git-wip-us.apache.org/repos ...
> 
> 
> 
> John
> 
> On Mon, May 14, 2018 at 9:17 PM Martin Gainty  wrote:
> 
> > Hi Sid
> >
> >
> > as long as you have JavaScript enabled in the browser
> > and you gave the same issue with curl then AFAIK its a permissions error
> >
> >
> > can you contract admin or webmaster and have them email you valid
> > username/password
> >
> > ?
> >
> > Martin
> > __
> >
> >
> >
> > 
> > From: Sid Anand 
> > Sent: Monday, May 14, 2018 9:12 PM
> > To: Martin Gainty
> > Cc: gene...@incubator.apache.org
> > Subject: Re: Cannot access https://cms.apache.org/incubator/publish
> >
> > So, perhaps I (sanand) don't have the necessary permissions?
> > -s
> >
> > On Mon, May 14, 2018 at 6:11 PM, Sid Anand  wrote:
> >
> > > I get the same error (I've hidden my password).
> > >
> > > sianand@LM-SJN-21002367:~ $ curl  --user sanand:
> > > https://cms.apache.org/incubator/publish
> > >
> > > 
> > >
> > > 
> > >
> > > 404 Not Found
> > >
> > > 
> > >
> > > Page Not Found
> > >
> > > The requested URL was not found on this server. If you are trying to
> > > edit a CMS-driven page, your local working copy may have been pruned.
> > >
> > > Please go to cms.apache.org,
> > find
> > > your project, and click on 'force new working copy' to create a new
> > >
> > > working copy to edit.
> > >
> > > 
> > >
> > >
> > > On Mon, May 14, 2018 at 4:22 PM, Martin Gainty 
> > > wrote:
> > >
> > >> hi sid

Re: add a page to Airflow wiki site

2017-08-30 Thread siddharth anand
I keep forgetting the images are stripped... I'm guessing you are *fenglu-g
*from the list shown in the image link.

https://www.dropbox.com/s/1bvh7pd1revb3x5/Screenshot%202017-08-30%2018.26.53.png?dl=0

On Wed, Aug 30, 2017 at 6:28 PM, siddharth anand  wrote:

> I've granted you permissions. I'm guessing you are *fenglu-g*
>
>
> [image: Inline image 1]
>
> On Wed, Aug 30, 2017 at 5:42 PM, Feng Lu 
> wrote:
>
>> Hi,
>>
>> We would like to share a design proposal on the wiki page
>> https://cwiki.apache.org/confluence/display/AIRFLOW, unfortunately it
>> doesn't look like I have the permission to do so, could someone (the
>> committers?) kindly grant me edit access?
>> Thank you.
>>
>> Feng
>>
>
>


Re: add a page to Airflow wiki site

2017-08-30 Thread siddharth anand
I've granted you permissions. I'm guessing you are *fenglu-g*


[image: Inline image 1]

On Wed, Aug 30, 2017 at 5:42 PM, Feng Lu  wrote:

> Hi,
>
> We would like to share a design proposal on the wiki page
> https://cwiki.apache.org/confluence/display/AIRFLOW, unfortunately it
> doesn't look like I have the permission to do so, could someone (the
> committers?) kindly grant me edit access?
> Thank you.
>
> Feng
>


Fwd: XCOM value within a DAG

2017-07-21 Thread siddharth anand
-- Forwarded message --
From: Vadzim Nemchenko 
Date: Thu, Jul 20, 2017 at 11:15 AM
Subject: XCOM value within a DAG
To: dev-ow...@airflow.incubator.apache.org


Hello guys,

Your Google Group is locked for creating threads, so could you help me out
please with the issue below?

I have a custom operator which pushes XCOM value as below:

...
task_instance = context['task_instance']
task_instance.xcom_push("list_of_files",file_list)...

It works fine. I have a dag definition file (my_dag.py) where I create a
task by using my own operator, it pushes the XCOM value then I want to do
for in loop by using this xcom value. How to pull it?

Best Regards, Vadzim


Re: [VOTE] Release Airflow 1.8.2 based on Airflow 1.8.2 RC2

2017-07-19 Thread siddharth anand
FYI, can anyone pictorially describe the release process (and post it on
the apache airflow wiki)? I think that would eliminate a lot of confusion
in the future and avoid a rehash of this email thread on the next release.

-s

On Wed, Jul 19, 2017 at 10:48 AM, Hitesh Shah  wrote:

> To add, the main source tarball should have instructions to generate the
> sdist and bdist versions. Additionally, as part of the release process if
> the plan is to publish to pypi (after the IPMC vote succeeds), then the
> appropriate bits also need to be verified/voted upon. There are not exactly
> counted as the official release bits but they do need to be verified as
> part of the voting process to ensure that the bits do indeed map to the
> source release, license/notice files are correct, etc.
>
> thanks
> -- Hitesh
>
>
> On Tue, Jul 18, 2017 at 12:01 AM, Bolke de Bruin 
> wrote:
>
> > Thanks Hitesh. We discussed it with John Ament on the IPMC. Python has
> the
> > notion of 3 types of distributions, “source”, “sdist”, “bdist”, contrary
> to
> > Java that knows only two (source, bdist). We used to vote on “sdist”,
> which
> > was deemed incorrect.
> >
> > So, Max, indeed we need to vote on a tar.gz that contains build
> > instructions in INSTALL to get to “sdist”. The build instructions should
> > also contain instruction how to run the license checks by Apache Rat.
> Most
> > of the work probably goes in the build instructions and verifying they
> > work, but it should not be much.
> >
> > Any other clarification required?
> >
> > Bolke
> >
> >
>


Re: Podling Report Reminder - July 2017

2017-07-05 Thread siddharth anand
I've updated the Airflow report on
https://wiki.apache.org/incubator/July2017

Do let me know if you have any questions.
-s


On Wed, Jul 5, 2017 at 7:00 AM,  wrote:

> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 19 July 2017, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, July 05).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
> *   How does the podling rate their own maturity.
>
> This should be appended to the Incubator Wiki page at:
>
> https://wiki.apache.org/incubator/July2017
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>


Re: Question about updating podling report

2017-07-04 Thread siddharth anand
Oh, I'm mixing 2 different things.. http://svn.apache.
org/repos/asf/incubator/public/trunk/content/projects/ is where our project
page is hosted, but podling reports are submitted over the wiki webform.

Right.. so, anyone have instructions on how to update the project page in
case I need to update that at some time?
-s

On Tue, Jul 4, 2017 at 2:01 PM, siddharth anand  wrote:

> Folks!
> As I joined a new role (at a new company), it means that I also started
> using a new laptop. I'd like to update the podling report (due tomorrow(,
> but don't remember the specifics for connecting via svn to the incubator
> repo. Anyone have instructions?
>
> I believe I need to generate a new SSH key and store it on
> https://id.apache.org/
>
> I'm trying... svn co svn+ssh://san...@svn.apache.org/repos/asf/incubator
> That's not working.
>
> -s
>


Question about updating podling report

2017-07-04 Thread siddharth anand
Folks!
As I joined a new role (at a new company), it means that I also started
using a new laptop. I'd like to update the podling report (due tomorrow(,
but don't remember the specifics for connecting via svn to the incubator
repo. Anyone have instructions?

I believe I need to generate a new SSH key and store it on
https://id.apache.org/

I'm trying... svn co svn+ssh://san...@svn.apache.org/repos/asf/incubator
That's not working.

-s


Re: Podling Report Reminder - July 2017

2017-07-01 Thread siddharth anand
Adding this to my to-do list.

On Tue, Jun 27, 2017 at 4:54 PM,  wrote:

> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 19 July 2017, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, July 05).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
> *   How does the podling rate their own maturity.
>
> This should be appended to the Incubator Wiki page at:
>
> https://wiki.apache.org/incubator/July2017
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>


Re: DAGs dont get refreshed ?

2017-06-20 Thread siddharth anand
There is a manual DAG refresh option in the UI for DAGs the UI is already
aware of -- this will reload a DAG. But that's not a complete solution to
the the more general DAG refresh problem.

-s

On Tue, Jun 20, 2017 at 6:36 PM, siddharth anand  wrote:

> To clarify, at Agari, we use monitd (like systemd) to restart both
> webserver and scheduler (running local executor) after deploying new dags
> to the dag folder. The Web UI does not discover new DAGs and it does
> automatically reload changes to existing files either.
>
> -s
>
> On Tue, Jun 20, 2017 at 6:34 PM, siddharth anand 
> wrote:
>
>> We actually do restart both Web and Schedulers. I know the scheduler does
>> reparse the files in the dag folder, but the current state of the web ui
>> does require a restart.
>>
>> -s
>>
>> On Tue, Jun 20, 2017 at 6:07 PM, Ashika Umanga Umagiliya <
>> umanga@gmail.com> wrote:
>>
>>> Greetings,
>>>
>>> We are using airflow (1.7) to manage our ETL pipeline and we are having
>>> issues related to refreshing of DAGs.
>>>
>>> When we update the DAG python script inside "dag folder" ,they don't get
>>> updated in the UI.(DAG tree as well as the Code in the UI). We have to
>>> kill
>>> and restart the "airflow webserver" process for them to be updated.Isn't
>>> there a hot-update feature in airflow?
>>> Is there any workaround to fix this issue ?
>>>
>>
>>
>


Re: DAGs dont get refreshed ?

2017-06-20 Thread siddharth anand
To clarify, at Agari, we use monitd (like systemd) to restart both
webserver and scheduler (running local executor) after deploying new dags
to the dag folder. The Web UI does not discover new DAGs and it does
automatically reload changes to existing files either.

-s

On Tue, Jun 20, 2017 at 6:34 PM, siddharth anand  wrote:

> We actually do restart both Web and Schedulers. I know the scheduler does
> reparse the files in the dag folder, but the current state of the web ui
> does require a restart.
>
> -s
>
> On Tue, Jun 20, 2017 at 6:07 PM, Ashika Umanga Umagiliya <
> umanga@gmail.com> wrote:
>
>> Greetings,
>>
>> We are using airflow (1.7) to manage our ETL pipeline and we are having
>> issues related to refreshing of DAGs.
>>
>> When we update the DAG python script inside "dag folder" ,they don't get
>> updated in the UI.(DAG tree as well as the Code in the UI). We have to
>> kill
>> and restart the "airflow webserver" process for them to be updated.Isn't
>> there a hot-update feature in airflow?
>> Is there any workaround to fix this issue ?
>>
>
>


Re: DAGs dont get refreshed ?

2017-06-20 Thread siddharth anand
We actually do restart both Web and Schedulers. I know the scheduler does
reparse the files in the dag folder, but the current state of the web ui
does require a restart.

-s

On Tue, Jun 20, 2017 at 6:07 PM, Ashika Umanga Umagiliya <
umanga@gmail.com> wrote:

> Greetings,
>
> We are using airflow (1.7) to manage our ETL pipeline and we are having
> issues related to refreshing of DAGs.
>
> When we update the DAG python script inside "dag folder" ,they don't get
> updated in the UI.(DAG tree as well as the Code in the UI). We have to kill
> and restart the "airflow webserver" process for them to be updated.Isn't
> there a hot-update feature in airflow?
> Is there any workaround to fix this issue ?
>


Re: Passing Variables

2017-06-20 Thread siddharth anand
Ah.. I completely missed the question.. in my haste to do too many things.

Assuming you have a DAG named process_my_data with 3 tasks :
read__from_source_table --> transform --> write_to_new_table. This dag
should have a @none schedule.

You could write a script to read your list of source tables and call
airflow trigger_dag -c  -e . This will launch a dag execution run for
each of the input that you call. I believe that the execution date should
differ by 1 second (timestamp granularity in the db).. so avoid a tight
loop with a 1 second sleep between executions.

You will see N dag runs, one for each of the N source tables that you pass
in.

-s

On Tue, Jun 20, 2017 at 12:22 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> One DAG cannot have multiple shapes at one time, by design. You cannot
> parameterize things that will affect the shape of your DAG (though note
> that you can fully parameterize what happens within individual task
> instances). Think about it, a DAG is one (and only one) graph. It's NOT a
> shapeshifting thing.
>
> As a workaround, and this may or may not be the right thing to do, you can
> write a DAG factory function, that will return a DAG object given
> parameters, but any given DAG instance (with a unique dag_id) has a single
> shape. If you do want to go that route, may want to use
> `schedule_interval='@once'`
>
> If you think the shape of your DAG needs to change from one DAG run to the
> next, you may want to re-think what is static and what is dynamic. Are your
> database tables schema changing from one DAG run to the next? No right?
> That'd be crazy! Most likely you want to think about the shape of your DAG
> in a similar way as you think about the schema of your tables: static or
> slowly changing.
>
> Max
>
> On Mon, Jun 19, 2017 at 4:11 AM, Rob Harrison  wrote:
>
> > Hi,
> >
> > I would like to pass a variable to my airflow dag and would like to know
> if
> > there is a recommended method for doing this.
> >
> > I am hoping to create a dag with python operators and tasks that read
> data
> > from a parquet table, perform a calculation then write the results into a
> > new table. I'd like to pass the source table name in along with the task
> > when calling the dag from the command line.
> >
> > From what I have read, the following can be used to read a variable from
> > the command line:
> >
> > airflow variables -s myvar="value"
> >
> > Does anyone have an example of this they can share?
> >
> > Thank you,
> > Rob
> >
>


Re: Passing Variables

2017-06-20 Thread siddharth anand
We use Airflow variables heavily.

from airflow.models import Variable

# Load an environment variable as a string

ENV = Variable.get('ENV').strip()

# Load an environment variable as JSON and access a JSON field named
PLATFORM

PLATFORM = 'EP'

SSH_KEY = Variable.get('ep_platform_ssh_keys',
deserialize_json=True)[PLATFORM]


You can put this code in your dag file or in any python code your dag file
imports.

-s

On Mon, Jun 19, 2017 at 4:11 AM, Rob Harrison  wrote:

> Hi,
>
> I would like to pass a variable to my airflow dag and would like to know if
> there is a recommended method for doing this.
>
> I am hoping to create a dag with python operators and tasks that read data
> from a parquet table, perform a calculation then write the results into a
> new table. I'd like to pass the source table name in along with the task
> when calling the dag from the command line.
>
> From what I have read, the following can be used to read a variable from
> the command line:
>
> airflow variables -s myvar="value"
>
> Does anyone have an example of this they can share?
>
> Thank you,
> Rob
>


Re: New Apache Airflow meetup : in Tokyo

2017-06-09 Thread siddharth anand
Thx Kengo san.
https://twitter.com/ApacheAirflow/status/873288362391609345

You may also update
https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements directly
in the future.

You now have full admin perms on the Wiki.
-s

On Thu, Jun 8, 2017 at 9:20 PM, Kengo Seki  wrote:

> Thanks a lot, Sid!
> Other slides have been published now. May I ask you to tweet and add
> them to the wiki too?
>
> https://www.slideshare.net/techblogyahoo/oozieairflow-apacheairflow-oozie
> https://speakerdeck.com/hatappi/airflowkarakuroko2nicheng-rihuan-etawake
>
> Airflow is not so popular in Japan yet, and a part of its reason is
> few information in Japanese, IMHO.
> The above slides are written in Japanese and really valuable for the
> (potential) Airflow users in Japan.
>
> Regards,
>
> Kengo Seki 
>
>
> 2017-05-16 3:23 GMT+09:00 siddharth anand :
> > Thx Kengo!
> >
> > I've added @takus slides to Airflow Links
> > <https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links> &
> > Announcements
> > <https://cwiki.apache.org/confluence/display/AIRFLOW/
> Announcements#Announcements-April12,2017>
> > &
> > tweeted them!
> >
> > -s
> >
> > On Thu, May 11, 2017 at 5:26 PM, Kengo Seki  wrote:
> >
> >> Thank you for the announcement, Sid!
> >> Yes, we held the first meetup in Tokyo last night. About 20
> >> participants enjoyed the following talks:
> >>
> >> * Takumi Sakamoto (@takus, Kaizen Platform) did a nice introduction
> >> about Airflow.
> >>   He explained Airflow's nice features such as pool, SLA and backfill,
> >>   and showed interesting tips such as combination with Jupyter
> >> Notebook and Datadog.
> >>   His slide is published in:
> >> https://speakerdeck.com/takus/building-data-pipelines-with-
> apache-airflow
> >>
> >> * Tomoki Uekusa (@tmk_ueks, Yahoo! Japan) showed their usecase in
> >> production.
> >>   He replaced some Oozie nodes with Airflow and be happy now :)
> >>   He also introduced useful tips, such as utilizing tree view and gantt
> >> chart,
> >>   and explained the way to implement plugins and showed some real
> examples.
> >>
> >> * Yusaku Hatanaka (@hatappi, Speee) compared Airflow and Kuroko2
> >> (https://github.com/cookpad/kuroko2).
> >>   Unfortunately they moved to the latter, because they didn't needed
> >> all of Airflow's rich features
> >>   and preferred Ruby over Python. But he explained Airflow's pros and
> >> cons in a comprehensible way,
> >>   and introduced some pitfalls and useful workarounds such as a
> >> problem caused by timezone setting.
> >>
> >> * I (@sekikn39, NTT Data) explained how to contribute Airflow,
> >>   for example the way to search and create JIRA issues, run unit tests
> >> and submit PRs.
> >>
> >> I really appreciate all speakers and audiences, and Yahoo! Japan folks
> >> for hosting this event.
> >> Though there's time difference between us, I hope to see some of you
> >> core developers next time!
> >>
> >> Regards,
> >>
> >> Kengo Seki 
> >>
> >>
> >> 2017-04-13 7:09 GMT+09:00 siddharth anand :
> >> > Live in Tokyo & want to contribute to @ApacheAirflow
> >> > <https://twitter.com/ApacheAirflow>? Check out our new Tokyo meetup :
> >> > http://bit.ly/2o7jXWF  <https://t.co/4yaEfFwqu0>. First meetup on May
> >> 11 :
> >> > https://www.meetup.com/Tokyo-Apache-Airflow-incubating-
> >> Meetup/events/238731591/
> >> >
> >> > Thanks to Kengo Seki (@sekikn) for taking the lead on this!
> >> > -s
> >>
>


Re: Concurrent schedulers

2017-05-23 Thread siddharth anand
I did run into "double SLA miss alarms" firing, but that was on 1.7x. I
haven't tested if that is still an issue in 1.8x.

-s

On Tue, May 23, 2017 at 8:46 AM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Awesome. I wasn't aware of DagRun locking, this is even better!
>
> Max
>
> On Mon, May 22, 2017 at 11:39 PM, Bolke de Bruin 
> wrote:
>
> > Hi Max,
> >
> > We seem to be in quite good order already. We are testing with multi
> > master mysql and will also test multi master Postgres. As we are doing
> > dagrun level locking already it does not seem to be required to do
> > DAG-level locking. Also tasks are being locked so if multiple schedulers
> > are running everything seems to be quite fine. If one of the schedulers
> > restarts it starts checking for orphaned tasks by checking the executor
> > queue which is unique for every scheduler. This will result it some tasks
> > being dequeued and then requeued. So airflow is robust enough to stay
> alive
> > then (with my patch for deadlocks applied), but some things are a bit
> > sub-optimal.
> >
> > As mentioned we are still stress testing this setup and we might find
> more.
> >
> > Bolke
> >
> > > On 22 May 2017, at 18:19, Maxime Beauchemin <
> maximebeauche...@gmail.com>
> > wrote:
> > >
> > > Things that might be needed for a correct multi-schedulers setup:
> > > * DAG-level lock while being evaluated
> > > * DAG-level lock expiration to recover from potential situation where
> the
> > > lock wasn't released
> > > * Accumulation of the list of task instances to run into the database
> (as
> > > opposed to cross process communication to master process)
> > > * Define a clear master cycle that would read the list of accumulated
> > task
> > > instances from the DB, dedup, prioritize and schedule. That master
> cycle
> > > should have a lock (and lock expiration) as well.
> > >
> > > Max
> > >
> > > On Mon, May 22, 2017 at 12:27 AM, Bolke de Bruin 
> > wrote:
> > >
> > >> Hi Stephen,
> > >>
> > >> We are currently stress testing Airflow for use in a multi-master
> setup.
> > >> One of my team members is doing a write up that should show up online
> > >> shortly. TL;DR; in its current state Airflow will need some patches in
> > >> order to run concurrently. One issue is that Airflow can have a
> database
> > >> deadlock which will stop the scheduler from running. I have a patch
> for
> > >> that out here (https://github.com/apache/incubator-airflow/pull/2267
> <
> > >> https://github.com/apache/incubator-airflow/pull/2267>) that works
> fine
> > >> on Postgres/MySql (tests don’t pass on sqlite yet due to limitations
> of
> > >> sqlite).
> > >>
> > >> Your global scheduler lock (eg. by an active passive configuration)
> > might
> > >> make most sense for now.
> > >>
> > >> Bolke
> > >>
> > >>> On 22 May 2017, at 07:52, Stephen Rigney  wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> We're running airflow in production, but for reliability (n.b. not
> > >>> performance) we'd like to confirm if it is safe to spawn multiple
> > >> instances
> > >>> of the scheduler overlapping in time (otherwise we may need to put
> more
> > >>> effort into assuring two copies aren't ever spawned at once in our
> > >>> environment).
> > >>>
> > >>>
> > >>> It seems this officially wasn't a supported configuration back in
> 2015
> > (
> > >>> https://groups.google.com/d/msg/airbnb_airflow/-
> > 1wKa3OcwME/uATa8y3YDAAJ
> > >> ),
> > >>> but has sufficient intra-airflow locking been added that it is now
> safe
> > >> to
> > >>> start up two temporally overlapping instances of the scheduler for
> the
> > >> same
> > >>> airflow system?
> > >>>
> > >>>
> > >>> Or should we hack in a "global scheduler lock" - we're not looking
> for
> > >>> increased performance by scheduler parallelism, just that if we ever
> > fire
> > >>> up two instances of the scheduler nothing terrible happens?
> > >>>
> > >>>
> > >>> Stephen
> > >>
> > >>
> >
> >
>


Re: Removing members from dev list?

2017-05-22 Thread siddharth anand
Great. Thx Andrew.
-s

On Mon, May 22, 2017 at 5:23 AM, Andrew Phillips  wrote:

> Hi Siddarth
>
> How do we (PMC) remove a email recipient from the dev list?
>>
>
> Anyone who is a moderator of the list should be able to request removal of
> a subscriber by sending an email to [1]:
>
> {listname}-unsubscribe-badboy=menace@tlp.apache.org
>
> I.e. in this case
>
> dev-unsubscribe-kerzhner=yahoo-inc@airflow.incubator.apache.org
>
> Regards
>
> ap
>
> [1] https://reference.apache.org/pmc/ml#problem_posts
>


Removing members from dev list?

2017-05-21 Thread siddharth anand
How do we (PMC) remove a email recipient from the dev list? I keep getting
requests to moderate the following because "kerzh...@yahoo-inc.com" is no
longer at the yahoo.

"kerzh...@yahoo-inc.com is no longer with Yahoo! Inc."

-s


Re: New Apache Airflow meetup : in Tokyo

2017-05-15 Thread siddharth anand
Thx Kengo!

I've added @takus slides to Airflow Links
<https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links> &
Announcements
<https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements#Announcements-April12,2017>
&
tweeted them!

-s

On Thu, May 11, 2017 at 5:26 PM, Kengo Seki  wrote:

> Thank you for the announcement, Sid!
> Yes, we held the first meetup in Tokyo last night. About 20
> participants enjoyed the following talks:
>
> * Takumi Sakamoto (@takus, Kaizen Platform) did a nice introduction
> about Airflow.
>   He explained Airflow's nice features such as pool, SLA and backfill,
>   and showed interesting tips such as combination with Jupyter
> Notebook and Datadog.
>   His slide is published in:
> https://speakerdeck.com/takus/building-data-pipelines-with-apache-airflow
>
> * Tomoki Uekusa (@tmk_ueks, Yahoo! Japan) showed their usecase in
> production.
>   He replaced some Oozie nodes with Airflow and be happy now :)
>   He also introduced useful tips, such as utilizing tree view and gantt
> chart,
>   and explained the way to implement plugins and showed some real examples.
>
> * Yusaku Hatanaka (@hatappi, Speee) compared Airflow and Kuroko2
> (https://github.com/cookpad/kuroko2).
>   Unfortunately they moved to the latter, because they didn't needed
> all of Airflow's rich features
>   and preferred Ruby over Python. But he explained Airflow's pros and
> cons in a comprehensible way,
>   and introduced some pitfalls and useful workarounds such as a
> problem caused by timezone setting.
>
> * I (@sekikn39, NTT Data) explained how to contribute Airflow,
>   for example the way to search and create JIRA issues, run unit tests
> and submit PRs.
>
> I really appreciate all speakers and audiences, and Yahoo! Japan folks
> for hosting this event.
> Though there's time difference between us, I hope to see some of you
> core developers next time!
>
> Regards,
>
> Kengo Seki 
>
>
> 2017-04-13 7:09 GMT+09:00 siddharth anand :
> > Live in Tokyo & want to contribute to @ApacheAirflow
> > <https://twitter.com/ApacheAirflow>? Check out our new Tokyo meetup :
> > http://bit.ly/2o7jXWF  <https://t.co/4yaEfFwqu0>. First meetup on May
> 11 :
> > https://www.meetup.com/Tokyo-Apache-Airflow-incubating-
> Meetup/events/238731591/
> >
> > Thanks to Kengo Seki (@sekikn) for taking the lead on this!
> > -s
>


Re: Article: Why Robinhood uses Airflow

2017-05-15 Thread siddharth anand
Tweeted it using the Airflow account!

-s

On Thu, May 11, 2017 at 5:33 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> https://robinhood.engineering/why-robinhood-uses-airflow-aed13a9a90c8
>
> Grateful to have you on board Robinhood!
>


Re: IMPORTANT: I need your pypi usernames

2017-05-11 Thread siddharth anand
Chris,
Sorry for the delay, my user name is r39132 for both of those!

-s

On Tue, May 9, 2017 at 1:25 PM, Chris Riccomini 
wrote:

> I have added the following:
>
> https://pypi.python.org/pypi/apache-airflow
> artwr, aoen, mistercrunch
>
> https://testpypi.python.org/pypi/apache-airflow
> mistercrunch
>
> Others, please provide your usernames. Everyone is being granted ownership
> access.
>
> On Tue, May 9, 2017 at 12:57 PM, Chris Riccomini 
> wrote:
>
> > Hey all,
> >
> > As part of 1.8.1, we are migrating from the `airflow` to `apache-airflow`
> > package name in PyPi. I have the new space, but I need everyone's
> usernames
> > for PyPi (both regular and test), so that all committers can publish new
> > releases. Please reply with your usernames for both:
> >
> > https://pypi.python.org
> > https://testpypi.python.org
> >
> > Cheers,
> > Chris
> >
>


Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-19 Thread siddharth anand
https://issues.apache.org/jira/browse/AIRFLOW-1121 is merged to fix the
webserver pid issue.. thx Kengo!

-s

On Tue, Apr 18, 2017 at 6:15 PM, Hitesh Shah  wrote:

> -1.
>
> Not sure if these have been called out earlier.
>
> For all the bundled files with different licenses (MIT, BSD, etc), the full
> texts of these licenses should be in the source tarball preferably at the
> end of the LICENSE file.
> webgl-2d needs to be called out as MIT license.
> Version in pkg-info has an rc0 notation. It should just be
> 1.8.1-incubating.
> A bunch of files under apache_airflow.egg-info/ and scripts/systemd/ need a
> license header
> Likewise for airflow/www/templates/airflow/variables/README.md
>
> Nice to have:
> Fix the top-level dir in the tarball to be
> "apache-airflow-1.8.1-incubating" instead of
> "apache-airflow-1.8.1rc0+apache.incubating"
>
> For all the other binary files (images, gifs), is there source provenance
> for all of them and that all of them are covered by the licenses in the
> LICENSE file?
>
> Last point - are all the entries in the NOTICE file required or do they
> just need to be in the LICENSE file? Any additions to the NOTICE have
> downstream repercussions as they need to be propagated down by any other
> project using airflow.
>
> thanks
> -- Hitesh
>
>
>
> On Mon, Apr 17, 2017 at 11:24 AM, Chris Riccomini 
> wrote:
>
> > Dear All,
> >
> > I have been able to make the Airflow 1.8.1 RC0 available at:
> > https://dist.apache.org/repos/dist/dev/incubator/airflow, public keys
> are
> > available at https://dist.apache.org/repos/
> dist/release/incubator/airflow.
> >
> > Issues fixed:
> >
> > [AIRFLOW-1062] DagRun#find returns wrong result if external_trigg
> > [AIRFLOW-1054] Fix broken import on test_dag
> > [AIRFLOW-1050] Retries ignored - regression
> > [AIRFLOW-1033] TypeError: can't compare datetime.datetime to None
> > [AIRFLOW-1030] HttpHook error when creating HttpSensor
> > [AIRFLOW-1017] get_task_instance should return None instead of th
> > [AIRFLOW-1011] Fix bug in BackfillJob._execute() for SubDAGs
> > [AIRFLOW-1001] Landing Time shows "unsupported operand type(s) fo
> > [AIRFLOW-1000] Rebrand to Apache Airflow instead of Airflow
> > [AIRFLOW-989] Clear Task Regression
> > [AIRFLOW-974] airflow.util.file mkdir has a race condition
> > [AIRFLOW-906] Update Code icon from lightning bolt to file
> > [AIRFLOW-858] Configurable database name for DB operators
> > [AIRFLOW-853] ssh_execute_operator.py stdout decode default to A
> > [AIRFLOW-832] Fix debug server
> > [AIRFLOW-817] Trigger dag fails when using CLI + API
> > [AIRFLOW-816] Make sure to pull nvd3 from local resources
> > [AIRFLOW-815] Add previous/next execution dates to available def
> > [AIRFLOW-813] Fix unterminated unit tests in tests.job (tests/jo
> > [AIRFLOW-812] Scheduler job terminates when there is no dag file
> > [AIRFLOW-806] UI should properly ignore DAG doc when it is None
> > [AIRFLOW-794] Consistent access to DAGS_FOLDER and SQL_ALCHEMY_C
> > [AIRFLOW-785] ImportError if cgroupspy is not installed
> > [AIRFLOW-784] Cannot install with funcsigs > 1.0.0
> > [AIRFLOW-780] The UI no longer shows broken DAGs
> > [AIRFLOW-777] dag_is_running is initlialized to True instead of
> > [AIRFLOW-719] Skipped operations make DAG finish prematurely
> > [AIRFLOW-694] Empty env vars do not overwrite non-empty config v
> > [AIRFLOW-139] Executing VACUUM with PostgresOperator
> > [AIRFLOW-111] DAG concurrency is not honored
> > [AIRFLOW-88] Improve clarity Travis CI reports
> >
> > I would like to raise a VOTE for releasing 1.8.1 based on release
> candidate
> > 0, i.e. just renaming release candidate 0 to 1.8.1 release.
> >
> > Please respond to this email by:
> >
> > +1,0,-1 with *binding* if you are a PMC member or *non-binding* if you
> are
> > not.
> >
> > Vote will run for 72 hours (ends this Thursday).
> >
> > Thanks!
> > Chris
> >
> > My VOTE: +1 (binding)
> >
>


Re: Best practices on Long running process over LB

2017-04-18 Thread siddharth anand
Another approach :
1. Airflow calls webservice in a fire-and-forget fashion
2. Webservice updates a message bus/stream (e.g. SQS) with result
3. An airfllow sensor pulls updates off SQS and processes them

This saves airflow from polling your webservice which would in turn poll
your DB. Additionally, it avoids coupling your airflow instance to the
availability of your webservice and DB. Also, you'd need to implement an
efficient http endpoint to return status on a potentially long list of
status_ids and then you'd need to manage that list of ids.

SQS is great.  It's cheap to poll (and SQS supports long-polling as well)
and doesn't couple Airflow to the uptime of your webservice and DB. SQS
also supports batch reads and is transactional.

-s

On Tue, Apr 18, 2017 at 3:44 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> The proper way to do this is for your service to return a token (unique
> identifier for the long running process) asynchronously (immediately), and
> to then call another endpoint to check on the status while passing this
> token.
>
> Since this is Airflow and you have the luxury of having a lot of predefined
> sensors, you may just have to call a trigger endpoint async, and in the
> next task have a sensor look for the actual byproduct of that service's
> process (say if the process generates an S3 file, you'd have an S3Sensor
> right after the trigger task). The good thing with this approach is that
> this is more "stateless" than the approach where you are using a token (it
> allows for tasks to die without worrying about the token).
>
> Max
>
> On Tue, Apr 18, 2017 at 2:47 PM, Amit Jain  wrote:
>
> > Hi All,
> >
> > We have a use case where we are building Airflow DAG consisting of few
> > tasks and each task (HttpOperator) is calling the service running behind
> > AWS Elastic Load Balancer (ELB).
> >
> > Since these tasks are the long running process so I'm getting 504 GATEWAY
> > TIMEOUT HTTP status code and resulting into incorrect task status at
> > Airflow side.
> >
> > IMO to solve this problem, we can choose among following approaches
> >
> >- Make a call to the service and service will send back response and
> >process actual request in another thread/process. One monitoring
> thread
> >would heartbeat about task status to DB. At Airflow side, immediate
> task
> >after each HttpOperator, we should have a sensor which should check
> for
> > the
> >status change in given poke interval.
> >- Since we have around 1500 task running per hour so using service
> >discovery system like Apache Zookeeper to get the node in round-robin
> >fashion would make a direct connection with the node running service.
> >- AWS ELB has limitation over HTTP idle-timeout to 1hr and my tasks
> are
> >taking ~ 3 hr to get it done so no change at AWS ELB possible
> >
> >
> > Both approaches have cons first one, makes us change our current flow at
> > each service side i.e. handle a request in async mode, start heartbeat on
> > executing process/thread status in some interval hence the DB writes.
> >
> > I'm interested to know how you guys are handling this problem and any
> > suggestion or improvement in mentioned approaches I can use.
> >
> >
> > Thanks,
> > Amit
> >
>


Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-18 Thread siddharth anand
https://issues.apache.org/jira/browse/AIRFLOW-1121

Jira filed.

On Tue, Apr 18, 2017 at 1:27 PM, siddharth anand  wrote:

> Sure. As soon as I get out of my meetings.
>
> -s
>
> On Tue, Apr 18, 2017 at 1:01 PM Chris Riccomini 
> wrote:
>
>> @Sid, can you open JIRA(s), and assign them as blockers to 1.8.1?
>>
>> On Tue, Apr 18, 2017 at 12:39 PM, siddharth anand 
>> wrote:
>>
>> > I've run into a regression with the webserver. It looks like the --pid
>> > argument is no longer honored in 1.8.1. The pid file is not being
>> written
>> > out! As a result, monitd, which watches the processes mentioned in the
>> pid
>> > file, keep trying to spawn webservers.
>> >
>> > HISTTIMEFORMAT="%d/%m/%y %T "
>> > PYTHONPATH=/usr/local/agari/ep-pipeline/production/
>> > current/analysis/cluster/:/usr/local/agari/ep-pipeline/
>> > production/current/analysis/lookups/
>> > TMP=/data/tmp AIRFLOW_HOME=/data/airflow
>> > PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin airflow webserver -p
>> > 8080  --pid /data/airflow/pids/airflow-webserver.pid
>> >
>> > The "upgrade" process for 1.8.1. is not simply  "pip install 1.8.1.
>> > tarball". It requires a "pip uninstall" of the previous 1.8.0 version
>> > followed by an new installation of 1.8.1. This could have pulled in some
>> > new dependencies that broke how this works.
>> >
>> > On Tue, Apr 18, 2017 at 12:32 PM, siddharth anand 
>> > wrote:
>> >
>> > > Hmn.. it always worked for me for any of the releases we installed. I
>> > > install `pip install `
>> > >
>> > > -s
>> > >
>> > > On Tue, Apr 18, 2017 at 10:44 AM, Chris Riccomini <
>> criccom...@apache.org
>> > >
>> > > wrote:
>> > >
>> > >> @Sid, how do you enable the versioning? I've never been able to get
>> this
>> > >> to
>> > >> work in my environment. It always shows "Not available", even with
>> > 1.8.0.
>> > >>
>> > >> On Mon, Apr 17, 2017 at 11:18 PM, Bolke de Bruin 
>> > >> wrote:
>> > >>
>> > >> > Hey Alex,
>> > >> >
>> > >> > I agree with you that they are nice to have, but as you mentioned
>> they
>> > >> are
>> > >> > not blockers. As we are moving towards time based releases I
>> suggest
>> > >> > marking them for 1.8.2 and cherry-picking them in your production.
>> > >> >
>> > >> > - Bolke.
>> > >> >
>> > >> > > On 18 Apr 2017, at 00:02, Alex Guziel
>> > > >> D>
>> > >> > wrote:
>> > >> > >
>> > >> > > Sorry about that. FWIW, these were recent and I don't think they
>> > were
>> > >> > > blockers but are nice to fix. Particularly, the tree one was
>> > forgotten
>> > >> > > about. I remember seeing it at the Airflow hackathon but I guess
>> I
>> > >> forgot
>> > >> > > to correct it.
>> > >> > >
>> > >> > > On Mon, Apr 17, 2017 at 12:17 PM, Chris Riccomini <
>> > >> criccom...@apache.org
>> > >> > >
>> > >> > > wrote:
>> > >> > >
>> > >> > >> :(:(:( Why was this not included in 1.8.1 JIRA? I've been
>> emailing
>> > >> the
>> > >> > list
>> > >> > >> all last week
>> > >> > >>
>> > >> > >> On Mon, Apr 17, 2017 at 11:28 AM, Alex Guziel <
>> > >> > >> alex.guz...@airbnb.com.invalid> wrote:
>> > >> > >>
>> > >> > >>> I would say to include [1074] (
>> > >> > >>> https://github.com/apache/incubator-airflow/pull/2221) so we
>> > don't
>> > >> > have
>> > >> > >> a
>> > >> > >>> regression in the release after. I would also say
>> > >> > >>> https://github.com/apache/incubator-airflow/pull/2241 is semi
>> > >> > important
>> > >> > >>> but
>> > >> > >>> less so.
>> > >> > >>>
>> > >> > >>> On Mo

Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-18 Thread siddharth anand
Sure. As soon as I get out of my meetings.

-s

On Tue, Apr 18, 2017 at 1:01 PM Chris Riccomini 
wrote:

> @Sid, can you open JIRA(s), and assign them as blockers to 1.8.1?
>
> On Tue, Apr 18, 2017 at 12:39 PM, siddharth anand 
> wrote:
>
> > I've run into a regression with the webserver. It looks like the --pid
> > argument is no longer honored in 1.8.1. The pid file is not being written
> > out! As a result, monitd, which watches the processes mentioned in the
> pid
> > file, keep trying to spawn webservers.
> >
> > HISTTIMEFORMAT="%d/%m/%y %T "
> > PYTHONPATH=/usr/local/agari/ep-pipeline/production/
> > current/analysis/cluster/:/usr/local/agari/ep-pipeline/
> > production/current/analysis/lookups/
> > TMP=/data/tmp AIRFLOW_HOME=/data/airflow
> > PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin airflow webserver -p
> > 8080  --pid /data/airflow/pids/airflow-webserver.pid
> >
> > The "upgrade" process for 1.8.1. is not simply  "pip install 1.8.1.
> > tarball". It requires a "pip uninstall" of the previous 1.8.0 version
> > followed by an new installation of 1.8.1. This could have pulled in some
> > new dependencies that broke how this works.
> >
> > On Tue, Apr 18, 2017 at 12:32 PM, siddharth anand 
> > wrote:
> >
> > > Hmn.. it always worked for me for any of the releases we installed. I
> > > install `pip install `
> > >
> > > -s
> > >
> > > On Tue, Apr 18, 2017 at 10:44 AM, Chris Riccomini <
> criccom...@apache.org
> > >
> > > wrote:
> > >
> > >> @Sid, how do you enable the versioning? I've never been able to get
> this
> > >> to
> > >> work in my environment. It always shows "Not available", even with
> > 1.8.0.
> > >>
> > >> On Mon, Apr 17, 2017 at 11:18 PM, Bolke de Bruin 
> > >> wrote:
> > >>
> > >> > Hey Alex,
> > >> >
> > >> > I agree with you that they are nice to have, but as you mentioned
> they
> > >> are
> > >> > not blockers. As we are moving towards time based releases I suggest
> > >> > marking them for 1.8.2 and cherry-picking them in your production.
> > >> >
> > >> > - Bolke.
> > >> >
> > >> > > On 18 Apr 2017, at 00:02, Alex Guziel
>  > >> D>
> > >> > wrote:
> > >> > >
> > >> > > Sorry about that. FWIW, these were recent and I don't think they
> > were
> > >> > > blockers but are nice to fix. Particularly, the tree one was
> > forgotten
> > >> > > about. I remember seeing it at the Airflow hackathon but I guess I
> > >> forgot
> > >> > > to correct it.
> > >> > >
> > >> > > On Mon, Apr 17, 2017 at 12:17 PM, Chris Riccomini <
> > >> criccom...@apache.org
> > >> > >
> > >> > > wrote:
> > >> > >
> > >> > >> :(:(:( Why was this not included in 1.8.1 JIRA? I've been
> emailing
> > >> the
> > >> > list
> > >> > >> all last week
> > >> > >>
> > >> > >> On Mon, Apr 17, 2017 at 11:28 AM, Alex Guziel <
> > >> > >> alex.guz...@airbnb.com.invalid> wrote:
> > >> > >>
> > >> > >>> I would say to include [1074] (
> > >> > >>> https://github.com/apache/incubator-airflow/pull/2221) so we
> > don't
> > >> > have
> > >> > >> a
> > >> > >>> regression in the release after. I would also say
> > >> > >>> https://github.com/apache/incubator-airflow/pull/2241 is semi
> > >> > important
> > >> > >>> but
> > >> > >>> less so.
> > >> > >>>
> > >> > >>> On Mon, Apr 17, 2017 at 11:24 AM, Chris Riccomini <
> > >> > criccom...@apache.org
> > >> > >>>
> > >> > >>> wrote:
> > >> > >>>
> > >> > >>>> Dear All,
> > >> > >>>>
> > >> > >>>> I have been able to make the Airflow 1.8.1 RC0 available at:
> > >> > >>>> https://dist.apache.org/repos/dist/dev/incubator/airflow,
> public
> > >> keys
> > >> &g

Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-18 Thread siddharth anand
I've run into a regression with the webserver. It looks like the --pid
argument is no longer honored in 1.8.1. The pid file is not being written
out! As a result, monitd, which watches the processes mentioned in the pid
file, keep trying to spawn webservers.

HISTTIMEFORMAT="%d/%m/%y %T "
PYTHONPATH=/usr/local/agari/ep-pipeline/production/current/analysis/cluster/:/usr/local/agari/ep-pipeline/production/current/analysis/lookups/
TMP=/data/tmp AIRFLOW_HOME=/data/airflow
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin airflow webserver -p
8080  --pid /data/airflow/pids/airflow-webserver.pid

The "upgrade" process for 1.8.1. is not simply  "pip install 1.8.1.
tarball". It requires a "pip uninstall" of the previous 1.8.0 version
followed by an new installation of 1.8.1. This could have pulled in some
new dependencies that broke how this works.

On Tue, Apr 18, 2017 at 12:32 PM, siddharth anand  wrote:

> Hmn.. it always worked for me for any of the releases we installed. I
> install `pip install `
>
> -s
>
> On Tue, Apr 18, 2017 at 10:44 AM, Chris Riccomini 
> wrote:
>
>> @Sid, how do you enable the versioning? I've never been able to get this
>> to
>> work in my environment. It always shows "Not available", even with 1.8.0.
>>
>> On Mon, Apr 17, 2017 at 11:18 PM, Bolke de Bruin 
>> wrote:
>>
>> > Hey Alex,
>> >
>> > I agree with you that they are nice to have, but as you mentioned they
>> are
>> > not blockers. As we are moving towards time based releases I suggest
>> > marking them for 1.8.2 and cherry-picking them in your production.
>> >
>> > - Bolke.
>> >
>> > > On 18 Apr 2017, at 00:02, Alex Guziel > D>
>> > wrote:
>> > >
>> > > Sorry about that. FWIW, these were recent and I don't think they were
>> > > blockers but are nice to fix. Particularly, the tree one was forgotten
>> > > about. I remember seeing it at the Airflow hackathon but I guess I
>> forgot
>> > > to correct it.
>> > >
>> > > On Mon, Apr 17, 2017 at 12:17 PM, Chris Riccomini <
>> criccom...@apache.org
>> > >
>> > > wrote:
>> > >
>> > >> :(:(:( Why was this not included in 1.8.1 JIRA? I've been emailing
>> the
>> > list
>> > >> all last week
>> > >>
>> > >> On Mon, Apr 17, 2017 at 11:28 AM, Alex Guziel <
>> > >> alex.guz...@airbnb.com.invalid> wrote:
>> > >>
>> > >>> I would say to include [1074] (
>> > >>> https://github.com/apache/incubator-airflow/pull/2221) so we don't
>> > have
>> > >> a
>> > >>> regression in the release after. I would also say
>> > >>> https://github.com/apache/incubator-airflow/pull/2241 is semi
>> > important
>> > >>> but
>> > >>> less so.
>> > >>>
>> > >>> On Mon, Apr 17, 2017 at 11:24 AM, Chris Riccomini <
>> > criccom...@apache.org
>> > >>>
>> > >>> wrote:
>> > >>>
>> > >>>> Dear All,
>> > >>>>
>> > >>>> I have been able to make the Airflow 1.8.1 RC0 available at:
>> > >>>> https://dist.apache.org/repos/dist/dev/incubator/airflow, public
>> keys
>> > >>> are
>> > >>>> available at https://dist.apache.org/repos/
>> > >>> dist/release/incubator/airflow.
>> > >>>>
>> > >>>> Issues fixed:
>> > >>>>
>> > >>>> [AIRFLOW-1062] DagRun#find returns wrong result if external_trigg
>> > >>>> [AIRFLOW-1054] Fix broken import on test_dag
>> > >>>> [AIRFLOW-1050] Retries ignored - regression
>> > >>>> [AIRFLOW-1033] TypeError: can't compare datetime.datetime to None
>> > >>>> [AIRFLOW-1030] HttpHook error when creating HttpSensor
>> > >>>> [AIRFLOW-1017] get_task_instance should return None instead of th
>> > >>>> [AIRFLOW-1011] Fix bug in BackfillJob._execute() for SubDAGs
>> > >>>> [AIRFLOW-1001] Landing Time shows "unsupported operand type(s) fo
>> > >>>> [AIRFLOW-1000] Rebrand to Apache Airflow instead of Airflow
>> > >>>> [AIRFLOW-989] Clear Task Regression
>> > >>>> [AIRFLOW-974] airflow.util.file mkdir has a race condition
>> > >>&

Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-18 Thread siddharth anand
Hmn.. it always worked for me for any of the releases we installed. I
install `pip install `

-s

On Tue, Apr 18, 2017 at 10:44 AM, Chris Riccomini 
wrote:

> @Sid, how do you enable the versioning? I've never been able to get this to
> work in my environment. It always shows "Not available", even with 1.8.0.
>
> On Mon, Apr 17, 2017 at 11:18 PM, Bolke de Bruin 
> wrote:
>
> > Hey Alex,
> >
> > I agree with you that they are nice to have, but as you mentioned they
> are
> > not blockers. As we are moving towards time based releases I suggest
> > marking them for 1.8.2 and cherry-picking them in your production.
> >
> > - Bolke.
> >
> > > On 18 Apr 2017, at 00:02, Alex Guziel 
> > wrote:
> > >
> > > Sorry about that. FWIW, these were recent and I don't think they were
> > > blockers but are nice to fix. Particularly, the tree one was forgotten
> > > about. I remember seeing it at the Airflow hackathon but I guess I
> forgot
> > > to correct it.
> > >
> > > On Mon, Apr 17, 2017 at 12:17 PM, Chris Riccomini <
> criccom...@apache.org
> > >
> > > wrote:
> > >
> > >> :(:(:( Why was this not included in 1.8.1 JIRA? I've been emailing the
> > list
> > >> all last week
> > >>
> > >> On Mon, Apr 17, 2017 at 11:28 AM, Alex Guziel <
> > >> alex.guz...@airbnb.com.invalid> wrote:
> > >>
> > >>> I would say to include [1074] (
> > >>> https://github.com/apache/incubator-airflow/pull/2221) so we don't
> > have
> > >> a
> > >>> regression in the release after. I would also say
> > >>> https://github.com/apache/incubator-airflow/pull/2241 is semi
> > important
> > >>> but
> > >>> less so.
> > >>>
> > >>> On Mon, Apr 17, 2017 at 11:24 AM, Chris Riccomini <
> > criccom...@apache.org
> > >>>
> > >>> wrote:
> > >>>
> >  Dear All,
> > 
> >  I have been able to make the Airflow 1.8.1 RC0 available at:
> >  https://dist.apache.org/repos/dist/dev/incubator/airflow, public
> keys
> > >>> are
> >  available at https://dist.apache.org/repos/
> > >>> dist/release/incubator/airflow.
> > 
> >  Issues fixed:
> > 
> >  [AIRFLOW-1062] DagRun#find returns wrong result if external_trigg
> >  [AIRFLOW-1054] Fix broken import on test_dag
> >  [AIRFLOW-1050] Retries ignored - regression
> >  [AIRFLOW-1033] TypeError: can't compare datetime.datetime to None
> >  [AIRFLOW-1030] HttpHook error when creating HttpSensor
> >  [AIRFLOW-1017] get_task_instance should return None instead of th
> >  [AIRFLOW-1011] Fix bug in BackfillJob._execute() for SubDAGs
> >  [AIRFLOW-1001] Landing Time shows "unsupported operand type(s) fo
> >  [AIRFLOW-1000] Rebrand to Apache Airflow instead of Airflow
> >  [AIRFLOW-989] Clear Task Regression
> >  [AIRFLOW-974] airflow.util.file mkdir has a race condition
> >  [AIRFLOW-906] Update Code icon from lightning bolt to file
> >  [AIRFLOW-858] Configurable database name for DB operators
> >  [AIRFLOW-853] ssh_execute_operator.py stdout decode default to A
> >  [AIRFLOW-832] Fix debug server
> >  [AIRFLOW-817] Trigger dag fails when using CLI + API
> >  [AIRFLOW-816] Make sure to pull nvd3 from local resources
> >  [AIRFLOW-815] Add previous/next execution dates to available def
> >  [AIRFLOW-813] Fix unterminated unit tests in tests.job (tests/jo
> >  [AIRFLOW-812] Scheduler job terminates when there is no dag file
> >  [AIRFLOW-806] UI should properly ignore DAG doc when it is None
> >  [AIRFLOW-794] Consistent access to DAGS_FOLDER and SQL_ALCHEMY_C
> >  [AIRFLOW-785] ImportError if cgroupspy is not installed
> >  [AIRFLOW-784] Cannot install with funcsigs > 1.0.0
> >  [AIRFLOW-780] The UI no longer shows broken DAGs
> >  [AIRFLOW-777] dag_is_running is initlialized to True instead of
> >  [AIRFLOW-719] Skipped operations make DAG finish prematurely
> >  [AIRFLOW-694] Empty env vars do not overwrite non-empty config v
> >  [AIRFLOW-139] Executing VACUUM with PostgresOperator
> >  [AIRFLOW-111] DAG concurrency is not honored
> >  [AIRFLOW-88] Improve clarity Travis CI reports
> > 
> >  I would like to raise a VOTE for releasing 1.8.1 based on release
> > >>> candidate
> >  0, i.e. just renaming release candidate 0 to 1.8.1 release.
> > 
> >  Please respond to this email by:
> > 
> >  +1,0,-1 with *binding* if you are a PMC member or *non-binding* if
> you
> > >>> are
> >  not.
> > 
> >  Vote will run for 72 hours (ends this Thursday).
> > 
> >  Thanks!
> >  Chris
> > 
> >  My VOTE: +1 (binding)
> > 
> > >>>
> > >>
> >
> >
>


Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-17 Thread siddharth anand
Just installed the rc in our staging env and letting it bake.

FYI, I noticed that the version is not available at the following URI :
/admin/versionview/
Here's a screenshot :
https://www.dropbox.com/s/shdzwadb8klqwt2/Screenshot%202017-04-17%2020.57.06.png?dl=0

I installed via pip!
-s

On Mon, Apr 17, 2017 at 3:02 PM, Alex Guziel  wrote:

> Sorry about that. FWIW, these were recent and I don't think they were
> blockers but are nice to fix. Particularly, the tree one was forgotten
> about. I remember seeing it at the Airflow hackathon but I guess I forgot
> to correct it.
>
> On Mon, Apr 17, 2017 at 12:17 PM, Chris Riccomini 
> wrote:
>
> > :(:(:( Why was this not included in 1.8.1 JIRA? I've been emailing the
> list
> > all last week
> >
> > On Mon, Apr 17, 2017 at 11:28 AM, Alex Guziel <
> > alex.guz...@airbnb.com.invalid> wrote:
> >
> > > I would say to include [1074] (
> > > https://github.com/apache/incubator-airflow/pull/2221) so we don't
> have
> > a
> > > regression in the release after. I would also say
> > > https://github.com/apache/incubator-airflow/pull/2241 is semi
> important
> > > but
> > > less so.
> > >
> > > On Mon, Apr 17, 2017 at 11:24 AM, Chris Riccomini <
> criccom...@apache.org
> > >
> > > wrote:
> > >
> > > > Dear All,
> > > >
> > > > I have been able to make the Airflow 1.8.1 RC0 available at:
> > > > https://dist.apache.org/repos/dist/dev/incubator/airflow, public
> keys
> > > are
> > > > available at https://dist.apache.org/repos/
> > > dist/release/incubator/airflow.
> > > >
> > > > Issues fixed:
> > > >
> > > > [AIRFLOW-1062] DagRun#find returns wrong result if external_trigg
> > > > [AIRFLOW-1054] Fix broken import on test_dag
> > > > [AIRFLOW-1050] Retries ignored - regression
> > > > [AIRFLOW-1033] TypeError: can't compare datetime.datetime to None
> > > > [AIRFLOW-1030] HttpHook error when creating HttpSensor
> > > > [AIRFLOW-1017] get_task_instance should return None instead of th
> > > > [AIRFLOW-1011] Fix bug in BackfillJob._execute() for SubDAGs
> > > > [AIRFLOW-1001] Landing Time shows "unsupported operand type(s) fo
> > > > [AIRFLOW-1000] Rebrand to Apache Airflow instead of Airflow
> > > > [AIRFLOW-989] Clear Task Regression
> > > > [AIRFLOW-974] airflow.util.file mkdir has a race condition
> > > > [AIRFLOW-906] Update Code icon from lightning bolt to file
> > > > [AIRFLOW-858] Configurable database name for DB operators
> > > > [AIRFLOW-853] ssh_execute_operator.py stdout decode default to A
> > > > [AIRFLOW-832] Fix debug server
> > > > [AIRFLOW-817] Trigger dag fails when using CLI + API
> > > > [AIRFLOW-816] Make sure to pull nvd3 from local resources
> > > > [AIRFLOW-815] Add previous/next execution dates to available def
> > > > [AIRFLOW-813] Fix unterminated unit tests in tests.job (tests/jo
> > > > [AIRFLOW-812] Scheduler job terminates when there is no dag file
> > > > [AIRFLOW-806] UI should properly ignore DAG doc when it is None
> > > > [AIRFLOW-794] Consistent access to DAGS_FOLDER and SQL_ALCHEMY_C
> > > > [AIRFLOW-785] ImportError if cgroupspy is not installed
> > > > [AIRFLOW-784] Cannot install with funcsigs > 1.0.0
> > > > [AIRFLOW-780] The UI no longer shows broken DAGs
> > > > [AIRFLOW-777] dag_is_running is initlialized to True instead of
> > > > [AIRFLOW-719] Skipped operations make DAG finish prematurely
> > > > [AIRFLOW-694] Empty env vars do not overwrite non-empty config v
> > > > [AIRFLOW-139] Executing VACUUM with PostgresOperator
> > > > [AIRFLOW-111] DAG concurrency is not honored
> > > > [AIRFLOW-88] Improve clarity Travis CI reports
> > > >
> > > > I would like to raise a VOTE for releasing 1.8.1 based on release
> > > candidate
> > > > 0, i.e. just renaming release candidate 0 to 1.8.1 release.
> > > >
> > > > Please respond to this email by:
> > > >
> > > > +1,0,-1 with *binding* if you are a PMC member or *non-binding* if
> you
> > > are
> > > > not.
> > > >
> > > > Vote will run for 72 hours (ends this Thursday).
> > > >
> > > > Thanks!
> > > > Chris
> > > >
> > > > My VOTE: +1 (binding)
> > > >
> > >
> >
>


Re: Welcome @saguziel as a committer and PMC member!

2017-04-17 Thread siddharth anand
Welcome Alex!

On Fri, Apr 14, 2017 at 8:23 AM, Chris Riccomini 
wrote:

> Congrats, Alex! Welcome. :)
>
> On Thu, Apr 13, 2017 at 7:06 PM, Dan Davydov  invalid
> > wrote:
>
> > Alex (@saguziel - AirBnB) has been making contributions and reviews for
> > quite a long time now and I'm very happy to say he has just become an
> > official committer and PMC member.
> >
> > He has ~13 commits, most of which are to the core of Airflow, and has
> been
> > active reviewing open source PRs, contributing in the recent release
> (e.g.
> > fixing blocking issues), and has a strong understanding of the the core
> > Airflow logic (he has submitted a couple of patches to remove race
> > conditions, and security patches).
> >
> > Congratulations and welcome Alex!
> > -Dan
> >
>


New Apache Airflow meetup : in Tokyo

2017-04-12 Thread siddharth anand
Live in Tokyo & want to contribute to @ApacheAirflow
? Check out our new Tokyo meetup :
http://bit.ly/2o7jXWF  . First meetup on May 11 :
https://www.meetup.com/Tokyo-Apache-Airflow-incubating-Meetup/events/238731591/

Thanks to Kengo Seki (@sekikn) for taking the lead on this!
-s


Re: Cleanup

2017-04-05 Thread siddharth anand
Edgardo,
This is a great question and something that requires functionality to
address. As Airflow starts getting used for bigger workloads, we need a way
to clean up defunct resources.

   - How do we delete a dag and its related resources?
  - Until the recent release, the way that I stopped having a defunct
  (retired) dag show up in the UI was to move the DAG file out of the
  dag_folder or just deleting it from Git. Our dag folders are
just symlinks
  to tagged Git repos.
  - This no longer works -- the UI will display the dag list based on
  entries in the dag table in the airflow metadata db -- but will no longer
  have code to back that dag table entry. I currently manually delete a row
  from the dag table, but that is surely not the right thing to do.
  - How do we retire entries from the *task_instance, job, log,  xcom,
  sla_miss, dag_stats, *and *dag_run* tables for dags that are deleted?
  (I can surely clean these up manually as well, but we need a UI
  control).
 -  *task_instance, job, log, &* *dag_run *tables grow faster than
 the others
 - How does one track if variables, connections, or pools are no
  longer referenced because all of the DAGs that use them are gone?
 - It would be nice here to have reference counts & links to DAGs
 that reference a Pool, Connection, or Variable. The reference
counts can be
 broken down into paused & unpaused.

It's time we added some functionality to the API/CLI/UI to address these
functionality gaps.

-s

On Tue, Apr 4, 2017 at 10:25 AM, Edgardo Vega 
wrote:

> Max,
>
> Thanks for the reply, it is much appreciated.  I am currently running ~10k
> task a day in our test environment.
>
> It is good to know where the archive point is and that I shouldn't have any
> issues for a long time.
>
> I was just thinking ahead as we get airflow into production environment.
> Maybe in this case maybe way too far ahead.
>
>
> Cheers,
>
> Edgardo
>
> On Tue, Apr 4, 2017 at 11:58 AM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > We run ~50k tasks a day at Airbnb. How many tasks/day are you planning on
> > running?
> >
> > Though you can archive the `task_instance` and `job` table down the line,
> > but that shouldn't be a concern until you hit tens of millions of
> entries.
> > Then you can setup a daily Airflow job that archives some of these
> entries.
> > I believe we do it based on `start_date` and move rows to some other
> table
> > in the same db.
> >
> > Max
> >
> > On Mon, Apr 3, 2017 at 5:30 PM, Edgardo Vega 
> > wrote:
> >
> > > I have been playing with airflow for a few days and it's not obvious
> what
> > > will happen down the road when we have lots of dags over a long period
> of
> > > time. I set a fake dag to run once a minute for a few days and
> everything
> > > seems okay except the graph view dropdown which works but take a few
> > > seconds to show up.
> > >
> > > Is there a way roll older data out of the system in order to clean
> things
> > > visually and keep the database at a smallish size?
> > >
> > > --
> > > Cheers,
> > >
> > > Edgardo
> > >
> >
>
>
>
> --
> Cheers,
>
> Edgardo
>


Re: PTAL: Airflow 2017 April Podling Report

2017-04-05 Thread siddharth anand
Just reviewed the report. Looks great! It is ready for sign-off from our
mentors.
-s

On Wed, Apr 5, 2017 at 3:49 PM, Arthur Wiedmer 
wrote:

> I added the following to the report, as it seemed like it was a required
> question :
>
> How would you assess the podling's maturity?Please feel free to add
> your own commentary.  [ ] Initial setup  [ ] Working towards first
> release  [ ] Community building  [X] Nearing graduation  [ ] Other:The
> Airflow community continues to grow, and we have successfully created
> our first Apache release. We want to continue on this momentum and
> create another release to solidify our  process and tools around it,
> but we feel we are nearing graduation. We are open to feedback and
> guidance to make sure we can do so.
>
> Does this look OK ? I only have until tonight to change it.
> To view the report : https://wiki.apache.org/incubator/April2017
>
> Best,
> Arthur
>
>
>
> On Wed, Apr 5, 2017 at 11:57 AM, Gurer Kiratli <
> gurer.kira...@airbnb.com.invalid> wrote:
>
> > Hi folks,
> >
> > Here is the draft of the podling report. Please take a look and comment.
> If
> > it looks good one of the committers have to post this on this on the wiki
> > today!
> >
> > Cheers,
> >
> > Gurer
> >
> >
> >
> > >>
> >
> > Airflow
> >
> > Airflow is a workflow automation and scheduling system that can be used
> to
> > author and manage data pipelines.
> >
> > Airflow has been incubating since 2016-03-31.
> >
> > Three most important issues to address in the move towards graduation:
> >
> >   1. We will have the 1.8.1 release soon, then we are looking to
> graduate.
> >   2.
> >   3.
> >
> > Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> > aware of?
> >
> >   None
> >
> > How has the community developed since the last report?
> >
> >   1. We had our first official release. 1.8.0 on March 19th 2017.
> >   2. We elected 1 new PPMC Member/Committer: Alex Guziel (a.k.a saguziel)
> >   3. Since our last podling report 3 months ago (i.e. between Jan 1 and
> Mar
> >  31, inclusive), we grew our contributors from 224 to 256
> >   4. Since our last podling report 3 months ago (i.e. between Jan 1 and
> Mar
> >  31, inclusive), we resolved 216 pull requests (currently at 1479
> > closed
> >  PRs)
> >   5. Two meet-ups, one in New York, NY hosted by Blue Apron and one in
> San
> > Jose, CA hosted by PayPal were held by the community.
> >   6. Since being accepted into the incubator, the number of companies
> >  officially using Apache Airflow has risen from 30 to 83.
> >
> > How has the project developed since the last report?
> >
> >   See above
> >
> > Date of last release:
> >
> >   March 19th, 2017
> >
> > When were the last committers or PPMC members elected?
> >
> >   As mentioned on
> >
> > https://cwiki.apache.org/confluence/display/AIRFLOW/
> > Announcements#Announcements-Mar14,2017
> >   Alex Guziel joined the Apache Airflow PPMC/Committer group.
> >
> > Signed-off-by:
> >
> >   [](airflow) Chris Nauroth
> >   [](airflow) Hitesh Shah
> >   [ ](airflow) Jakob Homan
> >
>


Re: 1.8.1 release

2017-03-30 Thread siddharth anand
Chris,
I've submitted PRs for :

   - PR [AIRFLOW-1013] :
   https://github.com/apache/incubator-airflow/pull/2203
   - PR [AIRFLOW-1054]:
   https://github.com/apache/incubator-airflow/pull/2201

And filed a blocker for a new issue. Essentially, @once DAGs cannot be
created if catchup=False :
https://issues.apache.org/jira/browse/AIRFLOW-1055

I have a PR that works for this, but will need to add unit tests for it as
well as for AIRFLOW-1013.

-s

On Wed, Mar 29, 2017 at 3:24 PM, siddharth anand  wrote:

> Didn't realize https://issues.apache.org/jira/browse/AIRFLOW-1013 was a
> blocker. I will have a PR shortly.
> -s
>
> On Wed, Mar 29, 2017 at 2:07 PM, Chris Riccomini 
> wrote:
>
>> The following three JIRAs were not merged into the v1-8-test branch, but
>> are listed as part of the 1.8.1 release:
>>
>> AIRFLOW-1017 b2b9587cca9195229ab107394ad94b7702c70e37
>> AIRFLOW-906 bc47200711be4d2c0b36b772651dae4f5e01a204
>> AIRFLOW-858 94dc7fb0a6bb3c563d9df6566cd52a59bd0c4629
>> AIRFLOW-832 b0ae70d3a8e935dc9266b6853683ae5375a7390b
>>
>> I'm going to merge them in now.
>>
>> On Wed, Mar 29, 2017 at 1:53 PM, Chris Riccomini 
>> wrote:
>>
>> > Hey Bolke,
>> >
>> > Great. Assuming your PR is committed, that leaves five blockers:
>> >
>> > https://issues.apache.org/jira/browse/AIRFLOW-1000
>> > https://issues.apache.org/jira/browse/AIRFLOW-1001
>> > https://issues.apache.org/jira/browse/AIRFLOW-1013
>> > https://issues.apache.org/jira/browse/AIRFLOW-1018
>> > https://issues.apache.org/jira/browse/AIRFLOW-1019
>> >
>> > I've also got a list of all open 1.8.1 JIRAs [1].
>> >
>> > Cheers,
>> > Chris
>> >
>> > [1] https://issues.apache.org/jira/issues/?jql=project%20%
>> > 3D%20AIRFLOW%20AND%20status%20in%20(Open%2C%20%22In%
>> > 20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.8.1
>> >
>> > On Mon, Mar 27, 2017 at 8:59 PM, Bolke de Bruin 
>> wrote:
>> >
>> >> Hi Chris,
>> >>
>> >> I have a PR out for
>> >>
>> >> * Revert of 719, which makes 982 obsolete and removes 983 from the
>> >> blockers list and just a new feature.
>> >>
>> >> See: https://github.com/apache/incubator-airflow/pull/2195 <
>> >> https://github.com/apache/incubator-airflow/pull/2195>
>> >>
>> >> Cc: @alexvanboxel
>> >>
>> >> Bolke
>> >>
>> >> > On 24 Mar 2017, at 10:21, Chris Riccomini 
>> >> wrote:
>> >> >
>> >> > Hey all,
>> >> >
>> >> > I've let this thread sit for a while. Here are a list of the issues
>> that
>> >> > were raised:
>> >> >
>> >> > BLOCKERS:
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-982
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-983
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1019
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1017
>> >> >
>> >> > NICE TO HAVE:
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1015
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1013
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1004
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1003
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1001
>> >> >
>> >> > It looks like AIRFLOW-1017 is done, though the JIRA is not closed.
>> >> >
>> >> > The rest remain open. I will wait on the release until the remaining
>> >> > blockers are finished. Dan/Daniel, can you comment on status?
>> >> >
>> >> > Ruslan, if you want to work on your nice to haves, and submit
>> patches,
>> >> > that's great, otherwise I don't believe they'll get fixed as part of
>> >> 1.8.1.
>> >> >
>> >> > Cheers,
>> >> > Chris
>> >> >
>> >> > On Wed, Mar 22, 2017 at 9:19 AM, Ruslan Dautkhanov <
>> >> dautkha...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Thank you Sid!
>> >> >>
>> >> >>
>> >> >> Best regards,
>> >> >> Ruslan
>> >> >>
>> >> >> On Wed, Mar 22, 2017 at 12:01 AM, siddharth anand <
>> san...@apache.org>
>> 

Re: 1.8.1 release

2017-03-29 Thread siddharth anand
Didn't realize https://issues.apache.org/jira/browse/AIRFLOW-1013 was a
blocker. I will have a PR shortly.
-s

On Wed, Mar 29, 2017 at 2:07 PM, Chris Riccomini 
wrote:

> The following three JIRAs were not merged into the v1-8-test branch, but
> are listed as part of the 1.8.1 release:
>
> AIRFLOW-1017 b2b9587cca9195229ab107394ad94b7702c70e37
> AIRFLOW-906 bc47200711be4d2c0b36b772651dae4f5e01a204
> AIRFLOW-858 94dc7fb0a6bb3c563d9df6566cd52a59bd0c4629
> AIRFLOW-832 b0ae70d3a8e935dc9266b6853683ae5375a7390b
>
> I'm going to merge them in now.
>
> On Wed, Mar 29, 2017 at 1:53 PM, Chris Riccomini 
> wrote:
>
> > Hey Bolke,
> >
> > Great. Assuming your PR is committed, that leaves five blockers:
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-1000
> > https://issues.apache.org/jira/browse/AIRFLOW-1001
> > https://issues.apache.org/jira/browse/AIRFLOW-1013
> > https://issues.apache.org/jira/browse/AIRFLOW-1018
> > https://issues.apache.org/jira/browse/AIRFLOW-1019
> >
> > I've also got a list of all open 1.8.1 JIRAs [1].
> >
> > Cheers,
> > Chris
> >
> > [1] https://issues.apache.org/jira/issues/?jql=project%20%
> > 3D%20AIRFLOW%20AND%20status%20in%20(Open%2C%20%22In%
> > 20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.8.1
> >
> > On Mon, Mar 27, 2017 at 8:59 PM, Bolke de Bruin 
> wrote:
> >
> >> Hi Chris,
> >>
> >> I have a PR out for
> >>
> >> * Revert of 719, which makes 982 obsolete and removes 983 from the
> >> blockers list and just a new feature.
> >>
> >> See: https://github.com/apache/incubator-airflow/pull/2195 <
> >> https://github.com/apache/incubator-airflow/pull/2195>
> >>
> >> Cc: @alexvanboxel
> >>
> >> Bolke
> >>
> >> > On 24 Mar 2017, at 10:21, Chris Riccomini 
> >> wrote:
> >> >
> >> > Hey all,
> >> >
> >> > I've let this thread sit for a while. Here are a list of the issues
> that
> >> > were raised:
> >> >
> >> > BLOCKERS:
> >> > https://issues.apache.org/jira/browse/AIRFLOW-982
> >> > https://issues.apache.org/jira/browse/AIRFLOW-983
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1019
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1017
> >> >
> >> > NICE TO HAVE:
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1015
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1013
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1004
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1003
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1001
> >> >
> >> > It looks like AIRFLOW-1017 is done, though the JIRA is not closed.
> >> >
> >> > The rest remain open. I will wait on the release until the remaining
> >> > blockers are finished. Dan/Daniel, can you comment on status?
> >> >
> >> > Ruslan, if you want to work on your nice to haves, and submit patches,
> >> > that's great, otherwise I don't believe they'll get fixed as part of
> >> 1.8.1.
> >> >
> >> > Cheers,
> >> > Chris
> >> >
> >> > On Wed, Mar 22, 2017 at 9:19 AM, Ruslan Dautkhanov <
> >> dautkha...@gmail.com>
> >> > wrote:
> >> >
> >> >> Thank you Sid!
> >> >>
> >> >>
> >> >> Best regards,
> >> >> Ruslan
> >> >>
> >> >> On Wed, Mar 22, 2017 at 12:01 AM, siddharth anand  >
> >> >> wrote:
> >> >>
> >> >>> Ruslan,
> >> >>> Thanks for sharing this list. I can pick a few up. I agree we should
> >> aim
> >> >> to
> >> >>> get some of them into 1.8.1.
> >> >>>
> >> >>> -s
> >> >>>
> >> >>> On Tue, Mar 21, 2017 at 2:29 PM, Ruslan Dautkhanov <
> >> dautkha...@gmail.com
> >> >>>
> >> >>> wrote:
> >> >>>
> >> >>>> Some of the issues I ran into while testing 1.8rc5 :
> >> >>>>
> >> >>>> https://issues.apache.org/jira/browse/AIRFLOW-1015
> >> >>>>> https://issues.apache.org/jira/browse/AIRFLOW-1013
> >> >>>>> https://issues.apache.org/jira/browse/AIRFLOW-1004
> >> >>>>>

Re: 1.8.1 release

2017-03-21 Thread siddharth anand
Ruslan,
Thanks for sharing this list. I can pick a few up. I agree we should aim to
get some of them into 1.8.1.

-s

On Tue, Mar 21, 2017 at 2:29 PM, Ruslan Dautkhanov 
wrote:

> Some of the issues I ran into while testing 1.8rc5 :
>
> https://issues.apache.org/jira/browse/AIRFLOW-1015
> > https://issues.apache.org/jira/browse/AIRFLOW-1013
> > https://issues.apache.org/jira/browse/AIRFLOW-1004
> > https://issues.apache.org/jira/browse/AIRFLOW-1003
> > https://issues.apache.org/jira/browse/AIRFLOW-1001
> > https://issues.apache.org/jira/browse/AIRFLOW-1015
>
>
> It would be great to have at least some of them fixed in 1.8.1.
>
> Thank you.
>
>
>
>
> --
> Ruslan Dautkhanov
>
> On Tue, Mar 21, 2017 at 3:02 PM, Dan Davydov  invalid
> > wrote:
>
> > Here is my list for targeted 1.8.1 fixes:
> > https://issues.apache.org/jira/browse/AIRFLOW-982
> > https://issues.apache.org/jira/browse/AIRFLOW-983
> > https://issues.apache.org/jira/browse/AIRFLOW-1019 (and in general the
> > slow
> > startup time from this new logic of orphaned/reset task)
> > https://issues.apache.org/jira/browse/AIRFLOW-1017 (which I will
> hopefully
> > have a fix out for soon just finishing up tests)
> >
> > We are also hitting a new issue with subdags with rc5 that we weren't
> > hitting with rc4 where subdags will occasionally just hang (had to roll
> > back from rc5 to rc4), I'll try to spin up a JIRA for it soon which
> should
> > be on the list too.
> >
> >
> > On Tue, Mar 21, 2017 at 1:54 PM, Chris Riccomini 
> > wrote:
> >
> > > Agreed. I'm looking for a list of checksums/JIRAs that we want in the
> > > bugfix release.
> > >
> > > On Tue, Mar 21, 2017 at 12:54 PM, Bolke de Bruin 
> > > wrote:
> > >
> > > >
> > > >
> > > > > On 21 Mar 2017, at 12:51, Bolke de Bruin 
> wrote:
> > > > >
> > > > > My suggestion, as we are using semantic versioning is:
> > > > >
> > > > > 1) no new features in the 1.8 branch
> > > > > 2) only bug fixes in the 1.8 branch
> > > > > 3) new features to land in 1.9
> > > > >
> > > > > This allows companies to
> > > >
> > > > Have a "known" version and can move to the new branch when they want
> to
> > > > get new features. Obviously we only support N-1, so when 1.10 comes
> out
> > > we
> > > > stop supporting 1.8.X.
> > > >
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > >> On 21 Mar 2017, at 11:22, Chris Riccomini 
> > > > wrote:
> > > > >>
> > > > >> Hey all,
> > > > >>
> > > > >> I suggest that we start a 1.8.1 Airflow release now. The goal
> would
> > > be:
> > > > >>
> > > > >> 1) get a second release under our belt
> > > > >> 2) patch known issues with the 1.8.0 release
> > > > >>
> > > > >> I'm happy to run it, but I saw Maxime mentioning that Airbnb might
> > > want
> > > > to.
> > > > >> @Max et al, can you comment?
> > > > >>
> > > > >> Also, can folks supply JIRAs for stuff that think needs to be in
> the
> > > > 1.8.1
> > > > >> bugfix release?
> > > > >>
> > > > >> Cheers,
> > > > >> Chris
> > > >
> > >
> >
>


Re: [RESULT][VOTE]Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-20 Thread siddharth anand
I've updated

   - http://incubator.apache.org/projects/airflow.html (under the News
   section)
   - CWiki Announcements :

https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements#Announcements-March19,2017
   
<https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements#Announcements-March19,2017>
   - Twitter account :
https://twitter.com/ApacheAirflow/status/843721202430492674
   <https://twitter.com/ApacheAirflow/status/843721202430492674>

FYI, I expect the PyPi link to the release will be :
https://pypi.python.org/pypi/airflow/1.8.0+apache.incubating

Max, let us know once the PyPi package is available! Bolke, thx again for
seeing this through!

-s

On Sun, Mar 19, 2017 at 10:46 AM, Bolke de Bruin  wrote:

> I have made airflow 1.8 available from: https://dist.apache.org/repos/
> dist/release/incubator/airflow/1.8.0-incubating/ , I have asked Sid to do
> the official announcement. PyPi can be updated and docs uploaded.
>
> Cheers
> Bolke
>
> > On 19 Mar 2017, at 09:21, Bolke de Bruin  wrote:
> >
> > I’m doing the announcement on the IPMC in a few (need to grab breakfast
> first ;-) ). It can be done any time after that.
> >
> > I need to bump the version number so I will need to re-sign and create a
> new tar ball. I hope they won’t mind that, as it is a bit of a chicken and
> egg problem.
> >
> > Bolke.
> >
> >> On 19 Mar 2017, at 09:01, Maxime Beauchemin 
> wrote:
> >>
> >> @Bolke I can take care of regenerating the docs + pypi upload, just let
> me
> >> know when
> >>
> >> Max
> >>
> >> On Fri, Mar 17, 2017 at 5:20 PM, Dan Davydov  invalid
> >>> wrote:
> >>
> >>> That's reasonable (treating it a bug instead of a change in behavior).
> Full
> >>> speed ahead!
> >>>
> >>> On Thu, Mar 16, 2017 at 9:01 AM, Bolke de Bruin 
> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> Apache Airflow (incubating) 1.8.0 (RC5) has been accepted.
> >>>>
> >>>> 9 “+1” votes received:
> >>>>
> >>>> - Maxime Beauchemin (binding)
> >>>> - Chris Riccomini (binding)
> >>>> - Arthur Wiedmer (binding)
> >>>> - Jeremiah Lowin (binding)
> >>>> - Siddharth Anand (binding)
> >>>> - Alex van Boxel (binding)
> >>>> - Bolke de Bruin (binding)
> >>>>
> >>>> - Daniel Huang (non-binding)
> >>>>
> >>>> Vote thread (start):
> >>>> http://mail-archives.apache.org/mod_mbox/incubator-
> >>>> airflow-dev/201703.mbox/%3cB1833A3A-05FB-4112-B395-
> >>>> 135caf930...@gmail.com%3e
> >>>>
> >>>> Next steps:
> >>>> 1) will start the voting process at the IPMC mailinglist. I don’t
> expect
> >>>> changes.
> >>>> 2) Only after the positive voting on the IPMC and finalisation I will
> >>>> rebrand the RC to Release.
> >>>> 3) I will upload it to the incubator release page, then the tar ball
> >>> needs
> >>>> to propagate to the mirrors.
> >>>> 4) Update the website (can someone volunteer please?)
> >>>> 5) Finally I will ask Maxime to upload it to pypi. It seems we can
> keep
> >>>> the apache branding as lib cloud is doing this as well (
> >>>> https://libcloud.apache.org/downloads.html#pypi-package).
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Bolke
> >>>
> >
>
>


Re: Reminder : LatestOnlyOperator

2017-03-18 Thread siddharth anand
Thx Boris . Credit goes to George (gwax) for the implementation of the
LatestOnlyOperator.

Boris,
Can you describe what you mean in a Jira?
-s

On Fri, Mar 17, 2017 at 6:02 PM, Boris Tyukin  wrote:

> this is nice indeed along with the new catchup option
> https://airflow.incubator.apache.org/scheduler.html#backfill-and-catchup
>
> Thanks Sid and Ben for adding these new options!
>
> for a complete picture, it would be nice to force only one dag run at the
> time.
>
> On Fri, Mar 17, 2017 at 7:33 PM, siddharth anand 
> wrote:
>
> > With the Apache Airflow 1.8 release imminent, you may want to try out the
> >
> > *LatestOnlyOperator.*
> >
> > If you want your DAG to only run on the most recent scheduled slot,
> > regardless of backlog, this operator will skip running downstream tasks
> for
> > all DAG Runs prior to the current time slot.
> >
> > For example, I might have a DAG that takes a DB snapshot once a day. It
> > might be that I paused that DAG for 2 weeks or that I had set the start
> > date to a fixed data 2 weeks in the past. When I enable my DAG, I don't
> > want it to run 14 days' worth of snapshots for the current state of the
> DB
> > -- that's unnecessary work.
> >
> > The LatestOnlyOperator avoids that work.
> >
> > https://github.com/apache/incubator-airflow/commit/
> > edf033be65b575f44aa221d5d0ec9ecb6b32c67a
> >
> > With it, you can simply use
> > latest_only = LatestOnlyOperator(task_id='latest_only', dag=dag)
> >
> > instead of
> > def skip_to_current_job(ds, **kwargs):
> > now = datetime.now()
> > left_window = kwargs['dag'].following_schedule(kwargs['execution_
> > date'])
> > right_window = kwargs['dag'].following_schedule(left_window)
> > logging.info(('Left Window {}, Now {}, Right Window
> > {}').format(left_window,now,right_window))
> > if not now <= right_window:
> > logging.info('Not latest execution, skipping downstream.')
> > return False
> > return True
> >
> > short_circuit = ShortCircuitOperator(
> >   task_id = 'short_circuit_if_not_current_job',
> >   provide_context = True,
> >   python_callable = skip_to_current_job,
> >   dag = dag
> > )
> >
> > -s
> >
>


Reminder : LatestOnlyOperator

2017-03-17 Thread siddharth anand
With the Apache Airflow 1.8 release imminent, you may want to try out the

*LatestOnlyOperator.*

If you want your DAG to only run on the most recent scheduled slot,
regardless of backlog, this operator will skip running downstream tasks for
all DAG Runs prior to the current time slot.

For example, I might have a DAG that takes a DB snapshot once a day. It
might be that I paused that DAG for 2 weeks or that I had set the start
date to a fixed data 2 weeks in the past. When I enable my DAG, I don't
want it to run 14 days' worth of snapshots for the current state of the DB
-- that's unnecessary work.

The LatestOnlyOperator avoids that work.

https://github.com/apache/incubator-airflow/commit/edf033be65b575f44aa221d5d0ec9ecb6b32c67a

With it, you can simply use
latest_only = LatestOnlyOperator(task_id='latest_only', dag=dag)

instead of
def skip_to_current_job(ds, **kwargs):
now = datetime.now()
left_window = kwargs['dag'].following_schedule(kwargs['execution_date'])
right_window = kwargs['dag'].following_schedule(left_window)
logging.info(('Left Window {}, Now {}, Right Window
{}').format(left_window,now,right_window))
if not now <= right_window:
logging.info('Not latest execution, skipping downstream.')
return False
return True

short_circuit = ShortCircuitOperator(
  task_id = 'short_circuit_if_not_current_job',
  provide_context = True,
  python_callable = skip_to_current_job,
  dag = dag
)

-s


Re: Airflow best practices

2017-03-17 Thread siddharth anand
FYI, we have some best practices in confluent and possibly in other places
as well. I'd recommend adding to that rather than relying on email. Email
can be used to mail the link :-)
-s

On Fri, Mar 17, 2017 at 9:33 AM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Forwarding an email that should have been on this mailing list:
>
> -- Forwarded message --
> From: Maxime Beauchemin 
> Date: Fri, Mar 17, 2017 at 8:53 AM
> Subject: Re: Airflow best practices
> To: Shreyas Joshi 
>
>
> Hi Shreyas,
>
> Simple Airflow scripts are simply "configuration as code" and probably
> don't need to be abstracted out. The DSL is pretty expressive and there's
> usually a way to write your script so that it mostly contains code specific
> to your pipeline (without much boilerplate).
>
> For more advanced pipelines and complicated patterns, say dynamically
> building pipelines, it makes a lot of sense to create abstractions
> (modules, functions, classes, ...). Airflow isn't opinionated as to how you
> use its primitives, the only [perhaps odd] requirement is that your DAG
> objects should be in global module scope, somewhere in your DAGS_FOLDER, so
> that they can be discovered by Airflow's "DAG crawler".
>
> At Airbnb we have a lot of abstractions that generate Airflow objects. Some
> examples of that include our AB testing framework, common data quality
> enforcement patterns (stage the data, run DQ checks, exchange the partition
> to production), and pretty much every other Airflow script. People create
> the logic they need to create their pipeline, a lot of it is "as dynamic as
> it needs to be". It's pretty common for people to write their own operators
> as well, packaged with their modules.
>
> We should put some more complex examples out somewhere to show people the
> kinds of things that can be done, though usually programmers using Airflow
> realize quickly the kinds of things they can do, I'm sure you did already!
>
> Max
>
> On Fri, Mar 17, 2017 at 6:46 AM, Shreyas Joshi xx...@github.com
> > wrote:
>
> > Hello Maxime,
> >
> > I am a data engineer at Github and we have been using Airflow for the
> last
> > few months. I noticed that in many of the example DAGs the code is simply
> > at the module level with no functions etc. Is this a recommended pattern
> > with Airflow DAGs? If so- I’d be very curious to know what the rationale
> > behind this recommendation is.
> >
> > Thanks,
> > Shreyas
>


Re: Airflow Committers: Landscape checks doing more harm than good?

2017-03-16 Thread siddharth anand
+1 for replacing it with travis linting.

On Thu, Mar 16, 2017 at 7:59 PM, Jeremiah Lowin  wrote:

> FWIW I recently started using yapf (https://github.com/google/yapf) with a
> slightly custom config to format all of my projects. Rather than alert to
> discrete linting errors and concrete style rules (like PEP8) -- things I'm
> sure we all do anyway -- it reformats all code in compliance with your
> chosen style rules. It even reformats code that is already PEP8 compliant
> to make it more "pythonic" (and still PEP8 compliant). Basically: if you
> like (or create) a yapf style, it takes care of all the hard reformatting
> work and produces pleasing, consistent results. /plug
>
> On Thu, Mar 16, 2017 at 8:42 PM Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > Let's wire a custom a linter command that can be called locally and
> respect
> > an agreed upon set of parameters (pylint + config file, based off of our
> > current .landscape.yml ).
> >
> > flake8 is far from being as good as pylint and can't be customized much
> > AFAICT, but variations on the command bellow can help you lint [only]
> your
> > PR:
> >
> > `git diff HEAD^ | flake8 --diff`
> >
> > It's a good thing to integrate in your workflow until we get an
> equivalent
> > pylint command/config
> >
> > On Thu, Mar 16, 2017 at 5:03 PM, Alex Guziel  > .invalid
> > > wrote:
> >
> > > +1 also
> > >
> > > We have code review already and the amount of false positives makes
> this
> > > useless.
> > >
> > > On Thu, Mar 16, 2017 at 5:02 PM, Maxime Beauchemin <
> > > maximebeauche...@gmail.com> wrote:
> > >
> > > > +1 as well
> > > >
> > > > I'm disappointed because the service is inches away from getting
> > > everything
> > > > right. As Bolke said, behind the cover it's little more than pylint,
> > git
> > > > hooks, and a somewhat-fancy ui.
> > > >
> > > > Operationally it's been getting in the way.
> > > >
> > > > There's a way to pipe the output of git diff into pylint and check
> > > whether
> > > > the touched lines need linting, in which case we should break the
> > build.
> > > > This could run in it's own slot in the Travis build matrix.
> > > >
> > > > Max
> > > >
> > > > On Thu, Mar 16, 2017 at 4:51 PM, Bolke de Bruin 
> > > wrote:
> > > >
> > > > > We can do it in Travis’ afaik. We should replace it.
> > > > >
> > > > > So +1.
> > > > >
> > > > > B.
> > > > >
> > > > > > On 16 Mar 2017, at 16:48, Jeremiah Lowin 
> > wrote:
> > > > > >
> > > > > > This may be an unpopular opinion, but most Airflow PRs have a
> > little
> > > > red
> > > > > > "x" next to them not because they have failing unit tests, but
> > > because
> > > > > the
> > > > > > Landscape check has decided they introduce bad code.
> > > > > >
> > > > > > Unfortunately Landscape is often wrong -- here it is telling me
> my
> > > > latest
> > > > > > PR introduced no less than 30 errors... in files I didn't touch!
> > > > > > https://github.com/apache/incubator-airflow/pull/2157 (however,
> it
> > > > > gives me
> > > > > > credit for fixing 23 errors in those same files, so I've got that
> > > going
> > > > > for
> > > > > > me... which is nice.)
> > > > > >
> > > > > > The upshot is that Github's "health" indicator can be swayed by
> > minor
> > > > or
> > > > > > erroneous issues, and therefore it serves little purpose other
> than
> > > > > making
> > > > > > it look like every PR is bad. This creates committer fatigue,
> since
> > > > every
> > > > > > PR needs to be parsed to see if it actually is OK or not.
> > > > > >
> > > > > > Don't get me wrong, I'm all for proper style and on occasion
> > > Landscape
> > > > > has
> > > > > > pointed out problems that I've gone and fixed. But on the whole,
> I
> > > > > believe
> > > > > > that having it as part of our red / green PR evaluation -- equal
> to
> > > and
> > > > > > often superseding unit tests -- is harmful. I'd much rather be
> able
> > > to
> > > > > scan
> > > > > > the PR list and know unequivocally that "green" indicates ready
> to
> > > > merge.
> > > > > >
> > > > > > J
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: Adding Variables and Connections via script

2017-03-16 Thread siddharth anand
Yes.. I recall that PR and I also recall adding support to manage variables
via the CLI a long time back.. both of those should be in the 1.8 release.

-s

On Wed, Mar 8, 2017 at 10:18 AM, Nicholas Hodgkinson <
nik.hodgkin...@collectivehealth.com> wrote:

> Thanks everyone! This is super helpful!
>
> -Nik
> nik.hodgkin...@collectivehealth.com
>
>
> On Tue, Mar 7, 2017 at 2:23 PM, Boris Tyukin 
> wrote:
>
> > To add to Ali's reply, there was a PR for connections and cli
> > https://github.com/apache/incubator-airflow/pull/1802
> >
> > hopefully it will make to 1.8
> >
> >
> > On Tue, Mar 7, 2017 at 4:49 PM, Nicholas Hodgkinson <
> > nik.hodgkin...@collectivehealth.com> wrote:
> >
> > > I would like to be able to create a script to assist local development
> > > which would populate several Connections and Variables that are used
> > across
> > > our organization; is there a way that I can add those from the command
> > line
> > > or Python script without having to manually enter them via the UI?
> > >
> > > Thanks,
> > > -Nik
> > > nik.hodgkin...@collectivehealth.com
> > >
> > > --
> > >
> > >
> > > Read our founder's story.
> > > 
> > >
> > > *This message may contain confidential, proprietary, or protected
> > > information.  If you are not the intended recipient, you may not
> review,
> > > copy, or distribute this message. If you received this message in
> error,
> > > please notify the sender by reply email and delete this message.*
> > >
> >
>
> --
>
>
> Read our founder's story.
> 
>
> *This message may contain confidential, proprietary, or protected
> information.  If you are not the intended recipient, you may not review,
> copy, or distribute this message. If you received this message in error,
> please notify the sender by reply email and delete this message.*
>


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-15 Thread siddharth anand
Confirmed that Bolke's PR above fixes the issue.

Also, I agree this is not a blocker for the current airflow release, so my
+1 (binding) stands.
-s

On Wed, Mar 15, 2017 at 3:11 PM, Bolke de Bruin  wrote:

> PR is available: https://github.com/apache/incubator-airflow/pull/2154
>
> But marked for 1.8.1.
>
> - Bolke
>
> > On 15 Mar 2017, at 14:37, Bolke de Bruin  wrote:
> >
> > On second thought I do consider it a bug and can have a fix out pretty
> quickly, but I don’t consider it a blocker.
> >
> > - B.
> >
> >> On 15 Mar 2017, at 14:21, Bolke de Bruin  wrote:
> >>
> >> Just to be clear: Also in 1.7.1 the DagRun was marked successful, but
> its tasks continued to be scheduled. So one could also consider 1.7.1
> behaviour a bug. I am not sure here, but I think it kind of makes sense to
> consider the behaviour of 1.7.1 a bug. It has been present throughout all
> the 1.8 rc/beta/apha series.
> >>
> >> So yes it is a change in behaviour whether it is a regression or an
> integrity improvement is up for discussion. Either way I don’t consider it
> a blocker.
> >>
> >> Bolke.
> >>
> >>> On 15 Mar 2017, at 14:06, siddharth anand  wrote:
> >>>
> >>> Here's the JIRA :
> >>> https://issues.apache.org/jira/browse/AIRFLOW-989
> >>>
> >>> I confirmed it is a regression from 1.7.1.3, which I installed via pip
> and
> >>> tested against the same DAG in the JIRA.
> >>>
> >>> The issue occurs if a leaf / last / terminal downstream task is not
> >>> cleared. You won't see this issue if you clear the entire DAG Run or
> clear
> >>> a task and all of its downstream tasks. If you truly want to only
> clear and
> >>> rerun a task, but not its downstream tasks, you can use the CLI to
> execute
> >>> a specific task (e.g. vial airflow run).
> >>>
> >>> This is a change in behavior -- if we do go ahead with the release,
> then
> >>> this JIRA should be in a list of JIRAs of known issues related to the
> new
> >>> version.
> >>> -s
> >>>
> >>> On Wed, Mar 15, 2017 at 9:17 AM, Chris Riccomini <
> criccom...@apache.org>
> >>> wrote:
> >>>
> >>>> @Sid, does this happen if you clear downstream as well?
> >>>>
> >>>> On Wed, Mar 15, 2017 at 9:04 AM, Chris Riccomini <
> criccom...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Has anyone been able to reproduce Sid's issue?
> >>>>>
> >>>>> On Tue, Mar 14, 2017 at 11:17 PM, Bolke de Bruin 
> >>>>> wrote:
> >>>>>
> >>>>>> That is not an airflow error, but a Kerberos error. Try executing
> the
> >>>>>> kinit command on the command line by yourself.
> >>>>>>
> >>>>>> Bolke
> >>>>>>
> >>>>>> Sent from my iPhone
> >>>>>>
> >>>>>>> On 14 Mar 2017, at 23:11, Ruslan Dautkhanov 
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> `airflow kerberos` is broken in 1.8-rc5
> >>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-987
> >>>>>>> Hopefully fix can be part of the 1.8 release.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Ruslan Dautkhanov
> >>>>>>>
> >>>>>>>> On Tue, Mar 14, 2017 at 6:19 PM, siddharth anand <
> san...@apache.org>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>> FYI,
> >>>>>>>> I've just hit a major bug in the release candidate related to
> "clear
> >>>>>> task"
> >>>>>>>> behavior.
> >>>>>>>>
> >>>>>>>> I've been running airflow in both stage and prod since yesterday
> on
> >>>>>> rc5 and
> >>>>>>>> have reproduced this in both environments. I will file a JIRA for
> >>>> this
> >>>>>>>> tonight, but wanted to send a note over email as well.
> >>>>>>>>
> >>>>>>>> In my example, I have a 2 task DAG. For a given DAG run that has

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-15 Thread siddharth anand
Here's the JIRA :
https://issues.apache.org/jira/browse/AIRFLOW-989

I confirmed it is a regression from 1.7.1.3, which I installed via pip and
tested against the same DAG in the JIRA.

The issue occurs if a leaf / last / terminal downstream task is not
cleared. You won't see this issue if you clear the entire DAG Run or clear
a task and all of its downstream tasks. If you truly want to only clear and
rerun a task, but not its downstream tasks, you can use the CLI to execute
a specific task (e.g. vial airflow run).

This is a change in behavior -- if we do go ahead with the release, then
this JIRA should be in a list of JIRAs of known issues related to the new
version.
-s

On Wed, Mar 15, 2017 at 9:17 AM, Chris Riccomini 
wrote:

> @Sid, does this happen if you clear downstream as well?
>
> On Wed, Mar 15, 2017 at 9:04 AM, Chris Riccomini 
> wrote:
>
> > Has anyone been able to reproduce Sid's issue?
> >
> > On Tue, Mar 14, 2017 at 11:17 PM, Bolke de Bruin 
> > wrote:
> >
> >> That is not an airflow error, but a Kerberos error. Try executing the
> >> kinit command on the command line by yourself.
> >>
> >> Bolke
> >>
> >> Sent from my iPhone
> >>
> >> > On 14 Mar 2017, at 23:11, Ruslan Dautkhanov 
> >> wrote:
> >> >
> >> > `airflow kerberos` is broken in 1.8-rc5
> >> > https://issues.apache.org/jira/browse/AIRFLOW-987
> >> > Hopefully fix can be part of the 1.8 release.
> >> >
> >> >
> >> >
> >> > --
> >> > Ruslan Dautkhanov
> >> >
> >> >> On Tue, Mar 14, 2017 at 6:19 PM, siddharth anand 
> >> wrote:
> >> >>
> >> >> FYI,
> >> >> I've just hit a major bug in the release candidate related to "clear
> >> task"
> >> >> behavior.
> >> >>
> >> >> I've been running airflow in both stage and prod since yesterday on
> >> rc5 and
> >> >> have reproduced this in both environments. I will file a JIRA for
> this
> >> >> tonight, but wanted to send a note over email as well.
> >> >>
> >> >> In my example, I have a 2 task DAG. For a given DAG run that has
> >> completed
> >> >> successfully, if I
> >> >> 1) clear task2 (leaf task in this case), the previously-successful
> DAG
> >> Run
> >> >> goes back to Running, requeues, and executes the task successfully.
> >> The DAG
> >> >> Run the returns from Running to Success.
> >> >> 2) clear task1 (root task in this case), the previously-successful
> DAG
> >> Run
> >> >> goes back to Running, DOES NOT requeue or execute the task at all.
> The
> >> DAG
> >> >> Run the returns from Running to Success though it never ran the task.
> >> >>
> >> >> 1) is expected and previous behavior. 2) is a regression.
> >> >>
> >> >> The only workaround is to use the CLI to run the task cleared. Here
> are
> >> >> some images :
> >> >> *After Clearing the Tasks*
> >> >> https://www.dropbox.com/s/wmuxt0krwx6wurr/Screenshot%
> >> >> 202017-03-14%2014.09.34.png?dl=0
> >> >>
> >> >> *After DAG Runs return to Success*
> >> >> https://www.dropbox.com/s/qop933rzgdzchpd/Screenshot%
> >> >> 202017-03-14%2014.09.49.png?dl=0
> >> >>
> >> >> This is a major regression because it will force everyone to use the
> >> CLI
> >> >> for things that they would normally use the UI for.
> >> >>
> >> >> -s
> >> >>
> >> >>
> >> >> -s
> >> >>
> >> >>
> >> >>> On Tue, Mar 14, 2017 at 1:32 PM, Daniel Huang 
> >> wrote:
> >> >>>
> >> >>> +1 (non-binding)!
> >> >>>
> >> >>> On Tue, Mar 14, 2017 at 11:35 AM, siddharth anand <
> san...@apache.org>
> >> >>> wrote:
> >> >>>
> >> >>>> +1 (binding)
> >> >>>>
> >> >>>>
> >> >>>> On Tue, Mar 14, 2017 at 8:42 AM, Maxime Beauchemin <
> >> >>>> maximebeauche...@gmail.com> wrote:
> >> >>>>
> >> >>>>> +1 (binding)
> >> >>>>>
> >> >>>>> On Tue, Mar 14, 2017 at 3:59 AM, Alex Van Bo

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-14 Thread siddharth anand
FYI,
I've just hit a major bug in the release candidate related to "clear task"
behavior.

I've been running airflow in both stage and prod since yesterday on rc5 and
have reproduced this in both environments. I will file a JIRA for this
tonight, but wanted to send a note over email as well.

In my example, I have a 2 task DAG. For a given DAG run that has completed
successfully, if I
1) clear task2 (leaf task in this case), the previously-successful DAG Run
goes back to Running, requeues, and executes the task successfully. The DAG
Run the returns from Running to Success.
2) clear task1 (root task in this case), the previously-successful DAG Run
goes back to Running, DOES NOT requeue or execute the task at all. The DAG
Run the returns from Running to Success though it never ran the task.

1) is expected and previous behavior. 2) is a regression.

The only workaround is to use the CLI to run the task cleared. Here are
some images :
*After Clearing the Tasks*
https://www.dropbox.com/s/wmuxt0krwx6wurr/Screenshot%202017-03-14%2014.09.34.png?dl=0

*After DAG Runs return to Success*
https://www.dropbox.com/s/qop933rzgdzchpd/Screenshot%202017-03-14%2014.09.49.png?dl=0

This is a major regression because it will force everyone to use the CLI
for things that they would normally use the UI for.

-s


-s


On Tue, Mar 14, 2017 at 1:32 PM, Daniel Huang  wrote:

> +1 (non-binding)!
>
> On Tue, Mar 14, 2017 at 11:35 AM, siddharth anand 
> wrote:
>
> > +1 (binding)
> >
> >
> > On Tue, Mar 14, 2017 at 8:42 AM, Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> >
> > > +1 (binding)
> > >
> > > On Tue, Mar 14, 2017 at 3:59 AM, Alex Van Boxel 
> > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Note: we had to revert all our ONE_SUCCESS with ALL_SUCCESS trigger
> > rules
> > > > where the parent nodes where joining with a SKIP. But I can of should
> > > have
> > > > known this was coming. Apart of that I had a successful run last
> night.
> > > >
> > > >
> > > > On Tue, Mar 14, 2017 at 1:37 AM siddharth anand 
> > > wrote:
> > > >
> > > > I'm going to deploy this to staging now. Fab work Bolke!
> > > > -s
> > > >
> > > > On Mon, Mar 13, 2017 at 2:16 PM, Dan Davydov  .
> > > > invalid
> > > > > wrote:
> > > >
> > > > > I'll test this on staging as soon as I get a chance (the testing is
> > > > > non-blocking on the rc5). Bolke very much in particular :).
> > > > >
> > > > > On Mon, Mar 13, 2017 at 10:46 AM, Jeremiah Lowin <
> jlo...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > +1 (binding) extremely impressed by the work and diligence all
> > > > > contributors
> > > > > > have put in to getting these blockers fixed, Bolke in particular.
> > > > > >
> > > > > > On Mon, Mar 13, 2017 at 1:07 AM Arthur Wiedmer <
> art...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > Thanks again for steering us through Bolke.
> > > > > > >
> > > > > > > Best,
> > > > > > > Arthur
> > > > > > >
> > > > > > > On Sun, Mar 12, 2017 at 9:59 PM, Bolke de Bruin <
> > bdbr...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Dear All,
> > > > > > > >
> > > > > > > > Finally, I have been able to make the FIFTH RELEASE CANDIDATE
> > of
> > > > > > Airflow
> > > > > > > > 1.8.0 available at: https://dist.apache.org/repos/
> > > > > > > > dist/dev/incubator/airflow/ <https://dist.apache.org/
> > > > > > > > repos/dist/dev/incubator/airflow/> , public keys are
> available
> > > at
> > > > > > > > https://dist.apache.org/repos/dist/release/incubator/
> airflow/
> > <
> > > > > > > > https://dist.apache.org/repos/dist/release/incubator/
> airflow/>
> > .
> > > > It
> > > > > is
> > > > > > > > tagged with a local version “apache.incubating” so it allows
> > > > > upgrading
> > > > > > > from
> > > > > > > > earlier r

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-14 Thread siddharth anand
+1 (binding)


On Tue, Mar 14, 2017 at 8:42 AM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> +1 (binding)
>
> On Tue, Mar 14, 2017 at 3:59 AM, Alex Van Boxel  wrote:
>
> > +1 (binding)
> >
> > Note: we had to revert all our ONE_SUCCESS with ALL_SUCCESS trigger rules
> > where the parent nodes where joining with a SKIP. But I can of should
> have
> > known this was coming. Apart of that I had a successful run last night.
> >
> >
> > On Tue, Mar 14, 2017 at 1:37 AM siddharth anand 
> wrote:
> >
> > I'm going to deploy this to staging now. Fab work Bolke!
> > -s
> >
> > On Mon, Mar 13, 2017 at 2:16 PM, Dan Davydov  > invalid
> > > wrote:
> >
> > > I'll test this on staging as soon as I get a chance (the testing is
> > > non-blocking on the rc5). Bolke very much in particular :).
> > >
> > > On Mon, Mar 13, 2017 at 10:46 AM, Jeremiah Lowin 
> > > wrote:
> > >
> > > > +1 (binding) extremely impressed by the work and diligence all
> > > contributors
> > > > have put in to getting these blockers fixed, Bolke in particular.
> > > >
> > > > On Mon, Mar 13, 2017 at 1:07 AM Arthur Wiedmer 
> > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Thanks again for steering us through Bolke.
> > > > >
> > > > > Best,
> > > > > Arthur
> > > > >
> > > > > On Sun, Mar 12, 2017 at 9:59 PM, Bolke de Bruin  >
> > > > wrote:
> > > > >
> > > > > > Dear All,
> > > > > >
> > > > > > Finally, I have been able to make the FIFTH RELEASE CANDIDATE of
> > > > Airflow
> > > > > > 1.8.0 available at: https://dist.apache.org/repos/
> > > > > > dist/dev/incubator/airflow/ <https://dist.apache.org/
> > > > > > repos/dist/dev/incubator/airflow/> , public keys are available
> at
> > > > > > https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> > > > > > https://dist.apache.org/repos/dist/release/incubator/airflow/> .
> > It
> > > is
> > > > > > tagged with a local version “apache.incubating” so it allows
> > > upgrading
> > > > > from
> > > > > > earlier releases.
> > > > > >
> > > > > > Issues fixed since rc4:
> > > > > >
> > > > > > [AIRFLOW-900] Double trigger should not kill original task
> instance
> > > > > > [AIRFLOW-900] Fixes bugs in LocalTaskJob for double run
> protection
> > > > > > [AIRFLOW-932] Do not mark tasks removed when backfilling
> > > > > > [AIRFLOW-961] run onkill when SIGTERMed
> > > > > > [AIRFLOW-910] Use parallel task execution for backfills
> > > > > > [AIRFLOW-967] Wrap strings in native for py2 ldap compatibility
> > > > > > [AIRFLOW-941] Use defined parameters for psycopg2
> > > > > > [AIRFLOW-719] Prevent DAGs from ending prematurely
> > > > > > [AIRFLOW-938] Use test for True in task_stats queries
> > > > > > [AIRFLOW-937] Improve performance of task_stats
> > > > > > [AIRFLOW-933] use ast.literal_eval rather eval because
> > > ast.literal_eval
> > > > > > does not execute input.
> > > > > > [AIRFLOW-919] Running tasks with no start date shouldn't break a
> > DAGs
> > > > UI
> > > > > > [AIRFLOW-897] Prevent dagruns from failing with unfinished tasks
> > > > > > [AIRFLOW-861] make pickle_info endpoint be login_required
> > > > > > [AIRFLOW-853] use utf8 encoding for stdout line decode
> > > > > > [AIRFLOW-856] Make sure execution date is set for local client
> > > > > > [AIRFLOW-830][AIRFLOW-829][AIRFLOW-88] Reduce Travis log
> verbosity
> > > > > > [AIRFLOW-794] Access DAGS_FOLDER and SQL_ALCHEMY_CONN exclusively
> > > from
> > > > > > settings
> > > > > > [AIRFLOW-694] Fix config behaviour for empty envvar
> > > > > > [AIRFLOW-365] Set dag.fileloc explicitly and use for Code view
> > > > > > [AIRFLOW-931] Do not set QUEUED in TaskInstances
> > > > > > [AIRFLOW-899] Tasks in SCHEDULED state should be white in the UI
> > > > instead
> > > > > > of black
> > > > > > [AIRFLOW-895] Address Apache release incompliancies
> > > > > > [AIRFLOW-893][AIRFLOW-510] Fix crashing webservers when a dagrun
> > has
> > > no
> > > > > > start date
> > > > > > [AIRFLOW-793] Enable compressed loading in S3ToHiveTransfer
> > > > > > [AIRFLOW-863] Example DAGs should have recent start dates
> > > > > > [AIRFLOW-869] Refactor mark success functionality
> > > > > > [AIRFLOW-856] Make sure execution date is set for local client
> > > > > > [AIRFLOW-814] Fix Presto*CheckOperator.__init__
> > > > > > [AIRFLOW-844] Fix cgroups directory creation
> > > > > >
> > > > > > No known issues anymore.
> > > > > >
> > > > > > I would also like to raise a VOTE for releasing 1.8.0 based on
> > > release
> > > > > > candidate 5, i.e. just renaming release candidate 5 to 1.8.0
> > release.
> > > > > >
> > > > > > Please respond to this email by:
> > > > > >
> > > > > > +1,0,-1 with *binding* if you are a PMC member or *non-binding*
> if
> > > you
> > > > > are
> > > > > > not.
> > > > > >
> > > > > > Thanks!
> > > > > > Bolke
> > > > > >
> > > > > > My VOTE: +1 (binding)
> > > > >
> > > >
> > >
> >
> > --
> >   _/
> > _/ Alex Van Boxel
> >
>


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-13 Thread siddharth anand
I'm going to deploy this to staging now. Fab work Bolke!
-s

On Mon, Mar 13, 2017 at 2:16 PM, Dan Davydov  wrote:

> I'll test this on staging as soon as I get a chance (the testing is
> non-blocking on the rc5). Bolke very much in particular :).
>
> On Mon, Mar 13, 2017 at 10:46 AM, Jeremiah Lowin 
> wrote:
>
> > +1 (binding) extremely impressed by the work and diligence all
> contributors
> > have put in to getting these blockers fixed, Bolke in particular.
> >
> > On Mon, Mar 13, 2017 at 1:07 AM Arthur Wiedmer 
> wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks again for steering us through Bolke.
> > >
> > > Best,
> > > Arthur
> > >
> > > On Sun, Mar 12, 2017 at 9:59 PM, Bolke de Bruin 
> > wrote:
> > >
> > > > Dear All,
> > > >
> > > > Finally, I have been able to make the FIFTH RELEASE CANDIDATE of
> > Airflow
> > > > 1.8.0 available at: https://dist.apache.org/repos/
> > > > dist/dev/incubator/airflow/  > > > repos/dist/dev/incubator/airflow/> , public keys are available at
> > > > https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> > > > https://dist.apache.org/repos/dist/release/incubator/airflow/> . It
> is
> > > > tagged with a local version “apache.incubating” so it allows
> upgrading
> > > from
> > > > earlier releases.
> > > >
> > > > Issues fixed since rc4:
> > > >
> > > > [AIRFLOW-900] Double trigger should not kill original task instance
> > > > [AIRFLOW-900] Fixes bugs in LocalTaskJob for double run protection
> > > > [AIRFLOW-932] Do not mark tasks removed when backfilling
> > > > [AIRFLOW-961] run onkill when SIGTERMed
> > > > [AIRFLOW-910] Use parallel task execution for backfills
> > > > [AIRFLOW-967] Wrap strings in native for py2 ldap compatibility
> > > > [AIRFLOW-941] Use defined parameters for psycopg2
> > > > [AIRFLOW-719] Prevent DAGs from ending prematurely
> > > > [AIRFLOW-938] Use test for True in task_stats queries
> > > > [AIRFLOW-937] Improve performance of task_stats
> > > > [AIRFLOW-933] use ast.literal_eval rather eval because
> ast.literal_eval
> > > > does not execute input.
> > > > [AIRFLOW-919] Running tasks with no start date shouldn't break a DAGs
> > UI
> > > > [AIRFLOW-897] Prevent dagruns from failing with unfinished tasks
> > > > [AIRFLOW-861] make pickle_info endpoint be login_required
> > > > [AIRFLOW-853] use utf8 encoding for stdout line decode
> > > > [AIRFLOW-856] Make sure execution date is set for local client
> > > > [AIRFLOW-830][AIRFLOW-829][AIRFLOW-88] Reduce Travis log verbosity
> > > > [AIRFLOW-794] Access DAGS_FOLDER and SQL_ALCHEMY_CONN exclusively
> from
> > > > settings
> > > > [AIRFLOW-694] Fix config behaviour for empty envvar
> > > > [AIRFLOW-365] Set dag.fileloc explicitly and use for Code view
> > > > [AIRFLOW-931] Do not set QUEUED in TaskInstances
> > > > [AIRFLOW-899] Tasks in SCHEDULED state should be white in the UI
> > instead
> > > > of black
> > > > [AIRFLOW-895] Address Apache release incompliancies
> > > > [AIRFLOW-893][AIRFLOW-510] Fix crashing webservers when a dagrun has
> no
> > > > start date
> > > > [AIRFLOW-793] Enable compressed loading in S3ToHiveTransfer
> > > > [AIRFLOW-863] Example DAGs should have recent start dates
> > > > [AIRFLOW-869] Refactor mark success functionality
> > > > [AIRFLOW-856] Make sure execution date is set for local client
> > > > [AIRFLOW-814] Fix Presto*CheckOperator.__init__
> > > > [AIRFLOW-844] Fix cgroups directory creation
> > > >
> > > > No known issues anymore.
> > > >
> > > > I would also like to raise a VOTE for releasing 1.8.0 based on
> release
> > > > candidate 5, i.e. just renaming release candidate 5 to 1.8.0 release.
> > > >
> > > > Please respond to this email by:
> > > >
> > > > +1,0,-1 with *binding* if you are a PMC member or *non-binding* if
> you
> > > are
> > > > not.
> > > >
> > > > Thanks!
> > > > Bolke
> > > >
> > > > My VOTE: +1 (binding)
> > >
> >
>


Re: scheduler running on multiple nodes

2017-02-23 Thread siddharth anand
I did  run 2 or more schedulers with Local Executors up until mid last
year. There have been enough changes to the code and feature additions that
I don't think this is a recommended practice at this point. Also, there is
not a lot of synchronization in the scheduler to ensure this will work.

-s

On Thu, Feb 9, 2017 at 6:47 AM, matus valo  wrote:

> Hi all,
>
>
>
> I am considering deployment of airflow as pipeline framework. I have found
> out multiple articles explaining deployment of airflow in distributed
> environment (e.g. [1]). Unfortunately, I was not able to find out any use
> case where scheduler is deployed distributed on multiple nodes. Is it
> possible to have scheduler distributed on multiple nodes to prevent single
> point of failure? I haven’t found any mention about it in documentation. I
> have found out in [2] that it is not possible but on the other hand in [3]
> is reference that this can be solved in new version of airflow.
>
>
>
> Thanks,
>
>
> Matus
>
>
>
> [1] http://site.clairvoyantsoft.com/setting-apache-airflow-cluster/
>
> [2] https://groups.google.com/forum/#!topic/airbnb_airflow/-1wKa3OcwME
>
> [3] https://issues.apache.org/jira/browse/AIRFLOW-678
>


Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-23 Thread siddharth anand
IMHO, a DAG run without a start date is non-sensical but is not enforced
 That said, our UI allows for the manual creation of DAG Runs without a
start date as shown in the images below:


   - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
   202017-02-22%2016.00.40.png?dl=0
   
<https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%202017-02-22%2016.00.40.png?dl=0>
   - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
   202017-02-22%2016.02.22.png?dl=0
   
<https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%202017-02-22%2016.02.22.png?dl=0>


On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Our database may have edge cases that could be associated with running any
> previous version that may or may not have been part of an official release.
>
> Let's see if anyone else reports the issue. If no one does, one option is
> to release 1.8.0 as is with a comment in the release notes, and have a
> future official minor apache release 1.8.1 that would fix these minor
> issues that are not deal breaker.
>
> @bolke, I'm curious, how long does it take you to go through one release
> cycle? Oh, and do you have a documented step by step process for releasing?
> I'd like to add the Pypi part to this doc and add committers that are
> interested to have rights on the project on Pypi.
>
> Max
>
> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin  wrote:
>
> > So it is a database integrity issue? Afaik a start_date should always be
> > set for a DagRun (create_dagrun) does so  I didn't check the code though.
> >
> > Sent from my iPhone
> >
> > > On 22 Feb 2017, at 22:19, Dan Davydov 
> > wrote:
> > >
> > > Should clarify this occurs when a dagrun does not have a start date,
> not
> > a
> > > dag (which makes it even less likely to happen). I don't think this is
> a
> > > blocker for releasing.
> > >
> > >> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov 
> > wrote:
> > >>
> > >> I rolled this out in our prod and the webservers failed to load due to
> > >> this commit:
> > >>
> > >> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag
> > >> 7c94d81c390881643f94d5e3d7d6fb351a445b72
> > >>
> > >> This fixed it:
> > >> -  > >> class="glyphicon glyphicon-info-sign" aria-hidden="true" title="Start
> > Date:
> > >> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}">
> > >> +  > >> class="glyphicon glyphicon-info-sign" aria-hidden="true">
> > >>
> > >> This is caused by assuming that all DAGs have start dates set, so a
> > broken
> > >> DAG will take down the whole UI. Not sure if we want to make this a
> > blocker
> > >> for the release or not, I'm guessing for most deployments this would
> > occur
> > >> pretty rarely. I'll submit a PR to fix it soon.
> > >>
> > >>
> > >>
> > >> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini <
> criccom...@apache.org
> > >
> > >> wrote:
> > >>
> > >>> Ack that the vote has already passed, but belated +1 (binding)
> > >>>
> > >>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin 
> > >>> wrote:
> > >>>
> > >>>> IPMC Voting can be found here:
> > >>>>
> > >>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
> > >>> 201702.mbox/%
> > >>>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3e <
> > >>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
> > >>> 201702.mbox/%
> > >>>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3E>
> > >>>>
> > >>>> Kind regards,
> > >>>> Bolke
> > >>>>
> > >>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin 
> wrote:
> > >>>>>
> > >>>>> Hello,
> > >>>>>
> > >>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been accepted.
> > >>>>>
> > >>>>> 9 “+1” votes received:
> > >>>>>
> > >>>>> - Maxime Beauchemin (binding)
> > >>>>> - Arthur Wiedmer (binding)
> > >>>>> - Dan Davydov (binding)
> > >

Re: Meetup featuring an Airflow talk tomorrow

2017-02-23 Thread siddharth anand
Nice!
https://twitter.com/ApacheAirflow/status/834945481440546816

Do please share slides and video so we can post both on via twitter & wiki.
-s

On Wed, Feb 22, 2017 at 3:00 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Hi,
>
> Just wanted to let you know that Arthur (one of our Apache Airflow
> committers) will be giving an talk on "Building Data Workflows with
> Airflow" tomorrow 2/22 at Galvanize
>
> https://www.meetup.com/SF-Data-Engineering/events/
> 237797553/?rv=md1&_af=event&_af_eid=237797553&https=on
>
> Enjoy!
>
> Max
>


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-20 Thread siddharth anand
+1 (binding). Thx Bolke!
-s

On Mon, Feb 20, 2017 at 2:51 PM, Alex Van Boxel  wrote:

> +1 (binding)
>
> On Mon, Feb 20, 2017 at 5:32 AM y...@yahoo-inc.com.INVALID
>  wrote:
>
> >
> > +1 (non-binding)
> >
> > Thanks for all the work!
> >
> > YiOn Sunday, February 19, 2017, 12:52:31 PM PST, Arthur Wiedmer <
> > arthur.wied...@gmail.com> wrote:+1 (binding)
> >
> > Thanks again for all the work!
> >
> > Best,
> > Arthur
> >
> > On Fri, Feb 17, 2017 at 4:46 PM, Jeremiah Lowin 
> wrote:
> >
> > > +1 (binding) many thanks for all your work on this Bolke!
> > >
> > > On Fri, Feb 17, 2017 at 7:10 PM Jayesh Senjaliya 
> > > wrote:
> > >
> > > +1 ( non-binding )works fine for me.
> > >
> > >
> > >
> > > On Fri, Feb 17, 2017 at 3:37 PM, Maxime Beauchemin <
> > > maximebeauche...@gmail.com> wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Fri, Feb 17, 2017 at 11:33 AM, Dan Davydov <
> > > > dan.davy...@airbnb.com.invalid> wrote:
> > > >
> > > > > +1 (binding). Mark success works great now, thanks to Bolke for
> > fixing.
> > > > >
> > > > > On Fri, Feb 17, 2017 at 12:22 AM, Bolke de Bruin <
> bdbr...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Dear All,
> > > > > >
> > > > > > I have made the FOURTH RELEASE CANDIDATE of Airflow 1.8.0
> available
> > > at:
> > > > > > https://dist.apache.org/repos/dist/dev/incubator/airflow/ <
> > > > > > https://dist.apache.org/repos/dist/dev/incubator/airflow/> ,
> > public
> > > > keys
> > > > > > are available at https://dist.apache.org/repos/
> > > > > > dist/release/incubator/airflow/  > > > > > /dist/release/incubator/airflow/> . It is tagged with a local
> > > version
> > > > > > “apache.incubating” so it allows upgrading from earlier releases.
> > > > > >
> > > > > > One issues have been fixed since release candidate 3:
> > > > > >
> > > > > > * mark success was not working properly
> > > > > >
> > > > > > No known issues anymore.
> > > > > >
> > > > > > I would also like to raise a VOTE for releasing 1.8.0 based on
> > > release
> > > > > > candidate 4, i.e. just renaming release candidate 4 to 1.8.0
> > release.
> > > > > >
> > > > > > Please respond to this email by:
> > > > > >
> > > > > > +1,0,-1 with *binding* if you are a PMC member or *non-binding*
> if
> > > you
> > > > > are
> > > > > > not.
> > > > > >
> > > > > > Thanks!
> > > > > > Bolke
> > > > > >
> > > > > > My VOTE: +1 (binding)
> > > > >
> > > >
> > >
>
> --
>   _/
> _/ Alex Van Boxel
>


Re: Soliciting feedback: Using the Airflow CLI as a thin client

2017-02-15 Thread siddharth anand
Hi Wilson,
I'm a huge fan of the CLI and you are correct that the released current
version of the CLI requires both a connection to the DB and access to the
dag folder.

In the new 1.8.0 release that is currently being driven by Bolke, the CLI
uses the API. I'm not 100% sure that all CLIs commands have API end-points,
but I suspect it's nearly complete if not already complete. That reminds
me.. as we vet the 1.8.0 release candidates, we should test out both CLI
and API.

In a nutshell, the goal is for the CLI to be a thin-wrapper that talks to
the API (running on the webserver), which would have access to both the DB
and DAG folder. This would allow anyone to run CLI from any machine that
has access to the API endpoints.
-s

On Tue, Feb 14, 2017 at 1:40 PM, Wilson Lian 
wrote:

> Hi all,
>
> I'm interested in using the Airflow CLI as a thin client so that I can run
> DAG-management commands like pause, unpause, trigger_dag, run, etc. from a
> local machine against a remote airflow cluster (e.g., running in Google
> Container Engine).
>
> I have tried pointing [core]sql_alchemy_conn at the remote database, but
> without a shared view of the DAGs folder, the different components don't
> seem to be able to sync up. For example, list_dags looks at the local DAGs
> folder, but not at the database; and using trigger_dag with a local DAG
> file seems to put the DAG in the database, but its task instances never
> execute, presumably because none of the nodes in the cluster have a copy of
> the DAG file.
>
> I think in order for the CLI to be used as a thin client, the database,
> rather than the DAGs folder needs to be used as the source of truth for
> DAGs (and possibly other objects). Can anyone provide an estimate of how
> heavyweight such a change would be?
>
> I'm also curious what people think about delegating the pointer to the
> current config file to a higher-level config file that contains references
> to different configurations and a pointer to the "current" config.
>


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc3

2017-02-13 Thread siddharth anand
Folks!
I need to change my vote.. -1 (Binding).


Mark Success/Clear is broken in the UI. It's a regression.

-s

On Mon, Feb 13, 2017 at 10:53 AM, Alex Van Boxel  wrote:

> +1 (binding)
>
> On Mon, Feb 13, 2017 at 7:45 PM siddharth anand  wrote:
>
> > +1 (binding)
> >
> > On Mon, Feb 13, 2017 at 8:57 AM, Chris Riccomini 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > On Sun, Feb 12, 2017 at 8:54 AM, Jeremiah Lowin 
> > wrote:
> > >
> > > > Interesting -- I also run on Kubernetes with a git-sync sidecar, but
> > the
> > > > containers wait for the synced repo to apprar before starting since
> it
> > > > contains some dependencies -- I assume that's why I didn't experience
> > the
> > > > same issue.
> > > >
> > > > On Sun, Feb 12, 2017 at 6:29 AM Bolke de Bruin 
> > > wrote:
> > > >
> > > > > Although the race condition doesn't explain why “num_runs = None”
> > > > resolved
> > > > > the issue for you earlier, but it does give a clue now: the PR that
> > > > > introduced “num_runs = -1” was there to be able to work with empty
> > dag
> > > > > dirs, maybe it wasn’t fully covered yet.
> > > > >
> > > > > Bolke
> > > > >
> > > > > > On 12 Feb 2017, at 12:26, Bolke de Bruin 
> > wrote:
> > > > > >
> > > > > > Ok great! Thanks! That sounds like a race condition: module not
> > > > > available yet at time of reading. I would expect that it resolves
> > > itself
> > > > > after a while.
> > > > > >
> > > > > > After talking to some people at the Warsaw BigData conf I have
> some
> > > > > ideas around syncing dags, Spoiler: no dependency on git.
> > > > > >
> > > > > > - Bolke
> > > > > >
> > > > > >> On 12 Feb 2017, at 11:17, Alex Van Boxel 
> > wrote:
> > > > > >>
> > > > > >> Running ok, in staging... @bolke I'm running patch-less. I've
> > > switched
> > > > > my
> > > > > >> Kubernetes from:
> > > > > >>
> > > > > >> - each container (webserver/scheduler/worker) had a git-sync'er
> > > > (getting
> > > > > >> the dags from git)
> > > > > >>> this meant that the scheduler had 0 dags at startup, and should
> > > have
> > > > > >> picked them up later
> > > > > >>
> > > > > >> to
> > > > > >>
> > > > > >> - single NFS share that shares airflow_home over each container
> > > > > >>> the git sync'er is now a seperate container running before the
> > > other
> > > > > >> containers
> > > > > >>
> > > > > >> This resolved my mystery DAG crashes.
> > > > > >>
> > > > > >> I'll be updating production to a patchless RC3 today, you get my
> > > vote
> > > > > after
> > > > > >> that.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Sun, Feb 12, 2017 at 4:59 AM Boris Tyukin <
> > bo...@boristyukin.com
> > > >
> > > > > wrote:
> > > > > >>
> > > > > >>> awesome! thanks Jeremiah
> > > > > >>>
> > > > > >>> On Sat, Feb 11, 2017 at 12:53 PM, Jeremiah Lowin <
> > > jlo...@apache.org>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> Boris, I submitted a PR to address your second point --
> > > > > >>>> https://github.com/apache/incubator-airflow/pull/2068.
> Thanks!
> > > > > >>>>
> > > > > >>>> On Sat, Feb 11, 2017 at 10:42 AM Boris Tyukin <
> > > > bo...@boristyukin.com>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>>> I am running LocalExecutor and not doing crazy things but use
> > DAG
> > > > > >>>>> generation heavily - everything runs fine as before. As I
> > > mentioned
> > > >

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc3

2017-02-13 Thread siddharth anand
+1 (binding)

On Mon, Feb 13, 2017 at 8:57 AM, Chris Riccomini 
wrote:

> +1 (binding)
>
> On Sun, Feb 12, 2017 at 8:54 AM, Jeremiah Lowin  wrote:
>
> > Interesting -- I also run on Kubernetes with a git-sync sidecar, but the
> > containers wait for the synced repo to apprar before starting since it
> > contains some dependencies -- I assume that's why I didn't experience the
> > same issue.
> >
> > On Sun, Feb 12, 2017 at 6:29 AM Bolke de Bruin 
> wrote:
> >
> > > Although the race condition doesn't explain why “num_runs = None”
> > resolved
> > > the issue for you earlier, but it does give a clue now: the PR that
> > > introduced “num_runs = -1” was there to be able to work with empty dag
> > > dirs, maybe it wasn’t fully covered yet.
> > >
> > > Bolke
> > >
> > > > On 12 Feb 2017, at 12:26, Bolke de Bruin  wrote:
> > > >
> > > > Ok great! Thanks! That sounds like a race condition: module not
> > > available yet at time of reading. I would expect that it resolves
> itself
> > > after a while.
> > > >
> > > > After talking to some people at the Warsaw BigData conf I have some
> > > ideas around syncing dags, Spoiler: no dependency on git.
> > > >
> > > > - Bolke
> > > >
> > > >> On 12 Feb 2017, at 11:17, Alex Van Boxel  wrote:
> > > >>
> > > >> Running ok, in staging... @bolke I'm running patch-less. I've
> switched
> > > my
> > > >> Kubernetes from:
> > > >>
> > > >> - each container (webserver/scheduler/worker) had a git-sync'er
> > (getting
> > > >> the dags from git)
> > > >>> this meant that the scheduler had 0 dags at startup, and should
> have
> > > >> picked them up later
> > > >>
> > > >> to
> > > >>
> > > >> - single NFS share that shares airflow_home over each container
> > > >>> the git sync'er is now a seperate container running before the
> other
> > > >> containers
> > > >>
> > > >> This resolved my mystery DAG crashes.
> > > >>
> > > >> I'll be updating production to a patchless RC3 today, you get my
> vote
> > > after
> > > >> that.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Sun, Feb 12, 2017 at 4:59 AM Boris Tyukin  >
> > > wrote:
> > > >>
> > > >>> awesome! thanks Jeremiah
> > > >>>
> > > >>> On Sat, Feb 11, 2017 at 12:53 PM, Jeremiah Lowin <
> jlo...@apache.org>
> > > >>> wrote:
> > > >>>
> > >  Boris, I submitted a PR to address your second point --
> > >  https://github.com/apache/incubator-airflow/pull/2068. Thanks!
> > > 
> > >  On Sat, Feb 11, 2017 at 10:42 AM Boris Tyukin <
> > bo...@boristyukin.com>
> > >  wrote:
> > > 
> > > > I am running LocalExecutor and not doing crazy things but use DAG
> > > > generation heavily - everything runs fine as before. As I
> mentioned
> > > in
> > > > other threads only had a few issues:
> > > >
> > > > 1) had to upgrade MySQL which was a PAIN. Cloudera CDH is running
> > old
> > > > version of MySQL which was compatible with 1.7.1 but not
> compatible
> > > now
> > > > with 1.8 because of fractional seconds support PR.
> > > >
> > > > 2) when you install airflow, there are two new example DAGs
> > > > (last_task_only) which are going back very far in the past and
> > > >>> scheduled
> > >  to
> > > > run every hour - a bunch of dags triggered on the first start of
> > >  scheduler
> > > > and hosed my CPU
> > > >
> > > > Everything else was fine and I LOVE lots of small UI changes,
> which
> > >  reduced
> > > > a lot my use of cli.
> > > >
> > > > Thanks again for the amazing work and an awesome project!
> > > >
> > > >
> > > > On Sat, Feb 11, 2017 at 9:17 AM, Jeremiah Lowin <
> jlo...@apache.org
> > >
> > >  wrote:
> > > >
> > > >> I was able to deploy successfully. +1 (binding)
> > > >>
> > > >> On Fri, Feb 10, 2017 at 7:37 PM Maxime Beauchemin <
> > > >> maximebeauche...@gmail.com> wrote:
> > > >>
> > > >>> +1 (binding)
> > > >>>
> > > >>> On Fri, Feb 10, 2017 at 3:44 PM, Arthur Wiedmer <
> > > >> arthur.wied...@gmail.com>
> > > >>> wrote:
> > > >>>
> > >  +1 (binding)
> > > 
> > >  On Feb 10, 2017 3:13 PM, "Dan Davydov" <
> dan.davy...@airbnb.com.
> > > >> invalid>
> > >  wrote:
> > > 
> > > > Our staging looks good, all the DAGs there pass.
> > > > +1 (binding)
> > > >
> > > > On Fri, Feb 10, 2017 at 10:21 AM, Chris Riccomini <
> > > >>> criccom...@apache.org
> > > >
> > > > wrote:
> > > >
> > > >> Running in all environments. Will vote after the weekend to
> > >  make
> > > >> sure
> > > >> things are working properly, but so far so good.
> > > >>
> > > >> On Fri, Feb 10, 2017 at 6:05 AM, Bolke de Bruin <
> > > > bdbr...@gmail.com
> > > >>>
> > > > wrote:
> > > >>
> > > >>> Dear All,
> > > >>>
> > > >>> Let’s try again!
> > > >>>
> > > >>> I have made the THIRD R

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc3

2017-02-11 Thread siddharth anand
Deployed to stage and will watch over the weekend before voting.
-s

On Sat, Feb 11, 2017 at 9:53 AM, Jeremiah Lowin  wrote:

> Boris, I submitted a PR to address your second point --
> https://github.com/apache/incubator-airflow/pull/2068. Thanks!
>
> On Sat, Feb 11, 2017 at 10:42 AM Boris Tyukin 
> wrote:
>
> > I am running LocalExecutor and not doing crazy things but use DAG
> > generation heavily - everything runs fine as before. As I mentioned in
> > other threads only had a few issues:
> >
> > 1) had to upgrade MySQL which was a PAIN. Cloudera CDH is running old
> > version of MySQL which was compatible with 1.7.1 but not compatible now
> > with 1.8 because of fractional seconds support PR.
> >
> > 2) when you install airflow, there are two new example DAGs
> > (last_task_only) which are going back very far in the past and scheduled
> to
> > run every hour - a bunch of dags triggered on the first start of
> scheduler
> > and hosed my CPU
> >
> > Everything else was fine and I LOVE lots of small UI changes, which
> reduced
> > a lot my use of cli.
> >
> > Thanks again for the amazing work and an awesome project!
> >
> >
> > On Sat, Feb 11, 2017 at 9:17 AM, Jeremiah Lowin 
> wrote:
> >
> > > I was able to deploy successfully. +1 (binding)
> > >
> > > On Fri, Feb 10, 2017 at 7:37 PM Maxime Beauchemin <
> > > maximebeauche...@gmail.com> wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Fri, Feb 10, 2017 at 3:44 PM, Arthur Wiedmer <
> > > arthur.wied...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Feb 10, 2017 3:13 PM, "Dan Davydov"  > > invalid>
> > > > > wrote:
> > > > >
> > > > > > Our staging looks good, all the DAGs there pass.
> > > > > > +1 (binding)
> > > > > >
> > > > > > On Fri, Feb 10, 2017 at 10:21 AM, Chris Riccomini <
> > > > criccom...@apache.org
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Running in all environments. Will vote after the weekend to
> make
> > > sure
> > > > > > > things are working properly, but so far so good.
> > > > > > >
> > > > > > > On Fri, Feb 10, 2017 at 6:05 AM, Bolke de Bruin <
> > bdbr...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Dear All,
> > > > > > > >
> > > > > > > > Let’s try again!
> > > > > > > >
> > > > > > > > I have made the THIRD RELEASE CANDIDATE of Airflow 1.8.0
> > > available
> > > > > at:
> > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/airflow/ <
> > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/airflow/> ,
> > > > public
> > > > > > keys
> > > > > > > > are available at https://dist.apache.org/repos/
> > > > > dist/release/incubator/
> > > > > > > > airflow/ <
> > https://dist.apache.org/repos/dist/release/incubator/
> > > > > > airflow/>
> > > > > > > > . It is tagged with a local version “apache.incubating” so it
> > > > allows
> > > > > > > > upgrading from earlier releases.
> > > > > > > >
> > > > > > > > Two issues have been fixed since release candidate 2:
> > > > > > > >
> > > > > > > > * trigger_dag could create dags with fractional seconds, not
> > > > > supported
> > > > > > by
> > > > > > > > logging and UI at the moment
> > > > > > > > * local api client trigger_dag had hardcoded execution of
> None
> > > > > > > >
> > > > > > > > Known issue:
> > > > > > > > * Airflow on kubernetes and num_runs -1 (default) can expose
> > > import
> > > > > > > issues.
> > > > > > > >
> > > > > > > > I have extensively discussed this with Alex (reporter) and we
> > > > > consider
> > > > > > > > this a known issue with a workaround available as we are
> unable
> > > to
> > > > > > > > replicate this in a different environment. UPDATING.md has
> been
> > > > > updated
> > > > > > > > with the work around.
> > > > > > > >
> > > > > > > > As these issues are confined to a very specific area and full
> > > unit
> > > > > > tests
> > > > > > > > were added I would also like to raise a VOTE for releasing
> > 1.8.0
> > > > > based
> > > > > > on
> > > > > > > > release candidate 3, i.e. just renaming release candidate 3
> to
> > > > 1.8.0
> > > > > > > > release.
> > > > > > > >
> > > > > > > > Please respond to this email by:
> > > > > > > >
> > > > > > > > +1,0,-1 with *binding* if you are a PMC member or
> *non-binding*
> > > if
> > > > > you
> > > > > > > are
> > > > > > > > not.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > > Bolke
> > > > > > > >
> > > > > > > > My VOTE: +1 (binding)
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: question regarding scheduling mechanism

2017-02-07 Thread siddharth anand
Apache mail servers don't accept in-line images (a.k.a. image attachments).
Please send a link to the image.

-s

On Tue, Feb 7, 2017 at 5:53 AM, הילה ויזן  wrote:

> Hi,
> I've read carefully the documentation related to scheduler and scheduling
> mechanism.
> I found in the docs the following info:
>
> Note that if you run a DAG on a schedule_interval of one day, the run
> stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other
> words, the job instance is started once the period it covers has ended.
>
> BUT I observed a strange "behavior" of task triggering:
> * Task with hourly schedule interval - 15 * * * * - of 12:15 is fired on
> 13:15 .
> * Task with daily schedule interval -0 2 * * * - of 07/02/17 02:00
> wasn't fired yet.
>
>
> [image: Inline image 2]
>
>
> What am i missing?
>
> Thanks!
> Hila
>
>
>


Re: Airflow 1.8.0 Release Candidate 1

2017-02-06 Thread siddharth anand
I did get 1.8.0 installed and running at Agari.

I did run into 2 problems.
1. Most of our DAGs broke due the way Operators are now imported.
https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#deprecated-features

According to the documentation, these deprecations would only cause an
issue in 2.0. However, I needed to fix them now.

So, I needed to change "from airflow.operators import PythonOperator" to
from "from airflow.operators.python_operator import PythonOperator". Am I
missing something?

2. I ran into a migration problem that seems to have cleared itself up. I
did notice that some dags do not have data in their "DAG Runs" column on
the overview page computed. I am looking into that issue presently.
https://www.dropbox.com/s/cn058mtu3vcv8sq/Screenshot%202017-02-06%2018.45.07.png?dl=0

-s

On Mon, Feb 6, 2017 at 4:30 PM, Dan Davydov 
wrote:

> Bolke, attached is the patch for the cgroups fix. Let me know which
> branches you would like me to merge it to. If anyone has complaints about
> the patch let me know (but it does not touch the core of airflow, only the
> new cgroups task runner).
>
> On Mon, Feb 6, 2017 at 4:24 PM, siddharth anand  wrote:
>
>> Actually, I see the error is further down..
>>
>>   File
>> "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py",
>> line
>> 469, in do_execute
>>
>> cursor.execute(statement, parameters)
>>
>> sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value in
>> column "dag_id" violates not-null constraint
>>
>> DETAIL:  Failing row contains (null, running, 1, f).
>>
>>  [SQL: 'INSERT INTO dag_stats (state, count, dirty) VALUES (%(state)s,
>> %(count)s, %(dirty)s)'] [parameters: {'count': 1L, 'state': u'running',
>> 'dirty': False}]
>>
>> It looks like an autoincrement is missing for this table.
>>
>>
>> I'm running `SQLAlchemy==1.1.4` - I see our setup.py specifies any version
>> greater than 0.9.8
>>
>> -s
>>
>>
>>
>> On Mon, Feb 6, 2017 at 4:11 PM, siddharth anand 
>> wrote:
>>
>> > I tried upgrading to 1.8.0rc1 from 1.7.1.3 via pip install
>> > https://dist.apache.org/repos/dist/dev/incubator/airflow/
>> > airflow-1.8.0rc1+apache.incubating.tar.gz and then running airflow
>> > upgradedb didn't quite work. First, I thought it completed successfully,
>> > then saw errors some tables were indeed missing. I ran it again and
>> > encountered the following exception :
>> >
>> > DB: postgresql://app_coust...@db-cousteau.ep.stage.agari.com:543
>> 2/airflow
>> >
>> > [2017-02-07 00:03:20,309] {db.py:284} INFO - Creating tables
>> >
>> > INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
>> >
>> > INFO  [alembic.runtime.migration] Will assume transactional DDL.
>> >
>> > INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 ->
>> > 211e584da130, add TI state index
>> >
>> > INFO  [alembic.runtime.migration] Running upgrade 211e584da130 ->
>> > 64de9cddf6c9, add task fails journal table
>> >
>> > INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 ->
>> > f2ca10b85618, add dag_stats table
>> >
>> > INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 ->
>> > 4addfa1236f1, Add fractional seconds to mysql tables
>> >
>> > INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 ->
>> > 8504051e801b, xcom dag task indices
>> >
>> > INFO  [alembic.runtime.migration] Running upgrade 8504051e801b ->
>> > 5e7d17757c7a, add pid field to TaskInstance
>> >
>> > INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a ->
>> > 127d2bf2dfa7, Add dag_id/state index on dag_run table
>> >
>> > /usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/crud.py:692:
>> > SAWarning: Column 'dag_stats.dag_id' is marked as a member of the
>> primary
>> > key for table 'dag_stats', but has no Python-side or server-side default
>> > generator indicated, nor does it indicate 'autoincrement=True' or
>> > 'nullable=True', and no explicit value is passed.  Primary key columns
>> > typically may not store NULL. Note that as of SQLAlchemy 1.1,
>> > 'autoincrement=True' must be indicated explicitly for composite (e.g.
>> > multicolumn) primary keys if AUTO_INCREMENT/SERIAL/IDENTITY behavior is
>> > expected for one of the columns in the primary key. CREATE TABLE
>> statements
>> > are impacted by this change as well on most backends.
>> >
>>
>
>


Re: Airflow Meetup @ Paypal (San Jose)

2017-02-06 Thread siddharth anand
I'd +1 the presentation. The panel and roadmap idea is enticing but
challenging since we tend to have disparate company-guided roadmaps that
tend to guide each of our individual efforts to some degree. To some extent
all of the contributors have a mini-roadmap in mind of operators or
features they would like to have implemented.

Gurer, where are the results of the roadmap survey that you collected?
https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg01777.html

If you feel you didn't get enough results, it might be a good idea for us
to kick that off again.
-s

On Fri, Feb 3, 2017 at 10:34 AM, Jayesh Senjaliya 
wrote:

> yeah, I think we should have both. since we only have 2 presentations,
> we will have plenty of time for round table and Q&A.
>
> - Jayesh
>
> On Fri, Feb 3, 2017 at 9:49 AM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > I like the idea of a panel talking about direction for the project. The
> > panel could be taking questions from a moderator and from the audience.
> I'm
> > open sitting on the panel.
> >
> > I could also give a talk on our A/B testing framework's complex DAG if I
> > can't get the engineers working on it to do it :)
> >
> > Maybe we can do both, and just make the Q/A section we usually have a
> panel
> > Q/A instead of a Max Q/A.
> >
> > Max
> >
> > On Fri, Feb 3, 2017 at 5:07 AM, Bolke de Bruin 
> wrote:
> >
> > > I might. But I would maybe be more interested in a kind of round table
> /
> > > panel session to discuss directions? Does that make sense? Or would you
> > > like me to talk about a specific subject?
> > >
> > > - Bolke.
> > >
> > > > On 3 Feb 2017, at 03:42, siddharth anand  wrote:
> > > >
> > > > Cool! I've tweeted it out using the ApacheAirflow account and also
> > added
> > > it
> > > > to https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements
> > > >
> > > > FYI,
> > > > I was mistaken about the drive between Strata and PayPal. I had used
> > the
> > > > wrong venue. Strata is at the SJ Convention Center this year.  It's
> > still
> > > > very close.. 5 miles (12 minutes - reverse commute).
> > > >
> > > > https://goo.gl/maps/nwxmkYsNFKQ2
> > > >
> > > > BTW, I heard Bolke may be attending ;-) Bolke, would you like to
> speak
> > at
> > > > the Meetup?
> > > >
> > > > Jakob (other committers), will be down here for Strata?
> > > >
> > > > -s
> > > >
> > > > On Thu, Feb 2, 2017 at 5:02 PM, Jayesh Senjaliya <
> jhsonl...@gmail.com>
> > > > wrote:
> > > >
> > > >> Sure,
> > > >> I have created event on Meetup :
> > > >> https://www.meetup.com/Bay-Area-Apache-Airflow-
> > > Incubating-Meetup/events/
> > > >> 237412864/
> > > >>
> > > >> Thanks for helping on this Siddharth.
> > > >> Jayesh
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Feb 1, 2017 at 7:50 PM, siddharth anand 
> > > wrote:
> > > >>
> > > >>> IMHO, I'd publish the meet-up. You still have 6 weeks to find a 3rd
> > > >>> speaker. If Bolke and Alex are traveling all the way for Strata,
> > > perhaps
> > > >>> one of them can speak :-)
> > > >>>
> > > >>> -s
> > > >>>
> > > >>> On Wed, Feb 1, 2017 at 1:48 PM, Russell Jurney <
> > > russell.jur...@gmail.com
> > > >>>
> > > >>> wrote:
> > > >>>
> > > >>>> Maybe start a new thread with a title "Call for Speakers for
> Meetup
> > on
> > > >>> Mar
> > > >>>> 14" ?
> > > >>>>
> > > >>>> On Wed, Feb 1, 2017 at 11:59 AM Jayesh Senjaliya <
> > jhsonl...@gmail.com
> > > >
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Yes, we are still waiting for more speakers.
> > > >>>>>
> > > >>>>> can anybody from Airbnb present ?
> > > >>>>>
> > > >>>>> anybody else ?
> > > >>>>>
> > > >>>>>
> > > >>>>> - Jayesh
> > > >>>

Re: Airflow Meetup 1Q17 Talk Videos

2017-02-06 Thread siddharth anand
Community members,
I'd encourage you to stream your meetups as well since we do have many
remote members in the community that may want to attend real-time.

In some cases, where there are local committers/contributors, we can offer
office hours to promote great in-person attendance.
-s

On Mon, Feb 6, 2017 at 2:27 PM, siddharth anand  wrote:

> Thx George.
>
> <http://goog_936078273>
> https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links
>
> I've added it to the page above and created a section for all Meetup
> Videos.
>
> Community members,
> For future reference, even if you don't plan to stream, I'd recommend
> recording your meetups so they can live on here.
> -s
>
> On Mon, Feb 6, 2017 at 12:53 PM, George Leslie-Waksman <
> geo...@cloverhealth.com.invalid> wrote:
>
>> Video of the meetup talks and subsequent Q&As is now on YouTube:
>> https://www.youtube.com/watch?v=P0GYZXR0YP4
>>
>
>


Re: Airflow 1.8.0 Release Candidate 1

2017-02-06 Thread siddharth anand
  Table "public.dag_stats"

 Column |  Type  | Modifiers

++---

 dag_id | character varying(250) | not null

 state  | character varying(50)  | not null

 count  | integer| not null

 dirty  | boolean| not null

Indexes:

"dag_stats_pkey" PRIMARY KEY, btree (dag_id, state)


The PKEY is a combination of 2 provided columns, so I'm wondering why
Alembic is complaining here.

On Mon, Feb 6, 2017 at 4:24 PM, siddharth anand  wrote:

> Actually, I see the error is further down..
>
>   File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py",
> line 469, in do_execute
>
> cursor.execute(statement, parameters)
>
> sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value in
> column "dag_id" violates not-null constraint
>
> DETAIL:  Failing row contains (null, running, 1, f).
>
>  [SQL: 'INSERT INTO dag_stats (state, count, dirty) VALUES (%(state)s,
> %(count)s, %(dirty)s)'] [parameters: {'count': 1L, 'state': u'running',
> 'dirty': False}]
>
> It looks like an autoincrement is missing for this table.
>
>
> I'm running `SQLAlchemy==1.1.4` - I see our setup.py specifies any
> version greater than 0.9.8
>
> -s
>
>
>
> On Mon, Feb 6, 2017 at 4:11 PM, siddharth anand  wrote:
>
>> I tried upgrading to 1.8.0rc1 from 1.7.1.3 via pip install
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/air
>> flow-1.8.0rc1+apache.incubating.tar.gz and then running airflow
>> upgradedb didn't quite work. First, I thought it completed successfully,
>> then saw errors some tables were indeed missing. I ran it again and
>> encountered the following exception :
>>
>> DB: postgresql://app_coust...@db-cousteau.ep.stage.agari.com:5432/airflow
>>
>> [2017-02-07 00:03:20,309] {db.py:284} INFO - Creating tables
>>
>> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
>>
>> INFO  [alembic.runtime.migration] Will assume transactional DDL.
>>
>> INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 ->
>> 211e584da130, add TI state index
>>
>> INFO  [alembic.runtime.migration] Running upgrade 211e584da130 ->
>> 64de9cddf6c9, add task fails journal table
>>
>> INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 ->
>> f2ca10b85618, add dag_stats table
>>
>> INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 ->
>> 4addfa1236f1, Add fractional seconds to mysql tables
>>
>> INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 ->
>> 8504051e801b, xcom dag task indices
>>
>> INFO  [alembic.runtime.migration] Running upgrade 8504051e801b ->
>> 5e7d17757c7a, add pid field to TaskInstance
>>
>> INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a ->
>> 127d2bf2dfa7, Add dag_id/state index on dag_run table
>>
>> /usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/crud.py:692:
>> SAWarning: Column 'dag_stats.dag_id' is marked as a member of the primary
>> key for table 'dag_stats', but has no Python-side or server-side default
>> generator indicated, nor does it indicate 'autoincrement=True' or
>> 'nullable=True', and no explicit value is passed.  Primary key columns
>> typically may not store NULL. Note that as of SQLAlchemy 1.1,
>> 'autoincrement=True' must be indicated explicitly for composite (e.g.
>> multicolumn) primary keys if AUTO_INCREMENT/SERIAL/IDENTITY behavior is
>> expected for one of the columns in the primary key. CREATE TABLE statements
>> are impacted by this change as well on most backends.
>>
>
>


Re: Airflow 1.8.0 Release Candidate 1

2017-02-06 Thread siddharth anand
Actually, I see the error is further down..

  File
"/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line
469, in do_execute

cursor.execute(statement, parameters)

sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value in
column "dag_id" violates not-null constraint

DETAIL:  Failing row contains (null, running, 1, f).

 [SQL: 'INSERT INTO dag_stats (state, count, dirty) VALUES (%(state)s,
%(count)s, %(dirty)s)'] [parameters: {'count': 1L, 'state': u'running',
'dirty': False}]

It looks like an autoincrement is missing for this table.


I'm running `SQLAlchemy==1.1.4` - I see our setup.py specifies any version
greater than 0.9.8

-s



On Mon, Feb 6, 2017 at 4:11 PM, siddharth anand  wrote:

> I tried upgrading to 1.8.0rc1 from 1.7.1.3 via pip install
> https://dist.apache.org/repos/dist/dev/incubator/airflow/
> airflow-1.8.0rc1+apache.incubating.tar.gz and then running airflow
> upgradedb didn't quite work. First, I thought it completed successfully,
> then saw errors some tables were indeed missing. I ran it again and
> encountered the following exception :
>
> DB: postgresql://app_coust...@db-cousteau.ep.stage.agari.com:5432/airflow
>
> [2017-02-07 00:03:20,309] {db.py:284} INFO - Creating tables
>
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
>
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
>
> INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 ->
> 211e584da130, add TI state index
>
> INFO  [alembic.runtime.migration] Running upgrade 211e584da130 ->
> 64de9cddf6c9, add task fails journal table
>
> INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 ->
> f2ca10b85618, add dag_stats table
>
> INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 ->
> 4addfa1236f1, Add fractional seconds to mysql tables
>
> INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 ->
> 8504051e801b, xcom dag task indices
>
> INFO  [alembic.runtime.migration] Running upgrade 8504051e801b ->
> 5e7d17757c7a, add pid field to TaskInstance
>
> INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a ->
> 127d2bf2dfa7, Add dag_id/state index on dag_run table
>
> /usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/crud.py:692:
> SAWarning: Column 'dag_stats.dag_id' is marked as a member of the primary
> key for table 'dag_stats', but has no Python-side or server-side default
> generator indicated, nor does it indicate 'autoincrement=True' or
> 'nullable=True', and no explicit value is passed.  Primary key columns
> typically may not store NULL. Note that as of SQLAlchemy 1.1,
> 'autoincrement=True' must be indicated explicitly for composite (e.g.
> multicolumn) primary keys if AUTO_INCREMENT/SERIAL/IDENTITY behavior is
> expected for one of the columns in the primary key. CREATE TABLE statements
> are impacted by this change as well on most backends.
>


Re: Airflow 1.8.0 Release Candidate 1

2017-02-06 Thread siddharth anand
I tried upgrading to 1.8.0rc1 from 1.7.1.3 via pip install
https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc1+apache.incubating.tar.gz
and
then running airflow upgradedb didn't quite work. First, I thought it
completed successfully, then saw errors some tables were indeed missing. I
ran it again and encountered the following exception :

DB: postgresql://app_coust...@db-cousteau.ep.stage.agari.com:5432/airflow

[2017-02-07 00:03:20,309] {db.py:284} INFO - Creating tables

INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.

INFO  [alembic.runtime.migration] Will assume transactional DDL.

INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 ->
211e584da130, add TI state index

INFO  [alembic.runtime.migration] Running upgrade 211e584da130 ->
64de9cddf6c9, add task fails journal table

INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 ->
f2ca10b85618, add dag_stats table

INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 ->
4addfa1236f1, Add fractional seconds to mysql tables

INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 ->
8504051e801b, xcom dag task indices

INFO  [alembic.runtime.migration] Running upgrade 8504051e801b ->
5e7d17757c7a, add pid field to TaskInstance

INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a ->
127d2bf2dfa7, Add dag_id/state index on dag_run table

/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/crud.py:692:
SAWarning: Column 'dag_stats.dag_id' is marked as a member of the primary
key for table 'dag_stats', but has no Python-side or server-side default
generator indicated, nor does it indicate 'autoincrement=True' or
'nullable=True', and no explicit value is passed.  Primary key columns
typically may not store NULL. Note that as of SQLAlchemy 1.1,
'autoincrement=True' must be indicated explicitly for composite (e.g.
multicolumn) primary keys if AUTO_INCREMENT/SERIAL/IDENTITY behavior is
expected for one of the columns in the primary key. CREATE TABLE statements
are impacted by this change as well on most backends.


Re: Airflow Meetup 1Q17 Talk Videos

2017-02-06 Thread siddharth anand
Thx George.


https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links

I've added it to the page above and created a section for all Meetup
Videos.

Community members,
For future reference, even if you don't plan to stream, I'd recommend
recording your meetups so they can live on here.
-s

On Mon, Feb 6, 2017 at 12:53 PM, George Leslie-Waksman <
geo...@cloverhealth.com.invalid> wrote:

> Video of the meetup talks and subsequent Q&As is now on YouTube:
> https://www.youtube.com/watch?v=P0GYZXR0YP4
>


Re: NYC Airflow Meetup

2017-02-03 Thread siddharth anand
Great!
Thanks for creating it - I've just joined so you can add me as an
organizer.

I've linked to it on :

   - https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements
   - https://twitter.com/ApacheAirflow/status/827743162789605382
   - https://cwiki.apache.org/confluence/display/AIRFLOW/Meetups
   -


-s

On Fri, Feb 3, 2017 at 11:01 AM, Joseph Napolitano <
joseph.napolit...@blueapron.com.invalid> wrote:

> Hi all,
>
> I want to thank everyone for attending NYC's first Airflow meetup at Blue
> Apron.  It was a huge success and we're glad to have met everyone.
>
> As suggested, we decided to create an official NYC Meetup page, sponsored
> by Blue Apron.  We'll add Sid and Max as Organizers.  Let us know if you
> want to help organize.
>
> https://www.meetup.com/NYC-Apache-Airflow-incubating-Meetup/
>
> I planned on taking video of the presentations, but it completely slipped
> my mind!  I'll upload my slides to Slideshare and provide a small writeup
> to complement them.
>
> We're committed to Airflow at Blue Apron and we love the project.  Now that
> our infrastructure is taking shape, we'll have time to contribute back to
> the project.  We have top-down support at Blue Apron to dedicate company
> time for it.
>
> Feel free to connect anytime!
> https://www.linkedin.com/in/joenap
>
> Thanks again,
> *Joe Napolitano *| Sr. Data Engineer
> www.blueapron.com | 5 Crosby Street, New York, NY 10013
>


Re: Airflow Meetup in NYC @ Blue Apron

2017-02-02 Thread siddharth anand
Hope this went well. Feel free to share videos and slides. Also, it would
be great if we could create a NY Apache Airflow meetup page. Would you be
interested in setting one up? It would be easier to promote a meetup page
on social media than an email on this list.

-s

On Fri, Jan 20, 2017 at 10:37 AM, Joseph Napolitano <
joseph.napolit...@blueapron.com.invalid> wrote:

> Hi all!
>
> I want to officially announce a Meetup for Airflow in NYC!  I'm looking
> forward to meeting other community members to share knowledge and network.
>
> We may create an official Meetup page, but in the meantime please signup
> here:
> https://docs.google.com/spreadsheets/d/1WmfgZeExSVdLf-
> u1uh3IleeHy8QTwaJ4BkkSkVM-X1E/edit?usp=sharing
>
> I have a confirmed date of February 1st @ 6:30 at Blue Apron's
> headquarters.
>
> In Summary:
> Date: Feb 1st
> Time 6:30 - 9pm EST
> Location: 40 W 23rd St. New York, NY 10010
> https://www.google.com/maps/place/40+W+23rd+St,+New+York,+
> NY+10010/@40.7420885,-73.9938457,17z/data=!3m1!4b1!4m5!
> 3m4!1s0x89c259a46471d2a1:0xc2517d92b1b68bba!8m2!3d40.
> 7420845!4d-73.9916517?hl=en
>
> We're on the 5th floor.  You need to check in with security in the building
> lobby, and again when you reach the fifth floor to get a name tag.
>
> Food & drink will be provided!
>
> Let me know if you would like to present.  We'd love to hear about your
> architecture and war stories.  We will have a large projector and PA system
> setup.
>
> Sorry about the short notice, but it took a while to get approved over the
> holidays and new year.  If we can't generate enough interest we can
> certainly push it back a month.
>
> Thanks, and Bon Appétite!
>
> --
> *Joe Napolitano *| Sr. Data Engineer
> www.blueapron.com | 5 Crosby Street, New York, NY 10013
>


Re: Airflow Meetup @ Paypal (San Jose)

2017-02-02 Thread siddharth anand
Cool! I've tweeted it out using the ApacheAirflow account and also added it
to https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements

FYI,
I was mistaken about the drive between Strata and PayPal. I had used the
wrong venue. Strata is at the SJ Convention Center this year.  It's still
very close.. 5 miles (12 minutes - reverse commute).

https://goo.gl/maps/nwxmkYsNFKQ2

BTW, I heard Bolke may be attending ;-) Bolke, would you like to speak at
the Meetup?

Jakob (other committers), will be down here for Strata?

-s

On Thu, Feb 2, 2017 at 5:02 PM, Jayesh Senjaliya 
wrote:

> Sure,
> I have created event on Meetup :
> https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/
> 237412864/
>
> Thanks for helping on this Siddharth.
> Jayesh
>
>
>
> On Wed, Feb 1, 2017 at 7:50 PM, siddharth anand  wrote:
>
> > IMHO, I'd publish the meet-up. You still have 6 weeks to find a 3rd
> > speaker. If Bolke and Alex are traveling all the way for Strata, perhaps
> > one of them can speak :-)
> >
> > -s
> >
> > On Wed, Feb 1, 2017 at 1:48 PM, Russell Jurney  >
> > wrote:
> >
> > > Maybe start a new thread with a title "Call for Speakers for Meetup on
> > Mar
> > > 14" ?
> > >
> > > On Wed, Feb 1, 2017 at 11:59 AM Jayesh Senjaliya 
> > > wrote:
> > >
> > > > Yes, we are still waiting for more speakers.
> > > >
> > > > can anybody from Airbnb present ?
> > > >
> > > > anybody else ?
> > > >
> > > >
> > > > - Jayesh
> > > >
> > > > On Tue, Jan 31, 2017 at 8:16 PM, siddharth anand 
> > > > wrote:
> > > >
> > > > > Jayesh,
> > > > > Looks good. No need to vote. Just publish a new event with details
> on
> > > the
> > > > > meet-up page:
> > > > > https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/
> > > > >
> > > > > Please add a short abstract as well for the talks and find a 3rd
> > > speaker.
> > > > > Please be sure to record the meet-up so that we can publish it.
> Once
> > > the
> > > > > meet-up event is up, please respond to this email! We can help
> > promote
> > > > it.
> > > > > I suggest picking a start time after the Strata talks end but not
> > super
> > > > > late either.
> > > > >
> > > > > -s
> > > > >
> > > > > On Tue, Jan 31, 2017 at 9:19 AM, Jayesh Senjaliya <
> > jhsonl...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > HI All,
> > > > > >
> > > > > > √ I have approval from Paypal to host Airflow meetup.  How about
> > > March
> > > > > 14th
> > > > > > ? Please vote.
> > > > > >
> > > > > > √ we will have food and drinks.
> > > > > > Please let me know if anybody has any special request, I will try
> > to
> > > > > > accommodate :)
> > > > > >
> > > > > > For presentations:
> > > > > >  1) Disk recommission using airflow with overall automation of
> > > "Hadoop
> > > > > Node
> > > > > > and Disk Remediation". - Jayesh Senjaliya ( Paypal )
> > > > > >  2) Predictive Analytics with Airflow and PySpark - ( Russell
> > Jurney
> > > )
> > > > > >
> > > > > >
> > > > > > Please send request to present to this email thread if you are
> > > > interested
> > > > > > in presenting.
> > > > > >
> > > > > > Thanks
> > > > > > Jayesh
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 26, 2017 at 4:08 PM, Russell Jurney <
> > > > > russell.jur...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Cool!
> > > > > > >
> > > > > > > On Wed, Jan 25, 2017 at 11:23 PM Jayesh Senjaliya <
> > > > jhsonl...@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Russell,
> > > > > > > >
> > > > > > > > yes, I 

Re: Airflow Meetup @ Paypal (San Jose)

2017-02-01 Thread siddharth anand
IMHO, I'd publish the meet-up. You still have 6 weeks to find a 3rd
speaker. If Bolke and Alex are traveling all the way for Strata, perhaps
one of them can speak :-)

-s

On Wed, Feb 1, 2017 at 1:48 PM, Russell Jurney 
wrote:

> Maybe start a new thread with a title "Call for Speakers for Meetup on Mar
> 14" ?
>
> On Wed, Feb 1, 2017 at 11:59 AM Jayesh Senjaliya 
> wrote:
>
> > Yes, we are still waiting for more speakers.
> >
> > can anybody from Airbnb present ?
> >
> > anybody else ?
> >
> >
> > - Jayesh
> >
> > On Tue, Jan 31, 2017 at 8:16 PM, siddharth anand 
> > wrote:
> >
> > > Jayesh,
> > > Looks good. No need to vote. Just publish a new event with details on
> the
> > > meet-up page:
> > > https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/
> > >
> > > Please add a short abstract as well for the talks and find a 3rd
> speaker.
> > > Please be sure to record the meet-up so that we can publish it. Once
> the
> > > meet-up event is up, please respond to this email! We can help promote
> > it.
> > > I suggest picking a start time after the Strata talks end but not super
> > > late either.
> > >
> > > -s
> > >
> > > On Tue, Jan 31, 2017 at 9:19 AM, Jayesh Senjaliya  >
> > > wrote:
> > >
> > > > HI All,
> > > >
> > > > √ I have approval from Paypal to host Airflow meetup.  How about
> March
> > > 14th
> > > > ? Please vote.
> > > >
> > > > √ we will have food and drinks.
> > > > Please let me know if anybody has any special request, I will try to
> > > > accommodate :)
> > > >
> > > > For presentations:
> > > >  1) Disk recommission using airflow with overall automation of
> "Hadoop
> > > Node
> > > > and Disk Remediation". - Jayesh Senjaliya ( Paypal )
> > > >  2) Predictive Analytics with Airflow and PySpark - ( Russell Jurney
> )
> > > >
> > > >
> > > > Please send request to present to this email thread if you are
> > interested
> > > > in presenting.
> > > >
> > > > Thanks
> > > > Jayesh
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Jan 26, 2017 at 4:08 PM, Russell Jurney <
> > > russell.jur...@gmail.com>
> > > > wrote:
> > > >
> > > > > Cool!
> > > > >
> > > > > On Wed, Jan 25, 2017 at 11:23 PM Jayesh Senjaliya <
> > jhsonl...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Russell,
> > > > > >
> > > > > > yes, I will be presenting from Paypal side.
> > > > > > Once i have official approval from Paypal, I will sent out email.
> > > > > > I am basically going by the steps what Siddharth outlined earlier
> > in
> > > > the
> > > > > > thread.
> > > > > >
> > > > > > Thanks
> > > > > > Jayesh
> > > > > >
> > > > > > On Wed, Jan 25, 2017 at 7:50 PM, Russell Jurney <
> > > > > russell.jur...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Is someone from Paypal likely to speak? Should we start a new
> > > thread
> > > > > > with a
> > > > > > > call for another speaker? There was mention of three being
> > needed.
> > > > > > >
> > > > > > > On Wed, Jan 25, 2017 at 5:33 PM Jayesh Senjaliya <
> > > > jhsonl...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Yes I am waiting for response from facilities about it, most
> > > likely
> > > > > by
> > > > > > > > early next week.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Jayesh
> > > > > > > >
> > > > > > > > On Wed, Jan 25, 2017 at 4:52 PM, Russell Jurney <
> > > > > > > russell.jur...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Boris, would you be able to attend an evening meetup on the
> > > &

Re: Airflow Meetup in NYC @ Blue Apron

2017-02-01 Thread siddharth anand
Also, if you record a video, we'd be happy to place it on the wiki and
promote it via our twitter feed, etc...
-s

On Mon, Jan 30, 2017 at 5:34 PM, Boris Tyukin  wrote:

> i hope you guys can share presentation slides at least for all of us who
> are not in NYC
>
> On Mon, Jan 30, 2017 at 7:33 PM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > I'd love to watch, is there any way you guys can livecast or share a
> video
> > after the event?
> >
> > Looking forward to it!
> >
> > Max
> >
> > On Mon, Jan 30, 2017 at 1:56 PM, Joseph Napolitano <
> > joseph.napolit...@blueapron.com.invalid> wrote:
> >
> > > Hi All!
> > >
> > > We are excited to host an Airflow Meetup in NYC.  We will have a guest
> > > speaker from Spotify!
> > >
> > > The Meetup is in 2 days, on Feb 1st @ 6:30pm at Blue Apron's
> > headquarters.
> > >
> > > In Summary:
> > > Date: Feb 1st
> > > Time 6:30 - 9pm EST
> > > Location: 40 W 23rd St. New York, NY 10010
> > > https://www.google.com/maps/place/40+W+23rd+St,+New+York,+NY
> > > +10010/@40.7420885,-73.9938457,17z/data=!3m1!4b1!4m5!3m4!
> > > 1s0x89c259a46471d2a1:0xc2517d92b1b68bba!8m2!3d40.
> > > 7420845!4d-73.9916517?hl=en
> > >
> > > Schedule:
> > > 6:30 - 7:15 Meet and greet
> > > 7:15 - ? Presentations from Blue Apron and Spotify
> > >
> > > It's not too late to signup for a presentation.  We will stick around
> as
> > > late as 9pm.
> > >
> > > We don't have an official Meetup page, so please sign up here :)
> > > The signup sheet is available here:
> > > https://docs.google.com/spreadsheets/d/1WmfgZeExSVdLf-u1uh3I
> > > leeHy8QTwaJ4BkkSkVM-X1E/edit?usp=sharing
> > >
> > > Feel free to share the signup sheet with other parties.
> > >
> > > As mentioned, we're on the 5th floor.  You need to check in with
> security
> > > in the building lobby, and again when you reach the fifth floor to get
> a
> > > name tag.
> > >
> > > Thanks, and looking forward to meeting everyone!
> > >
> > > Cheers,
> > > Joe Nap
> > >
> > >
> > >
> > > On Fri, Jan 20, 2017 at 1:37 PM, Joseph Napolitano <
> > > joseph.napolit...@blueapron.com> wrote:
> > >
> > > > Hi all!
> > > >
> > > > I want to officially announce a Meetup for Airflow in NYC!  I'm
> looking
> > > > forward to meeting other community members to share knowledge and
> > > network.
> > > >
> > > > We may create an official Meetup page, but in the meantime please
> > signup
> > > > here:
> > > > https://docs.google.com/spreadsheets/d/1WmfgZeExSVdLf-u1uh3I
> > > > leeHy8QTwaJ4BkkSkVM-X1E/edit?usp=sharing
> > > >
> > > > I have a confirmed date of February 1st @ 6:30 at Blue Apron's
> > > > headquarters.
> > > >
> > > > In Summary:
> > > > Date: Feb 1st
> > > > Time 6:30 - 9pm EST
> > > > Location: 40 W 23rd St. New York, NY 10010
> > > > https://www.google.com/maps/place/40+W+23rd+St,+New+York,+NY
> > > > +10010/@40.7420885,-73.9938457,17z/data=!3m1!4b1!4m5!3m4!
> > > > 1s0x89c259a46471d2a1:0xc2517d92b1b68bba!8m2!3d40.7420845!4d-
> > > > 73.9916517?hl=en
> > > >
> > > > We're on the 5th floor.  You need to check in with security in the
> > > > building lobby, and again when you reach the fifth floor to get a
> name
> > > tag.
> > > >
> > > > Food & drink will be provided!
> > > >
> > > > Let me know if you would like to present.  We'd love to hear about
> your
> > > > architecture and war stories.  We will have a large projector and PA
> > > system
> > > > setup.
> > > >
> > > > Sorry about the short notice, but it took a while to get approved
> over
> > > the
> > > > holidays and new year.  If we can't generate enough interest we can
> > > > certainly push it back a month.
> > > >
> > > > Thanks, and Bon Appétite!
> > > >
> > > > --
> > > > *Joe Napolitano *| Sr. Data Engineer
> > > > www.blueapron.com | 5 Crosby Street, New York, NY 10013
> > > >
> > >
> > >
> > >
> > > --
> > > *Joe Napolitano *| Sr. Data Engineer
> > > www.blueapron.com | 5 Crosby Street, New York, NY 10013
> > >
> >
>


Re: Airflow Meetup @ Paypal (San Jose)

2017-01-31 Thread siddharth anand
Jayesh,
Looks good. No need to vote. Just publish a new event with details on the
meet-up page:
https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/

Please add a short abstract as well for the talks and find a 3rd speaker.
Please be sure to record the meet-up so that we can publish it. Once the
meet-up event is up, please respond to this email! We can help promote it.
I suggest picking a start time after the Strata talks end but not super
late either.

-s

On Tue, Jan 31, 2017 at 9:19 AM, Jayesh Senjaliya 
wrote:

> HI All,
>
> √ I have approval from Paypal to host Airflow meetup.  How about March 14th
> ? Please vote.
>
> √ we will have food and drinks.
> Please let me know if anybody has any special request, I will try to
> accommodate :)
>
> For presentations:
>  1) Disk recommission using airflow with overall automation of "Hadoop Node
> and Disk Remediation". - Jayesh Senjaliya ( Paypal )
>  2) Predictive Analytics with Airflow and PySpark - ( Russell Jurney )
>
>
> Please send request to present to this email thread if you are interested
> in presenting.
>
> Thanks
> Jayesh
>
>
>
>
> On Thu, Jan 26, 2017 at 4:08 PM, Russell Jurney 
> wrote:
>
> > Cool!
> >
> > On Wed, Jan 25, 2017 at 11:23 PM Jayesh Senjaliya 
> > wrote:
> >
> > > Hi Russell,
> > >
> > > yes, I will be presenting from Paypal side.
> > > Once i have official approval from Paypal, I will sent out email.
> > > I am basically going by the steps what Siddharth outlined earlier in
> the
> > > thread.
> > >
> > > Thanks
> > > Jayesh
> > >
> > > On Wed, Jan 25, 2017 at 7:50 PM, Russell Jurney <
> > russell.jur...@gmail.com>
> > > wrote:
> > >
> > > > Is someone from Paypal likely to speak? Should we start a new thread
> > > with a
> > > > call for another speaker? There was mention of three being needed.
> > > >
> > > > On Wed, Jan 25, 2017 at 5:33 PM Jayesh Senjaliya <
> jhsonl...@gmail.com>
> > > > wrote:
> > > >
> > > > > Yes I am waiting for response from facilities about it, most likely
> > by
> > > > > early next week.
> > > > >
> > > > > Thanks
> > > > > Jayesh
> > > > >
> > > > > On Wed, Jan 25, 2017 at 4:52 PM, Russell Jurney <
> > > > russell.jur...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Boris, would you be able to attend an evening meetup on the
> nights
> > of
> > > > > 3/15
> > > > > > or 3/16? I think attendance would be better on one of those days,
> > as
> > > > many
> > > > > > people don't attend the tutorial days.
> > > > > >
> > > > > > Paypal sounds awesome as a venue. Would they handle food and
> drink
> > as
> > > > > well?
> > > > > >
> > > > > > On Wed, Jan 25, 2017 at 11:28 AM, Boris Tyukin <
> > > bo...@boristyukin.com>
> > > > > > wrote:
> > > > > >
> > > > > > > it would be great!
> > > > > > >
> > > > > > > On Wed, Jan 25, 2017 at 1:26 PM, siddharth anand <
> > > san...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Paypal is quite close (11 minute drive on local streets per
> > > google
> > > > > > Maps :
> > > > > > > > https://goo.gl/maps/otUpve9StxJ2) to the Strata venue, so it
> > > would
> > > > > > make
> > > > > > > > sense to hold the meet-up at Paypal during Strata week.
> > > > > > > >
> > > > > > > > -s
> > > > > > > >
> > > > > > > > On Wed, Jan 25, 2017 at 5:48 AM, Boris Tyukin <
> > > > bo...@boristyukin.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > any way to schedule it during Strata week? would love to
> > attend
> > > > one
> > > > > > of
> > > > > > > > > airflow meetups but I am in Florida. 03/13 or 03/14 would
> > work
> > > > the
> > > > > > best
> > > > > > > > > because first two days of Strata are training days and 

Re: Airflow Meetup @ Paypal (San Jose)

2017-01-25 Thread siddharth anand
Paypal is quite close (11 minute drive on local streets per google Maps :
https://goo.gl/maps/otUpve9StxJ2) to the Strata venue, so it would make
sense to hold the meet-up at Paypal during Strata week.

-s

On Wed, Jan 25, 2017 at 5:48 AM, Boris Tyukin  wrote:

> any way to schedule it during Strata week? would love to attend one of
> airflow meetups but I am in Florida. 03/13 or 03/14 would work the best
> because first two days of Strata are training days and not very busy
>
> On Tue, Jan 24, 2017 at 10:33 PM, Russell Jurney  >
> wrote:
>
> > Unfortunately, Strata has no room for us :( Paypal sounds like a great
> > option.
> >
> > Jayesh, sounds like you're driving? :)
> >
> > On Tue, Jan 24, 2017 at 12:04 PM, siddharth anand 
> > wrote:
> >
> > > Russell,
> > > Let us know what you learn about Strata.
> > >
> > > Even if Strata offers up rooms to communities for free (based on
> > > information such as community size, etc...), I'm doubtful they would
> > cover
> > > food and drinks. That cost would need to be carried by a sponsor --
> i.e.
> > > you'd need to find a sponsor for it. We considered something similar
> for
> > > QCon -- however, our venue costs were fairly high so the catering cost
> > for
> > > most budding communities and their sponsors were a turn-off. Given that
> > > Strata is a large conference hosted at a largish (i.e. expensive)
> hotel,
> > > I'd expect some of the same cost issues, unless Strata co-sponsored it.
> > >
> > > I'm all for something at Strata, but just wanted to share my $0.02.
> Since
> > > this topic came up on Jayesh's thread, I'd like to time-bound it. If
> you
> > > don't hear back by say Friday with specifics from Strata, I'd say that
> > > Jayesh's wins by first-mover privilege.
> > >
> > > Jayesh,
> > > If we don't hear from Strata by Friday, I'd say we continue with your
> > idea.
> > > I've already promoted your user to Event Organizer on
> > > https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/
> > >
> > > You'd need to follow the steps below:
> > >
> > >- Get approval from Paypal to host it
> > >- Ping this list for 2 more speakers - I'd imagine someone from
> PayPal
> > >will also speak about PayPal's use of Airflow.
> > >- Create the meet-up event (ideally once you have all 3 speakers)
> > >- Update this list with a link to this event (and ping me if I don't
> > see
> > >it) -- I'll then promote it on our twitter channel, etc...
> > >
> > > -s
> > >
> > > On Mon, Jan 23, 2017 at 4:42 PM, Jayesh Senjaliya  >
> > > wrote:
> > >
> > > > I am actually up for both, Paypal can host after Strata.
> > > >
> > > > waiting for community to comment as well.
> > > >
> > > > Thanks
> > > > Jayesh
> > > >
> > > >
> > > > On Mon, Jan 23, 2017 at 3:45 PM, Russell Jurney <
> > > russell.jur...@gmail.com>
> > > > wrote:
> > > >
> > > > > I reached out and am awaiting to hear if they have space. They did
> > say
> > > > that
> > > > > attendees of meetups in the evening do NOT need to have a Strata
> > pass.
> > > > >
> > > > > I'm new here, so I don't want to hijack your meetup. If you guys
> want
> > > > > Paypal, lets have Paypal host. I'm sure it will be great either
> way.
> > > > >
> > > > > On Fri, Jan 20, 2017 at 1:10 PM, Russell Jurney <
> > > > russell.jur...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I think if we hold it in the evening, there is no requirement to
> > buy
> > > a
> > > > > > ticket to come to the meetup. Let me verify.
> > > > > >
> > > > > > On Fri, Jan 20, 2017 at 12:45 PM, Jayesh Senjaliya <
> > > > jhsonl...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Russell,
> > > > > >>
> > > > > >> Sure, Strata will have its own flavor of visitors, but the
> tickets
> > > are
> > > > > >> kind of expensive too for everybody to join.
> > > > > >>
> > > > > >> 

Re: Airflow Meetup @ Paypal (San Jose)

2017-01-24 Thread siddharth anand
Russell,
Let us know what you learn about Strata.

Even if Strata offers up rooms to communities for free (based on
information such as community size, etc...), I'm doubtful they would cover
food and drinks. That cost would need to be carried by a sponsor -- i.e.
you'd need to find a sponsor for it. We considered something similar for
QCon -- however, our venue costs were fairly high so the catering cost for
most budding communities and their sponsors were a turn-off. Given that
Strata is a large conference hosted at a largish (i.e. expensive) hotel,
I'd expect some of the same cost issues, unless Strata co-sponsored it.

I'm all for something at Strata, but just wanted to share my $0.02. Since
this topic came up on Jayesh's thread, I'd like to time-bound it. If you
don't hear back by say Friday with specifics from Strata, I'd say that
Jayesh's wins by first-mover privilege.

Jayesh,
If we don't hear from Strata by Friday, I'd say we continue with your idea.
I've already promoted your user to Event Organizer on
https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/

You'd need to follow the steps below:

   - Get approval from Paypal to host it
   - Ping this list for 2 more speakers - I'd imagine someone from PayPal
   will also speak about PayPal's use of Airflow.
   - Create the meet-up event (ideally once you have all 3 speakers)
   - Update this list with a link to this event (and ping me if I don't see
   it) -- I'll then promote it on our twitter channel, etc...

-s

On Mon, Jan 23, 2017 at 4:42 PM, Jayesh Senjaliya 
wrote:

> I am actually up for both, Paypal can host after Strata.
>
> waiting for community to comment as well.
>
> Thanks
> Jayesh
>
>
> On Mon, Jan 23, 2017 at 3:45 PM, Russell Jurney 
> wrote:
>
> > I reached out and am awaiting to hear if they have space. They did say
> that
> > attendees of meetups in the evening do NOT need to have a Strata pass.
> >
> > I'm new here, so I don't want to hijack your meetup. If you guys want
> > Paypal, lets have Paypal host. I'm sure it will be great either way.
> >
> > On Fri, Jan 20, 2017 at 1:10 PM, Russell Jurney <
> russell.jur...@gmail.com>
> > wrote:
> >
> > > I think if we hold it in the evening, there is no requirement to buy a
> > > ticket to come to the meetup. Let me verify.
> > >
> > > On Fri, Jan 20, 2017 at 12:45 PM, Jayesh Senjaliya <
> jhsonl...@gmail.com>
> > > wrote:
> > >
> > >> Hi Russell,
> > >>
> > >> Sure, Strata will have its own flavor of visitors, but the tickets are
> > >> kind of expensive too for everybody to join.
> > >>
> > >> I agree on turnouts though, so we can try for Strata first and
> fallback
> > to
> > >> regular
> > >> meetup in March end or even April if we dont get space in Strata.
> > >>
> > >> or we can just do both since there will be different group of people
> at
> > >> both places.
> > >>
> > >> - Jayesh
> > >>
> > >>
> > >> On Fri, Jan 20, 2017 at 12:35 PM, Russell Jurney <
> > >> russell.jur...@gmail.com>
> > >> wrote:
> > >>
> > >> > As I mentioned in the other thread, I am available to speak on
> > >> Predictive
> > >> > Analytics with Airflow and PySpark.
> > >> >
> > >> > Mid march has been suggested. What about the evening of Tuesday,
> 3/14
> > -
> > >> the
> > >> > first day of sessions at Strata? We could promote the meetup with
> the
> > >> > conference, get it listed as an evening event. Alternative day could
> > be
> > >> > Wednesday 3/15, 2nd day of Strata sessions.
> > >> >
> > >> > This brings up the question... should we maybe have the meetup at
> > >> Strata?
> > >> > Just a thought, we might get better turnout if we get a room from
> > >> Strata.
> > >> > I'm sure they would agree. I'm new here; just an idea.
> > >> >
> > >> > Russ
> > >> >
> > >> > On Fri, Jan 20, 2017 at 11:36 AM, Jacky 
> wrote:
> > >> >
> > >> > > Hello Airflow community !
> > >> > >
> > >> > > I am Jayesh from Paypal, and at last meetup we briefly talked
> about
> > >> > > hosting next one and I offered to host at Paypal office in San
> Jose.
> > >> > >
> > >> > > If we can come up with some dates, I can talk to facilities to
> > reserve
> > >> > > space accordingly. so that it dont become short notice for the
> > >> community.
> > >> > >
> > >> > > Any thoughts/comments?
> > >> > >
> > >> > > Thanks
> > >> > > Jayesh
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
> relato.io
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
> > >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
> >
>


QCon London

2017-01-23 Thread siddharth anand
Hi Folks!
I will be attending QCon London Mar 5-8. Happy to meet locals and talk
Airflow and data infrastructure if there is interest. FYI, I'm also a
co-chair for QCon London and would be very interested in getting a deeper
understanding of the local (London and environs) tech scene and potential
speakers for next year's QCon London.

If interested in meeting up, you can directly email me.  I'll be at the
conference most days hosted at http://qeiicentre.london/getting-here/
-s


Re: Medium series: Airflow for Google Cloud

2017-01-20 Thread siddharth anand
Looks like you don't have an account.. once you create one.. let me know
and I will grant you admin perms on the wiki.
-s

On Fri, Jan 20, 2017 at 6:08 PM, siddharth anand  wrote:

> I've added it to https://cwiki.apache.org/confluence/display/AIRFLOW/
> Airflow+Links
>
> Feel free to add future posts to this page. You should have access.
> -s
>
> On Fri, Jan 20, 2017 at 3:23 PM, Alex Van Boxel  wrote:
>
>> Hey all,
>>
>> now that 1.8 is nearing release. I finally started writing about Airflow.
>> As it's me writing, I'll be focussing on the Google Cloud integration.
>>
>> Today's post is about BigQuery
>> https://medium.com/google-cloud/airflow-for-google-cloud-
>> part-1-d7da9a048aa4#.qe6f0gldf
>>
>> Next one will be about DataProc.
>> --
>>   _/
>> _/ Alex Van Boxel
>>
>
>


Re: Medium series: Airflow for Google Cloud

2017-01-20 Thread siddharth anand
I've added it to
https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links

Feel free to add future posts to this page. You should have access.
-s

On Fri, Jan 20, 2017 at 3:23 PM, Alex Van Boxel  wrote:

> Hey all,
>
> now that 1.8 is nearing release. I finally started writing about Airflow.
> As it's me writing, I'll be focussing on the Google Cloud integration.
>
> Today's post is about BigQuery
> https://medium.com/google-cloud/airflow-for-google-
> cloud-part-1-d7da9a048aa4#.qe6f0gldf
>
> Next one will be about DataProc.
> --
>   _/
> _/ Alex Van Boxel
>


Re: Airflow Meetup in NYC @ Blue Apron

2017-01-20 Thread siddharth anand
Great to hear. Yes, please do set up an official meet-up page. You are
welcome to add a few of the committer or other contributors as co-admins of
the meet-up page. (e.g.
https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/ )

Once the meet-up page and the event is up, we can tweet the event and
publish the announcement


-s

On Fri, Jan 20, 2017 at 10:37 AM, Joseph Napolitano <
joseph.napolit...@blueapron.com.invalid> wrote:

> Hi all!
>
> I want to officially announce a Meetup for Airflow in NYC!  I'm looking
> forward to meeting other community members to share knowledge and network.
>
> We may create an official Meetup page, but in the meantime please signup
> here:
> https://docs.google.com/spreadsheets/d/1WmfgZeExSVdLf-
> u1uh3IleeHy8QTwaJ4BkkSkVM-X1E/edit?usp=sharing
>
> I have a confirmed date of February 1st @ 6:30 at Blue Apron's
> headquarters.
>
> In Summary:
> Date: Feb 1st
> Time 6:30 - 9pm EST
> Location: 40 W 23rd St. New York, NY 10010
> https://www.google.com/maps/place/40+W+23rd+St,+New+York,+
> NY+10010/@40.7420885,-73.9938457,17z/data=!3m1!4b1!4m5!
> 3m4!1s0x89c259a46471d2a1:0xc2517d92b1b68bba!8m2!3d40.
> 7420845!4d-73.9916517?hl=en
>
> We're on the 5th floor.  You need to check in with security in the building
> lobby, and again when you reach the fifth floor to get a name tag.
>
> Food & drink will be provided!
>
> Let me know if you would like to present.  We'd love to hear about your
> architecture and war stories.  We will have a large projector and PA system
> setup.
>
> Sorry about the short notice, but it took a while to get approved over the
> holidays and new year.  If we can't generate enough interest we can
> certainly push it back a month.
>
> Thanks, and Bon Appétite!
>
> --
> *Joe Napolitano *| Sr. Data Engineer
> www.blueapron.com | 5 Crosby Street, New York, NY 10013
>


Re: New book covers Airflow with PySpark: Agile Data Science 2.0 (O'Reilly, 2017) AND Airflow Meetup?

2017-01-19 Thread siddharth anand
Mid-March might be a good time given that we had 2 meet-ups recently.

We have a wiki about Airflow meet-ups :
https://cwiki.apache.org/confluence/display/AIRFLOW/Meetups. Feel free to
ask this list if someone would like to host..I'd imagine interest would
primarily come from other members of the community, but we're open to all
ideas. Since the last meet-up was in SF, it would be great if the next one
were in the South Bay.

-s

On Thu, Jan 19, 2017 at 6:46 PM, Russell Jurney 
wrote:

> Siddharth, nice to hear from you. Great to hear!
>
> I'm just starting a consultancy called Data Syndrome around the book, and I
> work from home, which doesn't put me in a great position to personally host
> the meetup. If you need someone to organize it and to seek a venue, I can
> do that. How does that sound? I'm sure I could find someone to host it.
>
> When would be a good date, do you think? Late February?
>
> On Thu, Jan 19, 2017 at 5:19 PM, siddharth anand 
> wrote:
>
> > Sounds like a great idea. We are looking for someone to host the next
> one..
> > once one is announced, you can sign up as a speaker.. You are also
> welcome
> > to host a meet-up if you like.
> > -s
> >
> > On Thu, Jan 19, 2017 at 4:39 PM, Russell Jurney <
> russell.jur...@gmail.com>
> > wrote:
> >
> > > Hello! My name is Russell Jurney. I am a relatively new Airflow user
> and
> > > just joined the group. I am an Azkaban refugee, and an enemy of Oozie
> and
> > > the tyranny of XML.
> > >
> > > I wanted to tell you about my new book, out in pre-release, called
> Agile
> > > Data Science 2.0 <http://bit.ly/agile_data_science> (O'Reilly 2017).
> In
> > > the
> > > book, we use Airflow in chapter 2, Setup, in a way similar to the
> Airflow
> > > tutorial. Then, in chapter 8, Deploying Predictive Systems, we use
> > Airflow
> > > to deploy a predictive system built with PySpark and Spark MLlib.
> > >
> > > Some highlights in the code at http://github.com/rjurney/
> > Agile_Data_Code_2
> > > :
> > >
> > >- ch02/airflow_test.py
> > ><https://github.com/rjurney/Agile_Data_Code_2/blob/master/
> > > ch02/airflow_test.py>
> > > is
> > >a complete Airflow/PySpark tutorial along with
> > ch02/pyspark_task_one.py
> > ><https://github.com/rjurney/Agile_Data_Code_2/blob/master/
> > > ch02/pyspark_task_one.py>
> > > and
> > >ch02/pyspark_task_two.py
> > ><https://github.com/rjurney/Agile_Data_Code_2/blob/master/
> > > ch02/pyspark_task_two.py>
> > >- The airflow setup for chapter 8 is at ch08/airflow/setup.py
> > ><https://github.com/rjurney/Agile_Data_Code_2/blob/master/
> > > ch08/airflow/setup.py>
> > >.
> > >- The scripts that it operates on are in ch08/
> > ><https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch08> and
> > > show
> > >things like how to use '{{ ds }}' and other parameters to hook your
> > > scripts
> > >into 'airflow backfill' and other features.
> > >- ch08/make_predictions.py
> > ><https://github.com/rjurney/Agile_Data_Code_2/blob/master/
> > > ch08/make_predictions.py>
> > > shows
> > >how to setup a PySpark environment in a script in a way that can
> work
> > > with
> > >Airflow.
> > >
> > > If there is any interest, I would love to present on something like
> > > "Building Predictive Systems with Spark and Airflow" at an upcoming
> > Airflow
> > > meetup.
> > >
> > > Thanks!
> > > --
> > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
> > >
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
>


Re: New book covers Airflow with PySpark: Agile Data Science 2.0 (O'Reilly, 2017) AND Airflow Meetup?

2017-01-19 Thread siddharth anand
Sounds like a great idea. We are looking for someone to host the next one..
once one is announced, you can sign up as a speaker.. You are also welcome
to host a meet-up if you like.
-s

On Thu, Jan 19, 2017 at 4:39 PM, Russell Jurney 
wrote:

> Hello! My name is Russell Jurney. I am a relatively new Airflow user and
> just joined the group. I am an Azkaban refugee, and an enemy of Oozie and
> the tyranny of XML.
>
> I wanted to tell you about my new book, out in pre-release, called Agile
> Data Science 2.0  (O'Reilly 2017). In
> the
> book, we use Airflow in chapter 2, Setup, in a way similar to the Airflow
> tutorial. Then, in chapter 8, Deploying Predictive Systems, we use Airflow
> to deploy a predictive system built with PySpark and Spark MLlib.
>
> Some highlights in the code at http://github.com/rjurney/Agile_Data_Code_2
> :
>
>- ch02/airflow_test.py
> ch02/airflow_test.py>
> is
>a complete Airflow/PySpark tutorial along with ch02/pyspark_task_one.py
> ch02/pyspark_task_one.py>
> and
>ch02/pyspark_task_two.py
> ch02/pyspark_task_two.py>
>- The airflow setup for chapter 8 is at ch08/airflow/setup.py
> ch08/airflow/setup.py>
>.
>- The scripts that it operates on are in ch08/
> and
> show
>things like how to use '{{ ds }}' and other parameters to hook your
> scripts
>into 'airflow backfill' and other features.
>- ch08/make_predictions.py
> ch08/make_predictions.py>
> shows
>how to setup a PySpark environment in a script in a way that can work
> with
>Airflow.
>
> If there is any interest, I would love to present on something like
> "Building Predictive Systems with Spark and Airflow" at an upcoming Airflow
> meetup.
>
> Thanks!
> --
> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
>


Re: Subsequent Airflow Meetup: 2017/01/11

2017-01-04 Thread siddharth anand
Great. I've updated the meet-up with the new speakers and talks.

-s

On Wed, Jan 4, 2017 at 3:57 PM, Kevin Mandich 
wrote:

> Hi George:
>
> Title: Offline Model Building with Airflow
> Description: Airflow is an important part of our machine learning pipeline
> at Agari. We'll describe a few DAGs we use as part of our effort to solve
> the needle-in-a-haystack problem of detecting targeted email attacks,
> a.k.a. spearphishing.
>
> Thanks!
>
> Kevin
>
> On Wed, Jan 4, 2017 at 3:10 PM, Dan Davydov  invalid>
> wrote:
>
> > Title: Operations & Support for Airflow
> > Brief Description: Several ideas for how to help catch and debug
> > operational issues with Airflow, as well as how to effectively deal with
> > common user issues.
> >
> > On Wed, Jan 4, 2017 at 10:32 AM, George Leslie-Waksman <
> > geo...@cloverhealth.com.invalid> wrote:
> >
> > > Kevin, Dan, do you have titles and (maybe) a brief paragraph for the
> > meetup
> > > description, or should I just make something from the descriptions
> > earlier
> > > in this thread?
> > >
> > > --George
> > >
> > > On Tue, Jan 3, 2017 at 4:00 PM Kevin Mandich 
> > > wrote:
> > >
> > > > Hi George,
> > > >
> > > > Confirmed - would like give a talk. Thanks,
> > > >
> > > > Kevin Mandich
> > > >
> > > > On Tue, Jan 3, 2017 at 5:40 AM, Dan Davydov  > > > .invalid>
> > > > wrote:
> > > >
> > > > > Confirmed.
> > > > >
> > > > > On Sun, Jan 1, 2017 at 9:16 PM, George Leslie-Waksman <
> > > > > geo...@cloverhealth.com.invalid> wrote:
> > > > >
> > > > > > Sorry for the delayed response, end of year and holidays stole my
> > > > > attention
> > > > > > for a bit.
> > > > > >
> > > > > > With the new year, I was just looking to pick things back up and
> > > > solicit
> > > > > > presenters for the meetup. Given we're looking for two more, and
> > > > > > Dan(Airbnb) and Kevin(Agari) have already expressed interest, I'd
> > be
> > > > > happy
> > > > > > to give them the spots.
> > > > > >
> > > > > > I hope the delay in my response isn't too much of an
> inconvenience
> > > for
> > > > > > anyone. Dan, Kevin: confirm and I'll add you to the line up.
> > > > > >
> > > > > > --George
> > > > > >
> > > > > > On Sun, Nov 20, 2016 at 8:44 PM siddharth anand <
> san...@apache.org
> > >
> > > > > wrote:
> > > > > >
> > > > > > > I suspect Clover Health is extremely busy with all of the
> benefit
> > > > > > > enrollments going on right now..
> > > > > > >
> > > > > > > George,
> > > > > > > When you come up for air, it looks like both Dan(Airbnb) and
> > > > > Kevin(Agari)
> > > > > > > have talk ideas.
> > > > > > >
> > > > > > > -s
> > > > > > >
> > > > > > > On Wed, Nov 16, 2016 at 11:50 PM, Dan Davydov <
> > > > > > > dan.davy...@airbnb.com.invalid> wrote:
> > > > > > >
> > > > > > > > Based on chatting with a couple of people today at the
> Airflow
> > > > > meet-up
> > > > > > I
> > > > > > > > think there has been some demand for an airflow operations
> > talk,
> > > > > > > > specifically around monitoring/alerting. If there is still
> > room I
> > > > can
> > > > > > > give
> > > > > > > > a talk about this, let me know George.
> > > > > > > >
> > > > > > > > On Thu, Nov 10, 2016 at 10:17 AM, siddharth anand <
> > > > san...@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Kevin,
> > > > > > > > > Here's a link to the 1Q17 meet-up.
> > > > > > > > >
> > > > > > > https://www.meetup.com/Bay-Area-Apache-Airflow-
> > > > > Incubating-Meetup/events/
> > > >

Q1 Meetup

2017-01-04 Thread siddharth anand
Dan, Kevin,
Please send George your updated talk titles and abstracts. George, pls
update the meetup once you have them.

https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/235259523/

-s


Re: Airflow Github Mirror is not synchronizing

2017-01-03 Thread siddharth anand
The repo mirror is syncing now.
-s
On Sat, Dec 31, 2016 at 11:22 AM siddharth anand  wrote:

> FYI!
> I've reopened my earlier JIRA issue. It looks like multiple Apache
> Projects are reporting the same.
> https://issues.apache.org/jira/browse/INFRA-12949
>
> Newly merged changes won't be available to contributors/users until the
> mirroring issue is fixed.
>
>
>


Jan 2017 Airflow Podling Report Posted

2016-12-31 Thread siddharth anand
Hi Folks!
Here's the quarterly Podling Report for Apache Airflow. Feel free to
suggest edits. If you are a committer/maintainer, you can directly edit it.

https://wiki.apache.org/incubator/January2017

-s


Airflow Github Mirror is not synchronizing

2016-12-31 Thread siddharth anand
FYI!
I've reopened my earlier JIRA issue. It looks like multiple Apache Projects
are reporting the same.
https://issues.apache.org/jira/browse/INFRA-12949

Newly merged changes won't be available to contributors/users until the
mirroring issue is fixed.


[AIRFLOW-676] Do not allow Pools with 0 slots

2016-12-31 Thread siddharth anand
https://github.com/apache/incubator-airflow/pull/1967

Hi Folks!
Would appreciate your feedback on the following pull request.

If you'd like this functionality, please provide a +1 on the PR itself.
-s


Re: Podling Report Reminder - January 2017

2016-12-30 Thread siddharth anand
I'll put this together.

-s

On Thu, Dec 29, 2016 at 6:31 PM,  wrote:

> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 18 January 2017, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, January 04).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
>
> This should be appended to the Incubator Wiki page at:
>
> https://wiki.apache.org/incubator/January2017
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>


Re: Cold Case PR Cleanup -- Current Status

2016-12-11 Thread siddharth anand
Hi Folks!
We actually got down to 34 open PRs just before Nov 30 and are back up to
54 and growing. I'm extending the Cold Case clean-up to Jan 15. If you are
a committer and are closing PRs, please document it on the wiki. I'd like a
sense of our current ability to keep up with the PR load.

https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-Case+PR+Resolution

-s

On Sun, Nov 13, 2016 at 11:51 PM, siddharth anand  wrote:

> Current status : we have < 50 open PRs presently!
> https://cwiki.apache.org/confluence/display/AIRFLOW/
> Cold-Case+PR+Resolution
>
> -s
>
> On Sat, Nov 5, 2016 at 8:06 PM, siddharth anand  wrote:
>
>> Committers,
>> If you have time this week, please make a push to get your cold case
>> PR cleanup done. The deadline is Nov 15, just in time for our WePay meetup.
>> I will be making an announcement there.
>>
>> -s
>>
>>
>> -- Forwarded message --
>> From: *siddharth anand* 
>> Date: Friday, November 4, 2016
>> Subject: Cold Case PR Cleanup -- Current Status
>> To: dev@airflow.incubator.apache.org
>>
>>
>> We have a little over 50 open PRs. We need to get to 10 by the end of the
>> year. Our current rate of new PRs (i.e. during this holiday season) is a
>> handful a week, so 10 open PRs roughly equates to PRs opened within a 2
>> week period.
>>
>> 2 week turn-around times for PR review should be the commitment the
>> maintainer group sticks to. BTW, I've seen a few contributors pitch in by
>> reviewing PRs. That is extremely helpful and speeds up PR review/merge
>> times. Please continue.
>>
>> Here's the current cold-case status.
>>
>> [image: Inline image 1]
>>
>> -s
>>
>> On Wed, Nov 2, 2016 at 12:50 PM, siddharth anand 
>> wrote:
>>
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-Cas
>>> e+PR+Resolution
>>>
>>> We are now, for the first time since I can remember, under *60* open
>>> PRs! Woo hoo! Keep up the pressure committers! If you haven't yet had
>>> closed your tracked PRs, please do so soon.
>>>
>>> Exactly 1 month ago (Oct 2), when this endeavor started, we were at
>>> *110* open PRs.
>>>
>>> [image: Inline image 1]
>>> -s
>>>
>>> On Wed, Nov 2, 2016 at 12:30 AM, siddharth anand 
>>> wrote:
>>>
>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-Cas
>>>> e+PR+Resolution
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> -s
>>>>
>>>
>>>
>>
>>
>>
>> --
>> Sent from Gmail Mobile
>>
>
>


Re: Airflow 2.0

2016-12-06 Thread siddharth anand
Max,
Do you have time to summarize this thread? Perhaps, publish it on the Wiki!
-s

On Thu, Dec 1, 2016 at 12:27 PM, Van Klaveren, Brian N. <
b...@slac.stanford.edu> wrote:

> With the announcement of AWS Batch (https://aws.amazon.com/batch/), and
> my own selfish needs, I think it'd be really great to generally support
> Batch systems like AWS Batch, Slurm, and Torque as executors, potentially
> with an extension of the BashOperator, but I think it might actually be
> flexible enough to not need a dedicated BatchOperator.
>
> Brian
>
>
> On Nov 24, 2016, at 7:40 AM, Maycock, Luke  oliverwyman.com> wrote:
>
> Add FK to dag_run to the task_instance table on Postgres so that
> task_instances can be uniquely attributed to dag runs.
>
>
> + 1
>
>
> Also, I believe xcoms would need to be addressed in the same way at the
> same time - I have added a comment to that affect on
> https://issues.apache.org/jira/browse/AIRFLOW-642
>
>
> I believe this would be implemented for all supported back-ends, not just
> PostgreSQL.
>
>
> Cheers,
> Luke Maycock
> OLIVER WYMAN
> luke.mayc...@affiliate.oliverwyman.com mayc...@affiliate.oliverwyman.com> affiliate.oliverwyman.com>
> www.oliverwyman.com www.oliverwyman.com/>
>
>
>
> 
> From: Arunprasad Venkatraman mailto:arp...@uber.com>>
> Sent: 21 November 2016 18:16
> To: dev@airflow.incubator.apache.org incubator.apache.org>
> Subject: Re: Airflow 2.0
>
> Add FK to dag_run to the task_instance table on Postgres so that
> task_instances can be uniquely attributed to dag runs.
> Ensure scheduler can be run continuously without needing restarts.
> Ensure scheduler can handle tens of thousands of active workflows
>
> +1
>
> We are planning to run around 40,000 tasks a day using airflow and some of
> them are critical to give quick feedback to developers. Currently having
> execution date to uniquely identify tasks does not work for us since we
> mainly trigger dags (instead of running them on schedule). And we collide
> with 1 sec granularity on several occasions.  Having a task uuid or
> associating dag_run to task_instance as suggested by Sergei table will help
> mitigate this issue for us and would make it easy for us to update task
> results too. We would be happy to start working on this if it makes sense.
>
> Also we are wondering if there were any work done in community to support
> multiple schedulers(or alternates to mysql/Postgres) because 1 scheduler
> does not scale for us well and we see slow down of up to couple of minute
> sometimes when there are several pending tasks.
>
> Thanks
>
>
>
> On Mon, Nov 21, 2016 at 9:57 AM, Chris Riccomini  >
> wrote:
>
> Ensure scheduler can be run continuously without needing restarts
>
> +1
>
> On Mon, Nov 21, 2016 at 5:25 AM, David Batista  d...@hellofresh.com>> wrote:
> A small request, which might be handy.
>
> Having the possibility to select multiple tasks and mark them as
> Success/Clear/etc.
>
> Allow the UI to select individual tasks (i.e., inside the Tree View) and
> then have a button to mark them as Success/Clear/etc.
>
> On 21 November 2016 at 14:22, Sergei Iakhnin  lle...@gmail.com>> wrote:
>
> I've been running Airflow on 1500 cores in the context of scientific
> workflows for the past year and a half. Features that would be
> important to
> me for 2.0:
>
> - Add FK to dag_run to the task_instance table on Postgres so that
> task_instances can be uniquely attributed to dag runs.
> - Ensure scheduler can be run continuously without needing restarts.
> Right
> now it gets into some ill-determined bad state forcing me to restart it
> every 20 minutes.
> - Ensure scheduler can handle tens of thousands of active workflows.
> Right
> now this results in extremely long scheduling times and inconsistent
> scheduling even at 2 thousand active workflows.
> - Add more flexible task scheduling prioritization. The default
> prioritization is the opposite of the behaviour I want. I would prefer
> that
> downstream tasks always have higher priority than upstream tasks to
> cause
> entire workflows to tend to complete sooner, rather than scheduling
> tasks
> from other workflows. Having a few scheduling prioritization strategies
> would be beneficial here.
> - Provide better support for manually-triggered DAGs on the UI i.e. by
> showing them as queued.
> - Provide some resource management capabilities via something like slots
> that can be defined on workers and occupied by tasks. Using celery's
> concurrency parameter at the airflow server level is too coarse-grained
> as
> it forces all workers to be the same, and does not allow proper resource
> management when different workflow tasks have different resource
> requirements thus hurting utilization (a worker could run 8 parallel
> tasks
> with small memory footprint, but only 1 task with

Re: Merging the experimental API Framework

2016-11-28 Thread siddharth anand
Bolke,
Thanks for kicking this off.

Is there already a design document for this? If not, can you create one? It
makes sense to have a design document for this to connect multiple PRs. You
can also add the information above to the same wiki -- The mailing list is
not always super friendly for historical referencing.

-s

On Mon, Nov 28, 2016 at 11:25 AM, Dan Davydov <
dan.davy...@airbnb.com.invalid> wrote:

> Just wanted to say this is very exciting, thank you Bolke :).
>
> On Mon, Nov 28, 2016 at 10:50 AM, Bolke de Bruin 
> wrote:
>
> > All,
> >
> > After a few weeks of work I have finalized the implementation of a Rest
> > API Framework. Out of the box it supports Kerberos authentication, which
> is
> > now fully end to end tested on Travis’ with a working KDC. You can also
> > switch the CLI to use the API endpoints when available. Currently, only
> the
> > “trigger_dag” functionality is available this way, but I hope others to
> > pick up and create new endpoints that the CLI can then use.
> >
> > For Contributors:
> >
> > In case you are implementing new functionality in the CLI please make
> sure
> > to implement the actual functionality in api/common/…/
> > and expose it through api_client (abstract), json_client (JSON),
> > local_client (direct). Endpoints are defined in www/api/experimental.
> >
> > Direct exposure in cli.py I would consider deprecated and I would prefer
> > to deny it from now on. Hopefully, this gives us a gradual path to
> improved
> > integration and improved security while maintaining backwards
> > compatibility. Also note that the APIs are still marked experimental and
> > are subject to change.
> >
> > Next steps:
> > - Swagger definitions (http://swagger.io)
> > - Research possible integration between different authentication backends
> > - Use “airflow api” instead of “airflow webserver” to separate concerns
> > - Remove all direct DB access from cli.py
> > - Improve documentation
> > - Design API graduation roadmap (when is something not experimental
> > anymore)
> >
> > Feedback obviously appreciated.
> >
> > Bolke
> >
> >
> >
>


Re: November 16 SF Bay Airflow meetup

2016-11-28 Thread siddharth anand
Thx Chris. I've added it to the announcements as well.  :
https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements#Announcements-Nov28,2016

On Mon, Nov 28, 2016 at 11:13 AM, Chris Riccomini 
wrote:

> Hey all,
>
> A recording of the meetup is now available here:
>
> https://wepayinc.app.box.com/s/1183ra3z8gxf8fridysu4wbjckg1s05v
>
> Cheers,
> Chris
>
> On Tue, Nov 1, 2016 at 10:14 AM, Chris Riccomini 
> wrote:
> > Hey all,
> >
> > Just a gentle reminder that the next Airflow meetup is happening on
> > November 16, 2016 at WePay in Redwood City. Details are here:
> >
> > http://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/
> 234778571/
> >
> > Looking forward to seeing you all there!
> >
> > Cheers,
> > Chris
>


Cold case PR cleanup deadline

2016-11-28 Thread siddharth anand
Is this week, Nov 30. Friendly reminder to all maintainers.


Re: Simple Feature Request

2016-11-23 Thread siddharth anand
I keep forgetting about inline images and attachments being stripped out by
apache mail servers.. ugh..

https://www.dropbox.com/s/4wrju14j3gh09mk/Screenshot%202016-11-23%2012.24.45.png?dl=0
-s

On Wed, Nov 23, 2016 at 9:37 PM, Sumit Maheshwari 
wrote:

> +1.. nice to have it..
>
>
>
> On Thu, Nov 24, 2016 at 3:51 AM, siddharth anand 
> wrote:
>
> > If we support a test query dialog, we could just execute the query via
> each
> > hook's pre-existing execute method.
> >
> > -s
> >
> > On Wed, Nov 23, 2016 at 2:04 PM, Bolke de Bruin 
> wrote:
> >
> > > It is nice, but probably not so simple. Every hook would need to
> > > incorporate a “test” function.
> > >
> > >
> > > > Op 23 nov. 2016, om 21:28 heeft siddharth anand 
> > het
> > > volgende geschreven:
> > > >
> > > > Folks!
> > > > Here's a nice (simple) feature request if someone would like to
> > > implement it and file a PR.
> > > >
> > > > On the Admin-->Connections page, you are add a connection. But how do
> > > you know if it works? How about adding a "Test Connection" button on
> the
> > > edit page to test if the parameters you have specified are valid. You
> > might
> > > be able to also issue a test query from the edit screen.
> > > >
> > > > Currently, we don't validate the connections in terms of testing them
> > > because we have some default ones (as examples). In the future, we
> might
> > > want to consider removing those example connections or hiding them
> > behind a
> > > config param similar to load_examples, which is used to load the
> example
> > > dags into the Web app.
> > > >
> > > >
> > > > -s
> > >
> > >
> >
>


Re: Simple Feature Request

2016-11-23 Thread siddharth anand
If we support a test query dialog, we could just execute the query via each
hook's pre-existing execute method.

-s

On Wed, Nov 23, 2016 at 2:04 PM, Bolke de Bruin  wrote:

> It is nice, but probably not so simple. Every hook would need to
> incorporate a “test” function.
>
>
> > Op 23 nov. 2016, om 21:28 heeft siddharth anand  het
> volgende geschreven:
> >
> > Folks!
> > Here's a nice (simple) feature request if someone would like to
> implement it and file a PR.
> >
> > On the Admin-->Connections page, you are add a connection. But how do
> you know if it works? How about adding a "Test Connection" button on the
> edit page to test if the parameters you have specified are valid. You might
> be able to also issue a test query from the edit screen.
> >
> > Currently, we don't validate the connections in terms of testing them
> because we have some default ones (as examples). In the future, we might
> want to consider removing those example connections or hiding them behind a
> config param similar to load_examples, which is used to load the example
> dags into the Web app.
> >
> >
> > -s
>
>


Simple Feature Request

2016-11-23 Thread siddharth anand
Folks!
Here's a nice (simple) feature request if someone would like to implement
it and file a PR.

On the Admin-->Connections page, you are add a connection. But how do you
know if it works? How about adding a "Test Connection" button on the edit
page to test if the parameters you have specified are valid. You might be
able to also issue a test query from the edit screen.

Currently, we don't validate the connections in terms of testing them
because we have some default ones (as examples). In the future, we might
want to consider removing those example connections or hiding them behind a
config param similar to load_examples, which is used to load the example
dags into the Web app.

[image: Inline image 1]
-s


Re: Dynamic creation of DAG

2016-11-22 Thread siddharth anand
Hi Max,
Which part in the above PR is related to dynamic dags?

When thinking about adding documentation about functionality, I propose the
community bias towards adding working examples and test coverage. We offer
a quick start (which by the way needs some updates - for example, why does
it not start airflow-scheduler after starting the webserver?), but then
folks get stuck in how to write DAGs and use the full range of Airflow
capabilities. This is where examples and better test coverage help keep
newbies productive.

Perhaps the examples and tests can be upgraded to show a fuller set of
dynamic dag capabilities?

-s

On Mon, Nov 21, 2016 at 7:55 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> I just added a bit of information about dynamic DAG creation here:
> https://github.com/apache/incubator-airflow/pull/1889/files#diff-
> c6f0a0722c6a2f86277535d7bcec7f8cR162
>
> Let me know if it helps.
>
> Max
>
> On Mon, Nov 21, 2016 at 2:58 AM, Deepak Kumar Malladi <
> kapeed2...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I want to dynamically create DAG during run time. I tried the snippet
> given
> > in the documentation. But it didnt work for me.
> >
> > Any pointer on how to trigger DAGs which aren't actually present in DAG
> > folder but are created through code execution (dynamically created)?
> >
> >
> > Thanks & Regards,
> > Deepak
> >
>


Re: Airflow 2.0

2016-11-21 Thread siddharth anand
1) The restart should not be needed, but if folks are reporting it, I'm
curious what the problem might be. If yo are running on master, then you
may not be aware of the min_file_process_interval setting.

[scheduler]

min_file_process_interval = 0

max_threads = 4

2) Yes.. security is not there. It's often something added to a maturing
project a little late in its growth - after feature completeness,
performance, etc... For example, Azkaban grew at LinkedIn to be widely
adopted for a few years before Azkaban2 came around and introduced security
features. If it's important to you, then vote. It may not be there on your
timeframe, but it will surely be something we land in 2017. Also if you run
in the cloud, there are some options that be make your installation more
secure.

Great feedback. I know Max kicked this thread off in order to figure out
how to get his team to consider the community's needs when picking what to
fix. This information is in fact helpful to us all.

-s

On Mon, Nov 21, 2016 at 6:13 PM, Boris Tyukin  wrote:

> I am still deciding between Airflow and oozie for our brand new Hadoop
> project but here is a few things that I did not like during my limited
> testing:
>
> 1) pain with scheduler/webserver restarts - things magically begin working
> after restart or disappear (like DAG tasks that are no longer part of DAG)
> 2) no security - a big deal for enterprise-like companies like the one I
> work for (a large healthcare organization).
> 3) backfill concept is a bit weird to me. I think Gerard put it pretty well
> - backfills should be run for the entire missing window, not day by day.
> Logging for backfills should be consistent with normal DAG Runs.
> 4) confusion around execution time and start time - i wish UI would clearly
> distinct them. Execution time only covers interval to a previous DAG run -
> I wish it would go back the LAST successful DAG run. That way I can rely on
> it to use it as watermarks for incremental processes.
> 5) UTC confusion - not all companies have a luxury to run all the systems
> on UTC.
>
>
> On Mon, Nov 21, 2016 at 5:26 PM, siddharth anand 
> wrote:
>
> > Also, a survey will be a little less noisy and easier to summarize than
> +1s
> > in this email thread.
> > -s (Sid)
> >
> > On Mon, Nov 21, 2016 at 2:25 PM, siddharth anand 
> > wrote:
> >
> > > Sergei,
> > > These are some great ideas -- I would classify at least half of them as
> > > pain points.
> > >
> > > Folks!
> > > I suggest people (on the dev list) keep feeding this thread at least
> for
> > > the next 2 days. I can then float a survey based on these ideas and
> give
> > > the community a chance to vote so we can prioritize the wish list.
> > >
> > > -s
> > >
> > > On Mon, Nov 21, 2016 at 5:22 AM, Sergei Iakhnin 
> > wrote:
> > >
> > >> I've been running Airflow on 1500 cores in the context of scientific
> > >> workflows for the past year and a half. Features that would be
> important
> > >> to
> > >> me for 2.0:
> > >>
> > >> - Add FK to dag_run to the task_instance table on Postgres so that
> > >> task_instances can be uniquely attributed to dag runs.
> > >> - Ensure scheduler can be run continuously without needing restarts.
> > Right
> > >> now it gets into some ill-determined bad state forcing me to restart
> it
> > >> every 20 minutes.
> > >> - Ensure scheduler can handle tens of thousands of active workflows.
> > Right
> > >> now this results in extremely long scheduling times and inconsistent
> > >> scheduling even at 2 thousand active workflows.
> > >> - Add more flexible task scheduling prioritization. The default
> > >> prioritization is the opposite of the behaviour I want. I would prefer
> > >> that
> > >> downstream tasks always have higher priority than upstream tasks to
> > cause
> > >> entire workflows to tend to complete sooner, rather than scheduling
> > tasks
> > >> from other workflows. Having a few scheduling prioritization
> strategies
> > >> would be beneficial here.
> > >> - Provide better support for manually-triggered DAGs on the UI i.e. by
> > >> showing them as queued.
> > >> - Provide some resource management capabilities via something like
> slots
> > >> that can be defined on workers and occupied by tasks. Using celery's
> > >> concurrency parameter at the airflow server level is too
> coarse-grained
> > as
> > >> it forces all workers to be the 

Re: Airflow 2.0

2016-11-21 Thread siddharth anand
Also, a survey will be a little less noisy and easier to summarize than +1s
in this email thread.
-s (Sid)

On Mon, Nov 21, 2016 at 2:25 PM, siddharth anand  wrote:

> Sergei,
> These are some great ideas -- I would classify at least half of them as
> pain points.
>
> Folks!
> I suggest people (on the dev list) keep feeding this thread at least for
> the next 2 days. I can then float a survey based on these ideas and give
> the community a chance to vote so we can prioritize the wish list.
>
> -s
>
> On Mon, Nov 21, 2016 at 5:22 AM, Sergei Iakhnin  wrote:
>
>> I've been running Airflow on 1500 cores in the context of scientific
>> workflows for the past year and a half. Features that would be important
>> to
>> me for 2.0:
>>
>> - Add FK to dag_run to the task_instance table on Postgres so that
>> task_instances can be uniquely attributed to dag runs.
>> - Ensure scheduler can be run continuously without needing restarts. Right
>> now it gets into some ill-determined bad state forcing me to restart it
>> every 20 minutes.
>> - Ensure scheduler can handle tens of thousands of active workflows. Right
>> now this results in extremely long scheduling times and inconsistent
>> scheduling even at 2 thousand active workflows.
>> - Add more flexible task scheduling prioritization. The default
>> prioritization is the opposite of the behaviour I want. I would prefer
>> that
>> downstream tasks always have higher priority than upstream tasks to cause
>> entire workflows to tend to complete sooner, rather than scheduling tasks
>> from other workflows. Having a few scheduling prioritization strategies
>> would be beneficial here.
>> - Provide better support for manually-triggered DAGs on the UI i.e. by
>> showing them as queued.
>> - Provide some resource management capabilities via something like slots
>> that can be defined on workers and occupied by tasks. Using celery's
>> concurrency parameter at the airflow server level is too coarse-grained as
>> it forces all workers to be the same, and does not allow proper resource
>> management when different workflow tasks have different resource
>> requirements thus hurting utilization (a worker could run 8 parallel tasks
>> with small memory footprint, but only 1 task with large memory footprint
>> for instance).
>>
>> With best regards,
>>
>> Sergei.
>>
>>
>> On Mon, Nov 21, 2016 at 2:00 PM Ryabchuk, Pavlo <
>> ext-pavlo.ryabc...@here.com>
>> wrote:
>>
>> > -1. We extremely rely on data profiling, as a pipeline health monitoring
>> > tool
>> >
>> > -Original Message-
>> > From: Chris Riccomini [mailto:criccom...@apache.org]
>> > Sent: Saturday, November 19, 2016 1:57 AM
>> > To: dev@airflow.incubator.apache.org
>> > Subject: Re: Airflow 2.0
>> >
>> > > RIP out the charting application and the data profiler
>> >
>> > Yes please! +1
>> >
>> > On Fri, Nov 18, 2016 at 2:41 PM, Maxime Beauchemin <
>> > maximebeauche...@gmail.com> wrote:
>> > > Another point that may be controversial for Airflow 2.0: RIP out the
>> > > charting application and the data profiler. Even though it's nice to
>> > > have it there, it's just out of scope and has major security
>> > issues/implications.
>> > >
>> > > I'm not sure how popular it actually is. We may need to run a survey
>> > > at some point around this kind of questions.
>> > >
>> > > Max
>> > >
>> > > On Fri, Nov 18, 2016 at 2:39 PM, Maxime Beauchemin <
>> > > maximebeauche...@gmail.com> wrote:
>> > >
>> > >> Using FAB's Model, we get pretty much all of that (REST API,
>> > >> auth/perms,
>> > >> CRUD) for free:
>> > >> https://emea01.safelinks.protection.outlook.com/?url=http%
>> 3A%2F%2Ffla
>> > >> sk-appbuilder.readthedocs.io%2Fen%2Flatest%2F&data=01%7C01%7
>> C%7C0064f
>> > >> 74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b85391feaea649
>> 19%7C1&sd
>> > >> ata=uIJcFlm02IJ0Yo2cYLxAJZlkbCF2ZMk6dR%2FkhazZwVE%3D&reserved=0
>> > >> quickhowto.html?highlight=rest#exposed-methods
>> > >>
>> > >> I'm pretty intimate with FAB since I use it (and contributed to it)
>> > >> for Superset/Caravel.
>> > >>
>> > >> All that's needed is to derive FAB's model class instead of
>> > >

Re: Airflow 2.0

2016-11-21 Thread siddharth anand
Sergei,
These are some great ideas -- I would classify at least half of them as
pain points.

Folks!
I suggest people (on the dev list) keep feeding this thread at least for
the next 2 days. I can then float a survey based on these ideas and give
the community a chance to vote so we can prioritize the wish list.

-s

On Mon, Nov 21, 2016 at 5:22 AM, Sergei Iakhnin  wrote:

> I've been running Airflow on 1500 cores in the context of scientific
> workflows for the past year and a half. Features that would be important to
> me for 2.0:
>
> - Add FK to dag_run to the task_instance table on Postgres so that
> task_instances can be uniquely attributed to dag runs.
> - Ensure scheduler can be run continuously without needing restarts. Right
> now it gets into some ill-determined bad state forcing me to restart it
> every 20 minutes.
> - Ensure scheduler can handle tens of thousands of active workflows. Right
> now this results in extremely long scheduling times and inconsistent
> scheduling even at 2 thousand active workflows.
> - Add more flexible task scheduling prioritization. The default
> prioritization is the opposite of the behaviour I want. I would prefer that
> downstream tasks always have higher priority than upstream tasks to cause
> entire workflows to tend to complete sooner, rather than scheduling tasks
> from other workflows. Having a few scheduling prioritization strategies
> would be beneficial here.
> - Provide better support for manually-triggered DAGs on the UI i.e. by
> showing them as queued.
> - Provide some resource management capabilities via something like slots
> that can be defined on workers and occupied by tasks. Using celery's
> concurrency parameter at the airflow server level is too coarse-grained as
> it forces all workers to be the same, and does not allow proper resource
> management when different workflow tasks have different resource
> requirements thus hurting utilization (a worker could run 8 parallel tasks
> with small memory footprint, but only 1 task with large memory footprint
> for instance).
>
> With best regards,
>
> Sergei.
>
>
> On Mon, Nov 21, 2016 at 2:00 PM Ryabchuk, Pavlo <
> ext-pavlo.ryabc...@here.com>
> wrote:
>
> > -1. We extremely rely on data profiling, as a pipeline health monitoring
> > tool
> >
> > -Original Message-
> > From: Chris Riccomini [mailto:criccom...@apache.org]
> > Sent: Saturday, November 19, 2016 1:57 AM
> > To: dev@airflow.incubator.apache.org
> > Subject: Re: Airflow 2.0
> >
> > > RIP out the charting application and the data profiler
> >
> > Yes please! +1
> >
> > On Fri, Nov 18, 2016 at 2:41 PM, Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> > > Another point that may be controversial for Airflow 2.0: RIP out the
> > > charting application and the data profiler. Even though it's nice to
> > > have it there, it's just out of scope and has major security
> > issues/implications.
> > >
> > > I'm not sure how popular it actually is. We may need to run a survey
> > > at some point around this kind of questions.
> > >
> > > Max
> > >
> > > On Fri, Nov 18, 2016 at 2:39 PM, Maxime Beauchemin <
> > > maximebeauche...@gmail.com> wrote:
> > >
> > >> Using FAB's Model, we get pretty much all of that (REST API,
> > >> auth/perms,
> > >> CRUD) for free:
> > >> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffla
> > >> sk-appbuilder.readthedocs.io%2Fen%2Flatest%2F&data=01%7C01%7C%7C0064f
> > >> 74fd0d940ab732808d4100e9c3f%7C6d4034cd72254f72b85391feaea64919%7C1&sd
> > >> ata=uIJcFlm02IJ0Yo2cYLxAJZlkbCF2ZMk6dR%2FkhazZwVE%3D&reserved=0
> > >> quickhowto.html?highlight=rest#exposed-methods
> > >>
> > >> I'm pretty intimate with FAB since I use it (and contributed to it)
> > >> for Superset/Caravel.
> > >>
> > >> All that's needed is to derive FAB's model class instead of
> > >> SqlAlchemy's model class (which FAB's model wraps and adds
> > >> functionality to and is 100% compatible AFAICT).
> > >>
> > >> Max
> > >>
> > >> On Fri, Nov 18, 2016 at 2:07 PM, Chris Riccomini
> > >> 
> > >> wrote:
> > >>
> > >>> > It may be doable to run this as a different package
> > >>> `airflow-webserver`, an
> > >>> > alternate UI at first, and to eventually rip out the old UI off of
> > >>> > the
> > >>> main
> > >>> > package.
> > >>>
> > >>> This is the same strategy that I was thinking of for AIRFLOW-85. You
> > >>> can build the new UI in parallel, and then delete the old one later.
> > >>> I really think that a REST interface should be a pre-req to any
> > >>> large/new UI changes, though. Getting unified so that everything is
> > >>> driven through REST will be a big win.
> > >>>
> > >>> On Fri, Nov 18, 2016 at 1:51 PM, Maxime Beauchemin
> > >>>  wrote:
> > >>> > A multi-tenant UI with composable roles on top of granular
> > permissions.
> > >>> >
> > >>> > Migrating from Flask-Admin to Flask App Builder would be an
> > >>> > easy-ish win (since they're both Flask). FAB Provides a good
> > >>> > authentication and permissio

Idea for a new UI feature

2016-11-20 Thread siddharth anand
A sort of obvious and missing feature of Airflow (awesome UI) is knowing
whether your DAGs are missing SLAs.

For example, say that you have an Hourly DAG and say that you have
specified the dag run timeout or sla=timedelta(hours=2), then after 2
hours, you will receive an email of the SLA miss.

It would be useful to see how far behind your DAG actually is, e.g. 6
hours. I welcome a PR that addresses this.

Something along the lines of what @btallman has in his recent PR to add
some cool additional info to the dashboard (dag overview) page :
https://github.com/apache/incubator-airflow/pull/1833

-s


Re: Subsequent Airflow Meetup: 2017/01/11

2016-11-20 Thread siddharth anand
I suspect Clover Health is extremely busy with all of the benefit
enrollments going on right now..

George,
When you come up for air, it looks like both Dan(Airbnb) and Kevin(Agari)
have talk ideas.

-s

On Wed, Nov 16, 2016 at 11:50 PM, Dan Davydov <
dan.davy...@airbnb.com.invalid> wrote:

> Based on chatting with a couple of people today at the Airflow meet-up I
> think there has been some demand for an airflow operations talk,
> specifically around monitoring/alerting. If there is still room I can give
> a talk about this, let me know George.
>
> On Thu, Nov 10, 2016 at 10:17 AM, siddharth anand 
> wrote:
>
> > Kevin,
> > Here's a link to the 1Q17 meet-up.
> > https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/
> > 235259523/
> >
> > Both upcoming meet-ups (next week at WePay and 1Q17 at Clover Health) can
> > be found on http://www.meetup.com/Bay-Area-Apache-Airflow-
> > Incubating-Meetup/
> >
> > -s
> >
> >
> > On Wed, Nov 9, 2016 at 4:24 PM, Kevin Mandich 
> > wrote:
> >
> > > Hi George,
> > >
> > > If there is still room, I'd like to give a talk about how we use
> Airflow
> > at
> > > my company, Agari. We are a data company that is working to eliminate
> > > inbound, targeted e-mail attacks to our customers (spear-phishing). I
> am
> > > currently working as a data scientist who is also responsible for
> > shipping
> > > my work to production.
> > >
> > > We currently use Airflow to build models from our telemetry data which
> > are
> > > then used for scoring in our near-real-time pipeline. I'd like to talk
> > > about some of the DAGs we've set up to do this.
> > >
> > > Please let me know if this sounds reasonable. Thank you,
> > >
> > > Kevin Mandich
> > > Agari Data, Inc.
> > >
> > >
> > > On Mon, Oct 31, 2016 at 11:27 PM, George Leslie-Waksman <
> > > geo...@cloverhealth.com.invalid> wrote:
> > >
> > > > I know it's a bit far in advance, but to make sure there's space (and
> > > food
> > > > and drink), I've scheduled and booked the subsequent meetup for
> January
> > > > 11th at Clover Health in SF.
> > > >
> > > > If anyone wants to volunteer to talk, let me know, otherwise I'll
> > > probably
> > > > start bugging folks sometime after Thanksgiving and before the
> December
> > > > holidays.
> > > >
> > > > --George Leslie-Waksman
> > > >
> > >
> >
>


Re: Github Mirroring currently broken

2016-11-20 Thread siddharth anand
This appears to be working again!
-s

On Sun, Nov 20, 2016 at 12:10 PM, siddharth anand  wrote:

> Committers/Maintainers,
> The Apache Airflow Github mirror is not synchronizing. I've filed a
> ticket. It looks like, as of now, 2 other Apache projects (nifi &
> brooklyn-server) have reported the same issue.
>
> https://issues.apache.org/jira/browse/INFRA-12949
>
> This means that although we are successfully merging to Apache master at
> https://git-wip-us.apache.org/repos/asf/incubator-airflow.git, the
> changes are not being mirrored to g...@github.com:apache/
> incubator-airflow.git. This affects things like rebasing of PRs.. and
> I've opened the tickets at a Blocker.
>
> -s
>


Github Mirroring currently broken

2016-11-20 Thread siddharth anand
Committers/Maintainers,
The Apache Airflow Github mirror is not synchronizing. I've filed a ticket.
It looks like, as of now, 2 other Apache projects (nifi & brooklyn-server)
have reported the same issue.

https://issues.apache.org/jira/browse/INFRA-12949

This means that although we are successfully merging to Apache master at
https://git-wip-us.apache.org/repos/asf/incubator-airflow.git, the changes
are not being mirrored to g...@github.com:apache/incubator-airflow.git. This
affects things like rebasing of PRs.. and I've opened the tickets at a
Blocker.

-s


Re: Airflow 2.0

2016-11-19 Thread siddharth anand
I feel a lot of changes happen to areas of the code shared by both
scheduler and webserver, such as models. Any time we have changes to these
shared areas, we will need to release the scheduler as well.

Also, it's not clear to me (out of ignorance perhaps) how the above would
speed up releasing.

FYI, on the topic of fixing bugs, nice fix just popped up (and got merged)
from Vijay Bhat:
https://github.com/apache/incubator-airflow/pull/1892

-s

On Fri, Nov 18, 2016 at 5:58 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Totally agree on all your points Sid.
>
> My feeling is that at the moment the most critical thing for the project is
> to get a release out and get to a steady pace of high quality releases.
>
> Somehow breaking down the package seem to me like it would really help with
> the release process. Maybe an idea is to make 1.9 a release made of smaller
> packages where airflow = `(airflow-core + airflow-operators +
> airflow-webserver)` or something like that. I'm thinking it would allow to
> release often on airflow-operators & airflow-webserver.
>
> Max
>
> On Fri, Nov 18, 2016 at 5:34 PM, siddharth anand 
> wrote:
>
> > David
> > https://issues.apache.org/jira/browse/AIRFLOW-558 (i.e. http
> > s://github.com/apache/incubator-airflow/pull/1830 ) Is on my plate..
> have
> > already gone through many rounds of reviews, testing, and fixes with the
> > submitter and does not need to wait till 2.0. We should be able to merge
> it
> > soon. BTW, you are encouraged to vote on these PRs so maintainers can
> > prioritize their time.
> >
> > Max,
> >
> > Thanks for kicking off this thread.
> >
> > Regarding 2.0, we've associated feature deprecation and non-backward
> > compatible changes with 2.0. Some of this work might be pretty
> > earth-shaking to Airflow users. IMHO, changes that increase user pain at
> > upgrade time need to be carefully balanced against value.
> >
> > Watching both Gitter and the email list, there are a variety of stumbling
> > points (for new users) that many of us who have been using the product
> for
> > 1-2 years have forgotten. A fair number of people still mention that
> > getting Airflow up and running is no simple task - i.e. Alex mentioned
> this
> > in his talk at the last meet-up. The recent BlueYonder talk referenced
> > https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls
> >
> > Though we may be numerically near 2.0 in terms of release numbers, I'd
> > prefer to prioritize a few things higher than releasing 2.0. We need to
> > build and exercise a few necessary muscles : timely PR processing &
> timely
> > Apache releases (i.e. quarterly). Beyond that, I'd like to prioritize the
> > "common pitfall" problems to ease on-boarding. Some of these don't need
> to
> > wait for a major release. The ones that do can be developed on a separate
> > 2.0 branch and baked, reviewed, and voted on by the community before we
> > consider dropping it into master.
> >
> > That way, we can keep master healthy to support the increasing rate of
> > community-submitted PRs that we are seeing and reduce the cycle time of
> > cutting stable releases, all while working on big-bang changes for 2.0
> > independently.
> >
> > Just my $0.02
> > -s
> >
> > On Fri, Nov 18, 2016 at 3:57 PM, Chris Riccomini 
> > wrote:
> >
> > > > RIP out the charting application and the data profiler
> > >
> > > Yes please! +1
> > >
> > > On Fri, Nov 18, 2016 at 2:41 PM, Maxime Beauchemin
> > >  wrote:
> > > > Another point that may be controversial for Airflow 2.0: RIP out the
> > > > charting application and the data profiler. Even though it's nice to
> > have
> > > > it there, it's just out of scope and has major security
> > > issues/implications.
> > > >
> > > > I'm not sure how popular it actually is. We may need to run a survey
> at
> > > > some point around this kind of questions.
> > > >
> > > > Max
> > > >
> > > > On Fri, Nov 18, 2016 at 2:39 PM, Maxime Beauchemin <
> > > > maximebeauche...@gmail.com> wrote:
> > > >
> > > >> Using FAB's Model, we get pretty much all of that (REST API,
> > auth/perms,
> > > >> CRUD) for free:
> > > >> http://flask-appbuilder.readthedocs.io/en/latest/
> > > >> quickhowto.html?highlight=rest#exposed-methods
> > > >>
> > > >> I

  1   2   3   4   >