Fwd: Cannot access https://cms.apache.org/incubator/publish

2018-05-15 Thread siddharth anand
Kaxil,
Can you try these steps and update the airflow wiki (committer guide) based on 
your findings?

-s

Sent from Sid's iPhone 

Begin forwarded message:

> From: Martin Gainty 
> Date: May 15, 2018 at 4:21:13 AM PDT
> To: "san...@apache.org" 
> Subject: Re: Cannot access https://cms.apache.org/incubator/publish
> 
> Hi Anand
> 
> apparently the new URL is incubator.apache.org as I could not find any 
> references to cms ..here is jbake readme
> # Apache Incubator Website
> 2
> 3
>  ## Prerequisites
> 4
> 5
>  The website is built using JBake and a Groovy template.  The builds for the 
> website do require internet access.
> 6
> 7
>  - Install JBake from http://jbake.org/download.html
> 8
>  - Create an environment variable `JBAKE_HOME` pointing to your JBake 
> installation
> 9
>  - Ensure that you have a JVM locally, e.g. 
> [OpenJDK](http://openjdk.java.net/install/)
> 10
> 11
>  ## Building & Running the site
> 12
> 13
>  There is a custom `bake.sh` file that is used to build the website.  You can 
> call it with any of the [arguments you would pass to 
> jbake](http://jbake.org/docs/2.5.1/#bake_command).
> 14
>  The easiest way to use it is to run `./bake.sh -b -s` this will start up 
> JBake in a watching mode as you make changes it will refresh after a short 
> period of time.
> 15
>  While working with it locally, you'll notice that the site URLs redirect to 
> `incubator.apache.org`, to change this edit `jbake.properties` and uncomment 
> the line referencing `localhost`
> 16
> 17
>  ## Jenkins Setup
> 18
> 19
>  Commits to the `jbake-site` branch are automatically checked out and built 
> using `build_site.sh`.  Once this goes live those commits will go against 
> `master`.  The jenkins job can be found at 
> [https://builds.apache.org/view/H-L/view/Incubator/job/Incubator%20Site/](https://builds.apache.org/view/H-L/view/Incubator/job/Incubator%20Site/)
> 20
>  The result of the commits are pushed to the `asf-site` branch which are then 
> published using `gitwcsub`
> 21
> 22
>  ## Asciidoctor
> 23
> 24
>  Most of the pages in the site are written using Asciidoctor.  While it is a 
> form of asciidoc it does have some [syntax differences that are worth 
> reviewing](http://asciidoctor.org/docs/asciidoc-syntax-quick-reference/)
> 25
> 26
>  ## Groovy Templates
> 27
> 28
>  The site templates are written in groovy scripts.  Even though the files end 
> with `.gsp` they are not GSP files and do not have access to tag libraries.  
> You can run custom code in them, similar to what is done in 
> [homepage.gsp](templates/homepage.gsp)
> 
> if you have hard requirement to accessing cms.apache.org 
> write a request for username/password to cms.apache.org to site admin: 
> d...@apache.org
> 
> Good Luck!
> Martin 
> __ 
>  
> 
> 
> From: John D. Ament 
> Sent: Monday, May 14, 2018 10:01 PM
> To: gene...@incubator.apache.org
> Cc: san...@apache.org
> Subject: Re: Cannot access https://cms.apache.org/incubator/publish
>  
> The Incubator website is no longer managed via CMS.  Please review
> https://incubator.apache.org/guides/website.html
> Updating the top-level Incubator website
> incubator.apache.org
> The Incubator website is generated by the incubator git repository. The 
> primary document format is asciidoc, templates are based on gsp, and we use 
> jbake to build it. You can edit files directly on github and raise a pull 
> request or just checkout the repository at 
> https://git-wip-us.apache.org/repos ...
> 
> 
> 
> John
> 
> On Mon, May 14, 2018 at 9:17 PM Martin Gainty  wrote:
> 
> > Hi Sid
> >
> >
> > as long as you have JavaScript enabled in the browser
> > and you gave the same issue with curl then AFAIK its a permissions error
> >
> >
> > can you contract admin or webmaster and have them email you valid
> > username/password
> >
> > ?
> >
> > Martin
> > __
> >
> >
> >
> > 
> > From: Sid Anand 
> > Sent: Monday, May 14, 2018 9:12 PM
> > To: Martin Gainty
> > Cc: gene...@incubator.apache.org
> > Subject: Re: Cannot access https://cms.apache.org/incubator/publish
> >
> > So, perhaps I (sanand) don't have the necessary permissions?
> > -s
> >
> > On Mon, May 14, 2018 at 6:11 PM, Sid Anand  wrote:
> >
> > > I get the same error (I've hidden my password).
> > >
> > > sianand@LM-SJN-21002367:~ $ curl  --user sanand:
> > > https://cms.apache.org/incubator/publish
> > >
> > > 
> > >
> > > 
> > >
> > > 404 Not Found
> > >
> > > 
> > >
> > > Page Not Found
> > >
> > > The requested URL was not found on this server. If you are trying to
> > > edit a CMS-driven page, your local working copy may have been pruned.
> > >
> > > Please go to cms.apache.org,
> > find
> > > your project, and click on 'force new working copy' to create a new
> > >
> > > working 

Re: [VOTE] Release Airflow 1.8.2 based on Airflow 1.8.2 RC2

2017-07-19 Thread siddharth anand
FYI, can anyone pictorially describe the release process (and post it on
the apache airflow wiki)? I think that would eliminate a lot of confusion
in the future and avoid a rehash of this email thread on the next release.

-s

On Wed, Jul 19, 2017 at 10:48 AM, Hitesh Shah  wrote:

> To add, the main source tarball should have instructions to generate the
> sdist and bdist versions. Additionally, as part of the release process if
> the plan is to publish to pypi (after the IPMC vote succeeds), then the
> appropriate bits also need to be verified/voted upon. There are not exactly
> counted as the official release bits but they do need to be verified as
> part of the voting process to ensure that the bits do indeed map to the
> source release, license/notice files are correct, etc.
>
> thanks
> -- Hitesh
>
>
> On Tue, Jul 18, 2017 at 12:01 AM, Bolke de Bruin 
> wrote:
>
> > Thanks Hitesh. We discussed it with John Ament on the IPMC. Python has
> the
> > notion of 3 types of distributions, “source”, “sdist”, “bdist”, contrary
> to
> > Java that knows only two (source, bdist). We used to vote on “sdist”,
> which
> > was deemed incorrect.
> >
> > So, Max, indeed we need to vote on a tar.gz that contains build
> > instructions in INSTALL to get to “sdist”. The build instructions should
> > also contain instruction how to run the license checks by Apache Rat.
> Most
> > of the work probably goes in the build instructions and verifying they
> > work, but it should not be much.
> >
> > Any other clarification required?
> >
> > Bolke
> >
> >
>


Re: Podling Report Reminder - July 2017

2017-07-05 Thread siddharth anand
I've updated the Airflow report on
https://wiki.apache.org/incubator/July2017

Do let me know if you have any questions.
-s


On Wed, Jul 5, 2017 at 7:00 AM,  wrote:

> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 19 July 2017, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, July 05).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
> *   How does the podling rate their own maturity.
>
> This should be appended to the Incubator Wiki page at:
>
> https://wiki.apache.org/incubator/July2017
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>


Question about updating podling report

2017-07-04 Thread siddharth anand
Folks!
As I joined a new role (at a new company), it means that I also started
using a new laptop. I'd like to update the podling report (due tomorrow(,
but don't remember the specifics for connecting via svn to the incubator
repo. Anyone have instructions?

I believe I need to generate a new SSH key and store it on
https://id.apache.org/

I'm trying... svn co svn+ssh://san...@svn.apache.org/repos/asf/incubator
That's not working.

-s


Re: Podling Report Reminder - July 2017

2017-07-01 Thread siddharth anand
Adding this to my to-do list.

On Tue, Jun 27, 2017 at 4:54 PM,  wrote:

> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 19 July 2017, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, July 05).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
> *   How does the podling rate their own maturity.
>
> This should be appended to the Incubator Wiki page at:
>
> https://wiki.apache.org/incubator/July2017
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>


Re: DAGs dont get refreshed ?

2017-06-20 Thread siddharth anand
There is a manual DAG refresh option in the UI for DAGs the UI is already
aware of -- this will reload a DAG. But that's not a complete solution to
the the more general DAG refresh problem.

-s

On Tue, Jun 20, 2017 at 6:36 PM, siddharth anand <san...@apache.org> wrote:

> To clarify, at Agari, we use monitd (like systemd) to restart both
> webserver and scheduler (running local executor) after deploying new dags
> to the dag folder. The Web UI does not discover new DAGs and it does
> automatically reload changes to existing files either.
>
> -s
>
> On Tue, Jun 20, 2017 at 6:34 PM, siddharth anand <san...@apache.org>
> wrote:
>
>> We actually do restart both Web and Schedulers. I know the scheduler does
>> reparse the files in the dag folder, but the current state of the web ui
>> does require a restart.
>>
>> -s
>>
>> On Tue, Jun 20, 2017 at 6:07 PM, Ashika Umanga Umagiliya <
>> umanga@gmail.com> wrote:
>>
>>> Greetings,
>>>
>>> We are using airflow (1.7) to manage our ETL pipeline and we are having
>>> issues related to refreshing of DAGs.
>>>
>>> When we update the DAG python script inside "dag folder" ,they don't get
>>> updated in the UI.(DAG tree as well as the Code in the UI). We have to
>>> kill
>>> and restart the "airflow webserver" process for them to be updated.Isn't
>>> there a hot-update feature in airflow?
>>> Is there any workaround to fix this issue ?
>>>
>>
>>
>


Re: DAGs dont get refreshed ?

2017-06-20 Thread siddharth anand
To clarify, at Agari, we use monitd (like systemd) to restart both
webserver and scheduler (running local executor) after deploying new dags
to the dag folder. The Web UI does not discover new DAGs and it does
automatically reload changes to existing files either.

-s

On Tue, Jun 20, 2017 at 6:34 PM, siddharth anand <san...@apache.org> wrote:

> We actually do restart both Web and Schedulers. I know the scheduler does
> reparse the files in the dag folder, but the current state of the web ui
> does require a restart.
>
> -s
>
> On Tue, Jun 20, 2017 at 6:07 PM, Ashika Umanga Umagiliya <
> umanga@gmail.com> wrote:
>
>> Greetings,
>>
>> We are using airflow (1.7) to manage our ETL pipeline and we are having
>> issues related to refreshing of DAGs.
>>
>> When we update the DAG python script inside "dag folder" ,they don't get
>> updated in the UI.(DAG tree as well as the Code in the UI). We have to
>> kill
>> and restart the "airflow webserver" process for them to be updated.Isn't
>> there a hot-update feature in airflow?
>> Is there any workaround to fix this issue ?
>>
>
>


Re: DAGs dont get refreshed ?

2017-06-20 Thread siddharth anand
We actually do restart both Web and Schedulers. I know the scheduler does
reparse the files in the dag folder, but the current state of the web ui
does require a restart.

-s

On Tue, Jun 20, 2017 at 6:07 PM, Ashika Umanga Umagiliya <
umanga@gmail.com> wrote:

> Greetings,
>
> We are using airflow (1.7) to manage our ETL pipeline and we are having
> issues related to refreshing of DAGs.
>
> When we update the DAG python script inside "dag folder" ,they don't get
> updated in the UI.(DAG tree as well as the Code in the UI). We have to kill
> and restart the "airflow webserver" process for them to be updated.Isn't
> there a hot-update feature in airflow?
> Is there any workaround to fix this issue ?
>


Re: Passing Variables

2017-06-20 Thread siddharth anand
Ah.. I completely missed the question.. in my haste to do too many things.

Assuming you have a DAG named process_my_data with 3 tasks :
read__from_source_table --> transform --> write_to_new_table. This dag
should have a @none schedule.

You could write a script to read your list of source tables and call
airflow trigger_dag -c  -e . This will launch a dag execution run for
each of the input that you call. I believe that the execution date should
differ by 1 second (timestamp granularity in the db).. so avoid a tight
loop with a 1 second sleep between executions.

You will see N dag runs, one for each of the N source tables that you pass
in.

-s

On Tue, Jun 20, 2017 at 12:22 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> One DAG cannot have multiple shapes at one time, by design. You cannot
> parameterize things that will affect the shape of your DAG (though note
> that you can fully parameterize what happens within individual task
> instances). Think about it, a DAG is one (and only one) graph. It's NOT a
> shapeshifting thing.
>
> As a workaround, and this may or may not be the right thing to do, you can
> write a DAG factory function, that will return a DAG object given
> parameters, but any given DAG instance (with a unique dag_id) has a single
> shape. If you do want to go that route, may want to use
> `schedule_interval='@once'`
>
> If you think the shape of your DAG needs to change from one DAG run to the
> next, you may want to re-think what is static and what is dynamic. Are your
> database tables schema changing from one DAG run to the next? No right?
> That'd be crazy! Most likely you want to think about the shape of your DAG
> in a similar way as you think about the schema of your tables: static or
> slowly changing.
>
> Max
>
> On Mon, Jun 19, 2017 at 4:11 AM, Rob Harrison  wrote:
>
> > Hi,
> >
> > I would like to pass a variable to my airflow dag and would like to know
> if
> > there is a recommended method for doing this.
> >
> > I am hoping to create a dag with python operators and tasks that read
> data
> > from a parquet table, perform a calculation then write the results into a
> > new table. I'd like to pass the source table name in along with the task
> > when calling the dag from the command line.
> >
> > From what I have read, the following can be used to read a variable from
> > the command line:
> >
> > airflow variables -s myvar="value"
> >
> > Does anyone have an example of this they can share?
> >
> > Thank you,
> > Rob
> >
>


Re: Passing Variables

2017-06-20 Thread siddharth anand
We use Airflow variables heavily.

from airflow.models import Variable

# Load an environment variable as a string

ENV = Variable.get('ENV').strip()

# Load an environment variable as JSON and access a JSON field named
PLATFORM

PLATFORM = 'EP'

SSH_KEY = Variable.get('ep_platform_ssh_keys',
deserialize_json=True)[PLATFORM]


You can put this code in your dag file or in any python code your dag file
imports.

-s

On Mon, Jun 19, 2017 at 4:11 AM, Rob Harrison  wrote:

> Hi,
>
> I would like to pass a variable to my airflow dag and would like to know if
> there is a recommended method for doing this.
>
> I am hoping to create a dag with python operators and tasks that read data
> from a parquet table, perform a calculation then write the results into a
> new table. I'd like to pass the source table name in along with the task
> when calling the dag from the command line.
>
> From what I have read, the following can be used to read a variable from
> the command line:
>
> airflow variables -s myvar="value"
>
> Does anyone have an example of this they can share?
>
> Thank you,
> Rob
>


Re: New Apache Airflow meetup : in Tokyo

2017-06-09 Thread siddharth anand
Thx Kengo san.
https://twitter.com/ApacheAirflow/status/873288362391609345

You may also update
https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements directly
in the future.

You now have full admin perms on the Wiki.
-s

On Thu, Jun 8, 2017 at 9:20 PM, Kengo Seki <sek...@apache.org> wrote:

> Thanks a lot, Sid!
> Other slides have been published now. May I ask you to tweet and add
> them to the wiki too?
>
> https://www.slideshare.net/techblogyahoo/oozieairflow-apacheairflow-oozie
> https://speakerdeck.com/hatappi/airflowkarakuroko2nicheng-rihuan-etawake
>
> Airflow is not so popular in Japan yet, and a part of its reason is
> few information in Japanese, IMHO.
> The above slides are written in Japanese and really valuable for the
> (potential) Airflow users in Japan.
>
> Regards,
>
> Kengo Seki <sek...@apache.org>
>
>
> 2017-05-16 3:23 GMT+09:00 siddharth anand <san...@apache.org>:
> > Thx Kengo!
> >
> > I've added @takus slides to Airflow Links
> > <https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links> &
> > Announcements
> > <https://cwiki.apache.org/confluence/display/AIRFLOW/
> Announcements#Announcements-April12,2017>
> > &
> > tweeted them!
> >
> > -s
> >
> > On Thu, May 11, 2017 at 5:26 PM, Kengo Seki <sek...@apache.org> wrote:
> >
> >> Thank you for the announcement, Sid!
> >> Yes, we held the first meetup in Tokyo last night. About 20
> >> participants enjoyed the following talks:
> >>
> >> * Takumi Sakamoto (@takus, Kaizen Platform) did a nice introduction
> >> about Airflow.
> >>   He explained Airflow's nice features such as pool, SLA and backfill,
> >>   and showed interesting tips such as combination with Jupyter
> >> Notebook and Datadog.
> >>   His slide is published in:
> >> https://speakerdeck.com/takus/building-data-pipelines-with-
> apache-airflow
> >>
> >> * Tomoki Uekusa (@tmk_ueks, Yahoo! Japan) showed their usecase in
> >> production.
> >>   He replaced some Oozie nodes with Airflow and be happy now :)
> >>   He also introduced useful tips, such as utilizing tree view and gantt
> >> chart,
> >>   and explained the way to implement plugins and showed some real
> examples.
> >>
> >> * Yusaku Hatanaka (@hatappi, Speee) compared Airflow and Kuroko2
> >> (https://github.com/cookpad/kuroko2).
> >>   Unfortunately they moved to the latter, because they didn't needed
> >> all of Airflow's rich features
> >>   and preferred Ruby over Python. But he explained Airflow's pros and
> >> cons in a comprehensible way,
> >>   and introduced some pitfalls and useful workarounds such as a
> >> problem caused by timezone setting.
> >>
> >> * I (@sekikn39, NTT Data) explained how to contribute Airflow,
> >>   for example the way to search and create JIRA issues, run unit tests
> >> and submit PRs.
> >>
> >> I really appreciate all speakers and audiences, and Yahoo! Japan folks
> >> for hosting this event.
> >> Though there's time difference between us, I hope to see some of you
> >> core developers next time!
> >>
> >> Regards,
> >>
> >> Kengo Seki <sek...@apache.org>
> >>
> >>
> >> 2017-04-13 7:09 GMT+09:00 siddharth anand <san...@apache.org>:
> >> > Live in Tokyo & want to contribute to @ApacheAirflow
> >> > <https://twitter.com/ApacheAirflow>? Check out our new Tokyo meetup :
> >> > http://bit.ly/2o7jXWF  <https://t.co/4yaEfFwqu0>. First meetup on May
> >> 11 :
> >> > https://www.meetup.com/Tokyo-Apache-Airflow-incubating-
> >> Meetup/events/238731591/
> >> >
> >> > Thanks to Kengo Seki (@sekikn) for taking the lead on this!
> >> > -s
> >>
>


Re: Concurrent schedulers

2017-05-23 Thread siddharth anand
I did run into "double SLA miss alarms" firing, but that was on 1.7x. I
haven't tested if that is still an issue in 1.8x.

-s

On Tue, May 23, 2017 at 8:46 AM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Awesome. I wasn't aware of DagRun locking, this is even better!
>
> Max
>
> On Mon, May 22, 2017 at 11:39 PM, Bolke de Bruin 
> wrote:
>
> > Hi Max,
> >
> > We seem to be in quite good order already. We are testing with multi
> > master mysql and will also test multi master Postgres. As we are doing
> > dagrun level locking already it does not seem to be required to do
> > DAG-level locking. Also tasks are being locked so if multiple schedulers
> > are running everything seems to be quite fine. If one of the schedulers
> > restarts it starts checking for orphaned tasks by checking the executor
> > queue which is unique for every scheduler. This will result it some tasks
> > being dequeued and then requeued. So airflow is robust enough to stay
> alive
> > then (with my patch for deadlocks applied), but some things are a bit
> > sub-optimal.
> >
> > As mentioned we are still stress testing this setup and we might find
> more.
> >
> > Bolke
> >
> > > On 22 May 2017, at 18:19, Maxime Beauchemin <
> maximebeauche...@gmail.com>
> > wrote:
> > >
> > > Things that might be needed for a correct multi-schedulers setup:
> > > * DAG-level lock while being evaluated
> > > * DAG-level lock expiration to recover from potential situation where
> the
> > > lock wasn't released
> > > * Accumulation of the list of task instances to run into the database
> (as
> > > opposed to cross process communication to master process)
> > > * Define a clear master cycle that would read the list of accumulated
> > task
> > > instances from the DB, dedup, prioritize and schedule. That master
> cycle
> > > should have a lock (and lock expiration) as well.
> > >
> > > Max
> > >
> > > On Mon, May 22, 2017 at 12:27 AM, Bolke de Bruin 
> > wrote:
> > >
> > >> Hi Stephen,
> > >>
> > >> We are currently stress testing Airflow for use in a multi-master
> setup.
> > >> One of my team members is doing a write up that should show up online
> > >> shortly. TL;DR; in its current state Airflow will need some patches in
> > >> order to run concurrently. One issue is that Airflow can have a
> database
> > >> deadlock which will stop the scheduler from running. I have a patch
> for
> > >> that out here (https://github.com/apache/incubator-airflow/pull/2267
> <
> > >> https://github.com/apache/incubator-airflow/pull/2267>) that works
> fine
> > >> on Postgres/MySql (tests don’t pass on sqlite yet due to limitations
> of
> > >> sqlite).
> > >>
> > >> Your global scheduler lock (eg. by an active passive configuration)
> > might
> > >> make most sense for now.
> > >>
> > >> Bolke
> > >>
> > >>> On 22 May 2017, at 07:52, Stephen Rigney  wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> We're running airflow in production, but for reliability (n.b. not
> > >>> performance) we'd like to confirm if it is safe to spawn multiple
> > >> instances
> > >>> of the scheduler overlapping in time (otherwise we may need to put
> more
> > >>> effort into assuring two copies aren't ever spawned at once in our
> > >>> environment).
> > >>>
> > >>>
> > >>> It seems this officially wasn't a supported configuration back in
> 2015
> > (
> > >>> https://groups.google.com/d/msg/airbnb_airflow/-
> > 1wKa3OcwME/uATa8y3YDAAJ
> > >> ),
> > >>> but has sufficient intra-airflow locking been added that it is now
> safe
> > >> to
> > >>> start up two temporally overlapping instances of the scheduler for
> the
> > >> same
> > >>> airflow system?
> > >>>
> > >>>
> > >>> Or should we hack in a "global scheduler lock" - we're not looking
> for
> > >>> increased performance by scheduler parallelism, just that if we ever
> > fire
> > >>> up two instances of the scheduler nothing terrible happens?
> > >>>
> > >>>
> > >>> Stephen
> > >>
> > >>
> >
> >
>


Re: Removing members from dev list?

2017-05-22 Thread siddharth anand
Great. Thx Andrew.
-s

On Mon, May 22, 2017 at 5:23 AM, Andrew Phillips  wrote:

> Hi Siddarth
>
> How do we (PMC) remove a email recipient from the dev list?
>>
>
> Anyone who is a moderator of the list should be able to request removal of
> a subscriber by sending an email to [1]:
>
> {listname}-unsubscribe-badboy=menace@tlp.apache.org
>
> I.e. in this case
>
> dev-unsubscribe-kerzhner=yahoo-inc@airflow.incubator.apache.org
>
> Regards
>
> ap
>
> [1] https://reference.apache.org/pmc/ml#problem_posts
>


Removing members from dev list?

2017-05-22 Thread siddharth anand
How do we (PMC) remove a email recipient from the dev list? I keep getting
requests to moderate the following because "kerzh...@yahoo-inc.com" is no
longer at the yahoo.

"kerzh...@yahoo-inc.com is no longer with Yahoo! Inc."

-s


Re: Article: Why Robinhood uses Airflow

2017-05-15 Thread siddharth anand
Tweeted it using the Airflow account!

-s

On Thu, May 11, 2017 at 5:33 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> https://robinhood.engineering/why-robinhood-uses-airflow-aed13a9a90c8
>
> Grateful to have you on board Robinhood!
>


Re: IMPORTANT: I need your pypi usernames

2017-05-11 Thread siddharth anand
Chris,
Sorry for the delay, my user name is r39132 for both of those!

-s

On Tue, May 9, 2017 at 1:25 PM, Chris Riccomini 
wrote:

> I have added the following:
>
> https://pypi.python.org/pypi/apache-airflow
> artwr, aoen, mistercrunch
>
> https://testpypi.python.org/pypi/apache-airflow
> mistercrunch
>
> Others, please provide your usernames. Everyone is being granted ownership
> access.
>
> On Tue, May 9, 2017 at 12:57 PM, Chris Riccomini 
> wrote:
>
> > Hey all,
> >
> > As part of 1.8.1, we are migrating from the `airflow` to `apache-airflow`
> > package name in PyPi. I have the new space, but I need everyone's
> usernames
> > for PyPi (both regular and test), so that all committers can publish new
> > releases. Please reply with your usernames for both:
> >
> > https://pypi.python.org
> > https://testpypi.python.org
> >
> > Cheers,
> > Chris
> >
>


Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-19 Thread siddharth anand
https://issues.apache.org/jira/browse/AIRFLOW-1121 is merged to fix the
webserver pid issue.. thx Kengo!

-s

On Tue, Apr 18, 2017 at 6:15 PM, Hitesh Shah  wrote:

> -1.
>
> Not sure if these have been called out earlier.
>
> For all the bundled files with different licenses (MIT, BSD, etc), the full
> texts of these licenses should be in the source tarball preferably at the
> end of the LICENSE file.
> webgl-2d needs to be called out as MIT license.
> Version in pkg-info has an rc0 notation. It should just be
> 1.8.1-incubating.
> A bunch of files under apache_airflow.egg-info/ and scripts/systemd/ need a
> license header
> Likewise for airflow/www/templates/airflow/variables/README.md
>
> Nice to have:
> Fix the top-level dir in the tarball to be
> "apache-airflow-1.8.1-incubating" instead of
> "apache-airflow-1.8.1rc0+apache.incubating"
>
> For all the other binary files (images, gifs), is there source provenance
> for all of them and that all of them are covered by the licenses in the
> LICENSE file?
>
> Last point - are all the entries in the NOTICE file required or do they
> just need to be in the LICENSE file? Any additions to the NOTICE have
> downstream repercussions as they need to be propagated down by any other
> project using airflow.
>
> thanks
> -- Hitesh
>
>
>
> On Mon, Apr 17, 2017 at 11:24 AM, Chris Riccomini 
> wrote:
>
> > Dear All,
> >
> > I have been able to make the Airflow 1.8.1 RC0 available at:
> > https://dist.apache.org/repos/dist/dev/incubator/airflow, public keys
> are
> > available at https://dist.apache.org/repos/
> dist/release/incubator/airflow.
> >
> > Issues fixed:
> >
> > [AIRFLOW-1062] DagRun#find returns wrong result if external_trigg
> > [AIRFLOW-1054] Fix broken import on test_dag
> > [AIRFLOW-1050] Retries ignored - regression
> > [AIRFLOW-1033] TypeError: can't compare datetime.datetime to None
> > [AIRFLOW-1030] HttpHook error when creating HttpSensor
> > [AIRFLOW-1017] get_task_instance should return None instead of th
> > [AIRFLOW-1011] Fix bug in BackfillJob._execute() for SubDAGs
> > [AIRFLOW-1001] Landing Time shows "unsupported operand type(s) fo
> > [AIRFLOW-1000] Rebrand to Apache Airflow instead of Airflow
> > [AIRFLOW-989] Clear Task Regression
> > [AIRFLOW-974] airflow.util.file mkdir has a race condition
> > [AIRFLOW-906] Update Code icon from lightning bolt to file
> > [AIRFLOW-858] Configurable database name for DB operators
> > [AIRFLOW-853] ssh_execute_operator.py stdout decode default to A
> > [AIRFLOW-832] Fix debug server
> > [AIRFLOW-817] Trigger dag fails when using CLI + API
> > [AIRFLOW-816] Make sure to pull nvd3 from local resources
> > [AIRFLOW-815] Add previous/next execution dates to available def
> > [AIRFLOW-813] Fix unterminated unit tests in tests.job (tests/jo
> > [AIRFLOW-812] Scheduler job terminates when there is no dag file
> > [AIRFLOW-806] UI should properly ignore DAG doc when it is None
> > [AIRFLOW-794] Consistent access to DAGS_FOLDER and SQL_ALCHEMY_C
> > [AIRFLOW-785] ImportError if cgroupspy is not installed
> > [AIRFLOW-784] Cannot install with funcsigs > 1.0.0
> > [AIRFLOW-780] The UI no longer shows broken DAGs
> > [AIRFLOW-777] dag_is_running is initlialized to True instead of
> > [AIRFLOW-719] Skipped operations make DAG finish prematurely
> > [AIRFLOW-694] Empty env vars do not overwrite non-empty config v
> > [AIRFLOW-139] Executing VACUUM with PostgresOperator
> > [AIRFLOW-111] DAG concurrency is not honored
> > [AIRFLOW-88] Improve clarity Travis CI reports
> >
> > I would like to raise a VOTE for releasing 1.8.1 based on release
> candidate
> > 0, i.e. just renaming release candidate 0 to 1.8.1 release.
> >
> > Please respond to this email by:
> >
> > +1,0,-1 with *binding* if you are a PMC member or *non-binding* if you
> are
> > not.
> >
> > Vote will run for 72 hours (ends this Thursday).
> >
> > Thanks!
> > Chris
> >
> > My VOTE: +1 (binding)
> >
>


Re: Best practices on Long running process over LB

2017-04-18 Thread siddharth anand
Another approach :
1. Airflow calls webservice in a fire-and-forget fashion
2. Webservice updates a message bus/stream (e.g. SQS) with result
3. An airfllow sensor pulls updates off SQS and processes them

This saves airflow from polling your webservice which would in turn poll
your DB. Additionally, it avoids coupling your airflow instance to the
availability of your webservice and DB. Also, you'd need to implement an
efficient http endpoint to return status on a potentially long list of
status_ids and then you'd need to manage that list of ids.

SQS is great.  It's cheap to poll (and SQS supports long-polling as well)
and doesn't couple Airflow to the uptime of your webservice and DB. SQS
also supports batch reads and is transactional.

-s

On Tue, Apr 18, 2017 at 3:44 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> The proper way to do this is for your service to return a token (unique
> identifier for the long running process) asynchronously (immediately), and
> to then call another endpoint to check on the status while passing this
> token.
>
> Since this is Airflow and you have the luxury of having a lot of predefined
> sensors, you may just have to call a trigger endpoint async, and in the
> next task have a sensor look for the actual byproduct of that service's
> process (say if the process generates an S3 file, you'd have an S3Sensor
> right after the trigger task). The good thing with this approach is that
> this is more "stateless" than the approach where you are using a token (it
> allows for tasks to die without worrying about the token).
>
> Max
>
> On Tue, Apr 18, 2017 at 2:47 PM, Amit Jain  wrote:
>
> > Hi All,
> >
> > We have a use case where we are building Airflow DAG consisting of few
> > tasks and each task (HttpOperator) is calling the service running behind
> > AWS Elastic Load Balancer (ELB).
> >
> > Since these tasks are the long running process so I'm getting 504 GATEWAY
> > TIMEOUT HTTP status code and resulting into incorrect task status at
> > Airflow side.
> >
> > IMO to solve this problem, we can choose among following approaches
> >
> >- Make a call to the service and service will send back response and
> >process actual request in another thread/process. One monitoring
> thread
> >would heartbeat about task status to DB. At Airflow side, immediate
> task
> >after each HttpOperator, we should have a sensor which should check
> for
> > the
> >status change in given poke interval.
> >- Since we have around 1500 task running per hour so using service
> >discovery system like Apache Zookeeper to get the node in round-robin
> >fashion would make a direct connection with the node running service.
> >- AWS ELB has limitation over HTTP idle-timeout to 1hr and my tasks
> are
> >taking ~ 3 hr to get it done so no change at AWS ELB possible
> >
> >
> > Both approaches have cons first one, makes us change our current flow at
> > each service side i.e. handle a request in async mode, start heartbeat on
> > executing process/thread status in some interval hence the DB writes.
> >
> > I'm interested to know how you guys are handling this problem and any
> > suggestion or improvement in mentioned approaches I can use.
> >
> >
> > Thanks,
> > Amit
> >
>


Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-18 Thread siddharth anand
https://issues.apache.org/jira/browse/AIRFLOW-1121

Jira filed.

On Tue, Apr 18, 2017 at 1:27 PM, siddharth anand <san...@apache.org> wrote:

> Sure. As soon as I get out of my meetings.
>
> -s
>
> On Tue, Apr 18, 2017 at 1:01 PM Chris Riccomini <criccom...@apache.org>
> wrote:
>
>> @Sid, can you open JIRA(s), and assign them as blockers to 1.8.1?
>>
>> On Tue, Apr 18, 2017 at 12:39 PM, siddharth anand <san...@apache.org>
>> wrote:
>>
>> > I've run into a regression with the webserver. It looks like the --pid
>> > argument is no longer honored in 1.8.1. The pid file is not being
>> written
>> > out! As a result, monitd, which watches the processes mentioned in the
>> pid
>> > file, keep trying to spawn webservers.
>> >
>> > HISTTIMEFORMAT="%d/%m/%y %T "
>> > PYTHONPATH=/usr/local/agari/ep-pipeline/production/
>> > current/analysis/cluster/:/usr/local/agari/ep-pipeline/
>> > production/current/analysis/lookups/
>> > TMP=/data/tmp AIRFLOW_HOME=/data/airflow
>> > PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin airflow webserver -p
>> > 8080  --pid /data/airflow/pids/airflow-webserver.pid
>> >
>> > The "upgrade" process for 1.8.1. is not simply  "pip install 1.8.1.
>> > tarball". It requires a "pip uninstall" of the previous 1.8.0 version
>> > followed by an new installation of 1.8.1. This could have pulled in some
>> > new dependencies that broke how this works.
>> >
>> > On Tue, Apr 18, 2017 at 12:32 PM, siddharth anand <san...@apache.org>
>> > wrote:
>> >
>> > > Hmn.. it always worked for me for any of the releases we installed. I
>> > > install `pip install `
>> > >
>> > > -s
>> > >
>> > > On Tue, Apr 18, 2017 at 10:44 AM, Chris Riccomini <
>> criccom...@apache.org
>> > >
>> > > wrote:
>> > >
>> > >> @Sid, how do you enable the versioning? I've never been able to get
>> this
>> > >> to
>> > >> work in my environment. It always shows "Not available", even with
>> > 1.8.0.
>> > >>
>> > >> On Mon, Apr 17, 2017 at 11:18 PM, Bolke de Bruin <bdbr...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > Hey Alex,
>> > >> >
>> > >> > I agree with you that they are nice to have, but as you mentioned
>> they
>> > >> are
>> > >> > not blockers. As we are moving towards time based releases I
>> suggest
>> > >> > marking them for 1.8.2 and cherry-picking them in your production.
>> > >> >
>> > >> > - Bolke.
>> > >> >
>> > >> > > On 18 Apr 2017, at 00:02, Alex Guziel
>> <alex.guz...@airbnb.com.INVALI
>> > >> D>
>> > >> > wrote:
>> > >> > >
>> > >> > > Sorry about that. FWIW, these were recent and I don't think they
>> > were
>> > >> > > blockers but are nice to fix. Particularly, the tree one was
>> > forgotten
>> > >> > > about. I remember seeing it at the Airflow hackathon but I guess
>> I
>> > >> forgot
>> > >> > > to correct it.
>> > >> > >
>> > >> > > On Mon, Apr 17, 2017 at 12:17 PM, Chris Riccomini <
>> > >> criccom...@apache.org
>> > >> > >
>> > >> > > wrote:
>> > >> > >
>> > >> > >> :(:(:( Why was this not included in 1.8.1 JIRA? I've been
>> emailing
>> > >> the
>> > >> > list
>> > >> > >> all last week
>> > >> > >>
>> > >> > >> On Mon, Apr 17, 2017 at 11:28 AM, Alex Guziel <
>> > >> > >> alex.guz...@airbnb.com.invalid> wrote:
>> > >> > >>
>> > >> > >>> I would say to include [1074] (
>> > >> > >>> https://github.com/apache/incubator-airflow/pull/2221) so we
>> > don't
>> > >> > have
>> > >> > >> a
>> > >> > >>> regression in the release after. I would also say
>> > >> > >>> https://github.com/apache/incubator-airflow/pull/2241 is semi
>> > >> > important
>> > >> > >

Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-18 Thread siddharth anand
Sure. As soon as I get out of my meetings.

-s

On Tue, Apr 18, 2017 at 1:01 PM Chris Riccomini <criccom...@apache.org>
wrote:

> @Sid, can you open JIRA(s), and assign them as blockers to 1.8.1?
>
> On Tue, Apr 18, 2017 at 12:39 PM, siddharth anand <san...@apache.org>
> wrote:
>
> > I've run into a regression with the webserver. It looks like the --pid
> > argument is no longer honored in 1.8.1. The pid file is not being written
> > out! As a result, monitd, which watches the processes mentioned in the
> pid
> > file, keep trying to spawn webservers.
> >
> > HISTTIMEFORMAT="%d/%m/%y %T "
> > PYTHONPATH=/usr/local/agari/ep-pipeline/production/
> > current/analysis/cluster/:/usr/local/agari/ep-pipeline/
> > production/current/analysis/lookups/
> > TMP=/data/tmp AIRFLOW_HOME=/data/airflow
> > PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin airflow webserver -p
> > 8080  --pid /data/airflow/pids/airflow-webserver.pid
> >
> > The "upgrade" process for 1.8.1. is not simply  "pip install 1.8.1.
> > tarball". It requires a "pip uninstall" of the previous 1.8.0 version
> > followed by an new installation of 1.8.1. This could have pulled in some
> > new dependencies that broke how this works.
> >
> > On Tue, Apr 18, 2017 at 12:32 PM, siddharth anand <san...@apache.org>
> > wrote:
> >
> > > Hmn.. it always worked for me for any of the releases we installed. I
> > > install `pip install `
> > >
> > > -s
> > >
> > > On Tue, Apr 18, 2017 at 10:44 AM, Chris Riccomini <
> criccom...@apache.org
> > >
> > > wrote:
> > >
> > >> @Sid, how do you enable the versioning? I've never been able to get
> this
> > >> to
> > >> work in my environment. It always shows "Not available", even with
> > 1.8.0.
> > >>
> > >> On Mon, Apr 17, 2017 at 11:18 PM, Bolke de Bruin <bdbr...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hey Alex,
> > >> >
> > >> > I agree with you that they are nice to have, but as you mentioned
> they
> > >> are
> > >> > not blockers. As we are moving towards time based releases I suggest
> > >> > marking them for 1.8.2 and cherry-picking them in your production.
> > >> >
> > >> > - Bolke.
> > >> >
> > >> > > On 18 Apr 2017, at 00:02, Alex Guziel
> <alex.guz...@airbnb.com.INVALI
> > >> D>
> > >> > wrote:
> > >> > >
> > >> > > Sorry about that. FWIW, these were recent and I don't think they
> > were
> > >> > > blockers but are nice to fix. Particularly, the tree one was
> > forgotten
> > >> > > about. I remember seeing it at the Airflow hackathon but I guess I
> > >> forgot
> > >> > > to correct it.
> > >> > >
> > >> > > On Mon, Apr 17, 2017 at 12:17 PM, Chris Riccomini <
> > >> criccom...@apache.org
> > >> > >
> > >> > > wrote:
> > >> > >
> > >> > >> :(:(:( Why was this not included in 1.8.1 JIRA? I've been
> emailing
> > >> the
> > >> > list
> > >> > >> all last week
> > >> > >>
> > >> > >> On Mon, Apr 17, 2017 at 11:28 AM, Alex Guziel <
> > >> > >> alex.guz...@airbnb.com.invalid> wrote:
> > >> > >>
> > >> > >>> I would say to include [1074] (
> > >> > >>> https://github.com/apache/incubator-airflow/pull/2221) so we
> > don't
> > >> > have
> > >> > >> a
> > >> > >>> regression in the release after. I would also say
> > >> > >>> https://github.com/apache/incubator-airflow/pull/2241 is semi
> > >> > important
> > >> > >>> but
> > >> > >>> less so.
> > >> > >>>
> > >> > >>> On Mon, Apr 17, 2017 at 11:24 AM, Chris Riccomini <
> > >> > criccom...@apache.org
> > >> > >>>
> > >> > >>> wrote:
> > >> > >>>
> > >> > >>>> Dear All,
> > >> > >>>>
> > >> > >>>> I have been able to make the Airflow 1.8.1 RC0 available at:
> > >> > >>>> h

Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-18 Thread siddharth anand
Hmn.. it always worked for me for any of the releases we installed. I
install `pip install `

-s

On Tue, Apr 18, 2017 at 10:44 AM, Chris Riccomini 
wrote:

> @Sid, how do you enable the versioning? I've never been able to get this to
> work in my environment. It always shows "Not available", even with 1.8.0.
>
> On Mon, Apr 17, 2017 at 11:18 PM, Bolke de Bruin 
> wrote:
>
> > Hey Alex,
> >
> > I agree with you that they are nice to have, but as you mentioned they
> are
> > not blockers. As we are moving towards time based releases I suggest
> > marking them for 1.8.2 and cherry-picking them in your production.
> >
> > - Bolke.
> >
> > > On 18 Apr 2017, at 00:02, Alex Guziel 
> > wrote:
> > >
> > > Sorry about that. FWIW, these were recent and I don't think they were
> > > blockers but are nice to fix. Particularly, the tree one was forgotten
> > > about. I remember seeing it at the Airflow hackathon but I guess I
> forgot
> > > to correct it.
> > >
> > > On Mon, Apr 17, 2017 at 12:17 PM, Chris Riccomini <
> criccom...@apache.org
> > >
> > > wrote:
> > >
> > >> :(:(:( Why was this not included in 1.8.1 JIRA? I've been emailing the
> > list
> > >> all last week
> > >>
> > >> On Mon, Apr 17, 2017 at 11:28 AM, Alex Guziel <
> > >> alex.guz...@airbnb.com.invalid> wrote:
> > >>
> > >>> I would say to include [1074] (
> > >>> https://github.com/apache/incubator-airflow/pull/2221) so we don't
> > have
> > >> a
> > >>> regression in the release after. I would also say
> > >>> https://github.com/apache/incubator-airflow/pull/2241 is semi
> > important
> > >>> but
> > >>> less so.
> > >>>
> > >>> On Mon, Apr 17, 2017 at 11:24 AM, Chris Riccomini <
> > criccom...@apache.org
> > >>>
> > >>> wrote:
> > >>>
> >  Dear All,
> > 
> >  I have been able to make the Airflow 1.8.1 RC0 available at:
> >  https://dist.apache.org/repos/dist/dev/incubator/airflow, public
> keys
> > >>> are
> >  available at https://dist.apache.org/repos/
> > >>> dist/release/incubator/airflow.
> > 
> >  Issues fixed:
> > 
> >  [AIRFLOW-1062] DagRun#find returns wrong result if external_trigg
> >  [AIRFLOW-1054] Fix broken import on test_dag
> >  [AIRFLOW-1050] Retries ignored - regression
> >  [AIRFLOW-1033] TypeError: can't compare datetime.datetime to None
> >  [AIRFLOW-1030] HttpHook error when creating HttpSensor
> >  [AIRFLOW-1017] get_task_instance should return None instead of th
> >  [AIRFLOW-1011] Fix bug in BackfillJob._execute() for SubDAGs
> >  [AIRFLOW-1001] Landing Time shows "unsupported operand type(s) fo
> >  [AIRFLOW-1000] Rebrand to Apache Airflow instead of Airflow
> >  [AIRFLOW-989] Clear Task Regression
> >  [AIRFLOW-974] airflow.util.file mkdir has a race condition
> >  [AIRFLOW-906] Update Code icon from lightning bolt to file
> >  [AIRFLOW-858] Configurable database name for DB operators
> >  [AIRFLOW-853] ssh_execute_operator.py stdout decode default to A
> >  [AIRFLOW-832] Fix debug server
> >  [AIRFLOW-817] Trigger dag fails when using CLI + API
> >  [AIRFLOW-816] Make sure to pull nvd3 from local resources
> >  [AIRFLOW-815] Add previous/next execution dates to available def
> >  [AIRFLOW-813] Fix unterminated unit tests in tests.job (tests/jo
> >  [AIRFLOW-812] Scheduler job terminates when there is no dag file
> >  [AIRFLOW-806] UI should properly ignore DAG doc when it is None
> >  [AIRFLOW-794] Consistent access to DAGS_FOLDER and SQL_ALCHEMY_C
> >  [AIRFLOW-785] ImportError if cgroupspy is not installed
> >  [AIRFLOW-784] Cannot install with funcsigs > 1.0.0
> >  [AIRFLOW-780] The UI no longer shows broken DAGs
> >  [AIRFLOW-777] dag_is_running is initlialized to True instead of
> >  [AIRFLOW-719] Skipped operations make DAG finish prematurely
> >  [AIRFLOW-694] Empty env vars do not overwrite non-empty config v
> >  [AIRFLOW-139] Executing VACUUM with PostgresOperator
> >  [AIRFLOW-111] DAG concurrency is not honored
> >  [AIRFLOW-88] Improve clarity Travis CI reports
> > 
> >  I would like to raise a VOTE for releasing 1.8.1 based on release
> > >>> candidate
> >  0, i.e. just renaming release candidate 0 to 1.8.1 release.
> > 
> >  Please respond to this email by:
> > 
> >  +1,0,-1 with *binding* if you are a PMC member or *non-binding* if
> you
> > >>> are
> >  not.
> > 
> >  Vote will run for 72 hours (ends this Thursday).
> > 
> >  Thanks!
> >  Chris
> > 
> >  My VOTE: +1 (binding)
> > 
> > >>>
> > >>
> >
> >
>


Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-17 Thread siddharth anand
Just installed the rc in our staging env and letting it bake.

FYI, I noticed that the version is not available at the following URI :
/admin/versionview/
Here's a screenshot :
https://www.dropbox.com/s/shdzwadb8klqwt2/Screenshot%202017-04-17%2020.57.06.png?dl=0

I installed via pip!
-s

On Mon, Apr 17, 2017 at 3:02 PM, Alex Guziel  wrote:

> Sorry about that. FWIW, these were recent and I don't think they were
> blockers but are nice to fix. Particularly, the tree one was forgotten
> about. I remember seeing it at the Airflow hackathon but I guess I forgot
> to correct it.
>
> On Mon, Apr 17, 2017 at 12:17 PM, Chris Riccomini 
> wrote:
>
> > :(:(:( Why was this not included in 1.8.1 JIRA? I've been emailing the
> list
> > all last week
> >
> > On Mon, Apr 17, 2017 at 11:28 AM, Alex Guziel <
> > alex.guz...@airbnb.com.invalid> wrote:
> >
> > > I would say to include [1074] (
> > > https://github.com/apache/incubator-airflow/pull/2221) so we don't
> have
> > a
> > > regression in the release after. I would also say
> > > https://github.com/apache/incubator-airflow/pull/2241 is semi
> important
> > > but
> > > less so.
> > >
> > > On Mon, Apr 17, 2017 at 11:24 AM, Chris Riccomini <
> criccom...@apache.org
> > >
> > > wrote:
> > >
> > > > Dear All,
> > > >
> > > > I have been able to make the Airflow 1.8.1 RC0 available at:
> > > > https://dist.apache.org/repos/dist/dev/incubator/airflow, public
> keys
> > > are
> > > > available at https://dist.apache.org/repos/
> > > dist/release/incubator/airflow.
> > > >
> > > > Issues fixed:
> > > >
> > > > [AIRFLOW-1062] DagRun#find returns wrong result if external_trigg
> > > > [AIRFLOW-1054] Fix broken import on test_dag
> > > > [AIRFLOW-1050] Retries ignored - regression
> > > > [AIRFLOW-1033] TypeError: can't compare datetime.datetime to None
> > > > [AIRFLOW-1030] HttpHook error when creating HttpSensor
> > > > [AIRFLOW-1017] get_task_instance should return None instead of th
> > > > [AIRFLOW-1011] Fix bug in BackfillJob._execute() for SubDAGs
> > > > [AIRFLOW-1001] Landing Time shows "unsupported operand type(s) fo
> > > > [AIRFLOW-1000] Rebrand to Apache Airflow instead of Airflow
> > > > [AIRFLOW-989] Clear Task Regression
> > > > [AIRFLOW-974] airflow.util.file mkdir has a race condition
> > > > [AIRFLOW-906] Update Code icon from lightning bolt to file
> > > > [AIRFLOW-858] Configurable database name for DB operators
> > > > [AIRFLOW-853] ssh_execute_operator.py stdout decode default to A
> > > > [AIRFLOW-832] Fix debug server
> > > > [AIRFLOW-817] Trigger dag fails when using CLI + API
> > > > [AIRFLOW-816] Make sure to pull nvd3 from local resources
> > > > [AIRFLOW-815] Add previous/next execution dates to available def
> > > > [AIRFLOW-813] Fix unterminated unit tests in tests.job (tests/jo
> > > > [AIRFLOW-812] Scheduler job terminates when there is no dag file
> > > > [AIRFLOW-806] UI should properly ignore DAG doc when it is None
> > > > [AIRFLOW-794] Consistent access to DAGS_FOLDER and SQL_ALCHEMY_C
> > > > [AIRFLOW-785] ImportError if cgroupspy is not installed
> > > > [AIRFLOW-784] Cannot install with funcsigs > 1.0.0
> > > > [AIRFLOW-780] The UI no longer shows broken DAGs
> > > > [AIRFLOW-777] dag_is_running is initlialized to True instead of
> > > > [AIRFLOW-719] Skipped operations make DAG finish prematurely
> > > > [AIRFLOW-694] Empty env vars do not overwrite non-empty config v
> > > > [AIRFLOW-139] Executing VACUUM with PostgresOperator
> > > > [AIRFLOW-111] DAG concurrency is not honored
> > > > [AIRFLOW-88] Improve clarity Travis CI reports
> > > >
> > > > I would like to raise a VOTE for releasing 1.8.1 based on release
> > > candidate
> > > > 0, i.e. just renaming release candidate 0 to 1.8.1 release.
> > > >
> > > > Please respond to this email by:
> > > >
> > > > +1,0,-1 with *binding* if you are a PMC member or *non-binding* if
> you
> > > are
> > > > not.
> > > >
> > > > Vote will run for 72 hours (ends this Thursday).
> > > >
> > > > Thanks!
> > > > Chris
> > > >
> > > > My VOTE: +1 (binding)
> > > >
> > >
> >
>


Re: Welcome @saguziel as a committer and PMC member!

2017-04-17 Thread siddharth anand
Welcome Alex!

On Fri, Apr 14, 2017 at 8:23 AM, Chris Riccomini 
wrote:

> Congrats, Alex! Welcome. :)
>
> On Thu, Apr 13, 2017 at 7:06 PM, Dan Davydov  invalid
> > wrote:
>
> > Alex (@saguziel - AirBnB) has been making contributions and reviews for
> > quite a long time now and I'm very happy to say he has just become an
> > official committer and PMC member.
> >
> > He has ~13 commits, most of which are to the core of Airflow, and has
> been
> > active reviewing open source PRs, contributing in the recent release
> (e.g.
> > fixing blocking issues), and has a strong understanding of the the core
> > Airflow logic (he has submitted a couple of patches to remove race
> > conditions, and security patches).
> >
> > Congratulations and welcome Alex!
> > -Dan
> >
>


New Apache Airflow meetup : in Tokyo

2017-04-12 Thread siddharth anand
Live in Tokyo & want to contribute to @ApacheAirflow
? Check out our new Tokyo meetup :
http://bit.ly/2o7jXWF  . First meetup on May 11 :
https://www.meetup.com/Tokyo-Apache-Airflow-incubating-Meetup/events/238731591/

Thanks to Kengo Seki (@sekikn) for taking the lead on this!
-s


Re: Cleanup

2017-04-05 Thread siddharth anand
Edgardo,
This is a great question and something that requires functionality to
address. As Airflow starts getting used for bigger workloads, we need a way
to clean up defunct resources.

   - How do we delete a dag and its related resources?
  - Until the recent release, the way that I stopped having a defunct
  (retired) dag show up in the UI was to move the DAG file out of the
  dag_folder or just deleting it from Git. Our dag folders are
just symlinks
  to tagged Git repos.
  - This no longer works -- the UI will display the dag list based on
  entries in the dag table in the airflow metadata db -- but will no longer
  have code to back that dag table entry. I currently manually delete a row
  from the dag table, but that is surely not the right thing to do.
  - How do we retire entries from the *task_instance, job, log,  xcom,
  sla_miss, dag_stats, *and *dag_run* tables for dags that are deleted?
  (I can surely clean these up manually as well, but we need a UI
  control).
 -  *task_instance, job, log, &* *dag_run *tables grow faster than
 the others
 - How does one track if variables, connections, or pools are no
  longer referenced because all of the DAGs that use them are gone?
 - It would be nice here to have reference counts & links to DAGs
 that reference a Pool, Connection, or Variable. The reference
counts can be
 broken down into paused & unpaused.

It's time we added some functionality to the API/CLI/UI to address these
functionality gaps.

-s

On Tue, Apr 4, 2017 at 10:25 AM, Edgardo Vega 
wrote:

> Max,
>
> Thanks for the reply, it is much appreciated.  I am currently running ~10k
> task a day in our test environment.
>
> It is good to know where the archive point is and that I shouldn't have any
> issues for a long time.
>
> I was just thinking ahead as we get airflow into production environment.
> Maybe in this case maybe way too far ahead.
>
>
> Cheers,
>
> Edgardo
>
> On Tue, Apr 4, 2017 at 11:58 AM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > We run ~50k tasks a day at Airbnb. How many tasks/day are you planning on
> > running?
> >
> > Though you can archive the `task_instance` and `job` table down the line,
> > but that shouldn't be a concern until you hit tens of millions of
> entries.
> > Then you can setup a daily Airflow job that archives some of these
> entries.
> > I believe we do it based on `start_date` and move rows to some other
> table
> > in the same db.
> >
> > Max
> >
> > On Mon, Apr 3, 2017 at 5:30 PM, Edgardo Vega 
> > wrote:
> >
> > > I have been playing with airflow for a few days and it's not obvious
> what
> > > will happen down the road when we have lots of dags over a long period
> of
> > > time. I set a fake dag to run once a minute for a few days and
> everything
> > > seems okay except the graph view dropdown which works but take a few
> > > seconds to show up.
> > >
> > > Is there a way roll older data out of the system in order to clean
> things
> > > visually and keep the database at a smallish size?
> > >
> > > --
> > > Cheers,
> > >
> > > Edgardo
> > >
> >
>
>
>
> --
> Cheers,
>
> Edgardo
>


Re: 1.8.1 release

2017-03-30 Thread siddharth anand
Chris,
I've submitted PRs for :

   - PR [AIRFLOW-1013] :
   https://github.com/apache/incubator-airflow/pull/2203
   - PR [AIRFLOW-1054]:
   https://github.com/apache/incubator-airflow/pull/2201

And filed a blocker for a new issue. Essentially, @once DAGs cannot be
created if catchup=False :
https://issues.apache.org/jira/browse/AIRFLOW-1055

I have a PR that works for this, but will need to add unit tests for it as
well as for AIRFLOW-1013.

-s

On Wed, Mar 29, 2017 at 3:24 PM, siddharth anand <san...@apache.org> wrote:

> Didn't realize https://issues.apache.org/jira/browse/AIRFLOW-1013 was a
> blocker. I will have a PR shortly.
> -s
>
> On Wed, Mar 29, 2017 at 2:07 PM, Chris Riccomini <criccom...@apache.org>
> wrote:
>
>> The following three JIRAs were not merged into the v1-8-test branch, but
>> are listed as part of the 1.8.1 release:
>>
>> AIRFLOW-1017 b2b9587cca9195229ab107394ad94b7702c70e37
>> AIRFLOW-906 bc47200711be4d2c0b36b772651dae4f5e01a204
>> AIRFLOW-858 94dc7fb0a6bb3c563d9df6566cd52a59bd0c4629
>> AIRFLOW-832 b0ae70d3a8e935dc9266b6853683ae5375a7390b
>>
>> I'm going to merge them in now.
>>
>> On Wed, Mar 29, 2017 at 1:53 PM, Chris Riccomini <criccom...@apache.org>
>> wrote:
>>
>> > Hey Bolke,
>> >
>> > Great. Assuming your PR is committed, that leaves five blockers:
>> >
>> > https://issues.apache.org/jira/browse/AIRFLOW-1000
>> > https://issues.apache.org/jira/browse/AIRFLOW-1001
>> > https://issues.apache.org/jira/browse/AIRFLOW-1013
>> > https://issues.apache.org/jira/browse/AIRFLOW-1018
>> > https://issues.apache.org/jira/browse/AIRFLOW-1019
>> >
>> > I've also got a list of all open 1.8.1 JIRAs [1].
>> >
>> > Cheers,
>> > Chris
>> >
>> > [1] https://issues.apache.org/jira/issues/?jql=project%20%
>> > 3D%20AIRFLOW%20AND%20status%20in%20(Open%2C%20%22In%
>> > 20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.8.1
>> >
>> > On Mon, Mar 27, 2017 at 8:59 PM, Bolke de Bruin <bdbr...@gmail.com>
>> wrote:
>> >
>> >> Hi Chris,
>> >>
>> >> I have a PR out for
>> >>
>> >> * Revert of 719, which makes 982 obsolete and removes 983 from the
>> >> blockers list and just a new feature.
>> >>
>> >> See: https://github.com/apache/incubator-airflow/pull/2195 <
>> >> https://github.com/apache/incubator-airflow/pull/2195>
>> >>
>> >> Cc: @alexvanboxel
>> >>
>> >> Bolke
>> >>
>> >> > On 24 Mar 2017, at 10:21, Chris Riccomini <criccom...@apache.org>
>> >> wrote:
>> >> >
>> >> > Hey all,
>> >> >
>> >> > I've let this thread sit for a while. Here are a list of the issues
>> that
>> >> > were raised:
>> >> >
>> >> > BLOCKERS:
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-982
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-983
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1019
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1017
>> >> >
>> >> > NICE TO HAVE:
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1015
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1013
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1004
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1003
>> >> > https://issues.apache.org/jira/browse/AIRFLOW-1001
>> >> >
>> >> > It looks like AIRFLOW-1017 is done, though the JIRA is not closed.
>> >> >
>> >> > The rest remain open. I will wait on the release until the remaining
>> >> > blockers are finished. Dan/Daniel, can you comment on status?
>> >> >
>> >> > Ruslan, if you want to work on your nice to haves, and submit
>> patches,
>> >> > that's great, otherwise I don't believe they'll get fixed as part of
>> >> 1.8.1.
>> >> >
>> >> > Cheers,
>> >> > Chris
>> >> >
>> >> > On Wed, Mar 22, 2017 at 9:19 AM, Ruslan Dautkhanov <
>> >> dautkha...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Thank you Sid!
>> >> >>
>> >> >>
>> >> >> Best regards,
>> >> >> Ruslan
>> >> >>
>> >> >

Re: 1.8.1 release

2017-03-29 Thread siddharth anand
Didn't realize https://issues.apache.org/jira/browse/AIRFLOW-1013 was a
blocker. I will have a PR shortly.
-s

On Wed, Mar 29, 2017 at 2:07 PM, Chris Riccomini <criccom...@apache.org>
wrote:

> The following three JIRAs were not merged into the v1-8-test branch, but
> are listed as part of the 1.8.1 release:
>
> AIRFLOW-1017 b2b9587cca9195229ab107394ad94b7702c70e37
> AIRFLOW-906 bc47200711be4d2c0b36b772651dae4f5e01a204
> AIRFLOW-858 94dc7fb0a6bb3c563d9df6566cd52a59bd0c4629
> AIRFLOW-832 b0ae70d3a8e935dc9266b6853683ae5375a7390b
>
> I'm going to merge them in now.
>
> On Wed, Mar 29, 2017 at 1:53 PM, Chris Riccomini <criccom...@apache.org>
> wrote:
>
> > Hey Bolke,
> >
> > Great. Assuming your PR is committed, that leaves five blockers:
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-1000
> > https://issues.apache.org/jira/browse/AIRFLOW-1001
> > https://issues.apache.org/jira/browse/AIRFLOW-1013
> > https://issues.apache.org/jira/browse/AIRFLOW-1018
> > https://issues.apache.org/jira/browse/AIRFLOW-1019
> >
> > I've also got a list of all open 1.8.1 JIRAs [1].
> >
> > Cheers,
> > Chris
> >
> > [1] https://issues.apache.org/jira/issues/?jql=project%20%
> > 3D%20AIRFLOW%20AND%20status%20in%20(Open%2C%20%22In%
> > 20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.8.1
> >
> > On Mon, Mar 27, 2017 at 8:59 PM, Bolke de Bruin <bdbr...@gmail.com>
> wrote:
> >
> >> Hi Chris,
> >>
> >> I have a PR out for
> >>
> >> * Revert of 719, which makes 982 obsolete and removes 983 from the
> >> blockers list and just a new feature.
> >>
> >> See: https://github.com/apache/incubator-airflow/pull/2195 <
> >> https://github.com/apache/incubator-airflow/pull/2195>
> >>
> >> Cc: @alexvanboxel
> >>
> >> Bolke
> >>
> >> > On 24 Mar 2017, at 10:21, Chris Riccomini <criccom...@apache.org>
> >> wrote:
> >> >
> >> > Hey all,
> >> >
> >> > I've let this thread sit for a while. Here are a list of the issues
> that
> >> > were raised:
> >> >
> >> > BLOCKERS:
> >> > https://issues.apache.org/jira/browse/AIRFLOW-982
> >> > https://issues.apache.org/jira/browse/AIRFLOW-983
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1019
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1017
> >> >
> >> > NICE TO HAVE:
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1015
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1013
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1004
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1003
> >> > https://issues.apache.org/jira/browse/AIRFLOW-1001
> >> >
> >> > It looks like AIRFLOW-1017 is done, though the JIRA is not closed.
> >> >
> >> > The rest remain open. I will wait on the release until the remaining
> >> > blockers are finished. Dan/Daniel, can you comment on status?
> >> >
> >> > Ruslan, if you want to work on your nice to haves, and submit patches,
> >> > that's great, otherwise I don't believe they'll get fixed as part of
> >> 1.8.1.
> >> >
> >> > Cheers,
> >> > Chris
> >> >
> >> > On Wed, Mar 22, 2017 at 9:19 AM, Ruslan Dautkhanov <
> >> dautkha...@gmail.com>
> >> > wrote:
> >> >
> >> >> Thank you Sid!
> >> >>
> >> >>
> >> >> Best regards,
> >> >> Ruslan
> >> >>
> >> >> On Wed, Mar 22, 2017 at 12:01 AM, siddharth anand <san...@apache.org
> >
> >> >> wrote:
> >> >>
> >> >>> Ruslan,
> >> >>> Thanks for sharing this list. I can pick a few up. I agree we should
> >> aim
> >> >> to
> >> >>> get some of them into 1.8.1.
> >> >>>
> >> >>> -s
> >> >>>
> >> >>> On Tue, Mar 21, 2017 at 2:29 PM, Ruslan Dautkhanov <
> >> dautkha...@gmail.com
> >> >>>
> >> >>> wrote:
> >> >>>
> >> >>>> Some of the issues I ran into while testing 1.8rc5 :
> >> >>>>
> >> >>>> https://issues.apache.org/jira/browse/AIRFLOW-1015
> >> >>>>> https://issues.apache.org/jira/browse/AIRFLOW-1013
> >> &

Re: 1.8.1 release

2017-03-22 Thread siddharth anand
Ruslan,
Thanks for sharing this list. I can pick a few up. I agree we should aim to
get some of them into 1.8.1.

-s

On Tue, Mar 21, 2017 at 2:29 PM, Ruslan Dautkhanov 
wrote:

> Some of the issues I ran into while testing 1.8rc5 :
>
> https://issues.apache.org/jira/browse/AIRFLOW-1015
> > https://issues.apache.org/jira/browse/AIRFLOW-1013
> > https://issues.apache.org/jira/browse/AIRFLOW-1004
> > https://issues.apache.org/jira/browse/AIRFLOW-1003
> > https://issues.apache.org/jira/browse/AIRFLOW-1001
> > https://issues.apache.org/jira/browse/AIRFLOW-1015
>
>
> It would be great to have at least some of them fixed in 1.8.1.
>
> Thank you.
>
>
>
>
> --
> Ruslan Dautkhanov
>
> On Tue, Mar 21, 2017 at 3:02 PM, Dan Davydov  invalid
> > wrote:
>
> > Here is my list for targeted 1.8.1 fixes:
> > https://issues.apache.org/jira/browse/AIRFLOW-982
> > https://issues.apache.org/jira/browse/AIRFLOW-983
> > https://issues.apache.org/jira/browse/AIRFLOW-1019 (and in general the
> > slow
> > startup time from this new logic of orphaned/reset task)
> > https://issues.apache.org/jira/browse/AIRFLOW-1017 (which I will
> hopefully
> > have a fix out for soon just finishing up tests)
> >
> > We are also hitting a new issue with subdags with rc5 that we weren't
> > hitting with rc4 where subdags will occasionally just hang (had to roll
> > back from rc5 to rc4), I'll try to spin up a JIRA for it soon which
> should
> > be on the list too.
> >
> >
> > On Tue, Mar 21, 2017 at 1:54 PM, Chris Riccomini 
> > wrote:
> >
> > > Agreed. I'm looking for a list of checksums/JIRAs that we want in the
> > > bugfix release.
> > >
> > > On Tue, Mar 21, 2017 at 12:54 PM, Bolke de Bruin 
> > > wrote:
> > >
> > > >
> > > >
> > > > > On 21 Mar 2017, at 12:51, Bolke de Bruin 
> wrote:
> > > > >
> > > > > My suggestion, as we are using semantic versioning is:
> > > > >
> > > > > 1) no new features in the 1.8 branch
> > > > > 2) only bug fixes in the 1.8 branch
> > > > > 3) new features to land in 1.9
> > > > >
> > > > > This allows companies to
> > > >
> > > > Have a "known" version and can move to the new branch when they want
> to
> > > > get new features. Obviously we only support N-1, so when 1.10 comes
> out
> > > we
> > > > stop supporting 1.8.X.
> > > >
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > >> On 21 Mar 2017, at 11:22, Chris Riccomini 
> > > > wrote:
> > > > >>
> > > > >> Hey all,
> > > > >>
> > > > >> I suggest that we start a 1.8.1 Airflow release now. The goal
> would
> > > be:
> > > > >>
> > > > >> 1) get a second release under our belt
> > > > >> 2) patch known issues with the 1.8.0 release
> > > > >>
> > > > >> I'm happy to run it, but I saw Maxime mentioning that Airbnb might
> > > want
> > > > to.
> > > > >> @Max et al, can you comment?
> > > > >>
> > > > >> Also, can folks supply JIRAs for stuff that think needs to be in
> the
> > > > 1.8.1
> > > > >> bugfix release?
> > > > >>
> > > > >> Cheers,
> > > > >> Chris
> > > >
> > >
> >
>


Re: [RESULT][VOTE]Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-20 Thread siddharth anand
I've updated

   - http://incubator.apache.org/projects/airflow.html (under the News
   section)
   - CWiki Announcements :

https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements#Announcements-March19,2017
   
<https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements#Announcements-March19,2017>
   - Twitter account :
https://twitter.com/ApacheAirflow/status/843721202430492674
   <https://twitter.com/ApacheAirflow/status/843721202430492674>

FYI, I expect the PyPi link to the release will be :
https://pypi.python.org/pypi/airflow/1.8.0+apache.incubating

Max, let us know once the PyPi package is available! Bolke, thx again for
seeing this through!

-s

On Sun, Mar 19, 2017 at 10:46 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:

> I have made airflow 1.8 available from: https://dist.apache.org/repos/
> dist/release/incubator/airflow/1.8.0-incubating/ , I have asked Sid to do
> the official announcement. PyPi can be updated and docs uploaded.
>
> Cheers
> Bolke
>
> > On 19 Mar 2017, at 09:21, Bolke de Bruin <bdbr...@gmail.com> wrote:
> >
> > I’m doing the announcement on the IPMC in a few (need to grab breakfast
> first ;-) ). It can be done any time after that.
> >
> > I need to bump the version number so I will need to re-sign and create a
> new tar ball. I hope they won’t mind that, as it is a bit of a chicken and
> egg problem.
> >
> > Bolke.
> >
> >> On 19 Mar 2017, at 09:01, Maxime Beauchemin <maximebeauche...@gmail.com>
> wrote:
> >>
> >> @Bolke I can take care of regenerating the docs + pypi upload, just let
> me
> >> know when
> >>
> >> Max
> >>
> >> On Fri, Mar 17, 2017 at 5:20 PM, Dan Davydov <dan.davy...@airbnb.com.
> invalid
> >>> wrote:
> >>
> >>> That's reasonable (treating it a bug instead of a change in behavior).
> Full
> >>> speed ahead!
> >>>
> >>> On Thu, Mar 16, 2017 at 9:01 AM, Bolke de Bruin <bdbr...@gmail.com>
> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> Apache Airflow (incubating) 1.8.0 (RC5) has been accepted.
> >>>>
> >>>> 9 “+1” votes received:
> >>>>
> >>>> - Maxime Beauchemin (binding)
> >>>> - Chris Riccomini (binding)
> >>>> - Arthur Wiedmer (binding)
> >>>> - Jeremiah Lowin (binding)
> >>>> - Siddharth Anand (binding)
> >>>> - Alex van Boxel (binding)
> >>>> - Bolke de Bruin (binding)
> >>>>
> >>>> - Daniel Huang (non-binding)
> >>>>
> >>>> Vote thread (start):
> >>>> http://mail-archives.apache.org/mod_mbox/incubator-
> >>>> airflow-dev/201703.mbox/%3cB1833A3A-05FB-4112-B395-
> >>>> 135caf930...@gmail.com%3e
> >>>>
> >>>> Next steps:
> >>>> 1) will start the voting process at the IPMC mailinglist. I don’t
> expect
> >>>> changes.
> >>>> 2) Only after the positive voting on the IPMC and finalisation I will
> >>>> rebrand the RC to Release.
> >>>> 3) I will upload it to the incubator release page, then the tar ball
> >>> needs
> >>>> to propagate to the mirrors.
> >>>> 4) Update the website (can someone volunteer please?)
> >>>> 5) Finally I will ask Maxime to upload it to pypi. It seems we can
> keep
> >>>> the apache branding as lib cloud is doing this as well (
> >>>> https://libcloud.apache.org/downloads.html#pypi-package).
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Bolke
> >>>
> >
>
>


Reminder : LatestOnlyOperator

2017-03-17 Thread siddharth anand
With the Apache Airflow 1.8 release imminent, you may want to try out the

*LatestOnlyOperator.*

If you want your DAG to only run on the most recent scheduled slot,
regardless of backlog, this operator will skip running downstream tasks for
all DAG Runs prior to the current time slot.

For example, I might have a DAG that takes a DB snapshot once a day. It
might be that I paused that DAG for 2 weeks or that I had set the start
date to a fixed data 2 weeks in the past. When I enable my DAG, I don't
want it to run 14 days' worth of snapshots for the current state of the DB
-- that's unnecessary work.

The LatestOnlyOperator avoids that work.

https://github.com/apache/incubator-airflow/commit/edf033be65b575f44aa221d5d0ec9ecb6b32c67a

With it, you can simply use
latest_only = LatestOnlyOperator(task_id='latest_only', dag=dag)

instead of
def skip_to_current_job(ds, **kwargs):
now = datetime.now()
left_window = kwargs['dag'].following_schedule(kwargs['execution_date'])
right_window = kwargs['dag'].following_schedule(left_window)
logging.info(('Left Window {}, Now {}, Right Window
{}').format(left_window,now,right_window))
if not now <= right_window:
logging.info('Not latest execution, skipping downstream.')
return False
return True

short_circuit = ShortCircuitOperator(
  task_id = 'short_circuit_if_not_current_job',
  provide_context = True,
  python_callable = skip_to_current_job,
  dag = dag
)

-s


Re: Airflow Committers: Landscape checks doing more harm than good?

2017-03-16 Thread siddharth anand
+1 for replacing it with travis linting.

On Thu, Mar 16, 2017 at 7:59 PM, Jeremiah Lowin  wrote:

> FWIW I recently started using yapf (https://github.com/google/yapf) with a
> slightly custom config to format all of my projects. Rather than alert to
> discrete linting errors and concrete style rules (like PEP8) -- things I'm
> sure we all do anyway -- it reformats all code in compliance with your
> chosen style rules. It even reformats code that is already PEP8 compliant
> to make it more "pythonic" (and still PEP8 compliant). Basically: if you
> like (or create) a yapf style, it takes care of all the hard reformatting
> work and produces pleasing, consistent results. /plug
>
> On Thu, Mar 16, 2017 at 8:42 PM Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > Let's wire a custom a linter command that can be called locally and
> respect
> > an agreed upon set of parameters (pylint + config file, based off of our
> > current .landscape.yml ).
> >
> > flake8 is far from being as good as pylint and can't be customized much
> > AFAICT, but variations on the command bellow can help you lint [only]
> your
> > PR:
> >
> > `git diff HEAD^ | flake8 --diff`
> >
> > It's a good thing to integrate in your workflow until we get an
> equivalent
> > pylint command/config
> >
> > On Thu, Mar 16, 2017 at 5:03 PM, Alex Guziel  > .invalid
> > > wrote:
> >
> > > +1 also
> > >
> > > We have code review already and the amount of false positives makes
> this
> > > useless.
> > >
> > > On Thu, Mar 16, 2017 at 5:02 PM, Maxime Beauchemin <
> > > maximebeauche...@gmail.com> wrote:
> > >
> > > > +1 as well
> > > >
> > > > I'm disappointed because the service is inches away from getting
> > > everything
> > > > right. As Bolke said, behind the cover it's little more than pylint,
> > git
> > > > hooks, and a somewhat-fancy ui.
> > > >
> > > > Operationally it's been getting in the way.
> > > >
> > > > There's a way to pipe the output of git diff into pylint and check
> > > whether
> > > > the touched lines need linting, in which case we should break the
> > build.
> > > > This could run in it's own slot in the Travis build matrix.
> > > >
> > > > Max
> > > >
> > > > On Thu, Mar 16, 2017 at 4:51 PM, Bolke de Bruin 
> > > wrote:
> > > >
> > > > > We can do it in Travis’ afaik. We should replace it.
> > > > >
> > > > > So +1.
> > > > >
> > > > > B.
> > > > >
> > > > > > On 16 Mar 2017, at 16:48, Jeremiah Lowin 
> > wrote:
> > > > > >
> > > > > > This may be an unpopular opinion, but most Airflow PRs have a
> > little
> > > > red
> > > > > > "x" next to them not because they have failing unit tests, but
> > > because
> > > > > the
> > > > > > Landscape check has decided they introduce bad code.
> > > > > >
> > > > > > Unfortunately Landscape is often wrong -- here it is telling me
> my
> > > > latest
> > > > > > PR introduced no less than 30 errors... in files I didn't touch!
> > > > > > https://github.com/apache/incubator-airflow/pull/2157 (however,
> it
> > > > > gives me
> > > > > > credit for fixing 23 errors in those same files, so I've got that
> > > going
> > > > > for
> > > > > > me... which is nice.)
> > > > > >
> > > > > > The upshot is that Github's "health" indicator can be swayed by
> > minor
> > > > or
> > > > > > erroneous issues, and therefore it serves little purpose other
> than
> > > > > making
> > > > > > it look like every PR is bad. This creates committer fatigue,
> since
> > > > every
> > > > > > PR needs to be parsed to see if it actually is OK or not.
> > > > > >
> > > > > > Don't get me wrong, I'm all for proper style and on occasion
> > > Landscape
> > > > > has
> > > > > > pointed out problems that I've gone and fixed. But on the whole,
> I
> > > > > believe
> > > > > > that having it as part of our red / green PR evaluation -- equal
> to
> > > and
> > > > > > often superseding unit tests -- is harmful. I'd much rather be
> able
> > > to
> > > > > scan
> > > > > > the PR list and know unequivocally that "green" indicates ready
> to
> > > > merge.
> > > > > >
> > > > > > J
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-15 Thread siddharth anand
Confirmed that Bolke's PR above fixes the issue.

Also, I agree this is not a blocker for the current airflow release, so my
+1 (binding) stands.
-s

On Wed, Mar 15, 2017 at 3:11 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:

> PR is available: https://github.com/apache/incubator-airflow/pull/2154
>
> But marked for 1.8.1.
>
> - Bolke
>
> > On 15 Mar 2017, at 14:37, Bolke de Bruin <bdbr...@gmail.com> wrote:
> >
> > On second thought I do consider it a bug and can have a fix out pretty
> quickly, but I don’t consider it a blocker.
> >
> > - B.
> >
> >> On 15 Mar 2017, at 14:21, Bolke de Bruin <bdbr...@gmail.com> wrote:
> >>
> >> Just to be clear: Also in 1.7.1 the DagRun was marked successful, but
> its tasks continued to be scheduled. So one could also consider 1.7.1
> behaviour a bug. I am not sure here, but I think it kind of makes sense to
> consider the behaviour of 1.7.1 a bug. It has been present throughout all
> the 1.8 rc/beta/apha series.
> >>
> >> So yes it is a change in behaviour whether it is a regression or an
> integrity improvement is up for discussion. Either way I don’t consider it
> a blocker.
> >>
> >> Bolke.
> >>
> >>> On 15 Mar 2017, at 14:06, siddharth anand <san...@apache.org> wrote:
> >>>
> >>> Here's the JIRA :
> >>> https://issues.apache.org/jira/browse/AIRFLOW-989
> >>>
> >>> I confirmed it is a regression from 1.7.1.3, which I installed via pip
> and
> >>> tested against the same DAG in the JIRA.
> >>>
> >>> The issue occurs if a leaf / last / terminal downstream task is not
> >>> cleared. You won't see this issue if you clear the entire DAG Run or
> clear
> >>> a task and all of its downstream tasks. If you truly want to only
> clear and
> >>> rerun a task, but not its downstream tasks, you can use the CLI to
> execute
> >>> a specific task (e.g. vial airflow run).
> >>>
> >>> This is a change in behavior -- if we do go ahead with the release,
> then
> >>> this JIRA should be in a list of JIRAs of known issues related to the
> new
> >>> version.
> >>> -s
> >>>
> >>> On Wed, Mar 15, 2017 at 9:17 AM, Chris Riccomini <
> criccom...@apache.org>
> >>> wrote:
> >>>
> >>>> @Sid, does this happen if you clear downstream as well?
> >>>>
> >>>> On Wed, Mar 15, 2017 at 9:04 AM, Chris Riccomini <
> criccom...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Has anyone been able to reproduce Sid's issue?
> >>>>>
> >>>>> On Tue, Mar 14, 2017 at 11:17 PM, Bolke de Bruin <bdbr...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> That is not an airflow error, but a Kerberos error. Try executing
> the
> >>>>>> kinit command on the command line by yourself.
> >>>>>>
> >>>>>> Bolke
> >>>>>>
> >>>>>> Sent from my iPhone
> >>>>>>
> >>>>>>> On 14 Mar 2017, at 23:11, Ruslan Dautkhanov <dautkha...@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> `airflow kerberos` is broken in 1.8-rc5
> >>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-987
> >>>>>>> Hopefully fix can be part of the 1.8 release.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Ruslan Dautkhanov
> >>>>>>>
> >>>>>>>> On Tue, Mar 14, 2017 at 6:19 PM, siddharth anand <
> san...@apache.org>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>> FYI,
> >>>>>>>> I've just hit a major bug in the release candidate related to
> "clear
> >>>>>> task"
> >>>>>>>> behavior.
> >>>>>>>>
> >>>>>>>> I've been running airflow in both stage and prod since yesterday
> on
> >>>>>> rc5 and
> >>>>>>>> have reproduced this in both environments. I will file a JIRA for
> >>>> this
> >>>>>>>> tonight, but wanted to send a note over email as well.
> >>>>>>>>

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-15 Thread siddharth anand
Here's the JIRA :
https://issues.apache.org/jira/browse/AIRFLOW-989

I confirmed it is a regression from 1.7.1.3, which I installed via pip and
tested against the same DAG in the JIRA.

The issue occurs if a leaf / last / terminal downstream task is not
cleared. You won't see this issue if you clear the entire DAG Run or clear
a task and all of its downstream tasks. If you truly want to only clear and
rerun a task, but not its downstream tasks, you can use the CLI to execute
a specific task (e.g. vial airflow run).

This is a change in behavior -- if we do go ahead with the release, then
this JIRA should be in a list of JIRAs of known issues related to the new
version.
-s

On Wed, Mar 15, 2017 at 9:17 AM, Chris Riccomini <criccom...@apache.org>
wrote:

> @Sid, does this happen if you clear downstream as well?
>
> On Wed, Mar 15, 2017 at 9:04 AM, Chris Riccomini <criccom...@apache.org>
> wrote:
>
> > Has anyone been able to reproduce Sid's issue?
> >
> > On Tue, Mar 14, 2017 at 11:17 PM, Bolke de Bruin <bdbr...@gmail.com>
> > wrote:
> >
> >> That is not an airflow error, but a Kerberos error. Try executing the
> >> kinit command on the command line by yourself.
> >>
> >> Bolke
> >>
> >> Sent from my iPhone
> >>
> >> > On 14 Mar 2017, at 23:11, Ruslan Dautkhanov <dautkha...@gmail.com>
> >> wrote:
> >> >
> >> > `airflow kerberos` is broken in 1.8-rc5
> >> > https://issues.apache.org/jira/browse/AIRFLOW-987
> >> > Hopefully fix can be part of the 1.8 release.
> >> >
> >> >
> >> >
> >> > --
> >> > Ruslan Dautkhanov
> >> >
> >> >> On Tue, Mar 14, 2017 at 6:19 PM, siddharth anand <san...@apache.org>
> >> wrote:
> >> >>
> >> >> FYI,
> >> >> I've just hit a major bug in the release candidate related to "clear
> >> task"
> >> >> behavior.
> >> >>
> >> >> I've been running airflow in both stage and prod since yesterday on
> >> rc5 and
> >> >> have reproduced this in both environments. I will file a JIRA for
> this
> >> >> tonight, but wanted to send a note over email as well.
> >> >>
> >> >> In my example, I have a 2 task DAG. For a given DAG run that has
> >> completed
> >> >> successfully, if I
> >> >> 1) clear task2 (leaf task in this case), the previously-successful
> DAG
> >> Run
> >> >> goes back to Running, requeues, and executes the task successfully.
> >> The DAG
> >> >> Run the returns from Running to Success.
> >> >> 2) clear task1 (root task in this case), the previously-successful
> DAG
> >> Run
> >> >> goes back to Running, DOES NOT requeue or execute the task at all.
> The
> >> DAG
> >> >> Run the returns from Running to Success though it never ran the task.
> >> >>
> >> >> 1) is expected and previous behavior. 2) is a regression.
> >> >>
> >> >> The only workaround is to use the CLI to run the task cleared. Here
> are
> >> >> some images :
> >> >> *After Clearing the Tasks*
> >> >> https://www.dropbox.com/s/wmuxt0krwx6wurr/Screenshot%
> >> >> 202017-03-14%2014.09.34.png?dl=0
> >> >>
> >> >> *After DAG Runs return to Success*
> >> >> https://www.dropbox.com/s/qop933rzgdzchpd/Screenshot%
> >> >> 202017-03-14%2014.09.49.png?dl=0
> >> >>
> >> >> This is a major regression because it will force everyone to use the
> >> CLI
> >> >> for things that they would normally use the UI for.
> >> >>
> >> >> -s
> >> >>
> >> >>
> >> >> -s
> >> >>
> >> >>
> >> >>> On Tue, Mar 14, 2017 at 1:32 PM, Daniel Huang <dxhu...@gmail.com>
> >> wrote:
> >> >>>
> >> >>> +1 (non-binding)!
> >> >>>
> >> >>> On Tue, Mar 14, 2017 at 11:35 AM, siddharth anand <
> san...@apache.org>
> >> >>> wrote:
> >> >>>
> >> >>>> +1 (binding)
> >> >>>>
> >> >>>>
> >> >>>> On Tue, Mar 14, 2017 at 8:42 AM, Maxime Beauchemin <
> >> >>>> maximebeauche...@gmail.com> wrote:
> >> >>>>
> >> >>>

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-14 Thread siddharth anand
FYI,
I've just hit a major bug in the release candidate related to "clear task"
behavior.

I've been running airflow in both stage and prod since yesterday on rc5 and
have reproduced this in both environments. I will file a JIRA for this
tonight, but wanted to send a note over email as well.

In my example, I have a 2 task DAG. For a given DAG run that has completed
successfully, if I
1) clear task2 (leaf task in this case), the previously-successful DAG Run
goes back to Running, requeues, and executes the task successfully. The DAG
Run the returns from Running to Success.
2) clear task1 (root task in this case), the previously-successful DAG Run
goes back to Running, DOES NOT requeue or execute the task at all. The DAG
Run the returns from Running to Success though it never ran the task.

1) is expected and previous behavior. 2) is a regression.

The only workaround is to use the CLI to run the task cleared. Here are
some images :
*After Clearing the Tasks*
https://www.dropbox.com/s/wmuxt0krwx6wurr/Screenshot%202017-03-14%2014.09.34.png?dl=0

*After DAG Runs return to Success*
https://www.dropbox.com/s/qop933rzgdzchpd/Screenshot%202017-03-14%2014.09.49.png?dl=0

This is a major regression because it will force everyone to use the CLI
for things that they would normally use the UI for.

-s


-s


On Tue, Mar 14, 2017 at 1:32 PM, Daniel Huang <dxhu...@gmail.com> wrote:

> +1 (non-binding)!
>
> On Tue, Mar 14, 2017 at 11:35 AM, siddharth anand <san...@apache.org>
> wrote:
>
> > +1 (binding)
> >
> >
> > On Tue, Mar 14, 2017 at 8:42 AM, Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> >
> > > +1 (binding)
> > >
> > > On Tue, Mar 14, 2017 at 3:59 AM, Alex Van Boxel <a...@vanboxel.be>
> > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Note: we had to revert all our ONE_SUCCESS with ALL_SUCCESS trigger
> > rules
> > > > where the parent nodes where joining with a SKIP. But I can of should
> > > have
> > > > known this was coming. Apart of that I had a successful run last
> night.
> > > >
> > > >
> > > > On Tue, Mar 14, 2017 at 1:37 AM siddharth anand <san...@apache.org>
> > > wrote:
> > > >
> > > > I'm going to deploy this to staging now. Fab work Bolke!
> > > > -s
> > > >
> > > > On Mon, Mar 13, 2017 at 2:16 PM, Dan Davydov <dan.davy...@airbnb.com
> .
> > > > invalid
> > > > > wrote:
> > > >
> > > > > I'll test this on staging as soon as I get a chance (the testing is
> > > > > non-blocking on the rc5). Bolke very much in particular :).
> > > > >
> > > > > On Mon, Mar 13, 2017 at 10:46 AM, Jeremiah Lowin <
> jlo...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > +1 (binding) extremely impressed by the work and diligence all
> > > > > contributors
> > > > > > have put in to getting these blockers fixed, Bolke in particular.
> > > > > >
> > > > > > On Mon, Mar 13, 2017 at 1:07 AM Arthur Wiedmer <
> art...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > Thanks again for steering us through Bolke.
> > > > > > >
> > > > > > > Best,
> > > > > > > Arthur
> > > > > > >
> > > > > > > On Sun, Mar 12, 2017 at 9:59 PM, Bolke de Bruin <
> > bdbr...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Dear All,
> > > > > > > >
> > > > > > > > Finally, I have been able to make the FIFTH RELEASE CANDIDATE
> > of
> > > > > > Airflow
> > > > > > > > 1.8.0 available at: https://dist.apache.org/repos/
> > > > > > > > dist/dev/incubator/airflow/ <https://dist.apache.org/
> > > > > > > > repos/dist/dev/incubator/airflow/> , public keys are
> available
> > > at
> > > > > > > > https://dist.apache.org/repos/dist/release/incubator/
> airflow/
> > <
> > > > > > > > https://dist.apache.org/repos/dist/release/incubator/
> airflow/>
> > .
> > > > It
> > > > > is
> > > > > > > > tagged with a local version “apache.incubating” so it allows

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-13 Thread siddharth anand
I'm going to deploy this to staging now. Fab work Bolke!
-s

On Mon, Mar 13, 2017 at 2:16 PM, Dan Davydov  wrote:

> I'll test this on staging as soon as I get a chance (the testing is
> non-blocking on the rc5). Bolke very much in particular :).
>
> On Mon, Mar 13, 2017 at 10:46 AM, Jeremiah Lowin 
> wrote:
>
> > +1 (binding) extremely impressed by the work and diligence all
> contributors
> > have put in to getting these blockers fixed, Bolke in particular.
> >
> > On Mon, Mar 13, 2017 at 1:07 AM Arthur Wiedmer 
> wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks again for steering us through Bolke.
> > >
> > > Best,
> > > Arthur
> > >
> > > On Sun, Mar 12, 2017 at 9:59 PM, Bolke de Bruin 
> > wrote:
> > >
> > > > Dear All,
> > > >
> > > > Finally, I have been able to make the FIFTH RELEASE CANDIDATE of
> > Airflow
> > > > 1.8.0 available at: https://dist.apache.org/repos/
> > > > dist/dev/incubator/airflow/  > > > repos/dist/dev/incubator/airflow/> , public keys are available at
> > > > https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> > > > https://dist.apache.org/repos/dist/release/incubator/airflow/> . It
> is
> > > > tagged with a local version “apache.incubating” so it allows
> upgrading
> > > from
> > > > earlier releases.
> > > >
> > > > Issues fixed since rc4:
> > > >
> > > > [AIRFLOW-900] Double trigger should not kill original task instance
> > > > [AIRFLOW-900] Fixes bugs in LocalTaskJob for double run protection
> > > > [AIRFLOW-932] Do not mark tasks removed when backfilling
> > > > [AIRFLOW-961] run onkill when SIGTERMed
> > > > [AIRFLOW-910] Use parallel task execution for backfills
> > > > [AIRFLOW-967] Wrap strings in native for py2 ldap compatibility
> > > > [AIRFLOW-941] Use defined parameters for psycopg2
> > > > [AIRFLOW-719] Prevent DAGs from ending prematurely
> > > > [AIRFLOW-938] Use test for True in task_stats queries
> > > > [AIRFLOW-937] Improve performance of task_stats
> > > > [AIRFLOW-933] use ast.literal_eval rather eval because
> ast.literal_eval
> > > > does not execute input.
> > > > [AIRFLOW-919] Running tasks with no start date shouldn't break a DAGs
> > UI
> > > > [AIRFLOW-897] Prevent dagruns from failing with unfinished tasks
> > > > [AIRFLOW-861] make pickle_info endpoint be login_required
> > > > [AIRFLOW-853] use utf8 encoding for stdout line decode
> > > > [AIRFLOW-856] Make sure execution date is set for local client
> > > > [AIRFLOW-830][AIRFLOW-829][AIRFLOW-88] Reduce Travis log verbosity
> > > > [AIRFLOW-794] Access DAGS_FOLDER and SQL_ALCHEMY_CONN exclusively
> from
> > > > settings
> > > > [AIRFLOW-694] Fix config behaviour for empty envvar
> > > > [AIRFLOW-365] Set dag.fileloc explicitly and use for Code view
> > > > [AIRFLOW-931] Do not set QUEUED in TaskInstances
> > > > [AIRFLOW-899] Tasks in SCHEDULED state should be white in the UI
> > instead
> > > > of black
> > > > [AIRFLOW-895] Address Apache release incompliancies
> > > > [AIRFLOW-893][AIRFLOW-510] Fix crashing webservers when a dagrun has
> no
> > > > start date
> > > > [AIRFLOW-793] Enable compressed loading in S3ToHiveTransfer
> > > > [AIRFLOW-863] Example DAGs should have recent start dates
> > > > [AIRFLOW-869] Refactor mark success functionality
> > > > [AIRFLOW-856] Make sure execution date is set for local client
> > > > [AIRFLOW-814] Fix Presto*CheckOperator.__init__
> > > > [AIRFLOW-844] Fix cgroups directory creation
> > > >
> > > > No known issues anymore.
> > > >
> > > > I would also like to raise a VOTE for releasing 1.8.0 based on
> release
> > > > candidate 5, i.e. just renaming release candidate 5 to 1.8.0 release.
> > > >
> > > > Please respond to this email by:
> > > >
> > > > +1,0,-1 with *binding* if you are a PMC member or *non-binding* if
> you
> > > are
> > > > not.
> > > >
> > > > Thanks!
> > > > Bolke
> > > >
> > > > My VOTE: +1 (binding)
> > >
> >
>


Re: scheduler running on multiple nodes

2017-02-23 Thread siddharth anand
I did  run 2 or more schedulers with Local Executors up until mid last
year. There have been enough changes to the code and feature additions that
I don't think this is a recommended practice at this point. Also, there is
not a lot of synchronization in the scheduler to ensure this will work.

-s

On Thu, Feb 9, 2017 at 6:47 AM, matus valo  wrote:

> Hi all,
>
>
>
> I am considering deployment of airflow as pipeline framework. I have found
> out multiple articles explaining deployment of airflow in distributed
> environment (e.g. [1]). Unfortunately, I was not able to find out any use
> case where scheduler is deployed distributed on multiple nodes. Is it
> possible to have scheduler distributed on multiple nodes to prevent single
> point of failure? I haven’t found any mention about it in documentation. I
> have found out in [2] that it is not possible but on the other hand in [3]
> is reference that this can be solved in new version of airflow.
>
>
>
> Thanks,
>
>
> Matus
>
>
>
> [1] http://site.clairvoyantsoft.com/setting-apache-airflow-cluster/
>
> [2] https://groups.google.com/forum/#!topic/airbnb_airflow/-1wKa3OcwME
>
> [3] https://issues.apache.org/jira/browse/AIRFLOW-678
>


Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-23 Thread siddharth anand
IMHO, a DAG run without a start date is non-sensical but is not enforced
 That said, our UI allows for the manual creation of DAG Runs without a
start date as shown in the images below:


   - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
   202017-02-22%2016.00.40.png?dl=0
   
<https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%202017-02-22%2016.00.40.png?dl=0>
   - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
   202017-02-22%2016.02.22.png?dl=0
   
<https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%202017-02-22%2016.02.22.png?dl=0>


On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Our database may have edge cases that could be associated with running any
> previous version that may or may not have been part of an official release.
>
> Let's see if anyone else reports the issue. If no one does, one option is
> to release 1.8.0 as is with a comment in the release notes, and have a
> future official minor apache release 1.8.1 that would fix these minor
> issues that are not deal breaker.
>
> @bolke, I'm curious, how long does it take you to go through one release
> cycle? Oh, and do you have a documented step by step process for releasing?
> I'd like to add the Pypi part to this doc and add committers that are
> interested to have rights on the project on Pypi.
>
> Max
>
> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>
> > So it is a database integrity issue? Afaik a start_date should always be
> > set for a DagRun (create_dagrun) does so  I didn't check the code though.
> >
> > Sent from my iPhone
> >
> > > On 22 Feb 2017, at 22:19, Dan Davydov <dan.davy...@airbnb.com.INVALID>
> > wrote:
> > >
> > > Should clarify this occurs when a dagrun does not have a start date,
> not
> > a
> > > dag (which makes it even less likely to happen). I don't think this is
> a
> > > blocker for releasing.
> > >
> > >> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov <dan.davy...@airbnb.com>
> > wrote:
> > >>
> > >> I rolled this out in our prod and the webservers failed to load due to
> > >> this commit:
> > >>
> > >> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag
> > >> 7c94d81c390881643f94d5e3d7d6fb351a445b72
> > >>
> > >> This fixed it:
> > >> -  > >> class="glyphicon glyphicon-info-sign" aria-hidden="true" title="Start
> > Date:
> > >> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}">
> > >> +  > >> class="glyphicon glyphicon-info-sign" aria-hidden="true">
> > >>
> > >> This is caused by assuming that all DAGs have start dates set, so a
> > broken
> > >> DAG will take down the whole UI. Not sure if we want to make this a
> > blocker
> > >> for the release or not, I'm guessing for most deployments this would
> > occur
> > >> pretty rarely. I'll submit a PR to fix it soon.
> > >>
> > >>
> > >>
> > >> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini <
> criccom...@apache.org
> > >
> > >> wrote:
> > >>
> > >>> Ack that the vote has already passed, but belated +1 (binding)
> > >>>
> > >>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin <bdbr...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> IPMC Voting can be found here:
> > >>>>
> > >>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
> > >>> 201702.mbox/%
> > >>>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3e <
> > >>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
> > >>> 201702.mbox/%
> > >>>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3E>
> > >>>>
> > >>>> Kind regards,
> > >>>> Bolke
> > >>>>
> > >>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin <bdbr...@gmail.com>
> wrote:
> > >>>>>
> > >>>>> Hello,
> > >>>>>
> > >>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been accepted.
> > >>>>>
> > >>>>> 9 “+1” votes received:
> > >>>>>
> > >>>>> - Maxime Beauchemin (binding)
> > >>

Re: Meetup featuring an Airflow talk tomorrow

2017-02-23 Thread siddharth anand
Nice!
https://twitter.com/ApacheAirflow/status/834945481440546816

Do please share slides and video so we can post both on via twitter & wiki.
-s

On Wed, Feb 22, 2017 at 3:00 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Hi,
>
> Just wanted to let you know that Arthur (one of our Apache Airflow
> committers) will be giving an talk on "Building Data Workflows with
> Airflow" tomorrow 2/22 at Galvanize
>
> https://www.meetup.com/SF-Data-Engineering/events/
> 237797553/?rv=md1&_af=event&_af_eid=237797553=on
>
> Enjoy!
>
> Max
>


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-20 Thread siddharth anand
+1 (binding). Thx Bolke!
-s

On Mon, Feb 20, 2017 at 2:51 PM, Alex Van Boxel  wrote:

> +1 (binding)
>
> On Mon, Feb 20, 2017 at 5:32 AM y...@yahoo-inc.com.INVALID
>  wrote:
>
> >
> > +1 (non-binding)
> >
> > Thanks for all the work!
> >
> > YiOn Sunday, February 19, 2017, 12:52:31 PM PST, Arthur Wiedmer <
> > arthur.wied...@gmail.com> wrote:+1 (binding)
> >
> > Thanks again for all the work!
> >
> > Best,
> > Arthur
> >
> > On Fri, Feb 17, 2017 at 4:46 PM, Jeremiah Lowin 
> wrote:
> >
> > > +1 (binding) many thanks for all your work on this Bolke!
> > >
> > > On Fri, Feb 17, 2017 at 7:10 PM Jayesh Senjaliya 
> > > wrote:
> > >
> > > +1 ( non-binding )works fine for me.
> > >
> > >
> > >
> > > On Fri, Feb 17, 2017 at 3:37 PM, Maxime Beauchemin <
> > > maximebeauche...@gmail.com> wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Fri, Feb 17, 2017 at 11:33 AM, Dan Davydov <
> > > > dan.davy...@airbnb.com.invalid> wrote:
> > > >
> > > > > +1 (binding). Mark success works great now, thanks to Bolke for
> > fixing.
> > > > >
> > > > > On Fri, Feb 17, 2017 at 12:22 AM, Bolke de Bruin <
> bdbr...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Dear All,
> > > > > >
> > > > > > I have made the FOURTH RELEASE CANDIDATE of Airflow 1.8.0
> available
> > > at:
> > > > > > https://dist.apache.org/repos/dist/dev/incubator/airflow/ <
> > > > > > https://dist.apache.org/repos/dist/dev/incubator/airflow/> ,
> > public
> > > > keys
> > > > > > are available at https://dist.apache.org/repos/
> > > > > > dist/release/incubator/airflow/  > > > > > /dist/release/incubator/airflow/> . It is tagged with a local
> > > version
> > > > > > “apache.incubating” so it allows upgrading from earlier releases.
> > > > > >
> > > > > > One issues have been fixed since release candidate 3:
> > > > > >
> > > > > > * mark success was not working properly
> > > > > >
> > > > > > No known issues anymore.
> > > > > >
> > > > > > I would also like to raise a VOTE for releasing 1.8.0 based on
> > > release
> > > > > > candidate 4, i.e. just renaming release candidate 4 to 1.8.0
> > release.
> > > > > >
> > > > > > Please respond to this email by:
> > > > > >
> > > > > > +1,0,-1 with *binding* if you are a PMC member or *non-binding*
> if
> > > you
> > > > > are
> > > > > > not.
> > > > > >
> > > > > > Thanks!
> > > > > > Bolke
> > > > > >
> > > > > > My VOTE: +1 (binding)
> > > > >
> > > >
> > >
>
> --
>   _/
> _/ Alex Van Boxel
>


Re: Soliciting feedback: Using the Airflow CLI as a thin client

2017-02-15 Thread siddharth anand
Hi Wilson,
I'm a huge fan of the CLI and you are correct that the released current
version of the CLI requires both a connection to the DB and access to the
dag folder.

In the new 1.8.0 release that is currently being driven by Bolke, the CLI
uses the API. I'm not 100% sure that all CLIs commands have API end-points,
but I suspect it's nearly complete if not already complete. That reminds
me.. as we vet the 1.8.0 release candidates, we should test out both CLI
and API.

In a nutshell, the goal is for the CLI to be a thin-wrapper that talks to
the API (running on the webserver), which would have access to both the DB
and DAG folder. This would allow anyone to run CLI from any machine that
has access to the API endpoints.
-s

On Tue, Feb 14, 2017 at 1:40 PM, Wilson Lian 
wrote:

> Hi all,
>
> I'm interested in using the Airflow CLI as a thin client so that I can run
> DAG-management commands like pause, unpause, trigger_dag, run, etc. from a
> local machine against a remote airflow cluster (e.g., running in Google
> Container Engine).
>
> I have tried pointing [core]sql_alchemy_conn at the remote database, but
> without a shared view of the DAGs folder, the different components don't
> seem to be able to sync up. For example, list_dags looks at the local DAGs
> folder, but not at the database; and using trigger_dag with a local DAG
> file seems to put the DAG in the database, but its task instances never
> execute, presumably because none of the nodes in the cluster have a copy of
> the DAG file.
>
> I think in order for the CLI to be used as a thin client, the database,
> rather than the DAGs folder needs to be used as the source of truth for
> DAGs (and possibly other objects). Can anyone provide an estimate of how
> heavyweight such a change would be?
>
> I'm also curious what people think about delegating the pointer to the
> current config file to a higher-level config file that contains references
> to different configurations and a pointer to the "current" config.
>


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc3

2017-02-11 Thread siddharth anand
Deployed to stage and will watch over the weekend before voting.
-s

On Sat, Feb 11, 2017 at 9:53 AM, Jeremiah Lowin  wrote:

> Boris, I submitted a PR to address your second point --
> https://github.com/apache/incubator-airflow/pull/2068. Thanks!
>
> On Sat, Feb 11, 2017 at 10:42 AM Boris Tyukin 
> wrote:
>
> > I am running LocalExecutor and not doing crazy things but use DAG
> > generation heavily - everything runs fine as before. As I mentioned in
> > other threads only had a few issues:
> >
> > 1) had to upgrade MySQL which was a PAIN. Cloudera CDH is running old
> > version of MySQL which was compatible with 1.7.1 but not compatible now
> > with 1.8 because of fractional seconds support PR.
> >
> > 2) when you install airflow, there are two new example DAGs
> > (last_task_only) which are going back very far in the past and scheduled
> to
> > run every hour - a bunch of dags triggered on the first start of
> scheduler
> > and hosed my CPU
> >
> > Everything else was fine and I LOVE lots of small UI changes, which
> reduced
> > a lot my use of cli.
> >
> > Thanks again for the amazing work and an awesome project!
> >
> >
> > On Sat, Feb 11, 2017 at 9:17 AM, Jeremiah Lowin 
> wrote:
> >
> > > I was able to deploy successfully. +1 (binding)
> > >
> > > On Fri, Feb 10, 2017 at 7:37 PM Maxime Beauchemin <
> > > maximebeauche...@gmail.com> wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Fri, Feb 10, 2017 at 3:44 PM, Arthur Wiedmer <
> > > arthur.wied...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Feb 10, 2017 3:13 PM, "Dan Davydov"  > > invalid>
> > > > > wrote:
> > > > >
> > > > > > Our staging looks good, all the DAGs there pass.
> > > > > > +1 (binding)
> > > > > >
> > > > > > On Fri, Feb 10, 2017 at 10:21 AM, Chris Riccomini <
> > > > criccom...@apache.org
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Running in all environments. Will vote after the weekend to
> make
> > > sure
> > > > > > > things are working properly, but so far so good.
> > > > > > >
> > > > > > > On Fri, Feb 10, 2017 at 6:05 AM, Bolke de Bruin <
> > bdbr...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Dear All,
> > > > > > > >
> > > > > > > > Let’s try again!
> > > > > > > >
> > > > > > > > I have made the THIRD RELEASE CANDIDATE of Airflow 1.8.0
> > > available
> > > > > at:
> > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/airflow/ <
> > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/airflow/> ,
> > > > public
> > > > > > keys
> > > > > > > > are available at https://dist.apache.org/repos/
> > > > > dist/release/incubator/
> > > > > > > > airflow/ <
> > https://dist.apache.org/repos/dist/release/incubator/
> > > > > > airflow/>
> > > > > > > > . It is tagged with a local version “apache.incubating” so it
> > > > allows
> > > > > > > > upgrading from earlier releases.
> > > > > > > >
> > > > > > > > Two issues have been fixed since release candidate 2:
> > > > > > > >
> > > > > > > > * trigger_dag could create dags with fractional seconds, not
> > > > > supported
> > > > > > by
> > > > > > > > logging and UI at the moment
> > > > > > > > * local api client trigger_dag had hardcoded execution of
> None
> > > > > > > >
> > > > > > > > Known issue:
> > > > > > > > * Airflow on kubernetes and num_runs -1 (default) can expose
> > > import
> > > > > > > issues.
> > > > > > > >
> > > > > > > > I have extensively discussed this with Alex (reporter) and we
> > > > > consider
> > > > > > > > this a known issue with a workaround available as we are
> unable
> > > to
> > > > > > > > replicate this in a different environment. UPDATING.md has
> been
> > > > > updated
> > > > > > > > with the work around.
> > > > > > > >
> > > > > > > > As these issues are confined to a very specific area and full
> > > unit
> > > > > > tests
> > > > > > > > were added I would also like to raise a VOTE for releasing
> > 1.8.0
> > > > > based
> > > > > > on
> > > > > > > > release candidate 3, i.e. just renaming release candidate 3
> to
> > > > 1.8.0
> > > > > > > > release.
> > > > > > > >
> > > > > > > > Please respond to this email by:
> > > > > > > >
> > > > > > > > +1,0,-1 with *binding* if you are a PMC member or
> *non-binding*
> > > if
> > > > > you
> > > > > > > are
> > > > > > > > not.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > > Bolke
> > > > > > > >
> > > > > > > > My VOTE: +1 (binding)
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Airflow Meetup @ Paypal (San Jose)

2017-02-06 Thread siddharth anand
I'd +1 the presentation. The panel and roadmap idea is enticing but
challenging since we tend to have disparate company-guided roadmaps that
tend to guide each of our individual efforts to some degree. To some extent
all of the contributors have a mini-roadmap in mind of operators or
features they would like to have implemented.

Gurer, where are the results of the roadmap survey that you collected?
https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg01777.html

If you feel you didn't get enough results, it might be a good idea for us
to kick that off again.
-s

On Fri, Feb 3, 2017 at 10:34 AM, Jayesh Senjaliya <jhsonl...@gmail.com>
wrote:

> yeah, I think we should have both. since we only have 2 presentations,
> we will have plenty of time for round table and Q
>
> - Jayesh
>
> On Fri, Feb 3, 2017 at 9:49 AM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > I like the idea of a panel talking about direction for the project. The
> > panel could be taking questions from a moderator and from the audience.
> I'm
> > open sitting on the panel.
> >
> > I could also give a talk on our A/B testing framework's complex DAG if I
> > can't get the engineers working on it to do it :)
> >
> > Maybe we can do both, and just make the Q/A section we usually have a
> panel
> > Q/A instead of a Max Q/A.
> >
> > Max
> >
> > On Fri, Feb 3, 2017 at 5:07 AM, Bolke de Bruin <bdbr...@gmail.com>
> wrote:
> >
> > > I might. But I would maybe be more interested in a kind of round table
> /
> > > panel session to discuss directions? Does that make sense? Or would you
> > > like me to talk about a specific subject?
> > >
> > > - Bolke.
> > >
> > > > On 3 Feb 2017, at 03:42, siddharth anand <san...@apache.org> wrote:
> > > >
> > > > Cool! I've tweeted it out using the ApacheAirflow account and also
> > added
> > > it
> > > > to https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements
> > > >
> > > > FYI,
> > > > I was mistaken about the drive between Strata and PayPal. I had used
> > the
> > > > wrong venue. Strata is at the SJ Convention Center this year.  It's
> > still
> > > > very close.. 5 miles (12 minutes - reverse commute).
> > > >
> > > > https://goo.gl/maps/nwxmkYsNFKQ2
> > > >
> > > > BTW, I heard Bolke may be attending ;-) Bolke, would you like to
> speak
> > at
> > > > the Meetup?
> > > >
> > > > Jakob (other committers), will be down here for Strata?
> > > >
> > > > -s
> > > >
> > > > On Thu, Feb 2, 2017 at 5:02 PM, Jayesh Senjaliya <
> jhsonl...@gmail.com>
> > > > wrote:
> > > >
> > > >> Sure,
> > > >> I have created event on Meetup :
> > > >> https://www.meetup.com/Bay-Area-Apache-Airflow-
> > > Incubating-Meetup/events/
> > > >> 237412864/
> > > >>
> > > >> Thanks for helping on this Siddharth.
> > > >> Jayesh
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Feb 1, 2017 at 7:50 PM, siddharth anand <san...@apache.org>
> > > wrote:
> > > >>
> > > >>> IMHO, I'd publish the meet-up. You still have 6 weeks to find a 3rd
> > > >>> speaker. If Bolke and Alex are traveling all the way for Strata,
> > > perhaps
> > > >>> one of them can speak :-)
> > > >>>
> > > >>> -s
> > > >>>
> > > >>> On Wed, Feb 1, 2017 at 1:48 PM, Russell Jurney <
> > > russell.jur...@gmail.com
> > > >>>
> > > >>> wrote:
> > > >>>
> > > >>>> Maybe start a new thread with a title "Call for Speakers for
> Meetup
> > on
> > > >>> Mar
> > > >>>> 14" ?
> > > >>>>
> > > >>>> On Wed, Feb 1, 2017 at 11:59 AM Jayesh Senjaliya <
> > jhsonl...@gmail.com
> > > >
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Yes, we are still waiting for more speakers.
> > > >>>>>
> > > >>>>> can anybody from Airbnb present ?
> > > >>>>>
> > > >>>>> anybody else ?
> > > >>>>>
> > > >>>>>
> > > >>>>

Re: Airflow Meetup 1Q17 Talk Videos

2017-02-06 Thread siddharth anand
Community members,
I'd encourage you to stream your meetups as well since we do have many
remote members in the community that may want to attend real-time.

In some cases, where there are local committers/contributors, we can offer
office hours to promote great in-person attendance.
-s

On Mon, Feb 6, 2017 at 2:27 PM, siddharth anand <san...@apache.org> wrote:

> Thx George.
>
> <http://goog_936078273>
> https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links
>
> I've added it to the page above and created a section for all Meetup
> Videos.
>
> Community members,
> For future reference, even if you don't plan to stream, I'd recommend
> recording your meetups so they can live on here.
> -s
>
> On Mon, Feb 6, 2017 at 12:53 PM, George Leslie-Waksman <
> geo...@cloverhealth.com.invalid> wrote:
>
>> Video of the meetup talks and subsequent Q is now on YouTube:
>> https://www.youtube.com/watch?v=P0GYZXR0YP4
>>
>
>


Re: Airflow 1.8.0 Release Candidate 1

2017-02-06 Thread siddharth anand
  Table "public.dag_stats"

 Column |  Type  | Modifiers

++---

 dag_id | character varying(250) | not null

 state  | character varying(50)  | not null

 count  | integer| not null

 dirty  | boolean| not null

Indexes:

"dag_stats_pkey" PRIMARY KEY, btree (dag_id, state)


The PKEY is a combination of 2 provided columns, so I'm wondering why
Alembic is complaining here.

On Mon, Feb 6, 2017 at 4:24 PM, siddharth anand <san...@apache.org> wrote:

> Actually, I see the error is further down..
>
>   File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py",
> line 469, in do_execute
>
> cursor.execute(statement, parameters)
>
> sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value in
> column "dag_id" violates not-null constraint
>
> DETAIL:  Failing row contains (null, running, 1, f).
>
>  [SQL: 'INSERT INTO dag_stats (state, count, dirty) VALUES (%(state)s,
> %(count)s, %(dirty)s)'] [parameters: {'count': 1L, 'state': u'running',
> 'dirty': False}]
>
> It looks like an autoincrement is missing for this table.
>
>
> I'm running `SQLAlchemy==1.1.4` - I see our setup.py specifies any
> version greater than 0.9.8
>
> -s
>
>
>
> On Mon, Feb 6, 2017 at 4:11 PM, siddharth anand <san...@apache.org> wrote:
>
>> I tried upgrading to 1.8.0rc1 from 1.7.1.3 via pip install
>> https://dist.apache.org/repos/dist/dev/incubator/airflow/air
>> flow-1.8.0rc1+apache.incubating.tar.gz and then running airflow
>> upgradedb didn't quite work. First, I thought it completed successfully,
>> then saw errors some tables were indeed missing. I ran it again and
>> encountered the following exception :
>>
>> DB: postgresql://app_coust...@db-cousteau.ep.stage.agari.com:5432/airflow
>>
>> [2017-02-07 00:03:20,309] {db.py:284} INFO - Creating tables
>>
>> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
>>
>> INFO  [alembic.runtime.migration] Will assume transactional DDL.
>>
>> INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 ->
>> 211e584da130, add TI state index
>>
>> INFO  [alembic.runtime.migration] Running upgrade 211e584da130 ->
>> 64de9cddf6c9, add task fails journal table
>>
>> INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 ->
>> f2ca10b85618, add dag_stats table
>>
>> INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 ->
>> 4addfa1236f1, Add fractional seconds to mysql tables
>>
>> INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 ->
>> 8504051e801b, xcom dag task indices
>>
>> INFO  [alembic.runtime.migration] Running upgrade 8504051e801b ->
>> 5e7d17757c7a, add pid field to TaskInstance
>>
>> INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a ->
>> 127d2bf2dfa7, Add dag_id/state index on dag_run table
>>
>> /usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/crud.py:692:
>> SAWarning: Column 'dag_stats.dag_id' is marked as a member of the primary
>> key for table 'dag_stats', but has no Python-side or server-side default
>> generator indicated, nor does it indicate 'autoincrement=True' or
>> 'nullable=True', and no explicit value is passed.  Primary key columns
>> typically may not store NULL. Note that as of SQLAlchemy 1.1,
>> 'autoincrement=True' must be indicated explicitly for composite (e.g.
>> multicolumn) primary keys if AUTO_INCREMENT/SERIAL/IDENTITY behavior is
>> expected for one of the columns in the primary key. CREATE TABLE statements
>> are impacted by this change as well on most backends.
>>
>
>


Re: Airflow 1.8.0 Release Candidate 1

2017-02-06 Thread siddharth anand
Actually, I see the error is further down..

  File
"/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line
469, in do_execute

cursor.execute(statement, parameters)

sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value in
column "dag_id" violates not-null constraint

DETAIL:  Failing row contains (null, running, 1, f).

 [SQL: 'INSERT INTO dag_stats (state, count, dirty) VALUES (%(state)s,
%(count)s, %(dirty)s)'] [parameters: {'count': 1L, 'state': u'running',
'dirty': False}]

It looks like an autoincrement is missing for this table.


I'm running `SQLAlchemy==1.1.4` - I see our setup.py specifies any version
greater than 0.9.8

-s



On Mon, Feb 6, 2017 at 4:11 PM, siddharth anand <san...@apache.org> wrote:

> I tried upgrading to 1.8.0rc1 from 1.7.1.3 via pip install
> https://dist.apache.org/repos/dist/dev/incubator/airflow/
> airflow-1.8.0rc1+apache.incubating.tar.gz and then running airflow
> upgradedb didn't quite work. First, I thought it completed successfully,
> then saw errors some tables were indeed missing. I ran it again and
> encountered the following exception :
>
> DB: postgresql://app_coust...@db-cousteau.ep.stage.agari.com:5432/airflow
>
> [2017-02-07 00:03:20,309] {db.py:284} INFO - Creating tables
>
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
>
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
>
> INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 ->
> 211e584da130, add TI state index
>
> INFO  [alembic.runtime.migration] Running upgrade 211e584da130 ->
> 64de9cddf6c9, add task fails journal table
>
> INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 ->
> f2ca10b85618, add dag_stats table
>
> INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 ->
> 4addfa1236f1, Add fractional seconds to mysql tables
>
> INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 ->
> 8504051e801b, xcom dag task indices
>
> INFO  [alembic.runtime.migration] Running upgrade 8504051e801b ->
> 5e7d17757c7a, add pid field to TaskInstance
>
> INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a ->
> 127d2bf2dfa7, Add dag_id/state index on dag_run table
>
> /usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/crud.py:692:
> SAWarning: Column 'dag_stats.dag_id' is marked as a member of the primary
> key for table 'dag_stats', but has no Python-side or server-side default
> generator indicated, nor does it indicate 'autoincrement=True' or
> 'nullable=True', and no explicit value is passed.  Primary key columns
> typically may not store NULL. Note that as of SQLAlchemy 1.1,
> 'autoincrement=True' must be indicated explicitly for composite (e.g.
> multicolumn) primary keys if AUTO_INCREMENT/SERIAL/IDENTITY behavior is
> expected for one of the columns in the primary key. CREATE TABLE statements
> are impacted by this change as well on most backends.
>


Re: Airflow 1.8.0 Release Candidate 1

2017-02-06 Thread siddharth anand
I tried upgrading to 1.8.0rc1 from 1.7.1.3 via pip install
https://dist.apache.org/repos/dist/dev/incubator/airflow/airflow-1.8.0rc1+apache.incubating.tar.gz
and
then running airflow upgradedb didn't quite work. First, I thought it
completed successfully, then saw errors some tables were indeed missing. I
ran it again and encountered the following exception :

DB: postgresql://app_coust...@db-cousteau.ep.stage.agari.com:5432/airflow

[2017-02-07 00:03:20,309] {db.py:284} INFO - Creating tables

INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.

INFO  [alembic.runtime.migration] Will assume transactional DDL.

INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 ->
211e584da130, add TI state index

INFO  [alembic.runtime.migration] Running upgrade 211e584da130 ->
64de9cddf6c9, add task fails journal table

INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 ->
f2ca10b85618, add dag_stats table

INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 ->
4addfa1236f1, Add fractional seconds to mysql tables

INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 ->
8504051e801b, xcom dag task indices

INFO  [alembic.runtime.migration] Running upgrade 8504051e801b ->
5e7d17757c7a, add pid field to TaskInstance

INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a ->
127d2bf2dfa7, Add dag_id/state index on dag_run table

/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/crud.py:692:
SAWarning: Column 'dag_stats.dag_id' is marked as a member of the primary
key for table 'dag_stats', but has no Python-side or server-side default
generator indicated, nor does it indicate 'autoincrement=True' or
'nullable=True', and no explicit value is passed.  Primary key columns
typically may not store NULL. Note that as of SQLAlchemy 1.1,
'autoincrement=True' must be indicated explicitly for composite (e.g.
multicolumn) primary keys if AUTO_INCREMENT/SERIAL/IDENTITY behavior is
expected for one of the columns in the primary key. CREATE TABLE statements
are impacted by this change as well on most backends.


Re: Airflow Meetup 1Q17 Talk Videos

2017-02-06 Thread siddharth anand
Thx George.


https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links

I've added it to the page above and created a section for all Meetup
Videos.

Community members,
For future reference, even if you don't plan to stream, I'd recommend
recording your meetups so they can live on here.
-s

On Mon, Feb 6, 2017 at 12:53 PM, George Leslie-Waksman <
geo...@cloverhealth.com.invalid> wrote:

> Video of the meetup talks and subsequent Q is now on YouTube:
> https://www.youtube.com/watch?v=P0GYZXR0YP4
>


Re: NYC Airflow Meetup

2017-02-03 Thread siddharth anand
Great!
Thanks for creating it - I've just joined so you can add me as an
organizer.

I've linked to it on :

   - https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements
   - https://twitter.com/ApacheAirflow/status/827743162789605382
   - https://cwiki.apache.org/confluence/display/AIRFLOW/Meetups
   -


-s

On Fri, Feb 3, 2017 at 11:01 AM, Joseph Napolitano <
joseph.napolit...@blueapron.com.invalid> wrote:

> Hi all,
>
> I want to thank everyone for attending NYC's first Airflow meetup at Blue
> Apron.  It was a huge success and we're glad to have met everyone.
>
> As suggested, we decided to create an official NYC Meetup page, sponsored
> by Blue Apron.  We'll add Sid and Max as Organizers.  Let us know if you
> want to help organize.
>
> https://www.meetup.com/NYC-Apache-Airflow-incubating-Meetup/
>
> I planned on taking video of the presentations, but it completely slipped
> my mind!  I'll upload my slides to Slideshare and provide a small writeup
> to complement them.
>
> We're committed to Airflow at Blue Apron and we love the project.  Now that
> our infrastructure is taking shape, we'll have time to contribute back to
> the project.  We have top-down support at Blue Apron to dedicate company
> time for it.
>
> Feel free to connect anytime!
> https://www.linkedin.com/in/joenap
>
> Thanks again,
> *Joe Napolitano *| Sr. Data Engineer
> www.blueapron.com | 5 Crosby Street, New York, NY 10013
>


Re: Airflow Meetup in NYC @ Blue Apron

2017-02-02 Thread siddharth anand
Hope this went well. Feel free to share videos and slides. Also, it would
be great if we could create a NY Apache Airflow meetup page. Would you be
interested in setting one up? It would be easier to promote a meetup page
on social media than an email on this list.

-s

On Fri, Jan 20, 2017 at 10:37 AM, Joseph Napolitano <
joseph.napolit...@blueapron.com.invalid> wrote:

> Hi all!
>
> I want to officially announce a Meetup for Airflow in NYC!  I'm looking
> forward to meeting other community members to share knowledge and network.
>
> We may create an official Meetup page, but in the meantime please signup
> here:
> https://docs.google.com/spreadsheets/d/1WmfgZeExSVdLf-
> u1uh3IleeHy8QTwaJ4BkkSkVM-X1E/edit?usp=sharing
>
> I have a confirmed date of February 1st @ 6:30 at Blue Apron's
> headquarters.
>
> In Summary:
> Date: Feb 1st
> Time 6:30 - 9pm EST
> Location: 40 W 23rd St. New York, NY 10010
> https://www.google.com/maps/place/40+W+23rd+St,+New+York,+
> NY+10010/@40.7420885,-73.9938457,17z/data=!3m1!4b1!4m5!
> 3m4!1s0x89c259a46471d2a1:0xc2517d92b1b68bba!8m2!3d40.
> 7420845!4d-73.9916517?hl=en
>
> We're on the 5th floor.  You need to check in with security in the building
> lobby, and again when you reach the fifth floor to get a name tag.
>
> Food & drink will be provided!
>
> Let me know if you would like to present.  We'd love to hear about your
> architecture and war stories.  We will have a large projector and PA system
> setup.
>
> Sorry about the short notice, but it took a while to get approved over the
> holidays and new year.  If we can't generate enough interest we can
> certainly push it back a month.
>
> Thanks, and Bon Appétite!
>
> --
> *Joe Napolitano *| Sr. Data Engineer
> www.blueapron.com | 5 Crosby Street, New York, NY 10013
>


Re: Airflow Meetup @ Paypal (San Jose)

2017-02-02 Thread siddharth anand
Cool! I've tweeted it out using the ApacheAirflow account and also added it
to https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements

FYI,
I was mistaken about the drive between Strata and PayPal. I had used the
wrong venue. Strata is at the SJ Convention Center this year.  It's still
very close.. 5 miles (12 minutes - reverse commute).

https://goo.gl/maps/nwxmkYsNFKQ2

BTW, I heard Bolke may be attending ;-) Bolke, would you like to speak at
the Meetup?

Jakob (other committers), will be down here for Strata?

-s

On Thu, Feb 2, 2017 at 5:02 PM, Jayesh Senjaliya <jhsonl...@gmail.com>
wrote:

> Sure,
> I have created event on Meetup :
> https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/
> 237412864/
>
> Thanks for helping on this Siddharth.
> Jayesh
>
>
>
> On Wed, Feb 1, 2017 at 7:50 PM, siddharth anand <san...@apache.org> wrote:
>
> > IMHO, I'd publish the meet-up. You still have 6 weeks to find a 3rd
> > speaker. If Bolke and Alex are traveling all the way for Strata, perhaps
> > one of them can speak :-)
> >
> > -s
> >
> > On Wed, Feb 1, 2017 at 1:48 PM, Russell Jurney <russell.jur...@gmail.com
> >
> > wrote:
> >
> > > Maybe start a new thread with a title "Call for Speakers for Meetup on
> > Mar
> > > 14" ?
> > >
> > > On Wed, Feb 1, 2017 at 11:59 AM Jayesh Senjaliya <jhsonl...@gmail.com>
> > > wrote:
> > >
> > > > Yes, we are still waiting for more speakers.
> > > >
> > > > can anybody from Airbnb present ?
> > > >
> > > > anybody else ?
> > > >
> > > >
> > > > - Jayesh
> > > >
> > > > On Tue, Jan 31, 2017 at 8:16 PM, siddharth anand <san...@apache.org>
> > > > wrote:
> > > >
> > > > > Jayesh,
> > > > > Looks good. No need to vote. Just publish a new event with details
> on
> > > the
> > > > > meet-up page:
> > > > > https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/
> > > > >
> > > > > Please add a short abstract as well for the talks and find a 3rd
> > > speaker.
> > > > > Please be sure to record the meet-up so that we can publish it.
> Once
> > > the
> > > > > meet-up event is up, please respond to this email! We can help
> > promote
> > > > it.
> > > > > I suggest picking a start time after the Strata talks end but not
> > super
> > > > > late either.
> > > > >
> > > > > -s
> > > > >
> > > > > On Tue, Jan 31, 2017 at 9:19 AM, Jayesh Senjaliya <
> > jhsonl...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > HI All,
> > > > > >
> > > > > > √ I have approval from Paypal to host Airflow meetup.  How about
> > > March
> > > > > 14th
> > > > > > ? Please vote.
> > > > > >
> > > > > > √ we will have food and drinks.
> > > > > > Please let me know if anybody has any special request, I will try
> > to
> > > > > > accommodate :)
> > > > > >
> > > > > > For presentations:
> > > > > >  1) Disk recommission using airflow with overall automation of
> > > "Hadoop
> > > > > Node
> > > > > > and Disk Remediation". - Jayesh Senjaliya ( Paypal )
> > > > > >  2) Predictive Analytics with Airflow and PySpark - ( Russell
> > Jurney
> > > )
> > > > > >
> > > > > >
> > > > > > Please send request to present to this email thread if you are
> > > > interested
> > > > > > in presenting.
> > > > > >
> > > > > > Thanks
> > > > > > Jayesh
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 26, 2017 at 4:08 PM, Russell Jurney <
> > > > > russell.jur...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Cool!
> > > > > > >
> > > > > > > On Wed, Jan 25, 2017 at 11:23 PM Jayesh Senjaliya <
> > > > jhsonl...@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > 

Re: Airflow Meetup @ Paypal (San Jose)

2017-02-01 Thread siddharth anand
IMHO, I'd publish the meet-up. You still have 6 weeks to find a 3rd
speaker. If Bolke and Alex are traveling all the way for Strata, perhaps
one of them can speak :-)

-s

On Wed, Feb 1, 2017 at 1:48 PM, Russell Jurney <russell.jur...@gmail.com>
wrote:

> Maybe start a new thread with a title "Call for Speakers for Meetup on Mar
> 14" ?
>
> On Wed, Feb 1, 2017 at 11:59 AM Jayesh Senjaliya <jhsonl...@gmail.com>
> wrote:
>
> > Yes, we are still waiting for more speakers.
> >
> > can anybody from Airbnb present ?
> >
> > anybody else ?
> >
> >
> > - Jayesh
> >
> > On Tue, Jan 31, 2017 at 8:16 PM, siddharth anand <san...@apache.org>
> > wrote:
> >
> > > Jayesh,
> > > Looks good. No need to vote. Just publish a new event with details on
> the
> > > meet-up page:
> > > https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/
> > >
> > > Please add a short abstract as well for the talks and find a 3rd
> speaker.
> > > Please be sure to record the meet-up so that we can publish it. Once
> the
> > > meet-up event is up, please respond to this email! We can help promote
> > it.
> > > I suggest picking a start time after the Strata talks end but not super
> > > late either.
> > >
> > > -s
> > >
> > > On Tue, Jan 31, 2017 at 9:19 AM, Jayesh Senjaliya <jhsonl...@gmail.com
> >
> > > wrote:
> > >
> > > > HI All,
> > > >
> > > > √ I have approval from Paypal to host Airflow meetup.  How about
> March
> > > 14th
> > > > ? Please vote.
> > > >
> > > > √ we will have food and drinks.
> > > > Please let me know if anybody has any special request, I will try to
> > > > accommodate :)
> > > >
> > > > For presentations:
> > > >  1) Disk recommission using airflow with overall automation of
> "Hadoop
> > > Node
> > > > and Disk Remediation". - Jayesh Senjaliya ( Paypal )
> > > >  2) Predictive Analytics with Airflow and PySpark - ( Russell Jurney
> )
> > > >
> > > >
> > > > Please send request to present to this email thread if you are
> > interested
> > > > in presenting.
> > > >
> > > > Thanks
> > > > Jayesh
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Jan 26, 2017 at 4:08 PM, Russell Jurney <
> > > russell.jur...@gmail.com>
> > > > wrote:
> > > >
> > > > > Cool!
> > > > >
> > > > > On Wed, Jan 25, 2017 at 11:23 PM Jayesh Senjaliya <
> > jhsonl...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Russell,
> > > > > >
> > > > > > yes, I will be presenting from Paypal side.
> > > > > > Once i have official approval from Paypal, I will sent out email.
> > > > > > I am basically going by the steps what Siddharth outlined earlier
> > in
> > > > the
> > > > > > thread.
> > > > > >
> > > > > > Thanks
> > > > > > Jayesh
> > > > > >
> > > > > > On Wed, Jan 25, 2017 at 7:50 PM, Russell Jurney <
> > > > > russell.jur...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Is someone from Paypal likely to speak? Should we start a new
> > > thread
> > > > > > with a
> > > > > > > call for another speaker? There was mention of three being
> > needed.
> > > > > > >
> > > > > > > On Wed, Jan 25, 2017 at 5:33 PM Jayesh Senjaliya <
> > > > jhsonl...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Yes I am waiting for response from facilities about it, most
> > > likely
> > > > > by
> > > > > > > > early next week.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Jayesh
> > > > > > > >
> > > > > > > > On Wed, Jan 25, 2017 at 4:52 PM, Russell Jurney <
> > > > > > > russell.jur...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> &g

Re: Airflow Meetup in NYC @ Blue Apron

2017-02-01 Thread siddharth anand
Also, if you record a video, we'd be happy to place it on the wiki and
promote it via our twitter feed, etc...
-s

On Mon, Jan 30, 2017 at 5:34 PM, Boris Tyukin  wrote:

> i hope you guys can share presentation slides at least for all of us who
> are not in NYC
>
> On Mon, Jan 30, 2017 at 7:33 PM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > I'd love to watch, is there any way you guys can livecast or share a
> video
> > after the event?
> >
> > Looking forward to it!
> >
> > Max
> >
> > On Mon, Jan 30, 2017 at 1:56 PM, Joseph Napolitano <
> > joseph.napolit...@blueapron.com.invalid> wrote:
> >
> > > Hi All!
> > >
> > > We are excited to host an Airflow Meetup in NYC.  We will have a guest
> > > speaker from Spotify!
> > >
> > > The Meetup is in 2 days, on Feb 1st @ 6:30pm at Blue Apron's
> > headquarters.
> > >
> > > In Summary:
> > > Date: Feb 1st
> > > Time 6:30 - 9pm EST
> > > Location: 40 W 23rd St. New York, NY 10010
> > > https://www.google.com/maps/place/40+W+23rd+St,+New+York,+NY
> > > +10010/@40.7420885,-73.9938457,17z/data=!3m1!4b1!4m5!3m4!
> > > 1s0x89c259a46471d2a1:0xc2517d92b1b68bba!8m2!3d40.
> > > 7420845!4d-73.9916517?hl=en
> > >
> > > Schedule:
> > > 6:30 - 7:15 Meet and greet
> > > 7:15 - ? Presentations from Blue Apron and Spotify
> > >
> > > It's not too late to signup for a presentation.  We will stick around
> as
> > > late as 9pm.
> > >
> > > We don't have an official Meetup page, so please sign up here :)
> > > The signup sheet is available here:
> > > https://docs.google.com/spreadsheets/d/1WmfgZeExSVdLf-u1uh3I
> > > leeHy8QTwaJ4BkkSkVM-X1E/edit?usp=sharing
> > >
> > > Feel free to share the signup sheet with other parties.
> > >
> > > As mentioned, we're on the 5th floor.  You need to check in with
> security
> > > in the building lobby, and again when you reach the fifth floor to get
> a
> > > name tag.
> > >
> > > Thanks, and looking forward to meeting everyone!
> > >
> > > Cheers,
> > > Joe Nap
> > >
> > >
> > >
> > > On Fri, Jan 20, 2017 at 1:37 PM, Joseph Napolitano <
> > > joseph.napolit...@blueapron.com> wrote:
> > >
> > > > Hi all!
> > > >
> > > > I want to officially announce a Meetup for Airflow in NYC!  I'm
> looking
> > > > forward to meeting other community members to share knowledge and
> > > network.
> > > >
> > > > We may create an official Meetup page, but in the meantime please
> > signup
> > > > here:
> > > > https://docs.google.com/spreadsheets/d/1WmfgZeExSVdLf-u1uh3I
> > > > leeHy8QTwaJ4BkkSkVM-X1E/edit?usp=sharing
> > > >
> > > > I have a confirmed date of February 1st @ 6:30 at Blue Apron's
> > > > headquarters.
> > > >
> > > > In Summary:
> > > > Date: Feb 1st
> > > > Time 6:30 - 9pm EST
> > > > Location: 40 W 23rd St. New York, NY 10010
> > > > https://www.google.com/maps/place/40+W+23rd+St,+New+York,+NY
> > > > +10010/@40.7420885,-73.9938457,17z/data=!3m1!4b1!4m5!3m4!
> > > > 1s0x89c259a46471d2a1:0xc2517d92b1b68bba!8m2!3d40.7420845!4d-
> > > > 73.9916517?hl=en
> > > >
> > > > We're on the 5th floor.  You need to check in with security in the
> > > > building lobby, and again when you reach the fifth floor to get a
> name
> > > tag.
> > > >
> > > > Food & drink will be provided!
> > > >
> > > > Let me know if you would like to present.  We'd love to hear about
> your
> > > > architecture and war stories.  We will have a large projector and PA
> > > system
> > > > setup.
> > > >
> > > > Sorry about the short notice, but it took a while to get approved
> over
> > > the
> > > > holidays and new year.  If we can't generate enough interest we can
> > > > certainly push it back a month.
> > > >
> > > > Thanks, and Bon Appétite!
> > > >
> > > > --
> > > > *Joe Napolitano *| Sr. Data Engineer
> > > > www.blueapron.com | 5 Crosby Street, New York, NY 10013
> > > >
> > >
> > >
> > >
> > > --
> > > *Joe Napolitano *| Sr. Data Engineer
> > > www.blueapron.com | 5 Crosby Street, New York, NY 10013
> > >
> >
>


Re: Airflow Meetup @ Paypal (San Jose)

2017-01-31 Thread siddharth anand
Jayesh,
Looks good. No need to vote. Just publish a new event with details on the
meet-up page:
https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/

Please add a short abstract as well for the talks and find a 3rd speaker.
Please be sure to record the meet-up so that we can publish it. Once the
meet-up event is up, please respond to this email! We can help promote it.
I suggest picking a start time after the Strata talks end but not super
late either.

-s

On Tue, Jan 31, 2017 at 9:19 AM, Jayesh Senjaliya <jhsonl...@gmail.com>
wrote:

> HI All,
>
> √ I have approval from Paypal to host Airflow meetup.  How about March 14th
> ? Please vote.
>
> √ we will have food and drinks.
> Please let me know if anybody has any special request, I will try to
> accommodate :)
>
> For presentations:
>  1) Disk recommission using airflow with overall automation of "Hadoop Node
> and Disk Remediation". - Jayesh Senjaliya ( Paypal )
>  2) Predictive Analytics with Airflow and PySpark - ( Russell Jurney )
>
>
> Please send request to present to this email thread if you are interested
> in presenting.
>
> Thanks
> Jayesh
>
>
>
>
> On Thu, Jan 26, 2017 at 4:08 PM, Russell Jurney <russell.jur...@gmail.com>
> wrote:
>
> > Cool!
> >
> > On Wed, Jan 25, 2017 at 11:23 PM Jayesh Senjaliya <jhsonl...@gmail.com>
> > wrote:
> >
> > > Hi Russell,
> > >
> > > yes, I will be presenting from Paypal side.
> > > Once i have official approval from Paypal, I will sent out email.
> > > I am basically going by the steps what Siddharth outlined earlier in
> the
> > > thread.
> > >
> > > Thanks
> > > Jayesh
> > >
> > > On Wed, Jan 25, 2017 at 7:50 PM, Russell Jurney <
> > russell.jur...@gmail.com>
> > > wrote:
> > >
> > > > Is someone from Paypal likely to speak? Should we start a new thread
> > > with a
> > > > call for another speaker? There was mention of three being needed.
> > > >
> > > > On Wed, Jan 25, 2017 at 5:33 PM Jayesh Senjaliya <
> jhsonl...@gmail.com>
> > > > wrote:
> > > >
> > > > > Yes I am waiting for response from facilities about it, most likely
> > by
> > > > > early next week.
> > > > >
> > > > > Thanks
> > > > > Jayesh
> > > > >
> > > > > On Wed, Jan 25, 2017 at 4:52 PM, Russell Jurney <
> > > > russell.jur...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Boris, would you be able to attend an evening meetup on the
> nights
> > of
> > > > > 3/15
> > > > > > or 3/16? I think attendance would be better on one of those days,
> > as
> > > > many
> > > > > > people don't attend the tutorial days.
> > > > > >
> > > > > > Paypal sounds awesome as a venue. Would they handle food and
> drink
> > as
> > > > > well?
> > > > > >
> > > > > > On Wed, Jan 25, 2017 at 11:28 AM, Boris Tyukin <
> > > bo...@boristyukin.com>
> > > > > > wrote:
> > > > > >
> > > > > > > it would be great!
> > > > > > >
> > > > > > > On Wed, Jan 25, 2017 at 1:26 PM, siddharth anand <
> > > san...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Paypal is quite close (11 minute drive on local streets per
> > > google
> > > > > > Maps :
> > > > > > > > https://goo.gl/maps/otUpve9StxJ2) to the Strata venue, so it
> > > would
> > > > > > make
> > > > > > > > sense to hold the meet-up at Paypal during Strata week.
> > > > > > > >
> > > > > > > > -s
> > > > > > > >
> > > > > > > > On Wed, Jan 25, 2017 at 5:48 AM, Boris Tyukin <
> > > > bo...@boristyukin.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > any way to schedule it during Strata week? would love to
> > attend
> > > > one
> > > > > > of
> > > > > > > > > airflow meetups but I am in Florida. 03/13 or 03/14 would
> > work
> > > > the
> > > > > > bes

Re: Airflow Meetup @ Paypal (San Jose)

2017-01-25 Thread siddharth anand
Paypal is quite close (11 minute drive on local streets per google Maps :
https://goo.gl/maps/otUpve9StxJ2) to the Strata venue, so it would make
sense to hold the meet-up at Paypal during Strata week.

-s

On Wed, Jan 25, 2017 at 5:48 AM, Boris Tyukin <bo...@boristyukin.com> wrote:

> any way to schedule it during Strata week? would love to attend one of
> airflow meetups but I am in Florida. 03/13 or 03/14 would work the best
> because first two days of Strata are training days and not very busy
>
> On Tue, Jan 24, 2017 at 10:33 PM, Russell Jurney <russell.jur...@gmail.com
> >
> wrote:
>
> > Unfortunately, Strata has no room for us :( Paypal sounds like a great
> > option.
> >
> > Jayesh, sounds like you're driving? :)
> >
> > On Tue, Jan 24, 2017 at 12:04 PM, siddharth anand <san...@apache.org>
> > wrote:
> >
> > > Russell,
> > > Let us know what you learn about Strata.
> > >
> > > Even if Strata offers up rooms to communities for free (based on
> > > information such as community size, etc...), I'm doubtful they would
> > cover
> > > food and drinks. That cost would need to be carried by a sponsor --
> i.e.
> > > you'd need to find a sponsor for it. We considered something similar
> for
> > > QCon -- however, our venue costs were fairly high so the catering cost
> > for
> > > most budding communities and their sponsors were a turn-off. Given that
> > > Strata is a large conference hosted at a largish (i.e. expensive)
> hotel,
> > > I'd expect some of the same cost issues, unless Strata co-sponsored it.
> > >
> > > I'm all for something at Strata, but just wanted to share my $0.02.
> Since
> > > this topic came up on Jayesh's thread, I'd like to time-bound it. If
> you
> > > don't hear back by say Friday with specifics from Strata, I'd say that
> > > Jayesh's wins by first-mover privilege.
> > >
> > > Jayesh,
> > > If we don't hear from Strata by Friday, I'd say we continue with your
> > idea.
> > > I've already promoted your user to Event Organizer on
> > > https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/
> > >
> > > You'd need to follow the steps below:
> > >
> > >- Get approval from Paypal to host it
> > >- Ping this list for 2 more speakers - I'd imagine someone from
> PayPal
> > >will also speak about PayPal's use of Airflow.
> > >- Create the meet-up event (ideally once you have all 3 speakers)
> > >- Update this list with a link to this event (and ping me if I don't
> > see
> > >it) -- I'll then promote it on our twitter channel, etc...
> > >
> > > -s
> > >
> > > On Mon, Jan 23, 2017 at 4:42 PM, Jayesh Senjaliya <jhsonl...@gmail.com
> >
> > > wrote:
> > >
> > > > I am actually up for both, Paypal can host after Strata.
> > > >
> > > > waiting for community to comment as well.
> > > >
> > > > Thanks
> > > > Jayesh
> > > >
> > > >
> > > > On Mon, Jan 23, 2017 at 3:45 PM, Russell Jurney <
> > > russell.jur...@gmail.com>
> > > > wrote:
> > > >
> > > > > I reached out and am awaiting to hear if they have space. They did
> > say
> > > > that
> > > > > attendees of meetups in the evening do NOT need to have a Strata
> > pass.
> > > > >
> > > > > I'm new here, so I don't want to hijack your meetup. If you guys
> want
> > > > > Paypal, lets have Paypal host. I'm sure it will be great either
> way.
> > > > >
> > > > > On Fri, Jan 20, 2017 at 1:10 PM, Russell Jurney <
> > > > russell.jur...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I think if we hold it in the evening, there is no requirement to
> > buy
> > > a
> > > > > > ticket to come to the meetup. Let me verify.
> > > > > >
> > > > > > On Fri, Jan 20, 2017 at 12:45 PM, Jayesh Senjaliya <
> > > > jhsonl...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Russell,
> > > > > >>
> > > > > >> Sure, Strata will have its own flavor of visitors, but the
> tickets
> > > are
> > > > > >> kind of expensive too for everybody to join.
> > > > > >>
> > > >

Re: Airflow Meetup @ Paypal (San Jose)

2017-01-24 Thread siddharth anand
Russell,
Let us know what you learn about Strata.

Even if Strata offers up rooms to communities for free (based on
information such as community size, etc...), I'm doubtful they would cover
food and drinks. That cost would need to be carried by a sponsor -- i.e.
you'd need to find a sponsor for it. We considered something similar for
QCon -- however, our venue costs were fairly high so the catering cost for
most budding communities and their sponsors were a turn-off. Given that
Strata is a large conference hosted at a largish (i.e. expensive) hotel,
I'd expect some of the same cost issues, unless Strata co-sponsored it.

I'm all for something at Strata, but just wanted to share my $0.02. Since
this topic came up on Jayesh's thread, I'd like to time-bound it. If you
don't hear back by say Friday with specifics from Strata, I'd say that
Jayesh's wins by first-mover privilege.

Jayesh,
If we don't hear from Strata by Friday, I'd say we continue with your idea.
I've already promoted your user to Event Organizer on
https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/

You'd need to follow the steps below:

   - Get approval from Paypal to host it
   - Ping this list for 2 more speakers - I'd imagine someone from PayPal
   will also speak about PayPal's use of Airflow.
   - Create the meet-up event (ideally once you have all 3 speakers)
   - Update this list with a link to this event (and ping me if I don't see
   it) -- I'll then promote it on our twitter channel, etc...

-s

On Mon, Jan 23, 2017 at 4:42 PM, Jayesh Senjaliya 
wrote:

> I am actually up for both, Paypal can host after Strata.
>
> waiting for community to comment as well.
>
> Thanks
> Jayesh
>
>
> On Mon, Jan 23, 2017 at 3:45 PM, Russell Jurney 
> wrote:
>
> > I reached out and am awaiting to hear if they have space. They did say
> that
> > attendees of meetups in the evening do NOT need to have a Strata pass.
> >
> > I'm new here, so I don't want to hijack your meetup. If you guys want
> > Paypal, lets have Paypal host. I'm sure it will be great either way.
> >
> > On Fri, Jan 20, 2017 at 1:10 PM, Russell Jurney <
> russell.jur...@gmail.com>
> > wrote:
> >
> > > I think if we hold it in the evening, there is no requirement to buy a
> > > ticket to come to the meetup. Let me verify.
> > >
> > > On Fri, Jan 20, 2017 at 12:45 PM, Jayesh Senjaliya <
> jhsonl...@gmail.com>
> > > wrote:
> > >
> > >> Hi Russell,
> > >>
> > >> Sure, Strata will have its own flavor of visitors, but the tickets are
> > >> kind of expensive too for everybody to join.
> > >>
> > >> I agree on turnouts though, so we can try for Strata first and
> fallback
> > to
> > >> regular
> > >> meetup in March end or even April if we dont get space in Strata.
> > >>
> > >> or we can just do both since there will be different group of people
> at
> > >> both places.
> > >>
> > >> - Jayesh
> > >>
> > >>
> > >> On Fri, Jan 20, 2017 at 12:35 PM, Russell Jurney <
> > >> russell.jur...@gmail.com>
> > >> wrote:
> > >>
> > >> > As I mentioned in the other thread, I am available to speak on
> > >> Predictive
> > >> > Analytics with Airflow and PySpark.
> > >> >
> > >> > Mid march has been suggested. What about the evening of Tuesday,
> 3/14
> > -
> > >> the
> > >> > first day of sessions at Strata? We could promote the meetup with
> the
> > >> > conference, get it listed as an evening event. Alternative day could
> > be
> > >> > Wednesday 3/15, 2nd day of Strata sessions.
> > >> >
> > >> > This brings up the question... should we maybe have the meetup at
> > >> Strata?
> > >> > Just a thought, we might get better turnout if we get a room from
> > >> Strata.
> > >> > I'm sure they would agree. I'm new here; just an idea.
> > >> >
> > >> > Russ
> > >> >
> > >> > On Fri, Jan 20, 2017 at 11:36 AM, Jacky 
> wrote:
> > >> >
> > >> > > Hello Airflow community !
> > >> > >
> > >> > > I am Jayesh from Paypal, and at last meetup we briefly talked
> about
> > >> > > hosting next one and I offered to host at Paypal office in San
> Jose.
> > >> > >
> > >> > > If we can come up with some dates, I can talk to facilities to
> > reserve
> > >> > > space accordingly. so that it dont become short notice for the
> > >> community.
> > >> > >
> > >> > > Any thoughts/comments?
> > >> > >
> > >> > > Thanks
> > >> > > Jayesh
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
> relato.io
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
> > >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
> >
>


QCon London

2017-01-23 Thread siddharth anand
Hi Folks!
I will be attending QCon London Mar 5-8. Happy to meet locals and talk
Airflow and data infrastructure if there is interest. FYI, I'm also a
co-chair for QCon London and would be very interested in getting a deeper
understanding of the local (London and environs) tech scene and potential
speakers for next year's QCon London.

If interested in meeting up, you can directly email me.  I'll be at the
conference most days hosted at http://qeiicentre.london/getting-here/
-s


Re: Medium series: Airflow for Google Cloud

2017-01-20 Thread siddharth anand
Looks like you don't have an account.. once you create one.. let me know
and I will grant you admin perms on the wiki.
-s

On Fri, Jan 20, 2017 at 6:08 PM, siddharth anand <san...@apache.org> wrote:

> I've added it to https://cwiki.apache.org/confluence/display/AIRFLOW/
> Airflow+Links
>
> Feel free to add future posts to this page. You should have access.
> -s
>
> On Fri, Jan 20, 2017 at 3:23 PM, Alex Van Boxel <a...@vanboxel.be> wrote:
>
>> Hey all,
>>
>> now that 1.8 is nearing release. I finally started writing about Airflow.
>> As it's me writing, I'll be focussing on the Google Cloud integration.
>>
>> Today's post is about BigQuery
>> https://medium.com/google-cloud/airflow-for-google-cloud-
>> part-1-d7da9a048aa4#.qe6f0gldf
>>
>> Next one will be about DataProc.
>> --
>>   _/
>> _/ Alex Van Boxel
>>
>
>


Re: Medium series: Airflow for Google Cloud

2017-01-20 Thread siddharth anand
I've added it to
https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links

Feel free to add future posts to this page. You should have access.
-s

On Fri, Jan 20, 2017 at 3:23 PM, Alex Van Boxel  wrote:

> Hey all,
>
> now that 1.8 is nearing release. I finally started writing about Airflow.
> As it's me writing, I'll be focussing on the Google Cloud integration.
>
> Today's post is about BigQuery
> https://medium.com/google-cloud/airflow-for-google-
> cloud-part-1-d7da9a048aa4#.qe6f0gldf
>
> Next one will be about DataProc.
> --
>   _/
> _/ Alex Van Boxel
>


Re: New book covers Airflow with PySpark: Agile Data Science 2.0 (O'Reilly, 2017) AND Airflow Meetup?

2017-01-19 Thread siddharth anand
Mid-March might be a good time given that we had 2 meet-ups recently.

We have a wiki about Airflow meet-ups :
https://cwiki.apache.org/confluence/display/AIRFLOW/Meetups. Feel free to
ask this list if someone would like to host..I'd imagine interest would
primarily come from other members of the community, but we're open to all
ideas. Since the last meet-up was in SF, it would be great if the next one
were in the South Bay.

-s

On Thu, Jan 19, 2017 at 6:46 PM, Russell Jurney <russell.jur...@gmail.com>
wrote:

> Siddharth, nice to hear from you. Great to hear!
>
> I'm just starting a consultancy called Data Syndrome around the book, and I
> work from home, which doesn't put me in a great position to personally host
> the meetup. If you need someone to organize it and to seek a venue, I can
> do that. How does that sound? I'm sure I could find someone to host it.
>
> When would be a good date, do you think? Late February?
>
> On Thu, Jan 19, 2017 at 5:19 PM, siddharth anand <san...@apache.org>
> wrote:
>
> > Sounds like a great idea. We are looking for someone to host the next
> one..
> > once one is announced, you can sign up as a speaker.. You are also
> welcome
> > to host a meet-up if you like.
> > -s
> >
> > On Thu, Jan 19, 2017 at 4:39 PM, Russell Jurney <
> russell.jur...@gmail.com>
> > wrote:
> >
> > > Hello! My name is Russell Jurney. I am a relatively new Airflow user
> and
> > > just joined the group. I am an Azkaban refugee, and an enemy of Oozie
> and
> > > the tyranny of XML.
> > >
> > > I wanted to tell you about my new book, out in pre-release, called
> Agile
> > > Data Science 2.0 <http://bit.ly/agile_data_science> (O'Reilly 2017).
> In
> > > the
> > > book, we use Airflow in chapter 2, Setup, in a way similar to the
> Airflow
> > > tutorial. Then, in chapter 8, Deploying Predictive Systems, we use
> > Airflow
> > > to deploy a predictive system built with PySpark and Spark MLlib.
> > >
> > > Some highlights in the code at http://github.com/rjurney/
> > Agile_Data_Code_2
> > > :
> > >
> > >- ch02/airflow_test.py
> > ><https://github.com/rjurney/Agile_Data_Code_2/blob/master/
> > > ch02/airflow_test.py>
> > > is
> > >a complete Airflow/PySpark tutorial along with
> > ch02/pyspark_task_one.py
> > ><https://github.com/rjurney/Agile_Data_Code_2/blob/master/
> > > ch02/pyspark_task_one.py>
> > > and
> > >ch02/pyspark_task_two.py
> > ><https://github.com/rjurney/Agile_Data_Code_2/blob/master/
> > > ch02/pyspark_task_two.py>
> > >- The airflow setup for chapter 8 is at ch08/airflow/setup.py
> > ><https://github.com/rjurney/Agile_Data_Code_2/blob/master/
> > > ch08/airflow/setup.py>
> > >.
> > >- The scripts that it operates on are in ch08/
> > ><https://github.com/rjurney/Agile_Data_Code_2/blob/master/ch08> and
> > > show
> > >things like how to use '{{ ds }}' and other parameters to hook your
> > > scripts
> > >into 'airflow backfill' and other features.
> > >- ch08/make_predictions.py
> > ><https://github.com/rjurney/Agile_Data_Code_2/blob/master/
> > > ch08/make_predictions.py>
> > > shows
> > >how to setup a PySpark environment in a script in a way that can
> work
> > > with
> > >Airflow.
> > >
> > > If there is any interest, I would love to present on something like
> > > "Building Predictive Systems with Spark and Airflow" at an upcoming
> > Airflow
> > > meetup.
> > >
> > > Thanks!
> > > --
> > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
> > >
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
>


Re: Airflow Github Mirror is not synchronizing

2017-01-03 Thread siddharth anand
The repo mirror is syncing now.
-s
On Sat, Dec 31, 2016 at 11:22 AM siddharth anand <san...@apache.org> wrote:

> FYI!
> I've reopened my earlier JIRA issue. It looks like multiple Apache
> Projects are reporting the same.
> https://issues.apache.org/jira/browse/INFRA-12949
>
> Newly merged changes won't be available to contributors/users until the
> mirroring issue is fixed.
>
>
>


Airflow Github Mirror is not synchronizing

2016-12-31 Thread siddharth anand
FYI!
I've reopened my earlier JIRA issue. It looks like multiple Apache Projects
are reporting the same.
https://issues.apache.org/jira/browse/INFRA-12949

Newly merged changes won't be available to contributors/users until the
mirroring issue is fixed.


[AIRFLOW-676] Do not allow Pools with 0 slots

2016-12-31 Thread siddharth anand
https://github.com/apache/incubator-airflow/pull/1967

Hi Folks!
Would appreciate your feedback on the following pull request.

If you'd like this functionality, please provide a +1 on the PR itself.
-s


Re: Podling Report Reminder - January 2017

2016-12-30 Thread siddharth anand
I'll put this together.

-s

On Thu, Dec 29, 2016 at 6:31 PM,  wrote:

> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 18 January 2017, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, January 04).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
>
> This should be appended to the Incubator Wiki page at:
>
> https://wiki.apache.org/incubator/January2017
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>


Re: Airflow 2.0

2016-12-06 Thread siddharth anand
Max,
Do you have time to summarize this thread? Perhaps, publish it on the Wiki!
-s

On Thu, Dec 1, 2016 at 12:27 PM, Van Klaveren, Brian N. <
b...@slac.stanford.edu> wrote:

> With the announcement of AWS Batch (https://aws.amazon.com/batch/), and
> my own selfish needs, I think it'd be really great to generally support
> Batch systems like AWS Batch, Slurm, and Torque as executors, potentially
> with an extension of the BashOperator, but I think it might actually be
> flexible enough to not need a dedicated BatchOperator.
>
> Brian
>
>
> On Nov 24, 2016, at 7:40 AM, Maycock, Luke  oliverwyman.com> wrote:
>
> Add FK to dag_run to the task_instance table on Postgres so that
> task_instances can be uniquely attributed to dag runs.
>
>
> + 1
>
>
> Also, I believe xcoms would need to be addressed in the same way at the
> same time - I have added a comment to that affect on
> https://issues.apache.org/jira/browse/AIRFLOW-642
>
>
> I believe this would be implemented for all supported back-ends, not just
> PostgreSQL.
>
>
> Cheers,
> Luke Maycock
> OLIVER WYMAN
> luke.mayc...@affiliate.oliverwyman.com mayc...@affiliate.oliverwyman.com> affiliate.oliverwyman.com>
> www.oliverwyman.com www.oliverwyman.com/>
>
>
>
> 
> From: Arunprasad Venkatraman >
> Sent: 21 November 2016 18:16
> To: dev@airflow.incubator.apache.org incubator.apache.org>
> Subject: Re: Airflow 2.0
>
> Add FK to dag_run to the task_instance table on Postgres so that
> task_instances can be uniquely attributed to dag runs.
> Ensure scheduler can be run continuously without needing restarts.
> Ensure scheduler can handle tens of thousands of active workflows
>
> +1
>
> We are planning to run around 40,000 tasks a day using airflow and some of
> them are critical to give quick feedback to developers. Currently having
> execution date to uniquely identify tasks does not work for us since we
> mainly trigger dags (instead of running them on schedule). And we collide
> with 1 sec granularity on several occasions.  Having a task uuid or
> associating dag_run to task_instance as suggested by Sergei table will help
> mitigate this issue for us and would make it easy for us to update task
> results too. We would be happy to start working on this if it makes sense.
>
> Also we are wondering if there were any work done in community to support
> multiple schedulers(or alternates to mysql/Postgres) because 1 scheduler
> does not scale for us well and we see slow down of up to couple of minute
> sometimes when there are several pending tasks.
>
> Thanks
>
>
>
> On Mon, Nov 21, 2016 at 9:57 AM, Chris Riccomini  >
> wrote:
>
> Ensure scheduler can be run continuously without needing restarts
>
> +1
>
> On Mon, Nov 21, 2016 at 5:25 AM, David Batista > wrote:
> A small request, which might be handy.
>
> Having the possibility to select multiple tasks and mark them as
> Success/Clear/etc.
>
> Allow the UI to select individual tasks (i.e., inside the Tree View) and
> then have a button to mark them as Success/Clear/etc.
>
> On 21 November 2016 at 14:22, Sergei Iakhnin > wrote:
>
> I've been running Airflow on 1500 cores in the context of scientific
> workflows for the past year and a half. Features that would be
> important to
> me for 2.0:
>
> - Add FK to dag_run to the task_instance table on Postgres so that
> task_instances can be uniquely attributed to dag runs.
> - Ensure scheduler can be run continuously without needing restarts.
> Right
> now it gets into some ill-determined bad state forcing me to restart it
> every 20 minutes.
> - Ensure scheduler can handle tens of thousands of active workflows.
> Right
> now this results in extremely long scheduling times and inconsistent
> scheduling even at 2 thousand active workflows.
> - Add more flexible task scheduling prioritization. The default
> prioritization is the opposite of the behaviour I want. I would prefer
> that
> downstream tasks always have higher priority than upstream tasks to
> cause
> entire workflows to tend to complete sooner, rather than scheduling
> tasks
> from other workflows. Having a few scheduling prioritization strategies
> would be beneficial here.
> - Provide better support for manually-triggered DAGs on the UI i.e. by
> showing them as queued.
> - Provide some resource management capabilities via something like slots
> that can be defined on workers and occupied by tasks. Using celery's
> concurrency parameter at the airflow server level is too coarse-grained
> as
> it forces all workers to be the same, and does not allow proper resource
> management when different workflow tasks have different resource
> requirements 

Re: Simple Feature Request

2016-11-23 Thread siddharth anand
I keep forgetting about inline images and attachments being stripped out by
apache mail servers.. ugh..

https://www.dropbox.com/s/4wrju14j3gh09mk/Screenshot%202016-11-23%2012.24.45.png?dl=0
-s

On Wed, Nov 23, 2016 at 9:37 PM, Sumit Maheshwari <sumeet.ma...@gmail.com>
wrote:

> +1.. nice to have it..
>
>
>
> On Thu, Nov 24, 2016 at 3:51 AM, siddharth anand <san...@apache.org>
> wrote:
>
> > If we support a test query dialog, we could just execute the query via
> each
> > hook's pre-existing execute method.
> >
> > -s
> >
> > On Wed, Nov 23, 2016 at 2:04 PM, Bolke de Bruin <bdbr...@gmail.com>
> wrote:
> >
> > > It is nice, but probably not so simple. Every hook would need to
> > > incorporate a “test” function.
> > >
> > >
> > > > Op 23 nov. 2016, om 21:28 heeft siddharth anand <san...@apache.org>
> > het
> > > volgende geschreven:
> > > >
> > > > Folks!
> > > > Here's a nice (simple) feature request if someone would like to
> > > implement it and file a PR.
> > > >
> > > > On the Admin-->Connections page, you are add a connection. But how do
> > > you know if it works? How about adding a "Test Connection" button on
> the
> > > edit page to test if the parameters you have specified are valid. You
> > might
> > > be able to also issue a test query from the edit screen.
> > > >
> > > > Currently, we don't validate the connections in terms of testing them
> > > because we have some default ones (as examples). In the future, we
> might
> > > want to consider removing those example connections or hiding them
> > behind a
> > > config param similar to load_examples, which is used to load the
> example
> > > dags into the Web app.
> > > >
> > > >
> > > > -s
> > >
> > >
> >
>


Re: Simple Feature Request

2016-11-23 Thread siddharth anand
If we support a test query dialog, we could just execute the query via each
hook's pre-existing execute method.

-s

On Wed, Nov 23, 2016 at 2:04 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:

> It is nice, but probably not so simple. Every hook would need to
> incorporate a “test” function.
>
>
> > Op 23 nov. 2016, om 21:28 heeft siddharth anand <san...@apache.org> het
> volgende geschreven:
> >
> > Folks!
> > Here's a nice (simple) feature request if someone would like to
> implement it and file a PR.
> >
> > On the Admin-->Connections page, you are add a connection. But how do
> you know if it works? How about adding a "Test Connection" button on the
> edit page to test if the parameters you have specified are valid. You might
> be able to also issue a test query from the edit screen.
> >
> > Currently, we don't validate the connections in terms of testing them
> because we have some default ones (as examples). In the future, we might
> want to consider removing those example connections or hiding them behind a
> config param similar to load_examples, which is used to load the example
> dags into the Web app.
> >
> >
> > -s
>
>


Simple Feature Request

2016-11-23 Thread siddharth anand
Folks!
Here's a nice (simple) feature request if someone would like to implement
it and file a PR.

On the Admin-->Connections page, you are add a connection. But how do you
know if it works? How about adding a "Test Connection" button on the edit
page to test if the parameters you have specified are valid. You might be
able to also issue a test query from the edit screen.

Currently, we don't validate the connections in terms of testing them
because we have some default ones (as examples). In the future, we might
want to consider removing those example connections or hiding them behind a
config param similar to load_examples, which is used to load the example
dags into the Web app.

[image: Inline image 1]
-s


Re: Dynamic creation of DAG

2016-11-22 Thread siddharth anand
Hi Max,
Which part in the above PR is related to dynamic dags?

When thinking about adding documentation about functionality, I propose the
community bias towards adding working examples and test coverage. We offer
a quick start (which by the way needs some updates - for example, why does
it not start airflow-scheduler after starting the webserver?), but then
folks get stuck in how to write DAGs and use the full range of Airflow
capabilities. This is where examples and better test coverage help keep
newbies productive.

Perhaps the examples and tests can be upgraded to show a fuller set of
dynamic dag capabilities?

-s

On Mon, Nov 21, 2016 at 7:55 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> I just added a bit of information about dynamic DAG creation here:
> https://github.com/apache/incubator-airflow/pull/1889/files#diff-
> c6f0a0722c6a2f86277535d7bcec7f8cR162
>
> Let me know if it helps.
>
> Max
>
> On Mon, Nov 21, 2016 at 2:58 AM, Deepak Kumar Malladi <
> kapeed2...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I want to dynamically create DAG during run time. I tried the snippet
> given
> > in the documentation. But it didnt work for me.
> >
> > Any pointer on how to trigger DAGs which aren't actually present in DAG
> > folder but are created through code execution (dynamically created)?
> >
> >
> > Thanks & Regards,
> > Deepak
> >
>


Re: Airflow 2.0

2016-11-21 Thread siddharth anand
1) The restart should not be needed, but if folks are reporting it, I'm
curious what the problem might be. If yo are running on master, then you
may not be aware of the min_file_process_interval setting.

[scheduler]

min_file_process_interval = 0

max_threads = 4

2) Yes.. security is not there. It's often something added to a maturing
project a little late in its growth - after feature completeness,
performance, etc... For example, Azkaban grew at LinkedIn to be widely
adopted for a few years before Azkaban2 came around and introduced security
features. If it's important to you, then vote. It may not be there on your
timeframe, but it will surely be something we land in 2017. Also if you run
in the cloud, there are some options that be make your installation more
secure.

Great feedback. I know Max kicked this thread off in order to figure out
how to get his team to consider the community's needs when picking what to
fix. This information is in fact helpful to us all.

-s

On Mon, Nov 21, 2016 at 6:13 PM, Boris Tyukin <bo...@boristyukin.com> wrote:

> I am still deciding between Airflow and oozie for our brand new Hadoop
> project but here is a few things that I did not like during my limited
> testing:
>
> 1) pain with scheduler/webserver restarts - things magically begin working
> after restart or disappear (like DAG tasks that are no longer part of DAG)
> 2) no security - a big deal for enterprise-like companies like the one I
> work for (a large healthcare organization).
> 3) backfill concept is a bit weird to me. I think Gerard put it pretty well
> - backfills should be run for the entire missing window, not day by day.
> Logging for backfills should be consistent with normal DAG Runs.
> 4) confusion around execution time and start time - i wish UI would clearly
> distinct them. Execution time only covers interval to a previous DAG run -
> I wish it would go back the LAST successful DAG run. That way I can rely on
> it to use it as watermarks for incremental processes.
> 5) UTC confusion - not all companies have a luxury to run all the systems
> on UTC.
>
>
> On Mon, Nov 21, 2016 at 5:26 PM, siddharth anand <san...@apache.org>
> wrote:
>
> > Also, a survey will be a little less noisy and easier to summarize than
> +1s
> > in this email thread.
> > -s (Sid)
> >
> > On Mon, Nov 21, 2016 at 2:25 PM, siddharth anand <san...@apache.org>
> > wrote:
> >
> > > Sergei,
> > > These are some great ideas -- I would classify at least half of them as
> > > pain points.
> > >
> > > Folks!
> > > I suggest people (on the dev list) keep feeding this thread at least
> for
> > > the next 2 days. I can then float a survey based on these ideas and
> give
> > > the community a chance to vote so we can prioritize the wish list.
> > >
> > > -s
> > >
> > > On Mon, Nov 21, 2016 at 5:22 AM, Sergei Iakhnin <lle...@gmail.com>
> > wrote:
> > >
> > >> I've been running Airflow on 1500 cores in the context of scientific
> > >> workflows for the past year and a half. Features that would be
> important
> > >> to
> > >> me for 2.0:
> > >>
> > >> - Add FK to dag_run to the task_instance table on Postgres so that
> > >> task_instances can be uniquely attributed to dag runs.
> > >> - Ensure scheduler can be run continuously without needing restarts.
> > Right
> > >> now it gets into some ill-determined bad state forcing me to restart
> it
> > >> every 20 minutes.
> > >> - Ensure scheduler can handle tens of thousands of active workflows.
> > Right
> > >> now this results in extremely long scheduling times and inconsistent
> > >> scheduling even at 2 thousand active workflows.
> > >> - Add more flexible task scheduling prioritization. The default
> > >> prioritization is the opposite of the behaviour I want. I would prefer
> > >> that
> > >> downstream tasks always have higher priority than upstream tasks to
> > cause
> > >> entire workflows to tend to complete sooner, rather than scheduling
> > tasks
> > >> from other workflows. Having a few scheduling prioritization
> strategies
> > >> would be beneficial here.
> > >> - Provide better support for manually-triggered DAGs on the UI i.e. by
> > >> showing them as queued.
> > >> - Provide some resource management capabilities via something like
> slots
> > >> that can be defined on workers and occupied by tasks. Using celery's
> > >> concurrency parameter at the airflow server level is too

Idea for a new UI feature

2016-11-20 Thread siddharth anand
A sort of obvious and missing feature of Airflow (awesome UI) is knowing
whether your DAGs are missing SLAs.

For example, say that you have an Hourly DAG and say that you have
specified the dag run timeout or sla=timedelta(hours=2), then after 2
hours, you will receive an email of the SLA miss.

It would be useful to see how far behind your DAG actually is, e.g. 6
hours. I welcome a PR that addresses this.

Something along the lines of what @btallman has in his recent PR to add
some cool additional info to the dashboard (dag overview) page :
https://github.com/apache/incubator-airflow/pull/1833

-s


Re: Subsequent Airflow Meetup: 2017/01/11

2016-11-20 Thread siddharth anand
I suspect Clover Health is extremely busy with all of the benefit
enrollments going on right now..

George,
When you come up for air, it looks like both Dan(Airbnb) and Kevin(Agari)
have talk ideas.

-s

On Wed, Nov 16, 2016 at 11:50 PM, Dan Davydov <
dan.davy...@airbnb.com.invalid> wrote:

> Based on chatting with a couple of people today at the Airflow meet-up I
> think there has been some demand for an airflow operations talk,
> specifically around monitoring/alerting. If there is still room I can give
> a talk about this, let me know George.
>
> On Thu, Nov 10, 2016 at 10:17 AM, siddharth anand <san...@apache.org>
> wrote:
>
> > Kevin,
> > Here's a link to the 1Q17 meet-up.
> > https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/
> > 235259523/
> >
> > Both upcoming meet-ups (next week at WePay and 1Q17 at Clover Health) can
> > be found on http://www.meetup.com/Bay-Area-Apache-Airflow-
> > Incubating-Meetup/
> >
> > -s
> >
> >
> > On Wed, Nov 9, 2016 at 4:24 PM, Kevin Mandich <kevinmand...@gmail.com>
> > wrote:
> >
> > > Hi George,
> > >
> > > If there is still room, I'd like to give a talk about how we use
> Airflow
> > at
> > > my company, Agari. We are a data company that is working to eliminate
> > > inbound, targeted e-mail attacks to our customers (spear-phishing). I
> am
> > > currently working as a data scientist who is also responsible for
> > shipping
> > > my work to production.
> > >
> > > We currently use Airflow to build models from our telemetry data which
> > are
> > > then used for scoring in our near-real-time pipeline. I'd like to talk
> > > about some of the DAGs we've set up to do this.
> > >
> > > Please let me know if this sounds reasonable. Thank you,
> > >
> > > Kevin Mandich
> > > Agari Data, Inc.
> > >
> > >
> > > On Mon, Oct 31, 2016 at 11:27 PM, George Leslie-Waksman <
> > > geo...@cloverhealth.com.invalid> wrote:
> > >
> > > > I know it's a bit far in advance, but to make sure there's space (and
> > > food
> > > > and drink), I've scheduled and booked the subsequent meetup for
> January
> > > > 11th at Clover Health in SF.
> > > >
> > > > If anyone wants to volunteer to talk, let me know, otherwise I'll
> > > probably
> > > > start bugging folks sometime after Thanksgiving and before the
> December
> > > > holidays.
> > > >
> > > > --George Leslie-Waksman
> > > >
> > >
> >
>


Re: Github Mirroring currently broken

2016-11-20 Thread siddharth anand
This appears to be working again!
-s

On Sun, Nov 20, 2016 at 12:10 PM, siddharth anand <san...@apache.org> wrote:

> Committers/Maintainers,
> The Apache Airflow Github mirror is not synchronizing. I've filed a
> ticket. It looks like, as of now, 2 other Apache projects (nifi &
> brooklyn-server) have reported the same issue.
>
> https://issues.apache.org/jira/browse/INFRA-12949
>
> This means that although we are successfully merging to Apache master at
> https://git-wip-us.apache.org/repos/asf/incubator-airflow.git, the
> changes are not being mirrored to g...@github.com:apache/
> incubator-airflow.git. This affects things like rebasing of PRs.. and
> I've opened the tickets at a Blocker.
>
> -s
>


Github Mirroring currently broken

2016-11-20 Thread siddharth anand
Committers/Maintainers,
The Apache Airflow Github mirror is not synchronizing. I've filed a ticket.
It looks like, as of now, 2 other Apache projects (nifi & brooklyn-server)
have reported the same issue.

https://issues.apache.org/jira/browse/INFRA-12949

This means that although we are successfully merging to Apache master at
https://git-wip-us.apache.org/repos/asf/incubator-airflow.git, the changes
are not being mirrored to g...@github.com:apache/incubator-airflow.git. This
affects things like rebasing of PRs.. and I've opened the tickets at a
Blocker.

-s


Re: Airflow 2.0

2016-11-18 Thread siddharth anand
David
https://issues.apache.org/jira/browse/AIRFLOW-558 (i.e. http
s://github.com/apache/incubator-airflow/pull/1830 ) Is on my plate.. have
already gone through many rounds of reviews, testing, and fixes with the
submitter and does not need to wait till 2.0. We should be able to merge it
soon. BTW, you are encouraged to vote on these PRs so maintainers can
prioritize their time.

Max,

Thanks for kicking off this thread.

Regarding 2.0, we've associated feature deprecation and non-backward
compatible changes with 2.0. Some of this work might be pretty
earth-shaking to Airflow users. IMHO, changes that increase user pain at
upgrade time need to be carefully balanced against value.

Watching both Gitter and the email list, there are a variety of stumbling
points (for new users) that many of us who have been using the product for
1-2 years have forgotten. A fair number of people still mention that
getting Airflow up and running is no simple task - i.e. Alex mentioned this
in his talk at the last meet-up. The recent BlueYonder talk referenced
https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls

Though we may be numerically near 2.0 in terms of release numbers, I'd
prefer to prioritize a few things higher than releasing 2.0. We need to
build and exercise a few necessary muscles : timely PR processing & timely
Apache releases (i.e. quarterly). Beyond that, I'd like to prioritize the
"common pitfall" problems to ease on-boarding. Some of these don't need to
wait for a major release. The ones that do can be developed on a separate
2.0 branch and baked, reviewed, and voted on by the community before we
consider dropping it into master.

That way, we can keep master healthy to support the increasing rate of
community-submitted PRs that we are seeing and reduce the cycle time of
cutting stable releases, all while working on big-bang changes for 2.0
independently.

Just my $0.02
-s

On Fri, Nov 18, 2016 at 3:57 PM, Chris Riccomini 
wrote:

> > RIP out the charting application and the data profiler
>
> Yes please! +1
>
> On Fri, Nov 18, 2016 at 2:41 PM, Maxime Beauchemin
>  wrote:
> > Another point that may be controversial for Airflow 2.0: RIP out the
> > charting application and the data profiler. Even though it's nice to have
> > it there, it's just out of scope and has major security
> issues/implications.
> >
> > I'm not sure how popular it actually is. We may need to run a survey at
> > some point around this kind of questions.
> >
> > Max
> >
> > On Fri, Nov 18, 2016 at 2:39 PM, Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> >
> >> Using FAB's Model, we get pretty much all of that (REST API, auth/perms,
> >> CRUD) for free:
> >> http://flask-appbuilder.readthedocs.io/en/latest/
> >> quickhowto.html?highlight=rest#exposed-methods
> >>
> >> I'm pretty intimate with FAB since I use it (and contributed to it) for
> >> Superset/Caravel.
> >>
> >> All that's needed is to derive FAB's model class instead of SqlAlchemy's
> >> model class (which FAB's model wraps and adds functionality to and is
> 100%
> >> compatible AFAICT).
> >>
> >> Max
> >>
> >> On Fri, Nov 18, 2016 at 2:07 PM, Chris Riccomini  >
> >> wrote:
> >>
> >>> > It may be doable to run this as a different package
> >>> `airflow-webserver`, an
> >>> > alternate UI at first, and to eventually rip out the old UI off of
> the
> >>> main
> >>> > package.
> >>>
> >>> This is the same strategy that I was thinking of for AIRFLOW-85. You
> >>> can build the new UI in parallel, and then delete the old one later. I
> >>> really think that a REST interface should be a pre-req to any
> >>> large/new UI changes, though. Getting unified so that everything is
> >>> driven through REST will be a big win.
> >>>
> >>> On Fri, Nov 18, 2016 at 1:51 PM, Maxime Beauchemin
> >>>  wrote:
> >>> > A multi-tenant UI with composable roles on top of granular
> permissions.
> >>> >
> >>> > Migrating from Flask-Admin to Flask App Builder would be an easy-ish
> win
> >>> > (since they're both Flask). FAB Provides a good authentication and
> >>> > permission model that ships out-of-the-box with a REST api. Suffice
> to
> >>> > define FAB models (derivative of SQLAlchemy's model) and you get a
> set
> >>> of
> >>> > perms for the model (can_show, can_list, can_add, can_change,
> >>> can_delete,
> >>> > ...) and a set of CRUD REST endpoints. It would also allow us to rip
> out
> >>> > the authentication backend code out of Airflow and rely on FAB for
> that.
> >>> > Also every single view gets permissions auto-created for it, and
> there
> >>> are
> >>> > easy way to define row-level type filters based on user permissions.
> >>> >
> >>> > It may be doable to run this as a different package
> >>> `airflow-webserver`, an
> >>> > alternate UI at first, and to eventually rip out the old UI off of
> the
> >>> main
> >>> > package.
> >>> >
> >>> > 

Re: Airflow installation error on ubuntu

2016-11-18 Thread siddharth anand
Kapil,
Please resubmit the email with links to the images. All embedded or
attached objects are stripped from email before Apache mail servers forward
them to recipients.

-s

On Fri, Nov 18, 2016 at 4:20 PM, Kapil Khandelwal 
wrote:

> Hi,
>
> I am trying to run command pip install airflow[all] on Ubuntu box but
> getting below error
>
> [image: Inline image 1]
>
> Ubuntu Version
>
> [image: Inline image 2]
>
> Please let me know how to resolve the error.
>
> Thanks
>
>


Priorities for the Airflow Community

2016-11-17 Thread siddharth anand
As we near the last month of the year and the 9th month in Apache
Incubation, I'd like to share some thoughts and gauge community feedback.

At the beginning of this year, we saw a good deal of interest in Airbnb's
Airflow project. As more companies started using it for business critical
work, it became evident that the only way for the project to scale and
improve would be to transfer ownership via Apache Incubation to the user
community.

We started incubation by swiftly adopting Apache processes and tooling,
selecting the initial committer group, and getting familiar with the inner
workings of Airflow.

The first few months saw various committers and contributors dive into the
guts of Airflow's scheduler (Bolke, Jeremiah, and Paul to name just a few).
The result was a lot of bug fixes, code refactoring, & solid performance
gains, followed by a period of stability in the core Airflow code (the
scheduler). We also saw a large influx of PRs from a growing contributor
base that buried the maintainers until recently. From cool new UI and CLI
features to more 3rd party integrations (e.g. GCP, AWS, Docker) to
improvements in logging and documentation, the community has helped make
Airflow more useful to all. Along the way, we have been promoting
contributors to the committer group and PPMC in order to better serve you.
We plan to continue rewarding member of our community with such promotions.

Roughly 8 months after entering incubation, we have seen Airflow grow :

   - from 100 contributors  to 212
   - from 30 companies to ~70
   - by thousands of commits
   - with increased coverage at conferences and literature

As a community, our job is not yet done:

   - We want your continued contribution
  - With so many companies and varied use-cases, your contributions
  
are
  what makes Airflow a rich ecosystem
  - We welcome pull requests that improve user experience, expand 3rd
  party integrations, address some quirks and annoyances in Airflow (e.g.
  changing the dag_id in order to recognize start_date changes, running on
  UTC everywhere), or simply improve documentation
   - To support your active involvement as contributors
  - We, the maintainers, are committed to supporting a timely (<2 week)
  turnaround of your Pull Requests
   - To support your active involvement as users
  - We, the maintainers, need to be better at releasing all of these
  contributions at a predictable cadence
 - This works in unison with reviewing PRs in a timely manner
  - We will be focusing on a predictable and frequent release cadence
  the first half of next year.

As more companies join the community, enterprise-level requirements start
to become more important. Features around security, stability, performance,
backward compatibility, and ease of installation / upgrade become more
important. It's a sign of maturity and of our entering the mainstream. We
will continue to accept features but will also consider risk when reviewing
PRs (e.g. potential for destabilization).

If there are other things we need to focus on or to prioritize, please do
share your thoughts.

-s


Re: Slides & Video Recording from yesterday's Airflow Talk

2016-11-17 Thread siddharth anand
Rob,
Wiki Access granted.
-s

On Thu, Nov 17, 2016 at 10:49 AM, Rob Froetscher <rfroetsc...@lumoslabs.com>
wrote:

> Here are our slides:
> https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYd
> Ohq_np90VlbVRc/edit?usp=sharing
>
> I don't think I have permissions to edit the wiki
>
> On Thu, Nov 17, 2016 at 10:41 AM, Chris Riccomini <criccom...@apache.org>
> wrote:
>
> > We'll be posting the video recording shortly. IT is working on it. :)
> >
> > Will post link on the meetup and mailing list.
> >
> > On Thu, Nov 17, 2016 at 10:12 AM, Siddharth Anand
> > <san...@agari.com.invalid> wrote:
> > > Chris and WePayEng,
> > > Thanks for hosting another great Airflow meet-up.
> > >
> > > Can all of the speakers post their slides online and add links to those
> > > talks in response to this email (and also on our Wiki)?
> > >
> > > -s
> >
>


Slides & Video Recording from yesterday's Airflow Talk

2016-11-17 Thread Siddharth Anand
Chris and WePayEng,
Thanks for hosting another great Airflow meet-up.

Can all of the speakers post their slides online and add links to those
talks in response to this email (and also on our Wiki)?

-s


Cold Case PR Cleanup Status - Deadline Extended to Nov 30

2016-11-17 Thread siddharth anand
Committers, please resolve any open Cold Case PRs that you still have on
your list below :
https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-Case+PR+Resolution

In the past 6 weeks, we've cleaned up over 70 (from over 110 down to 36! We
have 16 more to go to get to our goal of ~20.

Dan, Max, Arthur, Bolke, Li Xuanji, Sumit, Chris, & Jeremiah,
If you can each pick 2, we'll reach our goal.

I'll be leading the Release of 1.8 (with help from Chris), but getting to
my goal of resolving these long-outstanding open PRs is a pre-requisite.
-s


Re: Tasks getting Queued when Pool is full sometimes never get run

2016-11-15 Thread siddharth anand
Not seeing this, but I am not running master in my staging or production
envs currently.
-s

On Mon, Nov 14, 2016 at 12:38 PM, Ben Tallman  wrote:

> We are seeing an issue when running Master where tasks sometimes never run.
> It seems that once they get marked as Dependencies Not met because the Pool
> is full, that isn't being re-evaluated. Is anyone else seeing this?
>
> https://issues.apache.org/jira/browse/AIRFLOW-627
>
> Thanks,
> Ben
>
> --
> Ben Tallman - 503.680.5709
>


Re: BranchPython : failed stated not propagated to downstream taks

2016-11-15 Thread siddharth anand
correct

On Tue, Nov 15, 2016 at 1:11 AM  wrote:

Thanks siddharth for your answer. I'm going to look at the extra "failure
processing branch" as you suggested.

 >It is intended that a DAGRun be deemed successful in all cases except for
 > failure. So, skipped nodes F and G, would result in a Successful DagRun.
A DAGRun is deemed succesful or failed based solely on the status of the
last tasks right ? Airflow does not consider non-terminal failed tasks ?


At moment, I want my privacy to be protected.
https://mytemp.email/


Using Airflow? Add your Company to the ReadMe!

2016-11-15 Thread siddharth anand
https://github.com/apache/incubator-airflow/blob/master/README.md

If you haven't yet, make sure to add your company to the list of companies
using Airflow! Submit a PR to add your company to the list.
-s


Re: Anyone noticed PR's for 41 & 137?

2016-11-15 Thread siddharth anand
Hi Gerard,
I have them both under my name on the following wiki though anyone from the
community is welcome to help with the review and test it out as well.
https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-Case+PR+Resolution

About 6 weeks ago, we had over 110 open PRs, many of which were very old
and stagnant. I've been working to bring that number down in my spare time
with the help of other committers, also listed on that wiki.

The ETA for that effort is end of Today, Nov 15. Hence, there has been some
delay in getting to it.
-s

On Mon, Nov 14, 2016 at 11:10 PM, Gerard Toonstra 
wrote:

> Hi all,
>
> Somewhat over a week ago, I submitted 2 PR's to fix long outstanding
> problems 41 & 137, but didn't see anyone putting a +1 on it or commenting
> on the resolution.
>
> https://github.com/apache/incubator-airflow/pull/1872
>
> https://github.com/apache/incubator-airflow/pull/1870
>
> I think these fix important problems in the airflow core code which many
> new users run into and then complain about.
>
> Rgds,
>
> Gerard
>


Re: Gantt chart broken on master?

2016-11-14 Thread siddharth anand
it's working as it always has, though this case (when a dag is behind
several runs and needs to catch up) appears to result in an empty Gantt
chart, which is not ideal.

On Mon, Nov 14, 2016 at 12:43 AM Sumit Maheshwari <sumeet.ma...@gmail.com>
wrote:

> So we can say that its not broken, right?
>
>
> PS: link to screenshot
>
> https://www.dropbox.com/s/wql05icqgnewl0d/Screenshot%202016-11-14%2006.55.43.png?dl=0
>
>
>
>
> On Mon, Nov 14, 2016 at 12:30 PM, siddharth anand <san...@apache.org>
> wrote:
>
> > Found the issue.. it seems the gantt charts load from the most recent
> > "running" DagRun, which in the example I provided (assuming you can see
> the
> > attached screenshots in my original email), has not currently running
> > tasks, hence the gantt charts look completely bare. If the Gantt chart
> > instead defaulted to the most recent running DagRun with at least one
> task
> > complete, it would be more useful IMHO.
> >
> > -s
> >
> > On Sun, Nov 13, 2016 at 9:53 PM, siddharth anand <san...@apache.org>
> > wrote:
> >
> >> I've installed airflow on a fresh virtualenv and I still have the issue
> >> above on the gantt chart. Anyone else notice the same. @Sumit, as the
> last
> >> person to merge a gantt chart change, can you repro on a fresh
> virtualenv?
> >>
> >> I notice a single "/" character on the page.
> >>
> >> -s
> >>
> >> On Sat, Nov 12, 2016 at 9:25 PM, siddharth anand <san...@apache.org>
> >> wrote:
> >>
> >>> Gantt chart is broken for me on master.
> >>>
> >>> I think it's due to this merge.
> >>> https://github.com/apache/incubator-airflow/commit/868bc8313
> >>> 7adca0ebfd5780f0dff5a7bfdfaadf9
> >>>
> >>> Why is an end_date needed?
> >>>
> >>> [image: Inline image 1]
> >>>
> >>> This is the tree view:
> >>> [image: Inline image 2]
> >>>
> >>> Sumit, as the merger/committer, can you confirm?
> >>>
> >>> -s
> >>>
> >>
> >>
> >
>


Re: Skip task

2016-11-14 Thread siddharth anand
For cases like this, we (Agari) use the following approach :

   1. Create a Variable in the UI of type boolean such as *enable_feature_x*
   2. Use a ShortCircuitOperator (or BranchPythonOperator) to Skip
   downstream processing based on the value of *enable_feature_x*
   3. Assuming that you don't want to skip ALL downstream tasks, you can
   use a trigger_rule of all_done to resume processing some portion of your
   downstream DAG after skipping an upstream portion

In other words, there is already a means to achieve what you are asking for
today. You can change the value of via *enable_feature_x  *the UI. If you'd
like to enhance the UI to better capture this pattern, pls submit a PR.
-s

On Thu, Nov 10, 2016 at 1:20 PM, Maycock, Luke <
luke.mayc...@affiliate.oliverwyman.com> wrote:

> Hi Gerard,
>
>
> I see the new status as having a number of uses:
>
>  1.  A user can manually set a task to skip in a DAG run via the UI.
>  2.  We can then make use of this new status to add the following
> functionality to Airflow:
> *   Run a DAG run up to a certain point and have the rest of the tasks
> have the new status.
> *   Run a DAG run from a certain task to the end, setting all
> pre-requisite tasks to have this new status.
>
> I am happy to be challenged on the above use cases if there are better
> ways to achieve the same things.
>
> Cheers,
> Luke Maycock
> OLIVER WYMAN
> luke.mayc...@affiliate.oliverwyman.com mayc...@affiliate.oliverwyman.com>
> www.oliverwyman.com
>
>
>
> 
> From: Gerard Toonstra 
> Sent: 09 November 2016 18:08
> To: dev@airflow.incubator.apache.org
> Subject: Re: Skip task
>
> Hey Luke,
>
> Who or what makes the decision to skip processing that task?
>
> Rgds,
>
> Gerard
>
> On Wed, Nov 9, 2016 at 2:39 PM, Maycock, Luke <
> luke.mayc...@affiliate.oliverwyman.com> wrote:
>
> > Hi Gerard,
> >
> >
> > Thank you for your quick response.
> >
> >
> > I am not trying to implement this for a specific operator but rather
> > trying to add it as a feature for any task in any DAG.
> >
> >
> > Given that the skipped states propagate where all directly upstream tasks
> > are skipped, I don't think this is the state we want to use. For the
> > functionality I'm looking for, I think I'll need to introduce a new
> status,
> > maybe 'disabled'.
> >
> >
> > Again, thanks for your response.
> >
> >
> > Cheers,
> > Luke Maycock
> > OLIVER WYMAN
> > luke.mayc...@affiliate.oliverwyman.com > mayc...@affiliate.oliverwyman.com>
> > www.oliverwyman.com
> >
> >
> >
> > 
> > From: Gerard Toonstra 
> > Sent: 08 November 2016 18:19
> > To: dev@airflow.incubator.apache.org
> > Subject: Re: Skip task
> >
> > Also in 1.7.1.3, there's the ShortCircuitOperator, which can give you an
> > example.
> >
> > https://github.com/apache/incubator-airflow/blob/1.7.1.
> > 3/airflow/operators/python_operator.py
> >
> > You'd have to modify this to your needs, but the way it works is that if
> > the condition evaluates to True, none of the
> > downstream tasks are actually executed, they'd be skipped. The reason for
> > putting them into SKIPPED state is that
> > the DAG final result would still be SUCCESS and not failed.
> >
> > You could copy the operator from there and don't do the full "for loop",
> > only pick the tasks immediately downstream
> > from this operator and skip that. Or... if you need to skip additional
> > tasks downstream, add a parameter "num_tasks"
> > that decide on a halting condition for the for loop.
> >
> > I believe that should work. I didn't try that here, but you can test that
> > and see what it does for you.
> >
> >
> > If you want this as a UI capability... for example have a human operator
> > decide on skipping this yes or not, then
> > maybe the best way forward would be some kind of highly custom plugin
> with
> > its own view. In the end, you'd basically
> > do the same action in the backend, whether the python cond evaluates to
> > True or the button is clicked.
> >
> > In the plugin case though, you'd have to keep the UI and the structure of
> > the DAG in sync and aligned, otherwise
> > it'd become a mess Airflow wasn't really developed for workflow/human
> > interaction, but in workflows where only
> > automated processes are involved. That doesn't mean that you can't do
> > anything like that, but it may be costly resource
> > wise to get this done. For example, on the basis of the BranchOperator,
> you
> > could call an external API to verify if a decision
> > was taken on a case, then follow branch A or B if the decision is there
> or
> > put the state back into UP_FOR_RETRY.
> > At the moment though, there's no programmatic way to reschedule that task
> > to some minutes or hours into the future before
> > it's looked at again, unless you really dive into airflow, scheduling
> > 

Re: Issue with latest versions of Celery & Kombu

2016-11-14 Thread siddharth anand
I don't run celery.

However, I would suggest that any solutions that you come up with be
followed up by a PR with version updates to setup.py.

On Mon, Nov 14, 2016 at 11:56 AM, Miller, Robin <
robin.mil...@affiliate.oliverwyman.com> wrote:

> Hi Nadeem,
>
>
> We are using Celery with RabbitMQ.  The upgrade to Celery 4.0 last week
> did cause RabbitMQ trouble. The web interface (for RabbitMQ) started giving
> us errors pointing to the use of non-utf8 characters in message queues (or
> queue names, it wasn't clear), which RabbitMQ does not support.
>
>
> We've also solved this by moving back to version 3.1.15 of Celery. We
> didn't investigate further, since this solved our problem, but I suspect
> that there are a fair few people who've been tripped up by this, or will be.
>
>
> Regards,
>
> Robin Miller
> OLIVER WYMAN
> robin.mil...@affiliate.oliverwyman.com mil...@affiliate.oliverwyman.com>
> www.oliverwyman.com
>
> 
> From: Nadeem Ahmed Nazeer 
> Sent: 12 November 2016 08:21:02
> To: dev@airflow.incubator.apache.org
> Cc: Nadeem Ahmed
> Subject: Issue with latest versions of Celery & Kombu
>
> Hi Airflowers,
>
> We install airflow from our chef scripts and are currently using Airflow
> 1.7.1.3. We re-base airflow once in a while to reduce the amount of
> backfills, where we burn down everything and bring it up. It worked fine
> every time until today.
>
> While doing the re-base now, we faced an issue with celery which was not
> able to talk to the backend database using sqlalchemy. There were no
> changes made to our setup scripts in chef.
>
> 2016-11-12 07:34:14,386 ERROR:airflow.jobs.SchedulerJob[MainThread] u'No
> such transport: sqla'
> Traceback (most recent call last):
>  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 755,
> in _execute
>executor.heartbeat()
>  File "/usr/local/lib/python2.7/dist-packages/airflow/
> executors/base_executor.py",
> line 99, in heartbeat
>self.execute_async(key, command=command, queue=queue)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/
> executors/celery_executor.py",
> line 66, in execute_async
>args=[command], queue=queue)
>  File "/usr/local/lib/python2.7/dist-packages/celery/app/task.py", line
> 536, in apply_async
>**options
>  File "/usr/local/lib/python2.7/dist-packages/celery/app/base.py", line
> 714, in send_task
>with self.producer_or_acquire(producer) as P:
>  File "/usr/local/lib/python2.7/dist-packages/celery/utils/objects.py",
> line 85, in __enter__
>*self.fb_args, **self.fb_kwargs
>  File "/usr/local/lib/python2.7/dist-packages/kombu/resource.py", line 83,
> in acquire
>R = self.prepare(R)
>  File "/usr/local/lib/python2.7/dist-packages/kombu/pools.py", line 62, in
> prepare
>p = p()
>  File "/usr/local/lib/python2.7/dist-packages/kombu/utils/functional.py",
> line 203, in __call__
>return self.evaluate()
>  File "/usr/local/lib/python2.7/dist-packages/kombu/utils/functional.py",
> line 206, in evaluate
>return self._fun(*self._args, **self._kwargs)
>  File "/usr/local/lib/python2.7/dist-packages/kombu/pools.py", line 42, in
> create_producer
>conn = self._acquire_connection()
>  File "/usr/local/lib/python2.7/dist-packages/kombu/pools.py", line 39, in
> _acquire_connection
>return self.connections.acquire(block=True)
>  File "/usr/local/lib/python2.7/dist-packages/kombu/resource.py", line 83,
> in acquire
>R = self.prepare(R)
>  File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line
> 936, in prepare
>resource = resource()
>  File "/usr/local/lib/python2.7/dist-packages/kombu/utils/functional.py",
> line 203, in __call__
>return self.evaluate()
>  File "/usr/local/lib/python2.7/dist-packages/kombu/utils/functional.py",
> line 206, in evaluate
>return self._fun(*self._args, **self._kwargs)
>  File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line
> 908, in new
>return self.connection.clone()
>  File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line
> 587, in clone
>return self.__class__(**dict(self._info(resolve=False), **kwargs))
>  File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line
> 597, in _info
>D = self.transport.default_connection_params
>  File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line
> 832, in transport
>self._transport = self.create_transport()
>  File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line
> 576, in create_transport
>return self.get_transport_cls()(client=self)
>  File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line
> 582, in get_transport_cls
>transport_cls = get_transport_cls(transport_cls)
>  File "/usr/local/lib/python2.7/dist-packages/kombu/transport/
> __init__.py",
> line 81, in get_transport_cls
>_transport_cache[transport] = resolve_transport(transport)
>  File 

Re: Issue with airflow upgradedb...

2016-11-14 Thread siddharth anand
Ben,
I ran into issues while maintaining my company's airflow fork and
cherry-picking my changes into the fork, especially when my changes
included db changes.

I had to play with the alembic_version in the db and do some other magic
that escapes me now. My best guidance for the future is to cherry pick ALL
DB-related changes from both master and your own btallman github fork into
your apigee fork. That way, the db migration lineage in your apigee fork
matches what is in master.

-s

On Fri, Nov 11, 2016 at 4:49 AM, Sumit Maheshwari 
wrote:

> Ben,
>
> Can u see whats current version using "alembic current".. afaik
> version f2ca10b85618
> is the latest migration in master and I had no issue migrating to it..
>
> Also did your CPs contain any custom migrations?
>
>
>
> On Fri, Nov 11, 2016 at 5:04 AM, Ben Tallman  wrote:
>
> > We are running master with a few cherry picked features... Did we squash
> > commits that Alembic is expecting? Did I?
> >
> > Basically, there are revisions that are no longer in master??
> Specifically
> > at least:
> >
> > Can't locate revision identified by 'f2ca10b85618'
> >
> > ===
> >
> > *airflow upgradedb*
> > [2016-11-10 15:31:04,156] {__init__.py:36} INFO - Using executor
> > CeleryExecutor
> > DB: postgresql://airflow_qa:***@
> > nucleus.c7b2twrxxjtc.us-west-2.rds.amazonaws.com/nucleus
> > [2016-11-10 15:31:05,707] {utils.py:288} INFO - Creating tables
> > INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> > INFO  [alembic.runtime.migration] Will assume transactional DDL.
> > Traceback (most recent call last):
> >   File "/usr/local/bin/airflow", line 15, in 
> > args.func(args)
> >   File "/Library/Python/2.7/site-packages/airflow/bin/cli.py", line 459,
> > in
> > upgradedb
> > utils.upgradedb()
> >   File "/Library/Python/2.7/site-packages/airflow/utils.py", line 295,
> in
> > upgradedb
> > command.upgrade(config, 'heads')
> >   File "/Library/Python/2.7/site-packages/alembic/command.py", line 174,
> > in
> > upgrade
> > script.run_env()
> >   File "/Library/Python/2.7/site-packages/alembic/script/base.py", line
> > 397, in run_env
> > util.load_python_file(self.dir, 'env.py')
> >   File "/Library/Python/2.7/site-packages/alembic/util/pyfiles.py", line
> > 81, in load_python_file
> > module = load_module_py(module_id, path)
> >   File "/Library/Python/2.7/site-packages/alembic/util/compat.py", line
> > 79,
> > in load_module_py
> > mod = imp.load_source(module_id, path, fp)
> >   File "/Library/Python/2.7/site-packages/airflow/migrations/env.py",
> line
> > 74, in 
> > run_migrations_online()
> >   File "/Library/Python/2.7/site-packages/airflow/migrations/env.py",
> line
> > 69, in run_migrations_online
> > context.run_migrations()
> >   File "", line 8, in run_migrations
> >   File "/Library/Python/2.7/site-packages/alembic/runtime/
> environment.py",
> > line 797, in run_migrations
> > self.get_context().run_migrations(**kw)
> >   File "/Library/Python/2.7/site-packages/alembic/runtime/migration.py",
> > line 303, in run_migrations
> > for step in self._migrations_fn(heads, self):
> >   File "/Library/Python/2.7/site-packages/alembic/command.py", line 163,
> > in
> > upgrade
> > return script._upgrade_revs(revision, rev)
> >   File "/Library/Python/2.7/site-packages/alembic/script/base.py", line
> > 314, in _upgrade_revs
> > for script in reversed(list(revs))
> >   File
> > "/System/Library/Frameworks/Python.framework/Versions/2.7/
> > lib/python2.7/contextlib.py",
> > line 35, in __exit__
> > self.gen.throw(type, value, traceback)
> >   File "/Library/Python/2.7/site-packages/alembic/script/base.py", line
> > 160, in _catch_revision_errors
> > compat.raise_from_cause(util.CommandError(resolution))
> >   File "/Library/Python/2.7/site-packages/alembic/util/compat.py", line
> > 132, in raise_from_cause
> > reraise(type(exception), exception, tb=exc_tb)
> >   File "/Library/Python/2.7/site-packages/alembic/script/base.py", line
> > 129, in _catch_revision_errors
> > yield
> >   File "/Library/Python/2.7/site-packages/alembic/script/base.py", line
> > 310, in _upgrade_revs
> > revs = list(revs)
> >   File "/Library/Python/2.7/site-packages/alembic/script/revision.py",
> > line
> > 610, in _iterate_revisions
> > requested_lowers = self.get_revisions(lower)
> >   File "/Library/Python/2.7/site-packages/alembic/script/revision.py",
> > line
> > 299, in get_revisions
> > return sum([self.get_revisions(id_elem) for id_elem in id_], ())
> >   File "/Library/Python/2.7/site-packages/alembic/script/revision.py",
> > line
> > 304, in get_revisions
> > for rev_id in resolved_id)
> >   File "/Library/Python/2.7/site-packages/alembic/script/revision.py",
> > line
> > 304, in 
> > for rev_id in resolved_id)
> >   File "/Library/Python/2.7/site-packages/alembic/script/revision.py",
> > line
> > 359, in 

Re: BranchPython : failed stated not propagated to downstream taks

2016-11-14 Thread siddharth anand
A couple things are at play.

The best practice is to ensure that your branch python operator does not
fail. Catch the most general exception and return a value that would pick a
"failure processing branch". If neither D nor E is your failure processing
branch, then add one more branch - BranchPythonOperator supports
generalized N-way branching after all.


Your "failure processing branch" can report errors and be a leaf
node/terminal node of your DAG, which means that F and G do not need to be
executed in case of failure.

If you would like F and G to always be executed (since I don't know your
use-case, I cannot comment whether it makes sense for you), then make all 3
of your branch nodes, D, E, and "Failure processing branch", upstream
parents of F.

Also, in F, you should specify a trigger_rule of "one_success", so that
processing of F only happens when you get one success.

It is intended that a DAGRun be deemed successful in all cases except for
failure. So, skipped nodes F and G, would result in a Successful DagRun.


-s

On Mon, Nov 14, 2016 at 1:23 PM,  wrote:

> Hello.
>
> My DAG is as follow :
>
> D
>/   \
> A -- B -- C F -- G
>\   / E
>
> I have a case where the C BranchPythonOperator fails so that it is not
> able to tell Airflow which child task must be executed. The result is : D &
> E are in UP_FOR_RETRY state and F & G are in SKIPPED state.
>
> The issue is, because of the last SKIPPED state, Airflow considers the
> whole DAG to have succeeded whereras it has not.
>
> The FAILED state is not propagated to downstream tasks. Is it intended
> behaviour ?
>
> Regards.
>
>
> At moment, I want my privacy to be protected.
> https://mytemp.email/
>


Re: Cold Case PR Cleanup -- Current Status

2016-11-13 Thread siddharth anand
Current status : we have < 50 open PRs presently!
https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-Case+PR+Resolution

-s

On Sat, Nov 5, 2016 at 8:06 PM, siddharth anand <san...@apache.org> wrote:

> Committers,
> If you have time this week, please make a push to get your cold case
> PR cleanup done. The deadline is Nov 15, just in time for our WePay meetup.
> I will be making an announcement there.
>
> -s
>
>
> -- Forwarded message --
> From: *siddharth anand* <san...@apache.org>
> Date: Friday, November 4, 2016
> Subject: Cold Case PR Cleanup -- Current Status
> To: dev@airflow.incubator.apache.org
>
>
> We have a little over 50 open PRs. We need to get to 10 by the end of the
> year. Our current rate of new PRs (i.e. during this holiday season) is a
> handful a week, so 10 open PRs roughly equates to PRs opened within a 2
> week period.
>
> 2 week turn-around times for PR review should be the commitment the
> maintainer group sticks to. BTW, I've seen a few contributors pitch in by
> reviewing PRs. That is extremely helpful and speeds up PR review/merge
> times. Please continue.
>
> Here's the current cold-case status.
>
> [image: Inline image 1]
>
> -s
>
> On Wed, Nov 2, 2016 at 12:50 PM, siddharth anand <san...@apache.org>
> wrote:
>
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-Cas
>> e+PR+Resolution
>>
>> We are now, for the first time since I can remember, under *60* open
>> PRs! Woo hoo! Keep up the pressure committers! If you haven't yet had
>> closed your tracked PRs, please do so soon.
>>
>> Exactly 1 month ago (Oct 2), when this endeavor started, we were at *110*
>> open PRs.
>>
>> [image: Inline image 1]
>> -s
>>
>> On Wed, Nov 2, 2016 at 12:30 AM, siddharth anand <san...@apache.org>
>> wrote:
>>
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-Cas
>>> e+PR+Resolution
>>>
>>> [image: Inline image 1]
>>>
>>> -s
>>>
>>
>>
>
>
>
> --
> Sent from Gmail Mobile
>


Re: [VOTE]: Using fractional seconds

2016-11-13 Thread siddharth anand
SGTM

On Sun, Nov 13, 2016 at 12:02 PM, Bolke de Bruin  wrote:

> Hi All,
>
> I count 3 positive votes, 0 negative ones. Therefore, I will finalize
> https://github.com/apache/incubator-airflow/pull/1794 which implements
> Option 1.
>
> Thanks!
> Bolke
>
> > Op 9 nov. 2016, om 22:48 heeft Arthur Wiedmer 
> het volgende geschreven:
> >
> > Hi all,
> >
> > I was the main proponent of option 2, mostly because I could not see a
> > specific situation where sub second precision was needed for this.
> >
> > However, I feel that we have heard from the community that there are use
> > cases out there. I agree with Bolke's analysis of the increased
> operational
> > cost of maintaining option 2.
> >
> > I vote for option 1.
> >
> > Best regards,
> > Arthur
> >
> > On Tue, Nov 8, 2016 at 10:40 AM, Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> >
> >> I vote for option 1.
> >>
> >> We may want to alter previous database migration script to have some
> >> MySQL-specfic, `try` block to get it right on fresh installs.
> >>
> >> We also may want a new database migration that is MysQL-specific and
> ALTERs
> >> the columns properly. It seems to me thought that this might require
> high
> >> level locks and take some time to execute on large tables (I'm thinking
> >> `task_instance`). No one likes to see a database migration script hang
> for
> >> minutes... An alternate approach might be for someone in the community
> to
> >> share a script that does this and that people can review and decide
> whether
> >> they want to run it, and perhaps when to run it, maybe after archiving
> some
> >> of the large tables in their environment.
> >>
> >> Max
> >>
> >> On Tue, Nov 8, 2016 at 6:39 AM, Vishal Doshi  wrote:
> >>
> >>> We have an (atypical) use case where one DAG launches multiple runs of
> >>> another DAG (but with different parameters). Without the precision, we
> >> have
> >>> to add a second between each launch to avoid the database issues.
> Moving
> >>> towards allowing fractional seconds would be great for us.
> >>>
> >>> Thanks,
> >>> Vishal
> >>>
> >>> On 11/8/16, 04:29, "Bolke de Bruin"  wrote:
> >>>
> >>>Dear All,
> >>>
> >>>I’m trying to move over the testing infrastructure to the new
> >>> infrastructure based on ubuntu 14.04 (we are on 12.04 now). 12.04 uses
> >>> MySQL 5.5 and 14.04 allows the use of MySQL 5.6, which we say we are
> >>> compatible with. MySQL does not store fractional seconds. Until version
> >>> 5.6.4 (https://dev.mysql.com/doc/refman/5.6/en/fractional-seconds.html
> )
> >>> it cuts off fractional seconds at comparison time, eg. comparing
> >>> “2016-01-01 00:00:00.01” against what is stored in MySQL
> “2016-01-01
> >>> 00:00:00” would return a tuple in 5.6.4 but will fail beyond 5.6.4. The
> >>> issue presents itself if you use the “@once” schedule interval.
> >>>
> >>>Other databases (Postgres, SQLite, etc) store fractional seconds by
> >>> default so do not exhibit this error. Since MySQL 5.6.4 it can also
> store
> >>> fractional seconds, but for backwards compatibility it needs to be
> >>> specified in the schema. Also note that MySQL behavior (not storing
> >>> fractional seconds) goes against SQL standards as is noted by
> themselves
> >> (
> >>> http://dev.mysql.com/doc/refman/5.7/en/fractional-seconds.html).
> >>>
> >>>There are two solutions to this issue:
> >>>
> >>>1. Update the schema for MySQL to include fractional seconds.
> >>>PRO:
> >>>- no coding changes
> >>>- makes mysql behave conform standards
> >>>- easier to maintain
> >>>- future proof
> >>>
> >>>CON:
> >>>- needs to maintain schema
> >>>- requires an update to the schema of running mysql instances
> >>>
> >>>2. Change the code to remove fractional settings (particularly
> .now()
> >>> invocations)
> >>>PRO:
> >>>- No impact on running MySQL instances
> >>>
> >>>CON:
> >>>- Impact on other databases that now loose precision, and might for
> a
> >>> brief time show different behavior
> >>>- Code to maintain, cannot use .now() directly
> >>>- Be very careful when using date time and accessing the DB
> >>>
> >>>
> >>>There was some back and forth discussion on bitter about this, but
> we
> >>> don’t seem to reach a conclusion. Hence I would like to call for a
> vote -
> >>> at this election day :). Of course with arguments if needed. If there
> is
> >> a
> >>> better way I’m of course open to that.
> >>>
> >>>
> >>>I vote for OPTION 1.
> >>>
> >>>Bolke
> >>>
> >>>
> >>>
> >>>
> >>
>
>


Gantt chart broken on master?

2016-11-12 Thread siddharth anand
Gantt chart is broken for me on master.

I think it's due to this merge.
https://github.com/apache/incubator-airflow/commit/868bc83137adca0ebfd5780f0dff5a7bfdfaadf9

Why is an end_date needed?

[image: Inline image 1]

This is the tree view:
[image: Inline image 2]

Sumit, as the merger/committer, can you confirm?

-s


Fwd: Cold Case PR Cleanup -- Current Status

2016-11-05 Thread siddharth anand
Committers,
If you have time this week, please make a push to get your cold case
PR cleanup done. The deadline is Nov 15, just in time for our WePay meetup.
I will be making an announcement there.

-s

-- Forwarded message --
From: *siddharth anand* <san...@apache.org>
Date: Friday, November 4, 2016
Subject: Cold Case PR Cleanup -- Current Status
To: dev@airflow.incubator.apache.org


We have a little over 50 open PRs. We need to get to 10 by the end of the
year. Our current rate of new PRs (i.e. during this holiday season) is a
handful a week, so 10 open PRs roughly equates to PRs opened within a 2
week period.

2 week turn-around times for PR review should be the commitment the
maintainer group sticks to. BTW, I've seen a few contributors pitch in by
reviewing PRs. That is extremely helpful and speeds up PR review/merge
times. Please continue.

Here's the current cold-case status.

[image: Inline image 1]

-s

On Wed, Nov 2, 2016 at 12:50 PM, siddharth anand <san...@apache.org
<javascript:_e(%7B%7D,'cvml','san...@apache.org');>> wrote:

> https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-Cas
> e+PR+Resolution
>
> We are now, for the first time since I can remember, under *60* open PRs!
> Woo hoo! Keep up the pressure committers! If you haven't yet had closed
> your tracked PRs, please do so soon.
>
> Exactly 1 month ago (Oct 2), when this endeavor started, we were at *110*
> open PRs.
>
> [image: Inline image 1]
> -s
>
> On Wed, Nov 2, 2016 at 12:30 AM, siddharth anand <san...@apache.org
> <javascript:_e(%7B%7D,'cvml','san...@apache.org');>> wrote:
>
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-Cas
>> e+PR+Resolution
>>
>> [image: Inline image 1]
>>
>> -s
>>
>
>



-- 
Sent from Gmail Mobile


Re: Cold Case PR Cleanup -- Current Status

2016-11-04 Thread siddharth anand
We have a little over 50 open PRs. We need to get to 10 by the end of the
year. Our current rate of new PRs (i.e. during this holiday season) is a
handful a week, so 10 open PRs roughly equates to PRs opened within a 2
week period.

2 week turn-around times for PR review should be the commitment the
maintainer group sticks to. BTW, I've seen a few contributors pitch in by
reviewing PRs. That is extremely helpful and speeds up PR review/merge
times. Please continue.

Here's the current cold-case status.

[image: Inline image 1]

-s

On Wed, Nov 2, 2016 at 12:50 PM, siddharth anand <san...@apache.org> wrote:

> https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-
> Case+PR+Resolution
>
> We are now, for the first time since I can remember, under *60* open PRs!
> Woo hoo! Keep up the pressure committers! If you haven't yet had closed
> your tracked PRs, please do so soon.
>
> Exactly 1 month ago (Oct 2), when this endeavor started, we were at *110*
> open PRs.
>
> [image: Inline image 1]
> -s
>
> On Wed, Nov 2, 2016 at 12:30 AM, siddharth anand <san...@apache.org>
> wrote:
>
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-
>> Case+PR+Resolution
>>
>> [image: Inline image 1]
>>
>> -s
>>
>
>


Re: Cold Case PR Cleanup -- Current Status

2016-11-02 Thread siddharth anand
https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-Case+PR+Resolution

We are now, for the first time since I can remember, under *60* open PRs!
Woo hoo! Keep up the pressure committers! If you haven't yet had closed
your tracked PRs, please do so soon.

Exactly 1 month ago (Oct 2), when this endeavor started, we were at *110*
open PRs.

[image: Inline image 1]
-s

On Wed, Nov 2, 2016 at 12:30 AM, siddharth anand <san...@apache.org> wrote:

> https://cwiki.apache.org/confluence/display/AIRFLOW/
> Cold-Case+PR+Resolution
>
> [image: Inline image 1]
>
> -s
>


Re: Astronomer.io Airflow blog post

2016-11-02 Thread siddharth anand
Merged.

On Wed, Nov 2, 2016 at 11:43 AM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Here:
> https://github.com/apache/incubator-airflow/pull/1863
>
>
> On Wed, Nov 2, 2016 at 10:51 AM, siddharth anand <san...@apache.org>
> wrote:
>
> > Removing laurel's email address as it appears to be wrong.
> >
> > -s
> >
> > On Wed, Nov 2, 2016 at 10:49 AM, siddharth anand <san...@apache.org>
> > wrote:
> >
> > > +1 for that idea. We should place all links on the wiki and just have
> the
> > > project page point to the wiki!
> > > (https://airflow.incubator.apache.org/project.html).
> > >
> > > Gerard, would you like to file a quick PR for that change and I can
> > > approve/merge it?
> > > -s
> > >
> > > On Wed, Nov 2, 2016 at 10:37 AM, Gerard Toonstra <gtoons...@gmail.com>
> > > wrote:
> > >
> > >> They are both on the project page of the airflow documentation in
> > >> resources
> > >> & links and on the wiki, the wiki is a bit
> > >> richer in that regard. Maybe link to the wiki from the doc pages
> > instead,
> > >> so it's all in one place?
> > >>
> > >> https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links
> > >>
> > >> https://airflow.incubator.apache.org/project.html
> > >>
> > >> G>
> > >>
> > >>
> > >>
> > >> On Wed, Nov 2, 2016 at 5:13 PM, Maxime Beauchemin <
> > >> maximebeauche...@gmail.com> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > Laurel Brunk reached out to share this great blog post he wrote
> about
> > >> using
> > >> > Airflow at Astronomer:
> > >> > http://www.astronomer.io/blog/airflow-at-astronomer
> > >> >
> > >> > Where should we link out to this type of material? README.md? The
> > >> > Confluence wiki?
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Max
> > >> >
> > >>
> > >
> > >
> >
>


Re: Astronomer.io Airflow blog post

2016-11-02 Thread siddharth anand
Removing laurel's email address as it appears to be wrong.

-s

On Wed, Nov 2, 2016 at 10:49 AM, siddharth anand <san...@apache.org> wrote:

> +1 for that idea. We should place all links on the wiki and just have the
> project page point to the wiki!
> (https://airflow.incubator.apache.org/project.html).
>
> Gerard, would you like to file a quick PR for that change and I can
> approve/merge it?
> -s
>
> On Wed, Nov 2, 2016 at 10:37 AM, Gerard Toonstra <gtoons...@gmail.com>
> wrote:
>
>> They are both on the project page of the airflow documentation in
>> resources
>> & links and on the wiki, the wiki is a bit
>> richer in that regard. Maybe link to the wiki from the doc pages instead,
>> so it's all in one place?
>>
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links
>>
>> https://airflow.incubator.apache.org/project.html
>>
>> G>
>>
>>
>>
>> On Wed, Nov 2, 2016 at 5:13 PM, Maxime Beauchemin <
>> maximebeauche...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > Laurel Brunk reached out to share this great blog post he wrote about
>> using
>> > Airflow at Astronomer:
>> > http://www.astronomer.io/blog/airflow-at-astronomer
>> >
>> > Where should we link out to this type of material? README.md? The
>> > Confluence wiki?
>> >
>> > Thanks,
>> >
>> > Max
>> >
>>
>
>


Re: Astronomer.io Airflow blog post

2016-11-02 Thread siddharth anand
+1 for that idea. We should place all links on the wiki and just have the
project page point to the wiki!
(https://airflow.incubator.apache.org/project.html).

Gerard, would you like to file a quick PR for that change and I can
approve/merge it?
-s

On Wed, Nov 2, 2016 at 10:37 AM, Gerard Toonstra 
wrote:

> They are both on the project page of the airflow documentation in resources
> & links and on the wiki, the wiki is a bit
> richer in that regard. Maybe link to the wiki from the doc pages instead,
> so it's all in one place?
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links
>
> https://airflow.incubator.apache.org/project.html
>
> G>
>
>
>
> On Wed, Nov 2, 2016 at 5:13 PM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > Hi,
> >
> > Laurel Brunk reached out to share this great blog post he wrote about
> using
> > Airflow at Astronomer:
> > http://www.astronomer.io/blog/airflow-at-astronomer
> >
> > Where should we link out to this type of material? README.md? The
> > Confluence wiki?
> >
> > Thanks,
> >
> > Max
> >
>


Cold Case PR Cleanup -- Current Status

2016-11-02 Thread siddharth anand
https://cwiki.apache.org/confluence/display/AIRFLOW/Cold-Case+PR+Resolution

[image: Inline image 1]

-s


Re: The best place to initialize a db table whenever airflow starts ?

2016-11-01 Thread siddharth anand
A DAG Run is a run for a particular dag at a particular time. For example,
if your DAG were called FOO and if its schedule interval were @hourly, then
you would have DAG runs : Foo @ noon, Foo @ 1p, Foo @ 2p, etc...

If you need to seed a table prior to each run, just add a task at the start
of your DAG to load the target table. You can do this simply by using a
PythonOperator, specifying your own custom Python callable -- your callable
would insert the necessary data.

I would warn that it is possible for Airflow to schedule DAG runs for FOO
in parallel, unless you specify, depends_on_past=True. You should probably
structure your table to be keyed by the dag run so that you could run
multiple DAG runs concurrently (i.e. depends_on_past=False).

-s

On Tue, Nov 1, 2016 at 7:09 AM, Michael Gong  wrote:

> Hi,
>
> I have a MySQL table, which will be stored some static information. The
> information could be different for different airflow runs, so I hope to use
> python code to initialize it whenever airflow starts.
>
>
> Where is the best place to put such code ?
>
>
> Is the class DagBag's __init__() a good candidate ?
>
>
> Please advise.
>
>
> Thanks.
>
>
> #
>
> class DagBag(LoggingMixin):
> """
> A dagbag is a collection of dags, parsed out of a folder tree and has
> high
> level configuration settings, like what database to use as a backend
> and
> what executor to use to fire off tasks. This makes it easier to run
> distinct environments for say production and development, tests, or for
> different teams or security profiles. What would have been system level
> settings are now dagbag level so that one system can run multiple,
> independent settings sets.
>
> :param dag_folder: the folder to scan to find DAGs
> :type dag_folder: str
> :param executor: the executor to use when executing task instances
> in this DagBag
> :param include_examples: whether to include the examples that ship
> with airflow or not
> :type include_examples: bool
> :param sync_to_db: whether to sync the properties of the DAGs to
> the metadata DB while finding them, typically should be done
> by the scheduler job only
> :type sync_to_db: bool
> """
> def __init__(
> self,
> dag_folder=None,
> executor=DEFAULT_EXECUTOR,
> include_examples=configuration.getboolean('core',
> 'LOAD_EXAMPLES'),
> sync_to_db=False):
>
> dag_folder = dag_folder or DAGS_FOLDER
> self.logger.info("Filling up the DagBag from
> {}".format(dag_folder))
> self.dag_folder = dag_folder
> self.dags = {}
> self.sync_to_db = sync_to_db
> self.file_last_changed = {}
> self.executor = executor
> self.import_errors = {}
> if include_examples:
> example_dag_folder = os.path.join(
>
> ...
>
> #
>
>


Re: Possible airflow-pool bug

2016-11-01 Thread siddharth anand
Yes.. we have seen the over-subscription of pools. We do need a fix for it
--- I don't believe there is one. We need someone to own and fix it.. happy
to review a PR.

We use pools at Agari for all of our needs. We are okay with mild
oversubscription, so we do see numbers slightly higher, but our pipelines
work fine with that level. In your example, the oversubscription is much
higher.
-s

On Tue, Nov 1, 2016 at 6:38 AM, David Kegley  wrote:

> I've been seeing some weird behavior when using Airflow's execution pool
> feature.  Pools have been massively over-filled leading to failed tasks. I
> created a bug for this issue, but in the mean time, has anyone else
> experienced this behavior?
>
> https://issues.apache.org/jira/browse/AIRFLOW-584
>
> Best,
> David
>


  1   2   3   >