scheduler questions

2016-10-13 Thread Boris Tyukin
Hello all and thanks for such an amazing project! I have been evaluating Airflow and spent a few days reading about it and playing with it and I have a few questions that I struggle to understand. Let's say I have a simple DAG that runs once a day and it is doing a full reload of tables from the

Re: scheduler questions

2016-10-13 Thread Joseph Napolitano
Hi Boris, To answer the first question, the backfill command has a flag to mark jobs as successful without running them. Take care to align the start and end times precisely as needed. As an example, for a job that runs daily at 7am: airflow backfill -s 2016-10-07T07 -e 2016-10-10T07

Re: scheduler questions

2016-10-13 Thread siddharth anand
If you use depends_on_past=True, it won't proceed to the next DAG Run if the previous DAG Run failed. If Day 2 fails, Day 3 won't run. -s On Thu, Oct 13, 2016 at 10:34 AM, siddharth anand wrote: > Yes! It does work with Depends_on_past=True. > -s > > On Thu, Oct 13, 2016 at

Re: scheduler questions

2016-10-13 Thread siddharth anand
*Question 2* You can use depend_on_past=True. Then, future dag runs won't be scheduled until past one succeed, which I specify as shown below: default_args = { 'owner': 'sanand', 'depends_on_past': True, 'pool': 'ep_data_pipeline', 'start_date': START_DATE, 'email':

Re: +1 on PRs!

2016-10-13 Thread siddharth anand
These are all good ideas and any would work. Our committer list is small enough that no committer would merge without either +1'ing it himself/herself or checking that another did. Sure, there could be a corner cases where a contributor +1'd a PR before being promoted to a committer and then

Re: scheduler questions

2016-10-13 Thread Boris Tyukin
so for my second scenario, I think i would still need to run missing days jobs one by one (by clearing the failed ones) and I understand this is recommended approach as I figured from Maxime's video. But sometimes it is more efficient to combine all missing day runs in one so I would be using a

Re: scheduler questions

2016-10-13 Thread siddharth anand
Boris, *Question 1* Only_Run_Latest is in master - https://github.com/apache/incubator-airflow/commit/edf033be65b575f44aa221d5d0ec9ecb6b32c67a. That will solve your problem. Releases come out one a quarter sometimes once every 2 quarters, so I would recommend that you run off master or off your

Re: scheduler questions

2016-10-13 Thread siddharth anand
Yes! It does work with Depends_on_past=True. -s On Thu, Oct 13, 2016 at 10:28 AM, Boris Tyukin wrote: > thanks so much, Sid! just a follow up question on "Only_Run_Latest" - will > it work with depend_on_past = True? or it will assume that DAG is used > False? > > On Thu,

Re: scheduler questions

2016-10-13 Thread Boris Tyukin
thanks so much, Sid! just a follow up question on "Only_Run_Latest" - will it work with depend_on_past = True? or it will assume that DAG is used False? On Thu, Oct 13, 2016 at 1:11 PM, siddharth anand wrote: > Boris, > > *Question 1* > Only_Run_Latest is in master - >

Re: scheduler questions

2016-10-13 Thread Boris Tyukin
This is not what I see actually. I posted below my test DAG and a screenshot. It does create DAGRuns on subsequent runs - I modeled that scenario by commenting one bash command and uncommenting another one with Exit 1. it does not create Task Instances on subsequent failed DAGs but it does

Re: Next Airflow meet-up

2016-10-13 Thread George Leslie-Waksman
Can someone add "gwax" to the editor list for Confluence so I can add a Clover Health as the host of the subsequent meetup? I assume early December is too soon; how do folks feel about mid-January? --George Leslie-Waksman On Wed, Oct 12, 2016 at 12:22 AM Alex Van Boxel wrote:

Re: scheduler questions

2016-10-13 Thread siddharth anand
I can't see an image. We run most of our dags with depends_on_past=True. If you want to chain your dag runs, such as not starting the first task of your dag run start until the last task of your previous dag runs completes, you can use an external task sensor. The external task sensor would be

A question/poll on the TaskInstance data model...

2016-10-13 Thread Ben Tallman
I (and Apigee) would like to have the DAG Graph paint old DagRuns based on the tasks (and ids) that ran, and not based off of the current DAG from the DagBag. In order to do that, I need to be able to map a DagRun, and one way is from the TaskInstance table. However that doesn't actually contain

Re: scheduler questions

2016-10-13 Thread Boris Tyukin
you rock, Sid! thanks for taking your time explaining it for me On Thu, Oct 13, 2016 at 6:10 PM, siddharth anand wrote: > I can't see an image. > > We run most of our dags with depends_on_past=True. > > If you want to chain your dag runs, such as not starting the first task

Airflow Logging

2016-10-13 Thread Maycock, Luke
Hi All, We (owlabs - fork: https://github.com/owlabs/incubator-airflow) have a high level design for how to improve the logging throughout the Airflow code to be more consistent, maintainable and extensible. We'd really appreciate any feedback on the design. Design for Consolidating Logging

Re: +1 on PRs!

2016-10-13 Thread Arthur Wiedmer
Another way to do this would be to use the reactions on Github. Use the  to upvote an issue or PR you care about. And reserve the +1s for Committer Code review. On top of that, it makes it easier to tally the votes. Best, Arthur On Tue, Oct 11, 2016 at 2:31 PM, siddharth anand