Re: scheduler questions

2016-10-17 Thread Maycock, Luke
<mailto:luke.mayc...@affiliate.oliverwyman.com> www.oliverwyman.com<http://www.oliverwyman.com/> From: siddharth anand Sent: 13 October 2016 18:11 To: dev@airflow.incubator.apache.org Subject: Re: scheduler questions Boris, *Question 1* Only_Run_

Re: scheduler questions

2016-10-15 Thread Ben Tallman
Boris - The pull request includes a airflow.cfg config entry to set backfill=False by default. [scheduler] backfill_by_default=(*true*|false) Thanks, Ben *--* *ben tallman* | *apigee

Re: scheduler questions

2016-10-14 Thread Boris Tyukin
Done. It makes sense to me Ben as backfill concept is very confusing to me. I think it even should be off by default. On 2016-10-13 12:05 (-0400), Ben Tallman wrote: > Boris - > > We have a pull request in which causes the scheduler to not backfill on a > per dag basis. This is designed for e

Re: scheduler questions

2016-10-13 Thread Boris Tyukin
you rock, Sid! thanks for taking your time explaining it for me On Thu, Oct 13, 2016 at 6:10 PM, siddharth anand wrote: > I can't see an image. > > We run most of our dags with depends_on_past=True. > > If you want to chain your dag runs, such as not starting the first task of > your dag run sta

Re: scheduler questions

2016-10-13 Thread siddharth anand
I can't see an image. We run most of our dags with depends_on_past=True. If you want to chain your dag runs, such as not starting the first task of your dag run start until the last task of your previous dag runs completes, you can use an external task sensor. The external task sensor would be th

Re: scheduler questions

2016-10-13 Thread Boris Tyukin
This is not what I see actually. I posted below my test DAG and a screenshot. It does create DAGRuns on subsequent runs - I modeled that scenario by commenting one bash command and uncommenting another one with Exit 1. it does not create Task Instances on subsequent failed DAGs but it does create

Re: scheduler questions

2016-10-13 Thread siddharth anand
If you use depends_on_past=True, it won't proceed to the next DAG Run if the previous DAG Run failed. If Day 2 fails, Day 3 won't run. -s On Thu, Oct 13, 2016 at 10:34 AM, siddharth anand wrote: > Yes! It does work with Depends_on_past=True. > -s > > On Thu, Oct 13, 2016 at 10:28 AM, Boris Tyuk

Re: scheduler questions

2016-10-13 Thread siddharth anand
Yes! It does work with Depends_on_past=True. -s On Thu, Oct 13, 2016 at 10:28 AM, Boris Tyukin wrote: > thanks so much, Sid! just a follow up question on "Only_Run_Latest" - will > it work with depend_on_past = True? or it will assume that DAG is used > False? > > On Thu, Oct 13, 2016 at 1:11 PM

Re: scheduler questions

2016-10-13 Thread Boris Tyukin
so for my second scenario, I think i would still need to run missing days jobs one by one (by clearing the failed ones) and I understand this is recommended approach as I figured from Maxime's video. But sometimes it is more efficient to combine all missing day runs in one so I would be using a wi

Re: scheduler questions

2016-10-13 Thread Boris Tyukin
thanks so much, Sid! just a follow up question on "Only_Run_Latest" - will it work with depend_on_past = True? or it will assume that DAG is used False? On Thu, Oct 13, 2016 at 1:11 PM, siddharth anand wrote: > Boris, > > *Question 1* > Only_Run_Latest is in master - > https://github.com/apache/

Re: scheduler questions

2016-10-13 Thread siddharth anand
*Question 2* You can use depend_on_past=True. Then, future dag runs won't be scheduled until past one succeed, which I specify as shown below: default_args = { 'owner': 'sanand', 'depends_on_past': True, 'pool': 'ep_data_pipeline', 'start_date': START_DATE, 'email': [import_ep_

Re: scheduler questions

2016-10-13 Thread siddharth anand
Boris, *Question 1* Only_Run_Latest is in master - https://github.com/apache/incubator-airflow/commit/edf033be65b575f44aa221d5d0ec9ecb6b32c67a. That will solve your problem. Releases come out one a quarter sometimes once every 2 quarters, so I would recommend that you run off master or off your o

Re: scheduler questions

2016-10-13 Thread Ben Tallman
Boris - We have a pull request in which causes the scheduler to not backfill on a per dag basis. This is designed for exactly this situation. Basically, the scheduler will skip intervals and jump to the last one in the list if this flag is set. If this is important to you, please vote for it. htt

Re: scheduler questions

2016-10-13 Thread Joseph Napolitano
Hi Boris, To answer the first question, the backfill command has a flag to mark jobs as successful without running them. Take care to align the start and end times precisely as needed. As an example, for a job that runs daily at 7am: airflow backfill -s 2016-10-07T07 -e 2016-10-10T07 my-dag-nam