Re: scheduler questions

2016-10-17 Thread Maycock, Luke
<mailto:luke.mayc...@affiliate.oliverwyman.com> www.oliverwyman.com<http://www.oliverwyman.com/> From: siddharth anand <san...@apache.org> Sent: 13 October 2016 18:11 To: dev@airflow.incubator.apache.org Subject: Re: scheduler questions B

Re: scheduler questions

2016-10-15 Thread Ben Tallman
Boris - The pull request includes a airflow.cfg config entry to set backfill=False by default. [scheduler] backfill_by_default=(*true*|false) Thanks, Ben *--* *ben tallman* | *apigee

Re: scheduler questions

2016-10-14 Thread Boris Tyukin
Done. It makes sense to me Ben as backfill concept is very confusing to me. I think it even should be off by default. On 2016-10-13 12:05 (-0400), Ben Tallman wrote: > Boris - > > We have a pull request in which causes the scheduler to not backfill on a > per dag basis. This

Re: scheduler questions

2016-10-13 Thread Boris Tyukin
you rock, Sid! thanks for taking your time explaining it for me On Thu, Oct 13, 2016 at 6:10 PM, siddharth anand wrote: > I can't see an image. > > We run most of our dags with depends_on_past=True. > > If you want to chain your dag runs, such as not starting the first task

Re: scheduler questions

2016-10-13 Thread siddharth anand
I can't see an image. We run most of our dags with depends_on_past=True. If you want to chain your dag runs, such as not starting the first task of your dag run start until the last task of your previous dag runs completes, you can use an external task sensor. The external task sensor would be

Re: scheduler questions

2016-10-13 Thread Boris Tyukin
This is not what I see actually. I posted below my test DAG and a screenshot. It does create DAGRuns on subsequent runs - I modeled that scenario by commenting one bash command and uncommenting another one with Exit 1. it does not create Task Instances on subsequent failed DAGs but it does

Re: scheduler questions

2016-10-13 Thread siddharth anand
If you use depends_on_past=True, it won't proceed to the next DAG Run if the previous DAG Run failed. If Day 2 fails, Day 3 won't run. -s On Thu, Oct 13, 2016 at 10:34 AM, siddharth anand wrote: > Yes! It does work with Depends_on_past=True. > -s > > On Thu, Oct 13, 2016 at

Re: scheduler questions

2016-10-13 Thread siddharth anand
Yes! It does work with Depends_on_past=True. -s On Thu, Oct 13, 2016 at 10:28 AM, Boris Tyukin wrote: > thanks so much, Sid! just a follow up question on "Only_Run_Latest" - will > it work with depend_on_past = True? or it will assume that DAG is used > False? > > On Thu,

Re: scheduler questions

2016-10-13 Thread Boris Tyukin
so for my second scenario, I think i would still need to run missing days jobs one by one (by clearing the failed ones) and I understand this is recommended approach as I figured from Maxime's video. But sometimes it is more efficient to combine all missing day runs in one so I would be using a

Re: scheduler questions

2016-10-13 Thread Boris Tyukin
thanks so much, Sid! just a follow up question on "Only_Run_Latest" - will it work with depend_on_past = True? or it will assume that DAG is used False? On Thu, Oct 13, 2016 at 1:11 PM, siddharth anand wrote: > Boris, > > *Question 1* > Only_Run_Latest is in master - >

Re: scheduler questions

2016-10-13 Thread siddharth anand
*Question 2* You can use depend_on_past=True. Then, future dag runs won't be scheduled until past one succeed, which I specify as shown below: default_args = { 'owner': 'sanand', 'depends_on_past': True, 'pool': 'ep_data_pipeline', 'start_date': START_DATE, 'email':

Re: scheduler questions

2016-10-13 Thread Joseph Napolitano
Hi Boris, To answer the first question, the backfill command has a flag to mark jobs as successful without running them. Take care to align the start and end times precisely as needed. As an example, for a job that runs daily at 7am: airflow backfill -s 2016-10-07T07 -e 2016-10-10T07