Re: Identifying delay between schedule & run instances

2018-08-09 Thread vardanguptacse



On 2018/08/09 06:27:30, Bolke de Bruin  wrote: 
> Hi vardang,
> 
> What do you intent to gain from this metric? There are many influences that 
> influence a difference between execution date and start date. You named one 
> of them, but there are also functional ones (limits reached etc). We are not 
> a real time system so we never really purposefully aimed for lowering a 
> difference because.
> 
> B.
> 
> Verstuurd vanaf mijn iPad
> 
> > Op 9 aug. 2018 om 08:04 heeft vardangupta...@gmail.com 
> >  het volgende geschreven:
> > 
> > 
> > 
> >> On 2018/08/06 07:07:05, vardangupta...@gmail.com 
> >>  wrote: 
> >> Hi Everyone,
> >> 
> >> We just wanted to calculate a metric which can talk about what's the 
> >> delay(if any) between DAG getting active in scheduler & server and then 
> >> tasks of DAG actually getting kicked off (let's suppose start_date was of 
> >> 1 hour earlier and schedule was every 10 minutes).
> >> 
> >> Currently task_instance table has execution_date, start_date, end_date & 
> >> queued_dttm, we can easily get this metric from the difference of 
> >> start_date  & execution_date but in case of back fill, execution_date will 
> >> be of previous schedule occurrence and difference of start_date & 
> >> execution_date will be skewed, though it will be okay for any future runs 
> >> to get the delay in scheduling but for back fills, this number won't be 
> >> trustworthy, any suggestions how to smartly identify this metric, may be 
> >> by knowing somehow back fill details? Even in DAG table, there is no 
> >> create_date & update_date notion which can tell me when this DAG was 
> >> originally brought to existence?
> >> 
> >> 
> >> Regards,
> >> Vardan Gupta
> >> 
> > Can someone look at the issue?
> 
Yes, you're right. Nature of Airflow is not to schedule real time scenarios, 
but as a service provider in our organization, we wanted to reach a number 
before talking to our internal teams, so that we could possibly convey a 
number, let's say in 95 percentile scheduling, there will be no more delay of x 
minutes.


Re: Identifying delay between schedule & run instances

2018-08-09 Thread Bolke de Bruin
Hi vardang,

What do you intent to gain from this metric? There are many influences that 
influence a difference between execution date and start date. You named one of 
them, but there are also functional ones (limits reached etc). We are not a 
real time system so we never really purposefully aimed for lowering a 
difference because.

B.

Verstuurd vanaf mijn iPad

> Op 9 aug. 2018 om 08:04 heeft vardangupta...@gmail.com 
>  het volgende geschreven:
> 
> 
> 
>> On 2018/08/06 07:07:05, vardangupta...@gmail.com  
>> wrote: 
>> Hi Everyone,
>> 
>> We just wanted to calculate a metric which can talk about what's the 
>> delay(if any) between DAG getting active in scheduler & server and then 
>> tasks of DAG actually getting kicked off (let's suppose start_date was of 1 
>> hour earlier and schedule was every 10 minutes).
>> 
>> Currently task_instance table has execution_date, start_date, end_date & 
>> queued_dttm, we can easily get this metric from the difference of start_date 
>>  & execution_date but in case of back fill, execution_date will be of 
>> previous schedule occurrence and difference of start_date & execution_date 
>> will be skewed, though it will be okay for any future runs to get the delay 
>> in scheduling but for back fills, this number won't be trustworthy, any 
>> suggestions how to smartly identify this metric, may be by knowing somehow 
>> back fill details? Even in DAG table, there is no create_date & update_date 
>> notion which can tell me when this DAG was originally brought to existence?
>> 
>> 
>> Regards,
>> Vardan Gupta
>> 
> Can someone look at the issue?


Re: Identifying delay between schedule & run instances

2018-08-09 Thread vardanguptacse



On 2018/08/06 07:07:05, vardangupta...@gmail.com  
wrote: 
> Hi Everyone,
> 
> We just wanted to calculate a metric which can talk about what's the delay(if 
> any) between DAG getting active in scheduler & server and then tasks of DAG 
> actually getting kicked off (let's suppose start_date was of 1 hour earlier 
> and schedule was every 10 minutes).
> 
> Currently task_instance table has execution_date, start_date, end_date & 
> queued_dttm, we can easily get this metric from the difference of start_date  
> & execution_date but in case of back fill, execution_date will be of previous 
> schedule occurrence and difference of start_date & execution_date will be 
> skewed, though it will be okay for any future runs to get the delay in 
> scheduling but for back fills, this number won't be trustworthy, any 
> suggestions how to smartly identify this metric, may be by knowing somehow 
> back fill details? Even in DAG table, there is no create_date & update_date 
> notion which can tell me when this DAG was originally brought to existence?
> 
> 
> Regards,
> Vardan Gupta
> 
Can someone look at the issue?


Identifying delay between schedule & run instances

2018-08-06 Thread vardanguptacse
Hi Everyone,

We just wanted to calculate a metric which can talk about what's the delay(if 
any) between DAG getting active in scheduler & server and then tasks of DAG 
actually getting kicked off (let's suppose start_date was of 1 hour earlier and 
schedule was every 10 minutes).

Currently task_instance table has execution_date, start_date, end_date & 
queued_dttm, we can easily get this metric from the difference of start_date  & 
execution_date but in case of back fill, execution_date will be of previous 
schedule occurrence and difference of start_date & execution_date will be 
skewed, though it will be okay for any future runs to get the delay in 
scheduling but for back fills, this number won't be trustworthy, any 
suggestions how to smartly identify this metric, may be by knowing somehow back 
fill details? Even in DAG table, there is no create_date & update_date notion 
which can tell me when this DAG was originally brought to existence?


Regards,
Vardan Gupta