Re: Single Airflow Instance Vs Multiple Airflow Instance

2018-06-07 Thread Ananth Durai
At Slack, We follow a similar pattern of deploying multiple airflow instances. Since the Airflow UI & the scheduler coupled, it introduces friction as the user need to know underlying deployment strategy. (like which Airflow URL I should visit to see my DAGs, multiple teams collaborating on the

Re: Disable Processing of DAG file

2018-05-28 Thread Ananth Durai
It is an interesting question. On a slightly related note, Correct me if I'm wrong, AFAIK we require restarting airflow scheduler in order pick any new DAG file changes by the scheduler. In that case, should the scheduler do the DAGFileProcessing every time before scheduling the tasks? Regards,

Re: Alert Emails Templatizing

2018-05-28 Thread Ananth Durai
It is a bit tricky; *Step 1:* you can write an SLA miss callback, send email from the callback and empty the `slas` object so that airflow won't sent SLA miss email. https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L688 *Step 2:* You can reuse airflow `send_email` method

Re: How to wait for external process

2018-05-28 Thread Ananth Durai
Since you already on AWS, the simplest thing I could think of is to write a signal file once the job finished and the downstream job waiting for the signal file. In other words, the same pattern how the Hadoop jobs writing `_SUCCESS` file and the downstream jobs depends on the signal file.

Re: Improving Airflow SLAs

2018-05-03 Thread Ananth Durai
Since we are talking about the SLA implementation, The current SLA miss implementation is part of the scheduler code. So in the cases like scheduler max out the process / not running for some reason, we will miss all the SLA alert. It is worth to decouple SLA alert from the scheduler path and run

Re: Awesome list of resources around Apache Airflow

2018-03-28 Thread Ananth Durai
This is awesome. Thanks for sharing the link. Regards, Ananth.P, On 28 March 2018 at 21:30, Tao Feng wrote: > Thanks Max for sharing the link. This is great :) > > -Tao > > On Wed, Mar 28, 2018 at 9:21 PM, Maxime Beauchemin < > maximebeauche...@gmail.com> wrote: > > >

Re: Rerunning task without cleaning DB?

2018-02-07 Thread Ananth Durai
We can't do that, unfortunately. Airflow schedule the task based on the current state in the DB. If you would like to preserve the history one option would be to add instrumentation on airflow_local_settings.py Regards, Ananth.P, On 5 February 2018 at 13:09, David Capwell

Re: Q1 Airflow Bay Area Meetup

2018-01-08 Thread Ananth Durai
I can give a talk about all the hacks we did to scale Airflow Local Executor and improve the data pipeline on-call experience at Slack if folks are interested. Regards, Ananth.P, On 8 January 2018 at 15:51, George Leslie-Waksman < geo...@cloverhealth.com.invalid> wrote: > +1 and would love

Airflow parallelism

2017-07-17 Thread Ananth Durai
Hi there, I'm having a hard time to understand airflow parallelism. I'm running Airflow with the following configuration Executor: LocalExecutor parallelism: 156 Airflow version: 1.8.1 I assume the maximum number of "Task Instances" would be 156 at any given time, but I constantly see the active