At Slack, We follow a similar pattern of deploying multiple airflow
instances. Since the Airflow UI & the scheduler coupled, it introduces
friction as the user need to know underlying deployment strategy. (like
which Airflow URL I should visit to see my DAGs, multiple teams
collaborating on the
It is an interesting question. On a slightly related note, Correct me if
I'm wrong, AFAIK we require restarting airflow scheduler in order pick any
new DAG file changes by the scheduler. In that case, should the scheduler
do the DAGFileProcessing every time before scheduling the tasks?
Regards,
It is a bit tricky;
*Step 1:*
you can write an SLA miss callback, send email from the callback and empty
the `slas` object so that airflow won't sent SLA miss email.
https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L688
*Step 2:*
You can reuse airflow `send_email` method
Since you already on AWS, the simplest thing I could think of is to write a
signal file once the job finished and the downstream job waiting for the
signal file. In other words, the same pattern how the Hadoop jobs writing
`_SUCCESS` file and the downstream jobs depends on the signal file.
Since we are talking about the SLA implementation, The current SLA miss
implementation is part of the scheduler code. So in the cases like
scheduler max out the process / not running for some reason, we will miss
all the SLA alert. It is worth to decouple SLA alert from the scheduler
path and run
This is awesome. Thanks for sharing the link.
Regards,
Ananth.P,
On 28 March 2018 at 21:30, Tao Feng wrote:
> Thanks Max for sharing the link. This is great :)
>
> -Tao
>
> On Wed, Mar 28, 2018 at 9:21 PM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> >
We can't do that, unfortunately. Airflow schedule the task based on the
current state in the DB. If you would like to preserve the history one
option would be to add instrumentation on airflow_local_settings.py
Regards,
Ananth.P,
On 5 February 2018 at 13:09, David Capwell
I can give a talk about all the hacks we did to scale Airflow Local
Executor and improve the data pipeline on-call experience at Slack if folks
are interested.
Regards,
Ananth.P,
On 8 January 2018 at 15:51, George Leslie-Waksman <
geo...@cloverhealth.com.invalid> wrote:
> +1 and would love
Hi there,
I'm having a hard time to understand airflow parallelism. I'm running
Airflow with the following configuration
Executor: LocalExecutor
parallelism: 156
Airflow version: 1.8.1
I assume the maximum number of "Task Instances" would be 156 at any given
time, but I constantly see the active