Re: [DISCUSS] Mechanism of SLA

2023-09-20 Thread Sung Yun
Thank you for the input Colin - noted. Sent from my iPhone > On Sep 20, 2023, at 4:53 PM, Collin McNulty > wrote: > > I want to concur with Daniel and Damian, that the key behavior for SLA > should be based on when a DAG or task _should_ have run, not when it > actually started. I think

Re: [DISCUSS] Mechanism of SLA

2023-09-20 Thread Collin McNulty
I want to concur with Daniel and Damian, that the key behavior for SLA should be based on when a DAG or task _should_ have run, not when it actually started. I think that’s important from a semantic standpoint, but also I think it’s just an important behavior to have in Airflow, because that’s how

Re: [DISCUSS] Mechanism of SLA

2023-09-20 Thread Sung Yun
would therefore agree that to many it would be > unintuitive if the behavior is 2 but it is called SLA. > > Damian > > -Original Message- > From: Daniel Standish > Sent: Wednesday, September 20, 2023 11:29 AM > To: dev@airflow.apache.org > Subject: Re: [DISCUSS] Mech

RE: [DISCUSS] Mechanism of SLA

2023-09-20 Thread Damian Shaw
agree that to many it would be unintuitive if the behavior is 2 but it is called SLA. Damian -Original Message- From: Daniel Standish Sent: Wednesday, September 20, 2023 11:29 AM To: dev@airflow.apache.org Subject: Re: [DISCUSS] Mechanism of SLA I don't think of it as really a question

Re: [DISCUSS] Mechanism of SLA

2023-09-20 Thread Daniel Standish
I don't think of it as really a question about accurate record keeping but more a question of what an SLA is, i.e. when do you want the warning, or what do you want the warning based on. I think that the idea has been that it really means, "if task not done by X time each day then warn". And the

Re: [DISCUSS] Mechanism of SLA

2023-09-19 Thread Ayaan Rayaan
mohammadirfanmemon687...@gmail.com On Sep 19, 2023 10:19 AM, "Daniel Standish" wrote: > I was able to chat with a couple folks about this. Small sample, but the > sentiment was, "this is just a timeout". In other words, if we're going to > call this SLA, we really ought to evaluate against the

Re: [DISCUSS] Mechanism of SLA

2023-09-19 Thread Sung Yun
Hi Daniel, Thank you for following up with the assessment. That’s an incredibly valuable data point. I know we may have some opportunity to talk about this topic more at the summit this week, but just for the sake of offering a reference of the other perspective, I would like to share this

Re: [DISCUSS] Mechanism of SLA

2023-09-18 Thread Daniel Standish
I was able to chat with a couple folks about this. Small sample, but the sentiment was, "this is just a timeout". In other words, if we're going to call this SLA, we really ought to evaluate against the "this thing should have run by" time and not the actual start time. And, ideally, we should

Re: [DISCUSS] Mechanism of SLA

2023-09-13 Thread Sung Yun
No problem! I very much appreciate your questions and critical thought process as well. It's been pretty difficult for me to fully understand how the SLA feature worked, given how overloaded and complicated the logic is in its current state. So it really helps to have another invested party

Re: [DISCUSS] Mechanism of SLA

2023-09-13 Thread Daniel Standish
OK so one difference here is, you're adding a new DAG SLA concept. Which is useful. One subtle difference from what I think is the existing "concept" of SLA is that you are evaluating it against when it started, as opposed to when it should have started, and evaluating it only in the course of

Re: [DISCUSS] Mechanism of SLA

2023-09-13 Thread Daniel Standish
First of all, thanks for being so charitable in engaging in this dialogue, I appreciate it. Yeah I think that the notion that maybe Airflow is making really impractical promises with SLA, well that could be true. One question for you, as I continue to let this percolate. Can you help me

Re: [DISCUSS] Mechanism of SLA

2023-09-13 Thread Sung Yun
Hi Daniel, These are all really great points, and I'm going to attempt at answering all of them in no particular order: On Expectations / SLAs / Naming: I think you hit the nail on the head here, and I agree with you that the naming choice of SLA is very misleading. To my understanding,

Re: [DISCUSS] Mechanism of SLA

2023-09-12 Thread Daniel Standish
> > [1] Yes, that's correct. I believe that containing the SLA evaluation > within the lifetime of a task as a duration-based sla will still have a > purpose. It's technically implemented like an execution_timeout, but the > goal of the SLA check is to execute a callback without killing the task:

Re: [DISCUSS] Mechanism of SLA

2023-09-12 Thread Sung Yun
Hi Daniel, Thank you for the review! I'm happy to keep having the discussion to make sure we can introduce the right way of implementing these solutions into Airflow. My general impression from the community in the discussions so far led me to believe that deprecating the problematic feature

Re: [DISCUSS] Mechanism of SLA

2023-09-12 Thread Daniel Standish
Some questions for you Sung. I tried looking to understand why we needed to remove behavior 3 discussed in AIP: *[remove]* Task-level SLA measured from DAG-run scheduled start time I'm just a little concerned that removing this would be a mistake because, in my mind, part of the essence of

Re: [DISCUSS] Mechanism of SLA

2023-07-15 Thread Jarek Potiuk
I really like the proposal as it is now. I think it is generally ready to be put up to vote (and implement). I think it has a chance to finally get our SLA feature straightened out. J. On Sat, Jul 8, 2023 at 12:00 AM Sung Yun wrote: > Thank you for the clarification Jarek :) > > I’ve updated

Re: [DISCUSS] Mechanism of SLA

2023-07-07 Thread Sung Yun
Thank you for the clarification Jarek :) I’ve updated the AIP on the Confluence page with your suggestion - please let me know what you folks think! In summary, I think it will serve as a great way to maintain some capacity to measure a soft-timeout within a task. Obvious pros of this approach

Re: [DISCUSS] Mechanism of SLA

2023-07-04 Thread Jarek Potiuk
> Which forking strategy are we exactly proposing? The important part is that you have a separate process that will run a separate Python interpreter so that if the task runs a "C" code without a loop, the "timer" thread will be able to stop it regardless (for timeout) and one that can run

Re: [DISCUSS] Mechanism of SLA

2023-06-21 Thread Sung Yun
Hi Jarek, I've been mulling over the implementation of (3) task: time_limit_sla, and I have some follow up questions about the implementation. Which forking strategy are we exactly proposing? Currently, we invoke task.execute_callable within the taskinstance, which we can effectively think of as

Re: [DISCUSS] Mechanism of SLA

2023-06-20 Thread Pierre Jeambrun
This task_sla is more and more making me think of a ‘task’ on its own. It would need to be run in parallel, non blocking, not overlap between each other, etc… How hard would it be to spawn them when a task run with SLA configured as a normal workload on the worker ? Maybe on a dedicated queue /

Re: [DISCUSS] Mechanism of SLA

2023-06-20 Thread Sung Yun
Thank you all for your continued engagement and input! It looks like Iaroslav's layout of 3 different labels of SLA's is helping us group the implementation into different categories, so I will organize my own responses in those logical groupings as well. 1. dag_sla 2. task_sla 3. task:

Re: [DISCUSS] Mechanism of SLA

2023-06-18 Thread utkarsh sharma
> > This can be IMHO implemented on the task level. We currently have timeout > implemented this way - whenever we start the task, we can have a signal > handler registered with "real" time registered that will cancel the task. > But I can imagine similar approach with signal and propagate the >

Re: [DISCUSS] Mechanism of SLA

2023-06-18 Thread Iaroslav Poskriakov
I want to say that airflow is a very popular project and the ways of calculating SLA are different. Because of different business cases. And if it's possible we should make most of them from the box. вс, 18 июн. 2023 г. в 13:30, Iaroslav Poskriakov < yaroslavposkrya...@gmail.com>: > So, I

Re: [DISCUSS] Mechanism of SLA

2023-06-18 Thread Iaroslav Poskriakov
So, I totally agree about dag level slas. It's very important to have it and according to Sung Yun proposal it should be implemented not on the scheduler job level. Regarding the second way of determining SLA: --> --> . It's very helpful in the way when we want to achieve not technical SLA

Re: [DISCUSS] Mechanism of SLA

2023-06-17 Thread Jarek Potiuk
I am also for DAG level SLA only (but maybe there are some twists). And I hope (since Sung Yun has not given up on that) - maybe that is the right time that others here will chime in and maybe it will let the vote go on? I think it would be great to get the SLA feature sorted out so that we have

Re: [DISCUSS] Mechanism of SLA

2023-06-15 Thread Sung Yun
Hello! Thank you very much for the feedback on the proposal. I’ve been hoping to get some more traction on this proposal, so it’s great to hear from another user of the feature. I understand that there’s a lot of support for keeping a native task level SLA feature, and I definitely agree with

[DISCUSS] Mechanism of SLA

2023-06-13 Thread Ярослав Поскряков
Mechanism of SLA Hi, I read the previous conversation regarding SLA and I think removing the opportunity to set sla for the task level will be a big mistake. So, the proposed implementation of the task level SLA will not be working correctly. That's why I guess we have to think about the