Re: Why not mark inactive DAGs in the main scheduler loop?

2018-08-22 Thread Ruiqin Yang
I previously sent a proposal about scaling Airflow, I created Jira tickets around that time. For this particular one, it is AIRFLOW-2760 . We've finished testing it in Airbnb and plan to bake it for some time while I work on open source the change

Re: Why not mark inactive DAGs in the main scheduler loop?

2018-08-22 Thread Taylor Edmiston
Kevin - Is there a Jira issue one can follow for this? On Wed, Aug 22, 2018 at 5:29 PM Ruiqin Yang wrote: > I'm working on spliting the DAG parsing manager to a subprocess and with > that we don't need to worry about scheduler doing non-supervisor stuff nor > prolong scheduler loop duration. I c

Re: Why not mark inactive DAGs in the main scheduler loop?

2018-08-22 Thread Ruiqin Yang
I'm working on spliting the DAG parsing manager to a subprocess and with that we don't need to worry about scheduler doing non-supervisor stuff nor prolong scheduler loop duration. I can make a follow up PR to address this once I have the original PR published if you guys don't have plan to work on

Re: Why not mark inactive DAGs in the main scheduler loop?

2018-08-22 Thread Dan Davydov
Agreed on delegation to a subprocess but I think that can come as part of a larger redesign (maybe along with uploading DAG import errors etc). The query should be quite fast so it should not have a significant impact on the Scheduler times. On Wed, Aug 22, 2018 at 3:52 PM Maxime Beauchemin < maxi

Re: Why not mark inactive DAGs in the main scheduler loop?

2018-08-22 Thread Maxime Beauchemin
I'd rather the scheduler delegate that to one of the minions (subprocess) if possible. We should keep everything we can off the main thread. BTW I've been speaking about renaming the scheduler to "supervisor" for a while now. While renaming may be a bit tricky (updating all references in the code)

Re: Why not mark inactive DAGs in the main scheduler loop?

2018-08-22 Thread Taylor Edmiston
I'm not super familiar with this part of the scheduler. What exactly are the implications of doing this mid-loop vs at scheduler termination? Is there a use case where DAGs hit this besides having been deleted? The deactivate_stale_dags call doesn't appear to be super expensive or anything like t

Why not mark inactive DAGs in the main scheduler loop?

2018-08-22 Thread Dan Davydov
I see some PRs creating endpoints to delete DAGs and other things related to manually deleting DAGs from the DB, but is there a good reason why we can't just move the deactivating DAG logic into the main scheduler loop? The scheduler already has some code like this, but it only runs when the Sched