[ 
https://issues.apache.org/jira/browse/AIRFLOW-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071350#comment-17071350
 ] 

Daniel Imberman commented on AIRFLOW-72:
----------------------------------------

This issue has been closed.
 
                            If you still feel this ticket is relevant, please 
submit a github issue
                             here 
https://github.com/apache/airflow/issues/new/choose

> Implement proper capacity scheduler
> -----------------------------------
>
>                 Key: AIRFLOW-72
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-72
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>            Reporter: Bolke de Bruin
>            Priority: Major
>              Labels: pool, queue, scheduler
>             Fix For: 2.0.0
>
>
> The scheduler is supposed to maintain queues and pools according to a 
> "capacity" model. However it is currently not properly implemented as 
> therefore issues as being able to oversubscribe to pools exist, race 
> conditions for queuing/dequeuing exist and probably others.
> This Jira Epic is to track all related issues to pooling/queuing and the 
> (tbd) roadmap to a proper capacity scheduler.
> Why queuing / scheduling broken:
> Locking is not properly implemented and cannot be as a check for slot 
> availability is spread throughout the scheduler, taskinstance and executor. 
> This makes obtaining a slot non-atomic and results in over subscribing. In 
> addition it leads to race conditions as having two tasks being picked from 
> the queue at the same time as the scheduler determines that a queued task 
> still needs to be send to the executor, while in an earlier run this already 
> happened.
> In order to fix this Pool handling needs to be centralized (code wise) and 
> work with a mutex (with_for_update()) on the database records. The 
> scheduler/taskinstance can then do something like:
> slot = Pool.obtain_slot(pool_id)
> Pool.release_slot(slot)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to