[
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Akira Ajisaka updated YARN-8990:
--------------------------------
Fix Version/s: (was: 3.2.0)
Target Version/s: 3.2.3
Labels: pull-request-available release-blocker (was:
pull-request-available)
> Fix fair scheduler race condition in app submit and queue cleanup
> -----------------------------------------------------------------
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Affects Versions: 3.2.0
> Reporter: Wilfred Spiegelenburg
> Assignee: Wilfred Spiegelenburg
> Priority: Blocker
> Labels: pull-request-available, release-blocker
> Fix For: 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race
> condition was introduced that can cause a queue to be removed while an
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is
> submitted to a dynamic queue which is empty or the queue does not exist yet.
> If during the processing of the application submit the
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up
> will be run first. The application submit first creates the queue and get a
> reference back to the queue.
> Other checks are performed and as the last action before getting ready to
> generate an AppAttempt the queue is updated to show the submitted application
> ID..
> The time between the queue creation and the queue update to show the submit
> is long enough for the queue to be removed. The application however is lost
> and will never get any resources assigned.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]