[jira] [Commented] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2021-08-02 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391388#comment-17391388
 ] 

Akira Ajisaka commented on YARN-8990:
-

Backport PR opened: https://github.com/apache/hadoop/pull/3254

> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2020-01-28 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025566#comment-17025566
 ] 

Wilfred Spiegelenburg commented on YARN-8990:
-

I do think that is a good idea. The patch should still apply for both to the 
branch-3.2, if not I can provide a branch specific patch if needed but we need 
a committer to check it in for us

> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
> Fix For: 3.2.0, 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2020-01-27 Thread Steven Rand (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024799#comment-17024799
 ] 

Steven Rand commented on YARN-8990:
---

How would people feel about cherrypicking this and YARN-8992 to {{branch-3.2}}? 
It seems like we should do that before {{branch-3.2.2}} gets cut for an 
eventual 3.2.2 release.

> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
> Fix For: 3.2.0, 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2019-11-08 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970395#comment-16970395
 ] 

Wilfred Spiegelenburg commented on YARN-8990:
-

Thank you [~Steven Rand] for making us aware of the omission. 
And yes you are correct. This was checked into 3.2.0 only and not in 3.2.x. It 
is in 3.3

For YARN-8992 the fix version is set incorrectly, that one is only in 3.3 
(adding comment there too)

 [~sunilg] how do we handle these two?

> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
> Fix For: 3.2.0, 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2019-11-04 Thread Steven Rand (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967099#comment-16967099
 ] 

Steven Rand commented on YARN-8990:
---

Hi all,

Unfortunately, this patch never made its way into the 3.2.1 release, which is 
affected by this race condition. I think what happened is that it was committed 
to trunk and backported to branch-3.2.0, but not to branch-3.2 (or 
branch-3.2.1).

And unless I'm misinterpreting the git history, the 3.2.1 release is also 
missing YARN-8992, despite the fix version of that ticket. 

We should at minimum make sure that the fixes for these race conditions are in 
3.2.2. Since this was a blocker and the impact is pretty serious, there may be 
more things we want to do, e.g., messaging or expediting the 3.2.2 release, but 
I'll leave that up you to decide.

> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
> Fix For: 3.2.0, 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2018-11-09 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681750#comment-16681750
 ] 

Sunil Govindan commented on YARN-8990:
--

Back ported to 3.2.0

Re spinning RC with this. Thanks [~wilfreds] 

> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
> Fix For: 3.2.0, 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2018-11-09 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681743#comment-16681743
 ] 

Daniel Templeton commented on YARN-8990:


Thanks for finding and fixing this one, [~wilfreds]!  It could have been a 
source of much unhappiness in 3.2.

> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
> Fix For: 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2018-11-08 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680725#comment-16680725
 ] 

Wilfred Spiegelenburg commented on YARN-8990:
-

thank you [~haibochen] for the quick review and checkin.

[~sunilg] should this be added to 3.2 or is 3.2.1 good enough?

> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
> Fix For: 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2018-11-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680656#comment-16680656
 ] 

Hudson commented on YARN-8990:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15393 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15393/])
YARN-8990. Fix fair scheduler race condition in app submit and queue 
(haibochen: rev 524a7523c427b55273133078898ae3535897bada)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java


> Fix fair scheduler race condition in app submit and queue cleanup
> -
>
> Key: YARN-8990
> URL: https://issues.apache.org/jira/browse/YARN-8990
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
> Fix For: 3.3.0
>
> Attachments: YARN-8990.001.patch, YARN-8990.002.patch
>
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org