[
https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934624#comment-16934624
]
Hadoop QA commented on YARN-9552:
---------------------------------
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s{color} | {color:green} The patch appears to include 1 new or modified test
files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m
43s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
42s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
30s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
46s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
13m 48s{color} | {color:green} branch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m
25s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
31s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
14m 11s{color} | {color:green} patch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m
42s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
24s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}148m 12s{color} |
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:080e9d0f9b3 |
| JIRA Issue | YARN-9552 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12980910/YARN-9552-branch-3.1.002.patch
|
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient findbugs checkstyle |
| uname | Linux f57f76c44878 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.1 / 6ef3204 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| Test Results |
https://builds.apache.org/job/PreCommit-YARN-Build/24815/testReport/ |
| Max. process+thread count | 799 (vs. ulimit of 5500) |
| modules | C:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
U:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
|
| Console output |
https://builds.apache.org/job/PreCommit-YARN-Build/24815/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> FairScheduler: NODE_UPDATE can cause NoSuchElementException
> -----------------------------------------------------------
>
> Key: YARN-9552
> URL: https://issues.apache.org/jira/browse/YARN-9552
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Reporter: Peter Bacsko
> Assignee: Peter Bacsko
> Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9552-001.patch, YARN-9552-002.patch,
> YARN-9552-003.patch, YARN-9552-004.patch, YARN-9552-branch-3.1.001.patch,
> YARN-9552-branch-3.1.002.patch, YARN-9552-branch-3.2.001.patch,
> YARN-9552-branch-3.2.002.patch
>
>
> We observed a race condition inside YARN with the following stack trace:
> {noformat}
> 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR
> EventDispatcher: Error in handling event type NODE_UPDATE to the Event
> Dispatcher
> java.util.NoSuchElementException
> at
> java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
> at
> java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132)
> at
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> This is basically the same as the one described in YARN-7382, but the root
> cause is different.
> When we create an application attempt, we create an {{FSAppAttempt}} object.
> This contains an {{AppSchedulingInfo}} which contains a set of
> {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a
> bit later on a separate thread during a state transition:
> {noformat}
> 2019-05-07 15:58:02,659 INFO [RM StateStore dispatcher]
> recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for
> app: application_1557237478804_0001
> 2019-05-07 15:58:02,684 INFO [RM Event dispatcher] rmapp.RMAppImpl
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change
> from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
> 2019-05-07 15:58:02,690 INFO [SchedulerEventDispatcher:Event Processor]
> fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted
> application application_1557237478804_0001 from user: bacskop, in queue:
> root.bacskop, currently num of applications: 1
> 2019-05-07 15:58:02,698 INFO [RM Event dispatcher] rmapp.RMAppImpl
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change
> from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
> 2019-05-07 15:58:02,731 INFO [RM Event dispatcher]
> resourcemanager.ApplicationMasterService
> (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app
> attempt : appattempt_1557237478804_0001_000001
> 2019-05-07 15:58:02,732 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl
> (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_000001
> State change from NEW to SUBMITTED on event = START
> 2019-05-07 15:58:02,746 INFO [SchedulerEventDispatcher:Event Processor]
> scheduler.SchedulerApplicationAttempt
> (SchedulerApplicationAttempt.java:<init>(207)) - *** In the constructor of
> SchedulerApplicationAttempt
> 2019-05-07 15:58:02,747 INFO [SchedulerEventDispatcher:Event Processor]
> scheduler.SchedulerApplicationAttempt
> (SchedulerApplicationAttempt.java:<init>(230)) - *** Contents of
> appSchedulingInfo: []
> 2019-05-07 15:58:02,752 INFO [SchedulerEventDispatcher:Event Processor]
> fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added
> Application Attempt appattempt_1557237478804_0001_000001 to scheduler from
> user: bacskop
> 2019-05-07 15:58:02,756 INFO [RM Event dispatcher]
> scheduler.AppSchedulingInfo
> (AppSchedulingInfo.java:updatePendingResources(257)) - *** Adding scheduler
> key: SchedulerRequestKey{priority=0, allocationRequestId=-1,
> containerToUpdate=null} for attempt: appattempt_1557237478804_0001_000001
> 2019-05-07 15:58:02,759 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl
> (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_000001
> State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
> 2019-05-07 15:58:02,892 INFO [main] impl.YarnClientImpl
> (YarnClientImpl.java:submitApplication(310)) - Submitted application
> application_1557237478804_0001
> {noformat}
> (some extra lines are printed with ***).
> So at 15:58:02,747 the set is empty and populated with a single element at
> 15:58:02,756 on "RM Event dispatcher". This means there's a tiny time window
> during which a {{NODE_UPDATE}} can cause a {{NoSuchElementException}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]