[jira] [Commented] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in a terminal state
[ https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934797#comment-16934797 ] Wangda Tan commented on YARN-4946: -- I would still prefer to revert the patch. But due to my bandwidth, I hope to get someone to help review details of the reverting patch and related fields before making a decision. cc: [~snemeth] , [~sunil.gov...@gmail.com] , [~Prabhu Joseph] > RM should not consider an application as COMPLETED when log aggregation is > not in a terminal state > -- > > Key: YARN-4946 > URL: https://issues.apache.org/jira/browse/YARN-4946 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-4946.001.patch, YARN-4946.002.patch, > YARN-4946.003.patch, YARN-4946.004.patch > > > MAPREDUCE-6415 added a tool that combines the aggregated log files for each > Yarn App into a HAR file. When run, it seeds the list by looking at the > aggregated logs directory, and then filters out ineligible apps. One of the > criteria involves checking with the RM that an Application's log aggregation > status is not still running and has not failed. When the RM "forgets" about > an older completed Application (e.g. RM failover, enough time has passed, > etc), the tool won't find the Application in the RM and will just assume that > its log aggregation succeeded, even if it actually failed or is still running. > We can solve this problem by doing the following: > The RM should not consider an app to be fully completed (and thus removed > from its history) until the aggregation status has reached a terminal state > (e.g. SUCCEEDED, FAILED, TIME_OUT). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9846) Use Finer-Grain Synchronization in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934761#comment-16934761 ] Jim Brennan commented on YARN-9846: --- [~belugabehr] thanks for the patch, but can you provide some background on what motivated this change? It's not clear to me that the new approach is actually better in this case. In the handle() and cleanupPrivLocalizers() methods, you are now acquiring two locks instead of one. And in processHeartbeat() we are no longer holding the privLocalizers lock while calling the localizer.processHeartbeat() - I'm not sure if that will break anything, but the localization code is pretty fragile so I'd be careful. I personally find the refactoring of LocalizerTracker.handle() to be less readable than the original, but that may just be a style issue. > Use Finer-Grain Synchronization in ResourceLocalizationService > -- > > Key: YARN-9846 > URL: https://issues.apache.org/jira/browse/YARN-9846 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9846.1.patch, YARN-9846.2.patch, YARN-9846.3.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788 > # Remove these synchronization blocks > # Ensure {{recentlyCleanedLocalizers}} is thread safe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934755#comment-16934755 ] Íñigo Goiri commented on YARN-9697: --- Anyway we can make this smaller? It's taking me a while to figure where the actual changes are. > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.ut.patch, YARN-9697.ut2.patch, > YARN-9697.wip1.patch, YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException
[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934624#comment-16934624 ] Hadoop QA commented on YARN-9552: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 43s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} branch-3.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 42s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}148m 12s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:080e9d0f9b3 | | JIRA Issue | YARN-9552 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12980910/YARN-9552-branch-3.1.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f57f76c44878 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.1 / 6ef3204 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24815/testReport/ | | Max. process+thread count | 799 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24815/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. >
[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934601#comment-16934601 ] Peter Bacsko commented on YARN-9840: [~maniraj...@gmail.com] if you already have a patch, feel free to take this over and then I'll review it. > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9846) Use Finer-Grain Synchronization in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934543#comment-16934543 ] Hadoop QA commented on YARN-9846: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 1m 36s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 101 unchanged - 4 fixed = 101 total (was 105) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 0m 25s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 53s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 25s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:39e82acc485 | | JIRA Issue | YARN-9846 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12980907/YARN-9846.3.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f83c6a483060 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1654497 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24816/testReport/ | | Max. process+thread count | 165 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output |
[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934499#comment-16934499 ] Manikandan R commented on YARN-9840: [~pbacsko] I have a patch to address this. Can I post the same? > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException
[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9552: --- Attachment: YARN-9552-branch-3.1.002.patch > FairScheduler: NODE_UPDATE can cause NoSuchElementException > --- > > Key: YARN-9552 > URL: https://issues.apache.org/jira/browse/YARN-9552 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9552-001.patch, YARN-9552-002.patch, > YARN-9552-003.patch, YARN-9552-004.patch, YARN-9552-branch-3.1.001.patch, > YARN-9552-branch-3.1.002.patch, YARN-9552-branch-3.2.001.patch, > YARN-9552-branch-3.2.002.patch > > > We observed a race condition inside YARN with the following stack trace: > {noformat} > 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR > EventDispatcher: Error in handling event type NODE_UPDATE to the Event > Dispatcher > java.util.NoSuchElementException > at > java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036) > at > java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This is basically the same as the one described in YARN-7382, but the root > cause is different. > When we create an application attempt, we create an {{FSAppAttempt}} object. > This contains an {{AppSchedulingInfo}} which contains a set of > {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a > bit later on a separate thread during a state transition: > {noformat} > 2019-05-07 15:58:02,659 INFO [RM StateStore dispatcher] > recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for > app: application_1557237478804_0001 > 2019-05-07 15:58:02,684 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED > 2019-05-07 15:58:02,690 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted > application application_1557237478804_0001 from user: bacskop, in queue: > root.bacskop, currently num of applications: 1 > 2019-05-07 15:58:02,698 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from SUBMITTED to ACCEPTED on event = APP_ACCEPTED > 2019-05-07 15:58:02,731 INFO [RM Event dispatcher] > resourcemanager.ApplicationMasterService > (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app > attempt : appattempt_1557237478804_0001_01 > 2019-05-07 15:58:02,732 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 > State change from NEW to SUBMITTED on event = START > 2019-05-07 15:58:02,746 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of > SchedulerApplicationAttempt > 2019-05-07 15:58:02,747 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(230)) - *** Contents of > appSchedulingInfo: [] > 2019-05-07 15:58:02,752 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler
[jira] [Commented] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException
[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934415#comment-16934415 ] Hadoop QA commented on YARN-9552: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 2s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 28s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} branch-3.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 26s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 21 unchanged - 0 fixed = 22 total (was 21) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 57s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 33s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}133m 20s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:080e9d0f9b3 | | JIRA Issue | YARN-9552 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12980879/YARN-9552-branch-3.1.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a52ebeeb2b8f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.1 / 6ef3204 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/24814/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit |
[jira] [Updated] (YARN-9846) Use Finer-Grain Synchronization in ResourceLocalizationService
[ https://issues.apache.org/jira/browse/YARN-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated YARN-9846: - Attachment: YARN-9846.3.patch > Use Finer-Grain Synchronization in ResourceLocalizationService > -- > > Key: YARN-9846 > URL: https://issues.apache.org/jira/browse/YARN-9846 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9846.1.patch, YARN-9846.2.patch, YARN-9846.3.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788 > # Remove these synchronization blocks > # Ensure {{recentlyCleanedLocalizers}} is thread safe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7817) Add Resource reference to RM's NodeInfo object so REST API can get non memory/vcore resource usages.
[ https://issues.apache.org/jira/browse/YARN-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934389#comment-16934389 ] Eric Payne commented on YARN-7817: -- I backported YARN-7817 and YARN-7860 to branch-2. > Add Resource reference to RM's NodeInfo object so REST API can get non > memory/vcore resource usages. > > > Key: YARN-7817 > URL: https://issues.apache.org/jira/browse/YARN-7817 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sumana Sathish >Assignee: Sunil Govindan >Priority: Major > Fix For: 3.1.0, 2.10.0 > > Attachments: Screen Shot 2018-01-25 at 11.59.31 PM.png, > YARN-7817.001.patch, YARN-7817.002.patch, YARN-7817.003.patch, > YARN-7817.004.patch, YARN-7817.005.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException
[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9552: --- Attachment: YARN-9552-branch-3.2.002.patch > FairScheduler: NODE_UPDATE can cause NoSuchElementException > --- > > Key: YARN-9552 > URL: https://issues.apache.org/jira/browse/YARN-9552 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9552-001.patch, YARN-9552-002.patch, > YARN-9552-003.patch, YARN-9552-004.patch, YARN-9552-branch-3.1.001.patch, > YARN-9552-branch-3.2.001.patch, YARN-9552-branch-3.2.002.patch > > > We observed a race condition inside YARN with the following stack trace: > {noformat} > 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR > EventDispatcher: Error in handling event type NODE_UPDATE to the Event > Dispatcher > java.util.NoSuchElementException > at > java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036) > at > java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This is basically the same as the one described in YARN-7382, but the root > cause is different. > When we create an application attempt, we create an {{FSAppAttempt}} object. > This contains an {{AppSchedulingInfo}} which contains a set of > {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a > bit later on a separate thread during a state transition: > {noformat} > 2019-05-07 15:58:02,659 INFO [RM StateStore dispatcher] > recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for > app: application_1557237478804_0001 > 2019-05-07 15:58:02,684 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED > 2019-05-07 15:58:02,690 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted > application application_1557237478804_0001 from user: bacskop, in queue: > root.bacskop, currently num of applications: 1 > 2019-05-07 15:58:02,698 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from SUBMITTED to ACCEPTED on event = APP_ACCEPTED > 2019-05-07 15:58:02,731 INFO [RM Event dispatcher] > resourcemanager.ApplicationMasterService > (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app > attempt : appattempt_1557237478804_0001_01 > 2019-05-07 15:58:02,732 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 > State change from NEW to SUBMITTED on event = START > 2019-05-07 15:58:02,746 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of > SchedulerApplicationAttempt > 2019-05-07 15:58:02,747 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(230)) - *** Contents of > appSchedulingInfo: [] > 2019-05-07 15:58:02,752 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added >
[jira] [Updated] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException
[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9552: --- Attachment: (was: YARN-9552-branch-3.2.002.patch) > FairScheduler: NODE_UPDATE can cause NoSuchElementException > --- > > Key: YARN-9552 > URL: https://issues.apache.org/jira/browse/YARN-9552 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9552-001.patch, YARN-9552-002.patch, > YARN-9552-003.patch, YARN-9552-004.patch, YARN-9552-branch-3.1.001.patch, > YARN-9552-branch-3.2.001.patch, YARN-9552-branch-3.2.002.patch > > > We observed a race condition inside YARN with the following stack trace: > {noformat} > 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR > EventDispatcher: Error in handling event type NODE_UPDATE to the Event > Dispatcher > java.util.NoSuchElementException > at > java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036) > at > java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This is basically the same as the one described in YARN-7382, but the root > cause is different. > When we create an application attempt, we create an {{FSAppAttempt}} object. > This contains an {{AppSchedulingInfo}} which contains a set of > {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a > bit later on a separate thread during a state transition: > {noformat} > 2019-05-07 15:58:02,659 INFO [RM StateStore dispatcher] > recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for > app: application_1557237478804_0001 > 2019-05-07 15:58:02,684 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED > 2019-05-07 15:58:02,690 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted > application application_1557237478804_0001 from user: bacskop, in queue: > root.bacskop, currently num of applications: 1 > 2019-05-07 15:58:02,698 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from SUBMITTED to ACCEPTED on event = APP_ACCEPTED > 2019-05-07 15:58:02,731 INFO [RM Event dispatcher] > resourcemanager.ApplicationMasterService > (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app > attempt : appattempt_1557237478804_0001_01 > 2019-05-07 15:58:02,732 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 > State change from NEW to SUBMITTED on event = START > 2019-05-07 15:58:02,746 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of > SchedulerApplicationAttempt > 2019-05-07 15:58:02,747 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(230)) - *** Contents of > appSchedulingInfo: [] > 2019-05-07 15:58:02,752 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) -
[jira] [Updated] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException
[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9552: --- Attachment: YARN-9552-branch-3.2.002.patch > FairScheduler: NODE_UPDATE can cause NoSuchElementException > --- > > Key: YARN-9552 > URL: https://issues.apache.org/jira/browse/YARN-9552 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9552-001.patch, YARN-9552-002.patch, > YARN-9552-003.patch, YARN-9552-004.patch, YARN-9552-branch-3.1.001.patch, > YARN-9552-branch-3.2.001.patch, YARN-9552-branch-3.2.002.patch > > > We observed a race condition inside YARN with the following stack trace: > {noformat} > 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR > EventDispatcher: Error in handling event type NODE_UPDATE to the Event > Dispatcher > java.util.NoSuchElementException > at > java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036) > at > java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This is basically the same as the one described in YARN-7382, but the root > cause is different. > When we create an application attempt, we create an {{FSAppAttempt}} object. > This contains an {{AppSchedulingInfo}} which contains a set of > {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a > bit later on a separate thread during a state transition: > {noformat} > 2019-05-07 15:58:02,659 INFO [RM StateStore dispatcher] > recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for > app: application_1557237478804_0001 > 2019-05-07 15:58:02,684 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED > 2019-05-07 15:58:02,690 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted > application application_1557237478804_0001 from user: bacskop, in queue: > root.bacskop, currently num of applications: 1 > 2019-05-07 15:58:02,698 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from SUBMITTED to ACCEPTED on event = APP_ACCEPTED > 2019-05-07 15:58:02,731 INFO [RM Event dispatcher] > resourcemanager.ApplicationMasterService > (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app > attempt : appattempt_1557237478804_0001_01 > 2019-05-07 15:58:02,732 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 > State change from NEW to SUBMITTED on event = START > 2019-05-07 15:58:02,746 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of > SchedulerApplicationAttempt > 2019-05-07 15:58:02,747 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(230)) - *** Contents of > appSchedulingInfo: [] > 2019-05-07 15:58:02,752 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added >
[jira] [Commented] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException
[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934317#comment-16934317 ] Hadoop QA commented on YARN-9552: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 32s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 52s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} branch-3.2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 27s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 20 unchanged - 0 fixed = 21 total (was 20) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 0s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 58s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}128m 10s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt | | | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2 | | | hadoop.yarn.server.resourcemanager.metrics.TestCombinedSystemMetricsPublisher | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:63396beab41 | | JIRA Issue | YARN-9552 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12980861/YARN-9552-branch-3.2.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 08e1d14191d7 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.2 / b207244 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Created] (YARN-9849) Leaf queues not inheriting parent queue status after adding status as “RUNNING” and thereafter, commenting the same in capacity-scheduler.xml
Sushanta Sen created YARN-9849: -- Summary: Leaf queues not inheriting parent queue status after adding status as “RUNNING” and thereafter, commenting the same in capacity-scheduler.xml Key: YARN-9849 URL: https://issues.apache.org/jira/browse/YARN-9849 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Reporter: Sushanta Sen 【Precondition】: 1. Install the cluster 2. Config Queues with more numbers say 2 parent [default,q1] & 4 leaf [q2,q3] 3. Cluster should be up and running 【Test step】: 1.By default leaf quques inherit parent status 2.Change leaf queues status as "RUNNING" explicitly 3. Run refresh command, leaf queues status shown as "RUNNING" in CLI/UI 4. Therafter,change the leaft queues status to "STOPPED" and refresh command 5. Run refresh command, leaf queues status shown as "STOPPING" in CLI/UI 6. Now comment the leafy queues status and run refresh queues 7.Observe 【Expect Output】: The leaf queues status should be displayed as "RUNNING" inheriting from the parent queue. 【Actual Output】: Still displays the leaf queues status as "STOPPED" raather than inheriting the same from parent which is in RUNNING -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9849) Leaf queues not inheriting parent queue status after adding status as “RUNNING” and thereafter, commenting the same in capacity-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T reassigned YARN-9849: --- Assignee: Bilwa S T > Leaf queues not inheriting parent queue status after adding status as > “RUNNING” and thereafter, commenting the same in capacity-scheduler.xml > -- > > Key: YARN-9849 > URL: https://issues.apache.org/jira/browse/YARN-9849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sushanta Sen >Assignee: Bilwa S T >Priority: Major > > 【Precondition】: > 1. Install the cluster > 2. Config Queues with more numbers say 2 parent [default,q1] & 4 leaf [q2,q3] > 3. Cluster should be up and running > 【Test step】: > 1.By default leaf quques inherit parent status > 2.Change leaf queues status as "RUNNING" explicitly > 3. Run refresh command, leaf queues status shown as "RUNNING" in CLI/UI > 4. Therafter,change the leaft queues status to "STOPPED" and refresh command > 5. Run refresh command, leaf queues status shown as "STOPPING" in CLI/UI > 6. Now comment the leafy queues status and run refresh queues > 7.Observe > 【Expect Output】: > The leaf queues status should be displayed as "RUNNING" inheriting from the > parent queue. > 【Actual Output】: > Still displays the leaf queues status as "STOPPED" raather than inheriting > the same from parent which is in RUNNING -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException
[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9552: --- Attachment: YARN-9552-branch-3.1.001.patch > FairScheduler: NODE_UPDATE can cause NoSuchElementException > --- > > Key: YARN-9552 > URL: https://issues.apache.org/jira/browse/YARN-9552 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9552-001.patch, YARN-9552-002.patch, > YARN-9552-003.patch, YARN-9552-004.patch, YARN-9552-branch-3.1.001.patch, > YARN-9552-branch-3.2.001.patch > > > We observed a race condition inside YARN with the following stack trace: > {noformat} > 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR > EventDispatcher: Error in handling event type NODE_UPDATE to the Event > Dispatcher > java.util.NoSuchElementException > at > java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036) > at > java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This is basically the same as the one described in YARN-7382, but the root > cause is different. > When we create an application attempt, we create an {{FSAppAttempt}} object. > This contains an {{AppSchedulingInfo}} which contains a set of > {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a > bit later on a separate thread during a state transition: > {noformat} > 2019-05-07 15:58:02,659 INFO [RM StateStore dispatcher] > recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for > app: application_1557237478804_0001 > 2019-05-07 15:58:02,684 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED > 2019-05-07 15:58:02,690 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted > application application_1557237478804_0001 from user: bacskop, in queue: > root.bacskop, currently num of applications: 1 > 2019-05-07 15:58:02,698 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from SUBMITTED to ACCEPTED on event = APP_ACCEPTED > 2019-05-07 15:58:02,731 INFO [RM Event dispatcher] > resourcemanager.ApplicationMasterService > (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app > attempt : appattempt_1557237478804_0001_01 > 2019-05-07 15:58:02,732 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 > State change from NEW to SUBMITTED on event = START > 2019-05-07 15:58:02,746 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of > SchedulerApplicationAttempt > 2019-05-07 15:58:02,747 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(230)) - *** Contents of > appSchedulingInfo: [] > 2019-05-07 15:58:02,752 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added > Application Attempt
[jira] [Updated] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException
[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9552: --- Attachment: YARN-9552-branch-3.2.001.patch > FairScheduler: NODE_UPDATE can cause NoSuchElementException > --- > > Key: YARN-9552 > URL: https://issues.apache.org/jira/browse/YARN-9552 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9552-001.patch, YARN-9552-002.patch, > YARN-9552-003.patch, YARN-9552-004.patch, YARN-9552-branch-3.2.001.patch > > > We observed a race condition inside YARN with the following stack trace: > {noformat} > 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR > EventDispatcher: Error in handling event type NODE_UPDATE to the Event > Dispatcher > java.util.NoSuchElementException > at > java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036) > at > java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This is basically the same as the one described in YARN-7382, but the root > cause is different. > When we create an application attempt, we create an {{FSAppAttempt}} object. > This contains an {{AppSchedulingInfo}} which contains a set of > {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a > bit later on a separate thread during a state transition: > {noformat} > 2019-05-07 15:58:02,659 INFO [RM StateStore dispatcher] > recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for > app: application_1557237478804_0001 > 2019-05-07 15:58:02,684 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED > 2019-05-07 15:58:02,690 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted > application application_1557237478804_0001 from user: bacskop, in queue: > root.bacskop, currently num of applications: 1 > 2019-05-07 15:58:02,698 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from SUBMITTED to ACCEPTED on event = APP_ACCEPTED > 2019-05-07 15:58:02,731 INFO [RM Event dispatcher] > resourcemanager.ApplicationMasterService > (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app > attempt : appattempt_1557237478804_0001_01 > 2019-05-07 15:58:02,732 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 > State change from NEW to SUBMITTED on event = START > 2019-05-07 15:58:02,746 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of > SchedulerApplicationAttempt > 2019-05-07 15:58:02,747 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(230)) - *** Contents of > appSchedulingInfo: [] > 2019-05-07 15:58:02,752 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added > Application Attempt appattempt_1557237478804_0001_01 to
[jira] [Reopened] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException
[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko reopened YARN-9552: > FairScheduler: NODE_UPDATE can cause NoSuchElementException > --- > > Key: YARN-9552 > URL: https://issues.apache.org/jira/browse/YARN-9552 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9552-001.patch, YARN-9552-002.patch, > YARN-9552-003.patch, YARN-9552-004.patch, YARN-9552-branch-3.2.001.patch > > > We observed a race condition inside YARN with the following stack trace: > {noformat} > 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR > EventDispatcher: Error in handling event type NODE_UPDATE to the Event > Dispatcher > java.util.NoSuchElementException > at > java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036) > at > java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This is basically the same as the one described in YARN-7382, but the root > cause is different. > When we create an application attempt, we create an {{FSAppAttempt}} object. > This contains an {{AppSchedulingInfo}} which contains a set of > {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a > bit later on a separate thread during a state transition: > {noformat} > 2019-05-07 15:58:02,659 INFO [RM StateStore dispatcher] > recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for > app: application_1557237478804_0001 > 2019-05-07 15:58:02,684 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED > 2019-05-07 15:58:02,690 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted > application application_1557237478804_0001 from user: bacskop, in queue: > root.bacskop, currently num of applications: 1 > 2019-05-07 15:58:02,698 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from SUBMITTED to ACCEPTED on event = APP_ACCEPTED > 2019-05-07 15:58:02,731 INFO [RM Event dispatcher] > resourcemanager.ApplicationMasterService > (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app > attempt : appattempt_1557237478804_0001_01 > 2019-05-07 15:58:02,732 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 > State change from NEW to SUBMITTED on event = START > 2019-05-07 15:58:02,746 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of > SchedulerApplicationAttempt > 2019-05-07 15:58:02,747 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(230)) - *** Contents of > appSchedulingInfo: [] > 2019-05-07 15:58:02,752 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added > Application Attempt appattempt_1557237478804_0001_01 to scheduler from > user: bacskop > 2019-05-07
[jira] [Commented] (YARN-9847) ZKRMStateStore will cause zk connection loss when writing huge data into znode
[ https://issues.apache.org/jira/browse/YARN-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934127#comment-16934127 ] Hadoop QA commented on YARN-9847: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 31s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 1 unchanged - 0 fixed = 3 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 28s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 3s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}150m 19s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | Invocation of toString on java.util.Arrays.copyOf(byte[], int) in org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.trimAttempStateData(ApplicationAttemptStateData, int) At ZKRMStateStore.java:int) in org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.trimAttempStateData(ApplicationAttemptStateData, int) At ZKRMStateStore.java:[line 905] | | | Found reliance on default encoding in org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.trimAttempStateData(ApplicationAttemptStateData, int):in org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.trimAttempStateData(ApplicationAttemptStateData, int): String.getBytes() At ZKRMStateStore.java:[line 901] | | | Possible null pointer dereference of null in
[jira] [Commented] (YARN-9847) ZKRMStateStore will cause zk connection loss when writing huge data into znode
[ https://issues.apache.org/jira/browse/YARN-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934125#comment-16934125 ] Zhankun Tang commented on YARN-9847: [~suxingfate], thanks for reporting this. This is interesting. One question is that will this truncate affect the state recovery? > ZKRMStateStore will cause zk connection loss when writing huge data into znode > -- > > Key: YARN-9847 > URL: https://issues.apache.org/jira/browse/YARN-9847 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wang, Xinglong >Assignee: Wang, Xinglong >Priority: Minor > Attachments: YARN-9847.001.patch > > > Recently, we encountered RM ZK connection issue due to RM was trying to write > huge data into znode. This behavior will zk report Len error and then cause > zk session connection loss. And eventually RM would crash due to zk > connection issue. > *The fix* > In order to protect ResouceManager from crash due to this. > This fix is trying to limit the size of data for attemp by limiting the > diagnostic info when writing ApplicationAttemptStateData into znode. The size > will be regulated by -Djute.maxbuffer set in yarn-env.sh. The same value will > be also used by zookeeper server. > *The story* > ResourceManager Log > {code:java} > 2019-07-29 02:14:59,638 WARN org.apache.zookeeper.ClientCnxn: Session > 0x36ab902369100a0 for serverabc-zk-5.vip.ebay.com/10.210.82.29:2181, > unexpected error, closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) > 2019-07-29 04:27:35,459 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:1050) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:699) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:299) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at >
[jira] [Commented] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException
[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934105#comment-16934105 ] Steven Rand commented on YARN-9552: --- Thanks! > FairScheduler: NODE_UPDATE can cause NoSuchElementException > --- > > Key: YARN-9552 > URL: https://issues.apache.org/jira/browse/YARN-9552 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9552-001.patch, YARN-9552-002.patch, > YARN-9552-003.patch, YARN-9552-004.patch > > > We observed a race condition inside YARN with the following stack trace: > {noformat} > 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR > EventDispatcher: Error in handling event type NODE_UPDATE to the Event > Dispatcher > java.util.NoSuchElementException > at > java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036) > at > java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This is basically the same as the one described in YARN-7382, but the root > cause is different. > When we create an application attempt, we create an {{FSAppAttempt}} object. > This contains an {{AppSchedulingInfo}} which contains a set of > {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a > bit later on a separate thread during a state transition: > {noformat} > 2019-05-07 15:58:02,659 INFO [RM StateStore dispatcher] > recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for > app: application_1557237478804_0001 > 2019-05-07 15:58:02,684 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED > 2019-05-07 15:58:02,690 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted > application application_1557237478804_0001 from user: bacskop, in queue: > root.bacskop, currently num of applications: 1 > 2019-05-07 15:58:02,698 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from SUBMITTED to ACCEPTED on event = APP_ACCEPTED > 2019-05-07 15:58:02,731 INFO [RM Event dispatcher] > resourcemanager.ApplicationMasterService > (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app > attempt : appattempt_1557237478804_0001_01 > 2019-05-07 15:58:02,732 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 > State change from NEW to SUBMITTED on event = START > 2019-05-07 15:58:02,746 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of > SchedulerApplicationAttempt > 2019-05-07 15:58:02,747 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:(230)) - *** Contents of > appSchedulingInfo: [] > 2019-05-07 15:58:02,752 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added > Application Attempt appattempt_1557237478804_0001_01 to scheduler from > user: bacskop >
[jira] [Commented] (YARN-9848) revert YARN-4946
[ https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934095#comment-16934095 ] Steven Rand commented on YARN-9848: --- Attached a patch which reverts YARN-4946 on trunk. The revert applied cleanly to the logic in {{RMAppManager}}, but had several conflicts in {{TestAppManager}}. Tagging [~ccondit], [~wangda], [~rkanter], [~snemeth] > revert YARN-4946 > > > Key: YARN-9848 > URL: https://issues.apache.org/jira/browse/YARN-9848 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, resourcemanager >Reporter: Steven Rand >Priority: Major > Attachments: YARN-9848-01.patch > > > In YARN-4946, we've been discussing a revert due to the potential for keeping > more applications in the state store than desired, and the potential to > greatly increase RM recovery times. > > I'm in favor of reverting the patch, but other ideas along the lines of > YARN-9571 would work as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in a terminal state
[ https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934091#comment-16934091 ] Steven Rand commented on YARN-4946: --- I created YARN-9848 for reverting. > RM should not consider an application as COMPLETED when log aggregation is > not in a terminal state > -- > > Key: YARN-4946 > URL: https://issues.apache.org/jira/browse/YARN-4946 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-4946.001.patch, YARN-4946.002.patch, > YARN-4946.003.patch, YARN-4946.004.patch > > > MAPREDUCE-6415 added a tool that combines the aggregated log files for each > Yarn App into a HAR file. When run, it seeds the list by looking at the > aggregated logs directory, and then filters out ineligible apps. One of the > criteria involves checking with the RM that an Application's log aggregation > status is not still running and has not failed. When the RM "forgets" about > an older completed Application (e.g. RM failover, enough time has passed, > etc), the tool won't find the Application in the RM and will just assume that > its log aggregation succeeded, even if it actually failed or is still running. > We can solve this problem by doing the following: > The RM should not consider an app to be fully completed (and thus removed > from its history) until the aggregation status has reached a terminal state > (e.g. SUCCEEDED, FAILED, TIME_OUT). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9848) revert YARN-4946
[ https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-9848: -- Attachment: YARN-9848-01.patch > revert YARN-4946 > > > Key: YARN-9848 > URL: https://issues.apache.org/jira/browse/YARN-9848 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, resourcemanager >Reporter: Steven Rand >Priority: Major > Attachments: YARN-9848-01.patch > > > In YARN-4946, we've been discussing a revert due to the potential for keeping > more applications in the state store than desired, and the potential to > greatly increase RM recovery times. > > I'm in favor of reverting the patch, but other ideas along the lines of > YARN-9571 would work as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org