[jira] [Updated] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce
[ https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1893: -- Attachment: YARN-1893.1.patch Submit the same patch to kick Jenkins again. Make ApplicationMasterProtocol#allocate AtMostOnce -- Key: YARN-1893 URL: https://issues.apache.org/jira/browse/YARN-1893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1893.1.patch, YARN-1893.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce
[ https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954593#comment-13954593 ] Jian He commented on YARN-1893: --- One nit, you can use addPersistedPassword instead of creating a new getPasswords method in AMRMTokenSecretManager. Make ApplicationMasterProtocol#allocate AtMostOnce -- Key: YARN-1893 URL: https://issues.apache.org/jira/browse/YARN-1893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1893.1.patch, YARN-1893.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1895) Add testcases to test AMRMToken on HA
Xuan Gong created YARN-1895: --- Summary: Add testcases to test AMRMToken on HA Key: YARN-1895 URL: https://issues.apache.org/jira/browse/YARN-1895 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce
[ https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954598#comment-13954598 ] Xuan Gong commented on YARN-1893: - bq. Seems we never had a test for testing AMRMToken works in failover scenario. To unblock this , can you at least manually test AMRMToken works in failover? We can add the test separately. Right. Create a separate ticket to track this: https://issues.apache.org/jira/browse/YARN-1895 Do the manual test to verify AMRMToken works in failover bq. One nit, you can use addPersistedPassword instead of creating a new getPasswords method in AMRMTokenSecretManager. Right. Create a new patch to fix it. Make ApplicationMasterProtocol#allocate AtMostOnce -- Key: YARN-1893 URL: https://issues.apache.org/jira/browse/YARN-1893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1893.1.patch, YARN-1893.1.patch, YARN-1893.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce
[ https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1893: Attachment: YARN-1893.2.patch Make ApplicationMasterProtocol#allocate AtMostOnce -- Key: YARN-1893 URL: https://issues.apache.org/jira/browse/YARN-1893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1893.1.patch, YARN-1893.1.patch, YARN-1893.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce
[ https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954599#comment-13954599 ] Hadoop QA commented on YARN-1893: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12637692/YARN-1893.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3489//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3489//console This message is automatically generated. Make ApplicationMasterProtocol#allocate AtMostOnce -- Key: YARN-1893 URL: https://issues.apache.org/jira/browse/YARN-1893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1893.1.patch, YARN-1893.1.patch, YARN-1893.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce
[ https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954600#comment-13954600 ] Jian He commented on YARN-1893: --- +1, will commit, once Jenkins says ok. Make ApplicationMasterProtocol#allocate AtMostOnce -- Key: YARN-1893 URL: https://issues.apache.org/jira/browse/YARN-1893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1893.1.patch, YARN-1893.1.patch, YARN-1893.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce
[ https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954615#comment-13954615 ] Hadoop QA commented on YARN-1893: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12637696/YARN-1893.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3490//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3490//console This message is automatically generated. Make ApplicationMasterProtocol#allocate AtMostOnce -- Key: YARN-1893 URL: https://issues.apache.org/jira/browse/YARN-1893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1893.1.patch, YARN-1893.1.patch, YARN-1893.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1894) RM shutdown due to java.net.UnknownHostException
[ https://issues.apache.org/jira/browse/YARN-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954629#comment-13954629 ] Mohammad Kamrul Islam commented on YARN-1894: - Thanks [~jianhe] and [~vinodkv] for pointing this and taking care of this. RM shutdown due to java.net.UnknownHostException Key: YARN-1894 URL: https://issues.apache.org/jira/browse/YARN-1894 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Fix For: 2.4.0 Background: I started Hadoop 2.3 on my Mac in my office network and submitted few jobs successfully. When i went to my home (new network), I submitted another job and it abruptly pulled down the RM service. Error in RM log: {noformat} 2014-03-29 12:28:56,754 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing RMDelegation token with sequence number: 3 2014-03-29 12:28:57,256 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.IllegalArgumentException: java.net.UnknownHostException: mislam-mn.MY.OOFICE.DOMAIN at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247) at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1342) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1208) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1167) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:868) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:642) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:556) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:696) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:740) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:88) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:543) at java.lang.Thread.run(Thread.java:695) Caused by: java.net.UnknownHostException: mislam-mn.linkedin.biz ... 15 more 2014-03-29 12:28:57,259 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2014-03-29 12:28:57,297 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:8088 2014-03-29 12:28:57,401 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032 2014-03-29 12:28:57,473 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033 . {noformat} Proposal: --- I believe the root cause : I moved my machine from one network to another with the same RM service. My point is: Whatever the cause, RM is a long running core-service and it should not exit this way. An appropriate error message should be sufficient. If there is an consensus (or no disagreement), I can work for a patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1017) Document RM Restart feature
[ https://issues.apache.org/jira/browse/YARN-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954651#comment-13954651 ] Hudson commented on YARN-1017: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #524 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/524/]) YARN-1017. Added documentation for ResourceManager Restart. (jianhe) (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582913) * /hadoop/common/trunk/hadoop-project/src/site/site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRestart.apt.vm Document RM Restart feature --- Key: YARN-1017 URL: https://issues.apache.org/jira/browse/YARN-1017 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Priority: Blocker Fix For: 2.4.0 Attachments: rm-restart-doc-1.patch, rm-restart-doc-2.patch, rm-restart-doc-3.patch This should give users a general idea about how RM Restart works and how to use RM Restart -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1883) TestRMAdminService fails due to inconsistent entries in UserGroups
[ https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954655#comment-13954655 ] Hudson commented on YARN-1883: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #524 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/524/]) YARN-1883. TestRMAdminService fails due to inconsistent entries in UserGroups (Mit Desai via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582862) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java TestRMAdminService fails due to inconsistent entries in UserGroups -- Key: YARN-1883 URL: https://issues.apache.org/jira/browse/YARN-1883 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 Fix For: 3.0.0, 2.5.0 Attachments: YARN-1883.patch, YARN-1883.patch testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails with the following error: {noformat} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421) at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104) {noformat} Line Numbers will be inconsistent as I was testing to run it in a particular order. But the Line on which the failure occurs is {code} Assert.assertTrue(groupBefore.contains(test_group_A) groupBefore.contains(test_group_B) groupBefore.contains(test_group_C) groupBefore.size() == 3); {code} testRMInitialsWithFileSystemBasedConfigurationProvider() and testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes the list of userGroups. testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() tries to verify the groups before changing it and fails if testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made the changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1891) Document NodeManager health-monitoring
[ https://issues.apache.org/jira/browse/YARN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954656#comment-13954656 ] Hudson commented on YARN-1891: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #524 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/524/]) YARN-1891. Added documentation for NodeManager health-monitoring. Contributed by Varun Vasudev. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582891) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManager.apt.vm Document NodeManager health-monitoring -- Key: YARN-1891 URL: https://issues.apache.org/jira/browse/YARN-1891 Project: Hadoop YARN Issue Type: Task Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Minor Fix For: 2.4.0 Attachments: apache-yarn-1891.0.patch Start documenting node manager starting with the health monitoring. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1017) Document RM Restart feature
[ https://issues.apache.org/jira/browse/YARN-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954684#comment-13954684 ] Hudson commented on YARN-1017: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1716 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1716/]) YARN-1017. Added documentation for ResourceManager Restart. (jianhe) (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582913) * /hadoop/common/trunk/hadoop-project/src/site/site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRestart.apt.vm Document RM Restart feature --- Key: YARN-1017 URL: https://issues.apache.org/jira/browse/YARN-1017 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Priority: Blocker Fix For: 2.4.0 Attachments: rm-restart-doc-1.patch, rm-restart-doc-2.patch, rm-restart-doc-3.patch This should give users a general idea about how RM Restart works and how to use RM Restart -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1883) TestRMAdminService fails due to inconsistent entries in UserGroups
[ https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954688#comment-13954688 ] Hudson commented on YARN-1883: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1716 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1716/]) YARN-1883. TestRMAdminService fails due to inconsistent entries in UserGroups (Mit Desai via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582862) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java TestRMAdminService fails due to inconsistent entries in UserGroups -- Key: YARN-1883 URL: https://issues.apache.org/jira/browse/YARN-1883 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 Fix For: 3.0.0, 2.5.0 Attachments: YARN-1883.patch, YARN-1883.patch testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails with the following error: {noformat} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421) at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104) {noformat} Line Numbers will be inconsistent as I was testing to run it in a particular order. But the Line on which the failure occurs is {code} Assert.assertTrue(groupBefore.contains(test_group_A) groupBefore.contains(test_group_B) groupBefore.contains(test_group_C) groupBefore.size() == 3); {code} testRMInitialsWithFileSystemBasedConfigurationProvider() and testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes the list of userGroups. testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() tries to verify the groups before changing it and fails if testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made the changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1891) Document NodeManager health-monitoring
[ https://issues.apache.org/jira/browse/YARN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954689#comment-13954689 ] Hudson commented on YARN-1891: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1716 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1716/]) YARN-1891. Added documentation for NodeManager health-monitoring. Contributed by Varun Vasudev. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582891) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManager.apt.vm Document NodeManager health-monitoring -- Key: YARN-1891 URL: https://issues.apache.org/jira/browse/YARN-1891 Project: Hadoop YARN Issue Type: Task Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Minor Fix For: 2.4.0 Attachments: apache-yarn-1891.0.patch Start documenting node manager starting with the health monitoring. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce
[ https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954841#comment-13954841 ] Hudson commented on YARN-1893: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5433 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5433/]) YARN-1893. Mark AtMostOnce annotation to ApplicationMasterProtocol#allocate. Contributed by Xuan Gong. (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1583203) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationMasterProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceOnHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java Make ApplicationMasterProtocol#allocate AtMostOnce -- Key: YARN-1893 URL: https://issues.apache.org/jira/browse/YARN-1893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1893.1.patch, YARN-1893.1.patch, YARN-1893.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA reassigned YARN-1879: Assignee: Tsuyoshi OZAWA (was: Xuan Gong) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol --- Key: YARN-1879 URL: https://issues.apache.org/jira/browse/YARN-1879 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Tsuyoshi OZAWA Priority: Critical Attachments: YARN-1879.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954897#comment-13954897 ] Tsuyoshi OZAWA commented on YARN-1879: -- I apologize for the delay because I was busy last weekend. I'd like to tackle this problem, so I takes it back to me. Please let me know if you have a problem. Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol --- Key: YARN-1879 URL: https://issues.apache.org/jira/browse/YARN-1879 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Xuan Gong Priority: Critical Attachments: YARN-1879.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954920#comment-13954920 ] Tsuyoshi OZAWA commented on YARN-1879: -- {code} If the application has already registered/unregistered, duplicate register/unregister attempts can cause exceptions. {code} This is correct, and is a subtle problem. If a RPC client, AM, think it's fatal, AM stops. Currently ARMRMClientImpl#registerApplicationMaster() can throw exceptions and ApplicationMaster doesn't handle them, so it causes AM failure. We preserve the semantics, we should mark ApplicationMasterProtocol#registerApplicationMaster() as AtMostOnce. About ARMRMClientImpl#unregisterApplicationMaster(), AM just log the failure of RPC and continue to stop. Therefore, we can make ApplicationMasterProtocol#unregisterApplicationMaster() as Idempotent. Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol --- Key: YARN-1879 URL: https://issues.apache.org/jira/browse/YARN-1879 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Tsuyoshi OZAWA Priority: Critical Attachments: YARN-1879.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954924#comment-13954924 ] Jian He commented on YARN-1879: --- bq. About ARMRMClientImpl#unregisterApplicationMaster(), AM just log the failure of RPC and continue to stop. I suppose this is what MR-AM does today? we cannot assume each AM does the same. Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol --- Key: YARN-1879 URL: https://issues.apache.org/jira/browse/YARN-1879 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Tsuyoshi OZAWA Priority: Critical Attachments: YARN-1879.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954925#comment-13954925 ] Hong Zhiguo commented on YARN-1889: --- Hi, Fengdong, I didn't submit new patch yet and I'll do it now. Sorry I don't have enough time to do it in last weekend. avoid creating new objects on each fair scheduler call to AppSchedulable comparator --- Key: YARN-1889 URL: https://issues.apache.org/jira/browse/YARN-1889 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Hong Zhiguo Priority: Minor Attachments: YARN-1889.patch In fair scheduler, in each scheduling attempt, a full sort is performed on List of AppSchedulable, which invokes Comparator.compare method many times. Both FairShareComparator and DRFComparator call AppSchedulable.getWeights, and AppSchedulable.getPriority. A new ResourceWeights object is allocated on each call of getWeights, and the same for getPriority. This introduces a lot of pressure to GC because these methods are called very very frequently. Below test case shows improvement on performance and GC behaviour. The results show that the GC pressure during processing NodeUpdate is recuded half by this patch. The code to show the improvement: (Add it to TestFairScheduler.java) import java.lang.management.GarbageCollectorMXBean; import java.lang.management.ManagementFactory; public void printGCStats() { long totalGarbageCollections = 0; long garbageCollectionTime = 0; for(GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) { long count = gc.getCollectionCount(); if(count = 0) { totalGarbageCollections += count; } long time = gc.getCollectionTime(); if(time = 0) { garbageCollectionTime += time; } } System.out.println(Total Garbage Collections: + totalGarbageCollections); System.out.println(Total Garbage Collection Time (ms): + garbageCollectionTime); } @Test public void testImpactOnGC() throws Exception { scheduler.reinitialize(conf, resourceManager.getRMContext()); // Add nodes int numNode = 1; for (int i = 0; i numNode; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, host); NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); scheduler.handle(nodeEvent); assertEquals(1024 * 64 * (i+1), scheduler.getClusterCapacity().getMemory()); } assertEquals(numNode, scheduler.getNumClusterNodes()); assertEquals(1024 * 64 * numNode, scheduler.getClusterCapacity().getMemory()); // add apps, each app has 100 containers. int minReqSize = FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; int numApp = 8000; int priority = 1; for (int i = 1; i numApp + 1; ++i) { ApplicationAttemptId attemptId = createAppAttemptId(i, 1); AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( attemptId.getApplicationId(), queue1, user1); scheduler.handle(appAddedEvent); AppAttemptAddedSchedulerEvent attemptAddedEvent = new AppAttemptAddedSchedulerEvent(attemptId, false); scheduler.handle(attemptAddedEvent); createSchedulingRequestExistingApplication(minReqSize * 2, 1, priority, attemptId); } scheduler.update(); assertEquals(numApp, scheduler.getQueueManager().getLeafQueue(queue1, true) .getRunnableAppSchedulables().size()); System.out.println(GC stats before NodeUpdate processing:); printGCStats(); int hb_num = 5000; long start = System.nanoTime(); for (int i = 0; i hb_num; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, host); NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); scheduler.handle(nodeEvent); } long end = System.nanoTime(); System.out.printf(processing time for a NodeUpdate in average: %d us\n, (end - start)/(hb_num * 1000)); System.out.println(GC stats after NodeUpdate processing:); printGCStats(); } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-1889: -- Attachment: YARN-1889.patch avoid creating new objects on each fair scheduler call to AppSchedulable comparator --- Key: YARN-1889 URL: https://issues.apache.org/jira/browse/YARN-1889 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Hong Zhiguo Priority: Minor Attachments: YARN-1889.patch, YARN-1889.patch In fair scheduler, in each scheduling attempt, a full sort is performed on List of AppSchedulable, which invokes Comparator.compare method many times. Both FairShareComparator and DRFComparator call AppSchedulable.getWeights, and AppSchedulable.getPriority. A new ResourceWeights object is allocated on each call of getWeights, and the same for getPriority. This introduces a lot of pressure to GC because these methods are called very very frequently. Below test case shows improvement on performance and GC behaviour. The results show that the GC pressure during processing NodeUpdate is recuded half by this patch. The code to show the improvement: (Add it to TestFairScheduler.java) import java.lang.management.GarbageCollectorMXBean; import java.lang.management.ManagementFactory; public void printGCStats() { long totalGarbageCollections = 0; long garbageCollectionTime = 0; for(GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) { long count = gc.getCollectionCount(); if(count = 0) { totalGarbageCollections += count; } long time = gc.getCollectionTime(); if(time = 0) { garbageCollectionTime += time; } } System.out.println(Total Garbage Collections: + totalGarbageCollections); System.out.println(Total Garbage Collection Time (ms): + garbageCollectionTime); } @Test public void testImpactOnGC() throws Exception { scheduler.reinitialize(conf, resourceManager.getRMContext()); // Add nodes int numNode = 1; for (int i = 0; i numNode; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, host); NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); scheduler.handle(nodeEvent); assertEquals(1024 * 64 * (i+1), scheduler.getClusterCapacity().getMemory()); } assertEquals(numNode, scheduler.getNumClusterNodes()); assertEquals(1024 * 64 * numNode, scheduler.getClusterCapacity().getMemory()); // add apps, each app has 100 containers. int minReqSize = FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; int numApp = 8000; int priority = 1; for (int i = 1; i numApp + 1; ++i) { ApplicationAttemptId attemptId = createAppAttemptId(i, 1); AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( attemptId.getApplicationId(), queue1, user1); scheduler.handle(appAddedEvent); AppAttemptAddedSchedulerEvent attemptAddedEvent = new AppAttemptAddedSchedulerEvent(attemptId, false); scheduler.handle(attemptAddedEvent); createSchedulingRequestExistingApplication(minReqSize * 2, 1, priority, attemptId); } scheduler.update(); assertEquals(numApp, scheduler.getQueueManager().getLeafQueue(queue1, true) .getRunnableAppSchedulables().size()); System.out.println(GC stats before NodeUpdate processing:); printGCStats(); int hb_num = 5000; long start = System.nanoTime(); for (int i = 0; i hb_num; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, host); NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); scheduler.handle(nodeEvent); } long end = System.nanoTime(); System.out.printf(processing time for a NodeUpdate in average: %d us\n, (end - start)/(hb_num * 1000)); System.out.println(GC stats after NodeUpdate processing:); printGCStats(); } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954931#comment-13954931 ] Hadoop QA commented on YARN-1889: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12637758/YARN-1889.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3491//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3491//console This message is automatically generated. avoid creating new objects on each fair scheduler call to AppSchedulable comparator --- Key: YARN-1889 URL: https://issues.apache.org/jira/browse/YARN-1889 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Hong Zhiguo Priority: Minor Attachments: YARN-1889.patch, YARN-1889.patch In fair scheduler, in each scheduling attempt, a full sort is performed on List of AppSchedulable, which invokes Comparator.compare method many times. Both FairShareComparator and DRFComparator call AppSchedulable.getWeights, and AppSchedulable.getPriority. A new ResourceWeights object is allocated on each call of getWeights, and the same for getPriority. This introduces a lot of pressure to GC because these methods are called very very frequently. Below test case shows improvement on performance and GC behaviour. The results show that the GC pressure during processing NodeUpdate is recuded half by this patch. The code to show the improvement: (Add it to TestFairScheduler.java) import java.lang.management.GarbageCollectorMXBean; import java.lang.management.ManagementFactory; public void printGCStats() { long totalGarbageCollections = 0; long garbageCollectionTime = 0; for(GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) { long count = gc.getCollectionCount(); if(count = 0) { totalGarbageCollections += count; } long time = gc.getCollectionTime(); if(time = 0) { garbageCollectionTime += time; } } System.out.println(Total Garbage Collections: + totalGarbageCollections); System.out.println(Total Garbage Collection Time (ms): + garbageCollectionTime); } @Test public void testImpactOnGC() throws Exception { scheduler.reinitialize(conf, resourceManager.getRMContext()); // Add nodes int numNode = 1; for (int i = 0; i numNode; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, host); NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); scheduler.handle(nodeEvent); assertEquals(1024 * 64 * (i+1), scheduler.getClusterCapacity().getMemory()); } assertEquals(numNode, scheduler.getNumClusterNodes()); assertEquals(1024 * 64 * numNode, scheduler.getClusterCapacity().getMemory()); // add apps, each app has 100 containers. int minReqSize = FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; int numApp = 8000; int priority = 1; for (int i = 1; i numApp + 1; ++i) { ApplicationAttemptId attemptId = createAppAttemptId(i, 1); AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( attemptId.getApplicationId(), queue1, user1); scheduler.handle(appAddedEvent); AppAttemptAddedSchedulerEvent
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954943#comment-13954943 ] Tsuyoshi OZAWA commented on YARN-1879: -- {quote} I suppose this is what MR-AM does today? we cannot assume each AM does the same. {quote} Jian, Thank you for the point. I supposed DistributedShell's application master in the sentence. I checked that MRAppMaster stops because it doesn't retry when Exception occurs at server-side. Therefore, as you mentioned, we cannot assume each AM doesn't the same. We should make them AtMostOnce with RetryCache-like mechanism. I'll create a patch based on the discussion. Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol --- Key: YARN-1879 URL: https://issues.apache.org/jira/browse/YARN-1879 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Tsuyoshi OZAWA Priority: Critical Attachments: YARN-1879.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-1889: -- Labels: reviewed (was: ) avoid creating new objects on each fair scheduler call to AppSchedulable comparator --- Key: YARN-1889 URL: https://issues.apache.org/jira/browse/YARN-1889 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Hong Zhiguo Priority: Minor Labels: reviewed Attachments: YARN-1889.patch, YARN-1889.patch In fair scheduler, in each scheduling attempt, a full sort is performed on List of AppSchedulable, which invokes Comparator.compare method many times. Both FairShareComparator and DRFComparator call AppSchedulable.getWeights, and AppSchedulable.getPriority. A new ResourceWeights object is allocated on each call of getWeights, and the same for getPriority. This introduces a lot of pressure to GC because these methods are called very very frequently. Below test case shows improvement on performance and GC behaviour. The results show that the GC pressure during processing NodeUpdate is recuded half by this patch. The code to show the improvement: (Add it to TestFairScheduler.java) import java.lang.management.GarbageCollectorMXBean; import java.lang.management.ManagementFactory; public void printGCStats() { long totalGarbageCollections = 0; long garbageCollectionTime = 0; for(GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) { long count = gc.getCollectionCount(); if(count = 0) { totalGarbageCollections += count; } long time = gc.getCollectionTime(); if(time = 0) { garbageCollectionTime += time; } } System.out.println(Total Garbage Collections: + totalGarbageCollections); System.out.println(Total Garbage Collection Time (ms): + garbageCollectionTime); } @Test public void testImpactOnGC() throws Exception { scheduler.reinitialize(conf, resourceManager.getRMContext()); // Add nodes int numNode = 1; for (int i = 0; i numNode; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, host); NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); scheduler.handle(nodeEvent); assertEquals(1024 * 64 * (i+1), scheduler.getClusterCapacity().getMemory()); } assertEquals(numNode, scheduler.getNumClusterNodes()); assertEquals(1024 * 64 * numNode, scheduler.getClusterCapacity().getMemory()); // add apps, each app has 100 containers. int minReqSize = FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; int numApp = 8000; int priority = 1; for (int i = 1; i numApp + 1; ++i) { ApplicationAttemptId attemptId = createAppAttemptId(i, 1); AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( attemptId.getApplicationId(), queue1, user1); scheduler.handle(appAddedEvent); AppAttemptAddedSchedulerEvent attemptAddedEvent = new AppAttemptAddedSchedulerEvent(attemptId, false); scheduler.handle(attemptAddedEvent); createSchedulingRequestExistingApplication(minReqSize * 2, 1, priority, attemptId); } scheduler.update(); assertEquals(numApp, scheduler.getQueueManager().getLeafQueue(queue1, true) .getRunnableAppSchedulables().size()); System.out.println(GC stats before NodeUpdate processing:); printGCStats(); int hb_num = 5000; long start = System.nanoTime(); for (int i = 0; i hb_num; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, host); NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); scheduler.handle(nodeEvent); } long end = System.nanoTime(); System.out.printf(processing time for a NodeUpdate in average: %d us\n, (end - start)/(hb_num * 1000)); System.out.println(GC stats after NodeUpdate processing:); printGCStats(); } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator
[ https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954945#comment-13954945 ] Fengdong Yu commented on YARN-1889: --- The new patch looks good to me. avoid creating new objects on each fair scheduler call to AppSchedulable comparator --- Key: YARN-1889 URL: https://issues.apache.org/jira/browse/YARN-1889 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Hong Zhiguo Priority: Minor Labels: reviewed Attachments: YARN-1889.patch, YARN-1889.patch In fair scheduler, in each scheduling attempt, a full sort is performed on List of AppSchedulable, which invokes Comparator.compare method many times. Both FairShareComparator and DRFComparator call AppSchedulable.getWeights, and AppSchedulable.getPriority. A new ResourceWeights object is allocated on each call of getWeights, and the same for getPriority. This introduces a lot of pressure to GC because these methods are called very very frequently. Below test case shows improvement on performance and GC behaviour. The results show that the GC pressure during processing NodeUpdate is recuded half by this patch. The code to show the improvement: (Add it to TestFairScheduler.java) import java.lang.management.GarbageCollectorMXBean; import java.lang.management.ManagementFactory; public void printGCStats() { long totalGarbageCollections = 0; long garbageCollectionTime = 0; for(GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) { long count = gc.getCollectionCount(); if(count = 0) { totalGarbageCollections += count; } long time = gc.getCollectionTime(); if(time = 0) { garbageCollectionTime += time; } } System.out.println(Total Garbage Collections: + totalGarbageCollections); System.out.println(Total Garbage Collection Time (ms): + garbageCollectionTime); } @Test public void testImpactOnGC() throws Exception { scheduler.reinitialize(conf, resourceManager.getRMContext()); // Add nodes int numNode = 1; for (int i = 0; i numNode; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, host); NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node); scheduler.handle(nodeEvent); assertEquals(1024 * 64 * (i+1), scheduler.getClusterCapacity().getMemory()); } assertEquals(numNode, scheduler.getNumClusterNodes()); assertEquals(1024 * 64 * numNode, scheduler.getClusterCapacity().getMemory()); // add apps, each app has 100 containers. int minReqSize = FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB; int numApp = 8000; int priority = 1; for (int i = 1; i numApp + 1; ++i) { ApplicationAttemptId attemptId = createAppAttemptId(i, 1); AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent( attemptId.getApplicationId(), queue1, user1); scheduler.handle(appAddedEvent); AppAttemptAddedSchedulerEvent attemptAddedEvent = new AppAttemptAddedSchedulerEvent(attemptId, false); scheduler.handle(attemptAddedEvent); createSchedulingRequestExistingApplication(minReqSize * 2, 1, priority, attemptId); } scheduler.update(); assertEquals(numApp, scheduler.getQueueManager().getLeafQueue(queue1, true) .getRunnableAppSchedulables().size()); System.out.println(GC stats before NodeUpdate processing:); printGCStats(); int hb_num = 5000; long start = System.nanoTime(); for (int i = 0; i hb_num; ++i) { String host = String.format(192.1.%d.%d, i/256, i%256); RMNode node = MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, host); NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node); scheduler.handle(nodeEvent); } long end = System.nanoTime(); System.out.printf(processing time for a NodeUpdate in average: %d us\n, (end - start)/(hb_num * 1000)); System.out.println(GC stats after NodeUpdate processing:); printGCStats(); } -- This message was sent by Atlassian JIRA (v6.2#6252)