[jira] [Updated] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce

2014-03-30 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1893:
--

Attachment: YARN-1893.1.patch

Submit the same patch to kick Jenkins again.

 Make ApplicationMasterProtocol#allocate AtMostOnce
 --

 Key: YARN-1893
 URL: https://issues.apache.org/jira/browse/YARN-1893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1893.1.patch, YARN-1893.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce

2014-03-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954593#comment-13954593
 ] 

Jian He commented on YARN-1893:
---

One nit, you can use addPersistedPassword instead of creating a new 
getPasswords method in AMRMTokenSecretManager.

 Make ApplicationMasterProtocol#allocate AtMostOnce
 --

 Key: YARN-1893
 URL: https://issues.apache.org/jira/browse/YARN-1893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1893.1.patch, YARN-1893.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1895) Add testcases to test AMRMToken on HA

2014-03-30 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-1895:
---

 Summary: Add testcases to test AMRMToken on HA
 Key: YARN-1895
 URL: https://issues.apache.org/jira/browse/YARN-1895
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce

2014-03-30 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954598#comment-13954598
 ] 

Xuan Gong commented on YARN-1893:
-

bq. Seems we never had a test for testing AMRMToken works in failover scenario. 
To unblock this , can you at least manually test AMRMToken works in failover? 
We can add the test separately.

Right. Create a separate ticket to track this: 
https://issues.apache.org/jira/browse/YARN-1895
Do the manual test to verify AMRMToken works in failover

bq. One nit, you can use addPersistedPassword instead of creating a new 
getPasswords method in AMRMTokenSecretManager.

Right. Create a new patch to fix it.

 Make ApplicationMasterProtocol#allocate AtMostOnce
 --

 Key: YARN-1893
 URL: https://issues.apache.org/jira/browse/YARN-1893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1893.1.patch, YARN-1893.1.patch, YARN-1893.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce

2014-03-30 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1893:


Attachment: YARN-1893.2.patch

 Make ApplicationMasterProtocol#allocate AtMostOnce
 --

 Key: YARN-1893
 URL: https://issues.apache.org/jira/browse/YARN-1893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1893.1.patch, YARN-1893.1.patch, YARN-1893.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce

2014-03-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954599#comment-13954599
 ] 

Hadoop QA commented on YARN-1893:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12637692/YARN-1893.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3489//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3489//console

This message is automatically generated.

 Make ApplicationMasterProtocol#allocate AtMostOnce
 --

 Key: YARN-1893
 URL: https://issues.apache.org/jira/browse/YARN-1893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1893.1.patch, YARN-1893.1.patch, YARN-1893.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce

2014-03-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954600#comment-13954600
 ] 

Jian He commented on YARN-1893:
---

+1, will commit, once Jenkins says ok.

 Make ApplicationMasterProtocol#allocate AtMostOnce
 --

 Key: YARN-1893
 URL: https://issues.apache.org/jira/browse/YARN-1893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1893.1.patch, YARN-1893.1.patch, YARN-1893.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce

2014-03-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954615#comment-13954615
 ] 

Hadoop QA commented on YARN-1893:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12637696/YARN-1893.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3490//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3490//console

This message is automatically generated.

 Make ApplicationMasterProtocol#allocate AtMostOnce
 --

 Key: YARN-1893
 URL: https://issues.apache.org/jira/browse/YARN-1893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1893.1.patch, YARN-1893.1.patch, YARN-1893.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1894) RM shutdown due to java.net.UnknownHostException

2014-03-30 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954629#comment-13954629
 ] 

Mohammad Kamrul Islam commented on YARN-1894:
-

Thanks [~jianhe] and [~vinodkv] for pointing this and taking care of this.


 RM shutdown due to java.net.UnknownHostException
 

 Key: YARN-1894
 URL: https://issues.apache.org/jira/browse/YARN-1894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Fix For: 2.4.0


 Background:
 
 I started Hadoop 2.3 on my Mac in my office network and submitted few jobs 
 successfully. When i went to my home (new network), I submitted another job 
 and it abruptly pulled down the RM service.
 Error in RM log:
 {noformat}
 2014-03-29 12:28:56,754 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager:
  storing RMDelegation token with sequence number: 3
 2014-03-29 12:28:57,256 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.IllegalArgumentException: java.net.UnknownHostException: 
 mislam-mn.MY.OOFICE.DOMAIN
 at 
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
 at 
 org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1294)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1342)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1208)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1167)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:868)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:642)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:556)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:696)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:740)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:88)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:543)
 at java.lang.Thread.run(Thread.java:695)
 Caused by: java.net.UnknownHostException: mislam-mn.linkedin.biz
 ... 15 more
 2014-03-29 12:28:57,259 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2014-03-29 12:28:57,297 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@0.0.0.0:8088
 2014-03-29 12:28:57,401 INFO org.apache.hadoop.ipc.Server: Stopping server on 
 8032
 2014-03-29 12:28:57,473 INFO org.apache.hadoop.ipc.Server: Stopping server on 
 8033
 .
 {noformat}
 Proposal:
 ---
 I believe the root cause : I moved my machine from one network to another 
 with the same RM service.
 My point is: Whatever the cause, RM is a long running core-service and it 
 should not exit this way. An appropriate error message should be sufficient.
 If there is an consensus (or no disagreement), I can work for a patch.
   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1017) Document RM Restart feature

2014-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954651#comment-13954651
 ] 

Hudson commented on YARN-1017:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #524 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/524/])
YARN-1017. Added documentation for ResourceManager Restart. (jianhe) (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582913)
* /hadoop/common/trunk/hadoop-project/src/site/site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRestart.apt.vm


 Document RM Restart feature
 ---

 Key: YARN-1017
 URL: https://issues.apache.org/jira/browse/YARN-1017
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
Priority: Blocker
 Fix For: 2.4.0

 Attachments: rm-restart-doc-1.patch, rm-restart-doc-2.patch, 
 rm-restart-doc-3.patch


 This should give users a general idea about how RM Restart works and how to 
 use RM Restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1883) TestRMAdminService fails due to inconsistent entries in UserGroups

2014-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954655#comment-13954655
 ] 

Hudson commented on YARN-1883:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #524 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/524/])
YARN-1883. TestRMAdminService fails due to inconsistent entries in UserGroups 
(Mit Desai via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582862)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java


 TestRMAdminService fails due to inconsistent entries in UserGroups
 --

 Key: YARN-1883
 URL: https://issues.apache.org/jira/browse/YARN-1883
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1883.patch, YARN-1883.patch


 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails 
 with the following error:
 {noformat}
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104)
 {noformat}
 Line Numbers will be inconsistent as I was testing to run it in a particular 
 order. But the Line on which the failure occurs is
 {code}
 Assert.assertTrue(groupBefore.contains(test_group_A)
  groupBefore.contains(test_group_B)
  groupBefore.contains(test_group_C)  groupBefore.size() == 3);
 {code}
 testRMInitialsWithFileSystemBasedConfigurationProvider() and
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider()
 calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes 
 the list of userGroups.
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() 
 tries to verify the groups before changing it and fails if 
 testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made 
 the changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1891) Document NodeManager health-monitoring

2014-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954656#comment-13954656
 ] 

Hudson commented on YARN-1891:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #524 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/524/])
YARN-1891. Added documentation for NodeManager health-monitoring. Contributed 
by Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582891)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManager.apt.vm


 Document NodeManager health-monitoring
 --

 Key: YARN-1891
 URL: https://issues.apache.org/jira/browse/YARN-1891
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Minor
 Fix For: 2.4.0

 Attachments: apache-yarn-1891.0.patch


 Start documenting node manager starting with the health monitoring.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1017) Document RM Restart feature

2014-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954684#comment-13954684
 ] 

Hudson commented on YARN-1017:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1716 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1716/])
YARN-1017. Added documentation for ResourceManager Restart. (jianhe) (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582913)
* /hadoop/common/trunk/hadoop-project/src/site/site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRestart.apt.vm


 Document RM Restart feature
 ---

 Key: YARN-1017
 URL: https://issues.apache.org/jira/browse/YARN-1017
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
Priority: Blocker
 Fix For: 2.4.0

 Attachments: rm-restart-doc-1.patch, rm-restart-doc-2.patch, 
 rm-restart-doc-3.patch


 This should give users a general idea about how RM Restart works and how to 
 use RM Restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1883) TestRMAdminService fails due to inconsistent entries in UserGroups

2014-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954688#comment-13954688
 ] 

Hudson commented on YARN-1883:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1716 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1716/])
YARN-1883. TestRMAdminService fails due to inconsistent entries in UserGroups 
(Mit Desai via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582862)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java


 TestRMAdminService fails due to inconsistent entries in UserGroups
 --

 Key: YARN-1883
 URL: https://issues.apache.org/jira/browse/YARN-1883
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1883.patch, YARN-1883.patch


 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails 
 with the following error:
 {noformat}
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104)
 {noformat}
 Line Numbers will be inconsistent as I was testing to run it in a particular 
 order. But the Line on which the failure occurs is
 {code}
 Assert.assertTrue(groupBefore.contains(test_group_A)
  groupBefore.contains(test_group_B)
  groupBefore.contains(test_group_C)  groupBefore.size() == 3);
 {code}
 testRMInitialsWithFileSystemBasedConfigurationProvider() and
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider()
 calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes 
 the list of userGroups.
 testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() 
 tries to verify the groups before changing it and fails if 
 testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made 
 the changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1891) Document NodeManager health-monitoring

2014-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954689#comment-13954689
 ] 

Hudson commented on YARN-1891:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1716 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1716/])
YARN-1891. Added documentation for NodeManager health-monitoring. Contributed 
by Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582891)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManager.apt.vm


 Document NodeManager health-monitoring
 --

 Key: YARN-1891
 URL: https://issues.apache.org/jira/browse/YARN-1891
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Minor
 Fix For: 2.4.0

 Attachments: apache-yarn-1891.0.patch


 Start documenting node manager starting with the health monitoring.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1893) Make ApplicationMasterProtocol#allocate AtMostOnce

2014-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954841#comment-13954841
 ] 

Hudson commented on YARN-1893:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5433 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5433/])
YARN-1893. Mark AtMostOnce annotation to ApplicationMasterProtocol#allocate. 
Contributed by Xuan Gong. (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1583203)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationMasterProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceOnHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 Make ApplicationMasterProtocol#allocate AtMostOnce
 --

 Key: YARN-1893
 URL: https://issues.apache.org/jira/browse/YARN-1893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1893.1.patch, YARN-1893.1.patch, YARN-1893.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-30 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA reassigned YARN-1879:


Assignee: Tsuyoshi OZAWA  (was: Xuan Gong)

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
 ---

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-30 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954897#comment-13954897
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--

I apologize for the delay because I was busy last weekend. I'd like to tackle 
this problem, so I takes it back to me. Please let me know if you have a 
problem.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
 ---

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-1879.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-30 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954920#comment-13954920
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--

{code}
If the application has already registered/unregistered, duplicate 
register/unregister attempts can cause exceptions. 
{code}

This is correct, and  is a subtle problem. If a RPC client, AM, think it's 
fatal, AM stops. Currently ARMRMClientImpl#registerApplicationMaster() can 
throw exceptions and ApplicationMaster doesn't handle them, so it causes AM 
failure. We preserve the semantics, we should mark 
ApplicationMasterProtocol#registerApplicationMaster() as AtMostOnce.

About ARMRMClientImpl#unregisterApplicationMaster(), AM just log the failure of 
RPC and continue to stop. Therefore, we can make 
ApplicationMasterProtocol#unregisterApplicationMaster() as Idempotent.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
 ---

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954924#comment-13954924
 ] 

Jian He commented on YARN-1879:
---

bq. About ARMRMClientImpl#unregisterApplicationMaster(), AM just log the 
failure of RPC and continue to stop.
I suppose this is what MR-AM does today? we cannot assume each AM does the same.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
 ---

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-30 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954925#comment-13954925
 ] 

Hong Zhiguo commented on YARN-1889:
---

Hi, Fengdong,
I didn't submit new patch yet and I'll do it now. Sorry I don't have enough 
time to do it in last weekend.

 avoid creating new objects on each fair scheduler call to AppSchedulable 
 comparator
 ---

 Key: YARN-1889
 URL: https://issues.apache.org/jira/browse/YARN-1889
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Hong Zhiguo
Priority: Minor
 Attachments: YARN-1889.patch


 In fair scheduler, in each scheduling attempt, a full sort is
 performed on List of AppSchedulable, which invokes Comparator.compare
 method many times. Both FairShareComparator and DRFComparator call
 AppSchedulable.getWeights, and AppSchedulable.getPriority.
 A new ResourceWeights object is allocated on each call of getWeights,
 and the same for getPriority. This introduces a lot of pressure to
 GC because these methods are called very very frequently.
 Below test case shows improvement on performance and GC behaviour. The 
 results show that the GC pressure during processing NodeUpdate is recuded 
 half by this patch.
 The code to show the improvement: (Add it to TestFairScheduler.java)
 import java.lang.management.GarbageCollectorMXBean;
 import java.lang.management.ManagementFactory;
   public void printGCStats() {
 long totalGarbageCollections = 0;
 long garbageCollectionTime = 0;
 for(GarbageCollectorMXBean gc :
   ManagementFactory.getGarbageCollectorMXBeans()) {
   long count = gc.getCollectionCount();
   if(count = 0) {
 totalGarbageCollections += count;
   }
   long time = gc.getCollectionTime();
   if(time = 0) {
 garbageCollectionTime += time;
   }
 }
 System.out.println(Total Garbage Collections: 
 + totalGarbageCollections);
 System.out.println(Total Garbage Collection Time (ms): 
 + garbageCollectionTime);
   }
   @Test
   public void testImpactOnGC() throws Exception {
 scheduler.reinitialize(conf, resourceManager.getRMContext());
 // Add nodes
 int numNode = 1;
 for (int i = 0; i  numNode; ++i) {
 String host = String.format(192.1.%d.%d, i/256, i%256);
 RMNode node =
 MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
 host);
 NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
 scheduler.handle(nodeEvent);
 assertEquals(1024 * 64 * (i+1), 
 scheduler.getClusterCapacity().getMemory());
 }
 assertEquals(numNode, scheduler.getNumClusterNodes());
 assertEquals(1024 * 64 * numNode, 
 scheduler.getClusterCapacity().getMemory());
 // add apps, each app has 100 containers.
 int minReqSize =
 
 FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
 int numApp = 8000;
 int priority = 1;
 for (int i = 1; i  numApp + 1; ++i) {
 ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
 AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
 attemptId.getApplicationId(), queue1, user1);
 scheduler.handle(appAddedEvent);
 AppAttemptAddedSchedulerEvent attemptAddedEvent =
 new AppAttemptAddedSchedulerEvent(attemptId, false);
 scheduler.handle(attemptAddedEvent);
 createSchedulingRequestExistingApplication(minReqSize * 2, 1, 
 priority, attemptId);
 }
 scheduler.update();
 assertEquals(numApp, scheduler.getQueueManager().getLeafQueue(queue1, 
 true)
 .getRunnableAppSchedulables().size());
 System.out.println(GC stats before NodeUpdate processing:);
 printGCStats();
 int hb_num = 5000;
 long start = System.nanoTime();
 for (int i = 0; i  hb_num; ++i) {
   String host = String.format(192.1.%d.%d, i/256, i%256);
   RMNode node =
   MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, 
 host);
   NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node);
   scheduler.handle(nodeEvent);
 }
 long end = System.nanoTime();
 System.out.printf(processing time for a NodeUpdate in average: %d us\n,
 (end - start)/(hb_num * 1000));
 System.out.println(GC stats after NodeUpdate processing:);
 printGCStats();
   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-30 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-1889:
--

Attachment: YARN-1889.patch

 avoid creating new objects on each fair scheduler call to AppSchedulable 
 comparator
 ---

 Key: YARN-1889
 URL: https://issues.apache.org/jira/browse/YARN-1889
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Hong Zhiguo
Priority: Minor
 Attachments: YARN-1889.patch, YARN-1889.patch


 In fair scheduler, in each scheduling attempt, a full sort is
 performed on List of AppSchedulable, which invokes Comparator.compare
 method many times. Both FairShareComparator and DRFComparator call
 AppSchedulable.getWeights, and AppSchedulable.getPriority.
 A new ResourceWeights object is allocated on each call of getWeights,
 and the same for getPriority. This introduces a lot of pressure to
 GC because these methods are called very very frequently.
 Below test case shows improvement on performance and GC behaviour. The 
 results show that the GC pressure during processing NodeUpdate is recuded 
 half by this patch.
 The code to show the improvement: (Add it to TestFairScheduler.java)
 import java.lang.management.GarbageCollectorMXBean;
 import java.lang.management.ManagementFactory;
   public void printGCStats() {
 long totalGarbageCollections = 0;
 long garbageCollectionTime = 0;
 for(GarbageCollectorMXBean gc :
   ManagementFactory.getGarbageCollectorMXBeans()) {
   long count = gc.getCollectionCount();
   if(count = 0) {
 totalGarbageCollections += count;
   }
   long time = gc.getCollectionTime();
   if(time = 0) {
 garbageCollectionTime += time;
   }
 }
 System.out.println(Total Garbage Collections: 
 + totalGarbageCollections);
 System.out.println(Total Garbage Collection Time (ms): 
 + garbageCollectionTime);
   }
   @Test
   public void testImpactOnGC() throws Exception {
 scheduler.reinitialize(conf, resourceManager.getRMContext());
 // Add nodes
 int numNode = 1;
 for (int i = 0; i  numNode; ++i) {
 String host = String.format(192.1.%d.%d, i/256, i%256);
 RMNode node =
 MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
 host);
 NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
 scheduler.handle(nodeEvent);
 assertEquals(1024 * 64 * (i+1), 
 scheduler.getClusterCapacity().getMemory());
 }
 assertEquals(numNode, scheduler.getNumClusterNodes());
 assertEquals(1024 * 64 * numNode, 
 scheduler.getClusterCapacity().getMemory());
 // add apps, each app has 100 containers.
 int minReqSize =
 
 FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
 int numApp = 8000;
 int priority = 1;
 for (int i = 1; i  numApp + 1; ++i) {
 ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
 AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
 attemptId.getApplicationId(), queue1, user1);
 scheduler.handle(appAddedEvent);
 AppAttemptAddedSchedulerEvent attemptAddedEvent =
 new AppAttemptAddedSchedulerEvent(attemptId, false);
 scheduler.handle(attemptAddedEvent);
 createSchedulingRequestExistingApplication(minReqSize * 2, 1, 
 priority, attemptId);
 }
 scheduler.update();
 assertEquals(numApp, scheduler.getQueueManager().getLeafQueue(queue1, 
 true)
 .getRunnableAppSchedulables().size());
 System.out.println(GC stats before NodeUpdate processing:);
 printGCStats();
 int hb_num = 5000;
 long start = System.nanoTime();
 for (int i = 0; i  hb_num; ++i) {
   String host = String.format(192.1.%d.%d, i/256, i%256);
   RMNode node =
   MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, 
 host);
   NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node);
   scheduler.handle(nodeEvent);
 }
 long end = System.nanoTime();
 System.out.printf(processing time for a NodeUpdate in average: %d us\n,
 (end - start)/(hb_num * 1000));
 System.out.println(GC stats after NodeUpdate processing:);
 printGCStats();
   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954931#comment-13954931
 ] 

Hadoop QA commented on YARN-1889:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12637758/YARN-1889.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3491//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3491//console

This message is automatically generated.

 avoid creating new objects on each fair scheduler call to AppSchedulable 
 comparator
 ---

 Key: YARN-1889
 URL: https://issues.apache.org/jira/browse/YARN-1889
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Hong Zhiguo
Priority: Minor
 Attachments: YARN-1889.patch, YARN-1889.patch


 In fair scheduler, in each scheduling attempt, a full sort is
 performed on List of AppSchedulable, which invokes Comparator.compare
 method many times. Both FairShareComparator and DRFComparator call
 AppSchedulable.getWeights, and AppSchedulable.getPriority.
 A new ResourceWeights object is allocated on each call of getWeights,
 and the same for getPriority. This introduces a lot of pressure to
 GC because these methods are called very very frequently.
 Below test case shows improvement on performance and GC behaviour. The 
 results show that the GC pressure during processing NodeUpdate is recuded 
 half by this patch.
 The code to show the improvement: (Add it to TestFairScheduler.java)
 import java.lang.management.GarbageCollectorMXBean;
 import java.lang.management.ManagementFactory;
   public void printGCStats() {
 long totalGarbageCollections = 0;
 long garbageCollectionTime = 0;
 for(GarbageCollectorMXBean gc :
   ManagementFactory.getGarbageCollectorMXBeans()) {
   long count = gc.getCollectionCount();
   if(count = 0) {
 totalGarbageCollections += count;
   }
   long time = gc.getCollectionTime();
   if(time = 0) {
 garbageCollectionTime += time;
   }
 }
 System.out.println(Total Garbage Collections: 
 + totalGarbageCollections);
 System.out.println(Total Garbage Collection Time (ms): 
 + garbageCollectionTime);
   }
   @Test
   public void testImpactOnGC() throws Exception {
 scheduler.reinitialize(conf, resourceManager.getRMContext());
 // Add nodes
 int numNode = 1;
 for (int i = 0; i  numNode; ++i) {
 String host = String.format(192.1.%d.%d, i/256, i%256);
 RMNode node =
 MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
 host);
 NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
 scheduler.handle(nodeEvent);
 assertEquals(1024 * 64 * (i+1), 
 scheduler.getClusterCapacity().getMemory());
 }
 assertEquals(numNode, scheduler.getNumClusterNodes());
 assertEquals(1024 * 64 * numNode, 
 scheduler.getClusterCapacity().getMemory());
 // add apps, each app has 100 containers.
 int minReqSize =
 
 FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
 int numApp = 8000;
 int priority = 1;
 for (int i = 1; i  numApp + 1; ++i) {
 ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
 AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
 attemptId.getApplicationId(), queue1, user1);
 scheduler.handle(appAddedEvent);
 AppAttemptAddedSchedulerEvent 

[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-30 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954943#comment-13954943
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--

{quote}
I suppose this is what MR-AM does today? we cannot assume each AM does the same.
{quote}

Jian, Thank you for the point. I supposed DistributedShell's application master 
in the sentence. I checked that MRAppMaster stops because it doesn't retry when 
Exception occurs at server-side. Therefore, as you mentioned, we cannot assume 
each AM doesn't the same. We should make them AtMostOnce with RetryCache-like 
mechanism. I'll create a patch based on the discussion.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
 ---

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-30 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated YARN-1889:
--

Labels: reviewed  (was: )

 avoid creating new objects on each fair scheduler call to AppSchedulable 
 comparator
 ---

 Key: YARN-1889
 URL: https://issues.apache.org/jira/browse/YARN-1889
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Hong Zhiguo
Priority: Minor
  Labels: reviewed
 Attachments: YARN-1889.patch, YARN-1889.patch


 In fair scheduler, in each scheduling attempt, a full sort is
 performed on List of AppSchedulable, which invokes Comparator.compare
 method many times. Both FairShareComparator and DRFComparator call
 AppSchedulable.getWeights, and AppSchedulable.getPriority.
 A new ResourceWeights object is allocated on each call of getWeights,
 and the same for getPriority. This introduces a lot of pressure to
 GC because these methods are called very very frequently.
 Below test case shows improvement on performance and GC behaviour. The 
 results show that the GC pressure during processing NodeUpdate is recuded 
 half by this patch.
 The code to show the improvement: (Add it to TestFairScheduler.java)
 import java.lang.management.GarbageCollectorMXBean;
 import java.lang.management.ManagementFactory;
   public void printGCStats() {
 long totalGarbageCollections = 0;
 long garbageCollectionTime = 0;
 for(GarbageCollectorMXBean gc :
   ManagementFactory.getGarbageCollectorMXBeans()) {
   long count = gc.getCollectionCount();
   if(count = 0) {
 totalGarbageCollections += count;
   }
   long time = gc.getCollectionTime();
   if(time = 0) {
 garbageCollectionTime += time;
   }
 }
 System.out.println(Total Garbage Collections: 
 + totalGarbageCollections);
 System.out.println(Total Garbage Collection Time (ms): 
 + garbageCollectionTime);
   }
   @Test
   public void testImpactOnGC() throws Exception {
 scheduler.reinitialize(conf, resourceManager.getRMContext());
 // Add nodes
 int numNode = 1;
 for (int i = 0; i  numNode; ++i) {
 String host = String.format(192.1.%d.%d, i/256, i%256);
 RMNode node =
 MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
 host);
 NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
 scheduler.handle(nodeEvent);
 assertEquals(1024 * 64 * (i+1), 
 scheduler.getClusterCapacity().getMemory());
 }
 assertEquals(numNode, scheduler.getNumClusterNodes());
 assertEquals(1024 * 64 * numNode, 
 scheduler.getClusterCapacity().getMemory());
 // add apps, each app has 100 containers.
 int minReqSize =
 
 FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
 int numApp = 8000;
 int priority = 1;
 for (int i = 1; i  numApp + 1; ++i) {
 ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
 AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
 attemptId.getApplicationId(), queue1, user1);
 scheduler.handle(appAddedEvent);
 AppAttemptAddedSchedulerEvent attemptAddedEvent =
 new AppAttemptAddedSchedulerEvent(attemptId, false);
 scheduler.handle(attemptAddedEvent);
 createSchedulingRequestExistingApplication(minReqSize * 2, 1, 
 priority, attemptId);
 }
 scheduler.update();
 assertEquals(numApp, scheduler.getQueueManager().getLeafQueue(queue1, 
 true)
 .getRunnableAppSchedulables().size());
 System.out.println(GC stats before NodeUpdate processing:);
 printGCStats();
 int hb_num = 5000;
 long start = System.nanoTime();
 for (int i = 0; i  hb_num; ++i) {
   String host = String.format(192.1.%d.%d, i/256, i%256);
   RMNode node =
   MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, 
 host);
   NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node);
   scheduler.handle(nodeEvent);
 }
 long end = System.nanoTime();
 System.out.printf(processing time for a NodeUpdate in average: %d us\n,
 (end - start)/(hb_num * 1000));
 System.out.println(GC stats after NodeUpdate processing:);
 printGCStats();
   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-30 Thread Fengdong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954945#comment-13954945
 ] 

Fengdong Yu commented on YARN-1889:
---

The new patch looks good to me.

 avoid creating new objects on each fair scheduler call to AppSchedulable 
 comparator
 ---

 Key: YARN-1889
 URL: https://issues.apache.org/jira/browse/YARN-1889
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Hong Zhiguo
Priority: Minor
  Labels: reviewed
 Attachments: YARN-1889.patch, YARN-1889.patch


 In fair scheduler, in each scheduling attempt, a full sort is
 performed on List of AppSchedulable, which invokes Comparator.compare
 method many times. Both FairShareComparator and DRFComparator call
 AppSchedulable.getWeights, and AppSchedulable.getPriority.
 A new ResourceWeights object is allocated on each call of getWeights,
 and the same for getPriority. This introduces a lot of pressure to
 GC because these methods are called very very frequently.
 Below test case shows improvement on performance and GC behaviour. The 
 results show that the GC pressure during processing NodeUpdate is recuded 
 half by this patch.
 The code to show the improvement: (Add it to TestFairScheduler.java)
 import java.lang.management.GarbageCollectorMXBean;
 import java.lang.management.ManagementFactory;
   public void printGCStats() {
 long totalGarbageCollections = 0;
 long garbageCollectionTime = 0;
 for(GarbageCollectorMXBean gc :
   ManagementFactory.getGarbageCollectorMXBeans()) {
   long count = gc.getCollectionCount();
   if(count = 0) {
 totalGarbageCollections += count;
   }
   long time = gc.getCollectionTime();
   if(time = 0) {
 garbageCollectionTime += time;
   }
 }
 System.out.println(Total Garbage Collections: 
 + totalGarbageCollections);
 System.out.println(Total Garbage Collection Time (ms): 
 + garbageCollectionTime);
   }
   @Test
   public void testImpactOnGC() throws Exception {
 scheduler.reinitialize(conf, resourceManager.getRMContext());
 // Add nodes
 int numNode = 1;
 for (int i = 0; i  numNode; ++i) {
 String host = String.format(192.1.%d.%d, i/256, i%256);
 RMNode node =
 MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
 host);
 NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
 scheduler.handle(nodeEvent);
 assertEquals(1024 * 64 * (i+1), 
 scheduler.getClusterCapacity().getMemory());
 }
 assertEquals(numNode, scheduler.getNumClusterNodes());
 assertEquals(1024 * 64 * numNode, 
 scheduler.getClusterCapacity().getMemory());
 // add apps, each app has 100 containers.
 int minReqSize =
 
 FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
 int numApp = 8000;
 int priority = 1;
 for (int i = 1; i  numApp + 1; ++i) {
 ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
 AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
 attemptId.getApplicationId(), queue1, user1);
 scheduler.handle(appAddedEvent);
 AppAttemptAddedSchedulerEvent attemptAddedEvent =
 new AppAttemptAddedSchedulerEvent(attemptId, false);
 scheduler.handle(attemptAddedEvent);
 createSchedulingRequestExistingApplication(minReqSize * 2, 1, 
 priority, attemptId);
 }
 scheduler.update();
 assertEquals(numApp, scheduler.getQueueManager().getLeafQueue(queue1, 
 true)
 .getRunnableAppSchedulables().size());
 System.out.println(GC stats before NodeUpdate processing:);
 printGCStats();
 int hb_num = 5000;
 long start = System.nanoTime();
 for (int i = 0; i  hb_num; ++i) {
   String host = String.format(192.1.%d.%d, i/256, i%256);
   RMNode node =
   MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, 
 host);
   NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node);
   scheduler.handle(nodeEvent);
 }
 long end = System.nanoTime();
 System.out.printf(processing time for a NodeUpdate in average: %d us\n,
 (end - start)/(hb_num * 1000));
 System.out.println(GC stats after NodeUpdate processing:);
 printGCStats();
   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)