date:20140718


 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--

Attachment: YARN-2033_ALL.1.patch

Upload a patch including the two dependent ones for jenkins to verify.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store


 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--

Attachment: YARN-2033.1.patch

I've made a first patch, that include the whole feature for timeline store 
based generic history service, and test cases. In this jira, I don't deprecate 
the old application history store classes set. I'll file another jira for it. 
Once this jira is done, we should mark those deprecated.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java


[ 
https://issues.apache.org/jira/browse/YARN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066186#comment-14066186
 ] 

Zhijie Shen commented on YARN-2319:
---

I encountered some test failures today around this test case. Will take a look

 Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java
 ---

 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-2319.0.patch


 MiniKdc only invoke start method not stop in 
 TestRMWebServicesDelegationTokens.java
 {code}
 testMiniKDC.start();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2320) Deprecate existing application history store after we store the history data to timeline store

Zhijie Shen created YARN-2320:
-

 Summary: Deprecate existing application history store after we 
store the history data to timeline store
 Key: YARN-2320
 URL: https://issues.apache.org/jira/browse/YARN-2320
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen


After YARN-2033, we should deprecate application history store set. There's no 
need to maintain two sets of store interfaces. In addition, we should conclude 
the outstanding jira's under YARN-321 about the application history store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2301) Improve yarn container command

[
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066195#comment-14066195
]

Zhijie Shen commented on YARN-2301:
---

bq. As attempt Id is not shown on console, this is easier for user to just copy
the appId and run it, may also be useful for container-preserving AM restart.

You can run yarn appattempt to get the attempt. Anyway it's arguable if it is
user friendly or not. Given adding a function, I vote for yarn container -list
appId

One more comment. “yarn container” can source the container information either
from RM or from timeline server. When making the changes, please make sure the
both sides are changed consistently

Improve yarn container command
--

Key: YARN-2301
URL: https://issues.apache.org/jira/browse/YARN-2301
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Jian He
Assignee: Naganarasimha G R
Labels: usability

While running yarn container -list Application Attempt ID command, some
observations:
1) the scheme (e.g. http/https ) before LOG-URL is missing
2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to
print as time format.
3) finish-time is 0 if container is not yet finished. May be N/A
4) May have an option to run as yarn container -list appId OR yarn
application -list-containers appId also.
As attempt Id is not shown on console, this is easier for user to just copy
the appId and run it, may also be useful for container-preserving AM
restart.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java


[ 
https://issues.apache.org/jira/browse/YARN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066210#comment-14066210
 ] 

Hadoop QA commented on YARN-2319:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656480/YARN-2319.0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4357//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4357//console

This message is automatically generated.

 Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java
 ---

 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-2319.0.patch


 MiniKdc only invoke start method not stop in 
 TestRMWebServicesDelegationTokens.java
 {code}
 testMiniKDC.start();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java


[ 
https://issues.apache.org/jira/browse/YARN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066220#comment-14066220
 ] 

Zhijie Shen commented on YARN-2319:
---

I ran through the test cases on trunk again. The failure I encountered before 
is not related to this. However, it's still good to have the close at the end.

The set of test failures seem to be related to other things as well.

 Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java
 ---

 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-2319.0.patch


 MiniKdc only invoke start method not stop in 
 TestRMWebServicesDelegationTokens.java
 {code}
 testMiniKDC.start();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2304) TestRMWebServices* fails intermittently


[ 
https://issues.apache.org/jira/browse/YARN-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066224#comment-14066224
 ] 

Zhijie Shen commented on YARN-2304:
---

It happened several times. Another instance:

https://issues.apache.org/jira/browse/YARN-2319?focusedCommentId=14066210page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14066210

 TestRMWebServices* fails intermittently
 ---

 Key: YARN-2304
 URL: https://issues.apache.org/jira/browse/YARN-2304
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
 Attachments: test-failure-log-RMWeb.txt


 The test fails intermittently because of bind exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store


[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066246#comment-14066246
 ] 

Hadoop QA commented on YARN-2033:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656482/YARN-2033_ALL.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 20 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.util.TestFSDownload
  
org.apache.hadoop.yarn.server.resourcemanager.metrics.TestYarnMetricsPublisher

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4358//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4358//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4358//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4358//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4358//console

This message is automatically generated.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066251#comment-14066251
 ] 

Hudson commented on YARN-1341:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #616 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/616/])
YARN-1341. Recover NMTokens upon nodemanager restart. (Contributed by Jason 
Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611512)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseNMTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java


 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.6.0

 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2321) NodeManager WebUI get wrong configuration of isPmemCheckEnabled()

2014-07-18 Thread Leitao Guo (JIRA)

Leitao Guo created YARN-2321:


 Summary: NodeManager WebUI get wrong configuration of 
isPmemCheckEnabled()
 Key: YARN-2321
 URL: https://issues.apache.org/jira/browse/YARN-2321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo


WebUI of NodeManager get the wrong configuration of Pmem enforcement enable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2270) TestFSDownload#testDownloadPublicWithStatCache fails in trunk

2014-07-18 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066272#comment-14066272
 ] 

Varun Vasudev commented on YARN-2270:
-

[~ajisakaa] your current patch if ok, but maybe we should skip the test if the 
ancestor permissions aren't right? If the real issue is the ancestor 
permissions, then the get() will fail for all the files. Maybe something like -
{noformat}
  boolean ancestorPermissionsOK = 
FSDownload.ancestorsHaveExecutePermissions(fs, basedir, null);
  assumeTrue(ancestorPermissionsOK);
{noformat}

The benefit of this  approach is that the test gets reported as skipped and 
people who are interested in ensuring it runs correctly can fix their build 
environment to ensure the test runs. Your current approach hides the fact that 
the test didn't really do what it was expected to do(apart from the log 
message).

 TestFSDownload#testDownloadPublicWithStatCache fails in trunk
 -

 Key: YARN-2270
 URL: https://issues.apache.org/jira/browse/YARN-2270
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.1
Reporter: Ted Yu
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: YARN-2270.patch


 From https://builds.apache.org/job/Hadoop-yarn-trunk/608/console :
 {code}
 Running org.apache.hadoop.yarn.util.TestFSDownload
 Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.955 sec  
 FAILURE! - in org.apache.hadoop.yarn.util.TestFSDownload
 testDownloadPublicWithStatCache(org.apache.hadoop.yarn.util.TestFSDownload)  
 Time elapsed: 0.137 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.util.TestFSDownload.testDownloadPublicWithStatCache(TestFSDownload.java:363)
 {code}
 Similar error can be seen here: 
 https://builds.apache.org/job/PreCommit-YARN-Build/4243//testReport/org.apache.hadoop.yarn.util/TestFSDownload/testDownloadPublicWithStatCache/
 Looks like future.get() returned null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2321) NodeManager WebUI get wrong configuration of isPmemCheckEnabled()

2014-07-18 Thread Leitao Guo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2321:
-

Attachment: YARN-2321.patch

 NodeManager WebUI get wrong configuration of isPmemCheckEnabled()
 -

 Key: YARN-2321
 URL: https://issues.apache.org/jira/browse/YARN-2321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: YARN-2321.patch


 WebUI of NodeManager get the wrong configuration of Pmem enforcement enable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2270) TestFSDownload#testDownloadPublicWithStatCache fails in trunk

2014-07-18 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2270:


Attachment: YARN-2270.2.patch

 TestFSDownload#testDownloadPublicWithStatCache fails in trunk
 -

 Key: YARN-2270
 URL: https://issues.apache.org/jira/browse/YARN-2270
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.1
Reporter: Ted Yu
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: YARN-2270.2.patch, YARN-2270.patch


 From https://builds.apache.org/job/Hadoop-yarn-trunk/608/console :
 {code}
 Running org.apache.hadoop.yarn.util.TestFSDownload
 Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.955 sec  
 FAILURE! - in org.apache.hadoop.yarn.util.TestFSDownload
 testDownloadPublicWithStatCache(org.apache.hadoop.yarn.util.TestFSDownload)  
 Time elapsed: 0.137 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.util.TestFSDownload.testDownloadPublicWithStatCache(TestFSDownload.java:363)
 {code}
 Similar error can be seen here: 
 https://builds.apache.org/job/PreCommit-YARN-Build/4243//testReport/org.apache.hadoop.yarn.util/TestFSDownload/testDownloadPublicWithStatCache/
 Looks like future.get() returned null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2321) NodeManager WebUI get wrong configuration of isPmemCheckEnabled()


[ 
https://issues.apache.org/jira/browse/YARN-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066291#comment-14066291
 ] 

Hadoop QA commented on YARN-2321:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656497/YARN-2321.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4359//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4359//console

This message is automatically generated.

 NodeManager WebUI get wrong configuration of isPmemCheckEnabled()
 -

 Key: YARN-2321
 URL: https://issues.apache.org/jira/browse/YARN-2321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: YARN-2321.patch


 WebUI of NodeManager get the wrong configuration of Pmem enforcement enable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2270) TestFSDownload#testDownloadPublicWithStatCache fails in trunk

2014-07-18 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066290#comment-14066290
 ] 

Akira AJISAKA commented on YARN-2270:
-

Thanks [~vvasudev] for the review! Update the patch to skip test if the basedir 
doesn't have the ancestor permissions.

 TestFSDownload#testDownloadPublicWithStatCache fails in trunk
 -

 Key: YARN-2270
 URL: https://issues.apache.org/jira/browse/YARN-2270
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.1
Reporter: Ted Yu
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: YARN-2270.2.patch, YARN-2270.patch


 From https://builds.apache.org/job/Hadoop-yarn-trunk/608/console :
 {code}
 Running org.apache.hadoop.yarn.util.TestFSDownload
 Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.955 sec  
 FAILURE! - in org.apache.hadoop.yarn.util.TestFSDownload
 testDownloadPublicWithStatCache(org.apache.hadoop.yarn.util.TestFSDownload)  
 Time elapsed: 0.137 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.util.TestFSDownload.testDownloadPublicWithStatCache(TestFSDownload.java:363)
 {code}
 Similar error can be seen here: 
 https://builds.apache.org/job/PreCommit-YARN-Build/4243//testReport/org.apache.hadoop.yarn.util/TestFSDownload/testDownloadPublicWithStatCache/
 Looks like future.get() returned null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2270) TestFSDownload#testDownloadPublicWithStatCache fails in trunk


[ 
https://issues.apache.org/jira/browse/YARN-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066295#comment-14066295
 ] 

Hadoop QA commented on YARN-2270:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656501/YARN-2270.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4360//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4360//console

This message is automatically generated.

 TestFSDownload#testDownloadPublicWithStatCache fails in trunk
 -

 Key: YARN-2270
 URL: https://issues.apache.org/jira/browse/YARN-2270
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.1
Reporter: Ted Yu
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: YARN-2270.2.patch, YARN-2270.patch


 From https://builds.apache.org/job/Hadoop-yarn-trunk/608/console :
 {code}
 Running org.apache.hadoop.yarn.util.TestFSDownload
 Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.955 sec  
 FAILURE! - in org.apache.hadoop.yarn.util.TestFSDownload
 testDownloadPublicWithStatCache(org.apache.hadoop.yarn.util.TestFSDownload)  
 Time elapsed: 0.137 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.util.TestFSDownload.testDownloadPublicWithStatCache(TestFSDownload.java:363)
 {code}
 Similar error can be seen here: 
 https://builds.apache.org/job/PreCommit-YARN-Build/4243//testReport/org.apache.hadoop.yarn.util/TestFSDownload/testDownloadPublicWithStatCache/
 Looks like future.get() returned null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2270) TestFSDownload#testDownloadPublicWithStatCache fails in trunk

2014-07-18 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066297#comment-14066297
 ] 

Varun Vasudev commented on YARN-2270:
-

+1, looks good to me.

 TestFSDownload#testDownloadPublicWithStatCache fails in trunk
 -

 Key: YARN-2270
 URL: https://issues.apache.org/jira/browse/YARN-2270
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.1
Reporter: Ted Yu
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: YARN-2270.2.patch, YARN-2270.patch


 From https://builds.apache.org/job/Hadoop-yarn-trunk/608/console :
 {code}
 Running org.apache.hadoop.yarn.util.TestFSDownload
 Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.955 sec  
 FAILURE! - in org.apache.hadoop.yarn.util.TestFSDownload
 testDownloadPublicWithStatCache(org.apache.hadoop.yarn.util.TestFSDownload)  
 Time elapsed: 0.137 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.util.TestFSDownload.testDownloadPublicWithStatCache(TestFSDownload.java:363)
 {code}
 Similar error can be seen here: 
 https://builds.apache.org/job/PreCommit-YARN-Build/4243//testReport/org.apache.hadoop.yarn.util/TestFSDownload/testDownloadPublicWithStatCache/
 Looks like future.get() returned null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2301) Improve yarn container command

2014-07-18 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066313#comment-14066313
 ] 

Naganarasimha G R commented on YARN-2301:
-

Thanks [~zjshen] for the comments,

I feel it would be easy to hit a single command and i would like to add yarn 
container -list appId 
I will consider the changes for container information got from Timeline/History 
server also.

 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Naganarasimha G R
  Labels: usability

 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066342#comment-14066342
 ] 

Hudson commented on YARN-1341:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1835 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1835/])
YARN-1341. Recover NMTokens upon nodemanager restart. (Contributed by Jason 
Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611512)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseNMTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java


 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.6.0

 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066365#comment-14066365
 ] 

Hudson commented on YARN-1341:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1808 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1808/])
YARN-1341. Recover NMTokens upon nodemanager restart. (Contributed by Jason 
Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611512)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseNMTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java


 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.6.0

 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster

2014-07-18 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066376#comment-14066376
]

Jason Lowe commented on YARN-2314:
--

While there is cache mismanagement going on as described above, a bigger issue
is how this cache interacts with the ClientCache in the RPC layer and how
Connection instances behave. Despite this cache's intent to try to limit the
number of connected NMs, calling stopProxy does *not* mean the connection and
corresponding IPC client thread is removed. Closing a proxy will only shutdown
threads if there are *no* other instances of that protocol proxy currently
open. See ClientCache.stopClient for details. Given that the whole point of
the ContainerManagementProtocolProxy cache is to preserve at least one
reference to the Client, the IPC Client stop method will never be called in
practice and IPC client threads will never be explicitly torn down as a result
of calling stopProxy.

As for Connection instances within the IPC Client, outside of erroneous
operation they will only shutdown if either they reach their idle timeout or
are explicitly told to stop via Client.stop, and the latter will never be
called in practice per above. That means the number of IPC client threads
lingering around is solely dictated by how fast we're connecting to new nodes
and how long the IPC idle timeout is. By default this timeout is 10 seconds,
and an AM running a wide-spread large job on a large, idle cluster can easily
allocate containers for and connect to all of the nodes in less than 10
seconds. That means we cam still have thousands of IPC client threads despite
ContainerManagementProtocolProxy's efforts to limit the number of connections.

In simplest terms this is a regression of MAPREDUCE-. That patch
explicitly tuned the IPC timeout of ContainerManagement proxies to zero so they
would be torn down as soon as we finished the first call. I've verified that
setting the IPC timeout to zero prevents the explosion of IPC client threads.
That's sort of a ham-fisted fix since it brings the whole point of the NM proxy
cache into question. We would be keeping the proxy objects around, but the
connection to the NM would need to be re-established each time we reused it.
Not sure the cache would be worth much at that point. If we want to explicitly
manage the number of outstanding NM connections without forcing the connections
to shutdown on each IPC call then I think we need help from the IPC layer
itself. As I mentioned above, I don't think there's an exposed mechanism to
close an individual connection of an IPC Client.

So to sum up, we can fix the cache management bugs described in the first
comment, but that alone will not prevent thousands of IPC client threads from
co-existing. We either need to set the IPC timeout to 0 (which brings the
utility of the NM proxy cache into question) or change the IPC layer to allow
us to close individual Client connections.

ContainerManagementProtocolProxy can create thousands of threads for a large
cluster

Key: YARN-2314
URL: https://issues.apache.org/jira/browse/YARN-2314
Project: Hadoop YARN
Issue Type: Bug
Components: client
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Priority: Critical

ContainerManagementProtocolProxy has a cache of NM proxies, and the size of
this cache is configurable. However the cache can grow far beyond the
configured size when running on a large cluster and blow AM address/container
limits. More details in the first comment.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java

2014-07-18 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066462#comment-14066462
 ] 

Tsuyoshi OZAWA commented on YARN-2319:
--

IIUC, the test failure is caused by JerseyTest. JerseyTest's constructor  - 
getContainer() - getBaseURI always returns the result of 
{{UriBuilder.fromUri(http://localhost/;).port(getPort(9998)).build()}}. If the 
another test jobs are running at the same time, some of them fail to bind port 
and tests fail as a result.
{code}
public JerseyTest(AppDescriptor ad) throws TestContainerException {
this.tc = getContainer(ad, getTestContainerFactory());
this.client = getClient(tc, ad);
}

/**
 * Returns the base URI of the application.
 * @return The base URI of the application
 */
protected URI getBaseURI() {
return UriBuilder.fromUri(http://localhost/;)
.port(getPort(9998)).build();
}
{code}


 Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java
 ---

 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-2319.0.patch


 MiniKdc only invoke start method not stop in 
 TestRMWebServicesDelegationTokens.java
 {code}
 testMiniKDC.start();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2304) TestRMWebServices* fails intermittently

2014-07-18 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066469#comment-14066469
 ] 

Tsuyoshi OZAWA commented on YARN-2304:
--

IIUC, the test failure is caused by JerseyTest. JerseyTest's constructor - 
getContainer() - getBaseURI always returns the result of 
UriBuilder.fromUri(http://localhost/;).port(getPort(9998)).build(). If the 
another test jobs are running at the same time, some of them fail to bind port 
and tests fail as a result.

{code}
public JerseyTest(AppDescriptor ad) throws TestContainerException {
this.tc = getContainer(ad, getTestContainerFactory());
this.client = getClient(tc, ad);
}

/**
 * Returns the base URI of the application.
 * @return The base URI of the application
 */
protected URI getBaseURI() {
return UriBuilder.fromUri(http://localhost/;)
.port(getPort(9998)).build();
}
{code}

 TestRMWebServices* fails intermittently
 ---

 Key: YARN-2304
 URL: https://issues.apache.org/jira/browse/YARN-2304
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
 Attachments: test-failure-log-RMWeb.txt


 The test fails intermittently because of bind exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java

2014-07-18 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066468#comment-14066468
 ] 

Tsuyoshi OZAWA commented on YARN-2319:
--

Oops, sorry, I intended to comment on YARN-2304. Feel free to delete it.

 Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java
 ---

 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-2319.0.patch


 MiniKdc only invoke start method not stop in 
 TestRMWebServicesDelegationTokens.java
 {code}
 testMiniKDC.start();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure


 [ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2008:
--

Attachment: YARN-2008.1.patch

Patch implementing the described behavior...

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066510#comment-14066510
 ] 

Craig Welch commented on YARN-2008:
---

[~airbots] Chen, I put together a patch, with it I believe the scenario you 
describe plays out as it should.  Can you have a look?  Also, do you mind if I 
assign this one over to me  see it through?

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066511#comment-14066511
 ] 

Craig Welch commented on YARN-2008:
---

[~wangda], can you have a look at this pls?  This is the headroom patch wrt 
ancestor-sibling utilization issues.

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-18 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066559#comment-14066559
 ] 

Jian He commented on YARN-2208:
---

patch looks good

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, 
 YARN-2208.7.patch, YARN-2208.8.patch, YARN-2208.8.patch, YARN-2208.8.patch, 
 YARN-2208.9.patch, YARN-2208.9.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2322) Provide Cli to refesh Admin Acls for Timeline server

2014-07-18 Thread Karam Singh (JIRA)

Karam Singh created YARN-2322:
-

 Summary: Provide Cli to refesh Admin Acls for Timeline server
 Key: YARN-2322
 URL: https://issues.apache.org/jira/browse/YARN-2322
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Karam Singh


Provide Cli to refresh Admin Acls for Timelineserver.
Currently rmadmin -refreshAdminAcls provides facility to refresh Admin Acls for 
ResourceManager. 
But If we want modify adminAcls from Timelineserver, then we need to restart 
it.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066589#comment-14066589
 ] 

Hadoop QA commented on YARN-2008:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656531/YARN-2008.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4361//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4361//console

This message is automatically generated.

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts

2014-07-18 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2244:


Attachment: YARN-2244.005.patch

Responded to feedback

 FairScheduler missing handling of containers for unknown application attempts 
 --

 Key: YARN-2244
 URL: https://issues.apache.org/jira/browse/YARN-2244
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-2224.patch, YARN-2244.001.patch, 
 YARN-2244.002.patch, YARN-2244.003.patch, YARN-2244.004.patch, 
 YARN-2244.005.patch


 We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other 
 fixes that were common across schedulers, there were some scheduler specific 
 fixes added to handle containers for unknown application attempts. Without 
 these fair scheduler simply logs that an unknown container was found and 
 continues to let it run. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2297) Preemption can prevent progress in small queues

2014-07-18 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066615#comment-14066615
 ] 

Sunil G commented on YARN-2297:
---

Hi [~gp.leftnoteasy]
bq. 1 Use (guaranteed - used)
I feel this can create a little bit more starvation for queues configured with 
less capacity.

bq. 2 combined function like sigmoid(ratio(used, guaranteed)) * (guaranteed - 
used)
Yes. This make more sense, it can neutralize ratio as well as difference to a 
uniform way. I feel more sampling can be done to come with a better approach. i 
can check and update you. 

 Preemption can prevent progress in small queues
 ---

 Key: YARN-2297
 URL: https://issues.apache.org/jira/browse/YARN-2297
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
Priority: Critical

 Preemption can cause hang issue in single-node cluster. Only AMs run. No task 
 container can run.
 h3. queue configuration
 Queue A/B has 1% and 99% respectively. 
 No max capacity.
 h3. scenario
 Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 
 1 user.
 Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. 
 Occupy entire cluster.
 Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each.
 Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. 
 No task of either app can proceed. 
 h3. commands
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter 
 -Dmapreduce.map.memory.mb=2000 
 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M 
 -Dmapreduce.randomtextwriter.bytespermap=2147483648 
 -Dmapreduce.job.queuename=A -Dmapreduce.map.maxattempts=100 
 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 
 -Dmapreduce.map.java.opts=-Xmx1800M 
 -Dmapreduce.randomtextwriter.mapsperhost=1 
 -Dmapreduce.randomtextwriter.totalbytes=2147483648 dir1
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep 
 -Dmapreduce.map.memory.mb=2000 
 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M 
 -Dmapreduce.job.queuename=B -Dmapreduce.map.maxattempts=100 
 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 
 -Dmapreduce.map.java.opts=-Xmx1800M -m 1 -r 0 -mt 4000  -rt 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure


 [ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2008:
--

Attachment: YARN-2008.2.patch

Added missed unit test

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066654#comment-14066654
 ] 

Craig Welch commented on YARN-2008:
---

The tests seem to pass on my box, I think these are still issues with the build 
server (tried 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
 and 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler)

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts


[ 
https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066712#comment-14066712
 ] 

Hadoop QA commented on YARN-2244:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656543/YARN-2244.005.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4362//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4362//console

This message is automatically generated.

 FairScheduler missing handling of containers for unknown application attempts 
 --

 Key: YARN-2244
 URL: https://issues.apache.org/jira/browse/YARN-2244
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-2224.patch, YARN-2244.001.patch, 
 YARN-2244.002.patch, YARN-2244.003.patch, YARN-2244.004.patch, 
 YARN-2244.005.patch


 We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other 
 fixes that were common across schedulers, there were some scheduler specific 
 fixes added to handle containers for unknown application attempts. Without 
 these fair scheduler simply logs that an unknown container was found and 
 continues to let it run. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts


[ 
https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066719#comment-14066719
 ] 

Karthik Kambatla commented on YARN-2244:


[~adhoot] - can you check if the test failures are related? 

 FairScheduler missing handling of containers for unknown application attempts 
 --

 Key: YARN-2244
 URL: https://issues.apache.org/jira/browse/YARN-2244
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-2224.patch, YARN-2244.001.patch, 
 YARN-2244.002.patch, YARN-2244.003.patch, YARN-2244.004.patch, 
 YARN-2244.005.patch


 We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other 
 fixes that were common across schedulers, there were some scheduler specific 
 fixes added to handle containers for unknown application attempts. Without 
 these fair scheduler simply logs that an unknown container was found and 
 continues to let it run. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066741#comment-14066741
 ] 

Hadoop QA commented on YARN-2008:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656545/YARN-2008.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4363//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4363//console

This message is automatically generated.

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-18 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066927#comment-14066927
 ] 

Xuan Gong commented on YARN-2208:
-

Committed to trunk and branch-2. Thanks Jian for review.

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, 
 YARN-2208.7.patch, YARN-2208.8.patch, YARN-2208.8.patch, YARN-2208.8.patch, 
 YARN-2208.9.patch, YARN-2208.9.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-810) Support CGroup ceiling enforcement on CPU

2014-07-18 Thread Wei Yan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wei Yan updated YARN-810:
-

Attachment: YARN-810.patch

Upload a patch for review.
(1) Add a configuration field cpu_enforce_ceiling_enabled to the
ApplicationSubmissionContext. Each application can set this field to true
(default is false) if it wants cpu ceiling enforcement.
(2) RM will notify the list of containers with cpu_enforce_ceiling_enabled with
NM through heartbeat. The heartbeat responsem message contains a list of
containerIds which are launched at current node and with ceiling enabled.
(3) The CgroupsLCEResource will set the cpu.cfs_period_us and cpu.cfs_quota_us
for containers with ceiling enabled.
(4) Update the distributed shell example to include the
cpu_enforce_ceiling_enabled configuration, so we can test this feature using
distributedshell.

Support CGroup ceiling enforcement on CPU
-

Key: YARN-810
URL: https://issues.apache.org/jira/browse/YARN-810
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini
Assignee: Sandy Ryza
Attachments: YARN-810.patch

Problem statement:
YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio.
Containers are then allowed to request vcores between the minimum and maximum
defined in the yarn-site.xml.
In the case where a single-threaded container requests 1 vcore, with a
pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of
the core it's using, provided that no other container is also using it. This
happens, even though the only guarantee that YARN/CGroups is making is that
the container will get at least 1/4th of the core.
If a second container then comes along, the second container can take
resources from the first, provided that the first container is still getting
at least its fair share (1/4th).
There are certain cases where this is desirable. There are also certain cases
where it might be desirable to have a hard limit on CPU usage, and not allow
the process to go above the specified resource requirement, even if it's
available.
Here's an RFC that describes the problem in more detail:
http://lwn.net/Articles/336127/
Solution:
As it happens, when CFS is used in combination with CGroups, you can enforce
a ceiling using two files in cgroups:
{noformat}
cpu.cfs_quota_us
cpu.cfs_period_us
{noformat}
The usage of these two files is documented in more detail here:
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
Testing:
I have tested YARN CGroups using the 2.0.5-alpha implementation. By default,
it behaves as described above (it is a soft cap, and allows containers to use
more than they asked for). I then tested CFS CPU quotas manually with YARN.
First, you can see that CFS is in use in the CGroup, based on the file names:
{noformat}
[criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
total 0
-r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
-r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
-rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
-rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
[criccomi@eat1-qa464 ~]$ sudo -u app cat
/cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
10
[criccomi@eat1-qa464 ~]$ sudo -u app cat
/cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
-1
{noformat}
Oddly, it appears that the cfs_period_us is set to .1s, not 1s.
We can place processes in hard limits. I have process 4370 running YARN
container container_1371141151815_0003_01_03 on a host. By default, it's
running at ~300% cpu usage.
{noformat}
CPU
4370 criccomi 20 0 1157m 551m 14m S 240.3 0.8 87:10.91 ...
{noformat}
When I set the CFS quote:
{noformat}
echo 1000
/cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
CPU
4370 criccomi 20 0 1157m 563m 14m S 1.0 0.8 90:08.39 ...
{noformat}
It drops to 1% usage, and you can see the box has room to spare:
{noformat}
Cpu(s): 2.4%us, 1.0%sy, 0.0%ni, 92.2%id, 4.2%wa, 0.0%hi, 0.1%si,
0.0%st
{noformat}
Turning the quota back to -1:
{noformat}
echo -1

[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken


[ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066943#comment-14066943
 ] 

Hudson commented on YARN-2208:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5918 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5918/])
YARN-2208. AMRMTokenManager need to have a way to roll over AMRMToken. 
Contributed by Xuan Gong (xgong: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611820)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenIdentifier.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/AMRMTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, 
 YARN-2208.7.patch, YARN-2208.8.patch, YARN-2208.8.patch, YARN-2208.8.patch, 
 YARN-2208.9.patch, YARN-2208.9.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts

2014-07-18 Thread Anubhav Dhoot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066938#comment-14066938
 ] 

Anubhav Dhoot commented on YARN-2244:
-

Seems unrelated . Most failures were with port binding issues
com.sun.jersey.test.framework.spi.container.TestContainerException: 
java.net.BindException: Address already in use
Will trigger a retest 

 FairScheduler missing handling of containers for unknown application attempts 
 --

 Key: YARN-2244
 URL: https://issues.apache.org/jira/browse/YARN-2244
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-2224.patch, YARN-2244.001.patch, 
 YARN-2244.002.patch, YARN-2244.003.patch, YARN-2244.004.patch, 
 YARN-2244.005.patch, YARN-2244.005.patch


 We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other 
 fixes that were common across schedulers, there were some scheduler specific 
 fixes added to handle containers for unknown application attempts. Without 
 these fair scheduler simply logs that an unknown container was found and 
 continues to let it run. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts

2014-07-18 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2244:


Attachment: YARN-2244.005.patch

Retrigger test

 FairScheduler missing handling of containers for unknown application attempts 
 --

 Key: YARN-2244
 URL: https://issues.apache.org/jira/browse/YARN-2244
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-2224.patch, YARN-2244.001.patch, 
 YARN-2244.002.patch, YARN-2244.003.patch, YARN-2244.004.patch, 
 YARN-2244.005.patch, YARN-2244.005.patch


 We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other 
 fixes that were common across schedulers, there were some scheduler specific 
 fixes added to handle containers for unknown application attempts. Without 
 these fair scheduler simply logs that an unknown container was found and 
 continues to let it run. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2131) Add a way to format the RMStateStore

2014-07-18 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067006#comment-14067006
 ] 

Robert Kanter commented on YARN-2131:
-

Given that Karthik created YARN-2268 and we can't use the multi operation, I 
think the addendum patch I uploaded already should be good, right?  It simply 
renames the command from -format to -format-state-store.

 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch, 
 YARN-2131_addendum.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts


[ 
https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067050#comment-14067050
 ] 

Hadoop QA commented on YARN-2244:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656583/YARN-2244.005.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4365//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4365//console

This message is automatically generated.

 FairScheduler missing handling of containers for unknown application attempts 
 --

 Key: YARN-2244
 URL: https://issues.apache.org/jira/browse/YARN-2244
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-2224.patch, YARN-2244.001.patch, 
 YARN-2244.002.patch, YARN-2244.003.patch, YARN-2244.004.patch, 
 YARN-2244.005.patch, YARN-2244.005.patch


 We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other 
 fixes that were common across schedulers, there were some scheduler specific 
 fixes added to handle containers for unknown application attempts. Without 
 these fair scheduler simply logs that an unknown container was found and 
 continues to let it run. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1342) Recover container tokens upon nodemanager restart

2014-07-18 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1342:
-

Attachment: YARN-1342v4.patch

Attaching a patch updated to trunk.

 Recover container tokens upon nodemanager restart
 -

 Key: YARN-1342
 URL: https://issues.apache.org/jira/browse/YARN-1342
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1342.patch, YARN-1342v2.patch, 
 YARN-1342v3-and-YARN-1987.patch, YARN-1342v4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU

[
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067063#comment-14067063
]

Hadoop QA commented on YARN-810:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12656584/YARN-810.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 4 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
org.apache.hadoop.yarn.util.TestFSDownload

org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers

org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices

org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices

org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/4364//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4364//console

This message is automatically generated.

Support CGroup ceiling enforcement on CPU
-

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067078#comment-14067078
 ] 

Craig Welch commented on YARN-2008:
---

And, the two which failed this time also pass on my box...

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2315) Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler.

2014-07-18 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2315:


Attachment: (was: YARN-2315.patch)

 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 ---

 Key: YARN-2315
 URL: https://issues.apache.org/jira/browse/YARN-2315
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2315.patch


 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
 different parameters so the first call is overrode by the second call. 
 queueInfo.setCapacity((float) getFairShare().getMemory() /
 scheduler.getClusterResource().getMemory());
 queueInfo.setCapacity((float) getResourceUsage().getMemory() /
 scheduler.getClusterResource().getMemory());
 We should change the second setCapacity call to setCurrentCapacity to 
 configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2315) Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler.

2014-07-18 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2315:


Attachment: YARN-2315.patch

 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 ---

 Key: YARN-2315
 URL: https://issues.apache.org/jira/browse/YARN-2315
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2315.patch


 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
 different parameters so the first call is overrode by the second call. 
 queueInfo.setCapacity((float) getFairShare().getMemory() /
 scheduler.getClusterResource().getMemory());
 queueInfo.setCapacity((float) getResourceUsage().getMemory() /
 scheduler.getClusterResource().getMemory());
 We should change the second setCapacity call to setCurrentCapacity to 
 configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1342) Recover container tokens upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067202#comment-14067202
 ] 

Hadoop QA commented on YARN-1342:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656604/YARN-1342v4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4366//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4366//console

This message is automatically generated.

 Recover container tokens upon nodemanager restart
 -

 Key: YARN-1342
 URL: https://issues.apache.org/jira/browse/YARN-1342
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1342.patch, YARN-1342v2.patch, 
 YARN-1342v3-and-YARN-1987.patch, YARN-1342v4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2045) Data persisted in NM should be versioned


[ 
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067210#comment-14067210
 ] 

Hadoop QA commented on YARN-2045:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656602/YARN-2045-v7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4367//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4367//console

This message is automatically generated.

 Data persisted in NM should be versioned
 

 Key: YARN-2045
 URL: https://issues.apache.org/jira/browse/YARN-2045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2045-v2.patch, YARN-2045-v3.patch, 
 YARN-2045-v4.patch, YARN-2045-v5.patch, YARN-2045-v6.patch, 
 YARN-2045-v7.patch, YARN-2045.patch


 As a split task from YARN-667, we want to add version info to NM related 
 data, include:
 - NodeManager local LevelDB state
 - NodeManager directory structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts


[ 
https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067241#comment-14067241
 ] 

Karthik Kambatla commented on YARN-2244:


Latest patch looks good to me. +1. 

 FairScheduler missing handling of containers for unknown application attempts 
 --

 Key: YARN-2244
 URL: https://issues.apache.org/jira/browse/YARN-2244
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-2224.patch, YARN-2244.001.patch, 
 YARN-2244.002.patch, YARN-2244.003.patch, YARN-2244.004.patch, 
 YARN-2244.005.patch, YARN-2244.005.patch


 We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other 
 fixes that were common across schedulers, there were some scheduler specific 
 fixes added to handle containers for unknown application attempts. Without 
 these fair scheduler simply logs that an unknown container was found and 
 continues to let it run. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts


[ 
https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067242#comment-14067242
 ] 

Karthik Kambatla commented on YARN-2244:


Committing this. 

 FairScheduler missing handling of containers for unknown application attempts 
 --

 Key: YARN-2244
 URL: https://issues.apache.org/jira/browse/YARN-2244
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-2224.patch, YARN-2244.001.patch, 
 YARN-2244.002.patch, YARN-2244.003.patch, YARN-2244.004.patch, 
 YARN-2244.005.patch, YARN-2244.005.patch


 We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other 
 fixes that were common across schedulers, there were some scheduler specific 
 fixes added to handle containers for unknown application attempts. Without 
 these fair scheduler simply logs that an unknown container was found and 
 continues to let it run. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-810) Support CGroup ceiling enforcement on CPU

2014-07-18 Thread Wei Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-810:
-

Attachment: YARN-810.patch

 Support CGroup ceiling enforcement on CPU
 -

 Key: YARN-810
 URL: https://issues.apache.org/jira/browse/YARN-810
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini
Assignee: Sandy Ryza
 Attachments: YARN-810.patch, YARN-810.patch


 Problem statement:
 YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
 Containers are then allowed to request vcores between the minimum and maximum 
 defined in the yarn-site.xml.
 In the case where a single-threaded container requests 1 vcore, with a 
 pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
 the core it's using, provided that no other container is also using it. This 
 happens, even though the only guarantee that YARN/CGroups is making is that 
 the container will get at least 1/4th of the core.
 If a second container then comes along, the second container can take 
 resources from the first, provided that the first container is still getting 
 at least its fair share (1/4th).
 There are certain cases where this is desirable. There are also certain cases 
 where it might be desirable to have a hard limit on CPU usage, and not allow 
 the process to go above the specified resource requirement, even if it's 
 available.
 Here's an RFC that describes the problem in more detail:
 http://lwn.net/Articles/336127/
 Solution:
 As it happens, when CFS is used in combination with CGroups, you can enforce 
 a ceiling using two files in cgroups:
 {noformat}
 cpu.cfs_quota_us
 cpu.cfs_period_us
 {noformat}
 The usage of these two files is documented in more detail here:
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
 Testing:
 I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
 it behaves as described above (it is a soft cap, and allows containers to use 
 more than they asked for). I then tested CFS CPU quotas manually with YARN.
 First, you can see that CFS is in use in the CGroup, based on the file names:
 {noformat}
 [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
 total 0
 -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
 drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
 -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
 -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
 -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
 10
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
 -1
 {noformat}
 Oddly, it appears that the cfs_period_us is set to .1s, not 1s.
 We can place processes in hard limits. I have process 4370 running YARN 
 container container_1371141151815_0003_01_03 on a host. By default, it's 
 running at ~300% cpu usage.
 {noformat}
 CPU
 4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
 {noformat}
 When I set the CFS quote:
 {noformat}
 echo 1000  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
  CPU
 4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
 {noformat}
 It drops to 1% usage, and you can see the box has room to spare:
 {noformat}
 Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si, 
 0.0%st
 {noformat}
 Turning the quota back to -1:
 {noformat}
 echo -1  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
 {noformat}
 Burns the cores again:
 {noformat}
 Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
 0.0%st
 CPU
 4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
 {noformat}
 On my dev box, I was testing CGroups by running a python process eight times, 
 to burn through all the cores, since it was doing as described above (giving 
 extra CPU to the process, even with a cpu.shares limit). Toggling the 
 cfs_quota_us seems to enforce a hard limit.
 Implementation:
 What do you guys think about introducing a variable to YarnConfiguration:
 bq.

[jira] [Commented] (YARN-2315) Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler.


[ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067279#comment-14067279
 ] 

Hadoop QA commented on YARN-2315:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656609/YARN-2315.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4368//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4368//console

This message is automatically generated.

 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 ---

 Key: YARN-2315
 URL: https://issues.apache.org/jira/browse/YARN-2315
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2315.patch


 Should use setCurrentCapacity instead of setCapacity to configure used 
 resource capacity for FairScheduler.
 In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
 different parameters so the first call is overrode by the second call. 
 queueInfo.setCapacity((float) getFairShare().getMemory() /
 scheduler.getClusterResource().getMemory());
 queueInfo.setCapacity((float) getResourceUsage().getMemory() /
 scheduler.getClusterResource().getMemory());
 We should change the second setCapacity call to setCurrentCapacity to 
 configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2273) NPE in ContinuousScheduling Thread crippled RM after DN flap


[ 
https://issues.apache.org/jira/browse/YARN-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067290#comment-14067290
 ] 

Karthik Kambatla commented on YARN-2273:


[~wei.yan] - you mentioned writing a unit test to reproduce the issue. Can we 
include that in the patch? 

 NPE in ContinuousScheduling Thread crippled RM after DN flap
 

 Key: YARN-2273
 URL: https://issues.apache.org/jira/browse/YARN-2273
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.3.0, 2.4.1
 Environment: cdh5.0.2 wheezy
Reporter: Andy Skelton
 Attachments: YARN-2273.patch, YARN-2273.patch


 One DN experienced memory errors and entered a cycle of rebooting and 
 rejoining the cluster. After the second time the node went away, the RM 
 produced this:
 {code}
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Application attempt appattempt_1404858438119_4352_01 released container 
 container_1404858438119_4352_01_04 on node: host: 
 node-A16-R09-19.hadoop.dfw.wordpress.com:8041 #containers=0 
 available=memory:8192, vCores:8 used=memory:0, vCores:0 with event: KILL
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Removed node node-A16-R09-19.hadoop.dfw.wordpress.com:8041 cluster capacity: 
 memory:335872, vCores:328
 2014-07-09 21:47:36,571 ERROR 
 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
 Thread[ContinuousScheduling,5,main] threw an Exception.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1044)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1040)
   at java.util.TimSort.countRunAndMakeAscending(TimSort.java:329)
   at java.util.TimSort.sort(TimSort.java:203)
   at java.util.TimSort.sort(TimSort.java:173)
   at java.util.Arrays.sort(Arrays.java:659)
   at java.util.Collections.sort(Collections.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousScheduling(FairScheduler.java:1012)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.access$600(FairScheduler.java:124)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$2.run(FairScheduler.java:1306)
   at java.lang.Thread.run(Thread.java:744)
 {code}
 A few cycles later YARN was crippled. The RM was running and jobs could be 
 submitted but containers were not assigned and no progress was made. 
 Restarting the RM resolved it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2273) NPE in ContinuousScheduling Thread crippled RM after DN flap

2014-07-18 Thread Wei Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2273:
--

Attachment: YARN-2273-replayException.patch

[~kasha], uploaded the testcase used before.

 NPE in ContinuousScheduling Thread crippled RM after DN flap
 

 Key: YARN-2273
 URL: https://issues.apache.org/jira/browse/YARN-2273
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.3.0, 2.4.1
 Environment: cdh5.0.2 wheezy
Reporter: Andy Skelton
 Attachments: YARN-2273-replayException.patch, YARN-2273.patch, 
 YARN-2273.patch


 One DN experienced memory errors and entered a cycle of rebooting and 
 rejoining the cluster. After the second time the node went away, the RM 
 produced this:
 {code}
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Application attempt appattempt_1404858438119_4352_01 released container 
 container_1404858438119_4352_01_04 on node: host: 
 node-A16-R09-19.hadoop.dfw.wordpress.com:8041 #containers=0 
 available=memory:8192, vCores:8 used=memory:0, vCores:0 with event: KILL
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Removed node node-A16-R09-19.hadoop.dfw.wordpress.com:8041 cluster capacity: 
 memory:335872, vCores:328
 2014-07-09 21:47:36,571 ERROR 
 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
 Thread[ContinuousScheduling,5,main] threw an Exception.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1044)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1040)
   at java.util.TimSort.countRunAndMakeAscending(TimSort.java:329)
   at java.util.TimSort.sort(TimSort.java:203)
   at java.util.TimSort.sort(TimSort.java:173)
   at java.util.Arrays.sort(Arrays.java:659)
   at java.util.Collections.sort(Collections.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousScheduling(FairScheduler.java:1012)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.access$600(FairScheduler.java:124)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$2.run(FairScheduler.java:1306)
   at java.lang.Thread.run(Thread.java:744)
 {code}
 A few cycles later YARN was crippled. The RM was running and jobs could be 
 submitted but containers were not assigned and no progress was made. 
 Restarting the RM resolved it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts


[ 
https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067328#comment-14067328
 ] 

Hudson commented on YARN-2244:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5920 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5920/])
YARN-2244. FairScheduler missing handling of containers for unknown application 
attempts. (Anubhav Dhoot via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611840)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java


 FairScheduler missing handling of containers for unknown application attempts 
 --

 Key: YARN-2244
 URL: https://issues.apache.org/jira/browse/YARN-2244
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-2224.patch, YARN-2244.001.patch, 
 YARN-2244.002.patch, YARN-2244.003.patch, YARN-2244.004.patch, 
 YARN-2244.005.patch, YARN-2244.005.patch


 We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other 
 fixes that were common across schedulers, there were some scheduler specific 
 fixes added to handle containers for unknown application attempts. Without 
 these fair scheduler simply logs that an unknown container was found and 
 continues to let it run. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2273) NPE in ContinuousScheduling Thread crippled RM after DN flap


[ 
https://issues.apache.org/jira/browse/YARN-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067347#comment-14067347
 ] 

Hadoop QA commented on YARN-2273:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12656686/YARN-2273-replayException.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4370//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4370//console

This message is automatically generated.

 NPE in ContinuousScheduling Thread crippled RM after DN flap
 

 Key: YARN-2273
 URL: https://issues.apache.org/jira/browse/YARN-2273
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.3.0, 2.4.1
 Environment: cdh5.0.2 wheezy
Reporter: Andy Skelton
 Attachments: YARN-2273-replayException.patch, YARN-2273.patch, 
 YARN-2273.patch


 One DN experienced memory errors and entered a cycle of rebooting and 
 rejoining the cluster. After the second time the node went away, the RM 
 produced this:
 {code}
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Application attempt appattempt_1404858438119_4352_01 released container 
 container_1404858438119_4352_01_04 on node: host: 
 node-A16-R09-19.hadoop.dfw.wordpress.com:8041 #containers=0 
 available=memory:8192, vCores:8 used=memory:0, vCores:0 with event: KILL
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Removed node node-A16-R09-19.hadoop.dfw.wordpress.com:8041 cluster capacity: 
 memory:335872, vCores:328
 2014-07-09 21:47:36,571 ERROR 
 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
 Thread[ContinuousScheduling,5,main] threw an Exception.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1044)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1040)
   at java.util.TimSort.countRunAndMakeAscending(TimSort.java:329)
   at java.util.TimSort.sort(TimSort.java:203)
   at java.util.TimSort.sort(TimSort.java:173)
   at java.util.Arrays.sort(Arrays.java:659)
   at java.util.Collections.sort(Collections.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousScheduling(FairScheduler.java:1012)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.access$600(FairScheduler.java:124)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$2.run(FairScheduler.java:1306)
   at java.lang.Thread.run(Thread.java:744)
 {code}
 A few cycles later YARN was crippled. The RM was running and jobs could be 
 submitted but containers were not assigned and no progress was made. 
 Restarting the RM resolved it.



--

[jira] [Updated] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-18 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2211:


Attachment: YARN-2211.5.patch

 RMStateStore needs to save AMRMToken master key for recovery when RM 
 restart/failover happens 
 --

 Key: YARN-2211
 URL: https://issues.apache.org/jira/browse/YARN-2211
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, 
 YARN-2211.4.patch, YARN-2211.5.patch


 After YARN-2208, AMRMToken can be rolled over periodically. We need to save 
 related Master Keys and use them to recover the AMRMToken when RM 
 restart/failover happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU