[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2

2014-06-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015227#comment-14015227
 ] 

Tsuyoshi OZAWA commented on YARN-2075:
--

I could reproduce this problem on both trunk and branch-2 and the patch works 
well both of them on my local. [~mitdesai], can you tell us what command did 
you run? I ran {{mvn clean test -Dtest=TestRMAdminCLI}} with the patch and it 
works well.

 TestRMAdminCLI consistently fail on trunk and branch-2
 --

 Key: YARN-2075
 URL: https://issues.apache.org/jira/browse/YARN-2075
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-2075.patch


 {code}
 Running org.apache.hadoop.yarn.client.TestRMAdminCLI
 Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec 
  FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI
 testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
 elapsed: 0.082 sec   ERROR!
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.088 
 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1874) Cleanup: Move RMActiveServices out of ResourceManager into its own file

2014-06-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015228#comment-14015228
 ] 

Tsuyoshi OZAWA commented on YARN-1874:
--

I found that the test failure is not related to a patch - it's filed on 
YARN-2075. Resubmitted a patch without updating.

 Cleanup: Move RMActiveServices out of ResourceManager into its own file
 ---

 Key: YARN-1874
 URL: https://issues.apache.org/jira/browse/YARN-1874
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1874.1.patch, YARN-1874.2.patch, YARN-1874.3.patch, 
 YARN-1874.4.patch


 As [~vinodkv] noticed on YARN-1867, ResourceManager is hard to maintain. We 
 should move RMActiveServices out to make it more manageable. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-804) mark AbstractService init/start/stop methods as final

2014-06-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-804.
-

Resolution: Won't Fix

I don't think we can fix this while mocking is used to test some aspects of the 
implementation classes...WONTFIX unless there's a workaround

 mark AbstractService init/start/stop methods as final
 -

 Key: YARN-804
 URL: https://issues.apache.org/jira/browse/YARN-804
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Steve Loughran
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-804-001.patch


 Now that YARN-117 and MAPREDUCE-5298 are checked in, we can mark the public 
 AbstractService init/start/stop methods as final.
 Why? It puts the lifecycle check and error handling around the subclass code, 
 ensuring no lifecycle method gets called in the wrong state or gets called 
 more than once.When a {{serviceInit(), serviceStart()   serviceStop()}} 
 method throws an exception, it's caught and auto-triggers stop. 
 Marking the methods as final forces service implementations to move to the 
 stricter lifecycle. It has one side effect: some of the mocking tests play up 
 -I'll need some assistance here



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-06-02 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1913:
--

Attachment: YARN-1913.patch

Thanks, Sandy.
Upload a new patch, move the AM resource usage check to AppSchedulable.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder

2014-06-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015275#comment-14015275
 ] 

Hudson commented on YARN-2103:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5642 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5642/])
YARN-2103. Inconsistency between viaProto flag and initial value of 
SerializedExceptionProto.Builder (Contributed by Binglin Chang) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1599115)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/SerializedExceptionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb/TestSerializedExceptionPBImpl.java


 Inconsistency between viaProto flag and initial value of 
 SerializedExceptionProto.Builder
 -

 Key: YARN-2103
 URL: https://issues.apache.org/jira/browse/YARN-2103
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.5.0

 Attachments: YARN-2103.v1.patch, YARN-2103.v2.patch, 
 YARN-2103.v3.patch


 Bug 1:
 {code}
   SerializedExceptionProto proto = SerializedExceptionProto
   .getDefaultInstance();
   SerializedExceptionProto.Builder builder = null;
   boolean viaProto = false;
 {code}
 Since viaProto is false, we should initiate build rather than proto
 Bug 2:
 the class does not provide hashcode() and equals() like other PBImpl records, 
 this class is used in other records, it may affect other records' behavior. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-741) Mark yarn.service package as public unstable

2014-06-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015298#comment-14015298
 ] 

Steve Loughran commented on YARN-741:
-

fixed in YARN-825

 Mark yarn.service package as public unstable
 

 Key: YARN-741
 URL: https://issues.apache.org/jira/browse/YARN-741
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran

 The package info file {{/org/apache/hadoop/yarn/service/package-info.java}} 
 marks the package as private -yet its something all YARN apps need to use (by 
 way of {{YarnClientImpl}}, and it's something all YARN AMs and containers 
 should be building from. 
 Once we are happy with the API and the documentation, mark it as public, 
 leaving it unstable until we have been using it enough to be confident that 
 it is



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-741) Mark yarn.service package as public unstable

2014-06-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-741.
-

Resolution: Duplicate

 Mark yarn.service package as public unstable
 

 Key: YARN-741
 URL: https://issues.apache.org/jira/browse/YARN-741
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran

 The package info file {{/org/apache/hadoop/yarn/service/package-info.java}} 
 marks the package as private -yet its something all YARN apps need to use (by 
 way of {{YarnClientImpl}}, and it's something all YARN AMs and containers 
 should be building from. 
 Once we are happy with the API and the documentation, mark it as public, 
 leaving it unstable until we have been using it enough to be confident that 
 it is



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015300#comment-14015300
 ] 

Hadoop QA commented on YARN-1913:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647873/YARN-1913.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3884//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3884//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3884//console

This message is automatically generated.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder

2014-06-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015334#comment-14015334
 ] 

Hudson commented on YARN-2103:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #571 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/571/])
YARN-2103. Inconsistency between viaProto flag and initial value of 
SerializedExceptionProto.Builder (Contributed by Binglin Chang) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1599115)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/SerializedExceptionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb/TestSerializedExceptionPBImpl.java


 Inconsistency between viaProto flag and initial value of 
 SerializedExceptionProto.Builder
 -

 Key: YARN-2103
 URL: https://issues.apache.org/jira/browse/YARN-2103
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.5.0

 Attachments: YARN-2103.v1.patch, YARN-2103.v2.patch, 
 YARN-2103.v3.patch


 Bug 1:
 {code}
   SerializedExceptionProto proto = SerializedExceptionProto
   .getDefaultInstance();
   SerializedExceptionProto.Builder builder = null;
   boolean viaProto = false;
 {code}
 Since viaProto is false, we should initiate build rather than proto
 Bug 2:
 the class does not provide hashcode() and equals() like other PBImpl records, 
 this class is used in other records, it may affect other records' behavior. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2117) Close of Reader in TimelineAuthenticationFilterInitializer#initFilter() should be enclosed in finally block

2014-06-02 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2117:
--

Attachment: YARN-2117.patch

 Close of Reader in TimelineAuthenticationFilterInitializer#initFilter() 
 should be enclosed in finally block
 ---

 Key: YARN-2117
 URL: https://issues.apache.org/jira/browse/YARN-2117
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Ted Yu
Assignee: Chen He
Priority: Minor
  Labels: newbie
 Attachments: YARN-2117.patch


 Here is related code:
 {code}
 Reader reader = new FileReader(signatureSecretFile);
 int c = reader.read();
 while (c  -1) {
   secret.append((char) c);
   c = reader.read();
 }
 reader.close();
 {code}
 If IOException is thrown out of reader.read(), reader would be left unclosed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2117) Close of Reader in TimelineAuthenticationFilterInitializer#initFilter() should be enclosed in finally block

2014-06-02 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2117:
--

Attachment: YARN-2117.patch

 Close of Reader in TimelineAuthenticationFilterInitializer#initFilter() 
 should be enclosed in finally block
 ---

 Key: YARN-2117
 URL: https://issues.apache.org/jira/browse/YARN-2117
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Ted Yu
Assignee: Chen He
Priority: Minor
  Labels: newbie
 Attachments: YARN-2117.patch


 Here is related code:
 {code}
 Reader reader = new FileReader(signatureSecretFile);
 int c = reader.read();
 while (c  -1) {
   secret.append((char) c);
   c = reader.read();
 }
 reader.close();
 {code}
 If IOException is thrown out of reader.read(), reader would be left unclosed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2117) Close of Reader in TimelineAuthenticationFilterInitializer#initFilter() should be enclosed in finally block

2014-06-02 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2117:
--

Attachment: (was: YARN-2117.patch)

 Close of Reader in TimelineAuthenticationFilterInitializer#initFilter() 
 should be enclosed in finally block
 ---

 Key: YARN-2117
 URL: https://issues.apache.org/jira/browse/YARN-2117
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Ted Yu
Assignee: Chen He
Priority: Minor
  Labels: newbie
 Attachments: YARN-2117.patch


 Here is related code:
 {code}
 Reader reader = new FileReader(signatureSecretFile);
 int c = reader.read();
 while (c  -1) {
   secret.append((char) c);
   c = reader.read();
 }
 reader.close();
 {code}
 If IOException is thrown out of reader.read(), reader would be left unclosed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2

2014-06-02 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015401#comment-14015401
 ] 

Mit Desai commented on YARN-2075:
-

[~ozawa] and [~kj-ki] that was my bad.
My local repo might not have been updated when I tested. I tested the patch and 
it work fine for me too.

Patch looks good to me.
+1 (non binding)

 TestRMAdminCLI consistently fail on trunk and branch-2
 --

 Key: YARN-2075
 URL: https://issues.apache.org/jira/browse/YARN-2075
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-2075.patch


 {code}
 Running org.apache.hadoop.yarn.client.TestRMAdminCLI
 Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec 
  FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI
 testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
 elapsed: 0.082 sec   ERROR!
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.088 
 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1874) Cleanup: Move RMActiveServices out of ResourceManager into its own file

2014-06-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1874:
-

Attachment: YARN-1874.4.patch

 Cleanup: Move RMActiveServices out of ResourceManager into its own file
 ---

 Key: YARN-1874
 URL: https://issues.apache.org/jira/browse/YARN-1874
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1874.1.patch, YARN-1874.2.patch, YARN-1874.3.patch, 
 YARN-1874.4.patch


 As [~vinodkv] noticed on YARN-1867, ResourceManager is hard to maintain. We 
 should move RMActiveServices out to make it more manageable. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1874) Cleanup: Move RMActiveServices out of ResourceManager into its own file

2014-06-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1874:
-

Attachment: (was: YARN-1874.4.patch)

 Cleanup: Move RMActiveServices out of ResourceManager into its own file
 ---

 Key: YARN-1874
 URL: https://issues.apache.org/jira/browse/YARN-1874
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1874.1.patch, YARN-1874.2.patch, YARN-1874.3.patch, 
 YARN-1874.4.patch


 As [~vinodkv] noticed on YARN-1867, ResourceManager is hard to maintain. We 
 should move RMActiveServices out to make it more manageable. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder

2014-06-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015407#comment-14015407
 ] 

Hudson commented on YARN-2103:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1762 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1762/])
YARN-2103. Inconsistency between viaProto flag and initial value of 
SerializedExceptionProto.Builder (Contributed by Binglin Chang) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1599115)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/SerializedExceptionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb/TestSerializedExceptionPBImpl.java


 Inconsistency between viaProto flag and initial value of 
 SerializedExceptionProto.Builder
 -

 Key: YARN-2103
 URL: https://issues.apache.org/jira/browse/YARN-2103
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.5.0

 Attachments: YARN-2103.v1.patch, YARN-2103.v2.patch, 
 YARN-2103.v3.patch


 Bug 1:
 {code}
   SerializedExceptionProto proto = SerializedExceptionProto
   .getDefaultInstance();
   SerializedExceptionProto.Builder builder = null;
   boolean viaProto = false;
 {code}
 Since viaProto is false, we should initiate build rather than proto
 Bug 2:
 the class does not provide hashcode() and equals() like other PBImpl records, 
 this class is used in other records, it may affect other records' behavior. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-06-02 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1913:
--

Attachment: YARN-1913.patch

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder

2014-06-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015414#comment-14015414
 ] 

Tsuyoshi OZAWA commented on YARN-2103:
--

Thanks and good job, [~decster], and thanks you for committing and review, 
[~djp]!

 Inconsistency between viaProto flag and initial value of 
 SerializedExceptionProto.Builder
 -

 Key: YARN-2103
 URL: https://issues.apache.org/jira/browse/YARN-2103
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.5.0

 Attachments: YARN-2103.v1.patch, YARN-2103.v2.patch, 
 YARN-2103.v3.patch


 Bug 1:
 {code}
   SerializedExceptionProto proto = SerializedExceptionProto
   .getDefaultInstance();
   SerializedExceptionProto.Builder builder = null;
   boolean viaProto = false;
 {code}
 Since viaProto is false, we should initiate build rather than proto
 Bug 2:
 the class does not provide hashcode() and equals() like other PBImpl records, 
 this class is used in other records, it may affect other records' behavior. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2

2014-06-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015421#comment-14015421
 ] 

Tsuyoshi OZAWA commented on YARN-2075:
--

[~mitdesai], Thanks for reporitng. +1 (non-binding), too. [~zjshen], could you 
take a look, please?

 TestRMAdminCLI consistently fail on trunk and branch-2
 --

 Key: YARN-2075
 URL: https://issues.apache.org/jira/browse/YARN-2075
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-2075.patch


 {code}
 Running org.apache.hadoop.yarn.client.TestRMAdminCLI
 Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec 
  FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI
 testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
 elapsed: 0.082 sec   ERROR!
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.088 
 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2117) Close of Reader in TimelineAuthenticationFilterInitializer#initFilter() should be enclosed in finally block

2014-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015429#comment-14015429
 ] 

Hadoop QA commented on YARN-2117:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647900/YARN-2117.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3885//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3885//console

This message is automatically generated.

 Close of Reader in TimelineAuthenticationFilterInitializer#initFilter() 
 should be enclosed in finally block
 ---

 Key: YARN-2117
 URL: https://issues.apache.org/jira/browse/YARN-2117
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Ted Yu
Assignee: Chen He
Priority: Minor
  Labels: newbie
 Attachments: YARN-2117.patch


 Here is related code:
 {code}
 Reader reader = new FileReader(signatureSecretFile);
 int c = reader.read();
 while (c  -1) {
   secret.append((char) c);
   c = reader.read();
 }
 reader.close();
 {code}
 If IOException is thrown out of reader.read(), reader would be left unclosed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1550) NPE in FairSchedulerAppsBlock#render

2014-06-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-1550:
--

Assignee: Anubhav Dhoot

Looks good to me. +1. Committing this shortly. 

 NPE in FairSchedulerAppsBlock#render
 

 Key: YARN-1550
 URL: https://issues.apache.org/jira/browse/YARN-1550
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: caolong
Assignee: Anubhav Dhoot
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1550.001.patch, YARN-1550.002.patch, YARN-1550.patch


 three Steps :
 1、debug at RMAppManager#submitApplication after code
 if (rmContext.getRMApps().putIfAbsent(applicationId, application) !=
 null) {
   String message = Application with id  + applicationId
   +  is already present! Cannot add a duplicate!;
   LOG.warn(message);
   throw RPCUtil.getRemoteException(message);
 }
 2、submit one application:hadoop jar 
 ~/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-ydh2.2.0-tests.jar
  sleep -Dhadoop.job.ugi=test2,#11 -Dmapreduce.job.queuename=p1 -m 1 -mt 1 
 -r 1
 3、go in page :http://ip:50030/cluster/scheduler and find 500 ERROR!
 the log:
 {noformat}
 2013-12-30 11:51:43,795 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/scheduler
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:96)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1550) NPE in FairSchedulerAppsBlock#render

2014-06-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015439#comment-14015439
 ] 

Karthik Kambatla commented on YARN-1550:


Actually, I run into the following NPE when running the new test locally. 
[~adhoot] - can you please take a look, it might be other changes that went in 
the interim? 

{noformat}
java.lang.NullPointerException: null
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.ClusterMetricsInfo.init(ClusterMetricsInfo.java:65)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.MetricsOverviewTable.render(MetricsOverviewTable.java:58)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
{noformat}



 NPE in FairSchedulerAppsBlock#render
 

 Key: YARN-1550
 URL: https://issues.apache.org/jira/browse/YARN-1550
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: caolong
Assignee: Anubhav Dhoot
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1550.001.patch, YARN-1550.002.patch, YARN-1550.patch


 three Steps :
 1、debug at RMAppManager#submitApplication after code
 if (rmContext.getRMApps().putIfAbsent(applicationId, application) !=
 null) {
   String message = Application with id  + applicationId
   +  is already present! Cannot add a duplicate!;
   LOG.warn(message);
   throw RPCUtil.getRemoteException(message);
 }
 2、submit one application:hadoop jar 
 ~/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-ydh2.2.0-tests.jar
  sleep -Dhadoop.job.ugi=test2,#11 -Dmapreduce.job.queuename=p1 -m 1 -mt 1 
 -r 1
 3、go in page :http://ip:50030/cluster/scheduler and find 500 ERROR!
 the log:
 {noformat}
 2013-12-30 11:51:43,795 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/scheduler
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:96)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015442#comment-14015442
 ] 

Hadoop QA commented on YARN-1913:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647905/YARN-1913.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3887//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3887//console

This message is automatically generated.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1874) Cleanup: Move RMActiveServices out of ResourceManager into its own file

2014-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015453#comment-14015453
 ] 

Hadoop QA commented on YARN-1874:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647902/YARN-1874.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 20 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.TestRMAdminCLI

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3886//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3886//console

This message is automatically generated.

 Cleanup: Move RMActiveServices out of ResourceManager into its own file
 ---

 Key: YARN-1874
 URL: https://issues.apache.org/jira/browse/YARN-1874
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1874.1.patch, YARN-1874.2.patch, YARN-1874.3.patch, 
 YARN-1874.4.patch


 As [~vinodkv] noticed on YARN-1867, ResourceManager is hard to maintain. We 
 should move RMActiveServices out to make it more manageable. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder

2014-06-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015456#comment-14015456
 ] 

Hudson commented on YARN-2103:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1789 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1789/])
YARN-2103. Inconsistency between viaProto flag and initial value of 
SerializedExceptionProto.Builder (Contributed by Binglin Chang) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1599115)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/SerializedExceptionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb/TestSerializedExceptionPBImpl.java


 Inconsistency between viaProto flag and initial value of 
 SerializedExceptionProto.Builder
 -

 Key: YARN-2103
 URL: https://issues.apache.org/jira/browse/YARN-2103
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.5.0

 Attachments: YARN-2103.v1.patch, YARN-2103.v2.patch, 
 YARN-2103.v3.patch


 Bug 1:
 {code}
   SerializedExceptionProto proto = SerializedExceptionProto
   .getDefaultInstance();
   SerializedExceptionProto.Builder builder = null;
   boolean viaProto = false;
 {code}
 Since viaProto is false, we should initiate build rather than proto
 Bug 2:
 the class does not provide hashcode() and equals() like other PBImpl records, 
 this class is used in other records, it may affect other records' behavior. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-06-02 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015518#comment-14015518
 ] 

Sandy Ryza commented on YARN-1913:
--

This is looking good.  A small things.

AppSchedulingInfo is only used to track pending resources.  We should hold 
amResource in SchedulerApplicationAttempt.

{code}
+  if (! queue.canRunAppAM(app.getAMResource())) {
{code}
Take out space after exclamation point.

{code}
   @Override
+  public boolean checkIfAMResourceUsageOverLimit(Resource usage, Resource 
maxAMResource) {
+return Resources.greaterThan(RESOURCE_CALCULATOR, null, usage, 
maxAMResource);
+  }
{code}
Simpler to just use usage.getMemory()  maxAMResource.getMemory().

{code}
+  if 
(request.getPriority().equals(RMAppAttemptImpl.AM_CONTAINER_PRIORITY)) {
{code}
I'm a little nervous about using the priority here because apps could 
unwittingly submit all requests at that priority.  Can we use 
SchedulerApplicationAttempt.getLiveContainers().isEmpty()?

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-06-02 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015536#comment-14015536
 ] 

Wei Yan commented on YARN-1913:
---

Thanks, Sandy.
One problem may exist if we use 
SchedulerApplicationAttempt.getLiveContainers().isEmpty(), if the application 
is unManagedAM, it will not generate an AM resource request. Thus, the first 
request would be an actual task, not an AM.
Correct me if I'm wrong here.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1874) Cleanup: Move RMActiveServices out of ResourceManager into its own file

2014-06-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015588#comment-14015588
 ] 

Tsuyoshi OZAWA commented on YARN-1874:
--

It's ready for review. This patch includes following changes:

1. Moved RMActiveServices out of ResourceManager into its own file.
2. Added {{getRMAppManager}}, {{getQueueACLsManager}}, 
{{getApplicationACLsManager}} to RMContext.
3. Changed tests to override {{ResourceManager#createAndInitActiveServices}} 
method.

 Cleanup: Move RMActiveServices out of ResourceManager into its own file
 ---

 Key: YARN-1874
 URL: https://issues.apache.org/jira/browse/YARN-1874
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1874.1.patch, YARN-1874.2.patch, YARN-1874.3.patch, 
 YARN-1874.4.patch


 As [~vinodkv] noticed on YARN-1867, ResourceManager is hard to maintain. We 
 should move RMActiveServices out to make it more manageable. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-06-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015611#comment-14015611
 ] 

Bikas Saha commented on YARN-2091:
--

Can this miss a case when the exitCode has not been set (eg when the container 
crashes on its own)? Should we check if the exitCode has already been set (eg. 
via a kill event) and if its not set then set it from exitEvent? How can we 
check if the exitCode has not been set? Maybe have some uninitialized/invalid 
default value.
{code}@@ -829,7 +829,6 @@ public void transition(ContainerImpl container, 
ContainerEvent event) {
 @Override
 public void transition(ContainerImpl container, ContainerEvent event) {
   ContainerExitEvent exitEvent = (ContainerExitEvent) event;
-  container.exitCode = exitEvent.getExitCode();{code}

The new exit status code need better comments/docs. E.g. what is the difference 
between to 2 new appmaster related exit status. Is kill_by_resourcemanager a 
generic value that can be replaced later on by a more specific reason like 
preempted?

 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
 YARN-2091.4.patch, YARN-2091.5.patch, YARN-2091.6.patch


 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2119) Fix the DEFAULT_PROXY_ADDRESS used for getBindAddress to fix 1590

2014-06-02 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-2119:
---

 Summary: Fix the DEFAULT_PROXY_ADDRESS used for getBindAddress to 
fix 1590
 Key: YARN-2119
 URL: https://issues.apache.org/jira/browse/YARN-2119
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot


The fix for [YARN-1590|https://issues.apache.org/jira/browse/YARN-1590] 
introduced an method to get web proxy bind address with the incorrect default 
port. Because all the users of the method (only 1 user) ignores the port, its 
not breaking anything yet. Fixing it in case someone else uses this in the 
future. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2120) Coloring queues running over minShare on RM Scheduler page

2014-06-02 Thread Siqi Li (JIRA)
Siqi Li created YARN-2120:
-

 Summary: Coloring queues running over minShare on RM Scheduler page
 Key: YARN-2120
 URL: https://issues.apache.org/jira/browse/YARN-2120
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Siqi Li






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-06-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015653#comment-14015653
 ] 

Karthik Kambatla commented on YARN-2010:


Sorry, the commit messages are for the wrong JIRA. Will fix them up.

 RM can't transition to active if it can't recover an app attempt
 

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
 yarn-2010-3.patch


 If the RM fails to recover an app attempt, it won't come up. We should make 
 it more resilient.
 Specifically, the underlying error is that the app was submitted before 
 Kerberos security got turned on. Makes sense for the app to fail in this 
 case. But YARN should still start.
 {noformat}
 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election 
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
 Active 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
  
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
  
 ... 4 more 
 Caused by: org.apache.hadoop.service.ServiceStateException: 
 org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
  
 ... 5 more 
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
 ... 8 more 
 Caused by: java.lang.IllegalArgumentException: Missing argument 
 at javax.crypto.spec.SecretKeySpec.init(SecretKeySpec.java:93) 
 at 
 org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
  
 ... 13 more 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2120) Coloring queues running over minShare on RM Scheduler page

2014-06-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2120:
--

Description: 
Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
MaxCapacity.
Since fairShare is displaying with dotted line, I think we can stop displaying 
orange when a queue over its fairshare.
It would be better to show a queue running over minShare with orange color, so 
that we know queue is running more than its min share. 
Also, we can display a queue running at maxShare with red color.

 Coloring queues running over minShare on RM Scheduler page
 --

 Key: YARN-2120
 URL: https://issues.apache.org/jira/browse/YARN-2120
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Siqi Li

 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 Since fairShare is displaying with dotted line, I think we can stop 
 displaying orange when a queue over its fairshare.
 It would be better to show a queue running over minShare with orange color, 
 so that we know queue is running more than its min share. 
 Also, we can display a queue running at maxShare with red color.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2120) Coloring queues running over minShare on RM Scheduler page

2014-06-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2120:
--

Attachment: AD45B623-9F14-420B-B1FB-1186E2B5EC4A.png

 Coloring queues running over minShare on RM Scheduler page
 --

 Key: YARN-2120
 URL: https://issues.apache.org/jira/browse/YARN-2120
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Siqi Li
 Attachments: AD45B623-9F14-420B-B1FB-1186E2B5EC4A.png


 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 Since fairShare is displaying with dotted line, I think we can stop 
 displaying orange when a queue over its fairshare.
 It would be better to show a queue running over minShare with orange color, 
 so that we know queue is running more than its min share. 
 Also, we can display a queue running at maxShare with red color.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2119) Fix the DEFAULT_PROXY_ADDRESS used for getBindAddress to fix 1590

2014-06-02 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2119:


Attachment: YARN-2119.patch

Fix with unit tests. Ran 
org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServer tests

 Fix the DEFAULT_PROXY_ADDRESS used for getBindAddress to fix 1590
 -

 Key: YARN-2119
 URL: https://issues.apache.org/jira/browse/YARN-2119
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2119.patch


 The fix for [YARN-1590|https://issues.apache.org/jira/browse/YARN-1590] 
 introduced an method to get web proxy bind address with the incorrect default 
 port. Because all the users of the method (only 1 user) ignores the port, its 
 not breaking anything yet. Fixing it in case someone else uses this in the 
 future. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2108) Show minShare on RM Fair Scheduler page

2014-06-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li resolved YARN-2108.
---

Resolution: Duplicate

 Show minShare on RM Fair Scheduler page
 ---

 Key: YARN-2108
 URL: https://issues.apache.org/jira/browse/YARN-2108
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2108.v1.patch, YARN-2108.v2.patch


 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 It would be better to show MinShare with possibly different color code, so 
 that we know queue is running more than its min share. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (YARN-2108) Show minShare on RM Fair Scheduler page

2014-06-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li reopened YARN-2108:
---


 Show minShare on RM Fair Scheduler page
 ---

 Key: YARN-2108
 URL: https://issues.apache.org/jira/browse/YARN-2108
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2108.v1.patch, YARN-2108.v2.patch


 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 It would be better to show MinShare with possibly different color code, so 
 that we know queue is running more than its min share. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2120) Coloring queues running over minShare on RM Scheduler page

2014-06-02 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015732#comment-14015732
 ] 

Siqi Li commented on YARN-2120:
---

Attached screenshot for proposed coloring scheme

 Coloring queues running over minShare on RM Scheduler page
 --

 Key: YARN-2120
 URL: https://issues.apache.org/jira/browse/YARN-2120
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Siqi Li
 Attachments: AD45B623-9F14-420B-B1FB-1186E2B5EC4A.png


 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 Since fairShare is displaying with dotted line, I think we can stop 
 displaying orange when a queue over its fairshare.
 It would be better to show a queue running over minShare with orange color, 
 so that we know queue is running more than its min share. 
 Also, we can display a queue running at maxShare with red color.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2108) Show minShare on RM Fair Scheduler page

2014-06-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li resolved YARN-2108.
---

Resolution: Duplicate

 Show minShare on RM Fair Scheduler page
 ---

 Key: YARN-2108
 URL: https://issues.apache.org/jira/browse/YARN-2108
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2108.v1.patch, YARN-2108.v2.patch


 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 It would be better to show MinShare with possibly different color code, so 
 that we know queue is running more than its min share. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1550) NPE in FairSchedulerAppsBlock#render

2014-06-02 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1550:


Attachment: YARN-1550.003.patch

Fixed failures after resolving with some interim changes that were checked in.

 NPE in FairSchedulerAppsBlock#render
 

 Key: YARN-1550
 URL: https://issues.apache.org/jira/browse/YARN-1550
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: caolong
Assignee: Anubhav Dhoot
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1550.001.patch, YARN-1550.002.patch, 
 YARN-1550.003.patch, YARN-1550.patch


 three Steps :
 1、debug at RMAppManager#submitApplication after code
 if (rmContext.getRMApps().putIfAbsent(applicationId, application) !=
 null) {
   String message = Application with id  + applicationId
   +  is already present! Cannot add a duplicate!;
   LOG.warn(message);
   throw RPCUtil.getRemoteException(message);
 }
 2、submit one application:hadoop jar 
 ~/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-ydh2.2.0-tests.jar
  sleep -Dhadoop.job.ugi=test2,#11 -Dmapreduce.job.queuename=p1 -m 1 -mt 1 
 -r 1
 3、go in page :http://ip:50030/cluster/scheduler and find 500 ERROR!
 the log:
 {noformat}
 2013-12-30 11:51:43,795 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/scheduler
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:96)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2121) TimelineAuthenticator#hasDelegationToken may throw NPE

2014-06-02 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2121:
-

 Summary: TimelineAuthenticator#hasDelegationToken may throw NPE
 Key: YARN-2121
 URL: https://issues.apache.org/jira/browse/YARN-2121
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen


{code}
  private boolean hasDelegationToken(URL url) {
return url.getQuery().contains(
TimelineAuthenticationConsts.DELEGATION_PARAM + =);
  }
{code}

If the given url doesn't have any params at all. It will throw NPE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-06-02 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1913:
--

Attachment: YARN-1913.patch

Update a patch. Use getLiveContainer().size() and unManagedAM to detect the AM 
container.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2119) Fix the DEFAULT_PROXY_ADDRESS used for getBindAddress to fix 1590

2014-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015763#comment-14015763
 ] 

Hadoop QA commented on YARN-2119:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647959/YARN-2119.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3888//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3888//console

This message is automatically generated.

 Fix the DEFAULT_PROXY_ADDRESS used for getBindAddress to fix 1590
 -

 Key: YARN-2119
 URL: https://issues.apache.org/jira/browse/YARN-2119
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2119.patch


 The fix for [YARN-1590|https://issues.apache.org/jira/browse/YARN-1590] 
 introduced an method to get web proxy bind address with the incorrect default 
 port. Because all the users of the method (only 1 user) ignores the port, its 
 not breaking anything yet. Fixing it in case someone else uses this in the 
 future. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-06-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015776#comment-14015776
 ] 

Vinod Kumar Vavilapalli commented on YARN-2010:
---

bq. It is true that the first time we encountered this was during an upgrade 
from non-secure to secure cluster.
My point is that this is a non-supported use-case. Let's make that explicit by 
throwing appropriate exception with the right message * (1)

bq. However, as I mentioned earlier in the JIRA, it is possible to run into 
this in other situations.
Let's figure out what these situations are and make sure they are handled 
correctly * (2). Skipping apps in all cases is likely not the right solution.

bq. Even in the case of upgrading from non-secure to secure cluster, I totally 
understand we can't support recovering running/completed applications. However, 
one shouldn't have to explicitly nuke the ZK store (which by the way is 
involved due to the ACLs-magic and lacks an rmadmin command) to be able to 
start the RM.
On the other hand, couple with [(1) above, that is exactly what I'd expect. If 
we skip applications automatically in all cases, that may be a worse thing to 
happen. - suddenly users will see that they are losing apps for a reason that 
is not so obvious to them. The risk of crashing the RM is that there is a need 
manual intervention with a longer downtime. But with (2) above, that risk will 
be mitigated a lot. Even if we decide to skip them, the outcome is the same - 
losing the apps.. But it rather be a conscious decision by the admins.

Crux of my argument is, let's not do a blanket 
{code}
try {
  .. 
} catch (Exception) {
 continue;
}
{code}
Instead do
{code}
try {
  .. 
} catch (Exception type1) {
 // handle correctly
}  catch (Exception type2) {
 // handle correctly
} ...
.
} catch (Exception catchAll) {
 // Decide to skip the app or crash the RM.
}
{code}


 RM can't transition to active if it can't recover an app attempt
 

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
 yarn-2010-3.patch


 If the RM fails to recover an app attempt, it won't come up. We should make 
 it more resilient.
 Specifically, the underlying error is that the app was submitted before 
 Kerberos security got turned on. Makes sense for the app to fail in this 
 case. But YARN should still start.
 {noformat}
 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election 
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
 Active 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
  
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
  
 ... 4 more 
 Caused by: org.apache.hadoop.service.ServiceStateException: 
 org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
  
 ... 5 more 
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
  
 at 
 

[jira] [Commented] (YARN-1550) NPE in FairSchedulerAppsBlock#render

2014-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015798#comment-14015798
 ] 

Hadoop QA commented on YARN-1550:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647960/YARN-1550.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3889//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3889//console

This message is automatically generated.

 NPE in FairSchedulerAppsBlock#render
 

 Key: YARN-1550
 URL: https://issues.apache.org/jira/browse/YARN-1550
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: caolong
Assignee: Anubhav Dhoot
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1550.001.patch, YARN-1550.002.patch, 
 YARN-1550.003.patch, YARN-1550.patch


 three Steps :
 1、debug at RMAppManager#submitApplication after code
 if (rmContext.getRMApps().putIfAbsent(applicationId, application) !=
 null) {
   String message = Application with id  + applicationId
   +  is already present! Cannot add a duplicate!;
   LOG.warn(message);
   throw RPCUtil.getRemoteException(message);
 }
 2、submit one application:hadoop jar 
 ~/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-ydh2.2.0-tests.jar
  sleep -Dhadoop.job.ugi=test2,#11 -Dmapreduce.job.queuename=p1 -m 1 -mt 1 
 -r 1
 3、go in page :http://ip:50030/cluster/scheduler and find 500 ERROR!
 the log:
 {noformat}
 2013-12-30 11:51:43,795 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/scheduler
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:96)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-06-02 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1913:
--

Attachment: YARN-1913.patch

Thanks, Sandy. Fixed that problem.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015823#comment-14015823
 ] 

Hadoop QA commented on YARN-1913:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647968/YARN-1913.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestMaxRunningAppsEnforcer
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerApplicationAttempt
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSSchedulerApp

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3890//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3890//console

This message is automatically generated.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-06-02 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1913:
--

Attachment: YARN-1913.patch

New patch to fix the test errors.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1550) NPE in FairSchedulerAppsBlock#render

2014-06-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015844#comment-14015844
 ] 

Karthik Kambatla commented on YARN-1550:


Thanks Anubhav. +1. Committing this shortly.

 NPE in FairSchedulerAppsBlock#render
 

 Key: YARN-1550
 URL: https://issues.apache.org/jira/browse/YARN-1550
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: caolong
Assignee: Anubhav Dhoot
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1550.001.patch, YARN-1550.002.patch, 
 YARN-1550.003.patch, YARN-1550.patch


 three Steps :
 1、debug at RMAppManager#submitApplication after code
 if (rmContext.getRMApps().putIfAbsent(applicationId, application) !=
 null) {
   String message = Application with id  + applicationId
   +  is already present! Cannot add a duplicate!;
   LOG.warn(message);
   throw RPCUtil.getRemoteException(message);
 }
 2、submit one application:hadoop jar 
 ~/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-ydh2.2.0-tests.jar
  sleep -Dhadoop.job.ugi=test2,#11 -Dmapreduce.job.queuename=p1 -m 1 -mt 1 
 -r 1
 3、go in page :http://ip:50030/cluster/scheduler and find 500 ERROR!
 the log:
 {noformat}
 2013-12-30 11:51:43,795 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/scheduler
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:96)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1590) _HOST doesn't expand properly for RM, NM, ProxyServer and JHS

2014-06-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015862#comment-14015862
 ] 

Karthik Kambatla commented on YARN-1590:


Just ran into this. If I am not mistaken, in the following snippet, we intended 
to use DEFAULT_PROXY_PORT instead of DEFAULT_RM_PORT. Correct? 
{code}
 PROXY_PREFIX + address;
+  public static final int DEFAULT_PROXY_PORT = 9099;
+  public static final String DEFAULT_PROXY_ADDRESS =
+0.0.0.0: + DEFAULT_RM_PORT;
{code}

YARN-2119 has been filed to fix this.

 _HOST doesn't expand properly for RM, NM, ProxyServer and JHS
 -

 Key: YARN-1590
 URL: https://issues.apache.org/jira/browse/YARN-1590
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.2.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Fix For: 2.4.0

 Attachments: YARN-1590.1.patch, YARN-1590.2.patch, YARN-1590.3.patch, 
 YARN-1590.4.patch


 _HOST is not properly substituted when we use VIP address. Currently it 
 always used the host name of the machine and disregard the VIP address. It is 
 true mainly for RM, NM, WebProxy, and JHS rpc service. Looks like it is 
 working fine for webservice authentication.
 On the other hand, the same thing is working fine for NN and SNN in RPC as 
 well as webservice.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2119) Fix the DEFAULT_PROXY_ADDRESS used for getBindAddress to fix 1590

2014-06-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015866#comment-14015866
 ] 

Karthik Kambatla commented on YARN-2119:


Looks good to me. +1. 

I ll commit this in a day if no one else has any comments. 

 Fix the DEFAULT_PROXY_ADDRESS used for getBindAddress to fix 1590
 -

 Key: YARN-2119
 URL: https://issues.apache.org/jira/browse/YARN-2119
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2119.patch


 The fix for [YARN-1590|https://issues.apache.org/jira/browse/YARN-1590] 
 introduced an method to get web proxy bind address with the incorrect default 
 port. Because all the users of the method (only 1 user) ignores the port, its 
 not breaking anything yet. Fixing it in case someone else uses this in the 
 future. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1877) Document yarn.resourcemanager.zk-auth and its scope

2014-06-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015868#comment-14015868
 ] 

Hudson commented on YARN-1877:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5643 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5643/])
YARN-1877. Updated CHANGES.txt to fix the JIRA number. It was previously 
committed as YARN-2010. (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1599348)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Document yarn.resourcemanager.zk-auth and its scope
 ---

 Key: YARN-1877
 URL: https://issues.apache.org/jira/browse/YARN-1877
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Fix For: 2.5.0

 Attachments: YARN-1877.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1550) NPE in FairSchedulerAppsBlock#render

2014-06-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015870#comment-14015870
 ] 

Hudson commented on YARN-1550:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5643 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5643/])
YARN-1550. NPE in FairSchedulerAppsBlock#render. (Anubhav Dhoot via kasha) 
(kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1599345)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java


 NPE in FairSchedulerAppsBlock#render
 

 Key: YARN-1550
 URL: https://issues.apache.org/jira/browse/YARN-1550
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: caolong
Assignee: Anubhav Dhoot
Priority: Critical
 Fix For: 2.5.0

 Attachments: YARN-1550.001.patch, YARN-1550.002.patch, 
 YARN-1550.003.patch, YARN-1550.patch


 three Steps :
 1、debug at RMAppManager#submitApplication after code
 if (rmContext.getRMApps().putIfAbsent(applicationId, application) !=
 null) {
   String message = Application with id  + applicationId
   +  is already present! Cannot add a duplicate!;
   LOG.warn(message);
   throw RPCUtil.getRemoteException(message);
 }
 2、submit one application:hadoop jar 
 ~/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-ydh2.2.0-tests.jar
  sleep -Dhadoop.job.ugi=test2,#11 -Dmapreduce.job.queuename=p1 -m 1 -mt 1 
 -r 1
 3、go in page :http://ip:50030/cluster/scheduler and find 500 ERROR!
 the log:
 {noformat}
 2013-12-30 11:51:43,795 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/scheduler
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:96)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-06-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015871#comment-14015871
 ] 

Hudson commented on YARN-2010:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5643 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5643/])
YARN-1877. Updated CHANGES.txt to fix the JIRA number. It was previously 
committed as YARN-2010. (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1599348)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 RM can't transition to active if it can't recover an app attempt
 

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
 yarn-2010-3.patch


 If the RM fails to recover an app attempt, it won't come up. We should make 
 it more resilient.
 Specifically, the underlying error is that the app was submitted before 
 Kerberos security got turned on. Makes sense for the app to fail in this 
 case. But YARN should still start.
 {noformat}
 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Exception handling the winning of election 
 org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
 Active 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
  
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
  
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
 Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
 transitioning to Active mode 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
  
 ... 4 more 
 Caused by: org.apache.hadoop.service.ServiceStateException: 
 org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
  
 ... 5 more 
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
 java.lang.IllegalArgumentException: Missing argument 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
  
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
 ... 8 more 
 Caused by: java.lang.IllegalArgumentException: Missing argument 
 at javax.crypto.spec.SecretKeySpec.init(SecretKeySpec.java:93) 
 at 
 org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
  
 ... 13 more 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-1540) Add an easy way to turn on HA

2014-06-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-1540.


  Resolution: Invalid
Target Version/s:   (was: )

Other sub-tasks under YARN-149 handle this. Now, it is relatively easy to 
configure RM HA. This JIRA is Invalid anymore. 

Please re-open or open another JIRA if you see other possible improvements.

 Add an easy way to turn on HA
 -

 Key: YARN-1540
 URL: https://issues.apache.org/jira/browse/YARN-1540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 Users will have to modify the configuration significantly to turn on HA. It 
 would be nice to have a simpler way of doing this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015879#comment-14015879
 ] 

Hadoop QA commented on YARN-1913:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647969/YARN-1913.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestMaxRunningAppsEnforcer
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSSchedulerApp
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerApplicationAttempt

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3891//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3891//console

This message is automatically generated.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2122) In AllocationFileLoaderService, the reloadThread should be created in init() and started in start()

2014-06-02 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-2122:
--

 Summary: In AllocationFileLoaderService, the reloadThread should 
be created in init() and started in start()
 Key: YARN-2122
 URL: https://issues.apache.org/jira/browse/YARN-2122
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter


AllcoationFileLoaderService has this reloadThread that is currently created and 
started in start(). Instead, it should be created in init() and started in 
start().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015896#comment-14015896
 ] 

Hadoop QA commented on YARN-1913:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647976/YARN-1913.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3892//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3892//console

This message is automatically generated.

 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2120) Coloring queues running over minShare on RM Scheduler page

2014-06-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2120:
--

Attachment: YARN-2120.v1.patch

 Coloring queues running over minShare on RM Scheduler page
 --

 Key: YARN-2120
 URL: https://issues.apache.org/jira/browse/YARN-2120
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Siqi Li
 Attachments: AD45B623-9F14-420B-B1FB-1186E2B5EC4A.png, 
 YARN-2120.v1.patch


 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 Since fairShare is displaying with dotted line, I think we can stop 
 displaying orange when a queue over its fairshare.
 It would be better to show a queue running over minShare with orange color, 
 so that we know queue is running more than its min share. 
 Also, we can display a queue running at maxShare with red color.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2121) TimelineAuthenticator#hasDelegationToken may throw NPE

2014-06-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2121:
--

Attachment: YARN-2121.1.patch

Upload a patch to fix the problem, and add the corresponding test cases.

 TimelineAuthenticator#hasDelegationToken may throw NPE
 --

 Key: YARN-2121
 URL: https://issues.apache.org/jira/browse/YARN-2121
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2121.1.patch


 {code}
   private boolean hasDelegationToken(URL url) {
 return url.getQuery().contains(
 TimelineAuthenticationConsts.DELEGATION_PARAM + =);
   }
 {code}
 If the given url doesn't have any params at all. It will throw NPE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2122) In AllocationFileLoaderService, the reloadThread should be created in init() and started in start()

2014-06-02 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2122:


Attachment: YARN-2122.patch

 In AllocationFileLoaderService, the reloadThread should be created in init() 
 and started in start()
 ---

 Key: YARN-2122
 URL: https://issues.apache.org/jira/browse/YARN-2122
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Attachments: YARN-2122.patch


 AllcoationFileLoaderService has this reloadThread that is currently created 
 and started in start(). Instead, it should be created in init() and started 
 in start().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2120) Coloring queues running over minShare on RM Scheduler page

2014-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015980#comment-14015980
 ] 

Hadoop QA commented on YARN-2120:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12648006/YARN-2120.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3893//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3893//console

This message is automatically generated.

 Coloring queues running over minShare on RM Scheduler page
 --

 Key: YARN-2120
 URL: https://issues.apache.org/jira/browse/YARN-2120
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: AD45B623-9F14-420B-B1FB-1186E2B5EC4A.png, 
 YARN-2120.v1.patch


 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 Since fairShare is displaying with dotted line, I think we can stop 
 displaying orange when a queue over its fairshare.
 It would be better to show a queue running over minShare with orange color, 
 so that we know queue is running more than its min share. 
 Also, we can display a queue running at maxShare with red color.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2122) In AllocationFileLoaderService, the reloadThread should be created in init() and started in start()

2014-06-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015986#comment-14015986
 ] 

Tsuyoshi OZAWA commented on YARN-2122:
--

Thank you for taking this JIRA, [~rkanter]. I think your patch fixes the issue 
itself.

I have one comment - how about overriding 
serviceInit()/serviceStart()/serviceStop() instead of init()/start()/stop()? 
Should we do this on another JIRA? [~kkambatl], what do you think?

 In AllocationFileLoaderService, the reloadThread should be created in init() 
 and started in start()
 ---

 Key: YARN-2122
 URL: https://issues.apache.org/jira/browse/YARN-2122
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Attachments: YARN-2122.patch


 AllcoationFileLoaderService has this reloadThread that is currently created 
 and started in start(). Instead, it should be created in init() and started 
 in start().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2121) TimelineAuthenticator#hasDelegationToken may throw NPE

2014-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016010#comment-14016010
 ] 

Hadoop QA commented on YARN-2121:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12648019/YARN-2121.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.TestRMAdminCLI

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3894//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3894//console

This message is automatically generated.

 TimelineAuthenticator#hasDelegationToken may throw NPE
 --

 Key: YARN-2121
 URL: https://issues.apache.org/jira/browse/YARN-2121
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2121.1.patch


 {code}
   private boolean hasDelegationToken(URL url) {
 return url.getQuery().contains(
 TimelineAuthenticationConsts.DELEGATION_PARAM + =);
   }
 {code}
 If the given url doesn't have any params at all. It will throw NPE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-06-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016019#comment-14016019
 ] 

Tsuyoshi OZAWA commented on YARN-2091:
--

[~bikassaha], thank you for the comments.

{quote}
 Should we check if the exitCode has already been set (eg. via a kill event) 
and if its not set then set it from exitEvent? How can we check if the exitCode 
has not been set? Maybe have some uninitialized/invalid default value.
{quote}

IIUC, we can distinguish its set value from default value by checking whether 
exitCode is ContainerExitStatus.INVALID because default value of {{exitCode}} 
is ContainerExitStatus.INVALID. Do you have any comment about this?
{code}
  if (container.exitCode == ContainerExitStatus.INVALID) {
container.exitCode = exitEvent.getExitCode();
  }
{code}

About the new exit status, I'll update comments in the next patch.


 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
 YARN-2091.4.patch, YARN-2091.5.patch, YARN-2091.6.patch


 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2120) Coloring queues running over minShare on RM Scheduler page

2014-06-02 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016027#comment-14016027
 ] 

Ashwin Shankar commented on YARN-2120:
--

[~l201514],
It would be helpful if we don't remove color codes for 'above/below fair share' 
since we don't always set minshare for queues.
In your proposal,for cases where we don't set minShare,the usage would start 
orange and would be the same for below and above fair share.
I know that there is dotted line to mark fair share,but that is too faint and I 
generally need to squint to find it, especially when there are
a lot of queues in the cluster.

 

 Coloring queues running over minShare on RM Scheduler page
 --

 Key: YARN-2120
 URL: https://issues.apache.org/jira/browse/YARN-2120
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: AD45B623-9F14-420B-B1FB-1186E2B5EC4A.png, 
 YARN-2120.v1.patch


 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 Since fairShare is displaying with dotted line, I think we can stop 
 displaying orange when a queue over its fairshare.
 It would be better to show a queue running over minShare with orange color, 
 so that we know queue is running more than its min share. 
 Also, we can display a queue running at maxShare with red color.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2122) In AllocationFileLoaderService, the reloadThread should be created in init() and started in start()

2014-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016032#comment-14016032
 ] 

Hadoop QA commented on YARN-2122:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12648023/YARN-2122.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3895//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3895//console

This message is automatically generated.

 In AllocationFileLoaderService, the reloadThread should be created in init() 
 and started in start()
 ---

 Key: YARN-2122
 URL: https://issues.apache.org/jira/browse/YARN-2122
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Attachments: YARN-2122.patch


 AllcoationFileLoaderService has this reloadThread that is currently created 
 and started in start(). Instead, it should be created in init() and started 
 in start().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2122) In AllocationFileLoaderService, the reloadThread should be created in init() and started in start()

2014-06-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016035#comment-14016035
 ] 

Karthik Kambatla commented on YARN-2122:


Good point, [~ozawa]. It would definitely be better to override serviceInit, 
serivceStart, serviceStop. 

 In AllocationFileLoaderService, the reloadThread should be created in init() 
 and started in start()
 ---

 Key: YARN-2122
 URL: https://issues.apache.org/jira/browse/YARN-2122
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Attachments: YARN-2122.patch


 AllcoationFileLoaderService has this reloadThread that is currently created 
 and started in start(). Instead, it should be created in init() and started 
 in start().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-06-02 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016050#comment-14016050
 ] 

Ashwin Shankar commented on YARN-2026:
--

Hi [~sandyr], did you have any comments ? 
basically in the above scenario fair share policy tends to look like fifo,
since the users who submitted apps first, hog the cluster, although all users 
have same fair share.

 Fair scheduler : Fair share for inactive queues causes unfair allocation in 
 some scenarios
 --

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2026-v1.txt


 Problem1- While using hierarchical queues in fair scheduler,there are few 
 scenarios where we have seen a leaf queue with least fair share can take 
 majority of the cluster and starve a sibling parent queue which has greater 
 weight/fair share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Problem2 - Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
 at 5%,until childQ2 starts relinquishing containers. We would like each of 
 childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
 40%,which would ensure childQ1 gets upto 40% resource if needed through 
 preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2120) Coloring queues running over minShare on RM Scheduler page

2014-06-02 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016066#comment-14016066
 ] 

Siqi Li commented on YARN-2120:
---

[~ashwinshankar77] thanks for your feedback, let me see if I can find a way to 
retain the original format

 Coloring queues running over minShare on RM Scheduler page
 --

 Key: YARN-2120
 URL: https://issues.apache.org/jira/browse/YARN-2120
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: AD45B623-9F14-420B-B1FB-1186E2B5EC4A.png, 
 YARN-2120.v1.patch


 Today RM Scheduler page shows FairShare, Used, Used (over fair share) and 
 MaxCapacity.
 Since fairShare is displaying with dotted line, I think we can stop 
 displaying orange when a queue over its fairshare.
 It would be better to show a queue running over minShare with orange color, 
 so that we know queue is running more than its min share. 
 Also, we can display a queue running at maxShare with red color.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2122) In AllocationFileLoaderService, the reloadThread should be created in init() and started in start()

2014-06-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016075#comment-14016075
 ] 

Hadoop QA commented on YARN-2122:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12648031/YARN-2122.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3896//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3896//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3896//console

This message is automatically generated.

 In AllocationFileLoaderService, the reloadThread should be created in init() 
 and started in start()
 ---

 Key: YARN-2122
 URL: https://issues.apache.org/jira/browse/YARN-2122
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Attachments: YARN-2122.patch, YARN-2122.patch


 AllcoationFileLoaderService has this reloadThread that is currently created 
 and started in start(). Instead, it should be created in init() and started 
 in start().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-06-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016081#comment-14016081
 ] 

Bikas Saha commented on YARN-2091:
--

If we are sure that the default value is set in the code to 
ContainerExitStatus.INVALID then sounds good. Given that 
ContainerExitStatus.INVALID == 5000 we have to explicitly initialize with that 
value since Java will default to 0.

 Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
 ---

 Key: YARN-2091
 URL: https://issues.apache.org/jira/browse/YARN-2091
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
 YARN-2091.4.patch, YARN-2091.5.patch, YARN-2091.6.patch


 Currently, the AM cannot programmatically determine if the task was killed 
 due to using excessive memory. The NM kills it without passing this 
 information in the container status back to the RM. So the AM cannot take any 
 action here. The jira tracks adding this exit status and passing it from the 
 NM to the RM and then the AM. In general, there may be other such actions 
 taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-06-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016116#comment-14016116
 ] 

Hudson commented on YARN-1913:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5646 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5646/])
YARN-1913. With Fair Scheduler, cluster can logjam when all resources are 
consumed by AMs (Wei Yan via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1599400)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm


 With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
 --

 Key: YARN-1913
 URL: https://issues.apache.org/jira/browse/YARN-1913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Wei Yan
 Fix For: 2.5.0

 Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
 YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch


 It's possible to deadlock a cluster by submitting many applications at once, 
 and have all cluster resources taken up by AMs.
 One solution is for the scheduler to limit resources taken up by AMs, as a 
 percentage of total cluster resources, via a maxApplicationMasterShare 
 config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-02 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016151#comment-14016151
 ] 

Carlo Curino commented on YARN-2022:


Hi Sunil, I read the doc with [~chris.douglas] and [~subru], and we agree with 
the general direction, though you will have to be very careful to test this 
thoroughly as you are enforcing rather tricky invariants.

A couple of specific concerns:

1) The yarn.resourcemanager.monitor.capacity.preemption.am_container_limit you 
propose I think it is a bit overkill. I understand the intent to allow for a 
more tunable preemption of AMs, but I worry this is so esoteric of a parameter 
that people will not know how  to use it. I personally would have to think very 
hard to figure out exactly what different configuration of this will give me in 
terms of increasing/decreasing the chances of an AM to survive preemption, and 
in terms of improving overal cluster efficiency. I propose to enforce only 
based on the existing invariants (am-percentage, max-apps etc..), as the 
semantics are crisper: the preemption policy will re-establish the invariants 
of the queue no more no less.

2) Preserving the correct user mix of jobs in the queue it is also a good 
addition, though again I am worried this is tricky code to write, so I strongly 
encourage you to write many many unit tests, and test the policy on a cluster 
extensively before it gets committed. 



 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-06-02 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016155#comment-14016155
 ] 

Sandy Ryza commented on YARN-2026:
--

Hi Ashwin, have been busy with other stuff and probably will be for the next 
week or two.  I see your point.  I need to think about it a little more - the 
main aim of preemption is to provide enforce guarantees for purposes like 
maintaining SLAs.  While converging towards fairness more quickly in user 
queues could be a nice property, it satisfies a slightly different goal.

 Fair scheduler : Fair share for inactive queues causes unfair allocation in 
 some scenarios
 --

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2026-v1.txt


 Problem1- While using hierarchical queues in fair scheduler,there are few 
 scenarios where we have seen a leaf queue with least fair share can take 
 majority of the cluster and starve a sibling parent queue which has greater 
 weight/fair share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Problem2 - Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
 at 5%,until childQ2 starts relinquishing containers. We would like each of 
 childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
 40%,which would ensure childQ1 gets upto 40% resource if needed through 
 preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-06-02 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016191#comment-14016191
 ] 

Ashwin Shankar commented on YARN-2026:
--

[~sandyr], 
Sure Sandy I'll patiently wait for your response. 
Also if you prefer ,please feel free to point me to some other committer who 
knows the FS code base well.
We are very interested to get this jira and YARN-1961 committed this month 
since its affecting our query cluster. 

 Fair scheduler : Fair share for inactive queues causes unfair allocation in 
 some scenarios
 --

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2026-v1.txt


 Problem1- While using hierarchical queues in fair scheduler,there are few 
 scenarios where we have seen a leaf queue with least fair share can take 
 majority of the cluster and starve a sibling parent queue which has greater 
 weight/fair share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Problem2 - Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
 at 5%,until childQ2 starts relinquishing containers. We would like each of 
 childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
 40%,which would ensure childQ1 gets upto 40% resource if needed through 
 preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)