date:20140520


 [ 
https://issues.apache.org/jira/browse/YARN-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2077:
-

Affects Version/s: 2.4.0

 JobImpl#makeUberDecision doesn't log that Uber mode is disabled because of 
 too much CPUs
 

 Key: YARN-2077
 URL: https://issues.apache.org/jira/browse/YARN-2077
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2077.1.patch


 JobImpl#makeUberDecision usually logs why the Job cannot be launched as Uber 
 mode(e.g. too much RAM; or something).  About CPUs, it's not logged 
 currently. We should log it when too much CPU.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2078) yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented

Tsuyoshi OZAWA created YARN-2078:


 Summary: yarn.app.am.resource.mb/cpu-vcores affects uber mode but 
is not documented
 Key: YARN-2078
 URL: https://issues.apache.org/jira/browse/YARN-2078
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial


We should document the condition when uber mode is enabled. If not, users need 
to read code.

{code}
boolean smallMemory =
( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
= sysMemSizeForUberSlot)
|| (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
   boolean smallCpu =
Math.max(
conf.getInt(
MRJobConfig.MAP_CPU_VCORES, 
MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
conf.getInt(
MRJobConfig.REDUCE_CPU_VCORES, 
MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
 = sysCPUSizeForUberSlot
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2078) yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented


 [ 
https://issues.apache.org/jira/browse/YARN-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2078:
-

Attachment: YARN-2078.1.patch

 yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented
 --

 Key: YARN-2078
 URL: https://issues.apache.org/jira/browse/YARN-2078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2078.1.patch


 We should document the condition when uber mode is enabled. If not, users 
 need to read code.
 {code}
 boolean smallMemory =
 ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
 conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
 = sysMemSizeForUberSlot)
 || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
boolean smallCpu =
 Math.max(
 conf.getInt(
 MRJobConfig.MAP_CPU_VCORES, 
 MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
 conf.getInt(
 MRJobConfig.REDUCE_CPU_VCORES, 
 MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
  = sysCPUSizeForUberSlot
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2078) yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented


 [ 
https://issues.apache.org/jira/browse/YARN-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2078:
-

Component/s: documentation

 yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented
 --

 Key: YARN-2078
 URL: https://issues.apache.org/jira/browse/YARN-2078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2078.1.patch


 We should document the condition when uber mode is enabled. If not, users 
 need to read code.
 {code}
 boolean smallMemory =
 ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
 conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
 = sysMemSizeForUberSlot)
 || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
boolean smallCpu =
 Math.max(
 conf.getInt(
 MRJobConfig.MAP_CPU_VCORES, 
 MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
 conf.getInt(
 MRJobConfig.REDUCE_CPU_VCORES, 
 MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
  = sysCPUSizeForUberSlot
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2078) yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented


 [ 
https://issues.apache.org/jira/browse/YARN-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2078:
-

Affects Version/s: 2.4.0

 yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented
 --

 Key: YARN-2078
 URL: https://issues.apache.org/jira/browse/YARN-2078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2078.1.patch


 We should document the condition when uber mode is enabled. If not, users 
 need to read code.
 {code}
 boolean smallMemory =
 ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
 conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
 = sysMemSizeForUberSlot)
 || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
boolean smallCpu =
 Math.max(
 conf.getInt(
 MRJobConfig.MAP_CPU_VCORES, 
 MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
 conf.getInt(
 MRJobConfig.REDUCE_CPU_VCORES, 
 MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
  = sysCPUSizeForUberSlot
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2077) JobImpl#makeUberDecision doesn't log that Uber mode is disabled because of too much CPUs


 [ 
https://issues.apache.org/jira/browse/YARN-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2077:
-

Component/s: client

 JobImpl#makeUberDecision doesn't log that Uber mode is disabled because of 
 too much CPUs
 

 Key: YARN-2077
 URL: https://issues.apache.org/jira/browse/YARN-2077
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2077.1.patch


 JobImpl#makeUberDecision usually logs why the Job cannot be launched as Uber 
 mode(e.g. too much RAM; or something).  About CPUs, it's not logged 
 currently. We should log it when too much CPU.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore


 [ 
https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated YARN-2030:


Attachment: YARN-2030.v1.patch

Attach patch. 
This is a code refactor,  TestFSRMStateStore and TestZKRMStateStore already 
cover the code here, so no additional test is added.



 Use StateMachine to simplify handleStoreEvent() in RMStateStore
 ---

 Key: YARN-2030
 URL: https://issues.apache.org/jira/browse/YARN-2030
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Junping Du
Assignee: Binglin Chang
 Attachments: YARN-2030.v1.patch


 Now the logic to handle different store events in handleStoreEvent() is as 
 following:
 {code}
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
 ...
   } else {
 ...
   }
   ...
   try {
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
   ...
 } else {
   ...
 }
   } 
   ...
 } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
 ...
   } else {
 ...
   }
 ...
 if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
   ...
 } else {
   ...
 }
   }
   ...
 } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) {
 ...
 } else {
   ...
 }
 }
 {code}
 This is not only confuse people but also led to mistake easily. We may 
 leverage state machine to simply this even no state transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2051) Add more unit tests for PBImpl that didn't get covered


 [ 
https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang reassigned YARN-2051:
---

Assignee: Binglin Chang

 Add more unit tests for PBImpl that didn't get covered
 --

 Key: YARN-2051
 URL: https://issues.apache.org/jira/browse/YARN-2051
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du
Assignee: Binglin Chang
Priority: Critical

 From YARN-2016, we can see some bug could exist in PB implementation of 
 protocol. The bad news is most of these PBImpl don't have any unit test to 
 verify the info is not lost or changed after serialization/deserialization. 
 We should add more tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts


[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002937#comment-14002937
 ] 

Hudson commented on YARN-2053:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5606 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5606/])
YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from 
previous attempts for work-preserving AM restart. Contributed by Wangda Tan 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2066) Wrong field is referenced in GetApplicationsRequestPBImpl#mergeLocalToBuilder()


[ 
https://issues.apache.org/jira/browse/YARN-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002941#comment-14002941
 ] 

Hudson commented on YARN-2066:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5606 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5606/])
YARN-2066. Wrong field is referenced in 
GetApplicationsRequestPBImpl#mergeLocalToBuilder (Contributed by Hong Zhiguo) 
(junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595413)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetApplicationsRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestGetApplicationsRequest.java


 Wrong field is referenced in 
 GetApplicationsRequestPBImpl#mergeLocalToBuilder()
 ---

 Key: YARN-2066
 URL: https://issues.apache.org/jira/browse/YARN-2066
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.4.1

 Attachments: YARN-2066.patch


 {code}
 if (this.finish != null) {
   builder.setFinishBegin(start.getMinimumLong());
   builder.setFinishEnd(start.getMaximumLong());
 }
 {code}
 this.finish should be referenced in the if block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2078) yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented


[ 
https://issues.apache.org/jira/browse/YARN-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002981#comment-14002981
 ] 

Hadoop QA commented on YARN-2078:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645748/YARN-2078.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3768//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3768//console

This message is automatically generated.

 yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented
 --

 Key: YARN-2078
 URL: https://issues.apache.org/jira/browse/YARN-2078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2078.1.patch


 We should document the condition when uber mode is enabled. If not, users 
 need to read code.
 {code}
 boolean smallMemory =
 ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
 conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
 = sysMemSizeForUberSlot)
 || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
boolean smallCpu =
 Math.max(
 conf.getInt(
 MRJobConfig.MAP_CPU_VCORES, 
 MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
 conf.getInt(
 MRJobConfig.REDUCE_CPU_VCORES, 
 MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
  = sysCPUSizeForUberSlot
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2077) JobImpl#makeUberDecision doesn't log that Uber mode is disabled because of too much CPUs


[ 
https://issues.apache.org/jira/browse/YARN-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002983#comment-14002983
 ] 

Hadoop QA commented on YARN-2077:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645746/YARN-2077.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3767//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3767//console

This message is automatically generated.

 JobImpl#makeUberDecision doesn't log that Uber mode is disabled because of 
 too much CPUs
 

 Key: YARN-2077
 URL: https://issues.apache.org/jira/browse/YARN-2077
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2077.1.patch


 JobImpl#makeUberDecision usually logs why the Job cannot be launched as Uber 
 mode(e.g. too much RAM; or something).  About CPUs, it's not logged 
 currently. We should log it when too much CPU.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse


[ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003009#comment-14003009
 ] 

Hadoop QA commented on YARN-1897:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645735/YARN-1897-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3771//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3771//console

This message is automatically generated.

 Define SignalContainerRequest and SignalContainerResponse
 -

 Key: YARN-1897
 URL: https://issues.apache.org/jira/browse/YARN-1897
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
 YARN-1897.1.patch


 We need to define SignalContainerRequest and SignalContainerResponse first as 
 they are needed by other sub tasks. SignalContainerRequest should use 
 OS-independent commands and provide a way to application to specify reason 
 for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk


[ 
https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003027#comment-14003027
 ] 

Hadoop QA commented on YARN-2075:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645730/YARN-2075.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3769//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3769//console

This message is automatically generated.

 TestRMAdminCLI consistently fail on trunk
 -

 Key: YARN-2075
 URL: https://issues.apache.org/jira/browse/YARN-2075
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
 Attachments: YARN-2075.patch


 {code}
 Running org.apache.hadoop.yarn.client.TestRMAdminCLI
 Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec 
  FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI
 testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
 elapsed: 0.082 sec   ERROR!
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.088 
 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application

[
https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003035#comment-14003035
]

Hadoop QA commented on YARN-941:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12645713/YARN-941.preview.3.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 7 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:red}-1 findbugs{color}. The patch appears to introduce 3 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.client.TestRMAdminCLI

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/3770//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/3770//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/3770//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3770//console

This message is automatically generated.

RM Should have a way to update the tokens it has for a running application
--

Key: YARN-941
URL: https://issues.apache.org/jira/browse/YARN-941
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Robert Joseph Evans
Assignee: Xuan Gong
Attachments: YARN-941.preview.2.patch, YARN-941.preview.3.patch,
YARN-941.preview.patch

When an application is submitted to the RM it includes with it a set of
tokens that the RM will renew on behalf of the application, that will be
passed to the AM when the application is launched, and will be used when
launching the application to access HDFS to download files on behalf of the
application.
For long lived applications/services these tokens can expire, and then the
tokens that the AM has will be invalid, and the tokens that the RM had will
also not work to launch a new AM.
We need to provide an API that will allow the RM to replace the current
tokens for this application with a new set. To avoid any real race issues, I
think this API should be something that the AM calls, so that the client can
connect to the AM with a new set of tokens it got using kerberos, then the AM
can inform the RM of the new set of tokens and quickly update its tokens
internally to use these new ones.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003208#comment-14003208
 ] 

Hadoop QA commented on YARN-2030:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645754/YARN-2030.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3772//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3772//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3772//console

This message is automatically generated.

 Use StateMachine to simplify handleStoreEvent() in RMStateStore
 ---

 Key: YARN-2030
 URL: https://issues.apache.org/jira/browse/YARN-2030
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Junping Du
Assignee: Binglin Chang
 Attachments: YARN-2030.v1.patch


 Now the logic to handle different store events in handleStoreEvent() is as 
 following:
 {code}
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
 ...
   } else {
 ...
   }
   ...
   try {
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
   ...
 } else {
   ...
 }
   } 
   ...
 } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
 ...
   } else {
 ...
   }
 ...
 if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
   ...
 } else {
   ...
 }
   }
   ...
 } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) {
 ...
 } else {
   ...
 }
 }
 {code}
 This is not only confuse people but also led to mistake easily. We may 
 leverage state machine to simply this even no state transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2066) Wrong field is referenced in GetApplicationsRequestPBImpl#mergeLocalToBuilder()


[ 
https://issues.apache.org/jira/browse/YARN-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003248#comment-14003248
 ] 

Hudson commented on YARN-2066:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #562 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/562/])
YARN-2066. Wrong field is referenced in 
GetApplicationsRequestPBImpl#mergeLocalToBuilder (Contributed by Hong Zhiguo) 
(junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595413)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetApplicationsRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestGetApplicationsRequest.java


 Wrong field is referenced in 
 GetApplicationsRequestPBImpl#mergeLocalToBuilder()
 ---

 Key: YARN-2066
 URL: https://issues.apache.org/jira/browse/YARN-2066
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.4.1

 Attachments: YARN-2066.patch


 {code}
 if (this.finish != null) {
   builder.setFinishBegin(start.getMinimumLong());
   builder.setFinishEnd(start.getMaximumLong());
 }
 {code}
 this.finish should be referenced in the if block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts


[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003257#comment-14003257
 ] 

Hudson commented on YARN-2053:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #562 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/562/])
YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from 
previous attempts for work-preserving AM restart. Contributed by Wangda Tan 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts


[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003292#comment-14003292
 ] 

Hudson commented on YARN-2053:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1754 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1754/])
YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from 
previous attempts for work-preserving AM restart. Contributed by Wangda Tan 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2066) Wrong field is referenced in GetApplicationsRequestPBImpl#mergeLocalToBuilder()


[ 
https://issues.apache.org/jira/browse/YARN-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003283#comment-14003283
 ] 

Hudson commented on YARN-2066:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1754 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1754/])
YARN-2066. Wrong field is referenced in 
GetApplicationsRequestPBImpl#mergeLocalToBuilder (Contributed by Hong Zhiguo) 
(junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595413)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetApplicationsRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestGetApplicationsRequest.java


 Wrong field is referenced in 
 GetApplicationsRequestPBImpl#mergeLocalToBuilder()
 ---

 Key: YARN-2066
 URL: https://issues.apache.org/jira/browse/YARN-2066
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.4.1

 Attachments: YARN-2066.patch


 {code}
 if (this.finish != null) {
   builder.setFinishBegin(start.getMinimumLong());
   builder.setFinishEnd(start.getMaximumLong());
 }
 {code}
 this.finish should be referenced in the if block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2079) Recover NonAggregatingLogHandler state upon nodemanager restart

2014-05-20 Thread Jason Lowe (JIRA)

Jason Lowe created YARN-2079:


 Summary: Recover NonAggregatingLogHandler state upon nodemanager 
restart
 Key: YARN-2079
 URL: https://issues.apache.org/jira/browse/YARN-2079
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe


The state of NonAggregatingLogHandler needs to be persisted so logs are 
properly deleted across a nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2066) Wrong field is referenced in GetApplicationsRequestPBImpl#mergeLocalToBuilder()


[ 
https://issues.apache.org/jira/browse/YARN-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003460#comment-14003460
 ] 

Hudson commented on YARN-2066:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1780 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1780/])
YARN-2066. Wrong field is referenced in 
GetApplicationsRequestPBImpl#mergeLocalToBuilder (Contributed by Hong Zhiguo) 
(junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595413)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetApplicationsRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestGetApplicationsRequest.java


 Wrong field is referenced in 
 GetApplicationsRequestPBImpl#mergeLocalToBuilder()
 ---

 Key: YARN-2066
 URL: https://issues.apache.org/jira/browse/YARN-2066
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.4.1

 Attachments: YARN-2066.patch


 {code}
 if (this.finish != null) {
   builder.setFinishBegin(start.getMinimumLong());
   builder.setFinishEnd(start.getMaximumLong());
 }
 {code}
 this.finish should be referenced in the if block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts


[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003469#comment-14003469
 ] 

Hudson commented on YARN-2053:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1780 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1780/])
YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from 
previous attempts for work-preserving AM restart. Contributed by Wangda Tan 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application

2014-05-20 Thread bc Wong (JIRA)

[
https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003440#comment-14003440
]

bc Wong commented on YARN-941:
--

Hi [~xgong], thanks for the patch! I'm interested in talking through the
changes and their security implications, for everybody who's following along. I
think the following are worth highlighting:

# The token update mechanism is via the AM heartbeat. So if the previous AMRM
token has been compromised, the attacker can get the new token.
** I don't think it's a big problem as the RM will only hand out the new token
in _exactly_ one AllocateResponse (except for the case of RM restart). So if
the attacker has the new token, the real AM won't, and it'll die and the token
will get revoked.
# How frequently a running AM gets an updated token is at the mercy of the
configuration (the roll interval and activation delay). In addition, whenever
the RM restarts, all AMs will get a new token on the next heartbeat.
** Should the RM check that the roll interval and activation delay are both
shorter than the token expiration interval?
# The client app is not responsible for renewing the token. The RM will renew
it proactively and update the apps.
** The loss of control may be inconvenient to the app. The AM must also
heartbeat frequently enough to catch the update in time. In practice, it's not
an issue. But it still makes me slightly uncomfortable, since the client is the
usually one renewing its credentials, of all other security protocols I know
of. Here, the RM doesn't have any explicit logic to update an AMRM token before
it expires. The math just generally works out if the admin sets the token
expiry, roll interval and activation delay to the right values.\\
\\
Again, I think this is better than making it the AM's responsibility to get a
new token, which is more burden on the AM. I just want to bring this up for
discussion.

RM Should have a way to update the tokens it has for a running application
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart

2014-05-20 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003489#comment-14003489
 ] 

Junping Du commented on YARN-1338:
--

[~jlowe], thanks again for your patch here! A few comments so far:
One question in general: beside null store and a leveled store, I saw a memory 
store implemented there but no usage so far. Does it helps in some scenario or 
only for test purpose? 

In NodeManager#serviceInit()
{code}
if (recoveryEnabled) {
...
+  nmStore = new NMLeveldbStateStoreService();
+} else {
+  nmStore = new NMNullStateStoreService();
 }
+nmStore.init(conf);
+nmStore.start();
{code}
Can we abstract code since if block into a method, something like: 
initializeNMStore(conf)? which can make NodeManager#serviceInit() simpler. 
 
In yarn_server_nodemanager_recovery.proto,
{code}
+message LocalizedResourceProto {
+  optional LocalResourceProto resource = 1;
+  optional string localPath = 2;
+  optional int64 size = 3;
+}
{code}
Does size here represent for size of local resource? If so, may be duplicated 
with the size within LocalResourceProto?

In ResourceLocalizationService.java
{code}
+  //Recover localized resources after an NM restart
+  public void recoverLocalizedResources(RecoveredLocalizationState state)
+  throws URISyntaxException {
+  ...
+  for (Map.EntryApplicationId, LocalResourceTrackerState appEntry :
+   userResources.getAppTrackerStates().entrySet()) {
+ApplicationId appId = appEntry.getKey();
+...
+recoverTrackerResources(tracker, appEntry.getValue());
+  }
+}
+  }
{code}
May be we should check appResourceState(appEntry.getValue)’s localizedResources 
and inProgressResources is not empty before recover it as we check for 
userResourceState?

In NMMemoryStateStoreService#loadLocalizationState()
{code}
  ...
+if (tk.appId == null) {
+  rur.privateTrackerState = loadTrackerState(ts);
+} else {
+  rur.appTrackerStates.put(tk.appId, loadTrackerState(ts));
+}
  ...
{code}
May be even in case tk.appId !=null, we should load private resource state as 
well?

Given the patch is big enough, I haven’t finished my review although walk 
though it a few times. More comments may come later.

 Recover localized resource cache state upon nodemanager restart
 ---

 Key: YARN-1338
 URL: https://issues.apache.org/jira/browse/YARN-1338
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1338.patch, YARN-1338v2.patch, 
 YARN-1338v3-and-YARN-1987.patch, YARN-1338v4.patch


 Today when node manager restarts we clean up all the distributed cache files 
 from disk. This is definitely not ideal from 2 aspects.
 * For work preserving restart we definitely want them as running containers 
 are using them
 * For even non work preserving restart this will be useful in the sense that 
 we don't have to download them again if needed by future tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext

2014-05-20 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003548#comment-14003548
 ] 

Jason Lowe commented on YARN-2050:
--

+1 lgtm.  Committing this.

 Fix LogCLIHelpers to create the correct FileContext
 ---

 Key: YARN-2050
 URL: https://issues.apache.org/jira/browse/YARN-2050
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: YARN-2050-2.patch, YARN-2050.patch


 LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus 
 the FileContext created isn't necessarily the FileContext for remote log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2076) Minor error in TestLeafQueue files

2014-05-20 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2076:
--

Attachment: YARN-2076.patch

 Minor error in TestLeafQueue files
 --

 Key: YARN-2076
 URL: https://issues.apache.org/jira/browse/YARN-2076
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Chen He
Assignee: Chen He
Priority: Minor
  Labels: test
 Attachments: YARN-2076.patch


 numNodes should be 2 instead of 3 in testReservationExchange() since only 
 two nodes are defined.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.


[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003675#comment-14003675
 ] 

Hadoop QA commented on YARN-1680:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645816/YARN-1680.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3773//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3773//console

This message is automatically generated.

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He
 Attachments: YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-20 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003678#comment-14003678
]

Bikas Saha commented on YARN-1366:
--

bq.If there's no RM restart, a normal app only calling unregister without
calling register earlier will be just deemed as FINISHED ? is this acceptable?
bq.What about storing information on zk for registered application.
Catching incorrect unregistration before registration should have always been
there. Is this a regression in the patch or an existing bug. Should we consider
the possibility of allowing unregister without register? What are the
downsides? As long as we can make sure that unregister is coming from the
latest version of the app.

ApplicationMasterService should Resync with the AM upon allocate call after
restart
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-05-20 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1365:


Attachment: YARN-1365.002.patch

Added ApplicationMasterService changes to send SHUTDOWN for attempt thats not 
known and RESYNC for allocate if the AM has not registered after restart. 
Added more Unit tests that verify these

Pending how to handle unregister after restart for an unregistered AM.   

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2076) Minor error in TestLeafQueue files


[ 
https://issues.apache.org/jira/browse/YARN-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003703#comment-14003703
 ] 

Hadoop QA commented on YARN-2076:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645818/YARN-2076.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3774//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3774//console

This message is automatically generated.

 Minor error in TestLeafQueue files
 --

 Key: YARN-2076
 URL: https://issues.apache.org/jira/browse/YARN-2076
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Chen He
Assignee: Chen He
Priority: Minor
  Labels: test
 Attachments: YARN-2076.patch


 numNodes should be 2 instead of 3 in testReservationExchange() since only 
 two nodes are defined.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart


[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003782#comment-14003782
 ] 

Hadoop QA commented on YARN-1365:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645826/YARN-1365.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3775//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3775//console

This message is automatically generated.

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application

2014-05-20 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003796#comment-14003796
 ] 

Marcelo Vanzin commented on YARN-941:
-

Apologies for jumping in the middle of the conversation. I don't have a lot of 
background into the Yarn code here, but from this bug and some internal 
discussions I have a question for people who are more familiar with this code:

What is the purpose of this renewal mechanism?

So far it seems to me that it's an attack mitigation feature. An attacker who 
is able to get the token would only be able to use it while the original 
application (i) is running and (ii) keeps renewing the token.

if that's true, it sounds to me like the problem is actually that it's possible 
to sniff the token in the first place. Wouldn't it be better, at that point, to 
have a protocol that doesn't allow that? Either using full-blown encryption for 
the RPC channels, or if that's deemed too expensive, some mechanism where 
tokens are negotiated instead of sent in plain text over the wire.

 RM Should have a way to update the tokens it has for a running application
 --

 Key: YARN-941
 URL: https://issues.apache.org/jira/browse/YARN-941
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Robert Joseph Evans
Assignee: Xuan Gong
 Attachments: YARN-941.preview.2.patch, YARN-941.preview.3.patch, 
 YARN-941.preview.patch


 When an application is submitted to the RM it includes with it a set of 
 tokens that the RM will renew on behalf of the application, that will be 
 passed to the AM when the application is launched, and will be used when 
 launching the application to access HDFS to download files on behalf of the 
 application.
 For long lived applications/services these tokens can expire, and then the 
 tokens that the AM has will be invalid, and the tokens that the RM had will 
 also not work to launch a new AM.
 We need to provide an API that will allow the RM to replace the current 
 tokens for this application with a new set.  To avoid any real race issues, I 
 think this API should be something that the AM calls, so that the client can 
 connect to the AM with a new set of tokens it got using kerberos, then the AM 
 can inform the RM of the new set of tokens and quickly update its tokens 
 internally to use these new ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1935) Security for timeline server

2014-05-20 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhijie Shen updated YARN-1935:
--

Attachment: Timeline_Kerberos_DT_ACLs.2.patch
Timeline Security Diagram.pdf

Hi folks,

I've just attached a diagram Timeline Security Diagram.pdf to demonstrate the
rough workflow of the the timeline security. In general, it consists of two
parts: authentication and the authorization.

*1. Authentication*

a) When the authentication is enabled, a customized authentication filter will
be loaded into the webapp of the timeline server, which prevents unauthorized
users to access any timeline web resources. The filter allow users to:

* negotiate the authentication via HTTP SPNEGO, and login with Kerberos
principal and keytab; and

* request a delegation token after Kerberos login and use it for follow-up
secured communication.

b) TimelineClient is adapted to pass the authentication before putting the
timeline data. It can choose append the Kerberos token or delegation token into
the HTTP request. The rationale behind supporting delegation token is to allow
AM and other containers to use TimelineClient to put the timeline data in a
secured manner, where the Kerberos stuff is not available.

c) TimelineClient also has the API to get the delegation token from the
timeline sever (actually from the customized authentication filter). When
security is enabled and the timeline service is enabled, and YarnClient is used
to submit an application, YarnClient will automatically call TimeClient to get
a delegation token and put into the application submission context, such that
the AM can used the passed-in delegation token to communicate with the timeline
server securely.

d) Any tool which support SPNEGO/Kerberos, such as Firefox, curl and etc., can
access the three GET APIs of the timeline server to inquiry the timeline data.

*2. Authorization*

Once the request from an authenticated user passes the customized
authentication filter, it will be processed by the timeline web services. Here
we use the ACLs manager to determine whether the user of the request has the
access to the requested data. The basic rules are as follows:

* The access control granularity is entity, which means a user can access all
the information of any entity and its events, or he/she can access nothing of
it.

* Currently we only allow the owner of the entity to access it. In the future,
we can simply extend the rule to allow Admin and users/groups on the access
control list.

*Configuration*
After all, to enable the timeline security, we need to setup Kerberos. In
addition, there're a bunch of configurations to do:

* Make use of the filter initializer to setup the customized authentication
filter, and the configuration is much like hadoop-auth style; and

* ACLs is controlled by YARN ACLs configuration like other YARN daemons.

I also uploaded my newest uber patch Timeline_Kerberos_DT_ACLs.2.patch to
demonstrate how the design is implemented

Security for timeline server

Key: YARN-1935
URL: https://issues.apache.org/jira/browse/YARN-1935
Project: Hadoop YARN
Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Zhijie Shen
Attachments: Timeline Security Diagram.pdf,
Timeline_Kerberos_DT_ACLs.2.patch, Timeline_Kerberos_DT_ACLs.patch

Jira to track work to secure the ATS

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem

2014-05-20 Thread Subramaniam Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1709:
---

Description: This JIRA is about the key data structure used to track 
resources over time to enable YARN-1051. The Reservation subsystem is 
conceptually a plan of how the scheduler will allocate resources over-time.  
(was: This JIRA is about the key data structure used to track resources over 
time to enable YARN-1051. The inventory subsystem is conceptually a plan of 
how the capacity scheduler will be configured over-time.)
Summary: Admission Control: Reservation subsystem  (was: Admission 
Control: inventory subsystem)

 Admission Control: Reservation subsystem
 

 Key: YARN-1709
 URL: https://issues.apache.org/jira/browse/YARN-1709
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Subramaniam Krishnan

 This JIRA is about the key data structure used to track resources over time 
 to enable YARN-1051. The Reservation subsystem is conceptually a plan of 
 how the scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-05-20 Thread Subramaniam Krishnan (JIRA)

Subramaniam Krishnan created YARN-2080:
--

 Summary: Admission Control: Integrate Reservation subsystem with 
ResourceManager
 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan


This JIRA is about the key data structure used to track resources over time to 
enable YARN-1051. The Reservation subsystem is conceptually a plan of how the 
scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-05-20 Thread Subramaniam Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-2080:
---

Description: This JIRA tracks the integration of Reservation subsystem data 
structures introduced in YARN-1709 with the YARN RM. This is essentially 
end2end wiring of YARN-1051.  (was: This JIRA is about the key data structure 
used to track resources over time to enable YARN-1051. The Reservation 
subsystem is conceptually a plan of how the scheduler will allocate resources 
over-time.)

 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan

 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-20 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-2074:
-

Assignee: Jian He  (was: Vinod Kumar Vavilapalli)

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He

 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-20 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003892#comment-14003892
 ] 

Jian He commented on YARN-2074:
---

I'd like to work on this. Taking this over..

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-20 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003960#comment-14003960
 ] 

Vinod Kumar Vavilapalli commented on YARN-2074:
---

[~sunilg], Agree that as much as possible we should avoid killing the AM during 
preemption and so we should look at YARN-2022 orthogonally. This one focuses 
only on the point that in the case that this cannot be avoided, it shouldn't be 
accounted towards AM failures.

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He

 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1569) For handle(SchedulerEvent) in FifoScheduler and CapacityScheduler, SchedulerEvent should get checked (instanceof) for appropriate type before casting

2014-05-20 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-1569:


Attachment: yarn-1569.patch

 For handle(SchedulerEvent) in FifoScheduler and CapacityScheduler, 
 SchedulerEvent should get checked (instanceof) for appropriate type before 
 casting
 -

 Key: YARN-1569
 URL: https://issues.apache.org/jira/browse/YARN-1569
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Junping Du
Assignee: zhihai xu
Priority: Minor
  Labels: newbie
 Attachments: yarn-1569.patch


 As following: http://wiki.apache.org/hadoop/CodeReviewChecklist, we should 
 always check appropriate type before casting. 
 handle(SchedulerEvent) in FifoScheduler and CapacityScheduler didn't check so 
 far (no bug there now) but should be improved as FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server

2014-05-20 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003994#comment-14003994
 ] 

Vinod Kumar Vavilapalli commented on YARN-1938:
---

Looks good to me too. Can you add the new configs into yarn-default.xml?

 Kerberos authentication for the timeline server
 ---

 Key: YARN-1938
 URL: https://issues.apache.org/jira/browse/YARN-1938
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1938.1.patch, YARN-1938.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1938) Kerberos authentication for the timeline server

2014-05-20 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1938:
--

Target Version/s: 2.5.0

 Kerberos authentication for the timeline server
 ---

 Key: YARN-1938
 URL: https://issues.apache.org/jira/browse/YARN-1938
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1938.1.patch, YARN-1938.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1338) Recover localized resource cache state upon nodemanager restart

2014-05-20 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1338:
-

Attachment: YARN-1338v5.patch

Thanks for the review, Junping!  Attaching a patch to address your comments 
with specific responses below.

bq. beside null store and a leveled store, I saw a memory store implemented 
there but no usage so far. Does it helps in some scenario or only for test 
purpose?

It's only for use in unit tests which is why it's located under src/test/.  It 
stores state in the memory of the JVM itself, so it's not very useful for 
real-world recovery scenarios.  The state is lost when the NM crashes/exits.

bq. Can we abstract code since if block into a method, something like: 
initializeNMStore(conf)? which can make NodeManager#serviceInit() simpler.

Done.

bq. Does size here represent for size of local resource? If so, may be 
duplicated with the size within LocalResourceProto?

As I understand it they are slightly different.  The size in the 
LocalResourceProto is the size of the resource that will be downloaded, while 
the size in LocalizedResource (and also persisted in LocalizedResourceProto) is 
the size of the resource on the local disk.  These can be different if the 
resource is uncompressed/unarchived after downloading (e.g.: a .tar.gz 
resource).

bq. May be we should check appResourceState(appEntry.getValue)’s 
localizedResources and inProgressResources is not empty before recover it as we 
check for userResourceState?

Done.  I also added a LocalResourceTrackerState#isEmpty method to make the code 
a bit cleaner.

bq. May be even in case tk.appId !=null, we should load private resource state 
as well?

No, if tk.appId is not null then this is state for an app-specific resource 
tracker and not for a private resource tracker.  See the javadoc for 
NMStateStoreService#startResourceLocalization or 
NMStateStoreService#finishResourceLocalziation for some hints, and I also added 
some comments to the NMMemoryStateStoreService to clarify how the user and 
appId are used to discern public vs. private vs. app-specific trackers.

 Recover localized resource cache state upon nodemanager restart
 ---

 Key: YARN-1338
 URL: https://issues.apache.org/jira/browse/YARN-1338
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1338.patch, YARN-1338v2.patch, 
 YARN-1338v3-and-YARN-1987.patch, YARN-1338v4.patch, YARN-1338v5.patch


 Today when node manager restarts we clean up all the distributed cache files 
 from disk. This is definitely not ideal from 2 aspects.
 * For work preserving restart we definitely want them as running containers 
 are using them
 * For even non work preserving restart this will be useful in the sense that 
 we don't have to download them again if needed by future tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1569) For handle(SchedulerEvent) in FifoScheduler and CapacityScheduler, SchedulerEvent should get checked (instanceof) for appropriate type before casting


[ 
https://issues.apache.org/jira/browse/YARN-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004063#comment-14004063
 ] 

Hadoop QA commented on YARN-1569:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645863/yarn-1569.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3777//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3777//console

This message is automatically generated.

 For handle(SchedulerEvent) in FifoScheduler and CapacityScheduler, 
 SchedulerEvent should get checked (instanceof) for appropriate type before 
 casting
 -

 Key: YARN-1569
 URL: https://issues.apache.org/jira/browse/YARN-1569
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Junping Du
Assignee: zhihai xu
Priority: Minor
  Labels: newbie
 Attachments: yarn-1569.patch


 As following: http://wiki.apache.org/hadoop/CodeReviewChecklist, we should 
 always check appropriate type before casting. 
 handle(SchedulerEvent) in FifoScheduler and CapacityScheduler didn't check so 
 far (no bug there now) but should be improved as FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-20 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2049:
--

Attachment: YARN-2049.3.patch

I created a new patch, which will no longer rely on HADOOP-10596, given it is 
still arguable how we should fix initSpnego of HttpServer2. In this patch, I 
walked around to use the filter initializer approach introduce by hadoop-auth 
to load TimelineAuthenticationFilter, though it is not consistent with the 
existing YARN-style SPNEGO configuration. Hopefully folks are fine with the the 
walk around to make the timeline security available ASAP.

 Delegation token stuff for the timeline sever
 -

 Key: YARN-2049
 URL: https://issues.apache.org/jira/browse/YARN-2049
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-20 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.2.patch

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1938) Kerberos authentication for the timeline server

2014-05-20 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1938:
--

Attachment: YARN-1938.3.patch

Thanks for review, Vinod and Varun. I added the configs into yarn-default.xml 
as well in the newest patch.

 Kerberos authentication for the timeline server
 ---

 Key: YARN-1938
 URL: https://issues.apache.org/jira/browse/YARN-1938
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server


[ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004138#comment-14004138
 ] 

Hadoop QA commented on YARN-1938:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645907/YARN-1938.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3780//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3780//console

This message is automatically generated.

 Kerberos authentication for the timeline server
 ---

 Key: YARN-1938
 URL: https://issues.apache.org/jira/browse/YARN-1938
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures


[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004146#comment-14004146
 ] 

Hadoop QA commented on YARN-2074:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645906/YARN-2074.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3781//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3781//console

This message is automatically generated.

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster


 [ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2073:
---

Attachment: yarn-2073-1.patch

Added a unit test - the test fails without the fix. Also, moved a bunch of 
helper code from TestFairScheduler to FairSchedulerTestBase.

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse

2014-05-20 Thread Ming Ma (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004166#comment-14004166
]

Ming Ma commented on YARN-1897:
---

Chatted with Gera offline. The definition of SignalContainer* APIs is needed
for other subtasks including YARN-1515. So we will resolve SignalContainer*
APIs issues in this jira. After it is done, other subtasks can continue. Here
are couple open issues.

1. Support for a list of containers. The latest patch in this jira just
supports a flat list of signalContainerRequest, regardless if they are from the
same containers or not. Gera's patch in YARN-1515 groups all commands under the
same container together via signalContainerRequest.getSignals(). Either
approach works. I don't have strong preference either way given the most common
use case is for single container; although signalContainers is more consistent
with startContainers.

2. Support for SIGTERM + delay + SIGKILL used in stopContainers. Latest
YARN-1515 introduces Pause method so that containers can pause in between
signals. We need something like that to support YARN-1515 scenario. Or we can
provide some new SignalContainerCommand like sleep.

Really appreciate any comments on this.

Define SignalContainerRequest and SignalContainerResponse
-

Key: YARN-1897
URL: https://issues.apache.org/jira/browse/YARN-1897
Project: Hadoop YARN
Issue Type: Sub-task
Components: api
Reporter: Ming Ma
Assignee: Ming Ma
Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch,
YARN-1897.1.patch

We need to define SignalContainerRequest and SignalContainerResponse first as
they are needed by other sub tasks. SignalContainerRequest should use
OS-independent commands and provide a way to application to specify reason
for diagnosis. SignalContainerResponse might be empty.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-20 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004175#comment-14004175
 ] 

Wei Yan commented on YARN-2073:
---

[~kasha], if we move preemption-related test code to a separate .java file, we 
may also need to move the previous preemption-related test functions 
(testChoiceOfPreemptedContainers and testPreemptionDecision) to the new file. 

And so next step we'll divide the TestFairScheduler into several test files 
according to different scheduler operations?

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster


[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004181#comment-14004181
 ] 

Karthik Kambatla commented on YARN-2073:


bq. we may also need to move the previous preemption-related test functions 
(testChoiceOfPreemptedContainers and testPreemptionDecision) to the new file
Moving them might require slightly more work, and I was planning on doing that 
in a separate JIRA along with splitting the tests into multiple files. 

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext


[ 
https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004198#comment-14004198
 ] 

Hudson commented on YARN-2050:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5607 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5607/])
YARN-2050. Fix LogCLIHelpers to create the correct FileContext. Contributed by 
Ming Ma (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596310)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java


 Fix LogCLIHelpers to create the correct FileContext
 ---

 Key: YARN-2050
 URL: https://issues.apache.org/jira/browse/YARN-2050
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-2050-2.patch, YARN-2050.patch


 LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus 
 the FileContext created isn't necessarily the FileContext for remote log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster


 [ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2073:
---

Attachment: yarn-2073-2.patch

Thanks Wei. Updated patch to address the nits.

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster


[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004241#comment-14004241
 ] 

Hadoop QA commented on YARN-2073:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645920/yarn-2073-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3783//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3783//console

This message is automatically generated.

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-20 Thread Rohith (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004279#comment-14004279
]

Rohith commented on YARN-1366:
--

bq. Catching incorrect unregistration before registration should have always
been there. Is this a regression in the patch or an existing bug.
This is not bug in existing code. Unregister in ApplicationMasterService check
whether app is registered.Otherwise throw
InvalidApplicationMasterRequestException

bq. Should we consider the possibility of allowing unregister without register?
Yes, becaue for differentiating
last heatbeat sent by AM to RM,RM restarted, and unregistering
application VS application master sending unregister without registering

ApplicationMasterService should Resync with the AM upon allocate call after
restart
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-20 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004290#comment-14004290
 ] 

Sandy Ryza commented on YARN-2073:
--

There are some situations where preemption with free resources on the cluster 
is the right thing to do.

For example, if I'm requesting 2 GB containers, I have no resources, and 100 
nodes on the cluster each have 1GB remaining, containers should get preempted 
on my behalf.

There are also cases arising from requests with strict locality - the cluster 
might have resources available because I'm waiting on a subset of nodes.  (In 
this case, we'd probably want to make sure preemption only happens on the nodes 
being waited for; otherwise we'd kill containers needlessly).

If the goal is to make sure that we aren't preempting on behalf of an 
application that's actually receiving resources, it might also be worth 
considering time-based approaches. E.g. only preempt on behalf of an 
application that hasn't received resources in some amount of time.

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore


 [ 
https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated YARN-2030:


Attachment: YARN-2030.v2.patch

attach v2 patch to fix findbug warnings

 Use StateMachine to simplify handleStoreEvent() in RMStateStore
 ---

 Key: YARN-2030
 URL: https://issues.apache.org/jira/browse/YARN-2030
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Junping Du
Assignee: Binglin Chang
 Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch


 Now the logic to handle different store events in handleStoreEvent() is as 
 following:
 {code}
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
 ...
   } else {
 ...
   }
   ...
   try {
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
   ...
 } else {
   ...
 }
   } 
   ...
 } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
 ...
   } else {
 ...
   }
 ...
 if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
   ...
 } else {
   ...
 }
   }
   ...
 } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) {
 ...
 } else {
   ...
 }
 }
 {code}
 This is not only confuse people but also led to mistake easily. We may 
 leverage state machine to simply this even no state transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2012) Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute

2014-05-20 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004307#comment-14004307
 ] 

Sandy Ryza commented on YARN-2012:
--

{code}
+  defaultQueueName = root. + defaultQueueName;
{code}
This should go inside the initializeFromXml method.

{code}
+if (configuredQueues.get(FSQueueType.LEAF).contains(defaultQueueName)
+|| configuredQueues.get(FSQueueType.PARENT).contains(
+defaultQueueName)) {
+  return defaultQueueName;
+}
+  }
   return root. + YarnConfiguration.DEFAULT_QUEUE_NAME;
{code}
I think it's a little confusing for the rule to fall back to default.  Can we 
let this part be handled by the create logic in assignAppToQueue?

 Fair Scheduler : Default rule in queue placement policy can take a queue as 
 an optional attribute
 -

 Key: YARN-2012
 URL: https://issues.apache.org/jira/browse/YARN-2012
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2012-v1.txt, YARN-2012-v2.txt


 Currently 'default' rule in queue placement policy,if applied,puts the app in 
 root.default queue. It would be great if we can make 'default' rule 
 optionally point to a different queue as default queue . This queue should be 
 an existing queue,if not we fall back to root.default queue hence keeping 
 this rule as terminal.
 This default queue can be a leaf queue or it can also be an parent queue if 
 the 'default' rule is nested inside nestedUserQueue rule(YARN-1864).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster


[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004315#comment-14004315
 ] 

Karthik Kambatla commented on YARN-2073:


Sandy - you make very good points. In other words, we want to have an 
absoluteMinSharePreemptionTimeout. Now, the question becomes whether we 
should express this as a separate timeout config or a scaling factor which 
determines this absolute timeout for both min-share and fair-share? Also, we 
can make it a per-queue config or a single factor for the cluster.

Eventually, we need a better story for preemption. Currently, it is like a 
spray gun, we preempt some resources and hope that helps this application. 
Instead, we should preempt resources that match the application's ask. In that 
case, this new config will be moot. 

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore

2014-05-20 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004335#comment-14004335
 ] 

Junping Du commented on YARN-2030:
--

Hi [~decster], thanks for taking on this effort. I will review your patch.

 Use StateMachine to simplify handleStoreEvent() in RMStateStore
 ---

 Key: YARN-2030
 URL: https://issues.apache.org/jira/browse/YARN-2030
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Junping Du
Assignee: Binglin Chang
 Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch


 Now the logic to handle different store events in handleStoreEvent() is as 
 following:
 {code}
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
 ...
   } else {
 ...
   }
   ...
   try {
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
   ...
 } else {
   ...
 }
   } 
   ...
 } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
 ...
   } else {
 ...
   }
 ...
 if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
   ...
 } else {
   ...
 }
   }
   ...
 } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) {
 ...
 } else {
   ...
 }
 }
 {code}
 This is not only confuse people but also led to mistake easily. We may 
 leverage state machine to simply this even no state transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2081) TestDistributedShell fails after YARN-1962

2014-05-20 Thread Hong Zhiguo (JIRA)

Hong Zhiguo created YARN-2081:
-

 Summary: TestDistributedShell fails after YARN-1962
 Key: YARN-2081
 URL: https://issues.apache.org/jira/browse/YARN-2081
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor


java.lang.AssertionError: expected:1 but was:0
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2081) TestDistributedShell fails after YARN-1962

2014-05-20 Thread Hong Zhiguo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2081:
--

Attachment: YARN-2081.patch

 TestDistributedShell fails after YARN-1962
 --

 Key: YARN-2081
 URL: https://issues.apache.org/jira/browse/YARN-2081
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2081.patch


 java.lang.AssertionError: expected:1 but was:0
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at org.junit.Assert.assertEquals(Assert.java:542)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2051) Add more unit tests for PBImpl that didn't get covered