[jira] [Updated] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse

2014-05-20 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-1897:
--

Attachment: YARN-1897-4.patch

Updated patch per Vinod's suggestions.

1. Clean up SignalContainerCommand.
2. Support signalContainersRequest.

 Define SignalContainerRequest and SignalContainerResponse
 -

 Key: YARN-1897
 URL: https://issues.apache.org/jira/browse/YARN-1897
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
 YARN-1897.1.patch


 We need to define SignalContainerRequest and SignalContainerResponse first as 
 they are needed by other sub tasks. SignalContainerRequest should use 
 OS-independent commands and provide a way to application to specify reason 
 for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1803) Signal container support in nodemanager

2014-05-20 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002847#comment-14002847
 ] 

Ming Ma commented on YARN-1803:
---

Vinod, I have updated YARN-1897. Please let me know if you have other 
suggestions.  I can also upload updated version for other subtasks that depend 
on YARN-1897.

 Signal container support in nodemanager
 ---

 Key: YARN-1803
 URL: https://issues.apache.org/jira/browse/YARN-1803
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: YARN-1803.patch


 It could include the followings.
 1. ContainerManager is able to process a new event type 
 ContainerManagerEventType.SIGNAL_CONTAINERS coming from NodeStatusUpdater and 
 deliver the request to ContainerExecutor.
 2. Translate the platform independent signal command to Linux specific 
 signals. Windows support will be tracked by another task.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-20 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002858#comment-14002858
 ] 

Rohith commented on YARN-1366:
--

bq. If there's no RM restart, a normal app only calling unregister without 
calling register earlier will be just deemed as FINISHED ? is this acceptable?
No. The mutual contract that unregistration should not be called before 
registering (MR handles this.MAPREDUCE-5769) but still in defensive programming 
this has to be handled at yarn.What about storing information on zk for 
registered application.? This can be read during recovery and move application 
directly to running.



 ApplicationMasterService should Resync with the AM upon allocate call after 
 restart
 ---

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.patch, 
 YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-20 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002873#comment-14002873
 ] 

Rohith commented on YARN-1366:
--

Adding to above point, enfource AMRMClient to handle unregistration should not 
be called before registering.

 ApplicationMasterService should Resync with the AM upon allocate call after 
 restart
 ---

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.patch, 
 YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse

2014-05-20 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002875#comment-14002875
 ] 

Gera Shegalov commented on YARN-1897:
-

I am confused, [~mingma]. I thought we agreed to do it as YARN-1515.

 Define SignalContainerRequest and SignalContainerResponse
 -

 Key: YARN-1897
 URL: https://issues.apache.org/jira/browse/YARN-1897
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
 YARN-1897.1.patch


 We need to define SignalContainerRequest and SignalContainerResponse first as 
 they are needed by other sub tasks. SignalContainerRequest should use 
 OS-independent commands and provide a way to application to specify reason 
 for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2077) JobImpl#makeUberDecision doesn't log that Uber mode is disabled because of too much CPUs

2014-05-20 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2077:
-

Affects Version/s: 2.4.0

 JobImpl#makeUberDecision doesn't log that Uber mode is disabled because of 
 too much CPUs
 

 Key: YARN-2077
 URL: https://issues.apache.org/jira/browse/YARN-2077
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2077.1.patch


 JobImpl#makeUberDecision usually logs why the Job cannot be launched as Uber 
 mode(e.g. too much RAM; or something).  About CPUs, it's not logged 
 currently. We should log it when too much CPU.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2078) yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented

2014-05-20 Thread Tsuyoshi OZAWA (JIRA)
Tsuyoshi OZAWA created YARN-2078:


 Summary: yarn.app.am.resource.mb/cpu-vcores affects uber mode but 
is not documented
 Key: YARN-2078
 URL: https://issues.apache.org/jira/browse/YARN-2078
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial


We should document the condition when uber mode is enabled. If not, users need 
to read code.

{code}
boolean smallMemory =
( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
= sysMemSizeForUberSlot)
|| (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
   boolean smallCpu =
Math.max(
conf.getInt(
MRJobConfig.MAP_CPU_VCORES, 
MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
conf.getInt(
MRJobConfig.REDUCE_CPU_VCORES, 
MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
 = sysCPUSizeForUberSlot
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2078) yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented

2014-05-20 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2078:
-

Attachment: YARN-2078.1.patch

 yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented
 --

 Key: YARN-2078
 URL: https://issues.apache.org/jira/browse/YARN-2078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2078.1.patch


 We should document the condition when uber mode is enabled. If not, users 
 need to read code.
 {code}
 boolean smallMemory =
 ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
 conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
 = sysMemSizeForUberSlot)
 || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
boolean smallCpu =
 Math.max(
 conf.getInt(
 MRJobConfig.MAP_CPU_VCORES, 
 MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
 conf.getInt(
 MRJobConfig.REDUCE_CPU_VCORES, 
 MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
  = sysCPUSizeForUberSlot
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2078) yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented

2014-05-20 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2078:
-

Component/s: documentation

 yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented
 --

 Key: YARN-2078
 URL: https://issues.apache.org/jira/browse/YARN-2078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2078.1.patch


 We should document the condition when uber mode is enabled. If not, users 
 need to read code.
 {code}
 boolean smallMemory =
 ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
 conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
 = sysMemSizeForUberSlot)
 || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
boolean smallCpu =
 Math.max(
 conf.getInt(
 MRJobConfig.MAP_CPU_VCORES, 
 MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
 conf.getInt(
 MRJobConfig.REDUCE_CPU_VCORES, 
 MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
  = sysCPUSizeForUberSlot
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2078) yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented

2014-05-20 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2078:
-

Affects Version/s: 2.4.0

 yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented
 --

 Key: YARN-2078
 URL: https://issues.apache.org/jira/browse/YARN-2078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2078.1.patch


 We should document the condition when uber mode is enabled. If not, users 
 need to read code.
 {code}
 boolean smallMemory =
 ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
 conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
 = sysMemSizeForUberSlot)
 || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
boolean smallCpu =
 Math.max(
 conf.getInt(
 MRJobConfig.MAP_CPU_VCORES, 
 MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
 conf.getInt(
 MRJobConfig.REDUCE_CPU_VCORES, 
 MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
  = sysCPUSizeForUberSlot
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2077) JobImpl#makeUberDecision doesn't log that Uber mode is disabled because of too much CPUs

2014-05-20 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2077:
-

Component/s: client

 JobImpl#makeUberDecision doesn't log that Uber mode is disabled because of 
 too much CPUs
 

 Key: YARN-2077
 URL: https://issues.apache.org/jira/browse/YARN-2077
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2077.1.patch


 JobImpl#makeUberDecision usually logs why the Job cannot be launched as Uber 
 mode(e.g. too much RAM; or something).  About CPUs, it's not logged 
 currently. We should log it when too much CPU.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore

2014-05-20 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated YARN-2030:


Attachment: YARN-2030.v1.patch

Attach patch. 
This is a code refactor,  TestFSRMStateStore and TestZKRMStateStore already 
cover the code here, so no additional test is added.



 Use StateMachine to simplify handleStoreEvent() in RMStateStore
 ---

 Key: YARN-2030
 URL: https://issues.apache.org/jira/browse/YARN-2030
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Junping Du
Assignee: Binglin Chang
 Attachments: YARN-2030.v1.patch


 Now the logic to handle different store events in handleStoreEvent() is as 
 following:
 {code}
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
 ...
   } else {
 ...
   }
   ...
   try {
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
   ...
 } else {
   ...
 }
   } 
   ...
 } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
 ...
   } else {
 ...
   }
 ...
 if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
   ...
 } else {
   ...
 }
   }
   ...
 } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) {
 ...
 } else {
   ...
 }
 }
 {code}
 This is not only confuse people but also led to mistake easily. We may 
 leverage state machine to simply this even no state transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2051) Add more unit tests for PBImpl that didn't get covered

2014-05-20 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang reassigned YARN-2051:
---

Assignee: Binglin Chang

 Add more unit tests for PBImpl that didn't get covered
 --

 Key: YARN-2051
 URL: https://issues.apache.org/jira/browse/YARN-2051
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du
Assignee: Binglin Chang
Priority: Critical

 From YARN-2016, we can see some bug could exist in PB implementation of 
 protocol. The bad news is most of these PBImpl don't have any unit test to 
 verify the info is not lost or changed after serialization/deserialization. 
 We should add more tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002937#comment-14002937
 ] 

Hudson commented on YARN-2053:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5606 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5606/])
YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from 
previous attempts for work-preserving AM restart. Contributed by Wangda Tan 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2066) Wrong field is referenced in GetApplicationsRequestPBImpl#mergeLocalToBuilder()

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002941#comment-14002941
 ] 

Hudson commented on YARN-2066:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5606 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5606/])
YARN-2066. Wrong field is referenced in 
GetApplicationsRequestPBImpl#mergeLocalToBuilder (Contributed by Hong Zhiguo) 
(junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595413)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetApplicationsRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestGetApplicationsRequest.java


 Wrong field is referenced in 
 GetApplicationsRequestPBImpl#mergeLocalToBuilder()
 ---

 Key: YARN-2066
 URL: https://issues.apache.org/jira/browse/YARN-2066
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.4.1

 Attachments: YARN-2066.patch


 {code}
 if (this.finish != null) {
   builder.setFinishBegin(start.getMinimumLong());
   builder.setFinishEnd(start.getMaximumLong());
 }
 {code}
 this.finish should be referenced in the if block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2078) yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002981#comment-14002981
 ] 

Hadoop QA commented on YARN-2078:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645748/YARN-2078.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3768//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3768//console

This message is automatically generated.

 yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented
 --

 Key: YARN-2078
 URL: https://issues.apache.org/jira/browse/YARN-2078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2078.1.patch


 We should document the condition when uber mode is enabled. If not, users 
 need to read code.
 {code}
 boolean smallMemory =
 ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0),
 conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0))
 = sysMemSizeForUberSlot)
 || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT));
boolean smallCpu =
 Math.max(
 conf.getInt(
 MRJobConfig.MAP_CPU_VCORES, 
 MRJobConfig.DEFAULT_MAP_CPU_VCORES), 
 conf.getInt(
 MRJobConfig.REDUCE_CPU_VCORES, 
 MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) 
  = sysCPUSizeForUberSlot
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2077) JobImpl#makeUberDecision doesn't log that Uber mode is disabled because of too much CPUs

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002983#comment-14002983
 ] 

Hadoop QA commented on YARN-2077:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645746/YARN-2077.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3767//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3767//console

This message is automatically generated.

 JobImpl#makeUberDecision doesn't log that Uber mode is disabled because of 
 too much CPUs
 

 Key: YARN-2077
 URL: https://issues.apache.org/jira/browse/YARN-2077
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.4.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2077.1.patch


 JobImpl#makeUberDecision usually logs why the Job cannot be launched as Uber 
 mode(e.g. too much RAM; or something).  About CPUs, it's not logged 
 currently. We should log it when too much CPU.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003009#comment-14003009
 ] 

Hadoop QA commented on YARN-1897:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645735/YARN-1897-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3771//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3771//console

This message is automatically generated.

 Define SignalContainerRequest and SignalContainerResponse
 -

 Key: YARN-1897
 URL: https://issues.apache.org/jira/browse/YARN-1897
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
 YARN-1897.1.patch


 We need to define SignalContainerRequest and SignalContainerResponse first as 
 they are needed by other sub tasks. SignalContainerRequest should use 
 OS-independent commands and provide a way to application to specify reason 
 for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003027#comment-14003027
 ] 

Hadoop QA commented on YARN-2075:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645730/YARN-2075.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3769//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3769//console

This message is automatically generated.

 TestRMAdminCLI consistently fail on trunk
 -

 Key: YARN-2075
 URL: https://issues.apache.org/jira/browse/YARN-2075
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
 Attachments: YARN-2075.patch


 {code}
 Running org.apache.hadoop.yarn.client.TestRMAdminCLI
 Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec 
  FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI
 testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
 elapsed: 0.082 sec   ERROR!
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.088 
 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003035#comment-14003035
 ] 

Hadoop QA commented on YARN-941:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12645713/YARN-941.preview.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.TestRMAdminCLI

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3770//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3770//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3770//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3770//console

This message is automatically generated.

 RM Should have a way to update the tokens it has for a running application
 --

 Key: YARN-941
 URL: https://issues.apache.org/jira/browse/YARN-941
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Robert Joseph Evans
Assignee: Xuan Gong
 Attachments: YARN-941.preview.2.patch, YARN-941.preview.3.patch, 
 YARN-941.preview.patch


 When an application is submitted to the RM it includes with it a set of 
 tokens that the RM will renew on behalf of the application, that will be 
 passed to the AM when the application is launched, and will be used when 
 launching the application to access HDFS to download files on behalf of the 
 application.
 For long lived applications/services these tokens can expire, and then the 
 tokens that the AM has will be invalid, and the tokens that the RM had will 
 also not work to launch a new AM.
 We need to provide an API that will allow the RM to replace the current 
 tokens for this application with a new set.  To avoid any real race issues, I 
 think this API should be something that the AM calls, so that the client can 
 connect to the AM with a new set of tokens it got using kerberos, then the AM 
 can inform the RM of the new set of tokens and quickly update its tokens 
 internally to use these new ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003208#comment-14003208
 ] 

Hadoop QA commented on YARN-2030:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645754/YARN-2030.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3772//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3772//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3772//console

This message is automatically generated.

 Use StateMachine to simplify handleStoreEvent() in RMStateStore
 ---

 Key: YARN-2030
 URL: https://issues.apache.org/jira/browse/YARN-2030
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Junping Du
Assignee: Binglin Chang
 Attachments: YARN-2030.v1.patch


 Now the logic to handle different store events in handleStoreEvent() is as 
 following:
 {code}
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
 ...
   } else {
 ...
   }
   ...
   try {
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
   ...
 } else {
   ...
 }
   } 
   ...
 } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
 ...
   } else {
 ...
   }
 ...
 if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
   ...
 } else {
   ...
 }
   }
   ...
 } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) {
 ...
 } else {
   ...
 }
 }
 {code}
 This is not only confuse people but also led to mistake easily. We may 
 leverage state machine to simply this even no state transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2066) Wrong field is referenced in GetApplicationsRequestPBImpl#mergeLocalToBuilder()

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003248#comment-14003248
 ] 

Hudson commented on YARN-2066:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #562 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/562/])
YARN-2066. Wrong field is referenced in 
GetApplicationsRequestPBImpl#mergeLocalToBuilder (Contributed by Hong Zhiguo) 
(junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595413)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetApplicationsRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestGetApplicationsRequest.java


 Wrong field is referenced in 
 GetApplicationsRequestPBImpl#mergeLocalToBuilder()
 ---

 Key: YARN-2066
 URL: https://issues.apache.org/jira/browse/YARN-2066
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.4.1

 Attachments: YARN-2066.patch


 {code}
 if (this.finish != null) {
   builder.setFinishBegin(start.getMinimumLong());
   builder.setFinishEnd(start.getMaximumLong());
 }
 {code}
 this.finish should be referenced in the if block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003257#comment-14003257
 ] 

Hudson commented on YARN-2053:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #562 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/562/])
YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from 
previous attempts for work-preserving AM restart. Contributed by Wangda Tan 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003292#comment-14003292
 ] 

Hudson commented on YARN-2053:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1754 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1754/])
YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from 
previous attempts for work-preserving AM restart. Contributed by Wangda Tan 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2066) Wrong field is referenced in GetApplicationsRequestPBImpl#mergeLocalToBuilder()

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003283#comment-14003283
 ] 

Hudson commented on YARN-2066:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1754 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1754/])
YARN-2066. Wrong field is referenced in 
GetApplicationsRequestPBImpl#mergeLocalToBuilder (Contributed by Hong Zhiguo) 
(junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595413)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetApplicationsRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestGetApplicationsRequest.java


 Wrong field is referenced in 
 GetApplicationsRequestPBImpl#mergeLocalToBuilder()
 ---

 Key: YARN-2066
 URL: https://issues.apache.org/jira/browse/YARN-2066
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.4.1

 Attachments: YARN-2066.patch


 {code}
 if (this.finish != null) {
   builder.setFinishBegin(start.getMinimumLong());
   builder.setFinishEnd(start.getMaximumLong());
 }
 {code}
 this.finish should be referenced in the if block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2079) Recover NonAggregatingLogHandler state upon nodemanager restart

2014-05-20 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-2079:


 Summary: Recover NonAggregatingLogHandler state upon nodemanager 
restart
 Key: YARN-2079
 URL: https://issues.apache.org/jira/browse/YARN-2079
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe


The state of NonAggregatingLogHandler needs to be persisted so logs are 
properly deleted across a nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2066) Wrong field is referenced in GetApplicationsRequestPBImpl#mergeLocalToBuilder()

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003460#comment-14003460
 ] 

Hudson commented on YARN-2066:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1780 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1780/])
YARN-2066. Wrong field is referenced in 
GetApplicationsRequestPBImpl#mergeLocalToBuilder (Contributed by Hong Zhiguo) 
(junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595413)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetApplicationsRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestGetApplicationsRequest.java


 Wrong field is referenced in 
 GetApplicationsRequestPBImpl#mergeLocalToBuilder()
 ---

 Key: YARN-2066
 URL: https://issues.apache.org/jira/browse/YARN-2066
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.4.1

 Attachments: YARN-2066.patch


 {code}
 if (this.finish != null) {
   builder.setFinishBegin(start.getMinimumLong());
   builder.setFinishEnd(start.getMaximumLong());
 }
 {code}
 this.finish should be referenced in the if block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003469#comment-14003469
 ] 

Hudson commented on YARN-2053:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1780 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1780/])
YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from 
previous attempts for work-preserving AM restart. Contributed by Wangda Tan 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application

2014-05-20 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003440#comment-14003440
 ] 

bc Wong commented on YARN-941:
--

Hi [~xgong], thanks for the patch! I'm interested in talking through the 
changes and their security implications, for everybody who's following along. I 
think the following are worth highlighting:

# The token update mechanism is via the AM heartbeat. So if the previous AMRM 
token has been compromised, the attacker can get the new token.
** I don't think it's a big problem as the RM will only hand out the new token 
in _exactly_ one AllocateResponse (except for the case of RM restart). So if 
the attacker has the new token, the real AM won't, and it'll die and the token 
will get revoked.
# How frequently a running AM gets an updated token is at the mercy of the 
configuration (the roll interval and activation delay). In addition, whenever 
the RM restarts, all AMs will get a new token on the next heartbeat.
** Should the RM check that the roll interval and activation delay are both 
shorter than the token expiration interval?
# The client app is not responsible for renewing the token. The RM will renew 
it proactively and update the apps.
** The loss of control may be inconvenient to the app. The AM must also 
heartbeat frequently enough to catch the update in time. In practice, it's not 
an issue. But it still makes me slightly uncomfortable, since the client is the 
usually one renewing its credentials, of all other security protocols I know 
of. Here, the RM doesn't have any explicit logic to update an AMRM token before 
it expires. The math just generally works out if the admin sets the token 
expiry, roll interval and activation delay to the right values.\\
\\
Again, I think this is better than making it the AM's responsibility to get a 
new token, which is more burden on the AM. I just want to bring this up for 
discussion.

 RM Should have a way to update the tokens it has for a running application
 --

 Key: YARN-941
 URL: https://issues.apache.org/jira/browse/YARN-941
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Robert Joseph Evans
Assignee: Xuan Gong
 Attachments: YARN-941.preview.2.patch, YARN-941.preview.3.patch, 
 YARN-941.preview.patch


 When an application is submitted to the RM it includes with it a set of 
 tokens that the RM will renew on behalf of the application, that will be 
 passed to the AM when the application is launched, and will be used when 
 launching the application to access HDFS to download files on behalf of the 
 application.
 For long lived applications/services these tokens can expire, and then the 
 tokens that the AM has will be invalid, and the tokens that the RM had will 
 also not work to launch a new AM.
 We need to provide an API that will allow the RM to replace the current 
 tokens for this application with a new set.  To avoid any real race issues, I 
 think this API should be something that the AM calls, so that the client can 
 connect to the AM with a new set of tokens it got using kerberos, then the AM 
 can inform the RM of the new set of tokens and quickly update its tokens 
 internally to use these new ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart

2014-05-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003489#comment-14003489
 ] 

Junping Du commented on YARN-1338:
--

[~jlowe], thanks again for your patch here! A few comments so far:
One question in general: beside null store and a leveled store, I saw a memory 
store implemented there but no usage so far. Does it helps in some scenario or 
only for test purpose? 

In NodeManager#serviceInit()
{code}
if (recoveryEnabled) {
...
+  nmStore = new NMLeveldbStateStoreService();
+} else {
+  nmStore = new NMNullStateStoreService();
 }
+nmStore.init(conf);
+nmStore.start();
{code}
Can we abstract code since if block into a method, something like: 
initializeNMStore(conf)? which can make NodeManager#serviceInit() simpler. 
 
In yarn_server_nodemanager_recovery.proto,
{code}
+message LocalizedResourceProto {
+  optional LocalResourceProto resource = 1;
+  optional string localPath = 2;
+  optional int64 size = 3;
+}
{code}
Does size here represent for size of local resource? If so, may be duplicated 
with the size within LocalResourceProto?

In ResourceLocalizationService.java
{code}
+  //Recover localized resources after an NM restart
+  public void recoverLocalizedResources(RecoveredLocalizationState state)
+  throws URISyntaxException {
+  ...
+  for (Map.EntryApplicationId, LocalResourceTrackerState appEntry :
+   userResources.getAppTrackerStates().entrySet()) {
+ApplicationId appId = appEntry.getKey();
+...
+recoverTrackerResources(tracker, appEntry.getValue());
+  }
+}
+  }
{code}
May be we should check appResourceState(appEntry.getValue)’s localizedResources 
and inProgressResources is not empty before recover it as we check for 
userResourceState?

In NMMemoryStateStoreService#loadLocalizationState()
{code}
  ...
+if (tk.appId == null) {
+  rur.privateTrackerState = loadTrackerState(ts);
+} else {
+  rur.appTrackerStates.put(tk.appId, loadTrackerState(ts));
+}
  ...
{code}
May be even in case tk.appId !=null, we should load private resource state as 
well?

Given the patch is big enough, I haven’t finished my review although walk 
though it a few times. More comments may come later.

 Recover localized resource cache state upon nodemanager restart
 ---

 Key: YARN-1338
 URL: https://issues.apache.org/jira/browse/YARN-1338
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1338.patch, YARN-1338v2.patch, 
 YARN-1338v3-and-YARN-1987.patch, YARN-1338v4.patch


 Today when node manager restarts we clean up all the distributed cache files 
 from disk. This is definitely not ideal from 2 aspects.
 * For work preserving restart we definitely want them as running containers 
 are using them
 * For even non work preserving restart this will be useful in the sense that 
 we don't have to download them again if needed by future tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext

2014-05-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003548#comment-14003548
 ] 

Jason Lowe commented on YARN-2050:
--

+1 lgtm.  Committing this.

 Fix LogCLIHelpers to create the correct FileContext
 ---

 Key: YARN-2050
 URL: https://issues.apache.org/jira/browse/YARN-2050
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: YARN-2050-2.patch, YARN-2050.patch


 LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus 
 the FileContext created isn't necessarily the FileContext for remote log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2076) Minor error in TestLeafQueue files

2014-05-20 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2076:
--

Attachment: YARN-2076.patch

 Minor error in TestLeafQueue files
 --

 Key: YARN-2076
 URL: https://issues.apache.org/jira/browse/YARN-2076
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Chen He
Assignee: Chen He
Priority: Minor
  Labels: test
 Attachments: YARN-2076.patch


 numNodes should be 2 instead of 3 in testReservationExchange() since only 
 two nodes are defined.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003675#comment-14003675
 ] 

Hadoop QA commented on YARN-1680:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645816/YARN-1680.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3773//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3773//console

This message is automatically generated.

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He
 Attachments: YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-20 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003678#comment-14003678
 ] 

Bikas Saha commented on YARN-1366:
--

bq.If there's no RM restart, a normal app only calling unregister without 
calling register earlier will be just deemed as FINISHED ? is this acceptable?
bq.What about storing information on zk for registered application.
Catching incorrect unregistration before registration should have always been 
there. Is this a regression in the patch or an existing bug. Should we consider 
the possibility of allowing unregister without register? What are the 
downsides? As long as we can make sure that unregister is coming from the 
latest version of the app.

 ApplicationMasterService should Resync with the AM upon allocate call after 
 restart
 ---

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.patch, 
 YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-05-20 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1365:


Attachment: YARN-1365.002.patch

Added ApplicationMasterService changes to send SHUTDOWN for attempt thats not 
known and RESYNC for allocate if the AM has not registered after restart. 
Added more Unit tests that verify these

Pending how to handle unregister after restart for an unregistered AM.   

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2076) Minor error in TestLeafQueue files

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003703#comment-14003703
 ] 

Hadoop QA commented on YARN-2076:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645818/YARN-2076.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3774//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3774//console

This message is automatically generated.

 Minor error in TestLeafQueue files
 --

 Key: YARN-2076
 URL: https://issues.apache.org/jira/browse/YARN-2076
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Chen He
Assignee: Chen He
Priority: Minor
  Labels: test
 Attachments: YARN-2076.patch


 numNodes should be 2 instead of 3 in testReservationExchange() since only 
 two nodes are defined.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003782#comment-14003782
 ] 

Hadoop QA commented on YARN-1365:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645826/YARN-1365.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3775//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3775//console

This message is automatically generated.

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
 YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application

2014-05-20 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003796#comment-14003796
 ] 

Marcelo Vanzin commented on YARN-941:
-

Apologies for jumping in the middle of the conversation. I don't have a lot of 
background into the Yarn code here, but from this bug and some internal 
discussions I have a question for people who are more familiar with this code:

What is the purpose of this renewal mechanism?

So far it seems to me that it's an attack mitigation feature. An attacker who 
is able to get the token would only be able to use it while the original 
application (i) is running and (ii) keeps renewing the token.

if that's true, it sounds to me like the problem is actually that it's possible 
to sniff the token in the first place. Wouldn't it be better, at that point, to 
have a protocol that doesn't allow that? Either using full-blown encryption for 
the RPC channels, or if that's deemed too expensive, some mechanism where 
tokens are negotiated instead of sent in plain text over the wire.

 RM Should have a way to update the tokens it has for a running application
 --

 Key: YARN-941
 URL: https://issues.apache.org/jira/browse/YARN-941
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Robert Joseph Evans
Assignee: Xuan Gong
 Attachments: YARN-941.preview.2.patch, YARN-941.preview.3.patch, 
 YARN-941.preview.patch


 When an application is submitted to the RM it includes with it a set of 
 tokens that the RM will renew on behalf of the application, that will be 
 passed to the AM when the application is launched, and will be used when 
 launching the application to access HDFS to download files on behalf of the 
 application.
 For long lived applications/services these tokens can expire, and then the 
 tokens that the AM has will be invalid, and the tokens that the RM had will 
 also not work to launch a new AM.
 We need to provide an API that will allow the RM to replace the current 
 tokens for this application with a new set.  To avoid any real race issues, I 
 think this API should be something that the AM calls, so that the client can 
 connect to the AM with a new set of tokens it got using kerberos, then the AM 
 can inform the RM of the new set of tokens and quickly update its tokens 
 internally to use these new ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1935) Security for timeline server

2014-05-20 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1935:
--

Attachment: Timeline_Kerberos_DT_ACLs.2.patch
Timeline Security Diagram.pdf

Hi folks,

I've just attached a diagram Timeline Security Diagram.pdf to demonstrate the 
rough workflow of the the timeline security. In general, it consists of two 
parts: authentication and the authorization.

*1. Authentication*

a) When the authentication is enabled, a customized authentication filter will 
be loaded into the webapp of the timeline server, which prevents unauthorized 
users to access any timeline web resources. The filter allow users to:

* negotiate the authentication via HTTP SPNEGO, and login with Kerberos 
principal and keytab; and

* request a delegation token after Kerberos login and use it for follow-up 
secured communication.

b) TimelineClient is adapted to pass the authentication before putting the 
timeline data. It can choose append the Kerberos token or delegation token into 
the HTTP request. The rationale behind supporting delegation token is to allow 
AM and other containers to use TimelineClient to put the timeline data in a 
secured manner, where the Kerberos stuff is not available.

c) TimelineClient also has the API to get the delegation token from the 
timeline sever (actually from the customized authentication filter). When 
security is enabled and the timeline service is enabled, and YarnClient is used 
to submit an application, YarnClient will automatically call TimeClient to get 
a delegation token and put into the application submission context, such that 
the AM can used the passed-in delegation token to communicate with the timeline 
server securely.

d) Any tool which support SPNEGO/Kerberos, such as Firefox, curl and etc., can 
access the three GET APIs of the timeline server to inquiry the timeline data.

*2. Authorization*

Once the request from an authenticated user passes the customized 
authentication filter, it will be processed by the timeline web services. Here 
we use the ACLs manager to determine whether the user of the request has the 
access to the requested data. The basic rules are as follows:

* The access control granularity is entity, which means a user can access all 
the information of any entity and its events, or he/she can access nothing of 
it.

* Currently we only allow the owner of the entity to access it. In the future, 
we can simply extend the rule to allow Admin and users/groups on the access 
control list.

*Configuration*
After all, to enable the timeline security, we need to setup Kerberos. In 
addition, there're a bunch of configurations to do:

* Make use of the filter initializer to setup the customized authentication 
filter, and the configuration is much like hadoop-auth style; and

* ACLs is controlled by YARN ACLs configuration like other YARN daemons.

I also uploaded my newest uber patch Timeline_Kerberos_DT_ACLs.2.patch to 
demonstrate how the design is implemented

 Security for timeline server
 

 Key: YARN-1935
 URL: https://issues.apache.org/jira/browse/YARN-1935
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Zhijie Shen
 Attachments: Timeline Security Diagram.pdf, 
 Timeline_Kerberos_DT_ACLs.2.patch, Timeline_Kerberos_DT_ACLs.patch


 Jira to track work to secure the ATS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem

2014-05-20 Thread Subramaniam Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1709:
---

Description: This JIRA is about the key data structure used to track 
resources over time to enable YARN-1051. The Reservation subsystem is 
conceptually a plan of how the scheduler will allocate resources over-time.  
(was: This JIRA is about the key data structure used to track resources over 
time to enable YARN-1051. The inventory subsystem is conceptually a plan of 
how the capacity scheduler will be configured over-time.)
Summary: Admission Control: Reservation subsystem  (was: Admission 
Control: inventory subsystem)

 Admission Control: Reservation subsystem
 

 Key: YARN-1709
 URL: https://issues.apache.org/jira/browse/YARN-1709
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Subramaniam Krishnan

 This JIRA is about the key data structure used to track resources over time 
 to enable YARN-1051. The Reservation subsystem is conceptually a plan of 
 how the scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-05-20 Thread Subramaniam Krishnan (JIRA)
Subramaniam Krishnan created YARN-2080:
--

 Summary: Admission Control: Integrate Reservation subsystem with 
ResourceManager
 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan


This JIRA is about the key data structure used to track resources over time to 
enable YARN-1051. The Reservation subsystem is conceptually a plan of how the 
scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-05-20 Thread Subramaniam Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-2080:
---

Description: This JIRA tracks the integration of Reservation subsystem data 
structures introduced in YARN-1709 with the YARN RM. This is essentially 
end2end wiring of YARN-1051.  (was: This JIRA is about the key data structure 
used to track resources over time to enable YARN-1051. The Reservation 
subsystem is conceptually a plan of how the scheduler will allocate resources 
over-time.)

 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan

 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-20 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-2074:
-

Assignee: Jian He  (was: Vinod Kumar Vavilapalli)

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He

 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-20 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003892#comment-14003892
 ] 

Jian He commented on YARN-2074:
---

I'd like to work on this. Taking this over..

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003960#comment-14003960
 ] 

Vinod Kumar Vavilapalli commented on YARN-2074:
---

[~sunilg], Agree that as much as possible we should avoid killing the AM during 
preemption and so we should look at YARN-2022 orthogonally. This one focuses 
only on the point that in the case that this cannot be avoided, it shouldn't be 
accounted towards AM failures.

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He

 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1569) For handle(SchedulerEvent) in FifoScheduler and CapacityScheduler, SchedulerEvent should get checked (instanceof) for appropriate type before casting

2014-05-20 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-1569:


Attachment: yarn-1569.patch

 For handle(SchedulerEvent) in FifoScheduler and CapacityScheduler, 
 SchedulerEvent should get checked (instanceof) for appropriate type before 
 casting
 -

 Key: YARN-1569
 URL: https://issues.apache.org/jira/browse/YARN-1569
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Junping Du
Assignee: zhihai xu
Priority: Minor
  Labels: newbie
 Attachments: yarn-1569.patch


 As following: http://wiki.apache.org/hadoop/CodeReviewChecklist, we should 
 always check appropriate type before casting. 
 handle(SchedulerEvent) in FifoScheduler and CapacityScheduler didn't check so 
 far (no bug there now) but should be improved as FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server

2014-05-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003994#comment-14003994
 ] 

Vinod Kumar Vavilapalli commented on YARN-1938:
---

Looks good to me too. Can you add the new configs into yarn-default.xml?

 Kerberos authentication for the timeline server
 ---

 Key: YARN-1938
 URL: https://issues.apache.org/jira/browse/YARN-1938
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1938.1.patch, YARN-1938.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1938) Kerberos authentication for the timeline server

2014-05-20 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1938:
--

Target Version/s: 2.5.0

 Kerberos authentication for the timeline server
 ---

 Key: YARN-1938
 URL: https://issues.apache.org/jira/browse/YARN-1938
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1938.1.patch, YARN-1938.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1338) Recover localized resource cache state upon nodemanager restart

2014-05-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1338:
-

Attachment: YARN-1338v5.patch

Thanks for the review, Junping!  Attaching a patch to address your comments 
with specific responses below.

bq. beside null store and a leveled store, I saw a memory store implemented 
there but no usage so far. Does it helps in some scenario or only for test 
purpose?

It's only for use in unit tests which is why it's located under src/test/.  It 
stores state in the memory of the JVM itself, so it's not very useful for 
real-world recovery scenarios.  The state is lost when the NM crashes/exits.

bq. Can we abstract code since if block into a method, something like: 
initializeNMStore(conf)? which can make NodeManager#serviceInit() simpler.

Done.

bq. Does size here represent for size of local resource? If so, may be 
duplicated with the size within LocalResourceProto?

As I understand it they are slightly different.  The size in the 
LocalResourceProto is the size of the resource that will be downloaded, while 
the size in LocalizedResource (and also persisted in LocalizedResourceProto) is 
the size of the resource on the local disk.  These can be different if the 
resource is uncompressed/unarchived after downloading (e.g.: a .tar.gz 
resource).

bq. May be we should check appResourceState(appEntry.getValue)’s 
localizedResources and inProgressResources is not empty before recover it as we 
check for userResourceState?

Done.  I also added a LocalResourceTrackerState#isEmpty method to make the code 
a bit cleaner.

bq. May be even in case tk.appId !=null, we should load private resource state 
as well?

No, if tk.appId is not null then this is state for an app-specific resource 
tracker and not for a private resource tracker.  See the javadoc for 
NMStateStoreService#startResourceLocalization or 
NMStateStoreService#finishResourceLocalziation for some hints, and I also added 
some comments to the NMMemoryStateStoreService to clarify how the user and 
appId are used to discern public vs. private vs. app-specific trackers.

 Recover localized resource cache state upon nodemanager restart
 ---

 Key: YARN-1338
 URL: https://issues.apache.org/jira/browse/YARN-1338
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1338.patch, YARN-1338v2.patch, 
 YARN-1338v3-and-YARN-1987.patch, YARN-1338v4.patch, YARN-1338v5.patch


 Today when node manager restarts we clean up all the distributed cache files 
 from disk. This is definitely not ideal from 2 aspects.
 * For work preserving restart we definitely want them as running containers 
 are using them
 * For even non work preserving restart this will be useful in the sense that 
 we don't have to download them again if needed by future tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1569) For handle(SchedulerEvent) in FifoScheduler and CapacityScheduler, SchedulerEvent should get checked (instanceof) for appropriate type before casting

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004063#comment-14004063
 ] 

Hadoop QA commented on YARN-1569:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645863/yarn-1569.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3777//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3777//console

This message is automatically generated.

 For handle(SchedulerEvent) in FifoScheduler and CapacityScheduler, 
 SchedulerEvent should get checked (instanceof) for appropriate type before 
 casting
 -

 Key: YARN-1569
 URL: https://issues.apache.org/jira/browse/YARN-1569
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Junping Du
Assignee: zhihai xu
Priority: Minor
  Labels: newbie
 Attachments: yarn-1569.patch


 As following: http://wiki.apache.org/hadoop/CodeReviewChecklist, we should 
 always check appropriate type before casting. 
 handle(SchedulerEvent) in FifoScheduler and CapacityScheduler didn't check so 
 far (no bug there now) but should be improved as FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-20 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2049:
--

Attachment: YARN-2049.3.patch

I created a new patch, which will no longer rely on HADOOP-10596, given it is 
still arguable how we should fix initSpnego of HttpServer2. In this patch, I 
walked around to use the filter initializer approach introduce by hadoop-auth 
to load TimelineAuthenticationFilter, though it is not consistent with the 
existing YARN-style SPNEGO configuration. Hopefully folks are fine with the the 
walk around to make the timeline security available ASAP.

 Delegation token stuff for the timeline sever
 -

 Key: YARN-2049
 URL: https://issues.apache.org/jira/browse/YARN-2049
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-20 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.2.patch

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1938) Kerberos authentication for the timeline server

2014-05-20 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1938:
--

Attachment: YARN-1938.3.patch

Thanks for review, Vinod and Varun. I added the configs into yarn-default.xml 
as well in the newest patch.

 Kerberos authentication for the timeline server
 ---

 Key: YARN-1938
 URL: https://issues.apache.org/jira/browse/YARN-1938
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004138#comment-14004138
 ] 

Hadoop QA commented on YARN-1938:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645907/YARN-1938.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3780//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3780//console

This message is automatically generated.

 Kerberos authentication for the timeline server
 ---

 Key: YARN-1938
 URL: https://issues.apache.org/jira/browse/YARN-1938
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004146#comment-14004146
 ] 

Hadoop QA commented on YARN-2074:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645906/YARN-2074.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3781//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3781//console

This message is automatically generated.

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2073:
---

Attachment: yarn-2073-1.patch

Added a unit test - the test fails without the fix. Also, moved a bunch of 
helper code from TestFairScheduler to FairSchedulerTestBase.

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse

2014-05-20 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004166#comment-14004166
 ] 

Ming Ma commented on YARN-1897:
---

Chatted with Gera offline. The definition of SignalContainer* APIs is needed 
for other subtasks including YARN-1515. So we will resolve SignalContainer* 
APIs issues in this jira. After it is done, other subtasks can continue. Here 
are couple open issues.

1.  Support for a list of containers. The latest patch in this jira just 
supports a flat list of signalContainerRequest, regardless if they are from the 
same containers or not. Gera's patch in YARN-1515 groups all commands under the 
same container together via signalContainerRequest.getSignals(). Either 
approach works. I don't have strong preference either way given the most common 
use case is for single container; although signalContainers is more consistent 
with startContainers.

2. Support for SIGTERM + delay + SIGKILL used in stopContainers. Latest 
YARN-1515 introduces Pause method so that containers can pause in between 
signals. We need something like that to support YARN-1515 scenario. Or we can 
provide some new SignalContainerCommand like sleep.

Really appreciate any comments on this.

 Define SignalContainerRequest and SignalContainerResponse
 -

 Key: YARN-1897
 URL: https://issues.apache.org/jira/browse/YARN-1897
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
 YARN-1897.1.patch


 We need to define SignalContainerRequest and SignalContainerResponse first as 
 they are needed by other sub tasks. SignalContainerRequest should use 
 OS-independent commands and provide a way to application to specify reason 
 for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-20 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004175#comment-14004175
 ] 

Wei Yan commented on YARN-2073:
---

[~kasha], if we move preemption-related test code to a separate .java file, we 
may also need to move the previous preemption-related test functions 
(testChoiceOfPreemptedContainers and testPreemptionDecision) to the new file. 

And so next step we'll divide the TestFairScheduler into several test files 
according to different scheduler operations?

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004181#comment-14004181
 ] 

Karthik Kambatla commented on YARN-2073:


bq. we may also need to move the previous preemption-related test functions 
(testChoiceOfPreemptedContainers and testPreemptionDecision) to the new file
Moving them might require slightly more work, and I was planning on doing that 
in a separate JIRA along with splitting the tests into multiple files. 

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004198#comment-14004198
 ] 

Hudson commented on YARN-2050:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5607 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5607/])
YARN-2050. Fix LogCLIHelpers to create the correct FileContext. Contributed by 
Ming Ma (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596310)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java


 Fix LogCLIHelpers to create the correct FileContext
 ---

 Key: YARN-2050
 URL: https://issues.apache.org/jira/browse/YARN-2050
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-2050-2.patch, YARN-2050.patch


 LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus 
 the FileContext created isn't necessarily the FileContext for remote log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2073:
---

Attachment: yarn-2073-2.patch

Thanks Wei. Updated patch to address the nits.

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004241#comment-14004241
 ] 

Hadoop QA commented on YARN-2073:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645920/yarn-2073-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3783//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3783//console

This message is automatically generated.

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-20 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004279#comment-14004279
 ] 

Rohith commented on YARN-1366:
--

bq. Catching incorrect unregistration before registration should have always 
been there. Is this a regression in the patch or an existing bug.
This is not bug in existing code. Unregister in ApplicationMasterService check 
whether app is registered.Otherwise throw 
InvalidApplicationMasterRequestException

bq. Should we consider the possibility of allowing unregister without register?
Yes, becaue for differentiating 
 last heatbeat sent by AM to RM,RM restarted, and unregistering 
application VS  application master sending unregister without registering

 ApplicationMasterService should Resync with the AM upon allocate call after 
 restart
 ---

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.patch, 
 YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-20 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004290#comment-14004290
 ] 

Sandy Ryza commented on YARN-2073:
--

There are some situations where preemption with free resources on the cluster 
is the right thing to do.

For example, if I'm requesting 2 GB containers, I have no resources, and 100 
nodes on the cluster each have 1GB remaining, containers should get preempted 
on my behalf.

There are also cases arising from requests with strict locality - the cluster 
might have resources available because I'm waiting on a subset of nodes.  (In 
this case, we'd probably want to make sure preemption only happens on the nodes 
being waited for; otherwise we'd kill containers needlessly).

If the goal is to make sure that we aren't preempting on behalf of an 
application that's actually receiving resources, it might also be worth 
considering time-based approaches. E.g. only preempt on behalf of an 
application that hasn't received resources in some amount of time.

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore

2014-05-20 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated YARN-2030:


Attachment: YARN-2030.v2.patch

attach v2 patch to fix findbug warnings

 Use StateMachine to simplify handleStoreEvent() in RMStateStore
 ---

 Key: YARN-2030
 URL: https://issues.apache.org/jira/browse/YARN-2030
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Junping Du
Assignee: Binglin Chang
 Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch


 Now the logic to handle different store events in handleStoreEvent() is as 
 following:
 {code}
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
 ...
   } else {
 ...
   }
   ...
   try {
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
   ...
 } else {
   ...
 }
   } 
   ...
 } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
 ...
   } else {
 ...
   }
 ...
 if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
   ...
 } else {
   ...
 }
   }
   ...
 } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) {
 ...
 } else {
   ...
 }
 }
 {code}
 This is not only confuse people but also led to mistake easily. We may 
 leverage state machine to simply this even no state transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2012) Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute

2014-05-20 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004307#comment-14004307
 ] 

Sandy Ryza commented on YARN-2012:
--

{code}
+  defaultQueueName = root. + defaultQueueName;
{code}
This should go inside the initializeFromXml method.

{code}
+if (configuredQueues.get(FSQueueType.LEAF).contains(defaultQueueName)
+|| configuredQueues.get(FSQueueType.PARENT).contains(
+defaultQueueName)) {
+  return defaultQueueName;
+}
+  }
   return root. + YarnConfiguration.DEFAULT_QUEUE_NAME;
{code}
I think it's a little confusing for the rule to fall back to default.  Can we 
let this part be handled by the create logic in assignAppToQueue?

 Fair Scheduler : Default rule in queue placement policy can take a queue as 
 an optional attribute
 -

 Key: YARN-2012
 URL: https://issues.apache.org/jira/browse/YARN-2012
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2012-v1.txt, YARN-2012-v2.txt


 Currently 'default' rule in queue placement policy,if applied,puts the app in 
 root.default queue. It would be great if we can make 'default' rule 
 optionally point to a different queue as default queue . This queue should be 
 an existing queue,if not we fall back to root.default queue hence keeping 
 this rule as terminal.
 This default queue can be a leaf queue or it can also be an parent queue if 
 the 'default' rule is nested inside nestedUserQueue rule(YARN-1864).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster

2014-05-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004315#comment-14004315
 ] 

Karthik Kambatla commented on YARN-2073:


Sandy - you make very good points. In other words, we want to have an 
absoluteMinSharePreemptionTimeout. Now, the question becomes whether we 
should express this as a separate timeout config or a scaling factor which 
determines this absolute timeout for both min-share and fair-share? Also, we 
can make it a per-queue config or a single factor for the cluster.

Eventually, we need a better story for preemption. Currently, it is like a 
spray gun, we preempt some resources and hope that helps this application. 
Instead, we should preempt resources that match the application's ask. In that 
case, this new config will be moot. 

 FairScheduler starts preempting resources even with free resources on the 
 cluster
 -

 Key: YARN-2073
 URL: https://issues.apache.org/jira/browse/YARN-2073
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch


 Preemption should kick in only when the currently available slots don't match 
 the request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore

2014-05-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004335#comment-14004335
 ] 

Junping Du commented on YARN-2030:
--

Hi [~decster], thanks for taking on this effort. I will review your patch.

 Use StateMachine to simplify handleStoreEvent() in RMStateStore
 ---

 Key: YARN-2030
 URL: https://issues.apache.org/jira/browse/YARN-2030
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Junping Du
Assignee: Binglin Chang
 Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch


 Now the logic to handle different store events in handleStoreEvent() is as 
 following:
 {code}
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
 ...
   } else {
 ...
   }
   ...
   try {
 if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
   ...
 } else {
   ...
 }
   } 
   ...
 } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)
 || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) {
   ...
   if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
 ...
   } else {
 ...
   }
 ...
 if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) {
   ...
 } else {
   ...
 }
   }
   ...
 } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) {
 ...
 } else {
   ...
 }
 }
 {code}
 This is not only confuse people but also led to mistake easily. We may 
 leverage state machine to simply this even no state transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2081) TestDistributedShell fails after YARN-1962

2014-05-20 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-2081:
-

 Summary: TestDistributedShell fails after YARN-1962
 Key: YARN-2081
 URL: https://issues.apache.org/jira/browse/YARN-2081
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor


java.lang.AssertionError: expected:1 but was:0
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2081) TestDistributedShell fails after YARN-1962

2014-05-20 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2081:
--

Attachment: YARN-2081.patch

 TestDistributedShell fails after YARN-1962
 --

 Key: YARN-2081
 URL: https://issues.apache.org/jira/browse/YARN-2081
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2081.patch


 java.lang.AssertionError: expected:1 but was:0
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at org.junit.Assert.assertEquals(Assert.java:542)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2051) Add more unit tests for PBImpl that didn't get covered

2014-05-20 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004371#comment-14004371
 ] 

Binglin Chang commented on YARN-2051:
-

I thought about this, most of the pb serde validation involves the following 
procedure:
1. set property to record using value(v0)
2. get proto obj from record
3. create new record from proto obj
4. get property from new record value(v1), validate v0 == v1
This can be automated for all set/get pairs, we just need to use reflection to 
find all get/set pairs of the record class, and test each pair. By doing this, 
we save lots of testing code. In the future when we add new properties to a 
record, no need to add/change the testing code:) 

Note: those record looks like java beans but many of those does not follow 
strict java bean laws, I try to leverage commons-beanutil but it seems it is 
not flexible enough, we make a patch soon.



 Add more unit tests for PBImpl that didn't get covered
 --

 Key: YARN-2051
 URL: https://issues.apache.org/jira/browse/YARN-2051
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Junping Du
Assignee: Binglin Chang
Priority: Critical

 From YARN-2016, we can see some bug could exist in PB implementation of 
 protocol. The bad news is most of these PBImpl don't have any unit test to 
 verify the info is not lost or changed after serialization/deserialization. 
 We should add more tests for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1872) TestDistributedShell occasionally fails in trunk

2014-05-20 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004373#comment-14004373
 ] 

Hong Zhiguo commented on YARN-1872:
---

Binglin, I got same failure. The phenomenon and reason of your failure is 
different with this one reported by Ted Yu.
I fixed it by YARN-2081.

 TestDistributedShell occasionally fails in trunk
 

 Key: YARN-1872
 URL: https://issues.apache.org/jira/browse/YARN-1872
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Hong Zhiguo
 Attachments: TestDistributedShell.out, YARN-1872.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console :
 TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and 
 TestDistributedShell#testDSShell timed out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)