[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

2013-03-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-18:
---

Attachment: YARN-18-v5.patch

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, 
 MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, 
 MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, 
 YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, 
 YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-109) .tmp file is not deleted for localized archives

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613676#comment-13613676
 ] 

Hudson commented on YARN-109:
-

Integrated in Hadoop-Yarn-trunk #167 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/167/])
YARN-109. .tmp file is not deleted for localized archives (Mayank Bansal 
via bobby) (Revision 1460723)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460723
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java


 .tmp file is not deleted for localized archives
 ---

 Key: YARN-109
 URL: https://issues.apache.org/jira/browse/YARN-109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Jason Lowe
Assignee: Mayank Bansal
 Fix For: 3.0.0, 0.23.7, 2.0.5-beta

 Attachments: YARN-109-trunk-1.patch, YARN-109-trunk-2.patch, 
 YARN-109-trunk-3.patch, YARN-109-trunk-4.patch, YARN-109-trunk-5.patch, 
 YARN-109-trunk.patch


 When archives are localized they are initially created as a .tmp file and 
 unpacked from that file.  However the .tmp file is not deleted afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-498) Unmanaged AM launcher does not set various constants in env for an AM, also does not handle failed AMs properly

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613679#comment-13613679
 ] 

Hudson commented on YARN-498:
-

Integrated in Hadoop-Yarn-trunk #167 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/167/])
YARN-498. Unmanaged AM launcher does not set various constants in env for 
an AM, also does not handle failed AMs properly (Hitesh Shah via bikas) 
(Revision 1460954)

 Result = SUCCESS
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460954
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java


 Unmanaged AM launcher does not set various constants in env for an AM, also 
 does not handle failed AMs properly
 ---

 Key: YARN-498
 URL: https://issues.apache.org/jira/browse/YARN-498
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: YARN-498.1.patch, YARN-498.2.patch, YARN-498.3.patch, 
 YARN-498.4.patch, YARN-498.wip.patch


 Currently, it only sets the app attempt id which is really not required as 
 AMs are only expected to extract it from the container id.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-497) Yarn unmanaged-am launcher jar does not define a main class in its manifest

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613682#comment-13613682
 ] 

Hudson commented on YARN-497:
-

Integrated in Hadoop-Yarn-trunk #167 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/167/])
YARN-497. Yarn unmanaged-am launcher jar does not define a main class in 
its manifest (Hitesh Shah via bikas) (Revision 1460846)

 Result = SUCCESS
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460846
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/pom.xml


 Yarn unmanaged-am launcher jar does not define a main class in its manifest
 ---

 Key: YARN-497
 URL: https://issues.apache.org/jira/browse/YARN-497
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
  Labels: usability
 Attachments: YARN-497.1.patch


 The jar should have a mainClass defined to make it easier to use with the 
 hadoop jar command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-469) Make scheduling mode in FS pluggable

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613678#comment-13613678
 ] 

Hudson commented on YARN-469:
-

Integrated in Hadoop-Yarn-trunk #167 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/167/])
YARN-469. Make scheduling mode in FS pluggable. (kkambatl via tucu) 
(Revision 1460961)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460961
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingAlgorithms.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingMode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes/FairSchedulingMode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes/FifoSchedulingMode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestComputeFairShares.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestSchedulingMode.java


 Make scheduling mode in FS pluggable
 

 Key: YARN-469
 URL: https://issues.apache.org/jira/browse/YARN-469
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: scheduler
 Fix For: 2.0.5-beta

 Attachments: yarn-469.patch, yarn-469.patch, yarn-469.patch, 
 yarn-469.patch, yarn-469.patch


 Currently, scheduling mode in FS is limited to Fair and FIFO. The code 
 typically has an if condition at multiple places to determine the correct 
 course of action.
 Making the scheduling mode pluggable helps in simplifying this process, 
 particularly as we add new modes (DRF in this case).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-439) Flatten NodeHeartbeatResponse

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613685#comment-13613685
 ] 

Hudson commented on YARN-439:
-

Integrated in Hadoop-Yarn-trunk #167 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/167/])
YARN-439. Flatten NodeHeartbeatResponse. Contributed by Xuan Gong. 
(Revision 1460811)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460811
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/HeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/HeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestRecordFactory.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java
* 

[jira] [Commented] (YARN-109) .tmp file is not deleted for localized archives

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613732#comment-13613732
 ] 

Hudson commented on YARN-109:
-

Integrated in Hadoop-Hdfs-0.23-Build #565 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/565/])
svn merge -c 1460723 FIXES: YARN-109. .tmp file is not deleted for 
localized archives (Mayank Bansal via bobby) (Revision 1460734)

 Result = UNSTABLE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460734
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java


 .tmp file is not deleted for localized archives
 ---

 Key: YARN-109
 URL: https://issues.apache.org/jira/browse/YARN-109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Jason Lowe
Assignee: Mayank Bansal
 Fix For: 3.0.0, 0.23.7, 2.0.5-beta

 Attachments: YARN-109-trunk-1.patch, YARN-109-trunk-2.patch, 
 YARN-109-trunk-3.patch, YARN-109-trunk-4.patch, YARN-109-trunk-5.patch, 
 YARN-109-trunk.patch


 When archives are localized they are initially created as a .tmp file and 
 unpacked from that file.  However the .tmp file is not deleted afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-498) Unmanaged AM launcher does not set various constants in env for an AM, also does not handle failed AMs properly

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613752#comment-13613752
 ] 

Hudson commented on YARN-498:
-

Integrated in Hadoop-Hdfs-trunk #1356 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/])
YARN-498. Unmanaged AM launcher does not set various constants in env for 
an AM, also does not handle failed AMs properly (Hitesh Shah via bikas) 
(Revision 1460954)

 Result = FAILURE
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460954
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java


 Unmanaged AM launcher does not set various constants in env for an AM, also 
 does not handle failed AMs properly
 ---

 Key: YARN-498
 URL: https://issues.apache.org/jira/browse/YARN-498
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: YARN-498.1.patch, YARN-498.2.patch, YARN-498.3.patch, 
 YARN-498.4.patch, YARN-498.wip.patch


 Currently, it only sets the app attempt id which is really not required as 
 AMs are only expected to extract it from the container id.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-109) .tmp file is not deleted for localized archives

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613749#comment-13613749
 ] 

Hudson commented on YARN-109:
-

Integrated in Hadoop-Hdfs-trunk #1356 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/])
YARN-109. .tmp file is not deleted for localized archives (Mayank Bansal 
via bobby) (Revision 1460723)

 Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460723
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java


 .tmp file is not deleted for localized archives
 ---

 Key: YARN-109
 URL: https://issues.apache.org/jira/browse/YARN-109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Jason Lowe
Assignee: Mayank Bansal
 Fix For: 3.0.0, 0.23.7, 2.0.5-beta

 Attachments: YARN-109-trunk-1.patch, YARN-109-trunk-2.patch, 
 YARN-109-trunk-3.patch, YARN-109-trunk-4.patch, YARN-109-trunk-5.patch, 
 YARN-109-trunk.patch


 When archives are localized they are initially created as a .tmp file and 
 unpacked from that file.  However the .tmp file is not deleted afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-497) Yarn unmanaged-am launcher jar does not define a main class in its manifest

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613755#comment-13613755
 ] 

Hudson commented on YARN-497:
-

Integrated in Hadoop-Hdfs-trunk #1356 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/])
YARN-497. Yarn unmanaged-am launcher jar does not define a main class in 
its manifest (Hitesh Shah via bikas) (Revision 1460846)

 Result = FAILURE
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460846
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/pom.xml


 Yarn unmanaged-am launcher jar does not define a main class in its manifest
 ---

 Key: YARN-497
 URL: https://issues.apache.org/jira/browse/YARN-497
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
  Labels: usability
 Attachments: YARN-497.1.patch


 The jar should have a mainClass defined to make it easier to use with the 
 hadoop jar command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613747#comment-13613747
 ] 

Hudson commented on YARN-71:


Integrated in Hadoop-Hdfs-trunk #1356 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/])
YARN-71. Fix the NodeManager to clean up local-dirs on restart. Contributed 
by Xuan Gong. (Revision 1460808)

 Result = FAILURE
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460808
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerReboot.java


 Ensure/confirm that the NodeManager cleans up local-dirs on restart
 ---

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.0.5-beta

 Attachments: YARN-71.10.patch, YARN-71.11.patch, YARN-71.12.patch, 
 YARN-71.13.patch, YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, 
 YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch, 
 YARN-71.8.patch, YARN-71.9.patch


 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-439) Flatten NodeHeartbeatResponse

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613758#comment-13613758
 ] 

Hudson commented on YARN-439:
-

Integrated in Hadoop-Hdfs-trunk #1356 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/])
YARN-439. Flatten NodeHeartbeatResponse. Contributed by Xuan Gong. 
(Revision 1460811)

 Result = FAILURE
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460811
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/HeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/HeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestRecordFactory.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java
* 

[jira] [Commented] (YARN-469) Make scheduling mode in FS pluggable

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613751#comment-13613751
 ] 

Hudson commented on YARN-469:
-

Integrated in Hadoop-Hdfs-trunk #1356 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/])
YARN-469. Make scheduling mode in FS pluggable. (kkambatl via tucu) 
(Revision 1460961)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460961
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingAlgorithms.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingMode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes/FairSchedulingMode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes/FifoSchedulingMode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestComputeFairShares.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestSchedulingMode.java


 Make scheduling mode in FS pluggable
 

 Key: YARN-469
 URL: https://issues.apache.org/jira/browse/YARN-469
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: scheduler
 Fix For: 2.0.5-beta

 Attachments: yarn-469.patch, yarn-469.patch, yarn-469.patch, 
 yarn-469.patch, yarn-469.patch


 Currently, scheduling mode in FS is limited to Fair and FIFO. The code 
 typically has an if condition at multiple places to determine the correct 
 course of action.
 Making the scheduling mode pluggable helps in simplifying this process, 
 particularly as we add new modes (DRF in this case).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613757#comment-13613757
 ] 

Hudson commented on YARN-378:
-

Integrated in Hadoop-Hdfs-trunk #1356 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/])
YARN-378. Fix RM to make the AM max attempts/retries to be configurable per 
application by clients. Contributed by Zhijie Shen. (Revision 1460895)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460895
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java


 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Fix For: 2.0.5-beta

 Attachments: YARN-378_10.patch, YARN-378_11.patch, YARN-378_1.patch, 
 YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, 
 YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch, YARN-378_8.patch, 
 YARN-378_9.patch, YARN_378-final-commit.patch, 
 YARN-378_MAPREDUCE-5062.2.patch, YARN-378_MAPREDUCE-5062.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 

[jira] [Commented] (YARN-498) Unmanaged AM launcher does not set various constants in env for an AM, also does not handle failed AMs properly

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614018#comment-13614018
 ] 

Hudson commented on YARN-498:
-

Integrated in Hadoop-Mapreduce-trunk #1384 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1384/])
YARN-498. Unmanaged AM launcher does not set various constants in env for 
an AM, also does not handle failed AMs properly (Hitesh Shah via bikas) 
(Revision 1460954)

 Result = FAILURE
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460954
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java


 Unmanaged AM launcher does not set various constants in env for an AM, also 
 does not handle failed AMs properly
 ---

 Key: YARN-498
 URL: https://issues.apache.org/jira/browse/YARN-498
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: YARN-498.1.patch, YARN-498.2.patch, YARN-498.3.patch, 
 YARN-498.4.patch, YARN-498.wip.patch


 Currently, it only sets the app attempt id which is really not required as 
 AMs are only expected to extract it from the container id.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-469) Make scheduling mode in FS pluggable

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614017#comment-13614017
 ] 

Hudson commented on YARN-469:
-

Integrated in Hadoop-Mapreduce-trunk #1384 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1384/])
YARN-469. Make scheduling mode in FS pluggable. (kkambatl via tucu) 
(Revision 1460961)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460961
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingAlgorithms.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingMode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes/FairSchedulingMode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes/FifoSchedulingMode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestComputeFairShares.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestSchedulingMode.java


 Make scheduling mode in FS pluggable
 

 Key: YARN-469
 URL: https://issues.apache.org/jira/browse/YARN-469
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: scheduler
 Fix For: 2.0.5-beta

 Attachments: yarn-469.patch, yarn-469.patch, yarn-469.patch, 
 yarn-469.patch, yarn-469.patch


 Currently, scheduling mode in FS is limited to Fair and FIFO. The code 
 typically has an if condition at multiple places to determine the correct 
 course of action.
 Making the scheduling mode pluggable helps in simplifying this process, 
 particularly as we add new modes (DRF in this case).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-497) Yarn unmanaged-am launcher jar does not define a main class in its manifest

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614021#comment-13614021
 ] 

Hudson commented on YARN-497:
-

Integrated in Hadoop-Mapreduce-trunk #1384 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1384/])
YARN-497. Yarn unmanaged-am launcher jar does not define a main class in 
its manifest (Hitesh Shah via bikas) (Revision 1460846)

 Result = FAILURE
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460846
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/pom.xml


 Yarn unmanaged-am launcher jar does not define a main class in its manifest
 ---

 Key: YARN-497
 URL: https://issues.apache.org/jira/browse/YARN-497
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
  Labels: usability
 Attachments: YARN-497.1.patch


 The jar should have a mainClass defined to make it easier to use with the 
 hadoop jar command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-439) Flatten NodeHeartbeatResponse

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614024#comment-13614024
 ] 

Hudson commented on YARN-439:
-

Integrated in Hadoop-Mapreduce-trunk #1384 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1384/])
YARN-439. Flatten NodeHeartbeatResponse. Contributed by Xuan Gong. 
(Revision 1460811)

 Result = FAILURE
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460811
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/HeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/HeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestRecordFactory.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java
* 

[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-26 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614036#comment-13614036
 ] 

Robert Joseph Evans commented on YARN-378:
--

Hitesh and Vinod,

It is not a big deal. I realized that both were going in, and I am glad that 
this is ready and has gone in.  It is a great feature. It just would have been 
nice to either commit them at the same time, or give a heads up on the mailing 
list that you were going to break the build for a little while.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Fix For: 2.0.5-beta

 Attachments: YARN-378_10.patch, YARN-378_11.patch, YARN-378_1.patch, 
 YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, 
 YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch, YARN-378_8.patch, 
 YARN-378_9.patch, YARN_378-final-commit.patch, 
 YARN-378_MAPREDUCE-5062.2.patch, YARN-378_MAPREDUCE-5062.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-7) Add support for DistributedShell to ask for CPUs along with memory

2013-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614051#comment-13614051
 ] 

Hadoop QA commented on YARN-7:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12575494/YARN-7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/602//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/602//console

This message is automatically generated.

 Add support for DistributedShell to ask for CPUs along with memory
 --

 Key: YARN-7
 URL: https://issues.apache.org/jira/browse/YARN-7
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Arun C Murthy
  Labels: patch
 Attachments: YARN-7.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-7) Add support for DistributedShell to ask for CPUs along with memory

2013-03-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-7:
-

Assignee: Junping Du

 Add support for DistributedShell to ask for CPUs along with memory
 --

 Key: YARN-7
 URL: https://issues.apache.org/jira/browse/YARN-7
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Arun C Murthy
Assignee: Junping Du
  Labels: patch
 Attachments: YARN-7.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-03-26 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-193:
-

Attachment: YARN-193.6.patch

Based on @Hitesh's previous patch, I've made the following changes in the 
newest one:

1. Modify the boundary case of judging valid resource value ( 0 = = 0).

2. maxMem doesn't need to be the multiple times of minMem.

3. To fix YARN-382, in RMAppManager, AM CLC still need to be updated after 
request normalization is executed, such that AM CLC knows the updated resource 
if possible, which will be equal to the resource of the allocated container. To 
ensure the equivalence, assert is added in 
RMAppAttemptImpl$AMContainerAllocatedTransition. Changes in YARN-370 is also 
reverted.

Therefore, if this jira is fixed, YARN-382 can be fixed as well.

4. InvalidResourceException, which is extended from IOException, is created and 
used when the requested resource is invalid in terms of its numbers. Modify the 
related functions to either throw or capture the exceptions. In particular, in 
the transitions of RMAppAttemptImpl, when the exception is captured the attempt 
will transit to FAILED state.

When YARN-142 gets fixed, the customized exception need to be updated.

5. Reorganize the code.

6. Add more test cases.

Comments, please. Thanks!

 Scheduler.normalizeRequest does not account for allocation requests that 
 exceed maximumAllocation limits 
 -

 Key: YARN-193
 URL: https://issues.apache.org/jira/browse/YARN-193
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
 MR-3796.wip.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

2013-03-26 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614355#comment-13614355
 ] 

Arun C Murthy commented on YARN-18:
---

Sorry, I'm just getting to this. This is a lot to digest.

Can we consider breaking this down to a couple of smaller patches? Tx.

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, 
 MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, 
 MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, 
 YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, 
 YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2013-03-26 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-101:
--

Assignee: Xuan Gong

 If  the heartbeat message loss, the nodestatus info of complete container 
 will loss too.
 

 Key: YARN-101
 URL: https://issues.apache.org/jira/browse/YARN-101
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: suse.
Reporter: xieguiming
Assignee: Xuan Gong
Priority: Minor
 Attachments: YARN-101.1.patch


 see the red color:
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
  protected void startStatusUpdater() {
 new Thread(Node Status Updater) {
   @Override
   @SuppressWarnings(unchecked)
   public void run() {
 int lastHeartBeatID = 0;
 while (!isStopped) {
   // Send heartbeat
   try {
 synchronized (heartbeatMonitor) {
   heartbeatMonitor.wait(heartBeatInterval);
 }
 {color:red} 
 // Before we send the heartbeat, we get the NodeStatus,
 // whose method removes completed containers.
 NodeStatus nodeStatus = getNodeStatus();
  {color}
 nodeStatus.setResponseId(lastHeartBeatID);
 
 NodeHeartbeatRequest request = recordFactory
 .newRecordInstance(NodeHeartbeatRequest.class);
 request.setNodeStatus(nodeStatus);   
 {color:red} 
// But if the nodeHeartbeat fails, we've already removed the 
 containers away to know about it. We aren't handling a nodeHeartbeat failure 
 case here.
 HeartbeatResponse response =
   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
{color} 
 if (response.getNodeAction() == NodeAction.SHUTDOWN) {
   LOG
   .info(Recieved SHUTDOWN signal from Resourcemanager as 
 part of heartbeat, +
hence shutting down.);
   NodeStatusUpdaterImpl.this.stop();
   break;
 }
 if (response.getNodeAction() == NodeAction.REBOOT) {
   LOG.info(Node is out of sync with ResourceManager,
   +  hence rebooting.);
   NodeStatusUpdaterImpl.this.reboot();
   break;
 }
 lastHeartBeatID = response.getResponseId();
 ListContainerId containersToCleanup = response
 .getContainersToCleanupList();
 if (containersToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedContainersEvent(containersToCleanup));
 }
 ListApplicationId appsToCleanup =
 response.getApplicationsToCleanupList();
 //Only start tracking for keepAlive on FINISH_APP
 trackAppsForKeepAlive(appsToCleanup);
 if (appsToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedAppsEvent(appsToCleanup));
 }
   } catch (Throwable e) {
 // TODO Better error handling. Thread can die with the rest of the
 // NM still running.
 LOG.error(Caught exception in status-updater, e);
   }
 }
   }
 }.start();
   }
   private NodeStatus getNodeStatus() {
 NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class);
 nodeStatus.setNodeId(this.nodeId);
 int numActiveContainers = 0;
 ListContainerStatus containersStatuses = new 
 ArrayListContainerStatus();
 for (IteratorEntryContainerId, Container i =
 this.context.getContainers().entrySet().iterator(); i.hasNext();) {
   EntryContainerId, Container e = i.next();
   ContainerId containerId = e.getKey();
   Container container = e.getValue();
   // Clone the container to send it to the RM
   org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = 
   container.cloneAndGetContainerStatus();
   containersStatuses.add(containerStatus);
   ++numActiveContainers;
   LOG.info(Sending out status for container:  + containerStatus);
   {color:red} 
   // Here is the part that removes the completed containers.
   if (containerStatus.getState() == ContainerState.COMPLETE) {
 // Remove
 i.remove();
   {color} 
 LOG.info(Removed completed container  + containerId);
   }
 }
 nodeStatus.setContainersStatuses(containersStatuses);
 LOG.debug(this.nodeId +  sending out status for 
 + numActiveContainers +  containers);
 NodeHealthStatus nodeHealthStatus = 

[jira] [Commented] (YARN-440) Flatten RegisterNodeManagerResponse

2013-03-26 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614381#comment-13614381
 ] 

Siddharth Seth commented on YARN-440:
-

+1. Committing.

 Flatten RegisterNodeManagerResponse
 ---

 Key: YARN-440
 URL: https://issues.apache.org/jira/browse/YARN-440
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: YARN-440.1.patch, YARN-440.2.patch, YARN-440.3.patch


 RegisterNodeManagerResponse has another wrapper RegistrationResponse under 
 it, which can be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-98) NM Application invalid state transition on reboot command from RM

2013-03-26 Thread omkar vinit joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-98?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

omkar vinit joshi reassigned YARN-98:
-

Assignee: omkar vinit joshi

 NM Application invalid state transition on reboot command from RM
 -

 Key: YARN-98
 URL: https://issues.apache.org/jira/browse/YARN-98
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Thomas Graves
Assignee: omkar vinit joshi

 If the RM goes down and comes back up, it tells the NM to reboot.  When the 
 NM reboots, if it has any applications it aggregates the logs for those 
 applications, then it transitions the app to 
 APPLICATION_LOG_HANDLING_FINISHED. I saw a case where there was an app that 
 was in the RUNNING state and tried to transition to 
 APPLICATION_LOG_HANDLING_finished and it got the invalid transition.
  [DeletionService #1]2012-04-11 15:12:40,476 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
  [AsyncDispatcher event 
 handler]org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid 
 event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:382)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:517)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:509)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74)
 at java.lang.Thread.run(Thread.java:619)
 2012-04-11 15:12:40,476 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1333003059741_15999 transitioned from RUNNING to null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed

2013-03-26 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-474:
-

Attachment: YARN-474.2.patch

Separate the fix for the specific problem of YARN-474 and that of YARN-209, to 
make the other issue independent, though the two issues share the same root 
cause.

 CapacityScheduler does not activate applications when configuration is 
 refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: YARN-474.1.patch, YARN-474.2.patch


 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-209) Capacity scheduler can leave application in pending state

2013-03-26 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-209:
-

Attachment: YARN-209.3.patch

Extract the fixing code specifically for this issue from YARN-474, to make this 
issue unblocked. @Bikas' end-to-end test case is retained but simplified, 
because it is good example to demonstrate the problem described here.

 Capacity scheduler can leave application in pending state
 -

 Key: YARN-209
 URL: https://issues.apache.org/jira/browse/YARN-209
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Fix For: 3.0.0

 Attachments: YARN-209.1.patch, YARN-209.2.patch, YARN-209.3.patch, 
 YARN-209-test.patch


 Say application A is submitted but at that time it does not meet the bar for 
 activation because of resource limit settings for applications. After that if 
 more hardware is added to the system and the application becomes valid it 
 still remains in pending state, likely forever.
 This might be rare to hit in real life because enough NM's heartbeat to the 
 RM before applications can get submitted. But a change in settings or 
 heartbeat interval might make it easier to repro. In RM restart scenarios, 
 this will likely hit more if its implemented by re-playing events and 
 re-submitting applications to the scheduler before the RPC to NM's is 
 activated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-112) Race in localization can cause containers to fail

2013-03-26 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614402#comment-13614402
 ] 

Robert Joseph Evans commented on YARN-112:
--

I am not really sure that we fixed the underlying issue.  

{code}files.rename(dst_work, destDirPath, Rename.OVERWRITE);{code}

threw an exception because there was something else in that directory already, 
but files.mkdir(destDirPath, cachePerms, false) is supposed to throw a 
FileAlreadyExistsException if the directory already exists.  

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileContext.html#mkdir%28org.apache.hadoop.fs.Path,%20org.apache.hadoop.fs.permission.FsPermission,%20boolean%29

files.rename should never get into this situation if files.rename threw the 
exception when it was supposed to.

I tested this and 
{code}
FileContext lfc = FileContext.getLocalFSFileContext(new Configuration());
Path p = new Path(/tmp/bobby.12345);
FsPermission cachePerms = new FsPermission((short) 0755);
lfc.mkdir(p, cachePerms, false);
lfc.mkdir(p, cachePerms, false);
{code}

never throws an exception.  We first need to address the bug in FileContext, 
and then we can look at how we can make FSDownload deal with mkdir throwing an 
exception, or whatever the fix ends up being.

I filed HADOOP-9438 for this.

If the fix ends up being that we do not support throwing the exception in 
FileContext, then your current solution looks OK.

I also have a hard time believing that we are getting random collisions on a 
long value that should be fairly uniformly distributed.  We need to guard 
against it either way and I suppose it is possible, but if I remember correctly 
we were seeing a significant number of these errors and my gut tells me that 
there is either something very wrong with Random, or there is something else 
also going on here.

 Race in localization can cause containers to fail
 -

 Key: YARN-112
 URL: https://issues.apache.org/jira/browse/YARN-112
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: omkar vinit joshi
 Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, 
 yarn-112.20131503.patch


 On one of our 0.23 clusters, I saw a case of two containers, corresponding to 
 two map tasks of a MR job, that were launched almost simultaneously on the 
 same node.  It appears they both tried to localize job.jar and job.xml at the 
 same time.  One of the containers failed when it couldn't rename the 
 temporary job.jar directory to its final name because the target directory 
 wasn't empty.  Shortly afterwards the second container failed because job.xml 
 could not be found, presumably because the first container removed it when it 
 cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-209) Capacity scheduler can leave application in pending state

2013-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614420#comment-13614420
 ] 

Hadoop QA commented on YARN-209:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12575556/YARN-209.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/605//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/605//console

This message is automatically generated.

 Capacity scheduler can leave application in pending state
 -

 Key: YARN-209
 URL: https://issues.apache.org/jira/browse/YARN-209
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Fix For: 3.0.0

 Attachments: YARN-209.1.patch, YARN-209.2.patch, YARN-209.3.patch, 
 YARN-209-test.patch


 Say application A is submitted but at that time it does not meet the bar for 
 activation because of resource limit settings for applications. After that if 
 more hardware is added to the system and the application becomes valid it 
 still remains in pending state, likely forever.
 This might be rare to hit in real life because enough NM's heartbeat to the 
 RM before applications can get submitted. But a change in settings or 
 heartbeat interval might make it easier to repro. In RM restart scenarios, 
 this will likely hit more if its implemented by re-playing events and 
 re-submitting applications to the scheduler before the RPC to NM's is 
 activated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed

2013-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614421#comment-13614421
 ] 

Hadoop QA commented on YARN-474:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/1257/YARN-474.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/604//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/604//console

This message is automatically generated.

 CapacityScheduler does not activate applications when configuration is 
 refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: YARN-474.1.patch, YARN-474.2.patch


 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-172) AM logs link in RM ui redirects back to RM if AM not started

2013-03-26 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-172:
-

Labels: usability  (was: )

 AM logs link in RM ui redirects back to RM if AM not started
 

 Key: YARN-172
 URL: https://issues.apache.org/jira/browse/YARN-172
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.3
Reporter: Thomas Graves
  Labels: usability

 I went to the RM UI app page for an application that failed to start with the 
 error:  org.apache.hadoop.security.AccessControlException: User user cannot 
 submit applications to queue root.foo 
 I tried to click on the AM logs link and it just redirected me back to the RM 
 page.  if the AM didn't start we shouldn't show an attempt there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-20) More information for yarn.resourcemanager.webapp.address in yarn-default.xml

2013-03-26 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-20:


Component/s: documentation

 More information for yarn.resourcemanager.webapp.address in yarn-default.xml
 --

 Key: YARN-20
 URL: https://issues.apache.org/jira/browse/YARN-20
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation, resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: nemon lou
Priority: Trivial
 Attachments: YARN-20.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

   The parameter  yarn.resourcemanager.webapp.address in yarn-default.xml  is 
 in host:port format,which is noted in the cluster set up guide 
 (http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html).
   When i read though the code,i find host format is also supported. In 
 host format,the port will be random.
   So we may add more documentation in  yarn-default.xml for easy understood.
   I will submit a patch if it's helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-432) Documentation for Log Aggregation and log retrieval.

2013-03-26 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-432:
-

Component/s: documentation

 Documentation for Log Aggregation and log retrieval.
 

 Key: YARN-432
 URL: https://issues.apache.org/jira/browse/YARN-432
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Mahadev konar
Assignee: Siddharth Seth

 Retrieving logs in 0.23 is very different from what 0.20.* does. This is a 
 very new feature which will require good documentation for users to get used 
 to it. Lets make sure we have some solid documentation for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable

2013-03-26 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614486#comment-13614486
 ] 

Sandy Ryza commented on YARN-24:


Thinking about this a little more, I didn't see a strong reason not to verify 
the root log dir each time.  This makes the log aggregation service resilient 
to the root directory being deleted or chmoded while a nodemanager is running.  
Uploaded a new patch that does this.

 Nodemanager fails to start if log aggregation enabled and namenode unavailable
 --

 Key: YARN-24
 URL: https://issues.apache.org/jira/browse/YARN-24
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Jason Lowe
Assignee: Sandy Ryza
 Attachments: YARN-24-1.patch, YARN-24-2.patch, YARN-24-3.patch, 
 YARN-24.patch


 If log aggregation is enabled and the namenode is currently unavailable, the 
 nodemanager fails to startup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-503) DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false

2013-03-26 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614502#comment-13614502
 ] 

Daryn Sharp commented on YARN-503:
--

bq. Will the case of MR actions be an issue? that the launcher goes away?

No, the central focus of this patch is to keep tokens alive as long as _at 
least one job_ is using the tokens.  Upon job submission, the new app is 
immediately linked against the tokens.  So for an oozie action, it's ok for the 
launcher to exit after submitting an action.  The tokens will stay alive until 
the action, and any sub-jobs it may have launched, have completed.  After no 
app is running with the tokens, and the keepalive expires, the tokens are 
cancelled.

Note that by default I maintained 100% backwards compat in that tokens for 
oozie jobs setting the  mapreduce.job.complete.cancel.delegation.tokens=false 
will never be cancelled.  The RM will stop renewing them and won't issue 
duplicate renews.  Until we deprecate/remove the setting, we may internally try 
make the conf setting a final to see what happens.

Will address findbugs after some webhdfs firefighting.

 DelegationTokens will be renewed forever if multiple jobs share tokens and 
 the first one sets JOB_CANCEL_DELEGATION_TOKEN to false
 --

 Key: YARN-503
 URL: https://issues.apache.org/jira/browse/YARN-503
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha
Reporter: Siddharth Seth
Assignee: Daryn Sharp
 Attachments: YARN-503.patch


 The first Job/App to register a token is the one which DelegationTokenRenewer 
 associates with a a specific Token. An attempt to remove/cancel these shared 
 tokens by subsequent jobs doesn't work - since the JobId will not match.
 As a result, Even if subsequent jobs have 
 MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN set to true - tokens will not be 
 cancelled when those jobs complete.
 Tokens will eventually be removed from the RM / JT when the service that 
 issued them considers them to have expired or via an explicit 
 cancelDelegationTokens call (not implemented yet in 23).
 A side affect of this is that the same delegation token will end up being 
 renewed multiple times (a separate TimerTask for each job which uses the 
 token).
 DelegationTokenRenewer could maintain a reference count/list of jobIds for 
 shared tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-509) ResourceTrackerPB misses KerberosInfo annotation which renders YARN unusable on secure clusters

2013-03-26 Thread Konstantin Boudnik (JIRA)
Konstantin Boudnik created YARN-509:
---

 Summary: ResourceTrackerPB misses KerberosInfo annotation which 
renders YARN unusable on secure clusters
 Key: YARN-509
 URL: https://issues.apache.org/jira/browse/YARN-509
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
 Environment: BigTop Kerberized cluster test environment
Reporter: Konstantin Boudnik
Priority: Blocker
 Fix For: 2.0.4-alpha, 3.0.0


During BigTop 0.6.0 release test cycle, [~rvs] came around the following 
problem:
{noformat}
013-03-26 15:37:03,573 FATAL
org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting
NodeManager
org.apache.hadoop.yarn.YarnException: Failed to Start
org.apache.hadoop.yarn.server.nodemanager.NodeManager
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:322)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359)
Caused by: org.apache.avro.AvroRuntimeException:
java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:162)
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
... 3 more
Caused by: java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128)
at 
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:61)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:158)
... 4 more
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not
authorized for protocol interface
org.apache.hadoop.yarn.server.api.ResourceTrackerPB, expected client
Kerberos principal is yarn/ip-10-46-37-244.ec2.internal@BIGTOP
at org.apache.hadoop.ipc.Client.call(Client.java:1235)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy26.registerNodeManager(Unknown Source)
at 
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
... 6 more

{noformat}

The most significant part is 
{{User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not 
authorized for protocol interface  
org.apache.hadoop.yarn.server.api.ResourceTrackerPB}} indicating that 
ResourceTrackerPB hasn't been annotated with {{@KerberosInfo}} nor 
{{@TokenInfo}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-03-26 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-193:
-

Attachment: YARN-193.7.patch

Clean up the warnings in TestRMAppAttemptTransitions and fix the broken test 
cases in it and TestClientRMService.

 Scheduler.normalizeRequest does not account for allocation requests that 
 exceed maximumAllocation limits 
 -

 Key: YARN-193
 URL: https://issues.apache.org/jira/browse/YARN-193
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
 MR-3796.wip.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, 
 YARN-193.7.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-112) Race in localization can cause containers to fail

2013-03-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614650#comment-13614650
 ] 

Vinod Kumar Vavilapalli commented on YARN-112:
--

Bobby, I too have seen in large clusters/jobs - the law of large numbers :) We 
don't see the random number generator.

HADOOP-9438 will help, but I think instead of this solution, avoiding the race 
altogether by generating the destination path deterministically unique is a 
better solution. Something like localizer_id + random_num is a better 
destination path than plain random number.

 Race in localization can cause containers to fail
 -

 Key: YARN-112
 URL: https://issues.apache.org/jira/browse/YARN-112
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: omkar vinit joshi
 Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, 
 yarn-112.20131503.patch


 On one of our 0.23 clusters, I saw a case of two containers, corresponding to 
 two map tasks of a MR job, that were launched almost simultaneously on the 
 same node.  It appears they both tried to localize job.jar and job.xml at the 
 same time.  One of the containers failed when it couldn't rename the 
 temporary job.jar directory to its final name because the target directory 
 wasn't empty.  Shortly afterwards the second container failed because job.xml 
 could not be found, presumably because the first container removed it when it 
 cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-112) Race in localization can cause containers to fail

2013-03-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614654#comment-13614654
 ] 

Vinod Kumar Vavilapalli commented on YARN-112:
--

bq. We don't see the random number generator.
I meant seed* .

 Race in localization can cause containers to fail
 -

 Key: YARN-112
 URL: https://issues.apache.org/jira/browse/YARN-112
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: omkar vinit joshi
 Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, 
 yarn-112.20131503.patch


 On one of our 0.23 clusters, I saw a case of two containers, corresponding to 
 two map tasks of a MR job, that were launched almost simultaneously on the 
 same node.  It appears they both tried to localize job.jar and job.xml at the 
 same time.  One of the containers failed when it couldn't rename the 
 temporary job.jar directory to its final name because the target directory 
 wasn't empty.  Shortly afterwards the second container failed because job.xml 
 could not be found, presumably because the first container removed it when it 
 cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-209) Capacity scheduler doesn't trigger app-activation after adding nodes

2013-03-26 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-209:
-

Summary: Capacity scheduler doesn't trigger app-activation after adding 
nodes  (was: Capacity scheduler can leave application in pending state)

 Capacity scheduler doesn't trigger app-activation after adding nodes
 

 Key: YARN-209
 URL: https://issues.apache.org/jira/browse/YARN-209
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Fix For: 3.0.0

 Attachments: YARN-209.1.patch, YARN-209.2.patch, YARN-209.3.patch, 
 YARN-209-test.patch


 Say application A is submitted but at that time it does not meet the bar for 
 activation because of resource limit settings for applications. After that if 
 more hardware is added to the system and the application becomes valid it 
 still remains in pending state, likely forever.
 This might be rare to hit in real life because enough NM's heartbeat to the 
 RM before applications can get submitted. But a change in settings or 
 heartbeat interval might make it easier to repro. In RM restart scenarios, 
 this will likely hit more if its implemented by re-playing events and 
 re-submitting applications to the scheduler before the RPC to NM's is 
 activated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2013-03-26 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-101:
---

Attachment: YARN-101.2.patch

1. recreate the patch based on the latest trunk version
2. add new testcase to test the patch

 If  the heartbeat message loss, the nodestatus info of complete container 
 will loss too.
 

 Key: YARN-101
 URL: https://issues.apache.org/jira/browse/YARN-101
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: suse.
Reporter: xieguiming
Assignee: Xuan Gong
Priority: Minor
 Attachments: YARN-101.1.patch, YARN-101.2.patch


 see the red color:
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
  protected void startStatusUpdater() {
 new Thread(Node Status Updater) {
   @Override
   @SuppressWarnings(unchecked)
   public void run() {
 int lastHeartBeatID = 0;
 while (!isStopped) {
   // Send heartbeat
   try {
 synchronized (heartbeatMonitor) {
   heartbeatMonitor.wait(heartBeatInterval);
 }
 {color:red} 
 // Before we send the heartbeat, we get the NodeStatus,
 // whose method removes completed containers.
 NodeStatus nodeStatus = getNodeStatus();
  {color}
 nodeStatus.setResponseId(lastHeartBeatID);
 
 NodeHeartbeatRequest request = recordFactory
 .newRecordInstance(NodeHeartbeatRequest.class);
 request.setNodeStatus(nodeStatus);   
 {color:red} 
// But if the nodeHeartbeat fails, we've already removed the 
 containers away to know about it. We aren't handling a nodeHeartbeat failure 
 case here.
 HeartbeatResponse response =
   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
{color} 
 if (response.getNodeAction() == NodeAction.SHUTDOWN) {
   LOG
   .info(Recieved SHUTDOWN signal from Resourcemanager as 
 part of heartbeat, +
hence shutting down.);
   NodeStatusUpdaterImpl.this.stop();
   break;
 }
 if (response.getNodeAction() == NodeAction.REBOOT) {
   LOG.info(Node is out of sync with ResourceManager,
   +  hence rebooting.);
   NodeStatusUpdaterImpl.this.reboot();
   break;
 }
 lastHeartBeatID = response.getResponseId();
 ListContainerId containersToCleanup = response
 .getContainersToCleanupList();
 if (containersToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedContainersEvent(containersToCleanup));
 }
 ListApplicationId appsToCleanup =
 response.getApplicationsToCleanupList();
 //Only start tracking for keepAlive on FINISH_APP
 trackAppsForKeepAlive(appsToCleanup);
 if (appsToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedAppsEvent(appsToCleanup));
 }
   } catch (Throwable e) {
 // TODO Better error handling. Thread can die with the rest of the
 // NM still running.
 LOG.error(Caught exception in status-updater, e);
   }
 }
   }
 }.start();
   }
   private NodeStatus getNodeStatus() {
 NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class);
 nodeStatus.setNodeId(this.nodeId);
 int numActiveContainers = 0;
 ListContainerStatus containersStatuses = new 
 ArrayListContainerStatus();
 for (IteratorEntryContainerId, Container i =
 this.context.getContainers().entrySet().iterator(); i.hasNext();) {
   EntryContainerId, Container e = i.next();
   ContainerId containerId = e.getKey();
   Container container = e.getValue();
   // Clone the container to send it to the RM
   org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = 
   container.cloneAndGetContainerStatus();
   containersStatuses.add(containerStatus);
   ++numActiveContainers;
   LOG.info(Sending out status for container:  + containerStatus);
   {color:red} 
   // Here is the part that removes the completed containers.
   if (containerStatus.getState() == ContainerState.COMPLETE) {
 // Remove
 i.remove();
   {color} 
 LOG.info(Removed completed container  + containerId);
   }
 }
 nodeStatus.setContainersStatuses(containersStatuses);
 

[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

2013-03-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614675#comment-13614675
 ] 

Junping Du commented on YARN-18:


Thanks Arun and Luke for comments and review.
[~acmurthy], YARN-18 and YARN-19 are still part of HADOOP-8468. In that 
umbrella JIRA, I had a proposal to describe how these two changes (P6, P7) 
works in detail level which may helps you to review this patch. If necessary, I 
can reflect the current changes (with addressing lots of comments above) and 
attach again in this JIRA. Thx!

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, 
 MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, 
 MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, 
 YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, 
 YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-509) ResourceTrackerPB misses KerberosInfo annotation which renders YARN unusable on secure clusters

2013-03-26 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614710#comment-13614710
 ] 

Roman Shaposhnik commented on YARN-509:
---

This is from Bigtop testing so I can make the cluster available for you (I'll 
need your public ssh key -- please send it to me offline pref. PGP encoded). 
Now, to answer your questions:

bq. What is security.resourcetracker.protocol.acl set to in your 
hadoop-policy.xml?

${HADOOP_YARN_USER} which acording to the process environment translates to yarn

bq. What is yarn.nodemanager.principal in yarn-site.xml ?

yarn/_HOST@BIGTOP

bq. RMNMSecurityInfoClass.class and the text file 
org.apache.hadoop.security.SecurityInfo are on the classpath of ResourceManager?

Yes it is.

Please let me know if you need any more info or if you'd like to get access to 
the cluster.

 ResourceTrackerPB misses KerberosInfo annotation which renders YARN unusable 
 on secure clusters
 ---

 Key: YARN-509
 URL: https://issues.apache.org/jira/browse/YARN-509
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
 Environment: BigTop Kerberized cluster test environment
Reporter: Konstantin Boudnik
Priority: Blocker
 Fix For: 3.0.0, 2.0.4-alpha


 During BigTop 0.6.0 release test cycle, [~rvs] came around the following 
 problem:
 {noformat}
 013-03-26 15:37:03,573 FATAL
 org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting
 NodeManager
 org.apache.hadoop.yarn.YarnException: Failed to Start
 org.apache.hadoop.yarn.server.nodemanager.NodeManager
 at 
 org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:199)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:322)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359)
 Caused by: org.apache.avro.AvroRuntimeException:
 java.lang.reflect.UndeclaredThrowableException
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:162)
 at 
 org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
 ... 3 more
 Caused by: java.lang.reflect.UndeclaredThrowableException
 at 
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128)
 at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:61)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:199)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:158)
 ... 4 more
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
 User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not
 authorized for protocol interface
 org.apache.hadoop.yarn.server.api.ResourceTrackerPB, expected client
 Kerberos principal is yarn/ip-10-46-37-244.ec2.internal@BIGTOP
 at org.apache.hadoop.ipc.Client.call(Client.java:1235)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at $Proxy26.registerNodeManager(Unknown Source)
 at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
 ... 6 more
 {noformat}
 The most significant part is 
 {{User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not 
 authorized for protocol interface  
 org.apache.hadoop.yarn.server.api.ResourceTrackerPB}} indicating that 
 ResourceTrackerPB hasn't been annotated with {{@KerberosInfo}} nor 
 {{@TokenInfo}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-510) Writing Yarn Applications documentation should be changed to signify use of of fully qualified paths when localizing resources

2013-03-26 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-510:


 Summary: Writing Yarn Applications documentation should be changed 
to signify use of of fully qualified paths when localizing resources
 Key: YARN-510
 URL: https://issues.apache.org/jira/browse/YARN-510
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.0-alpha
Reporter: Hitesh Shah
Assignee: Hitesh Shah


Path jarPath = new Path(/Working_HDFS_DIR/+ appId +/+AM_JAR);
fs.copyFromLocalFile(new Path(/local/src/AM.jar), jarPath); // VALIDATED jar 
is in HDFS under correct PATH
FileStatus jarStatus = fs.getFileStatus(jarPath);
LocalResource amJarRsrc = Records.newRecord(LocalResource.class);
amJarRsrc.setType(LocalResourceType.FILE);
amJarRsrc.setVisibility(LocalResourceVisibility.APPLICATION);
amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath));
amJarRsrc.setTimestamp(jarStatus.getModificationTime());
amJarRsrc.setSize(jarStatus.getLen());
localResources.put(AppMaster.jar,  amJarRsrc);
amContainer.setLocalResources(localResources);

Error logs (nodeManager.log)

INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Application application_1364219323374_0016 transitioned from INITING to RUNNING
INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Got exception parsing AppMaster.jar and value resource {, port: -1, file: 
/Working_HDFS_DIR/application_1364219323374_0016/AM.jar, }, size: 13940, 
timestamp: 1364230436600, type: FILE, visibility: APPLICATION, 
2013-03-25 17:53:57,391 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Failed to parse resource-request
java.net.URISyntaxException: Expected scheme name at index 0: 
:///Working_HDFS_DIR/application_1364219323374_0016/AM.jar
at java.net.URI$Parser.fail(URI.java:2810)
at java.net.URI$Parser.failExpecting(URI.java:2816)
at java.net.URI$Parser.parse(URI.java:3008)
at java.net.URI.init(URI.java:735)
at 
org.apache.hadoop.yarn.util.ConverterUtils.getPathFromYarnURL(ConverterUtils.java:70)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.init(LocalResourceRequest.java:46)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:501)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:472)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:382)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMa

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-309) Make RM provide heartbeat interval to NM

2013-03-26 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-309:
---

Attachment: YARN-309.4.patch

1. create new contents in YarnConfiguration, to set default value
2. Everytime ResourceTrackerService will set HeartBeatInterval, and NM will get 
and use this interval
3. add heartbeatInterval variable in .proto file

 Make RM provide heartbeat interval to NM
 

 Key: YARN-309
 URL: https://issues.apache.org/jira/browse/YARN-309
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-309.1.patch, YARN-309.2.patch, YARN-309.3.patch, 
 YARN-309.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-499) On container failure, include last n lines of logs in diagnostics

2013-03-26 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614738#comment-13614738
 ] 

Sandy Ryza commented on YARN-499:
-

Ravi,

The idea of putting the app master in a big try/catch seems good to me, but I 
was envisioning this JIRA to encompass something more general that would handle 
non-AM container logs, containers that OOM before getting into the main 
function, and containers that don't run java.  It's true that the approach I 
outlined doesn't deterministically report exceptions, but it at least gets us 
back to parity with MR1, and I believe that in most cases (and in all cases 
that I've seen), the end of the log contains the helpful information.

 On container failure, include last n lines of logs in diagnostics
 -

 Key: YARN-499
 URL: https://issues.apache.org/jira/browse/YARN-499
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 When a container fails, the only way to diagnose it is to look at the logs.  
 ContainerStatuses include a diagnostic string that is reported back to the 
 resource manager by the node manager.
 Currently in MR2 I believe whatever is sent to the task's standard out is 
 added to the diagnostics string, but for MR standard out is redirected to a 
 file called stdout.  In MR1, this string was populated with the last few 
 lines of the task's stdout file, and got printed to the console, allowing for 
 easy debugging.
 Handling this would help to soothe the infuriating problem of an AM dying for 
 a mysterious reason before setting a tracking URL (MAPREDUCE-3688).
 This could be done in one of two ways.
 * Use tee to send MR's standard out to both the stdout file and standard out. 
  This requires modifying ShellCmdExecutor to roll what it reads in, as we 
 wouldn't want to be storing the entire task log in NM memory.
 * Read the task's log files.  This would require standardizing or making the 
 container log files configurable.  Right now the log files are determined in 
 userland and all that is YARN is aware of the log directory.
 Does this present any issues I'm not considering?  If so it this might only 
 be needed for AMs? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-499) On container failure, include last n lines of logs in diagnostics

2013-03-26 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-499:


Attachment: YARN-499.patch

 On container failure, include last n lines of logs in diagnostics
 -

 Key: YARN-499
 URL: https://issues.apache.org/jira/browse/YARN-499
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-499.patch


 When a container fails, the only way to diagnose it is to look at the logs.  
 ContainerStatuses include a diagnostic string that is reported back to the 
 resource manager by the node manager.
 Currently in MR2 I believe whatever is sent to the task's standard out is 
 added to the diagnostics string, but for MR standard out is redirected to a 
 file called stdout.  In MR1, this string was populated with the last few 
 lines of the task's stdout file, and got printed to the console, allowing for 
 easy debugging.
 Handling this would help to soothe the infuriating problem of an AM dying for 
 a mysterious reason before setting a tracking URL (MAPREDUCE-3688).
 This could be done in one of two ways.
 * Use tee to send MR's standard out to both the stdout file and standard out. 
  This requires modifying ShellCmdExecutor to roll what it reads in, as we 
 wouldn't want to be storing the entire task log in NM memory.
 * Read the task's log files.  This would require standardizing or making the 
 container log files configurable.  Right now the log files are determined in 
 userland and all that is YARN is aware of the log directory.
 Does this present any issues I'm not considering?  If so it this might only 
 be needed for AMs? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-499) On container failure, include last n lines of logs in diagnostics

2013-03-26 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614752#comment-13614752
 ] 

Sandy Ryza commented on YARN-499:
-

Uploaded a patch that uses tee to send the standard out both to standard out 
and the stdout file.  The standard ShellCommandExecutor holds on to all of a 
container's standard output.  I replaced it with one that only holds on to the 
last 500 characters.  This also fixes an existing security issue that would a 
allow container to force an out of memory error in the nodemanager by feeding 
it a ton of output.

I've verified on a pseudo-distributed cluster that OOM errors on jvm 
initialization get printed to the console.

 On container failure, include last n lines of logs in diagnostics
 -

 Key: YARN-499
 URL: https://issues.apache.org/jira/browse/YARN-499
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-499.patch


 When a container fails, the only way to diagnose it is to look at the logs.  
 ContainerStatuses include a diagnostic string that is reported back to the 
 resource manager by the node manager.
 Currently in MR2 I believe whatever is sent to the task's standard out is 
 added to the diagnostics string, but for MR standard out is redirected to a 
 file called stdout.  In MR1, this string was populated with the last few 
 lines of the task's stdout file, and got printed to the console, allowing for 
 easy debugging.
 Handling this would help to soothe the infuriating problem of an AM dying for 
 a mysterious reason before setting a tracking URL (MAPREDUCE-3688).
 This could be done in one of two ways.
 * Use tee to send MR's standard out to both the stdout file and standard out. 
  This requires modifying ShellCmdExecutor to roll what it reads in, as we 
 wouldn't want to be storing the entire task log in NM memory.
 * Read the task's log files.  This would require standardizing or making the 
 container log files configurable.  Right now the log files are determined in 
 userland and all that is YARN is aware of the log directory.
 Does this present any issues I'm not considering?  If so it this might only 
 be needed for AMs? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-112) Race in localization can cause containers to fail

2013-03-26 Thread omkar vinit joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614755#comment-13614755
 ] 

omkar vinit joshi commented on YARN-112:


Vinod's suggestion looks good to me and it will in fact simplify FSDownload 
logic. Adding unique number generator (AtomicLong) to LocalResourcesTrackerImpl 
so that random (in our case now unique) number generation will be centralized 
for public, private as well as application cache files.

 Race in localization can cause containers to fail
 -

 Key: YARN-112
 URL: https://issues.apache.org/jira/browse/YARN-112
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: omkar vinit joshi
 Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, 
 yarn-112.20131503.patch


 On one of our 0.23 clusters, I saw a case of two containers, corresponding to 
 two map tasks of a MR job, that were launched almost simultaneously on the 
 same node.  It appears they both tried to localize job.jar and job.xml at the 
 same time.  One of the containers failed when it couldn't rename the 
 temporary job.jar directory to its final name because the target directory 
 wasn't empty.  Shortly afterwards the second container failed because job.xml 
 could not be found, presumably because the first container removed it when it 
 cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-112) Race in localization can cause containers to fail

2013-03-26 Thread omkar vinit joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

omkar vinit joshi updated YARN-112:
---

Attachment: yarn-112-20130326.patch

 Race in localization can cause containers to fail
 -

 Key: YARN-112
 URL: https://issues.apache.org/jira/browse/YARN-112
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: omkar vinit joshi
 Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, 
 yarn-112-20130326.patch, yarn-112.20131503.patch


 On one of our 0.23 clusters, I saw a case of two containers, corresponding to 
 two map tasks of a MR job, that were launched almost simultaneously on the 
 same node.  It appears they both tried to localize job.jar and job.xml at the 
 same time.  One of the containers failed when it couldn't rename the 
 temporary job.jar directory to its final name because the target directory 
 wasn't empty.  Shortly afterwards the second container failed because job.xml 
 could not be found, presumably because the first container removed it when it 
 cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed

2013-03-26 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-474:
-

Attachment: YARN-474.3.patch

@Vinod's comments are addressed the newest patch. In addition, I've tested the 
patch on one-node cluster, and seen it worked.

 CapacityScheduler does not activate applications when 
 maximum-am-resource-percent configuration is refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: YARN-474.1.patch, YARN-474.2.patch, YARN-474.3.patch


 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-440) Flatten RegisterNodeManagerResponse

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614792#comment-13614792
 ] 

Hudson commented on YARN-440:
-

Integrated in Hadoop-trunk-Commit #3531 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3531/])
YARN-440. Flatten RegisterNodeManagerResponse. Contributed by Xuan Gong. 
(Revision 1461256)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1461256
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/RegistrationResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/RegistrationResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestRMNMSecretKeys.java


 Flatten RegisterNodeManagerResponse
 ---

 Key: YARN-440
 URL: https://issues.apache.org/jira/browse/YARN-440
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Fix For: 2.0.5-beta

 Attachments: YARN-440.1.patch, YARN-440.2.patch, YARN-440.3.patch


 RegisterNodeManagerResponse has another wrapper RegistrationResponse under 
 it, which can be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-209) Capacity scheduler doesn't trigger app-activation after adding nodes

2013-03-26 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614819#comment-13614819
 ] 

Bikas Saha commented on YARN-209:
-

Patch looks good overall. I dont quite see what 
testActivatingPendingApplication() is buying us in its current form. If the 
leaf queue test fails before the fix and passes after it, then it should be 
enough IMO.

 Capacity scheduler doesn't trigger app-activation after adding nodes
 

 Key: YARN-209
 URL: https://issues.apache.org/jira/browse/YARN-209
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Fix For: 3.0.0

 Attachments: YARN-209.1.patch, YARN-209.2.patch, YARN-209.3.patch, 
 YARN-209-test.patch


 Say application A is submitted but at that time it does not meet the bar for 
 activation because of resource limit settings for applications. After that if 
 more hardware is added to the system and the application becomes valid it 
 still remains in pending state, likely forever.
 This might be rare to hit in real life because enough NM's heartbeat to the 
 RM before applications can get submitted. But a change in settings or 
 heartbeat interval might make it easier to repro. In RM restart scenarios, 
 this will likely hit more if its implemented by re-playing events and 
 re-submitting applications to the scheduler before the RPC to NM's is 
 activated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM

2013-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614820#comment-13614820
 ] 

Hadoop QA commented on YARN-309:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12575621/YARN-309.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/608//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/608//console

This message is automatically generated.

 Make RM provide heartbeat interval to NM
 

 Key: YARN-309
 URL: https://issues.apache.org/jira/browse/YARN-309
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-309.1.patch, YARN-309.2.patch, YARN-309.3.patch, 
 YARN-309.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-03-26 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614833#comment-13614833
 ] 

Bikas Saha commented on YARN-193:
-

I am not sure if the normalization errors should reach all the way to the 
RMAppAttemptImpl and cause failures. AM container request should be validated 
and normalized in ApplicationMasterService.submitApplication() as the first 
thing, even before sending it to RMAppManager. Task container requests should 
be validated in ApplicationMasterService.allocate() as the first thing before 
calling scheduler.allocate(). This is like a sanity check. This also ensures 
that we are not calling into the scheduler and changing its internal state (eg 
it could return completed container or newly allocated container which would be 
lost if we throw an exception).
RMAppAttempImpl could assert that the allocated container has same size as the 
requested container.

Normalization should simply cap the resource to the max allowed. Normalize can 
be called from anywhere and so its not necessary to always validate before 
normalizing. In fact we could choose to normalize requests  max to max instead 
of throwing an exception.

Validate should not throw an exception IMO. Its like a helper function that 
tell if the value is valid or not. Different users can choose to do different 
things based on the result of validate().

 Scheduler.normalizeRequest does not account for allocation requests that 
 exceed maximumAllocation limits 
 -

 Key: YARN-193
 URL: https://issues.apache.org/jira/browse/YARN-193
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
 MR-3796.wip.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, 
 YARN-193.7.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM

2013-03-26 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614847#comment-13614847
 ] 

Xuan Gong commented on YARN-309:


Test case fails because of localhost binding problem.
aused by: org.apache.hadoop.yarn.YarnException: java.net.BindException: Problem 
binding to [localhost:12345] java.net.BindException: Address already in use; 
For more details see:  http://wiki.apache.org/hadoop/BindException
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
at 
org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:63)
at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:52)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.start(ContainerManagerImpl.java:230)
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
... 11 more
Caused by: java.net.BindException: Problem binding to [localhost:12345] 
java.net.BindException: Address already in use; For more details see:  
http://wiki.apache.org/hadoop/BindException
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:716)
at org.apache.hadoop.ipc.Server.bind(Server.java:415)
at org.apache.hadoop.ipc.Server$Listener.init(Server.java:518)
at org.apache.hadoop.ipc.Server.init(Server.java:1962)
at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:986)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.init(ProtobufRpcEngine.java:427)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:402)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:829)
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169)
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
... 15 more

 Make RM provide heartbeat interval to NM
 

 Key: YARN-309
 URL: https://issues.apache.org/jira/browse/YARN-309
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-309.1.patch, YARN-309.2.patch, YARN-309.3.patch, 
 YARN-309.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM

2013-03-26 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614848#comment-13614848
 ] 

Xuan Gong commented on YARN-309:


And it is not introduced by this patch.

 Make RM provide heartbeat interval to NM
 

 Key: YARN-309
 URL: https://issues.apache.org/jira/browse/YARN-309
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-309.1.patch, YARN-309.2.patch, YARN-309.3.patch, 
 YARN-309.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-499) On container failure, include last n lines of logs in diagnostics

2013-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614863#comment-13614863
 ] 

Hadoop QA commented on YARN-499:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12575623/YARN-499.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  org.apache.hadoop.mapreduce.v2.app.TestRecovery
  
org.apache.hadoop.mapreduce.v2.app.job.impl.TestMapReduceChildJVM

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/606//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/606//console

This message is automatically generated.

 On container failure, include last n lines of logs in diagnostics
 -

 Key: YARN-499
 URL: https://issues.apache.org/jira/browse/YARN-499
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-499.patch


 When a container fails, the only way to diagnose it is to look at the logs.  
 ContainerStatuses include a diagnostic string that is reported back to the 
 resource manager by the node manager.
 Currently in MR2 I believe whatever is sent to the task's standard out is 
 added to the diagnostics string, but for MR standard out is redirected to a 
 file called stdout.  In MR1, this string was populated with the last few 
 lines of the task's stdout file, and got printed to the console, allowing for 
 easy debugging.
 Handling this would help to soothe the infuriating problem of an AM dying for 
 a mysterious reason before setting a tracking URL (MAPREDUCE-3688).
 This could be done in one of two ways.
 * Use tee to send MR's standard out to both the stdout file and standard out. 
  This requires modifying ShellCmdExecutor to roll what it reads in, as we 
 wouldn't want to be storing the entire task log in NM memory.
 * Read the task's log files.  This would require standardizing or making the 
 container log files configurable.  Right now the log files are determined in 
 userland and all that is YARN is aware of the log directory.
 Does this present any issues I'm not considering?  If so it this might only 
 be needed for AMs? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable

2013-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614871#comment-13614871
 ] 

Hadoop QA commented on YARN-24:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12575571/YARN-24-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/612//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/612//console

This message is automatically generated.

 Nodemanager fails to start if log aggregation enabled and namenode unavailable
 --

 Key: YARN-24
 URL: https://issues.apache.org/jira/browse/YARN-24
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Jason Lowe
Assignee: Sandy Ryza
 Attachments: YARN-24-1.patch, YARN-24-2.patch, YARN-24-3.patch, 
 YARN-24.patch


 If log aggregation is enabled and the namenode is currently unavailable, the 
 nodemanager fails to startup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2013-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614872#comment-13614872
 ] 

Hadoop QA commented on YARN-101:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12575611/YARN-101.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/611//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/611//console

This message is automatically generated.

 If  the heartbeat message loss, the nodestatus info of complete container 
 will loss too.
 

 Key: YARN-101
 URL: https://issues.apache.org/jira/browse/YARN-101
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: suse.
Reporter: xieguiming
Assignee: Xuan Gong
Priority: Minor
 Attachments: YARN-101.1.patch, YARN-101.2.patch


 see the red color:
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
  protected void startStatusUpdater() {
 new Thread(Node Status Updater) {
   @Override
   @SuppressWarnings(unchecked)
   public void run() {
 int lastHeartBeatID = 0;
 while (!isStopped) {
   // Send heartbeat
   try {
 synchronized (heartbeatMonitor) {
   heartbeatMonitor.wait(heartBeatInterval);
 }
 {color:red} 
 // Before we send the heartbeat, we get the NodeStatus,
 // whose method removes completed containers.
 NodeStatus nodeStatus = getNodeStatus();
  {color}
 nodeStatus.setResponseId(lastHeartBeatID);
 
 NodeHeartbeatRequest request = recordFactory
 .newRecordInstance(NodeHeartbeatRequest.class);
 request.setNodeStatus(nodeStatus);   
 {color:red} 
// But if the nodeHeartbeat fails, we've already removed the 
 containers away to know about it. We aren't handling a nodeHeartbeat failure 
 case here.
 HeartbeatResponse response =
   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
{color} 
 if (response.getNodeAction() == NodeAction.SHUTDOWN) {
   LOG
   .info(Recieved SHUTDOWN signal from Resourcemanager as 
 part of heartbeat, +
hence shutting down.);
   NodeStatusUpdaterImpl.this.stop();
   break;
 }
 if (response.getNodeAction() == NodeAction.REBOOT) {
   LOG.info(Node is out of sync with ResourceManager,
   +  hence rebooting.);
   NodeStatusUpdaterImpl.this.reboot();
   break;
 }
 lastHeartBeatID = response.getResponseId();
 ListContainerId containersToCleanup = response
 .getContainersToCleanupList();
 if (containersToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedContainersEvent(containersToCleanup));
 }
 ListApplicationId appsToCleanup =
 response.getApplicationsToCleanupList();
 //Only start tracking for keepAlive on FINISH_APP
 trackAppsForKeepAlive(appsToCleanup);
 if (appsToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedAppsEvent(appsToCleanup));
 }
   } catch (Throwable e) {
 // TODO Better error handling. Thread can die with the rest of the
 // NM still running.
 LOG.error(Caught exception in status-updater, 

[jira] [Commented] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable

2013-03-26 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614873#comment-13614873
 ] 

Sandy Ryza commented on YARN-24:


Verified the updated patch on a pseudo-distributed cluster as well.

 Nodemanager fails to start if log aggregation enabled and namenode unavailable
 --

 Key: YARN-24
 URL: https://issues.apache.org/jira/browse/YARN-24
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Jason Lowe
Assignee: Sandy Ryza
 Attachments: YARN-24-1.patch, YARN-24-2.patch, YARN-24-3.patch, 
 YARN-24.patch


 If log aggregation is enabled and the namenode is currently unavailable, the 
 nodemanager fails to startup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-474) CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed

2013-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614882#comment-13614882
 ] 

Hadoop QA commented on YARN-474:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12575633/YARN-474.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/613//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/613//console

This message is automatically generated.

 CapacityScheduler does not activate applications when 
 maximum-am-resource-percent configuration is refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: YARN-474.1.patch, YARN-474.2.patch, YARN-474.3.patch


 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-474) CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed

2013-03-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614886#comment-13614886
 ] 

Vinod Kumar Vavilapalli commented on YARN-474:
--

The latest patch looks good, I am checking it in.

 CapacityScheduler does not activate applications when 
 maximum-am-resource-percent configuration is refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: YARN-474.1.patch, YARN-474.2.patch, YARN-474.3.patch


 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-474) CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed

2013-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614890#comment-13614890
 ] 

Hudson commented on YARN-474:
-

Integrated in Hadoop-trunk-Commit #3532 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3532/])
YARN-474. Fix CapacityScheduler to trigger application-activation when 
am-resource-percent configuration is refreshed. Contributed by Zhijie Shen. 
(Revision 1461402)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1461402
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java


 CapacityScheduler does not activate applications when 
 maximum-am-resource-percent configuration is refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Fix For: 2.0.5-beta

 Attachments: YARN-474.1.patch, YARN-474.2.patch, YARN-474.3.patch


 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-157) The option shell_command and shell_script have conflict

2013-03-26 Thread rainy Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614922#comment-13614922
 ] 

rainy Yu commented on YARN-157:
---

I can't commit Attachments. My patch is:
Index: 
src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
===
--- 
src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java  
(revision 90765)
+++ 
src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java  
(working copy)
@@ -140,8 +140,8 @@
   // Main class to invoke application master
   private String appMasterMainClass = ;

-  // Shell command to be executed 
-  private String shellCommand = ; 
+  // Shell command to be executed. the Linux shell command '/bin/sh' is default
+  private String shellCommand = /bin/sh; 
   // Location of shell script
   private String shellScriptPath = ;
   // Args to be passed to the shell command
@@ -276,10 +276,11 @@
 appMasterMainClass = cliParser.getOptionValue(class,
 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster);

-if (!cliParser.hasOption(shell_command)) {
-  throw new IllegalArgumentException(No shell command specified to be 
executed by application master);
+if (cliParser.hasOption(shell_command)) {
+  //throw new IllegalArgumentException(No shell command specified to be 
executed by application master);
+   shellCommand = cliParser.getOptionValue(shell_command);
 }
-shellCommand = cliParser.getOptionValue(shell_command);
+//shellCommand = cliParser.getOptionValue(shell_command);

 if (cliParser.hasOption(shell_script)) {
   shellScriptPath = cliParser.getOptionValue(shell_script);


 The option shell_command and shell_script have conflict
 ---

 Key: YARN-157
 URL: https://issues.apache.org/jira/browse/YARN-157
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.0.1-alpha
Reporter: Li Ming
  Labels: patch

 The DistributedShell has an option shell_script to let user specify a shell 
 script which will be executed in containers. But the issue is that the 
 shell_command option is a must, so if both options are set, then every 
 container executor will end with exitCode=1. This is because DistributedShell 
 executes the shell_command and shell_script together. For example, if 
 shell_command is 'date' then the final command to be executed in container is 
 date `ExecShellScript.sh`, so the date command will treat the result of 
 ExecShellScript.sh as its parameter, then there will be an error. 
 To solve this, the DistributedShell should not use the value of shell_command 
 option when the shell_script option is set, and the shell_command option also 
 should not be mandatory. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-112) Race in localization can cause containers to fail

2013-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614929#comment-13614929
 ] 

Hadoop QA commented on YARN-112:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12575629/yarn-112-20130326.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/614//console

This message is automatically generated.

 Race in localization can cause containers to fail
 -

 Key: YARN-112
 URL: https://issues.apache.org/jira/browse/YARN-112
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: omkar vinit joshi
 Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, 
 yarn-112-20130326.patch, yarn-112.20131503.patch


 On one of our 0.23 clusters, I saw a case of two containers, corresponding to 
 two map tasks of a MR job, that were launched almost simultaneously on the 
 same node.  It appears they both tried to localize job.jar and job.xml at the 
 same time.  One of the containers failed when it couldn't rename the 
 temporary job.jar directory to its final name because the target directory 
 wasn't empty.  Shortly afterwards the second container failed because job.xml 
 could not be found, presumably because the first container removed it when it 
 cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-209) Capacity scheduler doesn't trigger app-activation after adding nodes

2013-03-26 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-209:
-

Attachment: YARN-209.4.patch

The log statement is removed and testActivatingPendingApplication is moved to 
TestRM and enhanced by checking the status before NM is added.

@Bikas, I agree TestLeafQueue only is enough to verify the bug, but I think the 
test case that you provided before is valuable. Therefore, I included and 
updated it as a showing case of activating pending applications by adding more 
nodemanagers.



 Capacity scheduler doesn't trigger app-activation after adding nodes
 

 Key: YARN-209
 URL: https://issues.apache.org/jira/browse/YARN-209
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Fix For: 3.0.0

 Attachments: YARN-209.1.patch, YARN-209.2.patch, YARN-209.3.patch, 
 YARN-209.4.patch, YARN-209-test.patch


 Say application A is submitted but at that time it does not meet the bar for 
 activation because of resource limit settings for applications. After that if 
 more hardware is added to the system and the application becomes valid it 
 still remains in pending state, likely forever.
 This might be rare to hit in real life because enough NM's heartbeat to the 
 RM before applications can get submitted. But a change in settings or 
 heartbeat interval might make it easier to repro. In RM restart scenarios, 
 this will likely hit more if its implemented by re-playing events and 
 re-submitting applications to the scheduler before the RPC to NM's is 
 activated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-209) Capacity scheduler doesn't trigger app-activation after adding nodes

2013-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614953#comment-13614953
 ] 

Hadoop QA commented on YARN-209:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12575660/YARN-209.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/615//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/615//console

This message is automatically generated.

 Capacity scheduler doesn't trigger app-activation after adding nodes
 

 Key: YARN-209
 URL: https://issues.apache.org/jira/browse/YARN-209
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Fix For: 3.0.0

 Attachments: YARN-209.1.patch, YARN-209.2.patch, YARN-209.3.patch, 
 YARN-209.4.patch, YARN-209-test.patch


 Say application A is submitted but at that time it does not meet the bar for 
 activation because of resource limit settings for applications. After that if 
 more hardware is added to the system and the application becomes valid it 
 still remains in pending state, likely forever.
 This might be rare to hit in real life because enough NM's heartbeat to the 
 RM before applications can get submitted. But a change in settings or 
 heartbeat interval might make it easier to repro. In RM restart scenarios, 
 this will likely hit more if its implemented by re-playing events and 
 re-submitting applications to the scheduler before the RPC to NM's is 
 activated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-474) CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed

2013-03-26 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614961#comment-13614961
 ] 

Konstantin Boudnik commented on YARN-474:
-

It seems that this commit has broken the [build of branch-2 
|https://builds.apache.org/view/Hadoop/job/Hadoop-branch2/4/console]

 CapacityScheduler does not activate applications when 
 maximum-am-resource-percent configuration is refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Fix For: 2.0.5-beta

 Attachments: YARN-474.1.patch, YARN-474.2.patch, YARN-474.3.patch


 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-474) CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed

2013-03-26 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614962#comment-13614962
 ] 

Konstantin Boudnik commented on YARN-474:
-

Here's the error message:

{noformat}
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-branch2/branch-2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java:[1610,10]
 cannot find symbol
[ERROR] symbol  : method setDouble(java.lang.String,float)
[ERROR] location: class 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration
{noformat}

 CapacityScheduler does not activate applications when 
 maximum-am-resource-percent configuration is refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Fix For: 2.0.5-beta

 Attachments: YARN-474.1.patch, YARN-474.2.patch, YARN-474.3.patch


 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira