[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-18: --- Attachment: YARN-18-v5.patch Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-109) .tmp file is not deleted for localized archives
[ https://issues.apache.org/jira/browse/YARN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613676#comment-13613676 ] Hudson commented on YARN-109: - Integrated in Hadoop-Yarn-trunk #167 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/167/]) YARN-109. .tmp file is not deleted for localized archives (Mayank Bansal via bobby) (Revision 1460723) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460723 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java .tmp file is not deleted for localized archives --- Key: YARN-109 URL: https://issues.apache.org/jira/browse/YARN-109 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Assignee: Mayank Bansal Fix For: 3.0.0, 0.23.7, 2.0.5-beta Attachments: YARN-109-trunk-1.patch, YARN-109-trunk-2.patch, YARN-109-trunk-3.patch, YARN-109-trunk-4.patch, YARN-109-trunk-5.patch, YARN-109-trunk.patch When archives are localized they are initially created as a .tmp file and unpacked from that file. However the .tmp file is not deleted afterwards. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-498) Unmanaged AM launcher does not set various constants in env for an AM, also does not handle failed AMs properly
[ https://issues.apache.org/jira/browse/YARN-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613679#comment-13613679 ] Hudson commented on YARN-498: - Integrated in Hadoop-Yarn-trunk #167 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/167/]) YARN-498. Unmanaged AM launcher does not set various constants in env for an AM, also does not handle failed AMs properly (Hitesh Shah via bikas) (Revision 1460954) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460954 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java Unmanaged AM launcher does not set various constants in env for an AM, also does not handle failed AMs properly --- Key: YARN-498 URL: https://issues.apache.org/jira/browse/YARN-498 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: YARN-498.1.patch, YARN-498.2.patch, YARN-498.3.patch, YARN-498.4.patch, YARN-498.wip.patch Currently, it only sets the app attempt id which is really not required as AMs are only expected to extract it from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-497) Yarn unmanaged-am launcher jar does not define a main class in its manifest
[ https://issues.apache.org/jira/browse/YARN-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613682#comment-13613682 ] Hudson commented on YARN-497: - Integrated in Hadoop-Yarn-trunk #167 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/167/]) YARN-497. Yarn unmanaged-am launcher jar does not define a main class in its manifest (Hitesh Shah via bikas) (Revision 1460846) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460846 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/pom.xml Yarn unmanaged-am launcher jar does not define a main class in its manifest --- Key: YARN-497 URL: https://issues.apache.org/jira/browse/YARN-497 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Labels: usability Attachments: YARN-497.1.patch The jar should have a mainClass defined to make it easier to use with the hadoop jar command. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-469) Make scheduling mode in FS pluggable
[ https://issues.apache.org/jira/browse/YARN-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613678#comment-13613678 ] Hudson commented on YARN-469: - Integrated in Hadoop-Yarn-trunk #167 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/167/]) YARN-469. Make scheduling mode in FS pluggable. (kkambatl via tucu) (Revision 1460961) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460961 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingAlgorithms.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingMode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes/FairSchedulingMode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes/FifoSchedulingMode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestComputeFairShares.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestSchedulingMode.java Make scheduling mode in FS pluggable Key: YARN-469 URL: https://issues.apache.org/jira/browse/YARN-469 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: scheduler Fix For: 2.0.5-beta Attachments: yarn-469.patch, yarn-469.patch, yarn-469.patch, yarn-469.patch, yarn-469.patch Currently, scheduling mode in FS is limited to Fair and FIFO. The code typically has an if condition at multiple places to determine the correct course of action. Making the scheduling mode pluggable helps in simplifying this process, particularly as we add new modes (DRF in this case). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-439) Flatten NodeHeartbeatResponse
[ https://issues.apache.org/jira/browse/YARN-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613685#comment-13613685 ] Hudson commented on YARN-439: - Integrated in Hadoop-Yarn-trunk #167 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/167/]) YARN-439. Flatten NodeHeartbeatResponse. Contributed by Xuan Gong. (Revision 1460811) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460811 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/HeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/HeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestRecordFactory.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java *
[jira] [Commented] (YARN-109) .tmp file is not deleted for localized archives
[ https://issues.apache.org/jira/browse/YARN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613732#comment-13613732 ] Hudson commented on YARN-109: - Integrated in Hadoop-Hdfs-0.23-Build #565 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/565/]) svn merge -c 1460723 FIXES: YARN-109. .tmp file is not deleted for localized archives (Mayank Bansal via bobby) (Revision 1460734) Result = UNSTABLE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460734 Files : * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java .tmp file is not deleted for localized archives --- Key: YARN-109 URL: https://issues.apache.org/jira/browse/YARN-109 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Assignee: Mayank Bansal Fix For: 3.0.0, 0.23.7, 2.0.5-beta Attachments: YARN-109-trunk-1.patch, YARN-109-trunk-2.patch, YARN-109-trunk-3.patch, YARN-109-trunk-4.patch, YARN-109-trunk-5.patch, YARN-109-trunk.patch When archives are localized they are initially created as a .tmp file and unpacked from that file. However the .tmp file is not deleted afterwards. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-498) Unmanaged AM launcher does not set various constants in env for an AM, also does not handle failed AMs properly
[ https://issues.apache.org/jira/browse/YARN-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613752#comment-13613752 ] Hudson commented on YARN-498: - Integrated in Hadoop-Hdfs-trunk #1356 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/]) YARN-498. Unmanaged AM launcher does not set various constants in env for an AM, also does not handle failed AMs properly (Hitesh Shah via bikas) (Revision 1460954) Result = FAILURE bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460954 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java Unmanaged AM launcher does not set various constants in env for an AM, also does not handle failed AMs properly --- Key: YARN-498 URL: https://issues.apache.org/jira/browse/YARN-498 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: YARN-498.1.patch, YARN-498.2.patch, YARN-498.3.patch, YARN-498.4.patch, YARN-498.wip.patch Currently, it only sets the app attempt id which is really not required as AMs are only expected to extract it from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-109) .tmp file is not deleted for localized archives
[ https://issues.apache.org/jira/browse/YARN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613749#comment-13613749 ] Hudson commented on YARN-109: - Integrated in Hadoop-Hdfs-trunk #1356 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/]) YARN-109. .tmp file is not deleted for localized archives (Mayank Bansal via bobby) (Revision 1460723) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460723 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java .tmp file is not deleted for localized archives --- Key: YARN-109 URL: https://issues.apache.org/jira/browse/YARN-109 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Assignee: Mayank Bansal Fix For: 3.0.0, 0.23.7, 2.0.5-beta Attachments: YARN-109-trunk-1.patch, YARN-109-trunk-2.patch, YARN-109-trunk-3.patch, YARN-109-trunk-4.patch, YARN-109-trunk-5.patch, YARN-109-trunk.patch When archives are localized they are initially created as a .tmp file and unpacked from that file. However the .tmp file is not deleted afterwards. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-497) Yarn unmanaged-am launcher jar does not define a main class in its manifest
[ https://issues.apache.org/jira/browse/YARN-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613755#comment-13613755 ] Hudson commented on YARN-497: - Integrated in Hadoop-Hdfs-trunk #1356 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/]) YARN-497. Yarn unmanaged-am launcher jar does not define a main class in its manifest (Hitesh Shah via bikas) (Revision 1460846) Result = FAILURE bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460846 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/pom.xml Yarn unmanaged-am launcher jar does not define a main class in its manifest --- Key: YARN-497 URL: https://issues.apache.org/jira/browse/YARN-497 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Labels: usability Attachments: YARN-497.1.patch The jar should have a mainClass defined to make it easier to use with the hadoop jar command. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613747#comment-13613747 ] Hudson commented on YARN-71: Integrated in Hadoop-Hdfs-trunk #1356 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/]) YARN-71. Fix the NodeManager to clean up local-dirs on restart. Contributed by Xuan Gong. (Revision 1460808) Result = FAILURE sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460808 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerReboot.java Ensure/confirm that the NodeManager cleans up local-dirs on restart --- Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Critical Fix For: 2.0.5-beta Attachments: YARN-71.10.patch, YARN-71.11.patch, YARN-71.12.patch, YARN-71.13.patch, YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch, YARN-71.8.patch, YARN-71.9.patch We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-439) Flatten NodeHeartbeatResponse
[ https://issues.apache.org/jira/browse/YARN-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613758#comment-13613758 ] Hudson commented on YARN-439: - Integrated in Hadoop-Hdfs-trunk #1356 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/]) YARN-439. Flatten NodeHeartbeatResponse. Contributed by Xuan Gong. (Revision 1460811) Result = FAILURE sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460811 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/HeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/HeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestRecordFactory.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java *
[jira] [Commented] (YARN-469) Make scheduling mode in FS pluggable
[ https://issues.apache.org/jira/browse/YARN-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613751#comment-13613751 ] Hudson commented on YARN-469: - Integrated in Hadoop-Hdfs-trunk #1356 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/]) YARN-469. Make scheduling mode in FS pluggable. (kkambatl via tucu) (Revision 1460961) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460961 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingAlgorithms.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingMode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes/FairSchedulingMode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes/FifoSchedulingMode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestComputeFairShares.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestSchedulingMode.java Make scheduling mode in FS pluggable Key: YARN-469 URL: https://issues.apache.org/jira/browse/YARN-469 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: scheduler Fix For: 2.0.5-beta Attachments: yarn-469.patch, yarn-469.patch, yarn-469.patch, yarn-469.patch, yarn-469.patch Currently, scheduling mode in FS is limited to Fair and FIFO. The code typically has an if condition at multiple places to determine the correct course of action. Making the scheduling mode pluggable helps in simplifying this process, particularly as we add new modes (DRF in this case). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613757#comment-13613757 ] Hudson commented on YARN-378: - Integrated in Hadoop-Hdfs-trunk #1356 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1356/]) YARN-378. Fix RM to make the AM max attempts/retries to be configurable per application by clients. Contributed by Zhijie Shen. (Revision 1460895) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460895 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Fix For: 2.0.5-beta Attachments: YARN-378_10.patch, YARN-378_11.patch, YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch, YARN-378_8.patch, YARN-378_9.patch, YARN_378-final-commit.patch, YARN-378_MAPREDUCE-5062.2.patch, YARN-378_MAPREDUCE-5062.patch We should support that different client or user have different ApplicationMaster retry times. It also say that
[jira] [Commented] (YARN-498) Unmanaged AM launcher does not set various constants in env for an AM, also does not handle failed AMs properly
[ https://issues.apache.org/jira/browse/YARN-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614018#comment-13614018 ] Hudson commented on YARN-498: - Integrated in Hadoop-Mapreduce-trunk #1384 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1384/]) YARN-498. Unmanaged AM launcher does not set various constants in env for an AM, also does not handle failed AMs properly (Hitesh Shah via bikas) (Revision 1460954) Result = FAILURE bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460954 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java Unmanaged AM launcher does not set various constants in env for an AM, also does not handle failed AMs properly --- Key: YARN-498 URL: https://issues.apache.org/jira/browse/YARN-498 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: YARN-498.1.patch, YARN-498.2.patch, YARN-498.3.patch, YARN-498.4.patch, YARN-498.wip.patch Currently, it only sets the app attempt id which is really not required as AMs are only expected to extract it from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-469) Make scheduling mode in FS pluggable
[ https://issues.apache.org/jira/browse/YARN-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614017#comment-13614017 ] Hudson commented on YARN-469: - Integrated in Hadoop-Mapreduce-trunk #1384 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1384/]) YARN-469. Make scheduling mode in FS pluggable. (kkambatl via tucu) (Revision 1460961) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460961 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingAlgorithms.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingMode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes/FairSchedulingMode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/modes/FifoSchedulingMode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestComputeFairShares.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestSchedulingMode.java Make scheduling mode in FS pluggable Key: YARN-469 URL: https://issues.apache.org/jira/browse/YARN-469 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: scheduler Fix For: 2.0.5-beta Attachments: yarn-469.patch, yarn-469.patch, yarn-469.patch, yarn-469.patch, yarn-469.patch Currently, scheduling mode in FS is limited to Fair and FIFO. The code typically has an if condition at multiple places to determine the correct course of action. Making the scheduling mode pluggable helps in simplifying this process, particularly as we add new modes (DRF in this case). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-497) Yarn unmanaged-am launcher jar does not define a main class in its manifest
[ https://issues.apache.org/jira/browse/YARN-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614021#comment-13614021 ] Hudson commented on YARN-497: - Integrated in Hadoop-Mapreduce-trunk #1384 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1384/]) YARN-497. Yarn unmanaged-am launcher jar does not define a main class in its manifest (Hitesh Shah via bikas) (Revision 1460846) Result = FAILURE bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460846 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/pom.xml Yarn unmanaged-am launcher jar does not define a main class in its manifest --- Key: YARN-497 URL: https://issues.apache.org/jira/browse/YARN-497 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Labels: usability Attachments: YARN-497.1.patch The jar should have a mainClass defined to make it easier to use with the hadoop jar command. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-439) Flatten NodeHeartbeatResponse
[ https://issues.apache.org/jira/browse/YARN-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614024#comment-13614024 ] Hudson commented on YARN-439: - Integrated in Hadoop-Mapreduce-trunk #1384 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1384/]) YARN-439. Flatten NodeHeartbeatResponse. Contributed by Xuan Gong. (Revision 1460811) Result = FAILURE sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1460811 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/HeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/HeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestRecordFactory.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java *
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614036#comment-13614036 ] Robert Joseph Evans commented on YARN-378: -- Hitesh and Vinod, It is not a big deal. I realized that both were going in, and I am glad that this is ready and has gone in. It is a great feature. It just would have been nice to either commit them at the same time, or give a heads up on the mailing list that you were going to break the build for a little while. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Fix For: 2.0.5-beta Attachments: YARN-378_10.patch, YARN-378_11.patch, YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch, YARN-378_8.patch, YARN-378_9.patch, YARN_378-final-commit.patch, YARN-378_MAPREDUCE-5062.2.patch, YARN-378_MAPREDUCE-5062.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-7) Add support for DistributedShell to ask for CPUs along with memory
[ https://issues.apache.org/jira/browse/YARN-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614051#comment-13614051 ] Hadoop QA commented on YARN-7: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575494/YARN-7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/602//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/602//console This message is automatically generated. Add support for DistributedShell to ask for CPUs along with memory -- Key: YARN-7 URL: https://issues.apache.org/jira/browse/YARN-7 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Arun C Murthy Labels: patch Attachments: YARN-7.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-7) Add support for DistributedShell to ask for CPUs along with memory
[ https://issues.apache.org/jira/browse/YARN-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reassigned YARN-7: - Assignee: Junping Du Add support for DistributedShell to ask for CPUs along with memory -- Key: YARN-7 URL: https://issues.apache.org/jira/browse/YARN-7 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Arun C Murthy Assignee: Junping Du Labels: patch Attachments: YARN-7.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-193: - Attachment: YARN-193.6.patch Based on @Hitesh's previous patch, I've made the following changes in the newest one: 1. Modify the boundary case of judging valid resource value ( 0 = = 0). 2. maxMem doesn't need to be the multiple times of minMem. 3. To fix YARN-382, in RMAppManager, AM CLC still need to be updated after request normalization is executed, such that AM CLC knows the updated resource if possible, which will be equal to the resource of the allocated container. To ensure the equivalence, assert is added in RMAppAttemptImpl$AMContainerAllocatedTransition. Changes in YARN-370 is also reverted. Therefore, if this jira is fixed, YARN-382 can be fixed as well. 4. InvalidResourceException, which is extended from IOException, is created and used when the requested resource is invalid in terms of its numbers. Modify the related functions to either throw or capture the exceptions. In particular, in the transitions of RMAppAttemptImpl, when the exception is captured the attempt will transit to FAILED state. When YARN-142 gets fixed, the customized exception need to be updated. 5. Reorganize the code. 6. Add more test cases. Comments, please. Thanks! Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614355#comment-13614355 ] Arun C Murthy commented on YARN-18: --- Sorry, I'm just getting to this. This is a lot to digest. Can we consider breaking this down to a couple of smaller patches? Tx. Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-101: -- Assignee: Xuan Gong If the heartbeat message loss, the nodestatus info of complete container will loss too. Key: YARN-101 URL: https://issues.apache.org/jira/browse/YARN-101 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: suse. Reporter: xieguiming Assignee: Xuan Gong Priority: Minor Attachments: YARN-101.1.patch see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread(Node Status Updater) { @Override @SuppressWarnings(unchecked) public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info(Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat, + hence shutting down.); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info(Node is out of sync with ResourceManager, + hence rebooting.); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error(Caught exception in status-updater, e); } } } }.start(); } private NodeStatus getNodeStatus() { NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); nodeStatus.setNodeId(this.nodeId); int numActiveContainers = 0; ListContainerStatus containersStatuses = new ArrayListContainerStatus(); for (IteratorEntryContainerId, Container i = this.context.getContainers().entrySet().iterator(); i.hasNext();) { EntryContainerId, Container e = i.next(); ContainerId containerId = e.getKey(); Container container = e.getValue(); // Clone the container to send it to the RM org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = container.cloneAndGetContainerStatus(); containersStatuses.add(containerStatus); ++numActiveContainers; LOG.info(Sending out status for container: + containerStatus); {color:red} // Here is the part that removes the completed containers. if (containerStatus.getState() == ContainerState.COMPLETE) { // Remove i.remove(); {color} LOG.info(Removed completed container + containerId); } } nodeStatus.setContainersStatuses(containersStatuses); LOG.debug(this.nodeId + sending out status for + numActiveContainers + containers); NodeHealthStatus nodeHealthStatus =
[jira] [Commented] (YARN-440) Flatten RegisterNodeManagerResponse
[ https://issues.apache.org/jira/browse/YARN-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614381#comment-13614381 ] Siddharth Seth commented on YARN-440: - +1. Committing. Flatten RegisterNodeManagerResponse --- Key: YARN-440 URL: https://issues.apache.org/jira/browse/YARN-440 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Xuan Gong Attachments: YARN-440.1.patch, YARN-440.2.patch, YARN-440.3.patch RegisterNodeManagerResponse has another wrapper RegistrationResponse under it, which can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-98) NM Application invalid state transition on reboot command from RM
[ https://issues.apache.org/jira/browse/YARN-98?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] omkar vinit joshi reassigned YARN-98: - Assignee: omkar vinit joshi NM Application invalid state transition on reboot command from RM - Key: YARN-98 URL: https://issues.apache.org/jira/browse/YARN-98 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Thomas Graves Assignee: omkar vinit joshi If the RM goes down and comes back up, it tells the NM to reboot. When the NM reboots, if it has any applications it aggregates the logs for those applications, then it transitions the app to APPLICATION_LOG_HANDLING_FINISHED. I saw a case where there was an app that was in the RUNNING state and tried to transition to APPLICATION_LOG_HANDLING_finished and it got the invalid transition. [DeletionService #1]2012-04-11 15:12:40,476 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state [AsyncDispatcher event handler]org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:382) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:517) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:509) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74) at java.lang.Thread.run(Thread.java:619) 2012-04-11 15:12:40,476 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1333003059741_15999 transitioned from RUNNING to null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed
[ https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-474: - Attachment: YARN-474.2.patch Separate the fix for the specific problem of YARN-474 and that of YARN-209, to make the other issue independent, though the two issues share the same root cause. CapacityScheduler does not activate applications when configuration is refreshed Key: YARN-474 URL: https://issues.apache.org/jira/browse/YARN-474 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: YARN-474.1.patch, YARN-474.2.patch Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-209) Capacity scheduler can leave application in pending state
[ https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-209: - Attachment: YARN-209.3.patch Extract the fixing code specifically for this issue from YARN-474, to make this issue unblocked. @Bikas' end-to-end test case is retained but simplified, because it is good example to demonstrate the problem described here. Capacity scheduler can leave application in pending state - Key: YARN-209 URL: https://issues.apache.org/jira/browse/YARN-209 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Zhijie Shen Fix For: 3.0.0 Attachments: YARN-209.1.patch, YARN-209.2.patch, YARN-209.3.patch, YARN-209-test.patch Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever. This might be rare to hit in real life because enough NM's heartbeat to the RM before applications can get submitted. But a change in settings or heartbeat interval might make it easier to repro. In RM restart scenarios, this will likely hit more if its implemented by re-playing events and re-submitting applications to the scheduler before the RPC to NM's is activated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614402#comment-13614402 ] Robert Joseph Evans commented on YARN-112: -- I am not really sure that we fixed the underlying issue. {code}files.rename(dst_work, destDirPath, Rename.OVERWRITE);{code} threw an exception because there was something else in that directory already, but files.mkdir(destDirPath, cachePerms, false) is supposed to throw a FileAlreadyExistsException if the directory already exists. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileContext.html#mkdir%28org.apache.hadoop.fs.Path,%20org.apache.hadoop.fs.permission.FsPermission,%20boolean%29 files.rename should never get into this situation if files.rename threw the exception when it was supposed to. I tested this and {code} FileContext lfc = FileContext.getLocalFSFileContext(new Configuration()); Path p = new Path(/tmp/bobby.12345); FsPermission cachePerms = new FsPermission((short) 0755); lfc.mkdir(p, cachePerms, false); lfc.mkdir(p, cachePerms, false); {code} never throws an exception. We first need to address the bug in FileContext, and then we can look at how we can make FSDownload deal with mkdir throwing an exception, or whatever the fix ends up being. I filed HADOOP-9438 for this. If the fix ends up being that we do not support throwing the exception in FileContext, then your current solution looks OK. I also have a hard time believing that we are getting random collisions on a long value that should be fairly uniformly distributed. We need to guard against it either way and I suppose it is possible, but if I remember correctly we were seeing a significant number of these errors and my gut tells me that there is either something very wrong with Random, or there is something else also going on here. Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: omkar vinit joshi Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112.20131503.patch On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-209) Capacity scheduler can leave application in pending state
[ https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614420#comment-13614420 ] Hadoop QA commented on YARN-209: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575556/YARN-209.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/605//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/605//console This message is automatically generated. Capacity scheduler can leave application in pending state - Key: YARN-209 URL: https://issues.apache.org/jira/browse/YARN-209 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Zhijie Shen Fix For: 3.0.0 Attachments: YARN-209.1.patch, YARN-209.2.patch, YARN-209.3.patch, YARN-209-test.patch Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever. This might be rare to hit in real life because enough NM's heartbeat to the RM before applications can get submitted. But a change in settings or heartbeat interval might make it easier to repro. In RM restart scenarios, this will likely hit more if its implemented by re-playing events and re-submitting applications to the scheduler before the RPC to NM's is activated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed
[ https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614421#comment-13614421 ] Hadoop QA commented on YARN-474: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1257/YARN-474.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/604//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/604//console This message is automatically generated. CapacityScheduler does not activate applications when configuration is refreshed Key: YARN-474 URL: https://issues.apache.org/jira/browse/YARN-474 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: YARN-474.1.patch, YARN-474.2.patch Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-172) AM logs link in RM ui redirects back to RM if AM not started
[ https://issues.apache.org/jira/browse/YARN-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-172: - Labels: usability (was: ) AM logs link in RM ui redirects back to RM if AM not started Key: YARN-172 URL: https://issues.apache.org/jira/browse/YARN-172 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.3 Reporter: Thomas Graves Labels: usability I went to the RM UI app page for an application that failed to start with the error: org.apache.hadoop.security.AccessControlException: User user cannot submit applications to queue root.foo I tried to click on the AM logs link and it just redirected me back to the RM page. if the AM didn't start we shouldn't show an attempt there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-20) More information for yarn.resourcemanager.webapp.address in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-20: Component/s: documentation More information for yarn.resourcemanager.webapp.address in yarn-default.xml -- Key: YARN-20 URL: https://issues.apache.org/jira/browse/YARN-20 Project: Hadoop YARN Issue Type: Improvement Components: documentation, resourcemanager Affects Versions: 2.0.0-alpha Reporter: nemon lou Priority: Trivial Attachments: YARN-20.patch Original Estimate: 1h Remaining Estimate: 1h The parameter yarn.resourcemanager.webapp.address in yarn-default.xml is in host:port format,which is noted in the cluster set up guide (http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html). When i read though the code,i find host format is also supported. In host format,the port will be random. So we may add more documentation in yarn-default.xml for easy understood. I will submit a patch if it's helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-432) Documentation for Log Aggregation and log retrieval.
[ https://issues.apache.org/jira/browse/YARN-432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-432: - Component/s: documentation Documentation for Log Aggregation and log retrieval. Key: YARN-432 URL: https://issues.apache.org/jira/browse/YARN-432 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Mahadev konar Assignee: Siddharth Seth Retrieving logs in 0.23 is very different from what 0.20.* does. This is a very new feature which will require good documentation for users to get used to it. Lets make sure we have some solid documentation for this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable
[ https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614486#comment-13614486 ] Sandy Ryza commented on YARN-24: Thinking about this a little more, I didn't see a strong reason not to verify the root log dir each time. This makes the log aggregation service resilient to the root directory being deleted or chmoded while a nodemanager is running. Uploaded a new patch that does this. Nodemanager fails to start if log aggregation enabled and namenode unavailable -- Key: YARN-24 URL: https://issues.apache.org/jira/browse/YARN-24 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Assignee: Sandy Ryza Attachments: YARN-24-1.patch, YARN-24-2.patch, YARN-24-3.patch, YARN-24.patch If log aggregation is enabled and the namenode is currently unavailable, the nodemanager fails to startup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-503) DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false
[ https://issues.apache.org/jira/browse/YARN-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614502#comment-13614502 ] Daryn Sharp commented on YARN-503: -- bq. Will the case of MR actions be an issue? that the launcher goes away? No, the central focus of this patch is to keep tokens alive as long as _at least one job_ is using the tokens. Upon job submission, the new app is immediately linked against the tokens. So for an oozie action, it's ok for the launcher to exit after submitting an action. The tokens will stay alive until the action, and any sub-jobs it may have launched, have completed. After no app is running with the tokens, and the keepalive expires, the tokens are cancelled. Note that by default I maintained 100% backwards compat in that tokens for oozie jobs setting the mapreduce.job.complete.cancel.delegation.tokens=false will never be cancelled. The RM will stop renewing them and won't issue duplicate renews. Until we deprecate/remove the setting, we may internally try make the conf setting a final to see what happens. Will address findbugs after some webhdfs firefighting. DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false -- Key: YARN-503 URL: https://issues.apache.org/jira/browse/YARN-503 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha Reporter: Siddharth Seth Assignee: Daryn Sharp Attachments: YARN-503.patch The first Job/App to register a token is the one which DelegationTokenRenewer associates with a a specific Token. An attempt to remove/cancel these shared tokens by subsequent jobs doesn't work - since the JobId will not match. As a result, Even if subsequent jobs have MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN set to true - tokens will not be cancelled when those jobs complete. Tokens will eventually be removed from the RM / JT when the service that issued them considers them to have expired or via an explicit cancelDelegationTokens call (not implemented yet in 23). A side affect of this is that the same delegation token will end up being renewed multiple times (a separate TimerTask for each job which uses the token). DelegationTokenRenewer could maintain a reference count/list of jobIds for shared tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-509) ResourceTrackerPB misses KerberosInfo annotation which renders YARN unusable on secure clusters
Konstantin Boudnik created YARN-509: --- Summary: ResourceTrackerPB misses KerberosInfo annotation which renders YARN unusable on secure clusters Key: YARN-509 URL: https://issues.apache.org/jira/browse/YARN-509 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.1-alpha Environment: BigTop Kerberized cluster test environment Reporter: Konstantin Boudnik Priority: Blocker Fix For: 2.0.4-alpha, 3.0.0 During BigTop 0.6.0 release test cycle, [~rvs] came around the following problem: {noformat} 013-03-26 15:37:03,573 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManager at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:199) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:322) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359) Caused by: org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:162) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) ... 3 more Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:61) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:199) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:158) ... 4 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB, expected client Kerberos principal is yarn/ip-10-46-37-244.ec2.internal@BIGTOP at org.apache.hadoop.ipc.Client.call(Client.java:1235) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy26.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) ... 6 more {noformat} The most significant part is {{User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB}} indicating that ResourceTrackerPB hasn't been annotated with {{@KerberosInfo}} nor {{@TokenInfo}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-193: - Attachment: YARN-193.7.patch Clean up the warnings in TestRMAppAttemptTransitions and fix the broken test cases in it and TestClientRMService. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614650#comment-13614650 ] Vinod Kumar Vavilapalli commented on YARN-112: -- Bobby, I too have seen in large clusters/jobs - the law of large numbers :) We don't see the random number generator. HADOOP-9438 will help, but I think instead of this solution, avoiding the race altogether by generating the destination path deterministically unique is a better solution. Something like localizer_id + random_num is a better destination path than plain random number. Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: omkar vinit joshi Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112.20131503.patch On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614654#comment-13614654 ] Vinod Kumar Vavilapalli commented on YARN-112: -- bq. We don't see the random number generator. I meant seed* . Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: omkar vinit joshi Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112.20131503.patch On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-209) Capacity scheduler doesn't trigger app-activation after adding nodes
[ https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-209: - Summary: Capacity scheduler doesn't trigger app-activation after adding nodes (was: Capacity scheduler can leave application in pending state) Capacity scheduler doesn't trigger app-activation after adding nodes Key: YARN-209 URL: https://issues.apache.org/jira/browse/YARN-209 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Zhijie Shen Fix For: 3.0.0 Attachments: YARN-209.1.patch, YARN-209.2.patch, YARN-209.3.patch, YARN-209-test.patch Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever. This might be rare to hit in real life because enough NM's heartbeat to the RM before applications can get submitted. But a change in settings or heartbeat interval might make it easier to repro. In RM restart scenarios, this will likely hit more if its implemented by re-playing events and re-submitting applications to the scheduler before the RPC to NM's is activated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-101: --- Attachment: YARN-101.2.patch 1. recreate the patch based on the latest trunk version 2. add new testcase to test the patch If the heartbeat message loss, the nodestatus info of complete container will loss too. Key: YARN-101 URL: https://issues.apache.org/jira/browse/YARN-101 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: suse. Reporter: xieguiming Assignee: Xuan Gong Priority: Minor Attachments: YARN-101.1.patch, YARN-101.2.patch see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread(Node Status Updater) { @Override @SuppressWarnings(unchecked) public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info(Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat, + hence shutting down.); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info(Node is out of sync with ResourceManager, + hence rebooting.); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error(Caught exception in status-updater, e); } } } }.start(); } private NodeStatus getNodeStatus() { NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); nodeStatus.setNodeId(this.nodeId); int numActiveContainers = 0; ListContainerStatus containersStatuses = new ArrayListContainerStatus(); for (IteratorEntryContainerId, Container i = this.context.getContainers().entrySet().iterator(); i.hasNext();) { EntryContainerId, Container e = i.next(); ContainerId containerId = e.getKey(); Container container = e.getValue(); // Clone the container to send it to the RM org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = container.cloneAndGetContainerStatus(); containersStatuses.add(containerStatus); ++numActiveContainers; LOG.info(Sending out status for container: + containerStatus); {color:red} // Here is the part that removes the completed containers. if (containerStatus.getState() == ContainerState.COMPLETE) { // Remove i.remove(); {color} LOG.info(Removed completed container + containerId); } } nodeStatus.setContainersStatuses(containersStatuses);
[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614675#comment-13614675 ] Junping Du commented on YARN-18: Thanks Arun and Luke for comments and review. [~acmurthy], YARN-18 and YARN-19 are still part of HADOOP-8468. In that umbrella JIRA, I had a proposal to describe how these two changes (P6, P7) works in detail level which may helps you to review this patch. If necessary, I can reflect the current changes (with addressing lots of comments above) and attach again in this JIRA. Thx! Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-509) ResourceTrackerPB misses KerberosInfo annotation which renders YARN unusable on secure clusters
[ https://issues.apache.org/jira/browse/YARN-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614710#comment-13614710 ] Roman Shaposhnik commented on YARN-509: --- This is from Bigtop testing so I can make the cluster available for you (I'll need your public ssh key -- please send it to me offline pref. PGP encoded). Now, to answer your questions: bq. What is security.resourcetracker.protocol.acl set to in your hadoop-policy.xml? ${HADOOP_YARN_USER} which acording to the process environment translates to yarn bq. What is yarn.nodemanager.principal in yarn-site.xml ? yarn/_HOST@BIGTOP bq. RMNMSecurityInfoClass.class and the text file org.apache.hadoop.security.SecurityInfo are on the classpath of ResourceManager? Yes it is. Please let me know if you need any more info or if you'd like to get access to the cluster. ResourceTrackerPB misses KerberosInfo annotation which renders YARN unusable on secure clusters --- Key: YARN-509 URL: https://issues.apache.org/jira/browse/YARN-509 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.1-alpha Environment: BigTop Kerberized cluster test environment Reporter: Konstantin Boudnik Priority: Blocker Fix For: 3.0.0, 2.0.4-alpha During BigTop 0.6.0 release test cycle, [~rvs] came around the following problem: {noformat} 013-03-26 15:37:03,573 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManager at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:199) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:322) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359) Caused by: org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:162) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) ... 3 more Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:61) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:199) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:158) ... 4 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB, expected client Kerberos principal is yarn/ip-10-46-37-244.ec2.internal@BIGTOP at org.apache.hadoop.ipc.Client.call(Client.java:1235) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy26.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) ... 6 more {noformat} The most significant part is {{User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB}} indicating that ResourceTrackerPB hasn't been annotated with {{@KerberosInfo}} nor {{@TokenInfo}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-510) Writing Yarn Applications documentation should be changed to signify use of of fully qualified paths when localizing resources
Hitesh Shah created YARN-510: Summary: Writing Yarn Applications documentation should be changed to signify use of of fully qualified paths when localizing resources Key: YARN-510 URL: https://issues.apache.org/jira/browse/YARN-510 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.0.0-alpha Reporter: Hitesh Shah Assignee: Hitesh Shah Path jarPath = new Path(/Working_HDFS_DIR/+ appId +/+AM_JAR); fs.copyFromLocalFile(new Path(/local/src/AM.jar), jarPath); // VALIDATED jar is in HDFS under correct PATH FileStatus jarStatus = fs.getFileStatus(jarPath); LocalResource amJarRsrc = Records.newRecord(LocalResource.class); amJarRsrc.setType(LocalResourceType.FILE); amJarRsrc.setVisibility(LocalResourceVisibility.APPLICATION); amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath)); amJarRsrc.setTimestamp(jarStatus.getModificationTime()); amJarRsrc.setSize(jarStatus.getLen()); localResources.put(AppMaster.jar, amJarRsrc); amContainer.setLocalResources(localResources); Error logs (nodeManager.log) INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1364219323374_0016 transitioned from INITING to RUNNING INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Got exception parsing AppMaster.jar and value resource {, port: -1, file: /Working_HDFS_DIR/application_1364219323374_0016/AM.jar, }, size: 13940, timestamp: 1364230436600, type: FILE, visibility: APPLICATION, 2013-03-25 17:53:57,391 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Failed to parse resource-request java.net.URISyntaxException: Expected scheme name at index 0: :///Working_HDFS_DIR/application_1364219323374_0016/AM.jar at java.net.URI$Parser.fail(URI.java:2810) at java.net.URI$Parser.failExpecting(URI.java:2816) at java.net.URI$Parser.parse(URI.java:3008) at java.net.URI.init(URI.java:735) at org.apache.hadoop.yarn.util.ConverterUtils.getPathFromYarnURL(ConverterUtils.java:70) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.init(LocalResourceRequest.java:46) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:501) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:472) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:382) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMa -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-309: --- Attachment: YARN-309.4.patch 1. create new contents in YarnConfiguration, to set default value 2. Everytime ResourceTrackerService will set HeartBeatInterval, and NM will get and use this interval 3. add heartbeatInterval variable in .proto file Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.1.patch, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-499) On container failure, include last n lines of logs in diagnostics
[ https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614738#comment-13614738 ] Sandy Ryza commented on YARN-499: - Ravi, The idea of putting the app master in a big try/catch seems good to me, but I was envisioning this JIRA to encompass something more general that would handle non-AM container logs, containers that OOM before getting into the main function, and containers that don't run java. It's true that the approach I outlined doesn't deterministically report exceptions, but it at least gets us back to parity with MR1, and I believe that in most cases (and in all cases that I've seen), the end of the log contains the helpful information. On container failure, include last n lines of logs in diagnostics - Key: YARN-499 URL: https://issues.apache.org/jira/browse/YARN-499 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza When a container fails, the only way to diagnose it is to look at the logs. ContainerStatuses include a diagnostic string that is reported back to the resource manager by the node manager. Currently in MR2 I believe whatever is sent to the task's standard out is added to the diagnostics string, but for MR standard out is redirected to a file called stdout. In MR1, this string was populated with the last few lines of the task's stdout file, and got printed to the console, allowing for easy debugging. Handling this would help to soothe the infuriating problem of an AM dying for a mysterious reason before setting a tracking URL (MAPREDUCE-3688). This could be done in one of two ways. * Use tee to send MR's standard out to both the stdout file and standard out. This requires modifying ShellCmdExecutor to roll what it reads in, as we wouldn't want to be storing the entire task log in NM memory. * Read the task's log files. This would require standardizing or making the container log files configurable. Right now the log files are determined in userland and all that is YARN is aware of the log directory. Does this present any issues I'm not considering? If so it this might only be needed for AMs? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-499) On container failure, include last n lines of logs in diagnostics
[ https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-499: Attachment: YARN-499.patch On container failure, include last n lines of logs in diagnostics - Key: YARN-499 URL: https://issues.apache.org/jira/browse/YARN-499 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-499.patch When a container fails, the only way to diagnose it is to look at the logs. ContainerStatuses include a diagnostic string that is reported back to the resource manager by the node manager. Currently in MR2 I believe whatever is sent to the task's standard out is added to the diagnostics string, but for MR standard out is redirected to a file called stdout. In MR1, this string was populated with the last few lines of the task's stdout file, and got printed to the console, allowing for easy debugging. Handling this would help to soothe the infuriating problem of an AM dying for a mysterious reason before setting a tracking URL (MAPREDUCE-3688). This could be done in one of two ways. * Use tee to send MR's standard out to both the stdout file and standard out. This requires modifying ShellCmdExecutor to roll what it reads in, as we wouldn't want to be storing the entire task log in NM memory. * Read the task's log files. This would require standardizing or making the container log files configurable. Right now the log files are determined in userland and all that is YARN is aware of the log directory. Does this present any issues I'm not considering? If so it this might only be needed for AMs? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-499) On container failure, include last n lines of logs in diagnostics
[ https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614752#comment-13614752 ] Sandy Ryza commented on YARN-499: - Uploaded a patch that uses tee to send the standard out both to standard out and the stdout file. The standard ShellCommandExecutor holds on to all of a container's standard output. I replaced it with one that only holds on to the last 500 characters. This also fixes an existing security issue that would a allow container to force an out of memory error in the nodemanager by feeding it a ton of output. I've verified on a pseudo-distributed cluster that OOM errors on jvm initialization get printed to the console. On container failure, include last n lines of logs in diagnostics - Key: YARN-499 URL: https://issues.apache.org/jira/browse/YARN-499 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-499.patch When a container fails, the only way to diagnose it is to look at the logs. ContainerStatuses include a diagnostic string that is reported back to the resource manager by the node manager. Currently in MR2 I believe whatever is sent to the task's standard out is added to the diagnostics string, but for MR standard out is redirected to a file called stdout. In MR1, this string was populated with the last few lines of the task's stdout file, and got printed to the console, allowing for easy debugging. Handling this would help to soothe the infuriating problem of an AM dying for a mysterious reason before setting a tracking URL (MAPREDUCE-3688). This could be done in one of two ways. * Use tee to send MR's standard out to both the stdout file and standard out. This requires modifying ShellCmdExecutor to roll what it reads in, as we wouldn't want to be storing the entire task log in NM memory. * Read the task's log files. This would require standardizing or making the container log files configurable. Right now the log files are determined in userland and all that is YARN is aware of the log directory. Does this present any issues I'm not considering? If so it this might only be needed for AMs? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614755#comment-13614755 ] omkar vinit joshi commented on YARN-112: Vinod's suggestion looks good to me and it will in fact simplify FSDownload logic. Adding unique number generator (AtomicLong) to LocalResourcesTrackerImpl so that random (in our case now unique) number generation will be centralized for public, private as well as application cache files. Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: omkar vinit joshi Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112.20131503.patch On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] omkar vinit joshi updated YARN-112: --- Attachment: yarn-112-20130326.patch Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: omkar vinit joshi Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112-20130326.patch, yarn-112.20131503.patch On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed
[ https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-474: - Attachment: YARN-474.3.patch @Vinod's comments are addressed the newest patch. In addition, I've tested the patch on one-node cluster, and seen it worked. CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed Key: YARN-474 URL: https://issues.apache.org/jira/browse/YARN-474 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: YARN-474.1.patch, YARN-474.2.patch, YARN-474.3.patch Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-440) Flatten RegisterNodeManagerResponse
[ https://issues.apache.org/jira/browse/YARN-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614792#comment-13614792 ] Hudson commented on YARN-440: - Integrated in Hadoop-trunk-Commit #3531 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3531/]) YARN-440. Flatten RegisterNodeManagerResponse. Contributed by Xuan Gong. (Revision 1461256) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1461256 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/RegistrationResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/RegistrationResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestRMNMSecretKeys.java Flatten RegisterNodeManagerResponse --- Key: YARN-440 URL: https://issues.apache.org/jira/browse/YARN-440 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Xuan Gong Fix For: 2.0.5-beta Attachments: YARN-440.1.patch, YARN-440.2.patch, YARN-440.3.patch RegisterNodeManagerResponse has another wrapper RegistrationResponse under it, which can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-209) Capacity scheduler doesn't trigger app-activation after adding nodes
[ https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614819#comment-13614819 ] Bikas Saha commented on YARN-209: - Patch looks good overall. I dont quite see what testActivatingPendingApplication() is buying us in its current form. If the leaf queue test fails before the fix and passes after it, then it should be enough IMO. Capacity scheduler doesn't trigger app-activation after adding nodes Key: YARN-209 URL: https://issues.apache.org/jira/browse/YARN-209 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Zhijie Shen Fix For: 3.0.0 Attachments: YARN-209.1.patch, YARN-209.2.patch, YARN-209.3.patch, YARN-209-test.patch Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever. This might be rare to hit in real life because enough NM's heartbeat to the RM before applications can get submitted. But a change in settings or heartbeat interval might make it easier to repro. In RM restart scenarios, this will likely hit more if its implemented by re-playing events and re-submitting applications to the scheduler before the RPC to NM's is activated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614820#comment-13614820 ] Hadoop QA commented on YARN-309: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575621/YARN-309.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/608//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/608//console This message is automatically generated. Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.1.patch, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614833#comment-13614833 ] Bikas Saha commented on YARN-193: - I am not sure if the normalization errors should reach all the way to the RMAppAttemptImpl and cause failures. AM container request should be validated and normalized in ApplicationMasterService.submitApplication() as the first thing, even before sending it to RMAppManager. Task container requests should be validated in ApplicationMasterService.allocate() as the first thing before calling scheduler.allocate(). This is like a sanity check. This also ensures that we are not calling into the scheduler and changing its internal state (eg it could return completed container or newly allocated container which would be lost if we throw an exception). RMAppAttempImpl could assert that the allocated container has same size as the requested container. Normalization should simply cap the resource to the max allowed. Normalize can be called from anywhere and so its not necessary to always validate before normalizing. In fact we could choose to normalize requests max to max instead of throwing an exception. Validate should not throw an exception IMO. Its like a helper function that tell if the value is valid or not. Different users can choose to do different things based on the result of validate(). Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614847#comment-13614847 ] Xuan Gong commented on YARN-309: Test case fails because of localhost binding problem. aused by: org.apache.hadoop.yarn.YarnException: java.net.BindException: Problem binding to [localhost:12345] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139) at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:63) at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.start(ContainerManagerImpl.java:230) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) ... 11 more Caused by: java.net.BindException: Problem binding to [localhost:12345] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:716) at org.apache.hadoop.ipc.Server.bind(Server.java:415) at org.apache.hadoop.ipc.Server$Listener.init(Server.java:518) at org.apache.hadoop.ipc.Server.init(Server.java:1962) at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:986) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.init(ProtobufRpcEngine.java:427) at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:402) at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:829) at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169) at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132) ... 15 more Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.1.patch, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614848#comment-13614848 ] Xuan Gong commented on YARN-309: And it is not introduced by this patch. Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.1.patch, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-499) On container failure, include last n lines of logs in diagnostics
[ https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614863#comment-13614863 ] Hadoop QA commented on YARN-499: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575623/YARN-499.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.mapreduce.v2.app.TestRecovery org.apache.hadoop.mapreduce.v2.app.job.impl.TestMapReduceChildJVM {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/606//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/606//console This message is automatically generated. On container failure, include last n lines of logs in diagnostics - Key: YARN-499 URL: https://issues.apache.org/jira/browse/YARN-499 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-499.patch When a container fails, the only way to diagnose it is to look at the logs. ContainerStatuses include a diagnostic string that is reported back to the resource manager by the node manager. Currently in MR2 I believe whatever is sent to the task's standard out is added to the diagnostics string, but for MR standard out is redirected to a file called stdout. In MR1, this string was populated with the last few lines of the task's stdout file, and got printed to the console, allowing for easy debugging. Handling this would help to soothe the infuriating problem of an AM dying for a mysterious reason before setting a tracking URL (MAPREDUCE-3688). This could be done in one of two ways. * Use tee to send MR's standard out to both the stdout file and standard out. This requires modifying ShellCmdExecutor to roll what it reads in, as we wouldn't want to be storing the entire task log in NM memory. * Read the task's log files. This would require standardizing or making the container log files configurable. Right now the log files are determined in userland and all that is YARN is aware of the log directory. Does this present any issues I'm not considering? If so it this might only be needed for AMs? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable
[ https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614871#comment-13614871 ] Hadoop QA commented on YARN-24: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575571/YARN-24-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/612//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/612//console This message is automatically generated. Nodemanager fails to start if log aggregation enabled and namenode unavailable -- Key: YARN-24 URL: https://issues.apache.org/jira/browse/YARN-24 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Assignee: Sandy Ryza Attachments: YARN-24-1.patch, YARN-24-2.patch, YARN-24-3.patch, YARN-24.patch If log aggregation is enabled and the namenode is currently unavailable, the nodemanager fails to startup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614872#comment-13614872 ] Hadoop QA commented on YARN-101: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575611/YARN-101.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/611//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/611//console This message is automatically generated. If the heartbeat message loss, the nodestatus info of complete container will loss too. Key: YARN-101 URL: https://issues.apache.org/jira/browse/YARN-101 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: suse. Reporter: xieguiming Assignee: Xuan Gong Priority: Minor Attachments: YARN-101.1.patch, YARN-101.2.patch see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread(Node Status Updater) { @Override @SuppressWarnings(unchecked) public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info(Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat, + hence shutting down.); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info(Node is out of sync with ResourceManager, + hence rebooting.); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error(Caught exception in status-updater,
[jira] [Commented] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable
[ https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614873#comment-13614873 ] Sandy Ryza commented on YARN-24: Verified the updated patch on a pseudo-distributed cluster as well. Nodemanager fails to start if log aggregation enabled and namenode unavailable -- Key: YARN-24 URL: https://issues.apache.org/jira/browse/YARN-24 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Assignee: Sandy Ryza Attachments: YARN-24-1.patch, YARN-24-2.patch, YARN-24-3.patch, YARN-24.patch If log aggregation is enabled and the namenode is currently unavailable, the nodemanager fails to startup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-474) CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed
[ https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614882#comment-13614882 ] Hadoop QA commented on YARN-474: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575633/YARN-474.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/613//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/613//console This message is automatically generated. CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed Key: YARN-474 URL: https://issues.apache.org/jira/browse/YARN-474 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: YARN-474.1.patch, YARN-474.2.patch, YARN-474.3.patch Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-474) CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed
[ https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614886#comment-13614886 ] Vinod Kumar Vavilapalli commented on YARN-474: -- The latest patch looks good, I am checking it in. CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed Key: YARN-474 URL: https://issues.apache.org/jira/browse/YARN-474 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: YARN-474.1.patch, YARN-474.2.patch, YARN-474.3.patch Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-474) CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed
[ https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614890#comment-13614890 ] Hudson commented on YARN-474: - Integrated in Hadoop-trunk-Commit #3532 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3532/]) YARN-474. Fix CapacityScheduler to trigger application-activation when am-resource-percent configuration is refreshed. Contributed by Zhijie Shen. (Revision 1461402) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1461402 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed Key: YARN-474 URL: https://issues.apache.org/jira/browse/YARN-474 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Hitesh Shah Assignee: Zhijie Shen Fix For: 2.0.5-beta Attachments: YARN-474.1.patch, YARN-474.2.patch, YARN-474.3.patch Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-157) The option shell_command and shell_script have conflict
[ https://issues.apache.org/jira/browse/YARN-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614922#comment-13614922 ] rainy Yu commented on YARN-157: --- I can't commit Attachments. My patch is: Index: src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java === --- src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java (revision 90765) +++ src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java (working copy) @@ -140,8 +140,8 @@ // Main class to invoke application master private String appMasterMainClass = ; - // Shell command to be executed - private String shellCommand = ; + // Shell command to be executed. the Linux shell command '/bin/sh' is default + private String shellCommand = /bin/sh; // Location of shell script private String shellScriptPath = ; // Args to be passed to the shell command @@ -276,10 +276,11 @@ appMasterMainClass = cliParser.getOptionValue(class, org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster); -if (!cliParser.hasOption(shell_command)) { - throw new IllegalArgumentException(No shell command specified to be executed by application master); +if (cliParser.hasOption(shell_command)) { + //throw new IllegalArgumentException(No shell command specified to be executed by application master); + shellCommand = cliParser.getOptionValue(shell_command); } -shellCommand = cliParser.getOptionValue(shell_command); +//shellCommand = cliParser.getOptionValue(shell_command); if (cliParser.hasOption(shell_script)) { shellScriptPath = cliParser.getOptionValue(shell_script); The option shell_command and shell_script have conflict --- Key: YARN-157 URL: https://issues.apache.org/jira/browse/YARN-157 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.0.1-alpha Reporter: Li Ming Labels: patch The DistributedShell has an option shell_script to let user specify a shell script which will be executed in containers. But the issue is that the shell_command option is a must, so if both options are set, then every container executor will end with exitCode=1. This is because DistributedShell executes the shell_command and shell_script together. For example, if shell_command is 'date' then the final command to be executed in container is date `ExecShellScript.sh`, so the date command will treat the result of ExecShellScript.sh as its parameter, then there will be an error. To solve this, the DistributedShell should not use the value of shell_command option when the shell_script option is set, and the shell_command option also should not be mandatory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614929#comment-13614929 ] Hadoop QA commented on YARN-112: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575629/yarn-112-20130326.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/614//console This message is automatically generated. Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: omkar vinit joshi Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112-20130326.patch, yarn-112.20131503.patch On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-209) Capacity scheduler doesn't trigger app-activation after adding nodes
[ https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-209: - Attachment: YARN-209.4.patch The log statement is removed and testActivatingPendingApplication is moved to TestRM and enhanced by checking the status before NM is added. @Bikas, I agree TestLeafQueue only is enough to verify the bug, but I think the test case that you provided before is valuable. Therefore, I included and updated it as a showing case of activating pending applications by adding more nodemanagers. Capacity scheduler doesn't trigger app-activation after adding nodes Key: YARN-209 URL: https://issues.apache.org/jira/browse/YARN-209 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Zhijie Shen Fix For: 3.0.0 Attachments: YARN-209.1.patch, YARN-209.2.patch, YARN-209.3.patch, YARN-209.4.patch, YARN-209-test.patch Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever. This might be rare to hit in real life because enough NM's heartbeat to the RM before applications can get submitted. But a change in settings or heartbeat interval might make it easier to repro. In RM restart scenarios, this will likely hit more if its implemented by re-playing events and re-submitting applications to the scheduler before the RPC to NM's is activated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-209) Capacity scheduler doesn't trigger app-activation after adding nodes
[ https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614953#comment-13614953 ] Hadoop QA commented on YARN-209: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575660/YARN-209.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/615//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/615//console This message is automatically generated. Capacity scheduler doesn't trigger app-activation after adding nodes Key: YARN-209 URL: https://issues.apache.org/jira/browse/YARN-209 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Zhijie Shen Fix For: 3.0.0 Attachments: YARN-209.1.patch, YARN-209.2.patch, YARN-209.3.patch, YARN-209.4.patch, YARN-209-test.patch Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever. This might be rare to hit in real life because enough NM's heartbeat to the RM before applications can get submitted. But a change in settings or heartbeat interval might make it easier to repro. In RM restart scenarios, this will likely hit more if its implemented by re-playing events and re-submitting applications to the scheduler before the RPC to NM's is activated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-474) CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed
[ https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614961#comment-13614961 ] Konstantin Boudnik commented on YARN-474: - It seems that this commit has broken the [build of branch-2 |https://builds.apache.org/view/Hadoop/job/Hadoop-branch2/4/console] CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed Key: YARN-474 URL: https://issues.apache.org/jira/browse/YARN-474 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Hitesh Shah Assignee: Zhijie Shen Fix For: 2.0.5-beta Attachments: YARN-474.1.patch, YARN-474.2.patch, YARN-474.3.patch Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-474) CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed
[ https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614962#comment-13614962 ] Konstantin Boudnik commented on YARN-474: - Here's the error message: {noformat} [ERROR] /home/jenkins/jenkins-slave/workspace/Hadoop-branch2/branch-2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java:[1610,10] cannot find symbol [ERROR] symbol : method setDouble(java.lang.String,float) [ERROR] location: class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration {noformat} CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed Key: YARN-474 URL: https://issues.apache.org/jira/browse/YARN-474 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Hitesh Shah Assignee: Zhijie Shen Fix For: 2.0.5-beta Attachments: YARN-474.1.patch, YARN-474.2.patch, YARN-474.3.patch Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira