[jira] [Commented] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private
[ https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352835#comment-14352835 ] Hudson commented on YARN-3296: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #127 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/127/]) YARN-3296. Mark ResourceCalculatorProcessTree class as Public for configurable resource monitoring. Contributed by Hitesh Shah (junping_du: rev 7ce3c7635392c32f0504191ddd8417fb20509caa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private --- Key: YARN-3296 URL: https://issues.apache.org/jira/browse/YARN-3296 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Fix For: 2.7.0 Attachments: YARN-3296.1.patch, YARN-3296.2.patch Given that someone can implement their custom plugin for resource monitoring and configure the NM to use it, this class should be marked public. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352944#comment-14352944 ] Junping Du commented on YARN-3225: -- Thanks [~devaraj.k] for delivering the patch which is the first one in graceful decommission effort! A couple of comments: In RefreshNodesRequestPBImpl.java, {code} @Override + public long getTimeout() { +return getProto().getTimeout(); + } + + @Override + public void setTimeout(long timeout) { +builder.setTimeout(timeout); + } {code} The setTimeout() has problem because we didn't set viaProto to false, so if we getTimeout() afterwards then it will return the old value from old proto. Suggest to add a method of maybeInitBuilder() just like other PBImpls, also add a unit test to verify the PBImpl works as expected. In NodeState.java, {code} DECOMMISSION_IN_PROGRESS {code} [~jlowe] suggested in umbrella JIRA that it is better to be DECOMMISSIONING. I had the same feeling so reflect the name in latest proposal. Do you think we should incorporate that comments here? In RMAdminCLI.java, {code} + .put(-refreshNodes, new UsageInfo([-g [timeout in ms]], Refresh the hosts information at the ResourceManager.)) {code} I think we should add more info to description message - Refresh the hosts information at the ResourceManager. to explain -g option doing. Isn't it? Also, per my suggestion above, it is better to specify seconds in timeout. MS is more precisely, but get more chance for wrong (manually) operation. Also, it is better to change the patch name to be consist with JIRA number (YARN-3225). New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K Attachments: YARN-914.patch New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private
[ https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352857#comment-14352857 ] Hudson commented on YARN-3296: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #861 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/861/]) YARN-3296. Mark ResourceCalculatorProcessTree class as Public for configurable resource monitoring. Contributed by Hitesh Shah (junping_du: rev 7ce3c7635392c32f0504191ddd8417fb20509caa) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java * hadoop-yarn-project/CHANGES.txt yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private --- Key: YARN-3296 URL: https://issues.apache.org/jira/browse/YARN-3296 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Fix For: 2.7.0 Attachments: YARN-3296.1.patch, YARN-3296.2.patch Given that someone can implement their custom plugin for resource monitoring and configure the NM to use it, this class should be marked public. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-3225: Attachment: YARN-3225.patch New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K Attachments: YARN-3225.patch, YARN-914.patch New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352962#comment-14352962 ] Junping Du commented on YARN-3304: -- Agree that negative value sounds very odd. However, if we are really failed to get cpu usage info, does value of 0 confuse user that metrics works fine and the cpu usage is very low here? ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Karthik Kambatla Priority: Blocker Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private
[ https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352863#comment-14352863 ] Hudson commented on YARN-3296: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2059 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2059/]) YARN-3296. Mark ResourceCalculatorProcessTree class as Public for configurable resource monitoring. Contributed by Hitesh Shah (junping_du: rev 7ce3c7635392c32f0504191ddd8417fb20509caa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private --- Key: YARN-3296 URL: https://issues.apache.org/jira/browse/YARN-3296 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Fix For: 2.7.0 Attachments: YARN-3296.1.patch, YARN-3296.2.patch Given that someone can implement their custom plugin for resource monitoring and configure the NM to use it, this class should be marked public. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private
[ https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353071#comment-14353071 ] Hudson commented on YARN-3296: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #118 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/118/]) YARN-3296. Mark ResourceCalculatorProcessTree class as Public for configurable resource monitoring. Contributed by Hitesh Shah (junping_du: rev 7ce3c7635392c32f0504191ddd8417fb20509caa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private --- Key: YARN-3296 URL: https://issues.apache.org/jira/browse/YARN-3296 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Fix For: 2.7.0 Attachments: YARN-3296.1.patch, YARN-3296.2.patch Given that someone can implement their custom plugin for resource monitoring and configure the NM to use it, this class should be marked public. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2495: Attachment: YARN-2495.20150309-1.patch Hi [~wangda], Attaching the updated patch and find the status of the comments : 1,2,3,4 : all have been rectified, method name having set in the begin and the end was not sounding appropriate, hence modified it to areNodeLabelsSetInReq and setAreNodeLabelsSetInReq in both heartbeat and Register 5) ??I think we may not need to check centralized/distributed configuration here, centralized/distributed is a config in RM side.?? Earlier had moved this configuration type check in the NM.serviceInit (l) before calling {{getNodeLabelsProviderService}} But now changed the method name to {{createNodeLabelsProviderService}} and moved back the check inside this method itself. As part of the YARN-2729 will be returning Script based node label provider in the createNodeLabelsProviderService method. ??In NM side, it should be how to get node labels, if user doesn't configure any script file for it, it should be null and no instance of NodeLabelProviderService will be added to NM.?? In current patch, null will be set only if configuration type is set as centralized in NM and based on earlier(other jira) feedback from Vinod, i think we need to fail fast and let the user know the error at the earliest, so script node label provider will throw exception on erroneous conditions like script not configured,no rights to execute etc.. and ensure NM will fail to start. ??So back to code, you can just leave getNodeLabelsProviderService(..), which will be implemented in YARN-2729.If you agree, we need change the name isDistributedNodeLabelsConf to?? Actually dint get the intent of these 2 lines and felt like comment was not complete... Is it you want to avoid check of configuration type in NM and move it script node label provider or something ? 6) has been rectified, was added while analyzing test case failure. 7) ??isDistributedNodeLabels seems not so necessary here, and if you agree with 5), it's better to remove the field?? IIUC point5 was related to NM initializing the provider and point7 is related to NodeStatusUpdaterImpl if so i dint get the relation. can you please clarify these 2 points 8) ??Add null check or comment (provider returned node labels will always be not-null, for areNodeLabelsUpdated in NodeStatusUpdaterImpl?? Before calling areNodeLabelsUpdated, i had already checked for null and set empty labels @ line 626 (startStatusUpdater method) 9)??Since we already have TestNodeStatusUpdater, it's better to merge TestNodeStatusUpdaterForLabels to it.?? Well there was already too many internal classes extending NodeStatusUpdaterImpl and ResourceTrackerService. And personally felt very very difficult to walk through the test case and try to reuse it and class had already crossed 1666 lines of code and hence as it was loosing readability added a new class. Please inform if required will merge it to the existing class only 10) Have modified ResourceTrackerService based on ur comments and have pushed some common code in register and Heartbeat to the common method. All findbugs issues are not related to my modifications and following test case failure is not related to my modification TestRMRestart.testRMRestartGetApplicationList. Also will be uploading a patch for 2729 to get the view of complete flow and also should will be testable Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2729: Attachment: YARN-2729.20150309-1.patch Hi [~wangda] Rebasing the patch Removing dependency on yarn-2923, Changed configuration names to suit current conf suggestions Have made it to fail fast on invalid configurations. If the above modifications are fine then will start looking into changes required to make HadoopCommon's NodeHealthScriptRunner to make it reuseable. (In seperate jira) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup --- Key: YARN-2729 URL: https://issues.apache.org/jira/browse/YARN-2729 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353053#comment-14353053 ] Nathan Roberts commented on YARN-3298: -- Thanks [~leftnoteasy] for the additional detail. Maybe I should just wait for the patch, but here's the case I'm worried about. queue.used is just under queue.capacity, so current-capacity = queue.capacity. two users in the queue, both have same used resources user-limit will be slightly less than (queue-capacity/2). (so user-limit can be extremely close to user.usage) user.usage + required might now be slightly greater than user-limit. If that happens, it seems like we'll be unable to cross the capacity threshold. Once above capacity, I think it will work, but crossing that threshold might be hard. Seems like current-capacity should be calculated as: {code} current-capacity = max(queue.used,queue.capacity)+now-required; {code} User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private
[ https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353127#comment-14353127 ] Hudson commented on YARN-3296: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2077 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2077/]) YARN-3296. Mark ResourceCalculatorProcessTree class as Public for configurable resource monitoring. Contributed by Hitesh Shah (junping_du: rev 7ce3c7635392c32f0504191ddd8417fb20509caa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private --- Key: YARN-3296 URL: https://issues.apache.org/jira/browse/YARN-3296 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Fix For: 2.7.0 Attachments: YARN-3296.1.patch, YARN-3296.2.patch Given that someone can implement their custom plugin for resource monitoring and configure the NM to use it, this class should be marked public. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3296) yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private
[ https://issues.apache.org/jira/browse/YARN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353095#comment-14353095 ] Hudson commented on YARN-3296: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #127 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/127/]) YARN-3296. Mark ResourceCalculatorProcessTree class as Public for configurable resource monitoring. Contributed by Hitesh Shah (junping_du: rev 7ce3c7635392c32f0504191ddd8417fb20509caa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java yarn.nodemanager.container-monitor.process-tree.class is configurable but ResourceCalculatorProcessTree class is marked Private --- Key: YARN-3296 URL: https://issues.apache.org/jira/browse/YARN-3296 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Fix For: 2.7.0 Attachments: YARN-3296.1.patch, YARN-3296.2.patch Given that someone can implement their custom plugin for resource monitoring and configure the NM to use it, this class should be marked public. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0006-YARN-3136.patch getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353143#comment-14353143 ] Sunil G commented on YARN-3136: --- bq.createReleaseCache schedules a timer task that Sorry. I also missed that. Agreeing to make 'applications' as a concurrent map. As its private and unstable, its fine to make its concurrent. But any schedulers which uses this will have to make change. Do we need to document that? Also attaching a patch for same. getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353241#comment-14353241 ] Naganarasimha G R commented on YARN-2495: - Most of the find bugs reported are from Fair scheduler and nothing to do with changes in the patch and tests failed are due to timeout and those are also not related to the patch. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN
Vinod Kumar Vavilapalli created YARN-3306: - Summary: [Umbrella] Proposing per-queue Policy driven scheduling in YARN Key: YARN-3306 URL: https://issues.apache.org/jira/browse/YARN-3306 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Scheduling layout in Apache Hadoop YARN today is very coarse grained. This proposal aims at converting today’s rigid scheduling in YARN to a per-queue policy driven architecture. We propose the creation of a common policy framework and implement acommon set of policies that administrators can pick and chose per queue - Make scheduling policies configurable per queue - Initially, we limit ourselves to a new type of scheduling policy that determines the ordering of applications within the leaf queue - In the near future, we will also pursue parent queue level policies and potential algorithm reuse through a separate type of policies that control resource limits per queue, user, application etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353277#comment-14353277 ] Zhijie Shen commented on YARN-3287: --- Sure, I'll take a look again. TimelineClient kerberos authentication failure uses wrong login context. Key: YARN-3287 URL: https://issues.apache.org/jira/browse/YARN-3287 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Daryn Sharp Attachments: YARN-3287.1.patch, YARN-3287.2.patch, timeline.patch TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353259#comment-14353259 ] Jonathan Eagles commented on YARN-3287: --- [~zjshen], can you have another looks now that I have up-merged and added sufficient tests to test this change? TimelineClient kerberos authentication failure uses wrong login context. Key: YARN-3287 URL: https://issues.apache.org/jira/browse/YARN-3287 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Daryn Sharp Attachments: YARN-3287.1.patch, YARN-3287.2.patch, timeline.patch TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353217#comment-14353217 ] Hadoop QA commented on YARN-2495: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703435/YARN-2495.20150309-1.patch against trunk revision 5578e22. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6894//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6894//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6894//console This message is automatically generated. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN
[ https://issues.apache.org/jira/browse/YARN-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3306: -- Attachment: PerQueuePolicydrivenschedulinginYARN.pdf Here's a detailed proposal doc. It's light on details on the leaf-queue policy interface - will do so in one of the sub-tasks. [~cwelch] is helping with most of the implementation, Tx Craig. [Umbrella] Proposing per-queue Policy driven scheduling in YARN --- Key: YARN-3306 URL: https://issues.apache.org/jira/browse/YARN-3306 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: PerQueuePolicydrivenschedulinginYARN.pdf Scheduling layout in Apache Hadoop YARN today is very coarse grained. This proposal aims at converting today’s rigid scheduling in YARN to a per-queue policy driven architecture. We propose the creation of a common policy framework and implement acommon set of policies that administrators can pick and chose per queue - Make scheduling policies configurable per queue - Initially, we limit ourselves to a new type of scheduling policy that determines the ordering of applications within the leaf queue - In the near future, we will also pursue parent queue level policies and potential algorithm reuse through a separate type of policies that control resource limits per queue, user, application etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353272#comment-14353272 ] Anubhav Dhoot commented on YARN-3304: - The intention of setting -1 as was for this issue (distinguishing unavailable vs actually zero). Ideally we should prevent adding the metrics to collection until they are available. One possibility is doing it at ContainerMetrics#recordCpuUsage. Suggest investigating if this ideal case is achievable, and if not i am fine with making these 0 to be consistent. ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Karthik Kambatla Priority: Blocker Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353232#comment-14353232 ] Hadoop QA commented on YARN-3136: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703438/0006-YARN-3136.patch against trunk revision 5578e22. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6895//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6895//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6895//console This message is automatically generated. getTransferredContainers can be a bottleneck during AM registration --- Key: YARN-3136 URL: https://issues.apache.org/jira/browse/YARN-3136 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353597#comment-14353597 ] Hudson commented on YARN-3287: -- FAILURE: Integrated in Hadoop-trunk-Commit #7291 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7291/]) YARN-3287. Made TimelineClient put methods do as the correct login context. Contributed by Daryn Sharp and Jonathan Eagles. (zjshen: rev d6e05c5ee26feefc17267b7c9db1e2a3dbdef117) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/security/TestTimelineAuthenticationFilter.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java TimelineClient kerberos authentication failure uses wrong login context. Key: YARN-3287 URL: https://issues.apache.org/jira/browse/YARN-3287 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Daryn Sharp Fix For: 2.7.0 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, timeline.patch TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3315) Fix -list-blacklisted-trackers to print the blacklisted NMs
[ https://issues.apache.org/jira/browse/YARN-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3305 to YARN-3315: --- Component/s: (was: mrv2) Affects Version/s: (was: 0.23.0) Key: YARN-3315 (was: MAPREDUCE-3305) Project: Hadoop YARN (was: Hadoop Map/Reduce) Fix -list-blacklisted-trackers to print the blacklisted NMs --- Key: YARN-3315 URL: https://issues.apache.org/jira/browse/YARN-3315 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil bin/mapred job -list-blacklisted-trackers currently prints getBlacklistedTrackers - Not implemented yet This is a long pending issue. Could not find a tracking ticket, hence opening one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3311) add location to web UI so you know where you are - cluster, node, AM, job history
[ https://issues.apache.org/jira/browse/YARN-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3074 to YARN-3311: --- Component/s: (was: mrv2) Affects Version/s: (was: 3.0.0) (was: 0.23.0) 3.0.0 Key: YARN-3311 (was: MAPREDUCE-3074) Project: Hadoop YARN (was: Hadoop Map/Reduce) add location to web UI so you know where you are - cluster, node, AM, job history - Key: YARN-3311 URL: https://issues.apache.org/jira/browse/YARN-3311 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Thomas Graves Right now if you go to any of the web UIs for resource manager, node manager, app master, or job history, they look very similar but sometimes it hard to tell which page you are. Adding a title or something that lets you know would be helpful. Or somehow make them more seemless so one doesn't have to know. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353704#comment-14353704 ] Jian He commented on YARN-3243: --- thanks Wangda ! - ParentQueue#canAssignToThisQueue, {code} if (totalUsedCapacityRatio = maxAvailCapacity) { canAssign = false; break; } {code} instead of comparing with ratio, I think it might be simpler to compare resource value CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3243.1.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353353#comment-14353353 ] Wangda Tan commented on YARN-2495: -- For your comments: 1) For the name, do you think is setAreNodeLabelsUpdated a better name since it avoids set occured twice :) (I understand this needs lots of refactorings, if you have any suggestions, we can finalize it before renaming. 5) I made a mistake that sent an incompleted comment :-p, what I wanted to say is: It will be problematic to ask admins make NM/RM configuration keep synchronized, so I don't want (and also not necessary) NM depends on RM's configuration. So I suggest to make a changes: - In NodeManager.java: when user doesn't configure provider, it should be null. In your patch, you can return a null directly, and YARN-2729 will implement the logic of instancing provider from config. - In NodeStatusUpdaterImpl: avoid using {{isDistributedNodeLabelsConf}}, since we will not have distributedNodeLabelConf in NM side if you agree on previously comment, instead, it will check null of provider. Regarding your fail-fast concern, it shouldn't be a problem if you agree on comment I just made. (I know there could be some back-and-forth comment from my side on this, I feel sorry about this since this feature is evolving itself, please just feel free to let me know your ideas.). 7) I should address your question on 5). 8) You can add an additional comments in line 626 for this. 9) Took a look at TestNodeStatusUpdater, your comment make sense to me, it's a very complex class, you can just leave this comment alone. 10) Few comments for your added code: - updateNodeLabelsInNodeLabelsManager - updateNodeLabelsFromNMReport - {{LOG.info(... accepted from RM}}, use LOG.debug and check {{isDebugEnabled}}. - Make errorMessage clear: indicate 1# this is node labels reported from NM, and 2# it's failed to be put to RM instead of not properly configured. In addition: Another thing we should do is, when distributed node label configuration is set, any direct modify node to labels mapping from RMAdminCLI should be rejected (like -replaceNodeToLabels). This can be done in a separated JIRA. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353553#comment-14353553 ] Wangda Tan commented on YARN-3298: -- Hi [~nroberts], If I understand what you meant correctly, maybe we can just relax when user.used user.limit (instead of user.used + now_required = user.limit), which can solve the problem you mentioned. User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353558#comment-14353558 ] Zhijie Shen commented on YARN-1884: --- [~xgong], thanks for the patch. Here're some comments: 1. No need to change application_history_server.proto, ApplicationHistoryManagerImpl.java, FileSystemApplicationHistoryStore.java, MemoryApplicationHistoryStore.java, ContainerFinishData.java, ContainerHistoryData.java, ContainerStartData.java, ContainerFinishDataPBImpl.java, ContainerStartDataPBImpl.java, ApplicationHistoryStoreTestUtils.java, TestFileSystemApplicationHistoryStore.java, TestMemoryApplicationHistoryStore.java, RMApplicationHistoryWriter.java, TestRMApplicationHistoryWriter.java. It's the deprecated code. 2. Why do we need conf here to compute http or https? getNodeHttpAddress() doesn't come with the prefix? If so, we need to fix it in other block, CLI and webservice too for consistency. For example, when generating the report, we should already append the http prefix. {code} 114 container.getNodeHttpAddress() == null ? # : WebAppUtils 115 .getHttpSchemePrefix(conf) + container.getNodeHttpAddress(), {code} 3. Is it possible if getContainer() returns null? If so, it will result in NPE. Another way is to make getNodeHttpAddress as the method of RMContainer. See how we do it for getContainerExitStatus and so on. {code} createdTime, container.getContainer().getNodeHttpAddress())); {code} ContainerReport should have nodeHttpAddress --- Key: YARN-1884 URL: https://issues.apache.org/jira/browse/YARN-1884 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Xuan Gong Attachments: YARN-1884.1.patch In web UI, we're going to show the node, which used to be to link to the NM web page. However, on AHS web UI, and RM web UI after YARN-1809, the node field has to be set to nodeID where the container is allocated. We need to add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3308) Improvements to CapacityScheduler documentation
[ https://issues.apache.org/jira/browse/YARN-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3308: --- Attachment: YARN-3308-02.patch 02: * rebased for trunk * took in arun's comments Improvements to CapacityScheduler documentation --- Key: YARN-3308 URL: https://issues.apache.org/jira/browse/YARN-3308 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 3.0.0 Reporter: Yoram Arnon Priority: Minor Labels: documentation Attachments: MAPREDUCE-3658, MAPREDUCE-3658, YARN-3308-02.patch Original Estimate: 3h Remaining Estimate: 3h There are some typos and some cases of incorrect English. Also, the descriptions of yarn.scheduler.capacity.queue-path.capacity, yarn.scheduler.capacity.queue-path.maximum-capacity, yarn.scheduler.capacity.queue-path.user-limit-factor, yarn.scheduler.capacity.maximum-applications are not very clear to the uninitiated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3316) Make the ResourceManager, NodeManager and HistoryServer run from Eclipse.
[ https://issues.apache.org/jira/browse/YARN-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3316: --- Component/s: resourcemanager nodemanager Make the ResourceManager, NodeManager and HistoryServer run from Eclipse. - Key: YARN-3316 URL: https://issues.apache.org/jira/browse/YARN-3316 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 3.0.0 Reporter: praveen sripati Priority: Minor Make the ResourceManager, NodeManager and HistoryServer run from Eclipse, so that it would be easy for development. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353574#comment-14353574 ] Zhijie Shen commented on YARN-3287: --- +1 for the last patch. Will commit it. TimelineClient kerberos authentication failure uses wrong login context. Key: YARN-3287 URL: https://issues.apache.org/jira/browse/YARN-3287 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Daryn Sharp Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, timeline.patch TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353560#comment-14353560 ] Hadoop QA commented on YARN-3287: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703485/YARN-3287.3.patch against trunk revision 3241fc2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6896//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6896//console This message is automatically generated. TimelineClient kerberos authentication failure uses wrong login context. Key: YARN-3287 URL: https://issues.apache.org/jira/browse/YARN-3287 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Daryn Sharp Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, timeline.patch TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3310) MR-279: Log info about the location of dist cache
[ https://issues.apache.org/jira/browse/YARN-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-2758 to YARN-3310: --- Component/s: (was: mrv2) Affects Version/s: (was: 0.23.0) Issue Type: Improvement (was: Bug) Key: YARN-3310 (was: MAPREDUCE-2758) Project: Hadoop YARN (was: Hadoop Map/Reduce) MR-279: Log info about the location of dist cache - Key: YARN-3310 URL: https://issues.apache.org/jira/browse/YARN-3310 Project: Hadoop YARN Issue Type: Improvement Reporter: Ramya Sunil Assignee: Siddharth Seth Priority: Minor Currently, there is no log info available about the actual location of the file/archive in dist cache being used by the task except for the ln command in task.sh. We need to log this information to help in debugging esp in those cases where there are more than one archive with the same name. In 0.20.x, in task logs, one could find log info such as the following: INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: distcache location/archive - mapred.local.dir/archive -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3313) Write additional tests for data locality in MRv2.
[ https://issues.apache.org/jira/browse/YARN-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3093 to YARN-3313: --- Component/s: (was: mrv2) (was: test) test Assignee: (was: Mahadev konar) Affects Version/s: (was: 0.23.0) 3.0.0 Key: YARN-3313 (was: MAPREDUCE-3093) Project: Hadoop YARN (was: Hadoop Map/Reduce) Write additional tests for data locality in MRv2. - Key: YARN-3313 URL: https://issues.apache.org/jira/browse/YARN-3313 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Mahadev konar We should add tests to make sure data locality is in place in MRv2 (with respect to the capacity scheduler and also the matching/ask of containers in the MR AM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3312) Web UI menu inconsistencies
[ https://issues.apache.org/jira/browse/YARN-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3075 to YARN-3312: --- Component/s: (was: mrv2) Affects Version/s: (was: 0.23.0) 3.0.0 Key: YARN-3312 (was: MAPREDUCE-3075) Project: Hadoop YARN (was: Hadoop Map/Reduce) Web UI menu inconsistencies --- Key: YARN-3312 URL: https://issues.apache.org/jira/browse/YARN-3312 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Thomas Graves When you go to the various web UI's the menus on the left are inconsistent and (atleast to me) sometimes confusing. For instance if you go to the application master UI, one of the menus is Cluster. If you click on one of the Cluster links it takes you back to the RM ui and you lose the app master UI altogether. Maybe its just me but that is confusing. I like having a link back to the cluster from AM but the way the UI is setup I would have expected it to just open that page in the middle div/frame and leave the AM menus there. Perhaps a different type of link or menu to indicate this is going to take you away from AM page. Also, the nodes and job history UI don't have the Cluster menus at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3309) Capacity scheduler can wait a very long time for node locality
Nathan Roberts created YARN-3309: Summary: Capacity scheduler can wait a very long time for node locality Key: YARN-3309 URL: https://issues.apache.org/jira/browse/YARN-3309 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Nathan Roberts The capacity scheduler will delay scheduling a container on a rack-local node in hopes that a node-local opportunity will come along (YARN-80). It does this by counting the number of missed scheduling opportunities the application has had. When the count reaches a certain threshold, the app will accept the rack-local node. The documented recommendation is to set this threshold to the #nodes in the cluster. However, there are some early-out optimizations that can lead to this delay being a very long time. Example in allocateContainersToNode(): {code} // Try to schedule more if there are no reservations to fulfill if (node.getReservedContainer() == null) { if (calculator.computeAvailableContainers(node.getAvailableResource(), minimumAllocation) 0) { if (LOG.isDebugEnabled()) { LOG.debug(Trying to schedule on node: + node.getNodeName() + , available: + node.getAvailableResource()); } root.assignContainers(clusterResource, node, false); } {code} So, in a large cluster that is completely full (AvailableResource on each node is 0), SchedulingOpportunities will only increase at the rate of container completion rate, not the heartbeat rate, which I think was the original assumption of YARN-80. On a large cluster, this can lead to an hour+ of skipped scheduling opportunities meaning the fifo'ness of a queue is ignored for a very long time. Maybe there should be a time-based limit on this delay as well as a count of missed-scheduling opportunities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353639#comment-14353639 ] Vinod Kumar Vavilapalli commented on YARN-2495: --- Ah, right. Forgot about that. Given that, it seems that we have the following # Node reports with invalid labels during registration - we reject it rightaway # Node gets successfully registered, but then the labels script starts generating invalid labels mid way through I think in case (2), we are better off ignoring the newly reported invalid labels, report this in the UI/NodeReport and let the node continue running. Thoughts? Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3314) Write an integration test for validating MR AM restart and recovery
[ https://issues.apache.org/jira/browse/YARN-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3245 to YARN-3314: --- Component/s: (was: mrv2) (was: test) test Affects Version/s: (was: 0.23.0) Key: YARN-3314 (was: MAPREDUCE-3245) Project: Hadoop YARN (was: Hadoop Map/Reduce) Write an integration test for validating MR AM restart and recovery --- Key: YARN-3314 URL: https://issues.apache.org/jira/browse/YARN-3314 Project: Hadoop YARN Issue Type: Test Components: test Reporter: Vinod Kumar Vavilapalli This, so that we can catch bugs like MAPREDUCE-3233. We need one with recovery disabled i.e. for only restart and one for restart+recovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353677#comment-14353677 ] Nathan Roberts commented on YARN-1963: -- {quote} Without some sort of labels, it will be very hard for users to reason about the definition and relative importance of priorities across queues and cluster. We must support the notion of priority-labels to make this feature usable in practice. {quote} Maybe I'm missing something... Isn't it relatively easy to reason about 24 and therefore 2 is lower priority than 4? Unix/Linux hasn't had labels for priorities and it seems to be working pretty well there. Even if I have labels, I have to make sure that all queues and clusters define them precisely the same way or I wind up just as confused, if not even more. Just my $0.02 Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: 0001-YARN-1963-prototype.patch, YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3316) Make the ResourceManager, NodeManager and HistoryServer run from Eclipse.
[ https://issues.apache.org/jira/browse/YARN-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-2798 to YARN-3316: --- Component/s: (was: mrv2) Affects Version/s: (was: 0.23.0) 3.0.0 Key: YARN-3316 (was: MAPREDUCE-2798) Project: Hadoop YARN (was: Hadoop Map/Reduce) Make the ResourceManager, NodeManager and HistoryServer run from Eclipse. - Key: YARN-3316 URL: https://issues.apache.org/jira/browse/YARN-3316 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 3.0.0 Reporter: praveen sripati Priority: Minor Make the ResourceManager, NodeManager and HistoryServer run from Eclipse, so that it would be easy for development. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353397#comment-14353397 ] Craig Welch commented on YARN-2495: --- -re bq. How about we simply things? Instead of accepting labels on both registration and heartbeat, why not restrict it to be just during registration? As I understand the requirements, it's necessary to handle the case where the derived set of labels changes during the lifetime of the nodemanager, e.g. externally libraries might be installed or some other condition may change which effects the labels no nodemanager re-registration is involved, and yet the changed labels need to be reflected Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353383#comment-14353383 ] Vinod Kumar Vavilapalli commented on YARN-2495: --- Quick comments - configuration.type - configuration-type - Should RegisterNodeManagerRequestProto.nodeLabels be a set instead? - Do we really need NodeHeartbeatRequest.areNodeLabelsSetInReq()? Why not just look at the set as mentioned in the previous comment? - RegisterNodeManagerRequest is getting changed. It will be interesting to reason about rolling-upgrades in this scenario. - How about we simply things? Instead of accepting labels on both registration and heartbeat, why not restrict it to be just during registration? - We should not even accept a node's registration when it reports invalid labels? Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353418#comment-14353418 ] Zhijie Shen commented on YARN-3287: --- I double checked the oozie use case. It seems that for each individual job, oozie server will create a separate client to start the MR job. The change should be safe then. Thanks for the patch, Jon! It's almost fine to me. Just one nit. 1. In private ClientResponse doPosting(Object obj, String path), doAs op will throw UndeclaredThrowableException, shall we capture and unwrap it as before. {code} 332 } catch (InterruptedException ie) { 333 throw new IOException(ie); 314 } {code} TimelineClient kerberos authentication failure uses wrong login context. Key: YARN-3287 URL: https://issues.apache.org/jira/browse/YARN-3287 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Daryn Sharp Attachments: YARN-3287.1.patch, YARN-3287.2.patch, timeline.patch TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3287: -- Attachment: YARN-3287.3.patch [~zjshen], trying to unwrap as before. Let me know if this is what you are intending. TimelineClient kerberos authentication failure uses wrong login context. Key: YARN-3287 URL: https://issues.apache.org/jira/browse/YARN-3287 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Daryn Sharp Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, timeline.patch TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3307) Master-Worker Application on YARN
[ https://issues.apache.org/jira/browse/YARN-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3315 to YARN-3307: --- Affects Version/s: (was: 3.0.0) 3.0.0 Key: YARN-3307 (was: MAPREDUCE-3315) Project: Hadoop YARN (was: Hadoop Map/Reduce) Master-Worker Application on YARN - Key: YARN-3307 URL: https://issues.apache.org/jira/browse/YARN-3307 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Sharad Agarwal Assignee: Sharad Agarwal Attachments: MAPREDUCE-3315-1.patch, MAPREDUCE-3315-2.patch, MAPREDUCE-3315-3.patch, MAPREDUCE-3315.patch Currently master worker scenarios are forced fit into Map-Reduce. Now with YARN, these can be first class and would benefit real/near realtime workloads and be more effective in using the cluster resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3308) Improvements to CapacityScheduler documentation
[ https://issues.apache.org/jira/browse/YARN-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3658 to YARN-3308: --- Component/s: (was: mrv2) documentation Assignee: (was: Yoram Arnon) Target Version/s: (was: 2.0.0-alpha, 3.0.0) Affects Version/s: (was: 0.23.0) Key: YARN-3308 (was: MAPREDUCE-3658) Project: Hadoop YARN (was: Hadoop Map/Reduce) Improvements to CapacityScheduler documentation --- Key: YARN-3308 URL: https://issues.apache.org/jira/browse/YARN-3308 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Yoram Arnon Priority: Minor Labels: documentation Attachments: MAPREDUCE-3658, MAPREDUCE-3658 Original Estimate: 3h Remaining Estimate: 3h There are some typos and some cases of incorrect English. Also, the descriptions of yarn.scheduler.capacity.queue-path.capacity, yarn.scheduler.capacity.queue-path.maximum-capacity, yarn.scheduler.capacity.queue-path.user-limit-factor, yarn.scheduler.capacity.maximum-applications are not very clear to the uninitiated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3308) Improvements to CapacityScheduler documentation
[ https://issues.apache.org/jira/browse/YARN-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3308: --- Release Note: (was: documentation change only) Improvements to CapacityScheduler documentation --- Key: YARN-3308 URL: https://issues.apache.org/jira/browse/YARN-3308 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Yoram Arnon Priority: Minor Labels: documentation Attachments: MAPREDUCE-3658, MAPREDUCE-3658 Original Estimate: 3h Remaining Estimate: 3h There are some typos and some cases of incorrect English. Also, the descriptions of yarn.scheduler.capacity.queue-path.capacity, yarn.scheduler.capacity.queue-path.maximum-capacity, yarn.scheduler.capacity.queue-path.user-limit-factor, yarn.scheduler.capacity.maximum-applications are not very clear to the uninitiated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353390#comment-14353390 ] Vinod Kumar Vavilapalli commented on YARN-1963: --- {quote} As per discussion happened in YARN-2896 with Eric Payne and Wangda Tan, there is proposal to use Integer alone as priority from client and as well as in server. As per design doc, a priority label was used as wrapper for user and internally server was using corresponding integer with same. We can continue discussion on this here in parent JIRA. Looping Vinod Kumar Vavilapalli. Current idea: yarn.prority-labels = low:2, medium:4, high:6 Proposed: yarn.application.priority = 2, 3 , 4 {quote} Without some sort of labels, it will be very hard for users to reason about the definition and relative importance of priorities across queues and cluster. We must support the notion of priority-labels to make this feature usable in practice. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: 0001-YARN-1963-prototype.patch, YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3308) Improvements to CapacityScheduler documentation
[ https://issues.apache.org/jira/browse/YARN-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3308: --- Affects Version/s: 3.0.0 Improvements to CapacityScheduler documentation --- Key: YARN-3308 URL: https://issues.apache.org/jira/browse/YARN-3308 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 3.0.0 Reporter: Yoram Arnon Priority: Minor Labels: documentation Attachments: MAPREDUCE-3658, MAPREDUCE-3658 Original Estimate: 3h Remaining Estimate: 3h There are some typos and some cases of incorrect English. Also, the descriptions of yarn.scheduler.capacity.queue-path.capacity, yarn.scheduler.capacity.queue-path.maximum-capacity, yarn.scheduler.capacity.queue-path.user-limit-factor, yarn.scheduler.capacity.maximum-applications are not very clear to the uninitiated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353825#comment-14353825 ] Allen Wittenauer commented on YARN-321: --- Looks like this should get closed out w/a fix ver of 2.4.0? Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353882#comment-14353882 ] Robert Kanter commented on YARN-2928: - I agree; we're using aggregator for too many things. For TimelineAggregator, IIRC, [~kasha] had suggested TimelineCollector at one point, and that sounded good. TimelineReceiver also sounds fine. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3215: - Description: In existing CapacityScheduler, when computing headroom of an application, it will only consider non-labeled nodes of this application. But it is possible the application is asking for labeled resources, so headroom-by-label (like 5G resource available under node-label=red) is required to get better resource allocation and avoid deadlocks such as MAPREDUCE-5928. This JIRA could involve both API changes (such as adding a label-to-available-resource map in AllocateResponse) and also internal changes in CapacityScheduler. Respect labels in CapacityScheduler when computing headroom --- Key: YARN-3215 URL: https://issues.apache.org/jira/browse/YARN-3215 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan In existing CapacityScheduler, when computing headroom of an application, it will only consider non-labeled nodes of this application. But it is possible the application is asking for labeled resources, so headroom-by-label (like 5G resource available under node-label=red) is required to get better resource allocation and avoid deadlocks such as MAPREDUCE-5928. This JIRA could involve both API changes (such as adding a label-to-available-resource map in AllocateResponse) and also internal changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353891#comment-14353891 ] Wangda Tan commented on YARN-3215: -- Yes, it works for no-labeled environment only, I added some details in description, please feel free to let me know your ideas. Thanks, Respect labels in CapacityScheduler when computing headroom --- Key: YARN-3215 URL: https://issues.apache.org/jira/browse/YARN-3215 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan In existing CapacityScheduler, when computing headroom of an application, it will only consider non-labeled nodes of this application. But it is possible the application is asking for labeled resources, so headroom-by-label (like 5G resource available under node-label=red) is required to get better resource allocation and avoid deadlocks such as MAPREDUCE-5928. This JIRA could involve both API changes (such as adding a label-to-available-resource map in AllocateResponse) and also internal changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353890#comment-14353890 ] Wangda Tan commented on YARN-3215: -- Yes, it works for no-labeled environment only, I added some details in description, please feel free to let me know your ideas. Thanks, Respect labels in CapacityScheduler when computing headroom --- Key: YARN-3215 URL: https://issues.apache.org/jira/browse/YARN-3215 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan In existing CapacityScheduler, when computing headroom of an application, it will only consider non-labeled nodes of this application. But it is possible the application is asking for labeled resources, so headroom-by-label (like 5G resource available under node-label=red) is required to get better resource allocation and avoid deadlocks such as MAPREDUCE-5928. This JIRA could involve both API changes (such as adding a label-to-available-resource map in AllocateResponse) and also internal changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
Craig Welch created YARN-3318: - Summary: Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch reassigned YARN-3318: - Assignee: Craig Welch Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
Craig Welch created YARN-3319: - Summary: Implement a Fair SchedulerOrderingPolicy Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Implement a Fair SchedulerOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353787#comment-14353787 ] Jian He commented on YARN-3273: --- looks good, to distinguish scenarios like one user belongs to two queues, we probably need to add a separate queue tag too ? For the Active Users: field in CS queue page, it may also be useful to change that to be simply user names which links back to the user page with filtered user name. Just for implementation reference, the existing Node Labels page has some similar functionalities. thanks again for taking on this, Rohith ! Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3273-v1.patch, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353803#comment-14353803 ] Zhijie Shen commented on YARN-3287: --- Merge it into branch-2.7 too. TimelineClient kerberos authentication failure uses wrong login context. Key: YARN-3287 URL: https://issues.apache.org/jira/browse/YARN-3287 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Daryn Sharp Fix For: 2.7.0 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, timeline.patch TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353800#comment-14353800 ] Nathan Roberts commented on YARN-3215: -- Hi [~leftnoteasy]. Can you provide a summary of what this is about? Basic testing seems to show this works at least to some degree. e.g. jobs running on nodes without labels don't appear to include labeled-nodes as part of headroom (as expected). Respect labels in CapacityScheduler when computing headroom --- Key: YARN-3215 URL: https://issues.apache.org/jira/browse/YARN-3215 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353864#comment-14353864 ] Xuan Gong commented on YARN-1884: - The new patch addressed all the comments ContainerReport should have nodeHttpAddress --- Key: YARN-1884 URL: https://issues.apache.org/jira/browse/YARN-1884 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Xuan Gong Attachments: YARN-1884.1.patch, YARN-1884.2.patch In web UI, we're going to show the node, which used to be to link to the NM web page. However, on AHS web UI, and RM web UI after YARN-1809, the node field has to be set to nodeID where the container is allocated. We need to add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.13.patch Initial, incomplete patch with the overall framework implementation of the SchedulerComparatorPolicy and FifoComparator, major TODO includes integrating with capacity scheduler configuration. Also includes a CompoundComparator for chaining comparator based policies where desired. Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353815#comment-14353815 ] Jonathan Eagles commented on YARN-3287: --- Thanks, [~zjshen] TimelineClient kerberos authentication failure uses wrong login context. Key: YARN-3287 URL: https://issues.apache.org/jira/browse/YARN-3287 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Daryn Sharp Fix For: 2.7.0 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, timeline.patch TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3317) MR-279: Modularize web framework and webapps
[ https://issues.apache.org/jira/browse/YARN-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-2435 to YARN-3317: --- Tags: (was: mrv2, hamlet, module) Component/s: (was: mrv2) Key: YARN-3317 (was: MAPREDUCE-2435) Project: Hadoop YARN (was: Hadoop Map/Reduce) MR-279: Modularize web framework and webapps Key: YARN-3317 URL: https://issues.apache.org/jira/browse/YARN-3317 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Luke Lu The patch moves the web framework out of yarn-common into a separate module: yarn-web. It also decouple webapps into separate modules/jars from their respective server modules/jars to allow webapp updates independent of servers. Servers use ServiceLoader to discover its webapp modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353853#comment-14353853 ] Sangjin Lee commented on YARN-2928: --- I suppose the ApplicationMaster events refer to the ones that are written by the distributed shell AM. Correct? Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353965#comment-14353965 ] Craig Welch commented on YARN-3318: --- The proposed initial implementation of the framework to support FIFO SchedulerApplicationAttempt ordering for the CapacityScheduler: A SchedulerComparatorPolicy which implements OrderingPolicy above. This implementation will take care of the common logic required for cases where the policy can be effectively implemented as a comparator (which is expected to be the case for several potential policies, including FIFO). A SchedulerComparator which is used by the SchedulerComparatorPolicy above. This is an extension of the java Comparator interface with additional logic required by the SchedulerComparatorPolicy, initially a method to accept SchedulerProcessEvents and indicate whether the require re-ordering of the associated SchedulerProcess. Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1884) ContainerReport should have nodeHttpAddress
[ https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1884: Attachment: YARN-1884.2.patch ContainerReport should have nodeHttpAddress --- Key: YARN-1884 URL: https://issues.apache.org/jira/browse/YARN-1884 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Xuan Gong Attachments: YARN-1884.1.patch, YARN-1884.2.patch In web UI, we're going to show the node, which used to be to link to the NM web page. However, on AHS web UI, and RM web UI after YARN-1809, the node field has to be set to nodeID where the container is allocated. We need to add nodeHttpAddress to the containerReport to link users to NM web page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353852#comment-14353852 ] Wangda Tan commented on YARN-3298: -- [~nroberts], As you mentioned, it is mostly as same as what we have today, and I think it cannot solve the jitter problem. What I really want to say is enforce the limit. To solve small amount of resource cannot be used in a queue problem which you mentioned in https://issues.apache.org/jira/browse/YARN-3298?focusedCommentId=14353053page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14353053, setting user-limit a little bit higher should solve the problem also. (like from 50 to 51). Sounds like a plan? User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353897#comment-14353897 ] Wangda Tan commented on YARN-2495: -- I think the two issues are identical, and we should have a consistent way to handle them. If we stop node when any invalid labels during registration, we should stop node when same issue happened when heartbeat after registration. I think we can either allow them running or stop both of them, I'm fine with both approach. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353895#comment-14353895 ] Vrushali C commented on YARN-2928: -- + 1 to renaming TimelineAggregator. TimelineReceiver is good. Some other suggestions are TimelineAccumulator or TimelineCollector. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS
[ https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353990#comment-14353990 ] Jian He commented on YARN-3300: --- lgtm, +1 outstanding_resource_requests table should not be shown in AHS -- Key: YARN-3300 URL: https://issues.apache.org/jira/browse/YARN-3300 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3300.1.patch, YARN-3300.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3298) User-limit should be enforced in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353833#comment-14353833 ] Nathan Roberts commented on YARN-3298: -- [~leftnoteasy], won't that be extremely close to what it is today? If so, then does it really solve the jitter issue you originally cited? Just to make sure I'm in-sync with your proposed direction, this is the code you're thinking about modifying, correct? {code} // Note: We aren't considering the current request since there is a fixed // overhead of the AM, but it's a check, not a = check, so... if (Resources .greaterThan(resourceCalculator, clusterResource, user.getConsumedResourceByLabel(label), limit)) { {code} User-limit should be enforced in CapacityScheduler -- Key: YARN-3298 URL: https://issues.apache.org/jira/browse/YARN-3298 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, yarn Reporter: Wangda Tan Assignee: Wangda Tan User-limit is not treat as a hard-limit for now, it will not consider required-resource (resource of being-allocated resource request). And also, when user's used resource equals to user-limit, it will still continue. This will generate jitter issues when we have YARN-2069 (preemption policy kills a container under an user, and scheduler allocate a container under the same user soon after). The expected behavior should be as same as queue's capacity: Only when user.usage + required = user-limit (1), queue will continue to allocate container. (1), user-limit mentioned here is determined by following computing {code} current-capacity = queue.used + now-required (when queue.used queue.capacity) queue.capacity (when queue.used queue.capacity) user-limit = min(max(current-capacity / #active-users, current-capacity * user-limit / 100), queue-capacity * user-limit-factor) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353872#comment-14353872 ] Sangjin Lee commented on YARN-2928: --- A couple of more comments on the plan: - I think the metrics API should be part of phase 2 since we will handle aggregation - It's a small item, but we should make the per-node aggregator a standalone daemon part of phase 2 Speaking of aggregator, the word aggregation/aggregator is now getting quite overloaded. Originally it meant rolling up metrics to parent entities. Now it's really used in two quite different contexts. For example, the TimelineAggregator classes have little to do with that original meaning. I'm not quite sure what aggregation means in that context, although, I know, I know, I said +1 to the name TimelineAggregator. :) Should we clear up this confusion? IMO, we should stick with the original meaning of aggregation when we talk about aggregation. For TimelineAggregator, perhaps we could rename it to TimelineReceiver or another name? Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353953#comment-14353953 ] Craig Welch commented on YARN-3318: --- Proposed elements of the framework: A SchedulerProcess interface which generalizes processes to be managed by the OrderingPolicy (initially, potentially in the future by other Policies as well) Initial implementer will be the SchedulerApplicaitonAttempt. An OrderingPolicy interface which exposes a collection of scheduler processes which will be ordered by the policy for container assignment and preemption. The ordering policy will provide one Iterator which presents processes in the policy specific order for container assignment and another Iterator which presents them in the proper order for preemption. It will also accept SchedulerProcessEvents which may indicate a need to re-order the associated SchedulerProcess (for example, after container completion, preemption, assignment, etc) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352766#comment-14352766 ] Hadoop QA commented on YARN-3225: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703395/YARN-914.patch against trunk revision 5578e22. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.api.TestPBImplRecords org.apache.hadoop.yarn.server.resourcemanager.webapp.TestNodesPage org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6893//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6893//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6893//console This message is automatically generated. New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K Attachments: YARN-914.patch New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-160: -- Fix Version/s: (was: 2.7.0) nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Varun Vasudev Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS
[ https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354227#comment-14354227 ] Jian He commented on YARN-3300: --- sounds good. committing outstanding_resource_requests table should not be shown in AHS -- Key: YARN-3300 URL: https://issues.apache.org/jira/browse/YARN-3300 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3300.1.patch, YARN-3300.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly
[ https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-1142: --- Fix Version/s: (was: 2.7.0) MiniYARNCluster web ui does not work properly - Key: YARN-1142 URL: https://issues.apache.org/jira/browse/YARN-1142 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur When going to the RM http port, the NM web ui is displayed. It seems there is a singleton somewhere that breaks things when RM NMs run in the same process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2890: --- Fix Version/s: (was: 2.7.0) MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-314: -- Fix Version/s: (was: 2.7.0) Schedulers should allow resource requests of different sizes at the same priority and location -- Key: YARN-314 URL: https://issues.apache.org/jira/browse/YARN-314 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Attachments: yarn-314-prelim.patch Currently, resource requests for the same container and locality are expected to all be the same size. While it it doesn't look like it's needed for apps currently, and can be circumvented by specifying different priorities if absolutely necessary, it seems to me that the ability to request containers with different resource requirements at the same priority level should be there for the future and for completeness sake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2172) Suspend/Resume Hadoop Jobs
[ https://issues.apache.org/jira/browse/YARN-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2172: --- Fix Version/s: (was: 2.2.0) Suspend/Resume Hadoop Jobs -- Key: YARN-2172 URL: https://issues.apache.org/jira/browse/YARN-2172 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager, webapp Affects Versions: 2.2.0 Environment: CentOS 6.5, Hadoop 2.2.0 Reporter: Richard Chen Labels: hadoop, jobs, resume, suspend Attachments: Hadoop Job Suspend Resume Design.docx, hadoop_job_suspend_resume.patch Original Estimate: 336h Remaining Estimate: 336h In a multi-application cluster environment, jobs running inside Hadoop YARN may be of lower-priority than jobs running outside Hadoop YARN like HBase. To give way to other higher-priority jobs inside Hadoop, a user or some cluster-level resource scheduling service should be able to suspend and/or resume some particular jobs within Hadoop YARN. When target jobs inside Hadoop are suspended, those already allocated and running task containers will continue to run until their completion or active preemption by other ways. But no more new containers would be allocated to the target jobs. In contrast, when suspended jobs are put into resume mode, they will continue to run from the previous job progress and have new task containers allocated to complete the rest of the jobs. My team has completed its implementation and our tests showed it works in a rather solid and convenient way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-965) NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed
[ https://issues.apache.org/jira/browse/YARN-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-965: -- Fix Version/s: (was: 2.7.0) NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed -- Key: YARN-965 URL: https://issues.apache.org/jira/browse/YARN-965 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.4-alpha Environment: suse linux Reporter: Li Yuan When successfully launched a container, container state from LOCALIZED to RUNNING, containersRunning ++. Container state from EXITED_WITH_FAILURE or KILLING to DONE, containersRunning--. However, state EXITED_WITH_FAILURE or KILLING could come from LOCALIZING(LOCALIZED), not RUNNING, which caused containersRunningis less than the actual number. Further more, Metrics is wrong, containersLaunched != containersCompleted + containersFailed + containersKilled + containersRunning + containersIniting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2784) Yarn project module names in POM needs to consistent acros hadoop project
[ https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2784: --- Component/s: (was: test) build Yarn project module names in POM needs to consistent acros hadoop project - Key: YARN-2784 URL: https://issues.apache.org/jira/browse/YARN-2784 Project: Hadoop YARN Issue Type: Improvement Components: build Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: YARN-2784.patch All yarn and mapreduce pom.xml has project name has hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop projects build like 'Apache Hadoop Yarn module-name' and 'Apache Hadoop MapReduce module-name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354279#comment-14354279 ] Rohith commented on YARN-3273: -- Thanks Jian He for your suggestion:-) Overall summary to be in right direction. I am assuming that all scheduler changes is only for CS. Is there any common scheduelr changes to be done ? # Headroom will be dispalyed in application attempt page. This will be set as 0 once the attempt is finished. # For each leaf queue in CS, UsedAMResource,UsedUserAMResource, 'User Limit for User' will be displayed. # In Active User, for each user link will be provided which redirect to additional filtered user page containing userInfo in table as above sample table. This is also applicable only for CS. # All active users table wont be rendered. Instead only link will be provided for each user i.e step-3 in active user. Am I understading is correct? Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3273-v1.patch, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-745) Move UnmanagedAMLauncher to yarn client package
[ https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-745: -- Fix Version/s: (was: 2.7.0) Move UnmanagedAMLauncher to yarn client package --- Key: YARN-745 URL: https://issues.apache.org/jira/browse/YARN-745 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Its currently sitting in yarn applications project which sounds wrong. client project sounds better since it contains the utilities/libraries that clients use to write and debug yarn applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2172) Suspend/Resume Hadoop Jobs
[ https://issues.apache.org/jira/browse/YARN-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354306#comment-14354306 ] Hadoop QA commented on YARN-2172: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658578/hadoop_job_suspend_resume.patch against trunk revision 47f7f18. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6899//console This message is automatically generated. Suspend/Resume Hadoop Jobs -- Key: YARN-2172 URL: https://issues.apache.org/jira/browse/YARN-2172 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager, webapp Affects Versions: 2.2.0 Environment: CentOS 6.5, Hadoop 2.2.0 Reporter: Richard Chen Labels: hadoop, jobs, resume, suspend Attachments: Hadoop Job Suspend Resume Design.docx, hadoop_job_suspend_resume.patch Original Estimate: 336h Remaining Estimate: 336h In a multi-application cluster environment, jobs running inside Hadoop YARN may be of lower-priority than jobs running outside Hadoop YARN like HBase. To give way to other higher-priority jobs inside Hadoop, a user or some cluster-level resource scheduling service should be able to suspend and/or resume some particular jobs within Hadoop YARN. When target jobs inside Hadoop are suspended, those already allocated and running task containers will continue to run until their completion or active preemption by other ways. But no more new containers would be allocated to the target jobs. In contrast, when suspended jobs are put into resume mode, they will continue to run from the previous job progress and have new task containers allocated to complete the rest of the jobs. My team has completed its implementation and our tests showed it works in a rather solid and convenient way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3323) Task UI, sort by name doesn't work
[ https://issues.apache.org/jira/browse/YARN-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3323: Summary: Task UI, sort by name doesn't work (was: MR Task UI, sort by name doesn't work) Moving to YARN project. Task UI, sort by name doesn't work -- Key: YARN-3323 URL: https://issues.apache.org/jira/browse/YARN-3323 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.5.1 Reporter: Thomas Graves Assignee: Brahma Reddy Battula If you go to the MapReduce ApplicationMaster or HistoryServer UI and open the list of tasks, then try to sort by the task name/id, it does nothing. Note that if you go to the task attempts, that seem to sort fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354362#comment-14354362 ] Naganarasimha G R commented on YARN-2495: - Hi [~wangda], 1) IMO method name was not readable when it was {{setAreNodeLabelsSet}} but i have changed it to {{setAreNodeLabelsSetInReq}} i feel this is sufficient. setAreNodeLabelsUpdated is same as earlier for which Craig had commented (which i also feel valid) {quote} I would go with areNodeLablesSet (all isNodeLabels = areNodeLabels wherever it appears, actually) - wrt Set vs Updated - this is primarily a workaround for the null/empty ambiguity and I think this name better reflects what is really going on (am I sending a value to act on or not), but I also think that this is a better contract, the receiver (rm) shouldn't really care about the logic the nm side is using to decide whether or not to set it's labels (freshness, updatedness, whatever), so all that should be communicated in the api is whether or not the value is set, not whether it's an update/whether it's checking freshness, etc. that's a nit, but I think it's a clearer name. {quote} Yes true lets finalize the name this time after that will start working on the patch if not it will be a wasted effort 5) {quote} It will be problematic to ask admins make NM/RM configuration keep synchronized, so I don't want (and also not necessary) NM depends on RM's configuration. So I suggest to make a changes: In NodeManager.java: when user doesn't configure provider, it should be null. In your patch, you can return a null directly, and YARN-2729 will implement the logic of instancing provider from config. In NodeStatusUpdaterImpl: avoid using isDistributedNodeLabelsConf, since we will not have distributedNodeLabelConf in NM side if you agree on previously comment, instead, it will check null of provider. {quote} Well modifications side is clear to me but is it good to allow the configurations being different from NM and RM ? Infact i wanted to discuss regarding whether to send shutdown during register if NM is configured differently from RM, but waited for the base changes to go in before discussing new stuff. 8) ??You can add an additional comments in line 626 for this.?? Ok will add a comment in LabelProvider.getLabels , Idea is LabelProvider is expected to give same Labels continiously untill there is a change and if null or empty is returned then No label is assumed 10) {{updateNodeLabelsInNodeLabelsManager - updateNodeLabelsFromNMReport}} : will take care in next patch {{LOG.info(... accepted from RM, use LOG.debug and check isDebugEnabled.}} : I feel better to Log this as Error as we are sending the labels only in case of any change and there has to be some way to identify if labels for a given NM and also currently we are sending out shutdown signal too. ??Make errorMessage clear: indicate 1# this is node labels reported from NM, and 2# it's failed to be put to RM instead of not properly configured.?? i think i have captured first point, but any way will reframe it as {{Node Labels labels reported from the NM with id nodeID were rejected from RM with exception message as exceptionMsg.}} ??Another thing we should do is, when distributed node label configuration is set, any direct modify node to labels mapping from RMAdminCLI should be rejected (like -replaceNodeToLabels).?? Will work on this once 2495 and 2729 are done .. Thanks [~vinodkv] [~cwelch] for reviewing it ??configuration.type - configuration-type?? will take care in next patch {quote} Should RegisterNodeManagerRequestProto.nodeLabels be a set instead? Do we really need NodeHeartbeatRequest.areNodeLabelsSetInReq()? Why not just look at the set as mentioned in the previous comment? {quote} Well as craig informed, RegisterNodeManagerRequestProto.nodeLabels is already a set but as by default empty set is provided by protoc, its req to inform whether labels are set as part of request hence areNodeLabelsSetInReq is required. ??RegisterNodeManagerRequest is getting changed. It will be interesting to reason about rolling-upgrades in this scenario.?? Well though i am not much aware of Rolling upgrades, i don't see any problems in a normal case because RM tries to read the labels from NM's req only when its distributed conf and also {{areNodeLabelsSetInReq}} is by default false. But I had queries when some existing setup they want to modify to distributed conf setup # Whether we need to send shutdown during register if NM is configured differently from RM ? # Will the new configurations be added in NM and RM and then Rolling upgrade will be done ? or we do rolling upgrade first and then reconfigure restart RM's and NM's ??How about we simply things? Instead of accepting labels on both registration and heartbeat, why not restrict it to be just during registration?? Well i have
[jira] [Updated] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3305: - Attachment: 0001-YARN-3305.patch AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS
[ https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354236#comment-14354236 ] Hudson commented on YARN-3300: -- FAILURE: Integrated in Hadoop-trunk-Commit #7293 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7293/]) YARN-3300. Outstanding_resource_requests table should not be shown in AHS. Contributed by Xuan Gong (jianhe: rev c3003eba6f9802f15699564a5eb7c6e34424cb14) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppAttemptPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/CHANGES.txt outstanding_resource_requests table should not be shown in AHS -- Key: YARN-3300 URL: https://issues.apache.org/jira/browse/YARN-3300 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-3300.1.patch, YARN-3300.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2784) Yarn project module names in POM needs to consistent acros hadoop project
[ https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2784: --- Fix Version/s: (was: 2.7.0) Yarn project module names in POM needs to consistent acros hadoop project - Key: YARN-2784 URL: https://issues.apache.org/jira/browse/YARN-2784 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: YARN-2784.patch All yarn and mapreduce pom.xml has project name has hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop projects build like 'Apache Hadoop Yarn module-name' and 'Apache Hadoop MapReduce module-name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1147) Add end-to-end tests for HA
[ https://issues.apache.org/jira/browse/YARN-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-1147: --- Fix Version/s: (was: 2.7.0) Add end-to-end tests for HA --- Key: YARN-1147 URL: https://issues.apache.org/jira/browse/YARN-1147 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Xuan Gong While individual sub-tasks add tests for the code they include, it will be handy to write end-to-end tests for HA including some stress testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS
[ https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-153: -- Fix Version/s: (was: 2.7.0) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS Key: YARN-153 URL: https://issues.apache.org/jira/browse/YARN-153 Project: Hadoop YARN Issue Type: New Feature Reporter: Jacob Jaigak Song Assignee: Jacob Jaigak Song Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, MAPREDUCE4393.patch Original Estimate: 336h Time Spent: 336h Remaining Estimate: 0h This application is to demonstrate that YARN can be used for non-mapreduce applications. As Hadoop has already been adopted and deployed widely and its deployment in future will be highly increased, we thought that it's a good potential to be used as PaaS. I have implemented a proof of concept to demonstrate that YARN can be used as a PaaS (Platform as a Service). I have done a gap analysis against VMware's Cloud Foundry and tried to achieve as many PaaS functionalities as possible on YARN. I'd like to check in this POC as a YARN example application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-113) WebAppProxyServlet must use SSLFactory for the HttpClient connections
[ https://issues.apache.org/jira/browse/YARN-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-113: -- Fix Version/s: (was: 2.7.0) WebAppProxyServlet must use SSLFactory for the HttpClient connections - Key: YARN-113 URL: https://issues.apache.org/jira/browse/YARN-113 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur The HttpClient must be configured to use the SSLFactory when the web UIs are over HTTPS, otherwise the proxy servlet fails to connect to the AM because of unknown (self-signed) certificates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3323) MR Task UI, sort by name doesn't work
[ https://issues.apache.org/jira/browse/YARN-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA moved MAPREDUCE-6102 to YARN-3323: Component/s: (was: webapps) webapp Target Version/s: (was: 2.6.0) Affects Version/s: (was: 2.5.1) 2.5.1 Key: YARN-3323 (was: MAPREDUCE-6102) Project: Hadoop YARN (was: Hadoop Map/Reduce) MR Task UI, sort by name doesn't work - Key: YARN-3323 URL: https://issues.apache.org/jira/browse/YARN-3323 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.5.1 Reporter: Thomas Graves Assignee: Brahma Reddy Battula If you go to the MapReduce ApplicationMaster or HistoryServer UI and open the list of tasks, then try to sort by the task name/id, it does nothing. Note that if you go to the task attempts, that seem to sort fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354006#comment-14354006 ] Craig Welch commented on YARN-3319: --- Initially this will be implemented for SchedulerApplicationAttempts in the CapacityScheduler LeafQueue (similar to the FIFO implementation in [YARN-3318]). The expectation is that this will be implement the SchedulerComparator interface and will be used as a comparator within the SchedulerComparatorPolicy implementation to achieve the intended behavior. Implement a Fair SchedulerOrderingPolicy Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Implement a Fair SchedulerOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3321) Health-Report column of NodePage should display more information.
[ https://issues.apache.org/jira/browse/YARN-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved MAPREDUCE-3091 to YARN-3321: --- Component/s: (was: nodemanager) (was: resourcemanager) resourcemanager nodemanager Assignee: (was: Subroto Sanyal) Affects Version/s: (was: 0.23.0) Key: YARN-3321 (was: MAPREDUCE-3091) Project: Hadoop YARN (was: Hadoop Map/Reduce) Health-Report column of NodePage should display more information. --- Key: YARN-3321 URL: https://issues.apache.org/jira/browse/YARN-3321 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Reporter: Subroto Sanyal Labels: javascript The Health-Checker script of the Nodes can run and generate some output, error and exit code. These information is not available in the GUI. It is possible the Health-Checker script generates some statistics about node. The same can displayed to GUI user. I suggest we display the information in pop-up balloon(using CSS/Javascript)? Any suggestions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354003#comment-14354003 ] Karthik Kambatla commented on YARN-2928: +1 to renaming. Prefer - TimelineCollector and TimelineReceiver in that order. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS
[ https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354005#comment-14354005 ] Jian He commented on YARN-3300: --- actually, after looking at the UI, on app page, there's a big blank space above the resource requests table, similarly for the attempt page. could you fix that too ? outstanding_resource_requests table should not be shown in AHS -- Key: YARN-3300 URL: https://issues.apache.org/jira/browse/YARN-3300 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3300.1.patch, YARN-3300.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)