[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366950#comment-14366950 ] Junping Du commented on YARN-3039: -- Thanks [~zjshen] and [~sjlee0] for review! [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Fix For: YARN-2928 Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch, YARN-3039-v6.patch, YARN-3039-v7.patch, YARN-3039-v8.patch, YARN-3039.9.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366917#comment-14366917 ] Varun Saxena commented on YARN-3047: Thanks a lot [~zjshen] for the review. bq. 1. No need to change timeline/TimelineEvents.java. Ok. bq. 2. In YarnConfiguration, how about we still reusing the existing timeline service config. I propose config reuse because there doesn't exist the use case that we start old timeline server and the new timeline reader server together. And change in WebAppUtils should be not necessary too. Same config has been used by aggregator as well. Thats why kept a new config. I guess it is possible that reader runs on the same node as aggregator bq. 3. NameValuePair is for internal usage only. Let's keep it in the timeline service module? Its in timeline service package itself i.e. {{hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/NameValuePair.java}}. Did you mean something else ? bq. Rename TimelineReaderStore to TimelineReader. Ok. bq. I think we don't need to have NullTimelineReader. Instead, we should have a POC implementation based on local FS like FileSystemTimelineWriterImpl. But we can defer this work in a separate jira if the implementation is not straightforward. Yes NullTimelineReader was just to compile the code as TimelineReader store would be an interface. Plan to have FS based implementation as part of YARN-3051. Will update a patch for it once this goes in. Probably store related code can be removed from this JIRA and handled completely as part of YARN-3051 to have a focussed review. Thoughts ? bq. 5. TimelineReaderServer - TimelineWebServer? For startTimelineReaderWebApp, can we do something similar to TimelineAggregatorsCollection#startWebApp. The intention for TimelineReaderServer was not to have it merely act as a REST endpoint. Hence not the name TimelineWebServer. TimelineReaderServer would use RPC as well for instance to serve request coming from YARN CLI. Commands such as yarn application used to contact AHS if app was not found in RM. This should now be handled by Timeline Reader. For this, I plan to raise another JIRA, once this one goes in. bq. 6. Add the command in yarn and yarn.cmd to start the server. This as per discussion with Sangjin will be done as part of YARN-3048. I will probably update a document regarding TimelineReader as soon as possible. [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.02.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3181) FairScheduler: Fix up outdated findbugs issues
[ https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366993#comment-14366993 ] Hudson commented on YARN-3181: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #136 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/136/]) Revert YARN-3181. FairScheduler: Fix up outdated findbugs issues. (kasha) (kasha: rev 32b43304563c2430c00bc3e142a962d2bc5f4d58) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSOpDurations.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java FairScheduler: Fix up outdated findbugs issues -- Key: YARN-3181 URL: https://issues.apache.org/jira/browse/YARN-3181 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Brahma Reddy Battula Attachments: yarn-3181-1.patch In FairScheduler, we have excluded some findbugs-reported errors. Some of them aren't applicable anymore, and there are a few that can be easily fixed without needing an exclusion. It would be nice to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366995#comment-14366995 ] Hudson commented on YARN-3243: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #136 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/136/]) YARN-3243. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. Contributed by Wangda Tan. (jianhe: rev 487374b7fe0c92fc7eb1406c568952722b5d5b15) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366996#comment-14366996 ] Hudson commented on YARN-3273: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #136 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/136/]) YARN-3273. Improve scheduler UI to facilitate scheduling analysis and debugging. Contributed Rohith Sharmaks (jianhe: rev 658097d6da1b1aac8e01db459f0c3b456e99652f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestContinuousScheduling.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/UserInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.8.0 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366998#comment-14366998 ] Hudson commented on YARN-3197: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #136 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/136/]) YARN-3197. Confusing log generated by CapacityScheduler. Contributed by (devaraj: rev 7179f94f9d000fc52bd9ce5aa9741aba97ec3ee8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Fix For: 2.8.0 Attachments: YARN-3197.001.patch, YARN-3197.002.patch, YARN-3197.003.patch, YARN-3197.004.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3205) FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration.
[ https://issues.apache.org/jira/browse/YARN-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366997#comment-14366997 ] Hudson commented on YARN-3205: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #136 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/136/]) YARN-3205. FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. Contributed by Zhihai Xu. (ozawa: rev 3bc72cc16d8c7b8addd8f565523001dfcc32b891) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/CHANGES.txt FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. --- Key: YARN-3205 URL: https://issues.apache.org/jira/browse/YARN-3205 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3205.000.patch, YARN-3205.001.patch FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. The old configuration may not have all these customized DFS_CLIENT configurations for FileSystemRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3357) Move TestFifoScheduler to FIFO package
[ https://issues.apache.org/jira/browse/YARN-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3357: - Attachment: 0001-YARN-3357.patch Move TestFifoScheduler to FIFO package -- Key: YARN-3357 URL: https://issues.apache.org/jira/browse/YARN-3357 Project: Hadoop YARN Issue Type: Task Components: scheduler Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3357.patch There are 2 test classes are found for fifo scheduler i.e # org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler # org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler. In these some test cases are common in both that does same functionality has been verified i.e testBlackListNodes. Tests from package org.apache.hadoop.yarn.server.resourcemanager can be merged with package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo. And eliminate duplicate tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367031#comment-14367031 ] Junping Du commented on YARN-3034: -- bq. As a further clarification, my problem is mainly on the test distributed shell. Right now we're using very ad hoc ways to set which version of timeline service we're using. Currently we're using test names to distinguish timeline V1 and V2, and since both versions work on the same port, we need to explicitly disable one version to use the other. Instead of doing this in the test script each time, I'd hope that there are some global settings/logic on the server side to decide which exact version of timeline service to launch. All the tests need to do is to check (and set) the version of active timeline service and launch the mini YARN cluster. It's a little bit off topic here so let move the rest discussion to YARN-3352. Thanks [~gtCarrera9] for clarifying more on this. Agree that we should have a more clean way to launch v1 and v2 service in unit test. May be launch both on different ports? Anyway, let's continue the discussion on YARN-3352. Back to the latest patch, mostly looks fine to me. Two minor comments: {code} + public static final String TIMELINE_SERVICE_VERSION = YARN_PREFIX + + timeline-service.version; {code} Can we replace this with TIMELINE_SERVICE_PREFIX + version ? {code} +YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED) + conf.getBoolean( +YarnConfiguration.SYSTEM_METRICS_PUBLISHER_ENABLED, +YarnConfiguration.DEFAULT_SYSTEM_METRICS_PUBLISHER_ENABLED) + YarnConfiguration.TIMELINE_SERVICE_VERSION_ONE.equals(conf.get( +YarnConfiguration.TIMELINE_SERVICE_VERSION, +YarnConfiguration.DEFAULT_TIMELINE_SERVICE_VERSION)); {code} equals = equalsIgnoreCase as user may input v1 or v2 (in lower case) which should also be accepted. Also, we should add a warning message log if user put something illegal here or it just get silient without any warn. BTW, [~sjlee0] has a refactor patch on YARN- which should get in quickly. This patch may need to rebase when that one is in. [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3181) FairScheduler: Fix up outdated findbugs issues
[ https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3181: --- Attachment: YARN-3181-002.patch FairScheduler: Fix up outdated findbugs issues -- Key: YARN-3181 URL: https://issues.apache.org/jira/browse/YARN-3181 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Brahma Reddy Battula Attachments: YARN-3181-002.patch, yarn-3181-1.patch In FairScheduler, we have excluded some findbugs-reported errors. Some of them aren't applicable anymore, and there are a few that can be easily fixed without needing an exclusion. It would be nice to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367023#comment-14367023 ] Varun Saxena commented on YARN-3047: Did you mean NameValuePair can have package level access instead of public? [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.02.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3357) Move TestFifoScheduler to FIFO package
[ https://issues.apache.org/jira/browse/YARN-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367108#comment-14367108 ] Hadoop QA commented on YARN-3357: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705333/0001-YARN-3357.patch against trunk revision 3411732. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7011//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7011//console This message is automatically generated. Move TestFifoScheduler to FIFO package -- Key: YARN-3357 URL: https://issues.apache.org/jira/browse/YARN-3357 Project: Hadoop YARN Issue Type: Task Components: scheduler Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3357.patch There are 2 test classes are found for fifo scheduler i.e # org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler # org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler. In these some test cases are common in both that does same functionality has been verified i.e testBlackListNodes. Tests from package org.apache.hadoop.yarn.server.resourcemanager can be merged with package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo. And eliminate duplicate tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367007#comment-14367007 ] Hudson commented on YARN-3243: -- FAILURE: Integrated in Hadoop-Yarn-trunk #870 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/870/]) YARN-3243. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. Contributed by Wangda Tan. (jianhe: rev 487374b7fe0c92fc7eb1406c568952722b5d5b15) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be
[jira] [Commented] (YARN-3205) FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration.
[ https://issues.apache.org/jira/browse/YARN-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367009#comment-14367009 ] Hudson commented on YARN-3205: -- FAILURE: Integrated in Hadoop-Yarn-trunk #870 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/870/]) YARN-3205. FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. Contributed by Zhihai Xu. (ozawa: rev 3bc72cc16d8c7b8addd8f565523001dfcc32b891) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * hadoop-yarn-project/CHANGES.txt FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. --- Key: YARN-3205 URL: https://issues.apache.org/jira/browse/YARN-3205 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3205.000.patch, YARN-3205.001.patch FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. The old configuration may not have all these customized DFS_CLIENT configurations for FileSystemRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3357) Move TestFifoScheduler to FIFO package
[ https://issues.apache.org/jira/browse/YARN-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367030#comment-14367030 ] Rohith commented on YARN-3357: -- Attaching the patch with following changes # Moved all the tests from org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler # Removed duplicated test which was testing same functionality in both the classes i.e {{testBlackListNodes}} # Other 2 test classes were using TestFifoScheduler as class name for logging in test. I just corrected to its own classes. Kindly review the patch Move TestFifoScheduler to FIFO package -- Key: YARN-3357 URL: https://issues.apache.org/jira/browse/YARN-3357 Project: Hadoop YARN Issue Type: Task Components: scheduler Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3357.patch There are 2 test classes are found for fifo scheduler i.e # org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler # org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler. In these some test cases are common in both that does same functionality has been verified i.e testBlackListNodes. Tests from package org.apache.hadoop.yarn.server.resourcemanager can be merged with package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo. And eliminate duplicate tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367077#comment-14367077 ] Hudson commented on YARN-3243: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2068 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2068/]) YARN-3243. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. Contributed by Wangda Tan. (jianhe: rev 487374b7fe0c92fc7eb1406c568952722b5d5b15) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be
[jira] [Commented] (YARN-3205) FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration.
[ https://issues.apache.org/jira/browse/YARN-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367079#comment-14367079 ] Hudson commented on YARN-3205: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2068 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2068/]) YARN-3205. FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. Contributed by Zhihai Xu. (ozawa: rev 3bc72cc16d8c7b8addd8f565523001dfcc32b891) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. --- Key: YARN-3205 URL: https://issues.apache.org/jira/browse/YARN-3205 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3205.000.patch, YARN-3205.001.patch FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. The old configuration may not have all these customized DFS_CLIENT configurations for FileSystemRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367076#comment-14367076 ] Hudson commented on YARN-3305: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2068 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2068/]) YARN-3305. Normalize AM resource request on app submission. Contributed by Rohith Sharmaks (jianhe: rev 968425e9f7b850ff9c2ab8ca37a64c3fdbe77dbf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Fix For: 2.8.0 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 0002-YARN-3305.patch, 0003-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367078#comment-14367078 ] Hudson commented on YARN-3273: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2068 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2068/]) YARN-3273. Improve scheduler UI to facilitate scheduling analysis and debugging. Contributed Rohith Sharmaks (jianhe: rev 658097d6da1b1aac8e01db459f0c3b456e99652f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestContinuousScheduling.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/UserInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.8.0 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367010#comment-14367010 ] Hudson commented on YARN-3197: -- FAILURE: Integrated in Hadoop-Yarn-trunk #870 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/870/]) YARN-3197. Confusing log generated by CapacityScheduler. Contributed by (devaraj: rev 7179f94f9d000fc52bd9ce5aa9741aba97ec3ee8) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Fix For: 2.8.0 Attachments: YARN-3197.001.patch, YARN-3197.002.patch, YARN-3197.003.patch, YARN-3197.004.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367008#comment-14367008 ] Hudson commented on YARN-3273: -- FAILURE: Integrated in Hadoop-Yarn-trunk #870 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/870/]) YARN-3273. Improve scheduler UI to facilitate scheduling analysis and debugging. Contributed Rohith Sharmaks (jianhe: rev 658097d6da1b1aac8e01db459f0c3b456e99652f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestContinuousScheduling.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/UserInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.8.0 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367006#comment-14367006 ] Hudson commented on YARN-3305: -- FAILURE: Integrated in Hadoop-Yarn-trunk #870 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/870/]) YARN-3305. Normalize AM resource request on app submission. Contributed by Rohith Sharmaks (jianhe: rev 968425e9f7b850ff9c2ab8ca37a64c3fdbe77dbf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Fix For: 2.8.0 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 0002-YARN-3305.patch, 0003-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3181) FairScheduler: Fix up outdated findbugs issues
[ https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367005#comment-14367005 ] Hudson commented on YARN-3181: -- FAILURE: Integrated in Hadoop-Yarn-trunk #870 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/870/]) Revert YARN-3181. FairScheduler: Fix up outdated findbugs issues. (kasha) (kasha: rev 32b43304563c2430c00bc3e142a962d2bc5f4d58) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSOpDurations.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml FairScheduler: Fix up outdated findbugs issues -- Key: YARN-3181 URL: https://issues.apache.org/jira/browse/YARN-3181 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Brahma Reddy Battula Attachments: yarn-3181-1.patch In FairScheduler, we have excluded some findbugs-reported errors. Some of them aren't applicable anymore, and there are a few that can be easily fixed without needing an exclusion. It would be nice to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3364) Clarify Naming of yarn.client.nodemanager-connect.max-wait-ms and yarn.resourcemanager.connect.max-wait.ms
[ https://issues.apache.org/jira/browse/YARN-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367167#comment-14367167 ] Andrew Johnson commented on YARN-3364: -- No, I did not have YARN-3238 applied. Thanks for that! Given that and HADOOP-11398 I think this can can be closed. Clarify Naming of yarn.client.nodemanager-connect.max-wait-ms and yarn.resourcemanager.connect.max-wait.ms --- Key: YARN-3364 URL: https://issues.apache.org/jira/browse/YARN-3364 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Andrew Johnson I encountered an issue recently where the ApplicationMaster for MapReduce jobs would spend hours attempting to connect to a node in my cluster that had died due to a hardware fault. After debugging this, I found that the yarn.client.nodemanager-connect.max-wait-ms property did not behave as I had expected. Based on the name I had thought this would set a maximum time limit for attempting to connect to a NodeManager. The code in org.apache.hadoop.yarn.client.NMProxy corroborated this thought - it used a RetryUpToMaximumTimeWithFixedSleep policy when a ConnectTimeoutException was thrown, as it was in my case with a dead node. However, the RetryUpToMaximumTimeWithFixedSleep policy doesn't actually set a time limit, but instead divides the maximum time by the sleep period to set a total number of retries, regardless of how long those retries take. As such I was seeing the ApplicationMaster spend much longer attempting to make a connection than I had anticipated. The yarn.resourcemanager.connect.max-wait.ms would have the same behavior. These properties would be better named like yarn.client.nodemanager-connect.max.retries and yarn.resourcemanager.connect.max.retries to better align with the actual behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367300#comment-14367300 ] Hudson commented on YARN-3305: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2086 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2086/]) YARN-3305. Normalize AM resource request on app submission. Contributed by Rohith Sharmaks (jianhe: rev 968425e9f7b850ff9c2ab8ca37a64c3fdbe77dbf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Fix For: 2.8.0 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 0002-YARN-3305.patch, 0003-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3181) FairScheduler: Fix up outdated findbugs issues
[ https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367240#comment-14367240 ] Hudson commented on YARN-3181: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #127 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/127/]) Revert YARN-3181. FairScheduler: Fix up outdated findbugs issues. (kasha) (kasha: rev 32b43304563c2430c00bc3e142a962d2bc5f4d58) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSOpDurations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java FairScheduler: Fix up outdated findbugs issues -- Key: YARN-3181 URL: https://issues.apache.org/jira/browse/YARN-3181 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Brahma Reddy Battula Attachments: YARN-3181-002.patch, yarn-3181-1.patch In FairScheduler, we have excluded some findbugs-reported errors. Some of them aren't applicable anymore, and there are a few that can be easily fixed without needing an exclusion. It would be nice to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3364) Clarify Naming of yarn.client.nodemanager-connect.max-wait-ms and yarn.resourcemanager.connect.max-wait.ms
[ https://issues.apache.org/jira/browse/YARN-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-3364. -- Resolution: Duplicate Closing this as a duplicate of HADOOP-11398. Clarify Naming of yarn.client.nodemanager-connect.max-wait-ms and yarn.resourcemanager.connect.max-wait.ms --- Key: YARN-3364 URL: https://issues.apache.org/jira/browse/YARN-3364 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Andrew Johnson I encountered an issue recently where the ApplicationMaster for MapReduce jobs would spend hours attempting to connect to a node in my cluster that had died due to a hardware fault. After debugging this, I found that the yarn.client.nodemanager-connect.max-wait-ms property did not behave as I had expected. Based on the name I had thought this would set a maximum time limit for attempting to connect to a NodeManager. The code in org.apache.hadoop.yarn.client.NMProxy corroborated this thought - it used a RetryUpToMaximumTimeWithFixedSleep policy when a ConnectTimeoutException was thrown, as it was in my case with a dead node. However, the RetryUpToMaximumTimeWithFixedSleep policy doesn't actually set a time limit, but instead divides the maximum time by the sleep period to set a total number of retries, regardless of how long those retries take. As such I was seeing the ApplicationMaster spend much longer attempting to make a connection than I had anticipated. The yarn.resourcemanager.connect.max-wait.ms would have the same behavior. These properties would be better named like yarn.client.nodemanager-connect.max.retries and yarn.resourcemanager.connect.max.retries to better align with the actual behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367242#comment-14367242 ] Hudson commented on YARN-3243: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #127 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/127/]) YARN-3243. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. Contributed by Wangda Tan. (jianhe: rev 487374b7fe0c92fc7eb1406c568952722b5d5b15) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource
[jira] [Commented] (YARN-3205) FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration.
[ https://issues.apache.org/jira/browse/YARN-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367244#comment-14367244 ] Hudson commented on YARN-3205: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #127 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/127/]) YARN-3205. FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. Contributed by Zhihai Xu. (ozawa: rev 3bc72cc16d8c7b8addd8f565523001dfcc32b891) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * hadoop-yarn-project/CHANGES.txt FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. --- Key: YARN-3205 URL: https://issues.apache.org/jira/browse/YARN-3205 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3205.000.patch, YARN-3205.001.patch FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. The old configuration may not have all these customized DFS_CLIENT configurations for FileSystemRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367243#comment-14367243 ] Hudson commented on YARN-3273: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #127 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/127/]) YARN-3273. Improve scheduler UI to facilitate scheduling analysis and debugging. Contributed Rohith Sharmaks (jianhe: rev 658097d6da1b1aac8e01db459f0c3b456e99652f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestContinuousScheduling.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/UserInfo.java Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.8.0 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367241#comment-14367241 ] Hudson commented on YARN-3305: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #127 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/127/]) YARN-3305. Normalize AM resource request on app submission. Contributed by Rohith Sharmaks (jianhe: rev 968425e9f7b850ff9c2ab8ca37a64c3fdbe77dbf) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Fix For: 2.8.0 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 0002-YARN-3305.patch, 0003-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3364) Clarify Naming of yarn.client.nodemanager-connect.max-wait-ms and yarn.resourcemanager.connect.max-wait.ms
Andrew Johnson created YARN-3364: Summary: Clarify Naming of yarn.client.nodemanager-connect.max-wait-ms and yarn.resourcemanager.connect.max-wait.ms Key: YARN-3364 URL: https://issues.apache.org/jira/browse/YARN-3364 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Andrew Johnson I encountered an issue recently where the ApplicationMaster for MapReduce jobs would spend hours attempting to connect to a node in my cluster that had died due to a hardware fault. After debugging this, I found that the yarn.client.nodemanager-connect.max-wait-ms property did not behave as I had expected. Based on the name I had thought this would set a maximum time limit for attempting to connect to a NodeManager. The code in org.apache.hadoop.yarn.client.NMProxy corroborated this thought - it used a RetryUpToMaximumTimeWithFixedSleep policy when a ConnectTimeoutException was thrown, as it was in my case with a dead node. However, the RetryUpToMaximumTimeWithFixedSleep policy doesn't actually set a time limit, but instead divides the maximum time by the sleep period to set a total number of retries, regardless of how long those retries take. As such I was seeing the ApplicationMaster spend much longer attempting to make a connection than I had anticipated. The yarn.resourcemanager.connect.max-wait.ms would have the same behavior. These properties would be better named like yarn.client.nodemanager-connect.max.retries and yarn.resourcemanager.connect.max.retries to better align with the actual behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3364) Clarify Naming of yarn.client.nodemanager-connect.max-wait-ms and yarn.resourcemanager.connect.max-wait.ms
[ https://issues.apache.org/jira/browse/YARN-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367164#comment-14367164 ] Jason Lowe commented on YARN-3364: -- Does your Hadoop build have the fix for YARN-3238? If not, that would explain the long retries you were seeing. Also the it's not a maximum time but a hacked-up guess at a number of retries issue is being tracked in HADOOP-11398. Clarify Naming of yarn.client.nodemanager-connect.max-wait-ms and yarn.resourcemanager.connect.max-wait.ms --- Key: YARN-3364 URL: https://issues.apache.org/jira/browse/YARN-3364 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Andrew Johnson I encountered an issue recently where the ApplicationMaster for MapReduce jobs would spend hours attempting to connect to a node in my cluster that had died due to a hardware fault. After debugging this, I found that the yarn.client.nodemanager-connect.max-wait-ms property did not behave as I had expected. Based on the name I had thought this would set a maximum time limit for attempting to connect to a NodeManager. The code in org.apache.hadoop.yarn.client.NMProxy corroborated this thought - it used a RetryUpToMaximumTimeWithFixedSleep policy when a ConnectTimeoutException was thrown, as it was in my case with a dead node. However, the RetryUpToMaximumTimeWithFixedSleep policy doesn't actually set a time limit, but instead divides the maximum time by the sleep period to set a total number of retries, regardless of how long those retries take. As such I was seeing the ApplicationMaster spend much longer attempting to make a connection than I had anticipated. The yarn.resourcemanager.connect.max-wait.ms would have the same behavior. These properties would be better named like yarn.client.nodemanager-connect.max.retries and yarn.resourcemanager.connect.max.retries to better align with the actual behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367302#comment-14367302 ] Hudson commented on YARN-3273: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2086 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2086/]) YARN-3273. Improve scheduler UI to facilitate scheduling analysis and debugging. Contributed Rohith Sharmaks (jianhe: rev 658097d6da1b1aac8e01db459f0c3b456e99652f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/UserInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestContinuousScheduling.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.8.0 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for
[jira] [Commented] (YARN-3205) FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration.
[ https://issues.apache.org/jira/browse/YARN-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367303#comment-14367303 ] Hudson commented on YARN-3205: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2086 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2086/]) YARN-3205. FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. Contributed by Zhihai Xu. (ozawa: rev 3bc72cc16d8c7b8addd8f565523001dfcc32b891) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/CHANGES.txt FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. --- Key: YARN-3205 URL: https://issues.apache.org/jira/browse/YARN-3205 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3205.000.patch, YARN-3205.001.patch FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. The old configuration may not have all these customized DFS_CLIENT configurations for FileSystemRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367301#comment-14367301 ] Hudson commented on YARN-3243: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2086 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2086/]) YARN-3243. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. Contributed by Wangda Tan. (jianhe: rev 487374b7fe0c92fc7eb1406c568952722b5d5b15) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource
[jira] [Commented] (YARN-3181) FairScheduler: Fix up outdated findbugs issues
[ https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367299#comment-14367299 ] Hudson commented on YARN-3181: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2086 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2086/]) Revert YARN-3181. FairScheduler: Fix up outdated findbugs issues. (kasha) (kasha: rev 32b43304563c2430c00bc3e142a962d2bc5f4d58) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSOpDurations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/CHANGES.txt FairScheduler: Fix up outdated findbugs issues -- Key: YARN-3181 URL: https://issues.apache.org/jira/browse/YARN-3181 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Brahma Reddy Battula Attachments: YARN-3181-002.patch, yarn-3181-1.patch In FairScheduler, we have excluded some findbugs-reported errors. Some of them aren't applicable anymore, and there are a few that can be easily fixed without needing an exclusion. It would be nice to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367317#comment-14367317 ] Hudson commented on YARN-3243: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #136 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/136/]) YARN-3243. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. Contributed by Wangda Tan. (jianhe: rev 487374b7fe0c92fc7eb1406c568952722b5d5b15) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.8.0 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much
[jira] [Commented] (YARN-3205) FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration.
[ https://issues.apache.org/jira/browse/YARN-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367319#comment-14367319 ] Hudson commented on YARN-3205: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #136 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/136/]) YARN-3205. FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. Contributed by Zhihai Xu. (ozawa: rev 3bc72cc16d8c7b8addd8f565523001dfcc32b891) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. --- Key: YARN-3205 URL: https://issues.apache.org/jira/browse/YARN-3205 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-3205.000.patch, YARN-3205.001.patch FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. The old configuration may not have all these customized DFS_CLIENT configurations for FileSystemRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367413#comment-14367413 ] Zhijie Shen commented on YARN-3047: --- bq. Did you mean NameValuePair can have package level access instead of public? It shouldn't be part of api module, but the timeline service module. [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.02.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367316#comment-14367316 ] Hudson commented on YARN-3305: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #136 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/136/]) YARN-3305. Normalize AM resource request on app submission. Contributed by Rohith Sharmaks (jianhe: rev 968425e9f7b850ff9c2ab8ca37a64c3fdbe77dbf) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Fix For: 2.8.0 Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 0002-YARN-3305.patch, 0003-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367318#comment-14367318 ] Hudson commented on YARN-3273: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #136 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/136/]) YARN-3273. Improve scheduler UI to facilitate scheduling analysis and debugging. Contributed Rohith Sharmaks (jianhe: rev 658097d6da1b1aac8e01db459f0c3b456e99652f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestContinuousScheduling.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/UserInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.8.0 Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be
[jira] [Commented] (YARN-3181) FairScheduler: Fix up outdated findbugs issues
[ https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367315#comment-14367315 ] Hudson commented on YARN-3181: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #136 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/136/]) Revert YARN-3181. FairScheduler: Fix up outdated findbugs issues. (kasha) (kasha: rev 32b43304563c2430c00bc3e142a962d2bc5f4d58) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSOpDurations.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml FairScheduler: Fix up outdated findbugs issues -- Key: YARN-3181 URL: https://issues.apache.org/jira/browse/YARN-3181 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Brahma Reddy Battula Attachments: YARN-3181-002.patch, yarn-3181-1.patch In FairScheduler, we have excluded some findbugs-reported errors. Some of them aren't applicable anymore, and there are a few that can be easily fixed without needing an exclusion. It would be nice to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3333) rename TimelineAggregator etc. to TimelineCollector
[ https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367410#comment-14367410 ] Sangjin Lee commented on YARN-: --- Back in progress. rename TimelineAggregator etc. to TimelineCollector --- Key: YARN- URL: https://issues.apache.org/jira/browse/YARN- Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-.001.patch Per discussions on YARN-2928, let's rename TimelineAggregator, etc. to TimelineCollector, etc. There are also several minor issues on the current branch, which can be fixed as part of this: - fixing some imports - missing license in TestTimelineServerClientIntegration.java - whitespaces - missing direct dependency -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3241: Attachment: YARN-3241.001.patch Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler Key: YARN-3241 URL: https://issues.apache.org/jira/browse/YARN-3241 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3241.000.patch, YARN-3241.001.patch Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same queue which will create Metrics source XXX already exists! MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367523#comment-14367523 ] Varun Saxena commented on YARN-3047: Yes, it is part of that. I have kept it inside {{hadoop-yarn-server-timelineservice}} [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch, YARN-3047.02.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367546#comment-14367546 ] Hitesh Shah commented on YARN-2375: --- [~jeagles] just pointed me to this jira. Firstly, this seems like an incompatible change for 2.6.0. Second, the semantics of the property yarn.timeline-service.enabled have changed. Earlier, this seemed like a global/admin flag at the YARN level that controlled whether ATS was enabled or disabled. Now, it seems like the assumption is that every application framework needs to check a YARN property config before deciding to use ATS or not? There is also an inconsistency in how YarnClient behaves as compared to TimelineClient. YarnClient obeys the yarn.timeline-service.enabled flag. But, TimelineClient does not. [~zjshen] [~jeagles] [~vinodkv] [~mitdesai] Comments? Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Fix For: 2.7.0, 2.6.1 Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3111) Fix ratio problem on FairScheduler page
[ https://issues.apache.org/jira/browse/YARN-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367563#comment-14367563 ] Ashwin Shankar commented on YARN-3111: -- I dont think 1 is done in the patch. My bad, I wasn't clear. In 1 what I was suggesting was that - represent that resource of steady/instant/max on the bar which is dominant in usage or used resources, so that, steady/instant/max/usage on the bar all finally represent ONE dimension in that bar/queue. So lets say usage of the queue is (20% mem, 60 % vcore), then steady/instant/max/usage would all display only vcore. bq. 3 is good, but one question is that parent queue has no tooltip now, but it has its own bar. Parent queues(except root) have a tooltip, I just checked in trunk. Can you check again ? bq. And think over 3 4, what about listing all resources's usage percent on the text on the right of each bar? Maybe color red for dominant resource? or just judge it by comparing percent number? It would be nice to have that with the color, however I'm concerned that it might look ugly from UE perspective. bq. And also what do you think of the issue I mentioned above? I think it still can happen after 1 2, cause for one queue: steady, fair, max, usage resource may have different dominant resource type. If I make a mistake here, please let me know. I believe this is clarified in the first paragraph of this comment. Let me know if you still have this concern. Fix ratio problem on FairScheduler page --- Key: YARN-3111 URL: https://issues.apache.org/jira/browse/YARN-3111 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Peng Zhang Assignee: Peng Zhang Priority: Minor Attachments: YARN-3111.1.patch, YARN-3111.png Found 3 problems on FairScheduler page: 1. Only compute memory for ratio even when queue schedulingPolicy is DRF. 2. When min resources is configured larger than real resources, the steady fair share ratio is so long that it is out the page. 3. When cluster resources is 0(no nodemanager start), ratio is displayed as NaN% used Attached image shows the snapshot of above problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367477#comment-14367477 ] Sunil G commented on YARN-2693: --- Thank you [~wangda] for sharing comments. As we move in the queue specific config inside scheduler.Queue, are we also taking ACLs back to scheduler (ACL wrt priority). Its better to control ACL from outside YarnAuthorizer and config only can be kept w.r.t scheduler. Pls share your thoughts. Regarding methods in ApplicationPrioirtyManager, it looks overall fine but I suggest we may need * getClusterApplicationPriorities (if its range, that can be sent back) Priority Label Manager in RM to manage application priority based on configuration -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 0006-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * Expose interface to RM to validate priority label TO have simplified interface, Priority Manager will support only configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367628#comment-14367628 ] Hadoop QA commented on YARN-3241: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705388/YARN-3241.001.patch against trunk revision 9d72f93. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7012//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7012//console This message is automatically generated. Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler Key: YARN-3241 URL: https://issues.apache.org/jira/browse/YARN-3241 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3241.000.patch, YARN-3241.001.patch Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same queue which will create Metrics source XXX already exists! MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3351) AppMaster tracking URL is broken in HA
[ https://issues.apache.org/jira/browse/YARN-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3351: Attachment: YARN-3351.002.patch Addressed comments 1 and 3. For 2, i meant any ipaddress, made it obvious by changing it to 1.2.3.4 Also made the test not cause any left over mapping and restore any it might affect AppMaster tracking URL is broken in HA -- Key: YARN-3351 URL: https://issues.apache.org/jira/browse/YARN-3351 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3351.001.patch, YARN-3351.002.patch After YARN-2713, the AppMaster link is broken in HA. To repro a) setup RM HA and ensure the first RM is not active, b) run a long sleep job and view the tracking url on the RM applications page The log and full stack trace is shown below {noformat} 2015-02-05 20:47:43,478 WARN org.mortbay.log: /proxy/application_1423182188062_0002/: java.net.BindException: Cannot assign requested address {noformat} {noformat} java.net.BindException: Cannot assign requested address at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376) at java.net.Socket.bind(Socket.java:631) at java.net.Socket.init(Socket.java:423) at java.net.Socket.init(Socket.java:280) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:188) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:345) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367484#comment-14367484 ] zhihai xu commented on YARN-3241: - Hi [~kasha], thanks for the review, your suggestion sounds reasonable to me. I uploaded a new patch YARN-3241.001.patch which addressed your comment. Also I find we need check the queue name in FairScheduler Config File to avoid similar issue, So I add code to check the queue name in Config File and add a test case for it. Please review it. Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler Key: YARN-3241 URL: https://issues.apache.org/jira/browse/YARN-3241 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3241.000.patch, YARN-3241.001.patch Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same queue which will create Metrics source XXX already exists! MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3241: Attachment: (was: YARN-3241.001.patch) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler Key: YARN-3241 URL: https://issues.apache.org/jira/browse/YARN-3241 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3241.000.patch, YARN-3241.001.patch Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same queue which will create Metrics source XXX already exists! MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367536#comment-14367536 ] Yongjun Zhang commented on YARN-3021: - HI [~jianhe] and all, I resumed working on this and found an obstacle here. See org.apache.hadoop.security.token.Token: {code} private synchronized TokenRenewer getRenewer() throws IOException { if (renewer != null) { return renewer; } renewer = TRIVIAL_RENEWER; synchronized (renewers) { for (TokenRenewer canidate : renewers) { if (canidate.handleKind(this.kind)) { renewer = canidate; return renewer; } } } LOG.warn(No TokenRenewer defined for token kind + this.kind); return renewer; } public boolean isManaged() throws IOException { return getRenewer().isManaged(this); } public long renew(Configuration conf ) throws IOException, InterruptedException { return getRenewer().renew(this, conf); } public void cancel(Configuration conf ) throws IOException, InterruptedException { getRenewer().cancel(this, conf); } {code} We can see that {{getRenewer()}} does more work than simply return the renewer. And non-null renewer is guaranteed to be returned currently. The other methods (listed above, called at server side) count on this behavior. If we set the renewer to null at client side and expect the server to pick it up, we need to do either 1. change the behaviour of {{getRenewer()} to return whatever renewer set by client. 2. or we change the token's {{kind}} to make {{getRenewer}} to return null, which will be really hacky. Making this kind of change seems to be more wide impact than expected, and things likely will broken by this change. Any thoughts? Thanks a lot. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
Sidharta Seethana created YARN-3366: --- Summary: Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367635#comment-14367635 ] Jian He commented on YARN-3021: --- Hi [~yzhangal], I think what we should do is in {{TokenCache#obtainTokensForNamenodesInternal}} change the {{delegTokenRenewer}} to be null for name nodes listed in mapreduce.job.hdfs-servers.token-renewal.exclude. And on server side, decode the {{identifier}} field in {{Token}} and check whether the {{renewer}} in {{AbstractDelegationTokenIdentifier}} is null or not. make sense ? YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367873#comment-14367873 ] Junping Du commented on YARN-914: - Hi, can someone in watch list help to review patch in sub JIRA YARN-3212? Thanks! (Umbrella) Support graceful decommission of nodemanager --- Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du Attachments: Gracefully Decommission of NodeManager (v1).pdf, Gracefully Decommission of NodeManager (v2).pdf, GracefullyDecommissionofNodeManagerv3.pdf When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367642#comment-14367642 ] Jian He commented on YARN-3021: --- Yongjun , thanks for taking this up ! just assigned the jira under your name YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Assignee: Yongjun Zhang Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3021: -- Assignee: Yongjun Zhang YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Assignee: Yongjun Zhang Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367692#comment-14367692 ] Yongjun Zhang commented on YARN-3021: - Hi Jian, looking closer at what you suggested, I think I was wrong about setting the TokenRenewer object in the token to null, instead, we want to set the renewer string to null. :-) thanks. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Assignee: Yongjun Zhang Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367697#comment-14367697 ] Zhijie Shen commented on YARN-2375: --- bq. Firstly, this seems like an incompatible change for 2.6.0. Do you mean the semantic incompatible? bq. Second, the semantics of the property yarn.timeline-service.enabled have changed. IMHO, yarn.timeline-service.enabled is still the global config. The difference is that previously, it's checked inside TimelineClient, but now it is checked by the user. Jon commented the reason of doing this: https://issues.apache.org/jira/browse/YARN-2375?focusedCommentId=14212964page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14212964 It sounds reasonable. Does it break any depending project? Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Fix For: 2.7.0, 2.6.1 Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.
[ https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3363: Labels: metrics supportability (was: ) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Key: YARN-3363 URL: https://issues.apache.org/jira/browse/YARN-3363 Project: Hadoop YARN Issue Type: Improvement Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Currently ContainerMetrics has container's actual memory usage(YARN-2984), actual CPU usage(YARN-3122), resource and pid(YARN-3022). It will be better to have localization and container launch time in ContainerMetrics for each active container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.
[ https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3363: Component/s: nodemanager add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Key: YARN-3363 URL: https://issues.apache.org/jira/browse/YARN-3363 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Currently ContainerMetrics has container's actual memory usage(YARN-2984), actual CPU usage(YARN-3122), resource and pid(YARN-3022). It will be better to have localization and container launch time in ContainerMetrics for each active container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368040#comment-14368040 ] Zhijie Shen commented on YARN-3040: --- Take it over. Thanks! - Zhijie [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3356) Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368122#comment-14368122 ] Hadoop QA commented on YARN-3356: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705446/YARN-3356.4.patch against trunk revision c239b6d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7018//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7018//console This message is automatically generated. Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track used-resources-by-label. -- Key: YARN-3356 URL: https://issues.apache.org/jira/browse/YARN-3356 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3356.1.patch, YARN-3356.2.patch, YARN-3356.3.patch, YARN-3356.4.patch Simliar to YARN-3099, Capacity Scheduler's LeafQueue.User/FiCaSchedulerApp should use ResourceRequest to track resource-usage/pending by label for better resource tracking and preemption. And also, when application's pending resource changed (container allocated, app completed, moved, etc.), we need update ResourceUsage of queue hierarchies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3284) Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command
[ https://issues.apache.org/jira/browse/YARN-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368208#comment-14368208 ] Hadoop QA commented on YARN-3284: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705440/YARN-3284.4.patch against trunk revision c239b6d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 13 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.mapreduce.v2.TestMRJobs org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.v2.TestNonExistentJob org.apache.hadoop.mapreduce.v2.TestRMNMInfo org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.mapreduce.v2.TestUberAM The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7017//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7017//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7017//console This message is automatically generated. Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command - Key: YARN-3284 URL: https://issues.apache.org/jira/browse/YARN-3284 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3284.1.patch, YARN-3284.1.patch, YARN-3284.2.patch, YARN-3284.3.patch, YARN-3284.3.rebase.patch, YARN-3284.4.patch Current, we have some extra metrics about the application and current attempt in RM Web UI. We should expose that information through YARN Command, too. 1. Preemption metrics 2. application outstanding resource requests 3. container locality info -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3351) AppMaster tracking URL is broken in HA
[ https://issues.apache.org/jira/browse/YARN-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368209#comment-14368209 ] Hudson commented on YARN-3351: -- FAILURE: Integrated in Hadoop-trunk-Commit #7365 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7365/]) YARN-3351. AppMaster tracking URL is broken in HA. (Anubhav Dhoot via kasha) (kasha: rev 20b49224eb90c796f042ac4251508f3979fd4787) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWebAppUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java AppMaster tracking URL is broken in HA -- Key: YARN-3351 URL: https://issues.apache.org/jira/browse/YARN-3351 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3351.001.patch, YARN-3351.002.patch, YARN-3351.003.patch After YARN-2713, the AppMaster link is broken in HA. To repro a) setup RM HA and ensure the first RM is not active, b) run a long sleep job and view the tracking url on the RM applications page The log and full stack trace is shown below {noformat} 2015-02-05 20:47:43,478 WARN org.mortbay.log: /proxy/application_1423182188062_0002/: java.net.BindException: Cannot assign requested address {noformat} {noformat} java.net.BindException: Cannot assign requested address at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376) at java.net.Socket.bind(Socket.java:631) at java.net.Socket.init(Socket.java:423) at java.net.Socket.init(Socket.java:280) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:188) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:345) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3368) Improve YARN web UI
[ https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368269#comment-14368269 ] Jian He commented on YARN-3368: --- we may expose information through the web service and have the client make rest call to retrieve the data and render that on the UI. Improve YARN web UI --- Key: YARN-3368 URL: https://issues.apache.org/jira/browse/YARN-3368 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He The goal is to improve YARN UI for better usability. We may take advantage of some existing front-end frameworks to build a fancier, easier-to-use UI. The old UI continue to exist until we feel it's ready to flip to the new UI. This serves as an umbrella jira to track the tasks. we can do this in a branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3371) TTL for YARN Registry SRV records
Gopal V created YARN-3371: - Summary: TTL for YARN Registry SRV records Key: YARN-3371 URL: https://issues.apache.org/jira/browse/YARN-3371 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Gopal V YARN service records do not have any stale indicators. The SRV records need a TTL equivalent for ephemeral services which tend to be reconfigured occasionally, to allow for clients to hold onto it without authoritative lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368064#comment-14368064 ] Zhijie Shen edited comment on YARN-3040 at 3/18/15 10:42 PM: - I've just uploaded a patch. It's an e2e modification to make the context information can be passed from the client to the backend storage. The context information includes *clusterId*, *userId*, *flowId*, *flowRunId* and *appId*. According to YARN-3240, new TimelineClient is constructed per application, and in the context of one application, we can reasonably assume this context information should be unchanged. Therefore, they just need to be specified when the client is constructed. The context information should be gathered or passed to AM and NM to construct timeline client properly. For example, for AM, this information can be passed via env inside CLC. Anyway, it's out of the scope of this Jira, we will cover that integration once we make some particular framework AM to use new timeline client. Back to the context information, some of them can be null, and some of them doesn't need to be specified explicitly: * *clusterId*: The application should specify the a unique cluster ID, or by default the cluster ID will be cluster_start timestamp of RM. * *userId*: The user doesn't need to specify this information. Instead, it will be obtained by the current ugi of the client. * *flowId*: The user either pass in a flowID or if it is an orphan application, the flowId will be the appId by replace the prefix with flow. * *flowRunId*: If it is an orphan application, it's 0. The reason why it should be 0 instead of a current timestamp when creating the timeline client is that their may have multiple clients in AM and NMs to be constructed at different time. They need to be synced on the same flowRunId. * *appId*: It's the only mandatory context information as we defined before. The client is constructed to only work with one application. I changed the web service endpoint accordingly to make it restful, and change the writer interface accordingly to pass in the context information when putting the entity. In addition, I've modified the FS-based writer implementation to reflect the change. The entity file will be put in the dir {{root/entities/clusterId/userId/flowId/flowRunId/appId/entityType/entityId.thist}}. It has been verified by TestDistributedShell and TestFileSystemTimelineWriterImpl. was (Author: zjshen): I've just uploaded a patch. It's an e2e modification to make the context information can be passed from the client to the backend storage. The context information includes *clusterId*, *userId*, *flowId*, *flowRunId* and *appId*. According to YARN-3240, new TimelineClient is constructed per application, and in the context of one application, we can reasonably assume this context information should be unchanged. Therefore, they just need to be specified when the client is constructed. The context information should be gathered or passed to AM and NM to construct timeline client properly. For example, for AM, this information can be passed via env inside CLC. Anyway, it's out of the scope of this Jira, we will cover that integration once we make some particular framework AM to use new timeline client. Back to the context information, some of them can be null, and some of them doesn't need to be specified explicitly: * *clusterId*: The application should specify the a unique cluster ID, or by default the cluster ID will be cluster_start timestamp of RM. * *userId*: The user doesn't need to specify this information. Instead, it will be obtained by the current ugi of the client. * *flowId*: The user either pass in a flowID or if it is an orphan application, the flowId will be the appId by replace the prefix with flow. * *flowRunId: If it is an orphan application, it's 0. The reason why it should be 0 instead of a current timestamp when creating the timeline client is that their may have multiple clients in AM and NMs to be constructed at different time. They need to be synced on the same flowRunId. * *appId*: It's the only mandatory context information as we defined before. The client is constructed to only work with one application. I changed the web service endpoint accordingly to make it restful, and change the writer interface accordingly to pass in the context information when putting the entity. In addition, I've modified the FS-based writer implementation to reflect the change. The entity file will be put in the dir {{root/entities/clusterId/userId/flowId/flowRunId/appId/entityType/entityId.thist}}. It has been verified by TestDistributedShell and TestFileSystemTimelineWriterImpl. [Data Model] Implement client-side API for handling flows - Key: YARN-3040
[jira] [Commented] (YARN-3351) AppMaster tracking URL is broken in HA
[ https://issues.apache.org/jira/browse/YARN-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368126#comment-14368126 ] Karthik Kambatla commented on YARN-3351: +1 AppMaster tracking URL is broken in HA -- Key: YARN-3351 URL: https://issues.apache.org/jira/browse/YARN-3351 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3351.001.patch, YARN-3351.002.patch, YARN-3351.003.patch After YARN-2713, the AppMaster link is broken in HA. To repro a) setup RM HA and ensure the first RM is not active, b) run a long sleep job and view the tracking url on the RM applications page The log and full stack trace is shown below {noformat} 2015-02-05 20:47:43,478 WARN org.mortbay.log: /proxy/application_1423182188062_0002/: java.net.BindException: Cannot assign requested address {noformat} {noformat} java.net.BindException: Cannot assign requested address at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376) at java.net.Socket.bind(Socket.java:631) at java.net.Socket.init(Socket.java:423) at java.net.Socket.init(Socket.java:280) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:188) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:345) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3370) don't show the exception message before showing container logs in UI
[ https://issues.apache.org/jira/browse/YARN-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368193#comment-14368193 ] Sergey Shelukhin commented on YARN-3370: [~vinodkv] fyi don't show the exception message before showing container logs in UI Key: YARN-3370 URL: https://issues.apache.org/jira/browse/YARN-3370 Project: Hadoop YARN Issue Type: Bug Reporter: Sergey Shelukhin When you click on e.g. AM attempt logs, Exception: Unknown container ... message is shown, then the page refreshes to logs. The message should not be shown by default -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3370) don't show the exception message before showing container logs in UI
Sergey Shelukhin created YARN-3370: -- Summary: don't show the exception message before showing container logs in UI Key: YARN-3370 URL: https://issues.apache.org/jira/browse/YARN-3370 Project: Hadoop YARN Issue Type: Bug Reporter: Sergey Shelukhin When you click on e.g. AM attempt logs, Exception: Unknown container ... message is shown, then the page refreshes to logs. The message should not be shown by default -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-3369: Description: In AppSchedulingInfo.java the method checkForDeactivation() has these 2 consecutive lines: {code} ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); if (request.getNumContainers() 0) { {code} the first line calls getResourceRequest and it can return null. {code} synchronized public ResourceRequest getResourceRequest( Priority priority, String resourceName) { MapString, ResourceRequest nodeRequests = requests.get(priority); return (nodeRequests == null) ? {color:red} null : nodeRequests.get(resourceName); } {code} The second line dereferences the pointer directly without a check. If the pointer is null, the RM dies. {quote}2015-03-17 14:14:04,757 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) at java.lang.Thread.run(Thread.java:722) {color:red} *2015-03-17 14:14:04,758 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..*{color} {quote} was: In AppSchedulingInfo.java the method checkForDeactivation() has these 2 consecutive lines: {quote} {color:red} ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); if (request.getNumContainers() 0) { {color} {quote} the first line calls getResourceRequest and it can return null. {quote} synchronized public ResourceRequest getResourceRequest( Priority priority, String resourceName) { MapString, ResourceRequest nodeRequests = requests.get(priority); {color:red} *return* {color} (nodeRequests == null) ? {color:red} *null* {color} : nodeRequests.get(resourceName); } {quote} The second line dereferences the pointer directly without a check. If the pointer is null, the RM dies. {quote}2015-03-17 14:14:04,757 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) at
[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
[ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368118#comment-14368118 ] Ming Ma commented on YARN-3212: --- bq. Do we want to consider DECOMMISSIONING nodes as not active? There are containers actively running on them, and in that sense they are participating in the cluster (and contributing to the overall cluster resource). I think they should still be considered active, but I could be persuaded otherwise. Do we need to support the scenario where NM becomes dead when it is being decommissioned? Say decommission timeout is 30 minutes larger than the NM liveness timeout. The node drops out of the cluster for some time and rejoin later all within the decommission time out. Will Yarn show the status as just dead node, or {dead, decommissioning}? Seems useful for admins to know about it. If we need that, we can consider two types of NodeState. One is liveness state, one is admin state. Then you will have different combinations. RMNode State Transition Update with DECOMMISSIONING state - Key: YARN-3212 URL: https://issues.apache.org/jira/browse/YARN-3212 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Junping Du Assignee: Junping Du Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, YARN-3212-v2.patch As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and can transition from “running” state triggered by a new event - “decommissioning”. This new state can be transit to state of “decommissioned” when Resource_Update if no running apps on this NM or NM reconnect after restart. Or it received DECOMMISSIONED event (after timeout from CLI). In addition, it can back to “running” if user decides to cancel previous decommission by calling recommission on the same node. The reaction to other events is similar to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
Giovanni Matteo Fumarola created YARN-3369: -- Summary: Missing NullPointer check in AppSchedulingInfo causes RM to die Key: YARN-3369 URL: https://issues.apache.org/jira/browse/YARN-3369 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Giovanni Matteo Fumarola In AppSchedulingInfo.java the method checkForDeactivation() has these 2 consecutive lines: {quote} {color:red} ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); if (request.getNumContainers() 0) { {color} {quote} the first line calls getResourceRequest and it can return null. {quote} synchronized public ResourceRequest getResourceRequest( Priority priority, String resourceName) { MapString, ResourceRequest nodeRequests = requests.get(priority); {color:red} *return* {color} (nodeRequests == null) ? {color:red} *null* {color} : nodeRequests.get(resourceName); } {quote} The second line dereferences the pointer directly without a check. If the pointer is null, the RM dies. {quote}2015-03-17 14:14:04,757 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) at java.lang.Thread.run(Thread.java:722) {color:red} *2015-03-17 14:14:04,758 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..*{color} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API
[ https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368089#comment-14368089 ] Jian He commented on YARN-3345: --- - public/unstable annotations for the newly added records, e.g. SetNodeLabelsAttributesRequest, NodeLabelAttributes#getAttributes,getNodeLabel - NodeLabelAttributes - NodeLabel, so that AddToClusterNodeLabelsRequest can later on use the same data structure. - for node exclusiveness - I think we may use NodeLabel#(get/set)IsExclusive - “ an un existed node-label=%s” - “non-existing node-label” - throw YarnException instead of IOException - below code, how about user wants to set the attributes to be empty {code} if (attr.getAttributes().isEmpty()) { // simply ignore continue; } {code} - add a newInstance method in SetNodeLabelsAttributesResponse and use that {code} SetNodeLabelsAttributesResponse response = recordFactory.newRecordInstance(SetNodeLabelsAttributesResponse.class); {code} - revert RMNodeLabelsManager change Add non-exclusive node label RMAdmin CLI/API Key: YARN-3345 URL: https://issues.apache.org/jira/browse/YARN-3345 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3345.1.patch, YARN-3345.2.patch, YARN-3345.3.patch, YARN-3345.4.patch As described in YARN-3214 (see design doc attached to that JIRA), we need add non-exclusive node label RMAdmin API and CLI implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3368) Improve YARN web UI
[ https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3368: -- Issue Type: Improvement (was: Bug) Improve YARN web UI --- Key: YARN-3368 URL: https://issues.apache.org/jira/browse/YARN-3368 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He The goal is to improve YARN UI for better usability. We may take advantage of some existing front-end frameworks to build a fancier, easier-to-use UI. The old UI continue to exist until we feel it's ready to flip to the new UI. This serves as an umbrella jira to track the tasks. we can do this in a branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368347#comment-14368347 ] Brahma Reddy Battula commented on YARN-3369: [~giovanni.fumarola] thanks for reporting..I would like to work on this jira, If you have patch, you can reassign to yourself...thanks Missing NullPointer check in AppSchedulingInfo causes RM to die Key: YARN-3369 URL: https://issues.apache.org/jira/browse/YARN-3369 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Giovanni Matteo Fumarola Assignee: Brahma Reddy Battula In AppSchedulingInfo.java the method checkForDeactivation() has these 2 consecutive lines: {code} ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); if (request.getNumContainers() 0) { {code} the first line calls getResourceRequest and it can return null. {code} synchronized public ResourceRequest getResourceRequest( Priority priority, String resourceName) { MapString, ResourceRequest nodeRequests = requests.get(priority); return (nodeRequests == null) ? {color:red} null : nodeRequests.get(resourceName); } {code} The second line dereferences the pointer directly without a check. If the pointer is null, the RM dies. {quote}2015-03-17 14:14:04,757 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) at java.lang.Thread.run(Thread.java:722) {color:red} *2015-03-17 14:14:04,758 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..*{color} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3040: -- Assignee: Zhijie Shen (was: Robert Kanter) [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3040: -- Attachment: YARN-3040.1.patch [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Attachments: YARN-3040.1.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3351) AppMaster tracking URL is broken in HA
[ https://issues.apache.org/jira/browse/YARN-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368090#comment-14368090 ] Hadoop QA commented on YARN-3351: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705449/YARN-3351.003.patch against trunk revision c239b6d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7019//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7019//console This message is automatically generated. AppMaster tracking URL is broken in HA -- Key: YARN-3351 URL: https://issues.apache.org/jira/browse/YARN-3351 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3351.001.patch, YARN-3351.002.patch, YARN-3351.003.patch After YARN-2713, the AppMaster link is broken in HA. To repro a) setup RM HA and ensure the first RM is not active, b) run a long sleep job and view the tracking url on the RM applications page The log and full stack trace is shown below {noformat} 2015-02-05 20:47:43,478 WARN org.mortbay.log: /proxy/application_1423182188062_0002/: java.net.BindException: Cannot assign requested address {noformat} {noformat} java.net.BindException: Cannot assign requested address at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376) at java.net.Socket.bind(Socket.java:631) at java.net.Socket.init(Socket.java:423) at java.net.Socket.init(Socket.java:280) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:188) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:345) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2828) Enable auto refresh of web pages (using http parameter)
[ https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay Bhat updated YARN-2828: - Attachment: YARN-2828.005.patch Enable auto refresh of web pages (using http parameter) --- Key: YARN-2828 URL: https://issues.apache.org/jira/browse/YARN-2828 Project: Hadoop YARN Issue Type: Improvement Reporter: Tim Robertson Assignee: Vijay Bhat Priority: Minor Attachments: YARN-2828.001.patch, YARN-2828.002.patch, YARN-2828.003.patch, YARN-2828.004.patch, YARN-2828.005.patch The MR1 Job Tracker had a useful HTTP parameter of e.g. refresh=3 that could be appended to URLs which enabled a page reload. This was very useful when developing mapreduce jobs, especially to watch counters changing. This is lost in the the Yarn interface. Could be implemented as a page element (e.g. drop down or so), but I'd recommend that the page not be more cluttered, and simply bring back the optional refresh HTTP param. It worked really nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Description: Implement a Fair Comparator for the Scheduler Comparator Ordering Policy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment An implementation of a Scheduler Comparator for use with the Scheduler Comparator Ordering Policy will be built with the below comparison for ordering applications for container assignment (ascending) and for preemption (descending) Current resource usage - less usage is lesser Submission time - earlier is lesser Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application name, which is lexically FIFO for that comparison (first submitted is lesser) was:Implement a Fair SchedulerOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. Implement a Fair SchedulerOrderingPolicy Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch Implement a Fair Comparator for the Scheduler Comparator Ordering Policy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment An implementation of a Scheduler Comparator for use with the Scheduler Comparator Ordering Policy will be built with the below comparison for ordering applications for container assignment (ascending) and for preemption (descending) Current resource usage - less usage is lesser Submission time - earlier is lesser Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application name, which is lexically FIFO for that comparison (first submitted is lesser) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3368) Improve YARN web UI
[ https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3368: -- Description: The goal is to improve YARN UI for better usability. We may take advantage of some existing front-end frameworks to build a fancier, easier-to-use UI. The old UI continue to exist until we feel it's ready to flip to the new UI. This serves as an umbrella jira to track the tasks. we can do this in a branch. was: The goal is to improve YARN UI for better usability. We may take advantage of some existing front-end frameworks to build a fancier, easier-to-use UI. The old UI continue to exist until we feel it's ready to flip to the new UI. Improve YARN web UI --- Key: YARN-3368 URL: https://issues.apache.org/jira/browse/YARN-3368 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He The goal is to improve YARN UI for better usability. We may take advantage of some existing front-end frameworks to build a fancier, easier-to-use UI. The old UI continue to exist until we feel it's ready to flip to the new UI. This serves as an umbrella jira to track the tasks. we can do this in a branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned YARN-3369: -- Assignee: Brahma Reddy Battula Missing NullPointer check in AppSchedulingInfo causes RM to die Key: YARN-3369 URL: https://issues.apache.org/jira/browse/YARN-3369 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Giovanni Matteo Fumarola Assignee: Brahma Reddy Battula In AppSchedulingInfo.java the method checkForDeactivation() has these 2 consecutive lines: {code} ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); if (request.getNumContainers() 0) { {code} the first line calls getResourceRequest and it can return null. {code} synchronized public ResourceRequest getResourceRequest( Priority priority, String resourceName) { MapString, ResourceRequest nodeRequests = requests.get(priority); return (nodeRequests == null) ? {color:red} null : nodeRequests.get(resourceName); } {code} The second line dereferences the pointer directly without a check. If the pointer is null, the RM dies. {quote}2015-03-17 14:14:04,757 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) at java.lang.Thread.run(Thread.java:722) {color:red} *2015-03-17 14:14:04,758 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..*{color} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2828) Enable auto refresh of web pages (using http parameter)
[ https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368339#comment-14368339 ] Hadoop QA commented on YARN-2828: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705458/YARN-2828.005.patch against trunk revision 20b4922. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7020//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7020//console This message is automatically generated. Enable auto refresh of web pages (using http parameter) --- Key: YARN-2828 URL: https://issues.apache.org/jira/browse/YARN-2828 Project: Hadoop YARN Issue Type: Improvement Reporter: Tim Robertson Assignee: Vijay Bhat Priority: Minor Attachments: YARN-2828.001.patch, YARN-2828.002.patch, YARN-2828.003.patch, YARN-2828.004.patch, YARN-2828.005.patch The MR1 Job Tracker had a useful HTTP parameter of e.g. refresh=3 that could be appended to URLs which enabled a page reload. This was very useful when developing mapreduce jobs, especially to watch counters changing. This is lost in the the Yarn interface. Could be implemented as a page element (e.g. drop down or so), but I'd recommend that the page not be more cluttered, and simply bring back the optional refresh HTTP param. It worked really nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3372) Collision-free unique bindings refresh APIs for service records
Gopal V created YARN-3372: - Summary: Collision-free unique bindings refresh APIs for service records Key: YARN-3372 URL: https://issues.apache.org/jira/browse/YARN-3372 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Gopal V The current bind() operation binds to a hard entry name for the service record, which makes it impossible for a truly distributed application without a centralized service to register without pre-determined naming conventions. The uniqueness does not need to guarantee ordering or any other leakage of abstractions, merely that each bind() returns a unique path the record was bound to. And that the TTL refresh can periodically update that exact record as an active API. These are state-less auto-configuration mechanisms inspired by the IPv6 improvements over DNS for resolution. Instead of relying ICMPv6, this uses the registry to keep a collective memory of unique identities to which endpoints are delegated to. This is only obliquely related to the Slider registration as even those do not track the generational ids for restarted daemons from the same container-id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3373) TTL identity aware read cache for the SRV records
Gopal V created YARN-3373: - Summary: TTL identity aware read cache for the SRV records Key: YARN-3373 URL: https://issues.apache.org/jira/browse/YARN-3373 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Gopal V The freshness/staleness checks of the SRV record should be an abstracted implementation detail of the service registry. This implies that every client is asked to listServiceRecords each time they require a list of the records, which would be incredibly expensive if it involved network round-trips during normal tight-loop operations. The combination of unique binding records and the TTL provides the equivalent of the DNS (fixed CNAME - unique A) roll-over mechanisms used to cache-bust effectively on the client-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3331) NodeManager should use directory other than tmp for extracting and loading leveldbjni
[ https://issues.apache.org/jira/browse/YARN-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3331: Attachment: YARN-3331.002.patch Addressed feedback. Thanks [~aw] for the very specific feedback NodeManager should use directory other than tmp for extracting and loading leveldbjni - Key: YARN-3331 URL: https://issues.apache.org/jira/browse/YARN-3331 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3331.001.patch, YARN-3331.002.patch /tmp can be required to be noexec in many environments. This causes a problem when nodemanager tries to load the leveldbjni library which can get unpacked and executed from /tmp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3284) Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command
[ https://issues.apache.org/jira/browse/YARN-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368482#comment-14368482 ] Rohith commented on YARN-3284: -- Thanks [~xgong] for working this jira.. [~leftnoteasy] for review.. Since patch size is huge, I think this task can be logically divide into 3 sub tasks which would help for reviewer for granular review and for implementer to rebase the code. # API changes includes proto's # Web UI includes updating metrics # Application CLI Any thoughts ? Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command - Key: YARN-3284 URL: https://issues.apache.org/jira/browse/YARN-3284 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3284.1.patch, YARN-3284.1.patch, YARN-3284.2.patch, YARN-3284.3.patch, YARN-3284.3.rebase.patch, YARN-3284.4.patch Current, we have some extra metrics about the application and current attempt in RM Web UI. We should expose that information through YARN Command, too. 1. Preemption metrics 2. application outstanding resource requests 3. container locality info -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368463#comment-14368463 ] Naganarasimha G R commented on YARN-3362: - Thanks [~wangda], Regarding the approach to display i had few concerns : * There will be some common queue metrics across the labels, wont it get repeated across for each label if a queue is mapped to multiple labels ? * IIUC most of the queue Metrics might not be specific to a label, like Capacity, Absolute max capacity, Max apps, Max AM's per user etc... . Correct me if my understanding on this is wrong. * Apart from the label specific queue metrics like (label capacity, label abs capacity,used) are there any new Label specific queue metrics you have in your mind ? * would it be better to list like {noformat} + root [=] 30% used + a [===] 75% used + a1 [=] 30% used - | Queue Metrics | || | metrics1 |value1 | | metrics2 |value2 | - | Active Users info (yarn-3273)| || | user1 |info| | user2 |info| - | Label Resource usage info | || | label_x [=] 30% used | | label_y [] 20% used | -- + a2 [=] 30% used ... {noformat} * Also if required we can have seperate page (/in the labels page/append at the end of CS page) like : {noformat} + label_x [=] 30% used [Actual Resource - Used resource ] + root [=] 30% used [Actual Resource - Used resource ] + a [===] 75% used [Actual Resource - Used resource ] + a1 [=] 30% used [Actual Resource - Used resource ] + label_y + root [...] + ... + label_z + root [...] {noformat} YARN-3273, has added more info to the CS page so we need to consider the size of page and its usability. Please provide your thoughts on the same Add node label usage in RM CapacityScheduler web UI --- Key: YARN-3362 URL: https://issues.apache.org/jira/browse/YARN-3362 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager, webapp Reporter: Wangda Tan Assignee: Naganarasimha G R We don't have node label usage in RM CapacityScheduler web UI now, without this, user will be hard to understand what happened to nodes have labels assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367669#comment-14367669 ] Yongjun Zhang commented on YARN-3021: - Hi [~jianhe], Thanks for your comment. I'm actually aligned with what you suggested. The problem I was trying to point out is, we will have to change the behavior of the code I pasted above to deal with null renewer. E.g., the {{getRenewer()}} method will return a non-null based on current implementation (if not set or found, TRIVIAL_RENEWER will be returned); after making the suggested change for this jira, the renewer can be null, so we should return null from {{getRenewer()}}. My question was, I'm not sure about the impact of this behavior change. I expect some application does count on the current behavior. More comments? Thanks. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Assignee: Yongjun Zhang Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367668#comment-14367668 ] Hadoop QA commented on YARN-3241: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705391/YARN-3241.001.patch against trunk revision 9d72f93. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7013//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7013//console This message is automatically generated. Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler Key: YARN-3241 URL: https://issues.apache.org/jira/browse/YARN-3241 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3241.000.patch, YARN-3241.001.patch Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same queue which will create Metrics source XXX already exists! MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3334) [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3334: - Attachment: YARN-3334-demo.patch Update a demo patch for putting some metrics info to new TimelineService. Haven't include any test now but will add it soon. [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2. - Key: YARN-3334 URL: https://issues.apache.org/jira/browse/YARN-3334 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: YARN-2928 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3334-demo.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3351) AppMaster tracking URL is broken in HA
[ https://issues.apache.org/jira/browse/YARN-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367719#comment-14367719 ] Hadoop QA commented on YARN-3351: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705406/YARN-3351.002.patch against trunk revision 402817c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.util.TestWebAppUtils Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7015//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7015//console This message is automatically generated. AppMaster tracking URL is broken in HA -- Key: YARN-3351 URL: https://issues.apache.org/jira/browse/YARN-3351 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3351.001.patch, YARN-3351.002.patch After YARN-2713, the AppMaster link is broken in HA. To repro a) setup RM HA and ensure the first RM is not active, b) run a long sleep job and view the tracking url on the RM applications page The log and full stack trace is shown below {noformat} 2015-02-05 20:47:43,478 WARN org.mortbay.log: /proxy/application_1423182188062_0002/: java.net.BindException: Cannot assign requested address {noformat} {noformat} java.net.BindException: Cannot assign requested address at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376) at java.net.Socket.bind(Socket.java:631) at java.net.Socket.init(Socket.java:423) at java.net.Socket.init(Socket.java:280) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:188) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:345) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367465#comment-14367465 ] Sunil G commented on YARN-2003: --- Thank you [~leftnoteasy] for sharing the comments Yes, YARN-2003 will focus on RM related changes excluding changes from Scheduler. I will rearrange code as per same and update. Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side] -- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler
[ https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3241: Attachment: YARN-3241.001.patch Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler Key: YARN-3241 URL: https://issues.apache.org/jira/browse/YARN-3241 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3241.000.patch, YARN-3241.001.patch Leading space, trailing space and empty sub queue name may cause MetricsException(Metrics source XXX already exists! ) when add application to FairScheduler. The reason is because QueueMetrics parse the queue name different from the QueueManager. QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space and trailing space in the sub queue name, It will also remove empty sub queue name. {code} static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); {code} But QueueManager won't remove Leading space, trailing space and empty sub queue name. This will cause out of sync between FSQueue and FSQueueMetrics. QueueManager will think two queue names are different so it will try to create a new queue. But FSQueueMetrics will treat these two queue names as same queue which will create Metrics source XXX already exists! MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367539#comment-14367539 ] Yongjun Zhang commented on YARN-3021: - Possibly introduce a dummy renewer class and make its methods no op, instead of setting renewer to null? I wonder whether this would be compatible change ... YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3365) Add support for using the 'tc' tool via container-executor
Sidharta Seethana created YARN-3365: --- Summary: Add support for using the 'tc' tool via container-executor Key: YARN-3365 URL: https://issues.apache.org/jira/browse/YARN-3365 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana We need the following functionality : 1) modify network interface traffic shaping rules - to be able to attach a qdisc, create child classes etc 2) read existing rules in place 3) read stats for the various classes Using tc requires elevated privileges - hence this functionality is to be made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)