[jira] [Commented] (YARN-3525) Rename fair scheduler properties increment-allocation-mb and increment-allocation-vcores
[ https://issues.apache.org/jira/browse/YARN-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507381#comment-14507381 ] Wei Yan commented on YARN-3525: --- Hi, [~bibinchundatt], I think the reason is These properties are looked up in the yarn-site.xml. Rename fair scheduler properties increment-allocation-mb and increment-allocation-vcores Key: YARN-3525 URL: https://issues.apache.org/jira/browse/YARN-3525 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Rename below two properties since only used by fair scheduler {color:blue}yarn.scheduler.increment-allocation-mb{color} to {color:red}yarn.scheduler.fair.increment-allocation-mb{color} {color:blue}yarn.scheduler.increment-allocation-vcores{color} to {color:red}yarn.scheduler.fair.increment-allocation-vcores{color} All other properties only for fair scheduler are using {color:red} yarn.scheduler.fair{color} prefix . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507332#comment-14507332 ] Lei Guo commented on YARN-1963: --- Agree with [~jlowe], integer is the base of priority, and label should be just an alias during application submission. If we keep both label and integer in the system, it could be complicate when administrator changing the label/range mapping. It's true that we do not expect the user to assign many different priorities, but we may enhance scheduler to calculate priority dynamically based on certain criteria, for example, the pending time or at certain time frame. In this case, the priority could be any number. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: 0001-YARN-1963-prototype.patch, YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507385#comment-14507385 ] Rohith commented on YARN-3225: -- +1(non-binding) LGTM. New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, YARN-3225-4.patch, YARN-3225-5.patch, YARN-3225.patch, YARN-914.patch New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507029#comment-14507029 ] Hadoop QA commented on YARN-2740: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 10 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 31s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 22s | The applied patch generated 5 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 0s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 0s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 54m 14s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 100m 10s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727201/YARN-2740.20150422-2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b08908a | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7444/artifact/patchprocess/whitespace.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7444/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7444/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7444/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7444/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7444/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7444//console | This message was automatically generated. ResourceManager side should properly handle node label modifications when distributed node label configuration enabled -- Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, YARN-2740.20150417-1.patch, YARN-2740.20150420-1.patch, YARN-2740.20150421-1.patch, YARN-2740.20150422-2.patch According to YARN-2495, when distributed node label configuration is enabled: - RMAdmin / REST API should reject change labels on node operations. - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507396#comment-14507396 ] Naganarasimha G R commented on YARN-2740: - unit test failure is not related to this patch and can fix white space issue but not clear about the check style output, will correct it once i get some confirmation from allen ResourceManager side should properly handle node label modifications when distributed node label configuration enabled -- Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, YARN-2740.20150417-1.patch, YARN-2740.20150420-1.patch, YARN-2740.20150421-1.patch, YARN-2740.20150422-2.patch According to YARN-2495, when distributed node label configuration is enabled: - RMAdmin / REST API should reject change labels on node operations. - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507255#comment-14507255 ] sandflee commented on YARN-3387: It seems a bug in LaunchAM in MockRM.java, in LaunchAM: 1, wait App becomes ACCEPTED, after this appAttempt is created 2, node Heart beat 3, wait appAttempt becomes ALLOCATED If nodeHeartBeat is handled before appAttempt becomes SCHEDULED, appAttempt State will never comes to ALLOCATED if no other nm heartbeat comes. just as the failed case https://builds.apache.org/job/PreCommit-YARN-Build/7410//testReport/org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager/TestAMRestart/testShouldNotCountFailureToMaxAttemptRetry/ https://builds.apache.org/job/PreCommit-YARN-Build/7410//testReport/org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager/TestAMRestart/testPreemptedAMRestartOnRMRestart/ container complete message couldn't pass to am if am restarted and rm changed - Key: YARN-3387 URL: https://issues.apache.org/jira/browse/YARN-3387 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: sandflee Priority: Critical Labels: patch Attachments: YARN-3387.001.patch, YARN-3387.002.patch suppose am work preserving and rm ha is enabled. container complete message is passed to appattemt.justFinishedContainers in rm。in normal situation,all attempt in one app shares the same justFinishedContainers, but when rm changed, every attempt has it's own justFinishedContainers, so in situations below, container complete message couldn't passed to am: 1, am restart 2, rm changes 3, container launched by first am completes container complete message will be passed to appAttempt1 not appAttempt2, but am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.
[ https://issues.apache.org/jira/browse/YARN-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507286#comment-14507286 ] Steve Loughran commented on YARN-2444: -- +add test to submit 100+K events and see what happens. Primary filters added after first submission not indexed, cause exceptions in logs. --- Key: YARN-2444 URL: https://issues.apache.org/jira/browse/YARN-2444 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.5.0 Reporter: Marcelo Vanzin Assignee: Steve Loughran Attachments: YARN-2444-001.patch, ats.java, org.apache.hadoop.yarn.server.timeline.TestTimelineClientPut-output.txt See attached code for an example. The code creates an entity with a primary filter, submits it to the ATS. After that, a new primary filter value is added and the entity is resubmitted. At that point two things can be seen: - Searching for the new primary filter value does not return the entity - The following exception shows up in the logs: {noformat} 14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying access for user dr.who (auth:SIMPLE) on the events of the timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test } org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test } is corrupted. at org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67) at org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3495) Confusing log generated by FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507084#comment-14507084 ] Hudson commented on YARN-3495: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2121 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2121/]) YARN-3495. Confusing log generated by FairScheduler. Contributed by Brahma Reddy Battula. (ozawa: rev 105afd54779852c518b978101f23526143e234a5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt Confusing log generated by FairScheduler Key: YARN-3495 URL: https://issues.apache.org/jira/browse/YARN-3495 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: YARN-3495.patch 2015-04-16 12:03:48,531 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507137#comment-14507137 ] Wangda Tan commented on YARN-3413: -- Test build failures seems caused by build environment, I can get build passed locally, retriggerred Jenkins. Checkstyle and whitespace checks are new added with Allen's patch, will try to fix them. Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime -- Key: YARN-3413 URL: https://issues.apache.org/jira/browse/YARN-3413 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, YARN-3413.4.patch, YARN-3413.5.patch As mentioned in : https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. Changing node label exclusivity and/or other attributes may not be a real use case, and also we should support setting node label attributes whiling adding them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-3434: Attachment: YARN-3434.patch Upmerged patch to latest Interaction between reservations and userlimit can result in significant ULF violation -- Key: YARN-3434 URL: https://issues.apache.org/jira/browse/YARN-3434 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch ULF was set to 1.0 User was able to consume 1.4X queue capacity. It looks like when this application launched, it reserved about 1000 containers, each 8G each, within about 5 seconds. I think this allowed the logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3495) Confusing log generated by FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507039#comment-14507039 ] Hudson commented on YARN-3495: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #172 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/172/]) YARN-3495. Confusing log generated by FairScheduler. Contributed by Brahma Reddy Battula. (ozawa: rev 105afd54779852c518b978101f23526143e234a5) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java Confusing log generated by FairScheduler Key: YARN-3495 URL: https://issues.apache.org/jira/browse/YARN-3495 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: YARN-3495.patch 2015-04-16 12:03:48,531 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore
[ https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507045#comment-14507045 ] Hudson commented on YARN-3410: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #172 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/172/]) YARN-3410. YARN admin should be able to remove individual application records from RMStateStore. (Rohith Sharmaks via wangda) (wangda: rev e71d0d87d9b388f211a8eb3d2cd9af347abf9bda) * hadoop-yarn-project/hadoop-yarn/bin/yarn * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/LeveldbRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestLeveldbRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java YARN-3410. Addendum fix for compilation error. Contributed by Rohith. (aajisaka: rev b08908ae5eaf60a7fc70bf60493a533e915553c5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md YARN admin should be able to remove individual application records from RMStateStore Key: YARN-3410 URL: https://issues.apache.org/jira/browse/YARN-3410 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, yarn Reporter: Wangda Tan Assignee: Rohith Priority: Critical Fix For: 2.8.0 Attachments: 0001-YARN-3410-v1.patch, 0001-YARN-3410.patch, 0001-YARN-3410.patch, 0002-YARN-3410.patch, 0003-YARN-3410.patch, 0004-YARN-3410-addendum-branch-2.patch, 0004-YARN-3410-addendum.patch, 0004-YARN-3410-branch-2.patch, 0004-YARN-3410.patch When RM state store entered an unexpected state, one example is YARN-2340, when an attempt is not in final state but app already completed, RM can never get up unless format RMStateStore. I think we should support remove individual application records from RMStateStore to unblock RM admin make choice of either waiting for a fix or format state store. In addition, RM should be able to report all fatal errors (which will shutdown RM) when doing app recovery, this can save admin some time to remove apps in bad state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3503) Expose disk utilization percentage and bad local and log dir counts on NM via JMX
[ https://issues.apache.org/jira/browse/YARN-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507050#comment-14507050 ] Hudson commented on YARN-3503: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #172 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/172/]) YARN-3503. Expose disk utilization percentage and bad local and log dir counts in NM metrics. Contributed by Varun Vasudev (jianhe: rev 674c7ef64916fabbe59c8d6cdd50ca19cf7ddb7c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java Expose disk utilization percentage and bad local and log dir counts on NM via JMX - Key: YARN-3503 URL: https://issues.apache.org/jira/browse/YARN-3503 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: YARN-3503.0.patch It would be useful to expose the disk utilization as well as the number of bad local disks on the NMs via JMX so that alerts can be setup for nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3494) Expose AM resource limit and usage in QueueMetrics
[ https://issues.apache.org/jira/browse/YARN-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507044#comment-14507044 ] Hudson commented on YARN-3494: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #172 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/172/]) YARN-3494. Expose AM resource limit and usage in CS QueueMetrics. Contributed by Rohith Sharmaks (jianhe: rev bdd90110e6904b59746812d9a093924a65e72280) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueMetrics.java Expose AM resource limit and usage in QueueMetrics --- Key: YARN-3494 URL: https://issues.apache.org/jira/browse/YARN-3494 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.8.0 Attachments: 0001-YARN-3494.patch, 0002-YARN-3494.patch, 0002-YARN-3494.patch Now we have the AM resource limit and user limit shown on the web UI, it would be useful to expose them in the QueueMetrics as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3503) Expose disk utilization percentage and bad local and log dir counts on NM via JMX
[ https://issues.apache.org/jira/browse/YARN-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507095#comment-14507095 ] Hudson commented on YARN-3503: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2121 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2121/]) YARN-3503. Expose disk utilization percentage and bad local and log dir counts in NM metrics. Contributed by Varun Vasudev (jianhe: rev 674c7ef64916fabbe59c8d6cdd50ca19cf7ddb7c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java Expose disk utilization percentage and bad local and log dir counts on NM via JMX - Key: YARN-3503 URL: https://issues.apache.org/jira/browse/YARN-3503 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: YARN-3503.0.patch It would be useful to expose the disk utilization as well as the number of bad local disks on the NMs via JMX so that alerts can be setup for nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore
[ https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507090#comment-14507090 ] Hudson commented on YARN-3410: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2121 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2121/]) YARN-3410. YARN admin should be able to remove individual application records from RMStateStore. (Rohith Sharmaks via wangda) (wangda: rev e71d0d87d9b388f211a8eb3d2cd9af347abf9bda) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestLeveldbRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/LeveldbRMStateStore.java * hadoop-yarn-project/hadoop-yarn/bin/yarn * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java YARN-3410. Addendum fix for compilation error. Contributed by Rohith. (aajisaka: rev b08908ae5eaf60a7fc70bf60493a533e915553c5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md YARN admin should be able to remove individual application records from RMStateStore Key: YARN-3410 URL: https://issues.apache.org/jira/browse/YARN-3410 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, yarn Reporter: Wangda Tan Assignee: Rohith Priority: Critical Fix For: 2.8.0 Attachments: 0001-YARN-3410-v1.patch, 0001-YARN-3410.patch, 0001-YARN-3410.patch, 0002-YARN-3410.patch, 0003-YARN-3410.patch, 0004-YARN-3410-addendum-branch-2.patch, 0004-YARN-3410-addendum.patch, 0004-YARN-3410-branch-2.patch, 0004-YARN-3410.patch When RM state store entered an unexpected state, one example is YARN-2340, when an attempt is not in final state but app already completed, RM can never get up unless format RMStateStore. I think we should support remove individual application records from RMStateStore to unblock RM admin make choice of either waiting for a fix or format state store. In addition, RM should be able to report all fatal errors (which will shutdown RM) when doing app recovery, this can save admin some time to remove apps in bad state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3494) Expose AM resource limit and usage in QueueMetrics
[ https://issues.apache.org/jira/browse/YARN-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507089#comment-14507089 ] Hudson commented on YARN-3494: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2121 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2121/]) YARN-3494. Expose AM resource limit and usage in CS QueueMetrics. Contributed by Rohith Sharmaks (jianhe: rev bdd90110e6904b59746812d9a093924a65e72280) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java Expose AM resource limit and usage in QueueMetrics --- Key: YARN-3494 URL: https://issues.apache.org/jira/browse/YARN-3494 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Fix For: 2.8.0 Attachments: 0001-YARN-3494.patch, 0002-YARN-3494.patch, 0002-YARN-3494.patch Now we have the AM resource limit and user limit shown on the web UI, it would be useful to expose them in the QueueMetrics as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507123#comment-14507123 ] Hadoop QA commented on YARN-3225: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 54s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 28s | The applied patch generated 18 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 48s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 7m 17s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 55m 49s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 110m 44s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727006/YARN-3225-5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b08908a | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7446/artifact/patchprocess/whitespace.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7446/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7446/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/7446/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7446/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7446/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7446/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7446//console | This message was automatically generated. New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, YARN-3225-4.patch, YARN-3225-5.patch, YARN-3225.patch, YARN-914.patch New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506991#comment-14506991 ] gu-chi commented on YARN-2308: -- Hi, Chang Li, as I went through the patches that you attached, previously these was +if (application==null) { + LOG.info(can't retireve application attempt); + return; +} but, finally, the patch merged does not have this modification. Is this updated on purpose? What is the concern? I am now facing one scenario, App status is Finished and AppAttempt status is null, this way when doing recover, application is null in CS and then NPE occur. I am thinking if condition application==null was there, the issue I meet will not occur. NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Chang Li Priority: Critical Fix For: 2.6.0 Attachments: YARN-2308.0.patch, YARN-2308.1.patch, jira2308.patch, jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506992#comment-14506992 ] gu-chi commented on YARN-2308: -- Hi, Chang Li, as I went through the patches that you attached, previously these was +if (application==null) { + LOG.info(can't retireve application attempt); + return; +} but, finally, the patch merged does not have this modification. Is this updated on purpose? What is the concern? I am now facing one scenario, App status is Finished and AppAttempt status is null, this way when doing recover, application is null in CS and then NPE occur. I am thinking if condition application==null was there, the issue I meet will not occur. NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Chang Li Priority: Critical Fix For: 2.6.0 Attachments: YARN-2308.0.patch, YARN-2308.1.patch, jira2308.patch, jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507222#comment-14507222 ] Hadoop QA commented on YARN-3434: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 33s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 30s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 17s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 55m 9s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 96m 8s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727222/YARN-3434.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b08908a | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7447/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7447/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7447/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7447//console | This message was automatically generated. Interaction between reservations and userlimit can result in significant ULF violation -- Key: YARN-3434 URL: https://issues.apache.org/jira/browse/YARN-3434 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch ULF was set to 1.0 User was able to consume 1.4X queue capacity. It looks like when this application launched, it reserved about 1000 containers, each 8G each, within about 5 seconds. I think this allowed the logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507181#comment-14507181 ] Junping Du commented on YARN-3225: -- v5 patch LGTM. The new Jenkins result just report some trivial format issue but failed to report details to improve it. +1. I will go ahead to commit it shortly. New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, YARN-3225-4.patch, YARN-3225-5.patch, YARN-3225.patch, YARN-914.patch New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507245#comment-14507245 ] Chang Li commented on YARN-2308: Hi [~gu chi], this jira is intended to fix the NPE caused by a missing queue result from queue configuration during rm restart. I did some early work on this problem, and my initial approach is to do a null check in the exact place NPE happened in addApplicationAttempt. Craig Welch carried on with a different approach. The final patch is checking if the queue is removed and err out. I think your problem is worth firing a separate jira, also I'd like to take on the issue you mentioned. Thanks NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Chang Li Priority: Critical Fix For: 2.6.0 Attachments: YARN-2308.0.patch, YARN-2308.1.patch, jira2308.patch, jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2038) Revisit how AMs learn of containers from previous attempts
[ https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507266#comment-14507266 ] sandflee commented on YARN-2038: If nm register to rm in a short time, we can add a interface to ApplicationMasterService to tell am container comes back. If nm are not registered to rm after nm expire time, rm knows nothing about nm now. Could AM tell RM the node and container Info through ApplicationMasterService.registerApplicationMaster while reregister to rm? with this info, RM could treat the unreigstered NM as a lost NODE after nm expire time, and pass the container complete msg to am. In this solution , we need am to store container info. Revisit how AMs learn of containers from previous attempts -- Key: YARN-2038 URL: https://issues.apache.org/jira/browse/YARN-2038 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Based on YARN-556, we need to update the way AMs learn about containers allocation previous attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3409) Add constraint node labels
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507169#comment-14507169 ] David Villegas commented on YARN-3409: -- Hi [~wangda], Are you planning to make the constraints static, i.e., set by an administrator? Or dynamic, so that they could reflect the current state of the cluster? I was wondering if this type of labels could be used to implement anti-affinity as described in YARN-1042. It seems to me this feature could potentially be similar to [Condor ClassAds|http://research.cs.wisc.edu/htcondor/manual/v7.6/4_1Condor_s_ClassAd.html], where a container request could specify things like the average load of the machine, or whether it is already running containers for a particular application type. Add constraint node labels -- Key: YARN-3409 URL: https://issues.apache.org/jira/browse/YARN-3409 Project: Hadoop YARN Issue Type: Sub-task Components: api, capacityscheduler, client Reporter: Wangda Tan Assignee: Wangda Tan Specify only one label for each node (IAW, partition a cluster) is a way to determinate how resources of a special set of nodes could be shared by a group of entities (like teams, departments, etc.). Partitions of a cluster has following characteristics: - Cluster divided to several disjoint sub clusters. - ACL/priority can apply on partition (Only market team / marke team has priority to use the partition). - Percentage of capacities can apply on partition (Market team has 40% minimum capacity and Dev team has 60% of minimum capacity of the partition). Constraints are orthogonal to partition, they’re describing attributes of node’s hardware/software just for affinity. Some example of constraints: - glibc version - JDK version - Type of CPU (x86_64/i686) - Type of OS (windows, linux, etc.) With this, application can be able to ask for resource has (glibc.version = 2.20 JDK.version = 8u20 x86_64). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3511) Add errors and warnings page to ATS
[ https://issues.apache.org/jira/browse/YARN-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3511: Attachment: YARN-3511.002.patch Added check to ensure only admins can access errors and warnings page. Add errors and warnings page to ATS --- Key: YARN-3511 URL: https://issues.apache.org/jira/browse/YARN-3511 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-3511.001.patch, YARN-3511.002.patch YARN-2901 adds the capability to view errors and warnings on the web UI. The ATS was missed out. Add support for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507416#comment-14507416 ] Rohith commented on YARN-3223: -- [~varun_saxena] Are you woking on this JIRA? Would you mind if I take over this If you have not started working on this? Resource update during NM graceful decommission --- Key: YARN-3223 URL: https://issues.apache.org/jira/browse/YARN-3223 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Junping Du Assignee: Varun Saxena During NM graceful decommission, we should handle resource update properly, include: make RMNode keep track of old resource for possible rollback, keep available resource to 0 and used resource get updated when container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3522) DistributedShell uses the wrong user to put timeline data
[ https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3522: -- Attachment: YARN-3522.1.patch DistributedShell uses the wrong user to put timeline data - Key: YARN-3522 URL: https://issues.apache.org/jira/browse/YARN-3522 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-3522.1.patch YARN-3287 breaks the timeline access control of distributed shell. In distributed shell AM: {code} if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { // Creating the Timeline Client timelineClient = TimelineClient.createTimelineClient(); timelineClient.init(conf); timelineClient.start(); } else { timelineClient = null; LOG.warn(Timeline service is not enabled); } {code} {code} ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() { @Override public TimelinePutResponse run() throws Exception { return timelineClient.putEntities(entity); } }); {code} YARN-3287 changes the timeline client to get the right ugi at serviceInit, but DS AM still doesn't use submitter ugi to init timeline client, but use the ugi for each put entity call. It result in the wrong user of the put request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3522) DistributedShell uses the wrong user to put timeline data
[ https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3522: -- Attachment: (was: YARN-3522.1.patch) DistributedShell uses the wrong user to put timeline data - Key: YARN-3522 URL: https://issues.apache.org/jira/browse/YARN-3522 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-3522.1.patch YARN-3287 breaks the timeline access control of distributed shell. In distributed shell AM: {code} if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { // Creating the Timeline Client timelineClient = TimelineClient.createTimelineClient(); timelineClient.init(conf); timelineClient.start(); } else { timelineClient = null; LOG.warn(Timeline service is not enabled); } {code} {code} ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() { @Override public TimelinePutResponse run() throws Exception { return timelineClient.putEntities(entity); } }); {code} YARN-3287 changes the timeline client to get the right ugi at serviceInit, but DS AM still doesn't use submitter ugi to init timeline client, but use the ugi for each put entity call. It result in the wrong user of the put request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data
[ https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507503#comment-14507503 ] Zhijie Shen commented on YARN-3522: --- /cc [~jeagles] DistributedShell uses the wrong user to put timeline data - Key: YARN-3522 URL: https://issues.apache.org/jira/browse/YARN-3522 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-3522.1.patch YARN-3287 breaks the timeline access control of distributed shell. In distributed shell AM: {code} if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { // Creating the Timeline Client timelineClient = TimelineClient.createTimelineClient(); timelineClient.init(conf); timelineClient.start(); } else { timelineClient = null; LOG.warn(Timeline service is not enabled); } {code} {code} ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() { @Override public TimelinePutResponse run() throws Exception { return timelineClient.putEntities(entity); } }); {code} YARN-3287 changes the timeline client to get the right ugi at serviceInit, but DS AM still doesn't use submitter ugi to init timeline client, but use the ugi for each put entity call. It result in the wrong user of the put request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507409#comment-14507409 ] Wangda Tan commented on YARN-2308: -- [~gu chi], the issue you mentioned seems like already solved by YARN-2340. Could you please check? NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Chang Li Priority: Critical Fix For: 2.6.0 Attachments: YARN-2308.0.patch, YARN-2308.1.patch, jira2308.patch, jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507482#comment-14507482 ] Inigo Goiri commented on YARN-3458: --- Anybody has inputs on the findbugs issues? What about the unit test? Any proposal? CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Priority: Minor Labels: containers, metrics, windows Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507504#comment-14507504 ] Junping Du commented on YARN-3411: -- Thanks [~vrushalic] for reply! bq. But I will be uploading a refined patch + some more changes like Metric writing soon. +1. The plan sounds good to me. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507511#comment-14507511 ] Hadoop QA commented on YARN-3413: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 31s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 19 new or modified test files. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 30 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 25s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 27s | The applied patch generated 11 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 2s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | mapreduce tests | 107m 21s | Tests passed in hadoop-mapreduce-client-jobclient. | | {color:green}+1{color} | yarn tests | 0m 28s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 7m 0s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 7m 13s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 54m 14s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 223m 41s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727015/YARN-3413.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b08908a | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/whitespace.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7448/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7448//console | This message was automatically generated. Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime -- Key: YARN-3413 URL: https://issues.apache.org/jira/browse/YARN-3413 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, YARN-3413.4.patch, YARN-3413.5.patch As mentioned in : https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. Changing node label exclusivity and/or other attributes may not be a real use case, and also we should support setting node label attributes whiling adding them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507531#comment-14507531 ] Wangda Tan commented on YARN-3413: -- Failed test (TestAMRestart) is not related to this patch, it can get passed locally. checkstyle/whitespace are lack of details, and they're some minor formatting suggestions. Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime -- Key: YARN-3413 URL: https://issues.apache.org/jira/browse/YARN-3413 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, YARN-3413.4.patch, YARN-3413.5.patch As mentioned in : https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. Changing node label exclusivity and/or other attributes may not be a real use case, and also we should support setting node label attributes whiling adding them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507436#comment-14507436 ] Wangda Tan commented on YARN-2740: -- Latest patch LGTM, +1. checkstyle result lack of details and some minor formatting suggestions, will commit today. ResourceManager side should properly handle node label modifications when distributed node label configuration enabled -- Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, YARN-2740.20150417-1.patch, YARN-2740.20150420-1.patch, YARN-2740.20150421-1.patch, YARN-2740.20150422-2.patch According to YARN-2495, when distributed node label configuration is enabled: - RMAdmin / REST API should reject change labels on node operations. - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3525) Rename fair scheduler properties increment-allocation-mb and increment-allocation-vcores
[ https://issues.apache.org/jira/browse/YARN-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507449#comment-14507449 ] Bibin A Chundatt commented on YARN-3525: Hi [~ywskycn] , Thank you for looking in to the issue. All the fair scheduler properties we mention in yarn-site only except queue level configuration. (About 13 properties) Reference :http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html Except the mentioned properties all of them follow the same prefix pattern specific to fair. So i feel we should sync these two properties also. Rename fair scheduler properties increment-allocation-mb and increment-allocation-vcores Key: YARN-3525 URL: https://issues.apache.org/jira/browse/YARN-3525 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Rename below two properties since only used by fair scheduler {color:blue}yarn.scheduler.increment-allocation-mb{color} to {color:red}yarn.scheduler.fair.increment-allocation-mb{color} {color:blue}yarn.scheduler.increment-allocation-vcores{color} to {color:red}yarn.scheduler.fair.increment-allocation-vcores{color} All other properties only for fair scheduler are using {color:red} yarn.scheduler.fair{color} prefix . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3511) Add errors and warnings page to ATS
[ https://issues.apache.org/jira/browse/YARN-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507489#comment-14507489 ] Hadoop QA commented on YARN-3511: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 55s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 3m 51s | There were no new checkstyle issues. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 51s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 3m 13s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 54m 29s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 99m 46s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727254/YARN-3511.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b08908a | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7449/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7449/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7449/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7449/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7449//console | This message was automatically generated. Add errors and warnings page to ATS --- Key: YARN-3511 URL: https://issues.apache.org/jira/browse/YARN-3511 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-3511.001.patch, YARN-3511.002.patch YARN-2901 adds the capability to view errors and warnings on the web UI. The ATS was missed out. Add support for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507501#comment-14507501 ] Sunil G commented on YARN-1963: --- Thank you [~grey] for sharing the thoughts. As per the design, integer will be used in schedulers all alone. Hence all comparisons and operations can be done on integer. However we can have a label mapping for the integer which can be used while application submission, and to view in UI etc. Labels can be added as only a mappings to integer. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: 0001-YARN-1963-prototype.patch, YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data
[ https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507516#comment-14507516 ] Hadoop QA commented on YARN-3522: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727285/YARN-3522.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1f4767c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7450//console | This message was automatically generated. DistributedShell uses the wrong user to put timeline data - Key: YARN-3522 URL: https://issues.apache.org/jira/browse/YARN-3522 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-3522.1.patch YARN-3287 breaks the timeline access control of distributed shell. In distributed shell AM: {code} if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { // Creating the Timeline Client timelineClient = TimelineClient.createTimelineClient(); timelineClient.init(conf); timelineClient.start(); } else { timelineClient = null; LOG.warn(Timeline service is not enabled); } {code} {code} ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() { @Override public TimelinePutResponse run() throws Exception { return timelineClient.putEntities(entity); } }); {code} YARN-3287 changes the timeline client to get the right ugi at serviceInit, but DS AM still doesn't use submitter ugi to init timeline client, but use the ugi for each put entity call. It result in the wrong user of the put request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3522) DistributedShell uses the wrong user to put timeline data
[ https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3522: -- Attachment: YARN-3522.2.patch Previous patch was not generated correctly. Create a new one. DistributedShell uses the wrong user to put timeline data - Key: YARN-3522 URL: https://issues.apache.org/jira/browse/YARN-3522 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-3522.1.patch, YARN-3522.2.patch YARN-3287 breaks the timeline access control of distributed shell. In distributed shell AM: {code} if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { // Creating the Timeline Client timelineClient = TimelineClient.createTimelineClient(); timelineClient.init(conf); timelineClient.start(); } else { timelineClient = null; LOG.warn(Timeline service is not enabled); } {code} {code} ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() { @Override public TimelinePutResponse run() throws Exception { return timelineClient.putEntities(entity); } }); {code} YARN-3287 changes the timeline client to get the right ugi at serviceInit, but DS AM still doesn't use submitter ugi to init timeline client, but use the ugi for each put entity call. It result in the wrong user of the put request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly
[ https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2605: Attachment: YARN-2605.2.patch [RM HA] Rest api endpoints doing redirect incorrectly - Key: YARN-2605 URL: https://issues.apache.org/jira/browse/YARN-2605 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: bc Wong Assignee: Xuan Gong Labels: newbie Attachments: YARN-2605.1.patch, YARN-2605.2.patch The standby RM's webui tries to do a redirect via meta-refresh. That is fine for pages designed to be viewed by web browsers. But the API endpoints shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd suggest HTTP 303, or return a well-defined error message (json or xml) stating that the standby status and a link to the active RM. The standby RM is returning this today: {noformat} $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics HTTP/1.1 200 OK Cache-Control: no-cache Expires: Thu, 25 Sep 2014 18:34:53 GMT Date: Thu, 25 Sep 2014 18:34:53 GMT Pragma: no-cache Expires: Thu, 25 Sep 2014 18:34:53 GMT Date: Thu, 25 Sep 2014 18:34:53 GMT Pragma: no-cache Content-Type: text/plain; charset=UTF-8 Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics Content-Length: 117 Server: Jetty(6.1.26) This is standby RM. Redirecting to the current active RM: http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3413: - Attachment: YARN-3413.6.patch Fixed trivial whitespace checks. (Ver.6) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime -- Key: YARN-3413 URL: https://issues.apache.org/jira/browse/YARN-3413 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, YARN-3413.4.patch, YARN-3413.5.patch, YARN-3413.6.patch As mentioned in : https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. Changing node label exclusivity and/or other attributes may not be a real use case, and also we should support setting node label attributes whiling adding them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507689#comment-14507689 ] Hadoop QA commented on YARN-3413: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 1s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727305/YARN-3413.6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 12f4df0 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7453//console | This message was automatically generated. Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime -- Key: YARN-3413 URL: https://issues.apache.org/jira/browse/YARN-3413 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, YARN-3413.4.patch, YARN-3413.5.patch, YARN-3413.6.patch As mentioned in : https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. Changing node label exclusivity and/or other attributes may not be a real use case, and also we should support setting node label attributes whiling adding them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: (was: output_minicluster.rtf) Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: output_minicluster.txt After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: output_minicluster2.txt Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: output_minicluster2.txt After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: (was: output_minicluster.txt) Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: output_minicluster2.txt After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507701#comment-14507701 ] Craig Welch commented on YARN-3319: --- bq. Some minor comments about configuration part by index: 1) done 2) done 3) done - see below bq. Do you think is it better to make property in queue-name.ordering-policy.policy-name.property-key?... Now that there is not proper composition only one policy can be active at a time and it shouldn't be necessary to namespace config items this way. At the same time, I could see us getting back to proper composition at some point, where this would be helpful. I've implemented it as a prefix convention in the policy instead of constraining the contents of the map in the capacity scheduler configuration. This is because we still support passing a class name as the policy type, which would make the configurations for class name based items unwieldy. It would also allow us to have shared configuration items between policies if we do end up with proper composition again. The end result of the configuration was as you suggested 4) done 5) done bq. FairOrderingPolicy: all 3 done bq. Findbugs warning? Failed to stage change, so it didn't make it into patch, should be there now. Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, YARN-3319.72.patch, YARN-3319.73.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3530) ATS throws exception on trying to filter results without otherinfo.
[ https://issues.apache.org/jira/browse/YARN-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-3530: - Assignee: Zhijie Shen ATS throws exception on trying to filter results without otherinfo. --- Key: YARN-3530 URL: https://issues.apache.org/jira/browse/YARN-3530 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: Sreenath Somarajapuram Assignee: Zhijie Shen Priority: Blocker Scenario: Am attempting to make data loading faster by fetching otherinfo on demand. As shown in the attached image, the patch adds a 'Load Counters' checkbox. It would be disabled by default, and on clicking, the counter data also would be loaded. Issue: Things are good when otherinfo is loaded. But ATS throws exception on trying to filter on status or applicationId without otherinfo in fields list. In other words, using fields=events,primaryfilters with secondaryFilter=status:RUNNING will return { exception: WebApplicationException, message: java.lang.NullPointerException, javaClassName: javax.ws.rs.WebApplicationException } from the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3531) Make good local directories available to ContainerExecutors at initialization time
Sidharta Seethana created YARN-3531: --- Summary: Make good local directories available to ContainerExecutors at initialization time Key: YARN-3531 URL: https://issues.apache.org/jira/browse/YARN-3531 Project: Hadoop YARN Issue Type: Improvement Reporter: Sidharta Seethana Currently, in the NodeManager's serviceInit() function, the configured executor is initialized before the node health checker/directory handler service are initialized. There are use cases where executor initialization requires access to 'good' local directories ( e.g for creation of temporary files , see YARN-3366 ). We need to figure out a way to make this possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data
[ https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507685#comment-14507685 ] Jian He commented on YARN-3522: --- - I think YARN-3287 in some sense is incompatible, since it forces user to use doAs to create the timeLineClient which is not required before. Is this ok ? I suggest adding a code comment in TimeLineClient#createTimelineClient to say caller must use doAs to create the timeLineClient - start and end event occurred in the same run() method ? {code} if(timelineClient != null) { publishApplicationAttemptEvent(timelineClient, appAttemptID.toString(), DSEvent.DS_APP_ATTEMPT_START, domainId, appSubmitterUgi); } {code} DistributedShell uses the wrong user to put timeline data - Key: YARN-3522 URL: https://issues.apache.org/jira/browse/YARN-3522 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-3522.1.patch, YARN-3522.2.patch YARN-3287 breaks the timeline access control of distributed shell. In distributed shell AM: {code} if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { // Creating the Timeline Client timelineClient = TimelineClient.createTimelineClient(); timelineClient.init(conf); timelineClient.start(); } else { timelineClient = null; LOG.warn(Timeline service is not enabled); } {code} {code} ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() { @Override public TimelinePutResponse run() throws Exception { return timelineClient.putEntities(entity); } }); {code} YARN-3287 changes the timeline client to get the right ugi at serviceInit, but DS AM still doesn't use submitter ugi to init timeline client, but use the ugi for each put entity call. It result in the wrong user of the put request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: output_minicluster.rtf Thanks [~gtCarrera9] for filing the jira! Current status: I presently am using the hbase minicluster from HBaseTestingUtility in the unit tests for YARN-3411. Right now, I have my setup working in eclipse. Attaching the eclipse log that shows that a mini hbase cluster/zookeeper/ regionservers are starting and creating tables and shutting down when I run the unit test from org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl. Some relevant code bits: {code} private static HBaseTestingUtility UTIL; @BeforeClass public static void setupBeforeClass() throws Exception { UTIL = new HBaseTestingUtility(); UTIL.startMiniCluster(); createSchema(); } @AfterClass public static void tearDownAfterClass() throws Exception { UTIL.shutdownMiniCluster(); } {code} Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: output_minicluster.rtf After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507576#comment-14507576 ] Wangda Tan commented on YARN-3413: -- Commented on HADOOP-11746: https://issues.apache.org/jira/browse/HADOOP-11746?focusedCommentId=14507573page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507573 as well. Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime -- Key: YARN-3413 URL: https://issues.apache.org/jira/browse/YARN-3413 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, YARN-3413.4.patch, YARN-3413.5.patch As mentioned in : https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. Changing node label exclusivity and/or other attributes may not be a real use case, and also we should support setting node label attributes whiling adding them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
Li Lu created YARN-3529: --- Summary: Add miniHBase cluster and Phoenix support to ATS v2 unit tests Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3530) ATS throws exception on trying to filter results without otherinfo.
Sreenath Somarajapuram created YARN-3530: Summary: ATS throws exception on trying to filter results without otherinfo. Key: YARN-3530 URL: https://issues.apache.org/jira/browse/YARN-3530 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: Sreenath Somarajapuram Priority: Blocker Scenario: Am attempting to make data loading faster by fetching otherinfo on demand. As shown in the attached image, the patch adds a 'Load Counters' checkbox. It would be disabled by default, and on clicking, the counter data also would be loaded. Issue: Things are good when otherinfo is loaded. But ATS throws exception on trying to filter on status or applicationId without otherinfo in fields list. In other words, using fields=events,primaryfilters with secondaryFilter=status:RUNNING will return { exception: WebApplicationException, message: java.lang.NullPointerException, javaClassName: javax.ws.rs.WebApplicationException } from the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3530) ATS throws exception on trying to filter results without otherinfo.
[ https://issues.apache.org/jira/browse/YARN-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3530: -- Component/s: (was: yarn) timelineserver Priority: Critical (was: Blocker) Target Version/s: 2.8.0 ATS throws exception on trying to filter results without otherinfo. --- Key: YARN-3530 URL: https://issues.apache.org/jira/browse/YARN-3530 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Sreenath Somarajapuram Assignee: Zhijie Shen Priority: Critical Scenario: Am attempting to make data loading faster by fetching otherinfo on demand. As shown in the attached image, the patch adds a 'Load Counters' checkbox. It would be disabled by default, and on clicking, the counter data also would be loaded. Issue: Things are good when otherinfo is loaded. But ATS throws exception on trying to filter on status or applicationId without otherinfo in fields list. In other words, using fields=events,primaryfilters with secondaryFilter=status:RUNNING will return { exception: WebApplicationException, message: java.lang.NullPointerException, javaClassName: javax.ws.rs.WebApplicationException } from the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data
[ https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507641#comment-14507641 ] Hadoop QA commented on YARN-3522: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 7m 46s | The applied patch generated 2 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 58s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 6m 43s | Tests failed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 52m 37s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.applications.distributedshell.TestDistributedShell | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727290/YARN-3522.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1f4767c | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7451/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7451/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7451/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7451/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7451//console | This message was automatically generated. DistributedShell uses the wrong user to put timeline data - Key: YARN-3522 URL: https://issues.apache.org/jira/browse/YARN-3522 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-3522.1.patch, YARN-3522.2.patch YARN-3287 breaks the timeline access control of distributed shell. In distributed shell AM: {code} if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { // Creating the Timeline Client timelineClient = TimelineClient.createTimelineClient(); timelineClient.init(conf); timelineClient.start(); } else { timelineClient = null; LOG.warn(Timeline service is not enabled); } {code} {code} ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() { @Override public TimelinePutResponse run() throws Exception { return timelineClient.putEntities(entity); } }); {code} YARN-3287 changes the timeline client to get the right ugi at serviceInit, but DS AM still doesn't use submitter ugi to init timeline client, but use the ugi for each put entity call. It result in the wrong user of the put request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507657#comment-14507657 ] Sidharta Seethana commented on YARN-3366: - Hi [~vinodkv], {quote} conf.get(hadoop.tmp.dir): We should write to the nmPrivate directories instead of /tmp. {quote} Digging in this further, it turns out that the change is far from trivial because of the way initialization works in the node manager today. I filed a separate JIRA to track this : https://issues.apache.org/jira/browse/YARN-3531 . I'll update the patch based on the rest of the feedback as discussed above. thanks Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3366.001.patch, YARN-3366.002.patch, YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, YARN-3366.006.patch In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly
[ https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507656#comment-14507656 ] Xuan Gong commented on YARN-2605: - Uploaded a new patch, and verified in a single node HA cluster. [RM HA] Rest api endpoints doing redirect incorrectly - Key: YARN-2605 URL: https://issues.apache.org/jira/browse/YARN-2605 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: bc Wong Assignee: Xuan Gong Labels: newbie Attachments: YARN-2605.1.patch, YARN-2605.2.patch The standby RM's webui tries to do a redirect via meta-refresh. That is fine for pages designed to be viewed by web browsers. But the API endpoints shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd suggest HTTP 303, or return a well-defined error message (json or xml) stating that the standby status and a link to the active RM. The standby RM is returning this today: {noformat} $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics HTTP/1.1 200 OK Cache-Control: no-cache Expires: Thu, 25 Sep 2014 18:34:53 GMT Date: Thu, 25 Sep 2014 18:34:53 GMT Pragma: no-cache Expires: Thu, 25 Sep 2014 18:34:53 GMT Date: Thu, 25 Sep 2014 18:34:53 GMT Pragma: no-cache Content-Type: text/plain; charset=UTF-8 Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics Content-Length: 117 Server: Jetty(6.1.26) This is standby RM. Redirecting to the current active RM: http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: output_minicluster.txt Attaching the eclipse log as a .txt Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: output_minicluster.txt After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.74.patch Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-2168) SCM/Client/NM/Admin protocols
[ https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reopened YARN-2168: --- SCM/Client/NM/Admin protocols - Key: YARN-2168 URL: https://issues.apache.org/jira/browse/YARN-2168 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2168-trunk-v1.patch, YARN-2168-trunk-v2.patch This jira is meant to be used to review the main shared cache APIs. They are as follows: * ClientSCMProtocol - The protocol between the yarn client and the cache manager. This protocol controls how resources in the cache are claimed and released. ** UseSharedCacheResourceRequest ** UseSharedCacheResourceResponse ** ReleaseSharedCacheResourceRequest ** ReleaseSharedCacheResourceResponse * SCMAdminProtocol - This is an administrative protocol for the cache manager. It allows administrators to manually trigger cleaner runs. ** RunSharedCacheCleanerTaskRequest ** RunSharedCacheCleanerTaskResponse * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the cache manager. This allows the NodeManager to coordinate with the cache manager when uploading new resources to the shared cache. ** NotifySCMRequest ** NotifySCMResponse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (YARN-2168) SCM/Client/NM/Admin protocols
[ https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli closed YARN-2168. - SCM/Client/NM/Admin protocols - Key: YARN-2168 URL: https://issues.apache.org/jira/browse/YARN-2168 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2168-trunk-v1.patch, YARN-2168-trunk-v2.patch This jira is meant to be used to review the main shared cache APIs. They are as follows: * ClientSCMProtocol - The protocol between the yarn client and the cache manager. This protocol controls how resources in the cache are claimed and released. ** UseSharedCacheResourceRequest ** UseSharedCacheResourceResponse ** ReleaseSharedCacheResourceRequest ** ReleaseSharedCacheResourceResponse * SCMAdminProtocol - This is an administrative protocol for the cache manager. It allows administrators to manually trigger cleaner runs. ** RunSharedCacheCleanerTaskRequest ** RunSharedCacheCleanerTaskResponse * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the cache manager. This allows the NodeManager to coordinate with the cache manager when uploading new resources to the shared cache. ** NotifySCMRequest ** NotifySCMResponse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: (was: output_minicluster2.txt) Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2168) SCM/Client/NM/Admin protocols
[ https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-2168. --- Resolution: Duplicate Fix Version/s: 2.7.0 Resolving this instead as a duplicate. SCM/Client/NM/Admin protocols - Key: YARN-2168 URL: https://issues.apache.org/jira/browse/YARN-2168 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2168-trunk-v1.patch, YARN-2168-trunk-v2.patch This jira is meant to be used to review the main shared cache APIs. They are as follows: * ClientSCMProtocol - The protocol between the yarn client and the cache manager. This protocol controls how resources in the cache are claimed and released. ** UseSharedCacheResourceRequest ** UseSharedCacheResourceResponse ** ReleaseSharedCacheResourceRequest ** ReleaseSharedCacheResourceResponse * SCMAdminProtocol - This is an administrative protocol for the cache manager. It allows administrators to manually trigger cleaner runs. ** RunSharedCacheCleanerTaskRequest ** RunSharedCacheCleanerTaskResponse * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the cache manager. This allows the NodeManager to coordinate with the cache manager when uploading new resources to the shared cache. ** NotifySCMRequest ** NotifySCMResponse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-2654) Revisit all shared cache config parameters to ensure quality names
[ https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reopened YARN-2654: --- Revisit all shared cache config parameters to ensure quality names -- Key: YARN-2654 URL: https://issues.apache.org/jira/browse/YARN-2654 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Blocker Attachments: shared_cache_config_parameters.txt Revisit all the shared cache config parameters in YarnConfiguration and yarn-default.xml to ensure quality names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: output_minicluster2.txt Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: output_minicluster2.txt After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2654) Revisit all shared cache config parameters to ensure quality names
[ https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-2654. --- Resolution: Won't Fix Closing as 'Won't Fix' Revisit all shared cache config parameters to ensure quality names -- Key: YARN-2654 URL: https://issues.apache.org/jira/browse/YARN-2654 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Blocker Attachments: shared_cache_config_parameters.txt Revisit all the shared cache config parameters in YarnConfiguration and yarn-default.xml to ensure quality names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-3434: Attachment: YARN-3434.patch Fixed the line length and the white space style issues. Other then that I moved things around and its just complaining about the same things more. Interaction between reservations and userlimit can result in significant ULF violation -- Key: YARN-3434 URL: https://issues.apache.org/jira/browse/YARN-3434 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch ULF was set to 1.0 User was able to consume 1.4X queue capacity. It looks like when this application launched, it reserved about 1000 containers, each 8G each, within about 5 seconds. I think this allowed the logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508097#comment-14508097 ] Li Lu commented on YARN-3390: - Hi [~zjshen], thanks for the patch! Here are some of my comments. Most of them are quite minor: # Changes in RMContainerAllocator.java appears to be irrelevant. Seems like this is changed by an IDE by mistake (on a refactoring)? # In the following lines {code} +for (String tag : app.getApplicationTags()) { + String value = null; + if ((value = getFlowContext(TimelineUtils.FLOW_NAME_TAG_PREFIX, tag)) != null the first null assignment to value is marked as redundant + if ((value = getFlowContext(TimelineUtils.FLOW_NAME_TAG_PREFIX, tag)) != null + !value.isEmpty()) { + collector.getTimelineEntityContext().setFlowName(value); + } else if ((value = getFlowContext(TimelineUtils.FLOW_VERSION_TAG_PREFIX, tag)) != null + !value.isEmpty()) { +collector.getTimelineEntityContext().setFlowVersion(value); + } else if ((value = getFlowContext(TimelineUtils.FLOW_RUN_ID_TAG_PREFIX, tag)) != null + !value.isEmpty()) { +collector.getTimelineEntityContext().setFlowRunId(Long.valueOf(value)); + } {code} Maybe we’d like to use a switch statement to deal with this? We may first split the tag into two parts, based on the first “:”, and then switch the first part of the returned array to set the second part of the array into flow name, version, and run id. Am I missing any fundamental obstacles for us to do this here? (String switch is available from Java 7) # Rename {{MyNMTimelineCollectorManager}} in TestTimelineServiceClientIntegration with something indicating it's for testing? # In the following lines: {code} - protected TimelineCollectorContext getTimelineEntityContext() { + public TimelineCollectorContext getTimelineEntityContext() { {code} We're exposing TimelineCollectorContext but we're not annotating the class. Even though we may treat unannotated classes as Audience.Private, maybe we'd like to mark it as unstable? # In TimelineCollectorManager, I'm still having this question, although we may not want to address it in this JIRA: are there any special consistency requirements that prevent us from using ConcurrentHashMap? # In TimelineCollectorWebService, why we're removing the utility function {{getCollector}}? I think we can reuse it when adding new web services. Reuse TimelineCollectorManager for RM - Key: YARN-3390 URL: https://issues.apache.org/jira/browse/YARN-3390 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3390.1.patch RMTimelineCollector should have the context info of each app whose entity has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508151#comment-14508151 ] Sangjin Lee commented on YARN-3437: --- Could you kindly take a look at the latest patch? Thanks! convert load test driver to timeline service v.2 Key: YARN-3437 URL: https://issues.apache.org/jira/browse/YARN-3437 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3437.001.patch, YARN-3437.002.patch, YARN-3437.003.patch This subtask covers the work for converting the proposed patch for the load test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508204#comment-14508204 ] Vinod Kumar Vavilapalli commented on YARN-3366: --- +1 for the latest patch. Checking this in. Can you file a ticket for the checkstyle rules' issues? Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3366.001.patch, YARN-3366.002.patch, YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, YARN-3366.006.patch, YARN-3366.007.patch In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508228#comment-14508228 ] Hudson commented on YARN-3366: -- FAILURE: Integrated in Hadoop-trunk-Commit #7642 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7642/]) YARN-3366. Enhanced NodeManager to support classifying/shaping outgoing network bandwidth traffic originating from YARN containers Contributed by Sidharta Seethana. (vinodkv: rev a100be685cc4521e9949589948219231aa5d2733) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestResourceHandlerModule.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestTrafficControlBandwidthHandlerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestTrafficController.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/OutboundBandwidthResourceHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficControlBandwidthHandlerImpl.java Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana Fix For: 2.8.0 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, YARN-3366.006.patch, YARN-3366.007.patch In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508378#comment-14508378 ] Sangjin Lee commented on YARN-3437: --- Thanks for the review [~djp]! bq. For performance perspective, we should move LOG.info() out of synchronized block (may be move out of collector.start()?). I can move the LOG.info() call outside the synchronized block. That said, I don't think this would have a meaningful performance impact. Aside from the fact that logging calls are usually synchronized themselves, it is reasonable to expect that the contention for this lock (collectors) would be quite low. We're talking about contention when multiple AMs are competing to create collectors on the same node, and the chances that there is any contention on this lock would be very low. Also, when you said may be move out of collector.start(), did you mean moving the collector.start() call outside the synchronization block? If so, I'd be hesitant to do that. We just had a discussion on this in another JIRA (see https://issues.apache.org/jira/browse/YARN-3390?focusedCommentId=14508121page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14508121). bq. we don't need to LOG.ERROR (replace with INFO?) That is a good suggestion. I'll update this (and remove()) to lower the logging level for this. bq. For remove(), similar that we should move collector.stop() and LOG.info() out of synchronized block. This we can do safely. I'll update the patch. convert load test driver to timeline service v.2 Key: YARN-3437 URL: https://issues.apache.org/jira/browse/YARN-3437 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3437.001.patch, YARN-3437.002.patch, YARN-3437.003.patch This subtask covers the work for converting the proposed patch for the load test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508220#comment-14508220 ] Sidharta Seethana commented on YARN-3366: - Here is the ticket : https://issues.apache.org/jira/browse/HADOOP-11869 Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3366.001.patch, YARN-3366.002.patch, YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, YARN-3366.006.patch, YARN-3366.007.patch In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin
[ https://issues.apache.org/jira/browse/YARN-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508312#comment-14508312 ] Brahma Reddy Battula commented on YARN-3532: findbugs are handled in HADOOP-11821. nodemanager version in RM nodes page didn't update when NMs rejoin -- Key: YARN-3532 URL: https://issues.apache.org/jira/browse/YARN-3532 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-3532.v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin
[ https://issues.apache.org/jira/browse/YARN-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508402#comment-14508402 ] Rohith commented on YARN-3532: -- Is it dup of YARN-1981? nodemanager version in RM nodes page didn't update when NMs rejoin -- Key: YARN-3532 URL: https://issues.apache.org/jira/browse/YARN-3532 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-3532.v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled
[ https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508414#comment-14508414 ] Rohith commented on YARN-3533: -- +1(non-binding) LGTM .. Test: Fix launchAM in MockRM to wait for attempt to be scheduled Key: YARN-3533 URL: https://issues.apache.org/jira/browse/YARN-3533 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3533.001.patch MockRM#launchAM fails in many test runs because it does not wait for the app attempt to be scheduled before NM update is sent as noted in [recent builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3448: -- Component/s: timelineserver Add Rolling Time To Lives Level DB Plugin Capabilities -- Key: YARN-3448 URL: https://issues.apache.org/jira/browse/YARN-3448 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3448.1.patch, YARN-3448.10.patch, YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. An exclusive write lock is held during the entire deletion phase which in practice can be hours. If we are to relax some of the consistency constraints, other performance enhancing techniques can be employed to maximize the throughput and minimize locking time. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. This allows each database to maximize the read cache effectiveness based on the unique usage patterns of each database. With 5 separate databases each lookup is much faster. This can also help with I/O to have the entity and index databases on separate disks. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. We replace DB record removal with file system removal if we create a rolling set of databases that age out and can be efficiently removed. To do this we must place a constraint to always place an entity's events into it's correct rolling db instance based on start time. This allows us to stitching the data back together while reading and artificial paging. Relax the synchronous writes constraints. If we are willing to accept losing some records that we not flushed in the operating system during a crash, we can use async writes that can be much faster. Prefer Sequential writes. sequential writes can be several times faster than random writes. Spend some small effort arranging the writes in such a way that will trend towards sequential write performance over random write performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508304#comment-14508304 ] Siddharth Wagle commented on YARN-3529: --- Enlisted deps here : {code} dependency groupIdorg.apache.phoenix/groupId artifactIdphoenix-core/artifactId version${phoenix.version}/version exclusions exclusion groupIdorg.apache.hadoop/groupId artifactIdhadoop-common/artifactId /exclusion exclusion groupIdorg.apache.hadoop/groupId artifactIdhadoop-annotations/artifactId /exclusion /exclusions /dependency !-- for unit tests only -- dependency groupIdorg.apache.phoenix/groupId artifactIdphoenix-core/artifactId typetest-jar/type version${phoenix.version}/version scopetest/scope /dependency dependency groupIdorg.apache.hbase/groupId artifactIdhbase-it/artifactId version${hbase.version}/version scopetest/scope classifiertests/classifier /dependency dependency groupIdorg.apache.hbase/groupId artifactIdhbase-testing-util/artifactId version${hbase.version}/version scopetest/scope optionaltrue/optional exclusions exclusion groupIdorg.jruby/groupId artifactIdjruby-complete/artifactId /exclusion /exclusions /dependency dependency {code} Add miniHBase cluster and Phoenix support to ATS v2 unit tests -- Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: AbstractMiniHBaseClusterTest.java, output_minicluster2.txt After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3534) Report node resource utilization
Inigo Goiri created YARN-3534: - Summary: Report node resource utilization Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the NodeResourceMonitor and send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508400#comment-14508400 ] Hadoop QA commented on YARN-3437: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727521/YARN-3437.004.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / a100be6 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7466//console | This message was automatically generated. convert load test driver to timeline service v.2 Key: YARN-3437 URL: https://issues.apache.org/jira/browse/YARN-3437 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3437.001.patch, YARN-3437.002.patch, YARN-3437.003.patch, YARN-3437.004.patch This subtask covers the work for converting the proposed patch for the load test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508183#comment-14508183 ] Wangda Tan commented on YARN-3413: -- Failed test is not related to the patch. Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime -- Key: YARN-3413 URL: https://issues.apache.org/jira/browse/YARN-3413 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, YARN-3413.4.patch, YARN-3413.5.patch, YARN-3413.6.patch, YARN-3413.7.patch As mentioned in : https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. Changing node label exclusivity and/or other attributes may not be a real use case, and also we should support setting node label attributes whiling adding them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3437: -- Attachment: YARN-3437.004.patch Patch v.4. - moved logging statements out of the synchronized blocks - dropped logging level from ERROR to INFO - reduced the synchronization scope in remove() convert load test driver to timeline service v.2 Key: YARN-3437 URL: https://issues.apache.org/jira/browse/YARN-3437 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3437.001.patch, YARN-3437.002.patch, YARN-3437.003.patch, YARN-3437.004.patch This subtask covers the work for converting the proposed patch for the load test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508192#comment-14508192 ] Sidharta Seethana commented on YARN-3366: - The test failure is unrelated is unrelated to this patch. The checkstyle script and the rules in place need to be revisited - for example, I see warnings for line too long for import statements. Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3366.001.patch, YARN-3366.002.patch, YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, YARN-3366.006.patch, YARN-3366.007.patch In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3448: -- Attachment: YARN-3448.12.patch Improved the patch further by running the code with a java profile. This patch is 25% faster and generates roughly 20% smaller database than the previous version. - Removed unnecessary PREFIX since each type is in its own database and is not needed to distinguish. - Removed unused invisible related entities to reduce to reduce further operations. - Changed database serialization method to more quickly generate a smaller serialized size of the primary filter values and other info. Library introduced is verified Apache License 2.0 from fast-serialization. - Profile show much time spent converting Strings to byte arrays. Converted the strings once and reused for all the database keys. - Reduced the read cache and write buffer size to take into consideration the 7 day default retention. - Removed insert time from start time database. This feature is used to detect changes since last query, but is not functional since it forces a scan of all data entries. Could be added back at a later time. Add Rolling Time To Lives Level DB Plugin Capabilities -- Key: YARN-3448 URL: https://issues.apache.org/jira/browse/YARN-3448 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3448.1.patch, YARN-3448.10.patch, YARN-3448.12.patch, YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. An exclusive write lock is held during the entire deletion phase which in practice can be hours. If we are to relax some of the consistency constraints, other performance enhancing techniques can be employed to maximize the throughput and minimize locking time. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. This allows each database to maximize the read cache effectiveness based on the unique usage patterns of each database. With 5 separate databases each lookup is much faster. This can also help with I/O to have the entity and index databases on separate disks. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. We replace DB record removal with file system removal if we create a rolling set of databases that age out and can be efficiently removed. To do this we must place a constraint to always place an entity's events into it's correct rolling db instance based on start time. This allows us to stitching the data back together while reading and artificial paging. Relax the synchronous writes constraints. If we are willing to accept losing some records that we not flushed in the operating system during a crash, we can use async writes that can be much faster. Prefer Sequential writes. sequential writes can be several times faster than random writes. Spend some small effort arranging the writes in such a way that will trend towards sequential write performance over random write performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin
[ https://issues.apache.org/jira/browse/YARN-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508250#comment-14508250 ] Hadoop QA commented on YARN-3532: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 59s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 31s | The applied patch generated 4 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 1s | The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 52s | Tests passed in hadoop-sls. | | {color:red}-1{color} | yarn tests | 52m 7s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 95m 50s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-sls | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, String):in org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, String): new java.io.FileReader(String) At RumenToSLSConverter.java:[line 122] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, String):in org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, String): new java.io.FileWriter(String) At RumenToSLSConverter.java:[line 124] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSNodeFile(String):in org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSNodeFile(String): new java.io.FileWriter(String) At RumenToSLSConverter.java:[line 145] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.SLSRunner.startAMFromSLSTraces(Resource, int):in org.apache.hadoop.yarn.sls.SLSRunner.startAMFromSLSTraces(Resource, int): new java.io.FileReader(String) At SLSRunner.java:[line 280] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics():in org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(): new java.io.FileWriter(String) At ResourceSchedulerWrapper.java:[line 490] | | | Found reliance on default encoding in new org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper$MetricsLogRunnable(ResourceSchedulerWrapper):in new org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper$MetricsLogRunnable(ResourceSchedulerWrapper): new java.io.FileWriter(String) At ResourceSchedulerWrapper.java:[line 695] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics():in org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics(): new java.io.FileWriter(String) At SLSCapacityScheduler.java:[line 493] | | | Found reliance on default encoding in new org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler):in new org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler): new java.io.FileWriter(String) At SLSCapacityScheduler.java:[line 698] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromNodeFile(String):in org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromNodeFile(String): new java.io.FileReader(String) At SLSUtils.java:[line 119] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromSLSTrace(String):in org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromSLSTrace(String): new java.io.FileReader(String) At SLSUtils.java:[line 92] | | | Class org.apache.hadoop.yarn.sls.web.SLSWebApp defines non-transient non-serializable instance field handleOperTimecostHistogramMap In SLSWebApp.java:instance field handleOperTimecostHistogramMap In SLSWebApp.java | | | Class org.apache.hadoop.yarn.sls.web.SLSWebApp defines non-transient non-serializable instance field queueAllocatedMemoryCounterMap In
[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508279#comment-14508279 ] Hadoop QA commented on YARN-3448: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 23s | The applied patch generated 6 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 8s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 27s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 3m 17s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | | | 45m 35s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727487/YARN-3448.12.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a100be6 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7465/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7465/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7465/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7465/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7465//console | This message was automatically generated. Add Rolling Time To Lives Level DB Plugin Capabilities -- Key: YARN-3448 URL: https://issues.apache.org/jira/browse/YARN-3448 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3448.1.patch, YARN-3448.10.patch, YARN-3448.12.patch, YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. An exclusive write lock is held during the entire deletion phase which in practice can be hours. If we are to relax some of the consistency constraints, other performance enhancing techniques can be employed to maximize the throughput and minimize locking time. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. This allows each database to maximize the read cache effectiveness based on the unique usage patterns of each database. With 5 separate databases each lookup is much faster. This can also help with I/O to have the entity and index databases on separate disks. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. We replace DB record removal with file system removal if we create a rolling set of databases that age out and can be efficiently removed. To do this we must place a constraint to always place an entity's events into it's correct rolling db instance based on start time. This allows us to stitching the data back together while reading and artificial paging. Relax the synchronous writes constraints. If we are willing to accept losing some records that we not flushed in the operating system during a crash, we can use async writes that can be much faster. Prefer Sequential writes. sequential writes can be several times faster than random writes. Spend some small effort arranging the writes in such a way that will trend towards sequential write performance over random write performance. -- This message was sent by Atlassian JIRA
[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507917#comment-14507917 ] Craig Welch commented on YARN-3319: --- The failed tests pass on my box with the patch, unrelated. The checkstyle is referring to ResourceLimits, which the patch doesn't change... poking around in the build artifacts there are some exceptions in some of the checkstyle stuff, I'm not sure it's actually working correctly Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507914#comment-14507914 ] Vrushali C commented on YARN-3134: -- Hi [~gtCarrera9] Thanks for the patch, I had some questions: - I don't see the isRelatedTo and relatesTo entities being written in this patch - For the metrics timeseries, I see that the metric values are being written as a ; separated list of values as a string, is that right? But I could not figure how where the timestamps associated with each metric value are stored. Storing metric values as strings would make it harder I think to query in numerical queries, like how many entities had GC MILLIS that were more than 25% of the CPU MILLIS. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507952#comment-14507952 ] Hadoop QA commented on YARN-3434: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 36s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 52m 23s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 93m 31s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727317/YARN-3434.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a3b1d8c | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7455/artifact/patchprocess/whitespace.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7455/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7455/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7455/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7455//console | This message was automatically generated. Interaction between reservations and userlimit can result in significant ULF violation -- Key: YARN-3434 URL: https://issues.apache.org/jira/browse/YARN-3434 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch ULF was set to 1.0 User was able to consume 1.4X queue capacity. It looks like when this application launched, it reserved about 1000 containers, each 8G each, within about 5 seconds. I think this allowed the logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-3366: Attachment: YARN-3366.007.patch Attaching a new patch based on code-review feedback from [~vinodkv] Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3366.001.patch, YARN-3366.002.patch, YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, YARN-3366.006.patch, YARN-3366.007.patch In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508121#comment-14508121 ] Sangjin Lee commented on YARN-3390: --- bq. In TimelineCollectorManager, I'm still having this question, although we may not want to address it in this JIRA: are there any special consistency requirements that prevent us from using ConcurrentHashMap? I can answer this as I added that code. :) In putIfAbsent(), it needs to start the collector as well if get() returns null. If we used ConcurrentHashMap and removed synchronization, multiple threads could start their own collectors unnecessarily. It is probably not a show stopper but less than desirable. Also, in real life the contention on TimelineCollectorManager is low enough that synchronization should be perfectly adequate. If we want to do this without synchronization, then we would want to use something like guava's LoadingCache. Reuse TimelineCollectorManager for RM - Key: YARN-3390 URL: https://issues.apache.org/jira/browse/YARN-3390 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3390.1.patch RMTimelineCollector should have the context info of each app whose entity has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin
[ https://issues.apache.org/jira/browse/YARN-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li reassigned YARN-3532: - Assignee: Siqi Li nodemanager version in RM nodes page didn't update when NMs rejoin -- Key: YARN-3532 URL: https://issues.apache.org/jira/browse/YARN-3532 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508155#comment-14508155 ] Hadoop QA commented on YARN-3366: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 12s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 59s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 30s | The applied patch generated 6 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 27s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 5m 48s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 49m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727369/YARN-3366.007.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0ebe84d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7463/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7463/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7463/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7463/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7463//console | This message was automatically generated. Outbound network bandwidth : classify/shape traffic originating from YARN containers Key: YARN-3366 URL: https://issues.apache.org/jira/browse/YARN-3366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3366.001.patch, YARN-3366.002.patch, YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, YARN-3366.006.patch, YARN-3366.007.patch In order to be able to isolate based on/enforce outbound traffic bandwidth limits, we need a mechanism to classify/shape network traffic in the nodemanager. For more information on the design, please see the attached design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly
[ https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507858#comment-14507858 ] Hadoop QA commented on YARN-2605: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 7m 40s | There were no new checkstyle issues. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 59s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 6m 57s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 52m 33s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 103m 32s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727301/YARN-2605.2.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 12f4df0 | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/7452/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7452/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7452/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7452//console | This message was automatically generated. [RM HA] Rest api endpoints doing redirect incorrectly - Key: YARN-2605 URL: https://issues.apache.org/jira/browse/YARN-2605 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: bc Wong Assignee: Xuan Gong Labels: newbie Attachments: YARN-2605.1.patch, YARN-2605.2.patch The standby RM's webui tries to do a redirect via meta-refresh. That is fine for pages designed to be viewed by web browsers. But the API endpoints shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd suggest HTTP 303, or return a well-defined error message (json or xml) stating that the standby status and a link to the active RM. The standby RM is returning this today: {noformat} $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics HTTP/1.1 200 OK Cache-Control: no-cache Expires: Thu, 25 Sep 2014 18:34:53 GMT Date: Thu, 25 Sep 2014 18:34:53 GMT Pragma: no-cache Expires: Thu, 25 Sep 2014 18:34:53 GMT Date: Thu, 25 Sep 2014 18:34:53 GMT Pragma: no-cache Content-Type: text/plain; charset=UTF-8 Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics Content-Length: 117 Server: Jetty(6.1.26) This is standby RM. Redirecting to the current active RM: http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-3434: Attachment: YARN-3434.patch Attaching exact same patch to kick jenkins again Interaction between reservations and userlimit can result in significant ULF violation -- Key: YARN-3434 URL: https://issues.apache.org/jira/browse/YARN-3434 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch ULF was set to 1.0 User was able to consume 1.4X queue capacity. It looks like when this application launched, it reserved about 1000 containers, each 8G each, within about 5 seconds. I think this allowed the logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin
Siqi Li created YARN-3532: - Summary: nodemanager version in RM nodes page didn't update when NMs rejoin Key: YARN-3532 URL: https://issues.apache.org/jira/browse/YARN-3532 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled
Anubhav Dhoot created YARN-3533: --- Summary: Test: Fix launchAM in MockRM to wait for attempt to be scheduled Key: YARN-3533 URL: https://issues.apache.org/jira/browse/YARN-3533 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot MockRM#launchAM fails in many test runs because it does not wait for the app attempt to be scheduled before NM update is sent as noted in [recent builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508167#comment-14508167 ] Anubhav Dhoot commented on YARN-3387: - Thanks [~sandflee] for reporting the issue. I have opened YARN-3533 to fix this. container complete message couldn't pass to am if am restarted and rm changed - Key: YARN-3387 URL: https://issues.apache.org/jira/browse/YARN-3387 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: sandflee Priority: Critical Labels: patch Attachments: YARN-3387.001.patch, YARN-3387.002.patch suppose am work preserving and rm ha is enabled. container complete message is passed to appattemt.justFinishedContainers in rm。in normal situation,all attempt in one app shares the same justFinishedContainers, but when rm changed, every attempt has it's own justFinishedContainers, so in situations below, container complete message couldn't passed to am: 1, am restart 2, rm changes 3, container launched by first am completes container complete message will be passed to appAttempt1 not appAttempt2, but am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)