[jira] [Commented] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling
[ https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066141#comment-15066141 ] Arun Suresh commented on YARN-4477: --- [~Tao Jie], I apologize but looks like I missed a couple of minor nits in my previous review.. : # You had made a modification to the *testQueueMaxAMShareWithContainerReservation* test case. Is this required ? # reword the javadoc for the documentation of what is returned by the *reserve* function for eg : {{return whether reservation was possible with the current threshold limits}} # we can probably move the comment on line 635 before the if. +1 after that.. > FairScheduler: encounter infinite loop in attemptScheduling > --- > > Key: YARN-4477 > URL: https://issues.apache.org/jira/browse/YARN-4477 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Tao Jie >Assignee: Tao Jie > Attachments: YARN-4477.001.patch, YARN-4477.002.patch, > YARN-4477.003.patch > > > This problem is introduced by YARN-4270 which add limitation on reservation. > In FSAppAttempt.reserve(): > {code} > if (!reservationExceedsThreshold(node, type)) { > LOG.info("Making reservation: node=" + node.getNodeName() + > " app_id=" + getApplicationId()); > if (!alreadyReserved) { > getMetrics().reserveResource(getUser(), container.getResource()); > RMContainer rmContainer = > super.reserve(node, priority, null, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } else { > RMContainer rmContainer = node.getReservedContainer(); > super.reserve(node, priority, rmContainer, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } > } > {code} > If reservation over threshod, current node will not set reservation. > But in attemptScheduling in FairSheduler: > {code} > while (node.getReservedContainer() == null) { > boolean assignedContainer = false; > if (!queueMgr.getRootQueue().assignContainer(node).equals( > Resources.none())) { > assignedContainers++; > assignedContainer = true; > > } > > if (!assignedContainer) { break; } > if (!assignMultiple) { break; } > if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; } > } > {code} > assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not > equals to Resources.none(). > As a result, if multiple assign is enabled and maxAssign is unlimited, this > while loop would never break. > I suppose that assignContainer(node) should return Resource.none rather than > CONTAINER_RESERVED when the attempt doesn't take the reservation because of > the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4493) move queue can make app don't belong to any queue
[ https://issues.apache.org/jira/browse/YARN-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiangyu updated YARN-4493: -- Attachment: yarn-4493.patch.1 patch from our revision. > move queue can make app don't belong to any queue > - > > Key: YARN-4493 > URL: https://issues.apache.org/jira/browse/YARN-4493 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.4.0, 2.6.0, 2.7.1 >Reporter: jiangyu >Priority: Minor > Attachments: yarn-4493.patch.1 > > > When moving a running application to a different queue, the current implement > don't check if the app can run in the new queue before remove it from current > queue. So if the destination queue is full, the app will throw exception, and > don't belong to any queue. > After that, the queue become orphane, can not schedule any resources. If you > kill the app, the removeApp method in FSLeafQueue will throw > IllealStateException of "Given app to remove app does not exist in queue ..." > exception. > So i think we should check if the destination queue can run the app before > remove it from the current queue. > The patch is from our revision. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4493) move queue can make app don't belong to any queue
jiangyu created YARN-4493: - Summary: move queue can make app don't belong to any queue Key: YARN-4493 URL: https://issues.apache.org/jira/browse/YARN-4493 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1, 2.6.0, 2.4.0 Reporter: jiangyu Priority: Minor When moving a running application to a different queue, the current implement don't check if the app can run in the new queue before remove it from current queue. So if the destination queue is full, the app will throw exception, and don't belong to any queue. After that, the queue become orphane, can not schedule any resources. If you kill the app, the removeApp method in FSLeafQueue will throw IllealStateException of "Given app to remove app does not exist in queue ..." exception. So i think we should check if the destination queue can run the app before remove it from the current queue. The patch is from our revision. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling
[ https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066115#comment-15066115 ] Hadoop QA commented on YARN-4477: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 7s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 10s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 151m 39s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12778474/YARN-4477.003.patch | | JIRA Issue | YARN-4477 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs c
[jira] [Assigned] (YARN-4482) Default values of several config parameters are missing
[ https://issues.apache.org/jira/browse/YARN-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan reassigned YARN-4482: -- Assignee: Mohammad Shahid Khan > Default values of several config parameters are missing > > > Key: YARN-4482 > URL: https://issues.apache.org/jira/browse/YARN-4482 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.6.2, 2.6.3 >Reporter: Tianyin Xu >Assignee: Mohammad Shahid Khan >Priority: Minor > > In {{yarn-default.xml}}, the default values of the following parameters are > commented out, > {{yarn.client.failover-max-attempts}} > {{yarn.client.failover-sleep-base-ms}} > {{yarn.client.failover-sleep-max-ms}} > Are these default values changed (I suppose so)? If so, we should update the > new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" > values... > (yarn-default.xml) > https://hadoop.apache.org/docs/r2.6.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml > https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling
[ https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066046#comment-15066046 ] Arun Suresh commented on YARN-4477: --- [~Tao Jie], The test case looks good.. I kicked off another build to see if FairScheduler test cases passes (since the test case that failed was testContinuousScheduling). +1 pending that. > FairScheduler: encounter infinite loop in attemptScheduling > --- > > Key: YARN-4477 > URL: https://issues.apache.org/jira/browse/YARN-4477 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Tao Jie >Assignee: Tao Jie > Attachments: YARN-4477.001.patch, YARN-4477.002.patch, > YARN-4477.003.patch > > > This problem is introduced by YARN-4270 which add limitation on reservation. > In FSAppAttempt.reserve(): > {code} > if (!reservationExceedsThreshold(node, type)) { > LOG.info("Making reservation: node=" + node.getNodeName() + > " app_id=" + getApplicationId()); > if (!alreadyReserved) { > getMetrics().reserveResource(getUser(), container.getResource()); > RMContainer rmContainer = > super.reserve(node, priority, null, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } else { > RMContainer rmContainer = node.getReservedContainer(); > super.reserve(node, priority, rmContainer, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } > } > {code} > If reservation over threshod, current node will not set reservation. > But in attemptScheduling in FairSheduler: > {code} > while (node.getReservedContainer() == null) { > boolean assignedContainer = false; > if (!queueMgr.getRootQueue().assignContainer(node).equals( > Resources.none())) { > assignedContainers++; > assignedContainer = true; > > } > > if (!assignedContainer) { break; } > if (!assignMultiple) { break; } > if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; } > } > {code} > assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not > equals to Resources.none(). > As a result, if multiple assign is enabled and maxAssign is unlimited, this > while loop would never break. > I suppose that assignContainer(node) should return Resource.none rather than > CONTAINER_RESERVED when the attempt doesn't take the reservation because of > the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066045#comment-15066045 ] Naganarasimha G R commented on YARN-2934: - Hi [~jira.shegalov], Hope latest patch covers all your comments, can you please take a look at it once ? check style issues are not directly induced by the patch. > Improve handling of container's stderr > --- > > Key: YARN-2934 > URL: https://issues.apache.org/jira/browse/YARN-2934 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Gera Shegalov >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, > YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, > YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, > YARN-2934.v2.001.patch, YARN-2934.v2.002.patch, YARN-2934.v2.003.patch > > > Most YARN applications redirect stderr to some file. That's why when > container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling
[ https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066044#comment-15066044 ] Tao Jie commented on YARN-4477: --- [~asuresh] [~kasha], Would you please give a review for it? > FairScheduler: encounter infinite loop in attemptScheduling > --- > > Key: YARN-4477 > URL: https://issues.apache.org/jira/browse/YARN-4477 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Tao Jie >Assignee: Tao Jie > Attachments: YARN-4477.001.patch, YARN-4477.002.patch, > YARN-4477.003.patch > > > This problem is introduced by YARN-4270 which add limitation on reservation. > In FSAppAttempt.reserve(): > {code} > if (!reservationExceedsThreshold(node, type)) { > LOG.info("Making reservation: node=" + node.getNodeName() + > " app_id=" + getApplicationId()); > if (!alreadyReserved) { > getMetrics().reserveResource(getUser(), container.getResource()); > RMContainer rmContainer = > super.reserve(node, priority, null, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } else { > RMContainer rmContainer = node.getReservedContainer(); > super.reserve(node, priority, rmContainer, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } > } > {code} > If reservation over threshod, current node will not set reservation. > But in attemptScheduling in FairSheduler: > {code} > while (node.getReservedContainer() == null) { > boolean assignedContainer = false; > if (!queueMgr.getRootQueue().assignContainer(node).equals( > Resources.none())) { > assignedContainers++; > assignedContainer = true; > > } > > if (!assignedContainer) { break; } > if (!assignMultiple) { break; } > if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; } > } > {code} > assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not > equals to Resources.none(). > As a result, if multiple assign is enabled and maxAssign is unlimited, this > while loop would never break. > I suppose that assignContainer(node) should return Resource.none rather than > CONTAINER_RESERVED when the attempt doesn't take the reservation because of > the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4480) Clean up some inappropriate imports
[ https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066017#comment-15066017 ] Kai Zheng commented on YARN-4480: - Thanks Uma for reviewing and committing this! > Clean up some inappropriate imports > --- > > Key: YARN-4480 > URL: https://issues.apache.org/jira/browse/YARN-4480 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 2.8.0 > > Attachments: YARN-4480-v1.patch, YARN-4480-v2.patch > > > It was noticed there are some unnecessary dependency into Directory classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4454) NM to nodelabel mapping going wrong after RM restart
[ https://issues.apache.org/jira/browse/YARN-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066015#comment-15066015 ] Bibin A Chundatt commented on YARN-4454: Sorry i mentioned wrong jira ID its YARN-4478 > NM to nodelabel mapping going wrong after RM restart > > > Key: YARN-4454 > URL: https://issues.apache.org/jira/browse/YARN-4454 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-4454.patch, test.patch > > > *Nodelabel mapping with NodeManager is going wrong if combination of > hostname and then NodeId is used to update nodelabel mapping* > *Steps to reproduce* > 1.Create cluster with 2 NM > 2.Add label X,Y to cluster > 3.replace Label of node 1 using ,x > 4.replace label for node 1 by ,y > 5.Again replace label of node 1 by ,x > Check cluster label mapping HOSTNAME1 will be mapped with X > Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y > {noformat} > 2015-12-14 17:17:54,901 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: > [,] > 2015-12-14 17:17:54,905 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: REPLACE labels on > nodes: > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-188:64318, labels=[ResourcePool_1] > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-188:0, labels=[ResourcePool_null] > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-187:64318, labels=[ResourcePool_null] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4454) NM to nodelabel mapping going wrong after RM restart
[ https://issues.apache.org/jira/browse/YARN-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066013#comment-15066013 ] Bibin A Chundatt commented on YARN-4454: Test failures are already tracked as part of umbrella JIRA YARN-4474 > NM to nodelabel mapping going wrong after RM restart > > > Key: YARN-4454 > URL: https://issues.apache.org/jira/browse/YARN-4454 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-4454.patch, test.patch > > > *Nodelabel mapping with NodeManager is going wrong if combination of > hostname and then NodeId is used to update nodelabel mapping* > *Steps to reproduce* > 1.Create cluster with 2 NM > 2.Add label X,Y to cluster > 3.replace Label of node 1 using ,x > 4.replace label for node 1 by ,y > 5.Again replace label of node 1 by ,x > Check cluster label mapping HOSTNAME1 will be mapped with X > Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y > {noformat} > 2015-12-14 17:17:54,901 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: > [,] > 2015-12-14 17:17:54,905 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: REPLACE labels on > nodes: > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-188:64318, labels=[ResourcePool_1] > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-188:0, labels=[ResourcePool_null] > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-187:64318, labels=[ResourcePool_null] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4100) Add Documentation for Distributed and Delegated-Centralized Node Labels feature
[ https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065962#comment-15065962 ] Dian Fu commented on YARN-4100: --- Hi [~Naganarasimha], Thanks a lot for the quick update. LGTM. +1. > Add Documentation for Distributed and Delegated-Centralized Node Labels > feature > --- > > Key: YARN-4100 > URL: https://issues.apache.org/jira/browse/YARN-4100 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodeLabel.html, YARN-4100.v1.001.patch, > YARN-4100.v1.002.patch, YARN-4100.v1.003.patch > > > Add Documentation for Distributed Node Labels feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism
[ https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065910#comment-15065910 ] Sidharta Seethana commented on YARN-3542: - Hi Vinod, A few comments inline regarding your review comments. Thanks, -Sidharta {quote} When user sets the old CgroupsLCEResourcesHandler, you are internally resetting it to DefaultLCEResourcesHandler(inside LinuxContainerExecutor) and using that as a control to stop using the older handler. This effectively means the old code is not used anymore, and that the new code is stable. This doesn't map well with the earlier decision to not document the new handlers yet. {quote} Thats a good point. However, the handlers themselves aren’t the issue IMO, but rather the configuration which might get out of hand as we add more resource handlers. This is especially true in light of resource profiles being worked in in YARN-3926 - which might require some changes to how the resource handlers are configured. However, there should be no issue hooking into the new handler using the old configuration mechanism. {quote} Lot more dedup is possible between the new hierarchy and the older hierarchy {quote} I don’t think we should dedup this code. There is no reason to hook into the new code/handlers for either of the two scenarios being discussed here : 1) deprecate the old handler in which case there shouldn’t be any reason to make make major changes to it. 2) not deprecate the old handler because the new handler might not be stable - in which case it doesn’t make sense to hook into the new handler/code yet. {quote} Further, specific constants like CPU_PERIOD_US perhaps belong better to the specific implementations likeCGroupsCpuResourceHandlerImpl instead of the root CpuResourceHandler. {quote} I am assuming you meant this : ‘the root CGroupsHandler’ not ‘the root CpuResourceHandler’. Arguments can be made both ways here : CGroupsHandler already has enums and constants that are used across multiple resource handler implementations. Some of these cannot be moved out (e.g the enum, tasks etc) . Moving out some of these to individual handlers is of limited use and makes it hard to get an quick overview of all the cgroups subsystems/parameters in use among the various resource handlers. It also creates problems if new handlers for the same subsystem are created which require using the same cgroup parameters. Cleaning this up fully would require more extensive refactoring - individual classes for various cgroups subsystems etc., This is not necessary, IMO. {quote} We should also deprecate the old LCEResourcesHandler interface, DefaultLCEResourcesHandler etc. That said, we shouldn't deprecate them or CgroupsLCEResourcesHandler before we make the newer mechanism public and usable. So may be we should fork off all this deprecation to another JIRA and only get to it after we publicly document the new mechanism for stable usage. {quote} Yeah, thats a good point again. The new handler itself isn’t the issue - the new configuration could be. Like I said before, though, I think there should be no problem using the new handler if the old configuration mechanism is in use. > Re-factor support for CPU as a resource using the new ResourceHandler > mechanism > --- > > Key: YARN-3542 > URL: https://issues.apache.org/jira/browse/YARN-3542 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Varun Vasudev >Priority: Critical > Attachments: YARN-3542.001.patch, YARN-3542.002.patch, > YARN-3542.003.patch, YARN-3542.004.patch, YARN-3542.005.patch, > YARN-3542.006.patch > > > In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier > addition of new resource types in the nodemanager (this was used for network > as a resource - See YARN-2140 ). We should refactor the existing CPU > implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using > the new ResourceHandler mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4454) NM to nodelabel mapping going wrong after RM restart
[ https://issues.apache.org/jira/browse/YARN-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065846#comment-15065846 ] Hadoop QA commented on YARN-4454: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 54s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 28s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 13s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 7s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 59s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 59s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 29s {color} | {color:red} hadoop-mapreduce-client-app in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 42s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 9s {color} | {color:red} hadoop-mapreduce-client-app in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 228m 58s {col
[jira] [Commented] (YARN-4492) Add documentation for queue level preemption which is supported in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065838#comment-15065838 ] Naganarasimha G R commented on YARN-4492: - ASF warning is due to HDFS-9173 and already it has been reopened ! > Add documentation for queue level preemption which is supported in Capacity > scheduler > - > > Key: YARN-4492 > URL: https://issues.apache.org/jira/browse/YARN-4492 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch > > > As part of YARN-2056, Support has been added to disable preemption for a > specific queue. This is a useful feature in a multiload cluster but currently > missing documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4480) Clean up some inappropriate imports
[ https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K reassigned YARN-4480: --- Assignee: Kai Zheng > Clean up some inappropriate imports > --- > > Key: YARN-4480 > URL: https://issues.apache.org/jira/browse/YARN-4480 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 2.8.0 > > Attachments: YARN-4480-v1.patch, YARN-4480-v2.patch > > > It was noticed there are some unnecessary dependency into Directory classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4492) Add documentation for queue level preemption which is supported in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065822#comment-15065822 ] Hadoop QA commented on YARN-4492: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 18s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 1m 0s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12778745/YARN-4492.v1.001.patch | | JIRA Issue | YARN-4492 | | Optional Tests | asflicense mvnsite | | uname | Linux 1fd0202e740c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7995a6e | | asflicense | https://builds.apache.org/job/PreCommit-YARN-Build/10052/artifact/patchprocess/patch-asflicense-problems.txt | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Max memory used | 29MB | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10052/console | This message was automatically generated. > Add documentation for queue level preemption which is supported in Capacity > scheduler > - > > Key: YARN-4492 > URL: https://issues.apache.org/jira/browse/YARN-4492 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch > > > As part of YARN-2056, Support has been added to disable preemption for a > specific queue. This is a useful feature in a multiload cluster but currently > missing documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4492) Add documentation for queue level preemption which is supported in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4492: Target Version/s: 2.8.0, 2.7.3 (was: 2.8.0, 2.7.2, 2.7.3) > Add documentation for queue level preemption which is supported in Capacity > scheduler > - > > Key: YARN-4492 > URL: https://issues.apache.org/jira/browse/YARN-4492 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch > > > As part of YARN-2056, Support has been added to disable preemption for a > specific queue. This is a useful feature in a multiload cluster but currently > missing documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4492) Add documentation for queue level preemption which is supported in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4492: Attachment: YARN-4492.v1.001.patch CapacityScheduler.html uploading with initial patch > Add documentation for queue level preemption which is supported in Capacity > scheduler > - > > Key: YARN-4492 > URL: https://issues.apache.org/jira/browse/YARN-4492 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch > > > As part of YARN-2056, Support has been added to disable preemption for a > specific queue. This is a useful feature in a multiload cluster but currently > missing documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4492) Add documentation for queue level preemption which is supported in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4492: Description: As part of YARN-2056, Support has been added to disable preemption for a specific queue. This is a useful feature in a multiload cluster but currently missing documentation. (was: As part of YARN-2096, Support has been added to disable preemption for a specific queue. This is a useful feature in a multiload cluster but currently missing documentation. ) > Add documentation for queue level preemption which is supported in Capacity > scheduler > - > > Key: YARN-4492 > URL: https://issues.apache.org/jira/browse/YARN-4492 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > > As part of YARN-2056, Support has been added to disable preemption for a > specific queue. This is a useful feature in a multiload cluster but currently > missing documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4492) Add documentation for queue level preemption which is supported in Capacity scheduler
Naganarasimha G R created YARN-4492: --- Summary: Add documentation for queue level preemption which is supported in Capacity scheduler Key: YARN-4492 URL: https://issues.apache.org/jira/browse/YARN-4492 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor As part of YARN-2096, Support has been added to disable preemption for a specific queue. This is a useful feature in a multiload cluster but currently missing documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4100) Add Documentation for Distributed and Delegated-Centralized Node Labels feature
[ https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065787#comment-15065787 ] Hadoop QA commented on YARN-4100: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 24s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 37s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 8s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 8s {color} | {color:green} hadoop-yarn-site in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 18s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 9s {color} | {color:green} hadoop-yarn-site in the patch passed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 30m 55s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12778738/YARN-4100.v1.003.patch | | JIRA Issue | YARN-4100 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml | | uname | Linux 337f90a1d604 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0f82b5d | | JDK v1.7.0_9
[jira] [Updated] (YARN-4100) Add Documentation for Distributed and Delegated-Centralized Node Labels feature
[ https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4100: Attachment: YARN-4100.v1.003.patch NodeLabel.html Thanks for the comments [~dian.fu], i have fixed your review comments in the latest patch. Hope to get review comments from [~bibinchundatt], [~rohithsharma] & [~wangda] can review and get it committed before the cut for 2.8 > Add Documentation for Distributed and Delegated-Centralized Node Labels > feature > --- > > Key: YARN-4100 > URL: https://issues.apache.org/jira/browse/YARN-4100 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodeLabel.html, YARN-4100.v1.001.patch, > YARN-4100.v1.002.patch, YARN-4100.v1.003.patch > > > Add Documentation for Distributed Node Labels feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4100) Add Documentation for Distributed and Delegated-Centralized Node Labels feature
[ https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4100: Attachment: (was: NodeLabel.html) > Add Documentation for Distributed and Delegated-Centralized Node Labels > feature > --- > > Key: YARN-4100 > URL: https://issues.apache.org/jira/browse/YARN-4100 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-4100.v1.001.patch, YARN-4100.v1.002.patch > > > Add Documentation for Distributed Node Labels feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4454) NM to nodelabel mapping going wrong after RM restart
[ https://issues.apache.org/jira/browse/YARN-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4454: --- Attachment: 0001-YARN-4454.patch Hi [~leftnoteasy] Thank you for looking into the issue . i was checking how to keep the order based on insertion order. Thts not required since when *host* is used all other labels we are getting updated in replace. Your solution totally make sense. Currently before internal label update for the normalize have used treeMap to keep sorting based on Node ID is required. Please do review the same. > NM to nodelabel mapping going wrong after RM restart > > > Key: YARN-4454 > URL: https://issues.apache.org/jira/browse/YARN-4454 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-4454.patch, test.patch > > > *Nodelabel mapping with NodeManager is going wrong if combination of > hostname and then NodeId is used to update nodelabel mapping* > *Steps to reproduce* > 1.Create cluster with 2 NM > 2.Add label X,Y to cluster > 3.replace Label of node 1 using ,x > 4.replace label for node 1 by ,y > 5.Again replace label of node 1 by ,x > Check cluster label mapping HOSTNAME1 will be mapped with X > Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y > {noformat} > 2015-12-14 17:17:54,901 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: > [,] > 2015-12-14 17:17:54,905 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: REPLACE labels on > nodes: > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-188:64318, labels=[ResourcePool_1] > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-188:0, labels=[ResourcePool_null] > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-187:64318, labels=[ResourcePool_null] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4490) RM restart the finished app shows wrong Diagnostics status
[ https://issues.apache.org/jira/browse/YARN-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065722#comment-15065722 ] Naganarasimha G R commented on YARN-4490: - [~mohdshahidkhan], thanks for the update! Yes i feel its not correct to have this message. It looks like though the application is finished its recovered ! > RM restart the finished app shows wrong Diagnostics status > -- > > Key: YARN-4490 > URL: https://issues.apache.org/jira/browse/YARN-4490 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > RM restart the finished app shows wrong Diagnostics status. > Preconditions: > RM recovery enable true. > Steps: > 1. run an application, wait application is finished. > 2. Restart the RM > 3. Check the application status is RM web UI > Issue: > Check the Diagnostic message: Attempt recovered after RM restart. > Expected: > The Diagnostic message should be available only for the application waiting > for allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4491) yarn list command to support filtering by tags
[ https://issues.apache.org/jira/browse/YARN-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-4491: -- Assignee: Varun Saxena > yarn list command to support filtering by tags > -- > > Key: YARN-4491 > URL: https://issues.apache.org/jira/browse/YARN-4491 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Varun Saxena >Priority: Minor > > although you can filter the list of yarn applications using the --appTypes > option; you can't use application tags. For finding applications on large > processes, adding a --tag option would allow users to be more selective > example: > {code} > yarn list --appTypes SPARK --tag production > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4491) yarn list command to support filtering by tags
Steve Loughran created YARN-4491: Summary: yarn list command to support filtering by tags Key: YARN-4491 URL: https://issues.apache.org/jira/browse/YARN-4491 Project: Hadoop YARN Issue Type: Improvement Components: client Affects Versions: 2.8.0 Reporter: Steve Loughran Priority: Minor although you can filter the list of yarn applications using the --appTypes option; you can't use application tags. For finding applications on large processes, adding a --tag option would allow users to be more selective example: {code} yarn list --appTypes SPARK --tag production {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4490) RM restart the finished app shows wrong Diagnostics status
[ https://issues.apache.org/jira/browse/YARN-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065715#comment-15065715 ] Mohammad Shahid Khan commented on YARN-4490: Hi [#Nagabarasimhar G R], this is not related to YARN-3946. My intension is if the application is finished then at the time of recovery the message "Attempt recovered after RM restart" is not needed. > RM restart the finished app shows wrong Diagnostics status > -- > > Key: YARN-4490 > URL: https://issues.apache.org/jira/browse/YARN-4490 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > RM restart the finished app shows wrong Diagnostics status. > Preconditions: > RM recovery enable true. > Steps: > 1. run an application, wait application is finished. > 2. Restart the RM > 3. Check the application status is RM web UI > Issue: > Check the Diagnostic message: Attempt recovered after RM restart. > Expected: > The Diagnostic message should be available only for the application waiting > for allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4490) RM restart the finished app shows wrong Diagnostics status
[ https://issues.apache.org/jira/browse/YARN-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065713#comment-15065713 ] Naganarasimha G R commented on YARN-4490: - hi [~mohdshahidkhan], Hope you are checking against the latest patch of the YARN-3946 jira. As per it i have removed the status for recover! bq. Not necessary. Diagnostics are not only meant to tell if app is waiting for allocation Well, later in the middle we modified the focus to only hold the status of the app in accepted state (till it registers with RM ), as there might be too many updates. And there were other debug jiras to capture information for other states and also App /Appattempt was not having any states to provide the information till it gets registed, hence this jira was focusing on providing diagnostics for those parts > RM restart the finished app shows wrong Diagnostics status > -- > > Key: YARN-4490 > URL: https://issues.apache.org/jira/browse/YARN-4490 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > RM restart the finished app shows wrong Diagnostics status. > Preconditions: > RM recovery enable true. > Steps: > 1. run an application, wait application is finished. > 2. Restart the RM > 3. Check the application status is RM web UI > Issue: > Check the Diagnostic message: Attempt recovered after RM restart. > Expected: > The Diagnostic message should be available only for the application waiting > for allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4490) RM restart the finished app shows wrong Diagnostics status
[ https://issues.apache.org/jira/browse/YARN-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065706#comment-15065706 ] Mohammad Shahid Khan commented on YARN-4490: Yes, [#Varun Saxena], if the application is recovered then diagnostic message does not add any value. And the message "Attempt recovered after RM restart", is also confusing for the app which is already finished. > RM restart the finished app shows wrong Diagnostics status > -- > > Key: YARN-4490 > URL: https://issues.apache.org/jira/browse/YARN-4490 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > RM restart the finished app shows wrong Diagnostics status. > Preconditions: > RM recovery enable true. > Steps: > 1. run an application, wait application is finished. > 2. Restart the RM > 3. Check the application status is RM web UI > Issue: > Check the Diagnostic message: Attempt recovered after RM restart. > Expected: > The Diagnostic message should be available only for the application waiting > for allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4490) RM restart the finished app shows wrong Diagnostics status
[ https://issues.apache.org/jira/browse/YARN-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065701#comment-15065701 ] Varun Saxena commented on YARN-4490: You mean whats the need of attempt diagnostics if app has finished ? > RM restart the finished app shows wrong Diagnostics status > -- > > Key: YARN-4490 > URL: https://issues.apache.org/jira/browse/YARN-4490 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > RM restart the finished app shows wrong Diagnostics status. > Preconditions: > RM recovery enable true. > Steps: > 1. run an application, wait application is finished. > 2. Restart the RM > 3. Check the application status is RM web UI > Issue: > Check the Diagnostic message: Attempt recovered after RM restart. > Expected: > The Diagnostic message should be available only for the application waiting > for allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4490) RM restart the finished app shows wrong Diagnostics status
[ https://issues.apache.org/jira/browse/YARN-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065696#comment-15065696 ] Varun Saxena commented on YARN-4490: bq. The Diagnostic message should be available only for the application waiting for allocation. Not necessary. Diagnostics are not only meant to tell if app is waiting for allocation. YARN-3946 merely added to current diagnostics. > RM restart the finished app shows wrong Diagnostics status > -- > > Key: YARN-4490 > URL: https://issues.apache.org/jira/browse/YARN-4490 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > RM restart the finished app shows wrong Diagnostics status. > Preconditions: > RM recovery enable true. > Steps: > 1. run an application, wait application is finished. > 2. Restart the RM > 3. Check the application status is RM web UI > Issue: > Check the Diagnostic message: Attempt recovered after RM restart. > Expected: > The Diagnostic message should be available only for the application waiting > for allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information
[ https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065688#comment-15065688 ] Sunil G commented on YARN-4290: --- YARN-4352 ,YARN-4306 are handling these test fails. All other tests are passing locally. > "yarn nodes -list" should print all nodes reports information > - > > Key: YARN-4290 > URL: https://issues.apache.org/jira/browse/YARN-4290 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0002-YARN-4290.patch, 0003-YARN-4290.patch > > > Currently, "yarn nodes -list" command only shows > - "Node-Id", > - "Node-State", > - "Node-Http-Address", > - "Number-of-Running-Containers" > I think we need to show more information such as used resource, just like > "yarn nodes -status" command. > Maybe we can add a parameter to -list, such as "-show-details" to enable > printing all detailed information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
[ https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065685#comment-15065685 ] Sunil G commented on YARN-4352: --- Hi [~djp]/[~rohithsharma] I analyzed this issue along with YARN-4306. {{MiniYARNCluster}} is used here and YarnClient cannot connect to RM and throwing {{UnKnownHostException}}. This causes timeout in all these test cases. {{QualifiedHostResolver}} is used in SecurityUtils for this cases ({{useIpForTokenService}} is false). And I could see that {{/etc/hosts}} has 2 loop back entries and 2nd one is the hostname of the machine. Hence below code will not return machine host name, instead it will return "localhost". {{ InetAddress.getByName(null)}} {noformat} // it's a simple host with no dots, ex. "host" // try the search list, then fallback to exact host InetAddress loopback = InetAddress.getByName(null); if (host.equalsIgnoreCase(loopback.getHostName())) { addr = InetAddress.getByAddress(host, loopback.getAddress()); } else { addr = getByNameWithSearch(host); if (addr == null) { addr = getByExactName(host); } } {noformat} I have provided more detailed comment in [YARN-4306-Analysis|https://issues.apache.org/jira/browse/YARN-4306?focusedCommentId=15030122&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15030122] and possible solutions are Solution: 1, loopback address configuration can be changed to use hostname first in the list {{/etc/hosts}} in jenkins machine. 2, use DNS in jenkins machine 3, we can make changes in {{SecurityUtils#getByExactName}} where we check with given hostname itself before doing with hostname + ".". > Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient > > > Key: YARN-4352 > URL: https://issues.apache.org/jira/browse/YARN-4352 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Sunil G > > From > https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt, > we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get > timeout which can be reproduced locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4490) RM restart the finished app shows wrong Diagnostics status
[ https://issues.apache.org/jira/browse/YARN-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065676#comment-15065676 ] Mohammad Shahid Khan commented on YARN-4490: in my view for finished applications the diagnostic message could be empty. > RM restart the finished app shows wrong Diagnostics status > -- > > Key: YARN-4490 > URL: https://issues.apache.org/jira/browse/YARN-4490 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > RM restart the finished app shows wrong Diagnostics status. > Preconditions: > RM recovery enable true. > Steps: > 1. run an application, wait application is finished. > 2. Restart the RM > 3. Check the application status is RM web UI > Issue: > Check the Diagnostic message: Attempt recovered after RM restart. > Expected: > The Diagnostic message should be available only for the application waiting > for allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4490) RM restart the finished app shows wrong Diagnostics status
[ https://issues.apache.org/jira/browse/YARN-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065675#comment-15065675 ] Mohammad Shahid Khan commented on YARN-4490: Hi [#Naganarasimha G R] any thoughts? > RM restart the finished app shows wrong Diagnostics status > -- > > Key: YARN-4490 > URL: https://issues.apache.org/jira/browse/YARN-4490 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > RM restart the finished app shows wrong Diagnostics status. > Preconditions: > RM recovery enable true. > Steps: > 1. run an application, wait application is finished. > 2. Restart the RM > 3. Check the application status is RM web UI > Issue: > Check the Diagnostic message: Attempt recovered after RM restart. > Expected: > The Diagnostic message should be available only for the application waiting > for allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4490) RM restart the finished app shows wrong Diagnostics status
Mohammad Shahid Khan created YARN-4490: -- Summary: RM restart the finished app shows wrong Diagnostics status Key: YARN-4490 URL: https://issues.apache.org/jira/browse/YARN-4490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan RM restart the finished app shows wrong Diagnostics status. Preconditions: RM recovery enable true. Steps: 1. run an application, wait application is finished. 2. Restart the RM 3. Check the application status is RM web UI Issue: Check the Diagnostic message: Attempt recovered after RM restart. Expected: The Diagnostic message should be available only for the application waiting for allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
[ https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-4352: - Assignee: Sunil G > Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient > > > Key: YARN-4352 > URL: https://issues.apache.org/jira/browse/YARN-4352 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Sunil G > > From > https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt, > we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get > timeout which can be reproduced locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)