[jira] [Commented] (YARN-4849) [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add licenses.
[ https://issues.apache.org/jira/browse/YARN-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229715#comment-15229715 ] Hadoop QA commented on YARN-4849: - (!) A patch to the testing environment has been detected. Re-executing against the patched versions to perform further tests. The console is at https://builds.apache.org/job/PreCommit-YARN-Build/10981/console in case of problems. > [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add > licenses. > --- > > Key: YARN-4849 > URL: https://issues.apache.org/jira/browse/YARN-4849 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4849-YARN-3368.1.patch, > YARN-4849-YARN-3368.2.patch, YARN-4849-YARN-3368.3.patch, > YARN-4849-YARN-3368.4.patch, YARN-4849-YARN-3368.5.patch, > YARN-4849-YARN-3368.6.patch, YARN-4849-YARN-3368.7.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4849) [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add licenses.
[ https://issues.apache.org/jira/browse/YARN-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4849: - Attachment: YARN-4849-YARN-3368.7.patch Attached ver.7 patch, addressed all comments from [~sunilg]. Thanks for review! Please let me know your thoughts on latest patch. > [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add > licenses. > --- > > Key: YARN-4849 > URL: https://issues.apache.org/jira/browse/YARN-4849 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4849-YARN-3368.1.patch, > YARN-4849-YARN-3368.2.patch, YARN-4849-YARN-3368.3.patch, > YARN-4849-YARN-3368.4.patch, YARN-4849-YARN-3368.5.patch, > YARN-4849-YARN-3368.6.patch, YARN-4849-YARN-3368.7.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4794) Distributed shell app gets stuck on stopping containers after App completes
[ https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229688#comment-15229688 ] Rohith Sharma K S commented on YARN-4794: - Thanks [~jianhe] for working on the patch. Nice issue!!:-) Making clear to folks before giving +1 that since NMClientImpl is annotated as @Private, removing protected method should not be any compatible issue. +1 LGTM, if there is no objections I will commit it EOD > Distributed shell app gets stuck on stopping containers after App completes > --- > > Key: YARN-4794 > URL: https://issues.apache.org/jira/browse/YARN-4794 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-4794.1.patch > > > Distributed shell app gets stuck on stopping containers after App completes > with the following exception > {code:title = app log} > 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application > completed. Stopping running containers > 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to > the server : java.nio.channels.ClosedByInterruptException > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3196) [Compatibility] Make TS next gen be compatible with the current TS
[ https://issues.apache.org/jira/browse/YARN-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3196: -- Labels: (was: yarn-2928-1st-milestone) > [Compatibility] Make TS next gen be compatible with the current TS > -- > > Key: YARN-3196 > URL: https://issues.apache.org/jira/browse/YARN-3196 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Junping Du > > File a jira to make sure that we don't forget to be compatible with the > current TS, such that we can smoothly move users to new TS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4821) have a separate NM timeline publishing interval
[ https://issues.apache.org/jira/browse/YARN-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229677#comment-15229677 ] Sangjin Lee commented on YARN-4821: --- I think it would be good if we can get this in, as long as it is not too complicated to implement. What do you think? > have a separate NM timeline publishing interval > --- > > Key: YARN-4821 > URL: https://issues.apache.org/jira/browse/YARN-4821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > Currently the interval with which NM publishes container CPU and memory > metrics is tied to {{yarn.nodemanager.resource-monitor.interval-ms}} whose > default is 3 seconds. This is too aggressive. > There should be a separate configuration that controls how often > {{NMTimelinePublisher}} publishes container metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3959) Store application related configurations in Timeline Service v2
[ https://issues.apache.org/jira/browse/YARN-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229676#comment-15229676 ] Sangjin Lee commented on YARN-3959: --- I think it would be nice if we can get this in. [~varun_saxena]? > Store application related configurations in Timeline Service v2 > --- > > Key: YARN-3959 > URL: https://issues.apache.org/jira/browse/YARN-3959 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > > We already have configuration field in HBase schema for application entity. > We need to make sure AM write it out when it get launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4736) Issues with HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229673#comment-15229673 ] Sangjin Lee commented on YARN-4736: --- Agreed. Please feel free to remove the label. > Issues with HBaseTimelineWriterImpl > --- > > Key: YARN-4736 > URL: https://issues.apache.org/jira/browse/YARN-4736 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Vrushali C >Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: NM_Hang_hbase1.0.3.tar.gz, hbaseException.log, > threaddump.log > > > Faced some issues while running ATSv2 in single node Hadoop cluster and in > the same node had launched Hbase with embedded zookeeper. > # Due to some NPE issues i was able to see NM was trying to shutdown, but the > NM daemon process was not completed due to the locks. > # Got some exception related to Hbase after application finished execution > successfully. > will attach logs and the trace for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229595#comment-15229595 ] Naganarasimha G R commented on YARN-3971: - [~bibinchundatt], Thanks for reopening this jira and yes agree with your analysis its wrong to handle in this way. So whats the approach you have in your mind ? I could think of having a flag in CommonNodeLabelsManager which set before calling initNodeLabelStore and reset after the call finishes. Thoughts? Also this time we need to better correct the test case too. > Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel > recovery > -- > > Key: YARN-3971 > URL: https://issues.apache.org/jira/browse/YARN-3971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, > 0003-YARN-3971.patch, 0004-YARN-3971.patch, 0005-YARN-3971.patch > > > Steps to reproduce > # Create label x,y > # Delete label x,y > # Create label x,y add capacity scheduler xml for labels x and y too > # Restart RM > > Both RM will become Standby. > Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} > {code} > 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in > state STARTED; cause: java.io.IOException: Cannot remove label=x, because > queue=a1 is using this label. Please remove label on queue before remove the > label > java.io.IOException: Cannot remove label=x, because queue=a1 is using this > label. Please remove label on queue before remove the label > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels
[ https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229557#comment-15229557 ] Tao Jie commented on YARN-4855: --- [~sunilg] [~leftnoteasy] [~naganarasimha] Thanks for your comments. I agree "--fail-on-unkown-nodes" makes the option more clear. {quote} also one more aspect is does it require a protocol change can we do it in the client side ? {quote} When we do verification on client side, client should request for all active node list (if verify on server side, node list is available in memory). My earlier concern is that it will bring more load if we do this verification as default behavior. However when we keep the default behavior as it used to be, and add one option for the node verification, I think on client side is more concise. More consideration, When we use *yarn rmadmin -replaceLabelsOnNode* and add node label to wrong nodes, this information seems to leak on RM. It stores in RM memory/filesystem, but could not be found in website or by shell cmd. As a result, it is difficult to remove or correct. Maybe it could be fixed in another Jira. Please correct me if I am wrong, thanks! > Should check if node exists when replace nodelabels > --- > > Key: YARN-4855 > URL: https://issues.apache.org/jira/browse/YARN-4855 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Tao Jie >Assignee: Tao Jie >Priority: Minor > Attachments: YARN-4855.001.patch > > > Today when we add nodelabels to nodes, it would succeed even if nodes are not > existing NodeManger in cluster without any message. > It could be like this: > When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be > denied if node does not exist. > When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would > add nodelabels no matter whether node exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4886) Add HDFS caller context for EntityGroupFSTimelineStore
[ https://issues.apache.org/jira/browse/YARN-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229529#comment-15229529 ] Xuan Gong commented on YARN-4886: - +1 lgtm. pending Jenkins > Add HDFS caller context for EntityGroupFSTimelineStore > -- > > Key: YARN-4886 > URL: https://issues.apache.org/jira/browse/YARN-4886 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4886-trunk.001.patch > > > We need to add a HDFS caller context for the entity group FS storage for > better audit log debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4726) [Umbrella] Allocation reuse for application upgrades
[ https://issues.apache.org/jira/browse/YARN-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229494#comment-15229494 ] Arun Suresh commented on YARN-4726: --- [~kasha], we have posted a slightly scoped down version of the doc on YARN-4876. To be used as initial building block for application upgrades.. Without full blown API changes described in YARN-1040 We plan to pursue YARN-4876 as phase 1 for the general purpose problem of decoupling allocation from container life cycle > [Umbrella] Allocation reuse for application upgrades > > > Key: YARN-4726 > URL: https://issues.apache.org/jira/browse/YARN-4726 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli > > See overview doc at YARN-4692, copying the sub-section to track all related > efforts. > Once auto-restart of containers is taken care of (YARN-4725), we need to > address what I believe is the second most important reason for service > containers to restart : upgrades. Once a service is running on YARN, the way > container allocation-lifecycle works, any time the container exits, YARN > will reclaim the resources. During an upgrade, with multitude of other > applications running in the system, giving up and getting back resources > allocated to the service is hard to manage. Things like NodeLabels in YARN > help this cause but are not straightforward to use to address the > app-specific usecases. > We need a first class way of letting application reuse the same > resourceallocation for multiple launches of the processes inside the > container. This is done by decoupling allocation lifecycle and the process > lifecycle. > The JIRA YARN-1040 initiated this conversation. We need two things here: > - (1) (Task) the ApplicationMaster should be able to use the same > container-allocation and issue multiple startContainerrequests to the > NodeManager. > - (2) [(Task) To support the upgrade of the ApplicationMaster itself, > clients should be able to inform YARN to restart AM within the same > allocation but with new bits. > The JIRAs YARN-3417 and YARN-4470 talk about the second task above ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2883) Queuing of container requests in the NM
[ https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229489#comment-15229489 ] Hadoop QA commented on YARN-2883: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 17s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 40s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 37s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 13 new + 425 unchanged - 4 fixed = 438 total (was 429) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 9s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 9s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} |
[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up
[ https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229486#comment-15229486 ] sandflee commented on YARN-4924: thanks [~nroberts], another thought, seems it's not nessesary for NM to store FINISH_APP event, for RM will check the running app when NM register, we just make sure that when NM register RM, it had recovered all containers with best, yes? [~jlowe] > NM recovery race can lead to container not cleaned up > - > > Key: YARN-4924 > URL: https://issues.apache.org/jira/browse/YARN-4924 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.7.2 >Reporter: Nathan Roberts > > It's probably a small window but we observed a case where the NM crashed and > then a container was not properly cleaned up during recovery. > I will add details in first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels
[ https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229478#comment-15229478 ] Sunil G commented on YARN-4855: --- bq.he need not worry whether the node is temporarily down and he can just apply the label mappings Thanks [~Naganarasimha] for the detailed clarification. This makes sense to me. bq.How about renaming it to "--fail-on-unkown-nodes" +1 for this as we all agree that current naming is confusing. Thanks [~leftnoteasy] and [~Naganarasimha]. > Should check if node exists when replace nodelabels > --- > > Key: YARN-4855 > URL: https://issues.apache.org/jira/browse/YARN-4855 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Tao Jie >Assignee: Tao Jie >Priority: Minor > Attachments: YARN-4855.001.patch > > > Today when we add nodelabels to nodes, it would succeed even if nodes are not > existing NodeManger in cluster without any message. > It could be like this: > When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be > denied if node does not exist. > When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would > add nodelabels no matter whether node exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt reopened YARN-3971: > Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel > recovery > -- > > Key: YARN-3971 > URL: https://issues.apache.org/jira/browse/YARN-3971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, > 0003-YARN-3971.patch, 0004-YARN-3971.patch, 0005-YARN-3971.patch > > > Steps to reproduce > # Create label x,y > # Delete label x,y > # Create label x,y add capacity scheduler xml for labels x and y too > # Restart RM > > Both RM will become Standby. > Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} > {code} > 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in > state STARTED; cause: java.io.IOException: Cannot remove label=x, because > queue=a1 is using this label. Please remove label on queue before remove the > label > java.io.IOException: Cannot remove label=x, because queue=a1 is using this > label. Please remove label on queue before remove the label > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229476#comment-15229476 ] Bibin A Chundatt commented on YARN-3971: [~wangda]/[~Naganarasimha] The issue still exists can we reopen the jira so that will provide an updated patch. The check done last time was for checking the ServiceState=STARTED . But always the service state will be STARTED when {{FileSystemNodeLabelsStore#recover}} is called and not in INIT since the call is from AbstractService.start() and before the serviceStart() is done the state is set to STARTED. {noformat} synchronized (stateChangeLock) { if (stateModel.enterState(STATE.STARTED) != STATE.STARTED) { try { startTime = System.currentTimeMillis(); serviceStart(); } {noformat} {{stateModel.enterState(STATE.STARTED)}} its directly set to {{SERVICE.STATE=STARTED}} > Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel > recovery > -- > > Key: YARN-3971 > URL: https://issues.apache.org/jira/browse/YARN-3971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, > 0003-YARN-3971.patch, 0004-YARN-3971.patch, 0005-YARN-3971.patch > > > Steps to reproduce > # Create label x,y > # Delete label x,y > # Create label x,y add capacity scheduler xml for labels x and y too > # Restart RM > > Both RM will become Standby. > Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} > {code} > 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in > state STARTED; cause: java.io.IOException: Cannot remove label=x, because > queue=a1 is using this label. Please remove label on queue before remove the > label > java.io.IOException: Cannot remove label=x, because queue=a1 is using this > label. Please remove label on queue before remove the label > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression
[ https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229472#comment-15229472 ] Sunil G commented on YARN-4925: --- Thanks [~bibinchundatt]. Yes, NODE_LOCAL and OFF-SWITCH should not be set differently. So a change in ANY can reset other 2 types which is fine. I was just ensuring this point. No issue :) > ContainerRequest in AMRMClient, application should be able to specify > nodes/racks together with nodeLabelExpression > --- > > Key: YARN-4925 > URL: https://issues.apache.org/jira/browse/YARN-4925 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Currently with nodelabel AMRMClient will not be able to specify nodelabels > with Node/Rack requests.For application like spark NODE_LOCAL requests cannot > be asked with label expression. > As per the check in {{AMRMClientImpl#checkNodeLabelExpression}} > {noformat} > // Don't allow specify node label against ANY request > if ((containerRequest.getRacks() != null && > (!containerRequest.getRacks().isEmpty())) > || > (containerRequest.getNodes() != null && > (!containerRequest.getNodes().isEmpty( { > throw new InvalidContainerRequestException( > "Cannot specify node label with rack and node"); > } > {noformat} > {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of > OFF-SWITCH. > The above check is not required for ContainerRequest ask /cc [~wangda] thank > you for confirming -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression
[ https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229469#comment-15229469 ] Bibin A Chundatt commented on YARN-4925: [~sunilg] and [~Naganarasimha] Thank you for looking into the issue. The check is for {{ContainerRequest}} and IIUC after the discussion with [~wangda] the check was done to *properly account how much resources an app requests for each partitions* considering that label for NODE_LOCAL and OFF-SWITCH shouldnt be set differently, which is not possible if we are setting {{ContainerRequest}} . {noformat} public static class ContainerRequest { final Resource capability; final List nodes; final List racks; final Priority priority; final boolean relaxLocality; final String nodeLabelsExpression; } {noformat} Consider in one NODELABEL there are 100 Nodes and need container to be started on NODE where data is available this is not allowed.For issue we face was for spark container needed be node local and was not allowed. cc/ [~wangda] > ContainerRequest in AMRMClient, application should be able to specify > nodes/racks together with nodeLabelExpression > --- > > Key: YARN-4925 > URL: https://issues.apache.org/jira/browse/YARN-4925 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Currently with nodelabel AMRMClient will not be able to specify nodelabels > with Node/Rack requests.For application like spark NODE_LOCAL requests cannot > be asked with label expression. > As per the check in {{AMRMClientImpl#checkNodeLabelExpression}} > {noformat} > // Don't allow specify node label against ANY request > if ((containerRequest.getRacks() != null && > (!containerRequest.getRacks().isEmpty())) > || > (containerRequest.getNodes() != null && > (!containerRequest.getNodes().isEmpty( { > throw new InvalidContainerRequestException( > "Cannot specify node label with rack and node"); > } > {noformat} > {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of > OFF-SWITCH. > The above check is not required for ContainerRequest ask /cc [~wangda] thank > you for confirming -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4794) Distributed shell app gets stuck on stopping containers after App completes
[ https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229466#comment-15229466 ] Hadoop QA commented on YARN-4794: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 21s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 39s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 146m 33s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.client.api.impl.TestAMRMProxy | | | hadoop.yarn.client.TestGetGroups | | JDK v1.8.0_77 Timed out junit tests | org.apache.hadoop.yarn.client.cli.TestYarnCLI | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestNMClient | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.client.api.impl.TestAMRMProxy | | | hadoop.yarn.client.TestGetGroups | | JDK v1.7.0_95
[jira] [Commented] (YARN-4552) NM ResourceLocalizationService should check and initialize local filecache dir (and log dir) even if NM recover is enabled.
[ https://issues.apache.org/jira/browse/YARN-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229410#comment-15229410 ] Vinod Kumar Vavilapalli commented on YARN-4552: --- [~djp], let me know if you can update this soon enough for 2.7.3 in a couple of days. Otherwise, we can simply move this to 2.7.4 or 2.8 in few weeks. > NM ResourceLocalizationService should check and initialize local filecache > dir (and log dir) even if NM recover is enabled. > --- > > Key: YARN-4552 > URL: https://issues.apache.org/jira/browse/YARN-4552 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4552-v2.patch, YARN-4552.patch > > > In some cases, user are cleanup localized file cache for debugging/trouble > shooting purpose during NM down time. However, after bring back NM (with > recovery enabled), the job submission could be failed for exception like > below: > {noformat} > Diagnostics: java.io.FileNotFoundException: File > /disk/12/yarn/local/filecache does not exist. > {noformat} > This is due to we only create filecache dir when recover is not enabled > during ResourceLocalizationService get initialized/started. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3461) Consolidate flow name/version/run defaults
[ https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3461: -- Attachment: YARN-3461-YARN-2928.03.patch Posted patch v.3. {quote} i am not sure why earlier code was trying to set these flags in AMLauncher.setupTokens methods, would it be better to set it in the caller createAMContainerLaunchContext or a seperate method for it or change the method name to be more meaningful. Thoughts? {quote} That's a good suggestion, and I made that change in the latest patch. I also thought that the code was buried at an awkward place. I refactored the code out to create a separate method ({{setFlowContext()}}). {quote} i think instead of calling setFlowTags twice may be we can have additional parameter indicating the default value along with pushing of tag.split(":", 2) inside setFlowTags {quote} We kind of need it in the form it's in. I know it seems somewhat awkward that we call it twice for a given tag. However, note that the flow information may not be present in the tags (that's why we need the defaults in the first place!), in which case the inner {{setFlowTags()}} will not be invoked. Another approach would be to call {{setFlowTags()}} with the default *after* iterating over the tag, but that also adds some more code to handle it so it's not clear if things become simpler. The {{setFlowTags()}} method is pretty cheap (a simple {{Map.put()}} call really), so I went for the simplest form of the implementation. Let me know if that works. > Consolidate flow name/version/run defaults > -- > > Key: YARN-3461 > URL: https://issues.apache.org/jira/browse/YARN-3461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Attachments: YARN-3461-YARN-2928.01.patch, > YARN-3461-YARN-2928.02.patch, YARN-3461-YARN-2928.03.patch > > > In YARN-3391, it's not resolved what should be the defaults for flow > name/version/run. Let's continue the discussion here and unblock YARN-3391 > from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels
[ https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229408#comment-15229408 ] Wangda Tan commented on YARN-4855: -- Thanks [~Tao Jie] and discussions from [~Naganarasimha], [~sunilg]. [Comment|https://issues.apache.org/jira/browse/YARN-4855?focusedCommentId=15227603=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15227603] looks doable and backward compatible. However, \-checkNode seems very confusing. How about renaming it to "--fail-on-unkown-nodes", which is longer but clearer. Thoughts? > Should check if node exists when replace nodelabels > --- > > Key: YARN-4855 > URL: https://issues.apache.org/jira/browse/YARN-4855 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Tao Jie >Assignee: Tao Jie >Priority: Minor > Attachments: YARN-4855.001.patch > > > Today when we add nodelabels to nodes, it would succeed even if nodes are not > existing NodeManger in cluster without any message. > It could be like this: > When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be > denied if node does not exist. > When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would > add nodelabels no matter whether node exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4855) Should check if node exists when replace nodelabels
[ https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4855: - Assignee: Tao Jie > Should check if node exists when replace nodelabels > --- > > Key: YARN-4855 > URL: https://issues.apache.org/jira/browse/YARN-4855 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Tao Jie >Assignee: Tao Jie >Priority: Minor > Attachments: YARN-4855.001.patch > > > Today when we add nodelabels to nodes, it would succeed even if nodes are not > existing NodeManger in cluster without any message. > It could be like this: > When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be > denied if node does not exist. > When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would > add nodelabels no matter whether node exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY
[ https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229384#comment-15229384 ] Wangda Tan commented on YARN-2694: -- [~zhz], The major reason of only supporting specify node label in off-switch request is: existing node label is actually node partition, admin can limit how much resource that can be used by each queue in each partition. So if rack / node level request has different partition of off-switch request, scheduler cannot correctly calculate how much resource the app requested in each partition: all existing logic to calculate pending resources of an app relies on off-switch request. This patch doesn't allow specify any label in rack / node, all rack / node request contains label will be directly rejected. This is not correct, we fixed this in YARN-4140 that forces all requests under the off-switch request has same label. You can take a look at discussions on YARN-4140. If you want rack / node specify different label, they will be available after YARN-3409 (node constraints). Node constraints will be used to tag NMs, no more enforcements (such as ACL, resource limtis, etc.) will be added to node constraints. > Ensure only single node labels specified in resource request / host, and node > label expression only specified when resourceName=ANY > --- > > Key: YARN-2694 > URL: https://issues.apache.org/jira/browse/YARN-2694 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, > YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, > YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, > YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, > YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, > YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, > YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, > YARN-2694-20150205-3.patch, YARN-2694-branch-2.6.1.txt > > > Currently, node label expression supporting in capacity scheduler is partial > completed. Now node label expression specified in Resource Request will only > respected when it specified at ANY level. And a ResourceRequest/host with > multiple node labels will make user limit, etc. computation becomes more > tricky. > Now we need temporarily disable them, changes include, > - AMRMClient > - ApplicationMasterService > - RMAdminCLI > - CommonNodeLabelsManager -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2883) Queuing of container requests in the NM
[ https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2883: - Attachment: YARN-2883-trunk.010.patch Thanks [~asuresh] for the feedback. Addressing your comments and attaching new patch. > Queuing of container requests in the NM > --- > > Key: YARN-2883 > URL: https://issues.apache.org/jira/browse/YARN-2883 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-2883-trunk.004.patch, YARN-2883-trunk.005.patch, > YARN-2883-trunk.006.patch, YARN-2883-trunk.007.patch, > YARN-2883-trunk.008.patch, YARN-2883-trunk.009.patch, > YARN-2883-trunk.010.patch, YARN-2883-yarn-2877.001.patch, > YARN-2883-yarn-2877.002.patch, YARN-2883-yarn-2877.003.patch, > YARN-2883-yarn-2877.004.patch > > > We propose to add a queue in each NM, where queueable container requests can > be held. > Based on the available resources in the node and the containers in the queue, > the NM will decide when to allow the execution of a queued container. > In order to ensure the instantaneous start of a guaranteed-start container, > the NM may decide to pre-empt/kill running queueable containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY
[ https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229313#comment-15229313 ] Zhe Zhang commented on YARN-2694: - Hi [~jianhe], [~leftnoteasy], I have a few questions about this change: bq. Currently, node label expression supporting in capacity scheduler is partial completed. Now node label expression specified in Resource Request will only respected when it specified at ANY level. Could you elaborate a bit on this? Is it because it's hard to satisfy both node label and locality requirements? If so, when do you think we will be ready to enable Node or Rack level resource requests? Thanks, > Ensure only single node labels specified in resource request / host, and node > label expression only specified when resourceName=ANY > --- > > Key: YARN-2694 > URL: https://issues.apache.org/jira/browse/YARN-2694 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, > YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, > YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, > YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, > YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, > YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, > YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, > YARN-2694-20150205-3.patch, YARN-2694-branch-2.6.1.txt > > > Currently, node label expression supporting in capacity scheduler is partial > completed. Now node label expression specified in Resource Request will only > respected when it specified at ANY level. And a ResourceRequest/host with > multiple node labels will make user limit, etc. computation becomes more > tricky. > Now we need temporarily disable them, changes include, > - AMRMClient > - ApplicationMasterService > - RMAdminCLI > - CommonNodeLabelsManager -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4794) Distributed shell app gets stuck on stopping containers after App completes
[ https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4794: -- Attachment: YARN-4794.1.patch > Distributed shell app gets stuck on stopping containers after App completes > --- > > Key: YARN-4794 > URL: https://issues.apache.org/jira/browse/YARN-4794 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-4794.1.patch > > > Distributed shell app gets stuck on stopping containers after App completes > with the following exception > {code:title = app log} > 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application > completed. Stopping running containers > 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to > the server : java.nio.channels.ClosedByInterruptException > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4886) Add HDFS caller context for EntityGroupFSTimelineStore
[ https://issues.apache.org/jira/browse/YARN-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229240#comment-15229240 ] Mingliang Liu commented on YARN-4886: - +1 (non-binding) > Add HDFS caller context for EntityGroupFSTimelineStore > -- > > Key: YARN-4886 > URL: https://issues.apache.org/jira/browse/YARN-4886 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4886-trunk.001.patch > > > We need to add a HDFS caller context for the entity group FS storage for > better audit log debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4886) Add HDFS caller context for EntityGroupFSTimelineStore
[ https://issues.apache.org/jira/browse/YARN-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4886: Attachment: YARN-4886-trunk.001.patch Upload a simple patch to add HDFS caller context information for ATS storage. > Add HDFS caller context for EntityGroupFSTimelineStore > -- > > Key: YARN-4886 > URL: https://issues.apache.org/jira/browse/YARN-4886 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4886-trunk.001.patch > > > We need to add a HDFS caller context for the entity group FS storage for > better audit log debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4930) Convenience command to perform a work-preserving stop of the nodemanager
Jason Lowe created YARN-4930: Summary: Convenience command to perform a work-preserving stop of the nodemanager Key: YARN-4930 URL: https://issues.apache.org/jira/browse/YARN-4930 Project: Hadoop YARN Issue Type: Improvement Reporter: Jason Lowe Priority: Minor When the nodemanager is not under supervision (yarn.nodemanager.recovery.supervised=false) there may be cases where an admin wants to perform a work-preserving maintenance of the node but the nodemanager stop CLI command will cause the NM to cleanup containers. There should be a CLI command that shuts down the NM while preserving local filesystem and container state to allow a subsequent nodemanager start to recover containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4930) Convenience command to perform a work-preserving stop of the nodemanager
[ https://issues.apache.org/jira/browse/YARN-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229060#comment-15229060 ] Jason Lowe commented on YARN-4930: -- Currently admins can accomplish the task via a kill -9 of the nodemanager process, but it would be better to encapsulate the specific mechanism into a CLI command. > Convenience command to perform a work-preserving stop of the nodemanager > > > Key: YARN-4930 > URL: https://issues.apache.org/jira/browse/YARN-4930 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jason Lowe >Priority: Minor > > When the nodemanager is not under supervision > (yarn.nodemanager.recovery.supervised=false) there may be cases where an > admin wants to perform a work-preserving maintenance of the node but the > nodemanager stop CLI command will cause the NM to cleanup containers. There > should be a CLI command that shuts down the NM while preserving local > filesystem and container state to allow a subsequent nodemanager start to > recover containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)
[ https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229052#comment-15229052 ] Hadoop QA commented on YARN-4630: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 27s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 3s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 19s {color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 17s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 17s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 21s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 21s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 20s {color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 16s {color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 47s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 16s {color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 53s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 21s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s {color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 27s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | |
[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults
[ https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229017#comment-15229017 ] Li Lu commented on YARN-3461: - I looked at the patch and felt OK with it. > Consolidate flow name/version/run defaults > -- > > Key: YARN-3461 > URL: https://issues.apache.org/jira/browse/YARN-3461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Attachments: YARN-3461-YARN-2928.01.patch, > YARN-3461-YARN-2928.02.patch > > > In YARN-3391, it's not resolved what should be the defaults for flow > name/version/run. Let's continue the discussion here and unblock YARN-3391 > from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults
[ https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228955#comment-15228955 ] Varun Saxena commented on YARN-3461: Ok... > Consolidate flow name/version/run defaults > -- > > Key: YARN-3461 > URL: https://issues.apache.org/jira/browse/YARN-3461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Attachments: YARN-3461-YARN-2928.01.patch, > YARN-3461-YARN-2928.02.patch > > > In YARN-3391, it's not resolved what should be the defaults for flow > name/version/run. Let's continue the discussion here and unblock YARN-3391 > from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels
[ https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228937#comment-15228937 ] Naganarasimha G R commented on YARN-4855: - Thanks [~sunilg] for sharing your views, as i mentioned earlier IIUC the intent of not validating the existance of the node is to just make admin life easier so that in certain cases he need not worry whether the node is temporarily down and he can just apply the label mappings. so approaches you mentioned overrides this behavior. But I can also understand [~Tao Jie]'s concern that if the node (host ip or port) is wrongly configured better to warn before hand than erroneously configure the cluster. So we have pro's and cons with the existing approach. So i am ok with [~Tao Jie]'s approach but with better option name (-verifyNode ?), also one more aspect is does it require a protocol change can we do it in the client side ? > Should check if node exists when replace nodelabels > --- > > Key: YARN-4855 > URL: https://issues.apache.org/jira/browse/YARN-4855 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Tao Jie >Priority: Minor > Attachments: YARN-4855.001.patch > > > Today when we add nodelabels to nodes, it would succeed even if nodes are not > existing NodeManger in cluster without any message. > It could be like this: > When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be > denied if node does not exist. > When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would > add nodelabels no matter whether node exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4807) MockAM#waitForState sleep duration is too long
[ https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228934#comment-15228934 ] Yufei Gu commented on YARN-4807: [~ka...@cloudera.com], you are right. I have a misunderstanding to the Hadoop QA's message. I found the following unit test cases failed because of we remove the minimum time for attempt. So I will file a JIRA YARN-4929 for it. - TestAMRestart.testRMAppAttemptFailuresValidityInterval - TestApplicationMasterService.testResourceTypes - TestContainerResourceUsage.testUsageAfterAMRestartWithMultipleContainers - TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForFairSche - TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForCapacitySche > MockAM#waitForState sleep duration is too long > -- > > Key: YARN-4807 > URL: https://issues.apache.org/jira/browse/YARN-4807 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Yufei Gu > Labels: newbie > Attachments: YARN-4807.001.patch, YARN-4807.002.patch, > YARN-4807.003.patch, YARN-4807.004.patch, YARN-4807.005.patch, > YARN-4807.006.patch, YARN-4807.007.patch > > > MockAM#waitForState sleep duration (500 ms) is too long. Also, there is > significant duplication with MockRM#waitForState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4929) Fix unit test cases failed because of removing the minimum wait time for attempt.
Yufei Gu created YARN-4929: -- Summary: Fix unit test cases failed because of removing the minimum wait time for attempt. Key: YARN-4929 URL: https://issues.apache.org/jira/browse/YARN-4929 Project: Hadoop YARN Issue Type: Bug Reporter: Yufei Gu Assignee: Yufei Gu The following unit test cases failed because of we remove the minimum wait time for attempt. - TestAMRestart.testRMAppAttemptFailuresValidityInterval - TestApplicationMasterService.testResourceTypes - TestContainerResourceUsage.testUsageAfterAMRestartWithMultipleContainers - TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForFairSche - TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForCapacitySche -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4562) YARN WebApp ignores the configuration passed to it for keystore settings
[ https://issues.apache.org/jira/browse/YARN-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228924#comment-15228924 ] Sergey Shelukhin commented on YARN-4562: That is true only if ssl-server.xml is present :) Yes, that works. > YARN WebApp ignores the configuration passed to it for keystore settings > > > Key: YARN-4562 > URL: https://issues.apache.org/jira/browse/YARN-4562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: YARN-4562.patch > > > The conf can be passed to WebApps builder, however the following code in > WebApps.java that builds the HttpServer2 object: > {noformat} > if (httpScheme.equals(WebAppUtils.HTTPS_PREFIX)) { > WebAppUtils.loadSslConfiguration(builder); > } > {noformat} > ...results in loadSslConfiguration creating a new Configuration object; the > one that is passed in is ignored, as far as the keystore/etc. settings are > concerned. loadSslConfiguration has another overload with Configuration > parameter that should be used instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4810) NM applicationpage cause internal error 500
[ https://issues.apache.org/jira/browse/YARN-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228925#comment-15228925 ] Naganarasimha G R commented on YARN-4810: - Thanks [~bibinchundatt] for the patch, lgtm +1. Will wait for some time before committing it. > NM applicationpage cause internal error 500 > --- > > Key: YARN-4810 > URL: https://issues.apache.org/jira/browse/YARN-4810 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4810.patch, 0002-YARN-4810.patch, 1.png, 2.png > > > Use url /node/application/ > *Case 1* > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.webapp.dao.AppInfo.(AppInfo.java:45) > at > org.apache.hadoop.yarn.server.nodemanager.webapp.ApplicationPage$ApplicationBlock.render(ApplicationPage.java:82) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848) > at > org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) > at > org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) > at > org.apache.hadoop.yarn.server.nodemanager.webapp.NMController.application(NMController.java:58) > ... 44 more > {noformat} > *Case 2* > {noformat} > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > Caused by: java.util.NoSuchElementException > at > com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) > at > org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:131) > at > org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:126) > at > org.apache.hadoop.yarn.server.nodemanager.webapp.ApplicationPage$ApplicationBlock.render(ApplicationPage.java:79) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848) > at > org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) > at > org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) > at > org.apache.hadoop.yarn.server.nodemanager.webapp.NMController.application(NMController.java:58) > ... 44 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults
[ https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228920#comment-15228920 ] Naganarasimha G R commented on YARN-3461: - Thanks for the patch [~sjlee0] few nits in the patch # i am not sure why earlier code was trying to set these flags in {{AMLauncher.setupTokens}} methods, would it be better to set it in the caller {{createAMContainerLaunchContext}} *or* a seperate method for it *or* change the method name to be more meaningful. Thoughts? # i think instead of calling {{setFlowTags}} twice may be we can have additional parameter indicating the default value along with pushing of {{tag.split(":", 2)}} inside {{setFlowTags}} > Consolidate flow name/version/run defaults > -- > > Key: YARN-3461 > URL: https://issues.apache.org/jira/browse/YARN-3461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Attachments: YARN-3461-YARN-2928.01.patch, > YARN-3461-YARN-2928.02.patch > > > In YARN-3391, it's not resolved what should be the defaults for flow > name/version/run. Let's continue the discussion here and unblock YARN-3391 > from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels
[ https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228878#comment-15228878 ] Marcin Tustin commented on YARN-3216: - [~sunilg] Yeah, YARN-4751 will be VERY nice to have. [~wangda] [~Naganarasimha Garla] My pleasure! Just trying to build up the community's knowledge, and keep it shared. > Max-AM-Resource-Percentage should respect node labels > - > > Key: YARN-3216 > URL: https://issues.apache.org/jira/browse/YARN-3216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, > 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, > 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, > 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch > > > Currently, max-am-resource-percentage considers default_partition only. When > a queue can access multiple partitions, we should be able to compute > max-am-resource-percentage based on that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels
[ https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228874#comment-15228874 ] Naganarasimha G R commented on YARN-3216: - Thanks [~marcin.tustin] its a useful doc and yes [~sunilg] it would be ideal to have fix specific to 2.7.x as its a mjor limitation to use nodelabels. > Max-AM-Resource-Percentage should respect node labels > - > > Key: YARN-3216 > URL: https://issues.apache.org/jira/browse/YARN-3216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, > 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, > 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, > 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch > > > Currently, max-am-resource-percentage considers default_partition only. When > a queue can access multiple partitions, we should be able to compute > max-am-resource-percentage based on that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels
[ https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228860#comment-15228860 ] Sunil G commented on YARN-3216: --- Yes. This is helpful. Thanks Marcin. With the recent progress in YARN-4751, I think I can make some necessary changes here too. I will wait to see the dependencies cleared out for YARN-4751, and will make the progress here. > Max-AM-Resource-Percentage should respect node labels > - > > Key: YARN-3216 > URL: https://issues.apache.org/jira/browse/YARN-3216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, > 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, > 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, > 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch > > > Currently, max-am-resource-percentage considers default_partition only. When > a queue can access multiple partitions, we should be able to compute > max-am-resource-percentage based on that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels
[ https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228853#comment-15228853 ] Wangda Tan commented on YARN-3216: -- Thanks [~marcin.tustin], that is very helpful! > Max-AM-Resource-Percentage should respect node labels > - > > Key: YARN-3216 > URL: https://issues.apache.org/jira/browse/YARN-3216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, > 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, > 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, > 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch > > > Currently, max-am-resource-percentage considers default_partition only. When > a queue can access multiple partitions, we should be able to compute > max-am-resource-percentage based on that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults
[ https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228845#comment-15228845 ] Li Lu commented on YARN-3461: - Hi [~varun_saxena] could you please hold a bit so that I can have a look at the patch this afternoon? Thanks! > Consolidate flow name/version/run defaults > -- > > Key: YARN-3461 > URL: https://issues.apache.org/jira/browse/YARN-3461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Attachments: YARN-3461-YARN-2928.01.patch, > YARN-3461-YARN-2928.02.patch > > > In YARN-3391, it's not resolved what should be the defaults for flow > name/version/run. Let's continue the discussion here and unblock YARN-3391 > from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels
[ https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228839#comment-15228839 ] Marcin Tustin commented on YARN-3216: - I've written up how we worked around this issue here: https://medium.com/handy-tech/practical-capacity-scheduling-with-yarn-28548ae4fb88#.5ihha0oqy > Max-AM-Resource-Percentage should respect node labels > - > > Key: YARN-3216 > URL: https://issues.apache.org/jira/browse/YARN-3216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, > 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, > 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, > 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch > > > Currently, max-am-resource-percentage considers default_partition only. When > a queue can access multiple partitions, we should be able to compute > max-am-resource-percentage based on that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression
[ https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228836#comment-15228836 ] Sunil G commented on YARN-4925: --- Thanks [~Naganarasimha Garla] for the clarification. One final doubt, in this case if we change the labelExpression for ANY containerRequest , it will replace the expression set by node/rack local earlier. Correct. > ContainerRequest in AMRMClient, application should be able to specify > nodes/racks together with nodeLabelExpression > --- > > Key: YARN-4925 > URL: https://issues.apache.org/jira/browse/YARN-4925 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Currently with nodelabel AMRMClient will not be able to specify nodelabels > with Node/Rack requests.For application like spark NODE_LOCAL requests cannot > be asked with label expression. > As per the check in {{AMRMClientImpl#checkNodeLabelExpression}} > {noformat} > // Don't allow specify node label against ANY request > if ((containerRequest.getRacks() != null && > (!containerRequest.getRacks().isEmpty())) > || > (containerRequest.getNodes() != null && > (!containerRequest.getNodes().isEmpty( { > throw new InvalidContainerRequestException( > "Cannot specify node label with rack and node"); > } > {noformat} > {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of > OFF-SWITCH. > The above check is not required for ContainerRequest ask /cc [~wangda] thank > you for confirming -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression
[ https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228824#comment-15228824 ] Naganarasimha G R commented on YARN-4925: - IIUC, [~bibinchundatt]'s solution was to keep the code same i.e. set the nodelabel expression present containerRequest for only *any* request thus user will be able to specify node/rack locality with label expression. > ContainerRequest in AMRMClient, application should be able to specify > nodes/racks together with nodeLabelExpression > --- > > Key: YARN-4925 > URL: https://issues.apache.org/jira/browse/YARN-4925 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Currently with nodelabel AMRMClient will not be able to specify nodelabels > with Node/Rack requests.For application like spark NODE_LOCAL requests cannot > be asked with label expression. > As per the check in {{AMRMClientImpl#checkNodeLabelExpression}} > {noformat} > // Don't allow specify node label against ANY request > if ((containerRequest.getRacks() != null && > (!containerRequest.getRacks().isEmpty())) > || > (containerRequest.getNodes() != null && > (!containerRequest.getNodes().isEmpty( { > throw new InvalidContainerRequestException( > "Cannot specify node label with rack and node"); > } > {noformat} > {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of > OFF-SWITCH. > The above check is not required for ContainerRequest ask /cc [~wangda] thank > you for confirming -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression
[ https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228803#comment-15228803 ] Sunil G commented on YARN-4925: --- Hi [~bibinchundatt] I have one doubt here. After removing this check, now ResourceRequest can have label exp for NodeLocal or RackLocal or ANY. With YARN-4140, label specified in ANY will reset all other ResourceRequest's label expression (for same priority). Is this intended OR will it solve the case mentioned in Spark scenario? > ContainerRequest in AMRMClient, application should be able to specify > nodes/racks together with nodeLabelExpression > --- > > Key: YARN-4925 > URL: https://issues.apache.org/jira/browse/YARN-4925 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Currently with nodelabel AMRMClient will not be able to specify nodelabels > with Node/Rack requests.For application like spark NODE_LOCAL requests cannot > be asked with label expression. > As per the check in {{AMRMClientImpl#checkNodeLabelExpression}} > {noformat} > // Don't allow specify node label against ANY request > if ((containerRequest.getRacks() != null && > (!containerRequest.getRacks().isEmpty())) > || > (containerRequest.getNodes() != null && > (!containerRequest.getNodes().isEmpty( { > throw new InvalidContainerRequestException( > "Cannot specify node label with rack and node"); > } > {noformat} > {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of > OFF-SWITCH. > The above check is not required for ContainerRequest ask /cc [~wangda] thank > you for confirming -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults
[ https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228748#comment-15228748 ] Varun Saxena commented on YARN-3461: [~sjlee0], thanks for updating the patch. It looks good to me overall. Will wait for a while before committing it, to give others a chance to look at it. > Consolidate flow name/version/run defaults > -- > > Key: YARN-3461 > URL: https://issues.apache.org/jira/browse/YARN-3461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Attachments: YARN-3461-YARN-2928.01.patch, > YARN-3461-YARN-2928.02.patch > > > In YARN-3391, it's not resolved what should be the defaults for flow > name/version/run. Let's continue the discussion here and unblock YARN-3391 > from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228743#comment-15228743 ] Steve Loughran commented on YARN-4928: -- an absolute {{/tmp}} path is "probably" mapped into a path on the current drive, e.g. {{c:\tmp}} . (Trivia: every drive on a windows system has its own current directory). > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4784) fair scheduler: defaultQueueSchedulingPolicy should not accept fifo as a value
[ https://issues.apache.org/jira/browse/YARN-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-4784: --- Attachment: YARN-4784.002.patch [~ka...@cloudera.com], I uploaded a new patch. Would you please take a look? > fair scheduler: defaultQueueSchedulingPolicy should not accept fifo as a value > -- > > Key: YARN-4784 > URL: https://issues.apache.org/jira/browse/YARN-4784 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-4784.001.patch, YARN-4784.002.patch > > > The configure item defaultQueueSchedulingPolicy should not accept fifo as a > value since it is an invalid value for non-leaf queues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels
[ https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228684#comment-15228684 ] Sunil G commented on YARN-4855: --- bq.This would be a incompatible change Yes. I agree with [~Naganarasimha Garla] here. This can be an incompatible behavior compared to older version. Suddenly we may get some exception which were not thrown earlier. {{yarn rmadmin -replaceLabelsOnNode -checkNode "node1=label1"}} seems to me like a workaround rather than a clean fix. And it may be difficult to understand what is meant by {{-checkNode}}. So I would like to get some more thoughts on original point with few suggestions. - Could we support this validation in {{replaceLabelsOnNode}} and document it cleanly about the new validation. OR - Can we skip such nodes which are not existing, but log it clearly (may be audit log too). I think in this earlier stages, it may be better. But its very much debatable. Thoughts? > Should check if node exists when replace nodelabels > --- > > Key: YARN-4855 > URL: https://issues.apache.org/jira/browse/YARN-4855 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Tao Jie >Priority: Minor > Attachments: YARN-4855.001.patch > > > Today when we add nodelabels to nodes, it would succeed even if nodes are not > existing NodeManger in cluster without any message. > It could be like this: > When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be > denied if node does not exist. > When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would > add nodelabels no matter whether node exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)
[ https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-4630: Assignee: Kousuke Saruta > Remove useless boxing/unboxing code (Hadoop YARN) > - > > Key: YARN-4630 > URL: https://issues.apache.org/jira/browse/YARN-4630 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Attachments: YARN-4630.0.patch, YARN-4630.1.patch > > > There are lots of places where useless boxing/unboxing occur. > To avoid performance issue, let's remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults
[ https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228509#comment-15228509 ] Sangjin Lee commented on YARN-3461: --- The unit test failures should be unrelated. I have seen those failures on trunk as well. I would greatly appreciate your review. Thanks! > Consolidate flow name/version/run defaults > -- > > Key: YARN-3461 > URL: https://issues.apache.org/jira/browse/YARN-3461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Attachments: YARN-3461-YARN-2928.01.patch, > YARN-3461-YARN-2928.02.patch > > > In YARN-3391, it's not resolved what should be the defaults for flow > name/version/run. Let's continue the discussion here and unblock YARN-3391 > from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4756) Unnecessary wait in Node Status Updater during reboot
[ https://issues.apache.org/jira/browse/YARN-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228520#comment-15228520 ] Eric Badger commented on YARN-4756: --- [~kasha], does my explanation make sense? Do you have any other concerns with this patch? > Unnecessary wait in Node Status Updater during reboot > - > > Key: YARN-4756 > URL: https://issues.apache.org/jira/browse/YARN-4756 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: YARN-4756.001.patch, YARN-4756.002.patch, > YARN-4756.003.patch, YARN-4756.004.patch, YARN-4756.005.patch > > > The startStatusUpdater thread waits for the isStopped variable to be set to > true, but it is waiting for the next heartbeat. During a reboot, the next > heartbeat will not come and so the thread waits for a timeout. Instead, we > should notify the thread to continue so that it can check the isStopped > variable and exit without having to wait for a timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
[ https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228492#comment-15228492 ] Sunil G commented on YARN-4624: --- I think this has gone stale. Lets move forward. In YARN-4304, one of the small improvement was to hide {{maxAMPercentageLimit}} for parent queue. We achieved this by declaring this variable as {{Float}} and was setting to null if queue is ParentQueue. Now we have got this corner scenario when labels are available in cluster, and no label-mappings are defined in CS. We have fixed partially in YARN-4634, however we can still have some more cases like here. So ideally its better to handle null check for {{getMaxAMPercentageLimit}} from UI to keep the improvement what we have done as part of YARN-4304. But after seeing the findbug warning, essentially we are boxing and unboxing this float variable. And I could see that there are some other parrellel effort going on in other tickets to avoid such cases.YARN-4630. So I think we can still keep {{float}} and avoid this problem. But we will loose small part of improvement there. So I think may be we can group this new metrics (maxAMPercentageLimit etc) in other DAO object and can use it. If its fine we can go with v1 patch here and unblock the scheduler issue. And other improvement can be tracked separately. [~leftnoteasy], Could you pls share your thoughts here. > NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI > --- > > Key: YARN-4624 > URL: https://issues.apache.org/jira/browse/YARN-4624 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, > YARN-4624-003.patch, YARN-4624.patch > > > Scenario: > === > Configure nodelables and add to cluster > Start the cluster > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228424#comment-15228424 ] Arpit Agarwal commented on YARN-4928: - >From looking at the test cases I think these paths will be only used as HDFS >paths. > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228459#comment-15228459 ] Gergely Novák commented on YARN-4928: - No, you're right, this actually does use the local file system. HDFS-6189 uses a different approach as well, let me update the patch based on that. > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4928: - Assignee: Gergely Novák > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228450#comment-15228450 ] Junping Du commented on YARN-4928: -- Oh. Sorry. My previous comments could be misleading... Yes. It is HDFS path so patch should be OK. +1 too. > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228418#comment-15228418 ] Junping Du commented on YARN-4928: -- Is absolute path "/tmp" works on Windows? I think we don't want test case to touch system absolute path in other JIRA discussions. [~ste...@apache.org], any comments here? > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228409#comment-15228409 ] Arpit Agarwal commented on YARN-4928: - +1 from me. [~djp], do you have any comments? > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4457) Cleanup unchecked types for EventHandler
[ https://issues.apache.org/jira/browse/YARN-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-4457: --- Attachment: YARN-4457.004.patch Here's a rebased patch that includes all the changes and fixes the formatting issue. > Cleanup unchecked types for EventHandler > > > Key: YARN-4457 > URL: https://issues.apache.org/jira/browse/YARN-4457 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4457.001.patch, YARN-4457.002.patch, > YARN-4457.003.patch, YARN-4457.004.patch > > > The EventHandler class is often used in an untyped context resulting in a > bunch of warnings about unchecked usage. The culprit is the > {{Dispatcher.getHandler()}} method. Fixing the typing on the method to > return {{EventHandler}} instead of {{EventHandler}} clears up the > errors and doesn't not introduce any incompatible changes. In the case that > some code does: > {code} > EventHandler h = dispatcher.getHandler(); > {code} > it will still work and will issue a compiler warning about raw types. There > are, however, no instances of this issue in the current source base. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)
[ https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated YARN-4630: - Attachment: YARN-4630.1.patch [~ajisakaa] Sorry I've overlooked your feedback. I've just fixed what you pointed out. > Remove useless boxing/unboxing code (Hadoop YARN) > - > > Key: YARN-4630 > URL: https://issues.apache.org/jira/browse/YARN-4630 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Priority: Minor > Attachments: YARN-4630.0.patch, YARN-4630.1.patch > > > There are lots of places where useless boxing/unboxing occur. > To avoid performance issue, let's remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4849) [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add licenses.
[ https://issues.apache.org/jira/browse/YARN-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228320#comment-15228320 ] Sunil G commented on YARN-4849: --- Thank you [~leftnoteasy] for the detailed cleaning up and mvn integration activity. Overall it looks fine for me.. I could build this -Pyarn-ui. But i havenot tested it, I will cover that as part of YARN-4515 Few comments from my end. Pls correct me if I am wrong. 1. In {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnUI2.md}} We may need some change in below part {noformat} Try it -- * Packaging and deploying Hadoop in this branch * Modify `app/adapters/yarn-app.js`, change `host` to your YARN RM web address * If you running YARN RM in your localhost, you should install `npm install -g corsproxy` and run `corsproxy` to avoid CORS errors. More details: `https://www.npmjs.com/package/corsproxy`. And the `host` of `app/adapters/yarn-app.js` should start with `localhost:1337`. * Run `ember server` * Visit your app at [http://localhost:4200](http://localhost:4200). {noformat} - {{Modify `app/adapters/yarn-app.js`, change `host` to}}, With YARN-4514, we have a better way to handle this. We can refer to config.js instead. - Run `ember server`. I think it is "ember serve". 2. In hadoop-yarn-ui/pom.xml, will "clean" target work? With clean, we expect the generated source and war file to be removed, correct? 3. In src/main/resources/META-INF/LICENSE.txt, Apache TEZ ui is mentioned in many places. Is it Apache YARN UI? 4. I may be wrong here Or could not find this info. In hadoop-yarn-ui/src/main/webapp/tests, how this will be triggered by jenkins. Pls help to share some information. > [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add > licenses. > --- > > Key: YARN-4849 > URL: https://issues.apache.org/jira/browse/YARN-4849 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4849-YARN-3368.1.patch, > YARN-4849-YARN-3368.2.patch, YARN-4849-YARN-3368.3.patch, > YARN-4849-YARN-3368.4.patch, YARN-4849-YARN-3368.5.patch, > YARN-4849-YARN-3368.6.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4795) ContainerMetrics drops records
[ https://issues.apache.org/jira/browse/YARN-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-4795: --- Attachment: YARN-4795.002.patch Nice catch. Here's a rebased patch that addresses the issues. > ContainerMetrics drops records > -- > > Key: YARN-4795 > URL: https://issues.apache.org/jira/browse/YARN-4795 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4795.001.patch, YARN-4795.002.patch > > > The metrics2 system was implemented to deal with persistent sources. > {{ContainerMetrics}} is an ephemeral source, and so it causes problems. > Specifically, the {{ContainerMetrics}} only reports metrics once after the > container has been stopped. This behavior is a problem because the metrics2 > system can ask sources for reports that will be quietly dropped by the sinks > that care. (It's a metrics2 feature, not a bug.) If that final report is > silently dropped, it's lost, because the {{ContainerMetrics}} won't report > anything else ever anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
[ https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4514: -- Attachment: YARN-4514-YARN-3368.4.patch Thank you [~leftnoteasy] for the comments. Yes, I agree. Its a valid point. We will have it also configured by default. > [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses > -- > > Key: YARN-4514 > URL: https://issues.apache.org/jira/browse/YARN-4514 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: YARN-4514-YARN-3368.1.patch, > YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, > YARN-4514-YARN-3368.4.patch > > > We have several configurations are hard-coded, for example, RM/ATS addresses, > we should make them configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up
[ https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228258#comment-15228258 ] Nathan Roberts commented on YARN-4924: -- Sorry [~sandflee]. I missed your comment about updating YARN-4051. That seems fine with me! > NM recovery race can lead to container not cleaned up > - > > Key: YARN-4924 > URL: https://issues.apache.org/jira/browse/YARN-4924 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.7.2 >Reporter: Nathan Roberts > > It's probably a small window but we observed a case where the NM crashed and > then a container was not properly cleaned up during recovery. > I will add details in first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4924) NM recovery race can lead to container not cleaned up
[ https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts updated YARN-4924: - Assignee: (was: Nathan Roberts) > NM recovery race can lead to container not cleaned up > - > > Key: YARN-4924 > URL: https://issues.apache.org/jira/browse/YARN-4924 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.7.2 >Reporter: Nathan Roberts > > It's probably a small window but we observed a case where the NM crashed and > then a container was not properly cleaned up during recovery. > I will add details in first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up
[ https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228243#comment-15228243 ] Nathan Roberts commented on YARN-4924: -- Thanks [~sandflee], [~jlowe] for the suggestion. I'll work up a fix soon. > NM recovery race can lead to container not cleaned up > - > > Key: YARN-4924 > URL: https://issues.apache.org/jira/browse/YARN-4924 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.7.2 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > > It's probably a small window but we observed a case where the NM crashed and > then a container was not properly cleaned up during recovery. > I will add details in first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4924) NM recovery race can lead to container not cleaned up
[ https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts reassigned YARN-4924: Assignee: Nathan Roberts > NM recovery race can lead to container not cleaned up > - > > Key: YARN-4924 > URL: https://issues.apache.org/jira/browse/YARN-4924 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.7.2 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > > It's probably a small window but we observed a case where the NM crashed and > then a container was not properly cleaned up during recovery. > I will add details in first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up
[ https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228198#comment-15228198 ] Jason Lowe commented on YARN-4924: -- I agree with [~sandflee] that postponing the finish app event dispatch until after we've waited for the containers to complete recovering would be an appropriate fix. > NM recovery race can lead to container not cleaned up > - > > Key: YARN-4924 > URL: https://issues.apache.org/jira/browse/YARN-4924 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.7.2 >Reporter: Nathan Roberts > > It's probably a small window but we observed a case where the NM crashed and > then a container was not properly cleaned up during recovery. > I will add details in first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels
[ https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228183#comment-15228183 ] Naganarasimha G R commented on YARN-4855: - Thanks for working on the patch, but would like to know the view of other community members as you are modifying the protocol in the patch. Any thoughts ? cc/ [~wangda], [~sunilg] > Should check if node exists when replace nodelabels > --- > > Key: YARN-4855 > URL: https://issues.apache.org/jira/browse/YARN-4855 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Tao Jie >Priority: Minor > Attachments: YARN-4855.001.patch > > > Today when we add nodelabels to nodes, it would succeed even if nodes are not > existing NodeManger in cluster without any message. > It could be like this: > When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be > denied if node does not exist. > When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would > add nodelabels no matter whether node exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228082#comment-15228082 ] Hadoop QA commented on YARN-4928: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 26s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 9s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 8s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s {color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 55s {color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 59s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12797274/YARN-4928.001.patch | | JIRA Issue | YARN-4928 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 769b4dfe493c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Updated] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergely Novák updated YARN-4928: Attachment: YARN-4928.001.patch > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
Gergely Novák created YARN-4928: --- Summary: Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon Key: YARN-4928 URL: https://issues.apache.org/jira/browse/YARN-4928 Project: Hadoop YARN Issue Type: Bug Components: test Affects Versions: 2.8.0 Environment: OS: Windows Server 2012 JDK: 1.7.0_79 Reporter: Gergely Novák Priority: Minor yarn.server.timeline.TestEntityGroupFSTimelineStore.* and yarn.server.timeline.TestLogInfo.* fail on Windows, because they are attempting to use a test root paths like "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", which contains a ":" (after the Windows drive letter) and DFSUtil.isValidName() does not accept paths containing ":". This problem is identical to HDFS-6189, so I suggest to use the same approach: using "/tmp/..." as test root dir instead of System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4855) Should check if node exists when replace nodelabels
[ https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Jie updated YARN-4855: -- Attachment: YARN-4855.001.patch > Should check if node exists when replace nodelabels > --- > > Key: YARN-4855 > URL: https://issues.apache.org/jira/browse/YARN-4855 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Tao Jie >Priority: Minor > Attachments: YARN-4855.001.patch > > > Today when we add nodelabels to nodes, it would succeed even if nodes are not > existing NodeManger in cluster without any message. > It could be like this: > When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be > denied if node does not exist. > When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would > add nodelabels no matter whether node exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4906) Capture container start/finish time in container metrics
[ https://issues.apache.org/jira/browse/YARN-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227943#comment-15227943 ] Hudson commented on YARN-4906: -- FAILURE: Integrated in Hadoop-trunk-Commit #9566 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9566/]) YARN-4906. Capture container start/finish time in container metrics. (vvasudev: rev b41e65e5bc9459b4d950a2c53860a223f1a0d2ec) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestAuxServices.java > Capture container start/finish time in container metrics > > > Key: YARN-4906 > URL: https://issues.apache.org/jira/browse/YARN-4906 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.9.0 > > Attachments: YARN-4906.1.patch, YARN-4906.2.patch, YARN-4906.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4769) Add support for CSRF header in the dump capacity scheduler logs and kill app buttons in RM web UI
[ https://issues.apache.org/jira/browse/YARN-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227904#comment-15227904 ] Varun Vasudev commented on YARN-4769: - [~jianhe] - can you please review? Thanks! > Add support for CSRF header in the dump capacity scheduler logs and kill app > buttons in RM web UI > - > > Key: YARN-4769 > URL: https://issues.apache.org/jira/browse/YARN-4769 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4769.001.patch > > > YARN-4737 adds support for CSRF filters in YARN. If the CSRF filter is > enabled, the current functionality to dump the capacity scheduler logs and > kill an app from the RM web UI will not work due to the missing CSRF header. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4901) MockRM should clear the QueueMetrics when it starts
[ https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227867#comment-15227867 ] Karthik Kambatla commented on YARN-4901: TestNMReconnect fails with FairScheduler because of this. > MockRM should clear the QueueMetrics when it starts > --- > > Key: YARN-4901 > URL: https://issues.apache.org/jira/browse/YARN-4901 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Daniel Templeton >Assignee: Daniel Templeton > > The {{ResourceManager}} rightly assumes that when it starts, it's starting > from naught. The {{MockRM}}, however, violates that assumption. For > example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} > instance. The {{QueueMetrics.queueMetrics}} field is static, which means > that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} > bleed over. Having the MockRM clear the {{QueueMetrics}} when it starts > should resolve the issue. I haven't looked yet at scope to see how hard easy > that is to do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4855) Should check if node exists when replace nodelabels
[ https://issues.apache.org/jira/browse/YARN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227863#comment-15227863 ] Tao Jie commented on YARN-4855: --- [~Naganarasimha], thanks for reply. We are trying to split resource pool for certain Applications from our cluster by nodeLabel. For example, we need a resource pool of 10 nodes, then we run a script with *rmadmin -replaceLabelsOnNode* cmd. However, this command always run successfully even though nodes are not available in cluster (node is down or misspelling of hostname). As a result, we should do something more to ensure the capacity of the resource pool. So I hope this option may help. > Should check if node exists when replace nodelabels > --- > > Key: YARN-4855 > URL: https://issues.apache.org/jira/browse/YARN-4855 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Tao Jie >Priority: Minor > > Today when we add nodelabels to nodes, it would succeed even if nodes are not > existing NodeManger in cluster without any message. > It could be like this: > When we use *yarn rmadmin -replaceLabelsOnNode "node1=label1"*, it would be > denied if node does not exist. > When we use *yarn rmadmin -replaceLabelsOnNode -force "node1=label1"* would > add nodelabels no matter whether node exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run
[ https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227816#comment-15227816 ] Hadoop QA commented on YARN-3998: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 30s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 50s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 43s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 48s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 40s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 1s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 1s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 55s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 17 new + 773 unchanged - 8 fixed = 790 total (was 781) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 2s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 23s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 14s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_77. {color} |
[jira] [Commented] (YARN-4927) TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the default
[ https://issues.apache.org/jira/browse/YARN-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227786#comment-15227786 ] Rohith Sharma K S commented on YARN-4927: - Thanks Karthik for finding out test failure. Even in HadoopQA result could not find because of test runs using default configurations. I think there are 2 options to handle it, firstly test class can run with parameterized. But only 1 test case depends on CS and rest all other tests are independent of any scheduler specific. Secondly, this particular test case can be moved out OR skip if fair scheduler is used. > TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the > default > > > Key: YARN-4927 > URL: https://issues.apache.org/jira/browse/YARN-4927 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > YARN-3893 adds this test, that relies on some CapacityScheduler-specific > stuff for refreshAll to fail, which doesn't apply when using FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)