[jira] [Comment Edited] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
[ https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405371#comment-15405371 ] Sunil G edited comment on YARN-4624 at 8/3/16 5:54 AM: --- Yes [~Naganarasimha Garla]. Thanks for the update. Attaching a rebased patch given by [~brahmareddy]. Test case is not needed as we are changes data type from boxed to normal float. was (Author: sunilg): Yes [~Naganarasimha Garla]. Thanks for the updated. Attaching a rebased patch given by [~brahmareddy]. Test case is not needed as we are changes data type from boxed to normal float. > NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI > --- > > Key: YARN-4624 > URL: https://issues.apache.org/jira/browse/YARN-4624 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, > YARN-4624-003.patch, YARN-4624.4.patch, YARN-4624.patch > > > Scenario: > === > Configure nodelables and add to cluster > Start the cluster > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
[ https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4624: -- Attachment: YARN-4624.4.patch Yes [~Naganarasimha Garla]. Thanks for the updated. Attaching a rebased patch given by [~brahmareddy]. Test case is not needed as we are changes data type from boxed to normal float. > NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI > --- > > Key: YARN-4624 > URL: https://issues.apache.org/jira/browse/YARN-4624 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, > YARN-4624-003.patch, YARN-4624.4.patch, YARN-4624.patch > > > Scenario: > === > Configure nodelables and add to cluster > Start the cluster > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5428) Allow for specifying the docker client configuration directory
[ https://issues.apache.org/jira/browse/YARN-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405355#comment-15405355 ] Allen Wittenauer commented on YARN-5428: Why would an admin provide creds and not individual users? Why should there be a global store of credentials? What prevents a user from stealing these global creds? > Allow for specifying the docker client configuration directory > -- > > Key: YARN-5428 > URL: https://issues.apache.org/jira/browse/YARN-5428 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-5428.001.patch, YARN-5428.002.patch, > YARN-5428.003.patch, YARN-5428.004.patch > > > The docker client allows for specifying a configuration directory that > contains the docker client's configuration. It is common to store "docker > login" credentials in this config, to avoid the need to docker login on each > cluster member. > By default the docker client config is $HOME/.docker/config.json on Linux. > However, this does not work with the current container executor user > switching and it may also be desirable to centralize this configuration > beyond the single user's home directory. > Note that the command line arg is for the configuration directory NOT the > configuration file. > This change will be needed to allow YARN to automatically pull images at > localization time or within container executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5456) container-executor support for FreeBSD, NetBSD, and others if conf path is absolute
[ https://issues.apache.org/jira/browse/YARN-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405349#comment-15405349 ] Hudson commented on YARN-5456: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10198 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10198/]) YARN-5456. container-executor support for FreeBSD, NetBSD, and others if (cnauroth: rev b913677365ad77ca7daa5741c04c14df1a0313cd) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/get_executable.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/config.h.cmake > container-executor support for FreeBSD, NetBSD, and others if conf path is > absolute > --- > > Key: YARN-5456 > URL: https://issues.apache.org/jira/browse/YARN-5456 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, security >Affects Versions: 3.0.0-alpha2 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Labels: security > Fix For: 3.0.0-alpha2 > > Attachments: YARN-5456.00.patch, YARN-5456.01.patch > > > YARN-5121 fixed quite a few portability issues, but it also changed how it > determines it's location to be very operating specific for security reasons. > We should add support for FreeBSD to unbreak it's ports entry, NetBSD (the > sysctl options are just in a different order), and for operating systems that > do not have a defined method, an escape hatch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5430) Get container's ip and host from NM
[ https://issues.apache.org/jira/browse/YARN-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5430: -- Attachment: YARN-5430.3.patch > Get container's ip and host from NM > --- > > Key: YARN-5430 > URL: https://issues.apache.org/jira/browse/YARN-5430 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5430.1.patch, YARN-5430.2.patch, YARN-5430.3.patch > > > In YARN-4757, we introduced a DNS mechanism for containers. That's based on > the assumption that we can get the container's ip and host information and > store it in the registry-service. This jira aims to get the container's ip > and host from the NM, primarily docker container -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5342) Improve non-exclusive node partition resource allocation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405332#comment-15405332 ] Naganarasimha G R commented on YARN-5342: - Thanks for attaching the patch for 2.8, seems like jenkins run is fine, Test case failures are not related to the patch. [~wangda] can you commit patch and resolve this jira ? > Improve non-exclusive node partition resource allocation in Capacity Scheduler > -- > > Key: YARN-5342 > URL: https://issues.apache.org/jira/browse/YARN-5342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: YARN-5342-branch-2.8.001.patch, YARN-5342.1.patch, > YARN-5342.2.patch, YARN-5342.3.patch, YARN-5342.4.patch > > > In the previous implementation, one non-exclusive container allocation is > possible when the missed-opportunity >= #cluster-nodes. And > missed-opportunity will be reset when container allocated to any node. > This will slow down the frequency of container allocation on non-exclusive > node partition: *When a non-exclusive partition=x has idle resource, we can > only allocate one container for this app in every > X=nodemanagers.heartbeat-interval secs for the whole cluster.* > In this JIRA, I propose a fix to reset missed-opporunity only if we have >0 > pending resource for the non-exclusive partition OR we get allocation from > the default partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5448) Resource in Cluster Metrics is not sum of resources in all nodes of all partitions
[ https://issues.apache.org/jira/browse/YARN-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405320#comment-15405320 ] Naganarasimha G R commented on YARN-5448: - Thanks for sharing your thoughts [~wangda], bq. Sorry I may not quite sure about this. Could you explain? What i meant was, these additional non-usable resources.columns as part of cluster metrics table will be use full only when there is a configuration error and once corrected these columns are not of much use, basically these columns purpose will be almost nill if configured correctly. One alternative i can think of is show these columns only when partitions are not mapped to queues. and if value is zero then dont show, thoughts ? bq. which can help answering questions like "why I cannot fully utilize the cluster". One view point what i had for this was captured in the above [comment | https://issues.apache.org/jira/browse/YARN-5448?focusedCommentId=15399248=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15399248], but well again its a view point and debatable so dont have any hard restrictions on having it. bq. It's better to add a non-usable nodes as a separate col, but to me it may not a fully replacement of total non-usable resources. May be i did not get the rationale behind {{"total non-usable resources"}} would be better than {{"non-usable nodes"}}, can elaborate more on your view on this ? > Resource in Cluster Metrics is not sum of resources in all nodes of all > partitions > -- > > Key: YARN-5448 > URL: https://issues.apache.org/jira/browse/YARN-5448 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager, webapp >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodesPage.png, schedulerPage.png > > > Currently Resource info from Cluster Metrics are got from Queue Metrics's > *available resource + allocated resource*. Hence if there are some nodes > which belongs to partition but if that partition is not associated with any > queue then in the capacity scheduler partition hierarchy shows this nodes > resources under its partition but Cluster metrics doesn't show. > Apart from this in the Nodes page too Metrics overview table is shown. So if > we show Resource info from Queue Metrics User will not be able to co relate > it. (have attached the images for the same) > IIUC idea of not showing in the *Metrics overview table* is to highlight that > configuration is not proper. This needs to be some how conveyed through > parititon-by-queue-hierarchy chart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405321#comment-15405321 ] Jian He commented on YARN-5382: --- bq. I see only one audit log message when I ran a sleep job and killed it on pseudo-distributed setup on my laptop I checked the code more, that's because AppKilledTransition will not get the RMAppKillByClientEvent any more if there exists an attempt - AppKilledTransition is processing the event sent from RMAppAttempt if there exists the attempt. Anyway, this actually makes things better, because we won't have two audit logs. - This code is exactly the same in two places, would you make a common method for it ? {code} if (event instanceof RMAppKillByClientEvent) { RMAppKillByClientEvent killEvent = (RMAppKillByClientEvent) event; UserGroupInformation callerUGI = killEvent.getCallerUGI(); String userName = null; if (callerUGI != null) { userName = callerUGI.getShortUserName(); } InetAddress remoteIP = killEvent.getIp(); RMAuditLogger.logSuccess(userName, AuditConstants.KILL_APP_REQUEST, "RMAppImpl", event.getApplicationId(), remoteIP); } {code} - Isn't "greater than" the correct wording ? {code} -Assert.assertTrue("application start time is not greater than 0", +Assert.assertTrue("application start time is not greater then 0", {code} - several parameters are not used in this method testSuccessLogFormatHelperWithIP, remove them ? - nit highlighted by IDE: "returns the {@link CallerUGI}" the CallerUGI is actually not a link. > RM does not audit log kill request for active applications > -- > > Key: YARN-5382 > URL: https://issues.apache.org/jira/browse/YARN-5382 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Jason Lowe >Assignee: Vrushali C > Attachments: YARN-5382-branch-2.7.01.patch, > YARN-5382-branch-2.7.02.patch, YARN-5382-branch-2.7.03.patch, > YARN-5382-branch-2.7.04.patch, YARN-5382-branch-2.7.05.patch, > YARN-5382-branch-2.7.09.patch, YARN-5382-branch-2.7.10.patch, > YARN-5382.06.patch, YARN-5382.07.patch, YARN-5382.08.patch, > YARN-5382.09.patch, YARN-5382.10.patch > > > ClientRMService will audit a kill request but only if it either fails to > issue the kill or if the kill is sent to an already finished application. It > does not create a log entry when the application is active which is arguably > the most important case to audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405299#comment-15405299 ] Sangeetha Abdu Jyothi commented on YARN-5327: - Please note that the failed test in unrelated to this patch. > API changes required to support recurring reservations in the YARN > ReservationSystem > > > Key: YARN-5327 > URL: https://issues.apache.org/jira/browse/YARN-5327 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Subru Krishnan >Assignee: Sangeetha Abdu Jyothi > Attachments: YARN-5327.001.patch, YARN-5327.002.patch, > YARN-5327.003.patch > > > YARN-5326 proposes adding native support for recurring reservations in the > YARN ReservationSystem. This JIRA is a sub-task to track the changes needed > in ApplicationClientProtocol to accomplish it. Please refer to the design doc > in the parent JIRA for details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
[ https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405281#comment-15405281 ] Naganarasimha G R commented on YARN-4624: - [~sunilg], As discussed offline safer option is to go with patch 1, so can you rebase the patch so that we can make progress in this jira ? > NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI > --- > > Key: YARN-4624 > URL: https://issues.apache.org/jira/browse/YARN-4624 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, > YARN-4624-003.patch, YARN-4624.patch > > > Scenario: > === > Configure nodelables and add to cluster > Start the cluster > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5410) Bootstrap Router module
[ https://issues.apache.org/jira/browse/YARN-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405277#comment-15405277 ] Hadoop QA commented on YARN-5410: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 59s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 12s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 19s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 29s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 15s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 19s {color} | {color:green} YARN-2915 passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server hadoop-yarn-project {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 46s {color} | {color:green} YARN-2915 passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 46s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 13s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 27s {color} | {color:red} root: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server hadoop-yarn-project {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 12s {color} | {color:red} hadoop-yarn-server-router in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 9s {color} | {color:green} hadoop-project in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 22s {color} | {color:red} hadoop-yarn-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 14s {color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 30s {color} | {color:red} hadoop-yarn-project in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 177m 50s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.TestDirectoryCollection | | | hadoop.yarn.server.nodemanager.TestDirectoryCollection | \\ \\ || Subsystem ||
[jira] [Commented] (YARN-3664) Federation PolicyStore internal APIs
[ https://issues.apache.org/jira/browse/YARN-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405172#comment-15405172 ] Hadoop QA commented on YARN-3664: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 49s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s {color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 20m 56s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821736/YARN-3664-YARN-2915-v3.patch | | JIRA Issue | YARN-3664 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux d0051a266a11 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-2915 / 22db8fd | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12621/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12621/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Federation PolicyStore internal APIs > > > Key: YARN-3664 > URL: https://issues.apache.org/jira/browse/YARN-3664 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru
[jira] [Commented] (YARN-5428) Allow for specifying the docker client configuration directory
[ https://issues.apache.org/jira/browse/YARN-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405147#comment-15405147 ] Zhankun Tang commented on YARN-5428: Yes. Agree. It can store other settings besides credentials. Since the credential won't expire (just base64 encoded, not fetched from server) if username and password doesn't change, the administrator must store it in advance. > Allow for specifying the docker client configuration directory > -- > > Key: YARN-5428 > URL: https://issues.apache.org/jira/browse/YARN-5428 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-5428.001.patch, YARN-5428.002.patch, > YARN-5428.003.patch, YARN-5428.004.patch > > > The docker client allows for specifying a configuration directory that > contains the docker client's configuration. It is common to store "docker > login" credentials in this config, to avoid the need to docker login on each > cluster member. > By default the docker client config is $HOME/.docker/config.json on Linux. > However, this does not work with the current container executor user > switching and it may also be desirable to centralize this configuration > beyond the single user's home directory. > Note that the command line arg is for the configuration directory NOT the > configuration file. > This change will be needed to allow YARN to automatically pull images at > localization time or within container executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3664) Federation PolicyStore internal APIs
[ https://issues.apache.org/jira/browse/YARN-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-3664: - Attachment: YARN-3664-YARN-2915-v3.patch Fixing Yetus warnings (v3). > Federation PolicyStore internal APIs > > > Key: YARN-3664 > URL: https://issues.apache.org/jira/browse/YARN-3664 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3664-YARN-2915-v0.patch, > YARN-3664-YARN-2915-v1.patch, YARN-3664-YARN-2915-v2.patch, > YARN-3664-YARN-2915-v3.patch > > > The federation Policy Store contains information about the capacity > allocations made by users, their mapping to sub-clusters and the policies > that each of the components (Router, AMRMPRoxy, RMs) should enforce -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5468) Scheduling of long-running applications
[ https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405142#comment-15405142 ] Konstantinos Karanasos commented on YARN-5468: -- Thanks for the comment, [~cheersyang]. Yes, I have read YARN-4902. It is definitely related, but in this JIRA we are focusing in particular on the scheduling of long running jobs/services. In that sense, YARN-4902 is more general. On the other hand, unlike YARN-4902, we will be providing the option of service planning, that is, we will be able to look at multiple services at once and plan their execution in a more holistic manner than the scheduler can do (given that the scheduler looks at one resource request at a time). This can be seen as related to the planning phase of YARN-1051. > Scheduling of long-running applications > --- > > Key: YARN-5468 > URL: https://issues.apache.org/jira/browse/YARN-5468 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacityscheduler, fairscheduler >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-5468.prototype.patch > > > This JIRA is about the scheduling of applications with long-running tasks. > It will include adding support to the YARN for a richer set of scheduling > constraints (such as affinity, anti-affinity, cardinality and time > constraints), and extending the schedulers to take them into account during > placement of containers to nodes. > We plan to have both an online version that will accommodate such requests as > they arrive, as well as a Long-running Application Planner that will make > more global decisions by considering multiple applications at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5410) Bootstrap Router module
[ https://issues.apache.org/jira/browse/YARN-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-5410: --- Attachment: YARN-5410-YARN-2915-v1.patch > Bootstrap Router module > --- > > Key: YARN-5410 > URL: https://issues.apache.org/jira/browse/YARN-5410 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-5410-YARN-2915-v1.patch > > > As detailed in the proposal in the umbrella JIRA, we are introducing a new > component that routes client request to appropriate ResourceManager(s). This > JIRA tracks the creation of a new sub-module for the Router. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5451) Container localizers that hang are not cleaned up
[ https://issues.apache.org/jira/browse/YARN-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405060#comment-15405060 ] Brook Zhou commented on YARN-5451: -- Is this because the ContainerLocalizer is launched in a separate process from LCE with a timeOutInterval of 0? > Container localizers that hang are not cleaned up > - > > Key: YARN-5451 > URL: https://issues.apache.org/jira/browse/YARN-5451 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe > > I ran across an old, rogue process on one of our nodes. It apparently was a > container localizer that somehow entered an infinite loop during startup. > The NM never cleaned up this broken localizer, so it happily ran forever. > The NM needs to do a better job of tracking localizers, including killing > them if they appear to be hung/broken. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5468) Scheduling of long-running applications
[ https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405052#comment-15405052 ] Panagiotis Garefalakis edited comment on YARN-5468 at 8/3/16 12:03 AM: --- Attaching a patch to showcase above proposal. In this first patch we are introducing allocation tags and three placement constraints: affinity, anti-affinity, cardinality. We are planning to consolidate those in a single constraint in the 2nd version of the patch. For the time being we do not support time constraints. In the current version the requests are accommodated in an online, greedy fashion. We extend distribute-shell application Client and AM to demonstrate inter-job placement constraints. Some unit-tests are also included to show the supported constraints (affinity, anti-affinity, and cardinality) in Node and Rack level. was (Author: pgaref): Attaching a patch to showcase above proposal. In this first patch we are introducing allocation tags and three placement constraints: affinity, anti-affinity, cardinality. We are planning to consolidate those in a single constraint in the 2nd version of the patch. For the time being we do not support time constraints. We extend distribute-shell Client and AM to demonstrate affinity inter-job constraints. Some unit-tests are also included to show the supported constraints (affinity, anti-affinity, and cardinality) in Node and Rack level. > Scheduling of long-running applications > --- > > Key: YARN-5468 > URL: https://issues.apache.org/jira/browse/YARN-5468 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacityscheduler, fairscheduler >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-5468.prototype.patch > > > This JIRA is about the scheduling of applications with long-running tasks. > It will include adding support to the YARN for a richer set of scheduling > constraints (such as affinity, anti-affinity, cardinality and time > constraints), and extending the schedulers to take them into account during > placement of containers to nodes. > We plan to have both an online version that will accommodate such requests as > they arrive, as well as a Long-running Application Planner that will make > more global decisions by considering multiple applications at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405056#comment-15405056 ] Hadoop QA commented on YARN-5382: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 36s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 21s {color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} branch-2.7 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} branch-2.7 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s {color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} branch-2.7 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 8s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager in branch-2.7 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} branch-2.7 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} branch-2.7 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 24s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 34 new + 684 unchanged - 7 fixed = 718 total (was 691) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 3946 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 1m 41s {color} | {color:red} The patch 96 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 17s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_101. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_101 with JDK v1.7.0_101 generated 3 new + 2 unchanged - 0 fixed = 5 total (was 2) {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 48s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_101. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 23s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 132m 53s {color} | {color:black} {color}
[jira] [Commented] (YARN-5468) Scheduling of long-running applications
[ https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405045#comment-15405045 ] Weiwei Yang commented on YARN-5468: --- Hi [~kkaranasos] Have you read YARN-4902? It looks like what you are trying to address here has some overlap with that one. > Scheduling of long-running applications > --- > > Key: YARN-5468 > URL: https://issues.apache.org/jira/browse/YARN-5468 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacityscheduler, fairscheduler >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > > This JIRA is about the scheduling of applications with long-running tasks. > It will include adding support to the YARN for a richer set of scheduling > constraints (such as affinity, anti-affinity, cardinality and time > constraints), and extending the schedulers to take them into account during > placement of containers to nodes. > We plan to have both an online version that will accommodate such requests as > they arrive, as well as a Long-running Application Planner that will make > more global decisions by considering multiple applications at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-5468) Scheduling of long-running applications
[ https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated YARN-5468: - Comment: was deleted (was: Uploading a first prototype) > Scheduling of long-running applications > --- > > Key: YARN-5468 > URL: https://issues.apache.org/jira/browse/YARN-5468 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacityscheduler, fairscheduler >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > > This JIRA is about the scheduling of applications with long-running tasks. > It will include adding support to the YARN for a richer set of scheduling > constraints (such as affinity, anti-affinity, cardinality and time > constraints), and extending the schedulers to take them into account during > placement of containers to nodes. > We plan to have both an online version that will accommodate such requests as > they arrive, as well as a Long-running Application Planner that will make > more global decisions by considering multiple applications at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5468) Scheduling of long-running applications
[ https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated YARN-5468: - Attachment: LRS-Constraints-v2.patch Uploading a first prototype > Scheduling of long-running applications > --- > > Key: YARN-5468 > URL: https://issues.apache.org/jira/browse/YARN-5468 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacityscheduler, fairscheduler >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: LRS-Constraints-v2.patch > > > This JIRA is about the scheduling of applications with long-running tasks. > It will include adding support to the YARN for a richer set of scheduling > constraints (such as affinity, anti-affinity, cardinality and time > constraints), and extending the schedulers to take them into account during > placement of containers to nodes. > We plan to have both an online version that will accommodate such requests as > they arrive, as well as a Long-running Application Planner that will make > more global decisions by considering multiple applications at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5121) fix some container-executor portability issues
[ https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-5121: --- Component/s: security > fix some container-executor portability issues > -- > > Key: YARN-5121 > URL: https://issues.apache.org/jira/browse/YARN-5121 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, security >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Blocker > Labels: security > Fix For: 3.0.0-alpha2 > > Attachments: YARN-5121.00.patch, YARN-5121.01.patch, > YARN-5121.02.patch, YARN-5121.03.patch, YARN-5121.04.patch, > YARN-5121.06.patch, YARN-5121.07.patch, YARN-5121.08.patch > > > container-executor has some issues that are preventing it from even compiling > on the OS X jenkins instance. Let's fix those. While we're there, let's > also try to take care of some of the other portability problems that have > crept in over the years, since it used to work great on Solaris but now > doesn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3664) Federation PolicyStore internal APIs
[ https://issues.apache.org/jira/browse/YARN-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405005#comment-15405005 ] Hadoop QA commented on YARN-3664: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 16s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 16s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821710/YARN-3664-YARN-2915-v2.patch | | JIRA Issue | YARN-3664 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux 2f6ebd4a847a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-2915 / 22db8fd | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/12619/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/12619/artifact/patchprocess/whitespace-eol.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12619/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12619/console | | Powered by |
[jira] [Updated] (YARN-5121) fix some container-executor portability issues
[ https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-5121: --- Labels: security (was: ) > fix some container-executor portability issues > -- > > Key: YARN-5121 > URL: https://issues.apache.org/jira/browse/YARN-5121 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, security >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Blocker > Labels: security > Fix For: 3.0.0-alpha2 > > Attachments: YARN-5121.00.patch, YARN-5121.01.patch, > YARN-5121.02.patch, YARN-5121.03.patch, YARN-5121.04.patch, > YARN-5121.06.patch, YARN-5121.07.patch, YARN-5121.08.patch > > > container-executor has some issues that are preventing it from even compiling > on the OS X jenkins instance. Let's fix those. While we're there, let's > also try to take care of some of the other portability problems that have > crept in over the years, since it used to work great on Solaris but now > doesn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5456) container-executor support for FreeBSD, NetBSD, and others if conf path is absolute
[ https://issues.apache.org/jira/browse/YARN-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-5456: --- Component/s: security > container-executor support for FreeBSD, NetBSD, and others if conf path is > absolute > --- > > Key: YARN-5456 > URL: https://issues.apache.org/jira/browse/YARN-5456 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, security >Affects Versions: 3.0.0-alpha2 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Labels: security > Attachments: YARN-5456.00.patch, YARN-5456.01.patch > > > YARN-5121 fixed quite a few portability issues, but it also changed how it > determines it's location to be very operating specific for security reasons. > We should add support for FreeBSD to unbreak it's ports entry, NetBSD (the > sysctl options are just in a different order), and for operating systems that > do not have a defined method, an escape hatch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5456) container-executor support for FreeBSD, NetBSD, and others if conf path is absolute
[ https://issues.apache.org/jira/browse/YARN-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-5456: --- Labels: security (was: ) > container-executor support for FreeBSD, NetBSD, and others if conf path is > absolute > --- > > Key: YARN-5456 > URL: https://issues.apache.org/jira/browse/YARN-5456 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, security >Affects Versions: 3.0.0-alpha2 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Labels: security > Attachments: YARN-5456.00.patch, YARN-5456.01.patch > > > YARN-5121 fixed quite a few portability issues, but it also changed how it > determines it's location to be very operating specific for security reasons. > We should add support for FreeBSD to unbreak it's ports entry, NetBSD (the > sysctl options are just in a different order), and for operating systems that > do not have a defined method, an escape hatch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5468) Scheduling of long-running applications
[ https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405002#comment-15405002 ] Konstantinos Karanasos commented on YARN-5468: -- We will shortly upload a first prototype just to get some initial feedback. In this first patch we are introducing the placement constraints and extend the CapacityScheduler to take them into account during scheduling. We will soon upload a design document too. > Scheduling of long-running applications > --- > > Key: YARN-5468 > URL: https://issues.apache.org/jira/browse/YARN-5468 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacityscheduler, fairscheduler >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > > This JIRA is about the scheduling of applications with long-running tasks. > It will include adding support to the YARN for a richer set of scheduling > constraints (such as affinity, anti-affinity, cardinality and time > constraints), and extending the schedulers to take them into account during > placement of containers to nodes. > We plan to have both an online version that will accommodate such requests as > they arrive, as well as a Long-running Application Planner that will make > more global decisions by considering multiple applications at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5456) container-executor support for FreeBSD, NetBSD, and others if conf path is absolute
[ https://issues.apache.org/jira/browse/YARN-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404998#comment-15404998 ] Allen Wittenauer commented on YARN-5456: I have a Kerberized Ubuntu/x86 VM that I generally use for testing things. Popped trunk+this patch onto it. Looks like things are working the way they are supposed to. Ran a simple sleep streaming job and ended up with the following dirs in the nm-local-dir: {code} root@ku:/tmp/hadoop-yarn/nm-local-dir# find . -user aw -type d -ls 93674 drwxr-s--- 4 aw yarn 4096 Aug 2 16:08 ./usercache/aw 93684 drwxr-s--- 3 aw yarn 4096 Aug 2 16:08 ./usercache/aw/appcache 93704 drwxr-s--- 7 aw yarn 4096 Aug 2 16:08 ./usercache/aw/appcache/application_1470179247859_0001 {code} after job finished, directories disappeared as expected. > container-executor support for FreeBSD, NetBSD, and others if conf path is > absolute > --- > > Key: YARN-5456 > URL: https://issues.apache.org/jira/browse/YARN-5456 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.0.0-alpha2 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Attachments: YARN-5456.00.patch, YARN-5456.01.patch > > > YARN-5121 fixed quite a few portability issues, but it also changed how it > determines it's location to be very operating specific for security reasons. > We should add support for FreeBSD to unbreak it's ports entry, NetBSD (the > sysctl options are just in a different order), and for operating systems that > do not have a defined method, an escape hatch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5468) Scheduling of long-running applications
Konstantinos Karanasos created YARN-5468: Summary: Scheduling of long-running applications Key: YARN-5468 URL: https://issues.apache.org/jira/browse/YARN-5468 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, fairscheduler Reporter: Konstantinos Karanasos Assignee: Konstantinos Karanasos This JIRA is about the scheduling of applications with long-running tasks. It will include adding support to the YARN for a richer set of scheduling constraints (such as affinity, anti-affinity, cardinality and time constraints), and extending the schedulers to take them into account during placement of containers to nodes. We plan to have both an online version that will accommodate such requests as they arrive, as well as a Long-running Application Planner that will make more global decisions by considering multiple applications at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5444) Fix failing unit tests in TestLinuxContainerExecutorWithMocks
[ https://issues.apache.org/jira/browse/YARN-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404981#comment-15404981 ] Yufei Gu commented on YARN-5444: Thanks a lot for the review and committing, [~vvasudev]. > Fix failing unit tests in TestLinuxContainerExecutorWithMocks > - > > Key: YARN-5444 > URL: https://issues.apache.org/jira/browse/YARN-5444 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 2.9.0 > > Attachments: YARN-5444.001.patch > > > Test case {{testLaunchCommandWithoutPriority}} and {{testStartLocalizer}} are > based on the assumption that Yarn configuration files won't be loaded, which > is not true in some situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3664) Federation PolicyStore internal APIs
[ https://issues.apache.org/jira/browse/YARN-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-3664: - Attachment: YARN-3664-YARN-2915-v2.patch [~vvasudev], thanks for your feedback. Attaching patch (v2) that incorporates your comments. Similar to YARN-5307, the names is definitely along the lines but are not exactly what you suggested as I have to tried to align with the final version of YARN-3662 which includes [~vinodkv]/[~leftnoteasy]'s feedback too. bq. Can we use something other than ByteBuffer for getParams - this'll become a problem if you ever expose this information via REST API or wish to update the object via a REST API(marshalling/unmarshalling ByteBuffer can be painful) I agree with your observation but we couldn't think of a better alternative based on the current understanding of the policy space (refer: YARN-5324/YARN-5325). Also since we have established this is an internal API, I feel we can revisit once the dust settles on the policies post testing. So I have left it as ByteBuffer for now. > Federation PolicyStore internal APIs > > > Key: YARN-3664 > URL: https://issues.apache.org/jira/browse/YARN-3664 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3664-YARN-2915-v0.patch, > YARN-3664-YARN-2915-v1.patch, YARN-3664-YARN-2915-v2.patch > > > The federation Policy Store contains information about the capacity > allocations made by users, their mapping to sub-clusters and the policies > that each of the components (Router, AMRMPRoxy, RMs) should enforce -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404966#comment-15404966 ] Hadoop QA commented on YARN-5327: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 29s {color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 29m 30s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.logaggregation.TestAggregatedLogFormat | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821701/YARN-5327.003.patch | | JIRA Issue | YARN-5327 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux e19297eedbf8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d28c2d9 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/12618/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt | | unit test logs |
[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404963#comment-15404963 ] Junping Du commented on YARN-4676: -- bq. If NM crashes (for example, JVM exit due to out of heap), it suppose to restart automatically, instead of waiting fur human to start it. Isn't that the general practice? I don't think this is a general case as YARN deployment cases could be various - in many cases (especially at on-premise environment), NM is not supposed to be so fragile and admin need to figure out what's happening before NM crash. Also, even we want to make NM get restart immediately (without human assistant/trouble-shoot), the auto restart logic is outside of YARN but belongs to some cluster deployment/monitor tools like Ambari. Here, we'd better not to have many assumptions. bq. But nothing prevent/disallow the NM daemon from restart, wither automatically or by human. When such NM restart, it will try to register itself to RM, which will be told to shutdown if it still appear in the exclude list. Such node will remain as DECOMMISSIONED inside RM until 10+ minutes later into LOST after the EXPIRE event. As I said above, this belongs to admin's behavior or your monitor tools logic. Just like if an admin is madly to keep starting a NM which belongs to decommissioned node, YARN can do nothing about it but just keep shutdown NM. Such node should always keep as DECOMMISSIONED and I don't see any benefit to move it to EXPIRE status. bq. Such DECOMMISSIONED node can be recommissioned (refreshNodes after it is removed from the exclude list). During which it is transition into RUNNING state. I don't see this hack can bring any benefit, comparing with refreshNode with moving it to include list and restart the NM deamon which will go through normal register process. The risk is we need to take care a separated code path that is dedicated for this minor case. bq. These behavior appears to me as robust instead of hacking. It appears that the behavior you expected relies on a separate mechanism that permanently shutdown NM once it is DECOMMISSIONED. I never hear we need a separate mechanism to shutdown NM once it is decommissioned. It should be built-in behavior for Apache Hadoop YARN so far. Are you talking about a private/specific branch rather than current trunk/branch-2? bq. As long as such DECOMMISSIONED node never try to register or be recommissioned, yes, I expect these transitions you listed could be removed. The re-register of node after taking refreshNode operation is going through the normal register process which is good enough for me. I don't think we need some change here unless we have strong reasons. So. Yes. Please remove these transitions because this is not correct based on current YARN's logic. bq. So I see these transitions are really needed. That said, I could removed them and maintain them privately inside EMR branch for the sake of getting this JIRA going. I can understand the pain point to maintain a private branch - may be standing at your private (EMR) branch, these pieces of code could be needed. However, as a community contributor, you have to switch your roles to stand at community code base in trunk/branch-2, and we committers can only help to get in pieces of code that benefit the whole community. If these piece of code can be important for another story (like resource elasticity of YARN) to benefit the community, we can move it out to another dedicated work but we need to have open discussion on design/implementation ahead - that's the right process for patch/feature contribution. bq. These transitions are there almost single the beginning of this JIRA, any other comments/surprises? These issues already make me surprised enough - these transitions in RMNode belongs to very key logic to YARN, and we need to be careful as always. I need more time to review the rest of code. Hopefully, I can finish my 1st round tomorrow and publish the left comments. > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, > GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, > YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, > YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, > YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, > YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, > YARN-4676.018.patch,
[jira] [Commented] (YARN-5406) In-memory based implementation of the FederationMembershipStateStore
[ https://issues.apache.org/jira/browse/YARN-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404941#comment-15404941 ] Subru Krishnan commented on YARN-5406: -- Thanks [~ellenfkh] for the patch and [~jianhe] for the review. I have a few minor comments: * I agree with [~jianhe] that the _impl_ package should be a sub-package of the _store_ package. * Rename {{FederationInMemoryMembershipStateStore}} --> {{MemoryFederationStateStore}} and the corresponding test. * We need to validate the inputs (like null checks). Since this is common across different store implementations, I have created YARN-5467 to track this. * All the tests are for positive cases, can we add a few for negative cases. * I think we should add a _isSubClusterActive_ method to {{SubClusterState}} and use it. * Can you update the Javadoc for _FilterInactiveSubClusters_ as requested by [~jianhe]. > In-memory based implementation of the FederationMembershipStateStore > > > Key: YARN-5406 > URL: https://issues.apache.org/jira/browse/YARN-5406 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Ellen Hui > Attachments: YARN-5406-YARN-2915.v0.patch > > > YARN-3662 defines the FederationMembershipStateStore API. This JIRA tracks an > in-memory based implementation which is useful for both single-box testing > and for future unit tests that depend on the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5467) InputValidator for the FederationStateStore internal APIs
Subru Krishnan created YARN-5467: Summary: InputValidator for the FederationStateStore internal APIs Key: YARN-5467 URL: https://issues.apache.org/jira/browse/YARN-5467 Project: Hadoop YARN Issue Type: Sub-task Reporter: Subru Krishnan Assignee: Giovanni Matteo Fumarola We need to check for mandatory fields, well formed-ness (for address fields) etc of input params to FederationStateStore. This is common across all Store implementations and can be used as a _fail-fast_ mechanism on the client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5390) Federation Subcluster Resolver
[ https://issues.apache.org/jira/browse/YARN-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403189#comment-15403189 ] Ellen Hui edited comment on YARN-5390 at 8/2/16 10:38 PM: -- Hi [~leftnoteasy], thanks for the quick feedback! * This interface will be used in the three patches you looked at, although you are correct that they have not been updated yet. For instance, the LocalityMulticastAMRMProxyFederationPolicy prototype in YARN-5325 uses the FederationSubClusterResolver interface to split resource requests. There are some examples of the resolver being used in the splitResourceRequests method of that class, although some of the classnames are out of date. From the javadoc: {panel} host localized ResourceRequest are always forwarded to the RM that owns the node, based on the feedback of a FederationSubClusterResolver rack localized ResourceRequest are forwarded to the RM that owns the rack (if the FederationSubClusterResolver provides this info) or they are forwarded as if they were ANY (this is important for deployment that stripe racks across sub-clusters) as there is not a single resolution. ANY request corresponding to node/rack local requests are only forwarded to the set of RMs that owns the node-local requests. The number of containers listed in each ANY is proportional to the number of node-local container requests (associated to this ANY via the same allocateRequestId) {panel} * The FederationInterceptor from YARN-5325 will be responsible for managing the lifecyle of the SubClusterResolver. * I think it's better to leave the SubClusterResolver methods non-static, since we want to allow the implementation to be pluggable and I can't think of a particular reason it should be static. Please let me know if you disagree, I may be missing something. Thanks! was (Author: ellenfkh): Hi [~wangda], thanks for the quick feedback! This interface will be used in the three patches you looked at, although you are correct that they have not been updated yet. For instance, the LocalityMulticastAMRMProxyFederationPolicy prototype in YARN-5325 uses the FederationSubClusterResolver interface to split resource requests. From the javadoc: host localized ResourceRequest are always forwarded to the RM that owns the node, based on the feedback of a FederationSubClusterResolver rack localized ResourceRequest are forwarded to the RM that owns the rack (if the FederationSubClusterResolver provides this info) or they are forwarded as if they were ANY (this is important for deployment that stripe racks across sub-clusters) as there is not a single resolution. ANY request corresponding to node/rack local requests are only forwarded to the set of RMs that owns the node-local requests. The number of containers listed in each ANY is proportional to the number of node-local container requests (associated to this ANY via the same allocateRequestId) There are some examples of the resolver being used in the splitResourceRequests method of the same class, although some of the classnames are out of date. The FederationInterceptor from YARN-5325 will be responsible for managing the lifecyle of the SubClusterResolver. I think it's better to leave the SubClusterResolver methods non-static, since we want to allow the implementation to be pluggable and I can't think of a particular reason it should be static. Please let me know if you disagree, I may be missing something. Thanks! > Federation Subcluster Resolver > -- > > Key: YARN-5390 > URL: https://issues.apache.org/jira/browse/YARN-5390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Ellen Hui > Attachments: YARN-5390-YARN-2915.v0.patch, > YARN-5390-YARN-2915.v1.patch, YARN-5390-YARN-2915.v2.patch > > > This JIRA tracks effort to create a mechanism to resolve nodes/racks resource > names to sub-cluster identifiers. This is needed by the federation policies > in YARN-5323, YARN-5324, YARN-5325 to operate correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeetha Abdu Jyothi updated YARN-5327: Attachment: YARN-5327.003.patch > API changes required to support recurring reservations in the YARN > ReservationSystem > > > Key: YARN-5327 > URL: https://issues.apache.org/jira/browse/YARN-5327 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Subru Krishnan >Assignee: Sangeetha Abdu Jyothi > Attachments: YARN-5327.001.patch, YARN-5327.002.patch, > YARN-5327.003.patch > > > YARN-5326 proposes adding native support for recurring reservations in the > YARN ReservationSystem. This JIRA is a sub-task to track the changes needed > in ApplicationClientProtocol to accomplish it. Please refer to the design doc > in the parent JIRA for details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5456) container-executor support for FreeBSD, NetBSD, and others if conf path is absolute
[ https://issues.apache.org/jira/browse/YARN-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404892#comment-15404892 ] Chris Nauroth commented on YARN-5456: - [~aw], patch 01 looks good. I verified this on OS X, Linux and FreeBSD. It's cool to see the test passing on FreeBSD this time around! My only other suggestion is to try deploying this change in a secured cluster for a bit of manual testing before we commit. > container-executor support for FreeBSD, NetBSD, and others if conf path is > absolute > --- > > Key: YARN-5456 > URL: https://issues.apache.org/jira/browse/YARN-5456 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.0.0-alpha2 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Attachments: YARN-5456.00.patch, YARN-5456.01.patch > > > YARN-5121 fixed quite a few portability issues, but it also changed how it > determines it's location to be very operating specific for security reasons. > We should add support for FreeBSD to unbreak it's ports entry, NetBSD (the > sysctl options are just in a different order), and for operating systems that > do not have a defined method, an escape hatch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5382) RM does not audit log kill request for active applications
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404788#comment-15404788 ] Vrushali C edited comment on YARN-5382 at 8/2/16 9:08 PM: -- So with the last uploaded patch v9 on branch-2.7 (https://issues.apache.org/jira/secure/attachment/12821498/YARN-5382-branch-2.7.09.patch) . The 2.7 patch does not have logging in ClientRMService. I see only one audit log message when I ran a sleep job and killed it on pseudo-distributed setup on my laptop. {code} [machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ grep -i Rmauditlogg logs/yarn-vchannapattan-resourcemanager-machine13-channapattan.log | grep -i Kill 2016-08-02 14:00:19,186 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=vchannapattan IP=127.0.0.1OPERATION=Kill Application Request TARGET=RMAppImpl RESULT=SUCCESS APPID=application_1470171585834_0001 2016-08-02 14:00:19,195 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=vchannapattan OPERATION=Application Finished - Killed TARGET=RMAppManager RESULT=SUCCES APPID=application_1470171585834_0001 [machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ {code} On another window: {code} [machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.4-SNAPSHOT.jar sleep -m 100 -r 1000 -mt 300 -rt 300 16/08/02 14:00:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/08/02 14:00:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/08/02 14:00:05 INFO mapreduce.JobSubmitter: number of splits:100 16/08/02 14:00:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470171585834_0001 16/08/02 14:00:05 INFO impl.YarnClientImpl: Submitted application application_1470171585834_0001 16/08/02 14:00:05 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1470171585834_0001/ 16/08/02 14:00:05 INFO mapreduce.Job: Running job: job_1470171585834_0001 ^C [machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ bin/yarn application -kill application_1470171585834_0001 16/08/02 14:00:16 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/08/02 14:00:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Killing application application_1470171585834_0001 16/08/02 14:00:19 INFO impl.YarnClientImpl: Killed application application_1470171585834_0001 [t-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ {code} I need to update the patch for trunk to include removal of the audit logging upon isAppFinalStateStored check. was (Author: vrushalic): So with the last uploaded patch v9 on branch-2.7 (https://issues.apache.org/jira/secure/attachment/12821498/YARN-5382-branch-2.7.09.patch) . The 2.7 patch does not have logging in ClientRMService. I see only one audit log message when I ran a sleep job and killed it on pseudo-distributed setup on my laptop. {code} [machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ grep -i Rmauditlogg logs/yarn-vchannapattan-resourcemanager-machine13-channapattan.log | grep -i Kill 2016-08-02 14:00:19,186 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=vchannapattan IP=127.0.0.1OPERATION=Kill Application Request TARGET=RMAppImpl RESULT=SUCCESS APPID=application_1470171585834_0001 2016-08-02 14:00:19,195 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=vchannapattan OPERATION=Application Finished - Killed TARGET=RMAppManager RESULT=SUCCES APPID=application_1470171585834_0001 [machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ {code} On another window: [machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.4-SNAPSHOT.jar sleep -m 100 -r 1000 -mt 300 -rt 300 16/08/02 14:00:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/08/02 14:00:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/08/02 14:00:05 INFO mapreduce.JobSubmitter: number of splits:100 16/08/02 14:00:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470171585834_0001 16/08/02 14:00:05 INFO impl.YarnClientImpl: Submitted application application_1470171585834_0001 16/08/02 14:00:05 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1470171585834_0001/ 16/08/02 14:00:05 INFO mapreduce.Job: Running job: job_1470171585834_0001 ^C [machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ bin/yarn application -kill
[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404788#comment-15404788 ] Vrushali C commented on YARN-5382: -- So with the last uploaded patch v9 on branch-2.7 (https://issues.apache.org/jira/secure/attachment/12821498/YARN-5382-branch-2.7.09.patch) . The 2.7 patch does not have logging in ClientRMService. I see only one audit log message when I ran a sleep job and killed it on pseudo-distributed setup on my laptop. {code} [machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ grep -i Rmauditlogg logs/yarn-vchannapattan-resourcemanager-machine13-channapattan.log | grep -i Kill 2016-08-02 14:00:19,186 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=vchannapattan IP=127.0.0.1OPERATION=Kill Application Request TARGET=RMAppImpl RESULT=SUCCESS APPID=application_1470171585834_0001 2016-08-02 14:00:19,195 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=vchannapattan OPERATION=Application Finished - Killed TARGET=RMAppManager RESULT=SUCCES APPID=application_1470171585834_0001 [machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ {code} On another window: [machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.4-SNAPSHOT.jar sleep -m 100 -r 1000 -mt 300 -rt 300 16/08/02 14:00:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/08/02 14:00:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/08/02 14:00:05 INFO mapreduce.JobSubmitter: number of splits:100 16/08/02 14:00:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470171585834_0001 16/08/02 14:00:05 INFO impl.YarnClientImpl: Submitted application application_1470171585834_0001 16/08/02 14:00:05 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1470171585834_0001/ 16/08/02 14:00:05 INFO mapreduce.Job: Running job: job_1470171585834_0001 ^C [machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ bin/yarn application -kill application_1470171585834_0001 16/08/02 14:00:16 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/08/02 14:00:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Killing application application_1470171585834_0001 16/08/02 14:00:19 INFO impl.YarnClientImpl: Killed application application_1470171585834_0001 [t-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ {code} I need to update the patch for trunk to include removal of the audit logging upon isAppFinalStateStored check. > RM does not audit log kill request for active applications > -- > > Key: YARN-5382 > URL: https://issues.apache.org/jira/browse/YARN-5382 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Jason Lowe >Assignee: Vrushali C > Attachments: YARN-5382-branch-2.7.01.patch, > YARN-5382-branch-2.7.02.patch, YARN-5382-branch-2.7.03.patch, > YARN-5382-branch-2.7.04.patch, YARN-5382-branch-2.7.05.patch, > YARN-5382-branch-2.7.09.patch, YARN-5382.06.patch, YARN-5382.07.patch, > YARN-5382.08.patch, YARN-5382.09.patch > > > ClientRMService will audit a kill request but only if it either fails to > issue the kill or if the kill is sent to an already finished application. It > does not create a log entry when the application is active which is arguably > the most important case to audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5307) Federation Application State Store internal APIs
[ https://issues.apache.org/jira/browse/YARN-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404774#comment-15404774 ] Hadoop QA commented on YARN-5307: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 7s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 2s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821681/YARN-5307-YARN-2915-v4.patch | | JIRA Issue | YARN-5307 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux 857cc5bf8d72 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-2915 / 22db8fd | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12616/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12616/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Federation Application State Store internal APIs > > > Key: YARN-5307 > URL: https://issues.apache.org/jira/browse/YARN-5307 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >
[jira] [Updated] (YARN-5307) Federation Application State Store internal APIs
[ https://issues.apache.org/jira/browse/YARN-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-5307: - Attachment: YARN-5307-YARN-2915-v4.patch Updated patch (v4) with minor typo fixes to private methods > Federation Application State Store internal APIs > > > Key: YARN-5307 > URL: https://issues.apache.org/jira/browse/YARN-5307 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-5307-YARN-2915-v1.patch, > YARN-5307-YARN-2915-v2.patch, YARN-5307-YARN-2915-v3.patch, > YARN-5307-YARN-2915-v4.patch > > > The Federation Application State encapsulates the mapping between an > application and it's _home_ sub-cluster, i.e. the sub-cluster to which it is > submitted to by the Router. Please refer to the design doc in parent JIRA for > further details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404694#comment-15404694 ] Jason Lowe commented on YARN-5382: -- If we keep the kill success logging in both a transition and in ClientRMService then we'll get two audit logs instead of one. I also don't think it's as simple as removing the one from KillAttemptTransition since then we won't get a log if the RM fails over just as it saved the killed state of an app but before it executed the AppKilledTransition. IMHO we need to log it once, before we enter the FINAL_SAVING state to record the killed transition. Then we might get two audit logs during a failover (one on each RM instance) but that's far preferable to none. > RM does not audit log kill request for active applications > -- > > Key: YARN-5382 > URL: https://issues.apache.org/jira/browse/YARN-5382 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Jason Lowe >Assignee: Vrushali C > Attachments: YARN-5382-branch-2.7.01.patch, > YARN-5382-branch-2.7.02.patch, YARN-5382-branch-2.7.03.patch, > YARN-5382-branch-2.7.04.patch, YARN-5382-branch-2.7.05.patch, > YARN-5382-branch-2.7.09.patch, YARN-5382.06.patch, YARN-5382.07.patch, > YARN-5382.08.patch, YARN-5382.09.patch > > > ClientRMService will audit a kill request but only if it either fails to > issue the kill or if the kill is sent to an already finished application. It > does not create a log entry when the application is active which is arguably > the most important case to audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404645#comment-15404645 ] Daniel Zhi commented on YARN-4676: -- If NM crashes (for example, JVM exit due to out of heap), it suppose to restart automatically, instead of waiting fur human to start it. Isn't that the general practice? NM code, upon receive shutdown from RM, will exit self. But nothing prevent/disallow the NM daemon from restart, wither automatically or by human. When such NM restart, it will try to register itself to RM, which will be told to shutdown if it still appear in the exclude list. Such node will remain as DECOMMISSIONED inside RM until 10+ minutes later into LOST after the EXPIRE event. Such DECOMMISSIONED node can be recommissioned (refreshNodes after it is removed from the exclude list). During which it is transition into RUNNING state. These behavior appears to me as robust instead of hacking. It appears that the behavior you expected relies on a separate mechanism that permanently shutdown NM once it is DECOMMISSIONED. As long as such DECOMMISSIONED node never try to register or be recommissioned, yes, I expect these transitions you listed could be removed. So I see these transitions are really needed. That said, I could removed them and maintain them privately inside EMR branch for the sake of getting this JIRA going. These transitions are there almost single the beginning of this JIRA, any other comments/surprises? > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, > GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, > YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, > YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, > YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, > YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, > YARN-4676.018.patch, YARN-4676.019.patch > > > YARN-4676 implements an automatic, asynchronous and flexible mechanism to > graceful decommission > YARN nodes. After user issues the refreshNodes request, ResourceManager > automatically evaluates > status of all affected nodes to kicks out decommission or recommission > actions. RM asynchronously > tracks container and application status related to DECOMMISSIONING nodes to > decommission the > nodes immediately after there are ready to be decommissioned. Decommissioning > timeout at individual > nodes granularity is supported and could be dynamically updated. The > mechanism naturally supports multiple > independent graceful decommissioning “sessions” where each one involves > different sets of nodes with > different timeout settings. Such support is ideal and necessary for graceful > decommission request issued > by external cluster management software instead of human. > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4888) Changes in RM container allocation for identifying resource-requests explicitly
[ https://issues.apache.org/jira/browse/YARN-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404644#comment-15404644 ] Subru Krishnan commented on YARN-4888: -- The checkstyle issue is to do with more than 7 parameters and the test case failure is unrelated. > Changes in RM container allocation for identifying resource-requests > explicitly > --- > > Key: YARN-4888 > URL: https://issues.apache.org/jira/browse/YARN-4888 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-4888-WIP.patch, YARN-4888-v0.patch, > YARN-4888-v2.patch, YARN-4888-v3.patch, YARN-4888-v4.patch, > YARN-4888-v5.patch, YARN-4888-v6.patch, YARN-4888.001.patch > > > YARN-4879 puts forward the notion of identifying allocate requests > explicitly. This JIRA is to track the changes in RM app scheduling data > structures to accomplish it. Please refer to the design doc in the parent > JIRA for details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404634#comment-15404634 ] Vrushali C commented on YARN-5382: -- Thanks [~jianhe] and [~jlowe]. Apologies, I somehow missed that the success logging in ClientRMService on the isAppFinalStateStored snuck back in. I think that happened when I rebased to latest during one of the patches. Will remove it now. [~jianhe], Would it be then okay to keep the logging in RMAppImpl#AppKilledTransition as well as ClientRMService? Will remove the one in RMAppImpl#KillAttemptTransition. > RM does not audit log kill request for active applications > -- > > Key: YARN-5382 > URL: https://issues.apache.org/jira/browse/YARN-5382 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Jason Lowe >Assignee: Vrushali C > Attachments: YARN-5382-branch-2.7.01.patch, > YARN-5382-branch-2.7.02.patch, YARN-5382-branch-2.7.03.patch, > YARN-5382-branch-2.7.04.patch, YARN-5382-branch-2.7.05.patch, > YARN-5382-branch-2.7.09.patch, YARN-5382.06.patch, YARN-5382.07.patch, > YARN-5382.08.patch, YARN-5382.09.patch > > > ClientRMService will audit a kill request but only if it either fails to > issue the kill or if the kill is sent to an already finished application. It > does not create a log entry when the application is active which is arguably > the most important case to audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5466) DefaultContainerExecutor needs JavaDocs
[ https://issues.apache.org/jira/browse/YARN-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-5466: --- Attachment: YARN-5466.001.patch This patch adds JavaDocs and does some basic cleanup. I'd love some confirmation that my interpretations of the methods are all accurate. > DefaultContainerExecutor needs JavaDocs > --- > > Key: YARN-5466 > URL: https://issues.apache.org/jira/browse/YARN-5466 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Minor > Attachments: YARN-5466.001.patch > > > Following on YARN-5455, let's document the DefaultContainerExecutor as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404555#comment-15404555 ] Junping Du commented on YARN-4676: -- Thanks for sharing these details, Daniel. bq. In the typical EMR cluster scenario, daemon like NM will be configured to auto-start if killed/shutdown, however RM will reject such NM if it appear in the exclude list. In today's YARN (community version), if RM reject NM's register request, NM should get terminated directly. I think we should follow existing behavior or it could be incompatible issues there. bq. 1, DECOMMISSIONED NM, will try to register to RM but will be rejected. It continue such loop until either: 1) the host being terminated; 2) the host being recommissioned. It was likely the DECOMMISSIONED->LOST transition is defensive coding — without it invalid event throws. I can understand we want to gain the scale in and out capability here for cluster's elasticity. However, I am not sure how much benefit we can gain from this hacking behavior - it sounds like we just saving NM daemon start time which is several seconds in most cases which is trivial comparing with container launching ad running. Do I miss other benefit here? bq. It was likely the DECOMMISSIONED->LOST transition is defensive coding — without it invalid event throws. As I mentioned above, we should remove watching DECOMMISSIONED node which is unnecessary to consume RM resource to take care of it. If EXPIRE event get throw in your case, then we should check something wrong there (like race-condition, etc.) and fix there. bq. CLEANUP_CONTAINER and CLEANUP_APP were for sure added to prevent otherwise invalid event exception at the DECOMMISSIONED state I can understand we want to get rid of any annoy invalid transition in our logs. However, similar to what I mentioned above, we need to find out where we send these events and check if these case are valid or belongs to bug due to race condition, etc. Even if we really sure some of events are hard to get rid of, we should empty the transition here as any logic in transition is not necessary. > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, > GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, > YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, > YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, > YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, > YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, > YARN-4676.018.patch, YARN-4676.019.patch > > > YARN-4676 implements an automatic, asynchronous and flexible mechanism to > graceful decommission > YARN nodes. After user issues the refreshNodes request, ResourceManager > automatically evaluates > status of all affected nodes to kicks out decommission or recommission > actions. RM asynchronously > tracks container and application status related to DECOMMISSIONING nodes to > decommission the > nodes immediately after there are ready to be decommissioned. Decommissioning > timeout at individual > nodes granularity is supported and could be dynamically updated. The > mechanism naturally supports multiple > independent graceful decommissioning “sessions” where each one involves > different sets of nodes with > different timeout settings. Such support is ideal and necessary for graceful > decommission request issued > by external cluster management software instead of human. > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5226) remove AHS enable check from LogsCLI#fetchAMContainerLogs
[ https://issues.apache.org/jira/browse/YARN-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404483#comment-15404483 ] Hudson commented on YARN-5226: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10194 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10194/]) YARN-5226. Remove AHS enable check from LogsCLI#fetchAMContainerLogs. (junping_du: rev 3818393297c7b337e380e8111a55f2ac4745cb83) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java > remove AHS enable check from LogsCLI#fetchAMContainerLogs > - > > Key: YARN-5226 > URL: https://issues.apache.org/jira/browse/YARN-5226 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.9.0 > > Attachments: YARN-5226.1.patch, YARN-5226.2.patch, YARN-5226.3.patch, > YARN-5226.4.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404463#comment-15404463 ] Daniel Zhi commented on YARN-4676: -- I can clarify the scenarios: 1. DECOMMISSIONED->RUNNING this happens due to the RECOMMISSION event which is triggered when the node is removed from exclude file (node can be dynamically excluded or included). In the typical EMR cluster scenario, daemon like NM will be configured to auto-start if killed/shutdown, however RM will reject such NM if it appear in the exclude list. 2. Related to 1, DECOMMISSIONED NM, upon auto-restart, will try to register to RM but will be rejected. It continue such loop until either: 1) the host being terminated; 2) the host being recommissioned. It was likely the DECOMMISSIONED->LOST transition is defensive coding --- without it invalid event throws 3. CLEANUP_CONTAINER and CLEANUP_APP were for sure added to prevent otherwise invalid event exception at the DECOMMISSIONED state. So the core reason related to these transitions are related to DECOMMISSIONED NMs are "active standby" (and could be RECOMMISSIONed without delay in any moment) until the hosts being terminated in EMR scenario. > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, > GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, > YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, > YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, > YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, > YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, > YARN-4676.018.patch, YARN-4676.019.patch > > > YARN-4676 implements an automatic, asynchronous and flexible mechanism to > graceful decommission > YARN nodes. After user issues the refreshNodes request, ResourceManager > automatically evaluates > status of all affected nodes to kicks out decommission or recommission > actions. RM asynchronously > tracks container and application status related to DECOMMISSIONING nodes to > decommission the > nodes immediately after there are ready to be decommissioned. Decommissioning > timeout at individual > nodes granularity is supported and could be dynamically updated. The > mechanism naturally supports multiple > independent graceful decommissioning “sessions” where each one involves > different sets of nodes with > different timeout settings. Such support is ideal and necessary for graceful > decommission request issued > by external cluster management software instead of human. > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5466) DefaultContainerExecutor needs JavaDocs
Daniel Templeton created YARN-5466: -- Summary: DefaultContainerExecutor needs JavaDocs Key: YARN-5466 URL: https://issues.apache.org/jira/browse/YARN-5466 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.8.0 Reporter: Daniel Templeton Assignee: Daniel Templeton Priority: Minor Following on YARN-5455, let's document the DefaultContainerExecutor as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5226) remove AHS enable check from LogsCLI#fetchAMContainerLogs
[ https://issues.apache.org/jira/browse/YARN-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404454#comment-15404454 ] Junping Du commented on YARN-5226: -- The test failure is not related. I have commit v4 patch to trunk and branch-2. Thanks [~xgong] for patch contribution! > remove AHS enable check from LogsCLI#fetchAMContainerLogs > - > > Key: YARN-5226 > URL: https://issues.apache.org/jira/browse/YARN-5226 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-5226.1.patch, YARN-5226.2.patch, YARN-5226.3.patch, > YARN-5226.4.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404441#comment-15404441 ] Junping Du commented on YARN-4676: -- Below transitions for RMNode doesn't sound correct: {noformat} + .addTransition(NodeState.DECOMMISSIONED, NodeState.RUNNING, + RMNodeEventType.RECOMMISSION, + new RecommissionNodeTransition(NodeState.RUNNING)) + .addTransition(NodeState.DECOMMISSIONED, NodeState.DECOMMISSIONED, + RMNodeEventType.CLEANUP_CONTAINER, new CleanUpContainerTransition()) + .addTransition(NodeState.DECOMMISSIONED, NodeState.LOST, + RMNodeEventType.EXPIRE, new DeactivateNodeTransition(NodeState.LOST)) + .addTransition(NodeState.DECOMMISSIONED, NodeState.DECOMMISSIONED, + RMNodeEventType.CLEANUP_APP, new CleanUpAppTransition()) {noformat} 1. RMNode in DECOMMISSIONED status shouldn't transit to RUNNING. The only way to make a decommissioned node to be active again is through two steps: 1) put this node back to RM include-node list and call refreshNode CLI, 2) restart the NM to register to RM again. The different with DECOMMISSIONING node is: decommissioning node is still running so step 2 is not needed. For DECOMMISSIONED node, we never know when step 2 will happen so we shouldn't mark node as RUNNING. 2. Transmit node from DECOMMISSIONED to LOST is not necessary. Node in DECOMMISSIONED is already down and won't have heartbeat to RM again so we stop heartbeat monitor against this node and no chance for EXPIRE event get sent. 3. For CleanUpContainerTransition(), CleanUpAppTransition() (and AddContainersToBeRemovedFromNMTransition() in existing code base), I don't think this is necessary as NM get decommissioned should already be clean up in state - this is different with NM shutdown where we keep container running and keeping NM state up-to-date. > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, > GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, > YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, > YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, > YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, > YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, > YARN-4676.018.patch, YARN-4676.019.patch > > > YARN-4676 implements an automatic, asynchronous and flexible mechanism to > graceful decommission > YARN nodes. After user issues the refreshNodes request, ResourceManager > automatically evaluates > status of all affected nodes to kicks out decommission or recommission > actions. RM asynchronously > tracks container and application status related to DECOMMISSIONING nodes to > decommission the > nodes immediately after there are ready to be decommissioned. Decommissioning > timeout at individual > nodes granularity is supported and could be dynamically updated. The > mechanism naturally supports multiple > independent graceful decommissioning “sessions” where each one involves > different sets of nodes with > different timeout settings. Such support is ideal and necessary for graceful > decommission request issued > by external cluster management software instead of human. > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior
[ https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404354#comment-15404354 ] Robert Kanter commented on YARN-5465: - I think the second option is better. Even though updating the timeout of a currently decommissioning node is harder, it's at least possible to have different sets of decommissioning nodes with different timeouts, which seems like a common scenario to me. The first option doesn't allow you to do this at all. > Server-Side NM Graceful Decommissioning subsequent call behavior > > > Key: YARN-5465 > URL: https://issues.apache.org/jira/browse/YARN-5465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Robert Kanter > > The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has > the following behavior when subsequent calls are made: > # Start a long-running job that has containers running on nodeA > # Add nodeA to the exclude file > # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA > # Wait 30 seconds > # Add nodeB to the exclude file > # Run {{-refreshNodes -g 30 -server}} (30sec) > # After 30 seconds, both nodeA and nodeB shut down > In a nutshell, issuing a subsequent call to gracefully decommission nodes > updates the timeout for any currently decommissioning nodes. This makes it > impossible to gracefully decommission different sets of nodes with different > timeouts. Though it does let you easily update the timeout of currently > decommissioning nodes. > Another behavior we could do is this: > # {color:grey}Start a long-running job that has containers running on nodeA > # {color:grey}Add nodeA to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully > decommissioning nodeA{color} > # {color:grey}Wait 30 seconds{color} > # {color:grey}Add nodeB to the exclude file{color} > # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color} > # After 30 seconds, nodeB shuts down > # After 60 more seconds, nodeA shuts down > This keeps the nodes affected by each call to gracefully decommission nodes > independent. You can now have different sets of decommissioning nodes with > different timeouts. However, to update the timeout of a currently > decommissioning node, you'd have to first recommission it, and then > decommission it again. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior
Robert Kanter created YARN-5465: --- Summary: Server-Side NM Graceful Decommissioning subsequent call behavior Key: YARN-5465 URL: https://issues.apache.org/jira/browse/YARN-5465 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Kanter The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has the following behavior when subsequent calls are made: # Start a long-running job that has containers running on nodeA # Add nodeA to the exclude file # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully decommissioning nodeA # Wait 30 seconds # Add nodeB to the exclude file # Run {{-refreshNodes -g 30 -server}} (30sec) # After 30 seconds, both nodeA and nodeB shut down In a nutshell, issuing a subsequent call to gracefully decommission nodes updates the timeout for any currently decommissioning nodes. This makes it impossible to gracefully decommission different sets of nodes with different timeouts. Though it does let you easily update the timeout of currently decommissioning nodes. Another behavior we could do is this: # {color:grey}Start a long-running job that has containers running on nodeA # {color:grey}Add nodeA to the exclude file{color} # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully decommissioning nodeA{color} # {color:grey}Wait 30 seconds{color} # {color:grey}Add nodeB to the exclude file{color} # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color} # After 30 seconds, nodeB shuts down # After 60 more seconds, nodeA shuts down This keeps the nodes affected by each call to gracefully decommission nodes independent. You can now have different sets of decommissioning nodes with different timeouts. However, to update the timeout of a currently decommissioning node, you'd have to first recommission it, and then decommission it again. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4717) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup
[ https://issues.apache.org/jira/browse/YARN-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404351#comment-15404351 ] Eric Badger commented on YARN-4717: --- [~templedf], [~rkanter], can we cherry-pick this back to 2.7? I just saw this failure in our nightly build. > TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails > Intermittently due to IllegalArgumentException from cleanup > --- > > Key: YARN-4717 > URL: https://issues.apache.org/jira/browse/YARN-4717 > Project: Hadoop YARN > Issue Type: Test > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Minor > Fix For: 2.9.0 > > Attachments: YARN-4717.001.patch > > > The same issue that was resolved by [~zxu] in YARN-3602 is back. Looks like > the commons-io package throws an IAE instead of an IOE now if the directory > doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5226) remove AHS enable check from LogsCLI#fetchAMContainerLogs
[ https://issues.apache.org/jira/browse/YARN-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404349#comment-15404349 ] Hadoop QA commented on YARN-5226: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 24s {color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 20m 20s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.api.impl.TestYarnClient | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821422/YARN-5226.4.patch | | JIRA Issue | YARN-5226 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 8bc210e63249 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b3018e7 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/12614/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | unit test logs | https://builds.apache.org/job/PreCommit-YARN-Build/12614/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12614/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12614/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > remove AHS enable check from LogsCLI#fetchAMContainerLogs >
[jira] [Resolved] (YARN-5463) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup
[ https://issues.apache.org/jira/browse/YARN-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger resolved YARN-5463. --- Resolution: Duplicate Closing as a dup of YARN-4717. Not sure how I didn't see the old one before I opened this one. Oops > TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails > Intermittently due to IllegalArgumentException from cleanup > --- > > Key: YARN-5463 > URL: https://issues.apache.org/jira/browse/YARN-5463 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: YARN-5463.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5463) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup
[ https://issues.apache.org/jira/browse/YARN-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404324#comment-15404324 ] Hadoop QA commented on YARN-5463: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} YARN-5463 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821644/YARN-5463.001.patch | | JIRA Issue | YARN-5463 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12615/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails > Intermittently due to IllegalArgumentException from cleanup > --- > > Key: YARN-5463 > URL: https://issues.apache.org/jira/browse/YARN-5463 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: YARN-5463.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA
[ https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-5464: Target Version/s: 2.9.0 > Server-Side NM Graceful Decommissioning with RM HA > -- > > Key: YARN-5464 > URL: https://issues.apache.org/jira/browse/YARN-5464 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Robert Kanter >Assignee: Robert Kanter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA
Robert Kanter created YARN-5464: --- Summary: Server-Side NM Graceful Decommissioning with RM HA Key: YARN-5464 URL: https://issues.apache.org/jira/browse/YARN-5464 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Kanter Assignee: Robert Kanter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404312#comment-15404312 ] Robert Kanter commented on YARN-4676: - One final minor thing: - In {{RMAdminCLI#refreshNodes}}, the client and server tracking is mutually exclusive. So we shouldn't need the 5 second grace period because it should only be one or the other. +1 after that. [~djp], can you also take a look at the latest patch? > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, > GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, > YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, > YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, > YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, > YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, > YARN-4676.018.patch, YARN-4676.019.patch > > > YARN-4676 implements an automatic, asynchronous and flexible mechanism to > graceful decommission > YARN nodes. After user issues the refreshNodes request, ResourceManager > automatically evaluates > status of all affected nodes to kicks out decommission or recommission > actions. RM asynchronously > tracks container and application status related to DECOMMISSIONING nodes to > decommission the > nodes immediately after there are ready to be decommissioned. Decommissioning > timeout at individual > nodes granularity is supported and could be dynamically updated. The > mechanism naturally supports multiple > independent graceful decommissioning “sessions” where each one involves > different sets of nodes with > different timeout settings. Such support is ideal and necessary for graceful > decommission request issued > by external cluster management software instead of human. > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5463) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup
[ https://issues.apache.org/jira/browse/YARN-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-5463: -- Attachment: YARN-5463.001.patch Attaching patch to catch and ignore IllegalArgumentExceptions along with the IOExceptions. > TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails > Intermittently due to IllegalArgumentException from cleanup > --- > > Key: YARN-5463 > URL: https://issues.apache.org/jira/browse/YARN-5463 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: YARN-5463.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5463) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup
Eric Badger created YARN-5463: - Summary: TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup Key: YARN-5463 URL: https://issues.apache.org/jira/browse/YARN-5463 Project: Hadoop YARN Issue Type: Bug Reporter: Eric Badger Assignee: Eric Badger -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5463) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup
[ https://issues.apache.org/jira/browse/YARN-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404293#comment-15404293 ] Eric Badger commented on YARN-5463: --- YARN-3602 fixed IOException, we need to now add IllegalArgumentException > TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails > Intermittently due to IllegalArgumentException from cleanup > --- > > Key: YARN-5463 > URL: https://issues.apache.org/jira/browse/YARN-5463 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5462) TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown fails intermittently
[ https://issues.apache.org/jira/browse/YARN-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404238#comment-15404238 ] Hadoop QA commented on YARN-5462: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 2s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 24s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 26m 44s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821637/YARN-5462.001.patch | | JIRA Issue | YARN-5462 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 69c7294b0aa4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7fc70c6 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12613/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12613/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown fails > intermittently > -- > > Key: YARN-5462 > URL: https://issues.apache.org/jira/browse/YARN-5462 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: YARN-5462.001.patch > > > {noformat} > java.io.IOException: Failed on local exception:
[jira] [Commented] (YARN-5333) Some recovered apps are put into default queue when RM HA
[ https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404243#comment-15404243 ] Hadoop QA commented on YARN-5333: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 254 unchanged - 0 fixed = 255 total (was 254) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 37m 22s {color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 51m 50s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821629/YARN-5333.06.patch | | JIRA Issue | YARN-5333 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 0eb4108d33ba 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7fc70c6 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/12612/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12612/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12612/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Some recovered apps are put into default queue when RM HA > - > > Key: YARN-5333 > URL:
[jira] [Commented] (YARN-5287) LinuxContainerExecutor fails to set proper permission
[ https://issues.apache.org/jira/browse/YARN-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404235#comment-15404235 ] Varun Vasudev commented on YARN-5287: - [~Ying Zhang] the patch no longer applies cleanly on trunk. It would be great if you could make a few minor changes - 1) {code} +/** + * * Function to prepare the container directories. + * * It creates the container work and log directories. + **/ {code} Please change the comment to follow the same format as other comments(no need for the extra "*" on the individual lines) 2) {code} +// This test is used to verify that app and container directories can be +// created with required permissions when umask has been set to a restrictive +// value of 077. {code} Change the formatting to follow {code}/** .. */{code} 3) {code}+ //Create container directories for "app_5"{code} Add space between // and Create > LinuxContainerExecutor fails to set proper permission > - > > Key: YARN-5287 > URL: https://issues.apache.org/jira/browse/YARN-5287 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.2 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-5287-tmp.patch, YARN-5287.003.patch, > YARN-5287.004.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > LinuxContainerExecutor fails to set the proper permissions on the local > directories(i.e., /hadoop/yarn/local/usercache/... by default) if the cluster > has been configured with a restrictive umask, e.g.: umask 077. Job failed due > to the following reason: > Path /hadoop/yarn/local/usercache/ambari-qa/appcache/application_ has > permission 700 but needs permission 750 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5287) LinuxContainerExecutor fails to set proper permission
[ https://issues.apache.org/jira/browse/YARN-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404170#comment-15404170 ] Naganarasimha G R commented on YARN-5287: - Thanks for the patch [~Ying Zhang], +1 , Latest patch LGTM, If no more comments will wait for some time and commit it ! > LinuxContainerExecutor fails to set proper permission > - > > Key: YARN-5287 > URL: https://issues.apache.org/jira/browse/YARN-5287 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.2 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-5287-tmp.patch, YARN-5287.003.patch, > YARN-5287.004.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > LinuxContainerExecutor fails to set the proper permissions on the local > directories(i.e., /hadoop/yarn/local/usercache/... by default) if the cluster > has been configured with a restrictive umask, e.g.: umask 077. Job failed due > to the following reason: > Path /hadoop/yarn/local/usercache/ambari-qa/appcache/application_ has > permission 700 but needs permission 750 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5462) TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown fails intermittently
[ https://issues.apache.org/jira/browse/YARN-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-5462: -- Attachment: YARN-5462.001.patch Attaching patch that adds an extra barrier to the serviceStop method for the NM. This way the RPC interfaces won't get torn down before the container gets started and so the connection won't be dropped. > TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown fails > intermittently > -- > > Key: YARN-5462 > URL: https://issues.apache.org/jira/browse/YARN-5462 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: YARN-5462.001.patch > > > {noformat} > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: > "slave-02.adcd.infra.corp.gq1.yahoo.com/69.147.96.229"; destination host is: > "127.0.0.1":12345; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776) > at org.apache.hadoop.ipc.Client.call(Client.java:1457) > at org.apache.hadoop.ipc.Client.call(Client.java:1390) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy78.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:101) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:248) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1492) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:508) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1730) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1078) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:977) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5462) TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown fails intermittently
Eric Badger created YARN-5462: - Summary: TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown fails intermittently Key: YARN-5462 URL: https://issues.apache.org/jira/browse/YARN-5462 Project: Hadoop YARN Issue Type: Bug Reporter: Eric Badger Assignee: Eric Badger {noformat} java.io.IOException: Failed on local exception: java.io.IOException: Connection reset by peer; Host Details : local host is: "slave-02.adcd.infra.corp.gq1.yahoo.com/69.147.96.229"; destination host is: "127.0.0.1":12345; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776) at org.apache.hadoop.ipc.Client.call(Client.java:1457) at org.apache.hadoop.ipc.Client.call(Client.java:1390) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy78.startContainers(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:101) at org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:248) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1492) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:508) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1730) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1078) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:977) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5333) Some recovered apps are put into default queue when RM HA
[ https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404150#comment-15404150 ] Jun Gong commented on YARN-5333: Attach a new patch. According to the suggestion, I abstracted refreshXXXWithout functions that do refresh without checking RM status. About the test case, it needs be bounded to a specific scheduler(either Capacity or FairScheduler) to reproduce the error case, so there is no change for it. Is it OK? > Some recovered apps are put into default queue when RM HA > - > > Key: YARN-5333 > URL: https://issues.apache.org/jira/browse/YARN-5333 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-5333.01.patch, YARN-5333.02.patch, > YARN-5333.03.patch, YARN-5333.04.patch, YARN-5333.05.patch, YARN-5333.06.patch > > > Enable RM HA and use FairScheduler, > {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, > {{yarn.scheduler.fair.user-as-default-queue}} is set to false. > Reproduce steps: > 1. Start two RMs. > 2. After RMs are running, change both RM's file > {{etc/hadoop/fair-scheduler.xml}}, then add some queues. > 3. Submit some apps to the new added queues. > 4. Stop the active RM, then the standby RM will transit to active and recover > apps. > However the new active RM will put recovered apps into default queue because > it might have not loaded the new {{fair-scheduler.xml}}. We need call > {{initScheduler}} before start active services or bring {{refreshAll()}} in > front of {{rm.transitionToActive()}}. *It seems it is also important for > other scheduler*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5333) Some recovered apps are put into default queue when RM HA
[ https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-5333: --- Attachment: YARN-5333.06.patch > Some recovered apps are put into default queue when RM HA > - > > Key: YARN-5333 > URL: https://issues.apache.org/jira/browse/YARN-5333 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-5333.01.patch, YARN-5333.02.patch, > YARN-5333.03.patch, YARN-5333.04.patch, YARN-5333.05.patch, YARN-5333.06.patch > > > Enable RM HA and use FairScheduler, > {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, > {{yarn.scheduler.fair.user-as-default-queue}} is set to false. > Reproduce steps: > 1. Start two RMs. > 2. After RMs are running, change both RM's file > {{etc/hadoop/fair-scheduler.xml}}, then add some queues. > 3. Submit some apps to the new added queues. > 4. Stop the active RM, then the standby RM will transit to active and recover > apps. > However the new active RM will put recovered apps into default queue because > it might have not loaded the new {{fair-scheduler.xml}}. We need call > {{initScheduler}} before start active services or bring {{refreshAll()}} in > front of {{rm.transitionToActive()}}. *It seems it is also important for > other scheduler*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5430) Get container's ip and host from NM
[ https://issues.apache.org/jira/browse/YARN-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404142#comment-15404142 ] Hadoop QA commented on YARN-5430: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 56s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 34s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 37s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 16 new + 160 unchanged - 1 fixed = 176 total (was 161) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 51s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 19s {color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 8s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 44m 36s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | | Format-string method String.format(String, Object[]) called with format string "Shell execution failed:with format string "Shell execution failed: ExitCode = %s Stderr: %s Stdout: %s Command:" wants 3 arguments but is given 4 in org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(List, PrivilegedOperation, File, Map, boolean, boolean) At PrivilegedOperationExecutor.java:[line 160] | | Failed junit tests | hadoop.yarn.logaggregation.TestAggregatedLogFormat | | |
[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404090#comment-15404090 ] Jason Lowe commented on YARN-5382: -- bq. Does user expect audit logging both before killing and after killing successfully ? Ideally from the ClientRMService perspective it should be logged when the request comes in, just like web servers audit log requests they serve. Unfortunately the polling-for-killed logic makes this messy to implement cleanly, so logging once when the app is killed would be the next best option, IMHO. Sorry I missed the fact that the success logging in ClientRMService was still there. At some point it was removed, but I missed it didn't stay that way. I also missed that AttemptKilledTransition and AppKilledTransition can both be triggered for an app being killed. > RM does not audit log kill request for active applications > -- > > Key: YARN-5382 > URL: https://issues.apache.org/jira/browse/YARN-5382 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Jason Lowe >Assignee: Vrushali C > Attachments: YARN-5382-branch-2.7.01.patch, > YARN-5382-branch-2.7.02.patch, YARN-5382-branch-2.7.03.patch, > YARN-5382-branch-2.7.04.patch, YARN-5382-branch-2.7.05.patch, > YARN-5382-branch-2.7.09.patch, YARN-5382.06.patch, YARN-5382.07.patch, > YARN-5382.08.patch, YARN-5382.09.patch > > > ClientRMService will audit a kill request but only if it either fails to > issue the kill or if the kill is sent to an already finished application. It > does not create a log entry when the application is active which is arguably > the most important case to audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5461) Port initial slider-core module code into yarn
Jian He created YARN-5461: - Summary: Port initial slider-core module code into yarn Key: YARN-5461 URL: https://issues.apache.org/jira/browse/YARN-5461 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5160) Add timeout when starting JobHistoryServer in MiniMRYarnCluster
[ https://issues.apache.org/jira/browse/YARN-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404059#comment-15404059 ] Hadoop QA commented on YARN-5160: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 115m 42s {color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 128m 35s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.mapred.TestMRCJCFileOutputCommitter | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821558/YARN-5160.01.patch | | JIRA Issue | YARN-5160 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux e507e0c41f2d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7fc70c6 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/12610/artifact/patchprocess/whitespace-eol.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/12610/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt | | unit test logs | https://builds.apache.org/job/PreCommit-YARN-Build/12610/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12610/testReport/ | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12610/console | |
[jira] [Updated] (YARN-5430) Get container's ip and host from NM
[ https://issues.apache.org/jira/browse/YARN-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5430: -- Attachment: YARN-5430.2.patch > Get container's ip and host from NM > --- > > Key: YARN-5430 > URL: https://issues.apache.org/jira/browse/YARN-5430 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5430.1.patch, YARN-5430.2.patch > > > In YARN-4757, we introduced a DNS mechanism for containers. That's based on > the assumption that we can get the container's ip and host information and > store it in the registry-service. This jira aims to get the container's ip > and host from the NM, primarily docker container -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5458) Rename DockerStopCommandTest to TestDockerStopCommand
[ https://issues.apache.org/jira/browse/YARN-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403852#comment-15403852 ] Hudson commented on YARN-5458: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10192 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10192/]) YARN-5458. Rename DockerStopCommandTest to TestDockerStopCommand. (vvasudev: rev 7fc70c6422da3602ad9d4364493f25454a1de50c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerStopCommandTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerStopCommand.java > Rename DockerStopCommandTest to TestDockerStopCommand > - > > Key: YARN-5458 > URL: https://issues.apache.org/jira/browse/YARN-5458 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Trivial > Fix For: 2.9.0 > > Attachments: YARN-5458.001.patch > > > DockerStopCommandTest does not follow the naming convention for test classes, > rename it to TestDockerStopCommand -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5443) Add support for docker inspect command
[ https://issues.apache.org/jira/browse/YARN-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403853#comment-15403853 ] Hudson commented on YARN-5443: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10192 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10192/]) YARN-5443. Add support for docker inspect command. Contributed by Shane (vvasudev: rev 2e7c2a13a853b8195bc4f51f6c3c1f61656c2b33) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerInspectCommand.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerInspectCommand.java > Add support for docker inspect command > -- > > Key: YARN-5443 > URL: https://issues.apache.org/jira/browse/YARN-5443 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Fix For: 2.9.0 > > Attachments: YARN-5443.001.patch, YARN-5443.002.patch > > > Similar to the DockerStopCommand and DockerRunCommand, it would be desirable > to have a DockerInspectCommand. The initial use is for retrieving a > containers status, but many other uses are possible (IP information, volume > information, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5458) Rename DockerStopCommandTest to TestDockerStopCommand
[ https://issues.apache.org/jira/browse/YARN-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-5458: Issue Type: Sub-task (was: Bug) Parent: YARN-3611 > Rename DockerStopCommandTest to TestDockerStopCommand > - > > Key: YARN-5458 > URL: https://issues.apache.org/jira/browse/YARN-5458 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Trivial > Fix For: 2.9.0 > > Attachments: YARN-5458.001.patch > > > DockerStopCommandTest does not follow the naming convention for test classes, > rename it to TestDockerStopCommand -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5443) Add support for docker inspect command
[ https://issues.apache.org/jira/browse/YARN-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403826#comment-15403826 ] Shane Kumpf commented on YARN-5443: --- Thanks, [~vvasudev]! > Add support for docker inspect command > -- > > Key: YARN-5443 > URL: https://issues.apache.org/jira/browse/YARN-5443 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Fix For: 2.9.0 > > Attachments: YARN-5443.001.patch, YARN-5443.002.patch > > > Similar to the DockerStopCommand and DockerRunCommand, it would be desirable > to have a DockerInspectCommand. The initial use is for retrieving a > containers status, but many other uses are possible (IP information, volume > information, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5458) Rename DockerStopCommandTest to TestDockerStopCommand
[ https://issues.apache.org/jira/browse/YARN-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403823#comment-15403823 ] Shane Kumpf commented on YARN-5458: --- Thanks [~vvasudev]! > Rename DockerStopCommandTest to TestDockerStopCommand > - > > Key: YARN-5458 > URL: https://issues.apache.org/jira/browse/YARN-5458 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Trivial > Fix For: 2.9.0 > > Attachments: YARN-5458.001.patch > > > DockerStopCommandTest does not follow the naming convention for test classes, > rename it to TestDockerStopCommand -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5458) Rename DockerStopCommandTest to TestDockerStopCommand
[ https://issues.apache.org/jira/browse/YARN-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403793#comment-15403793 ] Varun Vasudev commented on YARN-5458: - +1, committing this. > Rename DockerStopCommandTest to TestDockerStopCommand > - > > Key: YARN-5458 > URL: https://issues.apache.org/jira/browse/YARN-5458 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Trivial > Attachments: YARN-5458.001.patch > > > DockerStopCommandTest does not follow the naming convention for test classes, > rename it to TestDockerStopCommand -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5443) Add support for docker inspect command
[ https://issues.apache.org/jira/browse/YARN-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-5443: Summary: Add support for docker inspect command (was: Add support for docker inspect) > Add support for docker inspect command > -- > > Key: YARN-5443 > URL: https://issues.apache.org/jira/browse/YARN-5443 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-5443.001.patch, YARN-5443.002.patch > > > Similar to the DockerStopCommand and DockerRunCommand, it would be desirable > to have a DockerInspectCommand. The initial use is for retrieving a > containers status, but many other uses are possible (IP information, volume > information, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5428) Allow for specifying the docker client configuration directory
[ https://issues.apache.org/jira/browse/YARN-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403785#comment-15403785 ] Shane Kumpf commented on YARN-5428: --- We don't pass down the $HOME environment variable or expand out ~, so the default setting of ~/.docker/config.json will not be honored when running docker containers on YARN. This patch will give you choice as to where you store the config.json file. An administrator still needs to deploy config.json to the location specified by this configuration. Deploying the config.json file pre-populated with credentials is an alternative to the interactive "docker login" command. Also note that other client configuration can be stored in config.json, such as formatting rules, http proxy settings and a few others. > Allow for specifying the docker client configuration directory > -- > > Key: YARN-5428 > URL: https://issues.apache.org/jira/browse/YARN-5428 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-5428.001.patch, YARN-5428.002.patch, > YARN-5428.003.patch, YARN-5428.004.patch > > > The docker client allows for specifying a configuration directory that > contains the docker client's configuration. It is common to store "docker > login" credentials in this config, to avoid the need to docker login on each > cluster member. > By default the docker client config is $HOME/.docker/config.json on Linux. > However, this does not work with the current container executor user > switching and it may also be desirable to centralize this configuration > beyond the single user's home directory. > Note that the command line arg is for the configuration directory NOT the > configuration file. > This change will be needed to allow YARN to automatically pull images at > localization time or within container executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5459) Add support for docker rm
[ https://issues.apache.org/jira/browse/YARN-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403756#comment-15403756 ] Shane Kumpf commented on YARN-5459: --- Thanks [~tangzhankun] - the intent is to move away from running the "docker rm" in container executor and allow users to control removal behavior through configuration. See YARN-5366. > Add support for docker rm > - > > Key: YARN-5459 > URL: https://issues.apache.org/jira/browse/YARN-5459 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Minor > Attachments: YARN-5459.001.patch > > > Add support for the docker rm command to be used for cleaning up exited and > failed containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5333) Some recovered apps are put into default queue when RM HA
[ https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403741#comment-15403741 ] Jun Gong edited comment on YARN-5333 at 8/2/16 10:43 AM: - Thanks [~rohithsharma], [~jianhe] for the review and comments! bq. 1. Should private boolean isTransitingToActive = false; is volatile? Yes, it needs be volatile. I'll update it. {quote} 2. Since none of the refreshXXX methods are synchronized, patch introduces a concurrency issue. If there is an explicit admin call for refreshing at the time of transitionToActive, then checkRMStatus will be executed for other admin calls. Until RM transition-to-active completely, explicit admin commands should not allowed to refresh. I think, we should incorporate similar to refreshAdminAcl method. {quote} How about adding {{synchronized}} to each refresh functions? It avoids adding more logic. When admin command comes, we could just call corresponding refresh functions. I think it does not matter to call refresh function many times. bq. 3. I think flag checkRMHAState can be passed to method checkRMStatus. I was thinking it. If adding checkRMHAState to checkRMStatus, we need add this parameter(checkRMHAState) to all refresh functions too(which is similar to refreshAdminAcl), there are a lot of places that call refresh functions. It might be better to just add a check before checkRMStatus? bq. I think if you can simulate test for generally instead of specific to fair scheduler, this test can be moved to class TestRMHA. There is already test TestRMHA#testTransitionedToActiveRefreshFail, probable the same test can be changed? Thanks. I'll update the test case. {quote} Instead of reusing the existing refreshAll method, I checked each refresh method, it should be cleaner to just create a new method which includes all necessary reconfig steps. This also avoids unnecessary audit logs, acl checks. {quote} Yes, it will be more clear to add a new method to include all reconfig steps. My doubt is that there will be two places that do similar reconfig things(the one is in refresh functions, the other is in the new added method). Then we need to modify both places if there is some change for one of them. I will try to refactor those refresh functions. was (Author: hex108): Thanks [~rohithsharma], [~jianhe] for the review and comments! bq. 1. Should private boolean isTransitingToActive = false; is volatile? Yes, it needs be volatile. I'll update it. {quote} 2. Since none of the refreshXXX methods are synchronized, patch introduces a concurrency issue. If there is an explicit admin call for refreshing at the time of transitionToActive, then checkRMStatus will be executed for other admin calls. Until RM transition-to-active completely, explicit admin commands should not allowed to refresh. I think, we should incorporate similar to refreshAdminAcl method. {quote} How about adding {{synchronized}} to each refresh functions? It avoids adding more logic. When admin command comes, we could just call corresponding refresh functions. I think it does not matter to call refresh function many times. bq. 3. I think flag checkRMHAState can be passed to method checkRMStatus. I was thinking it. If adding checkRMHAState to checkRMStatus, we need add this parameter(checkRMHAState) to all refresh functions too(which is similar to refreshAdminAcl), there are a lot of places that call refresh functions. It might be better to just add a check before checkRMStatus? bq. I think if you can simulate test for generally instead of specific to fair scheduler, this test can be moved to class TestRMHA. There is already test TestRMHA#testTransitionedToActiveRefreshFail, probable the same test can be changed? Thanks. I'll update the test case. {quote} Instead of reusing the existing refreshAll method, I checked each refresh method, it should be cleaner to just create a new method which includes all necessary reconfig steps. This also avoids unnecessary audit logs, acl checks. {quote} Yes, it will be more clear to add a new method to include all reconfig steps. My doubt is that there will be two places that do similar reconfig things(the one is in refresh functions, the other is in the new added method). Then we need to modify both places if there is some change for one of them. > Some recovered apps are put into default queue when RM HA > - > > Key: YARN-5333 > URL: https://issues.apache.org/jira/browse/YARN-5333 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-5333.01.patch, YARN-5333.02.patch, > YARN-5333.03.patch, YARN-5333.04.patch, YARN-5333.05.patch > > > Enable RM HA and use FairScheduler, > {{yarn.scheduler.fair.allow-undeclared-pools}} is set to
[jira] [Commented] (YARN-5333) Some recovered apps are put into default queue when RM HA
[ https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403741#comment-15403741 ] Jun Gong commented on YARN-5333: Thanks [~rohithsharma], [~jianhe] for the review and comments! bq. 1. Should private boolean isTransitingToActive = false; is volatile? Yes, it needs be volatile. I'll update it. {quote} 2. Since none of the refreshXXX methods are synchronized, patch introduces a concurrency issue. If there is an explicit admin call for refreshing at the time of transitionToActive, then checkRMStatus will be executed for other admin calls. Until RM transition-to-active completely, explicit admin commands should not allowed to refresh. I think, we should incorporate similar to refreshAdminAcl method. {quote} How about adding {{synchronized}} to each refresh functions? It avoids adding more logic. When admin command comes, we could just call corresponding refresh functions. I think it does not matter to call refresh function many times. bq. 3. I think flag checkRMHAState can be passed to method checkRMStatus. I was thinking it. If adding checkRMHAState to checkRMStatus, we need add this parameter(checkRMHAState) to all refresh functions too(which is similar to refreshAdminAcl), there are a lot of places that call refresh functions. It might be better to just add a check before checkRMStatus? bq. I think if you can simulate test for generally instead of specific to fair scheduler, this test can be moved to class TestRMHA. There is already test TestRMHA#testTransitionedToActiveRefreshFail, probable the same test can be changed? Thanks. I'll update the test case. {quote} Instead of reusing the existing refreshAll method, I checked each refresh method, it should be cleaner to just create a new method which includes all necessary reconfig steps. This also avoids unnecessary audit logs, acl checks. {quote} Yes, it will be more clear to add a new method to include all reconfig steps. My doubt is that there will be two places that do similar reconfig things(the one is in refresh functions, the other is in the new added method). Then we need to modify both places if there is some change for one of them. > Some recovered apps are put into default queue when RM HA > - > > Key: YARN-5333 > URL: https://issues.apache.org/jira/browse/YARN-5333 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-5333.01.patch, YARN-5333.02.patch, > YARN-5333.03.patch, YARN-5333.04.patch, YARN-5333.05.patch > > > Enable RM HA and use FairScheduler, > {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, > {{yarn.scheduler.fair.user-as-default-queue}} is set to false. > Reproduce steps: > 1. Start two RMs. > 2. After RMs are running, change both RM's file > {{etc/hadoop/fair-scheduler.xml}}, then add some queues. > 3. Submit some apps to the new added queues. > 4. Stop the active RM, then the standby RM will transit to active and recover > apps. > However the new active RM will put recovered apps into default queue because > it might have not loaded the new {{fair-scheduler.xml}}. We need call > {{initScheduler}} before start active services or bring {{refreshAll()}} in > front of {{rm.transitionToActive()}}. *It seems it is also important for > other scheduler*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4091) Add REST API to retrieve scheduler activity
[ https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403712#comment-15403712 ] Sunil G commented on YARN-4091: --- bq.1) Add more detailed diagnostic messages to apps/queues, bq.2) Merge pending application state into node allocation state. Yes, this is make sense. we can spin off these improvements. bq.What do you mean by target state? Could you please explain more? bq.I think the priority attribute in response could indicate "priority level 0". Do you think it is enough? So we could use "priority skipped"? Yes. I will try to explain. When an AM container is allocated, the state of app in the rest o/p is shown as ACCEPTED. Since we already allocated AM container in this heartbeat, definitely state of app ll become RUNNING/FAILED. So I was thinking whether it ll be informative to show the target state with the allocation/rejection and how far it will help the user. This can be enhancement, by checking use case value, we can choose to do or not do. > Add REST API to retrieve scheduler activity > --- > > Key: YARN-4091 > URL: https://issues.apache.org/jira/browse/YARN-4091 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Chen Ge > Attachments: Improvement on debugdiagnostic information - YARN.pdf, > SchedulerActivityManager-TestReport v2.pdf, > SchedulerActivityManager-TestReport.pdf, YARN-4091-design-doc-v1.pdf, > YARN-4091.1.patch, YARN-4091.2.patch, YARN-4091.3.patch, YARN-4091.4.patch, > YARN-4091.5.patch, YARN-4091.5.patch, YARN-4091.6.patch, > YARN-4091.preliminary.1.patch, app_activities v2.json, app_activities.json, > node_activities v2.json, node_activities.json > > > As schedulers are improved with various new capabilities, more configurations > which tunes the schedulers starts to take actions such as limit assigning > containers to an application, or introduce delay to allocate container etc. > There are no clear information passed down from scheduler to outerworld under > these various scenarios. This makes debugging very tougher. > This ticket is an effort to introduce more defined states on various parts in > scheduler where it skips/rejects container assignment, activate application > etc. Such information will help user to know whats happening in scheduler. > Attaching a short proposal for initial discussion. We would like to improve > on this as we discuss. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager
[ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469 ] mai shurong edited comment on YARN-5449 at 8/2/16 9:40 AM: --- Sorry, I had added my description of this issue when I created it, bu was not submitted to jira by some problems. I would add description as soon as possible. was (Author: shurong.mai): Sorry, I had added my description of this issue when I created this jira, bu was not submitted to jira by some problems. I would add description as soon as possible. > nodemanager process is hung, and lost from resourcemanager > -- > > Key: YARN-5449 > URL: https://issues.apache.org/jira/browse/YARN-5449 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux > The java version is jdk1.7.0_45 > The hadoop version is hadoop-2.2.0 >Reporter: mai shurong > > The nodemanager process is hung(is not dead), and lost from resourcemanager. > The nodemanager's log is stopped from printing. > The used cpu of nodemanager process is very low(nearly 0%). > GC of nodemanager jvm process is stopped, and the result of jstat(jstat > -gccause pid 1000 100) is as follows: > S0 S1 E O P YGC YGCTFGCFGCT GCT > LGCC GCC > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > The nodemanager jvm process is also accur this problem using CMS garbage > collector or g1 garbage collector. > The parameters of CMS garbage collector are as following: > -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 > -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 > -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 > The parameters of g1 garbage collector are as following: > -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC > -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 > -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 > -XX:+PrintAdaptiveSizePolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager
[ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469 ] mai shurong edited comment on YARN-5449 at 8/2/16 9:38 AM: --- Sorry, I had added my description of this issue when I created this jira, bu was not submitted to jira by some problems. I would add description as soon as possible. was (Author: shurong.mai): Sorry, I had added my description of this issue when I created, bu was not submitted to jira by some problems. I would add description as soon as possible. > nodemanager process is hung, and lost from resourcemanager > -- > > Key: YARN-5449 > URL: https://issues.apache.org/jira/browse/YARN-5449 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux > The java version is jdk1.7.0_45 > The hadoop version is hadoop-2.2.0 >Reporter: mai shurong > > The nodemanager process is hung(is not dead), and lost from resourcemanager. > The nodemanager's log is stopped from printing. > The used cpu of nodemanager process is very low(nearly 0%). > GC of nodemanager jvm process is stopped, and the result of jstat(jstat > -gccause pid 1000 100) is as follows: > S0 S1 E O P YGC YGCTFGCFGCT GCT > LGCC GCC > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > The nodemanager jvm process is also accur this problem using CMS garbage > collector or g1 garbage collector. > The parameters of CMS garbage collector are as following: > -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 > -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 > -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 > The parameters of g1 garbage collector are as following: > -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC > -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 > -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 > -XX:+PrintAdaptiveSizePolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager
[ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469 ] mai shurong edited comment on YARN-5449 at 8/2/16 9:38 AM: --- Sorry, I had added my description of this issue when I created, bu was not submitted to jira by some problems. I would add description as soon as possible. was (Author: shurong.mai): Sorry, I had added my description, bu was not submitted to jira by some problems. I would add description as soon as possible. > nodemanager process is hung, and lost from resourcemanager > -- > > Key: YARN-5449 > URL: https://issues.apache.org/jira/browse/YARN-5449 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux > The java version is jdk1.7.0_45 > The hadoop version is hadoop-2.2.0 >Reporter: mai shurong > > The nodemanager process is hung(is not dead), and lost from resourcemanager. > The nodemanager's log is stopped from printing. > The used cpu of nodemanager process is very low(nearly 0%). > GC of nodemanager jvm process is stopped, and the result of jstat(jstat > -gccause pid 1000 100) is as follows: > S0 S1 E O P YGC YGCTFGCFGCT GCT > LGCC GCC > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > The nodemanager jvm process is also accur this problem using CMS garbage > collector or g1 garbage collector. > The parameters of CMS garbage collector are as following: > -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 > -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 > -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 > The parameters of g1 garbage collector are as following: > -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC > -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 > -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 > -XX:+PrintAdaptiveSizePolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5449) nodemanager process is hung, and lost from resourcemanager
[ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mai shurong updated YARN-5449: -- Description: The nodemanager process is hung(is not dead), and lost from resourcemanager. The nodemanager's log is stopped from printing. The used cpu of nodemanager process is very low(nearly 0%). GC of nodemanager jvm process is stopped, and the result of jstat(jstat -gccause pid 1000 100) is as follows: S0 S1 E O P YGC YGCTFGCFGCT GCTLGCC GCC 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No GCG1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No GCG1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No GCG1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No GCG1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No GCG1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No GCG1 Evacuation Pause The nodemanager jvm process is also accur this problem using CMS garbage collector or g1 garbage collector. The parameters of CMS garbage collector are as following: -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 The parameters of g1 garbage collector are as following: -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 -XX:+PrintAdaptiveSizePolicy was: The nodemanager process is hung, and lost from resourcemanager. The nodemanager's log is stopped from printing. The used cpu of nodemanager process is very low(nearly 0%). GC of nodemanager jvm process is stopped, and the result of jstat(jstat -gccause pid 1000 100) is as follows: S0 S1 E O P YGC YGCTFGCFGCT GCTLGCC GCC 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No GCG1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No GCG1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No GCG1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No GCG1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No GCG1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No GCG1 Evacuation Pause The nodemanager jvm process is also accur this problem using CMS garbage collector or g1 garbage collector. The parameters of CMS garbage collector are as following: -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 The parameters of g1 garbage collector are as following: -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 -XX:+PrintAdaptiveSizePolicy > nodemanager process is hung, and lost from resourcemanager > -- > > Key: YARN-5449 > URL: https://issues.apache.org/jira/browse/YARN-5449 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux > The java version is jdk1.7.0_45 > The hadoop version is hadoop-2.2.0 >Reporter: mai shurong > > The nodemanager process is hung(is not dead), and lost from resourcemanager. > The nodemanager's log is stopped from printing. > The used cpu of nodemanager process is very low(nearly 0%). > GC of nodemanager jvm process is stopped, and the result of jstat(jstat > -gccause pid 1000 100) is as follows: > S0 S1 E O P YGC YGCTFGCFGCT GCT > LGCC GCC > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 7
[jira] [Updated] (YARN-5160) Add timeout when starting JobHistoryServer in MiniMRYarnCluster
[ https://issues.apache.org/jira/browse/YARN-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Bokor updated YARN-5160: --- Attachment: YARN-5160.01.patch Uploading first patch. I could not write JUnit test. I should have mocked the {{JobHistoryServer}} object but it is created inside {{serviceStart}} so I could not mock it. I did some manual test and it worked. > Add timeout when starting JobHistoryServer in MiniMRYarnCluster > --- > > Key: YARN-5160 > URL: https://issues.apache.org/jira/browse/YARN-5160 > Project: Hadoop YARN > Issue Type: Test > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Andras Bokor >Assignee: Andras Bokor >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-5160.01.patch > > > This JIRA is to follow up a TODO in MiniMRYarnCluster. > {{//TODO Add a timeout. State.STOPPED check ?}} > I think State.STOPPED check is not needed. I do not see the value to check > STOPPED state here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3854) Add localization support for docker images
[ https://issues.apache.org/jira/browse/YARN-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403564#comment-15403564 ] Zhankun Tang commented on YARN-3854: Yes. It seems that our direction is towards "docker pull" during localization. Will we just discard the "HDFS+ docker load" way and design it based on "docker pull" ? > Add localization support for docker images > -- > > Key: YARN-3854 > URL: https://issues.apache.org/jira/browse/YARN-3854 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Zhankun Tang > Attachments: YARN-3854-branch-2.8.001.patch, > YARN-3854_Localization_support_for_Docker_image_v1.pdf, > YARN-3854_Localization_support_for_Docker_image_v2.pdf > > > We need the ability to localize docker images when those images aren't > already available locally. There are various approaches that could be used > here with different trade-offs/issues : image archives on HDFS + docker load > , docker pull during the localization phase or (automatic) docker pull > during the run/launch phase. > We also need the ability to clean-up old/stale, unused images. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5428) Allow for specifying the docker client configuration directory
[ https://issues.apache.org/jira/browse/YARN-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403503#comment-15403503 ] Zhankun Tang edited comment on YARN-5428 at 8/2/16 7:29 AM: Thanks for the patch, [~shaneku...@gmail.com]. One question: I remember "docker login" will store the credentials in ~/.docker/config.json by default. Will this patch eliminate the need of "docker login"? Or should the administrator store credentials in the config.json file manually? was (Author: tangzhankun): thanks for the patch, [~shaneku...@gmail.com]. Looks good to me. > Allow for specifying the docker client configuration directory > -- > > Key: YARN-5428 > URL: https://issues.apache.org/jira/browse/YARN-5428 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-5428.001.patch, YARN-5428.002.patch, > YARN-5428.003.patch, YARN-5428.004.patch > > > The docker client allows for specifying a configuration directory that > contains the docker client's configuration. It is common to store "docker > login" credentials in this config, to avoid the need to docker login on each > cluster member. > By default the docker client config is $HOME/.docker/config.json on Linux. > However, this does not work with the current container executor user > switching and it may also be desirable to centralize this configuration > beyond the single user's home directory. > Note that the command line arg is for the configuration directory NOT the > configuration file. > This change will be needed to allow YARN to automatically pull images at > localization time or within container executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5428) Allow for specifying the docker client configuration directory
[ https://issues.apache.org/jira/browse/YARN-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403503#comment-15403503 ] Zhankun Tang commented on YARN-5428: thanks for the patch, [~shaneku...@gmail.com]. Looks good to me. > Allow for specifying the docker client configuration directory > -- > > Key: YARN-5428 > URL: https://issues.apache.org/jira/browse/YARN-5428 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-5428.001.patch, YARN-5428.002.patch, > YARN-5428.003.patch, YARN-5428.004.patch > > > The docker client allows for specifying a configuration directory that > contains the docker client's configuration. It is common to store "docker > login" credentials in this config, to avoid the need to docker login on each > cluster member. > By default the docker client config is $HOME/.docker/config.json on Linux. > However, this does not work with the current container executor user > switching and it may also be desirable to centralize this configuration > beyond the single user's home directory. > Note that the command line arg is for the configuration directory NOT the > configuration file. > This change will be needed to allow YARN to automatically pull images at > localization time or within container executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5310) AM restart failed because of the expired HDFS delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403481#comment-15403481 ] Xianyin Xin commented on YARN-5310: --- Thanks [~aw]. Then do we have any good idea on this problem? > AM restart failed because of the expired HDFS delegation tokens > --- > > Key: YARN-5310 > URL: https://issues.apache.org/jira/browse/YARN-5310 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Xianyin Xin >Assignee: Xianyin Xin > > For a long running AM, it would get failed when restart because the token in > ApplicationSubmissionContext expires. We should update it when we get a new > delegation token on behalf of the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org