[jira] [Updated] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run
[ https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3998: --- Attachment: YARN-3998.04.patch Sorry, there is something wrong with YARN-3998.03.patch. Attached a new patch YARN-3998.04.patch. > Add retry-times to let NM re-launch container when it fails to run > -- > > Key: YARN-3998 > URL: https://issues.apache.org/jira/browse/YARN-3998 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3998.01.patch, YARN-3998.02.patch, > YARN-3998.03.patch, YARN-3998.04.patch > > > I'd like to add a field(retry-times) in ContainerLaunchContext. When AM > launches containers, it could specify the value. Then NM will re-launch the > container 'retry-times' times when it fails to run(e.g.exit code is not 0). > It will save a lot of time. It avoids container localization. RM does not > need to re-schedule the container. And local files in container's working > directory will be left for re-use.(If container have downloaded some big > files, it does not need to re-download them when running again.) > We find it is useful in systems like Storm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4559) Make leader elector and zk store share the same curator client
[ https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4559: -- Attachment: YARN-4559.3.patch > Make leader elector and zk store share the same curator client > -- > > Key: YARN-4559 > URL: https://issues.apache.org/jira/browse/YARN-4559 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4559.1.patch, YARN-4559.2.patch, YARN-4559.3.patch > > > After YARN-4438, we can reuse the same curator client for leader elector and > zk store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run
[ https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101383#comment-15101383 ] Jun Gong commented on YARN-3998: [~vvasudev] Thanks for the detailed review and suggestions! I just attached a new patch to address above problems. {quote} In your implementation, the relaunched container will go through the launchContainer call which will try to setup the container launch environment again(like creating the local and log dirs, creating tokens, etc). Won't this lead to FileAlreadyExistsException being thrown as part of the launchContainer call? In addition, this also means that on a node with more than one local dir, different attempts could get allocated to different local dirs. I wonder if it's better to move the retry logic into the launchContainer function instead of adding a new state transition? {quote} The reason for adding a new state transition are as following: 1. Between retry interval, container is not running actually, it seems more reasonable to make it in LOCALIZED state. 2. For NM restart, it will not be enough to just add retry logic into *ContainerLaunch#call()*. When NM restart, it will call *RecoveredContainerLaunch#call*, then we also need add retry logic at this place, otherwise container might exit with failure with no retry. The logic seems more clear to add a state transition, and avoids duplicated codes. In order to avoid FileAlreadyExistsException, I add some code(*cleanupContainerFilesForRelaunch*) to cleanup files(token file and launch script). We also need to cleanup previous PID file, NM will try to get PID through this file when NM restart. In order to use same container working directory and log directory, we need to record these path, and need to store these path to NMStateStore for NM restart case. According to [~vvasudev]'s suggestion, we use a simple heuristic – if a good work directory with the container tokens file already exists, use that directory otherwise use a new one. That way we don’t need to worry about storing the directories in the state store. However there is not a file likes 'tokens file' for log directory, so we use the file 'stdout' as this kind of file. We assume there is 'stdout' in most containers' log direcotry. > Add retry-times to let NM re-launch container when it fails to run > -- > > Key: YARN-3998 > URL: https://issues.apache.org/jira/browse/YARN-3998 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3998.01.patch, YARN-3998.02.patch, > YARN-3998.03.patch > > > I'd like to add a field(retry-times) in ContainerLaunchContext. When AM > launches containers, it could specify the value. Then NM will re-launch the > container 'retry-times' times when it fails to run(e.g.exit code is not 0). > It will save a lot of time. It avoids container localization. RM does not > need to re-schedule the container. And local files in container's working > directory will be left for re-use.(If container have downloaded some big > files, it does not need to re-download them when running again.) > We find it is useful in systems like Storm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run
[ https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3998: --- Attachment: YARN-3998.03.patch > Add retry-times to let NM re-launch container when it fails to run > -- > > Key: YARN-3998 > URL: https://issues.apache.org/jira/browse/YARN-3998 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3998.01.patch, YARN-3998.02.patch, > YARN-3998.03.patch > > > I'd like to add a field(retry-times) in ContainerLaunchContext. When AM > launches containers, it could specify the value. Then NM will re-launch the > container 'retry-times' times when it fails to run(e.g.exit code is not 0). > It will save a lot of time. It avoids container localization. RM does not > need to re-schedule the container. And local files in container's working > directory will be left for re-use.(If container have downloaded some big > files, it does not need to re-download them when running again.) > We find it is useful in systems like Storm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4598) Invalid event: RESOURCE_FAILED at CONTAINER_CLEANEDUP_AFTER_KILL
tangshangwen created YARN-4598: -- Summary: Invalid event: RESOURCE_FAILED at CONTAINER_CLEANEDUP_AFTER_KILL Key: YARN-4598 URL: https://issues.apache.org/jira/browse/YARN-4598 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: tangshangwen Assignee: tangshangwen In our cluster, I found that the container has some problems in state transition,this is my log {noformat} 2016-01-12 17:42:50,088 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1452588902899_0001_01_87 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE 2016-01-12 17:42:50,088 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Can't handle this event at current state: Current: [CONTAINER_CLEANEDUP_AFTER_KILL], eventType: [RESOURCE_FAILED] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at CONTAINER_CLEANEDUP_AFTER_KILL at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1127) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:83) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1078) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1071) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:744) 2016-01-12 17:42:50,089 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1452588902899_0001_01_94 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to null 2016-01-12 17:42:50,089 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop OPERATION=Container Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1452588902899_0001 CONTAINERID=container_1452588902899_0001_01_94 2016-01-12 17:42:50,089 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1452588902899_0001_01_94 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4502) Sometimes Two AM containers get launched
[ https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101256#comment-15101256 ] Wangda Tan commented on YARN-4502: -- Looks good to me, +1. Pending Jenkins. > Sometimes Two AM containers get launched > > > Key: YARN-4502 > URL: https://issues.apache.org/jira/browse/YARN-4502 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: YARN-4502-20160114.txt, YARN-4502-20160212.txt > > > Scenario : > * set yarn.resourcemanager.am.max-attempts = 2 > * start dshell application > {code} > yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar > hadoop-yarn-applications-distributedshell-*.jar > -attempt_failures_validity_interval 6 -shell_command "sleep 150" > -num_containers 16 > {code} > * Kill AM pid > * Print container list for 2nd attempt > {code} > yarn container -list appattempt_1450825622869_0001_02 > INFO impl.TimelineClientImpl: Timeline service address: > http://xxx:port/ws/v1/timeline/ > INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10: > Total number of containers :2 > Container-Id Start Time Finish Time > StateHost Node Http Address >LOG-URL > container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa > container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa > {code} > * look for new AM pid > Here, 2nd AM container was suppose to be started on > container_e12_1450825622869_0001_02_01. But AM was not launched on > container_e12_1450825622869_0001_02_01. It was in AQUIRED state. > On other hand, container_e12_1450825622869_0001_02_02 got the AM running. > Expected behavior: RM should not start 2 containers for starting AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4589) Diagnostics for localization timeouts is lacking
[ https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101240#comment-15101240 ] Hadoop QA commented on YARN-4589: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 1s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 12 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 19s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 42s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 6m 42s {color} | {color:red} root in trunk failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 2m 23s {color} | {color:red} root in trunk failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 13s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 27s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 44s {color} | {color:red} root in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 0m 44s {color} | {color:red} root in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 44s {color} | {color:red} root in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 50s {color} | {color:red} root in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 0m 50s {color} | {color:red} root in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 50s {color} | {color:red} root in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 20s {color} | {color:red} Patch generated 4 new checkstyle issues in root (total was 728, now 730). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 4s {color} | {color:red} hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app introduced 1 new FindBugs issues. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 44s {color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 17s {color} | {color:red} hadoop-mapreduce-client-app in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 9m 45s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 0, now 1). {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 9m 45s {color} | {color:red} hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 0, now 1).
[jira] [Updated] (YARN-4538) QueueMetrics pending cores and memory metrics wrong
[ https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4538: --- Attachment: 0005-YARN-4538.patch [~leftnoteasy] Thanks for review . Sorry to miss javadoc. Patch uploaded correcting javadoc. Please do review. > QueueMetrics pending cores and memory metrics wrong > > > Key: YARN-4538 > URL: https://issues.apache.org/jira/browse/YARN-4538 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4538.patch, 0002-YARN-4538.patch, > 0003-YARN-4538.patch, 0004-YARN-4538.patch, 0005-YARN-4538.patch > > > Submit 2 application to default queue > Check queue metrics for pending cores and memory > {noformat} > List allQueues = client.getChildQueueInfos("root"); > for (QueueInfo queueInfo : allQueues) { > QueueStatistics quastats = queueInfo.getQueueStatistics(); > System.out.println(quastats.getPendingVCores()); > System.out.println(quastats.getPendingMemoryMB()); > } > {noformat} > *Output :* > -20 > -20480 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4584) RM startup failure when AM attempts greater than max-attempts
[ https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101222#comment-15101222 ] Rohith Sharma K S commented on YARN-4584: - I had offline discussion with [~jianhe] regarding this issue. And fix summary is as follows. # Do not remove the attempts if attemptFailuresValidityInterval <=0. # Remove the excess attempts which are only beyond attemptFailuresValidityInterval. > RM startup failure when AM attempts greater than max-attempts > - > > Key: YARN-4584 > URL: https://issues.apache.org/jira/browse/YARN-4584 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-4584.patch > > > Configure 3 queue in cluster with 8 GB > # queue 40% > # queue 50% > # default 10% > * Submit applications to all 3 queue with container size as 1024MB (sleep job > with 50 containers on all queues) > * AM that gets assigned to default queue and gets preempted immediately after > 20 preemption kill all application > Due resource limit in default queue AM got prempted about 20 times > On RM restart RM fails to restart > {noformat} > 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: > noteFailure java.lang.NullPointerException > 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED; cause: > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: > Service: RMActiveServices entered state STOPPED > 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: > RMActiveServices: stopping services, size=16 > {noformat} -
[jira] [Updated] (YARN-4502) Sometimes Two AM containers get launched
[ https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4502: -- Attachment: YARN-4502-20160114.txt Tx for the review, [~leftnoteasy] bq. Do you think is it better to rename PREEMPT_CONTAINER/preemptContainer in CapacityScheduler to something like add_preemption_candidate or mark-container-for-preemption? Preempt-container looks very similar to kill-preempted-container. Similarily, FiCaSchedulerApp#preemptContainer. Makes sense, that is what I was thinking before too but didn't make the change. Uploading a new patch with that comment addressed. Also fixed whitespace and checkstyle issue. Cannot address the too-long-file warning. > Sometimes Two AM containers get launched > > > Key: YARN-4502 > URL: https://issues.apache.org/jira/browse/YARN-4502 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: YARN-4502-20160114.txt, YARN-4502-20160212.txt > > > Scenario : > * set yarn.resourcemanager.am.max-attempts = 2 > * start dshell application > {code} > yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar > hadoop-yarn-applications-distributedshell-*.jar > -attempt_failures_validity_interval 6 -shell_command "sleep 150" > -num_containers 16 > {code} > * Kill AM pid > * Print container list for 2nd attempt > {code} > yarn container -list appattempt_1450825622869_0001_02 > INFO impl.TimelineClientImpl: Timeline service address: > http://xxx:port/ws/v1/timeline/ > INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10: > Total number of containers :2 > Container-Id Start Time Finish Time > StateHost Node Http Address >LOG-URL > container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa > container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa > {code} > * look for new AM pid > Here, 2nd AM container was suppose to be started on > container_e12_1450825622869_0001_02_01. But AM was not launched on > container_e12_1450825622869_0001_02_01. It was in AQUIRED state. > On other hand, container_e12_1450825622869_0001_02_02 got the AM running. > Expected behavior: RM should not start 2 containers for starting AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101166#comment-15101166 ] Wangda Tan commented on YARN-4108: -- Thanks for looking at this, [~eepayne]. bq. In the lazy preemption case, PCPP will send an event to the scheduler to mark a container killable. Can PCPP check if it's already been marked before sending, so that maybe event traffic will be less in the RM? Agree, we can create a killable map similar to preempted-map in PCPP bq. Currently, if both queueA and queueB are over their guaranteed capacity, preemption will still occur if queueA is more over capacity than queueB. I think it is probably important to preserve this behavior (YARN-2592). Thank for pointing me this patch, quick read comments on YARN-2592. I think we can still keep the same behavior in the new proposal: currently I assume only queue with usage less than guranteed can preempt containers from others, but we can relax this limit to: queue doesn't have to-be-preempted containers could preempt from others. However, I think allowing two over-satisfied queues shooting at each other may not reasonable, if we have 3 queues configured to, a=10, b=20, c=70. when c uses nothing, we cannot simply interpret a's new capacity = 33 and b's new capacity = 66. (a:b = 10:20). Since admin only configured capacities of a/b to 10/20, we should strictly follow what admin configured. bq. don't see anyplace where ResourceLimits#isAllowPreemption is called. But, if it is, Will the following code in LeafQueue change preemption behavior?... Yes, LeafQueue decides an app could kill containers or not. And app will use it in {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainer}} for deciding {{toKillContainers}}. bq. I'm just trying to understand how things will be affected when headroom for a parent queue is (limit - used) + killable. Doesn't that say that a parent queue has more headroom than it's already acutally using? Is it relying on this behavior so that the assignment code will determine that it has more headroom when there are killable containers, and then rely on the leafqueue to kill those containers? I'm not sure if I understand your question properly, let me trying to explain this behavior: ParentQueue will add its own killable container to headroom (getTotalKillableResource is a bad naming, it should be {{getTotalKillableResourceForThisQueue}}). Since these containers are all belongs to the parent queue, it has rights to kill all of them to satisfy max-queue-capacity. Killable container will be actually killed in two cases: - An under-satisfied leaf queue trying to allocate on a node, but the node doesn't have enough resources, so it will kill containers *on the node* to allocate the new container - A queue who is using more than max-capacity, an it has killable container, we will try to kill containers for such queues to make sure it doesn't violate max-capacity. You can check following code in ParentQueue#allocateResource: {code} // check if we need to kill (killable) containers if maximum resource violated. if (getQueueCapacities().getAbsoluteMaximumCapacity(nodePartition) < getQueueCapacities().getAbsoluteUsedCapacity(nodePartition)) { killContainersToEnforceMaxQueueCapacity(nodePartition, clusterResource); } {code} bq. NPE if getChildQueues() returns null Nice catching, updated locally bq. CSAssignment#toKillContainers: I would call them containersToKill Agree, updated locally bq. It would be interesting to know what your thoughts are on making further modifications to PCPP to make more informed choices about which containers to kill. I don't have clear ideas for this, a rough idea in my mind is, we could adding some field to scheduler to indicate some special request (e.g. large/hard-locality, etc.) is starving and head-of-line (HOL). And doing scan in PCPP at background, after PCPP marks container-to-be-preempted, we can leverage marked starving-and-HOL request to modify existing marked to-be-preempted containers. Again, this is a rough thinking, I'm not sure if it is doable. > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-V3.pdf, > YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, > YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch > > > This is sibling JIRA for YARN-
[jira] [Commented] (YARN-4559) Make leader elector and zk store share the same curator client
[ https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101162#comment-15101162 ] Hadoop QA commented on YARN-4559: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} Patch generated 5 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 108, now 112). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 19s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 45s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 40s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 140m 29s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.resourceManager; locked 60% of time Unsynchronized access at RMStateStore.java:60% of time Unsynchronized access at RMStateStore.java:[line 1183] | | JDK v1.8.0_66 F
[jira] [Commented] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
[ https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101151#comment-15101151 ] Hadoop QA commented on YARN-4428: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 36s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 2, now 2). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} Patch generated 7 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 132, now 138). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 8s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 10s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 138m 39s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed
[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle
[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101149#comment-15101149 ] Chris Douglas commented on YARN-4597: - The {{ContainerLaunchContext}} (CLC) specifies the prerequisites for starting a container on a node. These include setting up user/application directories and downloading dependencies to the NM cache (localization). The NM assumes that an authenticated {{startContainer}} request has not overbooked resources on the node, so resources are only reserved/enforced during the container launch and execution. This JIRA proposes to add a phase between localization and container launch to manage a collection of runnable containers. Similar to the localizer stage, a container will launch only after all the resources from its CLC are assigned by a _local scheduler_. The local scheduler will select containers to run based on priority, declared requirements, and by monitoring utilization on the node (YARN-1011). A few future and in-progress features motiviate this change. *Preemption* Instead of sending a kill when the RM selects a victim container, it could instead convert it from a {{GUARANTEED}} to an {{OPTIMISTIC}} container (YARN-4335). This has two benefits. First, the downgraded container can continue to run until a guaranteed container arrives _and_ finishes localizing its dependencies, so the downgraded container has an opportunity to complete or checkpoint. When the guaranteed container moves from {{LOCALIZED}} to {{SCHEDULING}}, the local scheduler may select the victim (formerly guaranteed) container to be killed. \[1\] Second, the NM may elect to kill the victim container to run _different_ optimistic containers, particularly short-running tasks. *Optimistic scheduling and overprovisioning* To support distributed scheduling (YARN-2877) and resource-aware scheduling (YARN-1011), the NM needs a component to select containers that are ready to run. The local scheduler can not only select tasks to run based on monitoring, it can also make offers to running containers using durations attached to leases \[2\]. Based on recent observations, it may start containers that oversubscribe the node, or delay starting containers if a lease is close to expiring (i.e., the container is likely to complete). *Long-running services*. Note that by separating the local scheduler, both that module _and_ the localizer could be opened up as services provided by the NM. The localizer could also be extended to prioritize downloads among {{OPTIMISTIC}} containers (possibly preemptable by {{GUARANTEED}}, and to group containers based on their dependencies (e.g., avoid downloading a large dep for fewer than N optimistic containers). By exposing these services, the NM can assist with the following: # Resource spikes. If a service container needs to spike temporarily, it may not need guaranteed resources (YARN-1197). Containers requiring low-latency elasticity could request optimistic resources instead of peak provisioning, resizing, or using workarounds like [Llama|http://cloudera.github.io/llama/]. If the local scheduler is addressable by local containers, then the lease could be logical (i.e., not start a process). Resources assigned to a {{RUNNING}} container could be published rather than triggering a launch. One could also imagine service workers marking some resources as unused, while retaining the authority to spike into them ("subleasing" them to opportunistic containers) by reclaiming them through the local scheduler. # Upgrades. If the container needs to pull new dependencies, it could use the NM Localizer rather of coordinating the download itself. # Maintenance tasks. Services often need to clean up, compact, scrub, and checkpoint local data. Right now, each service needs to independnetly monitor resource utilization to back off saturated resources (particularly disks). Coordination between services is difficult. In contrast, one could schedule tasks like block scrubbing as optimistic tasks in the NM to avoid interrupting services that are spiking. This is similar in spirit to distributed scheduling insofar as it does not involve the RM and targets a single host (i.e., the host the container is running on). \[1\] Though it was selected as a victim by the RM, the local scheduler may decide to kill a different {{OPTIMISTIC}} container when the guaranteed container requests resources. For example, if a container completes on the node after the RM selected the victim, then the NM may elect to kill a smaller optimistic process if it is sufficient to satisfy the guarantee. \[2\] Discussion on duration in YARN-1039 was part of a broader conversation on support for long-running services (YARN-896). > Add SCHEDULE to NM container lifecycle > -- > > Key: YARN-4597 >
[jira] [Created] (YARN-4597) Add SCHEDULE to NM container lifecycle
Chris Douglas created YARN-4597: --- Summary: Add SCHEDULE to NM container lifecycle Key: YARN-4597 URL: https://issues.apache.org/jira/browse/YARN-4597 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Chris Douglas Currently, the NM immediately launches containers after resource localization. Several features could be more cleanly implemented if the NM included a separate stage for reserving resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101143#comment-15101143 ] Wangda Tan commented on YARN-4108: -- Hi [~sunilg] bq. 1. In current PCPP, we first preempt all reserved continers from all applications in a queue if its overallocated. I would prefer to following the existing PCPP logic: drop the container reservation first and let scheduler continue to decide who can allocate/reserve container on it. bq. 2. If I understood correctly, "killable containers" will be triggered with preeempt event only if a proper allocation can happen for target application (from underserving queue). For the preempt event sent to AM, my current thinking is, we will send it when we add it to preempt list (not killable). AM could save state or return resources. I understand this could lead to unnecessary add-container-to-preempt-list event send to AM, but I think it's better than excessive killing containers. bq. 3. To cancel "killable container", i think PCPP will take the call by waiting for some interval. So some new configuration is needed for this? Maybe not, we don't need extra config for cancel killable container bq. 4. I would like to have some freedom in selecting conatiners (marking) for preemption. A simple sorting based on submission time or priority seems limited approach. Could we have some interface here so that we can plugin user specific comparision cases. Agree, we can discuss this in a separated JIRA if we agree with the overall approach (separate container selection and preemption). Will file a new JIRA when we reach a consensus about the roadmap. [~mding] > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-V3.pdf, > YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, > YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch > > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101117#comment-15101117 ] Wangda Tan commented on YARN-3215: -- Thanks [~sunilg]/[~Naganarasimha]. bq. min (total unused resourcelimit for a given label, ), so that headroom doesn't exceed whats actually available! right ? Sounds like a good plan to me :) > Respect labels in CapacityScheduler when computing headroom > --- > > Key: YARN-3215 > URL: https://issues.apache.org/jira/browse/YARN-3215 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-3215.v1.001.patch > > > In existing CapacityScheduler, when computing headroom of an application, it > will only consider "non-labeled" nodes of this application. > But it is possible the application is asking for labeled resources, so > headroom-by-label (like 5G resource available under node-label=red) is > required to get better resource allocation and avoid deadlocks such as > MAPREDUCE-5928. > This JIRA could involve both API changes (such as adding a > label-to-available-resource map in AllocateResponse) and also internal > changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4538) QueueMetrics pending cores and memory metrics wrong
[ https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101115#comment-15101115 ] Wangda Tan commented on YARN-4538: -- +1 to latest patch, [~bibinchundatt], could you take a look at [javadocs warning|https://builds.apache.org/job/PreCommit-YARN-Build/10279/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt]? > QueueMetrics pending cores and memory metrics wrong > > > Key: YARN-4538 > URL: https://issues.apache.org/jira/browse/YARN-4538 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4538.patch, 0002-YARN-4538.patch, > 0003-YARN-4538.patch, 0004-YARN-4538.patch > > > Submit 2 application to default queue > Check queue metrics for pending cores and memory > {noformat} > List allQueues = client.getChildQueueInfos("root"); > for (QueueInfo queueInfo : allQueues) { > QueueStatistics quastats = queueInfo.getQueueStatistics(); > System.out.println(quastats.getPendingVCores()); > System.out.println(quastats.getPendingMemoryMB()); > } > {noformat} > *Output :* > -20 > -20480 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4512) Provide a knob to turn on over-allocation
[ https://issues.apache.org/jira/browse/YARN-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101110#comment-15101110 ] Wangda Tan commented on YARN-4512: -- Thanks [~elgoiri], I don't have strong opinion regarding to use ResourceOption/node-heartbeat to update thresholds. > Provide a knob to turn on over-allocation > - > > Key: YARN-4512 > URL: https://issues.apache.org/jira/browse/YARN-4512 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: YARN-4512-YARN-1011.001.patch, > yarn-4512-yarn-1011.002.patch, yarn-4512-yarn-1011.003.patch > > > We need two configs for overallocation - one to specify the threshold upto > which it is okay to over-allocate, another to specify the threshold after > which OPPORTUNISTIC containers should be preempted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4596) SystemMetricPublisher should not swallow error messages from TimelineClient#putEntities
Li Lu created YARN-4596: --- Summary: SystemMetricPublisher should not swallow error messages from TimelineClient#putEntities Key: YARN-4596 URL: https://issues.apache.org/jira/browse/YARN-4596 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Li Lu Assignee: Li Lu We should report error messages from the returned TimelineResponse when posting timeline entities through system metric publisher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4565) When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, Sometimes lead to situation where all queue resources consumed by AMs only
[ https://issues.apache.org/jira/browse/YARN-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101108#comment-15101108 ] Wangda Tan commented on YARN-4565: -- Thanks [~Naganarasimha]! > When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, > Sometimes lead to situation where all queue resources consumed by AMs only > > > Key: YARN-4565 > URL: https://issues.apache.org/jira/browse/YARN-4565 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0 >Reporter: Karam Singh >Assignee: Wangda Tan > Attachments: YARN-4565.1.patch, YARN-4565.2.patch, YARN-4565.3.patch > > > When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, > Sometimes lead to situation where all queue resources consumed by AMs only, > So from users perpective it appears that all application in queue are stuck, > whole queue capacity is comsumed by AMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101106#comment-15101106 ] Wangda Tan commented on YARN-1011: -- bq. Welcome any thoughts/suggestions on handling promotion if we allow applications to ask for only guaranteed containers. I ll continue brain-storming. We want to have a simple mechanism, if possible; complex protocols seem to find a way to hoard bugs. Agree :) > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy > Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf, > yarn-1011-design-v2.pdf > > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4496) Improve HA ResourceManager Failover detection on the client
[ https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-4496: - Assignee: Jian He (was: Arun Suresh) > Improve HA ResourceManager Failover detection on the client > --- > > Key: YARN-4496 > URL: https://issues.apache.org/jira/browse/YARN-4496 > Project: Hadoop YARN > Issue Type: Improvement > Components: client, resourcemanager >Reporter: Arun Suresh >Assignee: Jian He > > HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to > improve Namenode failover detection in the client. It does this by > concurrently trying all namenodes and picks the namenode that returns the > fastest with a successful response as the active node. > It would be useful to have a similar ProxyProvider for the Yarn RM (it can > possibly be done by converging some the class hierarchies to use the same > ProxyProvider) > This would especially be useful for large YARN deployments with multiple > standby RMs where clients will be able to pick the active RM without having > to traverse a list of configured RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4496) Improve HA ResourceManager Failover detection on the client
[ https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101053#comment-15101053 ] Jian He commented on YARN-4496: --- taking over, thanks ! > Improve HA ResourceManager Failover detection on the client > --- > > Key: YARN-4496 > URL: https://issues.apache.org/jira/browse/YARN-4496 > Project: Hadoop YARN > Issue Type: Improvement > Components: client, resourcemanager >Reporter: Arun Suresh >Assignee: Jian He > > HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to > improve Namenode failover detection in the client. It does this by > concurrently trying all namenodes and picks the namenode that returns the > fastest with a successful response as the active node. > It would be useful to have a similar ProxyProvider for the Yarn RM (it can > possibly be done by converging some the class hierarchies to use the same > ProxyProvider) > This would especially be useful for large YARN deployments with multiple > standby RMs where clients will be able to pick the active RM without having > to traverse a list of configured RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4584) RM startup failure when AM attempts greater than max-attempts
[ https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099209#comment-15099209 ] Bibin A Chundatt commented on YARN-4584: [~jianhe] The AM is getting pre-empted and exceeds the max limit in my case. > RM startup failure when AM attempts greater than max-attempts > - > > Key: YARN-4584 > URL: https://issues.apache.org/jira/browse/YARN-4584 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-4584.patch > > > Configure 3 queue in cluster with 8 GB > # queue 40% > # queue 50% > # default 10% > * Submit applications to all 3 queue with container size as 1024MB (sleep job > with 50 containers on all queues) > * AM that gets assigned to default queue and gets preempted immediately after > 20 preemption kill all application > Due resource limit in default queue AM got prempted about 20 times > On RM restart RM fails to restart > {noformat} > 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: > noteFailure java.lang.NullPointerException > 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED; cause: > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: > Service: RMActiveServices entered state STOPPED > 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: > RMActiveServices: stopping services, size=16 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
[ https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099207#comment-15099207 ] Hadoop QA commented on YARN-4428: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 39s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 2, now 2). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s {color} | {color:red} Patch generated 7 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 132, now 138). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 47s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 8s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 140m 23s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yar
[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099204#comment-15099204 ] Hadoop QA commented on YARN-4311: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 3s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 40s {color} | {color:red} root in trunk failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 17s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 2s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 6s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 34s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 3s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 8m 17s {color} | {color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 622 new issues (was 111, now 733). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 3s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 43s {color} | {color:red} root in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 43s {color} | {color:red} root in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 2s {color} | {color:red} Patch generated 2 new checkstyle issues in root (total was 397, now 398). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 4s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 57s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 20s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:gre
[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file
[ https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099191#comment-15099191 ] Sangjin Lee commented on YARN-4577: --- {quote} The use case here is simple: if we specify the aux-services classpath, either from local fs or from hdfs, we will load this service from the specified classpath (no matter we set the classpath in NM path or not). Otherwise, we load the service from the NM path. {quote} Hmm, is one of the goals to preserve aux service's dependencies against hadoop's dependencies (as I see in the linked ticket SPARK-12807)? If so, I don't think the current approach in the patch does that. Note that URLClassLoader (or any simple extension of ClassLoader) always *delegates classloading to the parent classloader first*, and loads the class *only if* the parent classloader doesn't load/have it. In other words, any classpath the URLClassLoader owns is effectively *appended*, not prepended. That's precisely why ApplicationClassLoader inverts that order to create isolation. Could you write a simple test program to verify this behavior? I'm pretty sure you'll find that your classpath will still be shadowed by the system classpath. Also, as for using the ApplicationClassLoader, it shouldn't be too difficult. You pass in {{URL[]}} to the URLClassLoader too, so that's common. You can simply pass in the classloader of the calling class as the parent classloader. Also, you can simply pass null for the system classes, in which case the sensible default will be used. If it helps anyway, we could introduce a simpler constructor like the following: {code} public ApplicationClassLoader(URL[] classpath) { this(classpath, getClass().getClassLoader(), null); } {code} > Enable aux services to have their own custom classpath/jar file > --- > > Key: YARN-4577 > URL: https://issues.apache.org/jira/browse/YARN-4577 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4577.1.patch, YARN-4577.2.patch > > > Right now, users have to add their jars to the NM classpath directly, thus > put them on the system classloader. But if multiple versions of the plugin > are present on the classpath, there is no control over which version actually > gets loaded. Or if there are any conflicts between the dependencies > introduced by the auxiliary service and the NM itself, they can break the NM, > the auxiliary service, or both. > The solution could be: to instantiate aux services using a classloader that > is different from the system classloader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4594) Fix test-container-executor.c to pass
[ https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099190#comment-15099190 ] Hadoop QA commented on YARN-4594: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 4s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 15s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 31m 48s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12782374/YARN-4594.001.patch | | JIRA Issue | YARN-4594 | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux e66c5cac93b8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cdf8895 | | Default Java | 1.7.0_91 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 | | JDK v1.7.0_91 Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/10292/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Max memory used | 76MB | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10292/console | This message was automatically generated. > Fix test-container-executor.c to pass > - > > Key: YARN-4594 > URL: https://issues.apache.org/jira/browse/YARN-4594 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Re
[jira] [Commented] (YARN-4559) Make leader elector and zk store share the same curator client
[ https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099185#comment-15099185 ] Karthik Kambatla commented on YARN-4559: There "could" be other stores that provide the same guarantees as zk-store. > Make leader elector and zk store share the same curator client > -- > > Key: YARN-4559 > URL: https://issues.apache.org/jira/browse/YARN-4559 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4559.1.patch, YARN-4559.2.patch > > > After YARN-4438, we can reuse the same curator client for leader elector and > zk store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4559) Make leader elector and zk store share the same curator client
[ https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099163#comment-15099163 ] Jian He commented on YARN-4559: --- bq. ZK-leader election with a different store. I think currently only zk-store supports fencing, which means user must use zk-store for failover ? > Make leader elector and zk store share the same curator client > -- > > Key: YARN-4559 > URL: https://issues.apache.org/jira/browse/YARN-4559 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4559.1.patch, YARN-4559.2.patch > > > After YARN-4438, we can reuse the same curator client for leader elector and > zk store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4559) Make leader elector and zk store share the same curator client
[ https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099154#comment-15099154 ] Karthik Kambatla commented on YARN-4559: bq. I think the best way is just to merge the LeaderElectorService logic into ZKRMStateStore itself. I am not sure that is a good idea. Users might want to use ZK-leader election with a different store. > Make leader elector and zk store share the same curator client > -- > > Key: YARN-4559 > URL: https://issues.apache.org/jira/browse/YARN-4559 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4559.1.patch, YARN-4559.2.patch > > > After YARN-4438, we can reuse the same curator client for leader elector and > zk store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
[ https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4428: --- Attachment: YARN-4428.3.patch > Redirect RM page to AHS page when AHS turned on and RM page is not avaialable > - > > Key: YARN-4428 > URL: https://issues.apache.org/jira/browse/YARN-4428 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4428.1.2.patch, YARN-4428.1.patch, > YARN-4428.2.2.patch, YARN-4428.2.patch, YARN-4428.3.patch, YARN-4428.3.patch > > > When AHS is turned on, if we can't view application in RM page, RM page > should redirect us to AHS page. For example, when you go to > cluster/app/application_1, if RM no longer remember the application, we will > simply get "Failed to read the application application_1", but it will be > good for RM ui to smartly try to redirect to AHS ui > /applicationhistory/app/application_1 to see if it's there. The redirect > usage already exist for logs in nodemanager UI. > Also, when AHS is enabled, WebAppProxyServlet should redirect to AHS page on > fall back of RM not remembering the app. YARN-3975 tried to do this only when > original tracking url is not set. But there are many cases, such as when app > failed at launch, original tracking url will be set to point to RM page, so > redirect to AHS page won't work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4589) Diagnostics for localization timeouts is lacking
[ https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4589: --- Attachment: YARN-4589.2.patch > Diagnostics for localization timeouts is lacking > > > Key: YARN-4589 > URL: https://issues.apache.org/jira/browse/YARN-4589 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4589.2.patch, YARN-4589.patch > > > When a container takes too long to localize it manifests as a timeout, and > there's no indication that localization was the issue. We need diagnostics > for timeouts to indicate the container was still localizing when the timeout > occurred. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4593) Deadlock in AbstractService.getConfig()
[ https://issues.apache.org/jira/browse/YARN-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099091#comment-15099091 ] Hadoop QA commented on YARN-4593: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 42s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 42s {color} | {color:red} root in trunk failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 49s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 40s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 37s {color} | {color:red} root in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 37s {color} | {color:red} root in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 49s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 27s {color} | {color:red} hadoop-common in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 34s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 45m 44s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12782366/YARN-4593-001.patch | | JIRA Issue | YARN-4593 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 2c4964c87234 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/person
[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099088#comment-15099088 ] Eric Payne commented on YARN-4108: -- [~leftnoteasy], great job! This approach looks like it has potential to vastly improve preemption. I just have a few comments and questions. - In the lazy preemption case, PCPP will send an event to the scheduler to mark a container killable. Can PCPP check if it's already been marked before sending, so that maybe event traffic will be less in the RM? - Currently, if both queueA and queueB are over their guaranteed capacity, preemption will still occur if queueA is more over capacity than queueB. I think it is probably important to preserve this behavior (YARN-2592). -- I don't see anyplace where {{ResourceLimits#isAllowPreemption}} is called. But, if it is, Will the following code in {{LeafQueue}} change preemption behavior? {noformat} private void setPreemptionAllowed(ResourceLimits limits, String nodePartition) { // Set preemption-allowed: // For leaf queue, only under-utilized queue is allowed to preempt resources from other queues float usedCapacity = queueCapacities.getAbsoluteUsedCapacity(nodePartition); float guaranteedCapacity = queueCapacities.getAbsoluteCapacity(nodePartition); limits.setIsAllowPreemption(usedCapacity < guaranteedCapacity); } {noformat} -- Also, in {{ParentQueue#canAssign}}, does the following code have the same effect? {noformat} if (this.getQueueCapacities().getUsedCapacity(node.getPartition()) < 1.0f) { {noformat} - In {{AbstractCSQueue#canAssignToThisQueue}}: -- I'm just trying to understand how things will be affected when headroom for a parent queue is (limit - used) + killable. Doesn't that say that a parent queue has more headroom than it's already acutally using? Is it relying on this behavior so that the {{assignment}} code will determine that it has more headroom when there are killable containers, and then rely on the leafqueue to kill those containers? -- NPE if {{getChildQueues()}} returns null {noformat} if (null != getChildQueues() || !getChildQueues().isEmpty()) { {noformat} - {{CSAssignment#toKillContainers}}: I would call them {{containersToKill}} {quote} 4. I would like to have some freedom in selecting conatiners (marking) for preemption. A simple sorting based on submission time or priority seems limited approach. Could we have some interface here so that we can plugin user specific comparision cases. submission time priority demand based etc may be {quote} - To [~sunilg]'s point: Currently PCPP doesn't take into consideration things like locality or container size. If a queue is over its capacity by 8GB, and there are 1 8GB container plus 8 1GB containers, PCPP may decide to kill the 1 8GB contaienr or it may decide to kill the 8 1GB containers, depending on properties like 'time since submission' and 'ignore-partition-exclusivity'. So, with the current, lazy preemption proposal, if the underserved queue needs an 8GB container and the 8 1GB containers are marked as killable, at least now those containers don't get killed. It's a step in the right direction, but the underserved queue still has to wait. Same kind of thing with locality and other properties. It would be interesting to know what your thoughts are on making further modifications to PCPP to make more informed choices about which containers to kill. There may not be a "right" choice in PCPP, though, since the requirements of the underserved queue may change by the time the scheduler gets around to allocating resources. > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-V3.pdf, > YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, > YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch > > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v
[jira] [Commented] (YARN-4559) Make leader elector and zk store share the same curator client
[ https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099068#comment-15099068 ] Jian He commented on YARN-4559: --- new patch moved the curator#close in RM#serviceStop. I cannot move the curator client creation into to LeaderElectorService because LeaderElectorService will not be created if HA is not enabled. I think the best way is just to merge the LeaderElectorService logic into ZKRMStateStore itself. But that requires more refactoring on ZKRMStateStore because ZKRMStateStore is not an alwaysOn service. > Make leader elector and zk store share the same curator client > -- > > Key: YARN-4559 > URL: https://issues.apache.org/jira/browse/YARN-4559 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4559.1.patch, YARN-4559.2.patch > > > After YARN-4438, we can reuse the same curator client for leader elector and > zk store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4559) Make leader elector and zk store share the same curator client
[ https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4559: -- Attachment: YARN-4559.2.patch > Make leader elector and zk store share the same curator client > -- > > Key: YARN-4559 > URL: https://issues.apache.org/jira/browse/YARN-4559 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4559.1.patch, YARN-4559.2.patch > > > After YARN-4438, we can reuse the same curator client for leader elector and > zk store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4595) Add support for configurable read-only mounts
Billie Rinaldi created YARN-4595: Summary: Add support for configurable read-only mounts Key: YARN-4595 URL: https://issues.apache.org/jira/browse/YARN-4595 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Billie Rinaldi Assignee: Billie Rinaldi Mounting files or directories from the host is one way of passing configuration and other information into a docker container. We could allow the user to set a list of mounts in the environment of ContainerLaunchContext (e.g. /dir1:/targetdir1,/dir2:/targetdir2). These would be mounted read-only to the specified target locations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4592) Remove unsed GetContainerStatus proto
[ https://issues.apache.org/jira/browse/YARN-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099017#comment-15099017 ] Hadoop QA commented on YARN-4592: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 11m 55s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12782358/YARN-4592.patch | | JIRA Issue | YARN-4592 | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux bd68db2f13f6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 817cc1f | | Default Java | 1.7.0_91 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 | | JDK v1.7.0_91 Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/10289/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api | | Max memory used | 76MB | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10289/console | This message was automatically generated. > Remove unsed GetContainerStatus proto > - > > Key: YARN-4592 > URL: https://issues.apache.org/jira/browse/YARN-4592 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang
[jira] [Commented] (YARN-4584) RM startup failure when AM attempts greater than max-attempts
[ https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098996#comment-15098996 ] Jian He commented on YARN-4584: --- I have one question, normally, if FailuresValidityInterval <=0, no attempts will be removed from store, because the number of attempts should always be <= max-attempts. But if attempts were failed for the reasons like preempted, disk failed ( see RMAppImpl#shouldCountTowardsMaxAttemptRetry), even if FailuresValidityInterval is <= 0, the number of attempts could go beyond max-attempt. Is this the scenario you are running into ? > RM startup failure when AM attempts greater than max-attempts > - > > Key: YARN-4584 > URL: https://issues.apache.org/jira/browse/YARN-4584 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-4584.patch > > > Configure 3 queue in cluster with 8 GB > # queue 40% > # queue 50% > # default 10% > * Submit applications to all 3 queue with container size as 1024MB (sleep job > with 50 containers on all queues) > * AM that gets assigned to default queue and gets preempted immediately after > 20 preemption kill all application > Due resource limit in default queue AM got prempted about 20 times > On RM restart RM fails to restart > {noformat} > 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: > noteFailure java.lang.NullPointerException > 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED; cause: > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: > Service: RMActiveServ
[jira] [Updated] (YARN-4594) Fix test-container-executor.c to pass
[ https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated YARN-4594: --- Attachment: YARN-4594.001.patch > Fix test-container-executor.c to pass > - > > Key: YARN-4594 > URL: https://issues.apache.org/jira/browse/YARN-4594 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: YARN-4594.001.patch > > > test-container-executor.c doesn't work: > * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually > /usr/bin/ls on many systems. > * The recursive delete logic in container-executor.c fails -- nftw does the > wrong thing when confronted with directories with the wrong mode (permission > bits), leading to an attempt to run rmdir on a non-empty directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098974#comment-15098974 ] MENG DING commented on YARN-4108: - Hi, [~leftnoteasy], will there be a separate ticket to track the issue of selecting to-be-preempted containers based on pending new/increase resource request? > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-V3.pdf, > YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, > YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch > > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4594) Fix test-container-executor.c to pass
Colin Patrick McCabe created YARN-4594: -- Summary: Fix test-container-executor.c to pass Key: YARN-4594 URL: https://issues.apache.org/jira/browse/YARN-4594 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe test-container-executor.c doesn't work: * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually /usr/bin/ls on many systems. * The recursive delete logic in container-executor.c fails -- nftw does the wrong thing when confronted with directories with the wrong mode (permission bits), leading to an attempt to run rmdir on a non-empty directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids
[ https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098972#comment-15098972 ] Hadoop QA commented on YARN-4371: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 9s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client (total was 15, now 16). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 16s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 26s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 143m 8s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.8.0_66 Timed out junit tests | org.apache.hadoop.yarn.client.cli.TestYarnCLI | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestNMClient | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.7.0_91 Timed out junit tests | org.apache.hadoop.yarn.client.cli.TestYarnCLI | | | org.apache.hadoop.yarn.client.api.impl.Test
[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098966#comment-15098966 ] Li Lu commented on YARN-4265: - The maven dependency check uses a separate maven repository {{-Dmaven.repo.local=/home/jenkins/yetus-m2/hadoop-trunk-0}}. I tested the whole workflow for building from the scratch and it worked. The AHS-test jar has not been published to apache SNAPSHOT (timeline plugin is the first module to depend on this module) yet. Therefore the mvn dependency test appears to be unrelated here. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, > YARN-4265-trunk.003.patch, YARN-4265-trunk.004.patch, > YARN-4265-trunk.005.patch, YARN-4265-trunk.006.patch, > YARN-4265-trunk.007.patch, YARN-4265-trunk.008.patch, > YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098951#comment-15098951 ] Hadoop QA commented on YARN-4265: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 8 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 38s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 31s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 25s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 59s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped branch modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 29s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 2s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} mvndep {color} | {color:red} 0m 27s {color} | {color:red} patch's hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server dependency:list failed {color} | | {color:red}-1{color} | {color:red} mvndep {color} | {color:red} 0m 44s {color} | {color:red} patch's hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage dependency:list failed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 44s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 23s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 23s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 34s {color} | {color:red} Patch generated 8 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 292, now 299). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patch modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 34s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 39s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server-jdk1.8.0_66 with JDK v1.8.0_66 generated 7 new issues (was 544, now 551). {color} |
[jira] [Updated] (YARN-4593) Deadlock in AbstractService.getConfig()
[ https://issues.apache.org/jira/browse/YARN-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-4593: - Attachment: YARN-4593-001.patch Patch 001, removes the -synchronized attributed. The field is still volatile (and only written in {{Service.init()}} > Deadlock in AbstractService.getConfig() > --- > > Key: YARN-4593 > URL: https://issues.apache.org/jira/browse/YARN-4593 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.2 > Environment: AM restarting on kerberized cluster >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-4593-001.patch > > > SLIDER-1052 has found a deadlock which can arise in it during AM restart. > Looking at the thread trace, one of the blockages is actually > {{AbstractService.getConfig()}} —this is synchronized and so blocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4593) Deadlock in AbstractService.getConfig()
[ https://issues.apache.org/jira/browse/YARN-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098826#comment-15098826 ] Steve Loughran commented on YARN-4593: -- SLIDER-1052 shows this. I'm not going to blame this JIRA for that, but it's certainly where the problem arises > Deadlock in AbstractService.getConfig() > --- > > Key: YARN-4593 > URL: https://issues.apache.org/jira/browse/YARN-4593 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.2 > Environment: AM restarting on kerberized cluster >Reporter: Steve Loughran >Assignee: Steve Loughran > > SLIDER-1052 has found a deadlock which can arise in it during AM restart. > Looking at the thread trace, one of the blockages is actually > {{AbstractService.getConfig()}} —this is synchronized and so blocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4593) Deadlock in AbstractService.getConfig()
[ https://issues.apache.org/jira/browse/YARN-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098825#comment-15098825 ] Steve Loughran commented on YARN-4593: -- I don't see why we need this to be synchronized (it's that way in branch-205); the actual type is {{volatile}} so access is thread safe anyway. > Deadlock in AbstractService.getConfig() > --- > > Key: YARN-4593 > URL: https://issues.apache.org/jira/browse/YARN-4593 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.2 > Environment: AM restarting on kerberized cluster >Reporter: Steve Loughran >Assignee: Steve Loughran > > SLIDER-1052 has found a deadlock which can arise in it during AM restart. > Looking at the thread trace, one of the blockages is actually > {{AbstractService.getConfig()}} —this is synchronized and so blocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4593) Deadlock in AbstractService.getConfig()
Steve Loughran created YARN-4593: Summary: Deadlock in AbstractService.getConfig() Key: YARN-4593 URL: https://issues.apache.org/jira/browse/YARN-4593 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.2 Environment: AM restarting on kerberized cluster Reporter: Steve Loughran Assignee: Steve Loughran SLIDER-1052 has found a deadlock which can arise in it during AM restart. Looking at the thread trace, one of the blockages is actually {{AbstractService.getConfig()}} —this is synchronized and so blocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster
[ https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098785#comment-15098785 ] Hadoop QA commented on YARN-4389: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 58s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 36s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 6s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 32s {color} | {color:red} Patch generated 5 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 161, now 166). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 11s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 14s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 12s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-api in the patch passed
[jira] [Created] (YARN-4592) Remove unsed GetContainerStatus proto
Chang Li created YARN-4592: -- Summary: Remove unsed GetContainerStatus proto Key: YARN-4592 URL: https://issues.apache.org/jira/browse/YARN-4592 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Priority: Minor Attachments: YARN-4592.patch GetContainerStatus protos have been left unused since YARN-926 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4592) Remove unsed GetContainerStatus proto
[ https://issues.apache.org/jira/browse/YARN-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4592: --- Attachment: YARN-4592.patch > Remove unsed GetContainerStatus proto > - > > Key: YARN-4592 > URL: https://issues.apache.org/jira/browse/YARN-4592 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li >Priority: Minor > Attachments: YARN-4592.patch > > > GetContainerStatus protos have been left unused since YARN-926 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4558) Yarn client retries on some non-retriable exceptions
[ https://issues.apache.org/jira/browse/YARN-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098722#comment-15098722 ] Sergey Shelukhin commented on YARN-4558: In this case, retry policy is built in YARN code. I don't know if there's more to it on Hadoop side than what YARN sets up, but from a cursory examination it doesn't look like that is the case. > Yarn client retries on some non-retriable exceptions > > > Key: YARN-4558 > URL: https://issues.apache.org/jira/browse/YARN-4558 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Sergey Shelukhin >Priority: Minor > > Seems the problem is in RMProxy where the policy is built. > {noformat} > Thread 23594: (state = BLOCKED) > - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) > - org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(java.lang.Object, > java.lang.reflect.Method, java.lang.Object[]) @bci=603, line=155 (Interpreted > frame) > - > com.sun.proxy.$Proxy32.getClusterNodes(org.apache.hadoop.yarn.api.protocolrecords.GetClusterNodesRequest) > @bci=16 (Interpreted frame) > - > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(org.apache.hadoop.yarn.api.records.NodeState[]) > @bci=66, line=515 (Interpreted frame) > {noformat} > produces > {noformat} > 2016-01-07 02:50:45,111 [main] WARN ipc.Client - Exception encountered while > connecting to the server : javax.security.sasl.SaslException: GSS initiate > failed [Caused by GSSException: No valid credentials provided (Mechanism > level: Failed to find any Kerberos tgt)] > 2016-01-07 02:51:15,126 [main] WARN ipc.Client - Exception encountered while > connecting to the server : javax.security.sasl.SaslException: GSS initiate > failed [Caused by GSSException: No valid credentials provided (Mechanism > level: Failed to find any Kerberos tgt)] > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI
[ https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098682#comment-15098682 ] Kuhu Shukla commented on YARN-3102: --- [~templedf], [~jlowe], Requesting for review comments. Thanks a lot! > Decommisioned Nodes not listed in Web UI > > > Key: YARN-3102 > URL: https://issues.apache.org/jira/browse/YARN-3102 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 > Environment: 2 Node Manager and 1 Resource Manager >Reporter: Bibin A Chundatt >Assignee: Kuhu Shukla >Priority: Minor > Attachments: YARN-3102-v1.patch, YARN-3102-v2.patch, > YARN-3102-v3.patch, YARN-3102-v4.patch, YARN-3102-v5.patch > > > Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to > yarn.exlude file In RM1 machine > Add Yarn.exclude with NM1 Host Name > Start the node as listed below NM1,NM2 Resource manager > Now check Nodes decommisioned in /cluster/nodes > Number of decommisioned node is listed as 1 but Table is empty in > /cluster/nodes/decommissioned (detail of Decommision node not shown) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated YARN-4311: -- Attachment: YARN-4311-v8.patch Thank you [~jlowe] for the review comments. I have updated the patch. The interval is 1/2 the value of the timeout config field( or should it be 1/3 like the NM expiry interval, although that could be excessive in my opinion). The default timeout is 1 minute. The interval is capped at 10 minutes. > Removing nodes from include and exclude lists will not remove them from > decommissioned nodes list > - > > Key: YARN-4311 > URL: https://issues.apache.org/jira/browse/YARN-4311 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-4311-v1.patch, YARN-4311-v2.patch, > YARN-4311-v3.patch, YARN-4311-v4.patch, YARN-4311-v5.patch, > YARN-4311-v6.patch, YARN-4311-v7.patch, YARN-4311-v8.patch > > > In order to fully forget about a node, removing the node from include and > exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The > tricky part that [~jlowe] pointed out was the case when include lists are not > used, in that case we don't want the nodes to fall off if they are not active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers
[ https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098641#comment-15098641 ] Giovanni Matteo Fumarola commented on YARN-2885: Agree with [~kishorch]. If it will remain in this way, we will have to add more checks in FederationInterceptor. > Create AMRMProxy request interceptor for distributed scheduling decisions for > queueable containers > -- > > Key: YARN-2885 > URL: https://issues.apache.org/jira/browse/YARN-2885 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2885-yarn-2877.001.patch, > YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, > YARN-2885-yarn-2877.full-3.patch, YARN-2885-yarn-2877.full.patch, > YARN-2885-yarn-2877.v4.patch, YARN-2885-yarn-2877.v5.patch, > YARN-2885_api_changes.patch > > > We propose to add a Local ResourceManager (LocalRM) to the NM in order to > support distributed scheduling decisions. > Architecturally we leverage the RMProxy, introduced in YARN-2884. > The LocalRM makes distributed decisions for queuable containers requests. > Guaranteed-start requests are still handled by the central RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3446) FairScheduler headroom calculation should exclude nodes in the blacklist
[ https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098617#comment-15098617 ] zhihai xu commented on YARN-3446: - [~kasha], thanks for the review and committing the patch! > FairScheduler headroom calculation should exclude nodes in the blacklist > > > Key: YARN-3446 > URL: https://issues.apache.org/jira/browse/YARN-3446 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.9.0 > > Attachments: YARN-3446.000.patch, YARN-3446.001.patch, > YARN-3446.002.patch, YARN-3446.003.patch, YARN-3446.004.patch, > YARN-3446.005.patch > > > FairScheduler HeadRoom calculation should exclude nodes in the blacklist. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes. This makes jobs to > hang forever(ResourceManager does not assign any new containers on > blacklisted nodes but availableResource AM get from RM includes blacklisted > nodes available resource). > This issue is similar as YARN-1680 which is for Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4371) "yarn application -kill" should take multiple application ids
[ https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4371: -- Attachment: 0006-YARN-4371.patch Fixing javadoc issue. > "yarn application -kill" should take multiple application ids > - > > Key: YARN-4371 > URL: https://issues.apache.org/jira/browse/YARN-4371 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Tsuyoshi Ozawa >Assignee: Sunil G > Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch, > 0003-YARN-4371.patch, 0004-YARN-4371.patch, 0005-YARN-4371.patch, > 0006-YARN-4371.patch > > > Currently we cannot pass multiple applications to "yarn application -kill" > command. The command should take multiple application ids at the same time. > Each entries should be separated with whitespace like: > {code} > yarn application -kill application_1234_0001 application_1234_0007 > application_1234_0012 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster
[ https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098583#comment-15098583 ] Hadoop QA commented on YARN-4389: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 48s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s {color} | {color:red} Patch generated 5 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 161, now 166). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 45s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 54s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 58s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed
[jira] [Commented] (YARN-4551) Address the duplication between StatusUpdateWhenHealthy and StatusUpdateWhenUnhealthy transitions
[ https://issues.apache.org/jira/browse/YARN-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098574#comment-15098574 ] Sunil G commented on YARN-4551: --- Thank you [~kasha] for the review and commit! > Address the duplication between StatusUpdateWhenHealthy and > StatusUpdateWhenUnhealthy transitions > - > > Key: YARN-4551 > URL: https://issues.apache.org/jira/browse/YARN-4551 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Sunil G >Priority: Minor > Labels: newbie > Fix For: 2.9.0 > > Attachments: 0001-YARN-4551.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file
[ https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098556#comment-15098556 ] Steve Loughran commented on YARN-4577: -- Test wise * {{testLoadAuxServiceLocally}} should be calling aux.close() in finally{} clauses. It's idempotent so you could so a close() in the main path (and so test it), but still clean up after. * it'd be nice for the asserts to include some text about why the asserts are failing, especially simple {{assertTrue()}} calls. The goal is that enough information is printed to enable someone who sees the Jenkins log to be able to diagnose the problem. An "assert failed line 315" doesn't do that much, leads to the "add more test diagnostics" patches and more iterations. > Enable aux services to have their own custom classpath/jar file > --- > > Key: YARN-4577 > URL: https://issues.apache.org/jira/browse/YARN-4577 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4577.1.patch, YARN-4577.2.patch > > > Right now, users have to add their jars to the NM classpath directly, thus > put them on the system classloader. But if multiple versions of the plugin > are present on the classpath, there is no control over which version actually > gets loaded. Or if there are any conflicts between the dependencies > introduced by the auxiliary service and the NM itself, they can break the NM, > the auxiliary service, or both. > The solution could be: to instantiate aux services using a classloader that > is different from the system classloader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
[ https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4428: --- Attachment: YARN-4428.3.patch thanks [~jlowe] for review and point me to the related issue! updated .3 patch which compute redirect inside RMWebAppFilter > Redirect RM page to AHS page when AHS turned on and RM page is not avaialable > - > > Key: YARN-4428 > URL: https://issues.apache.org/jira/browse/YARN-4428 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4428.1.2.patch, YARN-4428.1.patch, > YARN-4428.2.2.patch, YARN-4428.2.patch, YARN-4428.3.patch > > > When AHS is turned on, if we can't view application in RM page, RM page > should redirect us to AHS page. For example, when you go to > cluster/app/application_1, if RM no longer remember the application, we will > simply get "Failed to read the application application_1", but it will be > good for RM ui to smartly try to redirect to AHS ui > /applicationhistory/app/application_1 to see if it's there. The redirect > usage already exist for logs in nodemanager UI. > Also, when AHS is enabled, WebAppProxyServlet should redirect to AHS page on > fall back of RM not remembering the app. YARN-3975 tried to do this only when > original tracking url is not set. But there are many cases, such as when app > failed at launch, original tracking url will be set to point to RM page, so > redirect to AHS page won't work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098526#comment-15098526 ] Sunil G commented on YARN-4108: --- Thank you [~leftnoteasy] for the detailed updated design doc. Few comments or queries 1. In current PCPP, we first preempt all reserved continers from all applications in a queue if its overallocated. After introduding "killable containers", so we will have some containers which are reserved or running. Am I correct? or reserved containers are planning to handled differently? Possible cases: - queueA's appA has made a reservation on node1. now queueB has demand, and we can unreserve appA's container requests from node1 and queueB's app can allocate some containers there. - or queueB's app can now reserve a container there. This case I think is explained by the design doc. Basically we can still try to preempt reserved container first. 2. If I understood correctly, "killable containers" will be triggered with preeempt event only if a proper allocation can happen for target application (from underserving queue). - So do we send preempt_container event to AM here at this point? This can make a delay of 15secs, so if by some chance scheduling footprint is changed (some other NMs freed space), we may some overkill. (I guess this can happen now also). - 15secs later, RM do forcekill to these containers. Is there any change for this approach in this new design? because as per doc, its mentioned that we need to preempt only if we can reserve. So I am slightly confused here. 3. To cancel "killable container", i think PCPP will take the call by waiting for some interval. So some new configuration is needed for this? 4. I would like to have some freedom in selecting conatiners (marking) for preemption. A simple sorting based on submission time or priority seems limited approach. Could we have some interface here so that we can plugin user specific comparision cases. - submission time - priority - demand based etc may be > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-V3.pdf, > YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, > YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch > > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers
[ https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098525#comment-15098525 ] Kishore Chaliparambil commented on YARN-2885: - Hi [~asuresh], I am reviewing the latest patch. I noticed that we assume that the last interceptor in the chain will be the LocalScheduler. This might break the model when we support YARN federation (YARN 3666). Federation interceptor will have to be the last interceptor since it abstracts the fact that there are multiple clusters from the application and clients. So I think instead of talking to the RM directly from the LocalScheduler, we can forward the request to the next interceptor in the chain. And until federation is implemented, we can have another interceptor implementation (e.g. DefaultRequestInterceptor ) that talks to the RM and use that as the last interceptor in the chain. Thanks, Kishore > Create AMRMProxy request interceptor for distributed scheduling decisions for > queueable containers > -- > > Key: YARN-2885 > URL: https://issues.apache.org/jira/browse/YARN-2885 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2885-yarn-2877.001.patch, > YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, > YARN-2885-yarn-2877.full-3.patch, YARN-2885-yarn-2877.full.patch, > YARN-2885-yarn-2877.v4.patch, YARN-2885-yarn-2877.v5.patch, > YARN-2885_api_changes.patch > > > We propose to add a Local ResourceManager (LocalRM) to the NM in order to > support distributed scheduling decisions. > Architecturally we leverage the RMProxy, introduced in YARN-2884. > The LocalRM makes distributed decisions for queuable containers requests. > Guaranteed-start requests are still handled by the central RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098511#comment-15098511 ] Karthik Kambatla commented on YARN-1011: [~nroberts], [~leftnoteasy] - reasonable concerns. I am looking into allowing the app ask only for guaranteed containers. Scheduling will likely remain simple: in our loop, we just skip an application if it is not interested in opportunistic containers. Promotion, though, becomes tricky: we should hold off on promoting a container until all higher-"priority" applications that want only guaranteed containers get them. Welcome any thoughts/suggestions on handling promotion if we allow applications to ask for only guaranteed containers. I ll continue brain-storming. We want to have a simple mechanism, if possible; complex protocols seem to find a way to hoard bugs. > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy > Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf, > yarn-1011-design-v2.pdf > > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: YARN-4265-trunk.008.patch New patch to address the checkstyle hidden field warning. I also tested locally for the maven dependency issue. I cleaned my local m2 repository, rebuilt hadoop, and ran maven dependency. I cannot reproduce the problem. Also, the problem appears to be intermittent in the past Jenkins runs. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, > YARN-4265-trunk.003.patch, YARN-4265-trunk.004.patch, > YARN-4265-trunk.005.patch, YARN-4265-trunk.006.patch, > YARN-4265-trunk.007.patch, YARN-4265-trunk.008.patch, > YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098502#comment-15098502 ] Naganarasimha G R commented on YARN-3215: - Thanks for detailing out [~sunilg] and small correction to the solution what you have mentioned as per our discussion, it should be min (total *unused* resourcelimit for a given label, ), so that headroom doesn't exceed whats actually available! right ? cc /[~wangda] thoughts? > Respect labels in CapacityScheduler when computing headroom > --- > > Key: YARN-3215 > URL: https://issues.apache.org/jira/browse/YARN-3215 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-3215.v1.001.patch > > > In existing CapacityScheduler, when computing headroom of an application, it > will only consider "non-labeled" nodes of this application. > But it is possible the application is asking for labeled resources, so > headroom-by-label (like 5G resource available under node-label=red) is > required to get better resource allocation and avoid deadlocks such as > MAPREDUCE-5928. > This JIRA could involve both API changes (such as adding a > label-to-available-resource map in AllocateResponse) and also internal > changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI
[ https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098477#comment-15098477 ] Hadoop QA commented on YARN-3102: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 50s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 12s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 148m 46s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12782289/YARN-3102-v5.patch | | JIRA Issue | YARN-3102 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs c
[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids
[ https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098447#comment-15098447 ] Sunil G commented on YARN-4371: --- javadoc warning is valid. I will update a new patch. > "yarn application -kill" should take multiple application ids > - > > Key: YARN-4371 > URL: https://issues.apache.org/jira/browse/YARN-4371 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Tsuyoshi Ozawa >Assignee: Sunil G > Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch, > 0003-YARN-4371.patch, 0004-YARN-4371.patch, 0005-YARN-4371.patch > > > Currently we cannot pass multiple applications to "yarn application -kill" > command. The command should take multiple application ids at the same time. > Each entries should be separated with whitespace like: > {code} > yarn application -kill application_1234_0001 application_1234_0007 > application_1234_0012 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file
[ https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098461#comment-15098461 ] Xuan Gong commented on YARN-4577: - Thanks, [~sjlee0] for the comments and suggestions. +1 for the suggestion to have a single generic solution that can address all the needs for isolated classloading. But i think that we still need some improvement on this. The use case here is simple: if we specify the aux-services classpath, either from local fs or from hdfs, we will load this service from the specified classpath (no matter we set the classpath in NM path or not). Otherwise, we load the service from the NM path. For ApplicationClassLoader, {code} public ApplicationClassLoader(String classpath, ClassLoader parent, List systemClasses) {code} looks like we have to specify classpath (we can not set it null). Also, it needs me to specify systemClasses which is not required in this use-case. There are some un-necessary checks, such as isSystemClass() when we call loadClass. Overall, i think that the ApplicationClassLoader is too complicate for this use-case. > Enable aux services to have their own custom classpath/jar file > --- > > Key: YARN-4577 > URL: https://issues.apache.org/jira/browse/YARN-4577 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4577.1.patch, YARN-4577.2.patch > > > Right now, users have to add their jars to the NM classpath directly, thus > put them on the system classloader. But if multiple versions of the plugin > are present on the classpath, there is no control over which version actually > gets loaded. Or if there are any conflicts between the dependencies > introduced by the auxiliary service and the NM itself, they can break the NM, > the auxiliary service, or both. > The solution could be: to instantiate aux services using a classloader that > is different from the system classloader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster
[ https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4389: -- Attachment: 0009-YARN-4389.patch Updating patch with validation for disableThreshold. Also added few test cases to verify all conditions. [~djp]/[~rohithsharma] pls help to check the same. > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be app specific > rather than a setting for whole YARN cluster > --- > > Key: YARN-4389 > URL: https://issues.apache.org/jira/browse/YARN-4389 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Reporter: Junping Du >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, > 0003-YARN-4389.patch, 0004-YARN-4389.patch, 0005-YARN-4389.patch, > 0006-YARN-4389.patch, 0007-YARN-4389.patch, 0008-YARN-4389.patch, > 0009-YARN-4389.patch > > > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be application > specific rather than a setting in cluster level, or we should't maintain > amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We > should allow each am to override this config, i.e. via submissionContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids
[ https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098422#comment-15098422 ] Hadoop QA commented on YARN-4371: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 9s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client (total was 15, now 16). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 36s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 0, now 1). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 16s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 29s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 142m 45s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.8.0_66 Timed out junit tests | org.apache.hadoop.yarn.client.cli.TestYarnCLI | | | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.ap
[jira] [Created] (YARN-4591) YARN Web UIs should provide a robots.txt
Lars Francke created YARN-4591: -- Summary: YARN Web UIs should provide a robots.txt Key: YARN-4591 URL: https://issues.apache.org/jira/browse/YARN-4591 Project: Hadoop YARN Issue Type: Improvement Reporter: Lars Francke Priority: Trivial To prevent well-behaved crawlers from indexing public YARN UIs. Similar to HDFS-330 / HDFS-9651. I took a quick look at the Webapp stuff in YARN and it looks complicated so I can't provide a quick patch. If anyone can point me in the right direction I might take a look. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3446) FairScheduler headroom calculation should exclude nodes in the blacklist
[ https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098360#comment-15098360 ] Hudson commented on YARN-3446: -- FAILURE: Integrated in Hadoop-trunk-Commit #9112 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9112/]) YARN-3446. FairScheduler headroom calculation should exclude nodes in (kasha: rev 9d04f26d4c42170ee3dab2f6fb09a94bbf72fc65) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAppSchedulingInfo.java > FairScheduler headroom calculation should exclude nodes in the blacklist > > > Key: YARN-3446 > URL: https://issues.apache.org/jira/browse/YARN-3446 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3446.000.patch, YARN-3446.001.patch, > YARN-3446.002.patch, YARN-3446.003.patch, YARN-3446.004.patch, > YARN-3446.005.patch > > > FairScheduler HeadRoom calculation should exclude nodes in the blacklist. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes. This makes jobs to > hang forever(ResourceManager does not assign any new containers on > blacklisted nodes but availableResource AM get from RM includes blacklisted > nodes available resource). > This issue is similar as YARN-1680 which is for Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3446) FairScheduler headroom calculation should exclude nodes in the blacklist
[ https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3446: --- Summary: FairScheduler headroom calculation should exclude nodes in the blacklist (was: FairScheduler headroom calculation should exclude nodes in the blacklist.) > FairScheduler headroom calculation should exclude nodes in the blacklist > > > Key: YARN-3446 > URL: https://issues.apache.org/jira/browse/YARN-3446 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3446.000.patch, YARN-3446.001.patch, > YARN-3446.002.patch, YARN-3446.003.patch, YARN-3446.004.patch, > YARN-3446.005.patch > > > FairScheduler HeadRoom calculation should exclude nodes in the blacklist. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes. This makes jobs to > hang forever(ResourceManager does not assign any new containers on > blacklisted nodes but availableResource AM get from RM includes blacklisted > nodes available resource). > This issue is similar as YARN-1680 which is for Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3446) FairScheduler headroom calculation should exclude nodes in the blacklist.
[ https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3446: --- Summary: FairScheduler headroom calculation should exclude nodes in the blacklist. (was: FairScheduler HeadRoom calculation should exclude nodes in the blacklist.) > FairScheduler headroom calculation should exclude nodes in the blacklist. > - > > Key: YARN-3446 > URL: https://issues.apache.org/jira/browse/YARN-3446 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3446.000.patch, YARN-3446.001.patch, > YARN-3446.002.patch, YARN-3446.003.patch, YARN-3446.004.patch, > YARN-3446.005.patch > > > FairScheduler HeadRoom calculation should exclude nodes in the blacklist. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes. This makes jobs to > hang forever(ResourceManager does not assign any new containers on > blacklisted nodes but availableResource AM get from RM includes blacklisted > nodes available resource). > This issue is similar as YARN-1680 which is for Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
[ https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098347#comment-15098347 ] zhihai xu commented on YARN-3446: - The test failures for TestClientRMTokens and TestAMAuthorizatio are not related to the patch. Both tests are passed in my local build. > FairScheduler HeadRoom calculation should exclude nodes in the blacklist. > - > > Key: YARN-3446 > URL: https://issues.apache.org/jira/browse/YARN-3446 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-3446.000.patch, YARN-3446.001.patch, > YARN-3446.002.patch, YARN-3446.003.patch, YARN-3446.004.patch, > YARN-3446.005.patch > > > FairScheduler HeadRoom calculation should exclude nodes in the blacklist. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes. This makes jobs to > hang forever(ResourceManager does not assign any new containers on > blacklisted nodes but availableResource AM get from RM includes blacklisted > nodes available resource). > This issue is similar as YARN-1680 which is for Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098341#comment-15098341 ] Eric Payne commented on YARN-4108: -- bq. Do you have any comments/suggestions about the latest proposal? Does it make sense to you? I'm still contemplating it. I will let you know. Thanks. > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-V3.pdf, > YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, > YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch > > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster
[ https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098292#comment-15098292 ] Sunil G commented on YARN-4389: --- Yes [~djp], that's perfectly fine. We could do that, and I will make necessary changes. > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be app specific > rather than a setting for whole YARN cluster > --- > > Key: YARN-4389 > URL: https://issues.apache.org/jira/browse/YARN-4389 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Reporter: Junping Du >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, > 0003-YARN-4389.patch, 0004-YARN-4389.patch, 0005-YARN-4389.patch, > 0006-YARN-4389.patch, 0007-YARN-4389.patch, 0008-YARN-4389.patch > > > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be application > specific rather than a setting in cluster level, or we should't maintain > amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We > should allow each am to override this config, i.e. via submissionContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster
[ https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098285#comment-15098285 ] Junping Du commented on YARN-4389: -- Thanks [~sunilg] for updating the patch. The patch looks pretty good now except one NIT: Shall we check disable threshold to be a valid value (not a negative value, less or equally than 1.0 f) set by app? If app's value is not valid, may be we should log a warn message and use global setting instead. > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be app specific > rather than a setting for whole YARN cluster > --- > > Key: YARN-4389 > URL: https://issues.apache.org/jira/browse/YARN-4389 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Reporter: Junping Du >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, > 0003-YARN-4389.patch, 0004-YARN-4389.patch, 0005-YARN-4389.patch, > 0006-YARN-4389.patch, 0007-YARN-4389.patch, 0008-YARN-4389.patch > > > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be application > specific rather than a setting in cluster level, or we should't maintain > amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We > should allow each am to override this config, i.e. via submissionContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster
[ https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4389: -- Attachment: 0008-YARN-4389.patch Thanks [~rohithsharma] This suggestion makes sense to me. With this, we will be giving option to application to enable/disable this feature regardless of the configuration in YARN. Uploading a new patch, also gave all cases as comment in code. > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be app specific > rather than a setting for whole YARN cluster > --- > > Key: YARN-4389 > URL: https://issues.apache.org/jira/browse/YARN-4389 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Reporter: Junping Du >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, > 0003-YARN-4389.patch, 0004-YARN-4389.patch, 0005-YARN-4389.patch, > 0006-YARN-4389.patch, 0007-YARN-4389.patch, 0008-YARN-4389.patch > > > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be application > specific rather than a setting in cluster level, or we should't maintain > amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We > should allow each am to override this config, i.e. via submissionContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster
[ https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098246#comment-15098246 ] Sunil G commented on YARN-4389: --- [~djp] and [~rohithsharma], could you please help to check the patch. > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be app specific > rather than a setting for whole YARN cluster > --- > > Key: YARN-4389 > URL: https://issues.apache.org/jira/browse/YARN-4389 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Reporter: Junping Du >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, > 0003-YARN-4389.patch, 0004-YARN-4389.patch, 0005-YARN-4389.patch, > 0006-YARN-4389.patch, 0007-YARN-4389.patch, 0008-YARN-4389.patch > > > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be application > specific rather than a setting in cluster level, or we should't maintain > amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We > should allow each am to override this config, i.e. via submissionContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098231#comment-15098231 ] Junping Du commented on YARN-4265: -- The checkstyle complain: {noformat} ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/src/main/java/org/apache/hadoop/yarn/server/timeline/LogInfo.java:90: String filename = getFilename();:12: 'filename' hides a field" {noformat} should be taken care of. Also, it sounds like maven dependency failed? > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, > YARN-4265-trunk.003.patch, YARN-4265-trunk.004.patch, > YARN-4265-trunk.005.patch, YARN-4265-trunk.006.patch, > YARN-4265-trunk.007.patch, YARN-4265.YARN-4234.001.patch, > YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3102) Decommisioned Nodes not listed in Web UI
[ https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated YARN-3102: -- Attachment: YARN-3102-v5.patch Addressed findbugs warning. > Decommisioned Nodes not listed in Web UI > > > Key: YARN-3102 > URL: https://issues.apache.org/jira/browse/YARN-3102 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 > Environment: 2 Node Manager and 1 Resource Manager >Reporter: Bibin A Chundatt >Assignee: Kuhu Shukla >Priority: Minor > Attachments: YARN-3102-v1.patch, YARN-3102-v2.patch, > YARN-3102-v3.patch, YARN-3102-v4.patch, YARN-3102-v5.patch > > > Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to > yarn.exlude file In RM1 machine > Add Yarn.exclude with NM1 Host Name > Start the node as listed below NM1,NM2 Resource manager > Now check Nodes decommisioned in /cluster/nodes > Number of decommisioned node is listed as 1 but Table is empty in > /cluster/nodes/decommissioned (detail of Decommision node not shown) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098219#comment-15098219 ] Sunil G commented on YARN-3215: --- Hi [~Naganarasimha Garla] As discussed offline, I will try explaining the problem more clearly. {{getQueueMaxResource}} considers {{queueCapacities.getAbsoluteMaximumCapacity}} to calculate max possible resource limit per-label. As per the approach, we try to find the difference with this limit to actual resource used for that label (in a queue) to get the headroom. As we consider absolute max capacity, there are chances that this limit may reach more than 100% . (on possibility is that same label is used in another queue). for eg: label X is configured in queueA. - capacity is 30% - max capacity is 60% label X is configured in queueB. - capacity is 70% - max capcity is 80% IN this case, its possible that we may get more than 100% for label X resource limit. So headroom calculated per queue may be more than what is already available. I think this is not very major problem, but its better to solved by putting a max cap with total resource limit per label vs its used limit. Thoughts? > Respect labels in CapacityScheduler when computing headroom > --- > > Key: YARN-3215 > URL: https://issues.apache.org/jira/browse/YARN-3215 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-3215.v1.001.patch > > > In existing CapacityScheduler, when computing headroom of an application, it > will only consider "non-labeled" nodes of this application. > But it is possible the application is asking for labeled resources, so > headroom-by-label (like 5G resource available under node-label=red) is > required to get better resource allocation and avoid deadlocks such as > MAPREDUCE-5928. > This JIRA could involve both API changes (such as adding a > label-to-available-resource map in AllocateResponse) and also internal > changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism
[ https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098214#comment-15098214 ] Varun Vasudev commented on YARN-3542: - {quote} This effectively means the old code is not used anymore, and that the new code is stable. And it seems like we are stating that [..], there should be no issue hooking into the new handler using the old configuration mechanism.? Given this and the fact that we are internally overriding to use the new handlers, is there a reason for keeping the old code at all? Also if we are using the new handler code internally anyways, we can proceed with the deprecation (or better deletion) of LCEResourcesHandler interface, DefaultLCEResourcesHandler etc? {quote} I think there are two issues here - 1) should we keep the CgroupsLCEResourcesHandler code and 2) should we do away with the resource-handler code altogether The answer to (1) is that it is not required but I think it's useful to keep around for some time for two reasons - 1. As documentation/reference - in case some discrepancy is seen in the new implementation 2. In the off-chance that someone has inherited from it. Unfortunately the resource-handler classes are public via config but are not annotated as such. The answer to (2) is no - the resource handler interface is public and we shouldn't break it. Deprecating it is fine once we provide an alternative(which we currently don't have) or decide we will not support it any longer. Either way that decision should be a seperate JIRA, not this one. > Re-factor support for CPU as a resource using the new ResourceHandler > mechanism > --- > > Key: YARN-3542 > URL: https://issues.apache.org/jira/browse/YARN-3542 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Varun Vasudev >Priority: Critical > Attachments: YARN-3542.001.patch, YARN-3542.002.patch, > YARN-3542.003.patch, YARN-3542.004.patch, YARN-3542.005.patch, > YARN-3542.006.patch, YARN-3542.007.patch > > > In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier > addition of new resource types in the nodemanager (this was used for network > as a resource - See YARN-2140 ). We should refactor the existing CPU > implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using > the new ResourceHandler mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4371) "yarn application -kill" should take multiple application ids
[ https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4371: -- Attachment: 0005-YARN-4371.patch Thank you [~jlowe] for the comments. I have updated the patch as per same. > "yarn application -kill" should take multiple application ids > - > > Key: YARN-4371 > URL: https://issues.apache.org/jira/browse/YARN-4371 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Tsuyoshi Ozawa >Assignee: Sunil G > Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch, > 0003-YARN-4371.patch, 0004-YARN-4371.patch, 0005-YARN-4371.patch > > > Currently we cannot pass multiple applications to "yarn application -kill" > command. The command should take multiple application ids at the same time. > Each entries should be separated with whitespace like: > {code} > yarn application -kill application_1234_0001 application_1234_0007 > application_1234_0012 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4590) SLS(Scheduler Load Simulator) web pages can't load css and js resource
[ https://issues.apache.org/jira/browse/YARN-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098168#comment-15098168 ] Bibin A Chundatt commented on YARN-4590: [~xupeng] Can you try from {{hadoop/tools/sls}} as below bin/slsrun.sh --input-rumen=./sample-data/2jobs2min-rumen-jh.json --output-dir=./sample-data/ For trunk i think we can update the add classpath as below {noformat} function calculate_classpath { hadoop_add_to_classpath_toolspath SLS_HTML_DIR="${HADOOP_PREFIX}" if [[ -n "${HADOOP_PREFIX}" ]]; then SLS_HTML_DIR="${HADOOP_PREFIX}/share/hadoop/tools/sls/html" else this="${BASH_SOURCE-$0}" bin=$(cd -P -- "$(dirname -- "${this}")" >/dev/null && pwd -P) SLS_HTML_DIR="${bin}/../html" fi hadoop_debug "Injecting ${SLS_HTML_DIR} into classpath" hadoop_add_classpath "${SLS_HTML_DIR}" if [[ ! -d html ]]; then ln -s "${SLS_HTML_DIR}" html fi } {noformat} Anythoughts on this? > SLS(Scheduler Load Simulator) web pages can't load css and js resource > --- > > Key: YARN-4590 > URL: https://issues.apache.org/jira/browse/YARN-4590 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: xupeng >Priority: Minor > > HadoopVersion : 2.6.0 / with patch YARN-4367-branch-2 > 1. run command "./slsrun.sh > --input-rumen=../sample-data/2jobs2min-rumen-jh.json > --output-dir=../sample-data/" > success > 2. open web page "http://10.6.128.88:10001/track"; > can not load css and js resource -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098085#comment-15098085 ] Sunil G commented on YARN-2009: --- Thanks [~leftnoteasy] for sharing the updates. I will also take a look on YARN-4108. bq.I think how to choose to-be preempted containers should be handled inside PCPP by different policies At first step, this will be really helpful and it will make code more easier and understandable. For in queue preemption, I will work on a similar approach what is taken in PCPP and will post a doc here. I think it can be another policy which is independent of PCPP, but could share few common code as possible. > Priority support for preemption in ProportionalCapacityPreemptionPolicy > --- > > Key: YARN-2009 > URL: https://issues.apache.org/jira/browse/YARN-2009 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Devaraj K >Assignee: Sunil G > > While preempting containers based on the queue ideal assignment, we may need > to consider preempting the low priority application containers first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4584) RM startup failure when AM attempts greater than max-attempts
[ https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098076#comment-15098076 ] Bibin A Chundatt commented on YARN-4584: [~rohithsharma] Testcase failures are not related to patch attached > RM startup failure when AM attempts greater than max-attempts > - > > Key: YARN-4584 > URL: https://issues.apache.org/jira/browse/YARN-4584 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-4584.patch > > > Configure 3 queue in cluster with 8 GB > # queue 40% > # queue 50% > # default 10% > * Submit applications to all 3 queue with container size as 1024MB (sleep job > with 50 containers on all queues) > * AM that gets assigned to default queue and gets preempted immediately after > 20 preemption kill all application > Due resource limit in default queue AM got prempted about 20 times > On RM restart RM fails to restart > {noformat} > 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: > noteFailure java.lang.NullPointerException > 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED; cause: > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: > Service: RMActiveServices entered state STOPPED > 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: > RMActiveServices: stopping services, size=16 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4462) FairScheduler: Disallow preemption from a queue
[ https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098072#comment-15098072 ] Hadoop QA commented on YARN-4462: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} Patch generated 3 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 76, now 76). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 14s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 36s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 148m 23s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12782254/YARN-4462.004.
[jira] [Commented] (YARN-4584) RM startup failure when AM attempts greater than max-attempts
[ https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098067#comment-15098067 ] Hadoop QA commented on YARN-4584: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 241, now 241). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 25s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 35s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 139m 29s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12782252/0001-YARN-45
[jira] [Updated] (YARN-3568) TestAMRMTokens should use some random port
[ https://issues.apache.org/jira/browse/YARN-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3568: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4478 > TestAMRMTokens should use some random port > -- > > Key: YARN-3568 > URL: https://issues.apache.org/jira/browse/YARN-3568 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.8.0 >Reporter: Gera Shegalov > > Since the default port is used for yarn.resourcemanager.scheduler.address, if > we already run a pseudo-distributed cluster on the same development machine, > the test fails like this: > {code} > testMasterKeyRollOver[0](org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens) > Time elapsed: 1.511 sec <<< ERROR! > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [0.0.0.0:8030] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:413) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:590) > at org.apache.hadoop.ipc.Server.(Server.java:2340) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:945) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:534) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:787) > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169) > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:140) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:586) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:996) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1037) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1033) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1033) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1073) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens.testMasterKeyRollOver(TestAMRMTokens.java:235) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4538) QueueMetrics pending cores and memory metrics wrong
[ https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098042#comment-15098042 ] Hadoop QA commented on YARN-4538: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 20s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 34s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 2, now 3). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 17s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 26s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 147m 22s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | |
[jira] [Commented] (YARN-4565) When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, Sometimes lead to situation where all queue resources consumed by AMs only
[ https://issues.apache.org/jira/browse/YARN-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098034#comment-15098034 ] Naganarasimha G R commented on YARN-4565: - +1, Patch LGTM > When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, > Sometimes lead to situation where all queue resources consumed by AMs only > > > Key: YARN-4565 > URL: https://issues.apache.org/jira/browse/YARN-4565 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0 >Reporter: Karam Singh >Assignee: Wangda Tan > Attachments: YARN-4565.1.patch, YARN-4565.2.patch, YARN-4565.3.patch > > > When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, > Sometimes lead to situation where all queue resources consumed by AMs only, > So from users perpective it appears that all application in queue are stuck, > whole queue capacity is comsumed by AMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4317) Test failure: TestResourceTrackerService
[ https://issues.apache.org/jira/browse/YARN-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4317: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4478 > Test failure: TestResourceTrackerService > - > > Key: YARN-4317 > URL: https://issues.apache.org/jira/browse/YARN-4317 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Tsuyoshi Ozawa > > {quote} > Running > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.438 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService > testReconnectNode(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) > Time elapsed: 0.114 sec <<< FAILURE! > java.lang.AssertionError: expected:<15360> but was:<10240> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testReconnectNode(TestResourceTrackerService.java:624) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4453) TestMiniYarnClusterNodeUtilization occasionally times out in trunk
[ https://issues.apache.org/jira/browse/YARN-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4453: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4478 > TestMiniYarnClusterNodeUtilization occasionally times out in trunk > -- > > Key: YARN-4453 > URL: https://issues.apache.org/jira/browse/YARN-4453 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Reporter: Sunil G > > TestMiniYarnClusterNodeUtilization failures are observed in few test runs in > YARN-4293. > In local also, same test case is timing out. > {noformat} > java.lang.Exception: test timed out after 6 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:158) > at com.sun.proxy.$Proxy85.nodeHeartbeat(Unknown Source) > at > org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:113) > {noformat} > YARN-3980, where this test are added, reported few timed-out cases. I think > this is to be investigated because its not looks good to increase timeout for > tests, if tests fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4462) FairScheduler: Disallow preemption from a queue
[ https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Jie updated YARN-4462: -- Attachment: YARN-4462.004.patch > FairScheduler: Disallow preemption from a queue > --- > > Key: YARN-4462 > URL: https://issues.apache.org/jira/browse/YARN-4462 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Tao Jie >Assignee: Tao Jie > Attachments: YARN-4462.001.patch, YARN-4462.002.patch, > YARN-4462.003.patch, YARN-4462.004.patch > > > When scheduler preemption is enabled, applications could be preempted if they > obtain resource over they should take. > When a mapreduce application is preempted some resource, it just runs slower. > However, when the preempted application is a long-run service, such as tomcat > running in slider, the service would fail. > So we should have a flag for application to indicate the scheduler that those > application should not be preempted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)