[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607901#comment-16607901 ] niu commented on YARN-8513: --- Thanks [~leftnoteasy] for your effort to look at this problem. In my attached debug log, the setting is we have 2 queues: root.dw and root.dev. Capacity setting for dw and dev are dw(capacity: 68, max:100) and dev(capacity:32, max:60), respectively. In this case, root almost fully occupied by dw and only has 256000 resources for dev. Therefore, each container request (360448) from dev will not be reserved according to the logic in YARN-4280 as the the used+notallocated beyonds the capacity of root (parent of dev) 's capacity. It makes sense for the above scenario. However, I still feel there is some problem. When I set the max capacity of dev from 60 to 100. Then, the problem will not occur. The root also beyonds the limitation under this setting. How to explain it ? I will attach the log next Monday. > CapacityScheduler infinite loop when queue is near fully utilized > - > > Key: YARN-8513 > URL: https://issues.apache.org/jira/browse/YARN-8513 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 3.1.0, 2.9.1 > Environment: Ubuntu 14.04.5 and 16.04.4 > YARN is configured with one label and 5 queues. >Reporter: Chen Yufei >Priority: Major > Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, > jstack-5.log, top-during-lock.log, top-when-normal.log, yarn3-jstack1.log, > yarn3-jstack2.log, yarn3-jstack3.log, yarn3-jstack4.log, yarn3-jstack5.log, > yarn3-resourcemanager.log, yarn3-top > > > ResourceManager does not respond to any request when queue is near fully > utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM > restart, it can recover running jobs and start accepting new ones. > > Seems like CapacityScheduler is in an infinite loop printing out the > following log messages (more than 25,000 lines in a second): > > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=0.99816763 > absoluteUsedCapacity=0.99816763 used= > cluster=}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application attempt=appattempt_1530619767030_1652_01 > container=null > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943 > clusterResource= type=NODE_LOCAL > requestedPartition=}} > > I encounter this problem several times after upgrading to YARN 2.9.1, while > the same configuration works fine under version 2.7.3. > > YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a > similar problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8755) Add clean up for FederationStore apps
[ https://issues.apache.org/jira/browse/YARN-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan resolved YARN-8755. -- Resolution: Duplicate [~bibinchundatt], this should be addressed by YARN-6648 & YARN-7599. Your review of the latter will be appreciated. Thanks. > Add clean up for FederationStore apps > - > > Key: YARN-8755 > URL: https://issues.apache.org/jira/browse/YARN-8755 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Priority: Major > > We should add clean up logic for applications to home cluster mapping in > federation State store. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8755) Add clean up for FederationStore apps
Bibin A Chundatt created YARN-8755: -- Summary: Add clean up for FederationStore apps Key: YARN-8755 URL: https://issues.apache.org/jira/browse/YARN-8755 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bibin A Chundatt We should add clean up logic for applications to home cluster mapping in federation State store. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
[ https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607890#comment-16607890 ] Bibin A Chundatt edited comment on YARN-8699 at 9/8/18 3:56 AM: Thank you [~giovanni.fumarola] for review and commit {quote} I found interesting that GetClusterMetricsRequest can be null {quote} Same here, Didn't want to change the behaviour in this jira. was (Author: bibinchundatt): Thank you [~giovanni.fumarola] for review and commit {quote} I found interesting that GetClusterMetricsRequest can be null {quote} Same here, Didn't want to change the behaviour . > Add Yarnclient#yarnclusterMetrics API implementation in router > -- > > Key: YARN-8699 > URL: https://issues.apache.org/jira/browse/YARN-8699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8699.001.patch, YARN-8699.002.patch, > YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch > > > Implement YarnclusterMetrics API in FederationClientInterceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
[ https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607890#comment-16607890 ] Bibin A Chundatt commented on YARN-8699: Thank you [~giovanni.fumarola] for review and commit {quote} I found interesting that GetClusterMetricsRequest can be null {quote} Same here, Didn't want to change the behaviour . > Add Yarnclient#yarnclusterMetrics API implementation in router > -- > > Key: YARN-8699 > URL: https://issues.apache.org/jira/browse/YARN-8699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8699.001.patch, YARN-8699.002.patch, > YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch > > > Implement YarnclusterMetrics API in FederationClientInterceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8709) intra-queue preemption checker always fail since one under-served queue was deleted
[ https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607880#comment-16607880 ] Hadoop QA commented on YARN-8709: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 5s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 73m 1s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}121m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8709 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938929/YARN-8709.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux fa19f44157f8 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bf8a175 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21790/testReport/ | | Max. process+thread count | 936 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21790/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > intra-queue preemption checker always fail since
[jira] [Commented] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period
[ https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607866#comment-16607866 ] Chandni Singh commented on YARN-8706: - Thanks [~eyang], [~ebadger], and [~shaneku...@gmail.com] > DelayedProcessKiller is executed for Docker containers even though docker > stop sends a KILL signal after the specified grace period > --- > > Key: YARN-8706 > URL: https://issues.apache.org/jira/browse/YARN-8706 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Labels: docker > Fix For: 3.2.0 > > Attachments: YARN-8706.001.patch, YARN-8706.002.patch, > YARN-8706.003.patch, YARN-8706.004.patch > > > {{DockerStopCommand}} adds a grace period of 10 seconds. > 10 seconds is also the default grace time use by docker stop > [https://docs.docker.com/engine/reference/commandline/stop/] > Documentation of the docker stop: > {quote}the main process inside the container will receive {{SIGTERM}}, and > after a grace period, {{SIGKILL}}. > {quote} > There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes > for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By > default this is set to {{250 milliseconds}} and so irrespective of the > container type, it will always get executed. > > For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} > after the grace period > - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of > executing DelayedProcessKiller > - when sleepDelayBeforeSigKill < 1 second, then the grace period should be > the smallest value, which is 1 second, because anyways we are forcing kill > after 250 ms > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period
[ https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607841#comment-16607841 ] Hudson commented on YARN-8706: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14907 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14907/]) YARN-8706. Updated docker container stop logic to avoid double kill. (eyang: rev bf8a1750e99cfbfa76021ce51b6514c74c06f498) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerCommandExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerInspectCommand.java > DelayedProcessKiller is executed for Docker containers even though docker > stop sends a KILL signal after the specified grace period > --- > > Key: YARN-8706 > URL: https://issues.apache.org/jira/browse/YARN-8706 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Labels: docker > Fix For: 3.2.0 > > Attachments: YARN-8706.001.patch, YARN-8706.002.patch, > YARN-8706.003.patch, YARN-8706.004.patch > > > {{DockerStopCommand}} adds a grace period of 10 seconds. > 10 seconds is also the default grace time use by docker stop > [https://docs.docker.com/engine/reference/commandline/stop/] > Documentation of the docker stop: > {quote}the main process inside the container will receive {{SIGTERM}}, and > after a grace period, {{SIGKILL}}. > {quote} > There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes > for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By > default this is set to {{250 milliseconds}} and so irrespective of the > container type, it will always get executed. > > For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} > after the grace period > - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of > executing DelayedProcessKiller > - when sleepDelayBeforeSigKill < 1 second, then the grace period should be > the smallest value, which is 1 second, because anyways we are forcing kill > after 250 ms > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8709) intra-queue preemption checker always fail since one under-served queue was deleted
[ https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8709: --- Attachment: YARN-8709.002.patch > intra-queue preemption checker always fail since one under-served queue was > deleted > --- > > Key: YARN-8709 > URL: https://issues.apache.org/jira/browse/YARN-8709 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, scheduler preemption >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8709.001.patch, YARN-8709.002.patch > > > After some queues deleted, the preemption checker in SchedulingMonitor was > always skipped because of YarnRuntimeException for every run. > Error logs: > {noformat} > ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: > Exception raised while executing preemption checker, skip this run..., > exception= > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't > happen, cannot find TempQueuePerPartition for queueName=1535075839208 > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) > {noformat} > I think there is something wrong with partitionToUnderServedQueues field in > ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues > can be add but never be removed, except rebuilding this policy. For example, > once under-served queue "a" is added into this structure, it will always be > there and never be removed, intra-queue preemption checker will try to get > all queues info for partitionToUnderServedQueues in > IntraQueueCandidatesSelector#selectCandidates and will throw > YarnRuntimeException if not found. So that after queue "a" is deleted from > queue structure, the preemption checker will always fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8709) intra-queue preemption checker always fail since one under-served queue was deleted
[ https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607829#comment-16607829 ] Tao Yang commented on YARN-8709: Thanks [~eepayne] for the review! There are several similar problems in TestProportionalCapacityPreemptionPolicyIntraQueue, Attached v2 patch to correct them. > intra-queue preemption checker always fail since one under-served queue was > deleted > --- > > Key: YARN-8709 > URL: https://issues.apache.org/jira/browse/YARN-8709 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, scheduler preemption >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8709.001.patch, YARN-8709.002.patch > > > After some queues deleted, the preemption checker in SchedulingMonitor was > always skipped because of YarnRuntimeException for every run. > Error logs: > {noformat} > ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: > Exception raised while executing preemption checker, skip this run..., > exception= > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't > happen, cannot find TempQueuePerPartition for queueName=1535075839208 > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) > {noformat} > I think there is something wrong with partitionToUnderServedQueues field in > ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues > can be add but never be removed, except rebuilding this policy. For example, > once under-served queue "a" is added into this structure, it will always be > there and never be removed, intra-queue preemption checker will try to get > all queues info for partitionToUnderServedQueues in > IntraQueueCandidatesSelector#selectCandidates and will throw > YarnRuntimeException if not found. So that after queue "a" is deleted from > queue structure, the preemption checker will always fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8751: Target Version/s: 3.2.0, 3.1.2 (was: 3.2.0) Fix Version/s: 3.1.2 > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Assignee: Craig Condit >Priority: Critical > Labels: Docker > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8751.001.patch > > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script paths... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating local dirs... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Path > /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05 > has permission 774 but needs per > mission 750. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null) > 2018-08-31 21:07:22,386 ERROR
[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607817#comment-16607817 ] Eric Yang commented on YARN-8751: - [~ccondit-target] cherry-picked to branch-3.1. > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Assignee: Craig Condit >Priority: Critical > Labels: Docker > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8751.001.patch > > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script paths... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating local dirs... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Path > /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05 > has permission 774 but needs per > mission 750. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null) > 2018-08-31 21:07:22,386
[jira] [Commented] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period
[ https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607809#comment-16607809 ] Eric Yang commented on YARN-8706: - +1 for patch 004. I will commit shortly. > DelayedProcessKiller is executed for Docker containers even though docker > stop sends a KILL signal after the specified grace period > --- > > Key: YARN-8706 > URL: https://issues.apache.org/jira/browse/YARN-8706 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Labels: docker > Attachments: YARN-8706.001.patch, YARN-8706.002.patch, > YARN-8706.003.patch, YARN-8706.004.patch > > > {{DockerStopCommand}} adds a grace period of 10 seconds. > 10 seconds is also the default grace time use by docker stop > [https://docs.docker.com/engine/reference/commandline/stop/] > Documentation of the docker stop: > {quote}the main process inside the container will receive {{SIGTERM}}, and > after a grace period, {{SIGKILL}}. > {quote} > There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes > for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By > default this is set to {{250 milliseconds}} and so irrespective of the > container type, it will always get executed. > > For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} > after the grace period > - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of > executing DelayedProcessKiller > - when sleepDelayBeforeSigKill < 1 second, then the grace period should be > the smallest value, which is 1 second, because anyways we are forcing kill > after 250 ms > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607808#comment-16607808 ] Hudson commented on YARN-8751: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14905 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14905/]) YARN-8751. Reduce conditions that mark node manager as unhealthy. (eyang: rev 7d623343879ce9a8f8e64601024d018efc02794c) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Assignee: Craig Condit >Priority: Critical > Labels: Docker > Fix For: 3.2.0 > > Attachments: YARN-8751.001.patch > > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script paths... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating local dirs... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor >
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607802#comment-16607802 ] Hadoop QA commented on YARN-8569: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 50s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 43 new + 149 unchanged - 1 fixed = 192 total (was 150) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 57s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 31s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 0s{color} | {color:red} hadoop-yarn-services-core in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 33s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607797#comment-16607797 ] Craig Condit commented on YARN-8751: [~eyang], [~shaneku...@gmail.com]: Do we want to commit this to branch-3.1 as well? > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Assignee: Craig Condit >Priority: Critical > Labels: Docker > Fix For: 3.2.0 > > Attachments: YARN-8751.001.patch > > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script paths... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating local dirs... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Path > /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05 > has permission 774 but needs per > mission 750. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to
[jira] [Updated] (YARN-8754) [UI2] Improve terms on Component Instance page
[ https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8754: - Description: Component instance page has "node" and "host". These two fields are representing "bare_host" and "hostname" respectively. >From UI2 page thats not clear. Thus, table content need to be changed to "bare >host" from "node" . This page also has "Host URL" which is hard coded to N/A. Thus, removing this field from table. was: Component instance page has "node" and "host". These two fields are representing "bare_host" and "hostname" accordingly. >From UI2 page thats not clear. Thus, table content need to be changed to "bare >host" from "node" . This page also has "Host URL" which is hard coded to N/A. Thus, removing this field from table. > [UI2] Improve terms on Component Instance page > --- > > Key: YARN-8754 > URL: https://issues.apache.org/jira/browse/YARN-8754 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-07 at 4.12.54 PM.png, Screen Shot > 2018-09-07 at 4.30.11 PM.png, YARN-8754.001.patch > > > Component instance page has "node" and "host". These two fields are > representing "bare_host" and "hostname" respectively. > From UI2 page thats not clear. Thus, table content need to be changed to > "bare host" from "node" . > This page also has "Host URL" which is hard coded to N/A. Thus, removing this > field from table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8045) Reduce log output from container status calls
[ https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607783#comment-16607783 ] Hadoop QA commented on YARN-8045: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 30s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8045 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938908/YARN-8045.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8154fa364d3d 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 335a813 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21789/testReport/ | | Max. process+thread count | 414 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21789/console | | Powered by | Apache Yetus 0.8.0
[jira] [Updated] (YARN-8754) [UI2] Improve terms on Component Instance page
[ https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8754: - Attachment: Screen Shot 2018-09-07 at 4.30.11 PM.png > [UI2] Improve terms on Component Instance page > --- > > Key: YARN-8754 > URL: https://issues.apache.org/jira/browse/YARN-8754 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-07 at 4.12.54 PM.png, Screen Shot > 2018-09-07 at 4.30.11 PM.png, YARN-8754.001.patch > > > Component instance page has "node" and "host". These two fields are > representing "bare_host" and "hostname" accordingly. > From UI2 page thats not clear. Thus, table content need to be changed to > "bare host" from "node" . > This page also has "Host URL" which is hard coded to N/A. Thus, removing this > field from table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8754) [UI2] Improve terms on Component Instance page
[ https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1660#comment-1660 ] Yesha Vora commented on YARN-8754: -- Find the screenshot of component instance page after fixing terms. !Screen Shot 2018-09-07 at 4.30.11 PM.png! > [UI2] Improve terms on Component Instance page > --- > > Key: YARN-8754 > URL: https://issues.apache.org/jira/browse/YARN-8754 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-07 at 4.12.54 PM.png, Screen Shot > 2018-09-07 at 4.30.11 PM.png, YARN-8754.001.patch > > > Component instance page has "node" and "host". These two fields are > representing "bare_host" and "hostname" accordingly. > From UI2 page thats not clear. Thus, table content need to be changed to > "bare host" from "node" . > This page also has "Host URL" which is hard coded to N/A. Thus, removing this > field from table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8754) [UI2] Improve terms on Component Instance page
[ https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8754: - Attachment: YARN-8754.001.patch > [UI2] Improve terms on Component Instance page > --- > > Key: YARN-8754 > URL: https://issues.apache.org/jira/browse/YARN-8754 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-07 at 4.12.54 PM.png, > YARN-8754.001.patch > > > Component instance page has "node" and "host". These two fields are > representing "bare_host" and "hostname" accordingly. > From UI2 page thats not clear. Thus, table content need to be changed to > "bare host" from "node" . > This page also has "Host URL" which is hard coded to N/A. Thus, removing this > field from table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8754) [UI2] Improve terms on Component Instance page
[ https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8754: - Attachment: Screen Shot 2018-09-07 at 4.12.54 PM.png > [UI2] Improve terms on Component Instance page > --- > > Key: YARN-8754 > URL: https://issues.apache.org/jira/browse/YARN-8754 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-07 at 4.12.54 PM.png > > > Component instance page has "node" and "host". These two fields are > representing "bare_host" and "hostname" accordingly. > From UI2 page thats not clear. Thus, table content need to be changed to > "bare host" from "node" . > This page also has "Host URL" which is hard coded to N/A. Thus, removing this > field from table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8748) Javadoc warnings within the nodemanager package
[ https://issues.apache.org/jira/browse/YARN-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607769#comment-16607769 ] Hadoop QA commented on YARN-8748: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 23s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 5 new + 9 unchanged - 0 fixed = 14 total (was 9) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 0 new + 0 unchanged - 10 fixed = 0 total (was 10) {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 0s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 69m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8748 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938907/YARN-8748.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 26a04272dbae 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 335a813 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21788/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test
[jira] [Updated] (YARN-8754) [UI2] Improve terms on Component Instance page
[ https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8754: - Affects Version/s: 3.1.1 > [UI2] Improve terms on Component Instance page > --- > > Key: YARN-8754 > URL: https://issues.apache.org/jira/browse/YARN-8754 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > > Component instance page has "node" and "host". These two fields are > representing "bare_host" and "hostname" accordingly. > From UI2 page thats not clear. Thus, table content need to be changed to > "bare host" from "node" . > This page also has "Host URL" which is hard coded to N/A. Thus, removing this > field from table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8754) [UI2] Improve terms on Component Instance page
[ https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora reassigned YARN-8754: Assignee: Yesha Vora > [UI2] Improve terms on Component Instance page > --- > > Key: YARN-8754 > URL: https://issues.apache.org/jira/browse/YARN-8754 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > > Component instance page has "node" and "host". These two fields are > representing "bare_host" and "hostname" accordingly. > From UI2 page thats not clear. Thus, table content need to be changed to > "bare host" from "node" . > This page also has "Host URL" which is hard coded to N/A. Thus, removing this > field from table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8754) [UI2] Improve terms on Component Instance page
Yesha Vora created YARN-8754: Summary: [UI2] Improve terms on Component Instance page Key: YARN-8754 URL: https://issues.apache.org/jira/browse/YARN-8754 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora Component instance page has "node" and "host". These two fields are representing "bare_host" and "hostname" accordingly. >From UI2 page thats not clear. Thus, table content need to be changed to "bare >host" from "node" . This page also has "Host URL" which is hard coded to N/A. Thus, removing this field from table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607758#comment-16607758 ] Botong Huang commented on YARN-8658: I did a quick pass, find a few small issues. Please fix the yetus complaints as well. AMRMClientRelayerMetrics: Metrics for AMRMProxy Internals. -> Metrics for FederationInterceptor (or AMRMClientRelayer?) Internals. remove everything about "E2E", perhaps per sub-cluster data is good enough here? UnmanagedApplicationManager: retain the empty line at line 169 > Metrics for AMRMClientRelayer inside FederationInterceptor > -- > > Key: YARN-8658 > URL: https://issues.apache.org/jira/browse/YARN-8658 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Young Chen >Priority: Major > Attachments: YARN-8658.01.patch, YARN-8658.02.patch, > YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, YARN-8658.06.patch > > > AMRMClientRelayer (YARN-7900) is introduced for stateful > FederationInterceptor (YARN-7899), to keep track of all pending requests sent > to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to > show the state of things in FederationInterceptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607751#comment-16607751 ] Hadoop QA commented on YARN-8658: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 29s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 11s{color} | {color:red} hadoop-yarn-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 11s{color} | {color:red} hadoop-yarn-server in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 59s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 38s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 4m 15s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 24s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 20s{color} | {color:red} hadoop-yarn-server-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 59s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 56m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.uam.TestUnmanagedApplicationManager | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8658 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938903/YARN-8658.06.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 557a8a03ac58 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Assigned] (YARN-8045) Reduce log output from container status calls
[ https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit reassigned YARN-8045: -- Assignee: Craig Condit > Reduce log output from container status calls > - > > Key: YARN-8045 > URL: https://issues.apache.org/jira/browse/YARN-8045 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Shane Kumpf >Assignee: Craig Condit >Priority: Major > > Each time a container's status is returned a log entry is produced in the NM > from {{ContainerManagerImpl}}. The container status includes the diagnostics > field for the container. If the diagnostics field contains an exception, it > can appear as if the exception is logged repeatedly every second. The > diagnostics message can also span many lines, which puts pressure on the logs > and makes it harder to read. > For example: > {code} > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_e01_1521323860653_0001_01_05 > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: > RUNNING, Capability: , Diagnostics: [2018-03-17 > 22:01:00.675]Exception from container-launch. > Container id: container_e01_1521323860653_0001_01_05 > Exit code: -1 > Exception message: > Shell ouput: > [2018-03-17 22:01:00.750]Diagnostic message from attempt : > [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1. > , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607714#comment-16607714 ] Eric Yang commented on YARN-8569: - [~leftnoteasy] Patch 6 addressed the following: - Use localizer to distribute initial copy of service.json in a tarball. - Mount expanded sysfs.tar to container. - Added logic to replace sysfs.tar local copy with the latest status after service reaches stable state. - Added test cases in Java and C level - On-off feature switch in container-executor.cfg to disable the feature. I didn't make sysfs REST API a generic mechanism for tarball replacer for distributed cache to prevent people from abusing this API for unintended purpose. If you still like that generic feature, please open a separate JIRA for that. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8748) Javadoc warnings within the nodemanager package
[ https://issues.apache.org/jira/browse/YARN-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit reassigned YARN-8748: -- Assignee: Craig Condit > Javadoc warnings within the nodemanager package > --- > > Key: YARN-8748 > URL: https://issues.apache.org/jira/browse/YARN-8748 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Shane Kumpf >Assignee: Craig Condit >Priority: Trivial > > There are a number of javadoc warnings in trunk in classes under the > nodemanager package. These should be addressed or suppressed. > {code:java} > [WARNING] Javadoc Warnings > [WARNING] > /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java:93: > warning - Tag @see: reference not found: > ContainerLaunch.ShellScriptBuilder#listDebugInformation > [WARNING] > /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118: > warning - YarnConfiguration#YARN_CONTAINER_SANDBOX (referenced by @value > tag) is an unknown reference. > [WARNING] > /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118: > warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_FILE_PERMISSIONS > (referenced by @value tag) is an unknown reference. > [WARNING] > /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118: > warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_POLICY (referenced by > @value tag) is an unknown reference. > [WARNING] > /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118: > warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_WHITELIST_GROUP > (referenced by @value tag) is an unknown reference. > [WARNING] > /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118: > warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_POLICY_GROUP_PREFIX > (referenced by @value tag) is an unknown reference. > [WARNING] > /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:211: > warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_WHITELIST_GROUP > (referenced by @value tag) is an unknown reference. > [WARNING] > /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:211: > warning - NMContainerPolicyUtils#SECURITY_FLAG (referenced by @value tag) is > an unknown reference. > [WARNING] > /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficControlBandwidthHandlerImpl.java:248: > warning - @return tag has no arguments. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8569: Attachment: YARN-8569.006.patch > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8500) Use hbase shaded jars
[ https://issues.apache.org/jira/browse/YARN-8500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607702#comment-16607702 ] Vrushali C commented on YARN-8500: -- Came across HBASE-15666 This explains the failures with Mini Cluster that I am seeing. > Use hbase shaded jars > - > > Key: YARN-8500 > URL: https://issues.apache.org/jira/browse/YARN-8500 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vrushali C >Assignee: Vrushali C >Priority: Major > Attachments: YARN-8500.0001.patch > > > Move to using hbase shaded jars in atsv2 > Related jira YARN-7213 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Young Chen updated YARN-8658: - Attachment: YARN-8658.06.patch > Metrics for AMRMClientRelayer inside FederationInterceptor > -- > > Key: YARN-8658 > URL: https://issues.apache.org/jira/browse/YARN-8658 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Young Chen >Priority: Major > Attachments: YARN-8658.01.patch, YARN-8658.02.patch, > YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, YARN-8658.06.patch > > > AMRMClientRelayer (YARN-7900) is introduced for stateful > FederationInterceptor (YARN-7899), to keep track of all pending requests sent > to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to > show the state of things in FederationInterceptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart
[ https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607650#comment-16607650 ] Yesha Vora commented on YARN-8753: -- Find the screenshot of Nodemanager chart after adding LOST. !Screen Shot 2018-09-07 at 11.59.02 AM.png! > [UI2] Lost nodes representation missing from Nodemanagers Chart > --- > > Key: YARN-8753 > URL: https://issues.apache.org/jira/browse/YARN-8753 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot > 2018-09-06 at 6.16.14 PM.png, Screen Shot 2018-09-07 at 11.59.02 AM.png, > YARN-8753.001.patch > > > Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status > page. > This chart does not show nodemanagers if they are LOST. > Due to this issue, Node information page and Node status page shows different > node managers count. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8735) Remove @value javadoc annotation from YARN projects
[ https://issues.apache.org/jira/browse/YARN-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang resolved YARN-8735. - Resolution: Duplicate > Remove @value javadoc annotation from YARN projects > --- > > Key: YARN-8735 > URL: https://issues.apache.org/jira/browse/YARN-8735 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Yang >Priority: Major > > Maven javadoc plugin doesn't support @value annotation, even though IntelliJ > works. There are only ~12 instances that need to be removed. It is probably > better to remove them before this snowball into a problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8735) Remove @value javadoc annotation from YARN projects
[ https://issues.apache.org/jira/browse/YARN-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607652#comment-16607652 ] Eric Yang commented on YARN-8735: - Same issues, and YARN-8748 covers a bit more. Mark this one as duplicate. > Remove @value javadoc annotation from YARN projects > --- > > Key: YARN-8735 > URL: https://issues.apache.org/jira/browse/YARN-8735 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Yang >Priority: Major > > Maven javadoc plugin doesn't support @value annotation, even though IntelliJ > works. There are only ~12 instances that need to be removed. It is probably > better to remove them before this snowball into a problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart
[ https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8753: - Attachment: Screen Shot 2018-09-07 at 11.59.02 AM.png > [UI2] Lost nodes representation missing from Nodemanagers Chart > --- > > Key: YARN-8753 > URL: https://issues.apache.org/jira/browse/YARN-8753 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot > 2018-09-06 at 6.16.14 PM.png, Screen Shot 2018-09-07 at 11.59.02 AM.png, > YARN-8753.001.patch > > > Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status > page. > This chart does not show nodemanagers if they are LOST. > Due to this issue, Node information page and Node status page shows different > node managers count. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart
[ https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8753: - Attachment: YARN-8753.001.patch > [UI2] Lost nodes representation missing from Nodemanagers Chart > --- > > Key: YARN-8753 > URL: https://issues.apache.org/jira/browse/YARN-8753 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot > 2018-09-06 at 6.16.14 PM.png, YARN-8753.001.patch > > > Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status > page. > This chart does not show nodemanagers if they are LOST. > Due to this issue, Node information page and Node status page shows different > node managers count. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-4961) Wrapper for leveldb DB to aid in handling database exceptions
[ https://issues.apache.org/jira/browse/YARN-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-4961: Assignee: Pradeep Ambati Yes, exactly. Thanks for picking this up! Currently it's very fragile to wield the database directly because instead of throwing checked IOExceptions when I/O errors occur it throws a runtime DBException. Having a wrapper class that provides the same methods but throwing checked IOExceptions instead of unchecked runtime exceptions would make it safer to use as a state store backend in Hadoop where we don't necessarily want to tear down the entire server when an I/O error occurs. > Wrapper for leveldb DB to aid in handling database exceptions > - > > Key: YARN-4961 > URL: https://issues.apache.org/jira/browse/YARN-4961 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jason Lowe >Assignee: Pradeep Ambati >Priority: Major > > It would be nice to have a utility wrapper around leveldb's DB to translate > the raw runtime DBExceptions into IOExceptions. This would help make the > code using leveldb easier to read and less error-prone to allowing the > runtime DBExceptions to escape and potentially terminate the calling process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart
[ https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8753: - Description: Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status page. This chart does not show nodemanagers if they are LOST. Due to this issue, Node information page and Node status page shows different node managers count. was: Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status page. This chart does not show nodemanagers if they are LOST. > [UI2] Lost nodes representation missing from Nodemanagers Chart > --- > > Key: YARN-8753 > URL: https://issues.apache.org/jira/browse/YARN-8753 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot > 2018-09-06 at 6.16.14 PM.png > > > Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status > page. > This chart does not show nodemanagers if they are LOST. > Due to this issue, Node information page and Node status page shows different > node managers count. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart
[ https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8753: - Attachment: Screen Shot 2018-09-06 at 6.16.02 PM.png > [UI2] Lost nodes representation missing from Nodemanagers Chart > --- > > Key: YARN-8753 > URL: https://issues.apache.org/jira/browse/YARN-8753 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot > 2018-09-06 at 6.16.14 PM.png > > > Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status > page. > This chart does not show nodemanagers if they are LOST. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart
[ https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8753: - Attachment: Screen Shot 2018-09-06 at 6.16.14 PM.png > [UI2] Lost nodes representation missing from Nodemanagers Chart > --- > > Key: YARN-8753 > URL: https://issues.apache.org/jira/browse/YARN-8753 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot > 2018-09-06 at 6.16.14 PM.png > > > Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status > page. > This chart does not show nodemanagers if they are LOST. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart
[ https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora reassigned YARN-8753: Assignee: Yesha Vora > [UI2] Lost nodes representation missing from Nodemanagers Chart > --- > > Key: YARN-8753 > URL: https://issues.apache.org/jira/browse/YARN-8753 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > > Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status > page. > This chart does not show nodemanagers if they are LOST. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart
Yesha Vora created YARN-8753: Summary: [UI2] Lost nodes representation missing from Nodemanagers Chart Key: YARN-8753 URL: https://issues.apache.org/jira/browse/YARN-8753 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.1 Reporter: Yesha Vora Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status page. This chart does not show nodemanagers if they are LOST. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4961) Wrapper for leveldb DB to aid in handling database exceptions
[ https://issues.apache.org/jira/browse/YARN-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607520#comment-16607520 ] Pradeep Ambati commented on YARN-4961: -- [~jlowe], I want to work on this JIRA. From what I understood, there should be a utility wrapper around DB which throws IOExceptions (translated from DBExceptions) instead of DBExceptions, am I right? > Wrapper for leveldb DB to aid in handling database exceptions > - > > Key: YARN-4961 > URL: https://issues.apache.org/jira/browse/YARN-4961 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jason Lowe >Priority: Major > > It would be nice to have a utility wrapper around leveldb's DB to translate > the raw runtime DBExceptions into IOExceptions. This would help make the > code using leveldb easier to read and less error-prone to allowing the > runtime DBExceptions to escape and potentially terminate the calling process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
[ https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607516#comment-16607516 ] Hudson commented on YARN-8699: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14899 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14899/]) YARN-8699. Add Yarnclient#yarnclusterMetrics API implementation in (gifuma: rev 3dc2988a3779590409cbe7062046e3fee68f8d22) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/MockResourceManagerFacade.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/RouterYarnClientUtils.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/test/java/org/apache/hadoop/yarn/server/router/clientrm/TestRouterYarnClientUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/test/java/org/apache/hadoop/yarn/server/router/clientrm/TestFederationClientInterceptor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/ClientMethod.java > Add Yarnclient#yarnclusterMetrics API implementation in router > -- > > Key: YARN-8699 > URL: https://issues.apache.org/jira/browse/YARN-8699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8699.001.patch, YARN-8699.002.patch, > YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch > > > Implement YarnclusterMetrics API in FederationClientInterceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607495#comment-16607495 ] Wangda Tan commented on YARN-8513: -- Spent good amount of time to check the issue. I found scheduler tries to reserve containers on two nodes. What happens is: 1) For root queue, total resource = 1351680, used resource = 1095680, available resource = 256000 2) The app which gets resource is running under dev queue, maximum resource = 8811008, used resource = 7168. 3) The app always get container reserved with size=360448, which is beyond parent queue's available resource. So this request will be rejected by resource committer. In my mind, this is expected behavior, even though the resource proposal / reject is not necessary. This behavior is in-line with YARN-4280, which we want to keep under-utilized queue still get resources when resource request is large. Let me use an example to explain this: Scheduler has two queues, a and b, capacity of each queues are 0.5. max capacity of a = 1.0, b=0.8. Assume cluster resource = 100. There's an app running in a, which uses 75 resources, so a's absolute used capacity = 0.75. There're still many pending resource request from a, size of each = 1 And then user submit app to b. asking a single container, which has size = 30. In that case, scheduler cannot allocate the container because cluster's total available = 25. If we give these resources to queue=a, queue=b can never get the available resource, because smaller resource request will be always preferred. Instead, the logic in YARN-4280 is: if queue b don't get resource because of parent queue's resource limit. Instead of giving resources to other queues, scheduler hold the resource. So you can see that there're 25 resources available, but no one can get the resource. The problem only occurs in a super busy cluster, with less node. To solve the problem, turn on preemption can alleviate the issue a lot. I prefer to close this as "no fix needed". Thoughts? > CapacityScheduler infinite loop when queue is near fully utilized > - > > Key: YARN-8513 > URL: https://issues.apache.org/jira/browse/YARN-8513 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 3.1.0, 2.9.1 > Environment: Ubuntu 14.04.5 and 16.04.4 > YARN is configured with one label and 5 queues. >Reporter: Chen Yufei >Priority: Major > Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, > jstack-5.log, top-during-lock.log, top-when-normal.log, yarn3-jstack1.log, > yarn3-jstack2.log, yarn3-jstack3.log, yarn3-jstack4.log, yarn3-jstack5.log, > yarn3-resourcemanager.log, yarn3-top > > > ResourceManager does not respond to any request when queue is near fully > utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM > restart, it can recover running jobs and start accepting new ones. > > Seems like CapacityScheduler is in an infinite loop printing out the > following log messages (more than 25,000 lines in a second): > > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=0.99816763 > absoluteUsedCapacity=0.99816763 used= > cluster=}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application attempt=appattempt_1530619767030_1652_01 > container=null > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943 > clusterResource= type=NODE_LOCAL > requestedPartition=}} > > I encounter this problem several times after upgrading to YARN 2.9.1, while > the same configuration works fine under version 2.7.3. > > YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a > similar problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
[ https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8699: --- Fix Version/s: 3.2.0 > Add Yarnclient#yarnclusterMetrics API implementation in router > -- > > Key: YARN-8699 > URL: https://issues.apache.org/jira/browse/YARN-8699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8699.001.patch, YARN-8699.002.patch, > YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch > > > Implement YarnclusterMetrics API in FederationClientInterceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
[ https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607480#comment-16607480 ] Giovanni Matteo Fumarola commented on YARN-8699: [^YARN-8699.005.patch] looks good. Committing to trunk. Thanks [~bibinchundatt] for the patch. However, I found interesting that GetClusterMetricsRequest can be null and have proper results. RM accepts null requests for this call. > Add Yarnclient#yarnclusterMetrics API implementation in router > -- > > Key: YARN-8699 > URL: https://issues.apache.org/jira/browse/YARN-8699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8699.001.patch, YARN-8699.002.patch, > YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch > > > Implement YarnclusterMetrics API in FederationClientInterceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607466#comment-16607466 ] Hadoop QA commented on YARN-8658: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 59s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 38s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 2s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 86m 13s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8658 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938850/YARN-8658.05.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux def21ffa6bb9 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 94ed5cf | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21785/testReport/ | | Max. process+thread count | 328 (vs. ulimit of
[jira] [Comment Edited] (YARN-8200) Backport resource types/GPU features to branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607445#comment-16607445 ] Jonathan Hung edited comment on YARN-8200 at 9/7/18 6:06 PM: - Build https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779 timed out: {noformat}cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt 2>&1 Elapsed: 2m 40s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt 2>&1 Elapsed: 15m 20s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt 2>&1 Elapsed: 4m 49s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt 2>&1 Elapsed: 79m 41s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt 2>&1 Elapsed: 3m 59s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt 2>&1 Build timed out (after 500 minutes). Marking the build as aborted. Build was aborted Performing Post build task... Match found for :. : True Logical operation result is TRUE Running script : #!/bin/bash{noformat} It appears the unit tests hang here: (https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt) {noformat}[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hadoop-yarn-client --- [INFO] Compiling 34 source files to /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/test-classes [WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java:[311,6] [deprecation] MiniYARNCluster(String,int,int,int,int,boolean) in MiniYARNCluster has been deprecated [WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestNMClientAsync.java:[453,16] [deprecation] onIncreaseContainerResourceError(ContainerId,Throwable) in AbstractCallbackHandler has been deprecated [WARNING]
[jira] [Comment Edited] (YARN-8200) Backport resource types/GPU features to branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607445#comment-16607445 ] Jonathan Hung edited comment on YARN-8200 at 9/7/18 6:03 PM: - Build https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779 timed out: {noformat}cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt 2>&1 Elapsed: 2m 40s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt 2>&1 Elapsed: 15m 20s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt 2>&1 Elapsed: 4m 49s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt 2>&1 Elapsed: 79m 41s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt 2>&1 Elapsed: 3m 59s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt 2>&1 Build timed out (after 500 minutes). Marking the build as aborted. Build was aborted Performing Post build task... Match found for :. : True Logical operation result is TRUE Running script : #!/bin/bash{noformat} It appears the unit tests hang here: (https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt) {noformat}[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hadoop-yarn-client --- [INFO] Compiling 34 source files to /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/test-classes [WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java:[311,6] [deprecation] MiniYARNCluster(String,int,int,int,int,boolean) in MiniYARNCluster has been deprecated [WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestNMClientAsync.java:[453,16] [deprecation] onIncreaseContainerResourceError(ContainerId,Throwable) in AbstractCallbackHandler has been deprecated [WARNING]
[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607445#comment-16607445 ] Jonathan Hung commented on YARN-8200: - Build https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779 timed out: {noformat}cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt 2>&1 Elapsed: 2m 40s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt 2>&1 Elapsed: 15m 20s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt 2>&1 Elapsed: 4m 49s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt 2>&1 Elapsed: 79m 41s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt 2>&1 Elapsed: 3m 59s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt 2>&1 Build timed out (after 500 minutes). Marking the build as aborted. Build was aborted Performing Post build task... Match found for :. : True Logical operation result is TRUE Running script : #!/bin/bash{noformat} > Backport resource types/GPU features to branch-2 > > > Key: YARN-8200 > URL: https://issues.apache.org/jira/browse/YARN-8200 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-8200-branch-2.001.patch, > counter.scheduler.operation.allocate.csv.defaultResources, > counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json > > > Currently we have a need for GPU scheduling on our YARN clusters to support > deep learning workloads. However, our main production clusters are running > older versions of branch-2 (2.7 in our case). To prevent supporting too many > very different hadoop versions across multiple clusters, we would like to > backport the resource types/resource profiles feature to branch-2, as well as > the GPU specific support. > > We have done a trial backport of YARN-3926 and some miscellaneous patches in > YARN-7069 based on issues
[jira] [Commented] (YARN-5592) Add support for dynamic resource updates with multiple resource types
[ https://issues.apache.org/jira/browse/YARN-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607393#comment-16607393 ] Wangda Tan commented on YARN-5592: -- [~sunilg], I think remove resource types gonna be hard. Unless we can pause the scheduler, check all resources existing in the heap, it is almost impossible to remove resource types. Adding resource types is also hard, since we have assumptions that all resources have the same length. It makes more sense to me to restart RMs to take effect of resource related changes. What is the problem we wanna to solve here? > Add support for dynamic resource updates with multiple resource types > - > > Key: YARN-5592 > URL: https://issues.apache.org/jira/browse/YARN-5592 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Manikandan R >Priority: Major > Attachments: YARN-5592-design-2.docx > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607373#comment-16607373 ] Eric Yang commented on YARN-8751: - +1 LGTM. > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Assignee: Craig Condit >Priority: Critical > Labels: Docker > Attachments: YARN-8751.001.patch > > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script paths... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating local dirs... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Path > /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05 > has permission 774 but needs per > mission 750. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null) > 2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch > (ContainerRelaunch.java:call(129)) - Failed
[jira] [Updated] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Young Chen updated YARN-8658: - Attachment: YARN-8658.05.patch > Metrics for AMRMClientRelayer inside FederationInterceptor > -- > > Key: YARN-8658 > URL: https://issues.apache.org/jira/browse/YARN-8658 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Young Chen >Priority: Major > Attachments: YARN-8658.01.patch, YARN-8658.02.patch, > YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch > > > AMRMClientRelayer (YARN-7900) is introduced for stateful > FederationInterceptor (YARN-7899), to keep track of all pending requests sent > to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to > show the state of things in FederationInterceptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Young Chen updated YARN-8658: - Attachment: (was: YARN-8658.04.patch) > Metrics for AMRMClientRelayer inside FederationInterceptor > -- > > Key: YARN-8658 > URL: https://issues.apache.org/jira/browse/YARN-8658 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Young Chen >Priority: Major > Attachments: YARN-8658.01.patch, YARN-8658.02.patch, > YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch > > > AMRMClientRelayer (YARN-7900) is introduced for stateful > FederationInterceptor (YARN-7899), to keep track of all pending requests sent > to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to > show the state of things in FederationInterceptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Young Chen updated YARN-8658: - Attachment: YARN-8658.04.patch > Metrics for AMRMClientRelayer inside FederationInterceptor > -- > > Key: YARN-8658 > URL: https://issues.apache.org/jira/browse/YARN-8658 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Young Chen >Priority: Major > Attachments: YARN-8658.01.patch, YARN-8658.02.patch, > YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.04.patch > > > AMRMClientRelayer (YARN-7900) is introduced for stateful > FederationInterceptor (YARN-7899), to keep track of all pending requests sent > to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to > show the state of things in FederationInterceptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8709) intra-queue preemption checker always fail since one under-served queue was deleted
[ https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607172#comment-16607172 ] Eric Payne commented on YARN-8709: -- [~Tao Yang], thanks for the patch. The changes look good. One small nit: Unless I am mis-counting, the amount of pending resources for queue b should be 50 and not 60 in {{TestProportionalCapacityPreemptionPolicyIntraQueue#testIntraQueuePreemptionAfterQueueDropped}} > intra-queue preemption checker always fail since one under-served queue was > deleted > --- > > Key: YARN-8709 > URL: https://issues.apache.org/jira/browse/YARN-8709 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, scheduler preemption >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8709.001.patch > > > After some queues deleted, the preemption checker in SchedulingMonitor was > always skipped because of YarnRuntimeException for every run. > Error logs: > {noformat} > ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: > Exception raised while executing preemption checker, skip this run..., > exception= > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't > happen, cannot find TempQueuePerPartition for queueName=1535075839208 > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) > {noformat} > I think there is something wrong with partitionToUnderServedQueues field in > ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues > can be add but never be removed, except rebuilding this policy. For example, > once under-served queue "a" is added into this structure, it will always be > there and never be removed, intra-queue preemption checker will try to get > all queues info for partitionToUnderServedQueues in > IntraQueueCandidatesSelector#selectCandidates and will throw > YarnRuntimeException if not found. So that after queue "a" is deleted from > queue structure, the preemption checker will always fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8750) Refactor TestQueueMetrics
[ https://issues.apache.org/jira/browse/YARN-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606978#comment-16606978 ] Hadoop QA commented on YARN-8750: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 22s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 18s{color} | {color:orange} root: The patch generated 8 new + 93 unchanged - 23 fixed = 101 total (was 116) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 11 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 29s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 1s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 72m 10s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}190m 2s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8750 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938780/YARN-8750.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 733a727497c1 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 396ce7b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | |
[jira] [Comment Edited] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606769#comment-16606769 ] Rahul Anand edited comment on YARN-7592 at 9/7/18 9:42 AM: --- As per my understanding, for a Non-HA setup, with the default configuration, this will always create a problem. I have listed down my analysis. NodeManager registration starts from {{NodeManager#main}} and evetually invokes {{NodeStatusUpdaterImpl#serviceStart}} {code:java} protected void serviceStart() throws Exception { ... this.resourceTracker = getRMClient(); .. } catch (Exception e) { String errorMessage = "Unexpected error starting NodeStatusUpdater"; LOG.error(errorMessage, e); throw new YarnRuntimeException(e); } } {code} Then, NodeStatusUpdaterImpl#getRMClient tries to create RM proxy for resource tracker protocol. Now, the Federation enabled check in RMProxy#newProxyInstance {code:java} if (HAUtil.isHAEnabled(conf) || HAUtil.isFederationEnabled(conf)) { RMFailoverProxyProvider provider = instance.createRMFailoverProxyProvider(conf, protocol);{code} is failing the registration of the nodemanager. By default, RMProxy#createRMFailoverProxyProvider will always select ConfiguredRMFailoverProxyProvider {code:java} RMFailoverProxyProvider provider = ReflectionUtils.newInstance( conf.getClass(YarnConfiguration.CLIENT_FAILOVER_PROXY_PROVIDER, defaultProviderClass, RMFailoverProxyProvider.class), conf); provider.init(conf, (RMProxy) this, protocol);{code} and eventually, it will try to get RM's id from ConfiguredRMFailoverProxyProvider#init {code:java} Collection rmIds = HAUtil.getRMHAIds(conf); {code} which would have been set only in case of HA setup according to ResourceManager#serviceInit. {code} this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf)); if (this.rmContext.isHAEnabled()) { HAUtil.verifyAndSetConfiguration(this.conf); } {code} When I tried to run with the proxy provider as FederationRMFailoverProxyProvider, it started the nodemanager but this would be idealistic to work with only in case of 1 RM. {code:java} yarn.client.failover-proxy-provider org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider {code} Please correct if I am wrong at any point. was (Author: rahulanand90): As per my understanding, for a Non-HA setup, with the default configuration, this will always create a problem. I have listed down my analysis. NodeManager registration starts from {{NodeManager#main}} and evetually invokes {{NodeStatusUpdaterImpl#serviceStart}} {code:java} protected void serviceStart() throws Exception \{ ... this.resourceTracker = getRMClient(); .. } catch (Exception e) \{ String errorMessage = "Unexpected error starting NodeStatusUpdater"; LOG.error(errorMessage, e); throw new YarnRuntimeException(e); } } {code} Then, NodeStatusUpdaterImpl#getRMClient tries to create RM proxy for resource tracker protocol. Now, the Federation enabled check in RMProxy#newProxyInstance {code:java} if (HAUtil.isHAEnabled(conf) || HAUtil.isFederationEnabled(conf)) { RMFailoverProxyProvider provider = instance.createRMFailoverProxyProvider(conf, protocol);{code} is failing the registration of the nodemanager. By default, RMProxy#createRMFailoverProxyProvider will always select ConfiguredRMFailoverProxyProvider {code:java} RMFailoverProxyProvider provider = ReflectionUtils.newInstance( conf.getClass(YarnConfiguration.CLIENT_FAILOVER_PROXY_PROVIDER, defaultProviderClass, RMFailoverProxyProvider.class), conf); provider.init(conf, (RMProxy) this, protocol);{code} and eventually, it will try to get RM's id from ConfiguredRMFailoverProxyProvider#init {code:java} Collection rmIds = HAUtil.getRMHAIds(conf); which would have been set only in case of HA setup according to ResourceManager#serviceInit. this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf)); if (this.rmContext.isHAEnabled()) \{ HAUtil.verifyAndSetConfiguration(this.conf); } {code} When I tried to run with the proxy provider as FederationRMFailoverProxyProvider, it started the nodemanager but this would be idealistic to work with only in case of 1 RM. {code:java} yarn.client.failover-proxy-provider org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider {code} Please correct if I am wrong at any point. > yarn.federation.failover.enabled missing in yarn-default.xml > > > Key: YARN-7592 > URL: https://issues.apache.org/jira/browse/YARN-7592 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.0.0-beta1 >Reporter: Gera Shegalov >Priority: Major > Attachments: IssueReproduce.patch > > >
[jira] [Commented] (YARN-7761) [UI2]Clicking 'master container log' or 'Link' next to 'log' under application's appAttempt goes to Old UI's Log link
[ https://issues.apache.org/jira/browse/YARN-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606839#comment-16606839 ] Akhil PB commented on YARN-7761: YARN-7760 only fixed AM Node redirection. > [UI2]Clicking 'master container log' or 'Link' next to 'log' under > application's appAttempt goes to Old UI's Log link > - > > Key: YARN-7761 > URL: https://issues.apache.org/jira/browse/YARN-7761 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Sumana Sathish >Assignee: Akhil PB >Priority: Major > > Clicking 'master container log' or 'Link' next to 'Log' under application's > appAttempt goes to Old UI's Log link -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7761) [UI2]Clicking 'master container log' or 'Link' next to 'log' under application's appAttempt goes to Old UI's Log link
[ https://issues.apache.org/jira/browse/YARN-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB reassigned YARN-7761: -- Assignee: Akhil PB (was: Vasudevan Skm) > [UI2]Clicking 'master container log' or 'Link' next to 'log' under > application's appAttempt goes to Old UI's Log link > - > > Key: YARN-7761 > URL: https://issues.apache.org/jira/browse/YARN-7761 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Sumana Sathish >Assignee: Akhil PB >Priority: Major > > Clicking 'master container log' or 'Link' next to 'Log' under application's > appAttempt goes to Old UI's Log link -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8750) Refactor TestQueueMetrics
[ https://issues.apache.org/jira/browse/YARN-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8750: - Attachment: YARN-8750.001.patch > Refactor TestQueueMetrics > - > > Key: YARN-8750 > URL: https://issues.apache.org/jira/browse/YARN-8750 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-8750.001.patch > > > {{TestQueueMetrics#checkApps}} and {{TestQueueMetrics#checkResources}} have 8 > and 14 parameters, respectively. > It is very hard to read the testcases that are using these methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606769#comment-16606769 ] Rahul Anand commented on YARN-7592: --- As per my understanding, for a Non-HA setup, with the default configuration, this will always create a problem. I have listed down my analysis. NodeManager registration starts from {{NodeManager#main}} and evetually invokes {{NodeStatusUpdaterImpl#serviceStart}} {code:java} protected void serviceStart() throws Exception \{ ... this.resourceTracker = getRMClient(); .. } catch (Exception e) \{ String errorMessage = "Unexpected error starting NodeStatusUpdater"; LOG.error(errorMessage, e); throw new YarnRuntimeException(e); } } {code} Then, NodeStatusUpdaterImpl#getRMClient tries to create RM proxy for resource tracker protocol. Now, the Federation enabled check in RMProxy#newProxyInstance {code:java} if (HAUtil.isHAEnabled(conf) || HAUtil.isFederationEnabled(conf)) { RMFailoverProxyProvider provider = instance.createRMFailoverProxyProvider(conf, protocol);{code} is failing the registration of the nodemanager. By default, RMProxy#createRMFailoverProxyProvider will always select ConfiguredRMFailoverProxyProvider {code:java} RMFailoverProxyProvider provider = ReflectionUtils.newInstance( conf.getClass(YarnConfiguration.CLIENT_FAILOVER_PROXY_PROVIDER, defaultProviderClass, RMFailoverProxyProvider.class), conf); provider.init(conf, (RMProxy) this, protocol);{code} and eventually, it will try to get RM's id from ConfiguredRMFailoverProxyProvider#init {code:java} Collection rmIds = HAUtil.getRMHAIds(conf); which would have been set only in case of HA setup according to ResourceManager#serviceInit. this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf)); if (this.rmContext.isHAEnabled()) \{ HAUtil.verifyAndSetConfiguration(this.conf); } {code} When I tried to run with the proxy provider as FederationRMFailoverProxyProvider, it started the nodemanager but this would be idealistic to work with only in case of 1 RM. {code:java} yarn.client.failover-proxy-provider org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider {code} Please correct if I am wrong at any point. > yarn.federation.failover.enabled missing in yarn-default.xml > > > Key: YARN-7592 > URL: https://issues.apache.org/jira/browse/YARN-7592 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.0.0-beta1 >Reporter: Gera Shegalov >Priority: Major > Attachments: IssueReproduce.patch > > > yarn.federation.failover.enabled should be documented in yarn-default.xml. I > am also not sure why it should be true by default and force the HA retry > policy in {{RMProxy#createRMProxy}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org