[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

2018-09-07 Thread niu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607901#comment-16607901
 ] 

niu commented on YARN-8513:
---

Thanks [~leftnoteasy] for your effort to look at this problem.

In my attached debug log,  the setting is we have 2 queues: root.dw and 
root.dev. Capacity setting for dw and dev are dw(capacity: 68, max:100) and 
dev(capacity:32, max:60), respectively.  In this case, root almost fully 
occupied by dw and only has 256000 resources for dev. Therefore,  each 
container request (360448) from dev will not be reserved according to the logic 
in YARN-4280 as the the used+notallocated beyonds the capacity of root (parent 
of dev) 's capacity. 

It makes sense for the above scenario. However, I still feel there is some 
problem. When I set the max capacity of dev from 60 to 100. Then, the problem 
will not occur. The root also beyonds the limitation under this setting. How to 
explain it ? I will attach the log next Monday.



> CapacityScheduler infinite loop when queue is near fully utilized
> -
>
> Key: YARN-8513
> URL: https://issues.apache.org/jira/browse/YARN-8513
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 3.1.0, 2.9.1
> Environment: Ubuntu 14.04.5 and 16.04.4
> YARN is configured with one label and 5 queues.
>Reporter: Chen Yufei
>Priority: Major
> Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, 
> jstack-5.log, top-during-lock.log, top-when-normal.log, yarn3-jstack1.log, 
> yarn3-jstack2.log, yarn3-jstack3.log, yarn3-jstack4.log, yarn3-jstack5.log, 
> yarn3-resourcemanager.log, yarn3-top
>
>
> ResourceManager does not respond to any request when queue is near fully 
> utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM 
> restart, it can recover running jobs and start accepting new ones.
>  
> Seems like CapacityScheduler is in an infinite loop printing out the 
> following log messages (more than 25,000 lines in a second):
>  
> {{2018-07-10 17:16:29,227 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> assignedContainer queue=root usedCapacity=0.99816763 
> absoluteUsedCapacity=0.99816763 used= 
> cluster=}}
> {{2018-07-10 17:16:29,227 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal}}
> {{2018-07-10 17:16:29,227 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
>  assignedContainer application attempt=appattempt_1530619767030_1652_01 
> container=null 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943
>  clusterResource= type=NODE_LOCAL 
> requestedPartition=}}
>  
> I encounter this problem several times after upgrading to YARN 2.9.1, while 
> the same configuration works fine under version 2.7.3.
>  
> YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a 
> similar problem.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8755) Add clean up for FederationStore apps

2018-09-07 Thread Subru Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-8755.
--
Resolution: Duplicate

[~bibinchundatt], this should be addressed by YARN-6648 & YARN-7599. Your 
review of the latter will be appreciated.

Thanks.

> Add clean up for FederationStore apps
> -
>
> Key: YARN-8755
> URL: https://issues.apache.org/jira/browse/YARN-8755
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Priority: Major
>
> We should add clean up logic for applications to home cluster mapping  in 
> federation State store. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8755) Add clean up for FederationStore apps

2018-09-07 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-8755:
--

 Summary: Add clean up for FederationStore apps
 Key: YARN-8755
 URL: https://issues.apache.org/jira/browse/YARN-8755
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bibin A Chundatt


We should add clean up logic for applications to home cluster mapping  in 
federation State store. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router

2018-09-07 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607890#comment-16607890
 ] 

Bibin A Chundatt edited comment on YARN-8699 at 9/8/18 3:56 AM:


Thank you [~giovanni.fumarola] for review and commit

{quote}
I found interesting that GetClusterMetricsRequest can be null 
{quote}
Same here, Didn't want to change the behaviour in this jira. 


was (Author: bibinchundatt):
Thank you [~giovanni.fumarola] for review and commit

{quote}
I found interesting that GetClusterMetricsRequest can be null 
{quote}
Same here, Didn't want to change the behaviour . 

> Add Yarnclient#yarnclusterMetrics API implementation in router
> --
>
> Key: YARN-8699
> URL: https://issues.apache.org/jira/browse/YARN-8699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8699.001.patch, YARN-8699.002.patch, 
> YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch
>
>
> Implement YarnclusterMetrics API in FederationClientInterceptor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router

2018-09-07 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607890#comment-16607890
 ] 

Bibin A Chundatt commented on YARN-8699:


Thank you [~giovanni.fumarola] for review and commit

{quote}
I found interesting that GetClusterMetricsRequest can be null 
{quote}
Same here, Didn't want to change the behaviour . 

> Add Yarnclient#yarnclusterMetrics API implementation in router
> --
>
> Key: YARN-8699
> URL: https://issues.apache.org/jira/browse/YARN-8699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8699.001.patch, YARN-8699.002.patch, 
> YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch
>
>
> Implement YarnclusterMetrics API in FederationClientInterceptor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8709) intra-queue preemption checker always fail since one under-served queue was deleted

2018-09-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607880#comment-16607880
 ] 

Hadoop QA commented on YARN-8709:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 73m  
1s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}121m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8709 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938929/YARN-8709.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fa19f44157f8 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bf8a175 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21790/testReport/ |
| Max. process+thread count | 936 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21790/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> intra-queue preemption checker always fail since 

[jira] [Commented] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period

2018-09-07 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607866#comment-16607866
 ] 

Chandni Singh commented on YARN-8706:
-

Thanks [~eyang], [~ebadger], and [~shaneku...@gmail.com] 

> DelayedProcessKiller is executed for Docker containers even though docker 
> stop sends a KILL signal after the specified grace period
> ---
>
> Key: YARN-8706
> URL: https://issues.apache.org/jira/browse/YARN-8706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: docker
> Fix For: 3.2.0
>
> Attachments: YARN-8706.001.patch, YARN-8706.002.patch, 
> YARN-8706.003.patch, YARN-8706.004.patch
>
>
> {{DockerStopCommand}} adds a grace period of 10 seconds.
> 10 seconds is also the default grace time use by docker stop
>  [https://docs.docker.com/engine/reference/commandline/stop/]
> Documentation of the docker stop:
> {quote}the main process inside the container will receive {{SIGTERM}}, and 
> after a grace period, {{SIGKILL}}.
> {quote}
> There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
> for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By 
> default this is set to {{250 milliseconds}} and so irrespective of the 
> container type, it will always get executed.
>  
> For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
> after the grace period
> - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of 
> executing DelayedProcessKiller
> - when sleepDelayBeforeSigKill < 1 second, then the grace period should be 
> the smallest value, which is 1 second, because anyways we are forcing kill 
> after 250 ms
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period

2018-09-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607841#comment-16607841
 ] 

Hudson commented on YARN-8706:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14907 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14907/])
YARN-8706. Updated docker container stop logic to avoid double kill. 
(eyang: rev bf8a1750e99cfbfa76021ce51b6514c74c06f498)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerCommandExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerInspectCommand.java


> DelayedProcessKiller is executed for Docker containers even though docker 
> stop sends a KILL signal after the specified grace period
> ---
>
> Key: YARN-8706
> URL: https://issues.apache.org/jira/browse/YARN-8706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: docker
> Fix For: 3.2.0
>
> Attachments: YARN-8706.001.patch, YARN-8706.002.patch, 
> YARN-8706.003.patch, YARN-8706.004.patch
>
>
> {{DockerStopCommand}} adds a grace period of 10 seconds.
> 10 seconds is also the default grace time use by docker stop
>  [https://docs.docker.com/engine/reference/commandline/stop/]
> Documentation of the docker stop:
> {quote}the main process inside the container will receive {{SIGTERM}}, and 
> after a grace period, {{SIGKILL}}.
> {quote}
> There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
> for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By 
> default this is set to {{250 milliseconds}} and so irrespective of the 
> container type, it will always get executed.
>  
> For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
> after the grace period
> - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of 
> executing DelayedProcessKiller
> - when sleepDelayBeforeSigKill < 1 second, then the grace period should be 
> the smallest value, which is 1 second, because anyways we are forcing kill 
> after 250 ms
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8709) intra-queue preemption checker always fail since one under-served queue was deleted

2018-09-07 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8709:
---
Attachment: YARN-8709.002.patch

> intra-queue preemption checker always fail since one under-served queue was 
> deleted
> ---
>
> Key: YARN-8709
> URL: https://issues.apache.org/jira/browse/YARN-8709
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, scheduler preemption
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8709.001.patch, YARN-8709.002.patch
>
>
> After some queues deleted, the preemption checker in SchedulingMonitor was 
> always skipped  because of YarnRuntimeException for every run.
> Error logs:
> {noformat}
> ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: 
> Exception raised while executing preemption checker, skip this run..., 
> exception=
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't 
> happen, cannot find TempQueuePerPartition for queueName=1535075839208
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> I think there is something wrong with partitionToUnderServedQueues field in 
> ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues 
> can be add but never be removed, except rebuilding this policy. For example, 
> once under-served queue "a" is added into this structure, it will always be 
> there and never be removed, intra-queue preemption checker will try to get 
> all queues info for partitionToUnderServedQueues in 
> IntraQueueCandidatesSelector#selectCandidates and will throw 
> YarnRuntimeException if not found. So that after queue "a" is deleted from 
> queue structure, the preemption checker will always fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8709) intra-queue preemption checker always fail since one under-served queue was deleted

2018-09-07 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607829#comment-16607829
 ] 

Tao Yang commented on YARN-8709:


Thanks [~eepayne] for the review! 
There are several similar problems in 
TestProportionalCapacityPreemptionPolicyIntraQueue, Attached v2 patch to 
correct them.

> intra-queue preemption checker always fail since one under-served queue was 
> deleted
> ---
>
> Key: YARN-8709
> URL: https://issues.apache.org/jira/browse/YARN-8709
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, scheduler preemption
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8709.001.patch, YARN-8709.002.patch
>
>
> After some queues deleted, the preemption checker in SchedulingMonitor was 
> always skipped  because of YarnRuntimeException for every run.
> Error logs:
> {noformat}
> ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: 
> Exception raised while executing preemption checker, skip this run..., 
> exception=
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't 
> happen, cannot find TempQueuePerPartition for queueName=1535075839208
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> I think there is something wrong with partitionToUnderServedQueues field in 
> ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues 
> can be add but never be removed, except rebuilding this policy. For example, 
> once under-served queue "a" is added into this structure, it will always be 
> there and never be removed, intra-queue preemption checker will try to get 
> all queues info for partitionToUnderServedQueues in 
> IntraQueueCandidatesSelector#selectCandidates and will throw 
> YarnRuntimeException if not found. So that after queue "a" is deleted from 
> queue structure, the preemption checker will always fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-07 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8751:

Target Version/s: 3.2.0, 3.1.2  (was: 3.2.0)
   Fix Version/s: 3.1.2

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Critical
>  Labels: Docker
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8751.001.patch
>
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating local dirs...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Path 
> /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
>  has permission 774 but needs per
> mission 750.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null)
> 2018-08-31 21:07:22,386 ERROR 

[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-07 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607817#comment-16607817
 ] 

Eric Yang commented on YARN-8751:
-

[~ccondit-target] cherry-picked to branch-3.1.

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Critical
>  Labels: Docker
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8751.001.patch
>
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating local dirs...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Path 
> /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
>  has permission 774 but needs per
> mission 750.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null)
> 2018-08-31 21:07:22,386 

[jira] [Commented] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period

2018-09-07 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607809#comment-16607809
 ] 

Eric Yang commented on YARN-8706:
-

+1 for patch 004.  I will commit shortly.

> DelayedProcessKiller is executed for Docker containers even though docker 
> stop sends a KILL signal after the specified grace period
> ---
>
> Key: YARN-8706
> URL: https://issues.apache.org/jira/browse/YARN-8706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: docker
> Attachments: YARN-8706.001.patch, YARN-8706.002.patch, 
> YARN-8706.003.patch, YARN-8706.004.patch
>
>
> {{DockerStopCommand}} adds a grace period of 10 seconds.
> 10 seconds is also the default grace time use by docker stop
>  [https://docs.docker.com/engine/reference/commandline/stop/]
> Documentation of the docker stop:
> {quote}the main process inside the container will receive {{SIGTERM}}, and 
> after a grace period, {{SIGKILL}}.
> {quote}
> There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
> for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By 
> default this is set to {{250 milliseconds}} and so irrespective of the 
> container type, it will always get executed.
>  
> For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
> after the grace period
> - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of 
> executing DelayedProcessKiller
> - when sleepDelayBeforeSigKill < 1 second, then the grace period should be 
> the smallest value, which is 1 second, because anyways we are forcing kill 
> after 250 ms
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607808#comment-16607808
 ] 

Hudson commented on YARN-8751:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14905 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14905/])
YARN-8751. Reduce conditions that mark node manager as unhealthy.
(eyang: rev 7d623343879ce9a8f8e64601024d018efc02794c)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java


> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Critical
>  Labels: Docker
> Fix For: 3.2.0
>
> Attachments: YARN-8751.001.patch
>
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating local dirs...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> 

[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-09-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607802#comment-16607802
 ] 

Hadoop QA commented on YARN-8569:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  9m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 43 new + 149 unchanged - 1 fixed = 192 total (was 150) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
57s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
31s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m  0s{color} 
| {color:red} hadoop-yarn-services-core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
33s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 

[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-07 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607797#comment-16607797
 ] 

Craig Condit commented on YARN-8751:


[~eyang], [~shaneku...@gmail.com]: Do we want to commit this to branch-3.1 as 
well?

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Critical
>  Labels: Docker
> Fix For: 3.2.0
>
> Attachments: YARN-8751.001.patch
>
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating local dirs...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Path 
> /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
>  has permission 774 but needs per
> mission 750.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to 

[jira] [Updated] (YARN-8754) [UI2] Improve terms on Component Instance page

2018-09-07 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8754:
-
Description: 
Component instance page has "node" and "host". These two fields are 
representing "bare_host" and "hostname" respectively. 

>From UI2 page thats not clear. Thus, table content need to be changed to "bare 
>host" from "node" .

This page also has "Host URL" which is hard coded to N/A. Thus, removing this 
field from table. 

  was:
Component instance page has "node" and "host". These two fields are 
representing "bare_host" and "hostname" accordingly. 

>From UI2 page thats not clear. Thus, table content need to be changed to "bare 
>host" from "node" .

This page also has "Host URL" which is hard coded to N/A. Thus, removing this 
field from table. 


> [UI2] Improve terms on Component Instance page 
> ---
>
> Key: YARN-8754
> URL: https://issues.apache.org/jira/browse/YARN-8754
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-07 at 4.12.54 PM.png, Screen Shot 
> 2018-09-07 at 4.30.11 PM.png, YARN-8754.001.patch
>
>
> Component instance page has "node" and "host". These two fields are 
> representing "bare_host" and "hostname" respectively. 
> From UI2 page thats not clear. Thus, table content need to be changed to 
> "bare host" from "node" .
> This page also has "Host URL" which is hard coded to N/A. Thus, removing this 
> field from table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2018-09-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607783#comment-16607783
 ] 

Hadoop QA commented on YARN-8045:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8045 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938908/YARN-8045.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8154fa364d3d 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 335a813 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21789/testReport/ |
| Max. process+thread count | 414 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21789/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Updated] (YARN-8754) [UI2] Improve terms on Component Instance page

2018-09-07 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8754:
-
Attachment: Screen Shot 2018-09-07 at 4.30.11 PM.png

> [UI2] Improve terms on Component Instance page 
> ---
>
> Key: YARN-8754
> URL: https://issues.apache.org/jira/browse/YARN-8754
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-07 at 4.12.54 PM.png, Screen Shot 
> 2018-09-07 at 4.30.11 PM.png, YARN-8754.001.patch
>
>
> Component instance page has "node" and "host". These two fields are 
> representing "bare_host" and "hostname" accordingly. 
> From UI2 page thats not clear. Thus, table content need to be changed to 
> "bare host" from "node" .
> This page also has "Host URL" which is hard coded to N/A. Thus, removing this 
> field from table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8754) [UI2] Improve terms on Component Instance page

2018-09-07 Thread Yesha Vora (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1660#comment-1660
 ] 

Yesha Vora commented on YARN-8754:
--

Find the screenshot of component instance page after fixing terms.  !Screen 
Shot 2018-09-07 at 4.30.11 PM.png! 

> [UI2] Improve terms on Component Instance page 
> ---
>
> Key: YARN-8754
> URL: https://issues.apache.org/jira/browse/YARN-8754
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-07 at 4.12.54 PM.png, Screen Shot 
> 2018-09-07 at 4.30.11 PM.png, YARN-8754.001.patch
>
>
> Component instance page has "node" and "host". These two fields are 
> representing "bare_host" and "hostname" accordingly. 
> From UI2 page thats not clear. Thus, table content need to be changed to 
> "bare host" from "node" .
> This page also has "Host URL" which is hard coded to N/A. Thus, removing this 
> field from table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8754) [UI2] Improve terms on Component Instance page

2018-09-07 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8754:
-
Attachment: YARN-8754.001.patch

> [UI2] Improve terms on Component Instance page 
> ---
>
> Key: YARN-8754
> URL: https://issues.apache.org/jira/browse/YARN-8754
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-07 at 4.12.54 PM.png, 
> YARN-8754.001.patch
>
>
> Component instance page has "node" and "host". These two fields are 
> representing "bare_host" and "hostname" accordingly. 
> From UI2 page thats not clear. Thus, table content need to be changed to 
> "bare host" from "node" .
> This page also has "Host URL" which is hard coded to N/A. Thus, removing this 
> field from table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8754) [UI2] Improve terms on Component Instance page

2018-09-07 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8754:
-
Attachment: Screen Shot 2018-09-07 at 4.12.54 PM.png

> [UI2] Improve terms on Component Instance page 
> ---
>
> Key: YARN-8754
> URL: https://issues.apache.org/jira/browse/YARN-8754
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-07 at 4.12.54 PM.png
>
>
> Component instance page has "node" and "host". These two fields are 
> representing "bare_host" and "hostname" accordingly. 
> From UI2 page thats not clear. Thus, table content need to be changed to 
> "bare host" from "node" .
> This page also has "Host URL" which is hard coded to N/A. Thus, removing this 
> field from table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8748) Javadoc warnings within the nodemanager package

2018-09-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607769#comment-16607769
 ] 

Hadoop QA commented on YARN-8748:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 19s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 5 new + 9 unchanged - 0 fixed = 14 total (was 9) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 0 new + 0 unchanged - 10 fixed = 0 total (was 10) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m  
0s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8748 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938907/YARN-8748.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 26a04272dbae 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 335a813 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21788/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test 

[jira] [Updated] (YARN-8754) [UI2] Improve terms on Component Instance page

2018-09-07 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8754:
-
Affects Version/s: 3.1.1

> [UI2] Improve terms on Component Instance page 
> ---
>
> Key: YARN-8754
> URL: https://issues.apache.org/jira/browse/YARN-8754
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
>
> Component instance page has "node" and "host". These two fields are 
> representing "bare_host" and "hostname" accordingly. 
> From UI2 page thats not clear. Thus, table content need to be changed to 
> "bare host" from "node" .
> This page also has "Host URL" which is hard coded to N/A. Thus, removing this 
> field from table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8754) [UI2] Improve terms on Component Instance page

2018-09-07 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora reassigned YARN-8754:


Assignee: Yesha Vora

> [UI2] Improve terms on Component Instance page 
> ---
>
> Key: YARN-8754
> URL: https://issues.apache.org/jira/browse/YARN-8754
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
>
> Component instance page has "node" and "host". These two fields are 
> representing "bare_host" and "hostname" accordingly. 
> From UI2 page thats not clear. Thus, table content need to be changed to 
> "bare host" from "node" .
> This page also has "Host URL" which is hard coded to N/A. Thus, removing this 
> field from table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8754) [UI2] Improve terms on Component Instance page

2018-09-07 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8754:


 Summary: [UI2] Improve terms on Component Instance page 
 Key: YARN-8754
 URL: https://issues.apache.org/jira/browse/YARN-8754
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


Component instance page has "node" and "host". These two fields are 
representing "bare_host" and "hostname" accordingly. 

>From UI2 page thats not clear. Thus, table content need to be changed to "bare 
>host" from "node" .

This page also has "Host URL" which is hard coded to N/A. Thus, removing this 
field from table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-07 Thread Botong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607758#comment-16607758
 ] 

Botong Huang commented on YARN-8658:


I did a quick pass, find a few small issues. Please fix the yetus complaints as 
well. 

AMRMClientRelayerMetrics: 
Metrics for AMRMProxy Internals. -> Metrics for FederationInterceptor (or 
AMRMClientRelayer?) Internals.
remove everything about "E2E", perhaps per sub-cluster data is good enough here?

UnmanagedApplicationManager: retain the empty line at line 169

> Metrics for AMRMClientRelayer inside FederationInterceptor
> --
>
> Key: YARN-8658
> URL: https://issues.apache.org/jira/browse/YARN-8658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-8658.01.patch, YARN-8658.02.patch, 
> YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, YARN-8658.06.patch
>
>
> AMRMClientRelayer (YARN-7900) is introduced for stateful 
> FederationInterceptor (YARN-7899), to keep track of all pending requests sent 
> to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to 
> show the state of things in FederationInterceptor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607751#comment-16607751
 ] 

Hadoop QA commented on YARN-8658:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m 
11s{color} | {color:red} hadoop-yarn-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 11s{color} 
| {color:red} hadoop-yarn-server in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 59s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 3 new + 
0 unchanged - 0 fixed = 3 total (was 0) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
38s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m 
15s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
24s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  2m 20s{color} 
| {color:red} hadoop-yarn-server-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 59s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.uam.TestUnmanagedApplicationManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8658 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938903/YARN-8658.06.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 557a8a03ac58 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Assigned] (YARN-8045) Reduce log output from container status calls

2018-09-07 Thread Craig Condit (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit reassigned YARN-8045:
--

Assignee: Craig Condit

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Major
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-09-07 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607714#comment-16607714
 ] 

Eric Yang commented on YARN-8569:
-

[~leftnoteasy] Patch 6 addressed the following:

- Use localizer to distribute initial copy of service.json in a tarball.
- Mount expanded sysfs.tar to container.
- Added logic to replace sysfs.tar local copy with the latest status after 
service reaches stable state.
- Added test cases in Java and C level
- On-off feature switch in container-executor.cfg to disable the feature.

I didn't make sysfs REST API a generic mechanism for tarball replacer for 
distributed cache to prevent people from abusing this API for unintended 
purpose.  If you still like that generic feature, please open a separate JIRA 
for that.

> Create an interface to provide cluster information to application
> -
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8569 YARN sysfs interface to provide cluster 
> information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, 
> YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, 
> YARN-8569.006.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8748) Javadoc warnings within the nodemanager package

2018-09-07 Thread Craig Condit (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit reassigned YARN-8748:
--

Assignee: Craig Condit

> Javadoc warnings within the nodemanager package
> ---
>
> Key: YARN-8748
> URL: https://issues.apache.org/jira/browse/YARN-8748
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Trivial
>
> There are a number of javadoc warnings in trunk in classes under the 
> nodemanager package. These should be addressed or suppressed.
> {code:java}
> [WARNING] Javadoc Warnings
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java:93:
>  warning - Tag @see: reference not found: 
> ContainerLaunch.ShellScriptBuilder#listDebugInformation
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX (referenced by @value 
> tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_FILE_PERMISSIONS 
> (referenced by @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_POLICY (referenced by 
> @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_WHITELIST_GROUP 
> (referenced by @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_POLICY_GROUP_PREFIX 
> (referenced by @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:211:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_WHITELIST_GROUP 
> (referenced by @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:211:
>  warning - NMContainerPolicyUtils#SECURITY_FLAG (referenced by @value tag) is 
> an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficControlBandwidthHandlerImpl.java:248:
>  warning - @return tag has no arguments.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8569) Create an interface to provide cluster information to application

2018-09-07 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8569:

Attachment: YARN-8569.006.patch

> Create an interface to provide cluster information to application
> -
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8569 YARN sysfs interface to provide cluster 
> information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, 
> YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, 
> YARN-8569.006.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8500) Use hbase shaded jars

2018-09-07 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607702#comment-16607702
 ] 

Vrushali C commented on YARN-8500:
--

Came across HBASE-15666 

This explains the failures with Mini Cluster that I am seeing. 

> Use hbase shaded jars
> -
>
> Key: YARN-8500
> URL: https://issues.apache.org/jira/browse/YARN-8500
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Vrushali C
>Priority: Major
> Attachments: YARN-8500.0001.patch
>
>
> Move to using hbase shaded jars in atsv2 
> Related jira YARN-7213



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-07 Thread Young Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Young Chen updated YARN-8658:
-
Attachment: YARN-8658.06.patch

> Metrics for AMRMClientRelayer inside FederationInterceptor
> --
>
> Key: YARN-8658
> URL: https://issues.apache.org/jira/browse/YARN-8658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-8658.01.patch, YARN-8658.02.patch, 
> YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, YARN-8658.06.patch
>
>
> AMRMClientRelayer (YARN-7900) is introduced for stateful 
> FederationInterceptor (YARN-7899), to keep track of all pending requests sent 
> to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to 
> show the state of things in FederationInterceptor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart

2018-09-07 Thread Yesha Vora (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607650#comment-16607650
 ] 

Yesha Vora commented on YARN-8753:
--

Find the screenshot of Nodemanager chart after adding LOST. 
 !Screen Shot 2018-09-07 at 11.59.02 AM.png! 

> [UI2] Lost nodes representation missing from Nodemanagers Chart
> ---
>
> Key: YARN-8753
> URL: https://issues.apache.org/jira/browse/YARN-8753
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot 
> 2018-09-06 at 6.16.14 PM.png, Screen Shot 2018-09-07 at 11.59.02 AM.png, 
> YARN-8753.001.patch
>
>
> Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status 
> page. 
> This chart does not show nodemanagers if they are LOST. 
> Due to this issue, Node information page and Node status page shows different 
> node managers count. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8735) Remove @value javadoc annotation from YARN projects

2018-09-07 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang resolved YARN-8735.
-
Resolution: Duplicate

> Remove @value javadoc annotation from YARN projects
> ---
>
> Key: YARN-8735
> URL: https://issues.apache.org/jira/browse/YARN-8735
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Yang
>Priority: Major
>
> Maven javadoc plugin doesn't support @value annotation, even though IntelliJ 
> works.  There are only ~12 instances that need to be removed.  It is probably 
> better to remove them before this snowball into a problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8735) Remove @value javadoc annotation from YARN projects

2018-09-07 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607652#comment-16607652
 ] 

Eric Yang commented on YARN-8735:
-

Same issues, and YARN-8748 covers a bit more.  Mark this one as duplicate.

> Remove @value javadoc annotation from YARN projects
> ---
>
> Key: YARN-8735
> URL: https://issues.apache.org/jira/browse/YARN-8735
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Yang
>Priority: Major
>
> Maven javadoc plugin doesn't support @value annotation, even though IntelliJ 
> works.  There are only ~12 instances that need to be removed.  It is probably 
> better to remove them before this snowball into a problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart

2018-09-07 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8753:
-
Attachment: Screen Shot 2018-09-07 at 11.59.02 AM.png

> [UI2] Lost nodes representation missing from Nodemanagers Chart
> ---
>
> Key: YARN-8753
> URL: https://issues.apache.org/jira/browse/YARN-8753
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot 
> 2018-09-06 at 6.16.14 PM.png, Screen Shot 2018-09-07 at 11.59.02 AM.png, 
> YARN-8753.001.patch
>
>
> Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status 
> page. 
> This chart does not show nodemanagers if they are LOST. 
> Due to this issue, Node information page and Node status page shows different 
> node managers count. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart

2018-09-07 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8753:
-
Attachment: YARN-8753.001.patch

> [UI2] Lost nodes representation missing from Nodemanagers Chart
> ---
>
> Key: YARN-8753
> URL: https://issues.apache.org/jira/browse/YARN-8753
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot 
> 2018-09-06 at 6.16.14 PM.png, YARN-8753.001.patch
>
>
> Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status 
> page. 
> This chart does not show nodemanagers if they are LOST. 
> Due to this issue, Node information page and Node status page shows different 
> node managers count. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-4961) Wrapper for leveldb DB to aid in handling database exceptions

2018-09-07 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned YARN-4961:


Assignee: Pradeep Ambati

Yes, exactly.  Thanks for picking this up!

Currently it's very fragile to wield the database directly because instead of 
throwing checked IOExceptions when I/O errors occur it throws a runtime 
DBException.  Having a wrapper class that provides the same methods but 
throwing checked IOExceptions instead of unchecked runtime exceptions would 
make it safer to use as a state store backend in Hadoop where we don't 
necessarily want to tear down the entire server when an I/O error occurs.


> Wrapper for leveldb DB to aid in handling database exceptions
> -
>
> Key: YARN-4961
> URL: https://issues.apache.org/jira/browse/YARN-4961
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jason Lowe
>Assignee: Pradeep Ambati
>Priority: Major
>
> It would be nice to have a utility wrapper around leveldb's DB to translate 
> the raw runtime DBExceptions into IOExceptions.  This would help make the 
> code using leveldb easier to read and less error-prone to allowing the 
> runtime DBExceptions to escape and potentially terminate the calling process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart

2018-09-07 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8753:
-
Description: 
Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status page. 
This chart does not show nodemanagers if they are LOST. 

Due to this issue, Node information page and Node status page shows different 
node managers count. 

  was:
Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status page. 
This chart does not show nodemanagers if they are LOST. 


> [UI2] Lost nodes representation missing from Nodemanagers Chart
> ---
>
> Key: YARN-8753
> URL: https://issues.apache.org/jira/browse/YARN-8753
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot 
> 2018-09-06 at 6.16.14 PM.png
>
>
> Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status 
> page. 
> This chart does not show nodemanagers if they are LOST. 
> Due to this issue, Node information page and Node status page shows different 
> node managers count. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart

2018-09-07 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8753:
-
Attachment: Screen Shot 2018-09-06 at 6.16.02 PM.png

> [UI2] Lost nodes representation missing from Nodemanagers Chart
> ---
>
> Key: YARN-8753
> URL: https://issues.apache.org/jira/browse/YARN-8753
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot 
> 2018-09-06 at 6.16.14 PM.png
>
>
> Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status 
> page. 
> This chart does not show nodemanagers if they are LOST. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart

2018-09-07 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8753:
-
Attachment: Screen Shot 2018-09-06 at 6.16.14 PM.png

> [UI2] Lost nodes representation missing from Nodemanagers Chart
> ---
>
> Key: YARN-8753
> URL: https://issues.apache.org/jira/browse/YARN-8753
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot 
> 2018-09-06 at 6.16.14 PM.png
>
>
> Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status 
> page. 
> This chart does not show nodemanagers if they are LOST. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart

2018-09-07 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora reassigned YARN-8753:


Assignee: Yesha Vora

> [UI2] Lost nodes representation missing from Nodemanagers Chart
> ---
>
> Key: YARN-8753
> URL: https://issues.apache.org/jira/browse/YARN-8753
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
>
> Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status 
> page. 
> This chart does not show nodemanagers if they are LOST. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart

2018-09-07 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8753:


 Summary: [UI2] Lost nodes representation missing from Nodemanagers 
Chart
 Key: YARN-8753
 URL: https://issues.apache.org/jira/browse/YARN-8753
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.1
Reporter: Yesha Vora


Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status page. 
This chart does not show nodemanagers if they are LOST. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4961) Wrapper for leveldb DB to aid in handling database exceptions

2018-09-07 Thread Pradeep Ambati (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607520#comment-16607520
 ] 

Pradeep Ambati commented on YARN-4961:
--

[~jlowe], I want to work on this JIRA. From what I understood, there should be 
a utility wrapper around DB which throws IOExceptions (translated from 
DBExceptions) instead of DBExceptions, am I right?

> Wrapper for leveldb DB to aid in handling database exceptions
> -
>
> Key: YARN-4961
> URL: https://issues.apache.org/jira/browse/YARN-4961
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jason Lowe
>Priority: Major
>
> It would be nice to have a utility wrapper around leveldb's DB to translate 
> the raw runtime DBExceptions into IOExceptions.  This would help make the 
> code using leveldb easier to read and less error-prone to allowing the 
> runtime DBExceptions to escape and potentially terminate the calling process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router

2018-09-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607516#comment-16607516
 ] 

Hudson commented on YARN-8699:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14899 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14899/])
YARN-8699. Add Yarnclient#yarnclusterMetrics API implementation in (gifuma: rev 
3dc2988a3779590409cbe7062046e3fee68f8d22)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/MockResourceManagerFacade.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/RouterYarnClientUtils.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/test/java/org/apache/hadoop/yarn/server/router/clientrm/TestRouterYarnClientUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/test/java/org/apache/hadoop/yarn/server/router/clientrm/TestFederationClientInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/ClientMethod.java


> Add Yarnclient#yarnclusterMetrics API implementation in router
> --
>
> Key: YARN-8699
> URL: https://issues.apache.org/jira/browse/YARN-8699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8699.001.patch, YARN-8699.002.patch, 
> YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch
>
>
> Implement YarnclusterMetrics API in FederationClientInterceptor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

2018-09-07 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607495#comment-16607495
 ] 

Wangda Tan commented on YARN-8513:
--

Spent good amount of time to check the issue.

I found scheduler tries to reserve containers on two nodes. What happens is:

1) For root queue, total resource = 1351680, used resource = 1095680, available 
resource = 256000
2) The app which gets resource is running under dev queue, maximum resource = 
8811008, used resource = 7168.
3) The app always get container reserved with size=360448, which is beyond 
parent queue's available resource. So this request will be rejected by resource 
committer.

In my mind, this is expected behavior, even though the resource proposal / 
reject is not necessary. This behavior is in-line with YARN-4280, which we want 
to keep under-utilized queue still get resources when resource request is large.

Let me use an example to explain this:

Scheduler has two queues, a and b, capacity of each queues are 0.5. max 
capacity of a = 1.0, b=0.8. Assume cluster resource = 100.

There's an app running in a, which uses 75 resources, so a's absolute used 
capacity = 0.75. There're still many pending resource request from a, size of 
each = 1

And then user submit app to b. asking a single container, which has size = 30. 
In that case, scheduler cannot allocate the container because cluster's total 
available = 25.

If we give these resources to queue=a, queue=b can never get the available 
resource, because smaller resource request will be always preferred.

Instead, the logic in YARN-4280 is: if queue b don't get resource because of 
parent queue's resource limit. Instead of giving resources to other queues, 
scheduler hold the resource. So you can see that there're 25 resources 
available, but no one can get the resource.

The problem only occurs in a super busy cluster, with less node. To solve the 
problem, turn on preemption can alleviate the issue a lot.

I prefer to close this as "no fix needed".

Thoughts?

> CapacityScheduler infinite loop when queue is near fully utilized
> -
>
> Key: YARN-8513
> URL: https://issues.apache.org/jira/browse/YARN-8513
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 3.1.0, 2.9.1
> Environment: Ubuntu 14.04.5 and 16.04.4
> YARN is configured with one label and 5 queues.
>Reporter: Chen Yufei
>Priority: Major
> Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, 
> jstack-5.log, top-during-lock.log, top-when-normal.log, yarn3-jstack1.log, 
> yarn3-jstack2.log, yarn3-jstack3.log, yarn3-jstack4.log, yarn3-jstack5.log, 
> yarn3-resourcemanager.log, yarn3-top
>
>
> ResourceManager does not respond to any request when queue is near fully 
> utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM 
> restart, it can recover running jobs and start accepting new ones.
>  
> Seems like CapacityScheduler is in an infinite loop printing out the 
> following log messages (more than 25,000 lines in a second):
>  
> {{2018-07-10 17:16:29,227 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> assignedContainer queue=root usedCapacity=0.99816763 
> absoluteUsedCapacity=0.99816763 used= 
> cluster=}}
> {{2018-07-10 17:16:29,227 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal}}
> {{2018-07-10 17:16:29,227 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
>  assignedContainer application attempt=appattempt_1530619767030_1652_01 
> container=null 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943
>  clusterResource= type=NODE_LOCAL 
> requestedPartition=}}
>  
> I encounter this problem several times after upgrading to YARN 2.9.1, while 
> the same configuration works fine under version 2.7.3.
>  
> YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a 
> similar problem.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router

2018-09-07 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8699:
---
Fix Version/s: 3.2.0

> Add Yarnclient#yarnclusterMetrics API implementation in router
> --
>
> Key: YARN-8699
> URL: https://issues.apache.org/jira/browse/YARN-8699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8699.001.patch, YARN-8699.002.patch, 
> YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch
>
>
> Implement YarnclusterMetrics API in FederationClientInterceptor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router

2018-09-07 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607480#comment-16607480
 ] 

Giovanni Matteo Fumarola commented on YARN-8699:


[^YARN-8699.005.patch] looks good.
Committing to trunk.

Thanks [~bibinchundatt] for the patch.

However, I found interesting that GetClusterMetricsRequest can be null and have 
proper results.
RM accepts null requests for this call.

> Add Yarnclient#yarnclusterMetrics API implementation in router
> --
>
> Key: YARN-8699
> URL: https://issues.apache.org/jira/browse/YARN-8699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8699.001.patch, YARN-8699.002.patch, 
> YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch
>
>
> Implement YarnclusterMetrics API in FederationClientInterceptor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607466#comment-16607466
 ] 

Hadoop QA commented on YARN-8658:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
59s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 57s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
38s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m  
2s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 86m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8658 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938850/YARN-8658.05.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux def21ffa6bb9 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 94ed5cf |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21785/testReport/ |
| Max. process+thread count | 328 (vs. ulimit of 

[jira] [Comment Edited] (YARN-8200) Backport resource types/GPU features to branch-2

2018-09-07 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607445#comment-16607445
 ] 

Jonathan Hung edited comment on YARN-8200 at 9/7/18 6:06 PM:
-

Build 
https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779 
timed out:
{noformat}cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt
 2>&1
Elapsed:   2m 40s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 2>&1
Elapsed:  15m 20s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
 2>&1
Elapsed:   4m 49s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 2>&1
Elapsed:  79m 41s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
 2>&1
Elapsed:   3m 59s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 2>&1
Build timed out (after 500 minutes). Marking the build as aborted.
Build was aborted
Performing Post build task...
Match found for :. : True
Logical operation result is TRUE
Running script  : #!/bin/bash{noformat}

It appears the unit tests hang here: 
(https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt)
{noformat}[INFO] --- maven-compiler-plugin:3.1:testCompile 
(default-testCompile) @ hadoop-yarn-client ---
[INFO] Compiling 34 source files to 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/test-classes
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java:[311,6]
 [deprecation] MiniYARNCluster(String,int,int,int,int,boolean) in 
MiniYARNCluster has been deprecated
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestNMClientAsync.java:[453,16]
 [deprecation] onIncreaseContainerResourceError(ContainerId,Throwable) in 
AbstractCallbackHandler has been deprecated
[WARNING] 

[jira] [Comment Edited] (YARN-8200) Backport resource types/GPU features to branch-2

2018-09-07 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607445#comment-16607445
 ] 

Jonathan Hung edited comment on YARN-8200 at 9/7/18 6:03 PM:
-

Build 
https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779 
timed out:
{noformat}cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt
 2>&1
Elapsed:   2m 40s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 2>&1
Elapsed:  15m 20s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
 2>&1
Elapsed:   4m 49s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 2>&1
Elapsed:  79m 41s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
 2>&1
Elapsed:   3m 59s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 2>&1
Build timed out (after 500 minutes). Marking the build as aborted.
Build was aborted
Performing Post build task...
Match found for :. : True
Logical operation result is TRUE
Running script  : #!/bin/bash{noformat}

It appears the unit tests hang here: 
(https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt)
{noformat}[INFO] --- maven-compiler-plugin:3.1:testCompile 
(default-testCompile) @ hadoop-yarn-client ---
[INFO] Compiling 34 source files to 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/test-classes
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java:[311,6]
 [deprecation] MiniYARNCluster(String,int,int,int,int,boolean) in 
MiniYARNCluster has been deprecated
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestNMClientAsync.java:[453,16]
 [deprecation] onIncreaseContainerResourceError(ContainerId,Throwable) in 
AbstractCallbackHandler has been deprecated
[WARNING] 

[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-09-07 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607445#comment-16607445
 ] 

Jonathan Hung commented on YARN-8200:
-

Build 
https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779 
timed out:
{noformat}cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt
 2>&1
Elapsed:   2m 40s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 2>&1
Elapsed:  15m 20s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
 2>&1
Elapsed:   4m 49s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 2>&1
Elapsed:  79m 41s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
 2>&1
Elapsed:   3m 59s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 2>&1
Build timed out (after 500 minutes). Marking the build as aborted.
Build was aborted
Performing Post build task...
Match found for :. : True
Logical operation result is TRUE
Running script  : #!/bin/bash{noformat}

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-8200-branch-2.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues 

[jira] [Commented] (YARN-5592) Add support for dynamic resource updates with multiple resource types

2018-09-07 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607393#comment-16607393
 ] 

Wangda Tan commented on YARN-5592:
--

[~sunilg], 

I think remove resource types gonna be hard. Unless we can pause the scheduler, 
check all resources existing in the heap, it is almost impossible to remove 
resource types. 

Adding resource types is also hard, since we have assumptions that all 
resources have the same length.

It makes more sense to me to restart RMs to take effect of resource related 
changes. 

What is the problem we wanna to solve here?

> Add support for dynamic resource updates with multiple resource types
> -
>
> Key: YARN-5592
> URL: https://issues.apache.org/jira/browse/YARN-5592
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-5592-design-2.docx
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-07 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607373#comment-16607373
 ] 

Eric Yang commented on YARN-8751:
-

+1 LGTM.

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8751.001.patch
>
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating local dirs...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Path 
> /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
>  has permission 774 but needs per
> mission 750.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null)
> 2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch 
> (ContainerRelaunch.java:call(129)) - Failed 

[jira] [Updated] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-07 Thread Young Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Young Chen updated YARN-8658:
-
Attachment: YARN-8658.05.patch

> Metrics for AMRMClientRelayer inside FederationInterceptor
> --
>
> Key: YARN-8658
> URL: https://issues.apache.org/jira/browse/YARN-8658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-8658.01.patch, YARN-8658.02.patch, 
> YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch
>
>
> AMRMClientRelayer (YARN-7900) is introduced for stateful 
> FederationInterceptor (YARN-7899), to keep track of all pending requests sent 
> to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to 
> show the state of things in FederationInterceptor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-07 Thread Young Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Young Chen updated YARN-8658:
-
Attachment: (was: YARN-8658.04.patch)

> Metrics for AMRMClientRelayer inside FederationInterceptor
> --
>
> Key: YARN-8658
> URL: https://issues.apache.org/jira/browse/YARN-8658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-8658.01.patch, YARN-8658.02.patch, 
> YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch
>
>
> AMRMClientRelayer (YARN-7900) is introduced for stateful 
> FederationInterceptor (YARN-7899), to keep track of all pending requests sent 
> to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to 
> show the state of things in FederationInterceptor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-07 Thread Young Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Young Chen updated YARN-8658:
-
Attachment: YARN-8658.04.patch

> Metrics for AMRMClientRelayer inside FederationInterceptor
> --
>
> Key: YARN-8658
> URL: https://issues.apache.org/jira/browse/YARN-8658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-8658.01.patch, YARN-8658.02.patch, 
> YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.04.patch
>
>
> AMRMClientRelayer (YARN-7900) is introduced for stateful 
> FederationInterceptor (YARN-7899), to keep track of all pending requests sent 
> to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to 
> show the state of things in FederationInterceptor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8709) intra-queue preemption checker always fail since one under-served queue was deleted

2018-09-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607172#comment-16607172
 ] 

Eric Payne commented on YARN-8709:
--

[~Tao Yang], thanks for the patch. The changes look good. One small nit:
Unless I am mis-counting, the amount of pending resources for queue b should be 
50 and not 60 in 
{{TestProportionalCapacityPreemptionPolicyIntraQueue#testIntraQueuePreemptionAfterQueueDropped}}

> intra-queue preemption checker always fail since one under-served queue was 
> deleted
> ---
>
> Key: YARN-8709
> URL: https://issues.apache.org/jira/browse/YARN-8709
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, scheduler preemption
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8709.001.patch
>
>
> After some queues deleted, the preemption checker in SchedulingMonitor was 
> always skipped  because of YarnRuntimeException for every run.
> Error logs:
> {noformat}
> ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: 
> Exception raised while executing preemption checker, skip this run..., 
> exception=
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't 
> happen, cannot find TempQueuePerPartition for queueName=1535075839208
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> I think there is something wrong with partitionToUnderServedQueues field in 
> ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues 
> can be add but never be removed, except rebuilding this policy. For example, 
> once under-served queue "a" is added into this structure, it will always be 
> there and never be removed, intra-queue preemption checker will try to get 
> all queues info for partitionToUnderServedQueues in 
> IntraQueueCandidatesSelector#selectCandidates and will throw 
> YarnRuntimeException if not found. So that after queue "a" is deleted from 
> queue structure, the preemption checker will always fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8750) Refactor TestQueueMetrics

2018-09-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606978#comment-16606978
 ] 

Hadoop QA commented on YARN-8750:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 
22s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 18s{color} | {color:orange} root: The patch generated 8 new + 93 unchanged - 
23 fixed = 101 total (was 116) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 11 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m  
1s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 72m 
10s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}190m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8750 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938780/YARN-8750.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 733a727497c1 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 396ce7b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| 

[jira] [Comment Edited] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml

2018-09-07 Thread Rahul Anand (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606769#comment-16606769
 ] 

Rahul Anand edited comment on YARN-7592 at 9/7/18 9:42 AM:
---

As per my understanding, for a Non-HA setup, with the default configuration, 
this will always create a problem. I have listed down my analysis.

NodeManager registration starts from {{NodeManager#main}} and evetually invokes 
{{NodeStatusUpdaterImpl#serviceStart}}
{code:java}
protected void serviceStart() throws Exception {
...
this.resourceTracker = getRMClient();
..
  } catch (Exception e) {
  String errorMessage = "Unexpected error starting NodeStatusUpdater";
  LOG.error(errorMessage, e);
  throw new YarnRuntimeException(e);
 }
}
 {code}
Then, NodeStatusUpdaterImpl#getRMClient tries to create RM proxy for resource 
tracker protocol. Now, the Federation enabled check in RMProxy#newProxyInstance 
{code:java}
if (HAUtil.isHAEnabled(conf) || HAUtil.isFederationEnabled(conf)) {
   RMFailoverProxyProvider provider =
   instance.createRMFailoverProxyProvider(conf, protocol);{code}
is failing the registration of the nodemanager. By default, 
RMProxy#createRMFailoverProxyProvider will always select 
ConfiguredRMFailoverProxyProvider 
{code:java}
RMFailoverProxyProvider provider = ReflectionUtils.newInstance(
  conf.getClass(YarnConfiguration.CLIENT_FAILOVER_PROXY_PROVIDER,
 defaultProviderClass, RMFailoverProxyProvider.class), conf);
provider.init(conf, (RMProxy) this, protocol);{code}
and eventually, it will try to get RM's id from 
ConfiguredRMFailoverProxyProvider#init
{code:java}
Collection rmIds = HAUtil.getRMHAIds(conf);
{code}
which would have been set only in case of HA setup according to 
ResourceManager#serviceInit.
{code}
this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf));
if (this.rmContext.isHAEnabled()) {
HAUtil.verifyAndSetConfiguration(this.conf);
}
  {code}
 

When I tried to run with the proxy provider as 
FederationRMFailoverProxyProvider, it started the nodemanager but this would be 
idealistic to work with only in case of 1 RM. 
{code:java}

yarn.client.failover-proxy-provider

org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider
{code}
Please correct if I am wrong at any point. 

 


was (Author: rahulanand90):
As per my understanding, for a Non-HA setup, with the default configuration, 
this will always create a problem. I have listed down my analysis.

NodeManager registration starts from {{NodeManager#main}} and evetually invokes 
{{NodeStatusUpdaterImpl#serviceStart}} 
{code:java}
protected void serviceStart() throws Exception \{
...
this.resourceTracker = getRMClient();
..
  } catch (Exception e) \{
  String errorMessage = "Unexpected error starting NodeStatusUpdater";
  LOG.error(errorMessage, e);
  throw new YarnRuntimeException(e);
 }
}
 {code}
Then, NodeStatusUpdaterImpl#getRMClient tries to create RM proxy for resource 
tracker protocol. Now, the Federation enabled check in RMProxy#newProxyInstance 
{code:java}
if (HAUtil.isHAEnabled(conf) || HAUtil.isFederationEnabled(conf)) {
   RMFailoverProxyProvider provider =
   instance.createRMFailoverProxyProvider(conf, protocol);{code}
is failing the registration of the nodemanager. By default, 
RMProxy#createRMFailoverProxyProvider will always select 
ConfiguredRMFailoverProxyProvider 
{code:java}
RMFailoverProxyProvider provider = ReflectionUtils.newInstance(
  conf.getClass(YarnConfiguration.CLIENT_FAILOVER_PROXY_PROVIDER,
 defaultProviderClass, RMFailoverProxyProvider.class), conf);
provider.init(conf, (RMProxy) this, protocol);{code}
and eventually, it will try to get RM's id from 
ConfiguredRMFailoverProxyProvider#init
{code:java}
Collection rmIds = HAUtil.getRMHAIds(conf);
 which would have been set only in case of HA setup according to 
ResourceManager#serviceInit.
this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf));
if (this.rmContext.isHAEnabled()) \{
HAUtil.verifyAndSetConfiguration(this.conf);
}
  {code}
 

When I tried to run with the proxy provider as 
FederationRMFailoverProxyProvider, it started the nodemanager but this would be 
idealistic to work with only in case of 1 RM. 
{code:java}

yarn.client.failover-proxy-provider

org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider
{code}
Please correct if I am wrong at any point. 

 

> yarn.federation.failover.enabled missing in yarn-default.xml
> 
>
> Key: YARN-7592
> URL: https://issues.apache.org/jira/browse/YARN-7592
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.0.0-beta1
>Reporter: Gera Shegalov
>Priority: Major
> Attachments: IssueReproduce.patch
>
>
> 

[jira] [Commented] (YARN-7761) [UI2]Clicking 'master container log' or 'Link' next to 'log' under application's appAttempt goes to Old UI's Log link

2018-09-07 Thread Akhil PB (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606839#comment-16606839
 ] 

Akhil PB commented on YARN-7761:


YARN-7760 only fixed AM Node redirection.

> [UI2]Clicking 'master container log' or 'Link' next to 'log' under 
> application's appAttempt goes to Old UI's Log link
> -
>
> Key: YARN-7761
> URL: https://issues.apache.org/jira/browse/YARN-7761
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Sumana Sathish
>Assignee: Akhil PB
>Priority: Major
>
> Clicking 'master container log' or 'Link' next to 'Log' under application's 
> appAttempt goes to Old UI's Log link



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7761) [UI2]Clicking 'master container log' or 'Link' next to 'log' under application's appAttempt goes to Old UI's Log link

2018-09-07 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB reassigned YARN-7761:
--

Assignee: Akhil PB  (was: Vasudevan Skm)

> [UI2]Clicking 'master container log' or 'Link' next to 'log' under 
> application's appAttempt goes to Old UI's Log link
> -
>
> Key: YARN-7761
> URL: https://issues.apache.org/jira/browse/YARN-7761
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Sumana Sathish
>Assignee: Akhil PB
>Priority: Major
>
> Clicking 'master container log' or 'Link' next to 'Log' under application's 
> appAttempt goes to Old UI's Log link



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8750) Refactor TestQueueMetrics

2018-09-07 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8750:
-
Attachment: YARN-8750.001.patch

> Refactor TestQueueMetrics
> -
>
> Key: YARN-8750
> URL: https://issues.apache.org/jira/browse/YARN-8750
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-8750.001.patch
>
>
> {{TestQueueMetrics#checkApps}} and {{TestQueueMetrics#checkResources}} have 8 
> and 14 parameters, respectively.
> It is very hard to read the testcases that are using these methods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml

2018-09-07 Thread Rahul Anand (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606769#comment-16606769
 ] 

Rahul Anand commented on YARN-7592:
---

As per my understanding, for a Non-HA setup, with the default configuration, 
this will always create a problem. I have listed down my analysis.

NodeManager registration starts from {{NodeManager#main}} and evetually invokes 
{{NodeStatusUpdaterImpl#serviceStart}} 
{code:java}
protected void serviceStart() throws Exception \{
...
this.resourceTracker = getRMClient();
..
  } catch (Exception e) \{
  String errorMessage = "Unexpected error starting NodeStatusUpdater";
  LOG.error(errorMessage, e);
  throw new YarnRuntimeException(e);
 }
}
 {code}
Then, NodeStatusUpdaterImpl#getRMClient tries to create RM proxy for resource 
tracker protocol. Now, the Federation enabled check in RMProxy#newProxyInstance 
{code:java}
if (HAUtil.isHAEnabled(conf) || HAUtil.isFederationEnabled(conf)) {
   RMFailoverProxyProvider provider =
   instance.createRMFailoverProxyProvider(conf, protocol);{code}
is failing the registration of the nodemanager. By default, 
RMProxy#createRMFailoverProxyProvider will always select 
ConfiguredRMFailoverProxyProvider 
{code:java}
RMFailoverProxyProvider provider = ReflectionUtils.newInstance(
  conf.getClass(YarnConfiguration.CLIENT_FAILOVER_PROXY_PROVIDER,
 defaultProviderClass, RMFailoverProxyProvider.class), conf);
provider.init(conf, (RMProxy) this, protocol);{code}
and eventually, it will try to get RM's id from 
ConfiguredRMFailoverProxyProvider#init
{code:java}
Collection rmIds = HAUtil.getRMHAIds(conf);
 which would have been set only in case of HA setup according to 
ResourceManager#serviceInit.
this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf));
if (this.rmContext.isHAEnabled()) \{
HAUtil.verifyAndSetConfiguration(this.conf);
}
  {code}
 

When I tried to run with the proxy provider as 
FederationRMFailoverProxyProvider, it started the nodemanager but this would be 
idealistic to work with only in case of 1 RM. 
{code:java}

yarn.client.failover-proxy-provider

org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider
{code}
Please correct if I am wrong at any point. 

 

> yarn.federation.failover.enabled missing in yarn-default.xml
> 
>
> Key: YARN-7592
> URL: https://issues.apache.org/jira/browse/YARN-7592
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.0.0-beta1
>Reporter: Gera Shegalov
>Priority: Major
> Attachments: IssueReproduce.patch
>
>
> yarn.federation.failover.enabled should be documented in yarn-default.xml. I 
> am also not sure why it should be true by default and force the HA retry 
> policy in {{RMProxy#createRMProxy}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org