[jira] [Updated] (YARN-8043) Add the exception message for failed launches running under LCE

2018-03-17 Thread Shane Kumpf (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8043:
--
Attachment: YARN-8043.001.patch

> Add the exception message for failed launches running under LCE
> ---
>
> Key: YARN-8043
> URL: https://issues.apache.org/jira/browse/YARN-8043
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-8043.001.patch
>
>
> If a {{ContainerExecutionException}} is thrown during container launch under 
> LCE information is added to the diagnostics for the container regarding the 
> failure. The diagnostic's output includes the container id, exit code, "error 
> output" and "shell output", with the expectation that the relevant exception 
> details are in the "error output". 
> {{ContainerExecutionException}} has several constructors, with the most 
> commonly used being the one that accepts a String. This form does not put the 
> exception details into the "error output". This makes it difficult for the 
> user to troubleshoot the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs

2018-03-17 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Attachment: YARN-8041.002.patch

> Federation: Implement multiple interfaces(14 interfaces), routing REST 
> invocations transparently to multiple RMs 
> -
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.0.0, 2.9.1
>Reporter: Yiran Wu
>Priority: Major
> Fix For: 3.0.0, 2.9.1
>
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch
>
>
> Implement routing 
> getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8044) Determine the appropriate default ContainerRetryPolicy

2018-03-17 Thread Shane Kumpf (JIRA)
Shane Kumpf created YARN-8044:
-

 Summary: Determine the appropriate default ContainerRetryPolicy
 Key: YARN-8044
 URL: https://issues.apache.org/jira/browse/YARN-8044
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Shane Kumpf


{{AbstractLauncher}} sets the retry policy to {{RETRY_ON_ALL_ERRORS}}, which 
may be too inclusive. Some error codes, such as -1, should likely result in a 
hard fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs

2018-03-17 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Fix Version/s: (was: 2.9.1)
   (was: 3.0.0)

> Federation: Implement multiple interfaces(14 interfaces), routing REST 
> invocations transparently to multiple RMs 
> -
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.0.0, 2.9.1
>Reporter: Yiran Wu
>Priority: Major
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch
>
>
> Implement routing 
> getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2018-03-17 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403774#comment-16403774
 ] 

Shane Kumpf commented on YARN-8045:
---

I'm not sure these really need to be logged at the INFO level, but removing the 
newlines would be an improvement.

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Priority: Major
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-17 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403775#comment-16403775
 ] 

Shane Kumpf commented on YARN-7973:
---

Opened YARN-8045, YARN-8043, and YARN-8044 for the items above. Let me know if 
you have any other concerns here.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer

2018-03-17 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-7962:
--
Attachment: YARN-7962.2.patch

> Race Condition When Stopping DelegationTokenRenewer
> ---
>
> Key: YARN-7962
> URL: https://issues.apache.org/jira/browse/YARN-7962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: YARN-7962.1.patch, YARN-7962.2.patch
>
>
> [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java]
> {code:java}
>   private ThreadPoolExecutor renewerService;
>   private void processDelegationTokenRenewerEvent(
>   DelegationTokenRenewerEvent evt) {
> serviceStateLock.readLock().lock();
> try {
>   if (isServiceStarted) {
> renewerService.execute(new DelegationTokenRenewerRunnable(evt));
>   } else {
> pendingEventQueue.add(evt);
>   }
> } finally {
>   serviceStateLock.readLock().unlock();
> }
>   }
>   @Override
>   protected void serviceStop() {
> if (renewalTimer != null) {
>   renewalTimer.cancel();
> }
> appTokens.clear();
> allTokens.clear();
> this.renewerService.shutdown();
> {code}
> {code:java}
> 2018-02-21 11:18:16,253  FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2
>  rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> What I think is going on here is that the {{serviceStop}} method is not 
> setting the {{isServiceStarted}} flag to 'false'.
> Please update so that the {{serviceStop}} method grabs the 
> {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before 
> shutting down the {{renewerService}} thread pool, to avoid this condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8035) Uncaught exception in ContainersMonitorImpl during relaunch due to the process ID changing

2018-03-17 Thread Shane Kumpf (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8035:
--
Attachment: YARN-8035.002.patch

> Uncaught exception in ContainersMonitorImpl during relaunch due to the 
> process ID changing
> --
>
> Key: YARN-8035
> URL: https://issues.apache.org/jira/browse/YARN-8035
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-8035.001.patch, YARN-8035.002.patch
>
>
> In the case of a container relaunch event, the container ID is reused but a 
> new process is spawned. For resource monitoring, {{ContainersMonitorImpl}} 
> will obtain the new PID post relaunch and initialize the process tree 
> monitoring. As part of this initialization, a tag called {{ContainerPid}}, 
> whose value is the PID for the container, is populated for the metrics 
> associated with the container. If the prior container failed after its 
> process started, the original PID will already be populated for the 
> container, resulting in the {{MetricsException}} below.
> {code:java}
> 2018-03-16 11:59:02,563 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Uncaught exception in ContainersMonitorImpl while monitoring resource of 
> container_1521201379995_0001_01_02
> org.apache.hadoop.metrics2.MetricsException: Tag ContainerPid already exists!
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.checkTagName(MetricsRegistry.java:433)
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:394)
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:400)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.recordProcessId(ContainerMetrics.java:277)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.initializeProcessTrees(ContainersMonitorImpl.java:559)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:448){code}
> {{MetricsRegistry}} provides a {{tag}} method that allows for updating the 
> value of an existing tag. Updating the value ensures that the PID associated 
> with container is the currently running process, which appears to be an 
> appropriate fix. However, it's unclear how this tag might be being used by 
> other systems. I'm not finding any usage in Hadoop itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8044) Determine the appropriate default ContainerRetryPolicy

2018-03-17 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403756#comment-16403756
 ] 

Shane Kumpf commented on YARN-8044:
---

{{ContainerRetryPolicy}} doesn't really provide a way to do this today. 
{{RETRY_ON_SPECIFIC_ERROR_CODES}} is likely too restrictive as -1 may be the 
only one where a hard fail makes sense. Adding {{FAIL_ON_SPECIFIC_ERROR_CODES}} 
support may make sense.

> Determine the appropriate default ContainerRetryPolicy
> --
>
> Key: YARN-8044
> URL: https://issues.apache.org/jira/browse/YARN-8044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Priority: Major
>
> {{AbstractLauncher}} sets the retry policy to {{RETRY_ON_ALL_ERRORS}}, which 
> may be too inclusive. Some error codes, such as -1, should likely result in a 
> hard fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8043) Add the exception message for failed launches running under LCE

2018-03-17 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403797#comment-16403797
 ] 

genericqa commented on YARN-8043:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
42s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8043 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12915035/YARN-8043.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 93c496dab2d4 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 49c747a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20004/testReport/ |
| Max. process+thread count | 440 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20004/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   

[jira] [Updated] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs

2018-03-17 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Attachment: (was: YARN-8041.002.patch)

> Federation: Implement multiple interfaces(14 interfaces), routing REST 
> invocations transparently to multiple RMs 
> -
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.0.0, 2.9.1
>Reporter: Yiran Wu
>Priority: Major
>
> Implement routing 
> getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs

2018-03-17 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Attachment: (was: YARN-8041.001.patch)

> Federation: Implement multiple interfaces(14 interfaces), routing REST 
> invocations transparently to multiple RMs 
> -
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.0.0, 2.9.1
>Reporter: Yiran Wu
>Priority: Major
> Attachments: YARN-8041.002.patch
>
>
> Implement routing 
> getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8045) Reduce log output from container status calls

2018-03-17 Thread Shane Kumpf (JIRA)
Shane Kumpf created YARN-8045:
-

 Summary: Reduce log output from container status calls
 Key: YARN-8045
 URL: https://issues.apache.org/jira/browse/YARN-8045
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Shane Kumpf


Each time a container's status is returned a log entry is produced in the NM 
from {{ContainerManagerImpl}}. The container status includes the diagnostics 
field for the container. If the diagnostics field contains an exception, it can 
appear as if the exception is logged repeatedly every second. The diagnostics 
message can also span many lines, which puts pressure on the logs and makes it 
harder to read.

For example:
{code}
2018-03-17 22:01:11,632 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Getting container-status for container_e01_1521323860653_0001_01_05
2018-03-17 22:01:11,632 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Returning ContainerStatus: [ContainerId: 
container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
RUNNING, Capability: , Diagnostics: [2018-03-17 
22:01:00.675]Exception from container-launch.
Container id: container_e01_1521323860653_0001_01_05
Exit code: -1
Exception message: 
Shell ouput: 


[2018-03-17 22:01:00.750]Diagnostic message from attempt :
[2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
, ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs

2018-03-17 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Attachment: YARN-8041.001.patch

> Federation: Implement multiple interfaces(14 interfaces), routing REST 
> invocations transparently to multiple RMs 
> -
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.0.0, 2.9.1
>Reporter: Yiran Wu
>Priority: Major
> Attachments: YARN-8041.001.patch
>
>
> Implement routing 
> getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8040) [UI2] New YARN UI webapp does not respect current pathname for REST api

2018-03-17 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403854#comment-16403854
 ] 

Sunil G commented on YARN-8040:
---

Thanks [~leftnoteasy].

I committed this to trunk.

> [UI2] New YARN UI webapp does not respect current pathname for REST api
> ---
>
> Key: YARN-8040
> URL: https://issues.apache.org/jira/browse/YARN-8040
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-8040.001.patch
>
>
> When ui2 is accessed behind proxy like knox/nginx, trailing path name should 
> not be skipped. However trim of "ui2" if its there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8040) [UI2] New YARN UI webapp does not respect current pathname for REST api

2018-03-17 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-8040:
--
Summary: [UI2] New YARN UI webapp does not respect current pathname for 
REST api  (was: [UI2] yarn new ui web-app does not respect current pathname for 
REST api)

> [UI2] New YARN UI webapp does not respect current pathname for REST api
> ---
>
> Key: YARN-8040
> URL: https://issues.apache.org/jira/browse/YARN-8040
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-8040.001.patch
>
>
> When ui2 is accessed behind proxy like knox/nginx, trailing path name should 
> not be skipped. However trim of "ui2" if its there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8040) [UI2] New YARN UI webapp does not respect current pathname for REST api

2018-03-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403874#comment-16403874
 ] 

Hudson commented on YARN-8040:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13851 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13851/])
YARN-8040. [UI2] New YARN UI webapp does not respect current pathname (sunilg: 
rev 98356a3ddebe5ca3c3fcc67c7267c847cfe1a292)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/initializers/loader.js


> [UI2] New YARN UI webapp does not respect current pathname for REST api
> ---
>
> Key: YARN-8040
> URL: https://issues.apache.org/jira/browse/YARN-8040
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-8040.001.patch
>
>
> When ui2 is accessed behind proxy like knox/nginx, trailing path name should 
> not be skipped. However trim of "ui2" if its there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8027) Setting hostname of docker container breaks for --net=host in docker 1.13

2018-03-17 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403562#comment-16403562
 ] 

Shane Kumpf commented on YARN-8027:
---

Thanks for the patch, [~Jim_Brennan]! I tested this with and without Registry 
DNS along with various networks and got the intended results.
{quote}Curious, is there any reason for YARN to want to change the hostname of 
a container unless Registry DNS is enabled?
{quote}
I tend to agree that there isn't a good reason. However, this is the only case 
that we know of where setting it has negative impact and the current patch does 
limit change to the existing behavior, so I'm +1 (non-binding) on the current 
patch. We can revisit for additional network types if the need arises.

> Setting hostname of docker container breaks for --net=host in docker 1.13
> -
>
> Key: YARN-8027
> URL: https://issues.apache.org/jira/browse/YARN-8027
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-8027.001.patch
>
>
> In DockerLinuxContainerRuntime:launchContainer, we are adding the --hostname 
> argument to the docker run command to set the hostname in the container to 
> something like:  ctr-e84-1520889172376-0001-01-01.
> This does not work when combined with the --net=host command line option in 
> Docker 1.13.1.  It causes multiple failures when the client tries to resolve 
> the hostname and it fails.
> We haven't seen this before because we were using docker 1.12.6 which seems 
> to ignore --hostname when you are using --net=host.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer

2018-03-17 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403559#comment-16403559
 ] 

genericqa commented on YARN-7962:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 3 new + 101 unchanged - 0 fixed = 104 total (was 101) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 56s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}111m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-7962 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12915010/YARN-7962.2.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 231112c96e87 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 49c747a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20001/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 

[jira] [Commented] (YARN-8035) Uncaught exception in ContainersMonitorImpl during relaunch due to the process ID changing

2018-03-17 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403476#comment-16403476
 ] 

genericqa commented on YARN-8035:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 19s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 1 new + 32 unchanged - 0 fixed = 33 total (was 32) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
50s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8035 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12915006/YARN-8035.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 61bc1f147843 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 49c747a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/1/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/1/testReport/ |
| Max. process+thread count | 397 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 

[jira] [Created] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs

2018-03-17 Thread Yiran Wu (JIRA)
Yiran Wu created YARN-8041:
--

 Summary: Federation: Implement multiple interfaces(14 interfaces), 
routing REST invocations transparently to multiple RMs 
 Key: YARN-8041
 URL: https://issues.apache.org/jira/browse/YARN-8041
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: federation
Reporter: Yiran Wu


Implement routing 
getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
 REST invocations transparently to multiple RMs 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs

2018-03-17 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Comment: was deleted

(was: Mistakes closed)

> Federation: Implement multiple interfaces(14 interfaces), routing REST 
> invocations transparently to multiple RMs 
> -
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.0.0, 2.9.1
>Reporter: Yiran Wu
>Priority: Major
> Fix For: 3.0.0, 2.9.1
>
> Attachments: YARN-8041.001.patch
>
>
> Implement routing 
> getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs

2018-03-17 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu reopened YARN-8041:


Mistakes closed

> Federation: Implement multiple interfaces(14 interfaces), routing REST 
> invocations transparently to multiple RMs 
> -
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.0.0, 2.9.1
>Reporter: Yiran Wu
>Priority: Major
> Fix For: 3.0.0, 2.9.1
>
> Attachments: YARN-8041.001.patch
>
>
> Implement routing 
> getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8043) Add the exception message for failed launches running under LCE

2018-03-17 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403744#comment-16403744
 ] 

Shane Kumpf edited comment on YARN-8043 at 3/17/18 9:52 PM:


I believe the intent of the {{output}} and {{errorOutput}} fields in 
{{ContainerExecutionException}} was to represent stdout/stderr for a shell 
command.The easy fix here is to add a {{getMessage}} call to the diagnostics 
output and clearly display what "errorOutput" is. I'll put up a patch that does 
that.


was (Author: shaneku...@gmail.com):
I believe the intent for the {{output}} and {{errorOutput}} fields in 
{{ContainerExecutionException}} were meant to represent stdout/stderr for a 
shell command.The easy fix here is to add a {{getMessage}} call to the 
diagnostics output and clearly display what "errorOutput" is. I'll put up a 
patch that does that.

> Add the exception message for failed launches running under LCE
> ---
>
> Key: YARN-8043
> URL: https://issues.apache.org/jira/browse/YARN-8043
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>
> If a {{ContainerExecutionException}} is thrown during container launch under 
> LCE information is added to the diagnostics for the container regarding the 
> failure. The diagnostic's output includes the container id, exit code, "error 
> output" and "shell output", with the expectation that the relevant exception 
> details are in the "error output". 
> {{ContainerExecutionException}} has several constructors, with the most 
> commonly used being the one that accepts a String. This form does not put the 
> exception details into the "error output". This makes it difficult for the 
> user to troubleshoot the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8043) Add the exception message for failed launches running under LCE

2018-03-17 Thread Shane Kumpf (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf reassigned YARN-8043:
-

Assignee: Shane Kumpf

> Add the exception message for failed launches running under LCE
> ---
>
> Key: YARN-8043
> URL: https://issues.apache.org/jira/browse/YARN-8043
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>
> If a {{ContainerExecutionException}} is thrown during container launch under 
> LCE information is added to the diagnostics for the container regarding the 
> failure. The diagnostic's output includes the container id, exit code, "error 
> output" and "shell output", with the expectation that the relevant exception 
> details are in the "error output". 
> {{ContainerExecutionException}} has several constructors, with the most 
> commonly used being the one that accepts a String. This form does not put the 
> exception details into the "error output". This makes it difficult for the 
> user to troubleshoot the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8043) Add the exception message for failed launches running under LCE

2018-03-17 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403744#comment-16403744
 ] 

Shane Kumpf commented on YARN-8043:
---

I believe the intent for the {{output}} and {{errorOutput}} fields in 
{{ContainerExecutionException}} were meant to represent stdout/stderr for a 
shell command.The easy fix here is to add a {{getMessage}} call to the 
diagnostics output and clearly display what "errorOutput" is. I'll put up a 
patch that does that.

> Add the exception message for failed launches running under LCE
> ---
>
> Key: YARN-8043
> URL: https://issues.apache.org/jira/browse/YARN-8043
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Major
>
> If a {{ContainerExecutionException}} is thrown during container launch under 
> LCE information is added to the diagnostics for the container regarding the 
> failure. The diagnostic's output includes the container id, exit code, "error 
> output" and "shell output", with the expectation that the relevant exception 
> details are in the "error output". 
> {{ContainerExecutionException}} has several constructors, with the most 
> commonly used being the one that accepts a String. This form does not put the 
> exception details into the "error output". This makes it difficult for the 
> user to troubleshoot the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7581) HBase filters are not constructed correctly in ATSv2

2018-03-17 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403482#comment-16403482
 ] 

Haibo Chen commented on YARN-7581:
--

{quote}someone wanted to see their megabytemillis counter value, do we need to 
retrieve the info field in that case?
{quote}
According to the comment in every TimelineEntityReader subclass,  "// By 
default fetch everything in INFO column family.", info fields are always 
retrieved. I assume that is for good reasons. Hence, I added info family to 
cfsInFields.
{quote} Any reason we are explicitly defining convertBytesToString and 
convertStringToBytes instead of using the ones that HBase provides?
{quote}
Good point. I was not aware of Bytes.toString(byte[]). Will update the patch to 
reuse Bytes.toString(byte[]) and Bytes.toBytes(String).

> HBase filters are not constructed correctly in ATSv2
> 
>
> Key: YARN-7581
> URL: https://issues.apache.org/jira/browse/YARN-7581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7581-YARN-7055.04.patch, YARN-7581.00.patch, 
> YARN-7581.01.patch, YARN-7581.02.patch, YARN-7581.03.patch, YARN-7581.04.patch
>
>
> Post YARN-7346,
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters() and 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters() 
> start to fail when hbase.profile is set to 2.0)
> *Error Message*
>  [ERROR] Failures:
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters:1266 
> expected:<2> but was:<0>
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters:1523 
> expected:<1> but was:<0>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8035) Uncaught exception in ContainersMonitorImpl during relaunch due to the process ID changing

2018-03-17 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403491#comment-16403491
 ] 

Shane Kumpf commented on YARN-8035:
---

New patch to address the unused import.

> Uncaught exception in ContainersMonitorImpl during relaunch due to the 
> process ID changing
> --
>
> Key: YARN-8035
> URL: https://issues.apache.org/jira/browse/YARN-8035
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-8035.001.patch, YARN-8035.002.patch
>
>
> In the case of a container relaunch event, the container ID is reused but a 
> new process is spawned. For resource monitoring, {{ContainersMonitorImpl}} 
> will obtain the new PID post relaunch and initialize the process tree 
> monitoring. As part of this initialization, a tag called {{ContainerPid}}, 
> whose value is the PID for the container, is populated for the metrics 
> associated with the container. If the prior container failed after its 
> process started, the original PID will already be populated for the 
> container, resulting in the {{MetricsException}} below.
> {code:java}
> 2018-03-16 11:59:02,563 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Uncaught exception in ContainersMonitorImpl while monitoring resource of 
> container_1521201379995_0001_01_02
> org.apache.hadoop.metrics2.MetricsException: Tag ContainerPid already exists!
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.checkTagName(MetricsRegistry.java:433)
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:394)
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:400)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.recordProcessId(ContainerMetrics.java:277)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.initializeProcessTrees(ContainersMonitorImpl.java:559)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:448){code}
> {{MetricsRegistry}} provides a {{tag}} method that allows for updating the 
> value of an existing tag. Updating the value ensures that the PID associated 
> with container is the currently running process, which appears to be an 
> appropriate fix. However, it's unclear how this tag might be being used by 
> other systems. I'm not finding any usage in Hadoop itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7581) HBase filters are not constructed correctly in ATSv2

2018-03-17 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403532#comment-16403532
 ] 

genericqa commented on YARN-7581:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
28s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-client in 
the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-7581 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12915011/YARN-7581.05.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 07df45d607ff 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 49c747a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20002/testReport/ |
| Max. process+thread count | 330 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 |
| Console 

[jira] [Commented] (YARN-8035) Uncaught exception in ContainersMonitorImpl during relaunch due to the process ID changing

2018-03-17 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403534#comment-16403534
 ] 

genericqa commented on YARN-8035:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 28s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
15s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8035 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12915009/YARN-8035.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1a5b495ec11b 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 49c747a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/2/testReport/ |
| Max. process+thread count | 469 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/2/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Uncaught exception in ContainersMonitorImpl during relaunch due 

[jira] [Created] (YARN-8042) Improve debugging on ATSv2 reader server

2018-03-17 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8042:


 Summary: Improve debugging on ATSv2 reader server
 Key: YARN-8042
 URL: https://issues.apache.org/jira/browse/YARN-8042
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Haibo Chen


It's been inconvenient to debug issues that happened on the read path. 
Typically, a query sent from a client is parsed into a TimelineReaderContext, 
TimelineEntityFilters  and TimelineDataToRetrieve  which are independent of the 
underlying backend storage implementations.    Then the general ATSv2 building 
blocks are then translated into a SCAN and GET query in HBase with specified 
row keys and filters.

To facilitate easy debugging, additional debug level logging messages (ideally 
that can be enabled dynamically without restarting TimelineReaderServer 
process) can be added at the boundary to narrow down the scope of investigation.

A good example of this is logging the scan or get query before it is sent to 
HBase and the result after the query returns from HBase. YARN support folks who 
are not necessarily HBase experts can present the debug messages to HBase 
experts and get help. (I had to remotely connect to TimelineReaderServer, set 
up breakpoints and get the hbase queries every time I suspect there is a bug in 
HBase)

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8042) Improve debugging on ATSv2 reader server

2018-03-17 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8042:
-
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-7055

> Improve debugging on ATSv2 reader server
> 
>
> Key: YARN-8042
> URL: https://issues.apache.org/jira/browse/YARN-8042
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Haibo Chen
>Priority: Major
>
> It's been inconvenient to debug issues that happened on the read path. 
> Typically, a query sent from a client is parsed into a TimelineReaderContext, 
> TimelineEntityFilters  and TimelineDataToRetrieve  which are independent of 
> the underlying backend storage implementations.    Then the general ATSv2 
> building blocks are then translated into a SCAN and GET query in HBase with 
> specified row keys and filters.
> To facilitate easy debugging, additional debug level logging messages 
> (ideally that can be enabled dynamically without restarting 
> TimelineReaderServer process) can be added at the boundary to narrow down the 
> scope of investigation.
> A good example of this is logging the scan or get query before it is sent to 
> HBase and the result after the query returns from HBase. YARN support folks 
> who are not necessarily HBase experts can present the debug messages to HBase 
> experts and get help. (I had to remotely connect to TimelineReaderServer, set 
> up breakpoints and get the hbase queries every time I suspect there is a bug 
> in HBase)
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8043) Add the exception message for failed launches running under LCE

2018-03-17 Thread Shane Kumpf (JIRA)
Shane Kumpf created YARN-8043:
-

 Summary: Add the exception message for failed launches running 
under LCE
 Key: YARN-8043
 URL: https://issues.apache.org/jira/browse/YARN-8043
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Shane Kumpf


If a {{ContainerExecutionException}} is thrown during container launch under 
LCE information is added to the diagnostics for the container regarding the 
failure. The diagnostic's output includes the container id, exit code, "error 
output" and "shell output", with the expectation that the relevant exception 
details are in the "error output". 

{{ContainerExecutionException}} has several constructors, with the most 
commonly used being the one that accepts a String. This form does not put the 
exception details into the "error output". This makes it difficult for the user 
to troubleshoot the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs

2018-03-17 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Attachment: YARN-8041.001.patch

> Federation: Implement multiple interfaces(14 interfaces), routing REST 
> invocations transparently to multiple RMs 
> -
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Reporter: Yiran Wu
>Priority: Major
> Attachments: YARN-8041.001.patch
>
>
> Implement routing 
> getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs

2018-03-17 Thread Yiran Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403539#comment-16403539
 ] 

Yiran Wu commented on YARN-8041:


add patch.

> Federation: Implement multiple interfaces(14 interfaces), routing REST 
> invocations transparently to multiple RMs 
> -
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.0.0, 2.9.1
>Reporter: Yiran Wu
>Priority: Major
> Fix For: 3.0.0, 2.9.1
>
> Attachments: YARN-8041.001.patch
>
>
> Implement routing 
> getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7581) HBase filters are not constructed correctly in ATSv2

2018-03-17 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-7581:
-
Attachment: YARN-7581.05.patch

> HBase filters are not constructed correctly in ATSv2
> 
>
> Key: YARN-7581
> URL: https://issues.apache.org/jira/browse/YARN-7581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7581-YARN-7055.04.patch, YARN-7581.00.patch, 
> YARN-7581.01.patch, YARN-7581.02.patch, YARN-7581.03.patch, 
> YARN-7581.04.patch, YARN-7581.05.patch
>
>
> Post YARN-7346,
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters() and 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters() 
> start to fail when hbase.profile is set to 2.0)
> *Error Message*
>  [ERROR] Failures:
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters:1266 
> expected:<2> but was:<0>
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters:1523 
> expected:<1> but was:<0>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7988) Refactor FSNodeLabelStore code for attributes store support

2018-03-17 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403563#comment-16403563
 ] 

Bibin A Chundatt commented on YARN-7988:


[~Naganarasimha]/[~sunil.gov...@gmail.com]/[~cheersyang]

Any comments..If you feel its good to be pushed will commit soon

> Refactor FSNodeLabelStore code for attributes store support
> ---
>
> Key: YARN-7988
> URL: https://issues.apache.org/jira/browse/YARN-7988
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-7988-YARN-3409.002.patch, 
> YARN-7988-YARN-3409.003.patch, YARN-7988-YARN-3409.004.patch, 
> YARN-7988.001.patch
>
>
> # Abstract out file FileSystemStore operation
> # Define EditLog Operartions  and Mirror operation
> # Support compatibility with old nodelabel store



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8031) NodeManager will fail to start if cpu subsystem is already mounted

2018-03-17 Thread JayceAu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JayceAu updated YARN-8031:
--
Attachment: (was: image-2018-03-15-14-47-30-583.png)

> NodeManager will fail to start if cpu subsystem is already mounted
> --
>
> Key: YARN-8031
> URL: https://issues.apache.org/jira/browse/YARN-8031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: JayceAu
>Priority: Major
>
> if *yarn.nodemanager.linux-container-executor.cgroups.mount* is set to true 
> and cpu subsystem is not yet mounted, NodeManager will mount the cpu 
> subsystem and then create the control group whose default name is 
> *hadoop-yarn* if the mount step is successful. This procedure works well if 
> cpu subsystem is not yet mounted. However, under some situation cpu subsystem 
> is already mounted before NodeManager starts and NodeManager will fail to 
> start because of no write permission to the *hadoop-yarn* path . For example:
>  # in OS that use systemd such as centos7 will have cpu subsystem mounted by 
> default on machine startup
>  # some deamon whose start order is more precedent than NodeManager may also 
> rely on the mounted state of cpu subsystem. In our production environment, we 
> limit the cpu usage of the monitoring and control agent, which starts on 
> reboot
> In order to solve this problem, container-executor must be able to create the 
> control group *hadoop-yarn* if mounting controller is successful or this 
> controller is already mounted. Besides, if cpu subsystem is used in 
> combination with other subsystem and it's already mounted, container-executor 
> should use the latest mount point of cpu subsystem instread of the one 
> provided by NodeManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8028) Support authorizeUserAccessToQueue in RMWebServices

2018-03-17 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403302#comment-16403302
 ] 

Sunil G commented on YARN-8028:
---

Thanks [~leftnoteasy]. makes sense to me.

I am fine with latest patch and will move other optimizations in separate jira. 
Committing shortly if no objections. Thank You.

> Support authorizeUserAccessToQueue in RMWebServices
> ---
>
> Key: YARN-8028
> URL: https://issues.apache.org/jira/browse/YARN-8028
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8028.001.patch, YARN-8028.002.patch
>
>
> Currently we have {{QueueUserACLInfo}} in ApplicationClient, we should 
> support similar API in REST API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8031) NodeManager will fail to start if cpu subsystem is already mounted

2018-03-17 Thread JayceAu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JayceAu updated YARN-8031:
--
Attachment: YARN-8031.001.patch

> NodeManager will fail to start if cpu subsystem is already mounted
> --
>
> Key: YARN-8031
> URL: https://issues.apache.org/jira/browse/YARN-8031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: JayceAu
>Priority: Major
> Attachments: YARN-8031.001.patch
>
>
> if *yarn.nodemanager.linux-container-executor.cgroups.mount* is set to true 
> and cpu subsystem is not yet mounted, NodeManager will mount the cpu 
> subsystem and then create the control group whose default name is 
> *hadoop-yarn* if the mount step is successful. This procedure works well if 
> cpu subsystem is not yet mounted. However, under some situation cpu subsystem 
> is already mounted before NodeManager starts and NodeManager will fail to 
> start because of no write permission to the *hadoop-yarn* path . For example:
>  # in OS that use systemd such as centos7 will have cpu subsystem mounted by 
> default on machine startup
>  # some deamon whose start order is more precedent than NodeManager may also 
> rely on the mounted state of cpu subsystem. In our production environment, we 
> limit the cpu usage of the monitoring and control agent, which starts on 
> reboot
> In order to solve this problem, container-executor must be able to create the 
> control group *hadoop-yarn* if mounting controller is successful or this 
> controller is already mounted. Besides, if cpu subsystem is used in 
> combination with other subsystem and it's already mounted, container-executor 
> should use the latest mount point of cpu subsystem instread of the one 
> provided by NodeManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8031) NodeManager will fail to start if cpu subsystem is already mounted

2018-03-17 Thread JayceAu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403294#comment-16403294
 ] 

JayceAu commented on YARN-8031:
---

@Miklos Szegedi, after reading the source code and according to my test result, 
if set this *yarn.nodemanager.linux-container-executor.cgroups.mount* to false, 
NM won't create the hierarchy directory hadoop-yarn with cpu controller 
mounted, which is conflict with what is mentioned in the doc:

[https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html]
{code:java}
// code placeholder
The cgroups hierarchy under which to place YARN proccesses(cannot contain 
commas). If yarn.nodemanager.linux-container-executor.cgroups.mount is false 
(that is, if cgroups have been pre-configured) and the YARN user has write 
access to the parent directory, then the directory will be created. If the 
directory already exists, the administrator has to give YARN write permissions 
to it recursively.
{code}

> NodeManager will fail to start if cpu subsystem is already mounted
> --
>
> Key: YARN-8031
> URL: https://issues.apache.org/jira/browse/YARN-8031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: JayceAu
>Priority: Major
> Attachments: YARN-8031.001.patch
>
>
> if *yarn.nodemanager.linux-container-executor.cgroups.mount* is set to true 
> and cpu subsystem is not yet mounted, NodeManager will mount the cpu 
> subsystem and then create the control group whose default name is 
> *hadoop-yarn* if the mount step is successful. This procedure works well if 
> cpu subsystem is not yet mounted. However, under some situation cpu subsystem 
> is already mounted before NodeManager starts and NodeManager will fail to 
> start because of no write permission to the *hadoop-yarn* path . For example:
>  # in OS that use systemd such as centos7 will have cpu subsystem mounted by 
> default on machine startup
>  # some deamon whose start order is more precedent than NodeManager may also 
> rely on the mounted state of cpu subsystem. In our production environment, we 
> limit the cpu usage of the monitoring and control agent, which starts on 
> reboot
> In order to solve this problem, container-executor must be able to create the 
> control group *hadoop-yarn* if mounting controller is successful or this 
> controller is already mounted. Besides, if cpu subsystem is used in 
> combination with other subsystem and it's already mounted, container-executor 
> should use the latest mount point of cpu subsystem instread of the one 
> provided by NodeManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8035) Uncaught exception in ContainersMonitorImpl during relaunch due to the process ID changing

2018-03-17 Thread Shane Kumpf (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf reassigned YARN-8035:
-

Assignee: Shane Kumpf

> Uncaught exception in ContainersMonitorImpl during relaunch due to the 
> process ID changing
> --
>
> Key: YARN-8035
> URL: https://issues.apache.org/jira/browse/YARN-8035
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-8035.001.patch
>
>
> In the case of a container relaunch event, the container ID is reused but a 
> new process is spawned. For resource monitoring, {{ContainersMonitorImpl}} 
> will obtain the new PID post relaunch and initialize the process tree 
> monitoring. As part of this initialization, a tag called {{ContainerPid}}, 
> whose value is the PID for the container, is populated for the metrics 
> associated with the container. If the prior container failed after its 
> process started, the original PID will already be populated for the 
> container, resulting in the {{MetricsException}} below.
> {code:java}
> 2018-03-16 11:59:02,563 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Uncaught exception in ContainersMonitorImpl while monitoring resource of 
> container_1521201379995_0001_01_02
> org.apache.hadoop.metrics2.MetricsException: Tag ContainerPid already exists!
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.checkTagName(MetricsRegistry.java:433)
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:394)
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:400)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.recordProcessId(ContainerMetrics.java:277)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.initializeProcessTrees(ContainersMonitorImpl.java:559)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:448){code}
> {{MetricsRegistry}} provides a {{tag}} method that allows for updating the 
> value of an existing tag. Updating the value ensures that the PID associated 
> with container is the currently running process, which appears to be an 
> appropriate fix. However, it's unclear how this tag might be being used by 
> other systems. I'm not finding any usage in Hadoop itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8035) Uncaught exception in ContainersMonitorImpl during relaunch due to the process ID changing

2018-03-17 Thread Shane Kumpf (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8035:
--
Attachment: YARN-8035.001.patch

> Uncaught exception in ContainersMonitorImpl during relaunch due to the 
> process ID changing
> --
>
> Key: YARN-8035
> URL: https://issues.apache.org/jira/browse/YARN-8035
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-8035.001.patch
>
>
> In the case of a container relaunch event, the container ID is reused but a 
> new process is spawned. For resource monitoring, {{ContainersMonitorImpl}} 
> will obtain the new PID post relaunch and initialize the process tree 
> monitoring. As part of this initialization, a tag called {{ContainerPid}}, 
> whose value is the PID for the container, is populated for the metrics 
> associated with the container. If the prior container failed after its 
> process started, the original PID will already be populated for the 
> container, resulting in the {{MetricsException}} below.
> {code:java}
> 2018-03-16 11:59:02,563 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Uncaught exception in ContainersMonitorImpl while monitoring resource of 
> container_1521201379995_0001_01_02
> org.apache.hadoop.metrics2.MetricsException: Tag ContainerPid already exists!
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.checkTagName(MetricsRegistry.java:433)
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:394)
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:400)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.recordProcessId(ContainerMetrics.java:277)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.initializeProcessTrees(ContainersMonitorImpl.java:559)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:448){code}
> {{MetricsRegistry}} provides a {{tag}} method that allows for updating the 
> value of an existing tag. Updating the value ensures that the PID associated 
> with container is the currently running process, which appears to be an 
> appropriate fix. However, it's unclear how this tag might be being used by 
> other systems. I'm not finding any usage in Hadoop itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8037) CGroupsResourceCalculator logs excessive warnings on container relaunch

2018-03-17 Thread Shane Kumpf (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8037:
--
Summary: CGroupsResourceCalculator logs excessive warnings on container 
relaunch  (was: CGroupsResourceCalculator excessive warnings on container 
relaunch)

> CGroupsResourceCalculator logs excessive warnings on container relaunch
> ---
>
> Key: YARN-8037
> URL: https://issues.apache.org/jira/browse/YARN-8037
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Major
>
> When a container is relaunched, the old process no longer exists. When using 
> the {{CGroupsResourceCalculator}} this results in the warning and exception 
> below being logged every second until the relaunch occurs, which is excessive 
> and filling up the logs.
> {code:java}
> 2018-03-16 14:30:33,438 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
>  Failed to parse 12844
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
> interim 12844
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_02/cpuacct.stat
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more
> 2018-03-16 14:30:33,438 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
>  Failed to parse cgroups 
> /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.memsw.usage_in_bytes
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
> interim 12844
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: 
> /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.usage_in_bytes
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more{code}
> We should consider moving the exception to debug to reduce the noise at a 
> minimum. Alternatively, it may make sense to stop the existing 
> {{MonitoringThread}} during relaunch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org