[
https://issues.apache.org/jira/browse/YARN-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957125#comment-16957125
]
Hadoop QA commented on YARN-9929:
---------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m
0s{color} | {color:red} The patch doesn't appear to include any new or modified
tests. Please justify why no new tests are needed for this patch. Also please
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
47s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 4m
21s{color} | {color:red} branch has errors when building and testing our client
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m
14s{color} | {color:red} hadoop-yarn-server-nodemanager in trunk failed.
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
15m 25s{color} | {color:green} patch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 22m
43s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed.
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
33s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 16s{color} |
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9929 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12983736/YARN-9929.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient findbugs checkstyle |
| uname | Linux a3de11274f8a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 19f35cf |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs |
https://builds.apache.org/job/PreCommit-YARN-Build/25025/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-YARN-Build/25025/testReport/ |
| Max. process+thread count | 400 (vs. ulimit of 5500) |
| modules | C:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
U:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
|
| Console output |
https://builds.apache.org/job/PreCommit-YARN-Build/25025/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> NodeManager OOM because of stuck DeletionService
> ------------------------------------------------
>
> Key: YARN-9929
> URL: https://issues.apache.org/jira/browse/YARN-9929
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 3.1.2
> Reporter: kyungwan nam
> Assignee: kyungwan nam
> Priority: Major
> Attachments: YARN-9929.001.patch, nm_heapdump.png
>
>
> NMs go through frequent Full GC due to a lack of heap memory.
> we can find a lot of FileDeletionTask, DockerContainerDeletionTask from the
> heap dump (screenshot is attached)
> and after analyzing the thread dump, we can figure out _DeletionService_ gets
> stuck in _executeStatusCommand_ which run 'docker inspect'
> {code:java}
> "DeletionService #0" - Thread t@41
> java.lang.Thread.State: RUNNABLE
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:255)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <649fc0cf> (a java.lang.UNIXProcess$ProcessPipeInputStream)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> - locked <3e45c938> (a java.io.InputStreamReader)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.read1(BufferedReader.java:212)
> at java.io.BufferedReader.read(BufferedReader.java:286)
> - locked <3e45c938> (a java.io.InputStreamReader)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1240)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:995)
> at org.apache.hadoop.util.Shell.run(Shell.java:902)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeDockerCommand(DockerCommandExecutor.java:91)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeStatusCommand(DockerCommandExecutor.java:180)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.getContainerStatus(DockerCommandExecutor.java:118)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.removeDockerContainer(LinuxContainerExecutor.java:937)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.DockerContainerDeletionTask.run(DockerContainerDeletionTask.java:61)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Locked ownable synchronizers:
> - locked <4cc6fa2a> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> {code}
> also, we found 'docker inspect' processes are running for a long time as
> follows.
> {code:java}
> root 95637 0.0 0.0 2650984 35776 ? Sl Aug23 5:48
> /usr/bin/docker inspect --format={{.State.Status}}
> container_e30_1555419799458_0014_01_000030
> root 95638 0.0 0.0 2773860 33908 ? Sl Aug23 5:33
> /usr/bin/docker inspect --format={{.State.Status}}
> container_e50_1561100493387_25316_01_001455
> root 95641 0.0 0.0 2445924 34204 ? Sl Aug23 5:34
> /usr/bin/docker inspect --format={{.State.Status}}
> container_e49_1560851258686_2107_01_000024
> root 95643 0.0 0.0 2642532 34428 ? Sl Aug23 5:30
> /usr/bin/docker inspect --format={{.State.Status}}
> container_e50_1561100493387_8111_01_002657{code}
>
> I think It has occurred since docker daemon is restarted.
> 'docker inspect' which was run while restarting the docker daemon was not
> working. and not even it was not terminated.
> It can be considered as a docker issue.
> but It could happen whenever if 'docker inspect' does not work due to docker
> daemon restarting or docker bug.
> It would be good to set the timeout for 'docker inspect' to avoid this issue.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]