[
https://issues.apache.org/jira/browse/YARN-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Billie Rinaldi resolved YARN-7426.
----------------------------------
Resolution: Duplicate
> Interrupt does not work when LocalizerRunner is reading from InputStream
> ------------------------------------------------------------------------
>
> Key: YARN-7426
> URL: https://issues.apache.org/jira/browse/YARN-7426
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Affects Versions: 2.7.3
> Reporter: Prabhu Joseph
> Priority: Critical
>
> When the NodeManager is overloaded and ContainerLocalizer processes are
> hanging, the containers will timeout and cleaned up. The LocalizerRunner
> thread will be interrupted during cleanup but the interrupt does not work
> when it is reading from FileInputStream. LocalizerRunner threads and
> ContainerLocalizer process keeps on accumulating which makes the node
> completely unresponsive. We can have a timeout for Shell Command to avoid
> this similar to HADOOP-13817.
> The timeout value can be set by AM same as container timeout.
> ContainerLocalizer JVM stacktrace:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x00007fd8ec019000 nid=0xc295 runnable
> [0x00007fd8f3956000]
> java.lang.Thread.State: RUNNABLE
> at java.util.zip.ZipFile.open(Native Method)
> at java.util.zip.ZipFile.<init>(ZipFile.java:219)
> at java.util.zip.ZipFile.<init>(ZipFile.java:149)
> at java.util.jar.JarFile.<init>(JarFile.java:166)
> at java.util.jar.JarFile.<init>(JarFile.java:103)
> at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:893)
> at sun.misc.URLClassPath$JarLoader.access$700(URLClassPath.java:756)
> at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:838)
> at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:831)
> at java.security.AccessController.doPrivileged(Native Method)
> at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:830)
> at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:803)
> at sun.misc.URLClassPath$3.run(URLClassPath.java:530)
> at sun.misc.URLClassPath$3.run(URLClassPath.java:520)
> at java.security.AccessController.doPrivileged(Native Method)
> at sun.misc.URLClassPath.getLoader(URLClassPath.java:519)
> at sun.misc.URLClassPath.getLoader(URLClassPath.java:492)
> - locked <0x000000076ac75058> (a sun.misc.URLClassPath)
> at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:457)
> - locked <0x000000076ac75058> (a sun.misc.URLClassPath)
> at sun.misc.URLClassPath.getResource(URLClassPath.java:211)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> - locked <0x000000076ac7f960> (a java.lang.Object)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495)
> {code}
> NodeManager LocalizerRunner thread which is not interrupted:
> {code}
> "LocalizerRunner for container_e746_1508665985104_601806_01_000005" #3932753
> prio=5 os_prio=0 tid=0x00007fb258d5f800 nid=0x11091 runnable
> [0x00007fb153946000]
> java.lang.Thread.State: RUNNABLE
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:255)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0x0000000718502b80> (a
> java.lang.UNIXProcess$ProcessPipeInputStream)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> - locked <0x0000000718502bd8> (a java.io.InputStreamReader)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.read1(BufferedReader.java:212)
> at java.io.BufferedReader.read(BufferedReader.java:286)
> - locked <0x0000000718502bd8> (a java.io.InputStreamReader)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1155)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:930)
> at org.apache.hadoop.util.Shell.run(Shell.java:848)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:151)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:264)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114)
> NM log shows the LocalizerRunner is suppose to
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]