[ 
https://issues.apache.org/jira/browse/YARN-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

徐鹏 updated YARN-9769:
---------------------
    Attachment: image-2019-08-21-16-35-41-882.png

> if "ContainerLocalizer Downloader" thread block ,it will never stop
> -------------------------------------------------------------------
>
>                 Key: YARN-9769
>                 URL: https://issues.apache.org/jira/browse/YARN-9769
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.5.0
>         Environment: hadoop:2.5.0-cdh5.2.0
>            Reporter: 徐鹏
>            Priority: Major
>         Attachments: image-2019-08-21-16-31-23-444.png, 
> image-2019-08-21-16-31-58-374.png, image-2019-08-21-16-32-07-920.png, 
> image-2019-08-21-16-32-39-938.png, image-2019-08-21-16-33-07-159.png, 
> image-2019-08-21-16-33-55-739.png, image-2019-08-21-16-35-41-882.png, 
> nm_jstack
>
>
> If "ContainerLocalizer Downloader" thread block ,it will never stop and  
> nodemanger jvm will run out of memory .NodeManager should fail 
> "ContainerLocalizer Downloader" thread by timeout.
>   
>  In my case:
>      *NM jvm main opt*: -
>  -XX:InitialHeapSize=2147483648 -XX:MaxGCPauseMillis=200 
> -XX:MaxHeapSize=2147483648 -XX:MaxNewSize=1287651328 
> -XX:MinHeapDeltaBytes=1048576 - -XX:+UseG1GC
>      *gc* : frequently but work bad (old gen >= 99%) 
>   
>     !image-2019-08-21-16-31-23-444.png!
>     *jstack&jmap*: 3602 "ContainerLocalizer Downloader" threads  block  
> ,total 561MB
>   
> {code:java}
> // code placeholder"ContainerLocalizer Downloader" #59288379 prio=5 os_prio=0 
> tid=0x00007f9c62d9d800 nid=0xb7550 waiting on condition 
> [0x00007f9b1c2c0000]"ContainerLocalizer Downloader" #59288379 prio=5 
> os_prio=0 tid=0x00007f9c62d9d800 nid=0xb7550 waiting on condition 
> [0x00007f9b1c2c0000]   java.lang.Thread.State: WAITING (parking) at 
> sun.misc.Unsafe.park(Native Method) - parking to wait for  
> <0x000000008057ddb0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976)
>  at 
> org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.allocSlot(DfsClientShmManager.java:254)
>  at 
> org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager.allocSlot(DfsClientShmManager.java:432)
>  at 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.allocShmSlot(ShortCircuitCache.java:1016)
>  at 
> org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:449)
>  at 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:783)
>  at 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:717)
>  at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:394)
>  at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:305) 
> at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:590) 
> - locked <0x00000000fa4ce540> (a org.apache.hadoop.hdfs.DFSInputStream) at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:797)
>  at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:844) - 
> locked <0x00000000fa4ce540> (a org.apache.hadoop.hdfs.DFSInputStream) at 
> java.io.DataInputStream.read(DataInputStream.java:100) at 
> org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78) at 
> org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52) at 
> org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) at 
> org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) at 
> org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:264) at 
> org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:60) at 
> org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:356) at 
> org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:354) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1701)
>  at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:354) at 
> org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {code}
>  
> !image-2019-08-21-16-31-58-374.png!
>   !image-2019-08-21-16-32-07-920.png!
> *ContainerLocalizer.class*
>   
> !image-2019-08-21-16-33-07-159.png!  
> *ADD Loop termination*
> *!image-2019-08-21-16-33-55-739.png!*
>    [^nm_jstack]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to