[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

Shurong Mai (JIRA) Sun, 28 Apr 2019 20:38:29 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827893#comment-16827893
 ]


Shurong Mai edited comment on YARN-5449 at 4/29/19 3:37 AM:
------------------------------------------------------------

[~rohithsharma] , thank you for your attention and advice . Before I created 
this issue, we had been making analysis it for a long time from  jvm process 
thread stack, jvm process  heap memory, different java version, os log, 
different os version,  different os file system and so on. But we could not get 
the reason for sure. As a result of we analysed, we guessed the most  probable 
reason of nodemanager process  hung was that disk hanging  when reading/writing 
disk, but we have not proved that yet.


was (Author: shurong.mai):
[~rohithsharma] , thank you for your attention and advices . Before I created 
this issue, we had been making analysis it for a long time from  jvm process 
thread stack, jvm process  heap memory, different java version, os log, 
different os version,  different os file system and so on. But we can't get the 
reason for sure. As a result of we analysed, the most  probable reason is that 
nodemanager process is hung.

> nodemanager process is hung, and lost from resourcemanager
> ----------------------------------------------------------
>
>                 Key: YARN-5449
>                 URL: https://issues.apache.org/jira/browse/YARN-5449
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.2.0
>         Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>            Reporter: Shurong Mai
>            Priority: Major
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
> LGCC                 GCC                 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437     7    5.899  629.335 No 
> GC                G1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437     7    5.899  629.335 No 
> GC                G1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437     7    5.899  629.335 No 
> GC                G1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437     7    5.899  629.335 No 
> GC                G1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437     7    5.899  629.335 No 
> GC                G1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437     7    5.899  629.335 No 
> GC                G1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage 
> collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
> -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
> -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
> -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
> -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
> -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

Reply via email to