Manikandan R created YARN-5370:
----------------------------------

             Summary: Setting yarn.nodemanager.delete.debug-delay-sec to high 
number crashes NM because of OOM
                 Key: YARN-5370
                 URL: https://issues.apache.org/jira/browse/YARN-5370
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Manikandan R


I set yarn.nodemanager.delete.debug-delay-sec to 100 + days in my dev  cluster 
for some reasons. It has been done before 3-4 weeks. After setting this up, at 
times, NM crashes because of OOM. So, I kept on increasing from 512MB to 6 GB 
over the past few weeks gradually as and when this crash occurs as temp fix. 
Sometimes, It won't start smoothly and after multiple tries, it starts 
functioning. While analyzing heap dump of corresponding JVM, come to know that 
DeletionService.Java is occupying almost 99% of total allocated memory (-xmx) 
something like this

org.apache.hadoop.yarn.server.nodemanager.DeletionService$DelServiceSchedThreadPoolExecutor
 @ 0x6c1d09068| 80 | 3,544,094,696 | 99.13%

Basically, there are huge no. of above mentioned tasks scheduled for deletion. 
Usually, I see NM memory requirements as 2-4GB for large clusters. In my case, 
cluster is very small and OOM occurs.

Is it expected behaviour? (or) Is there any limit we can expose on 
yarn.nodemanager.delete.debug-delay-sec to avoid these kind of issues?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to