Manikandan R created YARN-5370: ---------------------------------- Summary: Setting yarn.nodemanager.delete.debug-delay-sec to high number crashes NM because of OOM Key: YARN-5370 URL: https://issues.apache.org/jira/browse/YARN-5370 Project: Hadoop YARN Issue Type: Bug Reporter: Manikandan R
I set yarn.nodemanager.delete.debug-delay-sec to 100 + days in my dev cluster for some reasons. It has been done before 3-4 weeks. After setting this up, at times, NM crashes because of OOM. So, I kept on increasing from 512MB to 6 GB over the past few weeks gradually as and when this crash occurs as temp fix. Sometimes, It won't start smoothly and after multiple tries, it starts functioning. While analyzing heap dump of corresponding JVM, come to know that DeletionService.Java is occupying almost 99% of total allocated memory (-xmx) something like this org.apache.hadoop.yarn.server.nodemanager.DeletionService$DelServiceSchedThreadPoolExecutor @ 0x6c1d09068| 80 | 3,544,094,696 | 99.13% Basically, there are huge no. of above mentioned tasks scheduled for deletion. Usually, I see NM memory requirements as 2-4GB for large clusters. In my case, cluster is very small and OOM occurs. Is it expected behaviour? (or) Is there any limit we can expose on yarn.nodemanager.delete.debug-delay-sec to avoid these kind of issues? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org