Manikandan R created YARN-5370:
----------------------------------
Summary: Setting yarn.nodemanager.delete.debug-delay-sec to high
number crashes NM because of OOM
Key: YARN-5370
URL: https://issues.apache.org/jira/browse/YARN-5370
Project: Hadoop YARN
Issue Type: Bug
Reporter: Manikandan R
I set yarn.nodemanager.delete.debug-delay-sec to 100 + days in my dev cluster
for some reasons. It has been done before 3-4 weeks. After setting this up, at
times, NM crashes because of OOM. So, I kept on increasing from 512MB to 6 GB
over the past few weeks gradually as and when this crash occurs as temp fix.
Sometimes, It won't start smoothly and after multiple tries, it starts
functioning. While analyzing heap dump of corresponding JVM, come to know that
DeletionService.Java is occupying almost 99% of total allocated memory (-xmx)
something like this
org.apache.hadoop.yarn.server.nodemanager.DeletionService$DelServiceSchedThreadPoolExecutor
@ 0x6c1d09068| 80 | 3,544,094,696 | 99.13%
Basically, there are huge no. of above mentioned tasks scheduled for deletion.
Usually, I see NM memory requirements as 2-4GB for large clusters. In my case,
cluster is very small and OOM occurs.
Is it expected behaviour? (or) Is there any limit we can expose on
yarn.nodemanager.delete.debug-delay-sec to avoid these kind of issues?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]