[
https://issues.apache.org/jira/browse/YARN-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
agoodboy updated YARN-9778:
---------------------------
Description:
Nodemanager does not clean local filecache dir event the size exceeds the
config in yarn-site.xml. The config in yarn-site.xml is as follows:
<property>
<name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name>
<value>600000</value>
</property>
<property>
<name>yarn.nodemanager.localizer.cache.target-size-mb</name>
<value>10240</value>
</property>
<property>
<name>yarn.nodemanager.container-executor.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
</property>
I use docker to run my program and in docker container I will download file
from hdfs to local dir. But after docker container killed or exit, the files
doesn't cleaned by nodemanager, hence, the filecache dir increases and node
enters unhealthy state. The docker start command with a mounted dir like this:
-v=/data1/hadoop/yarn/local/filecache/2115/models.tar.gz/models:/home/hadoop/xdl/models:rw
-v=/data1/hadoop/yarn/local/filecache/2116:/data1/hadoop/yarn/local/filecache/2116
-v=/data1/hadoop/yarn/local/filecache/2117:/data1/hadoop/yarn/local/filecache/2117.
For example, the filecache dir size is
$ sudo du -sh .112G .
But nodemanager still does not clean it event I set cache size is 10GB.
was:
Nodemanager does not clean local filecache dir event the size exceeds the
config in yarn-site.xml. The config in yarn-site.xml is as follows:
<property>
<name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name>
<value>600000</value>
</property>
<property>
<name>yarn.nodemanager.localizer.cache.target-size-mb</name>
<value>10240</value>
</property>
<property>
<name>yarn.nodemanager.container-executor.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
</property>
I use docker to run my program and in docker container I will download file
from hdfs to local dir. But after docker container killed or exit, the files
doesn't cleaned by nodemanager, hence, the filecache dir increases and node
enters unhealthy state. The docker start command with a mounted dir like this:
-v=/data1/hadoop/yarn/local/filecache/2115/models.tar.gz/models:/home/hadoop/xdl/models:rw
-v=/data1/hadoop/yarn/local/filecache/2116:/data1/hadoop/yarn/local/filecache/2116
-v=/data1/hadoop/yarn/local/filecache/2117:/data1/hadoop/yarn/local/filecache/2117.
For example, the filecache dir size is
$ sudo du -sh .filecache$ sudo du -sh .112G .
But nodemanager still does not clean it event I set cache size is 10GB.
> Nodemanager does not clean public cache(filecache)
> --------------------------------------------------
>
> Key: YARN-9778
> URL: https://issues.apache.org/jira/browse/YARN-9778
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Affects Versions: 3.1.2
> Environment: HDP 3.1.0.78
> Reporter: agoodboy
> Priority: Major
>
> Nodemanager does not clean local filecache dir event the size exceeds the
> config in yarn-site.xml. The config in yarn-site.xml is as follows:
> <property>
> <name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name>
> <value>600000</value>
> </property>
> <property>
> <name>yarn.nodemanager.localizer.cache.target-size-mb</name>
> <value>10240</value>
> </property>
> <property>
> <name>yarn.nodemanager.container-executor.class</name>
>
> <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
> </property>
>
> I use docker to run my program and in docker container I will download file
> from hdfs to local dir. But after docker container killed or exit, the files
> doesn't cleaned by nodemanager, hence, the filecache dir increases and node
> enters unhealthy state. The docker start command with a mounted dir like this:
> -v=/data1/hadoop/yarn/local/filecache/2115/models.tar.gz/models:/home/hadoop/xdl/models:rw
>
> -v=/data1/hadoop/yarn/local/filecache/2116:/data1/hadoop/yarn/local/filecache/2116
>
> -v=/data1/hadoop/yarn/local/filecache/2117:/data1/hadoop/yarn/local/filecache/2117.
>
> For example, the filecache dir size is
> $ sudo du -sh .112G .
>
> But nodemanager still does not clean it event I set cache size is 10GB.
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]