Hi,
I'm running 2.2.0 clusters, my application is pretty disk I/O
expensive(processing huge zip files), overtime I found some job failure due
to "no space on disk", normally the leftover files can be cleaned, but for
some reason if they're not, I expect no more new task can run on this node,
but in fact I still can see new tasks are coming to that node and keep
failing. My application will write data to /tmp(where may cause out of disk
space), so I can configure below properties:
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>
/scratch/usr/software/hadoop2/hadoop-dc/temp/nm-local-dir,
/tmp/nm-local-dir
</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
<value>1.0</value>
</property>
As I have /tmp/nm-local-dir as part of $yarn.nodemanager.local-dirs, based
on doc
yarn.nodemanager.disk-health-checker.min-healthy-disks:
The minimum fraction of number of disks to be healthy for the nodemanager
to launch new containers. This correspond to both
yarn-nodemanager.local-dirs and yarn.nodemanager.log-dirs. i.e. If there
are less number of healthy local-dirs (or log-dirs) available, then new
containers will not be launched on this node.
Did I miss anything?
--
--Anfernee