lujie created YARN-8381:
---------------------------
Summary: Job get stuck while node is unhealthy, but without log
messages to indicate such case
Key: YARN-8381
URL: https://issues.apache.org/jira/browse/YARN-8381
Project: Hadoop YARN
Issue Type: Improvement
Reporter: lujie
I started a fresh pseudo-distributed system on an node, then run a job but it
stuck. My first reaction was checking log message to local problem, but
obtaining no error message. Then I waked up to check the node health after
reading log message for long time. The Yarn web UI showed that the nodemanager
is unhealthy, due to the "l{{ocal-dirs are bad:
/tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
to 98% and solved this problem. But I still strongly recommend adding error
log messages for unhealthy nodemanger.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]