[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lujie updated YARN-8381: ------------------------ Summary: Job got stuck while node is unhealthy, but without log messages to indicate such case (was: Job get stuck while node is unhealthy, but without log messages to indicate such case) > Job got stuck while node is unhealthy, but without log messages to indicate > such case > ------------------------------------------------------------------------------------- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: lujie > Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. Then I waked up to check the node health after > reading log message for long time. The Yarn web UI showed that the > nodemanager is unhealthy, due to the "l{{ocal-dirs are bad: > /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. But I still strongly recommend adding error > log messages for unhealthy nodemanger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org