lujie created YARN-8381: --------------------------- Summary: Job get stuck while node is unhealthy, but without log messages to indicate such case Key: YARN-8381 URL: https://issues.apache.org/jira/browse/YARN-8381 Project: Hadoop YARN Issue Type: Improvement Reporter: lujie
I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. Then I waked up to check the node health after reading log message for long time. The Yarn web UI showed that the nodemanager is unhealthy, due to the "l{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. But I still strongly recommend adding error log messages for unhealthy nodemanger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org