lujie created YARN-8381:
---------------------------

             Summary: Job get stuck while node is unhealthy, but without log 
messages to indicate such case
                 Key: YARN-8381
                 URL: https://issues.apache.org/jira/browse/YARN-8381
             Project: Hadoop YARN
          Issue Type: Improvement
            Reporter: lujie


I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message. Then I waked up to check the node health after  
reading log message for long time. The Yarn web UI showed that the nodemanager 
is unhealthy, due to the "l{{ocal-dirs are bad: 
/tmp/hadoop-hduser/nm-local-dir}}".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem. But I still  strongly recommend adding error 
log messages for unhealthy nodemanger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to