[jira] [Created] (YARN-8381) Job get stuck while node is unhealthy, but without log messages to indicate such case

lujie (JIRA) Wed, 30 May 2018 20:17:00 -0700

lujie created YARN-8381:
---------------------------

             Summary: Job get stuck while node is unhealthy, but without log 
messages to indicate such case
                 Key: YARN-8381
                 URL: https://issues.apache.org/jira/browse/YARN-8381
             Project: Hadoop YARN
          Issue Type: Improvement
            Reporter: lujie



I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message. Then I waked up to check the node health after  
reading log message for long time. The Yarn web UI showed that the nodemanager 
is unhealthy, due to the "l{{ocal-dirs are bad: 
/tmp/hadoop-hduser/nm-local-dir}}".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem. But I still  strongly recommend adding error 
log messages for unhealthy nodemanger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (YARN-8381) Job get stuck while node is unhealthy, but without log messages to indicate such case

Reply via email to