sandflee created YARN-6854:
------------------------------
Summary: many job failed if NM couldn't detect disk error
Key: YARN-6854
URL: https://issues.apache.org/jira/browse/YARN-6854
Project: Hadoop YARN
Issue Type: Bug
Reporter: sandflee
Priority: Critical
checkDiskHealthy is enabled, but it couldn't find this error. leading
containers failed and new containers assigned to this node then failed again.
the disk error seems a filesystem error, all io operation (such as ls) failed
on $localdir/usercache/userFoo, and no effect on other dir.
Any suggestion?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]