Lavkesh Lahngir created YARN-3591:
-------------------------------------
Summary: Resource Localisation on a bad disk causes subsequent
containers failure
Key: YARN-3591
URL: https://issues.apache.org/jira/browse/YARN-3591
Project: Hadoop YARN
Issue Type: Bug
Reporter: Lavkesh Lahngir
It happens when a resource is localised on the disk, after localising that disk
has gone bad. NM keeps paths for localised resources in memory. At the time of
resource request isResourcePresent(rsrc) will be called which calls
file.exists() on the localised path.
In some cases when disk has gone bad, inodes are stilled cached and
file.exists() returns true. But at the time of reading, file will not open.
Note: file.exists() actually calls stat64 natively which returns true because
it was able to find inode information from the OS.
A proposal is to call file.list() on the parent path of the resource, which
will call open() natively. If the disk is good it should return an array of
paths with length at-least 1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)