Lavkesh Lahngir created YARN-3591:
-------------------------------------

             Summary: Resource Localisation on a bad disk causes subsequent 
containers failure 
                 Key: YARN-3591
                 URL: https://issues.apache.org/jira/browse/YARN-3591
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Lavkesh Lahngir


It happens when a resource is localised on the disk, after localising that disk 
has gone bad. NM keeps paths for localised resources in memory.  At the time of 
resource request isResourcePresent(rsrc) will be called which calls 
file.exists() on the localised path.

In some cases when disk has gone bad, inodes are stilled cached and 
file.exists() returns true. But at the time of reading, file will not open.

Note: file.exists() actually calls stat64 natively which returns true because 
it was able to find inode information from the OS.

A proposal is to call file.list() on the parent path of the resource, which 
will call open() natively. If the disk is good it should return an array of 
paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to