[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587682#comment-14587682
 ] 

Varun Vasudev commented on YARN-3591:
-------------------------------------

Lavkesh's original patch did the test regardless of whether the directory was 
known to be good or bad. We want the LocalDirHandlerService to be a central 
place for the state of the local dirs. If there is a test that improves our 
detection of bad disks, we should add it to the DirectoryCollection class. 
However in this case, the local dirs were detected as bad. In spite of being 
known to bad, we still tried to serve jars from them. If the frequency of the 
checks is too small, admins can change it to suit their liking.



> Resource Localisation on a bad disk causes subsequent containers failure 
> -------------------------------------------------------------------------
>
>                 Key: YARN-3591
>                 URL: https://issues.apache.org/jira/browse/YARN-3591
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Lavkesh Lahngir
>            Assignee: Lavkesh Lahngir
>         Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
> YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch
>
>
> It happens when a resource is localised on the disk, after localising that 
> disk has gone bad. NM keeps paths for localised resources in memory.  At the 
> time of resource request isResourcePresent(rsrc) will be called which calls 
> file.exists() on the localised path.
> In some cases when disk has gone bad, inodes are stilled cached and 
> file.exists() returns true. But at the time of reading, file will not open.
> Note: file.exists() actually calls stat64 natively which returns true because 
> it was able to find inode information from the OS.
> A proposal is to call file.list() on the parent path of the resource, which 
> will call open() natively. If the disk is good it should return an array of 
> paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to