[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532567#comment-14532567
 ] 

Lavkesh Lahngir commented on YARN-3591:
---------------------------------------

example: 
>>stat /data/d3/yarn/local
File: `/data/d3/yarn/local'
Size: 4096      Blocks: 8          IO Block: 4096   directory
Device: 830h/2096d      Inode: 107307009   Links: 3
Access: (0755/drwxr-xr-x)  Uid: (  110/ yarn)   Gid: (  118/  hadoop)
Access: 2014-11-18 13:57:19.000000000 +0000
Modify: 2014-11-19 11:15:15.000000000 +0000
Change: 2014-11-19 11:15:15.000000000 +0000
Birth: -

>> ls /data/d3/yarn/local
ls: reading directory /data/d3/yarn/local: Input/output error

> Resource Localisation on a bad disk causes subsequent containers failure 
> -------------------------------------------------------------------------
>
>                 Key: YARN-3591
>                 URL: https://issues.apache.org/jira/browse/YARN-3591
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Lavkesh Lahngir
>         Attachments: 0001-YARN-3591.patch
>
>
> It happens when a resource is localised on the disk, after localising that 
> disk has gone bad. NM keeps paths for localised resources in memory.  At the 
> time of resource request isResourcePresent(rsrc) will be called which calls 
> file.exists() on the localised path.
> In some cases when disk has gone bad, inodes are stilled cached and 
> file.exists() returns true. But at the time of reading, file will not open.
> Note: file.exists() actually calls stat64 natively which returns true because 
> it was able to find inode information from the OS.
> A proposal is to call file.list() on the parent path of the resource, which 
> will call open() natively. If the disk is good it should return an array of 
> paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to