Hello,

If a resource is localised on a disk and that disk has gone bad after
localising, subsequent containers are not able to find the resource and NM
does not download it again.
The problem is stat system call succeeds on the bad path which causes
file.exists() to return true.
But ls on the path returns an IO error.

LocalResourcesTrackerImpl.java
case REQUEST:
if (rsrc != null && (!isResourcePresent(rsrc))) {
    LOG.info("Resource " + rsrc.getLocalPath()
            + " is missing, localizing it again");
    removeResource(req);
    rsrc = null;
}
if (null == rsrc) {
    rsrc = new LocalizedResource(req, dispatcher);
    localrsrc.put(req, rsrc);
}
break;

isResourcePresent() calls file.exists() which calls stat64 natively which
returns true.. But the disk actually is bad, and there is no possibility of
reading/writing on that path.

example:
>>stat /data/d3/yarn/local

  File: `/data/d3/yarn/local'
Size: 4096      Blocks: 8          IO Block: 4096   directory
Device: 830h/2096d Inode: 107307009   Links: 3
Access: (0755/drwxr-xr-x)  Uid: (  110/ yarn)   Gid: (  118/  hadoop)
Access: 2014-11-18 13:57:19.000000000 +0000
Modify: 2014-11-19 11:15:15.000000000 +0000
Change: 2014-11-19 11:15:15.000000000 +0000
Birth: -

and ls  says

ls: reading directory /data/d3/mapred: Input/output error


Any thoughts?

Thanks

Reply via email to