[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733353#comment-14733353
 ] 

Hudson commented on YARN-3591:
------------------------------

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #357 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/357/])
YARN-3591. Resource localization on a bad disk causes subsequent containers 
failure. Contributed by Lavkesh Lahngir. (vvasudev: rev 
1dbd8e34a7d97c4d8586da79c980d8f2e0aad61d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceRetention.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java


> Resource localization on a bad disk causes subsequent containers failure 
> -------------------------------------------------------------------------
>
>                 Key: YARN-3591
>                 URL: https://issues.apache.org/jira/browse/YARN-3591
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Lavkesh Lahngir
>            Assignee: Lavkesh Lahngir
>             Fix For: 2.8.0
>
>         Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
> YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch, YARN-3591.5.patch, 
> YARN-3591.6.patch, YARN-3591.7.patch, YARN-3591.8.patch, YARN-3591.9.patch
>
>
> It happens when a resource is localised on the disk, after localising that 
> disk has gone bad. NM keeps paths for localised resources in memory.  At the 
> time of resource request isResourcePresent(rsrc) will be called which calls 
> file.exists() on the localised path.
> In some cases when disk has gone bad, inodes are stilled cached and 
> file.exists() returns true. But at the time of reading, file will not open.
> Note: file.exists() actually calls stat64 natively which returns true because 
> it was able to find inode information from the OS.
> A proposal is to call file.list() on the parent path of the resource, which 
> will call open() natively. If the disk is good it should return an array of 
> paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to