[
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637241#comment-13637241
]
Hudson commented on YARN-547:
-----------------------------
Integrated in Hadoop-Hdfs-trunk #1378 (See
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1378/])
YARN-547. Fixed race conditions in public and private resource localization
which used to cause duplicate downloads. Contributed by Omkar Vinit Joshi.
(Revision 1470076)
Result = FAILURE
vinodkv :
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470076
Files :
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalizedResource.java
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
> Race condition in Public / Private Localizer may result into resource getting
> downloaded again
> ----------------------------------------------------------------------------------------------
>
> Key: YARN-547
> URL: https://issues.apache.org/jira/browse/YARN-547
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Omkar Vinit Joshi
> Assignee: Omkar Vinit Joshi
> Fix For: 2.0.5-beta
>
> Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch,
> yarn-547-20130412.patch, yarn-547-20130415.patch, yarn-547-20130416.1.patch,
> yarn-547-20130416.patch, yarn-547-20130418.patch
>
>
> Public Localizer :
> At present when multiple containers try to request a localized resource
> * If the resource is not present then first it is created and Resource
> Localization starts ( LocalizedResource is in DOWNLOADING state)
> * Now if in this state multiple ResourceRequestEvents arrive then
> ResourceLocalizationEvents are sent for all of them.
> Most of the times it is not resulting into a duplicate resource download but
> there is a race condition present there. Inside ResourceLocalization (for
> public download) all the requests are added to local attempts map. If a new
> request comes in then first it is checked in this map before a new download
> starts for the same. For the current download the request will be there in
> the map. Now if a same resource request comes in then it will rejected (i.e.
> resource is getting downloaded already). However if the current download
> completes then the request will be removed from this local map. Now after
> this removal if the LocalizerRequestEvent comes in then as it is not present
> in local map the resource will be downloaded again.
> PrivateLocalizer :
> Here a different but similar race condition is present.
> * Here inside findNextResource method call; each LocalizerRunner tries to
> grab a lock on LocalizerResource. If the lock is not acquired then it will
> keep trying until the resource state changes to LOCALIZED. This lock will be
> released by the LocalizerRunner when download completes.
> * Now if another ContainerLocalizer tries to grab the lock on a resource
> before LocalizedResource state changes to LOCALIZED then resource will be
> downloaded again.
> At both the places the root cause of this is that all the threads try to
> acquire the lock on resource however current state of the LocalizedResource
> is not taken into consideration.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira