[
https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628115#comment-13628115
]
Omkar Vinit Joshi commented on YARN-539:
----------------------------------------
The modified flow for Successful as well as Failed resource is
* Failed Resource download :- Public/Private localizer will notify tracker.
Tracker removes the resource from its cache (No memory leak now). Then passes
the event to LocalizedResource. Resource send ContainerResourceFailedEvent to
all the waiting containers. Containers in turn send ResourceReleaseEvent.
Earlier we thought about removing this Release call but it is required as
multiple resources requested by the container may fail one after the another
before container's release event is handled on all the requested resources due
to one of the resource failure.
* Successful Resource download :- Public/Private localizer will notify tracker
which in turn will notify LocalizedResource. Resource informs all the Container
of the successful download.
* Added Test TestLocalResourcesTrackerImpl.testLocalResourceCache for testing
resource lifecycle and memory leak
** 2 Containers are requesting the resource. After resource failure the
containers are informed and resource is removed from cache. Now before last
container's ResourceReleaseEvent is handled another container requests for the
same resource. So the ResourceReleaseEvent will return silently without
exception. In the end after successful resource localization (for second
attempt) and ResourceReleasEvent (by container-3) resource remains in cache in
LOCALIZED state with zero containers in waiting queue.
> LocalizedResources are leaked in memory in case resource localization fails
> ---------------------------------------------------------------------------
>
> Key: YARN-539
> URL: https://issues.apache.org/jira/browse/YARN-539
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Omkar Vinit Joshi
> Assignee: Omkar Vinit Joshi
> Attachments: yarn-539-20130410.patch
>
>
> If resource localization fails then resource remains in memory and is
> 1) Either cleaned up when next time cache cleanup runs and there is space
> crunch. (If sufficient space in cache is available then it will remain in
> memory).
> 2) reused if LocalizationRequest comes again for the same resource.
> I think when resource localization fails then that event should be sent to
> LocalResourceTracker which will then remove it from its cache.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira