[
https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623082#comment-13623082
]
Omkar Vinit Joshi commented on YARN-539:
----------------------------------------
At present the flow of events in case resource localization is as follows
* When resource localization fails (Public localizer / LocalizerRunner(Private)
)it sends ContainerResourceFailedEvent to the containers which then sends
ResourceReleaseEvent to the failed resource. In the end when
LocalizedResource's ref count drops to 0 its state is changed from DOWNLOADING
to INIT.
Now due to this resource may end up in memory (ResourceLocalizationTracker -
memory leak) or may also introduce a race condition
[yarn-544|https://issues.apache.org/jira/browse/YARN-544]
Now proposed solution is
* when resource localization fails, resource localization failed event
(ResourceFailedEvent) is sent to (LocalResourcesTrackerImpl). The tracker will
remove this localized resource from its cache and will then pass the event to
LocalizedResource. LocalizedResource will then notify all the containers which
were waiting for this resource. The containers will no longer send an
additional ResourceReleaseEvent.
* Now to keep the flow same for Success as well as Failure, even the
Localization successful event will be sent to LocalizedResource via
LocalResourcesTrackerImpl.
> LocalizedResources are leaked in memory in case resource localization fails
> ---------------------------------------------------------------------------
>
> Key: YARN-539
> URL: https://issues.apache.org/jira/browse/YARN-539
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Omkar Vinit Joshi
> Assignee: Omkar Vinit Joshi
>
> If resource localization fails then resource remains in memory and is
> 1) Either cleaned up when next time cache cleanup runs and there is space
> crunch. (If sufficient space in cache is available then it will remain in
> memory).
> 2) reused if LocalizationRequest comes again for the same resource.
> I think when resource localization fails then that event should be sent to
> LocalResourceTracker which will then remove it from its cache.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira