[ 
https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623082#comment-13623082
 ] 

Omkar Vinit Joshi commented on YARN-539:
----------------------------------------

At present the flow of events in case resource localization is as follows
* When resource localization fails (Public localizer / LocalizerRunner(Private) 
)it sends ContainerResourceFailedEvent to the containers which then sends 
ResourceReleaseEvent to the failed resource. In the end when 
LocalizedResource's ref count drops to 0 its state is changed from DOWNLOADING 
to INIT.

Now due to this resource may end up in memory (ResourceLocalizationTracker - 
memory leak) or may also introduce a race condition 
[yarn-544|https://issues.apache.org/jira/browse/YARN-544]

Now proposed solution is
* when resource localization fails, resource localization failed event 
(ResourceFailedEvent) is sent to (LocalResourcesTrackerImpl). The tracker will 
remove this localized resource from its cache and will then pass the event to 
LocalizedResource. LocalizedResource will then notify all the containers which 
were waiting for this resource. The containers will no longer send an 
additional ResourceReleaseEvent.
* Now to keep the flow same for Success as well as Failure, even the 
Localization successful event will be sent to LocalizedResource via 
LocalResourcesTrackerImpl.
                
> LocalizedResources are leaked in memory in case resource localization fails
> ---------------------------------------------------------------------------
>
>                 Key: YARN-539
>                 URL: https://issues.apache.org/jira/browse/YARN-539
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Omkar Vinit Joshi
>            Assignee: Omkar Vinit Joshi
>
> If resource localization fails then resource remains in memory and is
> 1) Either cleaned up when next time cache cleanup runs and there is space 
> crunch. (If sufficient space in cache is available then it will remain in 
> memory).
> 2) reused if LocalizationRequest comes again for the same resource.
> I think when resource localization fails then that event should be sent to 
> LocalResourceTracker which will then remove it from its cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to