[
https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628492#comment-13628492
]
Omkar Vinit Joshi commented on YARN-539:
----------------------------------------
bq. Resource doesn't have life, so it can't 'fail'. In that sense, shall we
rename ResourceFailedEvent to ResourceFailedLocalizationEvent? Similarly
ResourceEventType.FAILED to ResourceEventType.LOCALIZATION_FAILED?
Fixed.
bq. Dismantle localizationCompleted altogether? Makes code much more readable
IMO.
Yeah.. this is no longer required and can be simplified :) .. updating handle
accordingly.
bq. The log message for release doesn't need to specifically talk about failed
resources. A release on a resource that is long gone for whatever reason will
run into this code-path.
Yes you are right ... ex. if resource's local file is deleted (becomes
inaccessible for some reason) then too we will end up getting these messages...
bq. Not related to your patch, but the code for REQUEST can simplified by doing
the null check first.
No. I think the flow is correct.
* check if resource is not null and present on disk if not then remove it from
cache.
* now if resource is null -> We can have below two possibilities. In both cases
we need to recreate the resource.
** Either resource's local copy is inaccessible
** or resource request is coming for the first time
bq. Null checks needed for rsrc on LOCALIZED and FAILED events?
Can never occur. as when these events come resource will be there in
DOWNLOADING state. Will never be removed because for this resource ref count >
0 (ResourceRetentionSet.addResources).
> LocalizedResources are leaked in memory in case resource localization fails
> ---------------------------------------------------------------------------
>
> Key: YARN-539
> URL: https://issues.apache.org/jira/browse/YARN-539
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Omkar Vinit Joshi
> Assignee: Omkar Vinit Joshi
> Attachments: yarn-539-20130410.1.patch, yarn-539-20130410.patch
>
>
> If resource localization fails then resource remains in memory and is
> 1) Either cleaned up when next time cache cleanup runs and there is space
> crunch. (If sufficient space in cache is available then it will remain in
> memory).
> 2) reused if LocalizationRequest comes again for the same resource.
> I think when resource localization fails then that event should be sent to
> LocalResourceTracker which will then remove it from its cache.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira