[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742706#comment-14742706 ]
Varun Saxena commented on YARN-2902: ------------------------------------ Implementation details are as under : (FSDownload.java) * While going through the code, found one issue in current code. FSDownload#call is a mix of interruptible and uninterruptible code. This means that when {{Future#cancel(true)}} is called, the code might not be interrupted and we may continue downloading, even though CancellationException is thrown on Future#get. But in ContainerLocalizer, on CancellationException we send FETCH_FAILURE back to NM and do not do any cleanup(deletion of downloaded resources) expecting FSDownload#call to do the cleanup as it indeed does on other exceptions. FSDownload does not provide any method to do the cleanup either. So to resolve this issue, have added a wait/notify construct to let download task complete(either through exception or normally) and then do the cleanup. Thought of checking for thread being interrupted at the end of FSDownload#call but FileContext#rename in the code resets the interrupted flag so going with this solution. This will increase the time taken for localizer to send last heartbeat(HB) to NM though. *Should I raise another JIRA for this ?* (ContainerLocalizer.java) * A set of paths is maintained for the resources which were successfully downloaded and reported to NM in HB. This is done because if container has been killed, NM would pay no heed to this FETCH_SUCCESS status. So if in HB rsp, NM indicates that resources have to be deleted, we need a reference of what we sent in the HB as well. This extra set is required because we currently do not wait for rsp from NM before deleting entry from pending resources map (delete it at the time of sending HB). Should we change it ? * We use this set of paths to do the deletion of resources reported to NM at the time of localizer DIE. pendingResources map is used for cleanup of resources which have not been reported to NM (ResourceLocalizationService.java) * On container kill, iterate over all the scheduled resources and for the resources which are in downloading state, schedule a deletion task which runs after a delay specified by config yarn.nodemanager.localizer.downloading-rsrcs.deletion.wait-sec * An additional map of localizer id(key) and deletion task future + deletion task id for state store(value) is maintained for cancellation of deletion task later(upon HB from localizer). * On localizer HB, we would try to cancel the deletion task and remove the task from state store. Moreover, a custom deletion task is created which calls the FileDeletionTask and also removes the entry from map above after deletion task completes. This however can lead to a minor race for removal of task from state store(i.e. race due to task completion and cancellation) if the code flow is not interruptible. But I guess that wont be a problem as removal of task by ID from leveldb would simply not do anything if key does not exist. * In case of NM restart(which should be quite rare), the map above would be lost. But that should be fine. Only side effect of this is that running deletion tasks may not be cancelled. But even if we attempt to delete a non existent directory, it should not be a problem. (DeletionService.java and Container Executor(s)) * Made relevant changes in DeletionService to ensure scheduling of deletion tasks with a delay. * Made changes to ensure deletion task's future and task id used for state store are returned back to caller for cancellation later. Also added code for cancellation of running task. * Added a new flag "ignoreMissingDir" in deletion task. Currently executors will throw an error if any of directory missing amongst the list of baseDirs passed for deletion do not exist. But for this use case we would ignore such missing dir and continue with deletion of other directories. Because NM may have inconsistent view of what all needs to be deleted. The flag hence has been added to support this new behavior and continue with old behavior(if flag is false). * Changes have been made in {{container-executor}} as well for support of this flag. An additional command line parameter has been added (for delete as user command). A value of 0 is analogous to the flag being false and 1 means true. Few additional points to note # xxxx_tmp directories should be deleted by localizer itself(in FSDownload). Have added it because as you said localizers can sometimes turn rogue. In this case, NM will delete these dirs. Should I handle tmp dirs then ? # I am assuming it would be fine to change command line parameters for a specific container-executor command in 2.8. If we do not want to change command line parameters for {{container-executor}}, there can be workarounds. > Killing a container that is localizing can orphan resources in the > DOWNLOADING state > ------------------------------------------------------------------------------------ > > Key: YARN-2902 > URL: https://issues.apache.org/jira/browse/YARN-2902 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Affects Versions: 2.5.0 > Reporter: Jason Lowe > Assignee: Varun Saxena > Attachments: YARN-2902.002.patch, YARN-2902.03.patch, > YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, YARN-2902.patch > > > If a container is in the process of localizing when it is stopped/killed then > resources are left in the DOWNLOADING state. If no other container comes > along and requests these resources they linger around with no reference > counts but aren't cleaned up during normal cache cleanup scans since it will > never delete resources in the DOWNLOADING state even if their reference count > is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)