[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742706#comment-14742706
 ] 

Varun Saxena commented on YARN-2902:
------------------------------------

Implementation details are as under :

(FSDownload.java)
* While going through the code, found one issue in current code. 
FSDownload#call is a mix of interruptible and uninterruptible code. This means 
that when {{Future#cancel(true)}} is called, the code might not be interrupted 
and we may continue downloading, even though CancellationException is thrown on 
Future#get. But in ContainerLocalizer, on CancellationException we send 
FETCH_FAILURE back to NM and do not do any cleanup(deletion of downloaded 
resources) expecting FSDownload#call to do the cleanup as it indeed does on 
other exceptions. FSDownload does not provide any method to do the cleanup 
either. So to resolve this issue, have added a wait/notify construct to let 
download task complete(either through exception or normally) and then do the 
cleanup. Thought of checking for thread being interrupted at the end of 
FSDownload#call but FileContext#rename in the code resets the interrupted flag 
so going with this solution.
This will increase the time taken for localizer to send last heartbeat(HB) to 
NM though.
*Should I raise another JIRA for this ?*
 
(ContainerLocalizer.java)
* A set of paths is maintained for the resources which were successfully 
downloaded and reported to NM in HB. This is done because if container has been 
killed, NM would pay no heed to this FETCH_SUCCESS status. So if in HB rsp, NM 
indicates that resources have to be deleted, we need a reference of what we 
sent in the HB as well. This extra set is required because we currently do not 
wait for rsp from NM before deleting entry from pending resources map (delete 
it at the time of sending HB).  Should we change it ?
* We use this set of paths to do the deletion of resources reported to NM at 
the time of localizer DIE. pendingResources map is used for cleanup of 
resources which have not been reported to NM

(ResourceLocalizationService.java)
*  On container kill, iterate over all the scheduled resources and for the 
resources which are in downloading state, schedule a deletion task which runs 
after a delay specified by config 
yarn.nodemanager.localizer.downloading-rsrcs.deletion.wait-sec 
* An additional map of localizer id(key) and deletion task future + deletion 
task id for state store(value) is maintained for cancellation of deletion task 
later(upon HB from localizer).
* On localizer HB, we would try to cancel the deletion task and remove the task 
from state store. Moreover, a custom deletion task is created which calls the 
FileDeletionTask and also removes the entry from map above after deletion task 
completes. This however can lead to a minor race for removal of task from state 
store(i.e. race due to task completion and cancellation) if the code flow is 
not interruptible. But I guess that wont be a problem as removal of task by ID 
from leveldb would simply not do anything if key does not exist.
* In case of NM restart(which should be quite rare), the map above would be 
lost. But that should be fine. Only side effect of this is that running 
deletion tasks may not be cancelled. But even if we attempt to delete a non 
existent directory, it should not be a problem.

(DeletionService.java and Container Executor(s))
* Made relevant changes in DeletionService to ensure scheduling of deletion 
tasks with a delay.
* Made changes to ensure deletion task's future and task id used for state 
store are returned back to caller for cancellation later. Also added code for 
cancellation of running task.
* Added a new flag "ignoreMissingDir" in deletion task. Currently executors 
will throw an error if any of directory missing amongst the list of baseDirs 
passed for deletion do not exist. But for this use case we would ignore such 
missing dir and continue with deletion of other directories. Because NM may 
have inconsistent view of what all needs to be deleted.  The flag hence has 
been added to support this new behavior and continue with old behavior(if flag 
is false).
* Changes have been made in {{container-executor}} as well for support of this 
flag. An additional command line parameter has been added (for delete as user 
command). A value of 0 is analogous to the flag being false and 1 means true.

Few additional points to note 
# xxxx_tmp directories should be deleted by localizer itself(in FSDownload). 
Have added it because as you said localizers can sometimes turn rogue. In this 
case, NM will delete these dirs. Should I handle tmp dirs then ?
# I am assuming it would be fine to change command line parameters for a 
specific container-executor command in 2.8. If we do not want to change command 
line parameters for {{container-executor}}, there can be workarounds.


> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to