[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236943#comment-14236943
 ] 

Varun Saxena commented on YARN-2902:
------------------------------------

Looking further at the code, I think we can call 
*LocalResourcesTrackerImpl#remove* irrespective of whether cache target size 
decided by *yarn.nodemanager.localizer.cache.target-size-mb* config has been 
reached or not.
This is because in below piece of code which is called from 
ResourceLocalizationService#handleCacheCleanup, we check the cumulative size of 
all the resources(which have reference count of 0) against the cache target 
size. And in case of resource having its state as *DOWNLOADING*, call to 
LocalizedResource#getSize will always return -1. Because it seems size is only 
updated once the state changes to *LOCALIZED*.

{code:title=ResourceRetentionSet.java|borderStyle=solid}
public void addResources(LocalResourcesTracker newTracker) {
    for (LocalizedResource resource : newTracker) {
      currentSize += resource.getSize();
      if (resource.getRefCount() > 0) {
        // always retain resources in use
        continue;
      }
      retain.put(resource, newTracker);
    }
    for (Iterator<Map.Entry<LocalizedResource,LocalResourcesTracker>> i =
           retain.entrySet().iterator();
         currentSize - delSize > targetSize && i.hasNext();) {
      Map.Entry<LocalizedResource,LocalResourcesTracker> rsrc = i.next();
      LocalizedResource resource = rsrc.getKey();
      LocalResourcesTracker tracker = rsrc.getValue();
      if (tracker.remove(resource, delService)) {
        delSize += resource.getSize();
        i.remove();
      }
    }
  }
{code}

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2902
>                 URL: https://issues.apache.org/jira/browse/YARN-2902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>             Fix For: 2.7.0
>
>         Attachments: YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to