[ 
https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610050#comment-15610050
 ] 

Sangjin Lee commented on YARN-5767:
-----------------------------------

Thanks for the patch [~ctrezzo]! I agree with the above feedback from Jason and 
Mikios.

Just a couple of more minor comments.

1. Let's make {{LRUComparator}} a private static class. This doesn't need to be 
visible outside {{LocalCacheCleaner}}.
2. Can we remove {{LocalCacheCleanerStats.getUserDelSizes()}}? This publishes 
the internal map and breaks the encapsulation. It doesn't appear to be used by 
any code.
3. Please use a generated {{serialVersionUID}} value instead of a hard-coded 
value of 1.

> Fix the order that resources are cleaned up from the local Public/Private 
> caches
> --------------------------------------------------------------------------------
>
>                 Key: YARN-5767
>                 URL: https://issues.apache.org/jira/browse/YARN-5767
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, 
> YARN-5767-trunk-v3.patch
>
>
> If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can 
> see that public resources are added to the {{ResourceRetentionSet}} first 
> followed by private resources:
> {code:java}
> private void handleCacheCleanup(LocalizationEvent event) {
>   ResourceRetentionSet retain =
>     new ResourceRetentionSet(delService, cacheTargetSize);
>   retain.addResources(publicRsrc);
>   if (LOG.isDebugEnabled()) {
>     LOG.debug("Resource cleanup (public) " + retain);
>   }
>   for (LocalResourcesTracker t : privateRsrc.values()) {
>     retain.addResources(t);
>     if (LOG.isDebugEnabled()) {
>       LOG.debug("Resource cleanup " + t.getUser() + ":" + retain);
>     }
>   }
>   //TODO Check if appRsrcs should also be added to the retention set.
> }
> {code}
> Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see 
> that this means public resources are deleted first until the target cache 
> size is met:
> {code:java}
> public void addResources(LocalResourcesTracker newTracker) {
>   for (LocalizedResource resource : newTracker) {
>     currentSize += resource.getSize();
>     if (resource.getRefCount() > 0) {
>       // always retain resources in use
>       continue;
>     }
>     retain.put(resource, newTracker);
>   }
>   for (Iterator<Map.Entry<LocalizedResource,LocalResourcesTracker>> i =
>          retain.entrySet().iterator();
>        currentSize - delSize > targetSize && i.hasNext();) {
>     Map.Entry<LocalizedResource,LocalResourcesTracker> rsrc = i.next();
>     LocalizedResource resource = rsrc.getKey();
>     LocalResourcesTracker tracker = rsrc.getValue();
>     if (tracker.remove(resource, delService)) {
>       delSize += resource.getSize();
>       i.remove();
>     }
>   }
> }
> {code}
> The result of this is that resources in the private cache are only deleted in 
> the cases where:
> # The cache size is larger than the target cache size and the public cache is 
> empty.
> # The cache size is larger than the target cache size and everything in the 
> public cache is being used by a running container.
> For clusters that primarily use the public cache (i.e. make use of the shared 
> cache), this means that the most commonly used resources can be deleted 
> before old resources in the private cache. Furthermore, the private cache can 
> continue to grow over time causing more and more churn in the public cache.
> Additionally, the same problem exists within the private cache. Since 
> resources are added to the retention set on a user by user basis, resources 
> will get cleaned up one user at a time in the order that privateRsrc.values() 
> returns the LocalResourcesTracker. So if user1 has 10MB in their cache and 
> user2 has 100MB in their cache and the target size of the cache is 50MB, 
> user1 could potentially have their entire cache removed before anything is 
> deleted from the user2 cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to