[ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609727#comment-15609727 ]
Jason Lowe commented on YARN-5767: ---------------------------------- Ah, I missed the findbugs connection with respect to comparator serialization. In that case I'm fine with leaving that in, since the alternative of adding a findbugs exception is about as much work. No need to do read/write object methods. > Fix the order that resources are cleaned up from the local Public/Private > caches > -------------------------------------------------------------------------------- > > Key: YARN-5767 > URL: https://issues.apache.org/jira/browse/YARN-5767 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1 > Reporter: Chris Trezzo > Assignee: Chris Trezzo > Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, > YARN-5767-trunk-v3.patch > > > If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can > see that public resources are added to the {{ResourceRetentionSet}} first > followed by private resources: > {code:java} > private void handleCacheCleanup(LocalizationEvent event) { > ResourceRetentionSet retain = > new ResourceRetentionSet(delService, cacheTargetSize); > retain.addResources(publicRsrc); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup (public) " + retain); > } > for (LocalResourcesTracker t : privateRsrc.values()) { > retain.addResources(t); > if (LOG.isDebugEnabled()) { > LOG.debug("Resource cleanup " + t.getUser() + ":" + retain); > } > } > //TODO Check if appRsrcs should also be added to the retention set. > } > {code} > Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see > that this means public resources are deleted first until the target cache > size is met: > {code:java} > public void addResources(LocalResourcesTracker newTracker) { > for (LocalizedResource resource : newTracker) { > currentSize += resource.getSize(); > if (resource.getRefCount() > 0) { > // always retain resources in use > continue; > } > retain.put(resource, newTracker); > } > for (Iterator<Map.Entry<LocalizedResource,LocalResourcesTracker>> i = > retain.entrySet().iterator(); > currentSize - delSize > targetSize && i.hasNext();) { > Map.Entry<LocalizedResource,LocalResourcesTracker> rsrc = i.next(); > LocalizedResource resource = rsrc.getKey(); > LocalResourcesTracker tracker = rsrc.getValue(); > if (tracker.remove(resource, delService)) { > delSize += resource.getSize(); > i.remove(); > } > } > } > {code} > The result of this is that resources in the private cache are only deleted in > the cases where: > # The cache size is larger than the target cache size and the public cache is > empty. > # The cache size is larger than the target cache size and everything in the > public cache is being used by a running container. > For clusters that primarily use the public cache (i.e. make use of the shared > cache), this means that the most commonly used resources can be deleted > before old resources in the private cache. Furthermore, the private cache can > continue to grow over time causing more and more churn in the public cache. > Additionally, the same problem exists within the private cache. Since > resources are added to the retention set on a user by user basis, resources > will get cleaned up one user at a time in the order that privateRsrc.values() > returns the LocalResourcesTracker. So if user1 has 10MB in their cache and > user2 has 100MB in their cache and the target size of the cache is 50MB, > user1 could potentially have their entire cache removed before anything is > deleted from the user2 cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org