Sangjin Lee commented on YARN-2183:

Thanks for the review [~kasha]!

I understand we need a check to prevent the race. I wonder if we can just 
re-use the existing check in CleanerTask#run instead of an explicit check in 
CleanerService#runCleanerTask? From what I remember, that would make the code 
in CleanerTask#run cleaner as well. (no pun)

The main motivation for this somewhat elaborate double check was for a 
situation when an on-demand cleaner run comes in and a scheduled cleaner run 
starts. Without this check, we would have two cleaner runs back to back which 
is somewhat wasteful.

Having said that, I think it is debatable how important it is to avoid that 
situation and whether it is an optimization worth doing. One could argue that 
this is bit too fine optimization. Thoughts?

I poked around a little more, and here is what I think. SharedCacheManager 
creates an instance of AppChecker, rest of the SCM pieces (Store, 
CleanerService) should just use the same instance. This instance can be passed 
either in the constructor or through an SCMContext similar to RMContext. Or, we 
could add SCM#getAppChecker.
In its current form, CleanerTask#cleanResourceReferences fetches the references 
from the store, checks if the apps are running, and asks the store to remove 
the references. Moving the whole method to the store would simplify the code 

Yes, I agree that moving cleanResourceReferences() to the store would simplify 
code here. There is one caveat however. Currently 
CleanerTask.cleanResourceReferences() is generic: i.e. it does not depend on 
the type of the store. But if we move this to the store, then I think it would 
need to be abstract at the level of SCMStore and each store implementation 
would need to implement its own. The main reason is that the concurrency/safety 
semantics would be different from store impl to store impl. In the case of the 
in-memory store, it would use the synchronization on the interned key. But in 
case of other stores, that does not apply and they need to do their own 
implementation mostly because how they handle concurrency will be different. So 
it would mean largely copying and pasting of the same logic with a small 
difference of how they handle concurrency. That does seem to be a downside of 
this approach. What do you think?

> Cleaner service for cache manager
> ---------------------------------
>                 Key: YARN-2183
>                 URL: https://issues.apache.org/jira/browse/YARN-2183
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, 
> YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch, YARN-2183-trunk-v5.patch
> Implement the cleaner service for the cache manager along with metrics for 
> the service. This service is responsible for cleaning up old resource 
> references in the manager and removing stale entries from the cache.

This message was sent by Atlassian JIRA

Reply via email to