Sangjin Lee commented on YARN-2183:

Thanks Karthik for the review. We have addressed most of your review comments 
in the updated patch. Items that need more discussion are below.

runCleanerTask: Instead of checking if there is a scheduled-cleaner-task 
running here, why not just rely on the check in CleanerTask#run(). Agree, we 
might be doing a little more work here unnecessarily, but not sure the savings 
are worth an extra check and an extra parameter in the CleanerTask constructor.

We do have the on-demand cleaner feature (YARN-2189; see below for more), and 
this check is needed to prevent a race (i.e. not allow an on-demand run when a 
scheduled run is in progress).

How does a user use runCleanerTask? Instantiate another SCM? The SCM isn't 
listening to any requests. I can see the SCM being run in the RM, and one could 
potentially add "yarn rmadmin -clean-shared-cache". In any case, given there is 
no way to reach a running SCM, I would remove runCleanerTask altogether for 
now, and add it back later when we need it? Thoughts?

As we discussed offline, we do have a YARN admin command implemented that lets 
you run the cleaner task on demand (YARN-2189). The admin command implements a 
proper ACL check (based on a YARN admin credential), and sends an RPC request 
to the running SCM. Since patches are organized this way, it may not have been 
very obvious by looking at this patch alone.

Should we worry about users starting SCMs with roots at different levels that 
can lead to multiple cleaners?

In theory it is possible. However, in reality checking that might be being bit 
too cautious. I think it might be fine without this check. Let me know what you 

Should cleanResourceReferences be moved to SCMStore?

That’s an interesting suggestion. In fact, with the InMemorySCMStore (which 
already has a reference to an AppChecker to clean up the initial apps) it may 
be OK to create SCMStore.cleanResourceReferences(). However, it’s not clear to 
me whether a dependency from an SCMStore to an AppChecker is always a fine 
requirement for other types of stores. In that sense, I would be hesitant to 
create this coupling by introducing SCMStore.cleanResourceReferences(). What do 
you think?

For the race condition (YARN-2663), would it help to handle the delete files on 
HDFS in the store#remove?

That’s a possibility. However, I think there can be an easier way to fix this 
race condition. This is partially due to the way the cleaner task is deleting 
unused files. Currently it deletes the entire directory as opposed to the 
specific file. The race can be fixed by avoiding deleting the directory. We’ll 
add the proper fix later on YARN-2663.

Make initSingleton private and call it in getInstance if the instance is null?

We looked into that, but the difficulty is initSingleton() needs the 
configuration which getInstance() does not provide.

How about using MutableRate or MutableStat for the rates?

We have removed the rate-specific metrics altogether as they can be derived 
from the original metrics.

Do we need CleanerMetricsCollector, wouldn't CleanerMetrics extending 
MetricsSource suffice?

For making the metrics available in JMX, indeed CleanerMetrics would suffice. 
CleanerMetricsCollector was introduced to create a web UI (HTML) to select a 
handful of metrics to show in HTML. That was to come in YARN-2203. Having said 
that, I think we can remove CleanerMetricsCollector for now, and not introduce 
the web UI. That UI is minimal anyway, and we can consider introducing a more 
refined web UI at a later point once this version is released.

> Cleaner service for cache manager
> ---------------------------------
>                 Key: YARN-2183
>                 URL: https://issues.apache.org/jira/browse/YARN-2183
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, 
> YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch, YARN-2183-trunk-v5.patch
> Implement the cleaner service for the cache manager along with metrics for 
> the service. This service is responsible for cleaning up old resource 
> references in the manager and removing stale entries from the cache.

This message was sent by Atlassian JIRA

Reply via email to