[ https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167175#comment-14167175 ]
Karthik Kambatla edited comment on YARN-2183 at 10/10/14 5:42 PM: ------------------------------------------------------------------ Thanks for updating the patch, Chris. Review comments on the current patch: # YarnConfiguration - let us include units for time in the config names; period-mins, initial-delay-mins, resource-sleep-ms. # CleanerService ## Don't need a SHUTDOWN_HOOK_PRIORITY specific to CleanerService, it is not being used anywhere either. ## Nit: In the constructor, can we explicitly initialize AtomicBoolean(false). ## Nit: Rename scheduler to executor or scheduledExecutor? ## serviceStart: the exception thrown should have a message more amenable to the user. How about appending "It appears there is another CleanerService running in the cluster"? ## serviceStop: the log message that the background thread stopped should be moved to the else-block in the try, instead of after catch? {code} try { if (!scheduler.awaitTermination(10, TimeUnit.SECONDS)) { LOG.warn("Gave up waiting for the cleaner task to shutdown."); } } catch (InterruptedException e) { LOG.warn("The cleaner service was interrupted while shutting down the task.", e); } LOG.info("The background thread stopped."); {code} ## runCleanerTask: Instead of checking if there is a scheduled-cleaner-task running here, why not just rely on the check in CleanerTask#run(). Agree, we might be doing a little more work here unnecessarily, but not sure the savings are worth an extra check and an extra parameter in the CleanerTask constructor. ## How does a user use runCleanerTask? Instantiate another SCM? The SCM isn't listening to any requests. I can see the SCM being run in the RM, and one could potentially add "yarn rmadmin -clean-shared-cache". In any case, given there is no way to reach a running SCM, I would remove runCleanerTask altogether for now, and add it back later when we need it? Thoughts? ## Should we worry about users starting SCMs with roots at different levels that can lead to multiple cleaners? # CleanerTask: ## Nit: RENAMED_SUFFIX can be private ## Add @Override to run() ## The following code is duplicated in InMemoryStateStore a well. May be, we should just add a static method to SharedCacheUtil? {code} StringBuilder pattern = new StringBuilder(); for (int i = 0; i < nestedLevel; i++) { pattern.append("*/"); } pattern.append("*"); {code} ## process: the sleep between directories is in millis, do we want to really calculate nanos? ## Should cleanResourceReferences be moved to SCMStore? ## For the race condition (YARN-2663), would it help to handle the delete files on HDFS in the store#remove? # CleanerMetrics: ## Make initSingleton private and call it in getInstance if the instance is null? ## How about using MutableRate or MutableStat for the rates? # Do we need CleanerMetricsCollector, wouldn't CleanerMetrics extending MetricsSource suffice? was (Author: kkambatl): Thanks for updating the patch, Chris. Review comments on the current patch: # YarnConfiguration - let us include units for time in the config names; period-mins, initial-delay-mins, resource-sleep-ms. # CleanerService ## Don't need a SHUTDOWN_HOOK_PRIORITY specific to CleanerService, it is not being used anywhere either. ## Nit: In the constructor, can we explicitly initialize AtomicBoolean(false). ## Nit: Rename scheduler to executor or scheduledExecutor? ## serviceStart: the exception thrown should have a message more amenable to the user. How about appending "It appears there is another CleanerService running in the cluster"? ## serviceStop: the log message that the background thread stopped should be moved to the else-block in the try, instead of after catch? {code} try { if (!scheduler.awaitTermination(10, TimeUnit.SECONDS)) { LOG.warn("Gave up waiting for the cleaner task to shutdown."); } } catch (InterruptedException e) { LOG.warn("The cleaner service was interrupted while shutting down the task.", e); } LOG.info("The background thread stopped."); {code} ## runCleanerTask: Instead of checking if there is a scheduled-cleaner-task running here, why not just rely on the check in CleanerTask#run(). Agree, we might be doing a little more work here unnecessarily, but not sure the savings are worth an extra check and an extra parameter in the CleanerTask constructor. ## How does a user use runCleanerTask? Instantiate another SCM? The SCM isn't listening to any requests. I can see the SCM being run in the RM, and one could potentially add "yarn rmadmin -clean-shared-cache". In any case, given there is no way to reach a running SCM, I would remove runCleanerTask altogether for now, and add it back later when we need it? Thoughts? ## Should we worry about users starting SCMs with roots at different levels that can lead to multiple cleaners? #CleanerTask: ## Nit: RENAMED_SUFFIX can be private ## Add @Override to run() ## The following code is duplicated in InMemoryStateStore a well. May be, we should just add a static method to SharedCacheUtil? {code} StringBuilder pattern = new StringBuilder(); for (int i = 0; i < nestedLevel; i++) { pattern.append("*/"); } pattern.append("*"); {code} ## process: the sleep between directories is in millis, do we want to really calculate nanos? ## Should cleanResourceReferences be moved to SCMStore? ## For the race condition (YARN-2663), would it help to handle the delete files on HDFS in the store#remove? # CleanerMetrics: ## Make initSingleton private and call it in getInstance if the instance is null? ## How about using MutableRate or MutableStat for the rates? # Do we need CleanerMetricsCollector, wouldn't CleanerMetrics extending MetricsSource suffice? > Cleaner service for cache manager > --------------------------------- > > Key: YARN-2183 > URL: https://issues.apache.org/jira/browse/YARN-2183 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Chris Trezzo > Assignee: Chris Trezzo > Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, > YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch > > > Implement the cleaner service for the cache manager along with metrics for > the service. This service is responsible for cleaning up old resource > references in the manager and removing stale entries from the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)