[
https://issues.apache.org/jira/browse/YARN-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389632#comment-16389632
]
Steve Loughran commented on YARN-6136:
--------------------------------------
It's just trying to do a cleanup at the end, no matter how things exit.
This could trivially be made optional
> YARN registry service should avoid scanning whole ZK tree for every
> container/application finish
> ------------------------------------------------------------------------------------------------
>
> Key: YARN-6136
> URL: https://issues.apache.org/jira/browse/YARN-6136
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: api, resourcemanager
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Priority: Critical
>
> In existing registry service implementation, purge operation triggered by
> container finish event:
> {code}
> public void onContainerFinished(ContainerId id) throws IOException {
> LOG.info("Container {} finished, purging container-level records",
> id);
> purgeRecordsAsync("/",
> id.toString(),
> PersistencePolicies.CONTAINER);
> }
> {code}
> Since this happens on every container finish, so it essentially scans all (or
> almost) ZK node from the root.
> We have a cluster which have hundreds of ZK nodes for service registry, and
> have 20K+ ZK nodes for other purposes. The existing implementation could
> generate massive ZK operations and internal Java objects (RegistryPathStatus)
> as well. The RM becomes very unstable when there're batch container finish
> events because of full GC pause and ZK connection failure.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]