[jira] [Commented] (YARN-6136) YARN registry service should avoid scanning whole ZK tree for every container/application finish
[ https://issues.apache.org/jira/browse/YARN-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389632#comment-16389632 ] Steve Loughran commented on YARN-6136: -- It's just trying to do a cleanup at the end, no matter how things exit. This could trivially be made optional > YARN registry service should avoid scanning whole ZK tree for every > container/application finish > > > Key: YARN-6136 > URL: https://issues.apache.org/jira/browse/YARN-6136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > > In existing registry service implementation, purge operation triggered by > container finish event: > {code} > public void onContainerFinished(ContainerId id) throws IOException { > LOG.info("Container {} finished, purging container-level records", > id); > purgeRecordsAsync("/", > id.toString(), > PersistencePolicies.CONTAINER); > } > {code} > Since this happens on every container finish, so it essentially scans all (or > almost) ZK node from the root. > We have a cluster which have hundreds of ZK nodes for service registry, and > have 20K+ ZK nodes for other purposes. The existing implementation could > generate massive ZK operations and internal Java objects (RegistryPathStatus) > as well. The RM becomes very unstable when there're batch container finish > events because of full GC pause and ZK connection failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6136) YARN registry service should avoid scanning whole ZK tree for every container/application finish
[ https://issues.apache.org/jira/browse/YARN-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847800#comment-15847800 ] Gour Saha commented on YARN-6136: - [~wangda] FYI, Slider today uses the following path - {code} /registry/users/{user-id}/services/org-apache-slider/{app-name}/components/{container-id} {code} > YARN registry service should avoid scanning whole ZK tree for every > container/application finish > > > Key: YARN-6136 > URL: https://issues.apache.org/jira/browse/YARN-6136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > > In existing registry service implementation, purge operation triggered by > container finish event: > {code} > public void onContainerFinished(ContainerId id) throws IOException { > LOG.info("Container {} finished, purging container-level records", > id); > purgeRecordsAsync("/", > id.toString(), > PersistencePolicies.CONTAINER); > } > {code} > Since this happens on every container finish, so it essentially scans all (or > almost) ZK node from the root. > We have a cluster which have hundreds of ZK nodes for service registry, and > have 20K+ ZK nodes for other purposes. The existing implementation could > generate massive ZK operations and internal Java objects (RegistryPathStatus) > as well. The RM becomes very unstable when there're batch container finish > events because of full GC pause and ZK connection failure. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org