[ 
https://issues.apache.org/jira/browse/YARN-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847601#comment-15847601
 ] 

Wangda Tan commented on YARN-6136:
----------------------------------

Per my understanding, existing design of service registry to do the full scan 
of ZK tree is, it doesn't have a pre-knowledge about how the znodes are 
organized for services. So it has to do full scan of the ZK tree to find the 
first matched znode which has YARN_ID equals to container-id.

To solve the problem, I think we can enforce a rule of how the ZK tree is 
organized, for example:
{code}
/services/{user}/{app-type}/{app-name}/{container-id}
{code}
For services and:
{code}
/yarn-daemons/{rule-name, like RM/NM}/{host:port}
{code}
For internal daemons.
With this we can directly locate znode by finished container / app info.

Not sure if is this discussed already, and not sure if there's any other 
approaches to solve the issue. [~steve_l], could you please add your thoughts? 

+ [~vinodkv], [~sidharta-s], [~gsaha], [~jianhe].

> Registry should avoid scanning whole ZK tree for every container/application 
> finish
> -----------------------------------------------------------------------------------
>
>                 Key: YARN-6136
>                 URL: https://issues.apache.org/jira/browse/YARN-6136
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, resourcemanager
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Critical
>
> In existing registry service implementation, purge operation triggered by 
> container finish event:
> {code}
>   public void onContainerFinished(ContainerId id) throws IOException {
>     LOG.info("Container {} finished, purging container-level records",
>         id);
>     purgeRecordsAsync("/",
>         id.toString(),
>         PersistencePolicies.CONTAINER);
>   }
> {code} 
> Since this happens on every container finish, so it essentially scans all (or 
> almost) ZK node from the root. 
> We have a cluster which have hundreds of ZK nodes for service registry, and 
> have 20K+ ZK nodes for other purposes. The existing implementation could 
> generate massive ZK operations and internal Java objects (RegistryPathStatus) 
> as well. The RM becomes very unstable when there're batch container finish 
> events because of full GC pause and ZK connection failure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to