[jira] [Commented] (YARN-6136) YARN registry service should avoid scanning whole ZK tree for every container/application finish

2018-03-07 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389632#comment-16389632
 ] 

Steve Loughran commented on YARN-6136:
--

It's just trying to do a cleanup at the end, no matter how things exit.

This could trivially be made optional

> YARN registry service should avoid scanning whole ZK tree for every 
> container/application finish
> 
>
> Key: YARN-6136
> URL: https://issues.apache.org/jira/browse/YARN-6136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
>
> In existing registry service implementation, purge operation triggered by 
> container finish event:
> {code}
>   public void onContainerFinished(ContainerId id) throws IOException {
> LOG.info("Container {} finished, purging container-level records",
> id);
> purgeRecordsAsync("/",
> id.toString(),
> PersistencePolicies.CONTAINER);
>   }
> {code} 
> Since this happens on every container finish, so it essentially scans all (or 
> almost) ZK node from the root. 
> We have a cluster which have hundreds of ZK nodes for service registry, and 
> have 20K+ ZK nodes for other purposes. The existing implementation could 
> generate massive ZK operations and internal Java objects (RegistryPathStatus) 
> as well. The RM becomes very unstable when there're batch container finish 
> events because of full GC pause and ZK connection failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6136) YARN registry service should avoid scanning whole ZK tree for every container/application finish

2017-01-31 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847800#comment-15847800
 ] 

Gour Saha commented on YARN-6136:
-

[~wangda] FYI, Slider today uses the following path -
{code}
/registry/users/{user-id}/services/org-apache-slider/{app-name}/components/{container-id}
{code}

> YARN registry service should avoid scanning whole ZK tree for every 
> container/application finish
> 
>
> Key: YARN-6136
> URL: https://issues.apache.org/jira/browse/YARN-6136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
>
> In existing registry service implementation, purge operation triggered by 
> container finish event:
> {code}
>   public void onContainerFinished(ContainerId id) throws IOException {
> LOG.info("Container {} finished, purging container-level records",
> id);
> purgeRecordsAsync("/",
> id.toString(),
> PersistencePolicies.CONTAINER);
>   }
> {code} 
> Since this happens on every container finish, so it essentially scans all (or 
> almost) ZK node from the root. 
> We have a cluster which have hundreds of ZK nodes for service registry, and 
> have 20K+ ZK nodes for other purposes. The existing implementation could 
> generate massive ZK operations and internal Java objects (RegistryPathStatus) 
> as well. The RM becomes very unstable when there're batch container finish 
> events because of full GC pause and ZK connection failure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org