[
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896704#comment-16896704
]
Tao Yang edited comment on YARN-9714 at 7/31/19 2:29 AM:
---------------------------------------------------------
Hi, [~bibinchundatt].
{quote}IIUC the zookeer StateStore is not an active service and zookeeper
connection is common for leader election too.
Do we really need to close the connection ??
{quote}
RMStateStore is an active service which will be created for every
RMActiveServices instance. As for zkManager in ZKStateStore, it will reuse
zkManager for HA when RM uses the Curator-based elector for leader election,
otherwise it will be created for ZKRMStateStore, so that we should only close
it when it's not for HA in ZKRMStateStore#serviceStop. Make sense?
{quote}
Few other issues in 3.1.1 which got fixed recently are YARN-9644,9639
{quote}
Thanks, I noticed those issues before but just missed YARN-9642 :(
was (Author: tao yang):
Hi, [~bibinchundatt].
{quote}
IIUC the zookeer StateStore is not an active service and zookeeper connection
is common for leader election too.
Do we really need to close the connection ??
{quote}
RMStateStore is an active service which will be created for every
RMActiveServices instance. As for zkManager in ZKStateStore, it will reuse
zkManager for HA when RM uses the Curator-based elector for leader election,
otherwise it will be created for ZKRMStateStore, so that we should only close
it when it's not for HA in ZKRMStateStore#serviceStop. Make sense?
> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -----------------------------------------------------------------------------
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Blocker
> Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the
> dump memory and jstack, I found two places in RM may cause memory leaks after
> RM transitioned to standby:
> # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
> # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when
> services are stopping.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]