[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
[ https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915613#comment-16915613 ] Rohith Sharma K S edited comment on YARN-9714 at 8/26/19 8:59 AM: -- I see. Thanks. May be fix could be check for curator disabled and close the connection. Make sense? was (Author: rohithsharma): I see. Thanks. May be fix could be check for curator enabled and close the connection. Make sense? > ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby > - > > Key: YARN-9714 > URL: https://issues.apache.org/jira/browse/YARN-9714 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Labels: memory-leak > Attachments: YARN-9714.001.patch, YARN-9714.002.patch > > > Recently RM full GC happened in one of our clusters, after investigating the > dump memory and jstack, I found two places in RM may cause memory leaks after > RM transitioned to standby: > # Release cache cleanup timer in AbstractYarnScheduler never be canceled. > # ZooKeeper connection in ZKRMStateStore never be closed. > To solve those leaks, we should close the connection or cancel the timer when > services are stopping. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
[ https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915598#comment-16915598 ] Tao Yang edited comment on YARN-9714 at 8/26/19 8:45 AM: - Hi, [~rohithsharma]. I have commented (over [here|https://issues.apache.org/jira/browse/YARN-9714?focusedCommentId=16896704=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16896704]) for this: "As for zkManager in ZKStateStore, it will reuse zkManager for HA when RM uses the Curator-based elector for leader election, otherwise it will be created for ZKRMStateStore". Please refer to {{ResourceManager#createEmbeddedElector}} and {{ZKRMStateStore#initInternal}} for details. was (Author: tao yang): Hi, [~rohithsharma]. I have commented (over here) for this: "As for zkManager in ZKStateStore, it will reuse zkManager for HA when RM uses the Curator-based elector for leader election, otherwise it will be created for ZKRMStateStore". Please refer to {{ResourceManager#createEmbeddedElector}} and {{ZKRMStateStore#initInternal}} for details. > ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby > - > > Key: YARN-9714 > URL: https://issues.apache.org/jira/browse/YARN-9714 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Labels: memory-leak > Attachments: YARN-9714.001.patch, YARN-9714.002.patch > > > Recently RM full GC happened in one of our clusters, after investigating the > dump memory and jstack, I found two places in RM may cause memory leaks after > RM transitioned to standby: > # Release cache cleanup timer in AbstractYarnScheduler never be canceled. > # ZooKeeper connection in ZKRMStateStore never be closed. > To solve those leaks, we should close the connection or cancel the timer when > services are stopping. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
[ https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915598#comment-16915598 ] Tao Yang edited comment on YARN-9714 at 8/26/19 8:43 AM: - Hi, [~rohithsharma]. I have commented (over here) for this: "As for zkManager in ZKStateStore, it will reuse zkManager for HA when RM uses the Curator-based elector for leader election, otherwise it will be created for ZKRMStateStore". Please refer to {{ResourceManager#createEmbeddedElector}} and {{ZKRMStateStore#initInternal}} for details. was (Author: tao yang): Hi, [~rohithsharma]. I have commented (over [here|https://issues.apache.org/jira/browse/YARN-9714?focusedCommentId=16896704=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16896704]) for this: "As for zkManager in ZKStateStore, it will reuse zkManager for HA when RM uses the Curator-based elector for leader election, otherwise it will be created for ZKRMStateStore". > ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby > - > > Key: YARN-9714 > URL: https://issues.apache.org/jira/browse/YARN-9714 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Labels: memory-leak > Attachments: YARN-9714.001.patch, YARN-9714.002.patch > > > Recently RM full GC happened in one of our clusters, after investigating the > dump memory and jstack, I found two places in RM may cause memory leaks after > RM transitioned to standby: > # Release cache cleanup timer in AbstractYarnScheduler never be canceled. > # ZooKeeper connection in ZKRMStateStore never be closed. > To solve those leaks, we should close the connection or cancel the timer when > services are stopping. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
[ https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896704#comment-16896704 ] Tao Yang edited comment on YARN-9714 at 7/31/19 2:29 AM: - Hi, [~bibinchundatt]. {quote}IIUC the zookeer StateStore is not an active service and zookeeper connection is common for leader election too. Do we really need to close the connection ?? {quote} RMStateStore is an active service which will be created for every RMActiveServices instance. As for zkManager in ZKStateStore, it will reuse zkManager for HA when RM uses the Curator-based elector for leader election, otherwise it will be created for ZKRMStateStore, so that we should only close it when it's not for HA in ZKRMStateStore#serviceStop. Make sense? {quote} Few other issues in 3.1.1 which got fixed recently are YARN-9644,9639 {quote} Thanks, I noticed those issues before but just missed YARN-9642 :( was (Author: tao yang): Hi, [~bibinchundatt]. {quote} IIUC the zookeer StateStore is not an active service and zookeeper connection is common for leader election too. Do we really need to close the connection ?? {quote} RMStateStore is an active service which will be created for every RMActiveServices instance. As for zkManager in ZKStateStore, it will reuse zkManager for HA when RM uses the Curator-based elector for leader election, otherwise it will be created for ZKRMStateStore, so that we should only close it when it's not for HA in ZKRMStateStore#serviceStop. Make sense? > ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby > - > > Key: YARN-9714 > URL: https://issues.apache.org/jira/browse/YARN-9714 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Blocker > Labels: memory-leak > Attachments: YARN-9714.001.patch, YARN-9714.002.patch > > > Recently RM full GC happened in one of our clusters, after investigating the > dump memory and jstack, I found two places in RM may cause memory leaks after > RM transitioned to standby: > # Release cache cleanup timer in AbstractYarnScheduler never be canceled. > # ZooKeeper connection in ZKRMStateStore never be closed. > To solve those leaks, we should close the connection or cancel the timer when > services are stopping. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
[ https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896704#comment-16896704 ] Tao Yang edited comment on YARN-9714 at 7/31/19 2:21 AM: - Hi, [~bibinchundatt]. {quote} IIUC the zookeer StateStore is not an active service and zookeeper connection is common for leader election too. Do we really need to close the connection ?? {quote} RMStateStore is an active service which will be created for every RMActiveServices instance. As for zkManager in ZKStateStore, it will reuse zkManager for HA when RM uses the Curator-based elector for leader election, otherwise it will be created for ZKRMStateStore, so that we should only close it when it's not for HA in ZKRMStateStore#serviceStop. Make sense? was (Author: tao yang): Hi, [~bibinchundatt]. {quote} IIUC the zookeer StateStore is not an active service and zookeeper connection is common for leader election too. Do we really need to close the connection ?? {qoute} RMStateStore is an active service which will be created for every RMActiveServices instance. As for zkManager in ZKStateStore, it will reuse zkManager for HA when RM uses the Curator-based elector for leader election, otherwise it will be created for ZKRMStateStore, so that we should only close it when it's not for HA in ZKRMStateStore#serviceStop. Make sense? > ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby > - > > Key: YARN-9714 > URL: https://issues.apache.org/jira/browse/YARN-9714 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Blocker > Labels: memory-leak > Attachments: YARN-9714.001.patch, YARN-9714.002.patch > > > Recently RM full GC happened in one of our clusters, after investigating the > dump memory and jstack, I found two places in RM may cause memory leaks after > RM transitioned to standby: > # Release cache cleanup timer in AbstractYarnScheduler never be canceled. > # ZooKeeper connection in ZKRMStateStore never be closed. > To solve those leaks, we should close the connection or cancel the timer when > services are stopping. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org