[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Rohith Sharma K S (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915613#comment-16915613
 ] 

Rohith Sharma K S edited comment on YARN-9714 at 8/26/19 8:59 AM:
--

I see. Thanks. May be fix could be check for curator disabled and close the 
connection. Make sense?


was (Author: rohithsharma):
I see. Thanks. May be fix could be check for curator enabled and close the 
connection. Make sense?

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915598#comment-16915598
 ] 

Tao Yang edited comment on YARN-9714 at 8/26/19 8:45 AM:
-

Hi, [~rohithsharma].
I have commented (over 
[here|https://issues.apache.org/jira/browse/YARN-9714?focusedCommentId=16896704=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16896704])
 for this: "As for zkManager in ZKStateStore, it will reuse zkManager for HA 
when RM uses the Curator-based elector for leader election, otherwise it will 
be created for ZKRMStateStore".  Please refer to 
{{ResourceManager#createEmbeddedElector}} and {{ZKRMStateStore#initInternal}} 
for details.


was (Author: tao yang):
Hi, [~rohithsharma].

I have commented (over here) for this: "As for zkManager in ZKStateStore, it 
will reuse zkManager for HA when RM uses the Curator-based elector for leader 
election, otherwise it will be created for ZKRMStateStore".  Please refer to 
{{ResourceManager#createEmbeddedElector}} and {{ZKRMStateStore#initInternal}} 
for details.

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-26 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915598#comment-16915598
 ] 

Tao Yang edited comment on YARN-9714 at 8/26/19 8:43 AM:
-

Hi, [~rohithsharma].

I have commented (over here) for this: "As for zkManager in ZKStateStore, it 
will reuse zkManager for HA when RM uses the Curator-based elector for leader 
election, otherwise it will be created for ZKRMStateStore".  Please refer to 
{{ResourceManager#createEmbeddedElector}} and {{ZKRMStateStore#initInternal}} 
for details.


was (Author: tao yang):
Hi, [~rohithsharma].

I have commented (over 
[here|https://issues.apache.org/jira/browse/YARN-9714?focusedCommentId=16896704=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16896704])
 for this: "As for zkManager in ZKStateStore, it will reuse zkManager for HA 
when RM uses the Curator-based elector for leader election, otherwise it will 
be created for ZKRMStateStore". 

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-07-30 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896704#comment-16896704
 ] 

Tao Yang edited comment on YARN-9714 at 7/31/19 2:29 AM:
-

Hi, [~bibinchundatt].
{quote}IIUC the zookeer StateStore is not an active service and zookeeper 
connection is common for leader election too.
 Do we really need to close the connection ??
{quote}
RMStateStore is an active service which will be created for every 
RMActiveServices instance. As for zkManager in ZKStateStore, it will reuse 
zkManager for HA when RM uses the Curator-based elector for leader election, 
otherwise it will be created for ZKRMStateStore, so that we should only close 
it when it's not for HA in ZKRMStateStore#serviceStop. Make sense?
{quote}
Few other issues in 3.1.1 which got fixed recently are YARN-9644,9639
{quote}
Thanks, I noticed those issues before but just missed YARN-9642 :(
 


was (Author: tao yang):
Hi, [~bibinchundatt].
{quote}
IIUC the zookeer StateStore is not an active service and zookeeper connection 
is common for leader election too.
Do we really need to close the connection ??
{quote}
RMStateStore is an active service which will be created for every 
RMActiveServices instance. As for zkManager in ZKStateStore, it will reuse 
zkManager for HA when RM uses the Curator-based elector for leader election, 
otherwise it will be created for ZKRMStateStore, so that we should only close 
it when it's not for HA in ZKRMStateStore#serviceStop. Make sense?

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-07-30 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896704#comment-16896704
 ] 

Tao Yang edited comment on YARN-9714 at 7/31/19 2:21 AM:
-

Hi, [~bibinchundatt].
{quote}
IIUC the zookeer StateStore is not an active service and zookeeper connection 
is common for leader election too.
Do we really need to close the connection ??
{quote}
RMStateStore is an active service which will be created for every 
RMActiveServices instance. As for zkManager in ZKStateStore, it will reuse 
zkManager for HA when RM uses the Curator-based elector for leader election, 
otherwise it will be created for ZKRMStateStore, so that we should only close 
it when it's not for HA in ZKRMStateStore#serviceStop. Make sense?


was (Author: tao yang):
Hi, [~bibinchundatt].
{quote}
IIUC the zookeer StateStore is not an active service and zookeeper connection 
is common for leader election too.
Do we really need to close the connection ??
{qoute}
RMStateStore is an active service which will be created for every 
RMActiveServices instance. As for zkManager in ZKStateStore, it will reuse 
zkManager for HA when RM uses the Curator-based elector for leader election, 
otherwise it will be created for ZKRMStateStore, so that we should only close 
it when it's not for HA in ZKRMStateStore#serviceStop. Make sense?

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org