[jira] [Comment Edited] (ZOOKEEPER-2383) Startup race in ZooKeeperServer

2016-12-21 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768373#comment-15768373
 ] 

Flavio Junqueira edited comment on ZOOKEEPER-2383 at 12/21/16 10:36 PM:


[~rakesh_r] pull request #101 merged fine onto master, but not branch 3.5. I 
haven't tried branch 3.4.


was (Author: fpj):
[~rakesh_r] pull request #101 merge fine onto master, but not branch 3.5. I 
haven't tried branch 3.4.

> Startup race in ZooKeeperServer
> ---
>
> Key: ZOOKEEPER-2383
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: jmx, server
>Affects Versions: 3.4.8
>Reporter: Steve Rowe
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, 
> ZOOKEEPER-2383-br-3-4.patch, ZOOKEEPER-2383.patch, ZOOKEEPER-2383.patch, 
> ZOOKEEPER-2383.patch, ZOOKEEPER-2383.patch, 
> release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log, 
> zk-3.4.8-NPE.log
>
>
> In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 
> (SOLR-8724) I ran into test failures where attempts to create a node in a 
> newly started standalone ZooKeeperServer were failing because of an assertion 
> in MBeanRegistry.
> ZooKeeperServer.startup() first sets up its request processor chain then 
> registers itself in JMX, but if a connection comes in before the server's JMX 
> registration happens, registration of the connection will fail because it 
> trips the assertion that (effectively) its parent (the server) has already 
> registered itself.
> {code:java|title=ZooKeeperServer.java}
> public synchronized void startup() {
> if (sessionTracker == null) {
> createSessionTracker();
> }
> startSessionTracker();
> setupRequestProcessors();
> registerJMX();
> state = State.RUNNING;
> notifyAll();
> }
> {code}
> {code:java|title=MBeanRegistry.java}
> public void register(ZKMBeanInfo bean, ZKMBeanInfo parent)
> throws JMException
> {
> assert bean != null;
> String path = null;
> if (parent != null) {
> path = mapBean2Path.get(parent);
> assert path != null;
> }
> {code}
> This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this 
> issue with ZK 3.4.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ZOOKEEPER-2383) Startup race in ZooKeeperServer

2016-09-17 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15499350#comment-15499350
 ] 

Rakesh R edited comment on ZOOKEEPER-2383 at 9/17/16 5:29 PM:
--

Thanks [~fpj] for the comments. 

bq. In NettyServerCnxn, it looks like we are only updating the code for 4lws. 
Don't we have to also update it here:

I think, I have handled this case and the following condition present in 
[ZOOKEEPER-2383-br-3-4.patch|https://issues.apache.org/jira/secure/attachment/12821043/ZOOKEEPER-2383-br-3-4.patch].
 Am I missing any other case apart from the below line of code.
{code}
@@ -735,7 +735,7 @@
 bb.flip();
 
 ZooKeeperServer zks = this.zkServer;
-if (zks == null) {
+if (zks == null || !zks.isRunning()) {
 throw new IOException("ZK down");
 }
 if (initialized) {
{code}

bq. but it never exists as the client keeps trying to connect. It sounds like 
some thread is hanging and not letting the test framework exit.
Without patch it fails at {{simplezks.waitForStartupInvocation(10)}}. On the 
other side {{SimpleZooKeeperServer#startup}} thread is waiting at 
startupDelayLatch.await() and this is causing an indefinite wait. How about 
adding the countdown logic in finally so that it will proceed:
{code}
try {
Assert.assertFalse(
"Should fail to create zk client session as server is not 
fully started",
simplezks.waitForSessionCreation(10));
} finally {
LOG.info(
"Decrements the count of the latch, so that server will 
proceed with startup");
startupDelayLatch.countDown();
}
{code}

bq. This sentence doesn't make much sense to me: Since zk server is not started 
createsession method to be invoked
I will modify this to "Should fail to create zk client session as server is not 
fully started"

bq. Please reduce the test case timeout to no longer than 30s.
Agreed.

FYI, apart from the above, I had fixed [Michael's 
comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15409673&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15409673]
 and [Raul's 
comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15408826&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15408826]
 in trunk patch, need to rebase branch-3-4 patch to incorporate these comments.


was (Author: rakeshr):
Thanks [~fpj] for the comments. 

bq. In NettyServerCnxn, it looks like we are only updating the code for 4lws. 
Don't we have to also update it here:

I think, I have handled this case and the following condition present in 
[ZOOKEEPER-2383-br-3-4.patch|https://issues.apache.org/jira/secure/attachment/12821043/ZOOKEEPER-2383-br-3-4.patch].
 Am I missing any other case apart from the below line of code.
{code}
@@ -735,7 +735,7 @@
 bb.flip();
 
 ZooKeeperServer zks = this.zkServer;
-if (zks == null) {
+if (zks == null || !zks.isRunning()) {
 throw new IOException("ZK down");
 }
 if (initialized) {
{code}

bq. but it never exists as the client keeps trying to connect. It sounds like 
some thread is hanging and not letting the test framework exit.
Without patch it fails at {{simplezks.waitForStartupInvocation(10)}}. On the 
other side {{SimpleZooKeeperServer#startup}} thread is waiting at 
startupDelayLatch.await() and this is causing an indefinite wait. How about 
adding a timed out like below:
{code}
startupDelayLatch.await(15, TimeUnit.SECONDS);
{code}

bq. This sentence doesn't make much sense to me: Since zk server is not started 
createsession method to be invoked
I will modify this to "Should fail to create zk client session as server is not 
fully started"

bq. Please reduce the test case timeout to no longer than 30s.
Agreed.

FYI, apart from the above, I had fixed [Michael's 
comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15409673&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15409673]
 and [Raul's 
comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15408826&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15408826]
 in trunk patch, need to rebase branch-3-4 patch to incorporate these comments.

> Startup race in ZooKeeperServer
> ---
>
> Key: ZOOKEEPER-2383
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
> Project: ZooKeeper
> 

[jira] [Comment Edited] (ZOOKEEPER-2383) Startup race in ZooKeeperServer

2016-09-17 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15499350#comment-15499350
 ] 

Rakesh R edited comment on ZOOKEEPER-2383 at 9/17/16 5:21 PM:
--

Thanks [~fpj] for the comments. 

bq. In NettyServerCnxn, it looks like we are only updating the code for 4lws. 
Don't we have to also update it here:

I think, I have handled this case and the following condition present in 
[ZOOKEEPER-2383-br-3-4.patch|https://issues.apache.org/jira/secure/attachment/12821043/ZOOKEEPER-2383-br-3-4.patch].
 Am I missing any other case apart from the below line of code.
{code}
@@ -735,7 +735,7 @@
 bb.flip();
 
 ZooKeeperServer zks = this.zkServer;
-if (zks == null) {
+if (zks == null || !zks.isRunning()) {
 throw new IOException("ZK down");
 }
 if (initialized) {
{code}

bq. but it never exists as the client keeps trying to connect. It sounds like 
some thread is hanging and not letting the test framework exit.
Without patch it fails at {{simplezks.waitForStartupInvocation(10)}}. On the 
other side {{SimpleZooKeeperServer#startup}} thread is waiting at 
startupDelayLatch.await() and this is causing an indefinite wait. How about 
adding a timed out like below:
{code}
startupDelayLatch.await(15, TimeUnit.SECONDS);
{code}

bq. This sentence doesn't make much sense to me: Since zk server is not started 
createsession method to be invoked
I will modify this to "Should fail to create zk client session as server is not 
fully started"

bq. Please reduce the test case timeout to no longer than 30s.
Agreed.

FYI, apart from the above, I had fixed [Michael's 
comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15409673&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15409673]
 and [Raul's 
comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15408826&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15408826]
 in trunk patch, need to rebase branch-3-4 patch to incorporate these comments.


was (Author: rakeshr):
Thanks [~fpj] for the comments. 

bq. In NettyServerCnxn, it looks like we are only updating the code for 4lws. 
Don't we have to also update it here:

I think, I have handled this case and the following condition present in 
[ZOOKEEPER-2383-br-3-4.patch|https://issues.apache.org/jira/secure/attachment/12821043/ZOOKEEPER-2383-br-3-4.patch].
 Am I missing any other case apart from the below line of code.
{code}
@@ -735,7 +735,7 @@
 bb.flip();
 
 ZooKeeperServer zks = this.zkServer;
-if (zks == null) {
+if (zks == null || !zks.isRunning()) {
 throw new IOException("ZK down");
 }
 if (initialized) {
{code}

bq. but it never exists as the client keeps trying to connect. It sounds like 
some thread is hanging and not letting the test framework exit.
Without patch it fails at {{simplezks.waitForStartupInvocation(10)}}. On the 
other side {{SimpleZooKeeperServer#startup}} thread is waiting at 
startupDelayLatch.await() and this is causing an indefinite wait. How about 
adding a timed out like below:
{code}
startupDelayLatch.await(15, TimeUnit.SECONDS);
{code}

bq. This sentence doesn't make much sense to me: Since zk server is not started 
createsession method to be invoked
I will modify this to "Failed to establish ZooKeeper client connection"

bq. Please reduce the test case timeout to no longer than 30s.
Agreed.

FYI, apart from the above, I had fixed [Michael's 
comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15409673&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15409673]
 and [Raul's 
comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15408826&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15408826]
 in trunk patch, need to rebase branch-3-4 patch to incorporate these comments.

> Startup race in ZooKeeperServer
> ---
>
> Key: ZOOKEEPER-2383
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: jmx, server
>Affects Versions: 3.4.8
>Reporter: Steve Rowe
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, 
> ZOOKEEPER-2383-br-3-4.patch, ZOOKEEPER-2383.patch, ZOOKEEPER-2383.patch, 
> ZOOKEEPER-2383.patch, relea

[jira] [Comment Edited] (ZOOKEEPER-2383) Startup race in ZooKeeperServer

2016-08-05 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409828#comment-15409828
 ] 

Rakesh R edited comment on ZOOKEEPER-2383 at 8/5/16 6:10 PM:
-

Thanks [~hanm] for the reviews. It looks like {{ZKUtil.java}} has ZooKeeper 
client related operations. Attached new patch, where I've tried keeping 
{{isZKServerRunning()}} in {{AbstractFourLetterCommand.java}}. Does this sound 
good to you?


was (Author: rakeshr):
Thanks [~hanm] for the reviews. Attached new patch, where I've tried keeping 
{{isZKServerRunning()}} in {{AbstractFourLetterCommand.java}}. Does this sound 
good to you?

> Startup race in ZooKeeperServer
> ---
>
> Key: ZOOKEEPER-2383
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: jmx, server
>Affects Versions: 3.4.8
>Reporter: Steve Rowe
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.9, 3.5.3, 3.6.0
>
> Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, 
> ZOOKEEPER-2383-br-3-4.patch, ZOOKEEPER-2383.patch, ZOOKEEPER-2383.patch, 
> ZOOKEEPER-2383.patch, release-3.4.8-extra-logging.patch, 
> zk-3.4.8-MBeanRegistry.log, zk-3.4.8-NPE.log
>
>
> In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 
> (SOLR-8724) I ran into test failures where attempts to create a node in a 
> newly started standalone ZooKeeperServer were failing because of an assertion 
> in MBeanRegistry.
> ZooKeeperServer.startup() first sets up its request processor chain then 
> registers itself in JMX, but if a connection comes in before the server's JMX 
> registration happens, registration of the connection will fail because it 
> trips the assertion that (effectively) its parent (the server) has already 
> registered itself.
> {code:java|title=ZooKeeperServer.java}
> public synchronized void startup() {
> if (sessionTracker == null) {
> createSessionTracker();
> }
> startSessionTracker();
> setupRequestProcessors();
> registerJMX();
> state = State.RUNNING;
> notifyAll();
> }
> {code}
> {code:java|title=MBeanRegistry.java}
> public void register(ZKMBeanInfo bean, ZKMBeanInfo parent)
> throws JMException
> {
> assert bean != null;
> String path = null;
> if (parent != null) {
> path = mapBean2Path.get(parent);
> assert path != null;
> }
> {code}
> This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this 
> issue with ZK 3.4.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ZOOKEEPER-2383) Startup race in ZooKeeperServer

2016-03-08 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185323#comment-15185323
 ] 

Flavio Junqueira edited comment on ZOOKEEPER-2383 at 3/8/16 5:39 PM:
-

[~steve_rowe] Thanks for reporting this issue. According to git blame, the 
latest changes around the startup method in ZooKeeperServer are due to 
ZOOKEEPER-1907, which actually turned out to be quite problematic, so this 
could be another issue due to that patch, I'm not sure.

{noformat}
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  411) public synchronized 
void startup() {
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  412) if 
(sessionTracker == null) {
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  413) 
createSessionTracker();
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  414) }
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  415) 
startSessionTracker();
097b7979 zookeeper/java/src/com/yahoo/zookeeper/server/ZooKeeperServer.java 
(Benjamin Reed  2008-05-12 23:01:25 +  416) 
setupRequestProcessors();
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  417) 
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  418) registerJMX();
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  419) 
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  420) state = 
State.RUNNING;
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  421) notifyAll();
097b7979 zookeeper/java/src/com/yahoo/zookeeper/server/ZooKeeperServer.java 
(Benjamin Reed  2008-05-12 23:01:25 +  422) }

{noformat}

{noformat}
commit 91f579e40755de870ed9123c8fd55925517d9aa6
Author: Hongchao Deng 
Date:   Mon Aug 17 20:52:07 2015 +

ZOOKEEPER-1907 Improve Thread handling (Rakesh R via hdeng)

git-svn-id: 
https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1696337 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}

[~rakesh_r] could you have a look, please?

CC [~rgs] [~phunt]


was (Author: fpj):
[~steve_rowe] Thanks for reporting this issue. According to git blame, the 
latest changes around the startup method in ZooKeeperServer is due to 
ZOOKEEPER-1907, which actually turned out to be quite problematic, so this 
could be another issue due to that patch, I'm not sure.

{noformat}
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  411) public synchronized 
void startup() {
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  412) if 
(sessionTracker == null) {
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  413) 
createSessionTracker();
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  414) }
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  415) 
startSessionTracker();
097b7979 zookeeper/java/src/com/yahoo/zookeeper/server/ZooKeeperServer.java 
(Benjamin Reed  2008-05-12 23:01:25 +  416) 
setupRequestProcessors();
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  417) 
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  418) registerJMX();
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  419) 
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  420) state = 
State.RUNNING;
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  421) notifyAll();
097b7979 zookeeper/java/src/com/yahoo/zookeeper/server/ZooKe