[jira] Updated: (ZOOKEEPER-475) FLENewEpochTest failed on nightly builds.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Paiva Junqueira updated ZOOKEEPER-475: - Attachment: ZOOKEEPER-475.patch Another rough patch. It does not make any changes to cnx manager, but it adds one case to fle. > FLENewEpochTest failed on nightly builds. > - > > Key: ZOOKEEPER-475 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-475 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.2.0 >Reporter: Mahadev konar >Assignee: Flavio Paiva Junqueira >Priority: Blocker > Fix For: 3.2.1, 3.3.0 > > Attachments: ZOOKEEPER-475.patch, ZOOKEEPER-475.patch > > > THe flenewepochtest failed on one of the nightly builds - > http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/377. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-477) zkCleanup.sh is flaky
[ https://issues.apache.org/jira/browse/ZOOKEEPER-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-477: Fix Version/s: 3.3.0 3.2.1 > zkCleanup.sh is flaky > - > > Key: ZOOKEEPER-477 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-477 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.2.0 >Reporter: Fernando >Assignee: Fernando > Fix For: 3.2.1, 3.3.0 > > Attachments: ppp > > > the zkCleanup.sh script is buggy in two ways: > 1) it doesn't actually pass through the snapshot count, so it doesn't work > 2) it assumes that there is only dataDir, it doesn't support dataLogDir > And it can use cleanup, so that it doesn't blindly call eval from the config > file.. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-477) zkCleanup.sh is flaky
[ https://issues.apache.org/jira/browse/ZOOKEEPER-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fernando updated ZOOKEEPER-477: --- Attachment: ppp patch to fix zkCleanup.sh > zkCleanup.sh is flaky > - > > Key: ZOOKEEPER-477 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-477 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.2.0 >Reporter: Fernando >Assignee: Fernando > Fix For: 3.2.1, 3.3.0 > > Attachments: ppp > > > the zkCleanup.sh script is buggy in two ways: > 1) it doesn't actually pass through the snapshot count, so it doesn't work > 2) it assumes that there is only dataDir, it doesn't support dataLogDir > And it can use cleanup, so that it doesn't blindly call eval from the config > file.. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-477) zkCleanup.sh is flaky
[ https://issues.apache.org/jira/browse/ZOOKEEPER-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-477: Assignee: Fernando > zkCleanup.sh is flaky > - > > Key: ZOOKEEPER-477 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-477 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.2.0 >Reporter: Fernando >Assignee: Fernando > > the zkCleanup.sh script is buggy in two ways: > 1) it doesn't actually pass through the snapshot count, so it doesn't work > 2) it assumes that there is only dataDir, it doesn't support dataLogDir > And it can use cleanup, so that it doesn't blindly call eval from the config > file.. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-477) zkCleanup.sh is flaky
[ https://issues.apache.org/jira/browse/ZOOKEEPER-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732718#action_12732718 ] Mahadev konar commented on ZOOKEEPER-477: - fernando, can you please upload the patch in a file... just go to zookeeper-3.2.0/ directory and do an svn diff > patchfile.txt. Then upload the file via attach file link on the left hand side of this page. This way you will have to click on a button agreeing to donate your code to apache. This way we do not have any legal issues. Please do take a look at http://wiki.apache.org/hadoop/ZooKeeper/PoweredBy on how to contribute. thanks > zkCleanup.sh is flaky > - > > Key: ZOOKEEPER-477 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-477 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.2.0 >Reporter: Fernando > > the zkCleanup.sh script is buggy in two ways: > 1) it doesn't actually pass through the snapshot count, so it doesn't work > 2) it assumes that there is only dataDir, it doesn't support dataLogDir > And it can use cleanup, so that it doesn't blindly call eval from the config > file.. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-478) support custom hostnames for client and quorum connections
support custom hostnames for client and quorum connections -- Key: ZOOKEEPER-478 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-478 Project: Zookeeper Issue Type: Improvement Components: quorum, server Affects Versions: 3.2.0 Reporter: Chris Darroch Priority: Minor Our system administrators would love it if we could configure ZooKeeper to listen for client and quorum connections on a hostname which isn't bound to the localhost. Maybe there's some neat way to do this I'm not aware of already, of course, but it looks to me like we would need to change the two ss.socket().bind(new InetSocketAddress(port)); calls, one in NIOServerCnxn and one in QuorumCnxManager to so that they instead used InetSocketAddress(host, port). Obviously that implies some optional definition of a hostname in the config file as well and possibly on the command-line. Does that seem like the right approach? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-477) zkCleanup.sh is flaky
[ https://issues.apache.org/jira/browse/ZOOKEEPER-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732684#action_12732684 ] Fernando commented on ZOOKEEPER-477: Here is the diff/patch to apply. Yes I give all gives to Apache. --- /export/home/fern/servers/zookeeper-3.2.0/bin/zkCleanup.sh 2009-07-01 09:51:22.0 -0700 +++ puppet-mnt/etc/modules/zookeeper320/files/zkCleanup.sh 2009-07-17 12:01:08.0 -0700 @@ -36,8 +36,16 @@ . $ZOOBINDIR/zkEnv.sh -eval `grep -e "^dataDir=" $ZOOCFG` +ZOODATADIR=$(grep '^dataDir=' $ZOOCFG | sed -e 's/.*=//') +ZOODATALOGDIR=$(grep '^dataLogDir=' $ZOOCFG | sed -e 's/.*=//') +if [ "x${ZOODATALOGDIR}" = "x" ] +then java "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \ -cp $CLASSPATH $JVMFLAGS \ - org.apache.zookeeper.server.PurgeTxnLog $dataDir + org.apache.zookeeper.server.PurgeTxnLog $ZOODATADIR $* +else +java "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \ + -cp $CLASSPATH $JVMFLAGS \ + org.apache.zookeeper.server.PurgeTxnLog $ZOODATALOGDIR $ZOODATADIR $* +fi > zkCleanup.sh is flaky > - > > Key: ZOOKEEPER-477 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-477 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.2.0 >Reporter: Fernando > > the zkCleanup.sh script is buggy in two ways: > 1) it doesn't actually pass through the snapshot count, so it doesn't work > 2) it assumes that there is only dataDir, it doesn't support dataLogDir > And it can use cleanup, so that it doesn't blindly call eval from the config > file.. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-477) zkCleanup.sh is flaky
zkCleanup.sh is flaky - Key: ZOOKEEPER-477 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-477 Project: Zookeeper Issue Type: Bug Components: scripts Affects Versions: 3.2.0 Reporter: Fernando the zkCleanup.sh script is buggy in two ways: 1) it doesn't actually pass through the snapshot count, so it doesn't work 2) it assumes that there is only dataDir, it doesn't support dataLogDir And it can use cleanup, so that it doesn't blindly call eval from the config file.. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-475) FLENewEpochTest failed on nightly builds.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Paiva Junqueira updated ZOOKEEPER-475: - Attachment: ZOOKEEPER-475.patch Patch so far. > FLENewEpochTest failed on nightly builds. > - > > Key: ZOOKEEPER-475 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-475 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.2.0 >Reporter: Mahadev konar >Assignee: Flavio Paiva Junqueira >Priority: Blocker > Fix For: 3.2.1, 3.3.0 > > Attachments: ZOOKEEPER-475.patch > > > THe flenewepochtest failed on one of the nightly builds - > http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/377. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-311) handle small path lengths in zoo_create()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732654#action_12732654 ] Mahadev konar commented on ZOOKEEPER-311: - the qa isnt running because the nightly builds are failing. This should be fixed soon and we can get the patch process back on track. > handle small path lengths in zoo_create() > - > > Key: ZOOKEEPER-311 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-311 > Project: Zookeeper > Issue Type: Improvement > Components: c client >Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.1, 3.2.0 >Reporter: Chris Darroch >Assignee: Chris Darroch >Priority: Minor > Fix For: 3.2.1 > > Attachments: ZOOKEEPER-311.patch, ZOOKEEPER-311.patch > > > The synchronous completion for zoo_create() contains the following code:\\ > {noformat} > if (sc->u.str.str_len > strlen(res.path)) { > len = strlen(res.path); > } else { > len = sc->u.str.str_len-1; > } > if (len > 0) { > memcpy(sc->u.str.str, res.path, len); > sc->u.str.str[len] = '\0'; > } > {noformat} > In the case where the max_realpath_len argument to zoo_create() is 0, none of > this code executes, which is OK. In the case where max_realpath_len is 1, a > user might expect their buffer to be filled with a null terminator, but > again, nothing will happen (even if strlen(res.path) is 0, which is unlikely > since new node's will have paths longer than "/"). > The name of the argument to zoo_create() is also a little misleading, as is > its description ("the maximum length of real path you would want") in > zookeeper.h, and the example usage in the Programmer's Guide: > {noformat} > int rc = zoo_create(zh,"/xyz","value", 5, &CREATE_ONLY, ZOO_EPHEMERAL, > buffer, sizeof(buffer)-1); > {noformat} > In fact this value should be the actual length of the buffer, including space > for the null terminator. If the user supplies a max_realpath_len of 10 and a > buffer of 11 bytes, and strlen(res.path) is 10, the code will truncate the > returned value to 9 bytes and put the null terminator in the second-last > byte, leaving the final byte of the buffer unused. > It would be better, I think, to rename the realpath and max_realpath_len > arguments to something like path_buffer and path_buffer_len, akin to > zoo_set(). The path_buffer_len would be treated as the full length of the > buffer (as the code does now, in fact, but the docs suggest otherwise). > The code in the synchronous completion could then be changed as per the > attached patch. > Since this would change, slightly, the behaviour or "contract" of the API, I > would be inclined to suggest waiting until 4.0.0 to implement this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-475) FLENewEpochTest failed on nightly builds.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732589#action_12732589 ] Patrick Hunt commented on ZOOKEEPER-475: the nightly build failed again last night, this time due to a failure in HierarchicalQuorumTest Flavio can you take a look? If it's the same issue then we're good, otw please open another jira. We really need to fix these asap (to get CI and the patch process up and running again): http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/380/testReport/org.apache.zookeeper.test/HierarchicalQuorumTest/testHierarchicalQuorum/ > FLENewEpochTest failed on nightly builds. > - > > Key: ZOOKEEPER-475 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-475 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.2.0 >Reporter: Mahadev konar >Assignee: Flavio Paiva Junqueira >Priority: Blocker > Fix For: 3.2.1, 3.3.0 > > > THe flenewepochtest failed on one of the nightly builds - > http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/377. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-475) FLENewEpochTest failed on nightly builds.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-475: --- Component/s: quorum Priority: Blocker (was: Major) Affects Version/s: 3.2.0 > FLENewEpochTest failed on nightly builds. > - > > Key: ZOOKEEPER-475 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-475 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.2.0 >Reporter: Mahadev konar >Assignee: Flavio Paiva Junqueira >Priority: Blocker > Fix For: 3.2.1, 3.3.0 > > > THe flenewepochtest failed on one of the nightly builds - > http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/377. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-311) handle small path lengths in zoo_create()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732581#action_12732581 ] Benjamin Reed commented on ZOOKEEPER-311: - +1 great job chris! i likke your test cases. thanx. now let me see if i can find out why qa isn't running... > handle small path lengths in zoo_create() > - > > Key: ZOOKEEPER-311 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-311 > Project: Zookeeper > Issue Type: Improvement > Components: c client >Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.1, 3.2.0 >Reporter: Chris Darroch >Assignee: Chris Darroch >Priority: Minor > Fix For: 3.2.1 > > Attachments: ZOOKEEPER-311.patch, ZOOKEEPER-311.patch > > > The synchronous completion for zoo_create() contains the following code:\\ > {noformat} > if (sc->u.str.str_len > strlen(res.path)) { > len = strlen(res.path); > } else { > len = sc->u.str.str_len-1; > } > if (len > 0) { > memcpy(sc->u.str.str, res.path, len); > sc->u.str.str[len] = '\0'; > } > {noformat} > In the case where the max_realpath_len argument to zoo_create() is 0, none of > this code executes, which is OK. In the case where max_realpath_len is 1, a > user might expect their buffer to be filled with a null terminator, but > again, nothing will happen (even if strlen(res.path) is 0, which is unlikely > since new node's will have paths longer than "/"). > The name of the argument to zoo_create() is also a little misleading, as is > its description ("the maximum length of real path you would want") in > zookeeper.h, and the example usage in the Programmer's Guide: > {noformat} > int rc = zoo_create(zh,"/xyz","value", 5, &CREATE_ONLY, ZOO_EPHEMERAL, > buffer, sizeof(buffer)-1); > {noformat} > In fact this value should be the actual length of the buffer, including space > for the null terminator. If the user supplies a max_realpath_len of 10 and a > buffer of 11 bytes, and strlen(res.path) is 10, the code will truncate the > returned value to 9 bytes and put the null terminator in the second-last > byte, leaving the final byte of the buffer unused. > It would be better, I think, to rename the realpath and max_realpath_len > arguments to something like path_buffer and path_buffer_len, akin to > zoo_set(). The path_buffer_len would be treated as the full length of the > buffer (as the code does now, in fact, but the docs suggest otherwise). > The code in the synchronous completion could then be changed as per the > attached patch. > Since this would change, slightly, the behaviour or "contract" of the API, I > would be inclined to suggest waiting until 4.0.0 to implement this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: ZooKeeper-trunk #380
See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/380/ -- [...truncated 216532 lines...] [junit] expect:StandaloneServer_port [junit] found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2009-07-17 10:47:04,357 - INFO [main-SendThread(localhost:11225):clientcnxn$sendthr...@869] - Attempting connection to server localhost/127.0.0.1:11225 [junit] 2009-07-17 10:47:04,357 - INFO [main-SendThread(localhost:11225):clientcnxn$sendthr...@785] - Priming connection to java.nio.channels.SocketChannel[connected local=/127.0.0.1:44750 remote=localhost/127.0.0.1:11225] [junit] 2009-07-17 10:47:04,357 - INFO [main-SendThread(localhost:11225):clientcnxn$sendthr...@939] - Server connection successful [junit] 2009-07-17 10:47:04,358 - INFO [NIOServerCxn.Factory:11225:nioserverc...@587] - Connected to /127.0.0.1:44750 lastZxid 6 [junit] 2009-07-17 10:47:04,358 - INFO [NIOServerCxn.Factory:11225:nioserverc...@968] - Finished init of 0x1228851e90a valid:true [junit] 2009-07-17 10:47:04,358 - INFO [NIOServerCxn.Factory:11225:nioserverc...@616] - Renewing session 0x1228851e90a [junit] 2009-07-17 10:47:15,410 - INFO [main:zookee...@461] - Closing session: 0x1228851e90a [junit] 2009-07-17 10:47:15,410 - INFO [main:clientc...@1070] - Closing ClientCnxn for session: 0x1228851e90a [junit] 2009-07-17 10:47:15,411 - INFO [ProcessThread:-1:preprequestproces...@384] - Processed session termination request for id: 0x1228851e90a [junit] 2009-07-17 10:47:15,481 - INFO [SyncThread:0:nioserverc...@837] - closing session:0x1228851e90a NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/127.0.0.1:11225 remote=/127.0.0.1:44750] [junit] 2009-07-17 10:47:15,481 - INFO [main-SendThread(localhost:11225):clientcnxn$sendthr...@963] - Exception while closing send thread for session 0x1228851e90a : Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] [junit] 2009-07-17 10:47:15,582 - INFO [main:clientc...@1056] - Disconnecting ClientCnxn for session: 0x1228851e90a [junit] 2009-07-17 10:47:15,582 - INFO [main:zookee...@469] - Session: 0x1228851e90a closed [junit] 2009-07-17 10:47:15,582 - INFO [main-EventThread:clientcnxn$eventthr...@514] - EventThread shut down [junit] 2009-07-17 10:47:15,582 - INFO [main:clientb...@375] - tearDown starting [junit] 2009-07-17 10:47:15,583 - INFO [main:zookee...@461] - Closing session: 0x1228851e90a [junit] 2009-07-17 10:47:15,583 - INFO [main:clientc...@1070] - Closing ClientCnxn for session: 0x1228851e90a [junit] 2009-07-17 10:47:15,583 - INFO [main:clientc...@1056] - Disconnecting ClientCnxn for session: 0x1228851e90a [junit] 2009-07-17 10:47:15,583 - INFO [main:zookee...@469] - Session: 0x1228851e90a closed [junit] 2009-07-17 10:47:15,583 - INFO [main:clientb...@352] - STOPPING server [junit] 2009-07-17 10:47:15,583 - INFO [NIOServerCxn.Factory:11225:nioservercnxn$fact...@239] - NIOServerCnxn factory exited run method [junit] 2009-07-17 10:47:15,584 - INFO [main:finalrequestproces...@283] - shutdown of request processor complete [junit] 2009-07-17 10:47:15,584 - INFO [SyncThread:0:syncrequestproces...@134] - SyncRequestProcessor exited! [junit] 2009-07-17 10:47:15,584 - INFO [ProcessThread:-1:preprequestproces...@119] - PrepRequestProcessor exited loop! [junit] ensureOnly:[] [junit] 2009-07-17 10:47:15,587 - INFO [main:clientb...@391] - FINISHED testWatcherAutoResetDisabledWithGlobal [junit] 2009-07-17 10:47:15,588 - INFO [main:clientb...@330] - STARTING testWatcherAutoResetDisabledWithLocal [junit] 2009-07-17 10:47:15,593 - INFO [main:clientb...@345] - STARTING server [junit] 2009-07-17 10:47:15,593 - INFO [main:zookeeperser...@159] - Created server [junit] 2009-07-17 10:47:15,593 - INFO [main:nioservercnxn$fact...@125] - binding to port 11226 [junit] 2009-07-17 10:47:15,594 - INFO [main:filetxnsnap...@208] - Snapshotting: 0 [junit] ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2009-07-17 10:47:15,596 - INFO [NIOServerCxn.Factory:11226:nioserverc...@702] - Processing stat command from /127.0.0.1:46703 [junit] 2009-07-17 10:47:15,596 - WARN [NIOServerCxn.Factory:11226:nioserverc...@498] - Exception causing close of session 0x0 due to java.io.IOException: Responded to info probe [junit] 2009-07-17 10:47:15,596 - INFO [NIOServerCxn.Factory:11226:nioserverc...@837] - closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/127.0.0.1:11226 remote=/127.0.0.1:46703] [junit] expect:InMemoryDataTree [junit] found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] expect:StandaloneServer_port [junit] found:StandaloneServ
[jira] Commented: (ZOOKEEPER-475) FLENewEpochTest failed on nightly builds.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732451#action_12732451 ] Flavio Paiva Junqueira commented on ZOOKEEPER-475: -- Great catch! (I know it was hudson, but it was good that you've seen it) The short version of the story is that the synchronization is not correct in QuorumCnxManager. The longer version is like this. From the traces, I can see the following sequence of messages: * Replica 1 sends a message to itself and to Replica 2 stating that its current vote is for replica 1; * Replica 2 sends a message to itself and to Replica 1 stating that its current vote is for replica 2; * Replica 1 updates its vote, and sends a message to itself stating that its current vote is for replica 2; * Since replica 1 has two votes for 2 in a an ensemble of 3 replicas, replica 1 decides to follow 2. The problem is that replica 2 does not receive a message from 1 stating that it changed its vote to 2, which prevents 2 from becoming a leader. Now looking more carefully at why that happened, you can see that when 1 tries to send a message to 2, QuorumCnxManager in 1 is both shutting down a connection to 2 at the same time that it is trying to open a new one. The incorrect synchronization prevents the creation of a new connection, and 1 and 2 end up not connected. > FLENewEpochTest failed on nightly builds. > - > > Key: ZOOKEEPER-475 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-475 > Project: Zookeeper > Issue Type: Bug >Reporter: Mahadev konar >Assignee: Flavio Paiva Junqueira > Fix For: 3.2.1, 3.3.0 > > > THe flenewepochtest failed on one of the nightly builds - > http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/377. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.