I was going to refer you guys to
https://issues.apache.org/jira/browse/ZOOKEEPER-63
but I noticed in the comments James beat me to it! :-)

Ben, you had an idea for how to address 63, please add a comment (I think it was to set the state to closed before sending the disconnect request to the server, but please update)

James, go ahead and create a new Jira issue for this SessionExpiredException being thrown. As you can reproduce it feel free to assign to yourself and work with the rest of the team to resolve.

Thanks!

Patrick

James Strachan wrote:
2008/7/23 James Strachan <[EMAIL PROTECTED]>:
2008/7/23 Benjamin Reed <[EMAIL PROTECTED]>:
SessionExpiredExceptions should be extremely rare. Basically they should only
happen if a machine goes down (of course that would mean no exception would
actually get generated since the client is dead :) or a network partition
occurs.

Having said that we seem to have a bug that cause SessionExpiredExceptions
when nothing bad has happened. The bug must be in the heart beat code (we do
them automatically, so the client shouldn't have to worry about it). If you
can reproduce it well, it would greatly help to track down the bug! Can you
send me the code to reproduce the problem?
Its the test case WriteLockTest in the patch for ZOOKEEPER-78 which is
currently dependent on the ZOOKEEPER-84 patch as well (though given
your recent comment I'm gonna refactor the code to not require a
ZooKeeper change :)

I'll ping the list when I've refactored the test case to not require
the ZOOKEEPER-84 change.

I've just updated the patch on ZOOKEEPER-78 to avoid the dependency on
ZOOKEEPER-84. It now uses a ZooKeeperFacade class which wraps up the
creation of the ZooKeeper - and recreation of it if a
SessionExpiredException is received.

The test case currently hangs there...

    [junit] "main" prio=5 tid=0x01001710 nid=0xb0801000 in
Object.wait() [0xb07ff000..0xb0800148]
    [junit]     at java.lang.Object.wait(Native Method)
    [junit]     - waiting on <0x096105e0> (a
org.apache.zookeeper.ClientCnxn$Packet)
    [junit]     at java.lang.Object.wait(Object.java:474)
    [junit]     at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:822)
    [junit]     - locked <0x096105e0> (a org.apache.zookeeper.ClientCnxn$Packet)
    [junit]     at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:329)
    [junit]     - locked <0x0bd54108> (a org.apache.zookeeper.ZooKeeper)
    [junit]     at
org.apache.zookeeper.protocols.ZooKeeperFacade.close(ZooKeeperFacade.java:99)
    [junit]     at
org.apache.zookeeper.protocols.WriteLockTest.tearDown(WriteLockTest.java:146)
    [junit]     at junit.framework.TestCase.runBare(TestCase.java:140)
    [junit]     at junit.framework.TestResult$1.protect(TestResult.java:110)
    [junit]     at junit.framework.TestResult.runProtected(TestResult.java:128)
    [junit]     at junit.framework.TestResult.run(TestResult.java:113)
    [junit]     at junit.framework.TestCase.run(TestCase.java:124)
    [junit]     at junit.framework.TestSuite.runTest(TestSuite.java:232)
    [junit]     at junit.framework.TestSuite.run(TestSuite.java:227)
    [junit]     at
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
    [junit]     at
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:36)
    [junit]     at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421)
    [junit]     at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912)
    [junit]     at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766)


basically the 3rd ZooKeeper client cannot close down; it just hangs in
the close() method.

(BTW it might be nice to avoid the close() method waiting forever - it
might as well wait, say, 10 seconds then just close anyway).

Though now I've refactored the code to avoid the patch on ZooKeeper to
deal with reconnecting when a SessionExpiredException occurs, I don't
seem to get any session expired exceptions :). I'm starting to wonder
if its maybe related to old persistent data on disk causing the
exception?

I still get the strange lack of Watch Events on the 3rd client though
and the hang on closing (if
WriteLockTest,workAroundClosingLastZNodeFails is set to false - I've
hacked the test to pass by default).

Reply via email to