2008/7/23 James Strachan [EMAIL PROTECTED]:
2008/7/23 Benjamin Reed [EMAIL PROTECTED]:
SessionExpiredExceptions should be extremely rare. Basically they should only
happen if a machine goes down (of course that would mean no exception would
actually get generated since the client is dead :) or a network partition
occurs.
Having said that we seem to have a bug that cause SessionExpiredExceptions
when nothing bad has happened. The bug must be in the heart beat code (we do
them automatically, so the client shouldn't have to worry about it). If you
can reproduce it well, it would greatly help to track down the bug! Can you
send me the code to reproduce the problem?
Its the test case WriteLockTest in the patch for ZOOKEEPER-78 which is
currently dependent on the ZOOKEEPER-84 patch as well (though given
your recent comment I'm gonna refactor the code to not require a
ZooKeeper change :)
I'll ping the list when I've refactored the test case to not require
the ZOOKEEPER-84 change.
I've just updated the patch on ZOOKEEPER-78 to avoid the dependency on
ZOOKEEPER-84. It now uses a ZooKeeperFacade class which wraps up the
creation of the ZooKeeper - and recreation of it if a
SessionExpiredException is received.
The test case currently hangs there...
[junit] main prio=5 tid=0x01001710 nid=0xb0801000 in
Object.wait() [0xb07ff000..0xb0800148]
[junit] at java.lang.Object.wait(Native Method)
[junit] - waiting on 0x096105e0 (a
org.apache.zookeeper.ClientCnxn$Packet)
[junit] at java.lang.Object.wait(Object.java:474)
[junit] at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:822)
[junit] - locked 0x096105e0 (a org.apache.zookeeper.ClientCnxn$Packet)
[junit] at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:329)
[junit] - locked 0x0bd54108 (a org.apache.zookeeper.ZooKeeper)
[junit] at
org.apache.zookeeper.protocols.ZooKeeperFacade.close(ZooKeeperFacade.java:99)
[junit] at
org.apache.zookeeper.protocols.WriteLockTest.tearDown(WriteLockTest.java:146)
[junit] at junit.framework.TestCase.runBare(TestCase.java:140)
[junit] at junit.framework.TestResult$1.protect(TestResult.java:110)
[junit] at junit.framework.TestResult.runProtected(TestResult.java:128)
[junit] at junit.framework.TestResult.run(TestResult.java:113)
[junit] at junit.framework.TestCase.run(TestCase.java:124)
[junit] at junit.framework.TestSuite.runTest(TestSuite.java:232)
[junit] at junit.framework.TestSuite.run(TestSuite.java:227)
[junit] at
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
[junit] at
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:36)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766)
basically the 3rd ZooKeeper client cannot close down; it just hangs in
the close() method.
(BTW it might be nice to avoid the close() method waiting forever - it
might as well wait, say, 10 seconds then just close anyway).
Though now I've refactored the code to avoid the patch on ZooKeeper to
deal with reconnecting when a SessionExpiredException occurs, I don't
seem to get any session expired exceptions :). I'm starting to wonder
if its maybe related to old persistent data on disk causing the
exception?
I still get the strange lack of Watch Events on the 3rd client though
and the hang on closing (if
WriteLockTest,workAroundClosingLastZNodeFails is set to false - I've
hacked the test to pass by default).
--
James
---
http://macstrac.blogspot.com/
Open Source Integration
http://open.iona.com