Re: when should a SessionExpiredException occur?

2008-07-23 Thread Benjamin Reed
SessionExpiredExceptions should be extremely rare. Basically they should only 
happen if a machine goes down (of course that would mean no exception would 
actually get generated since the client is dead :) or a network partition 
occurs.

Having said that we seem to have a bug that cause SessionExpiredExceptions 
when nothing bad has happened. The bug must be in the heart beat code (we do 
them automatically, so the client shouldn't have to worry about it). If you 
can reproduce it well, it would greatly help to track down the bug! Can you 
send me the code to reproduce the problem?

thanx
ben

On Wednesday 23 July 2008 06:28:12 James Strachan wrote:
 Am just wondering if I've hit this due to some other bug. I thought ZK
 did keep-alive pings to ensure each client is alive and its session
 does not expire? Or does the client have to explicitly keep calling
 some method on the ZooKeeper interface to ensure a steady flow of
 packets to the ZK server to keep it alive?

 The test case WriteLockTest in the patch for ZOOKEEPER-78 (the
 WriteLock) can always reproduce a SessionExpiredException when using 3
 clients (its always the 3rd session that expires).

 Now when a SessionExpiredException occurs, any recipe/protocol has to
 be able to deal with it; so the ZOOKEEPER-84 issue is still valid
 IMHO. But I'm wondering if in my test case it shouldn't be happening;
 as I've got 3 clients and a server all in the same JVM and the JVM
 isn't locked or pegged nor do the TCP sockets fail AFAIK.

 So I just thought I'd ask; are the keep alive packets used by default?
 If they are then maybe they are not sent very frequently or something?




Re: when should a SessionExpiredException occur?

2008-07-23 Thread James Strachan
2008/7/23 James Strachan [EMAIL PROTECTED]:
 2008/7/23 Benjamin Reed [EMAIL PROTECTED]:
 SessionExpiredExceptions should be extremely rare. Basically they should only
 happen if a machine goes down (of course that would mean no exception would
 actually get generated since the client is dead :) or a network partition
 occurs.

 Having said that we seem to have a bug that cause SessionExpiredExceptions
 when nothing bad has happened. The bug must be in the heart beat code (we do
 them automatically, so the client shouldn't have to worry about it). If you
 can reproduce it well, it would greatly help to track down the bug! Can you
 send me the code to reproduce the problem?

 Its the test case WriteLockTest in the patch for ZOOKEEPER-78 which is
 currently dependent on the ZOOKEEPER-84 patch as well (though given
 your recent comment I'm gonna refactor the code to not require a
 ZooKeeper change :)

 I'll ping the list when I've refactored the test case to not require
 the ZOOKEEPER-84 change.

I've just updated the patch on ZOOKEEPER-78 to avoid the dependency on
ZOOKEEPER-84. It now uses a ZooKeeperFacade class which wraps up the
creation of the ZooKeeper - and recreation of it if a
SessionExpiredException is received.

The test case currently hangs there...

[junit] main prio=5 tid=0x01001710 nid=0xb0801000 in
Object.wait() [0xb07ff000..0xb0800148]
[junit] at java.lang.Object.wait(Native Method)
[junit] - waiting on 0x096105e0 (a
org.apache.zookeeper.ClientCnxn$Packet)
[junit] at java.lang.Object.wait(Object.java:474)
[junit] at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:822)
[junit] - locked 0x096105e0 (a org.apache.zookeeper.ClientCnxn$Packet)
[junit] at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:329)
[junit] - locked 0x0bd54108 (a org.apache.zookeeper.ZooKeeper)
[junit] at
org.apache.zookeeper.protocols.ZooKeeperFacade.close(ZooKeeperFacade.java:99)
[junit] at
org.apache.zookeeper.protocols.WriteLockTest.tearDown(WriteLockTest.java:146)
[junit] at junit.framework.TestCase.runBare(TestCase.java:140)
[junit] at junit.framework.TestResult$1.protect(TestResult.java:110)
[junit] at junit.framework.TestResult.runProtected(TestResult.java:128)
[junit] at junit.framework.TestResult.run(TestResult.java:113)
[junit] at junit.framework.TestCase.run(TestCase.java:124)
[junit] at junit.framework.TestSuite.runTest(TestSuite.java:232)
[junit] at junit.framework.TestSuite.run(TestSuite.java:227)
[junit] at
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
[junit] at
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:36)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766)


basically the 3rd ZooKeeper client cannot close down; it just hangs in
the close() method.

(BTW it might be nice to avoid the close() method waiting forever - it
might as well wait, say, 10 seconds then just close anyway).

Though now I've refactored the code to avoid the patch on ZooKeeper to
deal with reconnecting when a SessionExpiredException occurs, I don't
seem to get any session expired exceptions :). I'm starting to wonder
if its maybe related to old persistent data on disk causing the
exception?

I still get the strange lack of Watch Events on the 3rd client though
and the hang on closing (if
WriteLockTest,workAroundClosingLastZNodeFails is set to false - I've
hacked the test to pass by default).

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com