[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Open  (was: Patch Available)

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, 
 ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:34032]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d516011c NIOServerCnxn: 
 

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Attachment: ZOOKEEPER-483.patch

The test case exposed another bug: log truncation was not being done properly 
with the buffered inputstream. i modified the test to make it fail reliably and 
then fixed the bug.

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, 
 ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
 ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Patch Available  (was: Open)

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, 
 ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
 ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:34032]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d516011c 

[jira] Updated: (ZOOKEEPER-503) race condition in asynchronous create

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-503:


Attachment: ZOOKEEPER-503.patch

this patch fixes a range of projects. it is a big simplification. it has a net 
removal of 700 lines of code. the meta data for a ledger was collapsed into a 
single znode. here is a description of the changes:

Index calculation in QuorumEngine must be synchronized on the LedgerHandle to 
avoid changes to the ensemble while trying to submit an operation. Such changes 
happen upon crashes of bookies. 
  

I initialized thought it was not necessary, but now I think this 
synchronization block is necessary. 

If a writer adds just a few entries to a ledger, it may end up with hints that 
say empty ledger when trying to recover a ledger. In this case, if we receive 
an empty ledger flag as a hint, we have to switch the hint to zero, which means 
that the client will start recovery from entry zero. If no entry has been 
written, it still works as the client won't be able to read anything.   
   

I have changed LedgerRecoveryTest to test for: many entries written, one entry 
written, no entry written.

I have been able to identify the problem that was causing BookieFailureTest to 
hang on Utkarsh's computer. Basically, when the queue of a BookieHandle is full 
and the corresponding bookie has crashed, we are not able to add a read 
operation to the queue incoming queue of the bookie handle because the 
BookieHandle is not processing new requests anymore and it is waiting to fail 
the handle. In this case, the BookieHandle throws an exception after timing out 
the call to add the read operation to the queue. We were propagating this 
exception to the application.   
  

The main problem is that we have to add the operation to the queue of 
ClientCBWorker so that we guarantee that it knows about the operation once we 
receive responses from bookies. If we throw an exception without removing the 
operation from the ClientCBWorker queue, the worker will wait forever, which I 
believe is the case Utkarsh was observing.  
   

If I reasoned about the code correctly, then my modifications fix this problem 
by retrying a few times and erroring out after a number of retries. Erroring 
out in this case means notifying the CBWorker so that we can release the 
operation. 

Fixing log level in LedgerConfig. -F

I have mainly worked on the ledger recovery machinery. I made it asynchronous 
by transforming LedgerRecovery into a thread and moving some calls. We have to 
revisit this way of making it asynchronous as it might not be acceptable for 
this patch.

I'm still to check why BookieFailureTest is failing for Utkarsh. It passes fine 
every time for me, so we have to find a way to reproduce it reliably in my 
machine so that I can debug it.


Took a pass over asynchronous ledger operations: create, open, close. Some 
parts are still blocking, work on those next.

 race condition in asynchronous create
 -

 Key: ZOOKEEPER-503
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Benjamin Reed
 Attachments: ZOOKEEPER-503.patch


 there is a race condition between the zookeeper completion thread and the 
 bookeeper processing queue during create. if the zookeeper completion thread 
 falls behind due to scheduling, the action counter of the create operation 
 may go backwards.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-503) race condition in asynchronous create

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-503:


Status: Patch Available  (was: Open)

 race condition in asynchronous create
 -

 Key: ZOOKEEPER-503
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Benjamin Reed
 Attachments: ZOOKEEPER-503.patch


 there is a race condition between the zookeeper completion thread and the 
 bookeeper processing queue during create. if the zookeeper completion thread 
 falls behind due to scheduling, the action counter of the create operation 
 may go backwards.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-503) race condition in asynchronous create

2009-08-14 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743541#action_12743541
 ] 

Benjamin Reed commented on ZOOKEEPER-503:
-

i should have also mentioned that this patch was done by flavio and utkarsh. i 
will be reviewing it.

 race condition in asynchronous create
 -

 Key: ZOOKEEPER-503
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Benjamin Reed
 Attachments: ZOOKEEPER-503.patch


 there is a race condition between the zookeeper completion thread and the 
 bookeeper processing queue during create. if the zookeeper completion thread 
 falls behind due to scheduling, the action counter of the create operation 
 may go backwards.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-14 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743547#action_12743547
 ] 

Benjamin Reed commented on ZOOKEEPER-483:
-

just to be clear. this bug isn't completely fixed and the test case should 
still be failing. i just want to make sure it fails reliably on the hudson 
machine.

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, 
 ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
 ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: