[jira] Commented: (ZOOKEEPER-481) Add lastMessageSent to QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734556#action_12734556 ] Henry Robinson commented on ZOOKEEPER-481: -- Does this change have any impact on ZOOKEEPER-22? Add lastMessageSent to QuorumCnxManager --- Key: ZOOKEEPER-481 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-481 Project: Zookeeper Issue Type: Bug Components: leaderElection Reporter: Flavio Paiva Junqueira Attachments: ZOOKEEPER-481.patch Currently we rely on TCP for reliable delivery of FLE messages. However, as we concurrently drop and create new connections, it is possible that a message is sent but never received. With this patch, cnx manager keeps a list of last messages sent, and resends the last one sent. Receiving multiples copies is harmless. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-481) Add lastMessageSent to QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734593#action_12734593 ] Flavio Paiva Junqueira commented on ZOOKEEPER-481: -- No, ZOOKEEPER-22 is about client-to-server communication, and this one is affects server-to-server communication in particular during leade election. Add lastMessageSent to QuorumCnxManager --- Key: ZOOKEEPER-481 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-481 Project: Zookeeper Issue Type: Bug Components: leaderElection Reporter: Flavio Paiva Junqueira Attachments: ZOOKEEPER-481.patch Currently we rely on TCP for reliable delivery of FLE messages. However, as we concurrently drop and create new connections, it is possible that a message is sent but never received. With this patch, cnx manager keeps a list of last messages sent, and resends the last one sent. Receiving multiples copies is harmless. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-484: --- Component/s: server Fix Version/s: 3.3.0 Clients get SESSION MOVED exception when switching from follower to a leader. - Key: ZOOKEEPER-484 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.2.1, 3.3.0 When a client is connected to follower and get disconnected and connects to a leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO NOT have this problem. The fix is to make sure the ownership of a connection gets changed when a session moves from follower to the leader. The workaround to it in 3.2.0 would be to swithc off connection from clients to the leader. take a look at *leaderServers* java property in http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes
need ops documentation that details supervision of ZK server processes -- Key: ZOOKEEPER-485 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485 Project: Zookeeper Issue Type: Bug Components: documentation, server Reporter: Patrick Hunt Fix For: 3.2.1, 3.3.0 We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process exits/dies/crashes/etc... In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm. Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done automatically then it will have to be done manually, by operator restarting the ZK server jvm The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - fits into this nicely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734807#action_12734807 ] Mahadev konar commented on ZOOKEEPER-483: - ryan, can you post (attach as files) server logs in different files for all you zookeeper servers? The one exception I can see if that you had problems with your disk but I am surprised to see it took down 2 of your other servers. also, please explain the setup -- I am assuming all the servers are on different machines, and the logs and snapshots go to different disk? ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn: java.nio.channels.SocketChannel[connected
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated ZOOKEEPER-483: -- Attachment: zklogs.tar.gz here is the complete log set from all 5 of our quorum members for the entire day. note that the event happens around 12:30 pm local (PDT) time in the log. I restarted the 3 crashed machines at about 13:30. They were not run under supervision, but clearly they should. The machines in question only have 1 disk, since they also run other hadoop things, and I was hoping that the quorum model would provide reliability even if the local disk isnt 100%. Although, having said that, the local disk seems to be ok, since nothing else on them has died. ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Attachments: zklogs.tar.gz here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181
[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734814#action_12734814 ] Mahadev konar commented on ZOOKEEPER-483: - sorry I misread the exception , thanks pat for pointing it out - it looks like the followers could not send acks back to the leader and thus exited. If you have monitoring in place, you should check for any network problems that might have happend during that time. Usually we run our zookeeper servers as daemons that are restarted autamtically if they fail. So in this case the servers that failed would have automatically restarted. - also, http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html#sc_CrossMachineRequirements has some hints on setting up your environment (servers on different rack/switch etc.). you might want to read through it. ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Attachments: zklogs.tar.gz here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn:
[jira] Updated: (ZOOKEEPER-436) Bookies should auto register to ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Paiva Junqueira updated ZOOKEEPER-436: - Component/s: contrib-bookkeeper Bookies should auto register to ZooKeeper - Key: ZOOKEEPER-436 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-436 Project: Zookeeper Issue Type: Improvement Components: contrib-bookkeeper Reporter: Benjamin Reed currently bookies have to be manually added to ZooKeeper to be used in a BookKeeper service. we should be able to just start up a bookkie, point it at ZooKeeper, and have it get auto integrated in. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-486) Improve bookie performance for large number of ledgers
Improve bookie performance for large number of ledgers -- Key: ZOOKEEPER-486 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-486 Project: Zookeeper Issue Type: Improvement Components: contrib-bookkeeper Reporter: Flavio Paiva Junqueira If we write simultaneously to a large number of ledgers on a bookie, then performance drops significantly due to more seeks on the ledger device. In such cases, we should be clustering ledgers into files to reduce the number of seeks, and performing sequential writes on each file. Clustering ledgers will impact read performance, so we would to have a knob to control such a feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-484: Attachment: sessionTest.patch this patch recreates the problem. Clients get SESSION MOVED exception when switching from follower to a leader. - Key: ZOOKEEPER-484 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.2.1, 3.3.0 Attachments: sessionTest.patch When a client is connected to follower and get disconnected and connects to a leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO NOT have this problem. The fix is to make sure the ownership of a connection gets changed when a session moves from follower to the leader. The workaround to it in 3.2.0 would be to swithc off connection from clients to the leader. take a look at *leaderServers* java property in http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734902#action_12734902 ] Patrick Hunt commented on ZOOKEEPER-483: Mahadev, we now include a bunch of environment information in the server log when it starts up, the following can be found in a few of the logs that Ryan attached: 2009-07-23 13:27:39,363 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT Looks to me like it's not an official release, afaict something between 3.1 and 3.2 (built 5/15/2009), perhaps Ryan can shed more light on the specifics. ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Fix For: 3.2.1 Attachments: zklogs.tar.gz here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23
[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734912#action_12734912 ] ryan rawson commented on ZOOKEEPER-483: --- In hbase we have a custom version, here it is: zookeeper-r785019-hbase-1329.jar based on that svn #, and plus a small patch from hbase-1329. This was opened as ZOOKEEPER-457. I think the problem might have happened when someone ran a heavy duty map-reduce with a major pile of disk IO intermediate output, which may have delayed zookeeper log writes. I'm looking to move the quorum to other machines. ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Fix For: 3.2.1 Attachments: zklogs.tar.gz here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-483: Attachment: ZOOKEEPER-483.patch i was able to reproduce the problem. and the patch was a missing catch for a socket exception. ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Fix For: 3.2.1 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:34032] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d516011c NIOServerCnxn: java.nio.channels.SocketChannel[connected