[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-07-23 Thread ryan rawson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson updated ZOOKEEPER-483:
--

Attachment: zklogs.tar.gz

here is the complete log set from all 5 of our quorum members for the entire 
day. note that the event happens around 12:30 pm local (PDT) time in the log.  
I restarted the 3 crashed machines at about 13:30.

They were not run under supervision, but clearly they should.  The machines in 
question only have 1 disk, since they also run other hadoop things, and I was 
hoping that the quorum model would provide reliability even if the local disk 
isnt 100%.  Although, having said that, the local disk seems to be ok, since 
nothing else on them has died.

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
 Attachments: zklogs.tar.gz


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 

[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-07-23 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734912#action_12734912
 ] 

ryan rawson commented on ZOOKEEPER-483:
---

In hbase we have a custom version, here it is:

zookeeper-r785019-hbase-1329.jar

based on that svn #, and plus a small patch from hbase-1329. This was opened as 
ZOOKEEPER-457.

I think the problem might have happened when someone ran a heavy duty 
map-reduce with a major pile of disk IO intermediate output, which may have 
delayed zookeeper log writes. I'm looking to move the quorum to other machines.

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
 Fix For: 3.2.1

 Attachments: zklogs.tar.gz


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing 

[jira] Updated: (ZOOKEEPER-457) Make ZookeeperMain public, support for HBase (and other) embedded clients

2009-07-01 Thread ryan rawson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson updated ZOOKEEPER-457:
--

Attachment: zk.patch

 Make ZookeeperMain public, support for HBase (and other) embedded clients
 -

 Key: ZOOKEEPER-457
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-457
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
 Fix For: 3.2.0

 Attachments: zk.patch


 hi folks, we have made some changes to zookeeper to facilitate providing an 
 embedded zk client in our own hbase client.  This will allow our users to use 
 1 shell to manipulate both hbase things and zookeeper things.  It requires 
 making a few things public, and in the process rearranging how some static 
 things are initialized.  
 It's fairly trivial refactoring, hopefully you guys approve!
 Thanks!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-457) Make ZookeeperMain public, support for HBase (and other) embedded clients

2009-07-01 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12726189#action_12726189
 ] 

ryan rawson commented on ZOOKEEPER-457:
---

it doesn't matter, i just tried to pick something that seemed ok, but obviously 
it didnt work.

we can push it back.

 Make ZookeeperMain public, support for HBase (and other) embedded clients
 -

 Key: ZOOKEEPER-457
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-457
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
 Fix For: 3.2.0

 Attachments: zk.patch


 hi folks, we have made some changes to zookeeper to facilitate providing an 
 embedded zk client in our own hbase client.  This will allow our users to use 
 1 shell to manipulate both hbase things and zookeeper things.  It requires 
 making a few things public, and in the process rearranging how some static 
 things are initialized.  
 It's fairly trivial refactoring, hopefully you guys approve!
 Thanks!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.