Hi all,

I have an hbase system that has worked fine for quite a long time, but now it 
is quite suddenly developing errors.  First it was dying immediately on startup 
because there wasn't a copy of hdfs-site.xml in the hbase conf directory (which 
doesn't seem like it should be necessary, and I'm not sure how it got moved if 
it had been there in the first place).  I copied the hdfs-site-xml from 
/etc/hadoops/conf into /etc/hbase/conf.  Now hbase starts up, but it can never 
connect to Zookeeper and dies after a few minutes of trying.  The weird thing, 
is that according to Zookeeper the connection is happening.  From the hbase 
logs I get a ton of messages like:

2013-08-05 11:57:19,019 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:60000-0x4403f9ef5b20026 Creating (or updating) unassigned node for 
0f3ca79375768472af70765ff231ee32 with OFFLINE state
2013-08-05 11:57:19,020 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=M_ZK_REGION_OFFLINE, server=hmaster:60000, 
region=0f3ca79375768472af70765ff231ee32

Eventually followed by:

2013-08-05 11:57:19,105 WARN org.apache.zookeeper.ClientCnxn: Session 
0x4403f9ef5b20026 for server hslave14/172.20.7.124:2181, unexpected error, 
closing socket connection and attempting reconnect
java.io.IOException: Packet len4935980 is out of range!
        at 
org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:708)
        at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:867)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154)

And then a bunch more Java errors as the process dies.  From the Zookeeper logs 
I see the hbase server connect:

13/08/05 11:40:27 INFO server.NIOServerCnxn: Accepted socket connection from 
/xxx.xxx.xxx.xxx:34879
13/08/05 11:40:27 INFO server.NIOServerCnxn: Client attempting to establish new 
session at /xxx.xxx.xxx.xxx:34879
13/08/05 11:40:27 INFO server.NIOServerCnxn: Established session 
0x1404ee40a8d000c with negotiated timeout 40000 for client 
/xxx.xxx.xxx.xxx:34879

Then disconnect, but only after it shuts down:

13/08/05 11:45:52 INFO server.NIOServerCnxn: Closed socket connection for 
client /xxx.xxx.xxx.xxx:34879 which had sessionid 0x1404ee40a8d000c

Does anyone have any clever ideas of places I can look for this error?  Or why 
I'm suddenly having this problem when I haven't changed anything?  Thanks in 
advance for any help provided.

Trevor

Reply via email to