bq. there wasn't a copy of hdfs-site.xml Can you tell us the versions of: hadoop hbase zookeeper you're using ?
Did you let HBase manage your zookeeper quorum ? On Mon, Aug 5, 2013 at 9:15 AM, Trevor Antczak <[email protected]>wrote: > Hi all, > > I have an hbase system that has worked fine for quite a long time, but now > it is quite suddenly developing errors. First it was dying immediately on > startup because there wasn't a copy of hdfs-site.xml in the hbase conf > directory (which doesn't seem like it should be necessary, and I'm not sure > how it got moved if it had been there in the first place). I copied the > hdfs-site-xml from /etc/hadoops/conf into /etc/hbase/conf. Now hbase > starts up, but it can never connect to Zookeeper and dies after a few > minutes of trying. The weird thing, is that according to Zookeeper the > connection is happening. From the hbase logs I get a ton of messages like: > > 2013-08-05 11:57:19,019 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x4403f9ef5b20026 Creating (or updating) unassigned node for > 0f3ca79375768472af70765ff231ee32 with OFFLINE state > 2013-08-05 11:57:19,020 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=M_ZK_REGION_OFFLINE, server=hmaster:60000, > region=0f3ca79375768472af70765ff231ee32 > > Eventually followed by: > > 2013-08-05 11:57:19,105 WARN org.apache.zookeeper.ClientCnxn: Session > 0x4403f9ef5b20026 for server hslave14/172.20.7.124:2181, unexpected > error, closing socket connection and attempting reconnect > java.io.IOException: Packet len4935980 is out of range! > at > org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:708) > at > org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:867) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154) > > And then a bunch more Java errors as the process dies. From the Zookeeper > logs I see the hbase server connect: > > 13/08/05 11:40:27 INFO server.NIOServerCnxn: Accepted socket connection > from /xxx.xxx.xxx.xxx:34879 > 13/08/05 11:40:27 INFO server.NIOServerCnxn: Client attempting to > establish new session at /xxx.xxx.xxx.xxx:34879 > 13/08/05 11:40:27 INFO server.NIOServerCnxn: Established session > 0x1404ee40a8d000c with negotiated timeout 40000 for client > /xxx.xxx.xxx.xxx:34879 > > Then disconnect, but only after it shuts down: > > 13/08/05 11:45:52 INFO server.NIOServerCnxn: Closed socket connection for > client /xxx.xxx.xxx.xxx:34879 which had sessionid 0x1404ee40a8d000c > > Does anyone have any clever ideas of places I can look for this error? Or > why I'm suddenly having this problem when I haven't changed anything? > Thanks in advance for any help provided. > > Trevor >
