hadoop-hbase-0.90.6+84.73-1 hadoop-zookeeper-3.3.5+19.5-1 hadoop-0.20.2+923.421-1
Yes, hbase is managing the Quorum. -----Original Message----- From: Ted Yu [mailto:[email protected]] Sent: Monday, August 05, 2013 12:39 PM To: [email protected] Subject: Re: Hbase keeps dying (Zookeeper) bq. there wasn't a copy of hdfs-site.xml Can you tell us the versions of: hadoop hbase zookeeper you're using ? Did you let HBase manage your zookeeper quorum ? On Mon, Aug 5, 2013 at 9:15 AM, Trevor Antczak <[email protected]>wrote: > Hi all, > > I have an hbase system that has worked fine for quite a long time, but > now it is quite suddenly developing errors. First it was dying > immediately on startup because there wasn't a copy of hdfs-site.xml in > the hbase conf directory (which doesn't seem like it should be > necessary, and I'm not sure how it got moved if it had been there in > the first place). I copied the hdfs-site-xml from /etc/hadoops/conf > into /etc/hbase/conf. Now hbase starts up, but it can never connect > to Zookeeper and dies after a few minutes of trying. The weird thing, > is that according to Zookeeper the connection is happening. From the hbase > logs I get a ton of messages like: > > 2013-08-05 11:57:19,019 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x4403f9ef5b20026 Creating (or updating) unassigned node > for > 0f3ca79375768472af70765ff231ee32 with OFFLINE state > 2013-08-05 11:57:19,020 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=M_ZK_REGION_OFFLINE, server=hmaster:60000, > region=0f3ca79375768472af70765ff231ee32 > > Eventually followed by: > > 2013-08-05 11:57:19,105 WARN org.apache.zookeeper.ClientCnxn: Session > 0x4403f9ef5b20026 for server hslave14/172.20.7.124:2181, unexpected > error, closing socket connection and attempting reconnect > java.io.IOException: Packet len4935980 is out of range! > at > org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:708) > at > org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:867) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154) > > And then a bunch more Java errors as the process dies. From the > Zookeeper logs I see the hbase server connect: > > 13/08/05 11:40:27 INFO server.NIOServerCnxn: Accepted socket > connection from /xxx.xxx.xxx.xxx:34879 > 13/08/05 11:40:27 INFO server.NIOServerCnxn: Client attempting to > establish new session at /xxx.xxx.xxx.xxx:34879 > 13/08/05 11:40:27 INFO server.NIOServerCnxn: Established session > 0x1404ee40a8d000c with negotiated timeout 40000 for client > /xxx.xxx.xxx.xxx:34879 > > Then disconnect, but only after it shuts down: > > 13/08/05 11:45:52 INFO server.NIOServerCnxn: Closed socket connection > for client /xxx.xxx.xxx.xxx:34879 which had sessionid > 0x1404ee40a8d000c > > Does anyone have any clever ideas of places I can look for this error? > Or why I'm suddenly having this problem when I haven't changed anything? > Thanks in advance for any help provided. > > Trevor >
