Marcos, We have plans to upgrade everything in the near future, but we can't just do it right now. For right now I'd just settle for a way to recreate the /hbase/master node in Zookeeper so I can get things working.
Trevor -----Original Message----- From: Marcos Luis Ortiz Valmaseda [mailto:[email protected]] Sent: Friday, August 09, 2013 11:51 AM To: Trevor Antczak Cc: [email protected] Subject: Re: Hbase keeps dying (Zookeeper) Regards, Trevor. > hadoop-hbase-0.90.6+84.73-1 > hadoop-zookeeper-3.3.5+19.5-1 > hadoop-0.20.2+923.421-1 > Why not to upgrade your components? HBase to the last 0.94.10: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753&version=12324627 Zookeeper to the last 3.4.5: http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html Hadoop 1.2.1: http://hadoop.apache.org/docs/r1.2.1/releasenotes.html That's my first advice. Now, from 3.3.5 to 3.4.5, there a lot of bug fixes and a lot of improvements: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12321883 2013/8/9, Trevor Antczak <[email protected]>: > So I've done some more research into this and it appears that my > Zookeeper doesn't have /hbase/master. From zkCli: > > [zk: localhost:2181(CONNECTED) 3] ls /hbase [splitlog, unassigned, > root-region-server, rs, table, shutdown] > [zk: localhost:2181(CONNECTED) 4] get /hbase/master Node does not > exist: /hbase/master > [zk: localhost:2181(CONNECTED) 5] > > I have no idea how this could have happened, but is there a way to > regenerate the node in zookeeper? All of the other expected nodes are > there. It seems from the logs that everything was fine with hbase > until > 12:01 AM on August 1st, at which point it just stopped working. I > can't find any reason that any of this has happened either. It's all very > strange. > > Trevor > > -----Original Message----- > From: Trevor Antczak [mailto:[email protected]] > Sent: Monday, August 05, 2013 2:40 PM > To: [email protected] > Subject: RE: Hbase keeps dying (Zookeeper) > > hadoop-hbase-0.90.6+84.73-1 > hadoop-zookeeper-3.3.5+19.5-1 > hadoop-0.20.2+923.421-1 > > Yes, hbase is managing the Quorum. > > -----Original Message----- > From: Ted Yu [mailto:[email protected]] > Sent: Monday, August 05, 2013 12:39 PM > To: [email protected] > Subject: Re: Hbase keeps dying (Zookeeper) > > bq. there wasn't a copy of hdfs-site.xml > > Can you tell us the versions of: > hadoop > hbase > zookeeper > you're using ? > > Did you let HBase manage your zookeeper quorum ? > > On Mon, Aug 5, 2013 at 9:15 AM, Trevor Antczak > <[email protected]>wrote: > >> Hi all, >> >> I have an hbase system that has worked fine for quite a long time, >> but now it is quite suddenly developing errors. First it was dying >> immediately on startup because there wasn't a copy of hdfs-site.xml >> in the hbase conf directory (which doesn't seem like it should be >> necessary, and I'm not sure how it got moved if it had been there in >> the first place). I copied the hdfs-site-xml from /etc/hadoops/conf >> into /etc/hbase/conf. Now hbase starts up, but it can never connect >> to Zookeeper and dies after a few minutes of trying. The weird >> thing, is that according to Zookeeper the connection is happening. >> From the hbase logs I get a ton of messages like: >> >> 2013-08-05 11:57:19,019 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: >> master:60000-0x4403f9ef5b20026 Creating (or updating) unassigned node >> for >> 0f3ca79375768472af70765ff231ee32 with OFFLINE state >> 2013-08-05 11:57:19,020 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Handling >> transition=M_ZK_REGION_OFFLINE, server=hmaster:60000, >> region=0f3ca79375768472af70765ff231ee32 >> >> Eventually followed by: >> >> 2013-08-05 11:57:19,105 WARN org.apache.zookeeper.ClientCnxn: Session >> 0x4403f9ef5b20026 for server hslave14/172.20.7.124:2181, unexpected >> error, closing socket connection and attempting reconnect >> java.io.IOException: Packet len4935980 is out of range! >> at >> org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:708) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:867) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154) >> >> And then a bunch more Java errors as the process dies. From the >> Zookeeper logs I see the hbase server connect: >> >> 13/08/05 11:40:27 INFO server.NIOServerCnxn: Accepted socket >> connection from /xxx.xxx.xxx.xxx:34879 >> 13/08/05 11:40:27 INFO server.NIOServerCnxn: Client attempting to >> establish new session at /xxx.xxx.xxx.xxx:34879 >> 13/08/05 11:40:27 INFO server.NIOServerCnxn: Established session >> 0x1404ee40a8d000c with negotiated timeout 40000 for client >> /xxx.xxx.xxx.xxx:34879 >> >> Then disconnect, but only after it shuts down: >> >> 13/08/05 11:45:52 INFO server.NIOServerCnxn: Closed socket connection >> for client /xxx.xxx.xxx.xxx:34879 which had sessionid >> 0x1404ee40a8d000c >> >> Does anyone have any clever ideas of places I can look for this error? >> Or why I'm suddenly having this problem when I haven't changed anything? >> Thanks in advance for any help provided. >> >> Trevor >> > -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
