Marcos,

We have plans to upgrade everything in the near future, but we can't just do it 
right now.  For right now I'd just settle for  a way to recreate the 
/hbase/master node in Zookeeper so I can get things working.

Trevor

-----Original Message-----
From: Marcos Luis Ortiz Valmaseda [mailto:[email protected]] 
Sent: Friday, August 09, 2013 11:51 AM
To: Trevor Antczak
Cc: [email protected]
Subject: Re: Hbase keeps dying (Zookeeper)

Regards, Trevor.
> hadoop-hbase-0.90.6+84.73-1
> hadoop-zookeeper-3.3.5+19.5-1
> hadoop-0.20.2+923.421-1
>

Why not to upgrade your components?
HBase to the last 0.94.10:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753&version=12324627

Zookeeper to the last 3.4.5:
http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html

Hadoop 1.2.1:
http://hadoop.apache.org/docs/r1.2.1/releasenotes.html
That's my first advice.

Now, from 3.3.5 to 3.4.5, there a lot of bug fixes and a lot of improvements:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12321883

2013/8/9, Trevor Antczak <[email protected]>:
> So I've done some more research into this and it appears that my 
> Zookeeper doesn't have /hbase/master.  From zkCli:
>
> [zk: localhost:2181(CONNECTED) 3] ls /hbase [splitlog, unassigned, 
> root-region-server, rs, table, shutdown]
> [zk: localhost:2181(CONNECTED) 4] get /hbase/master Node does not 
> exist: /hbase/master
> [zk: localhost:2181(CONNECTED) 5]
>
> I have no idea how this could have happened, but is there a way to 
> regenerate the node in zookeeper?  All of the other expected nodes are 
> there.  It seems from the logs that everything was fine with hbase 
> until
> 12:01 AM on August 1st, at which point it just stopped working. I 
> can't find any reason that any of this has happened either.  It's all very 
> strange.
>
> Trevor
>
> -----Original Message-----
> From: Trevor Antczak [mailto:[email protected]]
> Sent: Monday, August 05, 2013 2:40 PM
> To: [email protected]
> Subject: RE: Hbase keeps dying (Zookeeper)
>
> hadoop-hbase-0.90.6+84.73-1
> hadoop-zookeeper-3.3.5+19.5-1
> hadoop-0.20.2+923.421-1
>
> Yes, hbase is managing the Quorum.
>
> -----Original Message-----
> From: Ted Yu [mailto:[email protected]]
> Sent: Monday, August 05, 2013 12:39 PM
> To: [email protected]
> Subject: Re: Hbase keeps dying (Zookeeper)
>
> bq. there wasn't a copy of hdfs-site.xml
>
> Can you tell us the versions of:
>  hadoop
>  hbase
>  zookeeper
> you're using ?
>
> Did you let HBase manage your zookeeper quorum ?
>
> On Mon, Aug 5, 2013 at 9:15 AM, Trevor Antczak
> <[email protected]>wrote:
>
>> Hi all,
>>
>> I have an hbase system that has worked fine for quite a long time, 
>> but now it is quite suddenly developing errors.  First it was dying 
>> immediately on startup because there wasn't a copy of hdfs-site.xml 
>> in the hbase conf directory (which doesn't seem like it should be 
>> necessary, and I'm not sure how it got moved if it had been there in 
>> the first place).  I copied the hdfs-site-xml from /etc/hadoops/conf 
>> into /etc/hbase/conf.  Now hbase starts up, but it can never connect 
>> to Zookeeper and dies after a few minutes of trying.  The weird 
>> thing, is that according to Zookeeper the connection is happening.  
>> From the hbase logs I get a ton of messages like:
>>
>> 2013-08-05 11:57:19,019 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> master:60000-0x4403f9ef5b20026 Creating (or updating) unassigned node 
>> for
>> 0f3ca79375768472af70765ff231ee32 with OFFLINE state
>> 2013-08-05 11:57:19,020 DEBUG
>> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
>> transition=M_ZK_REGION_OFFLINE, server=hmaster:60000,
>> region=0f3ca79375768472af70765ff231ee32
>>
>> Eventually followed by:
>>
>> 2013-08-05 11:57:19,105 WARN org.apache.zookeeper.ClientCnxn: Session
>> 0x4403f9ef5b20026 for server hslave14/172.20.7.124:2181, unexpected 
>> error, closing socket connection and attempting reconnect
>> java.io.IOException: Packet len4935980 is out of range!
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:708)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:867)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154)
>>
>> And then a bunch more Java errors as the process dies.  From the 
>> Zookeeper logs I see the hbase server connect:
>>
>> 13/08/05 11:40:27 INFO server.NIOServerCnxn: Accepted socket 
>> connection from /xxx.xxx.xxx.xxx:34879
>> 13/08/05 11:40:27 INFO server.NIOServerCnxn: Client attempting to 
>> establish new session at /xxx.xxx.xxx.xxx:34879
>> 13/08/05 11:40:27 INFO server.NIOServerCnxn: Established session 
>> 0x1404ee40a8d000c with negotiated timeout 40000 for client
>> /xxx.xxx.xxx.xxx:34879
>>
>> Then disconnect, but only after it shuts down:
>>
>> 13/08/05 11:45:52 INFO server.NIOServerCnxn: Closed socket connection 
>> for client /xxx.xxx.xxx.xxx:34879 which had sessionid 
>> 0x1404ee40a8d000c
>>
>> Does anyone have any clever ideas of places I can look for this error?
>> Or why I'm suddenly having this problem when I haven't changed anything?
>>  Thanks in advance for any help provided.
>>
>> Trevor
>>
>


--
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz

Reply via email to