Re: hosts unreachables

N Keywal Fri, 01 Jun 2012 08:33:16 -0700

Yes, this is the balance process (as its name says: keeps the cluster
balanced), and it's not related to the process of looking after dead
nodes.
The nodes are monitored by ZooKeeper, the timeout is by default 180
seconds (setting: zookeeper.session.timeout)


On Fri, Jun 1, 2012 at 4:40 PM, Cyril Scetbon <[email protected]> wrote:
> I've another regionserver (hb-d2) that crashed (I can easily reproduce the
> issue by continuing injections), and as I see in master log, it gets
> information about hb-d2 every 5 minutes. I suppose it's what helps him to
> note if a node is dead or not. However it adds hb-d2 to the dead node list
> at 13:32:20, so before 5 minutes since the last time it got the server
> information. Is it normal ?
>
> 2012-06-01 13:02:36,309 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
> Server information: hb-d5,60020,1338553124247=47,
> hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
> hb-d10,60020,1338553126695=47, hb-d6,60020,133
> 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
> hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
> hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
> ..
> 2012-06-01 13:07:36,319 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
> Server information: hb-d5,60020,1338553124247=47,
> hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
> hb-d10,60020,1338553126695=47, hb-d6,60020,133
> 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
> hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
> hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
> ..
> 2012-06-01 13:12:36,328 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
> Server information: hb-d5,60020,1338553124247=47,
> hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
> hb-d10,60020,1338553126695=47, hb-d6,60020,133
> 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
> hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
> hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
> ..
> 2012-06-01 13:17:36,337 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
> Server information: hb-d5,60020,1338553124247=47,
> hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
> hb-d10,60020,1338553126695=47, hb-d6,60020,133
> 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
> hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
> hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
> ..
> 2012-06-01 13:22:36,346 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
> Server information: hb-d5,60020,1338553124247=47,
> hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
> hb-d10,60020,1338553126695=47, hb-d6,60020,133
> 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
> hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
> hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
> ..
> 2012-06-01 13:27:36,353 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
> Server information: hb-d5,60020,1338553124247=47,
> hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
> hb-d10,60020,1338553126695=47, hb-d6,60020,133
> 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
> hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
> hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
> ..
> 2012-06-01 13:32:20,048 INFO
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
> ephemeral node deleted, processing expiration [hb-d2,60020,1338553126560]
> 2012-06-01 13:32:20,048 DEBUG org.apache.hadoop.hbase.master.ServerManager:
> Added=hb-d2,60020,1338553126560 to dead servers, submitted shutdown handler
> to be executed, root=false, meta=false
> 2012-06-01 13:32:20,048 INFO
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs
> for hb-d2,60020,1338553126560
>
>
>
> On 6/1/12 3:25 PM, Cyril Scetbon wrote:
>>
>> I've added hbase.hregion.memstore.mslab.enabled = true to the
>> configuration of all regionservers and add flags -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
>> -XX:CMSInitiatingOccupancyFraction=60 to the hbase environment
>> However my regionservers are still crashing when I load data into the
>> cluster
>>
>> Here are the logs for the node hb-d3 that crashed at 12:56
>>
>> - GC logs : http://pastebin.com/T0d0y8pZ
>> - regionserver logs : http://pastebin.com/n6v9x3XM
>>
>> thanks
>>
>> On 5/31/12 11:12 PM, Jean-Daniel Cryans wrote:
>>>
>>> Both, also you could bigger log snippets (post them on something like
>>> pastebin.com) and we could see more evidence of the issue.
>>>
>>> J-D
>>>
>>> On Thu, May 31, 2012 at 2:09 PM, Cyril Scetbon<[email protected]>
>>>  wrote:
>>>>
>>>> On 5/31/12 11:00 PM, Jean-Daniel Cryans wrote:
>>>>>
>>>>> What I'm seeing looks more like GC issues. Start reading this:
>>>>> http://hbase.apache.org/book.html#gc
>>>>>
>>>>> J-D
>>>>
>>>> Hi,
>>>>
>>>> Really not sure cause I've enabled gcc's verbose option and I don't see
>>>> anything taking a long time. Maybe I can check again on one node. On
>>>> which
>>>> node do you think I should check GC issue ?
>>>>
>>>>
>>>> --
>>>> Cyril SCETBON
>>>>
>>
>>
>
>
> --
> Cyril SCETBON
>

Re: hosts unreachables

Reply via email to