Re: Region servers exiting, not recovering

Matthew LeMieux Tue, 21 Sep 2010 17:37:02 -0700

What are the JVM limitations that you were you running into?

-Matthew


On Sep 21, 2010, at 5:31 PM, Ryan Rawson wrote:

> We tried that before, but some things are difficult to reset in the same JVM.
> 
> A clean restart just works better :-)
> 
> On Tue, Sep 21, 2010 at 5:29 PM, Dmitriy Lyubimov <[email protected]> wrote:
>> Thanks a lot, Ryan.
>> 
>> That's what i thought, I knew this explanation that the regions are split;
>> although I guess one might reason there's no reason why we can't try to
>> start a new life by rejoining cluster again as a new region server (but the
>> same process). Or at least have such an option. Just wanted to double-check
>> before wrapping it into some sort of a kicker.
>> -Dmitriy
>> 
>> 
>> On Tue, Sep 21, 2010 at 5:24 PM, Ryan Rawson <[email protected]> wrote:
>> 
>>> You could wrap the regionserver in a script that auto-reboots them?
>>> 
>>> We cant really recover from this scenario, because the master notices
>>> we are dead, then splits our logs and reassigns the regions to other
>>> nodes.  This is the basis of how reliable hbase works in the face of
>>> machine failure.
>>> 
>>> -ryan
>>> 
>>> On Tue, Sep 21, 2010 at 5:20 PM, Dmitriy Lyubimov <[email protected]>
>>> wrote:
>>>> Hi,
>>>> 
>>>> so in our production, we see temporary networking failures (we are not
>>> quite
>>>> 100% sure what they are) but now and then region server's zookeeper
>>> session
>>>> would get expired and in addition some ipc channels would throw 'channel
>>>> closed'.
>>>> 
>>>> This causes region server to exit. Which is not a very big deal, our
>>>> monitoring system would send a text message so somebody would restart the
>>>> region server.
>>>> 
>>>> however, this does happen a little more often than we probably would have
>>>> liked to do it manually.
>>>> 
>>>> Why is server not recovering/reconnecting automatically? is there a
>>> facility
>>>> to enable server restarts and region server nodes to rejoin the cluster
>>>> automatically?
>>>> 
>>>> Thanks.
>>>> -Dmitriy
>>>> 
>>> 
>>

Re: Region servers exiting, not recovering

Reply via email to