Re: Region servers exiting, not recovering

Ryan Rawson Tue, 21 Sep 2010 17:31:49 -0700

We tried that before, but some things are difficult to reset in the same JVM.


A clean restart just works better :-)

On Tue, Sep 21, 2010 at 5:29 PM, Dmitriy Lyubimov <[email protected]> wrote:
> Thanks a lot, Ryan.
>
> That's what i thought, I knew this explanation that the regions are split;
> although I guess one might reason there's no reason why we can't try to
> start a new life by rejoining cluster again as a new region server (but the
> same process). Or at least have such an option. Just wanted to double-check
> before wrapping it into some sort of a kicker.
> -Dmitriy
>
>
> On Tue, Sep 21, 2010 at 5:24 PM, Ryan Rawson <[email protected]> wrote:
>
>> You could wrap the regionserver in a script that auto-reboots them?
>>
>> We cant really recover from this scenario, because the master notices
>> we are dead, then splits our logs and reassigns the regions to other
>> nodes.  This is the basis of how reliable hbase works in the face of
>> machine failure.
>>
>> -ryan
>>
>> On Tue, Sep 21, 2010 at 5:20 PM, Dmitriy Lyubimov <[email protected]>
>> wrote:
>> > Hi,
>> >
>> > so in our production, we see temporary networking failures (we are not
>> quite
>> > 100% sure what they are) but now and then region server's zookeeper
>> session
>> > would get expired and in addition some ipc channels would throw 'channel
>> > closed'.
>> >
>> > This causes region server to exit. Which is not a very big deal, our
>> > monitoring system would send a text message so somebody would restart the
>> > region server.
>> >
>> > however, this does happen a little more often than we probably would have
>> > liked to do it manually.
>> >
>> > Why is server not recovering/reconnecting automatically? is there a
>> facility
>> > to enable server restarts and region server nodes to rejoin the cluster
>> > automatically?
>> >
>> > Thanks.
>> > -Dmitriy
>> >
>>
>

Re: Region servers exiting, not recovering

Reply via email to