Thanks a lot, Ryan.

That's what i thought, I knew this explanation that the regions are split;
although I guess one might reason there's no reason why we can't try to
start a new life by rejoining cluster again as a new region server (but the
same process). Or at least have such an option. Just wanted to double-check
before wrapping it into some sort of a kicker.
-Dmitriy


On Tue, Sep 21, 2010 at 5:24 PM, Ryan Rawson <[email protected]> wrote:

> You could wrap the regionserver in a script that auto-reboots them?
>
> We cant really recover from this scenario, because the master notices
> we are dead, then splits our logs and reassigns the regions to other
> nodes.  This is the basis of how reliable hbase works in the face of
> machine failure.
>
> -ryan
>
> On Tue, Sep 21, 2010 at 5:20 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
> > Hi,
> >
> > so in our production, we see temporary networking failures (we are not
> quite
> > 100% sure what they are) but now and then region server's zookeeper
> session
> > would get expired and in addition some ipc channels would throw 'channel
> > closed'.
> >
> > This causes region server to exit. Which is not a very big deal, our
> > monitoring system would send a text message so somebody would restart the
> > region server.
> >
> > however, this does happen a little more often than we probably would have
> > liked to do it manually.
> >
> > Why is server not recovering/reconnecting automatically? is there a
> facility
> > to enable server restarts and region server nodes to rejoin the cluster
> > automatically?
> >
> > Thanks.
> > -Dmitriy
> >
>

Reply via email to