We tried that before, but some things are difficult to reset in the same JVM.
A clean restart just works better :-) On Tue, Sep 21, 2010 at 5:29 PM, Dmitriy Lyubimov <[email protected]> wrote: > Thanks a lot, Ryan. > > That's what i thought, I knew this explanation that the regions are split; > although I guess one might reason there's no reason why we can't try to > start a new life by rejoining cluster again as a new region server (but the > same process). Or at least have such an option. Just wanted to double-check > before wrapping it into some sort of a kicker. > -Dmitriy > > > On Tue, Sep 21, 2010 at 5:24 PM, Ryan Rawson <[email protected]> wrote: > >> You could wrap the regionserver in a script that auto-reboots them? >> >> We cant really recover from this scenario, because the master notices >> we are dead, then splits our logs and reassigns the regions to other >> nodes. This is the basis of how reliable hbase works in the face of >> machine failure. >> >> -ryan >> >> On Tue, Sep 21, 2010 at 5:20 PM, Dmitriy Lyubimov <[email protected]> >> wrote: >> > Hi, >> > >> > so in our production, we see temporary networking failures (we are not >> quite >> > 100% sure what they are) but now and then region server's zookeeper >> session >> > would get expired and in addition some ipc channels would throw 'channel >> > closed'. >> > >> > This causes region server to exit. Which is not a very big deal, our >> > monitoring system would send a text message so somebody would restart the >> > region server. >> > >> > however, this does happen a little more often than we probably would have >> > liked to do it manually. >> > >> > Why is server not recovering/reconnecting automatically? is there a >> facility >> > to enable server restarts and region server nodes to rejoin the cluster >> > automatically? >> > >> > Thanks. >> > -Dmitriy >> > >> >
