What are the JVM limitations that you were you running into? -Matthew
On Sep 21, 2010, at 5:31 PM, Ryan Rawson wrote: > We tried that before, but some things are difficult to reset in the same JVM. > > A clean restart just works better :-) > > On Tue, Sep 21, 2010 at 5:29 PM, Dmitriy Lyubimov <[email protected]> wrote: >> Thanks a lot, Ryan. >> >> That's what i thought, I knew this explanation that the regions are split; >> although I guess one might reason there's no reason why we can't try to >> start a new life by rejoining cluster again as a new region server (but the >> same process). Or at least have such an option. Just wanted to double-check >> before wrapping it into some sort of a kicker. >> -Dmitriy >> >> >> On Tue, Sep 21, 2010 at 5:24 PM, Ryan Rawson <[email protected]> wrote: >> >>> You could wrap the regionserver in a script that auto-reboots them? >>> >>> We cant really recover from this scenario, because the master notices >>> we are dead, then splits our logs and reassigns the regions to other >>> nodes. This is the basis of how reliable hbase works in the face of >>> machine failure. >>> >>> -ryan >>> >>> On Tue, Sep 21, 2010 at 5:20 PM, Dmitriy Lyubimov <[email protected]> >>> wrote: >>>> Hi, >>>> >>>> so in our production, we see temporary networking failures (we are not >>> quite >>>> 100% sure what they are) but now and then region server's zookeeper >>> session >>>> would get expired and in addition some ipc channels would throw 'channel >>>> closed'. >>>> >>>> This causes region server to exit. Which is not a very big deal, our >>>> monitoring system would send a text message so somebody would restart the >>>> region server. >>>> >>>> however, this does happen a little more often than we probably would have >>>> liked to do it manually. >>>> >>>> Why is server not recovering/reconnecting automatically? is there a >>> facility >>>> to enable server restarts and region server nodes to rejoin the cluster >>>> automatically? >>>> >>>> Thanks. >>>> -Dmitriy >>>> >>> >>
