Dave, Would you be willing to post your custom scripts?
Your setup sounds useful for what we are doing. Thanks. Sent from my iPhone On Jul 19, 2011, at 10:49 AM, "Buttler, David" <[email protected]> wrote: > Hi Stack, > > As a further data point, I always use the hbase-daemon.sh scripts to > start/stop HBase. I modified the start/stop-hbase.sh scripts so that they > don't start/stop zookeeper, and I have a modified version that I call > start/stop-zookeeper.sh. This allows me to use HBase to manage zookeeper so > I can have a more sane configuration system, but not necessarily stop > zookeeper when I stop HBase, since I use zookeeper for some other stuff too. > > Sometimes the region servers don't die when I want them to, so I have another > script that calls the hbase-daemon.sh stop regionserver script in parallel on > all of the machines. Only rarely do I have to kill -9 one. But, as far as I > can tell, I have never lost data doing this. > > Dave > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of Stack > Sent: Tuesday, July 19, 2011 12:11 AM > To: [email protected] > Subject: Re: how to restart a hbase cluster > > On Tue, Jul 19, 2011 at 12:02 AM, Weihua JIANG <[email protected]> wrote: >> It seems stop-hbase.sh only stops master/backup masters and zookeepers. >> > > Usually it sends a signal to the master that then sets a flag in > zookeeper. When regionservers see this flag, they start to close down > user-space regions. When all user-space regions have been closed, > they the server will close catalog regions. When a regionserver is > carrying no regions, it shuts itself down. > > The master waits until all regionservers are down. It then will go down > itself. > > If you have set hbase to manage zookeeper, the last thing done on the > way out is shutdown the zk ensemble. > > This is how it is supposed to work. > > >> So, according to my understanding, region servers shall shutdown >> itself since it can't find either master or zookeeper. >> > > Hmm Don't they keep retrying? > > >> But, I made a recent experimentation on our hbase cluster. After 2 >> days of mater/zookeeper shutdown, the region servers are still alive. > > That doesn't seem correct. Did the cluster come up cleanly? Or did > the master go down before regionservers came up? > >> I am not sure whether it is the problem in hbase release or our own >> problem since our version is a heavy patched one. >> >> Then, can I perform hbase cluster in following way? >> 1. stop master >> 2. stop master backups >> 3. stop zookeepers >> 4. stop region servers >> >> The only difference is step #4. If I manually stop down RS, will it >> affect data integrity? If not, then I can safely performed the steps >> to shutdown the cluster. >> > > If a regionserver crashes down rather than shutdown cleanly, it will > leave its wal logs around. The master will notice them and replay > them. So try not to crash out your regionservers. ./bin/stop-hbase.sh > should put the regionservers all down cleanly. > > If you do ./bin/hbase-daemon.sh stop regionserver, that'll send the > process a signal. It'll run its shutdown signal handler. I think > this will bring on a clean shutdown. See the code to be sure. > > if clean shutdown, data should be preserved. Even if its not a > clean shutdown, as long as the log splitting is allowed complete, > there should be no data loss even if server is crashed down. > > St.Ack
