Hi Stack, As a further data point, I always use the hbase-daemon.sh scripts to start/stop HBase. I modified the start/stop-hbase.sh scripts so that they don't start/stop zookeeper, and I have a modified version that I call start/stop-zookeeper.sh. This allows me to use HBase to manage zookeeper so I can have a more sane configuration system, but not necessarily stop zookeeper when I stop HBase, since I use zookeeper for some other stuff too.
Sometimes the region servers don't die when I want them to, so I have another script that calls the hbase-daemon.sh stop regionserver script in parallel on all of the machines. Only rarely do I have to kill -9 one. But, as far as I can tell, I have never lost data doing this. Dave -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Stack Sent: Tuesday, July 19, 2011 12:11 AM To: [email protected] Subject: Re: how to restart a hbase cluster On Tue, Jul 19, 2011 at 12:02 AM, Weihua JIANG <[email protected]> wrote: > It seems stop-hbase.sh only stops master/backup masters and zookeepers. > Usually it sends a signal to the master that then sets a flag in zookeeper. When regionservers see this flag, they start to close down user-space regions. When all user-space regions have been closed, they the server will close catalog regions. When a regionserver is carrying no regions, it shuts itself down. The master waits until all regionservers are down. It then will go down itself. If you have set hbase to manage zookeeper, the last thing done on the way out is shutdown the zk ensemble. This is how it is supposed to work. > So, according to my understanding, region servers shall shutdown > itself since it can't find either master or zookeeper. > Hmm Don't they keep retrying? > But, I made a recent experimentation on our hbase cluster. After 2 > days of mater/zookeeper shutdown, the region servers are still alive. That doesn't seem correct. Did the cluster come up cleanly? Or did the master go down before regionservers came up? > I am not sure whether it is the problem in hbase release or our own > problem since our version is a heavy patched one. > > Then, can I perform hbase cluster in following way? > 1. stop master > 2. stop master backups > 3. stop zookeepers > 4. stop region servers > > The only difference is step #4. If I manually stop down RS, will it > affect data integrity? If not, then I can safely performed the steps > to shutdown the cluster. > If a regionserver crashes down rather than shutdown cleanly, it will leave its wal logs around. The master will notice them and replay them. So try not to crash out your regionservers. ./bin/stop-hbase.sh should put the regionservers all down cleanly. If you do ./bin/hbase-daemon.sh stop regionserver, that'll send the process a signal. It'll run its shutdown signal handler. I think this will bring on a clean shutdown. See the code to be sure. if clean shutdown, data should be preserved. Even if its not a clean shutdown, as long as the log splitting is allowed complete, there should be no data loss even if server is crashed down. St.Ack
