It looks like you've figured it. Good stuff. St.Ack
On Tue, Jul 19, 2011 at 1:25 AM, Weihua JIANG <[email protected]> wrote: > Thanks a lot, Stack. > > Now, I have a much clearer understanding. > > I think I made a mistake in my previous experimentation. Since I use > CDH3 for testing, I shutdown the master using command > service hadoop-hbase-master stop > It turns to shutdown the master via hbase-daemon.sh which just send > the KILL signal to existing master process. Thus, this master shutdown > has no chance to set the flag on zookeeper. > > Meanwhile, stop-hbase.sh doesn't use hbase-daemon.sh to shutdown > master and has chance to set flag on zookeeper. > > Thanks > Weihua > > 2011/7/19 Stack <[email protected]>: >> On Tue, Jul 19, 2011 at 12:02 AM, Weihua JIANG <[email protected]> >> wrote: >>> It seems stop-hbase.sh only stops master/backup masters and zookeepers. >>> >> >> Usually it sends a signal to the master that then sets a flag in >> zookeeper. When regionservers see this flag, they start to close down >> user-space regions. When all user-space regions have been closed, >> they the server will close catalog regions. When a regionserver is >> carrying no regions, it shuts itself down. >> >> The master waits until all regionservers are down. It then will go down >> itself. >> >> If you have set hbase to manage zookeeper, the last thing done on the >> way out is shutdown the zk ensemble. >> >> This is how it is supposed to work. >> >> >>> So, according to my understanding, region servers shall shutdown >>> itself since it can't find either master or zookeeper. >>> >> >> Hmm Don't they keep retrying? >> >> >>> But, I made a recent experimentation on our hbase cluster. After 2 >>> days of mater/zookeeper shutdown, the region servers are still alive. >> >> That doesn't seem correct. Did the cluster come up cleanly? Or did >> the master go down before regionservers came up? >> >>> I am not sure whether it is the problem in hbase release or our own >>> problem since our version is a heavy patched one. >>> >>> Then, can I perform hbase cluster in following way? >>> 1. stop master >>> 2. stop master backups >>> 3. stop zookeepers >>> 4. stop region servers >>> >>> The only difference is step #4. If I manually stop down RS, will it >>> affect data integrity? If not, then I can safely performed the steps >>> to shutdown the cluster. >>> >> >> If a regionserver crashes down rather than shutdown cleanly, it will >> leave its wal logs around. The master will notice them and replay >> them. So try not to crash out your regionservers. ./bin/stop-hbase.sh >> should put the regionservers all down cleanly. >> >> If you do ./bin/hbase-daemon.sh stop regionserver, that'll send the >> process a signal. It'll run its shutdown signal handler. I think >> this will bring on a clean shutdown. See the code to be sure. >> >> if clean shutdown, data should be preserved. Even if its not a >> clean shutdown, as long as the log splitting is allowed complete, >> there should be no data loss even if server is crashed down. >> >> St.Ack >> >
