Hi Stack,

As a further data point, I always use the hbase-daemon.sh scripts to start/stop 
HBase.  I modified the start/stop-hbase.sh scripts so that they don't 
start/stop zookeeper, and I have a modified version that I call 
start/stop-zookeeper.sh.  This allows me to use HBase to manage zookeeper so I 
can have a more sane configuration system, but not necessarily stop zookeeper 
when I stop HBase, since I use zookeeper for some other stuff too.

Sometimes the region servers don't die when I want them to, so I have another 
script that calls the hbase-daemon.sh stop regionserver script in parallel on 
all of the machines.  Only rarely do I have to kill -9 one.  But, as far as I 
can tell, I have never lost data doing this.

Dave

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stack
Sent: Tuesday, July 19, 2011 12:11 AM
To: [email protected]
Subject: Re: how to restart a hbase cluster

On Tue, Jul 19, 2011 at 12:02 AM, Weihua JIANG <[email protected]> wrote:
> It seems stop-hbase.sh only stops master/backup masters and zookeepers.
>

Usually it sends a signal to the master that then sets a flag in
zookeeper.  When regionservers see this flag, they start to close down
user-space regions.  When all user-space regions have been closed,
they the server will close catalog regions.  When a regionserver is
carrying no regions, it shuts itself down.

The master waits until all regionservers are down.  It then will go down itself.

If you have set hbase to manage zookeeper, the last thing done on the
way out is shutdown the zk ensemble.

This is how it is supposed to work.


> So, according to my understanding, region servers shall shutdown
> itself since it can't find either master or zookeeper.
>

Hmm  Don't they keep retrying?


> But, I made a recent experimentation on our hbase cluster. After 2
> days of mater/zookeeper shutdown, the region servers are still alive.

That doesn't seem correct.  Did the cluster come up cleanly?  Or did
the master go down before regionservers came up?

> I am not sure whether it is the problem in hbase release or our own
> problem since our version is a heavy patched one.
>
> Then, can I perform hbase cluster in following way?
> 1. stop master
> 2. stop master backups
> 3. stop zookeepers
> 4. stop region servers
>
> The only difference is step #4. If I manually stop down RS, will it
> affect data integrity? If not, then I can safely performed the steps
> to shutdown the cluster.
>

If a regionserver crashes down rather than shutdown cleanly, it will
leave its wal logs around.  The master will notice them and replay
them.  So try not to crash out your regionservers. ./bin/stop-hbase.sh
should put the regionservers all down cleanly.

If you do ./bin/hbase-daemon.sh stop regionserver, that'll send the
process a signal.  It'll run its shutdown signal handler.  I think
this will bring on a clean shutdown.  See the code to be sure.

if  clean shutdown, data should be preserved.   Even if its not a
clean shutdown, as long as the log splitting is allowed complete,
there should be no data loss even if server is crashed down.

St.Ack

Reply via email to