It looks like you've figured it.  Good stuff.
St.Ack

On Tue, Jul 19, 2011 at 1:25 AM, Weihua JIANG <[email protected]> wrote:
> Thanks a lot, Stack.
>
> Now, I have a much clearer understanding.
>
> I think I made a mistake in my previous experimentation. Since I use
> CDH3 for testing, I shutdown the master using command
>         service hadoop-hbase-master stop
> It turns to shutdown the master via hbase-daemon.sh which just send
> the KILL signal to existing master process. Thus, this master shutdown
> has no chance to set the flag on zookeeper.
>
> Meanwhile, stop-hbase.sh doesn't use hbase-daemon.sh to shutdown
> master and has chance to set flag on zookeeper.
>
> Thanks
> Weihua
>
> 2011/7/19 Stack <[email protected]>:
>> On Tue, Jul 19, 2011 at 12:02 AM, Weihua JIANG <[email protected]> 
>> wrote:
>>> It seems stop-hbase.sh only stops master/backup masters and zookeepers.
>>>
>>
>> Usually it sends a signal to the master that then sets a flag in
>> zookeeper.  When regionservers see this flag, they start to close down
>> user-space regions.  When all user-space regions have been closed,
>> they the server will close catalog regions.  When a regionserver is
>> carrying no regions, it shuts itself down.
>>
>> The master waits until all regionservers are down.  It then will go down 
>> itself.
>>
>> If you have set hbase to manage zookeeper, the last thing done on the
>> way out is shutdown the zk ensemble.
>>
>> This is how it is supposed to work.
>>
>>
>>> So, according to my understanding, region servers shall shutdown
>>> itself since it can't find either master or zookeeper.
>>>
>>
>> Hmm  Don't they keep retrying?
>>
>>
>>> But, I made a recent experimentation on our hbase cluster. After 2
>>> days of mater/zookeeper shutdown, the region servers are still alive.
>>
>> That doesn't seem correct.  Did the cluster come up cleanly?  Or did
>> the master go down before regionservers came up?
>>
>>> I am not sure whether it is the problem in hbase release or our own
>>> problem since our version is a heavy patched one.
>>>
>>> Then, can I perform hbase cluster in following way?
>>> 1. stop master
>>> 2. stop master backups
>>> 3. stop zookeepers
>>> 4. stop region servers
>>>
>>> The only difference is step #4. If I manually stop down RS, will it
>>> affect data integrity? If not, then I can safely performed the steps
>>> to shutdown the cluster.
>>>
>>
>> If a regionserver crashes down rather than shutdown cleanly, it will
>> leave its wal logs around.  The master will notice them and replay
>> them.  So try not to crash out your regionservers. ./bin/stop-hbase.sh
>> should put the regionservers all down cleanly.
>>
>> If you do ./bin/hbase-daemon.sh stop regionserver, that'll send the
>> process a signal.  It'll run its shutdown signal handler.  I think
>> this will bring on a clean shutdown.  See the code to be sure.
>>
>> if  clean shutdown, data should be preserved.   Even if its not a
>> clean shutdown, as long as the log splitting is allowed complete,
>> there should be no data loss even if server is crashed down.
>>
>> St.Ack
>>
>

Reply via email to