Taking a look at the git blame for the script, some of the parts you
reference are over 13 years old. So it may just be that they deserve some
updating. Anyway, you are not missing anything and your approach is both
safe and more graceful.

On Thu, Mar 9, 2023 at 8:47 PM Bryan Beaudreault <bbeaudrea...@apache.org>
wrote:

> I can’t speak to why the script is the way it is. But I will say that my
> company has been running hbase at massive scale with high reliability
> standards for years. We’ve never used any of the built in shell scripts. We
> have our own automation, and our HMaster rolling restart is more like what
> you describe. So I would say the shell script here is overly conservative
> and not prioritizing availability. There’s no concern for racing for master
> node, since it uses ZK for leader election, which is designed for this
> case. I’d recommend you do what you describe instead if you value
> availability (who doesn’t :)?)
>
> On Thu, Mar 9, 2023 at 2:46 AM 杨光 <jacklove2...@gmail.com> wrote:
>
>> Hi everyone! I just read the rolling-restart.sh in $HBASE_HOME/bin, found
>> that the script would stop all master service (including the backup ones)
>> at the same time, and then restart them both:
>>
>> # The content of rolling-restart.sh
>> ...
>> # stop all masters before re-start to avoid races for master znode
>> "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" stop master
>> "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
>>   --hosts "${HBASE_BACKUP_MASTERS}" stop master-backup
>>
>> # make sure the master znode has been deleted before continuing
>> zmaster=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool
>> zookeeper.znode.master`
>> ...
>>
>> # all masters are down, now restart
>> "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}"
>> ${START_CMD_DIST_MODE} master
>> "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
>>   --hosts "${HBASE_BACKUP_MASTERS}" ${START_CMD_DIST_MODE} master-backup
>>
>> In this way the HMaster service would be unavailable during this period.
>> Why is it designed in this way? Can it be done in a more graceful way?
>> Like
>> this:
>>
>>    - Stop the backup master, and then restart it
>>    - Stop the active master, then the backup master would become active
>>    - start the original active one of master, now it's the backup one
>>
>> I have tested it on my own cluster and it seems to work fine. Is this more
>> graceful? Or am I missing something?
>>
>

Reply via email to