I can’t speak to why the script is the way it is. But I will say that my
company has been running hbase at massive scale with high reliability
standards for years. We’ve never used any of the built in shell scripts. We
have our own automation, and our HMaster rolling restart is more like what
you describe. So I would say the shell script here is overly conservative
and not prioritizing availability. There’s no concern for racing for master
node, since it uses ZK for leader election, which is designed for this
case. I’d recommend you do what you describe instead if you value
availability (who doesn’t :)?)

On Thu, Mar 9, 2023 at 2:46 AM 杨光 <jacklove2...@gmail.com> wrote:

> Hi everyone! I just read the rolling-restart.sh in $HBASE_HOME/bin, found
> that the script would stop all master service (including the backup ones)
> at the same time, and then restart them both:
>
> # The content of rolling-restart.sh
> ...
> # stop all masters before re-start to avoid races for master znode
> "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" stop master
> "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
>   --hosts "${HBASE_BACKUP_MASTERS}" stop master-backup
>
> # make sure the master znode has been deleted before continuing
> zmaster=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool
> zookeeper.znode.master`
> ...
>
> # all masters are down, now restart
> "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}"
> ${START_CMD_DIST_MODE} master
> "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
>   --hosts "${HBASE_BACKUP_MASTERS}" ${START_CMD_DIST_MODE} master-backup
>
> In this way the HMaster service would be unavailable during this period.
> Why is it designed in this way? Can it be done in a more graceful way? Like
> this:
>
>    - Stop the backup master, and then restart it
>    - Stop the active master, then the backup master would become active
>    - start the original active one of master, now it's the backup one
>
> I have tested it on my own cluster and it seems to work fine. Is this more
> graceful? Or am I missing something?
>

Reply via email to