Seems like this scenario could be a slightly safer if a master tells RS's to
exit iff it is the last master in the cluster.

Then Bill's scenario would involve an extra master starting up (but staying
quiescent if the real master stays alive), then that extra master exits with
no extra actions.

The current behavior sub-ideal because restarting region servers incurs a
performance penalty due to bad data placement.  It takes  along time for
that penalty to heal.

But removing the behavior isn't a bad idea either.  In our systems, we are
doing process management in any case so if an hbase master dies, we will
restart it shortly.  It would be nice if the region servers were still there
to receive it when it comes back.

On Wed, Mar 2, 2011 at 5:27 PM, Ryan Rawson <[email protected]> wrote:

> Mis feature, basically a master will tell the regionservers to
> 'shutdown and flush gracefully' via RPC.
>
> Since we don't ship with any cluster management tools - to make your
> life easier we have a 'master tells RS to shutdown' path.  I wouldn't
> be against removing it and relying on regular process management (Eg:
> ssh $host hbase-daemon.sh stop regionserver which uses kill -QUIT) to
> do cluster shutdown.
>
> At SU we have abstracted HBase away from the devs, they use thrift and
> never do anymore more than hbase shell.
>
> -ryan
>
> On Wed, Mar 2, 2011 at 5:23 PM, Bill Graham <[email protected]> wrote:
> > Hi,
> >
> > We had a troubling experience today that I wanted to share. Our dev
> > cluster got completely shut down by a developer by mistake, without
> > said developer even realizing it. Here's how...
> >
> > We have multiple sets of HBase configs checked into SVN that
> > developers can checkout and point their HBASE_CONF_DIR to to easily
> > change from developing in local mode or testing against our
> > distributed dev cluster.
> >
> > In local mode someone might do something like this:
> >
> > bin/start-hbase.sh
> > bin/hbase shell
> >
> > ... do some work ...
> >
> > bin/stop-hbase.sh
> >
> > The problem arose when a developer accidentally tried to do this with
> > their HBASE_CONF_DIR pointing to our dev cluster configs. When this
> > happens, the first command will add another master to the cluster and
> > the last command will shut down the entire cluster. I assume this
> > happens via Zookeeper somehow, since we don't have ssh keys to
> > remotely start/stop as the user running the processes.
> >
> > So the question is, is this a bug or a feature? If it's a feature it
> > seems like an incredibly dangerous one. Once our live cluster is
> > running, those configs will also be needed on the client so really bad
> > things could happen by mistake.
> >
> > thanks,
> > Bill
> >
>

Reply via email to