Re: RegionServer split-brain

Esteban Gutierrez Wed, 18 Oct 2017 09:21:21 -0700

Hi Nikita,

Adding to what Sean said, few other things can happen:


1. The RS will abort since it can't append to the WAL or if we haven't done
any append for a long time it will abort when it attempts to roll the
current WAL.
2. If the client is in the middle of a scan it might return some data or
will fail since the RS can't reach out the NN or fetch blocks from remote
DNs.
3. The client will might see a ZK timeout and will abort even before the RS
crashes.

>From my experience is best if you tune your cluster for short ZK and RPC
timeout for intra-cluster operations in order for the master to start the
recovery of the RS and tune clients with its own ZK and RPC timeouts
depending on business needs.

thanks,
esteban.




--
Cloudera, Inc.


On Wed, Oct 18, 2017 at 10:58 AM, Sean Busbey <[email protected]> wrote:

> When a HBase Master believes a RegionServer is dead, the first step in
> recovering impacted regions[1] is to take over the HDFS leases of its
> WAL files. This prevents the RegionServer from continuing to accept
> edits and will cause it to abort once it sees the lease is gone.
>
>
>
> [1]: http://hbase.apache.org/book.html#_wal_splitting
>
> On Wed, Oct 18, 2017 at 9:56 AM, Nikita Marshalkin <[email protected]>
> wrote:
> > Hi,
> >
> > I've got a question about HBase consistency guarantees,
> > especially linearizability,
> > and can't find the answer in the documentation.
> >
> > Consider the following scenario:
> >
> > 1. There is a network split between RegionServer and (Zookeeper and
> Master)
> > 2. Master reassigns region to another RegionServer
> > 3. Region hasn't reacted yet to ZK session expiration and still believes
> > that he is in game
> >
> > Now there are two RegionServers that think they are "the right one"
> >
> > Clients with outdated cache are communicating with the failed one, while
> > others are writing to the new one.
> > It violates assumption "single RegionServer for a region"
> >
> > What am I missing?
> > --
> > Yours sincerely,
> > Nikita Marshalkin.
>

Re: RegionServer split-brain

Reply via email to