Hi Nikita, Adding to what Sean said, few other things can happen:
1. The RS will abort since it can't append to the WAL or if we haven't done any append for a long time it will abort when it attempts to roll the current WAL. 2. If the client is in the middle of a scan it might return some data or will fail since the RS can't reach out the NN or fetch blocks from remote DNs. 3. The client will might see a ZK timeout and will abort even before the RS crashes. >From my experience is best if you tune your cluster for short ZK and RPC timeout for intra-cluster operations in order for the master to start the recovery of the RS and tune clients with its own ZK and RPC timeouts depending on business needs. thanks, esteban. -- Cloudera, Inc. On Wed, Oct 18, 2017 at 10:58 AM, Sean Busbey <[email protected]> wrote: > When a HBase Master believes a RegionServer is dead, the first step in > recovering impacted regions[1] is to take over the HDFS leases of its > WAL files. This prevents the RegionServer from continuing to accept > edits and will cause it to abort once it sees the lease is gone. > > > > [1]: http://hbase.apache.org/book.html#_wal_splitting > > On Wed, Oct 18, 2017 at 9:56 AM, Nikita Marshalkin <[email protected]> > wrote: > > Hi, > > > > I've got a question about HBase consistency guarantees, > > especially linearizability, > > and can't find the answer in the documentation. > > > > Consider the following scenario: > > > > 1. There is a network split between RegionServer and (Zookeeper and > Master) > > 2. Master reassigns region to another RegionServer > > 3. Region hasn't reacted yet to ZK session expiration and still believes > > that he is in game > > > > Now there are two RegionServers that think they are "the right one" > > > > Clients with outdated cache are communicating with the failed one, while > > others are writing to the new one. > > It violates assumption "single RegionServer for a region" > > > > What am I missing? > > -- > > Yours sincerely, > > Nikita Marshalkin. >
