I looked at how things worked in 0.20 era. It did parent offlining
first as per 0.90 and then added the daughters A first and then B.
Below is the code. So, it probably had same issue I'd guess.
St.Ack
// Mark old region as offline and split in META.
// NOTE: there is no need for retry logic here. HTable does it for us.
oldRegionInfo.setOffline(true);
oldRegionInfo.setSplit(true);
// Inform the HRegionServer that the parent HRegion is no-longer online.
this.server.removeFromOnlineRegions(oldRegionInfo);
Put put = new Put(oldRegionInfo.getRegionName());
put.add(CATALOG_FAMILY, REGIONINFO_QUALIFIER,
Writables.getBytes(oldRegionInfo));
put.add(CATALOG_FAMILY, SPLITA_QUALIFIER,
Writables.getBytes(newRegions[0].getRegionInfo()));
put.add(CATALOG_FAMILY, SPLITB_QUALIFIER,
Writables.getBytes(newRegions[1].getRegionInfo()));
t.put(put);
// If we crash here, then the daughters will not be added and we'll have
// and offlined parent but no daughters to take up the slack. hbase-2244
// adds fixup to the metascanners.
// Add new regions to META
for (int i = 0; i < newRegions.length; i++) {
put = new Put(newRegions[i].getRegionName());
put.add(CATALOG_FAMILY, REGIONINFO_QUALIFIER, Writables.getBytes(
newRegions[i].getRegionInfo()));
t.put(put);
}
On Fri, Aug 19, 2011 at 3:22 AM, Stack <[email protected]> wrote:
> On Fri, Aug 19, 2011 at 12:05 AM, Joseph Pallas
> <[email protected]> wrote:
>> The test program has multiple client threads, each of which is performing a
>> stream of operations (it's actually a custom workload running in the YCSB
>> framework). The program is keeping track of data that was inserted by write
>> operations, and subsequent read operations only retrieve data that was
>> previously written. The read operation involves first doing a
>> HTableInterface.exists call on a row/cf/qual that is expected to exist. It
>> is this exists call that we have seen fail. When the failure occurs, the
>> client reports an exception and stops. Then we examine the data using the
>> HBase shell, and the item we were looking for is there: the exists call
>> should have succeeded. Furthermore, the item has a timestamp that shows it
>> really was inserted several minutes previously—it was not inserted right
>> around the time of the failure (which might happen if there were a race
>> condition of some sort in our client).
>>
>
> OK. The exists call is rarely used I'd say which may be why you are
> seeing something we don't.
>
>
>
>> So, what is interesting is when we look at the log files for the region
>> server, and at the time this happens, the region involved is in the middle
>> of a split. Also, the key we failed on is greater than the split key. After
>> much reading of the code in SplitTransaction and HRegionServer, I came up
>> with a theory.
>>
>> When a region splits, daughter regions are created and the region is marked
>> as offline/splitting in META (by MetaEditor.offlineParentInMeta). The
>> daughter regions are brought online and added to META by
>> SplitTransaction.openDaughterRegion and HRegionServer.postOpenDeployTasks.
>> Later, the META entry for the original region is cleaned up. The two
>> daughter regions are managed in their own DaughterOpener thread. This is
>> where I am suspicious: if daughter A’s thread updates META before daughter
>> B’s thread does, then there's a window of time on the client when
>> HConnectionManager.locateRegionInMeta if looking for a key in daughter B
>> will see only daughter A. The client, I believe, does not check end rows in
>> META, so it will think that daughter A is the region to handle the request.
>>
>
> ooooh.
>
>> Now, the question is: are they any circumstances under which sending that
>> request to the wrong region (daughter A instead of daughter B) might yield
>> incorrect results, instead of an exception? My gut says maybe, but my
>> experiments have not yet managed to find it.
>
>
> Well, we can do a transaction that involved mutliple rows. Currently
> (as I'm sure you know by now), the steps are:
>
> 1. close region (NSRE if anyone asks for the region after close)
> 2. offline region in edit (still NSRE'ing)
> 3. Open Daughters in parallel and then in parallel update .META.
>
> We should add daughters, daughter B first, then daughter A, and then
> offline parent? If we do it in this sequence, if you are looking for
> a row in daughter A, you'll get the parent still and then a NSRE
> because its closed.... so you'll go back to .META. and then find
> daughter A eventually. If you are looking for a row in B and A is
> online first, you'll think it has it when it doesn't... which would be
> bad.
>
> If we offline parent first and then add daughter B first... and we're
> looking for row in daughter A, but its not online yet, we'll get
> WrongRegionException which would be a blast from the past... something
> we used to get in the old days but like polio, managed to eradicate
> them.
>
> How does this sound Joe? We could rig you a SplitTransaction to do
> the above. We could hack one up first and if it did away with your
> issue, we'd then spend a bit of time making sure it rolled back
> properly on fail (need to make sure rollback works properly).
>
> St.Ack
>