I’m not exactly at my wit’s end, but I could use some insight here from someone 
who is intimately familiar with region splitting.  Any guidance (clearing up 
misunderstandings) would be great.

I’ll start with the main point: I think there may be a bug in HBase 0.90.3 that 
involves reading from a region while it is being split.  Below, I’ll explain 
where I think the bug might be and why I am not certain.

Here is the situation.  We have seen our test program turn up a mysterious 
failure on a small number of occasions.  The first couple of times we thought 
it must be a bug in our client code, put it into the “fix this later” category, 
and carried on.  Then we stopped seeing it for a while and figured it must have 
been fixed by some other change.  But when it next appeared we looked at it 
more closely, and we could not explain what we were seeing.

The test program has multiple client threads, each of which is performing a 
stream of operations (it's actually a custom workload running in the YCSB 
framework).  The program is keeping track of data that was inserted by write 
operations, and subsequent read operations only retrieve data that was 
previously written.  The read operation involves first doing a 
HTableInterface.exists call on a row/cf/qual that is expected to exist.  It is 
this exists call that we have seen fail.  When the failure occurs, the client 
reports an exception and stops.  Then we examine the data using the HBase 
shell, and the item we were looking for is there: the exists call should have 
succeeded.  Furthermore, the item has a timestamp that shows it really was 
inserted several minutes previously—it was not inserted right around the time 
of the failure (which might happen if there were a race condition of some sort 
in our client).

So, what is interesting is when we look at the log files for the region server, 
and at the time this happens, the region involved is in the middle of a split. 
Also, the key we failed on is greater than the split key.  After much reading 
of the code in SplitTransaction and HRegionServer, I came up with a theory.

When a region splits, daughter regions are created and the region is marked as 
offline/splitting in META (by MetaEditor.offlineParentInMeta).  The daughter 
regions are brought online and added to META by 
SplitTransaction.openDaughterRegion and HRegionServer.postOpenDeployTasks.  
Later, the META entry for the original region is cleaned up.  The two daughter 
regions are managed in their own DaughterOpener thread.  This is where I am 
suspicious: if daughter A’s thread updates META before daughter B’s thread 
does, then there's a window of time on the client when 
HConnectionManager.locateRegionInMeta if looking for a key in daughter B will 
see only daughter A.  The client, I believe, does not check end rows in META, 
so it will think that daughter A is the region to handle the request.

Now, the question is: are they any circumstances under which sending that 
request to the wrong region (daughter A instead of daughter B) might yield 
incorrect results, instead of an exception?  My gut says maybe, but my 
experiments have not yet managed to find it.  I know that a path that goes 
through HRegion.checkRow should throw an exception.  But around the point where 
RegionScanner is making scanners on all the stores I get lost and uncertain 
about whether there is a case that can slip through the cracks and not call 
checkRow.  Maybe involving MemStoreScanner?

So, in summary: I think it is definitely undesirable to have daughter A appear 
in META before daughter B, but I haven't figured out how that might lead to the 
error we encountered.  I would be a lot more comfortable filing a JIRA if I 
could find that faulty case, or if someone could explain why I'm barking up the 
wrong tree.

Can anyone help?

Thanks.
joe

Reply via email to