Re: In search of a bug around splitting

Stack Thu, 18 Aug 2011 20:34:22 -0700

I looked at how things worked in 0.20 era.  It did parent offlining
first as per 0.90 and then added the daughters A first and then B.
Below is the code.  So, it probably had same issue I'd guess.
St.Ack


    // Mark old region as offline and split in META.
    // NOTE: there is no need for retry logic here. HTable does it for us.
    oldRegionInfo.setOffline(true);
    oldRegionInfo.setSplit(true);
    // Inform the HRegionServer that the parent HRegion is no-longer online.
    this.server.removeFromOnlineRegions(oldRegionInfo);

    Put put = new Put(oldRegionInfo.getRegionName());
    put.add(CATALOG_FAMILY, REGIONINFO_QUALIFIER,
        Writables.getBytes(oldRegionInfo));
    put.add(CATALOG_FAMILY, SPLITA_QUALIFIER,
        Writables.getBytes(newRegions[0].getRegionInfo()));
    put.add(CATALOG_FAMILY, SPLITB_QUALIFIER,
        Writables.getBytes(newRegions[1].getRegionInfo()));
    t.put(put);

    // If we crash here, then the daughters will not be added and we'll have
    // and offlined parent but no daughters to take up the slack.  hbase-2244
    // adds fixup to the metascanners.

    // Add new regions to META
    for (int i = 0; i < newRegions.length; i++) {
      put = new Put(newRegions[i].getRegionName());
      put.add(CATALOG_FAMILY, REGIONINFO_QUALIFIER, Writables.getBytes(
        newRegions[i].getRegionInfo()));
      t.put(put);
    }



On Fri, Aug 19, 2011 at 3:22 AM, Stack <[email protected]> wrote:
> On Fri, Aug 19, 2011 at 12:05 AM, Joseph Pallas
> <[email protected]> wrote:
>> The test program has multiple client threads, each of which is performing a 
>> stream of operations (it's actually a custom workload running in the YCSB 
>> framework).  The program is keeping track of data that was inserted by write 
>> operations, and subsequent read operations only retrieve data that was 
>> previously written.  The read operation involves first doing a 
>> HTableInterface.exists call on a row/cf/qual that is expected to exist.  It 
>> is this exists call that we have seen fail.  When the failure occurs, the 
>> client reports an exception and stops.  Then we examine the data using the 
>> HBase shell, and the item we were looking for is there: the exists call 
>> should have succeeded.  Furthermore, the item has a timestamp that shows it 
>> really was inserted several minutes previously—it was not inserted right 
>> around the time of the failure (which might happen if there were a race 
>> condition of some sort in our client).
>>
>
> OK.  The exists call is rarely used I'd say which may be why you are
> seeing something we don't.
>
>
>
>> So, what is interesting is when we look at the log files for the region 
>> server, and at the time this happens, the region involved is in the middle 
>> of a split. Also, the key we failed on is greater than the split key.  After 
>> much reading of the code in SplitTransaction and HRegionServer, I came up 
>> with a theory.
>>
>> When a region splits, daughter regions are created and the region is marked 
>> as offline/splitting in META (by MetaEditor.offlineParentInMeta).  The 
>> daughter regions are brought online and added to META by 
>> SplitTransaction.openDaughterRegion and HRegionServer.postOpenDeployTasks.  
>> Later, the META entry for the original region is cleaned up.  The two 
>> daughter regions are managed in their own DaughterOpener thread.  This is 
>> where I am suspicious: if daughter A’s thread updates META before daughter 
>> B’s thread does, then there's a window of time on the client when 
>> HConnectionManager.locateRegionInMeta if looking for a key in daughter B 
>> will see only daughter A.  The client, I believe, does not check end rows in 
>> META, so it will think that daughter A is the region to handle the request.
>>
>
> ooooh.
>
>> Now, the question is: are they any circumstances under which sending that 
>> request to the wrong region (daughter A instead of daughter B) might yield 
>> incorrect results, instead of an exception?  My gut says maybe, but my 
>> experiments have not yet managed to find it.
>
>
> Well, we can do a transaction that involved mutliple rows.  Currently
> (as I'm sure you know by now), the steps are:
>
> 1. close region (NSRE if anyone asks for the region after close)
> 2. offline region in edit (still NSRE'ing)
> 3. Open Daughters in parallel and then in parallel update .META.
>
> We should add daughters, daughter B first, then daughter A, and then
> offline parent?  If we do it in this sequence, if you are looking for
> a row in daughter A, you'll get the parent still and then a NSRE
> because its closed.... so you'll go back to .META. and then find
> daughter A eventually.  If you are looking for a row in B and A is
> online first, you'll think it has it when it doesn't... which would be
> bad.
>
> If we offline parent first and then add daughter B first... and we're
> looking for row in daughter A, but its not online yet, we'll get
> WrongRegionException which would be a blast from the past... something
> we used to get in the old days but like polio, managed to eradicate
> them.
>
> How does this sound Joe?  We could rig you a SplitTransaction to do
> the above.  We could hack one up first and if it did away with your
> issue, we'd then spend a bit of time making sure it rolled back
> properly on fail (need to make sure rollback works properly).
>
> St.Ack
>

Re: In search of a bug around splitting

Reply via email to