> 
> Crap, the second after I hit "send" the lightbulb goes on!  Why is that?
> 
> The quorum _was_ met (all vnodes just migrated to the one machine) but since 
> some of them were fail-overs they didn't have the value yet (or the wrong 
> value)?  In this case a read repair happened and subsequent gets worked.
> 

Your understanding is correct. However, when I say "quorum was met" I usually 
mean that "it had R successful replies". Minor semantic quibble.  

You are correct in saying that the wiki is misleading -- read repair happens 
when any successful reply reaches the FSM, even if "not found" was returned to 
the client, that is, if quorum was not met. We'll get that fixed.

> I'm still dark on the second question.
> 
> 
> 2) Why doesn't r=1 work?
> 
> In the IRC session, you claimed that r=1 would not have helped this problem.  
> Just like the OP, this confused me.  You then went on to say it was because 
> of some optimization and then mentioned a "basic quorum."
> 
> I took a few minutes to think about this and the only conclusion I came to is 
> that when r=1 you will treat the first response as the final response, and in 
> this case the notfound response will always come back first?  I'm not sure if 
> what I just said makes sense but I would have expected r=1 to work, just like 
> the OP.  I'll admit that I still haven't read all the wiki docs yet (but I've 
> read Read Repair 3 times now), so I'd be happy to hear RTFM.

A number of months ago, we ran into some issues with a cluster where "not 
found" responses were not returning in a reasonable amount of time, especially 
when R=1. That is, the requests took MUCH longer than a SUCCESSFUL read. We 
determined that this occurred because one of the partitions was too busy to 
reply, causing the request timeout to expire.  So we added a special case 
called "basic quorum" (n_val/2 + 1) that is invoked only when receiving a "not 
found" response from a replica.  The idea is that if a simple majority of the 
replica partitions report "not found", it's probably not there.  This way, you 
don't sit around waiting for the last lonely partition to reply when R=1 (and 
your successful reads are still fast because you only wait for one replica).  
It's a tradeoff of availability: returning a potentially incorrect response vs. 
appearing unavailable (timing out). We chose the former.

Hope that helps,

Sean Cribbs <[email protected]>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/





_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to