oops, forgot to include the list

>> This sucks, so as an optimization you decided that if a majority of the
cluster thinks it's X, well then it must be X!  I'm not sure I explained
that well, but I'm sure I understand it now :)

All true, except when Riak knows X is out of date, and Y is the correct
value, and forces read repair with Y, but returns X which sucks, because a
KV can be Found, Not Found, and Found, when you need it to
be continuously found.

Note, I dont mind that an out of date version is returned by Riak - that I
can handle. But a 404 for a KV that actually exists is a problem for me.



 On 15 January 2011 11:49, Ryan Zezeski <[email protected]> wrote:

>
>
> On Fri, Jan 14, 2011 at 7:12 PM, Sean Cribbs <[email protected]> wrote:
>
>>
>>
>> Crap, the second after I hit "send" the lightbulb goes on!  Why is that?
>>
>> The quorum _was_ met (all vnodes just migrated to the one machine) but
>> since some of them were fail-overs they didn't have the value yet (or the
>> wrong value)?  In this case a read repair happened and subsequent gets
>> worked.
>>
>>
>> Your understanding is correct. However, when I say "quorum was met" I
>> usually mean that "it had R successful replies". Minor semantic quibble.
>>
>> You are correct in saying that the wiki is misleading -- read repair
>> happens when any successful reply reaches the FSM, even if "not found" was
>> returned to the client, that is, if quorum was not met. We'll get that
>> fixed.
>>
>> I'm still dark on the second question.
>>
>>
>>> 2) Why doesn't r=1 work?
>>>
>>> In the IRC session, you claimed that r=1 would not have helped this
>>> problem.  Just like the OP, this confused me.  You then went on to say it
>>> was because of some optimization and then mentioned a "basic quorum."
>>>
>>> I took a few minutes to think about this and the only conclusion I came
>>> to is that when r=1 you will treat the first response as the final response,
>>> and in this case the notfound response will always come back first?  I'm not
>>> sure if what I just said makes sense but I would have expected r=1 to work,
>>> just like the OP.  I'll admit that I still haven't read all the wiki docs
>>> yet (but I've read Read Repair 3 times now), so I'd be happy to hear RTFM.
>>>
>>
>> A number of months ago, we ran into some issues with a cluster where "not
>> found" responses were not returning in a reasonable amount of time,
>> especially when R=1. That is, the requests took MUCH longer than a
>> SUCCESSFUL read. We determined that this occurred because one of the
>> partitions was too busy to reply, causing the request timeout to expire.  So
>> we added a special case called "basic quorum" (n_val/2 + 1) that is invoked
>> only when receiving a "not found" response from a replica.  The idea is that
>> if a simple majority of the replica partitions report "not found", it's
>> probably not there.  This way, you don't sit around waiting for the last
>> lonely partition to reply when R=1 (and your successful reads are still fast
>> because you only wait for one replica).  It's a tradeoff of availability:
>> returning a potentially incorrect response vs. appearing unavailable (timing
>> out). We chose the former.
>>
>> Hope that helps,
>>
>
>
> Reading your explanation made me realize it's because I'm mucking up
> the semantics of "quorum."  It was previously my understanding that if R=1
> then you only need a quorum of 1 vnode, where a quorum is simply defined as
> a response.  Which would mean that the first reply (whether notfound or a
> value) would be considered the cluster value.  However, as you subtly hinted
> to above, quorum does not mean that, i.e. it's more than just a response.
>  It's that R vnodes found _a_ value and agreed on it's contents.  Going back
> to the case of R=1, N=3, and the value is missing on 2 of it's preferred
> vnodes it means that the request will take as long as the longest vnode to
> respond, even if 2 vnodes reply immediately with no value.  This sucks, so
> as an optimization you decided that if a majority of the cluster thinks it's
> X, well then it must be X!  I'm not sure I explained that well, but I'm sure
> I understand it now :)
>
>
> -Ryan
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to