oops, forgot to include the list >> This sucks, so as an optimization you decided that if a majority of the cluster thinks it's X, well then it must be X! I'm not sure I explained that well, but I'm sure I understand it now :)
All true, except when Riak knows X is out of date, and Y is the correct value, and forces read repair with Y, but returns X which sucks, because a KV can be Found, Not Found, and Found, when you need it to be continuously found. Note, I dont mind that an out of date version is returned by Riak - that I can handle. But a 404 for a KV that actually exists is a problem for me. On 15 January 2011 11:49, Ryan Zezeski <[email protected]> wrote: > > > On Fri, Jan 14, 2011 at 7:12 PM, Sean Cribbs <[email protected]> wrote: > >> >> >> Crap, the second after I hit "send" the lightbulb goes on! Why is that? >> >> The quorum _was_ met (all vnodes just migrated to the one machine) but >> since some of them were fail-overs they didn't have the value yet (or the >> wrong value)? In this case a read repair happened and subsequent gets >> worked. >> >> >> Your understanding is correct. However, when I say "quorum was met" I >> usually mean that "it had R successful replies". Minor semantic quibble. >> >> You are correct in saying that the wiki is misleading -- read repair >> happens when any successful reply reaches the FSM, even if "not found" was >> returned to the client, that is, if quorum was not met. We'll get that >> fixed. >> >> I'm still dark on the second question. >> >> >>> 2) Why doesn't r=1 work? >>> >>> In the IRC session, you claimed that r=1 would not have helped this >>> problem. Just like the OP, this confused me. You then went on to say it >>> was because of some optimization and then mentioned a "basic quorum." >>> >>> I took a few minutes to think about this and the only conclusion I came >>> to is that when r=1 you will treat the first response as the final response, >>> and in this case the notfound response will always come back first? I'm not >>> sure if what I just said makes sense but I would have expected r=1 to work, >>> just like the OP. I'll admit that I still haven't read all the wiki docs >>> yet (but I've read Read Repair 3 times now), so I'd be happy to hear RTFM. >>> >> >> A number of months ago, we ran into some issues with a cluster where "not >> found" responses were not returning in a reasonable amount of time, >> especially when R=1. That is, the requests took MUCH longer than a >> SUCCESSFUL read. We determined that this occurred because one of the >> partitions was too busy to reply, causing the request timeout to expire. So >> we added a special case called "basic quorum" (n_val/2 + 1) that is invoked >> only when receiving a "not found" response from a replica. The idea is that >> if a simple majority of the replica partitions report "not found", it's >> probably not there. This way, you don't sit around waiting for the last >> lonely partition to reply when R=1 (and your successful reads are still fast >> because you only wait for one replica). It's a tradeoff of availability: >> returning a potentially incorrect response vs. appearing unavailable (timing >> out). We chose the former. >> >> Hope that helps, >> > > > Reading your explanation made me realize it's because I'm mucking up > the semantics of "quorum." It was previously my understanding that if R=1 > then you only need a quorum of 1 vnode, where a quorum is simply defined as > a response. Which would mean that the first reply (whether notfound or a > value) would be considered the cluster value. However, as you subtly hinted > to above, quorum does not mean that, i.e. it's more than just a response. > It's that R vnodes found _a_ value and agreed on it's contents. Going back > to the case of R=1, N=3, and the value is missing on 2 of it's preferred > vnodes it means that the request will take as long as the longest vnode to > respond, even if 2 vnodes reply immediately with no value. This sucks, so > as an optimization you decided that if a majority of the cluster thinks it's > X, well then it must be X! I'm not sure I explained that well, but I'm sure > I understand it now :) > > > -Ryan > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
