On Mon, Jun 4, 2012 at 2:34 PM, aaron morton <aa...@thelastpickle.com>wrote:
> IIRC index slices work a little differently with consistency, they need to > have CL level nodes available for all token ranges. If you drop it to CL > ONE the read is local only for a particular token range. > Yes, this is what we observed. When I reasoned my way through what I knew about how secondary indexes work, I came to the same conclusion about all token ranges having to be available. My surprise at the behavior was because I *hadn't* reasoned my way through it until we had the issue. Somehow I doubt I'm the only user of secondary indexes that was unaware of this ramification of CL choice. It might be a good idea for the documentation to reflect the tradeoffs more clearly. Thanks for you help! Jim > > The problem when doing index reads is the nodes that contain the results > can no longer be selected by the partitioner. > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 2/06/2012, at 5:15 AM, Jim Ancona wrote: > > Hi, > > We have an application with two code paths, one of which uses a secondary > index query and the other, which doesn't. While testing node down scenarios > in our cluster we got a result which surprised (and concerned) me, and I > wanted to find out if the behavior we observed is expected. > > Background: > > - 6 nodes in the cluster (in order: A, B, C, E, F and G) > - RF = 3 > - All operations at QUORUM > - Operation 1: Read by row key followed by write > - Operation 2: Read by secondary index, followed by write > > While running a mixed workload of operations 1 and 2, we got the following > results: > > * Scenario* * Result* All nodes up All operations succeed One node downAll > operations succeedNodes A and E downAll operations succeedNodes A and B > downOperation 1: ~33% fail > Operation 2: All fail Nodes A and C down Operation 1: ~17% fail > Operation 2: All fail > We had expected (perhaps incorrectly) that the secondary index reads would > fail in proportion to the portion of the ring that was unable to reach > quorum, just as the row key reads did. For both operation types the > underlying failure was an UnavailableException. > > The same pattern repeated for the other scenarios we tried. The row key > operations failed at the expected ratios, given the portion of the ring > that was unable to meet quorum because of nodes down, while all the > secondary index reads failed as soon as 2 out of any 3 adjacent nodes were > down. > > Is this an expected behavior? Is it documented anywhere? I didn't find it > with a quick search. > > The operation doing secondary index query is an important one for our app, > and we'd really prefer that it degrade gracefully in the face of cluster > failures. My plan at this point is to do that query at ConsistencyLevel.ONE > (and accept the increased risk of inconsistency). Will that work? > > Thanks in advance, > > Jim > > >