Re: Secondary Indexes, Quorum and Cluster Availability

Jim Ancona Tue, 05 Jun 2012 13:30:47 -0700

On Mon, Jun 4, 2012 at 2:34 PM, aaron morton <aa...@thelastpickle.com>wrote:


> IIRC index slices work a little differently with consistency, they need to
> have CL level nodes available for all token ranges. If you drop it to CL
> ONE the read is local only for a particular token range.
>

Yes, this is what we observed. When I reasoned my way through what I knew
about how secondary indexes work, I came to the same conclusion about all
token ranges having to be available.

My surprise at the behavior was because I *hadn't* reasoned my way through
it until we had the issue. Somehow I doubt I'm the only user of secondary
indexes that was unaware of this ramification of CL choice. It might be a
good idea for the documentation to reflect the tradeoffs more clearly.

Thanks for you help!

Jim


>
> The problem when doing index reads is the nodes that contain the results
> can no longer be selected by the partitioner.
>

> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 2/06/2012, at 5:15 AM, Jim Ancona wrote:
>
> Hi,
>
> We have an application with two code paths, one of which uses a secondary
> index query and the other, which doesn't. While testing node down scenarios
> in our cluster we got a result which surprised (and concerned) me, and I
> wanted to find out if the behavior we observed is expected.
>
> Background:
>
>    - 6 nodes in the cluster (in order: A, B, C, E, F and G)
>    - RF = 3
>    - All operations at QUORUM
>    - Operation 1: Read by row key followed by write
>    - Operation 2: Read by secondary index, followed by write
>
> While running a mixed workload of operations 1 and 2, we got the following
> results:
>
>  * Scenario* * Result* All nodes up All operations succeed One node downAll 
> operations succeedNodes A and E downAll operations succeedNodes A and B 
> downOperation 1: ~33% fail
> Operation 2: All fail Nodes A and C down Operation 1: ~17% fail
> Operation 2: All fail
> We had expected (perhaps incorrectly) that the secondary index reads would
> fail in proportion to the portion of the ring that was unable to reach
> quorum, just as the row key reads did. For both operation types the
> underlying failure was an UnavailableException.
>
> The same pattern repeated for the other scenarios we tried. The row key
> operations failed at the expected ratios, given the portion of the ring
> that was unable to meet quorum because of nodes down, while all the
> secondary index reads failed as soon as 2 out of any 3 adjacent nodes were
> down.
>
> Is this an expected behavior? Is it documented anywhere? I didn't find it
> with a quick search.
>
> The operation doing secondary index query is an important one for our app,
> and we'd really prefer that it degrade gracefully in the face of cluster
> failures. My plan at this point is to do that query at ConsistencyLevel.ONE
> (and accept the increased risk of inconsistency). Will that work?
>
> Thanks in advance,
>
> Jim
>
>
>

Re: Secondary Indexes, Quorum and Cluster Availability

Reply via email to