Another thing 2i lacks, is the ability to set the R-value. Just like when GET'ing on a specific key, there is a tradeoff between consistency and performance in choosing the set of vnodes to query.

The 2i query implementation in Riak chooses performance over consistency by only querying the minimum set vnodes (1/n), which is the equivalent of R=1. In other words, 2i queries have no entropy tolerence. If an object is missing from a given vnode (not yet fixed by AE), you will get inconcistent results in approximately 1/n of your 2i queries for it. Riak chooses the 1/n vnode set somewhat randomly, so you can experience that running the 2i query multiple times returns different sets of keys.

There is currently no easy way to change that for high-consistency use cases. As an experiment, I have tried modifying Riak's source code, to make it query all vnodes, which seemingly works fine. However, Riak will then return (up to) n copies of each key, so these need to be deduped by the client. Obviously, this alternative approach favours concistency over performance, and will be ~3 times as expensive.

Also remember that AAE will fix entropy eventually, so the windows for inconsistent 2i queries will be closed by Riak itself after some time. You can look at the AAE logs to deduct the level of entropy you are running with and guestimate the impact from that on the consistency of your 2i queries.

- Rune, Trifork


Den 07-10-2013 23:46, Jon Meredith skrev:
Hi Brady,

The 2I indices are written in the same store as the main objects whenever the main object is updated. If a primary node is down, the indices will be written to a fallback node. When the fallback sees the primary come back online and stops receiving requests for that partition it will send the main object back to the primary and that will re-index it.

The docs could benefit with a little clarification. Secondary indices do benefit from read repair, that is if the main object is spotted as being out of date or missing during a get, it is rewritten with the up to date information on all nodes. The anti-entropy mechanism that we are currently missing is spotting corruption within leveldb itself. For example if part of a the leveldb database storing a vnode is corrupted so that the .sst files containing the index entries were destroyed there is no mechanism to spot and repair that. We are intending to add that for the next major release.

Jon


On Mon, Oct 7, 2013 at 2:39 PM, Brady Wetherington <[email protected] <mailto:[email protected]>> wrote:

    What happens to your 2i indexes if you do a write and one of the
    nodes you're trying to write to is down?

    http://docs.basho.com/riak/latest/dev/using/2i/ says:

      * When you want or need anti-entropy. Since 2i is just metadata
        on the KV object and the indexes reside on the same node, 2i
        piggybacks off of read-repair.

    But
    http://docs.basho.com/riak/latest/ops/running/recovery/repairing-indexes/
    says:

    Riak Secondary indexes (2i) currently have no form of anti-entropy
    (such as read-repair). Furthermore, for performance and load
    balancing reasons, 2i reads from 1 random node. This means that
    when a replica loss has occurred, inconsistent results may be
    returned.

    I am building a solution around 2i - so I just wanted to know if
    there was any way to clarify these points - how resilient are
    these indexes? Under what circumstances will they stop working (or
    return inconsistent results)?

    -B.

    _______________________________________________
    riak-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




--
Jon Meredith
VP, Engineering
Basho Technologies, Inc.
[email protected] <mailto:[email protected]>


--
sdfd

Best regards / Venlig hilsen

*Rune Skou Larsen*
Trifork Public A/S
Dyssen 1, 8200 Ã…rhus N, Denmark
Phone: +45 3160 2497    Skype: runeskoularsen   twitter: @RuneSkouLarsen

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to