Re: Data Replica algortihm

Andrew Thompson Mon, 19 Aug 2013 16:16:53 -0700

On Mon, Aug 19, 2013 at 04:03:25PM -0300, João Machado wrote:
> Hi all,
> 
> Anyone knows how the algorithm responsible for choosing where the data
> replica is stored works?
> 
> Is possible to customise (or configure) it to ensure that the data will be
> in different physical nodes (or even physical racks) like Cassandra does?


Riak writes the replicas to the vnode to which the {bucket, key} hashes
and the following N-1 vnodes.

Riak uses target_n (which defaults to 4) to determine what the largest N
expected is, it then tries to lay out the vnodes that no <target_n>
sequence of vnodes are on the same physical nodes. Here's an example 4
node cluster with the default target_n of 4:

([email protected])5> riak_core_ring:pretty_print(element(2,
riak_core_ring_manager:get_my_ring()), [legend]).
==================================== Nodes ====================================
Node a: 16 ( 25.0%) [email protected]
Node b: 16 ( 25.0%) [email protected]
Node c: 16 ( 25.0%) [email protected]
Node d: 16 ( 25.0%) [email protected]
==================================== Ring ====================================
abcd|abcd|abcd|abcd|abcd|abcd|abcd|abcd|abcd|abcd|abcd|abcd|abcd|abcd|abcd|abcd|
ok

As you can see, there are no sequences where any 2 vnodes in a 4 vnode
sequence are on the same machine. Now look at what happens when # of
nodes < target_n:

==================================== Nodes ====================================
Node a: 22 ( 34.4%) [email protected]
Node b: 21 ( 32.8%) [email protected]
Node c: 21 ( 32.8%) [email protected]
==================================== Ring =====================================
abca|bcab|cabc|abca|bcab|cabc|abca|bcab|cabc|abca|bcab|cabc|abca|bcab|cabc|abca|

Oops, now we have nodes appearing twice in the list of every 4 vnodes.
This is bad because it means that if you loose one node, and N=4, you
have a chance to loose 2 of your replicas, instead of one.

Rack awareness is something that is being worked on, but we don't have
it just yet, but as long as you have more than target_n nodes, Riak will
try pretty hard to balance your replicas across physical machines. This
is also why we don't recommend riak clusters smaller than 5 nodes
(satisfy target_n and one more machine so you can tolerate 2/5 machines
being down).

Andrew

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Data Replica algortihm

Reply via email to