On Thu, Jun 09, 2011 at 08:01:24PM -0700, Ben Tilly wrote:
> It sounds like you understood perfectly.
> 
> Basically we are running a cluster of machines that are busy doing
> lots of stuff.  We wanted to use Riak to keep configuration
> information about those machines and the stuff they were doing.  So
> Riak would be running on machines whose primary job is something else.
>  A critical use case for us is to figure out what needs to be done on
> which other machines after one of the machines goes down.  Therefore
> having the potential to have our data unavailable during a failover
> because of the failover kills the benefit that we wanted from a high
> availability system.

I think you're confusing 2 issues. The time when a *new* node is joining
a cluster or an established node is leaving the cluster *VS* when a node
in the cluster is unavailable.

In the first case, when you are changing the cluster membership (not the
list of available nodes) you will currently get notfounds as the
partitions claims are changed.

In the second case (which should be the more common one if you aren't
constantly adding/removing nodes from the cluster) you will NOT receive
notfounds (assuming your n_val is > 1 and your r value is less than the
n_val). When you do a get on the key which is stored on the downed node
riak will create a fallback vnode for the missing vnode, do a GET on
that key (and it should get N-1 replies) and do a
read-repair on the newly spun up fallback vnode. You should not get
notfounds in this case unless R==N or N=1 or you've had multiple nodes
fail since you last fetched this key (if the cluster is large enough
this should be unlikely).

If you want to increase your robustness further, you can increase N and
reduce R. Git HEAD also has a new GET parameter called notfound_ok,
which instructs riak to treat notfound responses as valid responses
(counting towards R) instead as an error.

Let me know if that clarifies things at all.

Andrew

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to