Re: 'not found' after join

John D. Rowell Thu, 05 May 2011 10:10:32 -0700

Hi Ryan, Greg,

2011/5/5 Ryan Zezeski <[email protected]>


> 1. For example, riak_core has a `handoff_concurrency` setting that
> determines how many vnodes can concurrently handoff on a given node.  By
> default this is set to 4.  That's going to take a while with your 2048
> vnodes and all :)
>

Won't that make the handoff situation potentially worse? From the thread I
understood that the main problem was that the cluster was shuffling too much
data around and thus becoming unresponsive and/or returning unexpected
results (like "not founds"). I'm attributing the concerns more to an
excessive I/O situation than to how long the handoff takes. If the handoff
can be made transparent (no or little side effects) I don't think most
people will really care (e.g. the "fix the cluster tomorrow" anecdote).

How about using a percentage of available I/O to throttle the vnode handoff
concurrency? Start with 1, and monitor the node's I/O (kinda like 'atop'
does, collection CPU, disk and network metrics), if it is below the expected
usage, then increase the vnode handoff concurrency, and vice-versa.

I for one would be perfectly happy if the handoff took several hours (even
days) if we could maintain the core riak_kv characteristics intact during
those events. We've all seen looooong RAID rebuild times, and it's usually
better to just sit tight and keep the rebuild speed low (slower I/O) while
keeping all of the dependent systems running smoothly.

cheers
-jd

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: 'not found' after join

Reply via email to