Hi Ryan, Greg, 2011/5/5 Ryan Zezeski <[email protected]>
> 1. For example, riak_core has a `handoff_concurrency` setting that > determines how many vnodes can concurrently handoff on a given node. By > default this is set to 4. That's going to take a while with your 2048 > vnodes and all :) > Won't that make the handoff situation potentially worse? From the thread I understood that the main problem was that the cluster was shuffling too much data around and thus becoming unresponsive and/or returning unexpected results (like "not founds"). I'm attributing the concerns more to an excessive I/O situation than to how long the handoff takes. If the handoff can be made transparent (no or little side effects) I don't think most people will really care (e.g. the "fix the cluster tomorrow" anecdote). How about using a percentage of available I/O to throttle the vnode handoff concurrency? Start with 1, and monitor the node's I/O (kinda like 'atop' does, collection CPU, disk and network metrics), if it is below the expected usage, then increase the vnode handoff concurrency, and vice-versa. I for one would be perfectly happy if the handoff took several hours (even days) if we could maintain the core riak_kv characteristics intact during those events. We've all seen looooong RAID rebuild times, and it's usually better to just sit tight and keep the rebuild speed low (slower I/O) while keeping all of the dependent systems running smoothly. cheers -jd
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
