Re: Questions about ring size

Mark Phillips Fri, 08 Mar 2013 17:31:35 -0800

Hi Chris,

Thanks for the detailed write up. These are some great data points.

We're doing some work right now to make large rings (where "large" =
more than 512 partitions) more efficient in terms of start and
convergence time, and handoff.

First things first: since your test cluster has no data in it, adding
"forced_ownership_handoff" to your riak_core section of your
app.config and up'ing it to something higher than your ring size
should help hasten convergence. *This is only useful for the purposes
of testing and should not be done in production.* That would look like
this:

{forced_ownership_handoff, 512}

You could also increase the "max_concurrency" setting (which also has
to be added to the riak_core section in your app.config). This
defaults to "2". You could also look at lowering the
"vnode_management_timer" from "10000" (10 seconds by default).

Back to the current limitations of Riak..

A few members of the Basho eng team - primarily Joe Blomstedt - have
been hacking on the ring-relate code for the last week or so are
making some great progress. The improvements will be in the 1.4
release (though it is a few months out from being official). To quote
Joe from an internal email: "in my current work-in-progress branch, I
successfully joined 4-nodes together using a 16384 ring yesterday.
Still took about 20 min, but working on bringing that down even
further today. Also, impact to cluster performance is worlds
different."

So, we're well aware of the improvements that need to be made in the
arena and are working quickly to improve. I think Joe has plans to
share his working code with the list in the near future (via a GitHub
PR/Issue I suspect), so look out for that.

In the interim, I would stick with a ring size of 512 or less for
productions clusters if you're not already live, and lean on some
beefier hardware to mitigate the current inefficiencies with large
rings until the code is purified.

Let us know if you have any other questions. Thanks for your testing
and patience.

Mark

On Fri, Mar 8, 2013 at 6:37 AM, Chris Read <[email protected]> wrote:
> Greetings all...
>
> While I can find lots of documentation about what a ring is and how it's
> using in Riak, I've found very little that's actually useful about
> determining the right size for your system. The most useful formula I've
> found so far has been the simple:
>
> ring size = 2 ^ (ceiling(log(max nodes * min partitions per node, 2)))
>
> Where the minimum recommended number of partitions per node is 10 (as per
> http://docs.basho.com/riak/latest/cookbooks/faqs/operations-faq/#is-it-possible-to-change-the-number-of-partitions).
>
> Nothing tells me though what sane upper bound is for the amount of data in a
> partition, or the overhead inside the cluster of managing larger ring sizes.
> My gut feel though is that more than a couple of hundred gigabytes per
> partition is getting a bit much.
>
> I've done some initial testing of ring sizes across a cluster of 9 physical
> machines and have seen some concerning results. All the numbers below are
> done on the same hardware running Ubuntu 12.04 with Riak 1.3.0 (official
> .deb release):
>
> Ring Size      |   512 |  1024 |    2048 |
> Create Cluster | 01:53 | 05:41 | 0:12:58 |
> Remove Node    | 04:01 | 10:31 | 0:31:13 |
> Add Node       | 01:05 | 05:22 | 1:04:49 |
>
> All this is done with NO DATA in the cluster at all - so why does it take
> over an hour to add a new node when ring=2048?
>
> Does it have anything to do with the concerns raised on this thread:
> https://groups.google.com/forum/?fromgroups=#!topic/nosql-databases/DZkgkgd9YnA
>
> Thanks,
>
> Chris
>
>
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Questions about ring size

Reply via email to