Re: crdt performance

Sean Cribbs Fri, 21 Feb 2014 14:07:24 -0800

Hi Alex,

We're not yet certain of all the performance characteristics of our
datatypes to give general recommendations. In order to head some of this
off at the pass we have added a bunch of metrics around them[1], including
actor-count and merge time. Of course, the general recommendations about
Riak object size still apply. Nevertheless, here are a few things we know
could be impactful:


   - Sets and maps use the 'orddict' Erlang type/module for in-memory.
   Lookups, inserts, and modifications have known O(N) behavior. We are
   considering porting Elixir's hashdict/hashset for this purpose.
   - We previously had custom binary formats, but reverted sets and maps to
   t2b with compression on. The BIF is simply faster than even our best
   pattern-matching code, and the compression is good enough to counter any
   bloat t2b might introduce (we use the lowest level of compression). Still,
   using a BIF in the critical path may have an impact on scheduler balance.
   [2]
   - In order to eliminate non-actor-identifier garbage, sets and maps use
   a somewhat complicated two-way merge function. There are likely areas for
   improvement there.
   - Right now, the "context" payload which lets you safely remove items
   from sets and maps is essentially an encoded version of the original
   datatype. Russell is working on reducing the size of that (down to just a
   causal token, perhaps) but we are unsure whether that will make it into 2.0.
   - Both sets and maps have a constant overhead for the entire structure
   and a constant overhead per-entry. Although it is proportionally small,
   this can add up quickly. For example, I have this little script which adds
   1000 integers to a set as 32-bit binaries, and assumes an equal
   distribution of updates among 3 replicas. If S is the set and SB is the
   serialized/binary version, erlang:external_size(S) == 40783 but
   erlang:byte_size(SB) == 4867 (4K and some change, really).

Hope that helps, sorry we don't have more general recommendations.

[1]
https://github.com/basho/riak_kv/blob/develop/src/riak_kv_stat.erl#L464-L474
[2] https://github.com/basho/riak_dt/pull/77

Cheers,


On Fri, Feb 21, 2014 at 2:53 PM, Alexander Sicular <[email protected]>wrote:

> Hey Gang,
>
> I'm pretty excited about the new CRDT support coming in the next release.
> Although counters are already out, I haven't seen much in the way of
> performance guidelines. Where counters are a fold/merge/sum type of deal
> with some kind of reasonably bounded size based on update frequency, sets
> and maps are potentially unbounded. Is there any guidance Basho can share
> in regards to at which size performance will drop off for set/maps. Every
> feature has some theoretical vs practical limit, ie. links, wondering what
> those are for the new set/map features. Obviously, ymmv.
>
> Thanks,
>
> -Alexander Sicular
>
> @siculars
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Sean Cribbs <[email protected]>
Software Engineer
Basho Technologies, Inc.
http://basho.com/

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: crdt performance

Reply via email to