Hi Alex, We're not yet certain of all the performance characteristics of our datatypes to give general recommendations. In order to head some of this off at the pass we have added a bunch of metrics around them[1], including actor-count and merge time. Of course, the general recommendations about Riak object size still apply. Nevertheless, here are a few things we know could be impactful:
- Sets and maps use the 'orddict' Erlang type/module for in-memory. Lookups, inserts, and modifications have known O(N) behavior. We are considering porting Elixir's hashdict/hashset for this purpose. - We previously had custom binary formats, but reverted sets and maps to t2b with compression on. The BIF is simply faster than even our best pattern-matching code, and the compression is good enough to counter any bloat t2b might introduce (we use the lowest level of compression). Still, using a BIF in the critical path may have an impact on scheduler balance. [2] - In order to eliminate non-actor-identifier garbage, sets and maps use a somewhat complicated two-way merge function. There are likely areas for improvement there. - Right now, the "context" payload which lets you safely remove items from sets and maps is essentially an encoded version of the original datatype. Russell is working on reducing the size of that (down to just a causal token, perhaps) but we are unsure whether that will make it into 2.0. - Both sets and maps have a constant overhead for the entire structure and a constant overhead per-entry. Although it is proportionally small, this can add up quickly. For example, I have this little script which adds 1000 integers to a set as 32-bit binaries, and assumes an equal distribution of updates among 3 replicas. If S is the set and SB is the serialized/binary version, erlang:external_size(S) == 40783 but erlang:byte_size(SB) == 4867 (4K and some change, really). Hope that helps, sorry we don't have more general recommendations. [1] https://github.com/basho/riak_kv/blob/develop/src/riak_kv_stat.erl#L464-L474 [2] https://github.com/basho/riak_dt/pull/77 Cheers, On Fri, Feb 21, 2014 at 2:53 PM, Alexander Sicular <[email protected]>wrote: > Hey Gang, > > I'm pretty excited about the new CRDT support coming in the next release. > Although counters are already out, I haven't seen much in the way of > performance guidelines. Where counters are a fold/merge/sum type of deal > with some kind of reasonably bounded size based on update frequency, sets > and maps are potentially unbounded. Is there any guidance Basho can share > in regards to at which size performance will drop off for set/maps. Every > feature has some theoretical vs practical limit, ie. links, wondering what > those are for the new set/map features. Obviously, ymmv. > > Thanks, > > -Alexander Sicular > > @siculars > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > -- Sean Cribbs <[email protected]> Software Engineer Basho Technologies, Inc. http://basho.com/
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
