Re: crdt performance

Sean Cribbs Sun, 23 Feb 2014 10:22:03 -0800

Small correction, thanks to Steve Vinoski: term_to_binary is reduction-counted 
and interruptible since R16, and so shouldn't cause scheduler issues anymore.


Sean Cribbs

> On Feb 21, 2014, at 5:05 PM, Sean Cribbs <[email protected]> wrote:
> 
> Hi Alex,
> 
> We're not yet certain of all the performance characteristics of our datatypes 
> to give general recommendations. In order to head some of this off at the 
> pass we have added a bunch of metrics around them[1], including actor-count 
> and merge time. Of course, the general recommendations about Riak object size 
> still apply. Nevertheless, here are a few things we know could be impactful:
> Sets and maps use the 'orddict' Erlang type/module for in-memory. Lookups, 
> inserts, and modifications have known O(N) behavior. We are considering 
> porting Elixir's hashdict/hashset for this purpose.
> We previously had custom binary formats, but reverted sets and maps to t2b 
> with compression on. The BIF is simply faster than even our best 
> pattern-matching code, and the compression is good enough to counter any 
> bloat t2b might introduce (we use the lowest level of compression). Still, 
> using a BIF in the critical path may have an impact on scheduler balance. [2]
> In order to eliminate non-actor-identifier garbage, sets and maps use a 
> somewhat complicated two-way merge function. There are likely areas for 
> improvement there.
> Right now, the "context" payload which lets you safely remove items from sets 
> and maps is essentially an encoded version of the original datatype. Russell 
> is working on reducing the size of that (down to just a causal token, 
> perhaps) but we are unsure whether that will make it into 2.0.
> Both sets and maps have a constant overhead for the entire structure and a 
> constant overhead per-entry. Although it is proportionally small, this can 
> add up quickly. For example, I have this little script which adds 1000 
> integers to a set as 32-bit binaries, and assumes an equal distribution of 
> updates among 3 replicas. If S is the set and SB is the serialized/binary 
> version, erlang:external_size(S) == 40783 but erlang:byte_size(SB) == 4867 
> (4K and some change, really).
> Hope that helps, sorry we don't have more general recommendations.
> 
> [1] 
> https://github.com/basho/riak_kv/blob/develop/src/riak_kv_stat.erl#L464-L474
> [2] https://github.com/basho/riak_dt/pull/77
> 
> Cheers,
> 
> 
>> On Fri, Feb 21, 2014 at 2:53 PM, Alexander Sicular <[email protected]> 
>> wrote:
>> Hey Gang,
>> 
>> I'm pretty excited about the new CRDT support coming in the next release. 
>> Although counters are already out, I haven't seen much in the way of 
>> performance guidelines. Where counters are a fold/merge/sum type of deal 
>> with some kind of reasonably bounded size based on update frequency, sets 
>> and maps are potentially unbounded. Is there any guidance Basho can share in 
>> regards to at which size performance will drop off for set/maps. Every 
>> feature has some theoretical vs practical limit, ie. links, wondering what 
>> those are for the new set/map features. Obviously, ymmv.
>> 
>> Thanks,
>> 
>> -Alexander Sicular
>> 
>> @siculars
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> 
> -- 
> Sean Cribbs <[email protected]>
> Software Engineer
> Basho Technologies, Inc.
> http://basho.com/

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: crdt performance

Reply via email to