Small correction, thanks to Steve Vinoski: term_to_binary is reduction-counted and interruptible since R16, and so shouldn't cause scheduler issues anymore.
Sean Cribbs > On Feb 21, 2014, at 5:05 PM, Sean Cribbs <[email protected]> wrote: > > Hi Alex, > > We're not yet certain of all the performance characteristics of our datatypes > to give general recommendations. In order to head some of this off at the > pass we have added a bunch of metrics around them[1], including actor-count > and merge time. Of course, the general recommendations about Riak object size > still apply. Nevertheless, here are a few things we know could be impactful: > Sets and maps use the 'orddict' Erlang type/module for in-memory. Lookups, > inserts, and modifications have known O(N) behavior. We are considering > porting Elixir's hashdict/hashset for this purpose. > We previously had custom binary formats, but reverted sets and maps to t2b > with compression on. The BIF is simply faster than even our best > pattern-matching code, and the compression is good enough to counter any > bloat t2b might introduce (we use the lowest level of compression). Still, > using a BIF in the critical path may have an impact on scheduler balance. [2] > In order to eliminate non-actor-identifier garbage, sets and maps use a > somewhat complicated two-way merge function. There are likely areas for > improvement there. > Right now, the "context" payload which lets you safely remove items from sets > and maps is essentially an encoded version of the original datatype. Russell > is working on reducing the size of that (down to just a causal token, > perhaps) but we are unsure whether that will make it into 2.0. > Both sets and maps have a constant overhead for the entire structure and a > constant overhead per-entry. Although it is proportionally small, this can > add up quickly. For example, I have this little script which adds 1000 > integers to a set as 32-bit binaries, and assumes an equal distribution of > updates among 3 replicas. If S is the set and SB is the serialized/binary > version, erlang:external_size(S) == 40783 but erlang:byte_size(SB) == 4867 > (4K and some change, really). > Hope that helps, sorry we don't have more general recommendations. > > [1] > https://github.com/basho/riak_kv/blob/develop/src/riak_kv_stat.erl#L464-L474 > [2] https://github.com/basho/riak_dt/pull/77 > > Cheers, > > >> On Fri, Feb 21, 2014 at 2:53 PM, Alexander Sicular <[email protected]> >> wrote: >> Hey Gang, >> >> I'm pretty excited about the new CRDT support coming in the next release. >> Although counters are already out, I haven't seen much in the way of >> performance guidelines. Where counters are a fold/merge/sum type of deal >> with some kind of reasonably bounded size based on update frequency, sets >> and maps are potentially unbounded. Is there any guidance Basho can share in >> regards to at which size performance will drop off for set/maps. Every >> feature has some theoretical vs practical limit, ie. links, wondering what >> those are for the new set/map features. Obviously, ymmv. >> >> Thanks, >> >> -Alexander Sicular >> >> @siculars >> >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > -- > Sean Cribbs <[email protected]> > Software Engineer > Basho Technologies, Inc. > http://basho.com/
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
