Hi Justin,

I wanted to write this earlier, but I just had to much on my plate:

Am 08.06.2011 16:11, schrieb Justin Sheehy:
On Thu, Mar 24, 2011 at 1:51 PM, Nico Meyer<[email protected]>  wrote:

The bigger concern for me would be the way the bucket/key tuple is
serialized:

Eshell V5.8  (abort with ^G)
1>  iolist_size(term_to_binary({<<>>,<<>>})).
13

That's 13 bytes of overhead per key were only 2 bytes is needed with
reasonable bucket/key length limits of 256 bytes each. Or if that is not
enough, one could also use a variable length encoding, so bucket/keys
can be arbitrarily large and the most common cases (less then 128 bytes)
still only use 2 bytes of overhead.
I've made a branch of bitcask that effectively does this.  It uses 3
bytes per record instead of 13, saving 10 bytes (both in RAM and on
disk) per element stored.

The tricky thing, however, is backward compatibility.  There are many
Riak installations out there with data stored in bitcask using the old
key encoding, and we shouldn't force them all to do a very costly
full-sweep of their existing data in order to get these savings.  When
we sort out the best way to manage a smooth upgrade, I would happily
push out the smaller encoding.


I think the possible gains of this change are fairly limited. Shaving of about 10 bytes per key compared to 43 bytes of overhead plus lets say at least 10 bytes for bucket and key combined is already less than 20 percent savings. The saving seems even smaller if you consider the overhead imposed by the memory allocator. I wrote a small test program in C++ which allocates one million blocks of memory of a given size and prints the overhead for each allocation. Turns out the overhead ranges from 8 to 23 bytes in a sawtooth like pattern (on a 64bit Linux machine):

size=56: overhead=8
size=57: overhead=23
size=58: overhead=22
size=59: overhead=21
size=60: overhead=20
size=61: overhead=19
size=62: overhead=18
size=63: overhead=17
size=64: overhead=16
size=65: overhead=15
size=66: overhead=14
size=67: overhead=13
size=68: overhead=12
size=69: overhead=11
size=70: overhead=10
size=71: overhead=9
size=72: overhead=8

Not much you can do about that, unless one wants to use unaligned memory, which one doesn't.


-Justin


Cheers,
Nico


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to