On Mon, May 23, 2011 at 10:53:29PM -0700, Anthony Molinaro wrote:
>
> On Mon, May 23, 2011 at 09:57:25PM -0600, David Smith wrote:
> > On Mon, May 23, 2011 at 9:39 PM, Anthony Molinaro
> > Thus, depending on
> > your merge triggers, more space can be used than is strictly necessary
> > to store the data.
>
> So the lack of any overhead in the calculation is expected? I mean
> according to http://wiki.basho.com/Cluster-Capacity-Planning.html
>
> Disk = Estimated Total Objects * Average Object Size * n_val
>
> Which just seems wrong, doesn't it? I don't quite understand the
> bitcask code well enough yet to see what the actual data it stores is,
> but the whitepaper suggested several things were involved in the on
> disk representation.
Okay, finally found the code for this part, I kept looking in the nif
but that's only the keydir, not the data files. It looks like
%% Setup io_list for writing -- avoid merging binaries if we can help it
Bytes0 = [<<Tstamp:?TSTAMPFIELD>>, <<KeySz:?KEYSIZEFIELD>>,
<<ValueSz:?VALSIZEFIELD>>, Key, Value],
Bytes = [<<(erlang:crc32(Bytes0)):?CRCSIZEFIELD>> | Bytes0],
And looking at the header, it seems that there's 14 bytes of overhead
(4 for CRC, 4 for timestamp, 2 for keysize, 4 for valsize).
So disk calculation should be
( 14 + Key + Value ) * Num Entries * N_Val
So using my numbers from before that gives
( 14 + 36 + 36 ) * 183915891 * 3 = 47450299878 = 44.1 GB
which actually isn't much closer to 341 GB than the previous calculation :(
So all my questions from the previous email still apply.
-Anthony
--
------------------------------------------------------------------------
Anthony Molinaro <[email protected]>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com