On Mon, May 23, 2011 at 09:57:25PM -0600, David Smith wrote: > On Mon, May 23, 2011 at 9:39 PM, Anthony Molinaro > <[email protected]> wrote: > > > > Bitcask-Capacity-Planning Cluster-Capacity-Planning Reality > > RAM 34.9 GB 34.9 GB 70 GB > > Disk 102 GB 18.49 GB 341 GB > > > > So it looks to me like the numbers for RAM are about 1/2 of actual and > > the number for Disk are completely off, they are different depending on > > which page you look at on the wiki and vastly underestimate reality. > > So RAM would require a little digging to figure out;
Anything I can do there to help? I'd really like to get to the bottom of the discrepency with these numbers. I assume everything is stored as binaries, and I'm not seeing some sort of 64-bit doubling (I know I convert my keys and values to binaries before sending them to riak). Here's the output of memory/0 on an attached shell ([email protected])1> memory(). [{total,7281790968}, {processes,18543872}, {processes_used,18132704}, {system,7263247096}, {atom,825105}, {atom_used,815183}, {binary,603512}, {code,8306646}, {ets,536440}] Which seems like it's all used by system which is I assume the keydirs in the driver. Also does the number of partitions impact this value at all? I have 1024 total on 8 nodes the ring currently looks like ring_ownership : <<"[{'[email protected]',128},\n {'[email protected]',128},\n {'[email protected]',129},\n {'[email protected]',129},\n {'[email protected]',128},\n {'[email protected]',128},\n {'[email protected]',126},\n {'[email protected]',128}]">> Which also seems a bit odd, I would expect them all to be 128, but anyway? > disk is easier to > explain. The disk calculations do not take into account (as best I can > tell) the fact that bitcask is an append-only store and requires > periodic merging/compaction of the on-disk files. Is there anyway to force a merge/compaction so I can attempt to better understand my usage. I know with cassandra I had a way to run compactions with their nodetool, but riak-admin doesn't seem to have any sort of controls, unless a backup causes merging to occur. > Thus, depending on > your merge triggers, more space can be used than is strictly necessary > to store the data. So the lack of any overhead in the calculation is expected? I mean according to http://wiki.basho.com/Cluster-Capacity-Planning.html Disk = Estimated Total Objects * Average Object Size * n_val Which just seems wrong, doesn't it? I don't quite understand the bitcask code well enough yet to see what the actual data it stores is, but the whitepaper suggested several things were involved in the on disk representation. -Anthony -- ------------------------------------------------------------------------ Anthony Molinaro <[email protected]> _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
