Greetings Ben,
Also, leveldb stores data in "levels". The very first storage level and the
runtime data recovery log are not compressed.
That said, I agree with Tom that you are most likely seeing Riak store 3 copies
of your data versus only one for mongodb. It is possible to dumb down Riak so
that it is closer to mongodb:
1. in app.config, look for the riak_core options, add the following line:
{default_bucket_props, [{n_val,1}]},
This will default the system to only storing one copy of your data.
2. if you are using Riak 1.3, again in app.config, look for the riak_kv options:
change this
{anti_entropy, {on, []}},
to
{anti_entropy, {off, []}},
This will disable Riak's automatic detection and correction of data loss /
corruption. The feature requires an added 1 to 2% data on disk.
Matthew
On Apr 10, 2013, at 9:01 AM, Tom Santero <[email protected]> wrote:
> Hi Ben,
>
> First, allow me to welcome to the list! Stick around, I think you'll like it
> here. :)
>
> How many nodes of Riak are you running vs how many nodes of Mongo?
>
> How much more disk space did Riak take?
>
> Riak is designed to run as a cluster of several nodes, utilizing replication
> to provide resiliency and high-availability during partial failure. By
> default Riak stores three replicas of every object you persist. If you are
> only running a single node of Riak for your testing purposes, I suspect this
> may explain the significant divergence you're seeing when compared to the
> disk space used vs a single mongo, as each replica in Riak is being stored to
> the same disk.
>
> Also, Snappy is optimizes for speed over disk utility, which will have a
> negligible impact on total disk usage when compared to other compression
> libraries such as zlib, etc. That said, for sufficiently large JSON files I
> know that BSON's prefixes can add significant overhead to object sizes such
> that BSON is actually heavier than the JSON it represents. What is the
> average size of the documents you're seeking to store?
>
> Could you tell us a bit more about what you're trying to achieve with both
> Riak and Mongo, respectfully?
>
> Tom
>
> On Wed, Apr 10, 2013 at 12:39 AM, Ben McCann <[email protected]> wrote:
> Hi,
>
> I'm currently storing data in MongoDB and would like to evaluate Riak as an
> alternative. Riak is appealing to me because LevelDB uses Snappy, so I would
> expect it to take less disk space to store my data set than MongoDB which
> does not use compression. However, when I benchmarked it by inserting a few
> hundred thousand JSON records into each datastore, Riak in fact took far more
> disk space. I'm wondering if there's something I might be missing here as a
> newcomer to Riak. E.g. I checked the disk space used by running "du -ch
> /var/lib/riak/leveldb". Is this perhaps not a good way to check disk space
> usage because perhaps Riak/LevelDB preallocates files? (I know MongoDB does
> this and has a built-in db.collection.stats command to provide true disk
> usage information). Are there any other reasons why Riak might be taking more
> space or anything I could have screwed up?
>
> Thanks,
> Ben
>
> --
> about.me/benmccann
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com