Lots of sparse columns. Efficient like Cassandra? Some measures of my dataset

gbrits Wed, 17 Jul 2013 04:39:50 -0700

Somewhere (can't find it now) I've read that Riak, like Cassandra could be
classified as a column store.

This is just a name of course but what I understand from Cassandra is that
this allows for space-efficient encoding of column-values. Basically storage
is surrounded around columns instead of rows, allowing for different
persistence strategies on a per-column, or column-family, basis. Moreover,
it would allow for zero storage overhead for non-existent column values.
I.e: basically allowing for efficient storage of sparse data-sets.

Does Riak have this property as well?

More specifically, I've got a datastructure on paper with the following
properties, when mapped to riak nomenclature:

- ~ 1.000.000 keys (will not grow)
- ~ 1.000 columns. (may grow)
- 1 particular key has a median of ~50 columns. In other words the entire
set is ~ 95% sparse.
- Wherever a key has a value for a particular column, that value is always
exactly a String (base 255) of 4KB length.
- the 4KB values themselves are pretty 'sparse' so would benefit a lot from
run-length encoding. Is this supported out of the box?

Given these properties how would Riak hold up? Hard to say of course, but
I'm looking for some general advice.

Thanks.

--
View this message in context:
http://riak-users.197444.n3.nabble.com/Lots-of-sparse-columns-Efficient-like-Cassandra-Some-measures-of-my-dataset-tp4028367.html
Sent from the Riak Users mailing list archive at Nabble.com.

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Lots of sparse columns. Efficient like Cassandra? Some measures of my dataset

Reply via email to