Just to add to Jeremiah's comments, I think you should consider whether you will be mostly retrieving:
1) all 1000 columns 2) some subset of columns 3) single columns That will greatly influence how you design your keyspace. Remember, with Riak it's just key-value in the end. This is one of my favorite examples of building a column-like system on top of pure key-value, Boundary's "Kobayashi" system: https://vimeo.com/42902962 On Wed, Jul 17, 2013 at 7:25 AM, Jeremiah Peschka < [email protected]> wrote: > > > -- > Jeremiah Peschka - Founder, Brent Ozar Unlimited > MCITP: SQL Server 2008, MVP > Cloudera Certified Developer for Apache Hadoop > > On Jul 17, 2013, at 4:38 AM, gbrits <[email protected]> wrote: > > > Somewhere (can't find it now) I've read that Riak, like Cassandra could > be > > classified as a column store. > > That is incorrect. Riak is a key value database where the value is an > opaque blob. > > > > > This is just a name of course but what I understand from Cassandra is > that > > this allows for space-efficient encoding of column-values. Basically > storage > > is surrounded around columns instead of rows, allowing for different > > persistence strategies on a per-column, or column-family, basis. > Moreover, > > it would allow for zero storage overhead for non-existent column values. > > I.e: basically allowing for efficient storage of sparse data-sets. > > > > Does Riak have this property as well? > > No. Riak will happily store whatever you throw at it. That being said, > most good serialization libraries will leave off nullable properties. > > > > > More specifically, I've got a datastructure on paper with the following > > properties, when mapped to riak nomenclature: > > > > - ~ 1.000.000 keys (will not grow) > > - ~ 1.000 columns. (may grow) > > - 1 particular key has a median of ~50 columns. In other words the entire > > set is ~ 95% sparse. > > - Wherever a key has a value for a particular column, that value is > always > > exactly a String (base 255) of 4KB length. > > - the 4KB values themselves are pretty 'sparse' so would benefit a lot > from > > run-length encoding. Is this supported out of the box? > > See above. > > > > > Given these properties how would Riak hold up? Hard to say of course, but > > I'm looking for some general advice. > > Riak objects should be no more than ~10MB for performance reasons. You > should be safe. > > > > > Thanks. > > > > > > > > > > -- > > View this message in context: > http://riak-users.197444.n3.nabble.com/Lots-of-sparse-columns-Efficient-like-Cassandra-Some-measures-of-my-dataset-tp4028367.html > > Sent from the Riak Users mailing list archive at Nabble.com. > > > > _______________________________________________ > > riak-users mailing list > > [email protected] > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > -- Sean Cribbs <[email protected]> Software Engineer Basho Technologies, Inc. http://basho.com/
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
