Re: Bucket planning

Sean Cribbs Tue, 20 Apr 2010 04:05:52 -0700

There still might be some benefit to segregating your data into multiple 
buckets, but that will largely depend upon what your most common query is.  
Also consider carefully your choice of key; choosing a suitable one will let 
you bypass map/reduce on certain queries.


That said, Alexander is right in that pre-computation will help you immensely.  
You'll have to either do so in your application or in post-commit hooks.

Sean Cribbs <[email protected]>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Apr 20, 2010, at 12:04 AM, Alexander Sicular wrote:

> I would go with A. An advantage of your data is that it is immutable. Since 
> it never changes you can do extensive m/r pre computations and have them run 
> continuously on some frequency. I would also probably spend some time 
> thinking about the new pre/post hook features to potentially build some 
> additional index in other buckets. Like in particular keys per time slice.
> 
> Do let us know how you proceed!
> 
> -Alexander
> 
> 
> @siculars on twitter
> http://siculars.posterous.com
> 
> Sent from my iPhone
> 
> On Apr 19, 2010, at 21:36, Brandon Smith <[email protected]> wrote:
> 
>> I am looking to use Riak as a data store for events in our system.
>> Namely, I have a handful of event types now and anticipate much more
>> later.
>> 
>> Consider the following "event":
>> 
>> {
>> "type" : "bws.stats",
>> "host" : "10.1.55.101",
>> "description" : "human friendly description",
>> "details" : "string or Object",
>> "timestamp" : 1234567890,
>> "status" : "SUCCESS|FAIL"
>> }
>> 
>> A JavaScript client will retrieve events and display them on a webpage
>> and filter the data based on:
>> 
>> 1) show just one event type
>> 2) show all events for a given host
>> 3) show all events for last Wednesday
>> 4) show all events that had a status of SUCCESS
>> 5) show the 10 events before a specific FAIL that matches the same host and 
>> type
>> 
>> There are other ways that we plan to query the data, but I think you
>> get the idea.
>> 
>> The event data payload is flexible if the above is not optimal for Riak.
>> 
>> Thoughts on how to best implement in Riak?
>> 
>> I've considered several approaches...
>> 
>> A) Store all events in one bucket and build out extensive,
>> parameterized map/reduce functionality in order to return specialized
>> data sets as described above
>> B) Store all events in separate buckets (based on type?)... but where
>> does that leave me when needing to pivot on other fields
>> C) Have as many buckets as I want to build data sets and duplicate
>> data across the buckets... i.e. a "type" bucket, "host" bucket, and
>> "status" bucket... except this starts to get a bit painful in order to
>> populate buckets that I didn't anticipate
>> D) Store all events in one bucket and have categorical buckets whose
>> entries are NOT duplicate, but rather link to the main "event" bucket
>> 
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Bucket planning

Reply via email to