Re: Bucket planning

Alexander Sicular Tue, 20 Apr 2010 01:05:14 -0700

Unfortunately, there is no offset, limit, sorted set or paginationbuilt in to riak. You kinda need to maintain your own index for any ofthat and/or rely on m/r in a less than real-time sort of way. I wouldtake your day keys and make them smaller... Like hours or evenminutes. The number of keys in your bucket is only a concern if youare not maintaining your own key index somewhere else. If you are notyou are forced to hit the bucket list keys function which is one ofthe most costly things you can do in riak. Again, specifics willdetermine what is best. Also, I would drop the "_" from the key forbrevity ;)

Riak gets you insane persisted storage and access to raw dataprimitives in a super friendly http wrapper (+erlang,+pbuffs), butdoesn't do some of that more abstracted stuff. Thats an exercise forthe user.


Take a good look at pairing with redis for sorted sets.

I believe only the post hook is erlang only.

-Alexander

@siculars on twitter
http://siculars.posterous.com

Sent from my iPhone

On Apr 20, 2010, at 2:17, Brandon Smith <[email protected]> wrote:

I think I understand your suggestion regarding an additional index in
other buckets.

To use your "keys per time slice" example, if I wanted to get all of
last month's phone call records, I will have already built up a
bucket-key pair that contains a list of keys...

GET /riak/callreport/2010_03

[["call","1234567890"],["call","1234567891"],["call","1234567892"]]


With this list, I could then use it as input for a Map/Reduce job. I
can see how these types of inverted indices can be built up through
the commit hooks (although at present, it seems providing and Erlang
function is the only option, right???).

Frankly, if I've got it right, it seems that this pattern is ripe for
being baked into Riak by way of some form of declarative means... or
through some pre-built commit functions.

Nevertheless, there are one conceptual leap that I am failing to grasp
about Riak...

1) How do I do the equivalent of a LIMIT/OFFSET in Riak?

I will be storing a very large number of bucket-key pairs of which I
need to retrieve a subset and always ordered chronologically. I
presume that if such an inverted index as shown above is maintained in
an array that chronological order can be preserved implicitly without
having to any type of forced ordering function.

However, how can I page through the results of a dataset? If even the
inverted index is sufficiently large (I estimate a constant of 57,600
events a day just for the initial event type Riak will be taking on...
other types of events not initially introduced will increase as the
number of customers in our multi-tenant system increase) is there a
pattern for paging through data in Riak?

Brandon

On Tue, Apr 20, 2010 at 12:04 AM, Alexander Sicular <[email protected]> wrote:

I would go with A. An advantage of your data is that it isimmutable. Sinceit never changes you can do extensive m/r pre computations and havethem run
continuously on some frequency. I would also probably spend some time
thinking about the new pre/post hook features to potentially buildsomeadditional index in other buckets. Like in particular keys per timeslice.
Do let us know how you proceed!

-Alexander


@siculars on twitter
http://siculars.posterous.com

Sent from my iPhone

On Apr 19, 2010, at 21:36, Brandon Smith <[email protected]> wrote:
I am looking to use Riak as a data store for events in our system.
Namely, I have a handful of event types now and anticipate much more
later.

Consider the following "event":

{
 "type" : "bws.stats",
 "host" : "10.1.55.101",
 "description" : "human friendly description",
 "details" : "string or Object",
 "timestamp" : 1234567890,
 "status" : "SUCCESS|FAIL"
}
A JavaScript client will retrieve events and display them on awebpage
and filter the data based on:

1) show just one event type
2) show all events for a given host
3) show all events for last Wednesday
4) show all events that had a status of SUCCESS
5) show the 10 events before a specific FAIL that matches the samehost
and type

There are other ways that we plan to query the data, but I think you
get the idea.
The event data payload is flexible if the above is not optimal forRiak.
Thoughts on how to best implement in Riak?

I've considered several approaches...

A) Store all events in one bucket and build out extensive,
parameterized map/reduce functionality in order to returnspecialized
data sets as described above
B) Store all events in separate buckets (based on type?)... butwhere
does that leave me when needing to pivot on other fields
C) Have as many buckets as I want to build data sets and duplicate
data across the buckets... i.e. a "type" bucket, "host" bucket, and
"status" bucket... except this starts to get a bit painful inorder to
populate buckets that I didn't anticipate
D) Store all events in one bucket and have categorical buckets whose
entries are NOT duplicate, but rather link to the main "event"bucket
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Bucket planning

Reply via email to