Hi, I'm currently using Riak for my project. It works well for single
documents, however I often need to present to users a stream of (loosely)
time ordered documents, Riak's keys are unordered by nature so there's no
straight forward way of traversing data. I came up with the following
approach:

Make a bucket (i.e: "pages"), set allow_mult to true. Inside this bucket
store a number that points to the "current" page, this number is
initialized to 0, I call this a cursor. For every "page" of data, create an
object in the same bucket, e.g: first page is associated with the key
page_0, second page: page_1 etc... These page objects are
sets modeled using statebox for conflict resolution.

When a document is inserted, read the cursor value. Since the cursor can
only be increasing, we resolve conflicts by choosing the largest value
among the siblings. Next, read the page that it points to (if cursor is 0,
read the key "page_0", if it is 1, read "page_1" etc). If the number of
objects inside this set exceeds the page size, increment the counter and
create a new page to insert the object into, otherwise, leave the counter
be and insert into this page.

To retrieve data in reverse chronological order, read the cursor to find
out the current page and then read the last page (which is shown to users
as the first page).

Currently, my document's ids are monotonically increasing using this:
https://github.com/boundary/flake so I can sort documents within a page.

I do realize that a page size can exceed its limit however, I don't know
how badly it can be with respect to writing rate. All I need is some form
of bulk get and chunking without resorting to 2i which can cover the whole
cluster.

So, is there any major problem with this approach? Thanks.
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to