Thanks Sean. Looks like 3 might be the best plan. And, pre/post-commit hooks... cool! I didn't see those - that's something I've been looking for (since I'd prefer to keep that kind of stuff happening on the data nodes rather than in the client/app itself).
One further question, is there any limitation to how the number of buckets can scale? If you're recommending using them to box data by minute I'm guessing that # buckets can increase without worry, but is this still the case if say I started binning into buckets by second? J On Wed, May 19, 2010 at 1:53 AM, Sean Cribbs <[email protected]> wrote: > Joel, > > Riak's only query mechanism aside from simple key retrieval is map-reduce. > However, there are a number of strategies you could take, depending on what > you want to query. I don't know the requirements of your application, but > here are some options: > > 1) Store the data either keyed on the timestamp, or as separate objects > linked from a timestamp object. > 2) Create buckets for each time-window you want to track. For example, if I > wanted to box data by minute, I'd make bucket names that look like: > 2010-05-18T09.46. Then if I want all the data from that minute, I'd run a > map-reduce query with that bucket name as the inputs. > 3) Create your own secondary indexes with a post-commit hook or code in your > application for year, month, day, etc. The secondary index would be, like > #1, keys that only contain links to the actual data. > > With any of these options (which are by no means exhaustive), your map-reduce > query will need to sort the data in a reduce phase if you require > chronological ordering. Also, if you're building your own indexes in separate > buckets, depending on the write throughput of your application, you might > want to build in some sort of conflict resolution and turn on allow_mult so > that concurrent updates are not lost. > > Sean Cribbs <[email protected]> > Developer Advocate > Basho Technologies, Inc. > http://basho.com/ > > On May 17, 2010, at 8:31 PM, Joel Pitt wrote: > >> Hi, >> >> I'm trying to work out the best way of storing temporal data in Riak. >> >> I've been investigating several NoSQL solutions and originally started >> out using CouchDB, however I want to move to a db that scales more >> gradually (CouchDB scales, but you really have to set up the >> architecture before-hand and I'd prefer to be able to build a cluster >> a node at a time) >> >> In CouchDB, I use a multi-level key in a map-reduce view to create an >> index by time. Each reduce level corresponds to year, month, day, >> time... so I can easily get aggregate data for say a month. >> >> In addition to Riak I'm investigating Cassandra. In Cassandra the way >> to store time series is by making the column keys timestamps and >> sorting columns by TimeUUID. This allows one to do slices across a >> range of time. This isn't exactly the same as what I have in CouchDB, >> but by consensus it seems to be the way to store a time index. >> >> Any suggestions for working with or creating time indexes in Riak? >> >> Ideally I'd be able to query documents with a time range to either get >> the documents, or to calculate aggregate statistics using a map-reduce >> task. >> >> Any information appreciated :-) >> >> Joel Pitt, PhD | http://ferrouswheel.me | +64 21 101 7308 >> NetEmpathy Co-founder | http://netempathy.com >> OpenCog Developer | http://opencog.org >> Board member, Humanity+ | http://humanityplus.org >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
