Re: Storage of time-series data

Joel Pitt Tue, 18 May 2010 17:06:41 -0700

Thanks Sean. Looks like 3 might be the best plan.

And, pre/post-commit hooks... cool! I didn't see those - that's
something I've been looking for (since I'd prefer to keep that kind of
stuff happening on the data nodes rather than in the client/app
itself).


One further question, is there any limitation to how the number of
buckets can scale? If you're recommending using them to box data by
minute I'm guessing that # buckets can increase without worry, but is
this still the case if say I started binning into buckets by second?

J

On Wed, May 19, 2010 at 1:53 AM, Sean Cribbs <[email protected]> wrote:
> Joel,
>
> Riak's only query mechanism aside from simple key retrieval is map-reduce.  
> However, there are a number of strategies you could take, depending on what 
> you want to query. I don't know the requirements of your application, but 
> here are some options:
>
> 1) Store the data either keyed on the timestamp, or as separate objects 
> linked from a timestamp object.
> 2) Create buckets for each time-window you want to track.  For example, if I 
> wanted to box data by minute, I'd make bucket names that look like: 
> 2010-05-18T09.46.  Then if I want all the data from that minute, I'd run a 
> map-reduce query with that bucket name as the inputs.
> 3) Create your own secondary indexes with a post-commit hook or code in your 
> application for year, month, day, etc.  The secondary index would be, like 
> #1, keys that only contain links to the actual data.
>
> With any of these options (which are by no means exhaustive), your map-reduce 
> query will need to sort the data in a reduce phase if you require 
> chronological ordering. Also, if you're building your own indexes in separate 
> buckets, depending on the write throughput of your application, you might 
> want to build in some sort of conflict resolution and turn on allow_mult so 
> that concurrent updates are not lost.
>
> Sean Cribbs <[email protected]>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On May 17, 2010, at 8:31 PM, Joel Pitt wrote:
>
>> Hi,
>>
>> I'm trying to work out the best way of storing temporal data in Riak.
>>
>> I've been investigating several NoSQL solutions and originally started
>> out using CouchDB, however I want to move to a db that scales more
>> gradually (CouchDB scales, but you really have to set up the
>> architecture before-hand and I'd prefer to be able to build a cluster
>> a node at a time)
>>
>> In CouchDB, I use a multi-level key in a map-reduce view to create an
>> index by time. Each reduce level corresponds to year, month, day,
>> time... so I can easily get aggregate data for say a month.
>>
>> In addition to Riak I'm investigating Cassandra. In Cassandra the way
>> to store time series is by making the column keys timestamps and
>> sorting columns by TimeUUID. This allows one to do slices across a
>> range of time. This isn't exactly the same as what I have in CouchDB,
>> but by consensus it seems to be the way to store a time index.
>>
>> Any suggestions for working with or creating time indexes in Riak?
>>
>> Ideally I'd be able to query documents with a time range to either get
>> the documents, or to calculate aggregate statistics using a map-reduce
>> task.
>>
>> Any information appreciated :-)
>>
>> Joel Pitt, PhD | http://ferrouswheel.me | +64 21 101 7308
>> NetEmpathy Co-founder | http://netempathy.com
>> OpenCog Developer | http://opencog.org
>> Board member, Humanity+ | http://humanityplus.org
>>
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Storage of time-series data

Reply via email to