On Sat, Jul 16, 2011 at 7:08 PM, Tristan Seligmann
<[email protected]> wrote:
> I'm trying to model a schema for a logging storage system in
> Cassandra: Log messages consist of a timestamp, message, and some
> other arbitrary key/value pairs. Querying would primarily be done
> based on timestamp ranges; I will probably be doing filtering based on
> matches against the key/value pairs as well, but I expect that will be
> handled by fetching the messages in the desired time range, then
> filtering out the uninteresting ones.

I recommend reading this:
http://blog.insidesystems.net/basic-time-series-with-cassandra

> A supercolumn makes it easy enough to store the key/value pairs as
> columns, but then I end up with all of my log messages in a single
> row, which obviously won't work. On the other hand, if I use the
> timestamp as the row key, I need to use OPP to query on ranges, and
> I'd prefer not to deal with the balancing issues that would raise. I
> suppose I could go halfway; use a prefix of the timestamp (eg. date +
> hour, or perhaps date + hour + minute) as the key, and then retrieve
> all of the keys in the range I'm interested in when performing a
> query.

Do the latter and avoid OPP.  Chunking by hour should be sufficient in
most cases.

-Brandon

Reply via email to