Re: anyone using Cassandra as an analytics/data warehouse?

Peter Harrison Sun, 09 Jan 2011 13:28:18 -0800

On Wed, Jan 5, 2011 at 4:09 PM, Dave Viner <davevi...@gmail.com> wrote:


>
> "a Column Family with the row key being the Unix time divided by 60x60 and
> a column key of... pretty much anything unique"
>     LogCF[hour-day-in-epoch-seconds][timeuuid] = 1
> where 'hour-day-in-epoch-seconds' is something like the first second of the
> given hour of the day, so 01/04/2011 19:00:00 (in epoch
> seconds: 1294167600); 'timeuuid' is a TimeUUID from cassandra, and '1' is
> the value of the entry.
>
> Then "look at the current row every hour to actually compile the numbers,
> and store the count in the same Column Family"
>     LogCF[hour-day-in-epoch-seconds][total] = x
> where 'x' is the sum of the number of timeuuid columns in the row?
>

This looks correct.



In terms of the query you need work out the row keys in order to query the
right rows. Obviously the longer the period the longer the query, but the
good news is that the query time doesn't increase with total data volume.
When I first used this approach I thought it felt a little too brute force.
There are ways to make it faster - such as storing the totals at different
time resolutions - but unless you are expecting queries over years rather
than a few weeks it's probably not worth the complexity introduced.

Re: anyone using Cassandra as an analytics/data warehouse?

Reply via email to