On Wed, Jan 5, 2011 at 4:09 PM, Dave Viner <davevi...@gmail.com> wrote:
> > "a Column Family with the row key being the Unix time divided by 60x60 and > a column key of... pretty much anything unique" > LogCF[hour-day-in-epoch-seconds][timeuuid] = 1 > where 'hour-day-in-epoch-seconds' is something like the first second of the > given hour of the day, so 01/04/2011 19:00:00 (in epoch > seconds: 1294167600); 'timeuuid' is a TimeUUID from cassandra, and '1' is > the value of the entry. > > Then "look at the current row every hour to actually compile the numbers, > and store the count in the same Column Family" > LogCF[hour-day-in-epoch-seconds][total] = x > where 'x' is the sum of the number of timeuuid columns in the row? > This looks correct. In terms of the query you need work out the row keys in order to query the right rows. Obviously the longer the period the longer the query, but the good news is that the query time doesn't increase with total data volume. When I first used this approach I thought it felt a little too brute force. There are ways to make it faster - such as storing the totals at different time resolutions - but unless you are expecting queries over years rather than a few weeks it's probably not worth the complexity introduced.