I've created a similar system using a rowkey like: (hash of date) - date
The downside is it still has a hotspot when inserting, but when reading a range 
of time it does not. My use case was geared towards speeding up lots of reads.

Column qualifiers are just the collection of items you are aggregating on. 
Values are increments. In your case qualifiers might look like

c:usa, c:usa:sex:m, c:usa:sex:f, c:italy:sex:m, c:italy:sex:f, c:italy, 

Basically any combination of things you care about. This has the downside that 
you have to determine what filters are available up front and not after the 
fact. The upside is querying should be fast.

Computing counts over time is a batch of gets, which you can compute using the 
list of dates/times that you care about.  For each qualifier you would sum 
across all of the row results.

Hope this gives you some ideas,

Carlos

-----Original Message-----
From: Software Dev [mailto:[email protected]] 
Sent: Tuesday, April 29, 2014 3:51 PM
To: [email protected]
Subject: Re: Help with row and column design

Someone mentioned in another post about hotspotting. I guess I could reverse 
the row keys to prevent this?

On Tue, Apr 29, 2014 at 3:34 PM, Software Dev <[email protected]> wrote:
> Hey all. I have some questions regarding row key and column design.
>
> We want to calculate some metrics based on our page views broken down 
> by hour, day, month and year. We also want this broken down country 
> and have the ability to filter by some other attributes such as the 
> sex of the user or whether or not the user is logged in..... Note 
> these will all be increments.
>
> So we have the initial row key design as
>
> YYYY - Row key for yearly totals
> YYYYMM - Row key for monthly totals
> YYYYMMDD - Row key for daily totals
> YYYYMMDDHH - Row key for hourly totals
>
> I think this may make sense as it will be easy to do a range scan over 
> a time period.
>
> Now for my column design. We were thinking along these lines.
>
> daily:US  - Daily counts for the US
> hourly:CA - Hourly counts for Canada
> ... and so on
>
> Now this seems like it would work but fails when we add in the 
> requirement of filtering results base on some other attributes. Say we 
> wanted to be able to filter based on sex (M or F) and/or filter based 
> on logged in status (Online or Offline) OR and/or filter based on some 
> other attribute OR perform no filtering at all. How would I go about 
> accomplishing this?
>
> Thanks for any input/pointers.

Reply via email to