thank Doug! Writing to hbase would be driven by asyn events (rather than M/R jobs) fired on user activity so higher 'put' throughput is not a strictly a requirement neither is exceptional read performance. TTL would be around 6 months so I don't envision scan data ranges > 6 months.
Can you send along any leads? thanks On Fri, Jul 15, 2011 at 12:40 PM, Doug Meil <[email protected]>wrote: > > Hi there- > > There was an almost identical question on this subject yesterday and it > comes up regularly. A lot of this depends on how many users you have, > data ingest rate, and how dynamic your reports/queries need to be. > > One option is creating a table that acts as a secondary index, another is > creating a summary table of activity via a MR job. These are common > options, but not the only ones. > > Much depends on your specific requirements, though. There isn't a > one-size-fits-all answer. > > > I'll update the book with something on this topic. > > > Doug > > > On 7/15/11 2:30 PM, "large data" <[email protected]> wrote: > > >Designing date range table where I track the userId, the activity and the > >day activity was performed. > > > >Key format is <userId activityId YYDDMM> (using space as separator) to > >avoid > >hot-spots by having the date as last part of the key. > > > >Now I can easily find the activities done by user 'X' using PrefixFilter. > > > >But how do I go about finding user activities between date ranges? > > > >thanks > >
