Hi there- I just submitted a patch to the book here...
https://issues.apache.org/jira/browse/HBASE-4110 You can see the contents in the patch. On 7/15/11 3:48 PM, "large data" <[email protected]> wrote: >thank Doug! > >Writing to hbase would be driven by asyn events (rather than M/R jobs) >fired >on user activity so higher 'put' throughput is not a strictly a >requirement >neither is exceptional read performance. TTL would be around 6 months so I >don't envision scan data ranges > 6 months. > >Can you send along any leads? > >thanks > >On Fri, Jul 15, 2011 at 12:40 PM, Doug Meil ><[email protected]>wrote: > >> >> Hi there- >> >> There was an almost identical question on this subject yesterday and it >> comes up regularly. A lot of this depends on how many users you have, >> data ingest rate, and how dynamic your reports/queries need to be. >> >> One option is creating a table that acts as a secondary index, another >>is >> creating a summary table of activity via a MR job. These are common >> options, but not the only ones. >> >> Much depends on your specific requirements, though. There isn't a >> one-size-fits-all answer. >> >> >> I'll update the book with something on this topic. >> >> >> Doug >> >> >> On 7/15/11 2:30 PM, "large data" <[email protected]> wrote: >> >> >Designing date range table where I track the userId, the activity and >>the >> >day activity was performed. >> > >> >Key format is <userId activityId YYDDMM> (using space as separator) to >> >avoid >> >hot-spots by having the date as last part of the key. >> > >> >Now I can easily find the activities done by user 'X' using >>PrefixFilter. >> > >> >But how do I go about finding user activities between date ranges? >> > >> >thanks >> >>
