thank Doug!

Writing to hbase would be driven by asyn events (rather than M/R jobs) fired
on user activity so higher 'put' throughput is not a strictly a requirement
neither is exceptional read performance. TTL would be around 6 months so I
don't envision scan data ranges > 6 months.

Can you send along any leads?

thanks

On Fri, Jul 15, 2011 at 12:40 PM, Doug Meil
<[email protected]>wrote:

>
> Hi there-
>
> There was an almost identical question on this subject yesterday and it
> comes up regularly.  A lot of this depends on how many users you have,
> data ingest rate, and how dynamic your reports/queries need to be.
>
> One option is creating a table that acts as a secondary index, another is
> creating a summary table of activity via a MR job.  These are common
> options, but not the only ones.
>
> Much depends on your specific requirements, though.  There isn't a
> one-size-fits-all answer.
>
>
> I'll update the book with something on this topic.
>
>
> Doug
>
>
> On 7/15/11 2:30 PM, "large data" <[email protected]> wrote:
>
> >Designing date range table where I track the userId, the activity and the
> >day activity was performed.
> >
> >Key format is <userId activityId YYDDMM> (using space as separator) to
> >avoid
> >hot-spots by having the date as last part of the key.
> >
> >Now I can easily find the activities done by user 'X' using PrefixFilter.
> >
> >But how do I go about finding user activities between date ranges?
> >
> >thanks
>
>

Reply via email to