Hi there- Given the fact that the userid is in the lead position of the key in both approaches, I'm not sure that he'd have a region hotspotting problem because the userid should be able to offer some spread.
On 10/10/12 12:55 PM, "Jerry Lam" <[email protected]> wrote: >Hi: > >So you are saying you have ~3TB of data stored per day? > >Using the second approach, all data for one day will go to only 1 >regionserver no matter what you do because HBase doesn't split a single >row. > >Using the first approach, data will spread across regionservers but there >will be hotspotted to each regionserver during write since this is a >time-series problem. > >Best Regards, > >Jerry > >On Wed, Oct 10, 2012 at 11:24 AM, yutoo yanio <[email protected]> >wrote: > >> hi >> i have a question about key & column design. >> in my application we have 3,000,000,000 record in every day >> each record contain : user-id, "time stamp", content(max 1KB). >> we need to store records for one year, this means we will have about >> 1,000,000,000,000 after 1 year. >> we just search a user-id over rang of "time stamp" >> table can design in two way >> 1.key=userid-timestamp and column:=content >> 2.key=userid-yyyyMMdd and column:HHmmss=content >> >> >> in first design we have tall-narrow table but we have very very >>records, in >> second design we have flat-wide table. >> which of them have better performance? >> >> thanks. >>
