Hi: So you are saying you have ~3TB of data stored per day?
Using the second approach, all data for one day will go to only 1 regionserver no matter what you do because HBase doesn't split a single row. Using the first approach, data will spread across regionservers but there will be hotspotted to each regionserver during write since this is a time-series problem. Best Regards, Jerry On Wed, Oct 10, 2012 at 11:24 AM, yutoo yanio <[email protected]> wrote: > hi > i have a question about key & column design. > in my application we have 3,000,000,000 record in every day > each record contain : user-id, "time stamp", content(max 1KB). > we need to store records for one year, this means we will have about > 1,000,000,000,000 after 1 year. > we just search a user-id over rang of "time stamp" > table can design in two way > 1.key=userid-timestamp and column:=content > 2.key=userid-yyyyMMdd and column:HHmmss=content > > > in first design we have tall-narrow table but we have very very records, in > second design we have flat-wide table. > which of them have better performance? > > thanks. >
