Hi Ted, Thanks a lot. That post is really helpful.
Many thanks. Bill On Sun, Jan 19, 2014 at 9:53 PM, Ted Yu <[email protected]> wrote: > Bill: > See http://blog.sematext.com/2012/04/09/hbasewd > > -avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ > > FYI > > > On Sun, Jan 19, 2014 at 4:02 PM, Bill Q <[email protected]> wrote: > > > Hi Amit, > > Thanks for the reply. > > > > If I understand your suggestion correctly, and assuming we have 100 > region > > servers, I would have to do 100 scans to merge reads if I want to pull > any > > data for a specific date. Is that correct? Is the 100 scans the most > > efficient way to deal with this issue? > > > > Any thoughts? > > > > Many thanks. > > > > > > Bill > > > > > > On Sun, Jan 19, 2014 at 4:02 PM, Amit Sela <[email protected]> wrote: > > > > > If you'll use bulk load to insert your data you could use the date as > key > > > prefix and choose the rest of the key in a way that will split each day > > > evenly. You'll have X regions for Evey day >> 14X regions for the two > > weeks > > > window. > > > On Jan 19, 2014 8:39 PM, "Bill Q" <[email protected]> wrote: > > > > > > > Hi, > > > > I am designing a schema to host some large volume of data over HBase. > > We > > > > collect daily trading data for some markets. And we run a moving > window > > > > analysis to make predictions based on a two weeks window. > > > > > > > > Since everybody is going to pull the latest two weeks data every day, > > if > > > we > > > > put the date in the lead positions of the Key, we will have some hot > > > > regions. So, we can use bucketing (date to mode bucket number) > approach > > > to > > > > deal with this situation. However, if we have 200 buckets, we need to > > run > > > > 200 scans to extract all the data in the last two weeks. > > > > > > > > My questions are: > > > > 1. What happens when each scan return the result? Will the scan > result > > be > > > > sent to a sink like place that collects and concatenate all the scan > > > > results? > > > > 2. Why having 200 scans might be a bad thing compared to have only 10 > > > > scans? > > > > 3. Any suggestions to the design? > > > > > > > > Many thanks. > > > > > > > > > > > > Bill > > > > > > > > > >
