HBase load distribution vs. scan efficiency

Bill Q Sun, 19 Jan 2014 10:39:55 -0800

Hi,
I am designing a schema to host some large volume of data over HBase. We
collect daily trading data for some markets. And we run a moving window
analysis to make predictions based on a two weeks window.


Since everybody is going to pull the latest two weeks data every day, if we
put the date in the lead positions of the Key, we will have some hot
regions. So, we can use bucketing (date to mode bucket number) approach to
deal with this situation. However, if we have 200 buckets, we need to run
200 scans to extract all the data in the last two weeks.

My questions are:
1. What happens when each scan return the result? Will the scan result be
sent to a sink  like place that collects and concatenate all the scan
results?
2. Why having 200 scans might be a bad thing compared to have only 10
scans?
3. Any suggestions to the design?

Many thanks.


Bill

HBase load distribution vs. scan efficiency

Reply via email to