Re: Write TimeSeries Data and Do Time Based Range Scans

Shahab Yunus Mon, 23 Sep 2013 15:51:49 -0700

http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/


Here you can find the discussion, trade-offs and working code/API (even for
M/R) about this and the approach you are trying out.

Regards,
Shahab


On Mon, Sep 23, 2013 at 5:41 PM, anil gupta <[email protected]> wrote:

> Hi All,
>
> I have a secondary index(inverted index) table with a rowkey on the basis
> of Timestamp of an event. Assume the rowkey as <TimeStamp in Epoch>.
> I also store some extra(apart from main_table rowkey) columns in that table
> for doing filtering.
>
> The requirement is to do range-based scan on the basis of time of
> event.  Hence, the index with this rowkey.
> I cannot use Hashing or MD5 digest solution because then i cannot do range
> based scans.  And, i already have a index like OpenTSDB in another table
> for the same dataset.(I have many secondary Index for same data set.)
>
> Problem: When we increase the write workload during stress test. Time
> secondary index becomes a bottleneck due to the famous Region HotSpotting
> problem.
> Solution: I am thinking of adding a prefix of { (<TimeStamp in Epoch>%10) =
> bucket}  in the rowkey. Then my row key will become:
>  <Bucket><TimeStamp in Epoch>
> By using above rowkey i can at least alleviate *WRITE* problem.(i don't
> think problem can be fixed permanently because of the use case requirement.
> I would love to be proven wrong.)
> However, with the above row key, now when i want to *READ* data, for every
> single range scans i have to read data from 10 different regions. This
> extra load for read is scaring me a bit.
>
> I am wondering if anyone has better suggestion/approach to solve this
> problem given the constraints i have.  Looking for feedback from community.
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Write TimeSeries Data and Do Time Based Range Scans

Reply via email to