Re: md5 hash key and splits

Mohit Anchlia Fri, 31 Aug 2012 07:55:32 -0700

On Thu, Aug 30, 2012 at 11:52 PM, Stack <[email protected]> wrote:

> On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia <[email protected]>
> wrote:
> > In general isn't it better to split the regions so that the load can be
> > spread accross the cluster to avoid HotSpots?
> >
>
> Time series data is a particular case [1] and the sematextians have
> tools to help w/ that particular loading pattern.  Is time series your
> loading pattern?  If so, yes, you need to employ some smarts (tsdb
> schema and write tricks or hbasewd tool) to avoid hotspotting.  But
> hotspotting is an issue apart from splts; you can split all you want
> and if your row keys are time series, splitting won't undo them.
>
> My data is timeseries and to get random distribution and still have the
keys in the same region for a user I am thinking of using
md5(userid)+reversetimestamp as a row key. But with this type of key how
can one do pre-splits? I have 30 nodes.



> You would split to distribute load over the cluster and HBase should
> be doing this for you w/o need of human intervention (caveat the
> reasons you might want to manually split as listed above by AK and
> Ian).
>
> St.Ack
> 1. http://hbase.apache.org/book.html#rowkey.design
>

Re: md5 hash key and splits

Reply via email to