Re: md5 hash key and splits

Stack Thu, 30 Aug 2012 23:52:40 -0700

On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia <mohitanch...@gmail.com> wrote:
> In general isn't it better to split the regions so that the load can be
> spread accross the cluster to avoid HotSpots?
>


Time series data is a particular case [1] and the sematextians have
tools to help w/ that particular loading pattern.  Is time series your
loading pattern?  If so, yes, you need to employ some smarts (tsdb
schema and write tricks or hbasewd tool) to avoid hotspotting.  But
hotspotting is an issue apart from splts; you can split all you want
and if your row keys are time series, splitting won't undo them.

You would split to distribute load over the cluster and HBase should
be doing this for you w/o need of human intervention (caveat the
reasons you might want to manually split as listed above by AK and
Ian).

St.Ack
1. http://hbase.apache.org/book.html#rowkey.design

Re: md5 hash key and splits

Reply via email to