On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia <mohitanch...@gmail.com> wrote: > In general isn't it better to split the regions so that the load can be > spread accross the cluster to avoid HotSpots? >
Time series data is a particular case [1] and the sematextians have tools to help w/ that particular loading pattern. Is time series your loading pattern? If so, yes, you need to employ some smarts (tsdb schema and write tricks or hbasewd tool) to avoid hotspotting. But hotspotting is an issue apart from splts; you can split all you want and if your row keys are time series, splitting won't undo them. You would split to distribute load over the cluster and HBase should be doing this for you w/o need of human intervention (caveat the reasons you might want to manually split as listed above by AK and Ian). St.Ack 1. http://hbase.apache.org/book.html#rowkey.design