On Thu, Aug 30, 2012 at 11:52 PM, Stack <[email protected]> wrote: > On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia <[email protected]> > wrote: > > In general isn't it better to split the regions so that the load can be > > spread accross the cluster to avoid HotSpots? > > > > Time series data is a particular case [1] and the sematextians have > tools to help w/ that particular loading pattern. Is time series your > loading pattern? If so, yes, you need to employ some smarts (tsdb > schema and write tricks or hbasewd tool) to avoid hotspotting. But > hotspotting is an issue apart from splts; you can split all you want > and if your row keys are time series, splitting won't undo them. > > My data is timeseries and to get random distribution and still have the keys in the same region for a user I am thinking of using md5(userid)+reversetimestamp as a row key. But with this type of key how can one do pre-splits? I have 30 nodes.
> You would split to distribute load over the cluster and HBase should > be doing this for you w/o need of human intervention (caveat the > reasons you might want to manually split as listed above by AK and > Ian). > > St.Ack > 1. http://hbase.apache.org/book.html#rowkey.design >
