In general isn't it better to split the regions so that the load can be spread accross the cluster to avoid HotSpots?
I read about pre-splitting here: http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ On Thu, Aug 30, 2012 at 4:30 PM, Amandeep Khurana <[email protected]> wrote: > Also, you might have read that an initial loading of data can be better > distributed across the cluster if the table is pre-split rather than > starting with a single region and splitting (possibly aggressively, > depending on the throughput) as the data loads in. Once you are in a stable > state with regions distributed across the cluster, there is really no > benefit in terms of spreading load by managing splitting manually v/s > letting HBase do it for you. At that point it's about what Ian mentioned - > predictability of latencies by avoiding splits happening at a busy time. > > On Thu, Aug 30, 2012 at 4:26 PM, Ian Varley <[email protected]> > wrote: > > > The Facebook devs have mentioned in public talks that they pre-split > their > > tables and don't use automated region splitting. But as far as I > remember, > > the reason for that isn't predictability of spreading load, so much as > > predictability of uptime & latency (they don't want an automated split to > > happen at a random busy time). Maybe that's what you mean, Mohit? > > > > Ian > > > > On Aug 30, 2012, at 5:45 PM, Stack wrote: > > > > On Thu, Aug 30, 2012 at 7:35 AM, Mohit Anchlia <[email protected] > > <mailto:[email protected]>> wrote: > > From what I;ve read it's advisable to do manual splits since you are able > > to spread the load in more predictable way. If I am missing something > > please let me know. > > > > > > Where did you read that? > > St.Ack > > > > >
