I have GBs of data to be dumped to HBase. After lots of trials and reading through the mailing list, I figured out creating regions manually is a good option because all data was hitting one node initially...
My approach to creating regions is as follow. - I sampled like about 1% of the actual data and created say 'n' regions based on this sample. Now while doing the insertions, it still hits one node first and then spreads out. Our theory is that, the key it encounters while inserting does'nt fall in the region that we created(using the sample) and hence it inserts as it would do normally. So, has anyone approached this problem in a smarter way ? Viv