I have GBs of data to be dumped to HBase.  After lots of trials and reading
through the mailing list, I figured out creating regions manually is a good
option because all data was hitting one node initially...

My approach to creating regions is as follow.
    - I sampled like about 1% of the actual data and created say 'n' regions
based on this sample.

Now while doing the insertions, it still hits one node first and then
spreads out.

Our theory is that, the key it encounters while inserting does'nt fall in
the region that we created(using the sample) and hence it inserts as it
would do normally.

So, has anyone approached this problem in a smarter way ?

Viv

Reply via email to