I have around 20 GB of data to be dumped into a hbase table.

Initially, I had a simple java program to put the values in a batch of
(5000-10000) records.  I tried concurrent inserts and each insert took about
15 seconds to write.  Which is very slow and was taking ages.

Next approach was to use importtsv, this started off with a set of maps and
after few minutes, I started getting RetriesException and errors out in a
while.

Of these experiments, I noticed that the master node was handing all the
traffic.  I understand that initially it dumps data in one node and then
splits across multiple nodes as data comes in.  Is there a way to split this
across regions in the beginning?

Or any other thoughts on how to handle inserts of large amounts of data?
Viv

Reply via email to