Bulk Load question.

Vivek Krishna Sat, 19 Mar 2011 08:49:03 -0700

I have around 20 GB of data to be dumped into a hbase table.

Initially, I had a simple java program to put the values in a batch of
(5000-10000) records.  I tried concurrent inserts and each insert took about
15 seconds to write.  Which is very slow and was taking ages.


Next approach was to use importtsv, this started off with a set of maps and
after few minutes, I started getting RetriesException and errors out in a
while.

Of these experiments, I noticed that the master node was handing all the
traffic.  I understand that initially it dumps data in one node and then
splits across multiple nodes as data comes in.  Is there a way to split this
across regions in the beginning?

Or any other thoughts on how to handle inserts of large amounts of data?
Viv

Bulk Load question.

Reply via email to