Thanks J-D.

The only place where I create an HTable is in the constructor of my Mapper.  
The constructor is called only once for each map task right? 

Han
On Jul 22, 2010, at 4:43 PM, Jean-Daniel Cryans wrote:

> Han,
> 
> This is bad, you must be doing something slow like creating a new
> HTable for each put call. Also you need to use the write buffer
> (disable auto flushing, then set the write buffer size on HTable
> during the map configuration) if since you manage the HTable yourself.
> 
> The bulk load tool usage is wide-spread, you should give it a try if
> you only have 1 family.
> 
> J-D
> 
> On Thu, Jul 22, 2010 at 1:06 PM, HAN LIU <[email protected]> wrote:
>> Hi Guys,
>> 
>> I've been doing some data insertion from HDFS to HBase and the performance 
>> seems to be really bad. It took about 3 hours to insert 15 GB of data.  The 
>> mapreduce job is launched from one machine which grabs data from HDFS and 
>> insert them into an HTable located at 3 other machines (1 master and 2 
>> regionservers). There are 17 map job in total (no reduce jobs), representing 
>> 17 files each about 1GB in size. The mapper simply extracts the useful 
>> information from each of these files and insert them into HBase. In the end 
>> there are about 22 million rows added in the table, and with my 
>> implementation (pretty low-efficient I think), for each of these row a 
>> 'table.put(Put p)' method is called once, so in the end there are 22 million 
>> 'table.put()' calls.
>> 
>> Does it make sense that these many 'table.put' calls talks 3 hours? Because 
>> I have played with my code and I have determined that the bottleneck is 
>> these 'table.put()' calls, because if I remove them, the rest of the code 
>> (doing every part of the job except for committing the updates via 
>> 'table.put()' )only takes 2 minutes to run. I am really inexperienced in 
>> HBase, so how do you guys usually do data insertion? What could be the 
>> tricks to enhance performance?
>> 
>> I am thinking about using the bulk load feature to batch insert data into 
>> HBase. Is this a popular method out there in the HBase community?
>> 
>> Really sorry about asking so much help for my problems but not helping other 
>> people with theirs. I really would like to offer help once I get more 
>> experienced with HBase.
>> 
>> Thanks a lot in advance :)
>> 
>> 
>> ----
>> Han Liu
>> SCS & HCI Institute
>> Undergrad. Class of 2012
>> Carnegie Mellon University
>> 
>> 
>> 
>> 
> 

Han Liu
SCS & HCI Institute
Undergrad. Class of 2012 
Carnegie Mellon University



Reply via email to