Han, This is bad, you must be doing something slow like creating a new HTable for each put call. Also you need to use the write buffer (disable auto flushing, then set the write buffer size on HTable during the map configuration) if since you manage the HTable yourself.
The bulk load tool usage is wide-spread, you should give it a try if you only have 1 family. J-D On Thu, Jul 22, 2010 at 1:06 PM, HAN LIU <[email protected]> wrote: > Hi Guys, > > I've been doing some data insertion from HDFS to HBase and the performance > seems to be really bad. It took about 3 hours to insert 15 GB of data. The > mapreduce job is launched from one machine which grabs data from HDFS and > insert them into an HTable located at 3 other machines (1 master and 2 > regionservers). There are 17 map job in total (no reduce jobs), representing > 17 files each about 1GB in size. The mapper simply extracts the useful > information from each of these files and insert them into HBase. In the end > there are about 22 million rows added in the table, and with my > implementation (pretty low-efficient I think), for each of these row a > 'table.put(Put p)' method is called once, so in the end there are 22 million > 'table.put()' calls. > > Does it make sense that these many 'table.put' calls talks 3 hours? Because I > have played with my code and I have determined that the bottleneck is these > 'table.put()' calls, because if I remove them, the rest of the code (doing > every part of the job except for committing the updates via 'table.put()' > )only takes 2 minutes to run. I am really inexperienced in HBase, so how do > you guys usually do data insertion? What could be the tricks to enhance > performance? > > I am thinking about using the bulk load feature to batch insert data into > HBase. Is this a popular method out there in the HBase community? > > Really sorry about asking so much help for my problems but not helping other > people with theirs. I really would like to offer help once I get more > experienced with HBase. > > Thanks a lot in advance :) > > > ---- > Han Liu > SCS & HCI Institute > Undergrad. Class of 2012 > Carnegie Mellon University > > > >
