Hi SJ,

Awesome setup. I tested with your configurations and the performance is 6 times 
better. :) Thanks a lot.
What role does 'setWriteToWAL(false)' play here? I hear that if it's set to 
false then there will be data loss in case of RegionServer crash. How would the 
performance be affected if I set it to true? 

Thanks,

Han
On Jul 23, 2010, at 9:57 AM, Samuru Jackson wrote:

> Hi,
> 
> For testing purposes I have to make some bulk loads as well.
> 
> What I do is to insert the data in bulks (for instance 10.000 rows every 
> time).
> 
> I create a Put List out of those records:
> 
> List<Put> pList = new ArrayList<Put>();
> 
> where each Put has WriteToWAL set to false;
> 
> put.setWriteToWAL(false);
> pList.add(p);
> 
> Then I set autoflush to false and create a larger writebuffer:
> 
> hTable.setAutoFlush(false);
> hTable.setWriteBufferSize(1024*1024*12);
> hTable.put(pList);    
> hTable.setAutoFlush(true);
> 
> The following settings have boosted my load performance 5times -
> without any bigger performance tunings, no special HW  configuration I
> achieve 8000-9000 records per second:
> p.setWriteToWAL(false);
> hTable.setAutoFlush(false);
> hTable.setWriteBufferSize(1024*1024*12);
> 
> 
> /SJ
> 
> 
> On Thu, Jul 22, 2010 at 6:31 PM, Jean-Daniel Cryans <[email protected]> 
> wrote:
>> Yes, then you should really look at using the write buffer.
>> 
>> J-D
>> 
>> On Thu, Jul 22, 2010 at 3:22 PM, HAN LIU <[email protected]> wrote:
>>> Thanks J-D.
>>> 
>>> The only place where I create an HTable is in the constructor of my Mapper. 
>>>  The constructor is called only once for each map task right?
>>> 
>>> Han
>>> On Jul 22, 2010, at 4:43 PM, Jean-Daniel Cryans wrote:
>>> 
>>>> Han,
>>>> 
>>>> This is bad, you must be doing something slow like creating a new
>>>> HTable for each put call. Also you need to use the write buffer
>>>> (disable auto flushing, then set the write buffer size on HTable
>>>> during the map configuration) if since you manage the HTable yourself.
>>>> 
>>>> The bulk load tool usage is wide-spread, you should give it a try if
>>>> you only have 1 family.
>>>> 
>>>> J-D
>>>> 
>>>> On Thu, Jul 22, 2010 at 1:06 PM, HAN LIU <[email protected]> wrote:
>>>>> Hi Guys,
>>>>> 
>>>>> I've been doing some data insertion from HDFS to HBase and the 
>>>>> performance seems to be really bad. It took about 3 hours to insert 15 GB 
>>>>> of data.  The mapreduce job is launched from one machine which grabs data 
>>>>> from HDFS and insert them into an HTable located at 3 other machines (1 
>>>>> master and 2 regionservers). There are 17 map job in total (no reduce 
>>>>> jobs), representing 17 files each about 1GB in size. The mapper simply 
>>>>> extracts the useful information from each of these files and insert them 
>>>>> into HBase. In the end there are about 22 million rows added in the 
>>>>> table, and with my implementation (pretty low-efficient I think), for 
>>>>> each of these row a 'table.put(Put p)' method is called once, so in the 
>>>>> end there are 22 million 'table.put()' calls.
>>>>> 
>>>>> Does it make sense that these many 'table.put' calls talks 3 hours? 
>>>>> Because I have played with my code and I have determined that the 
>>>>> bottleneck is these 'table.put()' calls, because if I remove them, the 
>>>>> rest of the code (doing every part of the job except for committing the 
>>>>> updates via 'table.put()' )only takes 2 minutes to run. I am really 
>>>>> inexperienced in HBase, so how do you guys usually do data insertion? 
>>>>> What could be the tricks to enhance performance?
>>>>> 
>>>>> I am thinking about using the bulk load feature to batch insert data into 
>>>>> HBase. Is this a popular method out there in the HBase community?
>>>>> 
>>>>> Really sorry about asking so much help for my problems but not helping 
>>>>> other people with theirs. I really would like to offer help once I get 
>>>>> more experienced with HBase.
>>>>> 
>>>>> Thanks a lot in advance :)
>>>>> 
>>>>> 
>>>>> ----
>>>>> Han Liu
>>>>> SCS & HCI Institute
>>>>> Undergrad. Class of 2012
>>>>> Carnegie Mellon University
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> Han Liu
>>> SCS & HCI Institute
>>> Undergrad. Class of 2012
>>> Carnegie Mellon University
>>> 
>>> 
>>> 
>>> 
>> 
> 

Han Liu
SCS & HCI Institute
Undergrad. Class of 2012 
Carnegie Mellon University



Reply via email to