HBase uses a write ahead log, if you don't write to it you will lose
all the data that's only in the memstores when a region server fails.
Useful for importing a lot of data.

J-D

On Fri, Jul 23, 2010 at 11:33 AM, Han Liu <[email protected]> wrote:
> Hi SJ,
>
> Awesome setup. I tested with your configurations and the performance is 6 
> times better. :) Thanks a lot.
> What role does 'setWriteToWAL(false)' play here? I hear that if it's set to 
> false then there will be data loss in case of RegionServer crash. How would 
> the performance be affected if I set it to true?
>
> Thanks,
>
> Han
> On Jul 23, 2010, at 9:57 AM, Samuru Jackson wrote:
>
>> Hi,
>>
>> For testing purposes I have to make some bulk loads as well.
>>
>> What I do is to insert the data in bulks (for instance 10.000 rows every 
>> time).
>>
>> I create a Put List out of those records:
>>
>> List<Put> pList = new ArrayList<Put>();
>>
>> where each Put has WriteToWAL set to false;
>>
>> put.setWriteToWAL(false);
>> pList.add(p);
>>
>> Then I set autoflush to false and create a larger writebuffer:
>>
>> hTable.setAutoFlush(false);
>> hTable.setWriteBufferSize(1024*1024*12);
>> hTable.put(pList);
>> hTable.setAutoFlush(true);
>>
>> The following settings have boosted my load performance 5times -
>> without any bigger performance tunings, no special HW  configuration I
>> achieve 8000-9000 records per second:
>> p.setWriteToWAL(false);
>> hTable.setAutoFlush(false);
>> hTable.setWriteBufferSize(1024*1024*12);
>>
>>
>> /SJ
>>
>>
>> On Thu, Jul 22, 2010 at 6:31 PM, Jean-Daniel Cryans <[email protected]> 
>> wrote:
>>> Yes, then you should really look at using the write buffer.
>>>
>>> J-D
>>>
>>> On Thu, Jul 22, 2010 at 3:22 PM, HAN LIU <[email protected]> wrote:
>>>> Thanks J-D.
>>>>
>>>> The only place where I create an HTable is in the constructor of my 
>>>> Mapper.  The constructor is called only once for each map task right?
>>>>
>>>> Han
>>>> On Jul 22, 2010, at 4:43 PM, Jean-Daniel Cryans wrote:
>>>>
>>>>> Han,
>>>>>
>>>>> This is bad, you must be doing something slow like creating a new
>>>>> HTable for each put call. Also you need to use the write buffer
>>>>> (disable auto flushing, then set the write buffer size on HTable
>>>>> during the map configuration) if since you manage the HTable yourself.
>>>>>
>>>>> The bulk load tool usage is wide-spread, you should give it a try if
>>>>> you only have 1 family.
>>>>>
>>>>> J-D
>>>>>
>>>>> On Thu, Jul 22, 2010 at 1:06 PM, HAN LIU <[email protected]> wrote:
>>>>>> Hi Guys,
>>>>>>
>>>>>> I've been doing some data insertion from HDFS to HBase and the 
>>>>>> performance seems to be really bad. It took about 3 hours to insert 15 
>>>>>> GB of data.  The mapreduce job is launched from one machine which grabs 
>>>>>> data from HDFS and insert them into an HTable located at 3 other 
>>>>>> machines (1 master and 2 regionservers). There are 17 map job in total 
>>>>>> (no reduce jobs), representing 17 files each about 1GB in size. The 
>>>>>> mapper simply extracts the useful information from each of these files 
>>>>>> and insert them into HBase. In the end there are about 22 million rows 
>>>>>> added in the table, and with my implementation (pretty low-efficient I 
>>>>>> think), for each of these row a 'table.put(Put p)' method is called 
>>>>>> once, so in the end there are 22 million 'table.put()' calls.
>>>>>>
>>>>>> Does it make sense that these many 'table.put' calls talks 3 hours? 
>>>>>> Because I have played with my code and I have determined that the 
>>>>>> bottleneck is these 'table.put()' calls, because if I remove them, the 
>>>>>> rest of the code (doing every part of the job except for committing the 
>>>>>> updates via 'table.put()' )only takes 2 minutes to run. I am really 
>>>>>> inexperienced in HBase, so how do you guys usually do data insertion? 
>>>>>> What could be the tricks to enhance performance?
>>>>>>
>>>>>> I am thinking about using the bulk load feature to batch insert data 
>>>>>> into HBase. Is this a popular method out there in the HBase community?
>>>>>>
>>>>>> Really sorry about asking so much help for my problems but not helping 
>>>>>> other people with theirs. I really would like to offer help once I get 
>>>>>> more experienced with HBase.
>>>>>>
>>>>>> Thanks a lot in advance :)
>>>>>>
>>>>>>
>>>>>> ----
>>>>>> Han Liu
>>>>>> SCS & HCI Institute
>>>>>> Undergrad. Class of 2012
>>>>>> Carnegie Mellon University
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> Han Liu
>>>> SCS & HCI Institute
>>>> Undergrad. Class of 2012
>>>> Carnegie Mellon University
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
> Han Liu
> SCS & HCI Institute
> Undergrad. Class of 2012
> Carnegie Mellon University
>
>
>
>

Reply via email to