HBase uses a write ahead log, if you don't write to it you will lose all the data that's only in the memstores when a region server fails. Useful for importing a lot of data.
J-D On Fri, Jul 23, 2010 at 11:33 AM, Han Liu <[email protected]> wrote: > Hi SJ, > > Awesome setup. I tested with your configurations and the performance is 6 > times better. :) Thanks a lot. > What role does 'setWriteToWAL(false)' play here? I hear that if it's set to > false then there will be data loss in case of RegionServer crash. How would > the performance be affected if I set it to true? > > Thanks, > > Han > On Jul 23, 2010, at 9:57 AM, Samuru Jackson wrote: > >> Hi, >> >> For testing purposes I have to make some bulk loads as well. >> >> What I do is to insert the data in bulks (for instance 10.000 rows every >> time). >> >> I create a Put List out of those records: >> >> List<Put> pList = new ArrayList<Put>(); >> >> where each Put has WriteToWAL set to false; >> >> put.setWriteToWAL(false); >> pList.add(p); >> >> Then I set autoflush to false and create a larger writebuffer: >> >> hTable.setAutoFlush(false); >> hTable.setWriteBufferSize(1024*1024*12); >> hTable.put(pList); >> hTable.setAutoFlush(true); >> >> The following settings have boosted my load performance 5times - >> without any bigger performance tunings, no special HW configuration I >> achieve 8000-9000 records per second: >> p.setWriteToWAL(false); >> hTable.setAutoFlush(false); >> hTable.setWriteBufferSize(1024*1024*12); >> >> >> /SJ >> >> >> On Thu, Jul 22, 2010 at 6:31 PM, Jean-Daniel Cryans <[email protected]> >> wrote: >>> Yes, then you should really look at using the write buffer. >>> >>> J-D >>> >>> On Thu, Jul 22, 2010 at 3:22 PM, HAN LIU <[email protected]> wrote: >>>> Thanks J-D. >>>> >>>> The only place where I create an HTable is in the constructor of my >>>> Mapper. The constructor is called only once for each map task right? >>>> >>>> Han >>>> On Jul 22, 2010, at 4:43 PM, Jean-Daniel Cryans wrote: >>>> >>>>> Han, >>>>> >>>>> This is bad, you must be doing something slow like creating a new >>>>> HTable for each put call. Also you need to use the write buffer >>>>> (disable auto flushing, then set the write buffer size on HTable >>>>> during the map configuration) if since you manage the HTable yourself. >>>>> >>>>> The bulk load tool usage is wide-spread, you should give it a try if >>>>> you only have 1 family. >>>>> >>>>> J-D >>>>> >>>>> On Thu, Jul 22, 2010 at 1:06 PM, HAN LIU <[email protected]> wrote: >>>>>> Hi Guys, >>>>>> >>>>>> I've been doing some data insertion from HDFS to HBase and the >>>>>> performance seems to be really bad. It took about 3 hours to insert 15 >>>>>> GB of data. The mapreduce job is launched from one machine which grabs >>>>>> data from HDFS and insert them into an HTable located at 3 other >>>>>> machines (1 master and 2 regionservers). There are 17 map job in total >>>>>> (no reduce jobs), representing 17 files each about 1GB in size. The >>>>>> mapper simply extracts the useful information from each of these files >>>>>> and insert them into HBase. In the end there are about 22 million rows >>>>>> added in the table, and with my implementation (pretty low-efficient I >>>>>> think), for each of these row a 'table.put(Put p)' method is called >>>>>> once, so in the end there are 22 million 'table.put()' calls. >>>>>> >>>>>> Does it make sense that these many 'table.put' calls talks 3 hours? >>>>>> Because I have played with my code and I have determined that the >>>>>> bottleneck is these 'table.put()' calls, because if I remove them, the >>>>>> rest of the code (doing every part of the job except for committing the >>>>>> updates via 'table.put()' )only takes 2 minutes to run. I am really >>>>>> inexperienced in HBase, so how do you guys usually do data insertion? >>>>>> What could be the tricks to enhance performance? >>>>>> >>>>>> I am thinking about using the bulk load feature to batch insert data >>>>>> into HBase. Is this a popular method out there in the HBase community? >>>>>> >>>>>> Really sorry about asking so much help for my problems but not helping >>>>>> other people with theirs. I really would like to offer help once I get >>>>>> more experienced with HBase. >>>>>> >>>>>> Thanks a lot in advance :) >>>>>> >>>>>> >>>>>> ---- >>>>>> Han Liu >>>>>> SCS & HCI Institute >>>>>> Undergrad. Class of 2012 >>>>>> Carnegie Mellon University >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> Han Liu >>>> SCS & HCI Institute >>>> Undergrad. Class of 2012 >>>> Carnegie Mellon University >>>> >>>> >>>> >>>> >>> >> > > Han Liu > SCS & HCI Institute > Undergrad. Class of 2012 > Carnegie Mellon University > > > >
