Thank you for the quick response Paul. I've switched off the autoFlush (haven't increased the writebuffer though). And the splits are pretty effective, I think because of the similar number of requests each region gets before they split further.
As per your suggestion, I tried the same task two more times with increased (15 and 20) handlers for regionservers and the results, though not ideal, are better than what they were with default number of handlers. With 15 handlers per regionserver, the results were actually as bad (multiple NPE 'coz of RPC call timeouts) as with default value. But with 20, the NPEs reduced dramatically (still not gone though). I think I'll try again with 25 and report back with the results. Also, I've seen some data loss in the process, I'm not sure its because of RPC timeouts as I've had exactly equal number of rows missing in the attempts I made(once with no. of handlers as 10 and the other time as 20) where number of NPE exceptions varied drastically. Any tips where/what I should be focusing on to uncover the cause of data-loss? (PS: I'm using primary key of the mysql table as my rowkey) Warm Regards, Naveen -----Original Message----- From: Paul Mackles [mailto:[email protected]] Sent: Monday, September 24, 2012 8:51 PM To: [email protected] Subject: Re: Mass dumping of data has issues Did you adjust the writebuffer to a larger size and/or turn off autoFlush for the Htable? I've found that both of those settings can have a profound impact on write performance. You might also look at adjusting the handler count for the regionservers which by default is pretty low. You should also confirm that your splits are effective in distributing the writes. On 9/24/12 11:01 AM, "Naveen" <[email protected]> wrote: >Hi, > >I've come across the following issue for which I'm unable to deduce what >the >root-cause could be. > >Scenario: >I'm trying to dump data(8.3M+ records) from mysql into a hbase table using >multi-threading(25 threads dumping 10 puts/tuples at a time). > >Config: >hbase v 0.92.0 >hadoop v 1.0 >1 master + 4 slaves >table is pre-split > >Issue: >Getting a NPE because RPC call takes longer than timeout(default 60 sec). >I'm not worried about the NPE(it's been fixed in 0.92.1+ releases) but >about >what could be causing RPC call to timeout on arbitrary intervals. > >Custom printed log : pastebin.com/r85wv8Yt > >WARN [Thread-99255] (HConnectionManager.java:1587) - Failed all from >region=dump,a405cdd9-b5b7-4ec2-9f91-fea98d5cb656,1348331511473.77f13d455fd >63 >c601816759b6ed575e8., hostname=hdslave1.company.com, port=60020 >java.util.concurrent.ExecutionException: java.lang.RuntimeException: >java.lang.NullPointerException > at >java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n. >processBatchCallback(HConnectionManager.java:1557) > at >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio >n. >processBatch(HConnectionManager.java:1409) > at >org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900) > at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760) > at >org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java >:4 >02) > at coprocessor.dump.Dumper.run(Dumper.java:41) > at java.lang.Thread.run(Thread.java:662) > >Any help or insights are welcome. > >Warm Regards, >Naveen >
