RE: Mass dumping of data has issues

Naveen Tue, 25 Sep 2012 23:20:16 -0700

Thank you for the quick response Paul. I've switched off the autoFlush
(haven't increased the writebuffer though). And the splits are pretty
effective, I think because of the similar number of requests each region
gets before they split further.

As per your suggestion, I tried the same task two more times with increased
(15 and 20) handlers for regionservers and the results, though not ideal,
are better than what they were with default number of handlers. With 15
handlers per regionserver, the results were actually as bad (multiple NPE
'coz of RPC call timeouts) as with default value. But with 20, the NPEs
reduced dramatically (still not gone though). I think I'll try again with 25
and report back with the results.

Also, I've seen some data loss in the process, I'm not sure its because of
RPC timeouts as I've had exactly equal number of rows missing in the
attempts I made(once with no. of handlers as 10 and the other time as 20)
where number of NPE exceptions varied drastically. Any tips where/what I
should be focusing on to uncover the cause of data-loss? (PS: I'm using
primary key of the mysql table as my rowkey)

Warm Regards,
Naveen

-----Original Message-----
From: Paul Mackles [mailto:[email protected]] 
Sent: Monday, September 24, 2012 8:51 PM
To: [email protected]
Subject: Re: Mass dumping of data has issues

Did you adjust the writebuffer to a larger size and/or turn off autoFlush
for the Htable? I've found that both of those settings can have a profound
impact on write performance. You might also look at adjusting the handler
count for the regionservers which by default is pretty low. You should
also confirm that your splits are effective in distributing the writes.

On 9/24/12 11:01 AM, "Naveen" <[email protected]> wrote:

>Hi,
>
>I've come across the following issue for which I'm unable to deduce what
>the
>root-cause could be.
>
>Scenario:
>I'm trying to dump data(8.3M+ records) from mysql into a hbase table using
>multi-threading(25 threads dumping 10 puts/tuples at a time).
>
>Config:
>hbase v 0.92.0
>hadoop v 1.0
>1 master + 4 slaves
>table is pre-split
>
>Issue:
>Getting a NPE because RPC call takes longer than timeout(default 60 sec).
>I'm not worried about the NPE(it's been fixed in 0.92.1+ releases) but
>about
>what could be causing RPC call to timeout on arbitrary intervals.
>
>Custom printed log : pastebin.com/r85wv8Yt
>
>WARN [Thread-99255] (HConnectionManager.java:1587) - Failed all from
>region=dump,a405cdd9-b5b7-4ec2-9f91-fea98d5cb656,1348331511473.77f13d455fd
>63
>c601816759b6ed575e8., hostname=hdslave1.company.com, port=60020
>java.util.concurrent.ExecutionException: java.lang.RuntimeException:
>java.lang.NullPointerException
>       at
>java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>       at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.
>processBatchCallback(HConnectionManager.java:1557)
>       at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.
>processBatch(HConnectionManager.java:1409)
>       at
>org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
>       at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777)
>       at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
>       at
>org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java
>:4
>02)
>       at coprocessor.dump.Dumper.run(Dumper.java:41)
>       at java.lang.Thread.run(Thread.java:662)
>       
>Any help or insights are welcome.
>
>Warm Regards,
>Naveen 
>

RE: Mass dumping of data has issues

Reply via email to