We're seeing problems loading data into HBase using MR

h Tue, 22 Mar 2011 10:34:30 -0700

Hey everyone,

I've got a situation where my data loads to HBase are failing.


The data is sent an isolated HBase cluster from a different hadoop cluster.  
What I see is that the performance is pretty bad (around 40k burst, 1k average 
inserts - with about 200 byte payloads).  If I were to write a standalone java 
client to hit the cluster I can get a sustained 40k ops/sec insert. 80k 
ops/second if I run in a different window.

The network is all gigE. 4GB heap on region server.Nothing external of the 
HBase system running on the cluster.

>From the MR side we see that the job eventually gets to 50% and then fails 
>with no status updates in 600 seconds.  If we were to write a simple java MR 
>that shoves in about 10Gb data through 20 reducers it also chokes and dies.

Is there anything that we should be looking at?  As a point of reference at 
0.26 we could push 250k ops / sec same jobs averaging in the 150's.  We also 
applied the META MEMSTORE_FLUSHSIZE fix 
(http://hbase.apache.org/book/upgrading.html)


Any help is greatly appreciated!

Thanks,
Dirk

We're seeing problems loading data into HBase using MR

Reply via email to