Thanks J-D > Let's start with... > > Are you using HTable directly or are you going through TableOutputFormat? I'm using http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/mapreduce/IdentityTableReducer.html
> If former, do you use the write buffer? Not explicitly set be me > Are you inserting into multiple families? 1 family > Are you using compression? LZO > Did you take a look at the region server logs? I am now ;) > If so, so you see a lot of messages in the likes of "Blocking ..."? Indeed: memstore size 138.7m is >= than blocking 128.0m size 2010-11-24 17:12:49,136 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region raw_occurrence_record,,1290613896288.841ac149ecacf4b721ac232960e98761.: memstore size 138.7m is >= than blocking 128.0m size 2010-11-24 17:12:49,155 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 10 on 60020' on region raw_occurrence_record,,1290613896288.841ac149ecacf4b721ac232960e98761.: memstore size 146.3m is >= than blocking 128.0m size 2010-11-24 17:12:49,169 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 5 on 60020' on region raw_occurrence_record,,1290613896288.841ac149ecacf4b721ac232960e98761.: memstore size 148.8m is >= than blocking 128.0m size 2010-11-24 17:12:49,193 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 8 on 60020' on region I guess this is bad, but could benefit from some guidance... > Are you monitoring the GCs? > If so, do you see some pauses longer than a second? What's the best way to do this please and I will? > Thx! Thank you J-D Tim > > J-D > > On Wed, Nov 24, 2010 at 11:00 AM, Tim Robertson > <[email protected]> wrote: >> Hi all, >> >> I am running an MR job that is loading an HBase table in the reduce, >> and I am seeing hopeless performance - 10 million records of <1Kb in 2 >> hours so far. >> >> Please bear in mind I am software guy, so go easy ;) but here is what >> I know so far: >> >> (http://code.google.com/p/gbif-occurrencestore/wiki/ClusterConfig >> describes the cluster, and currently 40 reducers are running, all on >> CDH3) >> >> - RS and TT all have load averages way down at 1-2 max >> - RS and TT CPUs are 398% idle on quad cores, 1598% idle on hyper >> threading dual quads >> - RS heap is 4G >> - there seems no iowait anywhere >> - Free -m shows "swap used 0" on all machines if I am reading it correctly >> >> Can anyone please suggest where I can go digging? Please don't assume >> I have looked at the basics - I'm learning as much as I can as I go. >> >> Thanks, >> Tim >> >
