Hi all, I am running an MR job that is loading an HBase table in the reduce, and I am seeing hopeless performance - 10 million records of <1Kb in 2 hours so far.
Please bear in mind I am software guy, so go easy ;) but here is what I know so far: (http://code.google.com/p/gbif-occurrencestore/wiki/ClusterConfig describes the cluster, and currently 40 reducers are running, all on CDH3) - RS and TT all have load averages way down at 1-2 max - RS and TT CPUs are 398% idle on quad cores, 1598% idle on hyper threading dual quads - RS heap is 4G - there seems no iowait anywhere - Free -m shows "swap used 0" on all machines if I am reading it correctly Can anyone please suggest where I can go digging? Please don't assume I have looked at the basics - I'm learning as much as I can as I go. Thanks, Tim
