Thank you for the pointer. I'm not sure if this is the bug I was encountering. This particular bug points to a problem with how load was calculated. The problem I was experiencing seemed to be a real issue that affected performance, not just reporting.
They published a fix on 20100827, but it doesn't seem to address the real problem of performance, just load reporting. In any case, I've downgraded from ubuntu lucid (10.04) to karmic (9.1) and am seeing a load reporting and response that is far more intuitive. I recommend avoiding lucid (at least in EC2). I've also upgraded to the latest release candidate that J-D posted (http://people.apache.org/~jdcryans/hbase-0.89.20100830-candidate-1/). (previously I was using CDH3) I'm very happy with the results. Stability is much better. It will take more than light breeze to knock the cluster over now! Thank you for your help, Matthew On Sep 1, 2010, at 10:17 AM, Gary Helmling wrote: > On Wed, Sep 1, 2010 at 7:24 AM, Matthew LeMieux <[email protected]> wrote: > >> I'm starting to find that EC2 is not reliable enough to support HBase. I'm >> running into 2 things that might be related: >> >> 1) On idle machines that are apparently doing nothing (reports of <3% CPU >> utilization, no I/O wait) the load is reported as being higher than the >> number of cores. I don't know if attachments work on the mailing list, but >> I attached a small image anyway to illustrate this confusing thing. (I've >> been using m1.large and m2.xlarge running CDH3) >> >> > If you're using AMIs based on the latest Ubuntu (10.4), theres a known > kernel issue that seems to be causing high loads while idle. More info > here: > > https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 > > > It's possible other distros running 2.6.32 may be showing the same problem > as well.
