I think it's mostly a matter of cost-efficiency -- HBase *runs* just fine on EC2, and is built to be in a transient environment. It's just not always cost-effective because you have to use pricey instances.
As far as my issue -- it didn't seem to be ZK. I like Andrew's point, I'll knock it up to bigger instances and see what's up. -B On Wed, Sep 1, 2010 at 10:04 AM, Jonathan Gray <[email protected]> wrote: > While I completely agree with much of what you're saying, and am usually one > of the first to encourage people to not use virtual machines w/ HBase, I know > of several successful deployments of HBase on EC2. In most instances there > was some pain encountered, but it does work for some. > > I've not seen these specific issues you seem to be running in to > (periodically spiking load but no cpu or iowait). > > I'm not sure I know what HBase could do to operate better in these > environments. I'm not sure I understand exactly what is happening to RS and > ZooKeeper when EC2 is being weird. You can't talk to ZK because of a > networking issue? Have you dug in to the ZK server logs to see what's up? > > HBase is a highly available service, we need to do heartbeating of some kind, > so lose of network connectivity is a killer. > > It could also be that ZK is being starved of IO so that it cannot write to > its transaction log and that is what is slowing it down. > > JG > >> -----Original Message----- >> From: Matthew LeMieux [mailto:[email protected]] >> Sent: Wednesday, September 01, 2010 7:25 AM >> To: [email protected] >> Subject: Re: Slow Inserts on EC2 Cluster >> >> I'm starting to find that EC2 is not reliable enough to support HBase. >> I'm running into 2 things that might be related: >> >> 1) On idle machines that are apparently doing nothing (reports of <3% >> CPU utilization, no I/O wait) the load is reported as being higher >> than the number of cores. I don't know if attachments work on the >> mailing list, but I attached a small image anyway to illustrate this >> confusing thing. (I've been using m1.large and m2.xlarge running CDH3) >> >> 2) Every once in a while it seems that somebody hits the pause button >> on one of my instances, and while the CPU utilization stays low, the >> load value spikes to a high value. When this happens the region >> servers decide to close up shop. It appears to be a problem with >> contacting zookeeper servers (who happen to stay up and running, but >> perhaps somewhat unresponsive when Amazon decides to hit the pause >> button). I have extended the timeout for contacting zookeeper servers, >> but these events continue to persist. One such event happened 8 hours >> ago, and I still can't get HBase back up and running. >> >> I've seen many comments on this list informing users that they are >> using hardware (or virtual machines) that are simply not big enough, >> not fast enough, or don't have enough memory. I'd like to offer an >> alternative point of view. Whether or not EC2 will last is uncertain, >> but cloud computing environments will definitely be around for a long >> time. What would it take to make HBase resilient enough to take >> advantage of those environments? Based on my experience and comments >> on this list, it seems "HBase in the cloud" is still a rather painful >> proposition. >> >> -Matthew > > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
