> > Since we are using EC2 Large instances, it seems > > unlikely that network or some other virtualization > > related resources crunch are affecting our > > performance measurement.
Your assumptions are wrong. It seems only c1.xlarge and m2.4xlarge may be assigned dedicated hardware. Reference: http://huanliu.wordpress.com/2010/06/14/amazons-physical-hardware-and-ec2-compute-unit/ Their shared disk storage (instance-store) would still be impacted by neighbors. I think the only way you will approach consistent results is if you use the cluster compute instances (cc1.4xlarge). These are a completely different architecture, HVM instead of PVM, dedicated 10GigE network, dedicated physical hosts, etc. With other instance types I see large variance from day to day even hour to hour. In short, EC2 is useless for performance benchmarking. It's very handy for a lot of other things though, like functional or smoke testing. For additional information see: http://www.comp.nus.edu.sg/~vldb2010/proceedings/files/papers/E02.pdf . - Andy --- On Thu, 3/3/11, Gary Helmling <[email protected]> wrote: > From: Gary Helmling <[email protected]> > Subject: Re: High variance in results for hbase benchmarking > To: [email protected] > Cc: "Aditya Sharma" <[email protected]> > Date: Thursday, March 3, 2011, 11:37 PM > On Thu, Mar 3, 2011 at 10:19 PM, > Aditya Sharma <[email protected]>wrote: > > > > > Since we are using EC2 Large instances, it seems > > unlikely that network or some other virtualization > > related resources crunch are affecting our > > performance measurement. > > > > > You are guaranteed to see large variance in results when > benchmarking on EC2. Welcome to the oversubscribed public > cloud! You can run the same test twice with the same > instances and still see massive differences. You should > expect at least 25% variance between test runs (in practice > I've seen as much as 100% variance myself). > > Two nodes is a very small cluster to be benchmarking on. > The minimum cluster size is typically recommended as something > like 1 master node (NN, JT and HBase Master) + 3 slaves (DN, > TT and Region Server). But HBase really works best when you > start to approach 10 slaves or > more. [...]
