> > Since we are using EC2 Large instances, it seems
> > unlikely that network or some other virtualization
> > related resources crunch are affecting our
> > performance measurement.

Your assumptions are wrong. It seems only c1.xlarge and m2.4xlarge may be 
assigned dedicated hardware. Reference: 
http://huanliu.wordpress.com/2010/06/14/amazons-physical-hardware-and-ec2-compute-unit/
  Their shared disk storage (instance-store) would still be impacted by 
neighbors.

I think the only way you will approach consistent results is if you use the 
cluster compute instances (cc1.4xlarge). These are a completely different 
architecture, HVM instead of PVM, dedicated 10GigE network, dedicated physical 
hosts, etc.

With other instance types I see large variance from day to day even hour to 
hour. In short, EC2 is useless for performance benchmarking. It's very handy 
for a lot of other things though, like functional or smoke testing.

For additional information see: 
http://www.comp.nus.edu.sg/~vldb2010/proceedings/files/papers/E02.pdf . 

   - Andy


--- On Thu, 3/3/11, Gary Helmling <[email protected]> wrote:

> From: Gary Helmling <[email protected]>
> Subject: Re: High variance in results for hbase benchmarking
> To: [email protected]
> Cc: "Aditya Sharma" <[email protected]>
> Date: Thursday, March 3, 2011, 11:37 PM
> On Thu, Mar 3, 2011 at 10:19 PM,
> Aditya Sharma <[email protected]>wrote:
> 
> >
> > Since we are using EC2 Large instances, it seems
> > unlikely that network or some other virtualization
> > related resources crunch are affecting our
> > performance measurement.
> >
> >
> You are guaranteed to see large variance in results when
> benchmarking on EC2.  Welcome to the oversubscribed public
> cloud!  You can run the same test twice with the same
> instances and still see massive differences.  You should
> expect at least 25% variance between test runs (in practice
> I've seen as much as 100% variance myself).
> 
> Two nodes is a very small cluster to be benchmarking on. 
> The minimum cluster size is typically recommended as something
> like 1 master node (NN, JT and HBase Master) + 3 slaves (DN,
> TT and Region Server).  But HBase really works best when you
> start to approach 10 slaves or
> more.
[...]




Reply via email to