RE: High variance in results for hbase benchmarking

Buttler, David Mon, 07 Mar 2011 15:54:08 -0800

Wait, you had two nodes and replication turned up to 3?  How does that work?
Dave



-----Original Message-----
From: Aditya Sharma [mailto:[email protected]] 
Sent: Sunday, March 06, 2011 10:00 AM
To: [email protected]; [email protected]; [email protected]
Cc: [email protected]
Subject: Re: High variance in results for hbase benchmarking

Mark, Gary, Ted,

Thanks for your responses. I will keep the EC2 issues and other things in
mind when I get a chance to redo the benchmarking. BTW is there any
recommendation for an on demand computing  provider for benchmarking
purpose?

@Gary,
To answer your questions , I am using the default configuration files (with
the hostnames changed of course) with Hadoop 0.20.2 and HBase 0.90.1 and the
default replication of 3 for HDFS. I am not using EBS because I was
concerned about the network latency between the EC2 host and EBS affecting
benchmarking.


@Ted,
No, I am actually trying to emulate existing application behaviour, so the
insert/upsert code is single threaded. However I had used the same
configuration for other datastores like MongoDB/Cassandra etc, and did not
see any marked drop in performance. However this could be because of the EC2
hardware variations.

Aditya


On Fri, Mar 4, 2011 at 1:58 PM, Andrew Purtell <[email protected]> wrote:

> > > Since we are using EC2 Large instances, it seems
> > > unlikely that network or some other virtualization
> > > related resources crunch are affecting our
> > > performance measurement.
>
> Your assumptions are wrong. It seems only c1.xlarge and m2.4xlarge may be
> assigned dedicated hardware. Reference:
> http://huanliu.wordpress.com/2010/06/14/amazons-physical-hardware-and-ec2-compute-unit/
>  Their shared disk storage (instance-store) would still be impacted by
> neighbors.
>
> I think the only way you will approach consistent results is if you use the
> cluster compute instances (cc1.4xlarge). These are a completely different
> architecture, HVM instead of PVM, dedicated 10GigE network, dedicated
> physical hosts, etc.
>
> With other instance types I see large variance from day to day even hour to
> hour. In short, EC2 is useless for performance benchmarking. It's very handy
> for a lot of other things though, like functional or smoke testing.
>
> For additional information see:
> http://www.comp.nus.edu.sg/~vldb2010/proceedings/files/papers/E02.pdf .
>
>   - Andy
>
>
> --- On Thu, 3/3/11, Gary Helmling <[email protected]> wrote:
>
> > From: Gary Helmling <[email protected]>
> > Subject: Re: High variance in results for hbase benchmarking
> > To: [email protected]
> > Cc: "Aditya Sharma" <[email protected]>
> > Date: Thursday, March 3, 2011, 11:37 PM
> > On Thu, Mar 3, 2011 at 10:19 PM,
> > Aditya Sharma <[email protected]>wrote:
> >
> > >
> > > Since we are using EC2 Large instances, it seems
> > > unlikely that network or some other virtualization
> > > related resources crunch are affecting our
> > > performance measurement.
> > >
> > >
> > You are guaranteed to see large variance in results when
> > benchmarking on EC2.  Welcome to the oversubscribed public
> > cloud!  You can run the same test twice with the same
> > instances and still see massive differences.  You should
> > expect at least 25% variance between test runs (in practice
> > I've seen as much as 100% variance myself).
> >
> > Two nodes is a very small cluster to be benchmarking on.
> > The minimum cluster size is typically recommended as something
> > like 1 master node (NN, JT and HBase Master) + 3 slaves (DN,
> > TT and Region Server).  But HBase really works best when you
> > start to approach 10 slaves or
> > more.
> [...]
>
>
>
>
>

RE: High variance in results for hbase benchmarking

Reply via email to