Wait, you had two nodes and replication turned up to 3? How does that work? Dave
-----Original Message----- From: Aditya Sharma [mailto:[email protected]] Sent: Sunday, March 06, 2011 10:00 AM To: [email protected]; [email protected]; [email protected] Cc: [email protected] Subject: Re: High variance in results for hbase benchmarking Mark, Gary, Ted, Thanks for your responses. I will keep the EC2 issues and other things in mind when I get a chance to redo the benchmarking. BTW is there any recommendation for an on demand computing provider for benchmarking purpose? @Gary, To answer your questions , I am using the default configuration files (with the hostnames changed of course) with Hadoop 0.20.2 and HBase 0.90.1 and the default replication of 3 for HDFS. I am not using EBS because I was concerned about the network latency between the EC2 host and EBS affecting benchmarking. @Ted, No, I am actually trying to emulate existing application behaviour, so the insert/upsert code is single threaded. However I had used the same configuration for other datastores like MongoDB/Cassandra etc, and did not see any marked drop in performance. However this could be because of the EC2 hardware variations. Aditya On Fri, Mar 4, 2011 at 1:58 PM, Andrew Purtell <[email protected]> wrote: > > > Since we are using EC2 Large instances, it seems > > > unlikely that network or some other virtualization > > > related resources crunch are affecting our > > > performance measurement. > > Your assumptions are wrong. It seems only c1.xlarge and m2.4xlarge may be > assigned dedicated hardware. Reference: > http://huanliu.wordpress.com/2010/06/14/amazons-physical-hardware-and-ec2-compute-unit/ > Their shared disk storage (instance-store) would still be impacted by > neighbors. > > I think the only way you will approach consistent results is if you use the > cluster compute instances (cc1.4xlarge). These are a completely different > architecture, HVM instead of PVM, dedicated 10GigE network, dedicated > physical hosts, etc. > > With other instance types I see large variance from day to day even hour to > hour. In short, EC2 is useless for performance benchmarking. It's very handy > for a lot of other things though, like functional or smoke testing. > > For additional information see: > http://www.comp.nus.edu.sg/~vldb2010/proceedings/files/papers/E02.pdf . > > - Andy > > > --- On Thu, 3/3/11, Gary Helmling <[email protected]> wrote: > > > From: Gary Helmling <[email protected]> > > Subject: Re: High variance in results for hbase benchmarking > > To: [email protected] > > Cc: "Aditya Sharma" <[email protected]> > > Date: Thursday, March 3, 2011, 11:37 PM > > On Thu, Mar 3, 2011 at 10:19 PM, > > Aditya Sharma <[email protected]>wrote: > > > > > > > > Since we are using EC2 Large instances, it seems > > > unlikely that network or some other virtualization > > > related resources crunch are affecting our > > > performance measurement. > > > > > > > > You are guaranteed to see large variance in results when > > benchmarking on EC2. Welcome to the oversubscribed public > > cloud! You can run the same test twice with the same > > instances and still see massive differences. You should > > expect at least 25% variance between test runs (in practice > > I've seen as much as 100% variance myself). > > > > Two nodes is a very small cluster to be benchmarking on. > > The minimum cluster size is typically recommended as something > > like 1 master node (NN, JT and HBase Master) + 3 slaves (DN, > > TT and Region Server). But HBase really works best when you > > start to approach 10 slaves or > > more. > [...] > > > > >
