On Thu, Mar 3, 2011 at 10:19 PM, Aditya Sharma <[email protected]>wrote:

>
> Since we are using EC2 Large instances, it seems unlikely that network or
> some other virtualization related resources crunch are affecting our
> performance measurement.
>
>
You are guaranteed to see large variance in results when benchmarking on
EC2.  Welcome to the oversubscribed public cloud!  You can run the same test
twice with the same instances and still see massive differences.  You should
expect at least 25% variance between test runs (in practice I've seen as
much as 100% variance myself).

Two nodes is a very small cluster to be benchmarking on.  The minimum
cluster size is typically recommended as something like 1 master node (NN,
JT and HBase Master) + 3 slaves (DN, TT and Region Server).  But HBase
really works best when you start to approach 10 slaves or more.

You'll need to provide a lot more detail in order to get any meaningful
recommendations on configuration, though.

Are you using instance storage on EC2 or EBS mounts?  If EBS, how many
volumes per instance?

What HBase and Hadoop versions are you running?

What replication factor are you using for HDFS?

What is your hbase-site.xml configuration?

What JVM heap size are you using for the region servers?

What are your read patterns?  Are they random reads?  Or is it typically a
smaller active data set that would benefit from the HBase block cache?

In general, EC2 instances provide very poor performance in IOPS.  If you're
using m1.large instances, you get 2 instance volumes (I believe), if you
bump up to m1.xlarge or c1.xlarge, then you get 4 instance volumes, which
should provide some improvement in throughput.  But again these are fully
virtualized instances, which comes at a cost in terms of IO throughput
(sometimes a high cost if you get stuck with noisy neighbors).

A bit of googling turned up this post with some interesting results on cloud
IO performance:
http://blog.cloudharmony.com/2010/06/disk-io-benchmarking-in-cloud.html

Ideally if you want to benchmark, you are best off getting your hands on
some physical hardware or trying one of the hosting providers that gets you
as close to bare metal as possible.

--gh

Reply via email to