On Thu, Mar 3, 2011 at 10:19 PM, Aditya Sharma <[email protected]>wrote:
> > Since we are using EC2 Large instances, it seems unlikely that network or > some other virtualization related resources crunch are affecting our > performance measurement. > > You are guaranteed to see large variance in results when benchmarking on EC2. Welcome to the oversubscribed public cloud! You can run the same test twice with the same instances and still see massive differences. You should expect at least 25% variance between test runs (in practice I've seen as much as 100% variance myself). Two nodes is a very small cluster to be benchmarking on. The minimum cluster size is typically recommended as something like 1 master node (NN, JT and HBase Master) + 3 slaves (DN, TT and Region Server). But HBase really works best when you start to approach 10 slaves or more. You'll need to provide a lot more detail in order to get any meaningful recommendations on configuration, though. Are you using instance storage on EC2 or EBS mounts? If EBS, how many volumes per instance? What HBase and Hadoop versions are you running? What replication factor are you using for HDFS? What is your hbase-site.xml configuration? What JVM heap size are you using for the region servers? What are your read patterns? Are they random reads? Or is it typically a smaller active data set that would benefit from the HBase block cache? In general, EC2 instances provide very poor performance in IOPS. If you're using m1.large instances, you get 2 instance volumes (I believe), if you bump up to m1.xlarge or c1.xlarge, then you get 4 instance volumes, which should provide some improvement in throughput. But again these are fully virtualized instances, which comes at a cost in terms of IO throughput (sometimes a high cost if you get stuck with noisy neighbors). A bit of googling turned up this post with some interesting results on cloud IO performance: http://blog.cloudharmony.com/2010/06/disk-io-benchmarking-in-cloud.html Ideally if you want to benchmark, you are best off getting your hands on some physical hardware or trying one of the hosting providers that gets you as close to bare metal as possible. --gh
