Hi, I think this bit from Matt and the last bit from Phil about a drive-per-cpu-core seem like strong arguments in favour of EBS. I don't have a good feel/experience for speed when storage medium is on the other side of a *fibre* link vs. completely local disk. The fact that everything is shared and the intensity of its use by others sharing the resources varies makes EBS vs. local super hard to properly compare.
How about doing this to compare performance and cost: * create N EC2 instances * on half of them configure Hadoop HDFS/MR to use local disk * on a quarter of them configure Hadoop HDFS/MR to use 1 EBS volume * on a quarter of them configure Hadoop HDFS/MR to use N EBS volumes * run your regular MR jobs * compare performance * look at the EBS section on the AWS monthly bill Q1: does above sound good or is there a way to improve this? Q2: what's the best way to compare performance of different nodes other than manually checking various Hadoop UIs to see how long Map and Reduce tasks on different nodes *tend* to take? The above is really more about HDFS/MR performance on local vs. EBS disks. If each of the above nodes also runs HBase RegionServer, how would one see which group of them is the fastest, which the slowest? Is there a "rows per second" sort of metric somewhere that would show how fast different RSs are? Thanks, Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: Matt Corgan <[email protected]> > To: user <[email protected]> > Sent: Tue, January 4, 2011 2:36:51 PM > Subject: Re: HBase / HDFS on EBS? > > One nice thing is that you can create many small EBS volumes per instance, > and since each EBS volume does ~100 IOPS you can get really good aggregate > random read performance. > > > On Tue, Jan 4, 2011 at 2:05 PM, Phil Whelan <[email protected]> wrote: > > > Hi Otis, > > > > I have used Hadoop on EBS, but not HBase yet (apologies for not being HBase > > specific). > > > > * Supposedly ephemeral disks can be faster, but EC2 claims EBS is faster. > > > People who benchmarked EBS mention its performance varies a lot. Local > > > disks > > > suffer from noisy neighbour problem, no? > > > > > > > EBS Volumes are much faster than the local EC2 image's local disk, in my > > experience. > > > > > > > * EBS disks are not local. They are far from the CPU. What happens with > > > data > > > locality if you have data on EBS? > > > > > > > Amazon uses local *fibre* network to connect EBS to the machine, so that is > > not much of a problem. > > > > > > > * MR jobs typically read and write a lot. I wonder if this ends up being > > > very > > > expensive? > > > > > > > Costs do tend to creep up on AWS. On the plus side, you can roughly > > calculate how expensive you MR jobs will be. Using your own hardware is > > definitely more cost effective. > > > > > > > * Data on ephemeral disks is lost when an instance terminates. Do people > > > really > > > rely purely on having N DNs and high enough replication factor to prevent > > > data > > > loss? > > > > > > > I found local EC2 image disks far slower than EBS, so stopped using them. I > > do not recall losing more than one EBS volume, but I've lost a many EC2 > > instances (and the local disk with it). Now I always choose EBS-backed EC2 > > instances. > > > > * With EBS you could just create a larger volume when you need more disk > > > space > > > and attach it to your existing DN. If you are running out of disk space > > on > > > local disks, what are the options? Got to launch more EC2 instances even > > > if all > > > you need is disk space, not more CPUs? > > > > > > > Yes, you cannot increase the local disk space on EC2 instance without > > getting a larger instance. As I understand, it is good for Hadoop to have > > one disk per cpu core for MR. > > > > Thanks, > > Phil > > > > -- > > Twitter : http://www.twitter.com/philwhln > > LinkedIn : http://ca.linkedin.com/in/philwhln > > Blog : http://www.philwhln.com > > > > > > > Thanks, > > > Otis > > > ---- > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > >
