Yeah, EC2 tends to have both low network and disk throughput. We use it because of the flexibility and because throughput isn't a big concern for us.
Also, fyi, I'm pretty sure from my tests that EBS is on a separate interface than the regular network (like Phil said), so EBS starving your internode bandwidth shouldn't be a concern. On Wed, Jan 5, 2011 at 11:16 AM, Lars George <[email protected]> wrote: > Hi, > > I ran some tests on various EC2 clusters from c1.medium, c1.xlarge to > m2.2xlarge with EBS on 1+10 instances. The instance storage usually > averages at around 2-3 MB/s for writes and the EBS backed m2.2xlarge > did 7-8 MB/s on writes. Reading I think is less an issue, but writing > is really bad. On a dedicated cluster I expect to see at leat 15 MB/s > but have seen 25 MB/s on quite average co-located servers. EBS is > better, but still bad for large ETL jobs. I had one use-case where a > single (!) machine with a single threaded app could do an ETL job in > about 50mins while an EC2 cluster doing the same on 1+10 nodes took > 30hrs! Go figure. And I added a "dry-run" switch that would do the > whole reading and parsing stuff, just no writing and they finished in > 45mins. So this was definitely write bound. > > One take away is to watch and expect a huge deviation in performance. > And a rule of thumb may be: If you have a well performing EC2 cluster, > do not shut it down if you can avoid it. Or spin up a few, do a burn > in and select the fastest. > > Lars > > > On Wed, Jan 5, 2011 at 4:50 PM, Matt Corgan <[email protected]> wrote: > > Hi Otis, > > > > I think it might be difficult to interpret the results of running all the > > different nodes in the same cluster. I would recommend running your test > > once with N nodes using local disk, then run again with N nodes using 1 > EBS > > volumes, then run again with N nodes using X EBS volumes. > > > > Do you know if your workload is most likely to be restricted by CPU, > memory, > > disk throughput, disk space, or disk seeks? EBS helps most with the last > 2, > > but don't overlook how expensive it can be. > > > > We're mostly disk seek limited, so we mount 6x100GB EBS volumes on each > > m1.large server and don't even use the local disks in order to keep > things > > simple. If that proves not enough and the servers can still handle it, > > we'll probably add new servers to the cluster with 12x100GB and then > slowly > > remove the old ones. These are not in a RAID configuration like we do > for > > MySQL, just listed in the hdfs-site.xml file: > > > > <property> > > <name>dfs.data.dir</name> > > > > <value>/mnt/hdfs/ebs1,/mnt/hdfs/ebs2,/mnt/hdfs/ebs3,/mnt/hdfs/ebs4,/mnt/hdfs/ebs5,/mnt/hdfs/ebs6</value> > > </property> > > > > Hope that helps, > > Matt > > > > > > On Wed, Jan 5, 2011 at 1:44 AM, Otis Gospodnetic < > [email protected] > >> wrote: > > > >> Hi, > >> > >> I think this bit from Matt and the last bit from Phil about a > >> drive-per-cpu-core > >> seem like strong arguments in favour of EBS. > >> I don't have a good feel/experience for speed when storage medium is on > the > >> other side of a *fibre* link vs. completely local disk. > >> The fact that everything is shared and the intensity of its use by > others > >> sharing the resources varies makes EBS vs. local super hard to properly > >> compare. > >> > >> How about doing this to compare performance and cost: > >> * create N EC2 instances > >> * on half of them configure Hadoop HDFS/MR to use local disk > >> * on a quarter of them configure Hadoop HDFS/MR to use 1 EBS volume > >> * on a quarter of them configure Hadoop HDFS/MR to use N EBS volumes > >> * run your regular MR jobs > >> * compare performance > >> * look at the EBS section on the AWS monthly bill > >> > >> Q1: does above sound good or is there a way to improve this? > >> Q2: what's the best way to compare performance of different nodes other > >> than > >> manually checking various Hadoop UIs to see how long Map and Reduce > tasks > >> on > >> different nodes *tend* to take? > >> > >> The above is really more about HDFS/MR performance on local vs. EBS > disks. > >> If each of the above nodes also runs HBase RegionServer, how would one > see > >> which > >> group of them is the fastest, which the slowest? > >> Is there a "rows per second" sort of metric somewhere that would show > how > >> fast > >> different RSs are? > >> > >> Thanks, > >> Otis > >> ---- > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > >> Lucene ecosystem search :: http://search-lucene.com/ > >> > >> > >> > >> ----- Original Message ---- > >> > From: Matt Corgan <[email protected]> > >> > To: user <[email protected]> > >> > Sent: Tue, January 4, 2011 2:36:51 PM > >> > Subject: Re: HBase / HDFS on EBS? > >> > > >> > One nice thing is that you can create many small EBS volumes per > >> instance, > >> > and since each EBS volume does ~100 IOPS you can get really good > >> aggregate > >> > random read performance. > >> > > >> > > >> > On Tue, Jan 4, 2011 at 2:05 PM, Phil Whelan <[email protected]> > wrote: > >> > > >> > > Hi Otis, > >> > > > >> > > I have used Hadoop on EBS, but not HBase yet (apologies for not > being > >> HBase > >> > > specific). > >> > > > >> > > * Supposedly ephemeral disks can be faster, but EC2 claims EBS is > >> faster. > >> > > > People who benchmarked EBS mention its performance varies a lot. > >> Local > >> > > > disks > >> > > > suffer from noisy neighbour problem, no? > >> > > > > >> > > > >> > > EBS Volumes are much faster than the local EC2 image's local disk, > in > >> my > >> > > experience. > >> > > > >> > > > >> > > > * EBS disks are not local. They are far from the CPU. What > happens > >> with > >> > > > data > >> > > > locality if you have data on EBS? > >> > > > > >> > > > >> > > Amazon uses local *fibre* network to connect EBS to the machine, so > >> that is > >> > > not much of a problem. > >> > > > >> > > > >> > > > * MR jobs typically read and write a lot. I wonder if this ends > up > >> being > >> > > > very > >> > > > expensive? > >> > > > > >> > > > >> > > Costs do tend to creep up on AWS. On the plus side, you can roughly > >> > > calculate how expensive you MR jobs will be. Using your own > hardware > >> is > >> > > definitely more cost effective. > >> > > > >> > > > >> > > > * Data on ephemeral disks is lost when an instance terminates. > Do > >> people > >> > > > really > >> > > > rely purely on having N DNs and high enough replication factor to > >> prevent > >> > > > data > >> > > > loss? > >> > > > > >> > > > >> > > I found local EC2 image disks far slower than EBS, so stopped using > >> them. I > >> > > do not recall losing more than one EBS volume, but I've lost a many > >> EC2 > >> > > instances (and the local disk with it). Now I always choose > EBS-backed > >> EC2 > >> > > instances. > >> > > > >> > > * With EBS you could just create a larger volume when you need more > >> disk > >> > > > space > >> > > > and attach it to your existing DN. If you are running out of > disk > >> space > >> > > on > >> > > > local disks, what are the options? Got to launch more EC2 > instances > >> even > >> > > > if all > >> > > > you need is disk space, not more CPUs? > >> > > > > >> > > > >> > > Yes, you cannot increase the local disk space on EC2 instance > without > >> > > getting a larger instance. As I understand, it is good for Hadoop > to > >> have > >> > > one disk per cpu core for MR. > >> > > > >> > > Thanks, > >> > > Phil > >> > > > >> > > -- > >> > > Twitter : http://www.twitter.com/philwhln > >> > > LinkedIn : http://ca.linkedin.com/in/philwhln > >> > > Blog : http://www.philwhln.com > >> > > > >> > > > >> > > > Thanks, > >> > > > Otis > >> > > > ---- > >> > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > >> > > > Lucene ecosystem search :: http://search-lucene.com/ > >> > > > > >> > > > > >> > > > >> > > >> > > >
