Re: HBase / HDFS on EBS?

Matt Corgan Tue, 04 Jan 2011 11:37:18 -0800

One nice thing is that you can create many small EBS volumes per instance,
and since each EBS volume does ~100 IOPS you can get really good aggregate
random read performance.



On Tue, Jan 4, 2011 at 2:05 PM, Phil Whelan <[email protected]> wrote:

> Hi Otis,
>
> I have used Hadoop on EBS, but not HBase yet (apologies for not being HBase
> specific).
>
> * Supposedly ephemeral disks can be faster, but EC2 claims EBS is faster.
> > People who benchmarked EBS mention its performance varies a lot.  Local
> > disks
> > suffer from noisy neighbour problem, no?
> >
>
> EBS Volumes are much faster than the local EC2 image's local disk, in my
> experience.
>
>
> > * EBS disks are not local.  They are far from the CPU.  What happens with
> > data
> > locality if you have data on EBS?
> >
>
> Amazon uses local *fibre* network to connect EBS to the machine, so that is
> not much of a problem.
>
>
> > * MR jobs typically read and write a lot.  I wonder if this ends up being
> > very
> > expensive?
> >
>
> Costs do tend to creep up on AWS. On the plus side, you can roughly
> calculate how expensive you MR jobs will be. Using your own hardware is
> definitely more cost effective.
>
>
> > * Data on ephemeral disks is lost when an instance terminates.  Do people
> > really
> > rely purely on having N DNs and high enough replication factor to prevent
> > data
> > loss?
> >
>
> I found local EC2 image disks far slower than EBS, so stopped using them. I
> do not recall losing more than one EBS volume, but I've lost a many EC2
> instances (and the local disk with it). Now I always choose EBS-backed EC2
> instances.
>
> * With EBS you could just create a larger volume when you need more disk
> > space
> > and attach it to your existing DN.  If you are running out of disk space
> on
> > local disks, what are the options?  Got to launch more EC2 instances even
> > if all
> > you need is disk space, not more CPUs?
> >
>
> Yes, you cannot increase the local disk space on EC2 instance without
> getting a larger instance. As I understand, it is good for Hadoop to have
> one disk per cpu core for MR.
>
> Thanks,
> Phil
>
> --
> Twitter : http://www.twitter.com/philwhln
> LinkedIn : http://ca.linkedin.com/in/philwhln
> Blog : http://www.philwhln.com
>
>
> > Thanks,
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
>

Re: HBase / HDFS on EBS?

Reply via email to