Re: HBase / HDFS on EBS?

Phil Whelan Tue, 04 Jan 2011 11:06:06 -0800

Hi Otis,

I have used Hadoop on EBS, but not HBase yet (apologies for not being HBase
specific).


* Supposedly ephemeral disks can be faster, but EC2 claims EBS is faster.
> People who benchmarked EBS mention its performance varies a lot.  Local
> disks
> suffer from noisy neighbour problem, no?
>

EBS Volumes are much faster than the local EC2 image's local disk, in my
experience.


> * EBS disks are not local.  They are far from the CPU.  What happens with
> data
> locality if you have data on EBS?
>

Amazon uses local *fibre* network to connect EBS to the machine, so that is
not much of a problem.


> * MR jobs typically read and write a lot.  I wonder if this ends up being
> very
> expensive?
>

Costs do tend to creep up on AWS. On the plus side, you can roughly
calculate how expensive you MR jobs will be. Using your own hardware is
definitely more cost effective.


> * Data on ephemeral disks is lost when an instance terminates.  Do people
> really
> rely purely on having N DNs and high enough replication factor to prevent
> data
> loss?
>

I found local EC2 image disks far slower than EBS, so stopped using them. I
do not recall losing more than one EBS volume, but I've lost a many EC2
instances (and the local disk with it). Now I always choose EBS-backed EC2
instances.

* With EBS you could just create a larger volume when you need more disk
> space
> and attach it to your existing DN.  If you are running out of disk space on
> local disks, what are the options?  Got to launch more EC2 instances even
> if all
> you need is disk space, not more CPUs?
>

Yes, you cannot increase the local disk space on EC2 instance without
getting a larger instance. As I understand, it is good for Hadoop to have
one disk per cpu core for MR.

Thanks,
Phil

-- 
Twitter : http://www.twitter.com/philwhln
LinkedIn : http://ca.linkedin.com/in/philwhln
Blog : http://www.philwhln.com


> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>

Re: HBase / HDFS on EBS?

Reply via email to