I don't have a whole lot of recent HBase on EBS experience, but when I
did do it my main issue was that sometimes some EBS volumes would
become unavailable.

The way I see it is that you have an additional moving part in your
whole stack, thus there's a chance it will generate a new set of
problems (compared to using local disks).

J-D

On Tue, Jan 4, 2011 at 9:43 AM, Otis Gospodnetic
<[email protected]> wrote:
> Hi,
>
> What do people think about running HBase / HDFS off of EBS on EC2?  That is,
> having HBase/HDFS keep the data on EBS.
> I was surprised not to find a lot of discussion around that:
>  http://search-hadoop.com/?q=%2Bebs+%2Bhdfs
>
> Here are my thoughts/questions:
>
> * Supposedly ephemeral disks can be faster, but EC2 claims EBS is faster.
> People who benchmarked EBS mention its performance varies a lot.  Local disks
> suffer from noisy neighbour problem, no?
>
> * EBS disks are not local.  They are far from the CPU.  What happens with data
> locality if you have data on EBS?
>
> * MR jobs typically read and write a lot.  I wonder if this ends up being very
> expensive?
>
> * Data on ephemeral disks is lost when an instance terminates.  Do people 
> really
> rely purely on having N DNs and high enough replication factor to prevent data
> loss?
>
> * With EBS you could just create a larger volume when you need more disk space
> and attach it to your existing DN.  If you are running out of disk space on
> local disks, what are the options?  Got to launch more EC2 instances even if 
> all
> you need is disk space, not more CPUs?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>

Reply via email to