I don't have a whole lot of recent HBase on EBS experience, but when I did do it my main issue was that sometimes some EBS volumes would become unavailable.
The way I see it is that you have an additional moving part in your whole stack, thus there's a chance it will generate a new set of problems (compared to using local disks). J-D On Tue, Jan 4, 2011 at 9:43 AM, Otis Gospodnetic <[email protected]> wrote: > Hi, > > What do people think about running HBase / HDFS off of EBS on EC2? That is, > having HBase/HDFS keep the data on EBS. > I was surprised not to find a lot of discussion around that: > http://search-hadoop.com/?q=%2Bebs+%2Bhdfs > > Here are my thoughts/questions: > > * Supposedly ephemeral disks can be faster, but EC2 claims EBS is faster. > People who benchmarked EBS mention its performance varies a lot. Local disks > suffer from noisy neighbour problem, no? > > * EBS disks are not local. They are far from the CPU. What happens with data > locality if you have data on EBS? > > * MR jobs typically read and write a lot. I wonder if this ends up being very > expensive? > > * Data on ephemeral disks is lost when an instance terminates. Do people > really > rely purely on having N DNs and high enough replication factor to prevent data > loss? > > * With EBS you could just create a larger volume when you need more disk space > and attach it to your existing DN. If you are running out of disk space on > local disks, what are the options? Got to launch more EC2 instances even if > all > you need is disk space, not more CPUs? > > Thanks, > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > >
