Hi,

What do people think about running HBase / HDFS off of EBS on EC2?  That is, 
having HBase/HDFS keep the data on EBS.
I was surprised not to find a lot of discussion around that:
  http://search-hadoop.com/?q=%2Bebs+%2Bhdfs

Here are my thoughts/questions:

* Supposedly ephemeral disks can be faster, but EC2 claims EBS is faster.  
People who benchmarked EBS mention its performance varies a lot.  Local disks 
suffer from noisy neighbour problem, no?

* EBS disks are not local.  They are far from the CPU.  What happens with data 
locality if you have data on EBS?

* MR jobs typically read and write a lot.  I wonder if this ends up being very 
expensive?

* Data on ephemeral disks is lost when an instance terminates.  Do people 
really 
rely purely on having N DNs and high enough replication factor to prevent data 
loss?

* With EBS you could just create a larger volume when you need more disk space 
and attach it to your existing DN.  If you are running out of disk space on 
local disks, what are the options?  Got to launch more EC2 instances even if 
all 
you need is disk space, not more CPUs?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

Reply via email to