Hi Otis, I have used Hadoop on EBS, but not HBase yet (apologies for not being HBase specific).
* Supposedly ephemeral disks can be faster, but EC2 claims EBS is faster. > People who benchmarked EBS mention its performance varies a lot. Local > disks > suffer from noisy neighbour problem, no? > EBS Volumes are much faster than the local EC2 image's local disk, in my experience. > * EBS disks are not local. They are far from the CPU. What happens with > data > locality if you have data on EBS? > Amazon uses local *fibre* network to connect EBS to the machine, so that is not much of a problem. > * MR jobs typically read and write a lot. I wonder if this ends up being > very > expensive? > Costs do tend to creep up on AWS. On the plus side, you can roughly calculate how expensive you MR jobs will be. Using your own hardware is definitely more cost effective. > * Data on ephemeral disks is lost when an instance terminates. Do people > really > rely purely on having N DNs and high enough replication factor to prevent > data > loss? > I found local EC2 image disks far slower than EBS, so stopped using them. I do not recall losing more than one EBS volume, but I've lost a many EC2 instances (and the local disk with it). Now I always choose EBS-backed EC2 instances. * With EBS you could just create a larger volume when you need more disk > space > and attach it to your existing DN. If you are running out of disk space on > local disks, what are the options? Got to launch more EC2 instances even > if all > you need is disk space, not more CPUs? > Yes, you cannot increase the local disk space on EC2 instance without getting a larger instance. As I understand, it is good for Hadoop to have one disk per cpu core for MR. Thanks, Phil -- Twitter : http://www.twitter.com/philwhln LinkedIn : http://ca.linkedin.com/in/philwhln Blog : http://www.philwhln.com > Thanks, > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > >
