Here is my thoughts: If your datastorage is used for long-term, then you may consider attaching HDFS storage device onto EBS rather than local disk (Attaching Namenode storage device onto EBS as well). But for this setup, I think we should think of dfs.replication.factor=2 (even 1) because EBS itself has already provided enough reliability.
If your datastore is used for ephemeral purpose (say EMR computation), you may consider just using the EC2 provided ephemeral disks. On Sat, Jun 4, 2011 at 11:27 AM, Jim R. Wilson <[email protected]>wrote: > Hi HBase community, > > What are the current best-practices with respect to starting up an HBase > cluster in EC2? I don't see any public AMI's newer than 0.89.xxx, and > starting up that one it's, clear that it's not configured for HDFS or > clustering (empty hbase-site.xml). > > Do people generally keep data in S3 or HDFS? If the latter, is it > persisted > via EBS? Do the hadoop nodes have more than one EBS attached to > distinguish > HDFS from the OS? > > Any help is much appreciated. Thanks in advance! > > -- Jim R. Wilson (jimbojw) > -- --Sean
