On Fri, Nov 11, 2011 at 4:42 PM, Otis Gospodnetic < [email protected]> wrote:
> Hello, > > I was wondering if anyone has done an experiment with HBase or HDFS/MR > where machines in the cluster have heterogeneous underlying file systems? > e.g., > * 10 nodes with xfs > * 10 nodes with ext3 > * 10 nodes with ext4 > > The goal being comparing performance of MapReduce jobs reading from and > writing to HBase (or just HDFS). > > > And does anyone have any reason to believe doing the above would be super > risky and cause data loss? > > Thanks, > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ Since Hadoop abstracts you from the filesystem guts the underlying file system chosen can be mixed and matched. you can even mix and match the disks on a single machine. I have found that ext3 performance gets noticeably poor as disks gets full. I captured system vitals from a before and after ext3 to ext4 upgrade. http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/a_great_reason_to_use Also if you want to get the most out of your disks read this: http://allthingshadoop.com/2011/05/20/faster-datanodes-with-less-wait-io-using-df-instead-of-du/ XFS should is usually described as on par or slightly better then ext4. However anecdotally most hardcore sysadmins I know can account for one XFS "i lost my super block" stories :)
