On Fri, Nov 11, 2011 at 4:42 PM, Otis Gospodnetic <
[email protected]> wrote:

> Hello,
>
> I was wondering if anyone has done an experiment with HBase or HDFS/MR
> where machines in the cluster have heterogeneous underlying file systems?
> e.g.,
> * 10 nodes with xfs
> * 10 nodes with ext3
> * 10 nodes with ext4
>
> The goal being comparing performance of MapReduce jobs reading from and
> writing to HBase (or just HDFS).
>
>
> And does anyone have any reason to believe doing the above would be super
> risky and cause data loss?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/


Since Hadoop abstracts you from the filesystem guts the underlying file
system chosen can be mixed and matched. you can even mix and match the
disks on a single machine.

I have found that ext3 performance gets noticeably poor as disks gets full.
I captured system vitals from a before and after ext3 to ext4 upgrade.

http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/a_great_reason_to_use

Also if you want to get the most out of your disks read this:

http://allthingshadoop.com/2011/05/20/faster-datanodes-with-less-wait-io-using-df-instead-of-du/

XFS should is usually described as on par or slightly better then ext4.
However anecdotally most hardcore sysadmins I know can account for one XFS
"i lost my super block" stories :)

Reply via email to