Re: ext4 on a hadoop cluster datanodes

Brian C. Huffman Wed, 12 Nov 2014 10:51:13 -0800

Would this set of ext4 parameters be ok for a 500GB HDFS data drive?


Thanks,
Brian

On 10/06/2014 06:09 PM, Travis wrote:

For filesystem creation, we use the following with mkfs.ext4
mkfs.ext4 -T largefile -m 1 -O dir_index,extent,sparse_super -L$HDFS_LABEL /dev/${DEV}1
By default, mkfs creates way too many inodes, so we tune it a bit withthe "largefile" option, which modifies the inode_ratio. This gives us~2 million usable inodes on a 2TB filesystem.
As well, by default, mkfs sets the block reserve to 5%, which wastes afair amount of space, since this space is only accessible to the rootuser. We tune this down to 1% at mkfs time, but you can use tune2fsat runtime to change it.
I don't know that I would use writeback. This mode is problematic inthe event of a crash because it can allow old data to exist on the FS,but with new metadata. I consider this corruption. Unless you knowyour environment to be super stable (meaning no OS or hardware-inducedcrashes) AND you have stable, UPS-backed power, I would steer clear ofthis.
If you're looking for the utmost in filesystem performance, you'rebetter off looking at the controller card you're using. Right now,we're using LSI9207-8i and seeing an aggregate 1.6-1.8GBytes/secthroughput across 12 drives in JBOD. Our older LSI-based cards canonly sustain maybe a quarter of that in the same disk configuration.
Travis
On Mon, Oct 6, 2014 at 4:46 PM, Colin Kincaid Williams <[email protected]<mailto:[email protected]>> wrote:
    Hi,

    I'm trying to figure out what are more ideal settings for using
    ext4 on hadoop cluster datanodes. From the hadoop site its
    recommended nodelalloc option is chosen in the fstab. Is that
    still a preferred option?

    I read elsewhere to disable the ext4 journal, and use data=writeback.

    http://fenidik.blogspot.com/2010/03/ext4-disable-journal.html

    Finally, in some slides i read to use
    dir_index,sparse_super,extent when creating the filesystem, and
    mount noatime and nodiratime

    
http://www.slideshare.net/leonsp/best-practices-for-deploying-hadoop-biginsights-in-the-cloud








--
Travis Campbell
[email protected] <mailto:[email protected]>

Re: ext4 on a hadoop cluster datanodes

Reply via email to