For filesystem creation, we use the following with mkfs.ext4
mkfs.ext4 -T largefile -m 1 -O dir_index,extent,sparse_super -L $HDFS_LABEL
/dev/${DEV}1
By default, mkfs creates way too many inodes, so we tune it a bit with the
"largefile" option, which modifies the inode_ratio. This gives us ~2
million usable inodes on a 2TB filesystem.
As well, by default, mkfs sets the block reserve to 5%, which wastes a fair
amount of space, since this space is only accessible to the root user. We
tune this down to 1% at mkfs time, but you can use tune2fs at runtime to
change it.
I don't know that I would use writeback. This mode is problematic in the
event of a crash because it can allow old data to exist on the FS, but with
new metadata. I consider this corruption. Unless you know your
environment to be super stable (meaning no OS or hardware-induced crashes)
AND you have stable, UPS-backed power, I would steer clear of this.
If you're looking for the utmost in filesystem performance, you're better
off looking at the controller card you're using. Right now, we're using
LSI9207-8i and seeing an aggregate 1.6-1.8GBytes/sec throughput across 12
drives in JBOD. Our older LSI-based cards can only sustain maybe a quarter
of that in the same disk configuration.
Travis
On Mon, Oct 6, 2014 at 4:46 PM, Colin Kincaid Williams <[email protected]>
wrote:
> Hi,
>
> I'm trying to figure out what are more ideal settings for using ext4 on
> hadoop cluster datanodes. From the hadoop site its recommended nodelalloc
> option is chosen in the fstab. Is that still a preferred option?
>
> I read elsewhere to disable the ext4 journal, and use data=writeback.
>
> http://fenidik.blogspot.com/2010/03/ext4-disable-journal.html
>
> Finally, in some slides i read to use dir_index,sparse_super,extent when
> creating the filesystem, and mount noatime and nodiratime
>
>
> http://www.slideshare.net/leonsp/best-practices-for-deploying-hadoop-biginsights-in-the-cloud
>
>
>
>
>
>
--
Travis Campbell
[email protected]