Re: Optimal Filesystem (and Settings) for HDFS
Bryan Duxbury wrote: We use XFS for our data drives, and we've had somewhat mixed results. Thanks for that. I've just created a wiki page to put some of these notes up -extensions and some hard data would be welcome http://wiki.apache.org/hadoop/DiskSetup One problem we have for hard data is that we need some different benchmarks for MR jobs. Terasort is good for measuring IO and MR framework performance, but for more CPU intensive algorithms, or things that need to seek round a bit more, you can't be sure that terasort benchmarks are a good predictor of what's right for you in terms of hardware, filesystem, etc. Contributions in this area would be welcome. I'd like to measure the power consumed on a run too, which is actually possible as far as my laptop is concerned, because you can ask it's battery what happened. -steve
Re: Optimal Filesystem (and Settings) for HDFS
We use XFS for our data drives, and we've had somewhat mixed results. One of the biggest pros is that XFS has more free space than ext3, even with the reserved space settings turned all the way to 0. Another is that you can format a 1TB drive as XFS in about 0 seconds, versus minutes for ext3. This makes it really fast to kickstart our worker nodes. We have seen some weird stuff happen though when machines run out of memory, apparently because the XFS driver does something odd with kernel memory. When this happens, we end up having to do some fscking before we can get that node back online. As far as outright performance, I actually *did* do some tests of xfs vs ext3 performance on our cluster. If you just look at a single machine's local disk speed, you can write and read noticeably faster when using XFS instead of ext3. However, the reality is that this extra disk performance won't have much of an effect on your overall job completion performance, since you will find yourself network bottlenecked well in advance of even ext3's performance. The long and short of it is that we use XFS to speed up our new machine deployment, and that's it. -Bryan On May 18, 2009, at 10:31 AM, Alex Loddengaard wrote: I believe Yahoo! uses ext3, though I know other people have said that XFS has performed better in various benchmarks. We use ext3, though we haven't done any benchmarks to prove its worth. This question has come up a lot, so I think it'd be worth doing a benchmark and writing up the results. I haven't been able to find a detailed analysis / benchmark writeup comparing various filesystems, unfortunately. Hope this helps, Alex On Mon, May 18, 2009 at 8:54 AM, Bob Schulze b.schu...@ecircle.com wrote: We are currently rebuilding our cluster - has anybody recommendations on the underlaying file system? Just standard Ext3? I could imagine that the block size could be larger than its default... Thx for any tips, Bob
Re: Optimal Filesystem (and Settings) for HDFS
Hi Brian, thanks for the mail. I have an issue when we use xfs. hadoop runs du -sk after every 10 min on my cluster and some times it goes in the loop and machine hangs. Have you seen this issue or its only me? I'll really appreciate if some one can put some light on this Anshuman - Original Message - From: Bryan Duxbury br...@rapleaf.com To: core-user@hadoop.apache.org Sent: Tuesday, May 19, 2009 2:50:57 PM GMT -08:00 US/Canada Pacific Subject: Re: Optimal Filesystem (and Settings) for HDFS We use XFS for our data drives, and we've had somewhat mixed results. One of the biggest pros is that XFS has more free space than ext3, even with the reserved space settings turned all the way to 0. Another is that you can format a 1TB drive as XFS in about 0 seconds, versus minutes for ext3. This makes it really fast to kickstart our worker nodes. We have seen some weird stuff happen though when machines run out of memory, apparently because the XFS driver does something odd with kernel memory. When this happens, we end up having to do some fscking before we can get that node back online. As far as outright performance, I actually *did* do some tests of xfs vs ext3 performance on our cluster. If you just look at a single machine's local disk speed, you can write and read noticeably faster when using XFS instead of ext3. However, the reality is that this extra disk performance won't have much of an effect on your overall job completion performance, since you will find yourself network bottlenecked well in advance of even ext3's performance. The long and short of it is that we use XFS to speed up our new machine deployment, and that's it. -Bryan On May 18, 2009, at 10:31 AM, Alex Loddengaard wrote: I believe Yahoo! uses ext3, though I know other people have said that XFS has performed better in various benchmarks. We use ext3, though we haven't done any benchmarks to prove its worth. This question has come up a lot, so I think it'd be worth doing a benchmark and writing up the results. I haven't been able to find a detailed analysis / benchmark writeup comparing various filesystems, unfortunately. Hope this helps, Alex On Mon, May 18, 2009 at 8:54 AM, Bob Schulze b.schu...@ecircle.com wrote: We are currently rebuilding our cluster - has anybody recommendations on the underlaying file system? Just standard Ext3? I could imagine that the block size could be larger than its default... Thx for any tips, Bob
Re: Optimal Filesystem (and Settings) for HDFS
I always disable atime and it's ilk The deadline scheduler helps with the (non xfs hanging) du datanode timeout issues, but not much. Ultimately that is a caching failure in the kernel, due to the hadoop io patterns. Anshu, any luck getting off the PAE kernels? Is this the xfs lockup, or just the du taking to long? At one point, sagar and I talked about replacing the du call with a script that used the df as a rapid and close proxy, to get rid of the du calls, the block report was another problem On Tue, May 19, 2009 at 3:59 PM, Anshuman Sachdeva asachd...@attributor.com wrote: Hi Brian, thanks for the mail. I have an issue when we use xfs. hadoop runs du -sk after every 10 min on my cluster and some times it goes in the loop and machine hangs. Have you seen this issue or its only me? I'll really appreciate if some one can put some light on this Anshuman - Original Message - From: Bryan Duxbury br...@rapleaf.com To: core-user@hadoop.apache.org Sent: Tuesday, May 19, 2009 2:50:57 PM GMT -08:00 US/Canada Pacific Subject: Re: Optimal Filesystem (and Settings) for HDFS We use XFS for our data drives, and we've had somewhat mixed results. One of the biggest pros is that XFS has more free space than ext3, even with the reserved space settings turned all the way to 0. Another is that you can format a 1TB drive as XFS in about 0 seconds, versus minutes for ext3. This makes it really fast to kickstart our worker nodes. We have seen some weird stuff happen though when machines run out of memory, apparently because the XFS driver does something odd with kernel memory. When this happens, we end up having to do some fscking before we can get that node back online. As far as outright performance, I actually *did* do some tests of xfs vs ext3 performance on our cluster. If you just look at a single machine's local disk speed, you can write and read noticeably faster when using XFS instead of ext3. However, the reality is that this extra disk performance won't have much of an effect on your overall job completion performance, since you will find yourself network bottlenecked well in advance of even ext3's performance. The long and short of it is that we use XFS to speed up our new machine deployment, and that's it. -Bryan On May 18, 2009, at 10:31 AM, Alex Loddengaard wrote: I believe Yahoo! uses ext3, though I know other people have said that XFS has performed better in various benchmarks. We use ext3, though we haven't done any benchmarks to prove its worth. This question has come up a lot, so I think it'd be worth doing a benchmark and writing up the results. I haven't been able to find a detailed analysis / benchmark writeup comparing various filesystems, unfortunately. Hope this helps, Alex On Mon, May 18, 2009 at 8:54 AM, Bob Schulze b.schu...@ecircle.com wrote: We are currently rebuilding our cluster - has anybody recommendations on the underlaying file system? Just standard Ext3? I could imagine that the block size could be larger than its default... Thx for any tips, Bob -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals
Optimal Filesystem (and Settings) for HDFS
We are currently rebuilding our cluster - has anybody recommendations on the underlaying file system? Just standard Ext3? I could imagine that the block size could be larger than its default... Thx for any tips, Bob
Re: Optimal Filesystem (and Settings) for HDFS
I believe Yahoo! uses ext3, though I know other people have said that XFS has performed better in various benchmarks. We use ext3, though we haven't done any benchmarks to prove its worth. This question has come up a lot, so I think it'd be worth doing a benchmark and writing up the results. I haven't been able to find a detailed analysis / benchmark writeup comparing various filesystems, unfortunately. Hope this helps, Alex On Mon, May 18, 2009 at 8:54 AM, Bob Schulze b.schu...@ecircle.com wrote: We are currently rebuilding our cluster - has anybody recommendations on the underlaying file system? Just standard Ext3? I could imagine that the block size could be larger than its default... Thx for any tips, Bob
Re: Optimal Filesystem (and Settings) for HDFS
Do not forget 'tune2fs -m 2'. By default this value gets set at 5%. With 1 TB disks we got 33 GB more usable space. Talk about instant savings! On Mon, May 18, 2009 at 1:31 PM, Alex Loddengaard a...@cloudera.com wrote: I believe Yahoo! uses ext3, though I know other people have said that XFS has performed better in various benchmarks. We use ext3, though we haven't done any benchmarks to prove its worth. This question has come up a lot, so I think it'd be worth doing a benchmark and writing up the results. I haven't been able to find a detailed analysis / benchmark writeup comparing various filesystems, unfortunately. Hope this helps, Alex On Mon, May 18, 2009 at 8:54 AM, Bob Schulze b.schu...@ecircle.com wrote: We are currently rebuilding our cluster - has anybody recommendations on the underlaying file system? Just standard Ext3? I could imagine that the block size could be larger than its default... Thx for any tips, Bob
Re: Optimal Filesystem (and Settings) for HDFS
On 5/18/09 11:33 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Do not forget 'tune2fs -m 2'. By default this value gets set at 5%. With 1 TB disks we got 33 GB more usable space. Talk about instant savings! Yup. Although, I think we're using -m 1. On Mon, May 18, 2009 at 1:31 PM, Alex Loddengaard a...@cloudera.com wrote: I believe Yahoo! uses ext3, Yup. They won't buy me enough battery backed RAM to use a memory file system. ;)