Re: RAID 5 performance issue.
Andrew Clayton wrote: On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote: Can you start a 'vmstat 1' in one window, then start whatever you do to get crappy performance. That would be interesting to see. In trying to find something simple that can show the problem I'm seeing. I think I may have found the culprit. Just testing on my machine at home, I made this simple program. /* fslattest.c */ #define _GNU_SOURCE #include stdio.h #include stdlib.h #include unistd.h #include sys/stat.h #include sys/types.h #include fcntl.h #include string.h int main(int argc, char *argv[]) { char file[255]; if (argc 2) { printf(Usage: fslattest file\n); exit(1); } strncpy(file, argv[1], 254); printf(Opening %s\n, file); while (1) { int testfd = open(file, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); close(testfd); unlink(file); sleep(1); } exit(0); } If I run this program under strace in my home directory (XFS file system on a (new) disk (no raid involved) all to its own.like $ strace -T -e open ./fslattest test It doesn't looks too bad. open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.005043 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000212 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.016844 If I then start up a dd in the same place. $ dd if=/dev/zero of=bigfile bs=1M count=500 Then I see the problem I'm seeing at work. open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 2.000348 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.594441 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 2.224636 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.074615 Doing the same on my other disk which is Ext3 and contains the root fs, it doesn't ever stutter open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.015423 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.92 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.93 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.88 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000103 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.96 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.94 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000114 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.91 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000274 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000107 Somewhere in there was the dd, but you can't tell. I've found if I mount the XFS filesystem with nobarrier, the latency is reduced to about 0.5 seconds with occasional spikes 1 second. When doing this on the raid array. open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.009164 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.71 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.002667 dd kicks in open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 11.580238 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 3.94 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.63 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 4.297978 dd finishes open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.000199 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.013413 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.025134 I guess I should take this to the XFS folks. Try mounting the filesystem noatime and see if that's part of the problem. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 11 Oct 2007 13:06:39 -0400, Bill Davidsen wrote: Andrew Clayton wrote: On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote: Can you start a 'vmstat 1' in one window, then start whatever you do to get crappy performance. That would be interesting to see. In trying to find something simple that can show the problem I'm seeing. I think I may have found the culprit. Just testing on my machine at home, I made this simple program. /* fslattest.c */ #define _GNU_SOURCE #include stdio.h #include stdlib.h #include unistd.h #include sys/stat.h #include sys/types.h #include fcntl.h #include string.h int main(int argc, char *argv[]) { char file[255]; if (argc 2) { printf(Usage: fslattest file\n); exit(1); } strncpy(file, argv[1], 254); printf(Opening %s\n, file); while (1) { int testfd = open(file, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); close(testfd); unlink(file); sleep(1); } exit(0); } If I run this program under strace in my home directory (XFS file system on a (new) disk (no raid involved) all to its own.like $ strace -T -e open ./fslattest test It doesn't looks too bad. open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.005043 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000212 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.016844 If I then start up a dd in the same place. $ dd if=/dev/zero of=bigfile bs=1M count=500 Then I see the problem I'm seeing at work. open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 2.000348 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.594441 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 2.224636 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.074615 Doing the same on my other disk which is Ext3 and contains the root fs, it doesn't ever stutter open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.015423 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.92 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.93 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.88 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000103 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.96 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.94 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000114 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.91 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000274 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000107 Somewhere in there was the dd, but you can't tell. I've found if I mount the XFS filesystem with nobarrier, the latency is reduced to about 0.5 seconds with occasional spikes 1 second. When doing this on the raid array. open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.009164 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.71 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.002667 dd kicks in open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 11.580238 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 3.94 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.63 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 4.297978 dd finishes open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.000199 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.013413 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.025134 I guess I should take this to the XFS folks. Try mounting the filesystem noatime and see if that's part of the problem. Yeah, it's mounted noatime. Looks like I tracked this down to an XFS regression. http://marc.info/?l=linux-fsdevelm=119211228609886w=2 Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 11 Oct 2007, Andrew Clayton wrote: On Thu, 11 Oct 2007 13:06:39 -0400, Bill Davidsen wrote: Andrew Clayton wrote: On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote: Can you start a 'vmstat 1' in one window, then start whatever you do to get crappy performance. That would be interesting to see. In trying to find something simple that can show the problem I'm seeing. I think I may have found the culprit. Just testing on my machine at home, I made this simple program. /* fslattest.c */ #define _GNU_SOURCE #include stdio.h #include stdlib.h #include unistd.h #include sys/stat.h #include sys/types.h #include fcntl.h #include string.h int main(int argc, char *argv[]) { char file[255]; if (argc 2) { printf(Usage: fslattest file\n); exit(1); } strncpy(file, argv[1], 254); printf(Opening %s\n, file); while (1) { int testfd = open(file, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); close(testfd); unlink(file); sleep(1); } exit(0); } If I run this program under strace in my home directory (XFS file system on a (new) disk (no raid involved) all to its own.like $ strace -T -e open ./fslattest test It doesn't looks too bad. open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.005043 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000212 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.016844 If I then start up a dd in the same place. $ dd if=/dev/zero of=bigfile bs=1M count=500 Then I see the problem I'm seeing at work. open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 2.000348 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.594441 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 2.224636 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.074615 Doing the same on my other disk which is Ext3 and contains the root fs, it doesn't ever stutter open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.015423 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.92 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.93 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.88 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000103 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.96 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.94 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000114 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.91 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000274 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000107 Somewhere in there was the dd, but you can't tell. I've found if I mount the XFS filesystem with nobarrier, the latency is reduced to about 0.5 seconds with occasional spikes 1 second. When doing this on the raid array. open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.009164 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.71 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.002667 dd kicks in open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 11.580238 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 3.94 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.63 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 4.297978 dd finishes open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.000199 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.013413 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.025134 I guess I should take this to the XFS folks. Try mounting the filesystem noatime and see if that's part of the problem. Yeah, it's mounted noatime. Looks like I tracked this down to an XFS regression. http://marc.info/?l=linux-fsdevelm=119211228609886w=2 Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Nice! Thanks for reporting the final result, 1-2 weeks of debugging/discussion, nice you found it. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Sun, 7 Oct 2007, Dean S. Messing wrote: Justin Piszcz wrote: On Fri, 5 Oct 2007, Dean S. Messing wrote: Brendan Conoboy wrote: snip Is the onboard SATA controller real SATA or just an ATA-SATA converter? If the latter, you're going to have trouble getting faster performance than any one disk can give you at a time. The output of 'lspci' should tell you if the onboard SATA controller is on its own bus or sharing space with some other device. Pasting the output here would be useful. snip N00bee question: How does one tell if a machine's disk controller is an ATA-SATA converter? The output of `lspci|fgrep -i sata' is: 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\ (rev 09) suggests a real SATA. These references to ATA in dmesg, however, make me wonder. ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133 ata1.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133 ata2.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32) ata2.00: configured for UDMA/133 ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133 ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata3.00: configured for UDMA/133 Dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html His drives are either really old and do not support NCQ or he is not using AHCI in the BIOS. Sorry, Justin, if I wasn't clear. I was asking the N00bee question about _my_own_ machine. The output of lspci (on my machine) seems to indicate I have a real STAT controller on the Motherboard, but the contents of dmesg, with the references to ATA-7 and UDMA/133, made me wonder if I had just an ATA-SATA converter. Hence my question: how does one tell definitively if one has a real SATA controller on the Mother Board? The output looks like a real (AHCI-capable) SATA controller and your drives are using NCQ/AHCI. Output from one of my machines: [ 23.621462] ata1: SATA max UDMA/133 cmd 0xf8812100 ctl 0x bmdma 0x irq 219 [ 24.078390] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 24.549806] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) As far as why it shows UDMA/133 in the kernel output I am sure there is a reason :) I know in the older SATA drives there was a bridge chip that was used to convert the drive from IDE-SATA maybe it is from those legacy days, not sure. With the newer NCQ/'native' SATA drives, the bridge chip should no longer exist. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote: Can you start a 'vmstat 1' in one window, then start whatever you do to get crappy performance. That would be interesting to see. In trying to find something simple that can show the problem I'm seeing. I think I may have found the culprit. Just testing on my machine at home, I made this simple program. /* fslattest.c */ #define _GNU_SOURCE #include stdio.h #include stdlib.h #include unistd.h #include sys/stat.h #include sys/types.h #include fcntl.h #include string.h int main(int argc, char *argv[]) { char file[255]; if (argc 2) { printf(Usage: fslattest file\n); exit(1); } strncpy(file, argv[1], 254); printf(Opening %s\n, file); while (1) { int testfd = open(file, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); close(testfd); unlink(file); sleep(1); } exit(0); } If I run this program under strace in my home directory (XFS file system on a (new) disk (no raid involved) all to its own.like $ strace -T -e open ./fslattest test It doesn't looks too bad. open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.005043 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000212 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.016844 If I then start up a dd in the same place. $ dd if=/dev/zero of=bigfile bs=1M count=500 Then I see the problem I'm seeing at work. open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 2.000348 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.594441 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 2.224636 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.074615 Doing the same on my other disk which is Ext3 and contains the root fs, it doesn't ever stutter open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.015423 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.92 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.93 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.88 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000103 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.96 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.94 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000114 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.91 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000274 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000107 Somewhere in there was the dd, but you can't tell. I've found if I mount the XFS filesystem with nobarrier, the latency is reduced to about 0.5 seconds with occasional spikes 1 second. When doing this on the raid array. open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.009164 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.71 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.002667 dd kicks in open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 11.580238 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 3.94 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.63 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 4.297978 dd finishes open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.000199 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.013413 open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.025134 I guess I should take this to the XFS folks. John Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
Justin Piszcz wrote: On Fri, 5 Oct 2007, Dean S. Messing wrote: Brendan Conoboy wrote: snip Is the onboard SATA controller real SATA or just an ATA-SATA converter? If the latter, you're going to have trouble getting faster performance than any one disk can give you at a time. The output of 'lspci' should tell you if the onboard SATA controller is on its own bus or sharing space with some other device. Pasting the output here would be useful. snip N00bee question: How does one tell if a machine's disk controller is an ATA-SATA converter? The output of `lspci|fgrep -i sata' is: 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\ (rev 09) suggests a real SATA. These references to ATA in dmesg, however, make me wonder. ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133 ata1.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133 ata2.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32) ata2.00: configured for UDMA/133 ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133 ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata3.00: configured for UDMA/133 Dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html His drives are either really old and do not support NCQ or he is not using AHCI in the BIOS. Sorry, Justin, if I wasn't clear. I was asking the N00bee question about _my_own_ machine. The output of lspci (on my machine) seems to indicate I have a real STAT controller on the Motherboard, but the contents of dmesg, with the references to ATA-7 and UDMA/133, made me wonder if I had just an ATA-SATA converter. Hence my question: how does one tell definitively if one has a real SATA controller on the Mother Board? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Fri, 5 Oct 2007, Dean S. Messing wrote: Brendan Conoboy wrote: snip Is the onboard SATA controller real SATA or just an ATA-SATA converter? If the latter, you're going to have trouble getting faster performance than any one disk can give you at a time. The output of 'lspci' should tell you if the onboard SATA controller is on its own bus or sharing space with some other device. Pasting the output here would be useful. snip N00bee question: How does one tell if a machine's disk controller is an ATA-SATA converter? The output of `lspci|fgrep -i sata' is: 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\ (rev 09) suggests a real SATA. These references to ATA in dmesg, however, make me wonder. ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133 ata1.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133 ata2.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32) ata2.00: configured for UDMA/133 ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133 ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata3.00: configured for UDMA/133 Dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html His drives are either really old and do not support NCQ or he is not using AHCI in the BIOS. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote: Also if it is software raid, when you make the XFS filesyste, on it, it sets up a proper (and tuned) sunit/swidth, so why would you want to change that? Oh I didn't, the sunit and swidth were set automatically. Do they look sane?. From reading the XFS section of the mount man page, I'm not entirely sure what they specify and certainly wouldn't have any idea what to set them to. Justin. Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Wed, 3 Oct 2007, Andrew Clayton wrote: On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote: Also if it is software raid, when you make the XFS filesyste, on it, it sets up a proper (and tuned) sunit/swidth, so why would you want to change that? Oh I didn't, the sunit and swidth were set automatically. Do they look sane?. From reading the XFS section of the mount man page, I'm not entirely sure what they specify and certainly wouldn't have any idea what to set them to. Justin. Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html As long as you ran mkfs.xfs /dev/md0 it should have optimized the filesystem according to the disks beneath it. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Sat, 6 Oct 2007, Justin Piszcz wrote: On Wed, 3 Oct 2007, Andrew Clayton wrote: On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote: Also if it is software raid, when you make the XFS filesyste, on it, it sets up a proper (and tuned) sunit/swidth, so why would you want to change that? Oh I didn't, the sunit and swidth were set automatically. Do they look sane?. From reading the XFS section of the mount man page, I'm not entirely sure what they specify and certainly wouldn't have any idea what to set them to. Justin. Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html As long as you ran mkfs.xfs /dev/md0 it should have optimized the filesystem according to the disks beneath it. Justin. Also can you provide the smartctl -a /dev/sda /dev/sdb etc for each disk? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 4 Oct 2007, Andrew Clayton wrote: On Thu, 04 Oct 2007 12:46:05 -0400, Steve Cousins wrote: Andrew Clayton wrote: On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote: What type (make/model) of the drives? The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100 A couple of things: 1. I thought you had SATA drives 2. ATA-6 would be UDMA/133 The SATA-1 versions of the 7K250's did not have NCQ. The SATA-2 versions do have NCQ. If you do have SATA drives, are they SATA-1 or SATA-2? Not sure, I suspect SATA 1 seeing as we've had them nearly 3 years. Some bits from dmesg ata1: SATA max UDMA/100 cmd 0xc2aa4880 ctl 0xc2aa488a bmdma 0xff ffc2aa4800 irq 19 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata1.00: ATA-6: HDS722525VLSA80, V36OA63A, max UDMA/100 ata1.00: 488397168 sectors, multi 16: LBA48 ata1.00: configured for UDMA/100 Steve Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Looks like SATA1 (non-ncq) to me. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Fri, 5 Oct 2007, Andrew Clayton wrote: On Fri, 5 Oct 2007 06:25:20 -0400 (EDT), Justin Piszcz wrote: So you have 3 SATA 1 disks: Yeah, 3 of them in the array, there is a fourth standalone disk which contains the root fs from which the system boots.. http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D Do you compile your own kernel or use the distribution's kernel? Compile my own. What does cat /proc/interrupts say? This is important to see if your disk controller(s) are sharing IRQs with other devices. $ cat /proc/interrupts CPU0 CPU1 0: 132052 249369403 IO-APIC-edge timer 1:202 52 IO-APIC-edge i8042 8: 0 1 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 14: 11483172 IO-APIC-edge ide0 16: 180411954798850 IO-APIC-fasteoi sata_sil24 18: 86068930 27 IO-APIC-fasteoi eth0 19: 161276622138177 IO-APIC-fasteoi sata_sil, ohci_hcd:usb1, ohci_hcd:usb2 NMI: 0 0 LOC: 249368914 249368949 ERR: 0 sata_sil24 contains the raid array, sata_sil the root fs disk Also note with only 3 disks in a RAID-5 you will not get stellar performance, but regardless, it should not be 'hanging' as you have mentioned. Just out of sheer curiosity have you tried the AS scheduler? CFQ is supposed to be better for multi-user performance but I would be highly interested if you used the AS scheduler-- would that change the 'hanging' problem you are noticing? I would give it a shot, also try the deadline and noop. I did try them briefly. I'll have another go. You probably want to keep the nr_requessts to 128, the stripe_cache_size to 8mb. The stripe size of 256k is probably optimal. OK. Did you also re-mount the XFS partition with the default mount options (or just take the sunit and swidth)? The /etc/fstab entry for the raid array is currently: /dev/md0/home xfs noatime,logbufs=8 1 2 and mount says /dev/md0 on /home type xfs (rw,noatime,logbufs=8) and /proc/mounts /dev/md0 /home xfs rw,noatime,logbufs=8,sunit=512,swidth=1024 0 0 So I guess mount or the kernel is setting the sunit and swidth values. Justin. Andrew The mount options are from when the filesystem was made for sunit/swidth I believe. -N Causes the file system parameters to be printed out without really creating the file system. You should be able to run mkfs.xfs -N /dev/md0 to get that information. /dev/md3/r1 xfs noatime,nodiratime,logbufs=8,logbsize=262144 0 1 Try using the following options and the AS scheduler and let me know if you still notice any 'hangs' Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Fri, 5 Oct 2007, Andrew Clayton wrote: On Fri, 5 Oct 2007 07:08:51 -0400 (EDT), Justin Piszcz wrote: The mount options are from when the filesystem was made for sunit/swidth I believe. -N Causes the file system parameters to be printed out without really creating the file system. You should be able to run mkfs.xfs -N /dev/md0 to get that information. Can't do it while it's mounted. would xfs_info show the same stuff? /dev/md3/r1 xfs noatime,nodiratime,logbufs=8,logbsize=262144 0 1 Try using the following options and the AS scheduler and let me know if you still notice any 'hangs' OK, I've remounted (mount -o remount) with those options. I've set the strip_cache_size to 8192 I've set the nr_requests back to 128 I've set the schedulers to anticipatory. Unfortunately problem remains. I'll try the noop scheduler as I don't think I ever tried that one. Justin. Andrew How are you measuring the problem? How can it be reproduced? Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Fri, 5 Oct 2007 13:53:12 +0100, Andrew Clayton wrote: Unfortunately problem remains. I'll try the noop scheduler as I don't think I ever tried that one. Didn't help either, oh well. If I hit the disk in workstation with a big dd then in iostat I see it maxing out at about 40MB/sec with 1 second await. The server seems to hit this with a much lower rate, 10MB/sec maybe I think I'm going to also move the raid disks back onto the onboard controller (as Goswin von Brederlow said it should have more bandwidth anyway) as the PCI card doesn't seem to have helped and I'm seeing soft SATA resets coming from it. e.g ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 ata6.00: irq_stat 0x00020002, device error via D2H FIS ata6.00: cmd 35/00:00:07:4a:d9/00:04:02:00:00/e0 tag 0 cdb 0x0 data 524288 out res 51/84:00:06:4e:d9/00:00:02:00:00/e2 Emask 0x10 (ATA bus error) ata6: soft resetting port ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata6.00: configured for UDMA/100 ata6: EH complete Just to confirm, I was seeing the problem with the on board controller and thought moving the disks to the PCI card might help (at £35 it was worth a shot!) Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Fri, 5 Oct 2007 07:08:51 -0400 (EDT), Justin Piszcz wrote: The mount options are from when the filesystem was made for sunit/swidth I believe. -N Causes the file system parameters to be printed out without really creating the file system. You should be able to run mkfs.xfs -N /dev/md0 to get that information. Can't do it while it's mounted. would xfs_info show the same stuff? /dev/md3/r1 xfs noatime,nodiratime,logbufs=8,logbsize=262144 0 1 Try using the following options and the AS scheduler and let me know if you still notice any 'hangs' OK, I've remounted (mount -o remount) with those options. I've set the strip_cache_size to 8192 I've set the nr_requests back to 128 I've set the schedulers to anticipatory. Unfortunately problem remains. I'll try the noop scheduler as I don't think I ever tried that one. Justin. Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Fri, 5 Oct 2007 10:07:47 -0400 (EDT), Justin Piszcz wrote: Yikes, yeah I would get them off the PCI card, what kind of motherboard is it? If you don't have a PCI-e based board it probably won't help THAT much but it still should be better than placing 3 drives on a PCI card. It's a Tyan Thunder K8S Pro S2882. No PCIe. Though given the fact that simply patching the kernel (on the RAID fs) when there's no other disk activity slows to a crawl which I'm fairly sure it didn't used, certainly these app stalls are new. The only trouble is I don't have any iostat profile from say a year ago when everything was OK. So I can't be 100% sure the current thing of spikes of iowait and await etc didn't actually always happen and it's actually something else that's wrong. Justin. Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Fri, 5 Oct 2007 10:07:47 -0400 (EDT), Justin Piszcz wrote: Yikes, yeah I would get them off the PCI card, what kind of motherboard is it? If you don't have a PCI-e based board it probably won't help THAT much but it still should be better than placing 3 drives on a PCI card. Moved the drives back onto the on board controller. While I had the machine down I ran memtest86+ for about 5 mins, no errors. I also got the output of mkfs.xfs -f -N /dev/md0 meta-data=/dev/md0 isize=256agcount=16, agsize=7631168 blks = sectsz=4096 attr=0 data = bsize=4096 blocks=122097920, imaxpct=25 = sunit=64 swidth=128 blks, unwritten=1 naming =version 2 bsize=4096 log =internal log bsize=4096 blocks=32768, version=2 = sectsz=4096 sunit=1 blks, lazy-count=0 realtime =none extsz=524288 blocks=0, rtextents=0 Justin. Thanks for your help by the way. Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Fri, 5 Oct 2007, Andrew Clayton wrote: On Fri, 5 Oct 2007 10:07:47 -0400 (EDT), Justin Piszcz wrote: Yikes, yeah I would get them off the PCI card, what kind of motherboard is it? If you don't have a PCI-e based board it probably won't help THAT much but it still should be better than placing 3 drives on a PCI card. Moved the drives back onto the on board controller. While I had the machine down I ran memtest86+ for about 5 mins, no errors. I also got the output of mkfs.xfs -f -N /dev/md0 meta-data=/dev/md0 isize=256agcount=16, agsize=7631168 blks = sectsz=4096 attr=0 data = bsize=4096 blocks=122097920, imaxpct=25 = sunit=64 swidth=128 blks, unwritten=1 naming =version 2 bsize=4096 log =internal log bsize=4096 blocks=32768, version=2 = sectsz=4096 sunit=1 blks, lazy-count=0 realtime =none extsz=524288 blocks=0, rtextents=0 Justin. Thanks for your help by the way. Andrew Hm, unfortunately at this point I think I am out of ideas you may need to ask the XFS/linux-raid developers how to run blktrace during those operations to figure out what is going on. BTW: Last thing I can think of, did you make any changes to PREEMPTION in the kernel, or do you disable it (SERVER)? Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
Have you had a look at the smartctl -a outputs of all the drives? Possibly one drive is being slow to respond due to seek errors etc. but I would perhaps expect to be seeing this in the log. If you have a full backup and a spare drive, I would probably rotate it through the array. Regards, Richard - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Sat, 6 Oct 2007, Richard Scobie wrote: Have you had a look at the smartctl -a outputs of all the drives? Possibly one drive is being slow to respond due to seek errors etc. but I would perhaps expect to be seeing this in the log. If you have a full backup and a spare drive, I would probably rotate it through the array. Regards, Richard - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Forgot about that, yeah post the smartctl -a output for each drive please. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
Andrew == Andrew Clayton [EMAIL PROTECTED] writes: Andrew On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote: Also, did performance just go to crap one day or was it gradual? Andrew IIRC I just noticed one day that firefox and vim was Andrew stalling. That was back in February/March I think. At the time Andrew the server was running a 2.6.18 kernel, since then I've tried Andrew a few kernels in between that and currently 2.6.23-rc9 Andrew Something seems to be periodically causing a lot of activity Andrew that max's out the stripe_cache for a few seconds (when I was Andrew trying to look with blktrace, it seemed pdflush was doing a Andrew lot of activity during this time). Andrew What I had noticed just recently was when I was the only one Andrew doing IO on the server (no NFS running and I was logged in at Andrew the console) even just patching the kernel was crawling to a Andrew halt. How much memory does this system have? Have you checked the output of /proc/mtrr at all? There' have been reports of systems with a bad BIOS that gets the memory map wrong, causing access to memory to slow down drastically. So if you have 2gb of RAM, try booting with mem=1900m or something like that and seeing if things are better for you. Make sure your BIOS is upto the latest level as well. John - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Fri, 5 Oct 2007 12:16:07 -0400 (EDT), Justin Piszcz wrote: Hm, unfortunately at this point I think I am out of ideas you may need to ask the XFS/linux-raid developers how to run blktrace during those operations to figure out what is going on. No problem, cheers. BTW: Last thing I can think of, did you make any changes to PREEMPTION in the kernel, or do you disable it (SERVER)? I normally have it disabled, but did try with voluntary preemption, but with no effect. Justin. Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Fri, 5 Oct 2007 15:02:22 -0400, John Stoffel wrote: How much memory does this system have? Have you checked the output of 2GB /proc/mtrr at all? There' have been reports of systems with a bad $ cat /proc/mtrr reg00: base=0x ( 0MB), size=2048MB: write-back, count=1 BIOS that gets the memory map wrong, causing access to memory to slow down drastically. BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - 7fff (usable) BIOS-e820: 7fff - 7000 (ACPI data) BIOS-e820: 7000 - 8000 (ACPI NVS) BIOS-e820: ff78 - 0001 (reserved) full dmesg (from 2.6.21-rc8-git2) at http://digital-domain.net/kernel/sw-raid5-issue/dmesg So if you have 2gb of RAM, try booting with mem=1900m or something Worth a shot. like that and seeing if things are better for you. Make sure your BIOS is upto the latest level as well. Hmm, I'll see whats involved in that. John Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
Andrew Clayton wrote: If anyone has any idea's I'm all ears. Hi Andrew, Are you sure your drives are healthy? Try benchmarking each drive individually and see if there is a dramatic performance difference between any of them. One failing drive can slow down an entire array. Only after you have determined that your drives are healthy when accessed individually are combined results particularly meaningful. For a generic SATA 1 drive you should expect a sustained raw read or write in excess of 45 MB/s. Check both read and write (this will destroy data) and make sure your cache is clear prior to the read test and after the write test. If each drive is working at a reasonable rate individually, you're ready to move on. The next question is: What happens when you access more than one device at the same time? You should either get nearly full combined performance, max out CPU, or get throttled by bus bandwidth (An actual kernel bug could also come into play here, but I tend to doubt it). Is the onboard SATA controller real SATA or just an ATA-SATA converter? If the latter, you're going to have trouble getting faster performance than any one disk can give you at a time. The output of 'lspci' should tell you if the onboard SATA controller is on its own bus or sharing space with some other device. Pasting the output here would be useful. Assuming you get good performance out of all 3 drives at the same time, it's time to create a RAID 5 md device with the three, make sure your parity is done building, then benchmark that. It's going to be slower to write and a bit slower to read (especially if your CPU is maxed out), but that is normal. Assuming you get good performance out of your md device, it's time to put your filesystem on the md device and benchmark that. If you use ext3, remember to set the stride parameter per the raid howto. I am unfamiliar with other fs/md interactions, so be sure to check. If you're actually maxing out your bus bandwidth and the onboard sata controller is on a different bus than the pci sata controller, try balancing the drives between the two to get a larger combined pipe. Good luck, -- Brendan Conoboy / Red Hat, Inc. / [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
Andrew On Fri, 5 Oct 2007 15:02:22 -0400, John Stoffel wrote: How much memory does this system have? Have you checked the output of Andrew 2GB /proc/mtrr at all? There' have been reports of systems with a bad Andrew $ cat /proc/mtrr Andrew reg00: base=0x ( 0MB), size=2048MB: write-back, count=1 That looks to be good, all the memory is there all in the same region. Oh well... it was a thought. BIOS that gets the memory map wrong, causing access to memory to slow down drastically. Andrew BIOS-provided physical RAM map: Andrew BIOS-e820: - 0009fc00 (usable) Andrew BIOS-e820: 0009fc00 - 000a (reserved) Andrew BIOS-e820: 000e - 0010 (reserved) Andrew BIOS-e820: 0010 - 7fff (usable) Andrew BIOS-e820: 7fff - 7000 (ACPI data) Andrew BIOS-e820: 7000 - 8000 (ACPI NVS) Andrew BIOS-e820: ff78 - 0001 (reserved) I dunno about this part. Andrew full dmesg (from 2.6.21-rc8-git2) at Andrew http://digital-domain.net/kernel/sw-raid5-issue/dmesg So if you have 2gb of RAM, try booting with mem=1900m or something Andrew Worth a shot. It might make a difference, might not. Do you have any kernel debugging options turned on? That might also be an issue. Check your .config, there are a couple of options which drastically slow down the system. like that and seeing if things are better for you. Make sure your BIOS is upto the latest level as well. Andrew Hmm, I'll see whats involved in that. At this point, I don't suspect the BIOS any more. Can you start a 'vmstat 1' in one window, then start whatever you do to get crappy performance. That would be interesting to see. John - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
Brendan Conoboy wrote: snip Is the onboard SATA controller real SATA or just an ATA-SATA converter? If the latter, you're going to have trouble getting faster performance than any one disk can give you at a time. The output of 'lspci' should tell you if the onboard SATA controller is on its own bus or sharing space with some other device. Pasting the output here would be useful. snip N00bee question: How does one tell if a machine's disk controller is an ATA-SATA converter? The output of `lspci|fgrep -i sata' is: 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\ (rev 09) suggests a real SATA. These references to ATA in dmesg, however, make me wonder. ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133 ata1.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133 ata2.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32) ata2.00: configured for UDMA/133 ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133 ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32) ata3.00: configured for UDMA/133 Dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote: Not bad, but not that good, either. Try running xfs_fsr into a nightly cronjob. By default, it will defrag mounted xfs filesystems for up to 2 hours. Typically this is enough to keep fragmentation well below 1%. I ran it last night on the raid array, it got the fragmentation down to 1.07%. Unfortunately that doesn't seemed to have helped. -Dave Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 4 Oct 2007, Justin Piszcz wrote: Is NCQ enabled on the drives? On Thu, 4 Oct 2007, Andrew Clayton wrote: On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote: Not bad, but not that good, either. Try running xfs_fsr into a nightly cronjob. By default, it will defrag mounted xfs filesystems for up to 2 hours. Typically this is enough to keep fragmentation well below 1%. I ran it last night on the raid array, it got the fragmentation down to 1.07%. Unfortunately that doesn't seemed to have helped. -Dave Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Also, did performance just go to crap one day or was it gradual? Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 4 Oct 2007 10:09:22 -0400 (EDT), Justin Piszcz wrote: Is NCQ enabled on the drives? I don't think the drives are capable of that. I don't seen any mention of NCQ in dmesg. Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 4 Oct 2007, Andrew Clayton wrote: On Thu, 4 Oct 2007 10:09:22 -0400 (EDT), Justin Piszcz wrote: Is NCQ enabled on the drives? I don't think the drives are capable of that. I don't seen any mention of NCQ in dmesg. Andrew What type (make/model) of the drives? True, the controller may not be able to do it either. What types of disks/controllers again? Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 4 Oct 2007, Andrew Clayton wrote: On Thu, 4 Oct 2007 10:09:22 -0400 (EDT), Justin Piszcz wrote: Is NCQ enabled on the drives? I don't think the drives are capable of that. I don't seen any mention of NCQ in dmesg. Andrew BTW You may not see 'NCQ' in the kernel messages unless you enable AHCI. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote: Also, did performance just go to crap one day or was it gradual? IIRC I just noticed one day that firefox and vim was stalling. That was back in February/March I think. At the time the server was running a 2.6.18 kernel, since then I've tried a few kernels in between that and currently 2.6.23-rc9 Something seems to be periodically causing a lot of activity that max's out the stripe_cache for a few seconds (when I was trying to look with blktrace, it seemed pdflush was doing a lot of activity during this time). What I had noticed just recently was when I was the only one doing IO on the server (no NFS running and I was logged in at the console) even just patching the kernel was crawling to a halt. Justin. Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote: What type (make/model) of the drives? The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100 True, the controller may not be able to do it either. What types of disks/controllers again? The RAID disks are currently connected to a Silicon Image PCI card are configured as a software RAID 5 03:04.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02) Subsystem: Silicon Image, Inc. Unknown device 7124 Flags: bus master, stepping, 66MHz, medium devsel, latency 64, IRQ 16 Memory at feafec00 (64-bit, non-prefetchable) [size=128] Memory at feaf (64-bit, non-prefetchable) [size=32K] I/O ports at bc00 [size=16] Expansion ROM at fea0 [disabled] [size=512K] Capabilities: [64] Power Management version 2 Capabilities: [40] PCI-X non-bridge device Capabilities: [54] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- The problem originated when the disks where connected to the on board Silicon Image 3114 controller. Justin. Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 4 Oct 2007, Andrew Clayton wrote: On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote: What type (make/model) of the drives? The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100 True, the controller may not be able to do it either. What types of disks/controllers again? The RAID disks are currently connected to a Silicon Image PCI card are configured as a software RAID 5 03:04.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02) Subsystem: Silicon Image, Inc. Unknown device 7124 Flags: bus master, stepping, 66MHz, medium devsel, latency 64, IRQ 16 Memory at feafec00 (64-bit, non-prefetchable) [size=128] Memory at feaf (64-bit, non-prefetchable) [size=32K] I/O ports at bc00 [size=16] Expansion ROM at fea0 [disabled] [size=512K] Capabilities: [64] Power Management version 2 Capabilities: [40] PCI-X non-bridge device Capabilities: [54] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- The problem originated when the disks where connected to the on board Silicon Image 3114 controller. Justin. Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html 7K250 http://www.itreviews.co.uk/hardware/h912.htm http://techreport.com/articles.x/8362 The T7K250 also supports Native Command Queuing (NCQ). You need to enable AHCI in order to reap the benefits though. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 4 Oct 2007, Andrew Clayton wrote: On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote: Also, did performance just go to crap one day or was it gradual? IIRC I just noticed one day that firefox and vim was stalling. That was back in February/March I think. At the time the server was running a 2.6.18 kernel, since then I've tried a few kernels in between that and currently 2.6.23-rc9 Something seems to be periodically causing a lot of activity that max's out the stripe_cache for a few seconds (when I was trying to look with blktrace, it seemed pdflush was doing a lot of activity during this time). What I had noticed just recently was when I was the only one doing IO on the server (no NFS running and I was logged in at the console) even just patching the kernel was crawling to a halt. Justin. Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Besides the NCQ issue your problem is a bit perpelxing.. Just out of curiosity have you run memtest86 for at least one pass to make sure there were no problems with the memory? Do you have a script showing all of the parameters that you use to optimize the array? Also mdadm -D /dev/md0 output please? What distribution are you running? (not that it should matter, but just curious) Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
Steve Cousins wrote: A couple of things: 1. I thought you had SATA drives 2. ATA-6 would be UDMA/133 Number 2 is not correct. Sorry about that. Steve - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
Andrew Clayton wrote: On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote: What type (make/model) of the drives? The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100 A couple of things: 1. I thought you had SATA drives 2. ATA-6 would be UDMA/133 The SATA-1 versions of the 7K250's did not have NCQ. The SATA-2 versions do have NCQ. If you do have SATA drives, are they SATA-1 or SATA-2? Steve - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 4 Oct 2007 12:20:25 -0400 (EDT), Justin Piszcz wrote: On Thu, 4 Oct 2007, Andrew Clayton wrote: On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote: Also, did performance just go to crap one day or was it gradual? IIRC I just noticed one day that firefox and vim was stalling. That was back in February/March I think. At the time the server was running a 2.6.18 kernel, since then I've tried a few kernels in between that and currently 2.6.23-rc9 Something seems to be periodically causing a lot of activity that max's out the stripe_cache for a few seconds (when I was trying to look with blktrace, it seemed pdflush was doing a lot of activity during this time). What I had noticed just recently was when I was the only one doing IO on the server (no NFS running and I was logged in at the console) even just patching the kernel was crawling to a halt. Justin. Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Besides the NCQ issue your problem is a bit perpelxing.. Just out of curiosity have you run memtest86 for at least one pass to make sure there were no problems with the memory? No I haven't. Do you have a script showing all of the parameters that you use to optimize the array? No script, Nothing that I change really seems to make any difference. Currently I have set /sys/block/md0/md/stripe_cache_size set at 16384 It doesn't really seem to matter what I set it to, as the stripe_cache_active will periodically reach that value and take a few seconds to come back down. /sys/block/sd[bcd]/queue/nr_requests to 512 and set readhead to 8192 on sd[bcd] But none of that really seems to make any difference. Also mdadm -D /dev/md0 output please? http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D What distribution are you running? (not that it should matter, but just curious) Fedora Core 6 (though I'm fairly sure it was happening before upgrading from Fedora Core 5) The iostat output of the drives when the problem occurs looks like the same profile as when the backup is going onto the USB 1.1 hard drive. The IO wait goes up, the cpu % is hitting 100% and we see multi second await times. Which is why I thought maybe the on board controller was a bottleneck, like the USB 1.1 is really slow and moved the disks onto the PCI card. But when I saw that even patching the kernel was going really slow I thought it can't really be the problem as it didn't used to go that slow. It's a tricky one... Justin. Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 04 Oct 2007 12:46:05 -0400, Steve Cousins wrote: Andrew Clayton wrote: On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote: What type (make/model) of the drives? The drives are 250GB Hitachi Deskstar 7K250 series ATA-6 UDMA/100 A couple of things: 1. I thought you had SATA drives 2. ATA-6 would be UDMA/133 The SATA-1 versions of the 7K250's did not have NCQ. The SATA-2 versions do have NCQ. If you do have SATA drives, are they SATA-1 or SATA-2? Not sure, I suspect SATA 1 seeing as we've had them nearly 3 years. Some bits from dmesg ata1: SATA max UDMA/100 cmd 0xc2aa4880 ctl 0xc2aa488a bmdma 0xff ffc2aa4800 irq 19 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata1.00: ATA-6: HDS722525VLSA80, V36OA63A, max UDMA/100 ata1.00: 488397168 sectors, multi 16: LBA48 ata1.00: configured for UDMA/100 Steve Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Thu, 4 Oct 2007 12:19:20 -0400 (EDT), Justin Piszcz wrote: 7K250 http://www.itreviews.co.uk/hardware/h912.htm http://techreport.com/articles.x/8362 The T7K250 also supports Native Command Queuing (NCQ). You need to enable AHCI in order to reap the benefits though. Cheers, I'll take a look at that. Justin. Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
Have you checked fragmentation? xfs_db -c frag -f /dev/md3 What does this report? Justin. On Wed, 3 Oct 2007, Andrew Clayton wrote: Hi, Hardware: Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file system) is connected to the onboard Silicon Image 3114 controller. The other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the on board controller onto the card the other day to see if that would help, it didn't. Software: Fedora Core 6, 2.6.23-rc9 kernel. Array/fs details: Filesystems are XFS FilesystemTypeSize Used Avail Use% Mounted on /dev/sda2 xfs 20G 5.6G 14G 29% / /dev/sda5 xfs213G 3.6G 209G 2% /data none tmpfs 1008M 0 1008M 0% /dev/shm /dev/md0 xfs466G 237G 229G 51% /home /dev/md0 is currently mounted with the following options noatime,logbufs=8,sunit=512,swidth=1024 sunit and swidth seem to be automatically set. xfs_info shows meta-data=/dev/md0 isize=256agcount=16, agsize=7631168 blks = sectsz=4096 attr=1 data = bsize=4096 blocks=122097920, imaxpct=25 = sunit=64 swidth=128 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=2 = sectsz=4096 sunit=1 blks, lazy-count=0 realtime =none extsz=524288 blocks=0, rtextents=0 The array has a 256k chunk size using left-symmetric layout. /sys/block/md0/md/stripe_cache_size is currently at 4096 (upping this from 256, alleviates the problem at best) I also have currently set /sys/block/sd[bcd]/queue/nr_requests to 512 (doesn't seem to have made any difference) Also blockdev --setra 8192 /dev/sd[bcd] also tried 16384 and 32768 IO scheduler is cfq for all devices. This machine acts as a file server for about 11 workstations. /home (the software RAID 5) is exported over NFS where by the clients mount their home directories (using autofs). I set it up about 3 years ago and it has been fine. However earlier this year we started noticing application stalls. e.g firefox would become unrepsonsive and the window would grey out (under Compiz), this typically lasts 2-4 seconds. During these stalls, I see the below iostat activity (taken at 2 second intervals on the file server). High iowait, high await's. The stripe_cache_active max's out and things kind of grind to halt for a few seconds until the stripe_cache_active starts shrinking. avg-cpu: %user %nice %system %iowait %steal %idle 0.000.000.000.250.00 99.75 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 5.47 0.0040.8014.91 0.05 9.73 7.18 3.93 sdb 0.00 0.00 1.49 1.49 5.97 9.9510.67 0.06 18.50 9.00 2.69 sdc 0.00 0.00 0.00 2.99 0.0015.9210.67 0.01 4.17 4.17 1.24 sdd 0.00 0.00 0.50 2.49 1.9913.9310.67 0.02 5.67 5.67 1.69 md0 0.00 0.00 0.00 1.99 0.00 7.96 8.00 0.00 0.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.250.005.241.500.00 93.02 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 12.50 0.0085.7513.72 0.12 9.60 6.28 7.85 sdb 182.50 275.00 114.00 17.50 986.0082.0016.24 337.03 660.64 6.06 79.70 sdc 171.00 269.50 117.00 20.00 1012.0094.0016.15 315.35 677.73 5.86 80.25 sdd 149.00 278.00 107.00 18.50 940.0084.0016.32 311.83 705.33 6.33 79.40 md0 0.00 0.00 0.00 1012.00 0.00 8090.0015.99 0.000.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.000.001.50 44.610.00 53.88 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 1.00 0.00 4.25 8.50 0.00 0.00 0.00 0.00 sdb 168.5064.00 129.50 58.00 1114.00 508.0017.30 645.37 1272.90 5.34 100.05 sdc 194.0076.50 141.50 43.00 1232.00 360.0017.26 664.01 916.30 5.42 100.05 sdd 172.0090.50 114.50 50.00 996.00 456.0017.65 662.54 977.28 6.08 100.05 md0 0.00 0.00 0.50 8.00 2.0032.00
Re: RAID 5 performance issue.
Also if it is software raid, when you make the XFS filesyste, on it, it sets up a proper (and tuned) sunit/swidth, so why would you want to change that? Justin. On Wed, 3 Oct 2007, Justin Piszcz wrote: Have you checked fragmentation? xfs_db -c frag -f /dev/md3 What does this report? Justin. On Wed, 3 Oct 2007, Andrew Clayton wrote: Hi, Hardware: Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file system) is connected to the onboard Silicon Image 3114 controller. The other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the on board controller onto the card the other day to see if that would help, it didn't. Software: Fedora Core 6, 2.6.23-rc9 kernel. Array/fs details: Filesystems are XFS FilesystemTypeSize Used Avail Use% Mounted on /dev/sda2 xfs 20G 5.6G 14G 29% / /dev/sda5 xfs213G 3.6G 209G 2% /data none tmpfs 1008M 0 1008M 0% /dev/shm /dev/md0 xfs466G 237G 229G 51% /home /dev/md0 is currently mounted with the following options noatime,logbufs=8,sunit=512,swidth=1024 sunit and swidth seem to be automatically set. xfs_info shows meta-data=/dev/md0 isize=256agcount=16, agsize=7631168 blks = sectsz=4096 attr=1 data = bsize=4096 blocks=122097920, imaxpct=25 = sunit=64 swidth=128 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=2 = sectsz=4096 sunit=1 blks, lazy-count=0 realtime =none extsz=524288 blocks=0, rtextents=0 The array has a 256k chunk size using left-symmetric layout. /sys/block/md0/md/stripe_cache_size is currently at 4096 (upping this from 256, alleviates the problem at best) I also have currently set /sys/block/sd[bcd]/queue/nr_requests to 512 (doesn't seem to have made any difference) Also blockdev --setra 8192 /dev/sd[bcd] also tried 16384 and 32768 IO scheduler is cfq for all devices. This machine acts as a file server for about 11 workstations. /home (the software RAID 5) is exported over NFS where by the clients mount their home directories (using autofs). I set it up about 3 years ago and it has been fine. However earlier this year we started noticing application stalls. e.g firefox would become unrepsonsive and the window would grey out (under Compiz), this typically lasts 2-4 seconds. During these stalls, I see the below iostat activity (taken at 2 second intervals on the file server). High iowait, high await's. The stripe_cache_active max's out and things kind of grind to halt for a few seconds until the stripe_cache_active starts shrinking. avg-cpu: %user %nice %system %iowait %steal %idle 0.000.000.000.250.00 99.75 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 5.47 0.0040.8014.91 0.059.73 7.18 3.93 sdb 0.00 0.00 1.49 1.49 5.97 9.9510.67 0.06 18.50 9.00 2.69 sdc 0.00 0.00 0.00 2.99 0.0015.9210.67 0.014.17 4.17 1.24 sdd 0.00 0.00 0.50 2.49 1.9913.9310.67 0.025.67 5.67 1.69 md0 0.00 0.00 0.00 1.99 0.00 7.96 8.00 0.000.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.250.005.241.500.00 93.02 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 12.50 0.0085.7513.72 0.129.60 6.28 7.85 sdb 182.50 275.00 114.00 17.50 986.0082.0016.24 337.03 660.64 6.06 79.70 sdc 171.00 269.50 117.00 20.00 1012.0094.0016.15 315.35 677.73 5.86 80.25 sdd 149.00 278.00 107.00 18.50 940.0084.0016.32 311.83 705.33 6.33 79.40 md0 0.00 0.00 0.00 1012.00 0.00 8090.0015.99 0.000.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.000.001.50 44.610.00 53.88 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 1.00 0.00 4.25 8.50 0.000.00 0.00 0.00 sdb 168.5064.00 129.50 58.00 1114.00 508.0017.30 645.37 1272.90 5.34 100.05 sdc 194.0076.50 141.50 43.00 1232.00 360.0017.26 664.01 916.30 5.42 100.05
Re: RAID 5 performance issue.
Andrew Clayton [EMAIL PROTECTED] writes: Hi, Hardware: Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file system) is connected to the onboard Silicon Image 3114 controller. The other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the on board controller onto the card the other day to see if that would help, it didn't. I would think the onboard controller is connected to the north or south bridge and possibly hooked directly into the hyper transport. The extra controler is PCI so you are limited to theoretical 128MiB/s. For me the onboard chips do much better (though at higher cpu cost) than pci cards. MfG Goswin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote: Have you checked fragmentation? You know, that never even occurred to me. I've gotten into the mind set that it's generally not a problem under Linux. xfs_db -c frag -f /dev/md3 What does this report? # xfs_db -c frag -f /dev/md0 actual 1828276, ideal 1708782, fragmentation factor 6.54% Good or bad? Seeing as this filesystem will be three years old in December, that doesn't seem overly bad. I'm currently looking to things like http://lwn.net/Articles/249450/ and http://lwn.net/Articles/242559/ for potential help, fortunately it seems I won't have too long to wait. Justin. Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Wed, 3 Oct 2007, Andrew Clayton wrote: On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote: Also if it is software raid, when you make the XFS filesyste, on it, it sets up a proper (and tuned) sunit/swidth, so why would you want to change that? Oh I didn't, the sunit and swidth were set automatically. Do they look sane?. From reading the XFS section of the mount man page, I'm not entirely sure what they specify and certainly wouldn't have any idea what to set them to. Justin. Cheers, Andrew You should not need to set them as mount options unless you are overriding the defaults. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Wed, 03 Oct 2007 19:53:08 +0200, Goswin von Brederlow wrote: Andrew Clayton [EMAIL PROTECTED] writes: Hi, Hardware: Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file system) is connected to the onboard Silicon Image 3114 controller. The other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the on board controller onto the card the other day to see if that would help, it didn't. I would think the onboard controller is connected to the north or south bridge and possibly hooked directly into the hyper transport. The extra controler is PCI so you are limited to theoretical 128MiB/s. For me the onboard chips do much better (though at higher cpu cost) than pci cards. Yeah, I was wondering about that. It certainly hasn't improved things, it's unclear if it's made things any worse.. MfG Goswin Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
What does cat /sys/block/md0/md/mismatch_cnt say? That fragmentation looks normal/fine. Justin. On Wed, 3 Oct 2007, Andrew Clayton wrote: On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote: Have you checked fragmentation? You know, that never even occurred to me. I've gotten into the mind set that it's generally not a problem under Linux. xfs_db -c frag -f /dev/md3 What does this report? # xfs_db -c frag -f /dev/md0 actual 1828276, ideal 1708782, fragmentation factor 6.54% Good or bad? Seeing as this filesystem will be three years old in December, that doesn't seem overly bad. I'm currently looking to things like http://lwn.net/Articles/249450/ and http://lwn.net/Articles/242559/ for potential help, fortunately it seems I won't have too long to wait. Justin. Cheers, Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On 10/3/07, Andrew Clayton [EMAIL PROTECTED] wrote: On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote: Have you checked fragmentation? You know, that never even occurred to me. I've gotten into the mind set that it's generally not a problem under Linux. It's probably not the root cause, but certainly doesn't help things. At least with XFS you have an easy way to defrag the filesystem without even taking it offline. # xfs_db -c frag -f /dev/md0 actual 1828276, ideal 1708782, fragmentation factor 6.54% Good or bad? Not bad, but not that good, either. Try running xfs_fsr into a nightly cronjob. By default, it will defrag mounted xfs filesystems for up to 2 hours. Typically this is enough to keep fragmentation well below 1%. -Dave - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Wed, 3 Oct 2007 16:35:21 -0400 (EDT), Justin Piszcz wrote: What does cat /sys/block/md0/md/mismatch_cnt say? $ cat /sys/block/md0/md/mismatch_cnt 0 That fragmentation looks normal/fine. Cool. Justin. Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
Andrew Clayton wrote: Yeah, I was wondering about that. It certainly hasn't improved things, it's unclear if it's made things any worse.. Many 3124 cards are PCI-X, so if you have one of these (and you seem to be using a server board which may well have PCI-X), bus performance is not going to be an issue. Regards, Richard - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 performance issue.
On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote: # xfs_db -c frag -f /dev/md0 actual 1828276, ideal 1708782, fragmentation factor 6.54% Good or bad? Not bad, but not that good, either. Try running xfs_fsr into a nightly cronjob. By default, it will defrag mounted xfs filesystems for up to 2 hours. Typically this is enough to keep fragmentation well below 1%. Worth a shot. -Dave Andrew - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html