Re: RAID 5 performance issue.

2007-10-11 Thread Bill Davidsen

Andrew Clayton wrote:

On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote:

  

Can you start a 'vmstat 1' in one window, then start whatever you do
to get crappy performance.  That would be interesting to see.



In trying to find something simple that can show the problem I'm
seeing. I think I may have found the culprit.

Just testing on my machine at home, I made this simple program.

/* fslattest.c */

#define _GNU_SOURCE

#include stdio.h
#include stdlib.h
#include unistd.h
#include sys/stat.h
#include sys/types.h
#include fcntl.h
#include string.h


int main(int argc, char *argv[])
{
char file[255];

if (argc  2) {
printf(Usage: fslattest file\n);
exit(1);
}

strncpy(file, argv[1], 254);
printf(Opening %s\n, file);

while (1) {
int testfd = open(file, 
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600);

close(testfd);
unlink(file);
sleep(1);
}

exit(0);
}


If I run this program under strace in my home directory (XFS file system
on a (new) disk (no raid involved) all to its own.like

$ strace -T -e open ./fslattest test

It doesn't looks too bad.

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.005043
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000212
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.016844

If I then start up a dd in the same place.

$ dd if=/dev/zero of=bigfile bs=1M count=500

Then I see the problem I'm seeing at work.

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 2.000348
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.594441
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 2.224636
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.074615

Doing the same on my other disk which is Ext3 and contains the root fs,
it doesn't ever stutter

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.015423
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.92
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.93
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.88
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000103
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.96
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.94
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000114
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.91
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000274
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000107


Somewhere in there was the dd, but you can't tell.

I've found if I mount the XFS filesystem with nobarrier, the
latency is reduced to about 0.5 seconds with occasional spikes  1
second.

When doing this on the raid array.

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.009164
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.71
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.002667

dd kicks in

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 11.580238
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 3.94
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.63
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 4.297978

dd finishes 


open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.000199
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.013413
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.025134


I guess I should take this to the XFS folks.


Try mounting the filesystem noatime and see if that's part of the problem.

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-11 Thread Andrew Clayton
On Thu, 11 Oct 2007 13:06:39 -0400, Bill Davidsen wrote:

 Andrew Clayton wrote:
  On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote:
 
 Can you start a 'vmstat 1' in one window, then start whatever
 you do
  to get crappy performance.  That would be interesting to see.
  
  In trying to find something simple that can show the problem I'm
  seeing. I think I may have found the culprit.
 
  Just testing on my machine at home, I made this simple program.
 
  /* fslattest.c */
 
  #define _GNU_SOURCE
 
  #include stdio.h
  #include stdlib.h
  #include unistd.h
  #include sys/stat.h
  #include sys/types.h
  #include fcntl.h
  #include string.h
 
 
  int main(int argc, char *argv[])
  {
  char file[255];
 
  if (argc  2) {
  printf(Usage: fslattest file\n);
  exit(1);
  }
 
  strncpy(file, argv[1], 254);
  printf(Opening %s\n, file);
 
  while (1) {
  int testfd = open(file, 
  O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); close(testfd);
  unlink(file);
  sleep(1);
  }
 
  exit(0);
  }
 
 
  If I run this program under strace in my home directory (XFS file
  system on a (new) disk (no raid involved) all to its own.like
 
  $ strace -T -e open ./fslattest test
 
  It doesn't looks too bad.
 
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
  0.005043 open(test,
  O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000212
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
  0.016844
 
  If I then start up a dd in the same place.
 
  $ dd if=/dev/zero of=bigfile bs=1M count=500
 
  Then I see the problem I'm seeing at work.
 
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
  2.000348 open(test,
  O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.594441
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
  2.224636 open(test,
  O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.074615
 
  Doing the same on my other disk which is Ext3 and contains the root
  fs, it doesn't ever stutter
 
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
  0.015423 open(test,
  O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.92
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
  0.93 open(test,
  O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.88
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
  0.000103 open(test,
  O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.96
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
  0.94 open(test,
  O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000114
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
  0.91 open(test,
  O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000274
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
  0.000107
 
 
  Somewhere in there was the dd, but you can't tell.
 
  I've found if I mount the XFS filesystem with nobarrier, the
  latency is reduced to about 0.5 seconds with occasional spikes  1
  second.
 
  When doing this on the raid array.
 
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.009164
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.71
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.002667
 
  dd kicks in
 
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 11.580238
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 3.94
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.63
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 4.297978
 
  dd finishes 
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.000199
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.013413
  open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.025134
 
 
  I guess I should take this to the XFS folks.
 
 Try mounting the filesystem noatime and see if that's part of the
 problem.

Yeah, it's mounted noatime. Looks like I tracked this down to an XFS
regression.

http://marc.info/?l=linux-fsdevelm=119211228609886w=2

Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-11 Thread Justin Piszcz



On Thu, 11 Oct 2007, Andrew Clayton wrote:


On Thu, 11 Oct 2007 13:06:39 -0400, Bill Davidsen wrote:


Andrew Clayton wrote:

On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote:

  Can you start a 'vmstat 1' in one window, then start whatever
  you do

to get crappy performance.  That would be interesting to see.
   

In trying to find something simple that can show the problem I'm
seeing. I think I may have found the culprit.

Just testing on my machine at home, I made this simple program.

/* fslattest.c */

#define _GNU_SOURCE

#include stdio.h
#include stdlib.h
#include unistd.h
#include sys/stat.h
#include sys/types.h
#include fcntl.h
#include string.h


int main(int argc, char *argv[])
{
char file[255];

if (argc  2) {
printf(Usage: fslattest file\n);
exit(1);
}

strncpy(file, argv[1], 254);
printf(Opening %s\n, file);

while (1) {
int testfd = open(file, 
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); close(testfd);
unlink(file);
sleep(1);
}

exit(0);
}


If I run this program under strace in my home directory (XFS file
system on a (new) disk (no raid involved) all to its own.like

$ strace -T -e open ./fslattest test

It doesn't looks too bad.

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
0.005043 open(test,
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000212
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
0.016844

If I then start up a dd in the same place.

$ dd if=/dev/zero of=bigfile bs=1M count=500

Then I see the problem I'm seeing at work.

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
2.000348 open(test,
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.594441
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
2.224636 open(test,
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.074615

Doing the same on my other disk which is Ext3 and contains the root
fs, it doesn't ever stutter

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
0.015423 open(test,
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.92
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
0.93 open(test,
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.88
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
0.000103 open(test,
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.96
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
0.94 open(test,
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000114
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
0.91 open(test,
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000274
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
0.000107


Somewhere in there was the dd, but you can't tell.

I've found if I mount the XFS filesystem with nobarrier, the
latency is reduced to about 0.5 seconds with occasional spikes  1
second.

When doing this on the raid array.

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.009164
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.71
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.002667

dd kicks in

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 11.580238
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 3.94
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.63
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 4.297978

dd finishes 
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.000199
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.013413
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.025134


I guess I should take this to the XFS folks.


Try mounting the filesystem noatime and see if that's part of the
problem.


Yeah, it's mounted noatime. Looks like I tracked this down to an XFS
regression.

http://marc.info/?l=linux-fsdevelm=119211228609886w=2

Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Nice!  Thanks for reporting the final result, 1-2 weeks of 
debugging/discussion, nice you found it.


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-08 Thread Justin Piszcz



On Sun, 7 Oct 2007, Dean S. Messing wrote:



Justin Piszcz wrote:

On Fri, 5 Oct 2007, Dean S. Messing wrote:


Brendan Conoboy wrote:
snip

Is the onboard SATA controller real SATA or just an ATA-SATA
converter?  If the latter, you're going to have trouble getting faster
performance than any one disk can give you at a time.  The output of
'lspci' should tell you if the onboard SATA controller is on its own
bus or sharing space with some other device.  Pasting the output here
would be useful.

snip

N00bee question:

How does one tell if a machine's disk controller is an ATA-SATA
converter?

The output of `lspci|fgrep -i sata' is:

00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\
(rev 09)

suggests a real SATA. These references to ATA in dmesg, however,
make me wonder.

ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133
ata1.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133
ata2.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata2.00: configured for UDMA/133
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133
ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata3.00: configured for UDMA/133


Dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



His drives are either really old and do not support NCQ or he is not using
AHCI in the BIOS.


Sorry, Justin, if I wasn't clear.  I was asking the N00bee question
about _my_own_ machine.  The output of lspci (on my machine) seems to
indicate I have a real STAT controller on the Motherboard, but the
contents of dmesg, with the references to ATA-7 and UDMA/133, made
me wonder if I had just an ATA-SATA converter.  Hence my question: how
does one tell definitively if one has a real SATA controller on the Mother
Board?



The output looks like a real (AHCI-capable) SATA controller and your 
drives are using NCQ/AHCI.


Output from one of my machines:
[   23.621462] ata1: SATA max UDMA/133 cmd 0xf8812100 ctl 0x bmdma 
0x irq 219

[   24.078390] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   24.549806] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

As far as why it shows UDMA/133 in the kernel output I am sure there is a 
reason :)


I know in the older SATA drives there was a bridge chip that was used to 
convert the drive from IDE-SATA maybe it is from those legacy days, not 
sure.


With the newer NCQ/'native' SATA drives, the bridge chip should no longer 
exist.


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-07 Thread Andrew Clayton
On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote:

 Can you start a 'vmstat 1' in one window, then start whatever you do
 to get crappy performance.  That would be interesting to see.

In trying to find something simple that can show the problem I'm
seeing. I think I may have found the culprit.

Just testing on my machine at home, I made this simple program.

/* fslattest.c */

#define _GNU_SOURCE

#include stdio.h
#include stdlib.h
#include unistd.h
#include sys/stat.h
#include sys/types.h
#include fcntl.h
#include string.h


int main(int argc, char *argv[])
{
char file[255];

if (argc  2) {
printf(Usage: fslattest file\n);
exit(1);
}

strncpy(file, argv[1], 254);
printf(Opening %s\n, file);

while (1) {
int testfd = open(file, 
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600);
close(testfd);
unlink(file);
sleep(1);
}

exit(0);
}


If I run this program under strace in my home directory (XFS file system
on a (new) disk (no raid involved) all to its own.like

$ strace -T -e open ./fslattest test

It doesn't looks too bad.

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.005043
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000212
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.016844

If I then start up a dd in the same place.

$ dd if=/dev/zero of=bigfile bs=1M count=500

Then I see the problem I'm seeing at work.

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 2.000348
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.594441
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 2.224636
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 1.074615

Doing the same on my other disk which is Ext3 and contains the root fs,
it doesn't ever stutter

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.015423
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.92
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.93
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.88
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000103
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.96
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.94
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000114
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.91
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000274
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 0.000107


Somewhere in there was the dd, but you can't tell.

I've found if I mount the XFS filesystem with nobarrier, the
latency is reduced to about 0.5 seconds with occasional spikes  1
second.

When doing this on the raid array.

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.009164
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.71
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.002667

dd kicks in

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 11.580238
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 3.94
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.63
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 4.297978

dd finishes 

open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.000199
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.013413
open(test, O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 0.025134


I guess I should take this to the XFS folks.

 John


Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-07 Thread Dean S. Messing

Justin Piszcz wrote:
On Fri, 5 Oct 2007, Dean S. Messing wrote:

 Brendan Conoboy wrote:
 snip
 Is the onboard SATA controller real SATA or just an ATA-SATA
 converter?  If the latter, you're going to have trouble getting faster
 performance than any one disk can give you at a time.  The output of
 'lspci' should tell you if the onboard SATA controller is on its own
 bus or sharing space with some other device.  Pasting the output here
 would be useful.
 snip

 N00bee question:

 How does one tell if a machine's disk controller is an ATA-SATA
 converter?

 The output of `lspci|fgrep -i sata' is:

 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI 
 Controller\
 (rev 09)

 suggests a real SATA. These references to ATA in dmesg, however,
 make me wonder.

 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133
 ata1.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32)
 ata1.00: configured for UDMA/133
 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133
 ata2.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32)
 ata2.00: configured for UDMA/133
 ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133
 ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
 ata3.00: configured for UDMA/133


 Dean
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


His drives are either really old and do not support NCQ or he is not using 
AHCI in the BIOS.

Sorry, Justin, if I wasn't clear.  I was asking the N00bee question
about _my_own_ machine.  The output of lspci (on my machine) seems to
indicate I have a real STAT controller on the Motherboard, but the
contents of dmesg, with the references to ATA-7 and UDMA/133, made
me wonder if I had just an ATA-SATA converter.  Hence my question: how
does one tell definitively if one has a real SATA controller on the Mother
Board?
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-06 Thread Justin Piszcz



On Fri, 5 Oct 2007, Dean S. Messing wrote:



Brendan Conoboy wrote:
snip

Is the onboard SATA controller real SATA or just an ATA-SATA
converter?  If the latter, you're going to have trouble getting faster
performance than any one disk can give you at a time.  The output of
'lspci' should tell you if the onboard SATA controller is on its own
bus or sharing space with some other device.  Pasting the output here
would be useful.

snip

N00bee question:

How does one tell if a machine's disk controller is an ATA-SATA
converter?

The output of `lspci|fgrep -i sata' is:

00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\
(rev 09)

suggests a real SATA. These references to ATA in dmesg, however,
make me wonder.

ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133
ata1.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133
ata2.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata2.00: configured for UDMA/133
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133
ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata3.00: configured for UDMA/133


Dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



His drives are either really old and do not support NCQ or he is not using 
AHCI in the BIOS.


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-06 Thread Andrew Clayton
On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote:

 Also if it is software raid, when you make the XFS filesyste, on it,
 it sets up a proper (and tuned) sunit/swidth, so why would you want
 to change that?

Oh I didn't, the sunit and swidth were set automatically. Do they look
sane?. From reading the XFS section of the mount man page, I'm not
entirely sure what they specify and certainly wouldn't have any idea
what to set them to.

 Justin.

Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-06 Thread Justin Piszcz



On Wed, 3 Oct 2007, Andrew Clayton wrote:


On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote:


Also if it is software raid, when you make the XFS filesyste, on it,
it sets up a proper (and tuned) sunit/swidth, so why would you want
to change that?


Oh I didn't, the sunit and swidth were set automatically. Do they look
sane?. From reading the XFS section of the mount man page, I'm not
entirely sure what they specify and certainly wouldn't have any idea
what to set them to.


Justin.


Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



As long as you ran mkfs.xfs /dev/md0 it should have optimized the 
filesystem according to the disks beneath it.


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-06 Thread Justin Piszcz



On Sat, 6 Oct 2007, Justin Piszcz wrote:




On Wed, 3 Oct 2007, Andrew Clayton wrote:


On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote:


Also if it is software raid, when you make the XFS filesyste, on it,
it sets up a proper (and tuned) sunit/swidth, so why would you want
to change that?


Oh I didn't, the sunit and swidth were set automatically. Do they look
sane?. From reading the XFS section of the mount man page, I'm not
entirely sure what they specify and certainly wouldn't have any idea
what to set them to.


Justin.


Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



As long as you ran mkfs.xfs /dev/md0 it should have optimized the filesystem 
according to the disks beneath it.


Justin.



Also can you provide the smartctl -a /dev/sda
/dev/sdb

etc for each disk?
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Justin Piszcz



On Thu, 4 Oct 2007, Andrew Clayton wrote:


On Thu, 04 Oct 2007 12:46:05 -0400, Steve Cousins wrote:


Andrew Clayton wrote:

On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote:


  What type (make/model) of the drives?

   

The drives are 250GB  Hitachi Deskstar 7K250 series ATA-6 UDMA/100
  A couple of things:


1. I thought you had SATA drives
2. ATA-6 would be UDMA/133

The SATA-1 versions of the 7K250's did not have NCQ. The SATA-2
versions do have NCQ. If you do have SATA drives, are they SATA-1 or
SATA-2?


Not sure, I suspect SATA 1 seeing as we've had them nearly 3 years.

Some bits from dmesg

ata1: SATA max UDMA/100 cmd 0xc2aa4880 ctl 0xc2aa488a
bmdma 0xff ffc2aa4800 irq 19
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: ATA-6: HDS722525VLSA80, V36OA63A, max UDMA/100
ata1.00: 488397168 sectors, multi 16: LBA48
ata1.00: configured for UDMA/100


Steve


Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Looks like SATA1 (non-ncq) to me.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Justin Piszcz



On Fri, 5 Oct 2007, Andrew Clayton wrote:


On Fri, 5 Oct 2007 06:25:20 -0400 (EDT), Justin Piszcz wrote:



So you have 3 SATA 1 disks:


Yeah, 3 of them in the array, there is a fourth standalone disk which
contains the root fs from which the system boots..


http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D

Do you compile your own kernel or use the distribution's kernel?


Compile my own.


What does cat /proc/interrupts say? This is important to see if your
disk controller(s) are sharing IRQs with other devices.


$ cat /proc/interrupts
  CPU0   CPU1
 0: 132052  249369403   IO-APIC-edge  timer
 1:202 52   IO-APIC-edge  i8042
 8:  0  1   IO-APIC-edge  rtc
 9:  0  0   IO-APIC-fasteoi   acpi
14:  11483172   IO-APIC-edge  ide0
16:   180411954798850   IO-APIC-fasteoi   sata_sil24
18:   86068930 27   IO-APIC-fasteoi   eth0
19:   161276622138177   IO-APIC-fasteoi   sata_sil, ohci_hcd:usb1, 
ohci_hcd:usb2
NMI:  0  0
LOC:  249368914  249368949
ERR:  0


sata_sil24 contains the raid array, sata_sil the root fs disk



Also note with only 3 disks in a RAID-5 you will not get stellar
performance, but regardless, it should not be 'hanging' as you have
mentioned.  Just out of sheer curiosity have you tried the AS
scheduler? CFQ is supposed to be better for multi-user performance
but I would be highly interested if you used the AS scheduler-- would
that change the 'hanging' problem you are noticing?  I would give it
a shot, also try the deadline and noop.


I did try them briefly. I'll have another go.


You probably want to keep the nr_requessts to 128, the
stripe_cache_size to 8mb.  The stripe size of 256k is probably
optimal.


OK.


Did you also re-mount the XFS partition with the default mount
options (or just take the sunit and swidth)?


The /etc/fstab entry for the raid array is currently:

/dev/md0/home   xfs
noatime,logbufs=8 1 2

and mount says

/dev/md0 on /home type xfs (rw,noatime,logbufs=8)

and /proc/mounts

/dev/md0 /home xfs rw,noatime,logbufs=8,sunit=512,swidth=1024 0 0

So I guess mount or the kernel is setting the sunit and swidth values.


Justin.



Andrew



The mount options are from when the filesystem was made for sunit/swidth I 
believe.


   -N Causes the file system parameters  to  be  printed  out  without
  really creating the file system.

You should be able to run mkfs.xfs -N /dev/md0 to get that information.

/dev/md3/r1 xfs
noatime,nodiratime,logbufs=8,logbsize=262144 0 1

Try using the following options and the AS scheduler and let me know if 
you still notice any 'hangs'


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Justin Piszcz



On Fri, 5 Oct 2007, Andrew Clayton wrote:


On Fri, 5 Oct 2007 07:08:51 -0400 (EDT), Justin Piszcz wrote:


The mount options are from when the filesystem was made for
sunit/swidth I believe.

-N Causes the file system parameters  to  be  printed
out  without really creating the file system.

You should be able to run mkfs.xfs -N /dev/md0 to get that
information.


Can't do it while it's mounted. would xfs_info show the same stuff?


/dev/md3/r1 xfs
noatime,nodiratime,logbufs=8,logbsize=262144 0 1

Try using the following options and the AS scheduler and let me know
if you still notice any 'hangs'


OK, I've remounted (mount -o remount) with those options.
I've set the strip_cache_size to 8192
I've set the nr_requests back to 128
I've set the schedulers to anticipatory.

Unfortunately problem remains.

I'll try the noop scheduler as I don't think I ever tried that one.


Justin.


Andrew



How are you measuring the problem?  How can it be reproduced?

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Andrew Clayton
On Fri, 5 Oct 2007 13:53:12 +0100, Andrew Clayton wrote:


 Unfortunately problem remains.
 
 I'll try the noop scheduler as I don't think I ever tried that one.

Didn't help either, oh well.

If I hit the disk in workstation with a big dd then in iostat I see it
maxing out at about 40MB/sec with  1 second await. The server seems to
hit this with a much lower rate,  10MB/sec maybe

I think I'm going to also move the raid disks back onto the onboard
controller (as Goswin von Brederlow said it should have more bandwidth
anyway) as the PCI card doesn't seem to have helped and I'm seeing soft
SATA resets coming from it.

e.g

ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata6.00: irq_stat 0x00020002, device error via D2H FIS
ata6.00: cmd 35/00:00:07:4a:d9/00:04:02:00:00/e0 tag 0 cdb 0x0 data 524288 out
 res 51/84:00:06:4e:d9/00:00:02:00:00/e2 Emask 0x10 (ATA bus error)
ata6: soft resetting port
ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata6.00: configured for UDMA/100
ata6: EH complete


Just to confirm, I was seeing the problem with the on board controller
and thought moving the disks to the PCI card might help (at £35 it was
worth a shot!)

Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Andrew Clayton
On Fri, 5 Oct 2007 07:08:51 -0400 (EDT), Justin Piszcz wrote:

 The mount options are from when the filesystem was made for
 sunit/swidth I believe.
 
 -N Causes the file system parameters  to  be  printed
 out  without really creating the file system.
 
 You should be able to run mkfs.xfs -N /dev/md0 to get that
 information.

Can't do it while it's mounted. would xfs_info show the same stuff?

 /dev/md3/r1 xfs
 noatime,nodiratime,logbufs=8,logbsize=262144 0 1
 
 Try using the following options and the AS scheduler and let me know
 if you still notice any 'hangs'

OK, I've remounted (mount -o remount) with those options. 
I've set the strip_cache_size to 8192
I've set the nr_requests back to 128
I've set the schedulers to anticipatory.

Unfortunately problem remains.

I'll try the noop scheduler as I don't think I ever tried that one.
 
 Justin.

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Andrew Clayton
On Fri, 5 Oct 2007 10:07:47 -0400 (EDT), Justin Piszcz wrote:

 Yikes, yeah I would get them off the PCI card, what kind of
 motherboard is it?  If you don't have a PCI-e based board it probably
 won't help THAT much but it still should be better than placing 3
 drives on a PCI card.

It's a Tyan Thunder K8S Pro S2882. No PCIe. Though given the fact that
simply patching the kernel (on the RAID fs) when there's no other
disk activity slows to a crawl which I'm fairly sure it didn't used,
certainly these app stalls are new. The only trouble is I don't have
any iostat profile from say a year ago when everything was OK. So I
can't be 100% sure the current thing of spikes of iowait and await etc
didn't actually always happen and it's actually something else that's
wrong.
 
 Justin.

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Andrew Clayton
On Fri, 5 Oct 2007 10:07:47 -0400 (EDT), Justin Piszcz wrote:

 Yikes, yeah I would get them off the PCI card, what kind of
 motherboard is it?  If you don't have a PCI-e based board it probably
 won't help THAT much but it still should be better than placing 3
 drives on a PCI card.

Moved the drives back onto the on board controller.

While I had the machine down I ran memtest86+ for about 5 mins, no
errors.

I also got the output of mkfs.xfs -f -N /dev/md0

meta-data=/dev/md0   isize=256agcount=16, agsize=7631168 blks
 =   sectsz=4096  attr=0
data =   bsize=4096   blocks=122097920, imaxpct=25
 =   sunit=64 swidth=128 blks, unwritten=1
naming   =version 2  bsize=4096  
log  =internal log   bsize=4096   blocks=32768, version=2
 =   sectsz=4096  sunit=1 blks, lazy-count=0
realtime =none   extsz=524288 blocks=0, rtextents=0

 Justin.

Thanks for your help by the way.

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Justin Piszcz



On Fri, 5 Oct 2007, Andrew Clayton wrote:


On Fri, 5 Oct 2007 10:07:47 -0400 (EDT), Justin Piszcz wrote:


Yikes, yeah I would get them off the PCI card, what kind of
motherboard is it?  If you don't have a PCI-e based board it probably
won't help THAT much but it still should be better than placing 3
drives on a PCI card.


Moved the drives back onto the on board controller.

While I had the machine down I ran memtest86+ for about 5 mins, no
errors.

I also got the output of mkfs.xfs -f -N /dev/md0

meta-data=/dev/md0   isize=256agcount=16, agsize=7631168 blks
=   sectsz=4096  attr=0
data =   bsize=4096   blocks=122097920, imaxpct=25
=   sunit=64 swidth=128 blks, unwritten=1
naming   =version 2  bsize=4096
log  =internal log   bsize=4096   blocks=32768, version=2
=   sectsz=4096  sunit=1 blks, lazy-count=0
realtime =none   extsz=524288 blocks=0, rtextents=0


Justin.


Thanks for your help by the way.

Andrew



Hm, unfortunately at this point I think I am out of ideas you may need to 
ask the XFS/linux-raid developers how to run blktrace during those 
operations to figure out what is going on.


BTW: Last thing I can think of, did you make any changes to PREEMPTION in 
the kernel, or do you disable it (SERVER)?


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Richard Scobie

Have you had a look at the smartctl -a outputs of all the drives?

Possibly one drive is being slow to respond due to seek errors etc. but 
I would perhaps expect to be seeing this in the log.


If you have a full backup and a spare drive, I would probably rotate it 
through the array.


Regards,

Richard
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Justin Piszcz



On Sat, 6 Oct 2007, Richard Scobie wrote:


Have you had a look at the smartctl -a outputs of all the drives?

Possibly one drive is being slow to respond due to seek errors etc. but I 
would perhaps expect to be seeing this in the log.


If you have a full backup and a spare drive, I would probably rotate it 
through the array.


Regards,

Richard
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Forgot about that, yeah post the smartctl -a output for each drive please.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread John Stoffel
 Andrew == Andrew Clayton [EMAIL PROTECTED] writes:

Andrew On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote:


 Also, did performance just go to crap one day or was it gradual?

Andrew IIRC I just noticed one day that firefox and vim was
Andrew stalling. That was back in February/March I think. At the time
Andrew the server was running a 2.6.18 kernel, since then I've tried
Andrew a few kernels in between that and currently 2.6.23-rc9

Andrew Something seems to be periodically causing a lot of activity
Andrew that max's out the stripe_cache for a few seconds (when I was
Andrew trying to look with blktrace, it seemed pdflush was doing a
Andrew lot of activity during this time).
 
Andrew What I had noticed just recently was when I was the only one
Andrew doing IO on the server (no NFS running and I was logged in at
Andrew the console) even just patching the kernel was crawling to a
Andrew halt.

How much memory does this system have?  Have you checked the output of
/proc/mtrr at all?  There' have been reports of systems with a bad
BIOS that gets the memory map wrong, causing access to memory to slow
down drastically.

So if you have 2gb of RAM, try booting with mem=1900m or something
like that and seeing if things are better for you.

Make sure your BIOS is upto the latest level as well.

John
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Andrew Clayton
On Fri, 5 Oct 2007 12:16:07 -0400 (EDT), Justin Piszcz wrote:

 
 Hm, unfortunately at this point I think I am out of ideas you may
 need to ask the XFS/linux-raid developers how to run blktrace during
 those operations to figure out what is going on.

No problem, cheers.
 
 BTW: Last thing I can think of, did you make any changes to
 PREEMPTION in the kernel, or do you disable it (SERVER)?

I normally have it disabled, but did try with voluntary preemption, but
with no effect.

 Justin.


Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Andrew Clayton
On Fri, 5 Oct 2007 15:02:22 -0400, John Stoffel wrote:

 
 How much memory does this system have?  Have you checked the output of

2GB

 /proc/mtrr at all?  There' have been reports of systems with a bad

$ cat /proc/mtrr 
reg00: base=0x (   0MB), size=2048MB: write-back, count=1

 BIOS that gets the memory map wrong, causing access to memory to slow
 down drastically.

BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 7fff (usable)
 BIOS-e820: 7fff - 7000 (ACPI data)
 BIOS-e820: 7000 - 8000 (ACPI NVS)
 BIOS-e820: ff78 - 0001 (reserved)


full dmesg (from 2.6.21-rc8-git2) at
http://digital-domain.net/kernel/sw-raid5-issue/dmesg

 So if you have 2gb of RAM, try booting with mem=1900m or something

Worth a shot.

 like that and seeing if things are better for you.
 
 Make sure your BIOS is upto the latest level as well.

Hmm, I'll see whats involved in that.
 
 John


Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Brendan Conoboy

Andrew Clayton wrote:

If anyone has any idea's I'm all ears.


Hi Andrew,

Are you sure your drives are healthy?  Try benchmarking each drive 
individually and see if there is a dramatic performance difference 
between any of them.  One failing drive can slow down an entire array. 
 Only after you have determined that your drives are healthy when 
accessed individually are combined results particularly meaningful.  For 
a generic SATA 1 drive you should expect a sustained raw read or write 
in excess of 45 MB/s.  Check both read and write (this will destroy 
data) and make sure your cache is clear prior to the read test and after 
the write test.  If each drive is working at a reasonable rate 
individually, you're ready to move on.


The next question is: What happens when you access more than one device 
at the same time?  You should either get nearly full combined 
performance, max out CPU, or get throttled by bus bandwidth (An actual 
kernel bug could also come into play here, but I tend to doubt it).  Is 
the onboard SATA controller real SATA or just an ATA-SATA converter?  If 
the latter, you're going to have trouble getting faster performance than 
any one disk can give you at a time.  The output of 'lspci' should tell 
you if the onboard SATA controller is on its own bus or sharing space 
with some other device.  Pasting the output here would be useful.


Assuming you get good performance out of all 3 drives at the same time, 
it's time to create a RAID 5 md device with the three, make sure your 
parity is done building, then benchmark that.  It's going to be slower 
to write and a bit slower to read (especially if your CPU is maxed out), 
but that is normal.


Assuming you get good performance out of your md device, it's time to 
put your filesystem on the md device and benchmark that.  If you use 
ext3, remember to set the stride parameter per the raid howto.  I am 
unfamiliar with other fs/md interactions, so be sure to check.


If you're actually maxing out your bus bandwidth and the onboard sata 
controller is on a different bus than the pci sata controller, try 
balancing the drives between the two to get a larger combined pipe.


Good luck,

--
Brendan Conoboy / Red Hat, Inc. / [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread John Stoffel

Andrew On Fri, 5 Oct 2007 15:02:22 -0400, John Stoffel wrote:
 
 How much memory does this system have?  Have you checked the output of

Andrew 2GB

 /proc/mtrr at all?  There' have been reports of systems with a bad

Andrew $ cat /proc/mtrr 
Andrew reg00: base=0x (   0MB), size=2048MB: write-back, count=1

That looks to be good, all the memory is there all in the same
region.  Oh well... it was a thought. 

 BIOS that gets the memory map wrong, causing access to memory to slow
 down drastically.

Andrew BIOS-provided physical RAM map:
Andrew  BIOS-e820:  - 0009fc00 (usable)
Andrew  BIOS-e820: 0009fc00 - 000a (reserved)
Andrew  BIOS-e820: 000e - 0010 (reserved)
Andrew  BIOS-e820: 0010 - 7fff (usable)
Andrew  BIOS-e820: 7fff - 7000 (ACPI data)
Andrew  BIOS-e820: 7000 - 8000 (ACPI NVS)
Andrew  BIOS-e820: ff78 - 0001 (reserved)

I dunno about this part.  

Andrew full dmesg (from 2.6.21-rc8-git2) at
Andrew http://digital-domain.net/kernel/sw-raid5-issue/dmesg

 So if you have 2gb of RAM, try booting with mem=1900m or something

Andrew Worth a shot.

It might make a difference, might not.  Do you have any kernel
debugging options turned on?  That might also be an issue.  Check your
.config, there are a couple of options which drastically slow down the
system. 

 like that and seeing if things are better for you.
 
 Make sure your BIOS is upto the latest level as well.

Andrew Hmm, I'll see whats involved in that.
 
At this point, I don't suspect the BIOS any more.  

Can you start a 'vmstat 1' in one window, then start whatever you do
to get crappy performance.  That would be interesting to see.

John
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-05 Thread Dean S. Messing

Brendan Conoboy wrote:
snip
 Is the onboard SATA controller real SATA or just an ATA-SATA
 converter?  If the latter, you're going to have trouble getting faster
 performance than any one disk can give you at a time.  The output of
 'lspci' should tell you if the onboard SATA controller is on its own
 bus or sharing space with some other device.  Pasting the output here
 would be useful.
snip

N00bee question: 

How does one tell if a machine's disk controller is an ATA-SATA
converter?

The output of `lspci|fgrep -i sata' is:

00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller\
 (rev 09)

suggests a real SATA. These references to ATA in dmesg, however,
make me wonder.

 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata1.00: ATA-7: WDC WD1600JS-75NCB3, 10.02E04, max UDMA/133
 ata1.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32)
 ata1.00: configured for UDMA/133
 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata2.00: ATA-7: ST3160812AS, 3.ADJ, max UDMA/133
 ata2.00: 31250 sectors, multi 0: LBA48 NCQ (depth 31/32)
 ata2.00: configured for UDMA/133
 ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 ata3.00: ATA-7: ST3500630NS, 3.AEK, max UDMA/133
 ata3.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
 ata3.00: configured for UDMA/133


Dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Andrew Clayton
On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote:

 Not bad, but not that good, either. Try running xfs_fsr into a nightly
 cronjob. By default, it will defrag mounted xfs filesystems for up to
 2 hours. Typically this is enough to keep fragmentation well below 1%.

I ran it last night on the raid array, it got the fragmentation down
to 1.07%. Unfortunately that doesn't seemed to have helped.
 
 -Dave

Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Justin Piszcz



On Thu, 4 Oct 2007, Justin Piszcz wrote:


Is NCQ enabled on the drives?

On Thu, 4 Oct 2007, Andrew Clayton wrote:


On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote:


Not bad, but not that good, either. Try running xfs_fsr into a nightly
cronjob. By default, it will defrag mounted xfs filesystems for up to
2 hours. Typically this is enough to keep fragmentation well below 1%.


I ran it last night on the raid array, it got the fragmentation down
to 1.07%. Unfortunately that doesn't seemed to have helped.


-Dave


Cheers,

Andrew


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Also, did performance just go to crap one day or was it gradual?

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Andrew Clayton
On Thu, 4 Oct 2007 10:09:22 -0400 (EDT), Justin Piszcz wrote:

 Is NCQ enabled on the drives?

I don't think the drives are capable of that. I don't seen any mention
of NCQ in dmesg.


Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Justin Piszcz



On Thu, 4 Oct 2007, Andrew Clayton wrote:


On Thu, 4 Oct 2007 10:09:22 -0400 (EDT), Justin Piszcz wrote:


Is NCQ enabled on the drives?


I don't think the drives are capable of that. I don't seen any mention
of NCQ in dmesg.


Andrew



What type (make/model) of the drives?

True, the controller may not be able to do it either.

What types of disks/controllers again?

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Justin Piszcz



On Thu, 4 Oct 2007, Andrew Clayton wrote:


On Thu, 4 Oct 2007 10:09:22 -0400 (EDT), Justin Piszcz wrote:


Is NCQ enabled on the drives?


I don't think the drives are capable of that. I don't seen any mention
of NCQ in dmesg.


Andrew



BTW You may not see 'NCQ' in the kernel messages unless you enable AHCI.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Andrew Clayton
On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote:


 Also, did performance just go to crap one day or was it gradual?

IIRC I just noticed one day that firefox and vim was stalling. That was
back in February/March I think. At the time the server was running a
2.6.18 kernel, since then I've tried a few kernels in between that and
currently 2.6.23-rc9

Something seems to be periodically causing a lot of activity that
max's out the stripe_cache for a few seconds (when I was trying
to look with blktrace, it seemed pdflush was doing a lot of activity
during this time). 
 
What I had noticed just recently was when I was the only one doing IO
on the server (no NFS running and I was logged in at the console) even
just patching the kernel was crawling to a halt.

 Justin.

Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Andrew Clayton
On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote:


 What type (make/model) of the drives?

The drives are 250GB  Hitachi Deskstar 7K250 series ATA-6 UDMA/100

 True, the controller may not be able to do it either.
 
 What types of disks/controllers again?

The RAID disks are currently connected to a Silicon Image PCI card are
configured as a software RAID 5

03:04.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA 
Controller (rev 02)
Subsystem: Silicon Image, Inc. Unknown device 7124
Flags: bus master, stepping, 66MHz, medium devsel, latency 64, IRQ 16
Memory at feafec00 (64-bit, non-prefetchable) [size=128]
Memory at feaf (64-bit, non-prefetchable) [size=32K]
I/O ports at bc00 [size=16]
Expansion ROM at fea0 [disabled] [size=512K]
Capabilities: [64] Power Management version 2
Capabilities: [40] PCI-X non-bridge device
Capabilities: [54] Message Signalled Interrupts: 64bit+ Queue=0/0 
Enable-


The problem originated  when the disks where connected to the on board
Silicon Image 3114 controller.

 Justin.

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Justin Piszcz



On Thu, 4 Oct 2007, Andrew Clayton wrote:


On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote:



What type (make/model) of the drives?


The drives are 250GB  Hitachi Deskstar 7K250 series ATA-6 UDMA/100


True, the controller may not be able to do it either.

What types of disks/controllers again?


The RAID disks are currently connected to a Silicon Image PCI card are
configured as a software RAID 5

03:04.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA 
Controller (rev 02)
   Subsystem: Silicon Image, Inc. Unknown device 7124
   Flags: bus master, stepping, 66MHz, medium devsel, latency 64, IRQ 16
   Memory at feafec00 (64-bit, non-prefetchable) [size=128]
   Memory at feaf (64-bit, non-prefetchable) [size=32K]
   I/O ports at bc00 [size=16]
   Expansion ROM at fea0 [disabled] [size=512K]
   Capabilities: [64] Power Management version 2
   Capabilities: [40] PCI-X non-bridge device
   Capabilities: [54] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-


The problem originated  when the disks where connected to the on board
Silicon Image 3114 controller.


Justin.


Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



7K250

http://www.itreviews.co.uk/hardware/h912.htm

http://techreport.com/articles.x/8362
The T7K250 also supports Native Command Queuing (NCQ).

You need to enable AHCI in order to reap the benefits though.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Justin Piszcz



On Thu, 4 Oct 2007, Andrew Clayton wrote:


On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote:



Also, did performance just go to crap one day or was it gradual?


IIRC I just noticed one day that firefox and vim was stalling. That was
back in February/March I think. At the time the server was running a
2.6.18 kernel, since then I've tried a few kernels in between that and
currently 2.6.23-rc9

Something seems to be periodically causing a lot of activity that
max's out the stripe_cache for a few seconds (when I was trying
to look with blktrace, it seemed pdflush was doing a lot of activity
during this time).

What I had noticed just recently was when I was the only one doing IO
on the server (no NFS running and I was logged in at the console) even
just patching the kernel was crawling to a halt.


Justin.


Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Besides the NCQ issue your problem is a bit perpelxing..

Just out of curiosity have you run memtest86 for at least one pass to make 
sure there were no problems with the memory?


Do you have a script showing all of the parameters that you use to 
optimize the array?


Also mdadm -D /dev/md0 output please?

What distribution are you running? (not that it should matter, but just 
curious)


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Steve Cousins

Steve Cousins wrote:

A couple of things:

   1. I thought you had SATA drives
   2. ATA-6 would be UDMA/133


Number 2 is not correct. Sorry about that.

Steve
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Steve Cousins

Andrew Clayton wrote:

On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote:


  

What type (make/model) of the drives?



The drives are 250GB  Hitachi Deskstar 7K250 series ATA-6 UDMA/100
  


A couple of things:

   1. I thought you had SATA drives
   2. ATA-6 would be UDMA/133

The SATA-1 versions of the 7K250's did not have NCQ. The SATA-2 versions 
do have NCQ. If you do have SATA drives, are they SATA-1 or SATA-2?


Steve
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Andrew Clayton
On Thu, 4 Oct 2007 12:20:25 -0400 (EDT), Justin Piszcz wrote:

 
 
 On Thu, 4 Oct 2007, Andrew Clayton wrote:
 
  On Thu, 4 Oct 2007 10:10:02 -0400 (EDT), Justin Piszcz wrote:
 
 
  Also, did performance just go to crap one day or was it gradual?
 
  IIRC I just noticed one day that firefox and vim was stalling. That
  was back in February/March I think. At the time the server was
  running a 2.6.18 kernel, since then I've tried a few kernels in
  between that and currently 2.6.23-rc9
 
  Something seems to be periodically causing a lot of activity that
  max's out the stripe_cache for a few seconds (when I was trying
  to look with blktrace, it seemed pdflush was doing a lot of activity
  during this time).
 
  What I had noticed just recently was when I was the only one doing
  IO on the server (no NFS running and I was logged in at the
  console) even just patching the kernel was crawling to a halt.
 
  Justin.
 
  Cheers,
 
  Andrew
  -
  To unsubscribe from this list: send the line unsubscribe
  linux-raid in the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 Besides the NCQ issue your problem is a bit perpelxing..
 
 Just out of curiosity have you run memtest86 for at least one pass to
 make sure there were no problems with the memory?

No I haven't.

 Do you have a script showing all of the parameters that you use to
 optimize the array?

No script, Nothing that I change really seems to make any difference.

Currently I have set

 /sys/block/md0/md/stripe_cache_size set at 16384

It doesn't really seem to matter what I set it to, as the
stripe_cache_active will periodically reach that value and take a few
seconds to come back down.

/sys/block/sd[bcd]/queue/nr_requests to 512

and set readhead to 8192 on sd[bcd]

But none of that really seems to make any difference.

 Also mdadm -D /dev/md0 output please?

http://digital-domain.net/kernel/sw-raid5-issue/mdadm-D

 What distribution are you running? (not that it should matter, but
 just curious)

Fedora Core 6 (though I'm fairly sure it was happening before
upgrading from Fedora Core 5)

The iostat output of the drives when the problem occurs looks like the
same profile as when the backup is going onto the USB 1.1 hard drive.
The IO wait goes up, the cpu % is hitting 100% and we see multi second
await times. Which is why I thought maybe the on board controller was a
bottleneck, like the USB 1.1 is really slow and moved the disks onto
the PCI card. But when I saw that even patching the kernel was going
really slow I thought it can't really be the problem as it didn't used
to go that slow.

It's a tricky one...

 Justin.

Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Andrew Clayton
On Thu, 04 Oct 2007 12:46:05 -0400, Steve Cousins wrote:

 Andrew Clayton wrote:
  On Thu, 4 Oct 2007 10:39:09 -0400 (EDT), Justin Piszcz wrote:
 
 
 What type (make/model) of the drives?
  
  The drives are 250GB  Hitachi Deskstar 7K250 series ATA-6 UDMA/100
A couple of things:
 
 1. I thought you had SATA drives
 2. ATA-6 would be UDMA/133
 
 The SATA-1 versions of the 7K250's did not have NCQ. The SATA-2
 versions do have NCQ. If you do have SATA drives, are they SATA-1 or
 SATA-2?

Not sure, I suspect SATA 1 seeing as we've had them nearly 3 years.

Some bits from dmesg

ata1: SATA max UDMA/100 cmd 0xc2aa4880 ctl 0xc2aa488a
bmdma 0xff ffc2aa4800 irq 19
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: ATA-6: HDS722525VLSA80, V36OA63A, max UDMA/100
ata1.00: 488397168 sectors, multi 16: LBA48 
ata1.00: configured for UDMA/100

 Steve

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-04 Thread Andrew Clayton
On Thu, 4 Oct 2007 12:19:20 -0400 (EDT), Justin Piszcz wrote:

 
 7K250
 
 http://www.itreviews.co.uk/hardware/h912.htm
 
 http://techreport.com/articles.x/8362
 The T7K250 also supports Native Command Queuing (NCQ).
 
 You need to enable AHCI in order to reap the benefits though.

Cheers, I'll take a look at that.

 Justin.

Andrew

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Justin Piszcz

Have you checked fragmentation?

xfs_db -c frag -f /dev/md3

What does this report?

Justin.

On Wed, 3 Oct 2007, Andrew Clayton wrote:


Hi,

Hardware:

Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file 
system) is connected to the onboard Silicon Image 3114 controller. The other 3 
(/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 card. I 
moved the 3 raid disks off the on board controller onto the card the other day 
to see if that would help, it didn't.

Software:

Fedora Core 6, 2.6.23-rc9 kernel.

Array/fs details:

Filesystems are XFS

FilesystemTypeSize  Used Avail Use% Mounted on
/dev/sda2  xfs 20G  5.6G   14G  29% /
/dev/sda5  xfs213G  3.6G  209G   2% /data
none tmpfs   1008M 0 1008M   0% /dev/shm
/dev/md0   xfs466G  237G  229G  51% /home

/dev/md0 is currently mounted with the following options

noatime,logbufs=8,sunit=512,swidth=1024

sunit and swidth seem to be automatically set.

xfs_info shows

meta-data=/dev/md0   isize=256agcount=16, agsize=7631168 blks
=   sectsz=4096  attr=1
data =   bsize=4096   blocks=122097920, imaxpct=25
=   sunit=64 swidth=128 blks, unwritten=1
naming   =version 2  bsize=4096
log  =internal   bsize=4096   blocks=32768, version=2
=   sectsz=4096  sunit=1 blks, lazy-count=0
realtime =none   extsz=524288 blocks=0, rtextents=0

The array has a 256k chunk size using left-symmetric layout.

/sys/block/md0/md/stripe_cache_size is currently at 4096 (upping this from
256, alleviates the problem at best)

I also have currently set /sys/block/sd[bcd]/queue/nr_requests to 512 (doesn't
seem to have made any difference)

Also blockdev --setra 8192 /dev/sd[bcd] also tried 16384 and 32768

IO scheduler is cfq for all devices.


This machine acts as a file server for about 11 workstations. /home (the 
software RAID 5) is exported over NFS where by the clients mount their home 
directories (using autofs).

I set it up about 3 years ago and it has been fine. However earlier this year 
we started noticing application stalls. e.g firefox would become unrepsonsive 
and the window would grey out (under Compiz), this typically lasts 2-4 seconds.

During these stalls, I see the below iostat activity (taken at 2 second 
intervals on the file server). High iowait, high await's. The 
stripe_cache_active max's out and things kind of grind to halt for a few 
seconds until the stripe_cache_active starts shrinking.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  0.000.000.000.250.00   99.75

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   0.00 0.00  0.00  5.47 0.0040.8014.91 0.05 
   9.73   7.18   3.93
sdb   0.00 0.00  1.49  1.49 5.97 9.9510.67 0.06 
  18.50   9.00   2.69
sdc   0.00 0.00  0.00  2.99 0.0015.9210.67 0.01 
   4.17   4.17   1.24
sdd   0.00 0.00  0.50  2.49 1.9913.9310.67 0.02 
   5.67   5.67   1.69
md0   0.00 0.00  0.00  1.99 0.00 7.96 8.00 0.00 
   0.00   0.00   0.00
sde   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  0.250.005.241.500.00   93.02

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   0.00 0.00  0.00 12.50 0.0085.7513.72 0.12 
   9.60   6.28   7.85
sdb 182.50   275.00 114.00 17.50   986.0082.0016.24   
337.03  660.64   6.06  79.70
sdc 171.00   269.50 117.00 20.00  1012.0094.0016.15   
315.35  677.73   5.86  80.25
sdd 149.00   278.00 107.00 18.50   940.0084.0016.32   
311.83  705.33   6.33  79.40
md0   0.00 0.00  0.00 1012.00 0.00  8090.0015.99 
0.000.00   0.00   0.00
sde   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  0.000.001.50   44.610.00   53.88

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   0.00 0.00  0.00  1.00 0.00 4.25 8.50 0.00 
   0.00   0.00   0.00
sdb 168.5064.00 129.50 58.00  1114.00   508.0017.30   
645.37 1272.90   5.34 100.05
sdc 194.0076.50 141.50 43.00  1232.00   360.0017.26   
664.01  916.30   5.42 100.05
sdd 172.0090.50 114.50 50.00   996.00   456.0017.65   
662.54  977.28   6.08 100.05
md0   0.00 0.00  0.50  8.00 2.0032.00 

Re: RAID 5 performance issue.

2007-10-03 Thread Justin Piszcz
Also if it is software raid, when you make the XFS filesyste, on it, it 
sets up a proper (and tuned) sunit/swidth, so why would you want to change 
that?


Justin.

On Wed, 3 Oct 2007, Justin Piszcz wrote:


Have you checked fragmentation?

xfs_db -c frag -f /dev/md3

What does this report?

Justin.

On Wed, 3 Oct 2007, Andrew Clayton wrote:


Hi,

Hardware:

Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file 
system) is connected to the onboard Silicon Image 3114 controller. The 
other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 
3124 card. I moved the 3 raid disks off the on board controller onto the 
card the other day to see if that would help, it didn't.


Software:

Fedora Core 6, 2.6.23-rc9 kernel.

Array/fs details:

Filesystems are XFS

FilesystemTypeSize  Used Avail Use% Mounted on
/dev/sda2  xfs 20G  5.6G   14G  29% /
/dev/sda5  xfs213G  3.6G  209G   2% /data
none tmpfs   1008M 0 1008M   0% /dev/shm
/dev/md0   xfs466G  237G  229G  51% /home

/dev/md0 is currently mounted with the following options

noatime,logbufs=8,sunit=512,swidth=1024

sunit and swidth seem to be automatically set.

xfs_info shows

meta-data=/dev/md0   isize=256agcount=16, agsize=7631168 
blks

=   sectsz=4096  attr=1
data =   bsize=4096   blocks=122097920, imaxpct=25
=   sunit=64 swidth=128 blks, unwritten=1
naming   =version 2  bsize=4096
log  =internal   bsize=4096   blocks=32768, version=2
=   sectsz=4096  sunit=1 blks, lazy-count=0
realtime =none   extsz=524288 blocks=0, rtextents=0

The array has a 256k chunk size using left-symmetric layout.

/sys/block/md0/md/stripe_cache_size is currently at 4096 (upping this from
256, alleviates the problem at best)

I also have currently set /sys/block/sd[bcd]/queue/nr_requests to 512 
(doesn't

seem to have made any difference)

Also blockdev --setra 8192 /dev/sd[bcd] also tried 16384 and 32768

IO scheduler is cfq for all devices.


This machine acts as a file server for about 11 workstations. /home (the 
software RAID 5) is exported over NFS where by the clients mount their home 
directories (using autofs).


I set it up about 3 years ago and it has been fine. However earlier this 
year we started noticing application stalls. e.g firefox would become 
unrepsonsive and the window would grey out (under Compiz), this typically 
lasts 2-4 seconds.


During these stalls, I see the below iostat activity (taken at 2 second 
intervals on the file server). High iowait, high await's. The 
stripe_cache_active max's out and things kind of grind to halt for a few 
seconds until the stripe_cache_active starts shrinking.


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  0.000.000.000.250.00   99.75

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 0.00  0.00  5.47 0.0040.8014.91 
0.059.73   7.18   3.93
sdb   0.00 0.00  1.49  1.49 5.97 9.9510.67 
0.06   18.50   9.00   2.69
sdc   0.00 0.00  0.00  2.99 0.0015.9210.67 
0.014.17   4.17   1.24
sdd   0.00 0.00  0.50  2.49 1.9913.9310.67 
0.025.67   5.67   1.69
md0   0.00 0.00  0.00  1.99 0.00 7.96 8.00 
0.000.00   0.00   0.00
sde   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  0.250.005.241.500.00   93.02

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 0.00  0.00 12.50 0.0085.7513.72 
0.129.60   6.28   7.85
sdb 182.50   275.00 114.00 17.50   986.0082.0016.24 
337.03  660.64   6.06  79.70
sdc 171.00   269.50 117.00 20.00  1012.0094.0016.15 
315.35  677.73   5.86  80.25
sdd 149.00   278.00 107.00 18.50   940.0084.0016.32 
311.83  705.33   6.33  79.40
md0   0.00 0.00  0.00 1012.00 0.00  8090.0015.99 
0.000.00   0.00   0.00
sde   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  0.000.001.50   44.610.00   53.88

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 0.00  0.00  1.00 0.00 4.25 8.50 
0.000.00   0.00   0.00
sdb 168.5064.00 129.50 58.00  1114.00   508.0017.30 
645.37 1272.90   5.34 100.05
sdc 194.0076.50 141.50 43.00  1232.00   360.0017.26 
664.01  916.30   5.42 100.05

Re: RAID 5 performance issue.

2007-10-03 Thread Goswin von Brederlow
Andrew Clayton [EMAIL PROTECTED] writes:

 Hi,

 Hardware:

 Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file 
 system) is connected to the onboard Silicon Image 3114 controller. The other 
 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 
 card. I moved the 3 raid disks off the on board controller onto the card the 
 other day to see if that would help, it didn't.

I would think the onboard controller is connected to the north or
south bridge and possibly hooked directly into the hyper
transport. The extra controler is PCI so you are limited to
theoretical 128MiB/s. For me the onboard chips do much better (though
at higher cpu cost) than pci cards.

MfG
Goswin
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Andrew Clayton
On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote:

 Have you checked fragmentation?

You know, that never even occurred to me. I've gotten into the mind set
that it's generally not a problem under Linux.

 xfs_db -c frag -f /dev/md3
 
 What does this report?

# xfs_db -c frag -f /dev/md0
actual 1828276, ideal 1708782, fragmentation factor 6.54%

Good or bad? 

Seeing as this filesystem will be three years old in December, that
doesn't seem overly bad.


I'm currently looking to things like

http://lwn.net/Articles/249450/ and 
http://lwn.net/Articles/242559/

for potential help, fortunately it seems I won't have too long to wait.

 Justin.

Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Justin Piszcz



On Wed, 3 Oct 2007, Andrew Clayton wrote:


On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote:


Also if it is software raid, when you make the XFS filesyste, on it,
it sets up a proper (and tuned) sunit/swidth, so why would you want
to change that?


Oh I didn't, the sunit and swidth were set automatically. Do they look
sane?. From reading the XFS section of the mount man page, I'm not
entirely sure what they specify and certainly wouldn't have any idea
what to set them to.


Justin.


Cheers,

Andrew



You should not need to set them as mount options unless you are overriding 
the defaults.


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Andrew Clayton
On Wed, 03 Oct 2007 19:53:08 +0200, Goswin von Brederlow wrote:

 Andrew Clayton [EMAIL PROTECTED] writes:
 
  Hi,
 
  Hardware:
 
  Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1
  (root file system) is connected to the onboard Silicon Image 3114
  controller. The other 3 (/home) are in a software RAID 5 connected
  to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the
  on board controller onto the card the other day to see if that
  would help, it didn't.
 
 I would think the onboard controller is connected to the north or
 south bridge and possibly hooked directly into the hyper
 transport. The extra controler is PCI so you are limited to
 theoretical 128MiB/s. For me the onboard chips do much better (though
 at higher cpu cost) than pci cards.

Yeah, I was wondering about that. It certainly hasn't improved things,
it's unclear if it's made things any worse..

 MfG
 Goswin


Cheers,

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Justin Piszcz

What does cat /sys/block/md0/md/mismatch_cnt say?

That fragmentation looks normal/fine.

Justin.

On Wed, 3 Oct 2007, Andrew Clayton wrote:


On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote:


Have you checked fragmentation?


You know, that never even occurred to me. I've gotten into the mind set
that it's generally not a problem under Linux.


xfs_db -c frag -f /dev/md3

What does this report?


# xfs_db -c frag -f /dev/md0
actual 1828276, ideal 1708782, fragmentation factor 6.54%

Good or bad?

Seeing as this filesystem will be three years old in December, that
doesn't seem overly bad.


I'm currently looking to things like

http://lwn.net/Articles/249450/ and
http://lwn.net/Articles/242559/

for potential help, fortunately it seems I won't have too long to wait.


Justin.


Cheers,

Andrew


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread David Rees
On 10/3/07, Andrew Clayton [EMAIL PROTECTED] wrote:
 On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote:
  Have you checked fragmentation?

 You know, that never even occurred to me. I've gotten into the mind set
 that it's generally not a problem under Linux.

It's probably not the root cause, but certainly doesn't help things.
At least with XFS you have an easy way to defrag the filesystem
without even taking it offline.

 # xfs_db -c frag -f /dev/md0
 actual 1828276, ideal 1708782, fragmentation factor 6.54%

 Good or bad?

Not bad, but not that good, either. Try running xfs_fsr into a nightly
cronjob. By default, it will defrag mounted xfs filesystems for up to
2 hours. Typically this is enough to keep fragmentation well below 1%.

-Dave
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Andrew Clayton
On Wed, 3 Oct 2007 16:35:21 -0400 (EDT), Justin Piszcz wrote:

 What does cat /sys/block/md0/md/mismatch_cnt say?

$ cat /sys/block/md0/md/mismatch_cnt
0

 That fragmentation looks normal/fine.

Cool.

 Justin.

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Richard Scobie

Andrew Clayton wrote:


Yeah, I was wondering about that. It certainly hasn't improved things,
it's unclear if it's made things any worse..



Many 3124 cards are PCI-X, so if you have one of these (and you seem to 
be using a server board which may well have PCI-X), bus performance is 
not going to be an issue.


Regards,

Richard
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Andrew Clayton
On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote:

  # xfs_db -c frag -f /dev/md0
  actual 1828276, ideal 1708782, fragmentation factor 6.54%
 
  Good or bad?
 
 Not bad, but not that good, either. Try running xfs_fsr into a nightly
 cronjob. By default, it will defrag mounted xfs filesystems for up to
 2 hours. Typically this is enough to keep fragmentation well below 1%.

Worth a shot.

 -Dave

Andrew
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html