Re: [Gelato-technical] Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-21 Thread Stan Bubrouski
Luck, Tony wrote:

Only a new user would have to pull the whole history ... and for most
uses it is sufficient to just pull the current top of the tree. Linus'
own tree only has a history going back to 2.6.12.-rc2 (when he started
using git).
Someday there might be a server daemon that can batch up the changes for
a "pull" to conserve network bandwidth.
There is a mailing list "git@vger.kernel.org" where these issues are
discussed.  Archives are available at marc.theaimsgroup.com and gelato.
Thanks tony I wasn't aware of the list, I'll look there for git info
from now on.
Best Regards,
Stan
-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Gelato-technical] Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-21 Thread Luck, Tony
>That said, is there any plan to change how this functions in the future 
>to solve these problems?  I.e. have it not use so much diskspace and 
>thus use less bandwith.  Am I misunderstanding in assuming that after
>say 1000 commits go into the tree it could end up several megs or gigs 
>bigger?
>
>If that is the case might it not be more prudent to sort this out now?

Only a new user would have to pull the whole history ... and for most
uses it is sufficient to just pull the current top of the tree. Linus'
own tree only has a history going back to 2.6.12.-rc2 (when he started
using git).

Someday there might be a server daemon that can batch up the changes for
a "pull" to conserve network bandwidth.

There is a mailing list "git@vger.kernel.org" where these issues are
discussed.  Archives are available at marc.theaimsgroup.com and gelato.

-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Gelato-technical] Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-21 Thread Stan Bubrouski
Luck, Tony wrote:
Yeah, I'm facing the same issue.  I started playing with git last
night.  Apart from disk-space usage, it's very nice, though I really
hope someone puts together a web-interface on top of git soon so we
can seek what changed when and by whom.

Disk space issues?  A complete git repository of the Linux kernel with
all changesets back to 2.4.0 takes just over 3G ... which is big compared
to BK, but 3G of disk only costs about $1 (for IDE ... if you want 15K rpm
SCSI, then you'll pay a lot more).  Network bandwidth is likely to be a
bigger problem.
That said, is there any plan to change how this functions in the future 
to solve these problems?  I.e. have it not use so much diskspace and 
thus use less bandwith.  Am I misunderstanding in assuming that after
say 1000 commits go into the tree it could end up several megs or gigs 
bigger?

If that is the case might it not be more prudent to sort this out now?
There's a prototype web i/f at http://grmso.net:8090/ that's already looking
fairly slick.
Yes it is very slick.  Kudos to the creator.
-sb

-Tony

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Gelato-technical] Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-21 Thread Randy.Dunlap
On Thu, 21 Apr 2005 10:33:29 -0700 David Mosberger wrote:

| > On Thu, 21 Apr 2005 10:19:28 -0700, "Luck, Tony" <[EMAIL PROTECTED]> 
said:
| 
|   >> I just checked 2.6.12-rc3 and the fls() fix is indeed missing.
|   >> Do you know what happened?
| 
|   Tony> If BitKeeper were still in use, I'd have dropped that patch
|   Tony> into my "release" tree and asked Linus to "pull" ... but it's
|   Tony> not, and I was stalled.  I should have a "git" tree up and
|   Tony> running in the next couple of days.  I'll make sure that the
|   Tony> fls fix goes in early.
| 
| Yeah, I'm facing the same issue.  I started playing with git last
| night.  Apart from disk-space usage, it's very nice, though I really
| hope someone puts together a web-interface on top of git soon so we
| can seek what changed when and by whom.

2 people have already done that.  Examples:
http://ehlo.org/~kay/gitweb.pl
and
http://grmso.net:8090/

and the commits mailing list is now working.
A script to show nightly (or daily:) commits and make
a daily patch tarball is also close to ready.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Gelato-technical] Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-21 Thread David Mosberger
> On Thu, 21 Apr 2005 10:41:52 -0700, "Luck, Tony" <[EMAIL PROTECTED]> said:

  Tony> Disk space issues?  A complete git repository of the Linux
  Tony> kernel with all changesets back to 2.4.0 takes just over 3G
  Tony> ... which is big compared to BK, but 3G of disk only costs
  Tony> about $1 (for IDE ... if you want 15K rpm SCSI, then you'll
  Tony> pay a lot more).  Network bandwidth is likely to be a bigger
  Tony> problem.

Ever heard that data is a gas?  My disks always fill up in no time at
all, no matter how big they are.  I agree that network bandwidth is an
bigger issue, though.

  Tony> There's a prototype web i/f at http://grmso.net:8090/ that's
  Tony> already looking fairly slick.

Indeed.  Plus it has a cool name, too.  Thanks for the pointer.

--david
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Gelato-technical] Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-21 Thread Luck, Tony
>Yeah, I'm facing the same issue.  I started playing with git last
>night.  Apart from disk-space usage, it's very nice, though I really
>hope someone puts together a web-interface on top of git soon so we
>can seek what changed when and by whom.

Disk space issues?  A complete git repository of the Linux kernel with
all changesets back to 2.4.0 takes just over 3G ... which is big compared
to BK, but 3G of disk only costs about $1 (for IDE ... if you want 15K rpm
SCSI, then you'll pay a lot more).  Network bandwidth is likely to be a
bigger problem.

There's a prototype web i/f at http://grmso.net:8090/ that's already looking
fairly slick.

-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Gelato-technical] Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-21 Thread David Mosberger
> On Thu, 21 Apr 2005 10:19:28 -0700, "Luck, Tony" <[EMAIL PROTECTED]> said:

  >> I just checked 2.6.12-rc3 and the fls() fix is indeed missing.
  >> Do you know what happened?

  Tony> If BitKeeper were still in use, I'd have dropped that patch
  Tony> into my "release" tree and asked Linus to "pull" ... but it's
  Tony> not, and I was stalled.  I should have a "git" tree up and
  Tony> running in the next couple of days.  I'll make sure that the
  Tony> fls fix goes in early.

Yeah, I'm facing the same issue.  I started playing with git last
night.  Apart from disk-space usage, it's very nice, though I really
hope someone puts together a web-interface on top of git soon so we
can seek what changed when and by whom.

--david
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Gelato-technical] Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-21 Thread Luck, Tony
>I just checked 2.6.12-rc3 and the fls() fix is indeed missing.  Do you
>know what happened?

If BitKeeper were still in use, I'd have dropped that patch into my
"release" tree and asked Linus to "pull" ... but it's not, and I was
stalled.  I should have a "git" tree up and running in the next couple
of days.  I'll make sure that the fls fix goes in early.

-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Gelato-technical] Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-21 Thread David Mosberger
Tony and Andrew,

I just checked 2.6.12-rc3 and the fls() fix is indeed missing.  Do you
know what happened?

--david

> On Thu, 21 Apr 2005 13:30:50 +0200, Andreas Hirstius <[EMAIL PROTECTED]> 
> said:

  Andreas> Hi, The fls() patch from David solves the problem :-))

  Andreas> Do you have an idea, when it will be in the mainline
  Andreas> kernel??

  Andreas> Andreas



  Andreas> Bartlomiej ZOLNIERKIEWICZ wrote:

  >>  Hi!
  >> 
  >>> A small update.
  >>> 
  >>> Patching mm/filemap.c is not necessary in order to get the
  >>> improved performance!  It's sufficient to remove
  >>> roundup_pow_of_two from |get_init_ra_size ...
  >>> 
  >>> So a simple one-liner changes to picture dramatically.  But why
  >>> ?!?!?
  >> 
  >> 
  >> roundup_pow_of_two() uses fls() and ia64 has buggy fls()
  >> implementation [ seems that David fixed it but patch is not in
  >> the mainline yet]:
  >> 
  >> http://www.mail-archive.com/linux-ia64@vger.kernel.org/msg01196.html
  >> 
  >> That would also explain why you couldn't reproduce the problem on
  >> ia32 Xeon machines.
  >> 
  >> Bartlomiej
  >> 

  Andreas> ___
  Andreas> Gelato-technical mailing list
  Andreas> [EMAIL PROTECTED]
  Andreas> https://www.gelato.unsw.edu.au/mailman/listinfo/gelato-technical
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-21 Thread Andreas Hirstius
Hi,
The fls() patch from David solves the problem :-))
Do you have an idea, when it will be in the mainline kernel??
Andreas

Bartlomiej ZOLNIERKIEWICZ wrote:
Hi!
A small update.
Patching mm/filemap.c is not necessary in order to get the improved
performance!
It's sufficient to remove roundup_pow_of_two from  |get_init_ra_size ...
So a simple one-liner changes to picture dramatically.
But why ?!?!?

roundup_pow_of_two() uses fls() and ia64 has buggy fls() implementation
[ seems that David fixed it but patch is not in the mainline yet]:
http://www.mail-archive.com/linux-ia64@vger.kernel.org/msg01196.html
That would also explain why you couldn't reproduce the problem on ia32 
Xeon machines.

Bartlomiej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-21 Thread Bartlomiej ZOLNIERKIEWICZ
Hi!
A small update.
Patching mm/filemap.c is not necessary in order to get the improved
performance!
It's sufficient to remove roundup_pow_of_two from  |get_init_ra_size ...
So a simple one-liner changes to picture dramatically.
But why ?!?!?
roundup_pow_of_two() uses fls() and ia64 has buggy fls() implementation
[ seems that David fixed it but patch is not in the mainline yet]:
http://www.mail-archive.com/linux-ia64@vger.kernel.org/msg01196.html
That would also explain why you couldn't reproduce the problem on ia32 
Xeon machines.

Bartlomiej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-21 Thread Andreas Hirstius
A small update.
Patching mm/filemap.c is not necessary in order to get the improved 
performance!
It's sufficient to remove roundup_pow_of_two from  |get_init_ra_size ...

So a simple one-liner changes to picture dramatically.
But why ?!?!?
Andreas
|
jmerkey wrote:

For 3Ware, you need to chage the queue depths, and you will see 
dramatically improved performance. 3Ware can take requests
a lot faster than Linux pushes them out. Try changing this instead, 
you won't be going to sleep all the time waiting on the read/write
request queues to get "unstarved".

/linux/include/linux/blkdev.h
//#define BLKDEV_MIN_RQ 4
//#define BLKDEV_MAX_RQ 128 /* Default maximum */
#define BLKDEV_MIN_RQ 4096
#define BLKDEV_MAX_RQ 8192 /* Default maximum */
Jeff
Andreas Hirstius wrote:
Hi,
We have a rx4640 with 3x 3Ware 9500 SATA controllers and 24x WD740GD 
HDD in a software RAID0 configuration (using md).
With kernel 2.6.11 the read performance on the md is reduced by a 
factor of 20 (!!) compared to previous kernels.
The write rate to the md doesn't change!! (it actually improves a bit).

The config for the kernels are basically identical.
Here is some vmstat output:
kernel 2.6.9: ~1GB/s read
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
1 1 0 12672 6592 15914112 0 0 1081344 56 15719 1583 0 11 14 74
1 0 0 12672 6592 15915200 0 0 1130496 0 15996 1626 0 11 14 74
0 1 0 12672 6592 15914112 0 0 1081344 0 15891 1570 0 11 14 74
0 1 0 12480 6592 15914112 0 0 1081344 0 15855 1537 0 11 14 74
1 0 0 12416 6592 15914112 0 0 1130496 0 16006 1586 0 12 14 74
kernel 2.6.11: ~55MB/s read
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
1 1 0 24448 37568 15905984 0 0 56934 0 5166 1862 0 1 24 75
0 1 0 20672 37568 15909248 0 0 57280 0 5168 1871 0 1 24 75
0 1 0 22848 37568 15907072 0 0 57306 0 5173 1874 0 1 24 75
0 1 0 25664 37568 15903808 0 0 57190 0 5171 1870 0 1 24 75
0 1 0 21952 37568 15908160 0 0 57267 0 5168 1871 0 1 24 75
Because the filesystem might have an impact on the measurement, "dd" 
on /dev/md0
was used to get information about the performance. This also opens 
the possibility to test with block sizes larger than the page size.
And it appears that the performance with kernel 2.6.11 is closely 
related to the block size.
For example if the block size is exactly a multiple (>2) of the page 
size the performance is back to ~1.1GB/s.
The general behaviour is a bit more complicated:
1. bs <= 1.5 * ps : ~27-57MB/s (differs with ps)
2. bs > 1.5 * ps && bs < 2 * ps : rate increases to max. rate
3. bs = n * ps ; (n >= 2) : ~1.1GB/s (== max. rate)
4. bs > n * ps && bs < ~(n+0.5) * ps ; (n > 2) : ~27-70MB/s (differs 
with ps)
5. bs > ~(n+0.5) * ps && bs < (n+1) * ps ; (n > 2) : increasing rate 
in several, more or
less, distinct steps (e.g. 1/3 of max. rate and then 2/3 of max rate 
for 64k pages)

I've tested all four possible page sizes on Itanium (4k, 8k, 16k and 
64k) and the pattern is always the same!!

With kernel 2.6.9 (any kernel before 2.6.10-bk6) the read rate is 
always at ~1.1GB/s,
independent of the block size.

This simple patch solves the problem, but I have no idea of possible 
side-effects ...

--- linux-2.6.12-rc2_orig/mm/filemap.c 2005-04-04 18:40:05.0 
+0200
+++ linux-2.6.12-rc2/mm/filemap.c 2005-04-20 10:27:42.0 +0200
@@ -719,7 +719,7 @@
index = *ppos >> PAGE_CACHE_SHIFT;
next_index = index;
prev_index = ra.prev_page;
- last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> 
PAGE_CACHE_SHIFT;
+ last_index = (*ppos + desc->count + PAGE_CACHE_SIZE) >> 
PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;

isize = i_size_read(inode);
--- linux-2.6.12-rc2_orig/mm/readahead.c 2005-04-04 
18:40:05.0 +0200
+++ linux-2.6.12-rc2/mm/readahead.c 2005-04-20 18:37:04.0 +0200
@@ -70,7 +70,7 @@
*/
static unsigned long get_init_ra_size(unsigned long size, unsigned 
long max)
{
- unsigned long newsize = roundup_pow_of_two(size);
+ unsigned long newsize = size;

if (newsize <= max / 64)
newsize = newsize * newsize;

In order to keep this mail short, I've created a webpage that 
contains all the detailed information and some plots:
http://www.cern.ch/openlab-debugging/raid

Regards,
Andreas Hirstius
-
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-20 Thread Nick Piggin
On Wed, 2005-04-20 at 10:55 -0600, jmerkey wrote:
> 
> For 3Ware, you need to chage the queue depths, and you will see 
> dramatically improved performance. 3Ware can take requests
> a lot faster than Linux pushes them out. Try changing this instead, you 
> won't be going to sleep all the time waiting on the read/write
> request queues to get "unstarved".
> 
> 
> /linux/include/linux/blkdev.h
> 
> //#define BLKDEV_MIN_RQ 4
> //#define BLKDEV_MAX_RQ 128 /* Default maximum */
> #define BLKDEV_MIN_RQ 4096
> #define BLKDEV_MAX_RQ 8192 /* Default maximum */
> 

BTW, don't do this. BLKDEV_MIN_RQ sets the size of the mempool
reserved requests and will only get slightly used in low memory
conditions, so most memory will probably be wasted.

Just change /sys/block/xxx/queue/nr_requests

Nick




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-20 Thread jmerkey
Burst is good.  There's another window in the SCSI layer that limits to 
bursts of 128 sector runs (this seems to be the behavior
on 3Ware).  I've never changed this, but increasing the max number of 
SCSI requests at this layer may help.  The
bursty behavior is good, BTW.

Jeff
Andreas Hirstius wrote:
I was curious if your patch would change the write rate because I see 
only ~550MB/s (continuous) which is about a factor two away from the 
capabilities of the disks.
... and got this behaviour (with and without my other patch):

(with single "dd if=/dev/zero of=testxx bs=65536 count=15 &" or 
several of them in parallel on an XFS fs)

"vmstat 1" output
0  0  0  28416  37888 1577836800 0 0 8485  3043  
0  0  0 100
6  0  0  22144  37952 1578592000 0 12356 7695  2029  0 
61  0 39
7  0  0  20864  38016 1578585600   324 1722240 8046  4159  
0 100  0  0
7  0  0  20864  38016 1578476800 0 1261440 8391  5222  
0 100  0  0
7  0  0  25984  38016 1578150400 0 2003456 8372  5038  
0 100  0  0
0  6  0  22784  38016 1578150400 0 2826624 8397  8423  
0 93  7  0
0  0  0  21632  38016 1578368000 0 147840 8572 12114  
0  9 17 74
0  0  0  21632  38016 1578368000 052 8586  5185  
0  0  0 100
0  0  0  21632  38016 1578368000 0 0 8588  5412  
0  0  0 100
0  0  0  21632  38016 1578368000 0 0 8580  5372  
0  0  0 100
0  0  0  21632  38016 1578368000 0 0 7840  5590  
0  0  0 100
0  0  0  21632  38016 1578368000 0 0 8587  5321  
0  0  0 100
0  0  0  21632  38016 1578368000 0 0 8569  5575  
0  0  0 100
0  0  0  21632  38016 1578368000 0 0 8550  5157  
0  0  0 100
0  0  0  21632  38016 1578368000 0 0 7963  5640  
0  0  0 100
0  0  0  21632  38016 1578368000 032 8583  4434  
0  0  0 100
7  0  0  20800  38016 1578476800 0  7424 8404  3638  0 
15  0 85
8  0  0  20864  38016 1578694400 0 688768 7357  3221  
0 100  0  0
8  0  0  20736  28544 1579424000 0 1978560 8376  4897  
0 100  0  0
7  0  0  22208  20736 1579878400 0 1385088 8367  4984  
0 100  0  0
6  0  0  22144   6848 158126720056 1291904 8377  4815  
0 100  0  0
0  0  0  50240   6848 1580940800   304  3136 8556  5088  1 
26  0 74
0  0  0  50304   6848 1580940800 0 0 8572  5181  
0  0  0 100

The average rate here is again pretty close to 550MB/s, it just writes 
the blocks in "bursts"...

Andreas
jmerkey wrote:

For 3Ware, you need to chage the queue depths, and you will see 
dramatically improved performance. 3Ware can take requests
a lot faster than Linux pushes them out. Try changing this instead, 
you won't be going to sleep all the time waiting on the read/write
request queues to get "unstarved".

/linux/include/linux/blkdev.h
//#define BLKDEV_MIN_RQ 4
//#define BLKDEV_MAX_RQ 128 /* Default maximum */
#define BLKDEV_MIN_RQ 4096
#define BLKDEV_MAX_RQ 8192 /* Default maximum */
Jeff
Andreas Hirstius wrote:
Hi,
We have a rx4640 with 3x 3Ware 9500 SATA controllers and 24x WD740GD 
HDD in a software RAID0 configuration (using md).
With kernel 2.6.11 the read performance on the md is reduced by a 
factor of 20 (!!) compared to previous kernels.
The write rate to the md doesn't change!! (it actually improves a bit).

The config for the kernels are basically identical.
Here is some vmstat output:
kernel 2.6.9: ~1GB/s read
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
1 1 0 12672 6592 15914112 0 0 1081344 56 15719 1583 0 11 14 74
1 0 0 12672 6592 15915200 0 0 1130496 0 15996 1626 0 11 14 74
0 1 0 12672 6592 15914112 0 0 1081344 0 15891 1570 0 11 14 74
0 1 0 12480 6592 15914112 0 0 1081344 0 15855 1537 0 11 14 74
1 0 0 12416 6592 15914112 0 0 1130496 0 16006 1586 0 12 14 74
kernel 2.6.11: ~55MB/s read
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
1 1 0 24448 37568 15905984 0 0 56934 0 5166 1862 0 1 24 75
0 1 0 20672 37568 15909248 0 0 57280 0 5168 1871 0 1 24 75
0 1 0 22848 37568 15907072 0 0 57306 0 5173 1874 0 1 24 75
0 1 0 25664 37568 15903808 0 0 57190 0 5171 1870 0 1 24 75
0 1 0 21952 37568 15908160 0 0 57267 0 5168 1871 0 1 24 75
Because the filesystem might have an impact on the measurement, "dd" 
on /dev/md0
was used to get information about the performance. This also opens 
the possibility to test with block sizes larger than the page size.
And it appears that the performance with kernel 2.6.11 is closely 
related to the block size.
For example if the block size is exactly a multiple (>2) of the page 
size the performance is back to ~1.1GB/s.
The general behaviour is a bit more complicated:
1. bs <= 1.5 * ps : ~27-57MB/s (differs with ps)
2. bs > 1.5 * ps && bs < 2 * ps : rate increases to max. r

Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-20 Thread Andreas Hirstius
I was curious if your patch would change the write rate because I see 
only ~550MB/s (continuous) which is about a factor two away from the 
capabilities of the disks.
... and got this behaviour (with and without my other patch):

(with single "dd if=/dev/zero of=testxx bs=65536 count=15 &" or 
several of them in parallel on an XFS fs)

"vmstat 1" output
0  0  0  28416  37888 1577836800 0 0 8485  3043  0  
0  0 100
6  0  0  22144  37952 1578592000 0 12356 7695  2029  0 
61  0 39
7  0  0  20864  38016 1578585600   324 1722240 8046  4159  
0 100  0  0
7  0  0  20864  38016 1578476800 0 1261440 8391  5222  
0 100  0  0
7  0  0  25984  38016 1578150400 0 2003456 8372  5038  
0 100  0  0
0  6  0  22784  38016 1578150400 0 2826624 8397  8423  
0 93  7  0
0  0  0  21632  38016 1578368000 0 147840 8572 12114  
0  9 17 74
0  0  0  21632  38016 1578368000 052 8586  5185  0  
0  0 100
0  0  0  21632  38016 1578368000 0 0 8588  5412  0  
0  0 100
0  0  0  21632  38016 1578368000 0 0 8580  5372  0  
0  0 100
0  0  0  21632  38016 1578368000 0 0 7840  5590  0  
0  0 100
0  0  0  21632  38016 1578368000 0 0 8587  5321  0  
0  0 100
0  0  0  21632  38016 1578368000 0 0 8569  5575  0  
0  0 100
0  0  0  21632  38016 1578368000 0 0 8550  5157  0  
0  0 100
0  0  0  21632  38016 1578368000 0 0 7963  5640  0  
0  0 100
0  0  0  21632  38016 1578368000 032 8583  4434  0  
0  0 100
7  0  0  20800  38016 1578476800 0  7424 8404  3638  0 
15  0 85
8  0  0  20864  38016 1578694400 0 688768 7357  3221  0 
100  0  0
8  0  0  20736  28544 1579424000 0 1978560 8376  4897  
0 100  0  0
7  0  0  22208  20736 1579878400 0 1385088 8367  4984  
0 100  0  0
6  0  0  22144   6848 158126720056 1291904 8377  4815  
0 100  0  0
0  0  0  50240   6848 1580940800   304  3136 8556  5088  1 
26  0 74
0  0  0  50304   6848 1580940800 0 0 8572  5181  0  
0  0 100

The average rate here is again pretty close to 550MB/s, it just writes 
the blocks in "bursts"...

Andreas
jmerkey wrote:

For 3Ware, you need to chage the queue depths, and you will see 
dramatically improved performance. 3Ware can take requests
a lot faster than Linux pushes them out. Try changing this instead, 
you won't be going to sleep all the time waiting on the read/write
request queues to get "unstarved".

/linux/include/linux/blkdev.h
//#define BLKDEV_MIN_RQ 4
//#define BLKDEV_MAX_RQ 128 /* Default maximum */
#define BLKDEV_MIN_RQ 4096
#define BLKDEV_MAX_RQ 8192 /* Default maximum */
Jeff
Andreas Hirstius wrote:
Hi,
We have a rx4640 with 3x 3Ware 9500 SATA controllers and 24x WD740GD 
HDD in a software RAID0 configuration (using md).
With kernel 2.6.11 the read performance on the md is reduced by a 
factor of 20 (!!) compared to previous kernels.
The write rate to the md doesn't change!! (it actually improves a bit).

The config for the kernels are basically identical.
Here is some vmstat output:
kernel 2.6.9: ~1GB/s read
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
1 1 0 12672 6592 15914112 0 0 1081344 56 15719 1583 0 11 14 74
1 0 0 12672 6592 15915200 0 0 1130496 0 15996 1626 0 11 14 74
0 1 0 12672 6592 15914112 0 0 1081344 0 15891 1570 0 11 14 74
0 1 0 12480 6592 15914112 0 0 1081344 0 15855 1537 0 11 14 74
1 0 0 12416 6592 15914112 0 0 1130496 0 16006 1586 0 12 14 74
kernel 2.6.11: ~55MB/s read
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
1 1 0 24448 37568 15905984 0 0 56934 0 5166 1862 0 1 24 75
0 1 0 20672 37568 15909248 0 0 57280 0 5168 1871 0 1 24 75
0 1 0 22848 37568 15907072 0 0 57306 0 5173 1874 0 1 24 75
0 1 0 25664 37568 15903808 0 0 57190 0 5171 1870 0 1 24 75
0 1 0 21952 37568 15908160 0 0 57267 0 5168 1871 0 1 24 75
Because the filesystem might have an impact on the measurement, "dd" 
on /dev/md0
was used to get information about the performance. This also opens 
the possibility to test with block sizes larger than the page size.
And it appears that the performance with kernel 2.6.11 is closely 
related to the block size.
For example if the block size is exactly a multiple (>2) of the page 
size the performance is back to ~1.1GB/s.
The general behaviour is a bit more complicated:
1. bs <= 1.5 * ps : ~27-57MB/s (differs with ps)
2. bs > 1.5 * ps && bs < 2 * ps : rate increases to max. rate
3. bs = n * ps ; (n >= 2) : ~1.1GB/s (== max. rate)
4. bs > n * ps && bs < ~(n+0.5) * ps ; (n > 2) : ~27-70MB/s (differs 
with ps)
5. bs > ~(n+0.5) * ps && bs < (n+1) * ps ; (n > 2) : increasing rate 
in several, more or
less, distinct steps (e.g. 1/3 of max. rate and then 2/3 of max rate 
for 64k p

Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-20 Thread Andreas Hirstius
Just tried it, but the performance problem remains :-(
(actually, why should it change? This part of the code didn't change so 
much between 2.6.10-bk6 and -bk7...)

Andreas

jmerkey wrote:

For 3Ware, you need to chage the queue depths, and you will see 
dramatically improved performance. 3Ware can take requests
a lot faster than Linux pushes them out. Try changing this instead, 
you won't be going to sleep all the time waiting on the read/write
request queues to get "unstarved".

/linux/include/linux/blkdev.h
//#define BLKDEV_MIN_RQ 4
//#define BLKDEV_MAX_RQ 128 /* Default maximum */
#define BLKDEV_MIN_RQ 4096
#define BLKDEV_MAX_RQ 8192 /* Default maximum */
Jeff
Andreas Hirstius wrote:
Hi,
We have a rx4640 with 3x 3Ware 9500 SATA controllers and 24x WD740GD 
HDD in a software RAID0 configuration (using md).
With kernel 2.6.11 the read performance on the md is reduced by a 
factor of 20 (!!) compared to previous kernels.
The write rate to the md doesn't change!! (it actually improves a bit).

The config for the kernels are basically identical.
Here is some vmstat output:
kernel 2.6.9: ~1GB/s read
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
1 1 0 12672 6592 15914112 0 0 1081344 56 15719 1583 0 11 14 74
1 0 0 12672 6592 15915200 0 0 1130496 0 15996 1626 0 11 14 74
0 1 0 12672 6592 15914112 0 0 1081344 0 15891 1570 0 11 14 74
0 1 0 12480 6592 15914112 0 0 1081344 0 15855 1537 0 11 14 74
1 0 0 12416 6592 15914112 0 0 1130496 0 16006 1586 0 12 14 74
kernel 2.6.11: ~55MB/s read
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
1 1 0 24448 37568 15905984 0 0 56934 0 5166 1862 0 1 24 75
0 1 0 20672 37568 15909248 0 0 57280 0 5168 1871 0 1 24 75
0 1 0 22848 37568 15907072 0 0 57306 0 5173 1874 0 1 24 75
0 1 0 25664 37568 15903808 0 0 57190 0 5171 1870 0 1 24 75
0 1 0 21952 37568 15908160 0 0 57267 0 5168 1871 0 1 24 75
Because the filesystem might have an impact on the measurement, "dd" 
on /dev/md0
was used to get information about the performance. This also opens 
the possibility to test with block sizes larger than the page size.
And it appears that the performance with kernel 2.6.11 is closely 
related to the block size.
For example if the block size is exactly a multiple (>2) of the page 
size the performance is back to ~1.1GB/s.
The general behaviour is a bit more complicated:
1. bs <= 1.5 * ps : ~27-57MB/s (differs with ps)
2. bs > 1.5 * ps && bs < 2 * ps : rate increases to max. rate
3. bs = n * ps ; (n >= 2) : ~1.1GB/s (== max. rate)
4. bs > n * ps && bs < ~(n+0.5) * ps ; (n > 2) : ~27-70MB/s (differs 
with ps)
5. bs > ~(n+0.5) * ps && bs < (n+1) * ps ; (n > 2) : increasing rate 
in several, more or
less, distinct steps (e.g. 1/3 of max. rate and then 2/3 of max rate 
for 64k pages)

I've tested all four possible page sizes on Itanium (4k, 8k, 16k and 
64k) and the pattern is always the same!!

With kernel 2.6.9 (any kernel before 2.6.10-bk6) the read rate is 
always at ~1.1GB/s,
independent of the block size.

This simple patch solves the problem, but I have no idea of possible 
side-effects ...

--- linux-2.6.12-rc2_orig/mm/filemap.c 2005-04-04 18:40:05.0 
+0200
+++ linux-2.6.12-rc2/mm/filemap.c 2005-04-20 10:27:42.0 +0200
@@ -719,7 +719,7 @@
index = *ppos >> PAGE_CACHE_SHIFT;
next_index = index;
prev_index = ra.prev_page;
- last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> 
PAGE_CACHE_SHIFT;
+ last_index = (*ppos + desc->count + PAGE_CACHE_SIZE) >> 
PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;

isize = i_size_read(inode);
--- linux-2.6.12-rc2_orig/mm/readahead.c 2005-04-04 
18:40:05.0 +0200
+++ linux-2.6.12-rc2/mm/readahead.c 2005-04-20 18:37:04.0 +0200
@@ -70,7 +70,7 @@
*/
static unsigned long get_init_ra_size(unsigned long size, unsigned 
long max)
{
- unsigned long newsize = roundup_pow_of_two(size);
+ unsigned long newsize = size;

if (newsize <= max / 64)
newsize = newsize * newsize;

In order to keep this mail short, I've created a webpage that 
contains all the detailed information and some plots:
http://www.cern.ch/openlab-debugging/raid

Regards,
Andreas Hirstius
-
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-20 Thread jmerkey

For 3Ware, you need to chage the queue depths, and you will see 
dramatically improved performance. 3Ware can take requests
a lot faster than Linux pushes them out. Try changing this instead, you 
won't be going to sleep all the time waiting on the read/write
request queues to get "unstarved".

/linux/include/linux/blkdev.h
//#define BLKDEV_MIN_RQ 4
//#define BLKDEV_MAX_RQ 128 /* Default maximum */
#define BLKDEV_MIN_RQ 4096
#define BLKDEV_MAX_RQ 8192 /* Default maximum */
Jeff
Andreas Hirstius wrote:
Hi,
We have a rx4640 with 3x 3Ware 9500 SATA controllers and 24x WD740GD 
HDD in a software RAID0 configuration (using md).
With kernel 2.6.11 the read performance on the md is reduced by a 
factor of 20 (!!) compared to previous kernels.
The write rate to the md doesn't change!! (it actually improves a bit).

The config for the kernels are basically identical.
Here is some vmstat output:
kernel 2.6.9: ~1GB/s read
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
1 1 0 12672 6592 15914112 0 0 1081344 56 15719 1583 0 11 14 74
1 0 0 12672 6592 15915200 0 0 1130496 0 15996 1626 0 11 14 74
0 1 0 12672 6592 15914112 0 0 1081344 0 15891 1570 0 11 14 74
0 1 0 12480 6592 15914112 0 0 1081344 0 15855 1537 0 11 14 74
1 0 0 12416 6592 15914112 0 0 1130496 0 16006 1586 0 12 14 74
kernel 2.6.11: ~55MB/s read
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
1 1 0 24448 37568 15905984 0 0 56934 0 5166 1862 0 1 24 75
0 1 0 20672 37568 15909248 0 0 57280 0 5168 1871 0 1 24 75
0 1 0 22848 37568 15907072 0 0 57306 0 5173 1874 0 1 24 75
0 1 0 25664 37568 15903808 0 0 57190 0 5171 1870 0 1 24 75
0 1 0 21952 37568 15908160 0 0 57267 0 5168 1871 0 1 24 75
Because the filesystem might have an impact on the measurement, "dd" 
on /dev/md0
was used to get information about the performance. This also opens the 
possibility to test with block sizes larger than the page size.
And it appears that the performance with kernel 2.6.11 is closely 
related to the block size.
For example if the block size is exactly a multiple (>2) of the page 
size the performance is back to ~1.1GB/s.
The general behaviour is a bit more complicated:
1. bs <= 1.5 * ps : ~27-57MB/s (differs with ps)
2. bs > 1.5 * ps && bs < 2 * ps : rate increases to max. rate
3. bs = n * ps ; (n >= 2) : ~1.1GB/s (== max. rate)
4. bs > n * ps && bs < ~(n+0.5) * ps ; (n > 2) : ~27-70MB/s (differs 
with ps)
5. bs > ~(n+0.5) * ps && bs < (n+1) * ps ; (n > 2) : increasing rate 
in several, more or
less, distinct steps (e.g. 1/3 of max. rate and then 2/3 of max rate 
for 64k pages)

I've tested all four possible page sizes on Itanium (4k, 8k, 16k and 
64k) and the pattern is always the same!!

With kernel 2.6.9 (any kernel before 2.6.10-bk6) the read rate is 
always at ~1.1GB/s,
independent of the block size.

This simple patch solves the problem, but I have no idea of possible 
side-effects ...

--- linux-2.6.12-rc2_orig/mm/filemap.c 2005-04-04 18:40:05.0 
+0200
+++ linux-2.6.12-rc2/mm/filemap.c 2005-04-20 10:27:42.0 +0200
@@ -719,7 +719,7 @@
index = *ppos >> PAGE_CACHE_SHIFT;
next_index = index;
prev_index = ra.prev_page;
- last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> 
PAGE_CACHE_SHIFT;
+ last_index = (*ppos + desc->count + PAGE_CACHE_SIZE) >> 
PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;

isize = i_size_read(inode);
--- linux-2.6.12-rc2_orig/mm/readahead.c 2005-04-04 18:40:05.0 
+0200
+++ linux-2.6.12-rc2/mm/readahead.c 2005-04-20 18:37:04.0 +0200
@@ -70,7 +70,7 @@
*/
static unsigned long get_init_ra_size(unsigned long size, unsigned 
long max)
{
- unsigned long newsize = roundup_pow_of_two(size);
+ unsigned long newsize = size;

if (newsize <= max / 64)
newsize = newsize * newsize;

In order to keep this mail short, I've created a webpage that contains 
all the detailed information and some plots:
http://www.cern.ch/openlab-debugging/raid

Regards,
Andreas Hirstius
-
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-20 Thread Andreas Hirstius
Hi,
We have a rx4640 with 3x 3Ware 9500 SATA controllers and 24x WD740GD HDD 
in a software RAID0 configuration (using md).
With kernel 2.6.11 the read performance on the md is reduced by a factor 
of 20 (!!) compared to previous kernels.
The write rate to the md doesn't change!! (it actually improves a bit).

The config for the kernels are basically identical.
Here is some vmstat output:
kernel 2.6.9: ~1GB/s read
procs  memory  swap  io 
system cpu
r  b   swpd   free   buff  cache   si   sobibo   incs us sy wa id
1  1  0  12672   6592 1591411200 108134456 15719  1583 0 11 14 74
1  0  0  12672   6592 1591520000 1130496 0 15996  1626 0 11 14 74
0  1  0  12672   6592 1591411200 1081344 0 15891  1570 0 11 14 74
0  1  0  12480   6592 1591411200 1081344 0 15855  1537 0 11 14 74
1  0  0  12416   6592 1591411200 1130496 0 16006  1586 0 12 14 74

kernel 2.6.11: ~55MB/s read
procs  memory  swap  io 
system cpu
r  b   swpd   free   buff  cache   si   sobibo   incs us sy wa id
1  1  0  24448  37568 1590598400 56934 0 5166  1862  0 1 24 75
0  1  0  20672  37568 1590924800 57280 0 5168  1871  0 1 24 75
0  1  0  22848  37568 1590707200 57306 0 5173  1874  0 1 24 75
0  1  0  25664  37568 1590380800 57190 0 5171  1870  0 1 24 75
0  1  0  21952  37568 1590816000 57267 0 5168  1871  0 1 24 75

Because the filesystem might have an impact on the measurement, "dd" on /dev/md0
was used to get information about the performance. 
This also opens the possibility to test with block sizes larger than the page size.
And it appears that the performance with kernel 2.6.11 is closely 
related to the block size.
For example if the block size is exactly a multiple (>2) of the page 
size the performance is back to ~1.1GB/s.
The general behaviour is a bit more complicated:  

 1. bs <= 1.5 * ps : ~27-57MB/s (differs with ps)
 2. bs > 1.5 * ps && bs < 2 * ps : rate increases to max. rate
 3. bs = n * ps ; (n >= 2) : ~1.1GB/s (== max. rate)
 4. bs > n * ps && bs < ~(n+0.5) * ps ; (n > 2) : ~27-70MB/s (differs 
with ps)
 5. bs > ~(n+0.5) * ps && bs < (n+1) * ps ; (n > 2) : increasing rate 
in several, more or
 less, distinct steps (e.g. 1/3 of max. rate and then 2/3 of max 
rate for 64k pages)

I've tested all four possible page sizes on Itanium (4k, 8k, 16k and 64k) and the pattern is 
always the same!!

With kernel 2.6.9 (any kernel before 2.6.10-bk6) the read rate is always at 
~1.1GB/s,
independent of the block size.
This simple patch solves the problem, but I have no idea of possible 
side-effects ...
--- linux-2.6.12-rc2_orig/mm/filemap.c  2005-04-04 18:40:05.0 +0200
+++ linux-2.6.12-rc2/mm/filemap.c   2005-04-20 10:27:42.0 +0200
@@ -719,7 +719,7 @@
   index = *ppos >> PAGE_CACHE_SHIFT;
   next_index = index;
   prev_index = ra.prev_page;
-   last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> 
PAGE_CACHE_SHIFT;
+   last_index = (*ppos + desc->count + PAGE_CACHE_SIZE) >> 
PAGE_CACHE_SHIFT;
   offset = *ppos & ~PAGE_CACHE_MASK;
   isize = i_size_read(inode);
--- linux-2.6.12-rc2_orig/mm/readahead.c2005-04-04 18:40:05.0 
+0200
+++ linux-2.6.12-rc2/mm/readahead.c 2005-04-20 18:37:04.0 +0200
@@ -70,7 +70,7 @@
 */
static unsigned long get_init_ra_size(unsigned long size, unsigned long max)
{
-   unsigned long newsize = roundup_pow_of_two(size);
+   unsigned long newsize = size;
   if (newsize <= max / 64)
   newsize = newsize * newsize;

In order to keep this mail short, I've created a webpage that contains 
all the detailed information and some plots:
http://www.cern.ch/openlab-debugging/raid

Regards,
  Andreas Hirstius
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/