Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-13 Thread Peter Zijlstra

On Tue, 2007-11-13 at 16:34 +0800, Zhang, Yanmin wrote:

> My new bisect captured 7c9e69faa28027913ee059c285a5ea8382e24b5d
> which caused the regression of iozone following run (3rd/4th... run after 
> mounting
> the ext3 partition).

Linus just reverted that commit with commit:

commit 0b832a4b93932103d73c0c3f35ef1153e288327b
Author: Linus Torvalds <[EMAIL PROTECTED]>
Date:   Tue Nov 13 08:07:31 2007 -0800

Revert "ext2/ext3/ext4: add block bitmap validation"

This reverts commit 7c9e69faa28027913ee059c285a5ea8382e24b5d, fixing up
conflicts in fs/ext4/balloc.c manually.

The cost of doing the bitmap validation on each lookup - even when the
bitmap is cached - is absolutely prohibitive.  We could, and probably
should, do it only when adding the bitmap to the buffer cache.  However,
right now we are better off just reverting it.

Peter Zijlstra measured the cost of this extra validation as a 85%
decrease in cached iozone, and while I had a patch that took it down to
just 17% by not being _quite_ so stupid in the validation, it was still
a big slowdown that could have been avoided by just doing it right.




signature.asc
Description: This is a digitally signed message part


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-13 Thread Zhang, Yanmin
On Tue, 2007-11-13 at 10:19 +0800, Zhang, Yanmin wrote:
> On Mon, 2007-11-12 at 17:48 +0100, Peter Zijlstra wrote:
> > On Mon, 2007-11-12 at 17:05 +0200, Benny Halevy wrote:
> > > On Nov. 12, 2007, 15:26 +0200, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > > > Single socket, dual core opteron, 2GB memory
> > > > Single SATA disk, ext3
> > > > 
> > > > 2.6.23.1-42.fc8 #1 SMP
> > > > 
> > > >   524288   4  225977  447461
> > > >   524288   4  232595  496848
> > > >   524288   4  220608  478076
> > > >   524288   4  203080  445230
> > > > 
> > > > 2.6.24-rc2 #28 SMP PREEMPT
> > > > 
> > > >   524288   4   54043   83585
> > > >   524288   4   69949  516253
> > > >   524288   4   72343  491416
> > > >   524288   4   71775  492653
> > 
> > 2.6.24-rc2 +
> > patches/wu-reiser.patch
> > patches/writeback-early.patch
> > patches/bdi-task-dirty.patch
> > patches/bdi-sysfs.patch
> > patches/sched-hrtick.patch
> > patches/sched-rt-entity.patch
> > patches/sched-watchdog.patch
> > patches/linus-ext3-blockalloc.patch
> > 
> >   524288   4  179657  487676
> >   524288   4  173989  465682
> >   524288   4  175842  489800
> > 
> > 
> > Linus' patch is the one that makes the difference here. So I'm unsure
> > how you bisected it down to:
> > 
> >   04fbfdc14e5f48463820d6b9807daa5e9c92c51f
> Originally, my test suite is just to pick up the result of first run. Your 
> prior
> patch(speed up writeback ramp-up on clean systems) fixed an issue about first
> run result regression. So my bisect captured it.
> 
> However, late on, I found following run have different results. A moment ago,
> I retested 04fbfdc14e5f48463820d6b9807daa5e9c92c51f by:
> #git checkout 04fbfdc14e5f48463820d6b9807daa5e9c92c51f
> #make
> 
> Then, reverse your patch. It looks like 
> 04fbfdc14e5f48463820d6b9807daa5e9c92c51f
> is not the root cause of following run regression. I will change my test 
> suite to
> make it run for many times and do a new bisect.
> 
> > These results seem to point to
> > 
> >   7c9e69faa28027913ee059c285a5ea8382e24b5d
My new bisect captured 7c9e69faa28027913ee059c285a5ea8382e24b5d
which caused the regression of iozone following run (3rd/4th... run after 
mounting
the ext3 partition).

Peter,

Where could I download Linus new patches, especially 
patches/linus-ext3-blockalloc.patch?
I couldn't find it in my archives of LKML mails.

yanmin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-13 Thread Zhang, Yanmin
On Tue, 2007-11-13 at 10:19 +0800, Zhang, Yanmin wrote:
 On Mon, 2007-11-12 at 17:48 +0100, Peter Zijlstra wrote:
  On Mon, 2007-11-12 at 17:05 +0200, Benny Halevy wrote:
   On Nov. 12, 2007, 15:26 +0200, Peter Zijlstra [EMAIL PROTECTED] wrote:
Single socket, dual core opteron, 2GB memory
Single SATA disk, ext3

2.6.23.1-42.fc8 #1 SMP

  524288   4  225977  447461
  524288   4  232595  496848
  524288   4  220608  478076
  524288   4  203080  445230

2.6.24-rc2 #28 SMP PREEMPT

  524288   4   54043   83585
  524288   4   69949  516253
  524288   4   72343  491416
  524288   4   71775  492653
  
  2.6.24-rc2 +
  patches/wu-reiser.patch
  patches/writeback-early.patch
  patches/bdi-task-dirty.patch
  patches/bdi-sysfs.patch
  patches/sched-hrtick.patch
  patches/sched-rt-entity.patch
  patches/sched-watchdog.patch
  patches/linus-ext3-blockalloc.patch
  
524288   4  179657  487676
524288   4  173989  465682
524288   4  175842  489800
  
  
  Linus' patch is the one that makes the difference here. So I'm unsure
  how you bisected it down to:
  
04fbfdc14e5f48463820d6b9807daa5e9c92c51f
 Originally, my test suite is just to pick up the result of first run. Your 
 prior
 patch(speed up writeback ramp-up on clean systems) fixed an issue about first
 run result regression. So my bisect captured it.
 
 However, late on, I found following run have different results. A moment ago,
 I retested 04fbfdc14e5f48463820d6b9807daa5e9c92c51f by:
 #git checkout 04fbfdc14e5f48463820d6b9807daa5e9c92c51f
 #make
 
 Then, reverse your patch. It looks like 
 04fbfdc14e5f48463820d6b9807daa5e9c92c51f
 is not the root cause of following run regression. I will change my test 
 suite to
 make it run for many times and do a new bisect.
 
  These results seem to point to
  
7c9e69faa28027913ee059c285a5ea8382e24b5d
My new bisect captured 7c9e69faa28027913ee059c285a5ea8382e24b5d
which caused the regression of iozone following run (3rd/4th... run after 
mounting
the ext3 partition).

Peter,

Where could I download Linus new patches, especially 
patches/linus-ext3-blockalloc.patch?
I couldn't find it in my archives of LKML mails.

yanmin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-13 Thread Peter Zijlstra

On Tue, 2007-11-13 at 16:34 +0800, Zhang, Yanmin wrote:

 My new bisect captured 7c9e69faa28027913ee059c285a5ea8382e24b5d
 which caused the regression of iozone following run (3rd/4th... run after 
 mounting
 the ext3 partition).

Linus just reverted that commit with commit:

commit 0b832a4b93932103d73c0c3f35ef1153e288327b
Author: Linus Torvalds [EMAIL PROTECTED]
Date:   Tue Nov 13 08:07:31 2007 -0800

Revert ext2/ext3/ext4: add block bitmap validation

This reverts commit 7c9e69faa28027913ee059c285a5ea8382e24b5d, fixing up
conflicts in fs/ext4/balloc.c manually.

The cost of doing the bitmap validation on each lookup - even when the
bitmap is cached - is absolutely prohibitive.  We could, and probably
should, do it only when adding the bitmap to the buffer cache.  However,
right now we are better off just reverting it.

Peter Zijlstra measured the cost of this extra validation as a 85%
decrease in cached iozone, and while I had a patch that took it down to
just 17% by not being _quite_ so stupid in the validation, it was still
a big slowdown that could have been avoided by just doing it right.




signature.asc
Description: This is a digitally signed message part


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Zhang, Yanmin
On Mon, 2007-11-12 at 17:48 +0100, Peter Zijlstra wrote:
> On Mon, 2007-11-12 at 17:05 +0200, Benny Halevy wrote:
> > On Nov. 12, 2007, 15:26 +0200, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > > Single socket, dual core opteron, 2GB memory
> > > Single SATA disk, ext3
> > > 
> > > x86_64 kernel and userland
> > > 
> > > (dirty_background_ratio, dirty_ratio) tunables
> > > 
> > >  (5,10) - default
> > > 
> > > 2.6.23.1-42.fc8 #1 SMP
> > > 
> > >   524288   4   59580   60356
> > >   524288   4   59247   61101
> > >   524288   4   61030   62831
> > > 
> > > 2.6.24-rc2 #28 SMP PREEMPT
> > > 
> > >   524288   4   49277   56582
> > >   524288   4   50728   61056
> > >   524288   4   52027   59758
> > >   524288   4   51520   62426
> > > 
> > > 
> > >  (20,40) - similar to your 8GB
> > > 
> > > 2.6.23.1-42.fc8 #1 SMP
> > > 
> > >   524288   4  225977  447461
> > >   524288   4  232595  496848
> > >   524288   4  220608  478076
> > >   524288   4  203080  445230
> > > 
> > > 2.6.24-rc2 #28 SMP PREEMPT
> > > 
> > >   524288   4   54043   83585
> > >   524288   4   69949  516253
> > >   524288   4   72343  491416
> > >   524288   4   71775  492653
> > > 
> > >  (60,80) - overkill
> > > 
> > > 2.6.23.1-42.fc8 #1 SMP
> > > 
> > >   524288   4  208450  491892
> > >   524288   4  216262  481135
> > >   524288   4  221892  543608
> > >   524288   4  202209  574725
> > >   524288   4  231730  452482
> > > 
> > > 2.6.24-rc2 #28 SMP PREEMPT
> > > 
> > >   524288   4   49091   86471
> > >   524288   4   65071  217566
> > >   524288   4   72238  492172
> > >   524288   4   71818  492433
> > >   524288   4   71327  493954
> > > 
> > > 
> > > While I see that the write speed as reported under .24 ~70MB/s is much
> > > lower than the one reported under .23 ~200MB/s, I find it very hard to
> > > believe my poor single SATA disk could actually do the 200MB/s for
> > > longer than its cache 8/16 MB (not sure).
> > > 
> > > vmstat shows that actual IO is done, even though the whole 512MB could
> > > fit in cache, hence my suspicion that the ~70MB/s is the most realistic
> > > of the two.
> > 
> > Even 70 MB/s seems too high.  What throughput do you see for the
> > raw disk partition/
> > 
> > Also, are the numbers above for successive runs?
> > It seems like you're seeing some caching effects so
> > I'd recommend using a file larger than your cache size and
> > the -e and -c options (to include fsync and close in timings)
> > to try to eliminate them.
> 
> -- iozone -i 0 -r 4k -s 512m -e -c
> 
> .23 (20,40)
> 
>   524288   4   31750   33560
>   524288   4   29786   32114
>   524288   4   29115   31476
> 
> .24 (20,40)
> 
>   524288   4   25022   32411
>   524288   4   25375   31662
>   524288   4   26407   33871
> 
> 
> -- iozone -i 0 -r 4k -s 4g -e -c
> 
> .23 (20,40)
> 
>  4194304   4   39699   35550
>  4194304   4   40225   36099
> 
> 
> .24 (20,40)
> 
>  4194304   4   39961   41656
>  4194304   4   39244   39673
> 
> 
> Yanmin, for that benchmark you ran, what was it meant to measure?
> From what I can make of it its just write cache benching.
Yeah. It's quite related to cache. I did more testing on my stoakley machine (8 
cores,
8GB mem). If I reduce the memory to 4GB, the speed will be far slower.

> 
> One thing I don't understand is how the write numbers are so much lower
> than the rewrite numbers. The iozone code (which gives me headaches,
> damn what a mess) seems to suggest that the only thing that is different
> is the lack of block allocation.
It might be a good direction.

> 
> Linus posted a patch yesterday fixing up a regression in the ext3 bitmap
> block allocator, /me goes apply that patch and rerun the tests.
> 
> > >  (20,40) - similar to your 8GB
> > > 
> > > 2.6.23.1-42.fc8 #1 SMP
> > > 
> > >   524288   4  225977  447461
> > >   524288   4  232595  496848
> > >   524288   4  220608  478076
> > >   524288   4  203080  445230
> > > 
> > > 2.6.24-rc2 #28 SMP PREEMPT
> > > 
> > >   524288   4   54043   83585
> > >   524288   4   69949  516253
> > >   524288   4   72343  491416
> > >   524288   4   71775  492653
> 
> 2.6.24-rc2 +
> patches/wu-reiser.patch
> patches/writeback-early.patch
> patches/bdi-task-dirty.patch
> patches/bdi-sysfs.patch
> patches/sched-hrtick.patch
> patches/sched-rt-entity.patch
> patches/sched-watchdog.patch
> patches/linus-ext3-blockalloc.patch
> 
>   524288   4  179657  487676
>   

Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Zhang, Yanmin
On Mon, 2007-11-12 at 04:58 -0800, Martin Knoblauch wrote:
> - Original Message 
> > From: "Zhang, Yanmin" <[EMAIL PROTECTED]>
> > To: Martin Knoblauch <[EMAIL PROTECTED]>
> > Cc: [EMAIL PROTECTED]; LKML 
> > Sent: Monday, November 12, 2007 1:45:57 AM
> > Subject: Re: iozone write 50% regression in kernel 2.6.24-rc1
> > 
> > On Fri, 2007-11-09 at 04:36 -0800, Martin Knoblauch wrote:
> > > - Original Message 
> > > > From: "Zhang, Yanmin" 
> > > > To: [EMAIL PROTECTED]
> > > > Cc: LKML 
> > > > Sent: Friday, November 9, 2007 10:47:52 AM
> > > > Subject: iozone write 50% regression in kernel 2.6.24-rc1
> > > > 
> > > > Comparing with 2.6.23, iozone sequential write/rewrite (512M) has
> > > > 50%
> > > > 
> > >  regression
> > > > in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.
> > > > 
> > > > My machine has 8 processor cores and 8GB memory.
> > > > 
> > > > By bisect, I located patch
> > >
> > >
> > 
>  
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h
> > =
> > > > 04fbfdc14e5f48463820d6b9807daa5e9c92c51f.
> > > > 
> > > > 
> > > > Another behavior: with kernel 2.6.23, if I run iozone for many
> > > > times
> > > > 
> > >  after rebooting machine,
> > > > the result looks stable. But with 2.6.24-rc1, the first run of
> > > > iozone
> > > > 
> > >  got a very small result and
> > > > following run has 4Xorig_result.
> > > > 
> > > > What I reported is the regression of 2nd/3rd run, because first run
> > > > has
> > > > 
> > >  bigger regression.
> > > > 
> > > > I also tried to change
> > > > /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio
> > > > 
> > >  and didn't get improvement.
> > >  could you tell us the exact iozone command you are using?
> > iozone -i 0 -r 4k -s 512m
> > 
> 
>  OK, I definitely do not see the reported effect.  On a HP Proliant with a 
> RAID5 on CCISS I get:
> 
> 2.6.19.2: 654-738 MB/sec write, 1126-1154 MB/sec rewrite
> 2.6.24-rc2: 772-820 MB/sec write, 1495-1539 MB/sec rewrite
> 
>  The first run is always slowest, all subsequent runs are faster and the same 
> speed.
Although the first run is always slowest, but if we compare 2.6.23 and 
2.6.24-rc,
we could find the first run result of 2.6.23 is 7 times of the one of 2.6.24-rc.

Originally, my test suite is just to pick up the result of first run. I might
change my test suite to make it run for many times.

Now I run the the test manually for many times after machine reboots. Comparing 
2.6.24-rc
with 2.6.23, 3rd and following run of 2.6.24-rc has about 50% regression.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Zhang, Yanmin
On Mon, 2007-11-12 at 12:25 -0500, Mark Lord wrote:
> Peter Zijlstra wrote:
> ..
> > While I see that the write speed as reported under .24 ~70MB/s is much
> > lower than the one reported under .23 ~200MB/s, I find it very hard to
> > believe my poor single SATA disk could actually do the 200MB/s for
> > longer than its cache 8/16 MB (not sure).
> > 
> > vmstat shows that actual IO is done, even though the whole 512MB could
> > fit in cache, hence my suspicion that the ~70MB/s is the most realistic
> > of the two.
> ..
> 
> Yeah, sequential 70MB/sec is quite realistic for a modern SATA drive.
> 
> But significantly faster than that (say, 100MB/sec +) is unlikely at present.
I just use command '#iozone -i 0 -r 4k -s 512m', no '-e -c'. So if
we consider cache, the speed is very fast. On my machine with 2.6.23, the write 
speed is
631M/s, quite fast. :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Mark Lord

Peter Zijlstra wrote:
..

While I see that the write speed as reported under .24 ~70MB/s is much
lower than the one reported under .23 ~200MB/s, I find it very hard to
believe my poor single SATA disk could actually do the 200MB/s for
longer than its cache 8/16 MB (not sure).

vmstat shows that actual IO is done, even though the whole 512MB could
fit in cache, hence my suspicion that the ~70MB/s is the most realistic
of the two.

..

Yeah, sequential 70MB/sec is quite realistic for a modern SATA drive.

But significantly faster than that (say, 100MB/sec +) is unlikely at present.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Peter Zijlstra

On Mon, 2007-11-12 at 17:05 +0200, Benny Halevy wrote:
> On Nov. 12, 2007, 15:26 +0200, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > Single socket, dual core opteron, 2GB memory
> > Single SATA disk, ext3
> > 
> > x86_64 kernel and userland
> > 
> > (dirty_background_ratio, dirty_ratio) tunables
> > 
> >  (5,10) - default
> > 
> > 2.6.23.1-42.fc8 #1 SMP
> > 
> >   524288   4   59580   60356
> >   524288   4   59247   61101
> >   524288   4   61030   62831
> > 
> > 2.6.24-rc2 #28 SMP PREEMPT
> > 
> >   524288   4   49277   56582
> >   524288   4   50728   61056
> >   524288   4   52027   59758
> >   524288   4   51520   62426
> > 
> > 
> >  (20,40) - similar to your 8GB
> > 
> > 2.6.23.1-42.fc8 #1 SMP
> > 
> >   524288   4  225977  447461
> >   524288   4  232595  496848
> >   524288   4  220608  478076
> >   524288   4  203080  445230
> > 
> > 2.6.24-rc2 #28 SMP PREEMPT
> > 
> >   524288   4   54043   83585
> >   524288   4   69949  516253
> >   524288   4   72343  491416
> >   524288   4   71775  492653
> > 
> >  (60,80) - overkill
> > 
> > 2.6.23.1-42.fc8 #1 SMP
> > 
> >   524288   4  208450  491892
> >   524288   4  216262  481135
> >   524288   4  221892  543608
> >   524288   4  202209  574725
> >   524288   4  231730  452482
> > 
> > 2.6.24-rc2 #28 SMP PREEMPT
> > 
> >   524288   4   49091   86471
> >   524288   4   65071  217566
> >   524288   4   72238  492172
> >   524288   4   71818  492433
> >   524288   4   71327  493954
> > 
> > 
> > While I see that the write speed as reported under .24 ~70MB/s is much
> > lower than the one reported under .23 ~200MB/s, I find it very hard to
> > believe my poor single SATA disk could actually do the 200MB/s for
> > longer than its cache 8/16 MB (not sure).
> > 
> > vmstat shows that actual IO is done, even though the whole 512MB could
> > fit in cache, hence my suspicion that the ~70MB/s is the most realistic
> > of the two.
> 
> Even 70 MB/s seems too high.  What throughput do you see for the
> raw disk partition/
> 
> Also, are the numbers above for successive runs?
> It seems like you're seeing some caching effects so
> I'd recommend using a file larger than your cache size and
> the -e and -c options (to include fsync and close in timings)
> to try to eliminate them.

-- iozone -i 0 -r 4k -s 512m -e -c

.23 (20,40)

  524288   4   31750   33560
  524288   4   29786   32114
  524288   4   29115   31476

.24 (20,40)

  524288   4   25022   32411
  524288   4   25375   31662
  524288   4   26407   33871


-- iozone -i 0 -r 4k -s 4g -e -c

.23 (20,40)

 4194304   4   39699   35550
 4194304   4   40225   36099


.24 (20,40)

 4194304   4   39961   41656
 4194304   4   39244   39673


Yanmin, for that benchmark you ran, what was it meant to measure?
From what I can make of it its just write cache benching.

One thing I don't understand is how the write numbers are so much lower
than the rewrite numbers. The iozone code (which gives me headaches,
damn what a mess) seems to suggest that the only thing that is different
is the lack of block allocation.

Linus posted a patch yesterday fixing up a regression in the ext3 bitmap
block allocator, /me goes apply that patch and rerun the tests.

> >  (20,40) - similar to your 8GB
> > 
> > 2.6.23.1-42.fc8 #1 SMP
> > 
> >   524288   4  225977  447461
> >   524288   4  232595  496848
> >   524288   4  220608  478076
> >   524288   4  203080  445230
> > 
> > 2.6.24-rc2 #28 SMP PREEMPT
> > 
> >   524288   4   54043   83585
> >   524288   4   69949  516253
> >   524288   4   72343  491416
> >   524288   4   71775  492653

2.6.24-rc2 +
patches/wu-reiser.patch
patches/writeback-early.patch
patches/bdi-task-dirty.patch
patches/bdi-sysfs.patch
patches/sched-hrtick.patch
patches/sched-rt-entity.patch
patches/sched-watchdog.patch
patches/linus-ext3-blockalloc.patch

  524288   4  179657  487676
  524288   4  173989  465682
  524288   4  175842  489800


Linus' patch is the one that makes the difference here. So I'm unsure
how you bisected it down to:

  04fbfdc14e5f48463820d6b9807daa5e9c92c51f

These results seem to point to

  7c9e69faa28027913ee059c285a5ea8382e24b5d

as being the offending patch.



signature.asc
Description: This is a digitally signed message part


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Peter Zijlstra
Single socket, dual core opteron, 2GB memory
Single SATA disk, ext3

x86_64 kernel and userland

(dirty_background_ratio, dirty_ratio) tunables

 (5,10) - default

2.6.23.1-42.fc8 #1 SMP

  524288   4   59580   60356
  524288   4   59247   61101
  524288   4   61030   62831

2.6.24-rc2 #28 SMP PREEMPT

  524288   4   49277   56582
  524288   4   50728   61056
  524288   4   52027   59758
  524288   4   51520   62426


 (20,40) - similar to your 8GB

2.6.23.1-42.fc8 #1 SMP

  524288   4  225977  447461
  524288   4  232595  496848
  524288   4  220608  478076
  524288   4  203080  445230

2.6.24-rc2 #28 SMP PREEMPT

  524288   4   54043   83585
  524288   4   69949  516253
  524288   4   72343  491416
  524288   4   71775  492653

 (60,80) - overkill

2.6.23.1-42.fc8 #1 SMP

  524288   4  208450  491892
  524288   4  216262  481135
  524288   4  221892  543608
  524288   4  202209  574725
  524288   4  231730  452482

2.6.24-rc2 #28 SMP PREEMPT

  524288   4   49091   86471
  524288   4   65071  217566
  524288   4   72238  492172
  524288   4   71818  492433
  524288   4   71327  493954


While I see that the write speed as reported under .24 ~70MB/s is much
lower than the one reported under .23 ~200MB/s, I find it very hard to
believe my poor single SATA disk could actually do the 200MB/s for
longer than its cache 8/16 MB (not sure).

vmstat shows that actual IO is done, even though the whole 512MB could
fit in cache, hence my suspicion that the ~70MB/s is the most realistic
of the two.

I'll have to look into what iozone actually does though and why this
patch makes the output different.

FWIW - because its a single backing dev it does get to 100% of the dirty
limit after a few runs, so not sure what makes the difference.


signature.asc
Description: This is a digitally signed message part


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Martin Knoblauch
- Original Message 
> From: "Zhang, Yanmin" <[EMAIL PROTECTED]>
> To: Martin Knoblauch <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED]; LKML 
> Sent: Monday, November 12, 2007 1:45:57 AM
> Subject: Re: iozone write 50% regression in kernel 2.6.24-rc1
> 
> On Fri, 2007-11-09 at 04:36 -0800, Martin Knoblauch wrote:
> > - Original Message 
> > > From: "Zhang, Yanmin" 
> > > To: [EMAIL PROTECTED]
> > > Cc: LKML 
> > > Sent: Friday, November 9, 2007 10:47:52 AM
> > > Subject: iozone write 50% regression in kernel 2.6.24-rc1
> > > 
> > > Comparing with 2.6.23, iozone sequential write/rewrite (512M) has
> > > 50%
> > > 
> >  regression
> > > in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.
> > > 
> > > My machine has 8 processor cores and 8GB memory.
> > > 
> > > By bisect, I located patch
> >
> >
> 
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h
> =
> > > 04fbfdc14e5f48463820d6b9807daa5e9c92c51f.
> > > 
> > > 
> > > Another behavior: with kernel 2.6.23, if I run iozone for many
> > > times
> > > 
> >  after rebooting machine,
> > > the result looks stable. But with 2.6.24-rc1, the first run of
> > > iozone
> > > 
> >  got a very small result and
> > > following run has 4Xorig_result.
> > > 
> > > What I reported is the regression of 2nd/3rd run, because first run
> > > has
> > > 
> >  bigger regression.
> > > 
> > > I also tried to change
> > > /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio
> > > 
> >  and didn't get improvement.
> >  could you tell us the exact iozone command you are using?
> iozone -i 0 -r 4k -s 512m
> 

 OK, I definitely do not see the reported effect.  On a HP Proliant with a 
RAID5 on CCISS I get:

2.6.19.2: 654-738 MB/sec write, 1126-1154 MB/sec rewrite
2.6.24-rc2: 772-820 MB/sec write, 1495-1539 MB/sec rewrite

 The first run is always slowest, all subsequent runs are faster and the same 
speed.

> 
> >  I would like to repeat it on my setup, because I definitely see
> the
> 
 opposite behaviour in 2.6.24-rc1/rc2. The speed there is much
> better
> 
 than in 2.6.22 and before (I skipped 2.6.23, because I was waiting
> for
> 
 the per-bdi changes). I definitely do not see the difference between
> 1st
> 
 and subsequent runs. But then, I do my tests with 5GB file sizes like:
> > 
> > iozone3_283/src/current/iozone -t 5 -F /scratch/X1
> /scratch/X2
> 
 /scratch/X3 /scratch/X4 /scratch/X5 -s 5000M -r 1024 -c -e -i 0 -i 1
> My machine uses SATA (AHCI) disk.
> 

 4x72GB SCSI disks building a RAID5 on a CCISS controller with battery backed 
write cache. Systems are 2 CPUs (64-bit) with 8 GB memory. I could test on some 
IBM boxes (2x dual core, 8 GB) with RAID5 on "aacraid", but I need some time to 
free up one of the boxes.

Cheers
Martin



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Zhang, Yanmin
On Mon, 2007-11-12 at 10:45 +0100, Peter Zijlstra wrote:
> On Mon, 2007-11-12 at 10:14 +0800, Zhang, Yanmin wrote:
> 
> > > Subject: mm: speed up writeback ramp-up on clean systems
> >
> > I tested kernel 2.6.23, 2,6,24-rc2, 2.6.24-rc2_peter(2.6.24-rc2+this patch).
> > 
> > 1) Compare among first/second/following running
> > 2.6.23: second run of iozone will get about 28% improvement than first run.
> > Following run is very stable like 2nd run.
> > 2.6.24-rc2: second run of iozone will get about 170% improvement than first 
> > run. 3rd run
> > will get about 80% improvement than 2nd. Following run is very stable 
> > like 3rd run.
> > 2.6.24-rc2_peter: second run of iozone will get about 14% improvement than 
> > first run. Following
> > run is mostly stable like 2nd run.
> > So the new patch really improves the first run result. Comparing wiht 
> > 2.6.24-rc2, 2.6.24-rc2_peter
> > has 330% improvement on the first run.
> > 
> > 2) Compare among different kernels(based on the stable highest result):
> > 2.6.24-rc2 has about 50% regression than 2.6.23.
> > 2.6.24-rc2_peter has the same result like 2.6.24-rc2.
> >
> > From this point of view, above patch has no improvement. :)
> 
> Drad, still good test results though.
> 
> Could you describe you system in detail, that is, you have 8GB of memory
> and 8 cpus (2*quad?).
Yes.

>  How many disks does it have
1 machine uses 1 AHCI SATA. Other machines use hardware raid0.

>  and are those
> aggregated using md or dm?
No.

>  What filesystem do you use?
Ext3.

I got the regression on my a couple of machines. Pls. try command
#iozone -i 0 -r 4k -s 512m

-yanmin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Peter Zijlstra

On Mon, 2007-11-12 at 10:14 +0800, Zhang, Yanmin wrote:

> > Subject: mm: speed up writeback ramp-up on clean systems
>
> I tested kernel 2.6.23, 2,6,24-rc2, 2.6.24-rc2_peter(2.6.24-rc2+this patch).
> 
> 1) Compare among first/second/following running
> 2.6.23: second run of iozone will get about 28% improvement than first run.
>   Following run is very stable like 2nd run.
> 2.6.24-rc2: second run of iozone will get about 170% improvement than first 
> run. 3rd run
>   will get about 80% improvement than 2nd. Following run is very stable 
> like 3rd run.
> 2.6.24-rc2_peter: second run of iozone will get about 14% improvement than 
> first run. Following
>   run is mostly stable like 2nd run.
> So the new patch really improves the first run result. Comparing wiht 
> 2.6.24-rc2, 2.6.24-rc2_peter
> has 330% improvement on the first run.
> 
> 2) Compare among different kernels(based on the stable highest result):
> 2.6.24-rc2 has about 50% regression than 2.6.23.
> 2.6.24-rc2_peter has the same result like 2.6.24-rc2.
>
> From this point of view, above patch has no improvement. :)

Drad, still good test results though.

Could you describe you system in detail, that is, you have 8GB of memory
and 8 cpus (2*quad?). How many disks does it have and are those
aggregated using md or dm? What filesystem do you use?




signature.asc
Description: This is a digitally signed message part


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Peter Zijlstra

On Mon, 2007-11-12 at 10:14 +0800, Zhang, Yanmin wrote:

  Subject: mm: speed up writeback ramp-up on clean systems

 I tested kernel 2.6.23, 2,6,24-rc2, 2.6.24-rc2_peter(2.6.24-rc2+this patch).
 
 1) Compare among first/second/following running
 2.6.23: second run of iozone will get about 28% improvement than first run.
   Following run is very stable like 2nd run.
 2.6.24-rc2: second run of iozone will get about 170% improvement than first 
 run. 3rd run
   will get about 80% improvement than 2nd. Following run is very stable 
 like 3rd run.
 2.6.24-rc2_peter: second run of iozone will get about 14% improvement than 
 first run. Following
   run is mostly stable like 2nd run.
 So the new patch really improves the first run result. Comparing wiht 
 2.6.24-rc2, 2.6.24-rc2_peter
 has 330% improvement on the first run.
 
 2) Compare among different kernels(based on the stable highest result):
 2.6.24-rc2 has about 50% regression than 2.6.23.
 2.6.24-rc2_peter has the same result like 2.6.24-rc2.

 From this point of view, above patch has no improvement. :)

Drad, still good test results though.

Could you describe you system in detail, that is, you have 8GB of memory
and 8 cpus (2*quad?). How many disks does it have and are those
aggregated using md or dm? What filesystem do you use?




signature.asc
Description: This is a digitally signed message part


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Zhang, Yanmin
On Mon, 2007-11-12 at 10:45 +0100, Peter Zijlstra wrote:
 On Mon, 2007-11-12 at 10:14 +0800, Zhang, Yanmin wrote:
 
   Subject: mm: speed up writeback ramp-up on clean systems
 
  I tested kernel 2.6.23, 2,6,24-rc2, 2.6.24-rc2_peter(2.6.24-rc2+this patch).
  
  1) Compare among first/second/following running
  2.6.23: second run of iozone will get about 28% improvement than first run.
  Following run is very stable like 2nd run.
  2.6.24-rc2: second run of iozone will get about 170% improvement than first 
  run. 3rd run
  will get about 80% improvement than 2nd. Following run is very stable 
  like 3rd run.
  2.6.24-rc2_peter: second run of iozone will get about 14% improvement than 
  first run. Following
  run is mostly stable like 2nd run.
  So the new patch really improves the first run result. Comparing wiht 
  2.6.24-rc2, 2.6.24-rc2_peter
  has 330% improvement on the first run.
  
  2) Compare among different kernels(based on the stable highest result):
  2.6.24-rc2 has about 50% regression than 2.6.23.
  2.6.24-rc2_peter has the same result like 2.6.24-rc2.
 
  From this point of view, above patch has no improvement. :)
 
 Drad, still good test results though.
 
 Could you describe you system in detail, that is, you have 8GB of memory
 and 8 cpus (2*quad?).
Yes.

  How many disks does it have
1 machine uses 1 AHCI SATA. Other machines use hardware raid0.

  and are those
 aggregated using md or dm?
No.

  What filesystem do you use?
Ext3.

I got the regression on my a couple of machines. Pls. try command
#iozone -i 0 -r 4k -s 512m

-yanmin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Martin Knoblauch
- Original Message 
 From: Zhang, Yanmin [EMAIL PROTECTED]
 To: Martin Knoblauch [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]; LKML linux-kernel@vger.kernel.org
 Sent: Monday, November 12, 2007 1:45:57 AM
 Subject: Re: iozone write 50% regression in kernel 2.6.24-rc1
 
 On Fri, 2007-11-09 at 04:36 -0800, Martin Knoblauch wrote:
  - Original Message 
   From: Zhang, Yanmin 
   To: [EMAIL PROTECTED]
   Cc: LKML 
   Sent: Friday, November 9, 2007 10:47:52 AM
   Subject: iozone write 50% regression in kernel 2.6.24-rc1
   
   Comparing with 2.6.23, iozone sequential write/rewrite (512M) has
   50%
   
   regression
   in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.
   
   My machine has 8 processor cores and 8GB memory.
   
   By bisect, I located patch
 
 
 
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h
 =
   04fbfdc14e5f48463820d6b9807daa5e9c92c51f.
   
   
   Another behavior: with kernel 2.6.23, if I run iozone for many
   times
   
   after rebooting machine,
   the result looks stable. But with 2.6.24-rc1, the first run of
   iozone
   
   got a very small result and
   following run has 4Xorig_result.
   
   What I reported is the regression of 2nd/3rd run, because first run
   has
   
   bigger regression.
   
   I also tried to change
   /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio
   
   and didn't get improvement.
   could you tell us the exact iozone command you are using?
 iozone -i 0 -r 4k -s 512m
 

 OK, I definitely do not see the reported effect.  On a HP Proliant with a 
RAID5 on CCISS I get:

2.6.19.2: 654-738 MB/sec write, 1126-1154 MB/sec rewrite
2.6.24-rc2: 772-820 MB/sec write, 1495-1539 MB/sec rewrite

 The first run is always slowest, all subsequent runs are faster and the same 
speed.

 
   I would like to repeat it on my setup, because I definitely see
 the
 
 opposite behaviour in 2.6.24-rc1/rc2. The speed there is much
 better
 
 than in 2.6.22 and before (I skipped 2.6.23, because I was waiting
 for
 
 the per-bdi changes). I definitely do not see the difference between
 1st
 
 and subsequent runs. But then, I do my tests with 5GB file sizes like:
  
  iozone3_283/src/current/iozone -t 5 -F /scratch/X1
 /scratch/X2
 
 /scratch/X3 /scratch/X4 /scratch/X5 -s 5000M -r 1024 -c -e -i 0 -i 1
 My machine uses SATA (AHCI) disk.
 

 4x72GB SCSI disks building a RAID5 on a CCISS controller with battery backed 
write cache. Systems are 2 CPUs (64-bit) with 8 GB memory. I could test on some 
IBM boxes (2x dual core, 8 GB) with RAID5 on aacraid, but I need some time to 
free up one of the boxes.

Cheers
Martin



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Peter Zijlstra
Single socket, dual core opteron, 2GB memory
Single SATA disk, ext3

x86_64 kernel and userland

(dirty_background_ratio, dirty_ratio) tunables

 (5,10) - default

2.6.23.1-42.fc8 #1 SMP

  524288   4   59580   60356
  524288   4   59247   61101
  524288   4   61030   62831

2.6.24-rc2 #28 SMP PREEMPT

  524288   4   49277   56582
  524288   4   50728   61056
  524288   4   52027   59758
  524288   4   51520   62426


 (20,40) - similar to your 8GB

2.6.23.1-42.fc8 #1 SMP

  524288   4  225977  447461
  524288   4  232595  496848
  524288   4  220608  478076
  524288   4  203080  445230

2.6.24-rc2 #28 SMP PREEMPT

  524288   4   54043   83585
  524288   4   69949  516253
  524288   4   72343  491416
  524288   4   71775  492653

 (60,80) - overkill

2.6.23.1-42.fc8 #1 SMP

  524288   4  208450  491892
  524288   4  216262  481135
  524288   4  221892  543608
  524288   4  202209  574725
  524288   4  231730  452482

2.6.24-rc2 #28 SMP PREEMPT

  524288   4   49091   86471
  524288   4   65071  217566
  524288   4   72238  492172
  524288   4   71818  492433
  524288   4   71327  493954


While I see that the write speed as reported under .24 ~70MB/s is much
lower than the one reported under .23 ~200MB/s, I find it very hard to
believe my poor single SATA disk could actually do the 200MB/s for
longer than its cache 8/16 MB (not sure).

vmstat shows that actual IO is done, even though the whole 512MB could
fit in cache, hence my suspicion that the ~70MB/s is the most realistic
of the two.

I'll have to look into what iozone actually does though and why this
patch makes the output different.

FWIW - because its a single backing dev it does get to 100% of the dirty
limit after a few runs, so not sure what makes the difference.


signature.asc
Description: This is a digitally signed message part


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Peter Zijlstra

On Mon, 2007-11-12 at 17:05 +0200, Benny Halevy wrote:
 On Nov. 12, 2007, 15:26 +0200, Peter Zijlstra [EMAIL PROTECTED] wrote:
  Single socket, dual core opteron, 2GB memory
  Single SATA disk, ext3
  
  x86_64 kernel and userland
  
  (dirty_background_ratio, dirty_ratio) tunables
  
   (5,10) - default
  
  2.6.23.1-42.fc8 #1 SMP
  
524288   4   59580   60356
524288   4   59247   61101
524288   4   61030   62831
  
  2.6.24-rc2 #28 SMP PREEMPT
  
524288   4   49277   56582
524288   4   50728   61056
524288   4   52027   59758
524288   4   51520   62426
  
  
   (20,40) - similar to your 8GB
  
  2.6.23.1-42.fc8 #1 SMP
  
524288   4  225977  447461
524288   4  232595  496848
524288   4  220608  478076
524288   4  203080  445230
  
  2.6.24-rc2 #28 SMP PREEMPT
  
524288   4   54043   83585
524288   4   69949  516253
524288   4   72343  491416
524288   4   71775  492653
  
   (60,80) - overkill
  
  2.6.23.1-42.fc8 #1 SMP
  
524288   4  208450  491892
524288   4  216262  481135
524288   4  221892  543608
524288   4  202209  574725
524288   4  231730  452482
  
  2.6.24-rc2 #28 SMP PREEMPT
  
524288   4   49091   86471
524288   4   65071  217566
524288   4   72238  492172
524288   4   71818  492433
524288   4   71327  493954
  
  
  While I see that the write speed as reported under .24 ~70MB/s is much
  lower than the one reported under .23 ~200MB/s, I find it very hard to
  believe my poor single SATA disk could actually do the 200MB/s for
  longer than its cache 8/16 MB (not sure).
  
  vmstat shows that actual IO is done, even though the whole 512MB could
  fit in cache, hence my suspicion that the ~70MB/s is the most realistic
  of the two.
 
 Even 70 MB/s seems too high.  What throughput do you see for the
 raw disk partition/
 
 Also, are the numbers above for successive runs?
 It seems like you're seeing some caching effects so
 I'd recommend using a file larger than your cache size and
 the -e and -c options (to include fsync and close in timings)
 to try to eliminate them.

-- iozone -i 0 -r 4k -s 512m -e -c

.23 (20,40)

  524288   4   31750   33560
  524288   4   29786   32114
  524288   4   29115   31476

.24 (20,40)

  524288   4   25022   32411
  524288   4   25375   31662
  524288   4   26407   33871


-- iozone -i 0 -r 4k -s 4g -e -c

.23 (20,40)

 4194304   4   39699   35550
 4194304   4   40225   36099


.24 (20,40)

 4194304   4   39961   41656
 4194304   4   39244   39673


Yanmin, for that benchmark you ran, what was it meant to measure?
From what I can make of it its just write cache benching.

One thing I don't understand is how the write numbers are so much lower
than the rewrite numbers. The iozone code (which gives me headaches,
damn what a mess) seems to suggest that the only thing that is different
is the lack of block allocation.

Linus posted a patch yesterday fixing up a regression in the ext3 bitmap
block allocator, /me goes apply that patch and rerun the tests.

   (20,40) - similar to your 8GB
  
  2.6.23.1-42.fc8 #1 SMP
  
524288   4  225977  447461
524288   4  232595  496848
524288   4  220608  478076
524288   4  203080  445230
  
  2.6.24-rc2 #28 SMP PREEMPT
  
524288   4   54043   83585
524288   4   69949  516253
524288   4   72343  491416
524288   4   71775  492653

2.6.24-rc2 +
patches/wu-reiser.patch
patches/writeback-early.patch
patches/bdi-task-dirty.patch
patches/bdi-sysfs.patch
patches/sched-hrtick.patch
patches/sched-rt-entity.patch
patches/sched-watchdog.patch
patches/linus-ext3-blockalloc.patch

  524288   4  179657  487676
  524288   4  173989  465682
  524288   4  175842  489800


Linus' patch is the one that makes the difference here. So I'm unsure
how you bisected it down to:

  04fbfdc14e5f48463820d6b9807daa5e9c92c51f

These results seem to point to

  7c9e69faa28027913ee059c285a5ea8382e24b5d

as being the offending patch.



signature.asc
Description: This is a digitally signed message part


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Mark Lord

Peter Zijlstra wrote:
..

While I see that the write speed as reported under .24 ~70MB/s is much
lower than the one reported under .23 ~200MB/s, I find it very hard to
believe my poor single SATA disk could actually do the 200MB/s for
longer than its cache 8/16 MB (not sure).

vmstat shows that actual IO is done, even though the whole 512MB could
fit in cache, hence my suspicion that the ~70MB/s is the most realistic
of the two.

..

Yeah, sequential 70MB/sec is quite realistic for a modern SATA drive.

But significantly faster than that (say, 100MB/sec +) is unlikely at present.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Zhang, Yanmin
On Mon, 2007-11-12 at 12:25 -0500, Mark Lord wrote:
 Peter Zijlstra wrote:
 ..
  While I see that the write speed as reported under .24 ~70MB/s is much
  lower than the one reported under .23 ~200MB/s, I find it very hard to
  believe my poor single SATA disk could actually do the 200MB/s for
  longer than its cache 8/16 MB (not sure).
  
  vmstat shows that actual IO is done, even though the whole 512MB could
  fit in cache, hence my suspicion that the ~70MB/s is the most realistic
  of the two.
 ..
 
 Yeah, sequential 70MB/sec is quite realistic for a modern SATA drive.
 
 But significantly faster than that (say, 100MB/sec +) is unlikely at present.
I just use command '#iozone -i 0 -r 4k -s 512m', no '-e -c'. So if
we consider cache, the speed is very fast. On my machine with 2.6.23, the write 
speed is
631M/s, quite fast. :)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Zhang, Yanmin
On Mon, 2007-11-12 at 04:58 -0800, Martin Knoblauch wrote:
 - Original Message 
  From: Zhang, Yanmin [EMAIL PROTECTED]
  To: Martin Knoblauch [EMAIL PROTECTED]
  Cc: [EMAIL PROTECTED]; LKML linux-kernel@vger.kernel.org
  Sent: Monday, November 12, 2007 1:45:57 AM
  Subject: Re: iozone write 50% regression in kernel 2.6.24-rc1
  
  On Fri, 2007-11-09 at 04:36 -0800, Martin Knoblauch wrote:
   - Original Message 
From: Zhang, Yanmin 
To: [EMAIL PROTECTED]
Cc: LKML 
Sent: Friday, November 9, 2007 10:47:52 AM
Subject: iozone write 50% regression in kernel 2.6.24-rc1

Comparing with 2.6.23, iozone sequential write/rewrite (512M) has
50%

regression
in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.

My machine has 8 processor cores and 8GB memory.

By bisect, I located patch
  
  
  
  
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h
  =
04fbfdc14e5f48463820d6b9807daa5e9c92c51f.


Another behavior: with kernel 2.6.23, if I run iozone for many
times

after rebooting machine,
the result looks stable. But with 2.6.24-rc1, the first run of
iozone

got a very small result and
following run has 4Xorig_result.

What I reported is the regression of 2nd/3rd run, because first run
has

bigger regression.

I also tried to change
/proc/sys/vm/dirty_ratio,dirty_backgroud_ratio

and didn't get improvement.
could you tell us the exact iozone command you are using?
  iozone -i 0 -r 4k -s 512m
  
 
  OK, I definitely do not see the reported effect.  On a HP Proliant with a 
 RAID5 on CCISS I get:
 
 2.6.19.2: 654-738 MB/sec write, 1126-1154 MB/sec rewrite
 2.6.24-rc2: 772-820 MB/sec write, 1495-1539 MB/sec rewrite
 
  The first run is always slowest, all subsequent runs are faster and the same 
 speed.
Although the first run is always slowest, but if we compare 2.6.23 and 
2.6.24-rc,
we could find the first run result of 2.6.23 is 7 times of the one of 2.6.24-rc.

Originally, my test suite is just to pick up the result of first run. I might
change my test suite to make it run for many times.

Now I run the the test manually for many times after machine reboots. Comparing 
2.6.24-rc
with 2.6.23, 3rd and following run of 2.6.24-rc has about 50% regression.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-12 Thread Zhang, Yanmin
On Mon, 2007-11-12 at 17:48 +0100, Peter Zijlstra wrote:
 On Mon, 2007-11-12 at 17:05 +0200, Benny Halevy wrote:
  On Nov. 12, 2007, 15:26 +0200, Peter Zijlstra [EMAIL PROTECTED] wrote:
   Single socket, dual core opteron, 2GB memory
   Single SATA disk, ext3
   
   x86_64 kernel and userland
   
   (dirty_background_ratio, dirty_ratio) tunables
   
    (5,10) - default
   
   2.6.23.1-42.fc8 #1 SMP
   
 524288   4   59580   60356
 524288   4   59247   61101
 524288   4   61030   62831
   
   2.6.24-rc2 #28 SMP PREEMPT
   
 524288   4   49277   56582
 524288   4   50728   61056
 524288   4   52027   59758
 524288   4   51520   62426
   
   
    (20,40) - similar to your 8GB
   
   2.6.23.1-42.fc8 #1 SMP
   
 524288   4  225977  447461
 524288   4  232595  496848
 524288   4  220608  478076
 524288   4  203080  445230
   
   2.6.24-rc2 #28 SMP PREEMPT
   
 524288   4   54043   83585
 524288   4   69949  516253
 524288   4   72343  491416
 524288   4   71775  492653
   
    (60,80) - overkill
   
   2.6.23.1-42.fc8 #1 SMP
   
 524288   4  208450  491892
 524288   4  216262  481135
 524288   4  221892  543608
 524288   4  202209  574725
 524288   4  231730  452482
   
   2.6.24-rc2 #28 SMP PREEMPT
   
 524288   4   49091   86471
 524288   4   65071  217566
 524288   4   72238  492172
 524288   4   71818  492433
 524288   4   71327  493954
   
   
   While I see that the write speed as reported under .24 ~70MB/s is much
   lower than the one reported under .23 ~200MB/s, I find it very hard to
   believe my poor single SATA disk could actually do the 200MB/s for
   longer than its cache 8/16 MB (not sure).
   
   vmstat shows that actual IO is done, even though the whole 512MB could
   fit in cache, hence my suspicion that the ~70MB/s is the most realistic
   of the two.
  
  Even 70 MB/s seems too high.  What throughput do you see for the
  raw disk partition/
  
  Also, are the numbers above for successive runs?
  It seems like you're seeing some caching effects so
  I'd recommend using a file larger than your cache size and
  the -e and -c options (to include fsync and close in timings)
  to try to eliminate them.
 
 -- iozone -i 0 -r 4k -s 512m -e -c
 
 .23 (20,40)
 
   524288   4   31750   33560
   524288   4   29786   32114
   524288   4   29115   31476
 
 .24 (20,40)
 
   524288   4   25022   32411
   524288   4   25375   31662
   524288   4   26407   33871
 
 
 -- iozone -i 0 -r 4k -s 4g -e -c
 
 .23 (20,40)
 
  4194304   4   39699   35550
  4194304   4   40225   36099
 
 
 .24 (20,40)
 
  4194304   4   39961   41656
  4194304   4   39244   39673
 
 
 Yanmin, for that benchmark you ran, what was it meant to measure?
 From what I can make of it its just write cache benching.
Yeah. It's quite related to cache. I did more testing on my stoakley machine (8 
cores,
8GB mem). If I reduce the memory to 4GB, the speed will be far slower.

 
 One thing I don't understand is how the write numbers are so much lower
 than the rewrite numbers. The iozone code (which gives me headaches,
 damn what a mess) seems to suggest that the only thing that is different
 is the lack of block allocation.
It might be a good direction.

 
 Linus posted a patch yesterday fixing up a regression in the ext3 bitmap
 block allocator, /me goes apply that patch and rerun the tests.
 
    (20,40) - similar to your 8GB
   
   2.6.23.1-42.fc8 #1 SMP
   
 524288   4  225977  447461
 524288   4  232595  496848
 524288   4  220608  478076
 524288   4  203080  445230
   
   2.6.24-rc2 #28 SMP PREEMPT
   
 524288   4   54043   83585
 524288   4   69949  516253
 524288   4   72343  491416
 524288   4   71775  492653
 
 2.6.24-rc2 +
 patches/wu-reiser.patch
 patches/writeback-early.patch
 patches/bdi-task-dirty.patch
 patches/bdi-sysfs.patch
 patches/sched-hrtick.patch
 patches/sched-rt-entity.patch
 patches/sched-watchdog.patch
 patches/linus-ext3-blockalloc.patch
 
   524288   4  179657  487676
   524288   4  173989  465682
   524288   4  175842  489800
 
 
 Linus' patch is the one that makes the difference here. So I'm unsure
 how you bisected it down to:
 
   04fbfdc14e5f48463820d6b9807daa5e9c92c51f
Originally, my test suite is just to pick up the result of first run. Your prior

Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-11 Thread Zhang, Yanmin
On Fri, 2007-11-09 at 10:54 +0100, Peter Zijlstra wrote:
> On Fri, 2007-11-09 at 17:47 +0800, Zhang, Yanmin wrote:
> > Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% 
> > regression
> > in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.
> > 
> > My machine has 8 processor cores and 8GB memory.
> > 
> > By bisect, I located patch
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=04fbfdc14e5f48463820d6b9807daa5e9c92c51f.
> > 
> > 
> > Another behavior: with kernel 2.6.23, if I run iozone for many times after 
> > rebooting machine,
> > the result looks stable. But with 2.6.24-rc1, the first run of iozone got a 
> > very small result and
> > following run has 4Xorig_result.
> 
> So the second run is 4x as fast as the first run?
Pls. see below comments.

> 
> > What I reported is the regression of 2nd/3rd run, because first run has 
> > bigger regression.
> 
> So the 2nd and 3rd run are stable at 50% slower than .23?
Almostly. I did more testing today. Pls. see below result list.

> 
> > I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and 
> > didn't get improvement.
> 
> Could you try:
> 
> ---
> Subject: mm: speed up writeback ramp-up on clean systems
I tested kernel 2.6.23, 2,6,24-rc2, 2.6.24-rc2_peter(2.6.24-rc2+this patch).

1) Compare among first/second/following running
2.6.23: second run of iozone will get about 28% improvement than first run.
Following run is very stable like 2nd run.
2.6.24-rc2: second run of iozone will get about 170% improvement than first 
run. 3rd run
will get about 80% improvement than 2nd. Following run is very stable 
like 3rd run.
2.6.24-rc2_peter: second run of iozone will get about 14% improvement than 
first run. Following
run is mostly stable like 2nd run.
So the new patch really improves the first run result. Comparing wiht 
2.6.24-rc2, 2.6.24-rc2_peter
has 330% improvement on the first run.

2) Compare among different kernels(based on the stable highest result):
2.6.24-rc2 has about 50% regression than 2.6.23.
2.6.24-rc2_peter has the same result like 2.6.24-rc2.
>From this point of view, above patch has no improvement. :)

-yanmin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-11 Thread Zhang, Yanmin
On Fri, 2007-11-09 at 04:36 -0800, Martin Knoblauch wrote:
> - Original Message 
> > From: "Zhang, Yanmin" <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Cc: LKML 
> > Sent: Friday, November 9, 2007 10:47:52 AM
> > Subject: iozone write 50% regression in kernel 2.6.24-rc1
> > 
> > Comparing with 2.6.23, iozone sequential write/rewrite (512M) has
> > 50%
> > 
>  regression
> > in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.
> > 
> > My machine has 8 processor cores and 8GB memory.
> > 
> > By bisect, I located patch
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=
> > 04fbfdc14e5f48463820d6b9807daa5e9c92c51f.
> > 
> > 
> > Another behavior: with kernel 2.6.23, if I run iozone for many
> > times
> > 
>  after rebooting machine,
> > the result looks stable. But with 2.6.24-rc1, the first run of
> > iozone
> > 
>  got a very small result and
> > following run has 4Xorig_result.
> > 
> > What I reported is the regression of 2nd/3rd run, because first run
> > has
> > 
>  bigger regression.
> > 
> > I also tried to change
> > /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio
> > 
>  and didn't get improvement.
>  could you tell us the exact iozone command you are using?
iozone -i 0 -r 4k -s 512m


>  I would like to repeat it on my setup, because I definitely see the opposite 
> behaviour in 2.6.24-rc1/rc2. The speed there is much better than in 2.6.22 
> and before (I skipped 2.6.23, because I was waiting for the per-bdi changes). 
> I definitely do not see the difference between 1st and subsequent runs. But 
> then, I do my tests with 5GB file sizes like:
> 
> iozone3_283/src/current/iozone -t 5 -F /scratch/X1 /scratch/X2 /scratch/X3 
> /scratch/X4 /scratch/X5 -s 5000M -r 1024 -c -e -i 0 -i 1
My machine uses SATA (AHCI) disk.

-yanmin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-11 Thread Zhang, Yanmin
On Fri, 2007-11-09 at 04:36 -0800, Martin Knoblauch wrote:
 - Original Message 
  From: Zhang, Yanmin [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Cc: LKML linux-kernel@vger.kernel.org
  Sent: Friday, November 9, 2007 10:47:52 AM
  Subject: iozone write 50% regression in kernel 2.6.24-rc1
  
  Comparing with 2.6.23, iozone sequential write/rewrite (512M) has
  50%
  
  regression
  in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.
  
  My machine has 8 processor cores and 8GB memory.
  
  By bisect, I located patch
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=
  04fbfdc14e5f48463820d6b9807daa5e9c92c51f.
  
  
  Another behavior: with kernel 2.6.23, if I run iozone for many
  times
  
  after rebooting machine,
  the result looks stable. But with 2.6.24-rc1, the first run of
  iozone
  
  got a very small result and
  following run has 4Xorig_result.
  
  What I reported is the regression of 2nd/3rd run, because first run
  has
  
  bigger regression.
  
  I also tried to change
  /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio
  
  and didn't get improvement.
  could you tell us the exact iozone command you are using?
iozone -i 0 -r 4k -s 512m


  I would like to repeat it on my setup, because I definitely see the opposite 
 behaviour in 2.6.24-rc1/rc2. The speed there is much better than in 2.6.22 
 and before (I skipped 2.6.23, because I was waiting for the per-bdi changes). 
 I definitely do not see the difference between 1st and subsequent runs. But 
 then, I do my tests with 5GB file sizes like:
 
 iozone3_283/src/current/iozone -t 5 -F /scratch/X1 /scratch/X2 /scratch/X3 
 /scratch/X4 /scratch/X5 -s 5000M -r 1024 -c -e -i 0 -i 1
My machine uses SATA (AHCI) disk.

-yanmin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-11 Thread Zhang, Yanmin
On Fri, 2007-11-09 at 10:54 +0100, Peter Zijlstra wrote:
 On Fri, 2007-11-09 at 17:47 +0800, Zhang, Yanmin wrote:
  Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% 
  regression
  in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.
  
  My machine has 8 processor cores and 8GB memory.
  
  By bisect, I located patch
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=04fbfdc14e5f48463820d6b9807daa5e9c92c51f.
  
  
  Another behavior: with kernel 2.6.23, if I run iozone for many times after 
  rebooting machine,
  the result looks stable. But with 2.6.24-rc1, the first run of iozone got a 
  very small result and
  following run has 4Xorig_result.
 
 So the second run is 4x as fast as the first run?
Pls. see below comments.

 
  What I reported is the regression of 2nd/3rd run, because first run has 
  bigger regression.
 
 So the 2nd and 3rd run are stable at 50% slower than .23?
Almostly. I did more testing today. Pls. see below result list.

 
  I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and 
  didn't get improvement.
 
 Could you try:
 
 ---
 Subject: mm: speed up writeback ramp-up on clean systems
I tested kernel 2.6.23, 2,6,24-rc2, 2.6.24-rc2_peter(2.6.24-rc2+this patch).

1) Compare among first/second/following running
2.6.23: second run of iozone will get about 28% improvement than first run.
Following run is very stable like 2nd run.
2.6.24-rc2: second run of iozone will get about 170% improvement than first 
run. 3rd run
will get about 80% improvement than 2nd. Following run is very stable 
like 3rd run.
2.6.24-rc2_peter: second run of iozone will get about 14% improvement than 
first run. Following
run is mostly stable like 2nd run.
So the new patch really improves the first run result. Comparing wiht 
2.6.24-rc2, 2.6.24-rc2_peter
has 330% improvement on the first run.

2) Compare among different kernels(based on the stable highest result):
2.6.24-rc2 has about 50% regression than 2.6.23.
2.6.24-rc2_peter has the same result like 2.6.24-rc2.
From this point of view, above patch has no improvement. :)

-yanmin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-09 Thread Martin Knoblauch
- Original Message 
> From: "Zhang, Yanmin" <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc: LKML 
> Sent: Friday, November 9, 2007 10:47:52 AM
> Subject: iozone write 50% regression in kernel 2.6.24-rc1
> 
> Comparing with 2.6.23, iozone sequential write/rewrite (512M) has
> 50%
> 
 regression
> in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.
> 
> My machine has 8 processor cores and 8GB memory.
> 
> By bisect, I located patch
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=
> 04fbfdc14e5f48463820d6b9807daa5e9c92c51f.
> 
> 
> Another behavior: with kernel 2.6.23, if I run iozone for many
> times
> 
 after rebooting machine,
> the result looks stable. But with 2.6.24-rc1, the first run of
> iozone
> 
 got a very small result and
> following run has 4Xorig_result.
> 
> What I reported is the regression of 2nd/3rd run, because first run
> has
> 
 bigger regression.
> 
> I also tried to change
> /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio
> 
 and didn't get improvement.
> 
> -yanmin
> -
Hi Yanmin,

 could you tell us the exact iozone command you are using? I would like to 
repeat it on my setup, because I definitely see the opposite behaviour in 
2.6.24-rc1/rc2. The speed there is much better than in 2.6.22 and before (I 
skipped 2.6.23, because I was waiting for the per-bdi changes). I definitely do 
not see the difference between 1st and subsequent runs. But then, I do my tests 
with 5GB file sizes like:

iozone3_283/src/current/iozone -t 5 -F /scratch/X1 /scratch/X2 /scratch/X3 
/scratch/X4 /scratch/X5 -s 5000M -r 1024 -c -e -i 0 -i 1

Kind regards
Martin



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-09 Thread Peter Zijlstra
On Fri, 2007-11-09 at 17:47 +0800, Zhang, Yanmin wrote:
> Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% 
> regression
> in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.
> 
> My machine has 8 processor cores and 8GB memory.
> 
> By bisect, I located patch
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=04fbfdc14e5f48463820d6b9807daa5e9c92c51f.
> 
> 
> Another behavior: with kernel 2.6.23, if I run iozone for many times after 
> rebooting machine,
> the result looks stable. But with 2.6.24-rc1, the first run of iozone got a 
> very small result and
> following run has 4Xorig_result.

So the second run is 4x as fast as the first run?

> What I reported is the regression of 2nd/3rd run, because first run has 
> bigger regression.

So the 2nd and 3rd run are stable at 50% slower than .23?

> I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and 
> didn't get improvement.

Could you try:

---
Subject: mm: speed up writeback ramp-up on clean systems

We allow violation of bdi limits if there is a lot of room on the
system. Once we hit half the total limit we start enforcing bdi limits
and bdi ramp-up should happen. Doing it this way avoids many small
writeouts on an otherwise idle system and should also speed up the
ramp-up.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
Reviewed-by: Fengguang Wu <[EMAIL PROTECTED]> 
---
 mm/page-writeback.c |   19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/page-writeback.c
===
--- linux-2.6.orig/mm/page-writeback.c  2007-09-28 10:08:33.937415368 +0200
+++ linux-2.6/mm/page-writeback.c   2007-09-28 10:54:26.018247516 +0200
@@ -355,8 +355,8 @@ get_dirty_limits(long *pbackground, long
  */
 static void balance_dirty_pages(struct address_space *mapping)
 {
-   long bdi_nr_reclaimable;
-   long bdi_nr_writeback;
+   long nr_reclaimable, bdi_nr_reclaimable;
+   long nr_writeback, bdi_nr_writeback;
long background_thresh;
long dirty_thresh;
long bdi_thresh;
@@ -376,11 +376,26 @@ static void balance_dirty_pages(struct a
 
get_dirty_limits(_thresh, _thresh,
_thresh, bdi);
+
+   nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
+   global_page_state(NR_UNSTABLE_NFS);
+   nr_writeback = global_page_state(NR_WRITEBACK);
+
bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
+
if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
break;
 
+   /*
+* Throttle it only when the background writeback cannot
+* catch-up. This avoids (excessively) small writeouts
+* when the bdi limits are ramping up.
+*/
+   if (nr_reclaimable + nr_writeback <
+   (background_thresh + dirty_thresh) / 2)
+   break;
+
if (!bdi->dirty_exceeded)
bdi->dirty_exceeded = 1;
 



signature.asc
Description: This is a digitally signed message part


iozone write 50% regression in kernel 2.6.24-rc1

2007-11-09 Thread Zhang, Yanmin
Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% regression
in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.

My machine has 8 processor cores and 8GB memory.

By bisect, I located patch
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=04fbfdc14e5f48463820d6b9807daa5e9c92c51f.


Another behavior: with kernel 2.6.23, if I run iozone for many times after 
rebooting machine,
the result looks stable. But with 2.6.24-rc1, the first run of iozone got a 
very small result and
following run has 4Xorig_result.

What I reported is the regression of 2nd/3rd run, because first run has bigger 
regression.

I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and 
didn't get improvement.

-yanmin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-09 Thread Peter Zijlstra
On Fri, 2007-11-09 at 17:47 +0800, Zhang, Yanmin wrote:
 Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% 
 regression
 in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.
 
 My machine has 8 processor cores and 8GB memory.
 
 By bisect, I located patch
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=04fbfdc14e5f48463820d6b9807daa5e9c92c51f.
 
 
 Another behavior: with kernel 2.6.23, if I run iozone for many times after 
 rebooting machine,
 the result looks stable. But with 2.6.24-rc1, the first run of iozone got a 
 very small result and
 following run has 4Xorig_result.

So the second run is 4x as fast as the first run?

 What I reported is the regression of 2nd/3rd run, because first run has 
 bigger regression.

So the 2nd and 3rd run are stable at 50% slower than .23?

 I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and 
 didn't get improvement.

Could you try:

---
Subject: mm: speed up writeback ramp-up on clean systems

We allow violation of bdi limits if there is a lot of room on the
system. Once we hit half the total limit we start enforcing bdi limits
and bdi ramp-up should happen. Doing it this way avoids many small
writeouts on an otherwise idle system and should also speed up the
ramp-up.

Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
Reviewed-by: Fengguang Wu [EMAIL PROTECTED] 
---
 mm/page-writeback.c |   19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/page-writeback.c
===
--- linux-2.6.orig/mm/page-writeback.c  2007-09-28 10:08:33.937415368 +0200
+++ linux-2.6/mm/page-writeback.c   2007-09-28 10:54:26.018247516 +0200
@@ -355,8 +355,8 @@ get_dirty_limits(long *pbackground, long
  */
 static void balance_dirty_pages(struct address_space *mapping)
 {
-   long bdi_nr_reclaimable;
-   long bdi_nr_writeback;
+   long nr_reclaimable, bdi_nr_reclaimable;
+   long nr_writeback, bdi_nr_writeback;
long background_thresh;
long dirty_thresh;
long bdi_thresh;
@@ -376,11 +376,26 @@ static void balance_dirty_pages(struct a
 
get_dirty_limits(background_thresh, dirty_thresh,
bdi_thresh, bdi);
+
+   nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
+   global_page_state(NR_UNSTABLE_NFS);
+   nr_writeback = global_page_state(NR_WRITEBACK);
+
bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
+
if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh)
break;
 
+   /*
+* Throttle it only when the background writeback cannot
+* catch-up. This avoids (excessively) small writeouts
+* when the bdi limits are ramping up.
+*/
+   if (nr_reclaimable + nr_writeback 
+   (background_thresh + dirty_thresh) / 2)
+   break;
+
if (!bdi-dirty_exceeded)
bdi-dirty_exceeded = 1;
 



signature.asc
Description: This is a digitally signed message part


iozone write 50% regression in kernel 2.6.24-rc1

2007-11-09 Thread Zhang, Yanmin
Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% regression
in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.

My machine has 8 processor cores and 8GB memory.

By bisect, I located patch
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=04fbfdc14e5f48463820d6b9807daa5e9c92c51f.


Another behavior: with kernel 2.6.23, if I run iozone for many times after 
rebooting machine,
the result looks stable. But with 2.6.24-rc1, the first run of iozone got a 
very small result and
following run has 4Xorig_result.

What I reported is the regression of 2nd/3rd run, because first run has bigger 
regression.

I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and 
didn't get improvement.

-yanmin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iozone write 50% regression in kernel 2.6.24-rc1

2007-11-09 Thread Martin Knoblauch
- Original Message 
 From: Zhang, Yanmin [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: LKML linux-kernel@vger.kernel.org
 Sent: Friday, November 9, 2007 10:47:52 AM
 Subject: iozone write 50% regression in kernel 2.6.24-rc1
 
 Comparing with 2.6.23, iozone sequential write/rewrite (512M) has
 50%
 
 regression
 in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression.
 
 My machine has 8 processor cores and 8GB memory.
 
 By bisect, I located patch
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=
 04fbfdc14e5f48463820d6b9807daa5e9c92c51f.
 
 
 Another behavior: with kernel 2.6.23, if I run iozone for many
 times
 
 after rebooting machine,
 the result looks stable. But with 2.6.24-rc1, the first run of
 iozone
 
 got a very small result and
 following run has 4Xorig_result.
 
 What I reported is the regression of 2nd/3rd run, because first run
 has
 
 bigger regression.
 
 I also tried to change
 /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio
 
 and didn't get improvement.
 
 -yanmin
 -
Hi Yanmin,

 could you tell us the exact iozone command you are using? I would like to 
repeat it on my setup, because I definitely see the opposite behaviour in 
2.6.24-rc1/rc2. The speed there is much better than in 2.6.22 and before (I 
skipped 2.6.23, because I was waiting for the per-bdi changes). I definitely do 
not see the difference between 1st and subsequent runs. But then, I do my tests 
with 5GB file sizes like:

iozone3_283/src/current/iozone -t 5 -F /scratch/X1 /scratch/X2 /scratch/X3 
/scratch/X4 /scratch/X5 -s 5000M -r 1024 -c -e -i 0 -i 1

Kind regards
Martin



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/