Re: iozone write 50% regression in kernel 2.6.24-rc1
On Tue, 2007-11-13 at 16:34 +0800, Zhang, Yanmin wrote: > My new bisect captured 7c9e69faa28027913ee059c285a5ea8382e24b5d > which caused the regression of iozone following run (3rd/4th... run after > mounting > the ext3 partition). Linus just reverted that commit with commit: commit 0b832a4b93932103d73c0c3f35ef1153e288327b Author: Linus Torvalds <[EMAIL PROTECTED]> Date: Tue Nov 13 08:07:31 2007 -0800 Revert "ext2/ext3/ext4: add block bitmap validation" This reverts commit 7c9e69faa28027913ee059c285a5ea8382e24b5d, fixing up conflicts in fs/ext4/balloc.c manually. The cost of doing the bitmap validation on each lookup - even when the bitmap is cached - is absolutely prohibitive. We could, and probably should, do it only when adding the bitmap to the buffer cache. However, right now we are better off just reverting it. Peter Zijlstra measured the cost of this extra validation as a 85% decrease in cached iozone, and while I had a patch that took it down to just 17% by not being _quite_ so stupid in the validation, it was still a big slowdown that could have been avoided by just doing it right. signature.asc Description: This is a digitally signed message part
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Tue, 2007-11-13 at 10:19 +0800, Zhang, Yanmin wrote: > On Mon, 2007-11-12 at 17:48 +0100, Peter Zijlstra wrote: > > On Mon, 2007-11-12 at 17:05 +0200, Benny Halevy wrote: > > > On Nov. 12, 2007, 15:26 +0200, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > > Single socket, dual core opteron, 2GB memory > > > > Single SATA disk, ext3 > > > > > > > > 2.6.23.1-42.fc8 #1 SMP > > > > > > > > 524288 4 225977 447461 > > > > 524288 4 232595 496848 > > > > 524288 4 220608 478076 > > > > 524288 4 203080 445230 > > > > > > > > 2.6.24-rc2 #28 SMP PREEMPT > > > > > > > > 524288 4 54043 83585 > > > > 524288 4 69949 516253 > > > > 524288 4 72343 491416 > > > > 524288 4 71775 492653 > > > > 2.6.24-rc2 + > > patches/wu-reiser.patch > > patches/writeback-early.patch > > patches/bdi-task-dirty.patch > > patches/bdi-sysfs.patch > > patches/sched-hrtick.patch > > patches/sched-rt-entity.patch > > patches/sched-watchdog.patch > > patches/linus-ext3-blockalloc.patch > > > > 524288 4 179657 487676 > > 524288 4 173989 465682 > > 524288 4 175842 489800 > > > > > > Linus' patch is the one that makes the difference here. So I'm unsure > > how you bisected it down to: > > > > 04fbfdc14e5f48463820d6b9807daa5e9c92c51f > Originally, my test suite is just to pick up the result of first run. Your > prior > patch(speed up writeback ramp-up on clean systems) fixed an issue about first > run result regression. So my bisect captured it. > > However, late on, I found following run have different results. A moment ago, > I retested 04fbfdc14e5f48463820d6b9807daa5e9c92c51f by: > #git checkout 04fbfdc14e5f48463820d6b9807daa5e9c92c51f > #make > > Then, reverse your patch. It looks like > 04fbfdc14e5f48463820d6b9807daa5e9c92c51f > is not the root cause of following run regression. I will change my test > suite to > make it run for many times and do a new bisect. > > > These results seem to point to > > > > 7c9e69faa28027913ee059c285a5ea8382e24b5d My new bisect captured 7c9e69faa28027913ee059c285a5ea8382e24b5d which caused the regression of iozone following run (3rd/4th... run after mounting the ext3 partition). Peter, Where could I download Linus new patches, especially patches/linus-ext3-blockalloc.patch? I couldn't find it in my archives of LKML mails. yanmin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Tue, 2007-11-13 at 10:19 +0800, Zhang, Yanmin wrote: On Mon, 2007-11-12 at 17:48 +0100, Peter Zijlstra wrote: On Mon, 2007-11-12 at 17:05 +0200, Benny Halevy wrote: On Nov. 12, 2007, 15:26 +0200, Peter Zijlstra [EMAIL PROTECTED] wrote: Single socket, dual core opteron, 2GB memory Single SATA disk, ext3 2.6.23.1-42.fc8 #1 SMP 524288 4 225977 447461 524288 4 232595 496848 524288 4 220608 478076 524288 4 203080 445230 2.6.24-rc2 #28 SMP PREEMPT 524288 4 54043 83585 524288 4 69949 516253 524288 4 72343 491416 524288 4 71775 492653 2.6.24-rc2 + patches/wu-reiser.patch patches/writeback-early.patch patches/bdi-task-dirty.patch patches/bdi-sysfs.patch patches/sched-hrtick.patch patches/sched-rt-entity.patch patches/sched-watchdog.patch patches/linus-ext3-blockalloc.patch 524288 4 179657 487676 524288 4 173989 465682 524288 4 175842 489800 Linus' patch is the one that makes the difference here. So I'm unsure how you bisected it down to: 04fbfdc14e5f48463820d6b9807daa5e9c92c51f Originally, my test suite is just to pick up the result of first run. Your prior patch(speed up writeback ramp-up on clean systems) fixed an issue about first run result regression. So my bisect captured it. However, late on, I found following run have different results. A moment ago, I retested 04fbfdc14e5f48463820d6b9807daa5e9c92c51f by: #git checkout 04fbfdc14e5f48463820d6b9807daa5e9c92c51f #make Then, reverse your patch. It looks like 04fbfdc14e5f48463820d6b9807daa5e9c92c51f is not the root cause of following run regression. I will change my test suite to make it run for many times and do a new bisect. These results seem to point to 7c9e69faa28027913ee059c285a5ea8382e24b5d My new bisect captured 7c9e69faa28027913ee059c285a5ea8382e24b5d which caused the regression of iozone following run (3rd/4th... run after mounting the ext3 partition). Peter, Where could I download Linus new patches, especially patches/linus-ext3-blockalloc.patch? I couldn't find it in my archives of LKML mails. yanmin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Tue, 2007-11-13 at 16:34 +0800, Zhang, Yanmin wrote: My new bisect captured 7c9e69faa28027913ee059c285a5ea8382e24b5d which caused the regression of iozone following run (3rd/4th... run after mounting the ext3 partition). Linus just reverted that commit with commit: commit 0b832a4b93932103d73c0c3f35ef1153e288327b Author: Linus Torvalds [EMAIL PROTECTED] Date: Tue Nov 13 08:07:31 2007 -0800 Revert ext2/ext3/ext4: add block bitmap validation This reverts commit 7c9e69faa28027913ee059c285a5ea8382e24b5d, fixing up conflicts in fs/ext4/balloc.c manually. The cost of doing the bitmap validation on each lookup - even when the bitmap is cached - is absolutely prohibitive. We could, and probably should, do it only when adding the bitmap to the buffer cache. However, right now we are better off just reverting it. Peter Zijlstra measured the cost of this extra validation as a 85% decrease in cached iozone, and while I had a patch that took it down to just 17% by not being _quite_ so stupid in the validation, it was still a big slowdown that could have been avoided by just doing it right. signature.asc Description: This is a digitally signed message part
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Mon, 2007-11-12 at 17:48 +0100, Peter Zijlstra wrote: > On Mon, 2007-11-12 at 17:05 +0200, Benny Halevy wrote: > > On Nov. 12, 2007, 15:26 +0200, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > Single socket, dual core opteron, 2GB memory > > > Single SATA disk, ext3 > > > > > > x86_64 kernel and userland > > > > > > (dirty_background_ratio, dirty_ratio) tunables > > > > > > (5,10) - default > > > > > > 2.6.23.1-42.fc8 #1 SMP > > > > > > 524288 4 59580 60356 > > > 524288 4 59247 61101 > > > 524288 4 61030 62831 > > > > > > 2.6.24-rc2 #28 SMP PREEMPT > > > > > > 524288 4 49277 56582 > > > 524288 4 50728 61056 > > > 524288 4 52027 59758 > > > 524288 4 51520 62426 > > > > > > > > > (20,40) - similar to your 8GB > > > > > > 2.6.23.1-42.fc8 #1 SMP > > > > > > 524288 4 225977 447461 > > > 524288 4 232595 496848 > > > 524288 4 220608 478076 > > > 524288 4 203080 445230 > > > > > > 2.6.24-rc2 #28 SMP PREEMPT > > > > > > 524288 4 54043 83585 > > > 524288 4 69949 516253 > > > 524288 4 72343 491416 > > > 524288 4 71775 492653 > > > > > > (60,80) - overkill > > > > > > 2.6.23.1-42.fc8 #1 SMP > > > > > > 524288 4 208450 491892 > > > 524288 4 216262 481135 > > > 524288 4 221892 543608 > > > 524288 4 202209 574725 > > > 524288 4 231730 452482 > > > > > > 2.6.24-rc2 #28 SMP PREEMPT > > > > > > 524288 4 49091 86471 > > > 524288 4 65071 217566 > > > 524288 4 72238 492172 > > > 524288 4 71818 492433 > > > 524288 4 71327 493954 > > > > > > > > > While I see that the write speed as reported under .24 ~70MB/s is much > > > lower than the one reported under .23 ~200MB/s, I find it very hard to > > > believe my poor single SATA disk could actually do the 200MB/s for > > > longer than its cache 8/16 MB (not sure). > > > > > > vmstat shows that actual IO is done, even though the whole 512MB could > > > fit in cache, hence my suspicion that the ~70MB/s is the most realistic > > > of the two. > > > > Even 70 MB/s seems too high. What throughput do you see for the > > raw disk partition/ > > > > Also, are the numbers above for successive runs? > > It seems like you're seeing some caching effects so > > I'd recommend using a file larger than your cache size and > > the -e and -c options (to include fsync and close in timings) > > to try to eliminate them. > > -- iozone -i 0 -r 4k -s 512m -e -c > > .23 (20,40) > > 524288 4 31750 33560 > 524288 4 29786 32114 > 524288 4 29115 31476 > > .24 (20,40) > > 524288 4 25022 32411 > 524288 4 25375 31662 > 524288 4 26407 33871 > > > -- iozone -i 0 -r 4k -s 4g -e -c > > .23 (20,40) > > 4194304 4 39699 35550 > 4194304 4 40225 36099 > > > .24 (20,40) > > 4194304 4 39961 41656 > 4194304 4 39244 39673 > > > Yanmin, for that benchmark you ran, what was it meant to measure? > From what I can make of it its just write cache benching. Yeah. It's quite related to cache. I did more testing on my stoakley machine (8 cores, 8GB mem). If I reduce the memory to 4GB, the speed will be far slower. > > One thing I don't understand is how the write numbers are so much lower > than the rewrite numbers. The iozone code (which gives me headaches, > damn what a mess) seems to suggest that the only thing that is different > is the lack of block allocation. It might be a good direction. > > Linus posted a patch yesterday fixing up a regression in the ext3 bitmap > block allocator, /me goes apply that patch and rerun the tests. > > > > (20,40) - similar to your 8GB > > > > > > 2.6.23.1-42.fc8 #1 SMP > > > > > > 524288 4 225977 447461 > > > 524288 4 232595 496848 > > > 524288 4 220608 478076 > > > 524288 4 203080 445230 > > > > > > 2.6.24-rc2 #28 SMP PREEMPT > > > > > > 524288 4 54043 83585 > > > 524288 4 69949 516253 > > > 524288 4 72343 491416 > > > 524288 4 71775 492653 > > 2.6.24-rc2 + > patches/wu-reiser.patch > patches/writeback-early.patch > patches/bdi-task-dirty.patch > patches/bdi-sysfs.patch > patches/sched-hrtick.patch > patches/sched-rt-entity.patch > patches/sched-watchdog.patch > patches/linus-ext3-blockalloc.patch > > 524288 4 179657 487676 >
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Mon, 2007-11-12 at 04:58 -0800, Martin Knoblauch wrote: > - Original Message > > From: "Zhang, Yanmin" <[EMAIL PROTECTED]> > > To: Martin Knoblauch <[EMAIL PROTECTED]> > > Cc: [EMAIL PROTECTED]; LKML > > Sent: Monday, November 12, 2007 1:45:57 AM > > Subject: Re: iozone write 50% regression in kernel 2.6.24-rc1 > > > > On Fri, 2007-11-09 at 04:36 -0800, Martin Knoblauch wrote: > > > - Original Message > > > > From: "Zhang, Yanmin" > > > > To: [EMAIL PROTECTED] > > > > Cc: LKML > > > > Sent: Friday, November 9, 2007 10:47:52 AM > > > > Subject: iozone write 50% regression in kernel 2.6.24-rc1 > > > > > > > > Comparing with 2.6.23, iozone sequential write/rewrite (512M) has > > > > 50% > > > > > > > regression > > > > in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. > > > > > > > > My machine has 8 processor cores and 8GB memory. > > > > > > > > By bisect, I located patch > > > > > > > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h > > = > > > > 04fbfdc14e5f48463820d6b9807daa5e9c92c51f. > > > > > > > > > > > > Another behavior: with kernel 2.6.23, if I run iozone for many > > > > times > > > > > > > after rebooting machine, > > > > the result looks stable. But with 2.6.24-rc1, the first run of > > > > iozone > > > > > > > got a very small result and > > > > following run has 4Xorig_result. > > > > > > > > What I reported is the regression of 2nd/3rd run, because first run > > > > has > > > > > > > bigger regression. > > > > > > > > I also tried to change > > > > /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio > > > > > > > and didn't get improvement. > > > could you tell us the exact iozone command you are using? > > iozone -i 0 -r 4k -s 512m > > > > OK, I definitely do not see the reported effect. On a HP Proliant with a > RAID5 on CCISS I get: > > 2.6.19.2: 654-738 MB/sec write, 1126-1154 MB/sec rewrite > 2.6.24-rc2: 772-820 MB/sec write, 1495-1539 MB/sec rewrite > > The first run is always slowest, all subsequent runs are faster and the same > speed. Although the first run is always slowest, but if we compare 2.6.23 and 2.6.24-rc, we could find the first run result of 2.6.23 is 7 times of the one of 2.6.24-rc. Originally, my test suite is just to pick up the result of first run. I might change my test suite to make it run for many times. Now I run the the test manually for many times after machine reboots. Comparing 2.6.24-rc with 2.6.23, 3rd and following run of 2.6.24-rc has about 50% regression. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Mon, 2007-11-12 at 12:25 -0500, Mark Lord wrote: > Peter Zijlstra wrote: > .. > > While I see that the write speed as reported under .24 ~70MB/s is much > > lower than the one reported under .23 ~200MB/s, I find it very hard to > > believe my poor single SATA disk could actually do the 200MB/s for > > longer than its cache 8/16 MB (not sure). > > > > vmstat shows that actual IO is done, even though the whole 512MB could > > fit in cache, hence my suspicion that the ~70MB/s is the most realistic > > of the two. > .. > > Yeah, sequential 70MB/sec is quite realistic for a modern SATA drive. > > But significantly faster than that (say, 100MB/sec +) is unlikely at present. I just use command '#iozone -i 0 -r 4k -s 512m', no '-e -c'. So if we consider cache, the speed is very fast. On my machine with 2.6.23, the write speed is 631M/s, quite fast. :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
Peter Zijlstra wrote: .. While I see that the write speed as reported under .24 ~70MB/s is much lower than the one reported under .23 ~200MB/s, I find it very hard to believe my poor single SATA disk could actually do the 200MB/s for longer than its cache 8/16 MB (not sure). vmstat shows that actual IO is done, even though the whole 512MB could fit in cache, hence my suspicion that the ~70MB/s is the most realistic of the two. .. Yeah, sequential 70MB/sec is quite realistic for a modern SATA drive. But significantly faster than that (say, 100MB/sec +) is unlikely at present. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Mon, 2007-11-12 at 17:05 +0200, Benny Halevy wrote: > On Nov. 12, 2007, 15:26 +0200, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > Single socket, dual core opteron, 2GB memory > > Single SATA disk, ext3 > > > > x86_64 kernel and userland > > > > (dirty_background_ratio, dirty_ratio) tunables > > > > (5,10) - default > > > > 2.6.23.1-42.fc8 #1 SMP > > > > 524288 4 59580 60356 > > 524288 4 59247 61101 > > 524288 4 61030 62831 > > > > 2.6.24-rc2 #28 SMP PREEMPT > > > > 524288 4 49277 56582 > > 524288 4 50728 61056 > > 524288 4 52027 59758 > > 524288 4 51520 62426 > > > > > > (20,40) - similar to your 8GB > > > > 2.6.23.1-42.fc8 #1 SMP > > > > 524288 4 225977 447461 > > 524288 4 232595 496848 > > 524288 4 220608 478076 > > 524288 4 203080 445230 > > > > 2.6.24-rc2 #28 SMP PREEMPT > > > > 524288 4 54043 83585 > > 524288 4 69949 516253 > > 524288 4 72343 491416 > > 524288 4 71775 492653 > > > > (60,80) - overkill > > > > 2.6.23.1-42.fc8 #1 SMP > > > > 524288 4 208450 491892 > > 524288 4 216262 481135 > > 524288 4 221892 543608 > > 524288 4 202209 574725 > > 524288 4 231730 452482 > > > > 2.6.24-rc2 #28 SMP PREEMPT > > > > 524288 4 49091 86471 > > 524288 4 65071 217566 > > 524288 4 72238 492172 > > 524288 4 71818 492433 > > 524288 4 71327 493954 > > > > > > While I see that the write speed as reported under .24 ~70MB/s is much > > lower than the one reported under .23 ~200MB/s, I find it very hard to > > believe my poor single SATA disk could actually do the 200MB/s for > > longer than its cache 8/16 MB (not sure). > > > > vmstat shows that actual IO is done, even though the whole 512MB could > > fit in cache, hence my suspicion that the ~70MB/s is the most realistic > > of the two. > > Even 70 MB/s seems too high. What throughput do you see for the > raw disk partition/ > > Also, are the numbers above for successive runs? > It seems like you're seeing some caching effects so > I'd recommend using a file larger than your cache size and > the -e and -c options (to include fsync and close in timings) > to try to eliminate them. -- iozone -i 0 -r 4k -s 512m -e -c .23 (20,40) 524288 4 31750 33560 524288 4 29786 32114 524288 4 29115 31476 .24 (20,40) 524288 4 25022 32411 524288 4 25375 31662 524288 4 26407 33871 -- iozone -i 0 -r 4k -s 4g -e -c .23 (20,40) 4194304 4 39699 35550 4194304 4 40225 36099 .24 (20,40) 4194304 4 39961 41656 4194304 4 39244 39673 Yanmin, for that benchmark you ran, what was it meant to measure? From what I can make of it its just write cache benching. One thing I don't understand is how the write numbers are so much lower than the rewrite numbers. The iozone code (which gives me headaches, damn what a mess) seems to suggest that the only thing that is different is the lack of block allocation. Linus posted a patch yesterday fixing up a regression in the ext3 bitmap block allocator, /me goes apply that patch and rerun the tests. > > (20,40) - similar to your 8GB > > > > 2.6.23.1-42.fc8 #1 SMP > > > > 524288 4 225977 447461 > > 524288 4 232595 496848 > > 524288 4 220608 478076 > > 524288 4 203080 445230 > > > > 2.6.24-rc2 #28 SMP PREEMPT > > > > 524288 4 54043 83585 > > 524288 4 69949 516253 > > 524288 4 72343 491416 > > 524288 4 71775 492653 2.6.24-rc2 + patches/wu-reiser.patch patches/writeback-early.patch patches/bdi-task-dirty.patch patches/bdi-sysfs.patch patches/sched-hrtick.patch patches/sched-rt-entity.patch patches/sched-watchdog.patch patches/linus-ext3-blockalloc.patch 524288 4 179657 487676 524288 4 173989 465682 524288 4 175842 489800 Linus' patch is the one that makes the difference here. So I'm unsure how you bisected it down to: 04fbfdc14e5f48463820d6b9807daa5e9c92c51f These results seem to point to 7c9e69faa28027913ee059c285a5ea8382e24b5d as being the offending patch. signature.asc Description: This is a digitally signed message part
Re: iozone write 50% regression in kernel 2.6.24-rc1
Single socket, dual core opteron, 2GB memory Single SATA disk, ext3 x86_64 kernel and userland (dirty_background_ratio, dirty_ratio) tunables (5,10) - default 2.6.23.1-42.fc8 #1 SMP 524288 4 59580 60356 524288 4 59247 61101 524288 4 61030 62831 2.6.24-rc2 #28 SMP PREEMPT 524288 4 49277 56582 524288 4 50728 61056 524288 4 52027 59758 524288 4 51520 62426 (20,40) - similar to your 8GB 2.6.23.1-42.fc8 #1 SMP 524288 4 225977 447461 524288 4 232595 496848 524288 4 220608 478076 524288 4 203080 445230 2.6.24-rc2 #28 SMP PREEMPT 524288 4 54043 83585 524288 4 69949 516253 524288 4 72343 491416 524288 4 71775 492653 (60,80) - overkill 2.6.23.1-42.fc8 #1 SMP 524288 4 208450 491892 524288 4 216262 481135 524288 4 221892 543608 524288 4 202209 574725 524288 4 231730 452482 2.6.24-rc2 #28 SMP PREEMPT 524288 4 49091 86471 524288 4 65071 217566 524288 4 72238 492172 524288 4 71818 492433 524288 4 71327 493954 While I see that the write speed as reported under .24 ~70MB/s is much lower than the one reported under .23 ~200MB/s, I find it very hard to believe my poor single SATA disk could actually do the 200MB/s for longer than its cache 8/16 MB (not sure). vmstat shows that actual IO is done, even though the whole 512MB could fit in cache, hence my suspicion that the ~70MB/s is the most realistic of the two. I'll have to look into what iozone actually does though and why this patch makes the output different. FWIW - because its a single backing dev it does get to 100% of the dirty limit after a few runs, so not sure what makes the difference. signature.asc Description: This is a digitally signed message part
Re: iozone write 50% regression in kernel 2.6.24-rc1
- Original Message > From: "Zhang, Yanmin" <[EMAIL PROTECTED]> > To: Martin Knoblauch <[EMAIL PROTECTED]> > Cc: [EMAIL PROTECTED]; LKML > Sent: Monday, November 12, 2007 1:45:57 AM > Subject: Re: iozone write 50% regression in kernel 2.6.24-rc1 > > On Fri, 2007-11-09 at 04:36 -0800, Martin Knoblauch wrote: > > - Original Message > > > From: "Zhang, Yanmin" > > > To: [EMAIL PROTECTED] > > > Cc: LKML > > > Sent: Friday, November 9, 2007 10:47:52 AM > > > Subject: iozone write 50% regression in kernel 2.6.24-rc1 > > > > > > Comparing with 2.6.23, iozone sequential write/rewrite (512M) has > > > 50% > > > > > regression > > > in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. > > > > > > My machine has 8 processor cores and 8GB memory. > > > > > > By bisect, I located patch > > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h > = > > > 04fbfdc14e5f48463820d6b9807daa5e9c92c51f. > > > > > > > > > Another behavior: with kernel 2.6.23, if I run iozone for many > > > times > > > > > after rebooting machine, > > > the result looks stable. But with 2.6.24-rc1, the first run of > > > iozone > > > > > got a very small result and > > > following run has 4Xorig_result. > > > > > > What I reported is the regression of 2nd/3rd run, because first run > > > has > > > > > bigger regression. > > > > > > I also tried to change > > > /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio > > > > > and didn't get improvement. > > could you tell us the exact iozone command you are using? > iozone -i 0 -r 4k -s 512m > OK, I definitely do not see the reported effect. On a HP Proliant with a RAID5 on CCISS I get: 2.6.19.2: 654-738 MB/sec write, 1126-1154 MB/sec rewrite 2.6.24-rc2: 772-820 MB/sec write, 1495-1539 MB/sec rewrite The first run is always slowest, all subsequent runs are faster and the same speed. > > > I would like to repeat it on my setup, because I definitely see > the > opposite behaviour in 2.6.24-rc1/rc2. The speed there is much > better > than in 2.6.22 and before (I skipped 2.6.23, because I was waiting > for > the per-bdi changes). I definitely do not see the difference between > 1st > and subsequent runs. But then, I do my tests with 5GB file sizes like: > > > > iozone3_283/src/current/iozone -t 5 -F /scratch/X1 > /scratch/X2 > /scratch/X3 /scratch/X4 /scratch/X5 -s 5000M -r 1024 -c -e -i 0 -i 1 > My machine uses SATA (AHCI) disk. > 4x72GB SCSI disks building a RAID5 on a CCISS controller with battery backed write cache. Systems are 2 CPUs (64-bit) with 8 GB memory. I could test on some IBM boxes (2x dual core, 8 GB) with RAID5 on "aacraid", but I need some time to free up one of the boxes. Cheers Martin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Mon, 2007-11-12 at 10:45 +0100, Peter Zijlstra wrote: > On Mon, 2007-11-12 at 10:14 +0800, Zhang, Yanmin wrote: > > > > Subject: mm: speed up writeback ramp-up on clean systems > > > > I tested kernel 2.6.23, 2,6,24-rc2, 2.6.24-rc2_peter(2.6.24-rc2+this patch). > > > > 1) Compare among first/second/following running > > 2.6.23: second run of iozone will get about 28% improvement than first run. > > Following run is very stable like 2nd run. > > 2.6.24-rc2: second run of iozone will get about 170% improvement than first > > run. 3rd run > > will get about 80% improvement than 2nd. Following run is very stable > > like 3rd run. > > 2.6.24-rc2_peter: second run of iozone will get about 14% improvement than > > first run. Following > > run is mostly stable like 2nd run. > > So the new patch really improves the first run result. Comparing wiht > > 2.6.24-rc2, 2.6.24-rc2_peter > > has 330% improvement on the first run. > > > > 2) Compare among different kernels(based on the stable highest result): > > 2.6.24-rc2 has about 50% regression than 2.6.23. > > 2.6.24-rc2_peter has the same result like 2.6.24-rc2. > > > > From this point of view, above patch has no improvement. :) > > Drad, still good test results though. > > Could you describe you system in detail, that is, you have 8GB of memory > and 8 cpus (2*quad?). Yes. > How many disks does it have 1 machine uses 1 AHCI SATA. Other machines use hardware raid0. > and are those > aggregated using md or dm? No. > What filesystem do you use? Ext3. I got the regression on my a couple of machines. Pls. try command #iozone -i 0 -r 4k -s 512m -yanmin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Mon, 2007-11-12 at 10:14 +0800, Zhang, Yanmin wrote: > > Subject: mm: speed up writeback ramp-up on clean systems > > I tested kernel 2.6.23, 2,6,24-rc2, 2.6.24-rc2_peter(2.6.24-rc2+this patch). > > 1) Compare among first/second/following running > 2.6.23: second run of iozone will get about 28% improvement than first run. > Following run is very stable like 2nd run. > 2.6.24-rc2: second run of iozone will get about 170% improvement than first > run. 3rd run > will get about 80% improvement than 2nd. Following run is very stable > like 3rd run. > 2.6.24-rc2_peter: second run of iozone will get about 14% improvement than > first run. Following > run is mostly stable like 2nd run. > So the new patch really improves the first run result. Comparing wiht > 2.6.24-rc2, 2.6.24-rc2_peter > has 330% improvement on the first run. > > 2) Compare among different kernels(based on the stable highest result): > 2.6.24-rc2 has about 50% regression than 2.6.23. > 2.6.24-rc2_peter has the same result like 2.6.24-rc2. > > From this point of view, above patch has no improvement. :) Drad, still good test results though. Could you describe you system in detail, that is, you have 8GB of memory and 8 cpus (2*quad?). How many disks does it have and are those aggregated using md or dm? What filesystem do you use? signature.asc Description: This is a digitally signed message part
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Mon, 2007-11-12 at 10:14 +0800, Zhang, Yanmin wrote: Subject: mm: speed up writeback ramp-up on clean systems I tested kernel 2.6.23, 2,6,24-rc2, 2.6.24-rc2_peter(2.6.24-rc2+this patch). 1) Compare among first/second/following running 2.6.23: second run of iozone will get about 28% improvement than first run. Following run is very stable like 2nd run. 2.6.24-rc2: second run of iozone will get about 170% improvement than first run. 3rd run will get about 80% improvement than 2nd. Following run is very stable like 3rd run. 2.6.24-rc2_peter: second run of iozone will get about 14% improvement than first run. Following run is mostly stable like 2nd run. So the new patch really improves the first run result. Comparing wiht 2.6.24-rc2, 2.6.24-rc2_peter has 330% improvement on the first run. 2) Compare among different kernels(based on the stable highest result): 2.6.24-rc2 has about 50% regression than 2.6.23. 2.6.24-rc2_peter has the same result like 2.6.24-rc2. From this point of view, above patch has no improvement. :) Drad, still good test results though. Could you describe you system in detail, that is, you have 8GB of memory and 8 cpus (2*quad?). How many disks does it have and are those aggregated using md or dm? What filesystem do you use? signature.asc Description: This is a digitally signed message part
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Mon, 2007-11-12 at 10:45 +0100, Peter Zijlstra wrote: On Mon, 2007-11-12 at 10:14 +0800, Zhang, Yanmin wrote: Subject: mm: speed up writeback ramp-up on clean systems I tested kernel 2.6.23, 2,6,24-rc2, 2.6.24-rc2_peter(2.6.24-rc2+this patch). 1) Compare among first/second/following running 2.6.23: second run of iozone will get about 28% improvement than first run. Following run is very stable like 2nd run. 2.6.24-rc2: second run of iozone will get about 170% improvement than first run. 3rd run will get about 80% improvement than 2nd. Following run is very stable like 3rd run. 2.6.24-rc2_peter: second run of iozone will get about 14% improvement than first run. Following run is mostly stable like 2nd run. So the new patch really improves the first run result. Comparing wiht 2.6.24-rc2, 2.6.24-rc2_peter has 330% improvement on the first run. 2) Compare among different kernels(based on the stable highest result): 2.6.24-rc2 has about 50% regression than 2.6.23. 2.6.24-rc2_peter has the same result like 2.6.24-rc2. From this point of view, above patch has no improvement. :) Drad, still good test results though. Could you describe you system in detail, that is, you have 8GB of memory and 8 cpus (2*quad?). Yes. How many disks does it have 1 machine uses 1 AHCI SATA. Other machines use hardware raid0. and are those aggregated using md or dm? No. What filesystem do you use? Ext3. I got the regression on my a couple of machines. Pls. try command #iozone -i 0 -r 4k -s 512m -yanmin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
- Original Message From: Zhang, Yanmin [EMAIL PROTECTED] To: Martin Knoblauch [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; LKML linux-kernel@vger.kernel.org Sent: Monday, November 12, 2007 1:45:57 AM Subject: Re: iozone write 50% regression in kernel 2.6.24-rc1 On Fri, 2007-11-09 at 04:36 -0800, Martin Knoblauch wrote: - Original Message From: Zhang, Yanmin To: [EMAIL PROTECTED] Cc: LKML Sent: Friday, November 9, 2007 10:47:52 AM Subject: iozone write 50% regression in kernel 2.6.24-rc1 Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% regression in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. My machine has 8 processor cores and 8GB memory. By bisect, I located patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h = 04fbfdc14e5f48463820d6b9807daa5e9c92c51f. Another behavior: with kernel 2.6.23, if I run iozone for many times after rebooting machine, the result looks stable. But with 2.6.24-rc1, the first run of iozone got a very small result and following run has 4Xorig_result. What I reported is the regression of 2nd/3rd run, because first run has bigger regression. I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and didn't get improvement. could you tell us the exact iozone command you are using? iozone -i 0 -r 4k -s 512m OK, I definitely do not see the reported effect. On a HP Proliant with a RAID5 on CCISS I get: 2.6.19.2: 654-738 MB/sec write, 1126-1154 MB/sec rewrite 2.6.24-rc2: 772-820 MB/sec write, 1495-1539 MB/sec rewrite The first run is always slowest, all subsequent runs are faster and the same speed. I would like to repeat it on my setup, because I definitely see the opposite behaviour in 2.6.24-rc1/rc2. The speed there is much better than in 2.6.22 and before (I skipped 2.6.23, because I was waiting for the per-bdi changes). I definitely do not see the difference between 1st and subsequent runs. But then, I do my tests with 5GB file sizes like: iozone3_283/src/current/iozone -t 5 -F /scratch/X1 /scratch/X2 /scratch/X3 /scratch/X4 /scratch/X5 -s 5000M -r 1024 -c -e -i 0 -i 1 My machine uses SATA (AHCI) disk. 4x72GB SCSI disks building a RAID5 on a CCISS controller with battery backed write cache. Systems are 2 CPUs (64-bit) with 8 GB memory. I could test on some IBM boxes (2x dual core, 8 GB) with RAID5 on aacraid, but I need some time to free up one of the boxes. Cheers Martin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
Single socket, dual core opteron, 2GB memory Single SATA disk, ext3 x86_64 kernel and userland (dirty_background_ratio, dirty_ratio) tunables (5,10) - default 2.6.23.1-42.fc8 #1 SMP 524288 4 59580 60356 524288 4 59247 61101 524288 4 61030 62831 2.6.24-rc2 #28 SMP PREEMPT 524288 4 49277 56582 524288 4 50728 61056 524288 4 52027 59758 524288 4 51520 62426 (20,40) - similar to your 8GB 2.6.23.1-42.fc8 #1 SMP 524288 4 225977 447461 524288 4 232595 496848 524288 4 220608 478076 524288 4 203080 445230 2.6.24-rc2 #28 SMP PREEMPT 524288 4 54043 83585 524288 4 69949 516253 524288 4 72343 491416 524288 4 71775 492653 (60,80) - overkill 2.6.23.1-42.fc8 #1 SMP 524288 4 208450 491892 524288 4 216262 481135 524288 4 221892 543608 524288 4 202209 574725 524288 4 231730 452482 2.6.24-rc2 #28 SMP PREEMPT 524288 4 49091 86471 524288 4 65071 217566 524288 4 72238 492172 524288 4 71818 492433 524288 4 71327 493954 While I see that the write speed as reported under .24 ~70MB/s is much lower than the one reported under .23 ~200MB/s, I find it very hard to believe my poor single SATA disk could actually do the 200MB/s for longer than its cache 8/16 MB (not sure). vmstat shows that actual IO is done, even though the whole 512MB could fit in cache, hence my suspicion that the ~70MB/s is the most realistic of the two. I'll have to look into what iozone actually does though and why this patch makes the output different. FWIW - because its a single backing dev it does get to 100% of the dirty limit after a few runs, so not sure what makes the difference. signature.asc Description: This is a digitally signed message part
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Mon, 2007-11-12 at 17:05 +0200, Benny Halevy wrote: On Nov. 12, 2007, 15:26 +0200, Peter Zijlstra [EMAIL PROTECTED] wrote: Single socket, dual core opteron, 2GB memory Single SATA disk, ext3 x86_64 kernel and userland (dirty_background_ratio, dirty_ratio) tunables (5,10) - default 2.6.23.1-42.fc8 #1 SMP 524288 4 59580 60356 524288 4 59247 61101 524288 4 61030 62831 2.6.24-rc2 #28 SMP PREEMPT 524288 4 49277 56582 524288 4 50728 61056 524288 4 52027 59758 524288 4 51520 62426 (20,40) - similar to your 8GB 2.6.23.1-42.fc8 #1 SMP 524288 4 225977 447461 524288 4 232595 496848 524288 4 220608 478076 524288 4 203080 445230 2.6.24-rc2 #28 SMP PREEMPT 524288 4 54043 83585 524288 4 69949 516253 524288 4 72343 491416 524288 4 71775 492653 (60,80) - overkill 2.6.23.1-42.fc8 #1 SMP 524288 4 208450 491892 524288 4 216262 481135 524288 4 221892 543608 524288 4 202209 574725 524288 4 231730 452482 2.6.24-rc2 #28 SMP PREEMPT 524288 4 49091 86471 524288 4 65071 217566 524288 4 72238 492172 524288 4 71818 492433 524288 4 71327 493954 While I see that the write speed as reported under .24 ~70MB/s is much lower than the one reported under .23 ~200MB/s, I find it very hard to believe my poor single SATA disk could actually do the 200MB/s for longer than its cache 8/16 MB (not sure). vmstat shows that actual IO is done, even though the whole 512MB could fit in cache, hence my suspicion that the ~70MB/s is the most realistic of the two. Even 70 MB/s seems too high. What throughput do you see for the raw disk partition/ Also, are the numbers above for successive runs? It seems like you're seeing some caching effects so I'd recommend using a file larger than your cache size and the -e and -c options (to include fsync and close in timings) to try to eliminate them. -- iozone -i 0 -r 4k -s 512m -e -c .23 (20,40) 524288 4 31750 33560 524288 4 29786 32114 524288 4 29115 31476 .24 (20,40) 524288 4 25022 32411 524288 4 25375 31662 524288 4 26407 33871 -- iozone -i 0 -r 4k -s 4g -e -c .23 (20,40) 4194304 4 39699 35550 4194304 4 40225 36099 .24 (20,40) 4194304 4 39961 41656 4194304 4 39244 39673 Yanmin, for that benchmark you ran, what was it meant to measure? From what I can make of it its just write cache benching. One thing I don't understand is how the write numbers are so much lower than the rewrite numbers. The iozone code (which gives me headaches, damn what a mess) seems to suggest that the only thing that is different is the lack of block allocation. Linus posted a patch yesterday fixing up a regression in the ext3 bitmap block allocator, /me goes apply that patch and rerun the tests. (20,40) - similar to your 8GB 2.6.23.1-42.fc8 #1 SMP 524288 4 225977 447461 524288 4 232595 496848 524288 4 220608 478076 524288 4 203080 445230 2.6.24-rc2 #28 SMP PREEMPT 524288 4 54043 83585 524288 4 69949 516253 524288 4 72343 491416 524288 4 71775 492653 2.6.24-rc2 + patches/wu-reiser.patch patches/writeback-early.patch patches/bdi-task-dirty.patch patches/bdi-sysfs.patch patches/sched-hrtick.patch patches/sched-rt-entity.patch patches/sched-watchdog.patch patches/linus-ext3-blockalloc.patch 524288 4 179657 487676 524288 4 173989 465682 524288 4 175842 489800 Linus' patch is the one that makes the difference here. So I'm unsure how you bisected it down to: 04fbfdc14e5f48463820d6b9807daa5e9c92c51f These results seem to point to 7c9e69faa28027913ee059c285a5ea8382e24b5d as being the offending patch. signature.asc Description: This is a digitally signed message part
Re: iozone write 50% regression in kernel 2.6.24-rc1
Peter Zijlstra wrote: .. While I see that the write speed as reported under .24 ~70MB/s is much lower than the one reported under .23 ~200MB/s, I find it very hard to believe my poor single SATA disk could actually do the 200MB/s for longer than its cache 8/16 MB (not sure). vmstat shows that actual IO is done, even though the whole 512MB could fit in cache, hence my suspicion that the ~70MB/s is the most realistic of the two. .. Yeah, sequential 70MB/sec is quite realistic for a modern SATA drive. But significantly faster than that (say, 100MB/sec +) is unlikely at present. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Mon, 2007-11-12 at 12:25 -0500, Mark Lord wrote: Peter Zijlstra wrote: .. While I see that the write speed as reported under .24 ~70MB/s is much lower than the one reported under .23 ~200MB/s, I find it very hard to believe my poor single SATA disk could actually do the 200MB/s for longer than its cache 8/16 MB (not sure). vmstat shows that actual IO is done, even though the whole 512MB could fit in cache, hence my suspicion that the ~70MB/s is the most realistic of the two. .. Yeah, sequential 70MB/sec is quite realistic for a modern SATA drive. But significantly faster than that (say, 100MB/sec +) is unlikely at present. I just use command '#iozone -i 0 -r 4k -s 512m', no '-e -c'. So if we consider cache, the speed is very fast. On my machine with 2.6.23, the write speed is 631M/s, quite fast. :) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Mon, 2007-11-12 at 04:58 -0800, Martin Knoblauch wrote: - Original Message From: Zhang, Yanmin [EMAIL PROTECTED] To: Martin Knoblauch [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; LKML linux-kernel@vger.kernel.org Sent: Monday, November 12, 2007 1:45:57 AM Subject: Re: iozone write 50% regression in kernel 2.6.24-rc1 On Fri, 2007-11-09 at 04:36 -0800, Martin Knoblauch wrote: - Original Message From: Zhang, Yanmin To: [EMAIL PROTECTED] Cc: LKML Sent: Friday, November 9, 2007 10:47:52 AM Subject: iozone write 50% regression in kernel 2.6.24-rc1 Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% regression in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. My machine has 8 processor cores and 8GB memory. By bisect, I located patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h = 04fbfdc14e5f48463820d6b9807daa5e9c92c51f. Another behavior: with kernel 2.6.23, if I run iozone for many times after rebooting machine, the result looks stable. But with 2.6.24-rc1, the first run of iozone got a very small result and following run has 4Xorig_result. What I reported is the regression of 2nd/3rd run, because first run has bigger regression. I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and didn't get improvement. could you tell us the exact iozone command you are using? iozone -i 0 -r 4k -s 512m OK, I definitely do not see the reported effect. On a HP Proliant with a RAID5 on CCISS I get: 2.6.19.2: 654-738 MB/sec write, 1126-1154 MB/sec rewrite 2.6.24-rc2: 772-820 MB/sec write, 1495-1539 MB/sec rewrite The first run is always slowest, all subsequent runs are faster and the same speed. Although the first run is always slowest, but if we compare 2.6.23 and 2.6.24-rc, we could find the first run result of 2.6.23 is 7 times of the one of 2.6.24-rc. Originally, my test suite is just to pick up the result of first run. I might change my test suite to make it run for many times. Now I run the the test manually for many times after machine reboots. Comparing 2.6.24-rc with 2.6.23, 3rd and following run of 2.6.24-rc has about 50% regression. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Mon, 2007-11-12 at 17:48 +0100, Peter Zijlstra wrote: On Mon, 2007-11-12 at 17:05 +0200, Benny Halevy wrote: On Nov. 12, 2007, 15:26 +0200, Peter Zijlstra [EMAIL PROTECTED] wrote: Single socket, dual core opteron, 2GB memory Single SATA disk, ext3 x86_64 kernel and userland (dirty_background_ratio, dirty_ratio) tunables (5,10) - default 2.6.23.1-42.fc8 #1 SMP 524288 4 59580 60356 524288 4 59247 61101 524288 4 61030 62831 2.6.24-rc2 #28 SMP PREEMPT 524288 4 49277 56582 524288 4 50728 61056 524288 4 52027 59758 524288 4 51520 62426 (20,40) - similar to your 8GB 2.6.23.1-42.fc8 #1 SMP 524288 4 225977 447461 524288 4 232595 496848 524288 4 220608 478076 524288 4 203080 445230 2.6.24-rc2 #28 SMP PREEMPT 524288 4 54043 83585 524288 4 69949 516253 524288 4 72343 491416 524288 4 71775 492653 (60,80) - overkill 2.6.23.1-42.fc8 #1 SMP 524288 4 208450 491892 524288 4 216262 481135 524288 4 221892 543608 524288 4 202209 574725 524288 4 231730 452482 2.6.24-rc2 #28 SMP PREEMPT 524288 4 49091 86471 524288 4 65071 217566 524288 4 72238 492172 524288 4 71818 492433 524288 4 71327 493954 While I see that the write speed as reported under .24 ~70MB/s is much lower than the one reported under .23 ~200MB/s, I find it very hard to believe my poor single SATA disk could actually do the 200MB/s for longer than its cache 8/16 MB (not sure). vmstat shows that actual IO is done, even though the whole 512MB could fit in cache, hence my suspicion that the ~70MB/s is the most realistic of the two. Even 70 MB/s seems too high. What throughput do you see for the raw disk partition/ Also, are the numbers above for successive runs? It seems like you're seeing some caching effects so I'd recommend using a file larger than your cache size and the -e and -c options (to include fsync and close in timings) to try to eliminate them. -- iozone -i 0 -r 4k -s 512m -e -c .23 (20,40) 524288 4 31750 33560 524288 4 29786 32114 524288 4 29115 31476 .24 (20,40) 524288 4 25022 32411 524288 4 25375 31662 524288 4 26407 33871 -- iozone -i 0 -r 4k -s 4g -e -c .23 (20,40) 4194304 4 39699 35550 4194304 4 40225 36099 .24 (20,40) 4194304 4 39961 41656 4194304 4 39244 39673 Yanmin, for that benchmark you ran, what was it meant to measure? From what I can make of it its just write cache benching. Yeah. It's quite related to cache. I did more testing on my stoakley machine (8 cores, 8GB mem). If I reduce the memory to 4GB, the speed will be far slower. One thing I don't understand is how the write numbers are so much lower than the rewrite numbers. The iozone code (which gives me headaches, damn what a mess) seems to suggest that the only thing that is different is the lack of block allocation. It might be a good direction. Linus posted a patch yesterday fixing up a regression in the ext3 bitmap block allocator, /me goes apply that patch and rerun the tests. (20,40) - similar to your 8GB 2.6.23.1-42.fc8 #1 SMP 524288 4 225977 447461 524288 4 232595 496848 524288 4 220608 478076 524288 4 203080 445230 2.6.24-rc2 #28 SMP PREEMPT 524288 4 54043 83585 524288 4 69949 516253 524288 4 72343 491416 524288 4 71775 492653 2.6.24-rc2 + patches/wu-reiser.patch patches/writeback-early.patch patches/bdi-task-dirty.patch patches/bdi-sysfs.patch patches/sched-hrtick.patch patches/sched-rt-entity.patch patches/sched-watchdog.patch patches/linus-ext3-blockalloc.patch 524288 4 179657 487676 524288 4 173989 465682 524288 4 175842 489800 Linus' patch is the one that makes the difference here. So I'm unsure how you bisected it down to: 04fbfdc14e5f48463820d6b9807daa5e9c92c51f Originally, my test suite is just to pick up the result of first run. Your prior
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Fri, 2007-11-09 at 10:54 +0100, Peter Zijlstra wrote: > On Fri, 2007-11-09 at 17:47 +0800, Zhang, Yanmin wrote: > > Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% > > regression > > in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. > > > > My machine has 8 processor cores and 8GB memory. > > > > By bisect, I located patch > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=04fbfdc14e5f48463820d6b9807daa5e9c92c51f. > > > > > > Another behavior: with kernel 2.6.23, if I run iozone for many times after > > rebooting machine, > > the result looks stable. But with 2.6.24-rc1, the first run of iozone got a > > very small result and > > following run has 4Xorig_result. > > So the second run is 4x as fast as the first run? Pls. see below comments. > > > What I reported is the regression of 2nd/3rd run, because first run has > > bigger regression. > > So the 2nd and 3rd run are stable at 50% slower than .23? Almostly. I did more testing today. Pls. see below result list. > > > I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and > > didn't get improvement. > > Could you try: > > --- > Subject: mm: speed up writeback ramp-up on clean systems I tested kernel 2.6.23, 2,6,24-rc2, 2.6.24-rc2_peter(2.6.24-rc2+this patch). 1) Compare among first/second/following running 2.6.23: second run of iozone will get about 28% improvement than first run. Following run is very stable like 2nd run. 2.6.24-rc2: second run of iozone will get about 170% improvement than first run. 3rd run will get about 80% improvement than 2nd. Following run is very stable like 3rd run. 2.6.24-rc2_peter: second run of iozone will get about 14% improvement than first run. Following run is mostly stable like 2nd run. So the new patch really improves the first run result. Comparing wiht 2.6.24-rc2, 2.6.24-rc2_peter has 330% improvement on the first run. 2) Compare among different kernels(based on the stable highest result): 2.6.24-rc2 has about 50% regression than 2.6.23. 2.6.24-rc2_peter has the same result like 2.6.24-rc2. >From this point of view, above patch has no improvement. :) -yanmin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Fri, 2007-11-09 at 04:36 -0800, Martin Knoblauch wrote: > - Original Message > > From: "Zhang, Yanmin" <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED] > > Cc: LKML > > Sent: Friday, November 9, 2007 10:47:52 AM > > Subject: iozone write 50% regression in kernel 2.6.24-rc1 > > > > Comparing with 2.6.23, iozone sequential write/rewrite (512M) has > > 50% > > > regression > > in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. > > > > My machine has 8 processor cores and 8GB memory. > > > > By bisect, I located patch > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h= > > 04fbfdc14e5f48463820d6b9807daa5e9c92c51f. > > > > > > Another behavior: with kernel 2.6.23, if I run iozone for many > > times > > > after rebooting machine, > > the result looks stable. But with 2.6.24-rc1, the first run of > > iozone > > > got a very small result and > > following run has 4Xorig_result. > > > > What I reported is the regression of 2nd/3rd run, because first run > > has > > > bigger regression. > > > > I also tried to change > > /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio > > > and didn't get improvement. > could you tell us the exact iozone command you are using? iozone -i 0 -r 4k -s 512m > I would like to repeat it on my setup, because I definitely see the opposite > behaviour in 2.6.24-rc1/rc2. The speed there is much better than in 2.6.22 > and before (I skipped 2.6.23, because I was waiting for the per-bdi changes). > I definitely do not see the difference between 1st and subsequent runs. But > then, I do my tests with 5GB file sizes like: > > iozone3_283/src/current/iozone -t 5 -F /scratch/X1 /scratch/X2 /scratch/X3 > /scratch/X4 /scratch/X5 -s 5000M -r 1024 -c -e -i 0 -i 1 My machine uses SATA (AHCI) disk. -yanmin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Fri, 2007-11-09 at 04:36 -0800, Martin Knoblauch wrote: - Original Message From: Zhang, Yanmin [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: LKML linux-kernel@vger.kernel.org Sent: Friday, November 9, 2007 10:47:52 AM Subject: iozone write 50% regression in kernel 2.6.24-rc1 Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% regression in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. My machine has 8 processor cores and 8GB memory. By bisect, I located patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h= 04fbfdc14e5f48463820d6b9807daa5e9c92c51f. Another behavior: with kernel 2.6.23, if I run iozone for many times after rebooting machine, the result looks stable. But with 2.6.24-rc1, the first run of iozone got a very small result and following run has 4Xorig_result. What I reported is the regression of 2nd/3rd run, because first run has bigger regression. I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and didn't get improvement. could you tell us the exact iozone command you are using? iozone -i 0 -r 4k -s 512m I would like to repeat it on my setup, because I definitely see the opposite behaviour in 2.6.24-rc1/rc2. The speed there is much better than in 2.6.22 and before (I skipped 2.6.23, because I was waiting for the per-bdi changes). I definitely do not see the difference between 1st and subsequent runs. But then, I do my tests with 5GB file sizes like: iozone3_283/src/current/iozone -t 5 -F /scratch/X1 /scratch/X2 /scratch/X3 /scratch/X4 /scratch/X5 -s 5000M -r 1024 -c -e -i 0 -i 1 My machine uses SATA (AHCI) disk. -yanmin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Fri, 2007-11-09 at 10:54 +0100, Peter Zijlstra wrote: On Fri, 2007-11-09 at 17:47 +0800, Zhang, Yanmin wrote: Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% regression in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. My machine has 8 processor cores and 8GB memory. By bisect, I located patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=04fbfdc14e5f48463820d6b9807daa5e9c92c51f. Another behavior: with kernel 2.6.23, if I run iozone for many times after rebooting machine, the result looks stable. But with 2.6.24-rc1, the first run of iozone got a very small result and following run has 4Xorig_result. So the second run is 4x as fast as the first run? Pls. see below comments. What I reported is the regression of 2nd/3rd run, because first run has bigger regression. So the 2nd and 3rd run are stable at 50% slower than .23? Almostly. I did more testing today. Pls. see below result list. I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and didn't get improvement. Could you try: --- Subject: mm: speed up writeback ramp-up on clean systems I tested kernel 2.6.23, 2,6,24-rc2, 2.6.24-rc2_peter(2.6.24-rc2+this patch). 1) Compare among first/second/following running 2.6.23: second run of iozone will get about 28% improvement than first run. Following run is very stable like 2nd run. 2.6.24-rc2: second run of iozone will get about 170% improvement than first run. 3rd run will get about 80% improvement than 2nd. Following run is very stable like 3rd run. 2.6.24-rc2_peter: second run of iozone will get about 14% improvement than first run. Following run is mostly stable like 2nd run. So the new patch really improves the first run result. Comparing wiht 2.6.24-rc2, 2.6.24-rc2_peter has 330% improvement on the first run. 2) Compare among different kernels(based on the stable highest result): 2.6.24-rc2 has about 50% regression than 2.6.23. 2.6.24-rc2_peter has the same result like 2.6.24-rc2. From this point of view, above patch has no improvement. :) -yanmin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
- Original Message > From: "Zhang, Yanmin" <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: LKML > Sent: Friday, November 9, 2007 10:47:52 AM > Subject: iozone write 50% regression in kernel 2.6.24-rc1 > > Comparing with 2.6.23, iozone sequential write/rewrite (512M) has > 50% > regression > in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. > > My machine has 8 processor cores and 8GB memory. > > By bisect, I located patch > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h= > 04fbfdc14e5f48463820d6b9807daa5e9c92c51f. > > > Another behavior: with kernel 2.6.23, if I run iozone for many > times > after rebooting machine, > the result looks stable. But with 2.6.24-rc1, the first run of > iozone > got a very small result and > following run has 4Xorig_result. > > What I reported is the regression of 2nd/3rd run, because first run > has > bigger regression. > > I also tried to change > /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio > and didn't get improvement. > > -yanmin > - Hi Yanmin, could you tell us the exact iozone command you are using? I would like to repeat it on my setup, because I definitely see the opposite behaviour in 2.6.24-rc1/rc2. The speed there is much better than in 2.6.22 and before (I skipped 2.6.23, because I was waiting for the per-bdi changes). I definitely do not see the difference between 1st and subsequent runs. But then, I do my tests with 5GB file sizes like: iozone3_283/src/current/iozone -t 5 -F /scratch/X1 /scratch/X2 /scratch/X3 /scratch/X4 /scratch/X5 -s 5000M -r 1024 -c -e -i 0 -i 1 Kind regards Martin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Fri, 2007-11-09 at 17:47 +0800, Zhang, Yanmin wrote: > Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% > regression > in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. > > My machine has 8 processor cores and 8GB memory. > > By bisect, I located patch > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=04fbfdc14e5f48463820d6b9807daa5e9c92c51f. > > > Another behavior: with kernel 2.6.23, if I run iozone for many times after > rebooting machine, > the result looks stable. But with 2.6.24-rc1, the first run of iozone got a > very small result and > following run has 4Xorig_result. So the second run is 4x as fast as the first run? > What I reported is the regression of 2nd/3rd run, because first run has > bigger regression. So the 2nd and 3rd run are stable at 50% slower than .23? > I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and > didn't get improvement. Could you try: --- Subject: mm: speed up writeback ramp-up on clean systems We allow violation of bdi limits if there is a lot of room on the system. Once we hit half the total limit we start enforcing bdi limits and bdi ramp-up should happen. Doing it this way avoids many small writeouts on an otherwise idle system and should also speed up the ramp-up. Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> Reviewed-by: Fengguang Wu <[EMAIL PROTECTED]> --- mm/page-writeback.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) Index: linux-2.6/mm/page-writeback.c === --- linux-2.6.orig/mm/page-writeback.c 2007-09-28 10:08:33.937415368 +0200 +++ linux-2.6/mm/page-writeback.c 2007-09-28 10:54:26.018247516 +0200 @@ -355,8 +355,8 @@ get_dirty_limits(long *pbackground, long */ static void balance_dirty_pages(struct address_space *mapping) { - long bdi_nr_reclaimable; - long bdi_nr_writeback; + long nr_reclaimable, bdi_nr_reclaimable; + long nr_writeback, bdi_nr_writeback; long background_thresh; long dirty_thresh; long bdi_thresh; @@ -376,11 +376,26 @@ static void balance_dirty_pages(struct a get_dirty_limits(_thresh, _thresh, _thresh, bdi); + + nr_reclaimable = global_page_state(NR_FILE_DIRTY) + + global_page_state(NR_UNSTABLE_NFS); + nr_writeback = global_page_state(NR_WRITEBACK); + bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) break; + /* +* Throttle it only when the background writeback cannot +* catch-up. This avoids (excessively) small writeouts +* when the bdi limits are ramping up. +*/ + if (nr_reclaimable + nr_writeback < + (background_thresh + dirty_thresh) / 2) + break; + if (!bdi->dirty_exceeded) bdi->dirty_exceeded = 1; signature.asc Description: This is a digitally signed message part
iozone write 50% regression in kernel 2.6.24-rc1
Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% regression in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. My machine has 8 processor cores and 8GB memory. By bisect, I located patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=04fbfdc14e5f48463820d6b9807daa5e9c92c51f. Another behavior: with kernel 2.6.23, if I run iozone for many times after rebooting machine, the result looks stable. But with 2.6.24-rc1, the first run of iozone got a very small result and following run has 4Xorig_result. What I reported is the regression of 2nd/3rd run, because first run has bigger regression. I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and didn't get improvement. -yanmin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
On Fri, 2007-11-09 at 17:47 +0800, Zhang, Yanmin wrote: Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% regression in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. My machine has 8 processor cores and 8GB memory. By bisect, I located patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=04fbfdc14e5f48463820d6b9807daa5e9c92c51f. Another behavior: with kernel 2.6.23, if I run iozone for many times after rebooting machine, the result looks stable. But with 2.6.24-rc1, the first run of iozone got a very small result and following run has 4Xorig_result. So the second run is 4x as fast as the first run? What I reported is the regression of 2nd/3rd run, because first run has bigger regression. So the 2nd and 3rd run are stable at 50% slower than .23? I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and didn't get improvement. Could you try: --- Subject: mm: speed up writeback ramp-up on clean systems We allow violation of bdi limits if there is a lot of room on the system. Once we hit half the total limit we start enforcing bdi limits and bdi ramp-up should happen. Doing it this way avoids many small writeouts on an otherwise idle system and should also speed up the ramp-up. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] Reviewed-by: Fengguang Wu [EMAIL PROTECTED] --- mm/page-writeback.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) Index: linux-2.6/mm/page-writeback.c === --- linux-2.6.orig/mm/page-writeback.c 2007-09-28 10:08:33.937415368 +0200 +++ linux-2.6/mm/page-writeback.c 2007-09-28 10:54:26.018247516 +0200 @@ -355,8 +355,8 @@ get_dirty_limits(long *pbackground, long */ static void balance_dirty_pages(struct address_space *mapping) { - long bdi_nr_reclaimable; - long bdi_nr_writeback; + long nr_reclaimable, bdi_nr_reclaimable; + long nr_writeback, bdi_nr_writeback; long background_thresh; long dirty_thresh; long bdi_thresh; @@ -376,11 +376,26 @@ static void balance_dirty_pages(struct a get_dirty_limits(background_thresh, dirty_thresh, bdi_thresh, bdi); + + nr_reclaimable = global_page_state(NR_FILE_DIRTY) + + global_page_state(NR_UNSTABLE_NFS); + nr_writeback = global_page_state(NR_WRITEBACK); + bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); + if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh) break; + /* +* Throttle it only when the background writeback cannot +* catch-up. This avoids (excessively) small writeouts +* when the bdi limits are ramping up. +*/ + if (nr_reclaimable + nr_writeback + (background_thresh + dirty_thresh) / 2) + break; + if (!bdi-dirty_exceeded) bdi-dirty_exceeded = 1; signature.asc Description: This is a digitally signed message part
iozone write 50% regression in kernel 2.6.24-rc1
Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% regression in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. My machine has 8 processor cores and 8GB memory. By bisect, I located patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=04fbfdc14e5f48463820d6b9807daa5e9c92c51f. Another behavior: with kernel 2.6.23, if I run iozone for many times after rebooting machine, the result looks stable. But with 2.6.24-rc1, the first run of iozone got a very small result and following run has 4Xorig_result. What I reported is the regression of 2nd/3rd run, because first run has bigger regression. I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and didn't get improvement. -yanmin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iozone write 50% regression in kernel 2.6.24-rc1
- Original Message From: Zhang, Yanmin [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: LKML linux-kernel@vger.kernel.org Sent: Friday, November 9, 2007 10:47:52 AM Subject: iozone write 50% regression in kernel 2.6.24-rc1 Comparing with 2.6.23, iozone sequential write/rewrite (512M) has 50% regression in kernel 2.6.24-rc1. 2.6.24-rc2 has the same regression. My machine has 8 processor cores and 8GB memory. By bisect, I located patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h= 04fbfdc14e5f48463820d6b9807daa5e9c92c51f. Another behavior: with kernel 2.6.23, if I run iozone for many times after rebooting machine, the result looks stable. But with 2.6.24-rc1, the first run of iozone got a very small result and following run has 4Xorig_result. What I reported is the regression of 2nd/3rd run, because first run has bigger regression. I also tried to change /proc/sys/vm/dirty_ratio,dirty_backgroud_ratio and didn't get improvement. -yanmin - Hi Yanmin, could you tell us the exact iozone command you are using? I would like to repeat it on my setup, because I definitely see the opposite behaviour in 2.6.24-rc1/rc2. The speed there is much better than in 2.6.22 and before (I skipped 2.6.23, because I was waiting for the per-bdi changes). I definitely do not see the difference between 1st and subsequent runs. But then, I do my tests with 5GB file sizes like: iozone3_283/src/current/iozone -t 5 -F /scratch/X1 /scratch/X2 /scratch/X3 /scratch/X4 /scratch/X5 -s 5000M -r 1024 -c -e -i 0 -i 1 Kind regards Martin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/