Re: bdflush/mm performance drop-out defect (more info)
On Tue, 22 May 2001, null wrote: > Here is some additional info about the 2.4 performance defect. > > Only one person offered a suggestion about the use of HIGHMEM. > I tried with and without HIGHMEM enabled with the same results. > However, it does appear to take a bit longer to reach > performance drop-out condition when HIGHMEM is disabled. I'm seeing this same thing whenever kswapd and bdflush are gobbling up CPU time at full speed without doing anything useful. At the moment I've only managed a really slight reduction in the phenomenon by just not waking up bdflush when it can't do any work. The real solution probably will consist of some "everybody wait on IO for a moment" thing which will take some time to develop. Stay on the lookout for patches on: http://www.surriel.com/patches/ cheers, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bdflush/mm performance drop-out defect (more info)
> post I quoted some conversation between Rick Van Riel and Alan Cox Oops. The least I can do is spell his name right. Sorry Rik. 8) Keep up the good work. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bdflush/mm performance drop-out defect (more info)
On Tue, 22 May 2001, Jeffrey W. Baker wrote: > In short, I'm not seeing this problem. I appreciate your attempt to duplicate the defect on your system. In my original post I quoted some conversation between Rick Van Riel and Alan Cox where they describe seeing the same symptoms under heavy load. Alan described it then as a "partial mystery". Yesterday some others that I work with confirmed seeing the same defect on their systems. Their devices go almost completely idle under heavy I/O load. What I would like to do now is somehow attract some visibility to this issue by helping to find a repeatable test case. > May I suggest that the problem may be the driver for your SCSI device? I can't ruled this out yet, but the problem has been confirmed on at least three different low level SCSI drivers: qlogicfc, qla2x00, and megaraid. And I suspect that Alan and Rick were likely using something else when they saw it. > I just ran some tests of parallel I/O on a 2 CPU Intel Pentium III 800 > MHz with 2GB main memory I have a theory that the problem only shows up when there is some pressure on the buffer cache code. In one of your tests, you have a system with 2GB of main memory and you write ten 100MB files with dd. This may not have begun to stress the buffer cache in a way that exposes the defect. If you have the resources, you might try writing larger files with dd. Try something which would exceed your 2GB system memory for some period of time. Or try lowering the visible system memory (mem=xx). I should point out that the most repeatable test case I've found so far is with large parallel mke2fs processes. I realize most folks don't have extra disk space lying around to do the parallel mkfs test, but it's the one I would recommend at this point. Using vmstat, you should be able to tell when your cache memory is being pushed. Something I noticed while using vmstat is that my system tends to become completely unresponsive at about the same time as some other process tries to do a read. Here's a theory: the dd and mkfs commands have a pattern best described as sequencial write, which may be causing the buffer cache code to settle into some optimization behavior over time, and when another process attempts to do a read from somewhere not already cached, it seems to cause severe performance issues backing out of that optimization. That's just an idea. Again, thanks for trying to reproduce this. Enough people have seen the symptoms that I'm gaining more confidence that it isn't just my configuration or something I'm doing wrong. I just hope that we can resolve this before 2.4 gets deployed in any kind of intense I/O environment where it could leave a bad impression. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bdflush/mm performance drop-out defect (more info)
On Tue, 22 May 2001, Jeffrey W. Baker wrote: In short, I'm not seeing this problem. I appreciate your attempt to duplicate the defect on your system. In my original post I quoted some conversation between Rick Van Riel and Alan Cox where they describe seeing the same symptoms under heavy load. Alan described it then as a partial mystery. Yesterday some others that I work with confirmed seeing the same defect on their systems. Their devices go almost completely idle under heavy I/O load. What I would like to do now is somehow attract some visibility to this issue by helping to find a repeatable test case. May I suggest that the problem may be the driver for your SCSI device? I can't ruled this out yet, but the problem has been confirmed on at least three different low level SCSI drivers: qlogicfc, qla2x00, and megaraid. And I suspect that Alan and Rick were likely using something else when they saw it. I just ran some tests of parallel I/O on a 2 CPU Intel Pentium III 800 MHz with 2GB main memory I have a theory that the problem only shows up when there is some pressure on the buffer cache code. In one of your tests, you have a system with 2GB of main memory and you write ten 100MB files with dd. This may not have begun to stress the buffer cache in a way that exposes the defect. If you have the resources, you might try writing larger files with dd. Try something which would exceed your 2GB system memory for some period of time. Or try lowering the visible system memory (mem=xx). I should point out that the most repeatable test case I've found so far is with large parallel mke2fs processes. I realize most folks don't have extra disk space lying around to do the parallel mkfs test, but it's the one I would recommend at this point. Using vmstat, you should be able to tell when your cache memory is being pushed. Something I noticed while using vmstat is that my system tends to become completely unresponsive at about the same time as some other process tries to do a read. Here's a theory: the dd and mkfs commands have a pattern best described as sequencial write, which may be causing the buffer cache code to settle into some optimization behavior over time, and when another process attempts to do a read from somewhere not already cached, it seems to cause severe performance issues backing out of that optimization. That's just an idea. Again, thanks for trying to reproduce this. Enough people have seen the symptoms that I'm gaining more confidence that it isn't just my configuration or something I'm doing wrong. I just hope that we can resolve this before 2.4 gets deployed in any kind of intense I/O environment where it could leave a bad impression. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bdflush/mm performance drop-out defect (more info)
post I quoted some conversation between Rick Van Riel and Alan Cox Oops. The least I can do is spell his name right. Sorry Rik. 8) Keep up the good work. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bdflush/mm performance drop-out defect (more info)
On Tue, 22 May 2001, null wrote: Here is some additional info about the 2.4 performance defect. Only one person offered a suggestion about the use of HIGHMEM. I tried with and without HIGHMEM enabled with the same results. However, it does appear to take a bit longer to reach performance drop-out condition when HIGHMEM is disabled. I'm seeing this same thing whenever kswapd and bdflush are gobbling up CPU time at full speed without doing anything useful. At the moment I've only managed a really slight reduction in the phenomenon by just not waking up bdflush when it can't do any work. The real solution probably will consist of some everybody wait on IO for a moment thing which will take some time to develop. Stay on the lookout for patches on: http://www.surriel.com/patches/ cheers, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bdflush/mm performance drop-out defect (more info)
On Tue, 22 May 2001, null wrote: > Here is some additional info about the 2.4 performance defect. > > Only one person offered a suggestion about the use of HIGHMEM. I tried > with and without HIGHMEM enabled with the same results. However, it does > appear to take a bit longer to reach performance drop-out condition when > HIGHMEM is disabled. > > The same system degradation also appears when using partitions on a single > internal SCSI drive, but seems to happen only when performing the I/O in > parallel processes. It appears that the load must be sustained long > enough to affect some buffer cache behavior. Parallel dd commands > (if=/dev/zero) also reveal the problem. I still need to do some > benchmarks, but it looks like 2.4 kernels achieve roughly 25% (or less?) > of the throughput of the 2.2 kernels under heavy parallel loads (on > identical hardware). I've also confirmed the defect on a dual-processor > Xeon system with 2.4. The defect exists whether drivers are built-in or > compiled as modules, altho the parallel mkfs test duration improves by as > much as 50% in some cases when using a kernel with built-in SCSI drivers. That's a very interesting observation. May I suggest that the problem may be the driver for your SCSI device? I just ran some tests of parallel I/O on a 2 CPU Intel Pentium III 800 MHz with 2GB main memory, on a single Seagate Barracuda ST336704LWV attached to a AIC7896. The system controller is Intel 440GX. The kernel is 2.4.3-ac7: jwb@windmill:~$ for i in 1 2 3 4 5 6 7 8 9 10; do time dd if=/dev/zero of=/tmp/$i bs=4096 count=25600 & done; This spawns 10 writers of 100MB files on the same filesystem. While all this went on, the system was responsive, and vmstat showed a steady block write of at least 2 blocks/second. Meanwhile this machine also has constantly used mysql and postgresql database systems and a few interactive users. The test completed in 19 seconds and 24 seconds on separate runs. I also performed this test on a machine with 2 Intel Pentium III 933 MHz CPUs, 512MB main memory, an Intel 840 system controller, and a Quantum 10K II 9GB drive attached to an Adaptec 7899P controller, using kernel 2.4.4-ac8. I had no problems there either, and the test completed in 30 seconds (with a nearly full disk). I also didn't see this problem on an Apple Powerbook G4 nor on another Intel machine with a DAC960 RAID. In short, I'm not seeing this problem. Regards, Jeffrey Baker - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
bdflush/mm performance drop-out defect (more info)
Here is some additional info about the 2.4 performance defect. Only one person offered a suggestion about the use of HIGHMEM. I tried with and without HIGHMEM enabled with the same results. However, it does appear to take a bit longer to reach performance drop-out condition when HIGHMEM is disabled. The same system degradation also appears when using partitions on a single internal SCSI drive, but seems to happen only when performing the I/O in parallel processes. It appears that the load must be sustained long enough to affect some buffer cache behavior. Parallel dd commands (if=/dev/zero) also reveal the problem. I still need to do some benchmarks, but it looks like 2.4 kernels achieve roughly 25% (or less?) of the throughput of the 2.2 kernels under heavy parallel loads (on identical hardware). I've also confirmed the defect on a dual-processor Xeon system with 2.4. The defect exists whether drivers are built-in or compiled as modules, altho the parallel mkfs test duration improves by as much as 50% in some cases when using a kernel with built-in SCSI drivers. During the periods when the console is frozen, the activity lights on the device are also idle. These idle periods last between 30 seconds up to several minutes, then there is a burst of about 10 to 15 seconds of I/O activity to the device. The 2.2 kernels appear to have none of these issues. Maybe someone can confirm this behavior on more systems using the test case below. It's extremely repeatable in all of the environments I've tried. A Windows colleague is beginning to wonder why I don't just downgrade to 2.2 to avoid this problem, but then he reminds me that Windows has better SMP performance than the 2.2 kernels. 8) I've about reached the point where I'm going to have to take his advice. -- Forwarded message -- Date: Fri, 11 May 2001 10:15:20 -0600 (MDT) From: null <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: nasty SCSI performance drop-outs (potential test case) On Tue, 10 Apr 2001, Rik van Riel wrote: > On Tue, 10 Apr 2001, Alan Cox wrote: > > > > Any time I start injecting lots of mail into the qmail queue, > > > *one* of the two processors gets pegged at 99%, and it takes forever > > > for anything typed at the console to actually appear (just as you > > > describe). > > > > Yes I've seen this case. Its partially still a mystery > > I've seen it too. It could be some interaction between kswapd > and bdflush ... but I'm not sure what the exact cause would be. > > regards, > > Rik Hopefully a repeatable test case will help "flush" this bug out of the kernel. Has anyone tried doing mkfs in parallel? Here are some data points (which are repeatable even up to 2.4.5-pre1): Unless noted, results are based on: stock RedHat7.1 configuration with 2.4.2 kernel (or any recent 2.4 kernel) default RedHat7.1 (standard?) kupdated parameter settings 6-way 700Mhz Xeon server, with 1.2 GB of system RAM SCSI I/O with qlogicfc or qla2x00 low-level driver ext2 filesystem Case 1: Elapsed time to make a single 5GB ext2 filesystem in this configuration is about 2.6 seconds (time mkfs -F /dev/sdx &). Time to mkfs two 5GB LUNs sequencially is then 5.2 seconds. Time to mkfs the same two 5GB LUNs in parallel is 54 seconds. Hmmm. Bandwidth on two CPUs is totally consumed (99.9%) and a third CPU is usually consumed by the kupdated process. Activity lights on the storage device are mostly idle during this time. Case 2: - Elapsed time to make a single 14GB ext2 filesystem is 8.1 seconds. Time to mkfs eight 14GB LUNs sequencially is 1 minute 15 seconds. Time to mkfs the same eight 14GB LUNs in parallel is 57 minutes and 23 seconds. Yikes. Bandwidth of all 6 CPUs is completely consumed and the system becomes completely unresponsive. Can't even log in from the console during this period. Activity lights on the device blink rarely. For comparison, the same parallel mkfs test on the exact same device and the exact same eight 14GB LUNs can be completed in 1 minute and 40 seconds on a 4-way 550Mhz Xeon server running 2.2.16 kernel (RH6.2) and the system is quite responsive the entire time. - I have not seen data corruptions in any of these test cases or with live data. In another test I tried one mkfs to the external device in parallel with a mkfs to an internal SCSI drive (Megaraid ctrlr) with the same drop-out in performance. Hopefully others can easily repeat this behavior. I suppose that parallel mkfs could represent a rare corner case of sequencial writes, but I've seen the same issue with almost any parallel SCSI I/O workload. No idea why sequencial mkfs isn't affected tho. As it stands, for high traffic server environments, the 2.2 kernels have well beyond an order of magnitude better performance on the same equipment. If others can repeat this result the 2.4 kernels are not quite buzzword compliant. 8) - To unsubscribe from this list: send the line
bdflush/mm performance drop-out defect (more info)
Here is some additional info about the 2.4 performance defect. Only one person offered a suggestion about the use of HIGHMEM. I tried with and without HIGHMEM enabled with the same results. However, it does appear to take a bit longer to reach performance drop-out condition when HIGHMEM is disabled. The same system degradation also appears when using partitions on a single internal SCSI drive, but seems to happen only when performing the I/O in parallel processes. It appears that the load must be sustained long enough to affect some buffer cache behavior. Parallel dd commands (if=/dev/zero) also reveal the problem. I still need to do some benchmarks, but it looks like 2.4 kernels achieve roughly 25% (or less?) of the throughput of the 2.2 kernels under heavy parallel loads (on identical hardware). I've also confirmed the defect on a dual-processor Xeon system with 2.4. The defect exists whether drivers are built-in or compiled as modules, altho the parallel mkfs test duration improves by as much as 50% in some cases when using a kernel with built-in SCSI drivers. During the periods when the console is frozen, the activity lights on the device are also idle. These idle periods last between 30 seconds up to several minutes, then there is a burst of about 10 to 15 seconds of I/O activity to the device. The 2.2 kernels appear to have none of these issues. Maybe someone can confirm this behavior on more systems using the test case below. It's extremely repeatable in all of the environments I've tried. A Windows colleague is beginning to wonder why I don't just downgrade to 2.2 to avoid this problem, but then he reminds me that Windows has better SMP performance than the 2.2 kernels. 8) I've about reached the point where I'm going to have to take his advice. -- Forwarded message -- Date: Fri, 11 May 2001 10:15:20 -0600 (MDT) From: null [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: nasty SCSI performance drop-outs (potential test case) On Tue, 10 Apr 2001, Rik van Riel wrote: On Tue, 10 Apr 2001, Alan Cox wrote: Any time I start injecting lots of mail into the qmail queue, *one* of the two processors gets pegged at 99%, and it takes forever for anything typed at the console to actually appear (just as you describe). Yes I've seen this case. Its partially still a mystery I've seen it too. It could be some interaction between kswapd and bdflush ... but I'm not sure what the exact cause would be. regards, Rik Hopefully a repeatable test case will help flush this bug out of the kernel. Has anyone tried doing mkfs in parallel? Here are some data points (which are repeatable even up to 2.4.5-pre1): Unless noted, results are based on: stock RedHat7.1 configuration with 2.4.2 kernel (or any recent 2.4 kernel) default RedHat7.1 (standard?) kupdated parameter settings 6-way 700Mhz Xeon server, with 1.2 GB of system RAM SCSI I/O with qlogicfc or qla2x00 low-level driver ext2 filesystem Case 1: Elapsed time to make a single 5GB ext2 filesystem in this configuration is about 2.6 seconds (time mkfs -F /dev/sdx ). Time to mkfs two 5GB LUNs sequencially is then 5.2 seconds. Time to mkfs the same two 5GB LUNs in parallel is 54 seconds. Hmmm. Bandwidth on two CPUs is totally consumed (99.9%) and a third CPU is usually consumed by the kupdated process. Activity lights on the storage device are mostly idle during this time. Case 2: - Elapsed time to make a single 14GB ext2 filesystem is 8.1 seconds. Time to mkfs eight 14GB LUNs sequencially is 1 minute 15 seconds. Time to mkfs the same eight 14GB LUNs in parallel is 57 minutes and 23 seconds. Yikes. Bandwidth of all 6 CPUs is completely consumed and the system becomes completely unresponsive. Can't even log in from the console during this period. Activity lights on the device blink rarely. For comparison, the same parallel mkfs test on the exact same device and the exact same eight 14GB LUNs can be completed in 1 minute and 40 seconds on a 4-way 550Mhz Xeon server running 2.2.16 kernel (RH6.2) and the system is quite responsive the entire time. - I have not seen data corruptions in any of these test cases or with live data. In another test I tried one mkfs to the external device in parallel with a mkfs to an internal SCSI drive (Megaraid ctrlr) with the same drop-out in performance. Hopefully others can easily repeat this behavior. I suppose that parallel mkfs could represent a rare corner case of sequencial writes, but I've seen the same issue with almost any parallel SCSI I/O workload. No idea why sequencial mkfs isn't affected tho. As it stands, for high traffic server environments, the 2.2 kernels have well beyond an order of magnitude better performance on the same equipment. If others can repeat this result the 2.4 kernels are not quite buzzword compliant. 8) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the
Re: bdflush/mm performance drop-out defect (more info)
On Tue, 22 May 2001, null wrote: Here is some additional info about the 2.4 performance defect. Only one person offered a suggestion about the use of HIGHMEM. I tried with and without HIGHMEM enabled with the same results. However, it does appear to take a bit longer to reach performance drop-out condition when HIGHMEM is disabled. The same system degradation also appears when using partitions on a single internal SCSI drive, but seems to happen only when performing the I/O in parallel processes. It appears that the load must be sustained long enough to affect some buffer cache behavior. Parallel dd commands (if=/dev/zero) also reveal the problem. I still need to do some benchmarks, but it looks like 2.4 kernels achieve roughly 25% (or less?) of the throughput of the 2.2 kernels under heavy parallel loads (on identical hardware). I've also confirmed the defect on a dual-processor Xeon system with 2.4. The defect exists whether drivers are built-in or compiled as modules, altho the parallel mkfs test duration improves by as much as 50% in some cases when using a kernel with built-in SCSI drivers. That's a very interesting observation. May I suggest that the problem may be the driver for your SCSI device? I just ran some tests of parallel I/O on a 2 CPU Intel Pentium III 800 MHz with 2GB main memory, on a single Seagate Barracuda ST336704LWV attached to a AIC7896. The system controller is Intel 440GX. The kernel is 2.4.3-ac7: jwb@windmill:~$ for i in 1 2 3 4 5 6 7 8 9 10; do time dd if=/dev/zero of=/tmp/$i bs=4096 count=25600 done; This spawns 10 writers of 100MB files on the same filesystem. While all this went on, the system was responsive, and vmstat showed a steady block write of at least 2 blocks/second. Meanwhile this machine also has constantly used mysql and postgresql database systems and a few interactive users. The test completed in 19 seconds and 24 seconds on separate runs. I also performed this test on a machine with 2 Intel Pentium III 933 MHz CPUs, 512MB main memory, an Intel 840 system controller, and a Quantum 10K II 9GB drive attached to an Adaptec 7899P controller, using kernel 2.4.4-ac8. I had no problems there either, and the test completed in 30 seconds (with a nearly full disk). I also didn't see this problem on an Apple Powerbook G4 nor on another Intel machine with a DAC960 RAID. In short, I'm not seeing this problem. Regards, Jeffrey Baker - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/