Re: amrd disk performance drop after running under high load
Alexey Popov wrote: Kris Kennaway wrote: what is your RAID controller configuration (read ahead/cache/write policy)? I have seen weird/bogus numbers (~100% busy) reported by systat -v when read ahead was enabled on LSI/amr controllers. I tried to run with disabled Read-ahead, but it didn't help. I just ran into this myself, and apparently it can be caused by Patrol Reads where the adapter periodically scans the disks to look for media errors. You can turn this off using -stopPR with the megarc gg port. Oops, -disPR is the correct command to disable, -stopPR just halts a PR event in progress. Wow! Really disabling Patrol Reads solves the problem. Thank you! I have many amrd's and all of them appear to have Patrol Reads enabled by default. But the problem happenes only on three of them. Is this a hardware problem? I am not sure, maybe for some reason the patrol reads are not interfering with other disk I/O so much (e.g. the hardware prioritises them differently or something). Anyway, glad to hear it was resolved. Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Kris Kennaway wrote: what is your RAID controller configuration (read ahead/cache/write policy)? I have seen weird/bogus numbers (~100% busy) reported by systat -v when read ahead was enabled on LSI/amr controllers. I tried to run with disabled Read-ahead, but it didn't help. I just ran into this myself, and apparently it can be caused by Patrol Reads where the adapter periodically scans the disks to look for media errors. You can turn this off using -stopPR with the megarc gg port. Oops, -disPR is the correct command to disable, -stopPR just halts a PR event in progress. Wow! Really disabling Patrol Reads solves the problem. Thank you! I have many amrd's and all of them appear to have Patrol Reads enabled by default. But the problem happenes only on three of them. Is this a hardware problem? With best regards, Alexey Popov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Kris Kennaway wrote: Alexey Popov wrote: Hi. Panagiotis Christias wrote: In the good case you are getting a much higher interrupt rate but with the data you provided I can't tell where from. You need to run vmstat -i at regular intervals (e.g. every 10 seconds for a minute) during the good and bad times, since it only provides counters and an average rate over the uptime of the system. Now I'm running 10-process lighttpd and the problem became no so big. I collected interrupt stats and it shows no relation beetween ionterrupts and slowdowns. Here is it: http://83.167.98.162/gprof/intr-graph/ Also I have similiar statistics on mutex profiling and it shows there's no problem in mutexes. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/ I have no idea what else to check. I don't know what this graph is showing me :) When precisely is the system behaving poorly? what is your RAID controller configuration (read ahead/cache/write policy)? I have seen weird/bogus numbers (~100% busy) reported by systat -v when read ahead was enabled on LSI/amr controllers. ** Existing Logical Drive Information By LSI Logic Corp.,USA ** [Note: For SATA-2, 4 and 6 channel controllers, please specify Ch=0 Id=0..15 for specifying physical drive(Ch=channel, Id=Target)] Logical Drive : 0( Adapter: 0 ): Status: OPTIMAL --- SpanDepth :01 RaidLevel: 5 RdAhead : Adaptive Cache: DirectIo StripSz :064KB Stripes : 6 WrPolicy: WriteBack Logical Drive 0 : SpanLevel_0 Disks Chnl Target StartBlock Blocks Physical Target Status -- -- -- -- 0 000x 0x22ec ONLINE 0 010x 0x22ec ONLINE 0 020x 0x22ec ONLINE 0 030x 0x22ec ONLINE 0 040x 0x22ec ONLINE 0 050x 0x22ec ONLINE I tried to run with disabled Read-ahead, but it didn't help. I just ran into this myself, and apparently it can be caused by Patrol Reads where the adapter periodically scans the disks to look for media errors. You can turn this off using -stopPR with the megarc port. Kris Oops, -disPR is the correct command to disable, -stopPR just halts a PR event in progress. Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Alexey Popov wrote: Hi. Panagiotis Christias wrote: In the good case you are getting a much higher interrupt rate but with the data you provided I can't tell where from. You need to run vmstat -i at regular intervals (e.g. every 10 seconds for a minute) during the good and bad times, since it only provides counters and an average rate over the uptime of the system. Now I'm running 10-process lighttpd and the problem became no so big. I collected interrupt stats and it shows no relation beetween ionterrupts and slowdowns. Here is it: http://83.167.98.162/gprof/intr-graph/ Also I have similiar statistics on mutex profiling and it shows there's no problem in mutexes. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/ I have no idea what else to check. I don't know what this graph is showing me :) When precisely is the system behaving poorly? what is your RAID controller configuration (read ahead/cache/write policy)? I have seen weird/bogus numbers (~100% busy) reported by systat -v when read ahead was enabled on LSI/amr controllers. ** Existing Logical Drive Information By LSI Logic Corp.,USA ** [Note: For SATA-2, 4 and 6 channel controllers, please specify Ch=0 Id=0..15 for specifying physical drive(Ch=channel, Id=Target)] Logical Drive : 0( Adapter: 0 ): Status: OPTIMAL --- SpanDepth :01 RaidLevel: 5 RdAhead : Adaptive Cache: DirectIo StripSz :064KB Stripes : 6 WrPolicy: WriteBack Logical Drive 0 : SpanLevel_0 Disks Chnl Target StartBlock Blocks Physical Target Status -- -- -- -- 0 000x 0x22ec ONLINE 0 010x 0x22ec ONLINE 0 020x 0x22ec ONLINE 0 030x 0x22ec ONLINE 0 040x 0x22ec ONLINE 0 050x 0x22ec ONLINE I tried to run with disabled Read-ahead, but it didn't help. I just ran into this myself, and apparently it can be caused by Patrol Reads where the adapter periodically scans the disks to look for media errors. You can turn this off using -stopPR with the megarc port. Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Hi. Panagiotis Christias wrote: In the good case you are getting a much higher interrupt rate but with the data you provided I can't tell where from. You need to run vmstat -i at regular intervals (e.g. every 10 seconds for a minute) during the good and bad times, since it only provides counters and an average rate over the uptime of the system. Now I'm running 10-process lighttpd and the problem became no so big. I collected interrupt stats and it shows no relation beetween ionterrupts and slowdowns. Here is it: http://83.167.98.162/gprof/intr-graph/ Also I have similiar statistics on mutex profiling and it shows there's no problem in mutexes. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/ I have no idea what else to check. I don't know what this graph is showing me :) When precisely is the system behaving poorly? what is your RAID controller configuration (read ahead/cache/write policy)? I have seen weird/bogus numbers (~100% busy) reported by systat -v when read ahead was enabled on LSI/amr controllers. ** Existing Logical Drive Information By LSI Logic Corp.,USA ** [Note: For SATA-2, 4 and 6 channel controllers, please specify Ch=0 Id=0..15 for specifying physical drive(Ch=channel, Id=Target)] Logical Drive : 0( Adapter: 0 ): Status: OPTIMAL --- SpanDepth :01 RaidLevel: 5 RdAhead : Adaptive Cache: DirectIo StripSz :064KB Stripes : 6 WrPolicy: WriteBack Logical Drive 0 : SpanLevel_0 Disks Chnl Target StartBlock Blocks Physical Target Status -- -- -- -- 0 000x 0x22ec ONLINE 0 010x 0x22ec ONLINE 0 020x 0x22ec ONLINE 0 030x 0x22ec ONLINE 0 040x 0x22ec ONLINE 0 050x 0x22ec ONLINE I tried to run with disabled Read-ahead, but it didn't help. With best regards, Alexey Popov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Hi. Kris Kennaway wrote: In the good case you are getting a much higher interrupt rate but with the data you provided I can't tell where from. You need to run vmstat -i at regular intervals (e.g. every 10 seconds for a minute) during the good and bad times, since it only provides counters and an average rate over the uptime of the system. Now I'm running 10-process lighttpd and the problem became no so big. I collected interrupt stats and it shows no relation beetween ionterrupts and slowdowns. Here is it: http://83.167.98.162/gprof/intr-graph/ Also I have similiar statistics on mutex profiling and it shows there's no problem in mutexes. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/ I have no idea what else to check. I don't know what this graph is showing me :) When precisely is the system behaving poorly? Take a look at Disk Load % picture at http://83.167.98.162/gprof/intr-graph/ At ~ 17:00, 03:00-04:00, 13:00-14:00, 00:30-01:30, 11:00-13:00 it shows peaks of disk activity which really never happen. As I said in the beginning of the thread in this peak moments disk becomes slow and vmstat shows 100% disk load while performing 10 tps. Other grafs at this page shows that there's no relation to interrupts rate of amr or em device. You advised me to check it. When I was using single-process lighttpd the problem was much harder as you can see at http://83.167.98.162/gprof/graph/ . At first picture on this page you can see disk load peaks at 18:00 and 15:00 which leaded to decreasing network output because disk was too slow. Back in this thread we suspected UMA mutexes. In order to check it I collected mutex profiling stats and draw graphs over time and they also didn't show anything interesting. All mutex graphs were smooth while disk load peaks. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/ With best regards, Alexey Popov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
On Nov 11, 2007 7:26 PM, Alexey Popov [EMAIL PROTECTED] wrote: Hi. Kris Kennaway wrote: In the good case you are getting a much higher interrupt rate but with the data you provided I can't tell where from. You need to run vmstat -i at regular intervals (e.g. every 10 seconds for a minute) during the good and bad times, since it only provides counters and an average rate over the uptime of the system. Now I'm running 10-process lighttpd and the problem became no so big. I collected interrupt stats and it shows no relation beetween ionterrupts and slowdowns. Here is it: http://83.167.98.162/gprof/intr-graph/ Also I have similiar statistics on mutex profiling and it shows there's no problem in mutexes. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/ I have no idea what else to check. I don't know what this graph is showing me :) When precisely is the system behaving poorly? Take a look at Disk Load % picture at http://83.167.98.162/gprof/intr-graph/ At ~ 17:00, 03:00-04:00, 13:00-14:00, 00:30-01:30, 11:00-13:00 it shows peaks of disk activity which really never happen. As I said in the beginning of the thread in this peak moments disk becomes slow and vmstat shows 100% disk load while performing 10 tps. Other grafs at this page shows that there's no relation to interrupts rate of amr or em device. You advised me to check it. When I was using single-process lighttpd the problem was much harder as you can see at http://83.167.98.162/gprof/graph/ . At first picture on this page you can see disk load peaks at 18:00 and 15:00 which leaded to decreasing network output because disk was too slow. Back in this thread we suspected UMA mutexes. In order to check it I collected mutex profiling stats and draw graphs over time and they also didn't show anything interesting. All mutex graphs were smooth while disk load peaks. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/ With best regards, Alexey Popov Hello, what is your RAID controller configuration (read ahead/cache/write policy)? I have seen weird/bogus numbers (~100% busy) reported by systat -v when read ahead was enabled on LSI/amr controllers. Regards, Panagiotis Christias ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Hi. Kris Kennaway wrote:te: In the good case you are getting a much higher interrupt rate but with the data you provided I can't tell where from. You need to run vmstat -i at regular intervals (e.g. every 10 seconds for a minute) during the good and bad times, since it only provides counters and an average rate over the uptime of the system. Now I'm running 10-process lighttpd and the problem became no so big. I collected interrupt stats and it shows no relation beetween ionterrupts and slowdowns. Here is it: http://83.167.98.162/gprof/intr-graph/ Also I have similiar statistics on mutex profiling and it shows there's no problem in mutexes. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/ I have no idea what else to check. With best regards, Alexey Popov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Alexey Popov wrote: Hi. Kris Kennaway wrote:te: In the good case you are getting a much higher interrupt rate but with the data you provided I can't tell where from. You need to run vmstat -i at regular intervals (e.g. every 10 seconds for a minute) during the good and bad times, since it only provides counters and an average rate over the uptime of the system. Now I'm running 10-process lighttpd and the problem became no so big. I collected interrupt stats and it shows no relation beetween ionterrupts and slowdowns. Here is it: http://83.167.98.162/gprof/intr-graph/ Also I have similiar statistics on mutex profiling and it shows there's no problem in mutexes. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/ I have no idea what else to check. With best regards, Alexey Popov I don't know what this graph is showing me :) When precisely is the system behaving poorly? Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Hi Kris Kennaway wrote: So I can conclude that FreeBSD has a long standing bug in VM that could be triggered when serving large amount of static data (much bigger than memory size) on high rates. Possibly this only applies to large files like mp3 or video. It is possible, we have further work to do to conclude this though. I forgot to mention I have pmc and kgmon profiling for good and bad times. But I have not enough knowledge to interpret it right and not sure if it can help. pmc would be useful. pmc profiling attached. OK, the pmc traces do seem to show that it's not a lock contention issue. That being the case I don't think the fact that different servers perform better is directly related. But it was evidence of mbuf lock contention in mutex profiling, wasn't it? As far as I understand, mutex problems can exist without increasing CPU load in pmc stats, right? There is also no evidence of a VM problem. What your vmstat and pmc traces show is that your system really isn't doing much work at all, relatively speaking. There is also still no evidence of a disk problem. In fact your disk seems to be almost idle in both cases you provided, only doing between 1 and 10 operations per second, which is trivial. vmstat and network output graphs shows that the problem exists. If it is not a disk or network or VM problem, what else could be wrong? In the good case you are getting a much higher interrupt rate but with the data you provided I can't tell where from. You need to run vmstat -i at regular intervals (e.g. every 10 seconds for a minute) during the good and bad times, since it only provides counters and an average rate over the uptime of the system. I'll try this, but AFAIR there was no strangeness with interrupts. I believe the reason of high interrupt rate in good cases is that server sends much traffic. What there is evidence of is an interrupt aliasing problem between em and USB: irq16: uhci0 1464547796 1870 irq64: em01463513610 1869 I tried disabling USB in kernel, this ussie was gone, but the main problem was left. Also I have this issue with interrupt aliasing on many servers without problems. With best regards, Alexey Popov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Alexey Popov wrote: Hi Kris Kennaway wrote: So I can conclude that FreeBSD has a long standing bug in VM that could be triggered when serving large amount of static data (much bigger than memory size) on high rates. Possibly this only applies to large files like mp3 or video. It is possible, we have further work to do to conclude this though. I forgot to mention I have pmc and kgmon profiling for good and bad times. But I have not enough knowledge to interpret it right and not sure if it can help. pmc would be useful. pmc profiling attached. OK, the pmc traces do seem to show that it's not a lock contention issue. That being the case I don't think the fact that different servers perform better is directly related. But it was evidence of mbuf lock contention in mutex profiling, wasn't it? As far as I understand, mutex problems can exist without increasing CPU load in pmc stats, right? No, the lock functions will show up as using a lot of CPU. I guess the lock profiling trace showed high numbers because you ran it for a long time. There is also no evidence of a VM problem. What your vmstat and pmc traces show is that your system really isn't doing much work at all, relatively speaking. There is also still no evidence of a disk problem. In fact your disk seems to be almost idle in both cases you provided, only doing between 1 and 10 operations per second, which is trivial. vmstat and network output graphs shows that the problem exists. If it is not a disk or network or VM problem, what else could be wrong? The vmstat output you provided so far doesn't show anything specific. In the good case you are getting a much higher interrupt rate but with the data you provided I can't tell where from. You need to run vmstat -i at regular intervals (e.g. every 10 seconds for a minute) during the good and bad times, since it only provides counters and an average rate over the uptime of the system. I'll try this, but AFAIR there was no strangeness with interrupts. I believe the reason of high interrupt rate in good cases is that server sends much traffic. What there is evidence of is an interrupt aliasing problem between em and USB: irq16: uhci0 1464547796 1870 irq64: em01463513610 1869 I tried disabling USB in kernel, this ussie was gone, but the main problem was left. Also I have this issue with interrupt aliasing on many servers without problems. OK. Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Alexey Popov wrote: Hi Kris Kennaway wrote: So I can conclude that FreeBSD has a long standing bug in VM that could be triggered when serving large amount of static data (much bigger than memory size) on high rates. Possibly this only applies to large files like mp3 or video. It is possible, we have further work to do to conclude this though. I forgot to mention I have pmc and kgmon profiling for good and bad times. But I have not enough knowledge to interpret it right and not sure if it can help. pmc would be useful. pmc profiling attached. Sorry for the delay, I was travelling last weekend and it took a few days to catch up. OK, the pmc traces do seem to show that it's not a lock contention issue. That being the case I don't think the fact that different servers perform better is directly related. In my tests multithreaded web servers don't seem to perform well anyway. There is also no evidence of a VM problem. What your vmstat and pmc traces show is that your system really isn't doing much work at all, relatively speaking. There is also still no evidence of a disk problem. In fact your disk seems to be almost idle in both cases you provided, only doing between 1 and 10 operations per second, which is trivial. In the good case you are getting a much higher interrupt rate but with the data you provided I can't tell where from. You need to run vmstat -i at regular intervals (e.g. every 10 seconds for a minute) during the good and bad times, since it only provides counters and an average rate over the uptime of the system. What there is evidence of is an interrupt aliasing problem between em and USB: irq16: uhci0 1464547796 1870 irq64: em01463513610 1869 This is a problem on some intel systems. Basically each em0 interrupt is also causing a bogus interrupt to the uhci0 device too. This will be causing some overhead and might be contributing to the UMA problems. I am not sure if it is the main issue, although it could be. It is mostly serious when both irqs run under Giant, because they will both fight for it every time one of them interrupts. That is not the case here but it could be other bad scenarios too. You could try disabling USB support in your kernel since you dont seem to be using it. Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Hi Scott Long wrote: interrupt total rate irq6: fdc0 8 0 irq14: ata0 47 0 irq16: uhci0 1428187319 1851 ^^ [1] irq18: uhci212374352 16 irq23: ehci0 3 0 irq46: amr0 11983237 15 irq64: em01427141755 1850 ^^ [2] cpu0: timer 1540896452 1997 cpu1: timer 1542377798 1999 Total 5962960971 7730 [1] and [2] looks suspicious to me (totals and rate are too close to each other and btw to timers). Let the latter (timers) alone. Do you use any USB device? Can you try to use other network card? That behaviour seems to be an interrupt storm and/or irq collision. It's neither. It's a side effect of a feature that FreeBSD abuses for handling interrupts. Note that amr0 and ehci2 are acting similar. It's mostly harmless, but it does waste CPU cycles. I wouldn't expect this on a recent version of FreeBSD, though, at least not from the e1000 driver. I have this effect on many servers and I believe it is harmless. At once I was trying to reduce CPU usage on the very loaded server and removed USB from kernel. This effect disappeared, but there was no significant difference in CPU usage. I disagree about your words about recent version. I have this effect on many servers with latest FreeBSD-6-stable and em. Actually I have more servers with this effect than without it. With best regards, Alexey Popov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Hi! Since nobody answered so far, here is my two cents. I'm not an expert here so it's only my imho. On Wed, 17 Oct 2007 22:52:49 +0400 Alexey Popov wrote: interrupt total rate irq6: fdc0 8 0 irq14: ata0 47 0 irq16: uhci0 1428187319 1851 ^^ [1] irq18: uhci212374352 16 irq23: ehci0 3 0 irq46: amr0 11983237 15 irq64: em01427141755 1850 ^^ [2] cpu0: timer 1540896452 1997 cpu1: timer 1542377798 1999 Total 5962960971 7730 [1] and [2] looks suspicious to me (totals and rate are too close to each other and btw to timers). Let the latter (timers) alone. Do you use any USB device? Can you try to use other network card? That behaviour seems to be an interrupt storm and/or irq collision. WBR -- bsam ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Boris Samorodov wrote: Hi! Since nobody answered so far, here is my two cents. I'm not an expert here so it's only my imho. On Wed, 17 Oct 2007 22:52:49 +0400 Alexey Popov wrote: interrupt total rate irq6: fdc0 8 0 irq14: ata0 47 0 irq16: uhci0 1428187319 1851 ^^ [1] irq18: uhci212374352 16 irq23: ehci0 3 0 irq46: amr0 11983237 15 irq64: em01427141755 1850 ^^ [2] cpu0: timer 1540896452 1997 cpu1: timer 1542377798 1999 Total 5962960971 7730 [1] and [2] looks suspicious to me (totals and rate are too close to each other and btw to timers). Let the latter (timers) alone. Do you use any USB device? Can you try to use other network card? That behaviour seems to be an interrupt storm and/or irq collision. It's neither. It's a side effect of a feature that FreeBSD abuses for handling interrupts. Note that amr0 and ehci2 are acting similar. It's mostly harmless, but it does waste CPU cycles. I wouldn't expect this on a recent version of FreeBSD, though, at least not from the e1000 driver. Scott ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
On Thu, 18 Oct 2007 15:57:16 -0600 Scott Long wrote: Boris Samorodov wrote: Since nobody answered so far, here is my two cents. I'm not an expert here so it's only my imho. On Wed, 17 Oct 2007 22:52:49 +0400 Alexey Popov wrote: interrupt total rate irq6: fdc0 8 0 irq14: ata0 47 0 irq16: uhci0 1428187319 1851 ^^ [1] irq18: uhci212374352 16 irq23: ehci0 3 0 irq46: amr0 11983237 15 irq64: em01427141755 1850 ^^ [2] cpu0: timer 1540896452 1997 cpu1: timer 1542377798 1999 Total 5962960971 7730 [1] and [2] looks suspicious to me (totals and rate are too close to each other and btw to timers). Let the latter (timers) alone. Do you use any USB device? Can you try to use other network card? That behaviour seems to be an interrupt storm and/or irq collision. It's neither. It's a side effect of a feature that FreeBSD abuses for handling interrupts. Note that amr0 and ehci2 are acting similar. It's mostly harmless, but it does waste CPU cycles. I wouldn't expect this on a recent version of FreeBSD, though, at least not from the e1000 driver. I see. Sorry for the noise. So, as I can understand _that_ can't be the problem (as at subj) the OP is seeing? WBR -- bsam ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Alexey Popov wrote: This is very unlikely, because I have 5 another video storage servers of the same hardware and software configurations and they feel good. Clearly something is different about them, though. If you can characterize exactly what that is then it will help. I can't see any difference but a date of installation. Really I compared all parameters and got nothing interesting. At first glance one can say that problem is in Dell's x850 series or amr(4), but we run this hardware on many other projects and they work well. Also Linux on them works. OK but there is no evidence in what you posted so far that amr is involved in any way. There is convincing evidence that it is the mbuf issue. Why are you sure this is the mbuf issue? Because that is the only problem shown in the data you posted. For example, if there is a real problem with amr or VM causing disk slowdown, then when it occurs the network subsystem will have another load pattern. Instead of just quick sending large amounts of data, the system will have to accept large amount of sumultaneous connections waiting for data. Can this cause high mbuf contention? I'd expect to see evidence of the main problem. And few hours ago I received feed back from Andrzej Tobola, he has the same problem on FreeBSD 7 with Promise ATA software mirror: Well, he didnt provide any evidence yet that it is the same problem, so let's not become confused by feelings :) I think he is telling about 100% disk busy while processing ~5 transfers/sec. % busy as reported by gstat doesn't mean what you think it does. What is the I/O response time? That's the meaningful statistic for evaluating I/O load. Also you didnt post about this. So I can conclude that FreeBSD has a long standing bug in VM that could be triggered when serving large amount of static data (much bigger than memory size) on high rates. Possibly this only applies to large files like mp3 or video. It is possible, we have further work to do to conclude this though. I forgot to mention I have pmc and kgmon profiling for good and bad times. But I have not enough knowledge to interpret it right and not sure if it can help. pmc would be useful. Also now I run nginx instead of lighttpd on one of the problematic servers. It seems to work much better - sometimes there is a peaks in disk load, but disk does not become very slow and network output does not change. The difference of nginx is that it runs in multiple processes, while lighttpd by default has only one process. Now I configured lighttpd on other server to run in multiple workers. I'll see if it helps. What else can i try? Still waiting on the vmstat -z output. Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Kris Kennaway wrote: What else can i try? Still waiting on the vmstat -z output. Also can you please obtain vmstat -i, netstat -m and 10 seconds of representative vmstat -w output when the problem is and is not occurring? Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Hi. Kris Kennaway wrote: After some time of running under high load disk performance become expremely poor. At that periods 'systat -vm 1' shows something like this: This web service is similiar to YouTube. This server is video store. I have around 200G of *.flv (flash video) files on the server I run lighttpd as a web server. Disk load is usually around 50%, network output 100Mbit/s, 100 simultaneous connections. CPU is mostly idle. This is very unlikely, because I have 5 another video storage servers of the same hardware and software configurations and they feel good. Clearly something is different about them, though. If you can characterize exactly what that is then it will help. I can't see any difference but a date of installation. Really I compared all parameters and got nothing interesting. At first glance one can say that problem is in Dell's x850 series or amr(4), but we run this hardware on many other projects and they work well. Also Linux on them works. OK but there is no evidence in what you posted so far that amr is involved in any way. There is convincing evidence that it is the mbuf issue. Why are you sure this is the mbuf issue? For example, if there is a real problem with amr or VM causing disk slowdown, then when it occurs the network subsystem will have another load pattern. Instead of just quick sending large amounts of data, the system will have to accept large amount of sumultaneous connections waiting for data. Can this cause high mbuf contention? And few hours ago I received feed back from Andrzej Tobola, he has the same problem on FreeBSD 7 with Promise ATA software mirror: Well, he didnt provide any evidence yet that it is the same problem, so let's not become confused by feelings :) I think he is telling about 100% disk busy while processing ~5 transfers/sec. So I can conclude that FreeBSD has a long standing bug in VM that could be triggered when serving large amount of static data (much bigger than memory size) on high rates. Possibly this only applies to large files like mp3 or video. It is possible, we have further work to do to conclude this though. I forgot to mention I have pmc and kgmon profiling for good and bad times. But I have not enough knowledge to interpret it right and not sure if it can help. Also now I run nginx instead of lighttpd on one of the problematic servers. It seems to work much better - sometimes there is a peaks in disk load, but disk does not become very slow and network output does not change. The difference of nginx is that it runs in multiple processes, while lighttpd by default has only one process. Now I configured lighttpd on other server to run in multiple workers. I'll see if it helps. What else can i try? With best regards, Alexey Popov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Hi Kris Kennaway wrote: And few hours ago I received feed back from Andrzej Tobola, he has the same problem on FreeBSD 7 with Promise ATA software mirror: Well, he didnt provide any evidence yet that it is the same problem, so let's not become confused by feelings :) I think he is telling about 100% disk busy while processing ~5 transfers/sec. % busy as reported by gstat doesn't mean what you think it does. What is the I/O response time? That's the meaningful statistic for evaluating I/O load. Also you didnt post about this. At the problematic time the disk felt to be very slow, processes all were in reading disk state and vmstat proved it by the % numbers. So I can conclude that FreeBSD has a long standing bug in VM that could be triggered when serving large amount of static data (much bigger than memory size) on high rates. Possibly this only applies to large files like mp3 or video. It is possible, we have further work to do to conclude this though. I forgot to mention I have pmc and kgmon profiling for good and bad times. But I have not enough knowledge to interpret it right and not sure if it can help. pmc would be useful. Unfortunately i've lost pmc profiling results. I'll try to collect it again later. See vmstats in attach (vmstat -z; netstat -m; vmstat -i; vmstat -w 1 | head -11;). Also you can see kgmon profiling results at: http://83.167.98.162/gprof/ With best regards, Alexey Popov ITEM SIZE LIMIT USED FREE REQUESTS FAILURES UMA Kegs: 240,0, 71,4, 71,0 UMA Zones:376,0, 71,9, 71,0 UMA Slabs:128,0, 1011, 62, 243081,0 UMA RCntSlabs:128,0, 361, 1205, 363320,0 UMA Hash: 256,0,4, 11,7,0 16 Bucket:152,0, 45, 30, 72,0 32 Bucket:280,0, 25, 45, 69,0 64 Bucket:536,0, 17, 25, 55, 53 128 Bucket: 1048,0, 287, 88, 1200,95423 VM OBJECT:224,0, 5536,23228, 7675004,0 MAP: 352,0,7, 15,7,0 KMAP ENTRY: 112,90222, 283, 1037, 1207524,0 MAP ENTRY:112,0, 1396, 419, 72221561,0 PV ENTRY: 48, 2244600,17835,30261, 768591673,0 DP fakepg:120,0,0, 31, 10,0 mt_zone: 1024,0, 170,6, 170,0 16:16,0, 3578, 2470, 745206870,0 32:32,0, 1273, 343, 1750850,0 64:64,0, 6147, 1693, 487691440,0 128: 128,0, 4659, 387, 1464251,0 256: 256,0, 596, 2539, 7208469,0 512: 512,0, 608, 253, 791295,0 1024:1024,0, 49, 239,82867,0 2048:2048,0, 27, 295, 115362,0 4096:4096,0, 240, 278, 564659,0 Files:120,0, 544, 324, 263880246,0 TURNSTILE:104,0, 181, 83, 307,0 PROC: 856,0, 82, 82, 308409,0 THREAD: 608,0, 169, 11,24468,0 KSEGRP: 136,0, 165, 69, 165,0 UPCALL:88,0,3, 73,3,0 SLEEPQUEUE:64,0, 181, 99, 307,0 VMSPACE: 544,0, 35, 77, 310929,0 mbuf_packet: 256,0, 368, 115, 1331807039,0 mbuf: 256,0, 2016, 2331, 5433003167,0 mbuf_cluster:2048,32768, 483, 239, 1236143964,0 mbuf_jumbo_pagesize: 4096,0,0,0,0,0 mbuf_jumbo_9k: 9216,0,0,0,0,0 mbuf_jumbo_16k: 16384,0,0,0,0,0 ACL UMA zone: 388,0,0,0,0,0 g_bio:216,0,4, 410, 48175991,0 ata_request: 336,0,0, 22, 24,0 ata_composite:376,0,0,0,0,0 VNODE:
Re: amrd disk performance drop after running under high load
Alexey Popov wrote: Hi. Kris Kennaway wrote: After some time of running under high load disk performance become expremely poor. At that periods 'systat -vm 1' shows something like this: What does high load mean? You need to explain the system workload more. This web service is similiar to YouTube. This server is video store. I have around 200G of *.flv (flash video) files on the server. I run lighttpd as a web server. Disk load is usually around 50%, network output 100Mbit/s, 100 simultaneous connections. CPU is mostly idle. As you can see it is a trivial service - sending files to network via HTTP. A couple of comments. Does lighttpd actually use HTTP accept filters? Are you using ipfilter and ipfw? You are paying a performance penalty for having them. You might try increasing BUCKET_MAX in sys/vm/uma_core.c. I don't really understand the code here, but you seem to be hitting a threshold behaviour where you are constantly running out of space in the per CPU caches. This can happen if your workload is unbalanced between the CPUs and you are always allocating on one but freeing on another, but I wouldn't expect it should happen on your workload. Maybe it can also happen if your turnover is high enough. What does vmstat -z show during the good and bad times? Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Hi. Krassimir Slavchev wrote: You run apache with mod_perl or php too. How many clients handle this apache server? Also in this light load you have locked files! Check script execution times (/server-status may be useful). When you have hight load check swap usage and haw many processes are in lockf state. Apache is not much used here, it is just for kind of content management. It is not exposed to external users. With best regards, Alexey Popov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Alexey Popov wrote: Hi. Kris Kennaway wrote: After some time of running under high load disk performance become expremely poor. At that periods 'systat -vm 1' shows something like this: What does high load mean? You need to explain the system workload more. This web service is similiar to YouTube. This server is video store. I have around 200G of *.flv (flash video) files on the server. I run lighttpd as a web server. Disk load is usually around 50%, network output 100Mbit/s, 100 simultaneous connections. CPU is mostly idle. As you can see it is a trivial service - sending files to network via HTTP. Disks amrd0 KB/t 85.39 tps 5 MB/s 0.38 % busy 99 Apart of all, I tried to make mutex profiling and here's the results (sorted by the total number of acquisitions): Bad case: 102 223514 273977 0 14689 1651568 /usr/src/sys/vm/uma_core.c:2349 (512) 950 263099 273968 0 15004 14427 /usr/src/sys/vm/uma_core.c:2450 (512) 108 150422 175840 0 10978 22988519 /usr/src/sys/vm/uma_core.c:1888 (mbuf) Here you can see that high UMA activity happens in periods of low disk performance. But I'm not sure whether this is a root of the problem, not a consequence. The extremely high contention there does seem to say you have a mbuf starvation problem and not a disk problem. I don't know why this would be happening off-hand. But there's no mbuf shortage in `netstat -m`. What else can I try to track down the source of the problem? Can you also provide more details about the system hardware and configuration? This is Dell 2850 2 x Xeon 3.2, 4Gb RAM, 6x300Gb SCSI RAID5. I'll attach details. With best regards, Alexey Popov ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] last pid: 11008; load averages: 0.07, 0.10, 0.08 up 47+08:32:50 11:46:15 38 processes: 1 running, 37 sleeping Mem: 46M Active, 3443M Inact, 246M Wired, 144M Cache, 208M Buf, 5596K Free Swap: 2048M Total, 4K Used, 2048M Free PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 56386 root 1 40 19856K 1K kqread 1 115:19 2.88% lighttpd 636 root 1 960 18292K 4212K select 0 25:39 0.00% snmpd 784 root 1 960 19668K 2072K select 1 2:31 0.00% sshd 680 root 1 960 7732K 1384K select 0 1:59 0.00% ntpd 1540 root 1 960 35092K 6496K select 0 1:30 0.00% httpd 769 root 4 200 14148K 2632K kserel 0 1:04 0.00% bacula-fd 755 root 1 960 3852K 1060K select 1 0:22 0.00% master 568 root 1 960 3648K 908K select 0 0:18 0.00% syslogd 80663 root 1 80 3688K 1016K nanslp 1 0:05 0.00% cron 760 postfix 1 960 3944K 1160K select 0 0:04 0.00% qmgr 89776 www 1 200 35180K 6684K lockf 0 0:04 0.00% httpd 89763 www 1 200 35180K 6684K lockf 0 0:04 0.00% httpd 89774 www 1 200 35180K 6684K lockf 0 0:04 0.00% httpd 89775 www 1 960 35180K 6684K select 0 0:04 0.00% httpd 699 root 1 200 7732K 1388K pause 0 0:03 0.00% ntpd 484 root 1 40 652K 220K select 0 0:00 0.00% devd 10904 llp 1 960 30616K 3564K select 0 0:00 0.00% sshd 10915 root 1 200 3912K 2340K pause 1 0:00 0.00% csh You run apache with mod_perl or php too. How many clients handle this apache server? Also in this light load you have locked files! Check script execution times (/server-status may be useful). When you have hight load check swap usage and haw many processes are in lockf state. Best Regards -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHFHuVxJBWvpalMpkRAnqTAJ9FgURNk98dtD0HYX6xIz17R6sLpQCgh5nJ XBtfOyzJJbkjzVzSF/WfmHc= =oTHZ -END PGP SIGNATURE- ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Hi. Kris Kennaway wrote: After some time of running under high load disk performance become expremely poor. At that periods 'systat -vm 1' shows something like this: What does high load mean? You need to explain the system workload more. This web service is similiar to YouTube. This server is video store. I have around 200G of *.flv (flash video) files on the server. I run lighttpd as a web server. Disk load is usually around 50%, network output 100Mbit/s, 100 simultaneous connections. CPU is mostly idle. As you can see it is a trivial service - sending files to network via HTTP. Does lighttpd actually use HTTP accept filters? Don't know how to make sure, but is seems to run appropriate setsockopt (truss output): setsockopt(0x4,0x,0x1000,0x7fffe620,0x100) = 0 (0x0) Are you using ipfilter and ipfw? You are paying a performance penalty for having them. I'm using ipfw and one of the first rules is to pass all TCP established. ipfilter is not used on this server, but it is present in kernel as it can be used on other servers. I have 95% CPU idle, so I think packet filters does not produce significant load on this server. You might try increasing BUCKET_MAX in sys/vm/uma_core.c. I don't really understand the code here, but you seem to be hitting a threshold behaviour where you are constantly running out of space in the per CPU caches. Thanks, I'll try this. This can happen if your workload is unbalanced between the CPUs and you are always allocating on one but freeing on another, but I wouldn't expect it should happen on your workload. Maybe it can also happen if your turnover is high enough. This is very unlikely, because I have 5 another video storage servers of the same hardware and software configurations and they feel good. On the other side, all other servers were put in production before or after problematic servers and were filled with content in the other ways and therefore they could have slightly differerent load pattern. Totally I faced this bug three times: 1. The first time there was AFAIR 5.4-RELEASE on DELL 2850 with the same configuration as now. It was mp3 store and I used thttpd as HTTP server to serve mp3's. That time the problems were not so frequent and also it took too long to get back to normal operation so we had to reboot servers once a week or so. The problems began when we moved to new hardware - Dell 2850. That time we suspected amrd driver and had no time to dig in, bacause all the servers of the project were problematic. Installing Linux helped. 2. The second time it was server for static files of the very popular blog. The http server was nginx and disk contented puctures, mp3's and videos. It was Dell 1850 2x146 SCSI mirror. Linux also solved the problem. 3. The problem we see now. At first glance one can say that problem is in Dell's x850 series or amr(4), but we run this hardware on many other projects and they work well. Also Linux on them works. And few hours ago I received feed back from Andrzej Tobola, he has the same problem on FreeBSD 7 with Promise ATA software mirror: === Subject: Re: amrd disk performance drop after running under high load Date: Tue, 16 Oct 2007 10:59:34 +0200 From: Andrzej Tobola [EMAIL PROTECTED] To: Alexey Popov [EMAIL PROTECTED] skip Exactly the same here but on big ata RAID0 with big trafic (~10GB/24h): amper% df -h /ftp/priv FilesystemSizeUsed Avail Capacity Mounted one /dev/ar0a 744G679G4.7G99%/ftp/priv amper% grep ^ar /var/run/dmesg.boot ar0: 763108MB Promise Fasttrak RAID0 (stripe 64 KB) status: READY ar0: disk0 READY using ad6 at ata3-master ar0: disk1 READY using ad4 at ata2-master amper% uname -a FreeBSD xxx 7.0-CURRENT-200709 FreeBSD 7.0-CURRENT-200709 #0: Tue Sep 11 04:44:48 UTC 2007 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC i386 I am rebooting if I reach this state (approx. a week). It is old bug - a few months ;) cheers, -a === So I can conclude that FreeBSD has a long standing bug in VM that could be triggered when serving large amount of static data (much bigger than memory size) on high rates. Possibly this only applies to large files like mp3 or video. What does vmstat -z show during the good and bad times? I'll send this data when the bad times will happen next time. With best regards, Alexey Popov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
* Alexey Popov ([EMAIL PROTECTED]) wrote: So I can conclude that FreeBSD has a long standing bug in VM that could be triggered when serving large amount of static data (much bigger than memory size) on high rates. Possibly this only applies to large files like mp3 or video. I've seen highly dubious VM behavior when reading large files locally; the system ends up swapping out a small but significant amount of various processes, even very small recently active ones like syslogd, for no apparant reason: http://lists.freebsd.org/pipermail/freebsd-stable/2007-September/036956.html I've also seen dubious IO behavior from amr(4), where access to one array will interfere with IO from an independent set of spindles that just happen to be attached to the same card: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/114438 Given the blank looks I've received every time I've mentioned these things I'm guessing they aren't seen by others all that often, but maybe one or both are vaguely relevent to your situation. -- Thomas 'Freaky' Hurst http://hur.st/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Alexey Popov wrote: Hi. Kris Kennaway wrote: After some time of running under high load disk performance become expremely poor. At that periods 'systat -vm 1' shows something like this: What does high load mean? You need to explain the system workload more. This web service is similiar to YouTube. This server is video store. I have around 200G of *.flv (flash video) files on the server. I run lighttpd as a web server. Disk load is usually around 50%, network output 100Mbit/s, 100 simultaneous connections. CPU is mostly idle. As you can see it is a trivial service - sending files to network via HTTP. Does lighttpd actually use HTTP accept filters? Don't know how to make sure, but is seems to run appropriate setsockopt (truss output): setsockopt(0x4,0x,0x1000,0x7fffe620,0x100) = 0 (0x0) OK. Are you using ipfilter and ipfw? You are paying a performance penalty for having them. I'm using ipfw and one of the first rules is to pass all TCP established. ipfilter is not used on this server, but it is present in kernel as it can be used on other servers. I have 95% CPU idle, so I think packet filters does not produce significant load on this server. Well, it was not your most serious issue, but from your profiling trace it is definitely burning cycles with every packet processed. You might try increasing BUCKET_MAX in sys/vm/uma_core.c. I don't really understand the code here, but you seem to be hitting a threshold behaviour where you are constantly running out of space in the per CPU caches. Thanks, I'll try this. This can happen if your workload is unbalanced between the CPUs and you are always allocating on one but freeing on another, but I wouldn't expect it should happen on your workload. Maybe it can also happen if your turnover is high enough. This is very unlikely, because I have 5 another video storage servers of the same hardware and software configurations and they feel good. Clearly something is different about them, though. If you can characterize exactly what that is then it will help. On the other side, all other servers were put in production before or after problematic servers and were filled with content in the other ways and therefore they could have slightly differerent load pattern. Totally I faced this bug three times: 1. The first time there was AFAIR 5.4-RELEASE on DELL 2850 with the same configuration as now. It was mp3 store and I used thttpd as HTTP server to serve mp3's. That time the problems were not so frequent and also it took too long to get back to normal operation so we had to reboot servers once a week or so. The problems began when we moved to new hardware - Dell 2850. That time we suspected amrd driver and had no time to dig in, bacause all the servers of the project were problematic. Installing Linux helped. 2. The second time it was server for static files of the very popular blog. The http server was nginx and disk contented puctures, mp3's and videos. It was Dell 1850 2x146 SCSI mirror. Linux also solved the problem. 3. The problem we see now. At first glance one can say that problem is in Dell's x850 series or amr(4), but we run this hardware on many other projects and they work well. Also Linux on them works. OK but there is no evidence in what you posted so far that amr is involved in any way. There is convincing evidence that it is the mbuf issue. And few hours ago I received feed back from Andrzej Tobola, he has the same problem on FreeBSD 7 with Promise ATA software mirror: Well, he didnt provide any evidence yet that it is the same problem, so let's not become confused by feelings :) So I can conclude that FreeBSD has a long standing bug in VM that could be triggered when serving large amount of static data (much bigger than memory size) on high rates. Possibly this only applies to large files like mp3 or video. It is possible, we have further work to do to conclude this though. Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: amrd disk performance drop after running under high load
Alexey Popov wrote: After some time of running under high load disk performance become expremely poor. At that periods 'systat -vm 1' shows something like this: What does high load mean? You need to explain the system workload more. Disks amrd0 KB/t 85.39 tps 5 MB/s 0.38 % busy 99 Apart of all, I tried to make mutex profiling and here's the results (sorted by the total number of acquisitions): Bad case: 102 223514 273977 0 14689 1651568 /usr/src/sys/vm/uma_core.c:2349 (512) 950 263099 273968 0 15004 14427 /usr/src/sys/vm/uma_core.c:2450 (512) 108 150422 175840 0 10978 22988519 /usr/src/sys/vm/uma_core.c:1888 (mbuf) Here you can see that high UMA activity happens in periods of low disk performance. But I'm not sure whether this is a root of the problem, not a consequence. The extremely high contention there does seem to say you have a mbuf starvation problem and not a disk problem. I don't know why this would be happening off-hand. Can you also provide more details about the system hardware and configuration? Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]