Re: amrd disk performance drop after running under high load

2007-11-23 Thread Alexey Popov

Kris Kennaway wrote:


what is your RAID controller configuration (read ahead/cache/write
policy)? I have seen weird/bogus numbers (~100% busy) reported by
systat -v when read ahead was enabled on LSI/amr controllers.

I tried to run with disabled Read-ahead, but it didn't help.
I just ran into this myself, and apparently it can be caused by 
Patrol Reads where the adapter periodically scans the disks to look 
for media errors.  You can turn this off using -stopPR with the megarc gg 
port.
Oops, -disPR is the correct command to disable, -stopPR just halts a PR 
event in progress.

Wow! Really disabling Patrol Reads solves the problem. Thank you!

I have many amrd's and all of them appear to have Patrol Reads enabled 
by default. But the problem happenes only on three of them. Is this a 
hardware problem?


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-11-23 Thread Kris Kennaway

Alexey Popov wrote:

Kris Kennaway wrote:


what is your RAID controller configuration (read ahead/cache/write
policy)? I have seen weird/bogus numbers (~100% busy) reported by
systat -v when read ahead was enabled on LSI/amr controllers.

I tried to run with disabled Read-ahead, but it didn't help.
I just ran into this myself, and apparently it can be caused by 
Patrol Reads where the adapter periodically scans the disks to look 
for media errors.  You can turn this off using -stopPR with the 
megarc gg port.
Oops, -disPR is the correct command to disable, -stopPR just halts a 
PR event in progress.

Wow! Really disabling Patrol Reads solves the problem. Thank you!

I have many amrd's and all of them appear to have Patrol Reads enabled 
by default. But the problem happenes only on three of them. Is this a 
hardware problem?


I am not sure, maybe for some reason the patrol reads are not 
interfering with other disk I/O so much (e.g. the hardware prioritises 
them differently or something).


Anyway, glad to hear it was resolved.

Kris

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-11-21 Thread Kris Kennaway

Kris Kennaway wrote:

Alexey Popov wrote:

Hi.

Panagiotis Christias wrote:

In the good case you are getting a much higher interrupt rate but
with the data you provided I can't tell where from.  You need to run
vmstat -i at regular intervals (e.g. every 10 seconds for a minute)
during the good and bad times, since it only provides counters
and an average rate over the uptime of the system.

Now I'm running 10-process lighttpd and the problem became no so big.

I collected interrupt stats and it shows no relation beetween
ionterrupts and slowdowns. Here is it:
http://83.167.98.162/gprof/intr-graph/

Also I have similiar statistics on mutex profiling and it shows
there's no problem in mutexes.
http://83.167.98.162/gprof/mtx-graph/mtxgifnew/

I have no idea what else to check.

I don't know what this graph is showing me :)  When precisely is the
system behaving poorly?

what is your RAID controller configuration (read ahead/cache/write
policy)? I have seen weird/bogus numbers (~100% busy) reported by
systat -v when read ahead was enabled on LSI/amr controllers.



**
  Existing Logical Drive Information
  By LSI Logic Corp.,USA

**
  [Note: For SATA-2, 4 and 6 channel controllers, please specify
  Ch=0 Id=0..15 for specifying physical drive(Ch=channel,
Id=Target)]


  Logical Drive : 0( Adapter: 0 ):  Status: OPTIMAL
---
SpanDepth :01 RaidLevel: 5  RdAhead : Adaptive  Cache: 
DirectIo

StripSz   :064KB   Stripes  : 6  WrPolicy: WriteBack

Logical Drive 0 : SpanLevel_0 Disks
Chnl  Target  StartBlock   Blocks  Physical Target Status
  --  --   --  --
0  000x   0x22ec   ONLINE
0  010x   0x22ec   ONLINE
0  020x   0x22ec   ONLINE
0  030x   0x22ec   ONLINE
0  040x   0x22ec   ONLINE
0  050x   0x22ec   ONLINE

I tried to run with disabled Read-ahead, but it didn't help.


I just ran into this myself, and apparently it can be caused by Patrol 
Reads where the adapter periodically scans the disks to look for media 
errors.  You can turn this off using -stopPR with the megarc port.


Kris



Oops, -disPR is the correct command to disable, -stopPR just halts a PR 
event in progress.


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-11-19 Thread Kris Kennaway

Alexey Popov wrote:

Hi.

Panagiotis Christias wrote:

In the good case you are getting a much higher interrupt rate but
with the data you provided I can't tell where from.  You need to run
vmstat -i at regular intervals (e.g. every 10 seconds for a minute)
during the good and bad times, since it only provides counters
and an average rate over the uptime of the system.

Now I'm running 10-process lighttpd and the problem became no so big.

I collected interrupt stats and it shows no relation beetween
ionterrupts and slowdowns. Here is it:
http://83.167.98.162/gprof/intr-graph/

Also I have similiar statistics on mutex profiling and it shows
there's no problem in mutexes.
http://83.167.98.162/gprof/mtx-graph/mtxgifnew/

I have no idea what else to check.

I don't know what this graph is showing me :)  When precisely is the
system behaving poorly?

what is your RAID controller configuration (read ahead/cache/write
policy)? I have seen weird/bogus numbers (~100% busy) reported by
systat -v when read ahead was enabled on LSI/amr controllers.



**
  Existing Logical Drive Information
  By LSI Logic Corp.,USA

**
  [Note: For SATA-2, 4 and 6 channel controllers, please specify
  Ch=0 Id=0..15 for specifying physical drive(Ch=channel,
Id=Target)]


  Logical Drive : 0( Adapter: 0 ):  Status: OPTIMAL
---
SpanDepth :01 RaidLevel: 5  RdAhead : Adaptive  Cache: DirectIo
StripSz   :064KB   Stripes  : 6  WrPolicy: WriteBack

Logical Drive 0 : SpanLevel_0 Disks
Chnl  Target  StartBlock   Blocks  Physical Target Status
  --  --   --  --
0  000x   0x22ec   ONLINE
0  010x   0x22ec   ONLINE
0  020x   0x22ec   ONLINE
0  030x   0x22ec   ONLINE
0  040x   0x22ec   ONLINE
0  050x   0x22ec   ONLINE

I tried to run with disabled Read-ahead, but it didn't help.


I just ran into this myself, and apparently it can be caused by Patrol 
Reads where the adapter periodically scans the disks to look for media 
errors.  You can turn this off using -stopPR with the megarc port.


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-11-12 Thread Alexey Popov

Hi.

Panagiotis Christias wrote:

In the good case you are getting a much higher interrupt rate but
with the data you provided I can't tell where from.  You need to run
vmstat -i at regular intervals (e.g. every 10 seconds for a minute)
during the good and bad times, since it only provides counters
and an average rate over the uptime of the system.

Now I'm running 10-process lighttpd and the problem became no so big.

I collected interrupt stats and it shows no relation beetween
ionterrupts and slowdowns. Here is it:
http://83.167.98.162/gprof/intr-graph/

Also I have similiar statistics on mutex profiling and it shows
there's no problem in mutexes.
http://83.167.98.162/gprof/mtx-graph/mtxgifnew/

I have no idea what else to check.

I don't know what this graph is showing me :)  When precisely is the
system behaving poorly?

what is your RAID controller configuration (read ahead/cache/write
policy)? I have seen weird/bogus numbers (~100% busy) reported by
systat -v when read ahead was enabled on LSI/amr controllers.



**
  Existing Logical Drive Information
  By LSI Logic Corp.,USA

**
  [Note: For SATA-2, 4 and 6 channel controllers, please specify
  Ch=0 Id=0..15 for specifying physical drive(Ch=channel,
Id=Target)]


  Logical Drive : 0( Adapter: 0 ):  Status: OPTIMAL
---
SpanDepth :01 RaidLevel: 5  RdAhead : Adaptive  Cache: DirectIo
StripSz   :064KB   Stripes  : 6  WrPolicy: WriteBack

Logical Drive 0 : SpanLevel_0 Disks
Chnl  Target  StartBlock   Blocks  Physical Target Status
  --  --   --  --
0  000x   0x22ec   ONLINE
0  010x   0x22ec   ONLINE
0  020x   0x22ec   ONLINE
0  030x   0x22ec   ONLINE
0  040x   0x22ec   ONLINE
0  050x   0x22ec   ONLINE

I tried to run with disabled Read-ahead, but it didn't help.

With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-11-11 Thread Alexey Popov

Hi.

Kris Kennaway wrote:
In the good case you are getting a much higher interrupt rate but 
with the data you provided I can't tell where from.  You need to run 
vmstat -i at regular intervals (e.g. every 10 seconds for a minute) 
during the good and bad times, since it only provides counters 
and an average rate over the uptime of the system.


Now I'm running 10-process lighttpd and the problem became no so big.

I collected interrupt stats and it shows no relation beetween 
ionterrupts and slowdowns. Here is it:

http://83.167.98.162/gprof/intr-graph/

Also I have similiar statistics on mutex profiling and it shows 
there's no problem in mutexes. 
http://83.167.98.162/gprof/mtx-graph/mtxgifnew/


I have no idea what else to check.


I don't know what this graph is showing me :)  When precisely is the 
system behaving poorly?
Take a look at Disk Load % picture at 
http://83.167.98.162/gprof/intr-graph/


At ~ 17:00, 03:00-04:00, 13:00-14:00, 00:30-01:30, 11:00-13:00 it shows 
peaks of disk activity which really never happen. As I said in the 
beginning of the thread in this peak moments disk becomes slow and 
vmstat shows 100% disk load while performing  10 tps. Other grafs at 
this page shows that there's no relation to interrupts rate of amr or em 
device. You advised me to check it.


When I was using single-process lighttpd the problem was much harder as 
you can see at http://83.167.98.162/gprof/graph/ . At first picture on 
this page you can see disk load peaks at 18:00 and 15:00 which leaded to 
decreasing network output because disk was too slow.


Back in this thread we suspected UMA mutexes. In order to check it I 
collected mutex profiling stats and draw graphs over time and they also 
didn't show anything interesting. All mutex graphs were smooth while 
disk load peaks. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-11-11 Thread Panagiotis Christias
On Nov 11, 2007 7:26 PM, Alexey Popov [EMAIL PROTECTED] wrote:
 Hi.

 Kris Kennaway wrote:
  In the good case you are getting a much higher interrupt rate but
  with the data you provided I can't tell where from.  You need to run
  vmstat -i at regular intervals (e.g. every 10 seconds for a minute)
  during the good and bad times, since it only provides counters
  and an average rate over the uptime of the system.
 
  Now I'm running 10-process lighttpd and the problem became no so big.
 
  I collected interrupt stats and it shows no relation beetween
  ionterrupts and slowdowns. Here is it:
  http://83.167.98.162/gprof/intr-graph/
 
  Also I have similiar statistics on mutex profiling and it shows
  there's no problem in mutexes.
  http://83.167.98.162/gprof/mtx-graph/mtxgifnew/
 
  I have no idea what else to check.

  I don't know what this graph is showing me :)  When precisely is the
  system behaving poorly?
 Take a look at Disk Load % picture at
 http://83.167.98.162/gprof/intr-graph/

 At ~ 17:00, 03:00-04:00, 13:00-14:00, 00:30-01:30, 11:00-13:00 it shows
 peaks of disk activity which really never happen. As I said in the
 beginning of the thread in this peak moments disk becomes slow and
 vmstat shows 100% disk load while performing  10 tps. Other grafs at
 this page shows that there's no relation to interrupts rate of amr or em
 device. You advised me to check it.

 When I was using single-process lighttpd the problem was much harder as
 you can see at http://83.167.98.162/gprof/graph/ . At first picture on
 this page you can see disk load peaks at 18:00 and 15:00 which leaded to
 decreasing network output because disk was too slow.

 Back in this thread we suspected UMA mutexes. In order to check it I
 collected mutex profiling stats and draw graphs over time and they also
 didn't show anything interesting. All mutex graphs were smooth while
 disk load peaks. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/

 With best regards,
 Alexey Popov

Hello,

what is your RAID controller configuration (read ahead/cache/write
policy)? I have seen weird/bogus numbers (~100% busy) reported by
systat -v when read ahead was enabled on LSI/amr controllers.

Regards,
Panagiotis Christias
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-11-09 Thread Alexey Popov

Hi.

Kris Kennaway wrote:te:

In the good case you are getting a much higher interrupt rate but with 
the data you provided I can't tell where from.  You need to run vmstat 
-i at regular intervals (e.g. every 10 seconds for a minute) during the 
good and bad times, since it only provides counters and an average 
rate over the uptime of the system.


Now I'm running 10-process lighttpd and the problem became no so big.

I collected interrupt stats and it shows no relation beetween 
ionterrupts and slowdowns. Here is it:

http://83.167.98.162/gprof/intr-graph/

Also I have similiar statistics on mutex profiling and it shows there's 
no problem in mutexes. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/


I have no idea what else to check.

With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-11-09 Thread Kris Kennaway

Alexey Popov wrote:

Hi.

Kris Kennaway wrote:te:

In the good case you are getting a much higher interrupt rate but 
with the data you provided I can't tell where from.  You need to run 
vmstat -i at regular intervals (e.g. every 10 seconds for a minute) 
during the good and bad times, since it only provides counters and 
an average rate over the uptime of the system.


Now I'm running 10-process lighttpd and the problem became no so big.

I collected interrupt stats and it shows no relation beetween 
ionterrupts and slowdowns. Here is it:

http://83.167.98.162/gprof/intr-graph/

Also I have similiar statistics on mutex profiling and it shows there's 
no problem in mutexes. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/


I have no idea what else to check.

With best regards,
Alexey Popov




I don't know what this graph is showing me :)  When precisely is the 
system behaving poorly?


Kris
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-31 Thread Alexey Popov

Hi

Kris Kennaway wrote:
So I can conclude that FreeBSD has a long standing bug in VM that 
could be triggered when serving large amount of static data (much 
bigger than memory size) on high rates. Possibly this only applies 
to large files like mp3 or video. 

It is possible, we have further work to do to conclude this though.
I forgot to mention I have pmc and kgmon profiling for good and bad 
times. But I have not enough knowledge to interpret it right and not 
sure if it can help.

pmc would be useful.

pmc profiling attached.
OK, the pmc traces do seem to show that it's not a lock contention 
issue.  That being the case I don't think the fact that different 
servers perform better is directly related. 
But it was evidence of mbuf lock contention in mutex profiling, wasn't 
it? As far as I understand, mutex problems can exist without increasing 
CPU load in pmc stats, right?


There is also no evidence of a VM problem.  What your vmstat and pmc 
traces show is that your system really isn't doing much work at all, 
relatively speaking.
There is also still no evidence of a disk problem.  In fact your disk 
seems to be almost idle in both cases you provided, only doing between 1 
and 10 operations per second, which is trivial.
vmstat and network output graphs shows that the problem exists. If it is 
not a disk or network or VM problem, what else could be wrong?


In the good case you are getting a much higher interrupt rate but with 
the data you provided I can't tell where from.  You need to run vmstat 
-i at regular intervals (e.g. every 10 seconds for a minute) during the 
good and bad times, since it only provides counters and an average 
rate over the uptime of the system.

I'll try this, but AFAIR there was no strangeness with interrupts.

I believe the reason of high interrupt rate in good cases is that 
server sends much traffic.


What there is evidence of is an interrupt aliasing problem between em 
and USB:

irq16: uhci0  1464547796   1870
irq64: em01463513610   1869
I tried disabling USB in kernel, this ussie was gone, but the main 
problem was left. Also I have this issue with interrupt aliasing on many 
servers without problems.


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-31 Thread Kris Kennaway

Alexey Popov wrote:

Hi

Kris Kennaway wrote:
So I can conclude that FreeBSD has a long standing bug in VM that 
could be triggered when serving large amount of static data (much 
bigger than memory size) on high rates. Possibly this only 
applies to large files like mp3 or video. 

It is possible, we have further work to do to conclude this though.
I forgot to mention I have pmc and kgmon profiling for good and bad 
times. But I have not enough knowledge to interpret it right and 
not sure if it can help.

pmc would be useful.

pmc profiling attached.
OK, the pmc traces do seem to show that it's not a lock contention 
issue.  That being the case I don't think the fact that different 
servers perform better is directly related. 
But it was evidence of mbuf lock contention in mutex profiling, wasn't 
it? As far as I understand, mutex problems can exist without increasing 
CPU load in pmc stats, right?


No, the lock functions will show up as using a lot of CPU.  I guess the 
lock profiling trace showed high numbers because you ran it for a long time.


There is also no evidence of a VM problem.  What your vmstat and pmc 
traces show is that your system really isn't doing much work at all, 
relatively speaking.
There is also still no evidence of a disk problem.  In fact your disk 
seems to be almost idle in both cases you provided, only doing between 
1 and 10 operations per second, which is trivial.
vmstat and network output graphs shows that the problem exists. If it is 
not a disk or network or VM problem, what else could be wrong?


The vmstat output you provided so far doesn't show anything specific.

In the good case you are getting a much higher interrupt rate but 
with the data you provided I can't tell where from.  You need to run 
vmstat -i at regular intervals (e.g. every 10 seconds for a minute) 
during the good and bad times, since it only provides counters and 
an average rate over the uptime of the system.

I'll try this, but AFAIR there was no strangeness with interrupts.

I believe the reason of high interrupt rate in good cases is that 
server sends much traffic.


What there is evidence of is an interrupt aliasing problem between em 
and USB:

irq16: uhci0  1464547796   1870
irq64: em01463513610   1869
I tried disabling USB in kernel, this ussie was gone, but the main 
problem was left. Also I have this issue with interrupt aliasing on many 
servers without problems.


OK.

Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-27 Thread Kris Kennaway

Alexey Popov wrote:

Hi

Kris Kennaway wrote:
So I can conclude that FreeBSD has a long standing bug in VM that 
could be triggered when serving large amount of static data (much 
bigger than memory size) on high rates. Possibly this only applies 
to large files like mp3 or video. 

It is possible, we have further work to do to conclude this though.
I forgot to mention I have pmc and kgmon profiling for good and bad 
times. But I have not enough knowledge to interpret it right and not 
sure if it can help.

pmc would be useful.

pmc profiling attached.


Sorry for the delay, I was travelling last weekend and it took a few 
days to catch up.


OK, the pmc traces do seem to show that it's not a lock contention 
issue.  That being the case I don't think the fact that different 
servers perform better is directly related.  In my tests multithreaded 
web servers don't seem to perform well anyway.


There is also no evidence of a VM problem.  What your vmstat and pmc 
traces show is that your system really isn't doing much work at all, 
relatively speaking.


There is also still no evidence of a disk problem.  In fact your disk 
seems to be almost idle in both cases you provided, only doing between 1 
and 10 operations per second, which is trivial.


In the good case you are getting a much higher interrupt rate but with 
the data you provided I can't tell where from.  You need to run vmstat 
-i at regular intervals (e.g. every 10 seconds for a minute) during the 
good and bad times, since it only provides counters and an average 
rate over the uptime of the system.


What there is evidence of is an interrupt aliasing problem between em 
and USB:


irq16: uhci0  1464547796   1870
irq64: em01463513610   1869

This is a problem on some intel systems.  Basically each em0 interrupt 
is also causing a bogus interrupt to the uhci0 device too.  This will be 
causing some overhead and might be contributing to the UMA problems.  I 
am not sure if it is the main issue, although it could be.  It is mostly 
serious when both irqs run under Giant, because they will both fight for 
it every time one of them interrupts.  That is not the case here but it 
could be other bad scenarios too.  You could try disabling USB support 
in your kernel since you dont seem to be using it.


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-19 Thread Alexey Popov

Hi

Scott Long wrote:


interrupt  total   rate
irq6: fdc0 8  0
irq14: ata0   47  0
irq16: uhci0  1428187319   1851

^^    [1]

irq18: uhci212374352 16
irq23: ehci0   3  0
irq46: amr0 11983237 15
irq64: em01427141755   1850

^^    [2]

cpu0: timer   1540896452   1997
cpu1: timer   1542377798   1999
Total 5962960971   7730


[1] and [2] looks suspicious to me (totals and rate are too close to
each other and btw to timers). Let the latter (timers) alone. Do you
use any USB device? Can you try to use other network card? That
behaviour seems to be an interrupt storm and/or irq collision.


It's neither.  It's a side effect of a feature that FreeBSD abuses for
handling interrupts.  Note that amr0 and ehci2 are acting similar.  It's
mostly harmless, but it does waste CPU cycles.  I wouldn't expect this
on a recent version of FreeBSD, though, at least not from the e1000
driver.
I have this effect on many servers and I believe it is harmless. At once 
I was trying to reduce CPU usage on the very loaded server and removed 
USB from kernel. This effect disappeared, but there was no significant 
difference in CPU usage.


I disagree about your words about recent version. I have this effect on 
many servers with latest FreeBSD-6-stable and em. Actually I have more 
servers with this effect than without it.


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-18 Thread Boris Samorodov
Hi!

Since nobody answered so far, here is my two cents. I'm not an expert
here so it's only my imho.

On Wed, 17 Oct 2007 22:52:49 +0400 Alexey Popov wrote:

 interrupt  total   rate
 irq6: fdc0 8  0
 irq14: ata0   47  0
 irq16: uhci0  1428187319   1851
^^    [1]
 irq18: uhci212374352 16
 irq23: ehci0   3  0
 irq46: amr0 11983237 15
 irq64: em01427141755   1850
^^    [2]
 cpu0: timer   1540896452   1997
 cpu1: timer   1542377798   1999
 Total 5962960971   7730

[1] and [2] looks suspicious to me (totals and rate are too close to
each other and btw to timers). Let the latter (timers) alone. Do you
use any USB device? Can you try to use other network card? That
behaviour seems to be an interrupt storm and/or irq collision.


WBR
-- 
bsam
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-18 Thread Scott Long

Boris Samorodov wrote:

Hi!

Since nobody answered so far, here is my two cents. I'm not an expert
here so it's only my imho.

On Wed, 17 Oct 2007 22:52:49 +0400 Alexey Popov wrote:


interrupt  total   rate
irq6: fdc0 8  0
irq14: ata0   47  0
irq16: uhci0  1428187319   1851

^^    [1]

irq18: uhci212374352 16
irq23: ehci0   3  0
irq46: amr0 11983237 15
irq64: em01427141755   1850

^^    [2]

cpu0: timer   1540896452   1997
cpu1: timer   1542377798   1999
Total 5962960971   7730


[1] and [2] looks suspicious to me (totals and rate are too close to
each other and btw to timers). Let the latter (timers) alone. Do you
use any USB device? Can you try to use other network card? That
behaviour seems to be an interrupt storm and/or irq collision.




It's neither.  It's a side effect of a feature that FreeBSD abuses for
handling interrupts.  Note that amr0 and ehci2 are acting similar.  It's
mostly harmless, but it does waste CPU cycles.  I wouldn't expect this
on a recent version of FreeBSD, though, at least not from the e1000
driver.

Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-18 Thread Boris Samorodov
On Thu, 18 Oct 2007 15:57:16 -0600 Scott Long wrote:
 Boris Samorodov wrote:

  Since nobody answered so far, here is my two cents. I'm not an expert
  here so it's only my imho.
 
  On Wed, 17 Oct 2007 22:52:49 +0400 Alexey Popov wrote:
 
  interrupt  total   rate
  irq6: fdc0 8  0
  irq14: ata0   47  0
  irq16: uhci0  1428187319   1851
  ^^    [1]
  irq18: uhci212374352 16
  irq23: ehci0   3  0
  irq46: amr0 11983237 15
  irq64: em01427141755   1850
  ^^    [2]
  cpu0: timer   1540896452   1997
  cpu1: timer   1542377798   1999
  Total 5962960971   7730
 
  [1] and [2] looks suspicious to me (totals and rate are too close to
  each other and btw to timers). Let the latter (timers) alone. Do you
  use any USB device? Can you try to use other network card? That
  behaviour seems to be an interrupt storm and/or irq collision.

 It's neither.  It's a side effect of a feature that FreeBSD abuses for
 handling interrupts.  Note that amr0 and ehci2 are acting similar.  It's
 mostly harmless, but it does waste CPU cycles.  I wouldn't expect this
 on a recent version of FreeBSD, though, at least not from the e1000
 driver.

I see. Sorry for the noise. So, as I can understand _that_ can't be the
problem (as at subj) the OP is seeing?


WBR
-- 
bsam
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-17 Thread Kris Kennaway

Alexey Popov wrote:

This is very unlikely, because I have 5 another video storage servers 
of the same hardware and software configurations and they feel good.
Clearly something is different about them, though.  If you can 
characterize exactly what that is then it will help.
I can't see any difference but a date of installation. Really I compared 
all parameters and got nothing interesting.


At first glance one can say that problem is in Dell's x850 series or 
amr(4), but we run this hardware on many other projects and they work 
well. Also Linux on them works.


OK but there is no evidence in what you posted so far that amr is 
involved in any way.  There is convincing evidence that it is the mbuf 
issue.

Why are you sure this is the mbuf issue?


Because that is the only problem shown in the data you posted.

 For example, if there is a real
problem with amr or VM causing disk slowdown, then when it occurs the 
network subsystem will have another load pattern. Instead of just quick 
sending large amounts of data, the system will have to accept large 
amount of sumultaneous connections waiting for data. Can this cause high 
mbuf contention?


I'd expect to see evidence of the main problem.

And few hours ago I received feed back from Andrzej Tobola, he has 
the same problem on FreeBSD 7 with Promise ATA software mirror:
Well, he didnt provide any evidence yet that it is the same problem, 
so let's not become confused by feelings :)
I think he is telling about 100% disk busy while processing ~5 
transfers/sec.


% busy as reported by gstat doesn't mean what you think it does.  What 
is the I/O response time?  That's the meaningful statistic for 
evaluating I/O load.  Also you didnt post about this.


So I can conclude that FreeBSD has a long standing bug in VM that 
could be triggered when serving large amount of static data (much 
bigger than memory size) on high rates. Possibly this only applies to 
large files like mp3 or video. 

It is possible, we have further work to do to conclude this though.
I forgot to mention I have pmc and kgmon profiling for good and bad 
times. But I have not enough knowledge to interpret it right and not 
sure if it can help.


pmc would be useful.

Also now I run nginx instead of lighttpd on one of the problematic 
servers. It seems to work much better - sometimes there is a peaks in 
disk load, but disk does not become very slow and network output does 
not change. The difference of nginx is that it runs in multiple 
processes, while lighttpd by default has only one process. Now I 
configured lighttpd on other server to run in multiple workers. I'll see 
if it helps.


What else can i try?


Still waiting on the vmstat -z output.

Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-17 Thread Alexey Popov

Hi.

Kris Kennaway wrote:
After some time of running under high load disk performance become 
expremely poor. At that periods 'systat -vm 1' shows something like

this:

This web service is similiar to YouTube. This server is video store. I
have around 200G of *.flv (flash video) files on the server
I run lighttpd as a web server. Disk load is usually around 50%, 
network

output 100Mbit/s, 100 simultaneous connections. CPU is mostly idle.
This is very unlikely, because I have 5 another video storage servers 
of the same hardware and software configurations and they feel good.
Clearly something is different about them, though.  If you can 
characterize exactly what that is then it will help.
I can't see any difference but a date of installation. Really I compared 
all parameters and got nothing interesting.


At first glance one can say that problem is in Dell's x850 series or 
amr(4), but we run this hardware on many other projects and they work 
well. Also Linux on them works.


OK but there is no evidence in what you posted so far that amr is 
involved in any way.  There is convincing evidence that it is the mbuf 
issue.
Why are you sure this is the mbuf issue? For example, if there is a real 
problem with amr or VM causing disk slowdown, then when it occurs the 
network subsystem will have another load pattern. Instead of just quick 
sending large amounts of data, the system will have to accept large 
amount of sumultaneous connections waiting for data. Can this cause high 
mbuf contention?




And few hours ago I received feed back from Andrzej Tobola, he has the 
same problem on FreeBSD 7 with Promise ATA software mirror:
Well, he didnt provide any evidence yet that it is the same problem, so 
let's not become confused by feelings :)
I think he is telling about 100% disk busy while processing ~5 
transfers/sec.


So I can conclude that FreeBSD has a long standing bug in VM that 
could be triggered when serving large amount of static data (much 
bigger than memory size) on high rates. Possibly this only applies to 
large files like mp3 or video. 

It is possible, we have further work to do to conclude this though.
I forgot to mention I have pmc and kgmon profiling for good and bad 
times. But I have not enough knowledge to interpret it right and not 
sure if it can help.


Also now I run nginx instead of lighttpd on one of the problematic 
servers. It seems to work much better - sometimes there is a peaks in 
disk load, but disk does not become very slow and network output does 
not change. The difference of nginx is that it runs in multiple 
processes, while lighttpd by default has only one process. Now I 
configured lighttpd on other server to run in multiple workers. I'll see 
if it helps.


What else can i try?

With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-17 Thread Kris Kennaway

Kris Kennaway wrote:


What else can i try?


Still waiting on the vmstat -z output.


Also can you please obtain vmstat -i, netstat -m and 10 seconds of 
representative vmstat -w output when the problem is and is not occurring?


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-17 Thread Alexey Popov

Hi

Kris Kennaway wrote:
And few hours ago I received feed back from Andrzej Tobola, he has 
the same problem on FreeBSD 7 with Promise ATA software mirror:
Well, he didnt provide any evidence yet that it is the same problem, 
so let's not become confused by feelings :)
I think he is telling about 100% disk busy while processing ~5 
transfers/sec.


% busy as reported by gstat doesn't mean what you think it does.  What 
is the I/O response time?  That's the meaningful statistic for 
evaluating I/O load.  Also you didnt post about this.
At the problematic time the disk felt to be very slow, processes all 
were in reading disk state and vmstat proved it by the % numbers.


So I can conclude that FreeBSD has a long standing bug in VM that 
could be triggered when serving large amount of static data (much 
bigger than memory size) on high rates. Possibly this only applies 
to large files like mp3 or video. 

It is possible, we have further work to do to conclude this though.
I forgot to mention I have pmc and kgmon profiling for good and bad 
times. But I have not enough knowledge to interpret it right and not 
sure if it can help.

pmc would be useful.
Unfortunately i've lost pmc profiling results. I'll try to collect it 
again later. See vmstats in attach (vmstat -z; netstat -m; vmstat -i; 
vmstat -w 1 | head -11;).


Also you can see kgmon profiling results at: http://83.167.98.162/gprof/

With best regards,
Alexey Popov

ITEM SIZE LIMIT  USED  FREE  REQUESTS  FAILURES

UMA Kegs: 240,0,   71,4,   71,0
UMA Zones:376,0,   71,9,   71,0
UMA Slabs:128,0, 1011,   62,   243081,0
UMA RCntSlabs:128,0,  361, 1205,   363320,0
UMA Hash: 256,0,4,   11,7,0
16 Bucket:152,0,   45,   30,   72,0
32 Bucket:280,0,   25,   45,   69,0
64 Bucket:536,0,   17,   25,   55,   53
128 Bucket:  1048,0,  287,   88, 1200,95423
VM OBJECT:224,0, 5536,23228,  7675004,0
MAP:  352,0,7,   15,7,0
KMAP ENTRY:   112,90222,  283, 1037,  1207524,0
MAP ENTRY:112,0, 1396,  419, 72221561,0
PV ENTRY:  48,  2244600,17835,30261, 768591673,0
DP fakepg:120,0,0,   31,   10,0
mt_zone: 1024,0,  170,6,  170,0
16:16,0, 3578, 2470, 745206870,0
32:32,0, 1273,  343,  1750850,0
64:64,0, 6147, 1693, 487691440,0
128:  128,0, 4659,  387,  1464251,0
256:  256,0,  596, 2539,  7208469,0
512:  512,0,  608,  253,   791295,0
1024:1024,0,   49,  239,82867,0
2048:2048,0,   27,  295,   115362,0
4096:4096,0,  240,  278,   564659,0
Files:120,0,  544,  324, 263880246,0
TURNSTILE:104,0,  181,   83,  307,0
PROC: 856,0,   82,   82,   308409,0
THREAD:   608,0,  169,   11,24468,0
KSEGRP:   136,0,  165,   69,  165,0
UPCALL:88,0,3,   73,3,0
SLEEPQUEUE:64,0,  181,   99,  307,0
VMSPACE:  544,0,   35,   77,   310929,0
mbuf_packet:  256,0,  368,  115, 1331807039,0
mbuf: 256,0, 2016, 2331, 5433003167,0
mbuf_cluster:2048,32768,  483,  239, 1236143964,0
mbuf_jumbo_pagesize: 4096,0,0,0,0,0
mbuf_jumbo_9k:   9216,0,0,0,0,0
mbuf_jumbo_16k: 16384,0,0,0,0,0
ACL UMA zone: 388,0,0,0,0,0
g_bio:216,0,4,  410, 48175991,0
ata_request:  336,0,0,   22,   24,0
ata_composite:376,0,0,0,0,0
VNODE:

Re: amrd disk performance drop after running under high load

2007-10-16 Thread Krassimir Slavchev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Alexey Popov wrote:
 Hi.
 
 Kris Kennaway wrote:
 
 After some time of running under high load disk performance become
 expremely poor. At that periods 'systat -vm 1' shows something like 
 this:
 What does high load mean?  You need to explain the system workload
 more.
 This web service is similiar to YouTube. This server is video store. I
 have around 200G of *.flv (flash video) files on the server.
 
 I run lighttpd as a web server. Disk load is usually around 50%, network
 output 100Mbit/s, 100 simultaneous connections. CPU is mostly idle.
 
 As you can see it is a trivial service - sending files to network via HTTP.
 

 Disks amrd0
 KB/t  85.39
 tps   5
 MB/s   0.38
 % busy   99

 Apart of all, I tried to make mutex profiling and here's the results
 (sorted by the total number of acquisitions):

 Bad case:

  102 223514 273977 0 14689 1651568 /usr/src/sys/vm/uma_core.c:2349 (512)
  950 263099 273968 0 15004 14427 /usr/src/sys/vm/uma_core.c:2450 (512)
  108 150422 175840 0 10978 22988519 /usr/src/sys/vm/uma_core.c:1888
 (mbuf)

   Here you can see that high UMA activity happens in periods of low disk
   performance. But I'm not sure whether this is a root of the
 problem, not
   a consequence.

 The extremely high contention there does seem to say you have a mbuf
 starvation problem and not a disk problem.  I don't know why this
 would be happening off-hand.
 But there's no mbuf shortage in `netstat -m`.
 
 What else can I try to track down the source of the problem?
 
 Can you also provide more details about the system hardware and
 configuration?
 This is Dell 2850 2 x Xeon 3.2, 4Gb RAM, 6x300Gb SCSI RAID5. I'll attach
 details.
 
 With best regards,
 Alexey Popov
 
 
 
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]


last pid: 11008;  load averages:  0.07,  0.10,  0.08  up 47+08:32:50
11:46:15
38 processes:  1 running, 37 sleeping

Mem: 46M Active, 3443M Inact, 246M Wired, 144M Cache, 208M Buf, 5596K Free
Swap: 2048M Total, 4K Used, 2048M Free


  PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
56386 root 1   40 19856K 1K kqread 1 115:19  2.88% lighttpd
  636 root 1  960 18292K  4212K select 0  25:39  0.00% snmpd
  784 root 1  960 19668K  2072K select 1   2:31  0.00% sshd
  680 root 1  960  7732K  1384K select 0   1:59  0.00% ntpd
 1540 root 1  960 35092K  6496K select 0   1:30  0.00% httpd
  769 root 4  200 14148K  2632K kserel 0   1:04  0.00% bacula-fd
  755 root 1  960  3852K  1060K select 1   0:22  0.00% master
  568 root 1  960  3648K   908K select 0   0:18  0.00% syslogd
80663 root 1   80  3688K  1016K nanslp 1   0:05  0.00% cron
  760 postfix  1  960  3944K  1160K select 0   0:04  0.00% qmgr
89776 www  1  200 35180K  6684K lockf  0   0:04  0.00% httpd
89763 www  1  200 35180K  6684K lockf  0   0:04  0.00% httpd
89774 www  1  200 35180K  6684K lockf  0   0:04  0.00% httpd
89775 www  1  960 35180K  6684K select 0   0:04  0.00% httpd
  699 root 1  200  7732K  1388K pause  0   0:03  0.00% ntpd
  484 root 1   40   652K   220K select 0   0:00  0.00% devd
10904 llp  1  960 30616K  3564K select 0   0:00  0.00% sshd
10915 root 1  200  3912K  2340K pause  1   0:00  0.00% csh


You run apache with mod_perl or php too. How many clients handle this
apache server? Also in this light load you have locked files! Check
script execution times (/server-status may be useful).
When you have hight load check swap usage and haw many processes are in
lockf  state.

Best Regards


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFHFHuVxJBWvpalMpkRAnqTAJ9FgURNk98dtD0HYX6xIz17R6sLpQCgh5nJ
XBtfOyzJJbkjzVzSF/WfmHc=
=oTHZ
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-16 Thread Kris Kennaway

Alexey Popov wrote:

Hi.

Kris Kennaway wrote:

After some time of running under high load disk performance become 
expremely poor. At that periods 'systat -vm 1' shows something like  
this:
What does high load mean?  You need to explain the system workload 
more.

This web service is similiar to YouTube. This server is video store. I
have around 200G of *.flv (flash video) files on the server.

I run lighttpd as a web server. Disk load is usually around 50%, network
output 100Mbit/s, 100 simultaneous connections. CPU is mostly idle.

As you can see it is a trivial service - sending files to network via HTTP.


A couple of comments.

Does lighttpd actually use HTTP accept filters?

Are you using ipfilter and ipfw?  You are paying a performance penalty 
for having them.


You might try increasing BUCKET_MAX in sys/vm/uma_core.c.  I don't 
really understand the code here, but you seem to be hitting a threshold 
behaviour where you are constantly running out of space in the per CPU 
caches.


This can happen if your workload is unbalanced between the CPUs and you 
are always allocating on one but freeing on another, but I wouldn't 
expect it should happen on your workload.  Maybe it can also happen if 
your turnover is high enough.  What does vmstat -z show during the good 
and bad times?


Kris


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-16 Thread Alexey Popov

Hi.

Krassimir Slavchev wrote:


You run apache with mod_perl or php too. How many clients handle this
apache server? Also in this light load you have locked files! Check
script execution times (/server-status may be useful).
When you have hight load check swap usage and haw many processes are in
lockf  state.
Apache is not much used here, it is just for kind of content management. 
 It is not exposed to external users.


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-16 Thread Alexey Popov

Hi.

Kris Kennaway wrote:

After some time of running under high load disk performance become 
expremely poor. At that periods 'systat -vm 1' shows something like

this:
What does high load mean?  You need to explain the system workload 
more.

This web service is similiar to YouTube. This server is video store. I
have around 200G of *.flv (flash video) files on the server.

I run lighttpd as a web server. Disk load is usually around 50%, network
output 100Mbit/s, 100 simultaneous connections. CPU is mostly idle.

As you can see it is a trivial service - sending files to network via 
HTTP.

Does lighttpd actually use HTTP accept filters?
Don't know how to make sure, but is seems to run appropriate setsockopt 
(truss output):


setsockopt(0x4,0x,0x1000,0x7fffe620,0x100) = 0 (0x0)

Are you using ipfilter and ipfw?  You are paying a performance penalty 
for having them.
I'm using ipfw and one of the first rules is to pass all TCP 
established. ipfilter is not used on this server, but it is present in 
kernel as it can be used on other servers. I have 95% CPU idle, so I 
think packet filters does not produce significant load on this server.


You might try increasing BUCKET_MAX in sys/vm/uma_core.c.  I don't 
really understand the code here, but you seem to be hitting a threshold 
behaviour where you are constantly running out of space in the per CPU 
caches.

Thanks, I'll try this.

This can happen if your workload is unbalanced between the CPUs and you 
are always allocating on one but freeing on another, but I wouldn't 
expect it should happen on your workload.  Maybe it can also happen if 
your turnover is high enough.  
This is very unlikely, because I have 5 another video storage servers of 
the same hardware and software configurations and they feel good.


On the other side, all other servers were put in production before or 
after problematic servers and were filled with content in the other ways 
and therefore they could have slightly differerent load pattern.


Totally I faced this bug three times:

1. The first time there was AFAIR 5.4-RELEASE on DELL 2850 with the same 
configuration as now. It was mp3 store and I used thttpd as HTTP server 
to serve mp3's. That time the problems were not so frequent and also it 
took too long to get back to normal operation so we had to reboot 
servers once a week or so.


The problems began when we moved to new hardware - Dell 2850. That time 
we suspected amrd driver and had no time to dig in, bacause all the 
servers of the project were problematic. Installing Linux helped.


2. The second time it was server for static files of the very popular 
blog. The http server was nginx and disk contented puctures, mp3's and 
videos. It was Dell 1850 2x146 SCSI mirror. Linux also solved the problem.


3. The problem we see now.

At first glance one can say that problem is in Dell's x850 series or 
amr(4), but we run this hardware on many other projects and they work 
well. Also Linux on them works.


And few hours ago I received feed back from Andrzej Tobola, he has the 
same problem on FreeBSD 7 with Promise ATA software mirror:


===
Subject: Re: amrd disk performance drop after running under high load
Date: Tue, 16 Oct 2007 10:59:34 +0200
From: Andrzej Tobola [EMAIL PROTECTED]
To: Alexey Popov [EMAIL PROTECTED]

skip

Exactly the same here but on big ata RAID0 with big trafic (~10GB/24h):

amper% df -h /ftp/priv
FilesystemSizeUsed   Avail Capacity  Mounted one
/dev/ar0a 744G679G4.7G99%/ftp/priv

amper% grep ^ar /var/run/dmesg.boot
ar0: 763108MB Promise Fasttrak RAID0 (stripe 64 KB) status: READY
ar0: disk0 READY using ad6 at ata3-master
ar0: disk1 READY using ad4 at ata2-master

amper% uname -a
FreeBSD xxx 7.0-CURRENT-200709 FreeBSD
7.0-CURRENT-200709 #0: Tue Sep 11 04:44:48 UTC 2007 
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC  i386


I am rebooting if I reach this state (approx. a week).
It is old bug - a few months ;)

cheers,
-a

===

So I can conclude that FreeBSD has a long standing bug in VM that could 
be triggered when serving large amount of static data (much bigger than 
memory size) on high rates. Possibly this only applies to large files 
like mp3 or video.



What does vmstat -z show during the good and bad times?

I'll send this data when the bad times will happen next time.

With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-16 Thread Thomas Hurst
* Alexey Popov ([EMAIL PROTECTED]) wrote:

 So I can conclude that FreeBSD has a long standing bug in VM that
 could be triggered when serving large amount of static data (much
 bigger than memory size) on high rates. Possibly this only applies to
 large files like mp3 or video.

I've seen highly dubious VM behavior when reading large files locally;
the system ends up swapping out a small but significant amount of
various processes, even very small recently active ones like syslogd,
for no apparant reason:

  http://lists.freebsd.org/pipermail/freebsd-stable/2007-September/036956.html

I've also seen dubious IO behavior from amr(4), where access to one
array will interfere with IO from an independent set of spindles that
just happen to be attached to the same card:

  http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/114438

Given the blank looks I've received every time I've mentioned these
things I'm guessing they aren't seen by others all that often, but maybe
one or both are vaguely relevent to your situation.

-- 
Thomas 'Freaky' Hurst
http://hur.st/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-16 Thread Kris Kennaway

Alexey Popov wrote:

Hi.

Kris Kennaway wrote:

After some time of running under high load disk performance become 
expremely poor. At that periods 'systat -vm 1' shows something like

this:
What does high load mean?  You need to explain the system workload 
more.

This web service is similiar to YouTube. This server is video store. I
have around 200G of *.flv (flash video) files on the server.

I run lighttpd as a web server. Disk load is usually around 50%, network
output 100Mbit/s, 100 simultaneous connections. CPU is mostly idle.

As you can see it is a trivial service - sending files to network via 
HTTP.

Does lighttpd actually use HTTP accept filters?
Don't know how to make sure, but is seems to run appropriate setsockopt 
(truss output):


setsockopt(0x4,0x,0x1000,0x7fffe620,0x100) = 0 (0x0)


OK.

Are you using ipfilter and ipfw?  You are paying a performance penalty 
for having them.
I'm using ipfw and one of the first rules is to pass all TCP 
established. ipfilter is not used on this server, but it is present in 
kernel as it can be used on other servers. I have 95% CPU idle, so I 
think packet filters does not produce significant load on this server.


Well, it was not your most serious issue, but from your profiling trace 
it is definitely burning cycles with every packet processed.


You might try increasing BUCKET_MAX in sys/vm/uma_core.c.  I don't 
really understand the code here, but you seem to be hitting a 
threshold behaviour where you are constantly running out of space in 
the per CPU caches.

Thanks, I'll try this.

This can happen if your workload is unbalanced between the CPUs and 
you are always allocating on one but freeing on another, but I 
wouldn't expect it should happen on your workload.  Maybe it can also 
happen if your turnover is high enough.  
This is very unlikely, because I have 5 another video storage servers of 
the same hardware and software configurations and they feel good.


Clearly something is different about them, though.  If you can 
characterize exactly what that is then it will help.


On the other side, all other servers were put in production before or 
after problematic servers and were filled with content in the other ways 
and therefore they could have slightly differerent load pattern.


Totally I faced this bug three times:

1. The first time there was AFAIR 5.4-RELEASE on DELL 2850 with the same 
configuration as now. It was mp3 store and I used thttpd as HTTP server 
to serve mp3's. That time the problems were not so frequent and also it 
took too long to get back to normal operation so we had to reboot 
servers once a week or so.


The problems began when we moved to new hardware - Dell 2850. That time 
we suspected amrd driver and had no time to dig in, bacause all the 
servers of the project were problematic. Installing Linux helped.


2. The second time it was server for static files of the very popular 
blog. The http server was nginx and disk contented puctures, mp3's and 
videos. It was Dell 1850 2x146 SCSI mirror. Linux also solved the problem.


3. The problem we see now.

At first glance one can say that problem is in Dell's x850 series or 
amr(4), but we run this hardware on many other projects and they work 
well. Also Linux on them works.


OK but there is no evidence in what you posted so far that amr is 
involved in any way.  There is convincing evidence that it is the mbuf 
issue.


And few hours ago I received feed back from Andrzej Tobola, he has the 
same problem on FreeBSD 7 with Promise ATA software mirror:


Well, he didnt provide any evidence yet that it is the same problem, so 
let's not become confused by feelings :)


So I can conclude that FreeBSD has a long standing bug in VM that could 
be triggered when serving large amount of static data (much bigger than 
memory size) on high rates. Possibly this only applies to large files 
like mp3 or video.


It is possible, we have further work to do to conclude this though.

Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: amrd disk performance drop after running under high load

2007-10-15 Thread Kris Kennaway

Alexey Popov wrote:

After some time of running under high load disk performance become 
expremely poor. At that periods 'systat -vm 1' shows something like  this:


What does high load mean?  You need to explain the system workload more.


Disks amrd0
KB/t  85.39
tps   5
MB/s   0.38
% busy   99


Apart of all, I tried to make mutex profiling and here's the results 
(sorted by the total number of acquisitions):


Bad case:

 102 223514 273977 0 14689 1651568 /usr/src/sys/vm/uma_core.c:2349 (512)
 950 263099 273968 0 15004 14427 /usr/src/sys/vm/uma_core.c:2450 (512)
 108 150422 175840 0 10978 22988519 /usr/src/sys/vm/uma_core.c:1888 (mbuf)


 Here you can see that high UMA activity happens in periods of low disk
 performance. But I'm not sure whether this is a root of the problem, not
 a consequence.

The extremely high contention there does seem to say you have a mbuf 
starvation problem and not a disk problem.  I don't know why this would 
be happening off-hand.


Can you also provide more details about the system hardware and 
configuration?


Kris



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]