Re: MBuf clusters - what uses them?

2014-04-07 Thread Manuel Bouyer
On Sat, Apr 05, 2014 at 08:10:41AM -0700, Paul Goyette wrote:
 I still have occassional lock-ups on my system which appear to be a result
 of MBuf Cluster exhaustion.  Just looking at netstat output, it shows that
 network is using about 554.  Yet vmstat -m shows more than 1000 in use.
 
 # netstat -m
 553 mbufs in use:
 541 mbufs allocated to data
 11 mbufs allocated to packet headers
 1 mbufs allocated to socket names and addresses
 98 calls to protocol drain routines
 # vmstat -mW | grep '^[MNm]'
 Memory resource pool statistics
 Name  SizeRequests FailReleases  InUse Avail  Pgreq  
 Pgrel Npage PageSz Hiwat Minpg Maxpg Idle Flags   Util
 mbpl   512   326610   31465   119644391
 236   155   4096   287 2   inf3 0x000  96.5%
 mclpl 2048   277460   26731   1015 9   1644   
 1132   512   4096   865 4 524274   4 0x000  99.1%
 mutex   64 46429620 2951894 1691068 604715 36471
 30 36441   4096 36471 0   inf1 0x040  72.5%
 #
 
 
 So, what is using the additional ~450 MBuf Clusters?

I found netstat -m to not be so reliable as well here.

I'm tracking down a mbuf leak (in netbsd-6) and so far I found
that eventually IFQ_ENQUEUE() could be called without KERNEL_LOCK held,
while the queue is only protected by a splfoo() call.
one is from ipfilter, and one from carp(4).
Are you using one of theses ?

You could also build a kernel with
options MBUFTRACE
and look at netstat -mssv (this one seems reliable).
This helped me spot these problems.

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: MBuf clusters - what uses them?

2014-04-06 Thread Paul Goyette

On Sat, 5 Apr 2014, Greg Troxel wrote:


snip

I see no fails counted.  Why do you think you are out of clusters?  Are
you seeing that in dmesg?  Or is it just a possible lockup explanation?


The mbuf/mbufcluster explanation was offered when I first reported 
this several months ago.



Please describe the lockup symptoms more precisely.


Most obvious symptom is sudden lack of network connectivity.  A ping to 
another host on the local network fails with a no buffer space error.



Also, look in vmstat -m for anything with fail != 0.


No failures ever appear.

However, I have tracked mbuf usage via netstat and vmstat, and shortly 
before the lockup, both numbers showed a sudden increase in utilization.



you might also save vmstat -m to a file every 5 minutes, and look
before/after the next lockup.


Yeah, I was doing this every 1 minute...


Someone at that time suggested that bit-torrent could have been doing 
something nasty, so I stopped my transmission server.  The frequency 
of lockup has dropped dramatically, but not to zero.



Another symptom is with postfix...  It receives incoming mail from the 
network, but fails to forward the mail through my local dspam - mailq 
shows lots of messages in the deferred state due to resources 
temporarily unavailable errors.  (As near as I can tell, postfix uses 
unix-family sockets for this...)






-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer |  | pgoyette at netbsd.org  |
-


MBuf clusters - what uses them?

2014-04-05 Thread Paul Goyette
I still have occassional lock-ups on my system which appear to be a 
result of MBuf Cluster exhaustion.  Just looking at netstat output, it 
shows that network is using about 554.  Yet vmstat -m shows more than 
1000 in use.


# netstat -m
553 mbufs in use:
541 mbufs allocated to data
11 mbufs allocated to packet headers
1 mbufs allocated to socket names and addresses
98 calls to protocol drain routines
# vmstat -mW | grep '^[MNm]'
Memory resource pool statistics
Name  SizeRequests FailReleases  InUse Avail  Pgreq  Pgrel 
Npage PageSz Hiwat Minpg Maxpg Idle Flags   Util
mbpl   512   326610   31465   119644391236  
 155   4096   287 2   inf3 0x000  96.5%
mclpl 2048   277460   26731   1015 9   1644   1132  
 512   4096   865 4 524274   4 0x000  99.1%
mutex   64 46429620 2951894 1691068 604715 3647130 
36441   4096 36471 0   inf1 0x040  72.5%
#


So, what is using the additional ~450 MBuf Clusters?


-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer |  | pgoyette at netbsd.org  |
-


Re: MBuf clusters - what uses them?

2014-04-05 Thread Greg Troxel

Paul Goyette p...@whooppee.com writes:

 # netstat -m
 553 mbufs in use:
 541 mbufs allocated to data
 11 mbufs allocated to packet headers
 1 mbufs allocated to socket names and addresses
 98 calls to protocol drain routines
 # vmstat -mW | grep '^[MNm]'
 Memory resource pool statistics
 Name  SizeRequests FailReleases  InUse Avail  Pgreq  
 Pgrel Npage PageSz Hiwat Minpg Maxpg Idle Flags   Util
 mbpl   512   326610   31465   119644391
 236   155   4096   287 2   inf3 0x000  96.5%
 mclpl 2048   277460   26731   1015 9   1644   
 1132   512   4096   865 4 524274   4 0x000  99.1%
 mutex   64 46429620 2951894 1691068 604715 36471
 30 36441   4096 36471 0   inf1 0x040  72.5%

mbuf and mbuf cluster are not the same thing.  I thought mbufs were 256
bytes, but they seem to be 512 (They are 256 on my netbsd-6/i386 box.)
Either way, they have a header and can hold some data.

mbuf clusters are 2048 bytes.  These are for data only, and are attached
to a regular mbuf so that the 2K space is used instead of the
512-header bytes.

clusters can be used when data does not fit in a regular mbuf.  Many
ethernet interfaces pre-allocate clusters and stage them in the receive
ring buffer so that the interface can just dma the data.   So having a
bunch of clusters used is pretty normal.   I'm a little fuzzy, but I
think some drivers have clusters preallocated and then get mbufs
themselves to attach to them when processing the receive interupt.

I see no fails counted.  Why do you think you are out of clusters?  Are
you seeing that in dmesg?  Or is it just a possible lockup explanation?

Please describe the lockup symptoms more precisely.

Also, look in vmstat -m for anything with fail != 0.

you might also save vmstat -m to a file every 5 minutes, and look
before/after the next lockup.




pgpdCATW0dljk.pgp
Description: PGP signature