Re: MBuf clusters - what uses them?
On Sat, Apr 05, 2014 at 08:10:41AM -0700, Paul Goyette wrote: I still have occassional lock-ups on my system which appear to be a result of MBuf Cluster exhaustion. Just looking at netstat output, it shows that network is using about 554. Yet vmstat -m shows more than 1000 in use. # netstat -m 553 mbufs in use: 541 mbufs allocated to data 11 mbufs allocated to packet headers 1 mbufs allocated to socket names and addresses 98 calls to protocol drain routines # vmstat -mW | grep '^[MNm]' Memory resource pool statistics Name SizeRequests FailReleases InUse Avail Pgreq Pgrel Npage PageSz Hiwat Minpg Maxpg Idle Flags Util mbpl 512 326610 31465 119644391 236 155 4096 287 2 inf3 0x000 96.5% mclpl 2048 277460 26731 1015 9 1644 1132 512 4096 865 4 524274 4 0x000 99.1% mutex 64 46429620 2951894 1691068 604715 36471 30 36441 4096 36471 0 inf1 0x040 72.5% # So, what is using the additional ~450 MBuf Clusters? I found netstat -m to not be so reliable as well here. I'm tracking down a mbuf leak (in netbsd-6) and so far I found that eventually IFQ_ENQUEUE() could be called without KERNEL_LOCK held, while the queue is only protected by a splfoo() call. one is from ipfilter, and one from carp(4). Are you using one of theses ? You could also build a kernel with options MBUFTRACE and look at netstat -mssv (this one seems reliable). This helped me spot these problems. -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: MBuf clusters - what uses them?
On Sat, 5 Apr 2014, Greg Troxel wrote: snip I see no fails counted. Why do you think you are out of clusters? Are you seeing that in dmesg? Or is it just a possible lockup explanation? The mbuf/mbufcluster explanation was offered when I first reported this several months ago. Please describe the lockup symptoms more precisely. Most obvious symptom is sudden lack of network connectivity. A ping to another host on the local network fails with a no buffer space error. Also, look in vmstat -m for anything with fail != 0. No failures ever appear. However, I have tracked mbuf usage via netstat and vmstat, and shortly before the lockup, both numbers showed a sudden increase in utilization. you might also save vmstat -m to a file every 5 minutes, and look before/after the next lockup. Yeah, I was doing this every 1 minute... Someone at that time suggested that bit-torrent could have been doing something nasty, so I stopped my transmission server. The frequency of lockup has dropped dramatically, but not to zero. Another symptom is with postfix... It receives incoming mail from the network, but fails to forward the mail through my local dspam - mailq shows lots of messages in the deferred state due to resources temporarily unavailable errors. (As near as I can tell, postfix uses unix-family sockets for this...) - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
MBuf clusters - what uses them?
I still have occassional lock-ups on my system which appear to be a result of MBuf Cluster exhaustion. Just looking at netstat output, it shows that network is using about 554. Yet vmstat -m shows more than 1000 in use. # netstat -m 553 mbufs in use: 541 mbufs allocated to data 11 mbufs allocated to packet headers 1 mbufs allocated to socket names and addresses 98 calls to protocol drain routines # vmstat -mW | grep '^[MNm]' Memory resource pool statistics Name SizeRequests FailReleases InUse Avail Pgreq Pgrel Npage PageSz Hiwat Minpg Maxpg Idle Flags Util mbpl 512 326610 31465 119644391236 155 4096 287 2 inf3 0x000 96.5% mclpl 2048 277460 26731 1015 9 1644 1132 512 4096 865 4 524274 4 0x000 99.1% mutex 64 46429620 2951894 1691068 604715 3647130 36441 4096 36471 0 inf1 0x040 72.5% # So, what is using the additional ~450 MBuf Clusters? - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: MBuf clusters - what uses them?
Paul Goyette p...@whooppee.com writes: # netstat -m 553 mbufs in use: 541 mbufs allocated to data 11 mbufs allocated to packet headers 1 mbufs allocated to socket names and addresses 98 calls to protocol drain routines # vmstat -mW | grep '^[MNm]' Memory resource pool statistics Name SizeRequests FailReleases InUse Avail Pgreq Pgrel Npage PageSz Hiwat Minpg Maxpg Idle Flags Util mbpl 512 326610 31465 119644391 236 155 4096 287 2 inf3 0x000 96.5% mclpl 2048 277460 26731 1015 9 1644 1132 512 4096 865 4 524274 4 0x000 99.1% mutex 64 46429620 2951894 1691068 604715 36471 30 36441 4096 36471 0 inf1 0x040 72.5% mbuf and mbuf cluster are not the same thing. I thought mbufs were 256 bytes, but they seem to be 512 (They are 256 on my netbsd-6/i386 box.) Either way, they have a header and can hold some data. mbuf clusters are 2048 bytes. These are for data only, and are attached to a regular mbuf so that the 2K space is used instead of the 512-header bytes. clusters can be used when data does not fit in a regular mbuf. Many ethernet interfaces pre-allocate clusters and stage them in the receive ring buffer so that the interface can just dma the data. So having a bunch of clusters used is pretty normal. I'm a little fuzzy, but I think some drivers have clusters preallocated and then get mbufs themselves to attach to them when processing the receive interupt. I see no fails counted. Why do you think you are out of clusters? Are you seeing that in dmesg? Or is it just a possible lockup explanation? Please describe the lockup symptoms more precisely. Also, look in vmstat -m for anything with fail != 0. you might also save vmstat -m to a file every 5 minutes, and look before/after the next lockup. pgpdCATW0dljk.pgp Description: PGP signature