Re: How to track down a suspected memory leak?
Is anyone still getting crashes after patch 4 in 4.2? On Dec 2, 2007 9:06 AM, Rolf Sommerhalder [EMAIL PROTECTED] wrote: On Nov 25, 2007 5:22 PM, David Higgs [EMAIL PROTECTED] wrote: Is this possibly the same memory leak mentioned below? http://marc.info/?l=openbsd-miscm=119572453509542w=2 Thanks for your pointer! Indeed, this patch/errata appears to have sqashed the memory leak. A patched kernel did not loose memory since Monday anymore. Thanks again, Rolf
Re: How to track down a suspected memory leak?
No problems here, I patched around 8 machines and they all stopped freezing up. Der Engel wrote: Is anyone still getting crashes after patch 4 in 4.2? On Dec 2, 2007 9:06 AM, Rolf Sommerhalder [EMAIL PROTECTED] wrote: On Nov 25, 2007 5:22 PM, David Higgs [EMAIL PROTECTED] wrote: Is this possibly the same memory leak mentioned below? http://marc.info/?l=openbsd-miscm=119572453509542w=2 Thanks for your pointer! Indeed, this patch/errata appears to have sqashed the memory leak. A patched kernel did not loose memory since Monday anymore. Thanks again, Rolf
Re: How to track down a suspected memory leak?
On Nov 25, 2007 5:22 PM, David Higgs [EMAIL PROTECTED] wrote: Is this possibly the same memory leak mentioned below? http://marc.info/?l=openbsd-miscm=119572453509542w=2 Thanks for your pointer! Indeed, this patch/errata appears to have sqashed the memory leak. A patched kernel did not loose memory since Monday anymore. Thanks again, Rolf
Re: How to track down a suspected memory leak?
* Josh [EMAIL PROTECTED] [2007-11-26 21:57]: Henning Brauer wrote: Thanks David for this pointer. It may very well be the same issue. Even though the two bridged interfaces are em(4) (1 Gb/s), the Out-of-Band Management (OOBM) interface is fxp(4) that carries two VLANs, one for pfsync(4), and one for commandcontrol/monitoring. the leak had nothing to do with fxp. it's simply a generic memory leak in a state insertion error path that single firewalls tend to trigger seldom if at all, but pfsync regularily hits. Still, I will given Henning's patch a try, while waiting for results of the instrumentation with 'vmstat -m', as suggested by the previous responder. if you're running pfsync i make bets it is that. if you look at vmstat -m and pfstatekeypl has more objects in use than pfstatepl you know it is that. Yeah your patch thankfully does fix the problem. Just had another pair of 4.2 boxes fall over from the same bug this morning. Is it serious enough to put an errata note up? assuming no ill effects from the fix show up, yes, soon. -- Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg Amsterdam
Re: How to track down a suspected memory leak?
Henning Brauer wrote: Thanks David for this pointer. It may very well be the same issue. Even though the two bridged interfaces are em(4) (1 Gb/s), the Out-of-Band Management (OOBM) interface is fxp(4) that carries two VLANs, one for pfsync(4), and one for commandcontrol/monitoring. the leak had nothing to do with fxp. it's simply a generic memory leak in a state insertion error path that single firewalls tend to trigger seldom if at all, but pfsync regularily hits. Still, I will given Henning's patch a try, while waiting for results of the instrumentation with 'vmstat -m', as suggested by the previous responder. if you're running pfsync i make bets it is that. if you look at vmstat -m and pfstatekeypl has more objects in use than pfstatepl you know it is that. Yeah your patch thankfully does fix the problem. Just had another pair of 4.2 boxes fall over from the same bug this morning. Is it serious enough to put an errata note up?
Re: How to track down a suspected memory leak?
On Sun, Nov 25, 2007 at 08:03:11AM +0100, Rolf Sommerhalder wrote: Hello list, I am looking for suggestions how to identify the source(s) of what appears to be a memory leak of approx. 10 MByte/day on a clustered pair of filtering bridges. These bridges are running i386 -current snapshot from Nov 2nd. They form outer, Internet-facing stage of a two stage firewall in an enterprise setup. [...] If i were you, i would collect a few vmstat -m outputs, probably using cron, at a time where the machines are pretty much idle and then compare them with the previous ones and see what's growing. If you're lucky, it gives you a pretty good indication in which subsystem the memory leak is. Then use the source :) Tobias
Re: How to track down a suspected memory leak?
On Nov 25, 2007 5:22 PM, David Higgs [EMAIL PROTECTED] wrote: Is this possibly the same memory leak mentioned below? http://marc.info/?l=openbsd-miscm=119572453509542w=2 Thanks David for this pointer. It may very well be the same issue. Even though the two bridged interfaces are em(4) (1 Gb/s), the Out-of-Band Management (OOBM) interface is fxp(4) that carries two VLANs, one for pfsync(4), and one for commandcontrol/monitoring. Interestingly, I observe memory depletion at the same rate on both nodes of these active-passive filtering bridge clusters (both the sparc64 and i386), e.g. free memory on the passive bridge depletes at the same rate as on the one that is active. This may hint that the problem is rather with the fxp(4) than with the em(4) which are bridged. Unless it is somehow related to Rapid Spanning Tree (RSTP) which is running on both the internal and external em(4)s on both the active and the passive node. Maybe it's worth mentioning that on the previous sparc64 platforms (Sun Blade 100), where I observed slow memory depletion first, the bridging was between two ports of a quad hme(4) NIC, and the OOBM was on a third port of the same quad NIC. Still, I will given Henning's patch a try, while waiting for results of the instrumentation with 'vmstat -m', as suggested by the previous responder. Thanks again, Rolf [EMAIL PROTECTED]:home]# ifconfig lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33208 groups: lo inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x6 em0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500 lladdr 00:10:f3:0c:fa:d6 description: brExt_InternetEx media: Ethernet autoselect (1000baseT full-duplex) status: active inet6 fe80::210:f3ff:fe0c:fad6%em0 prefixlen 64 scopeid 0x1 em1: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500 lladdr 00:10:f3:0c:fa:d7 description: brInt_InternetInt media: Ethernet autoselect (1000baseT full-duplex) status: active inet6 fe80::210:f3ff:fe0c:fad7%em1 prefixlen 64 scopeid 0x2 fxp0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 1500 lladdr 00:10:f3:0c:fa:d8 media: Ethernet autoselect (none) status: no carrier fxp1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 00:10:f3:0c:fa:d9 description: VLAN trunk OOBMgtExt, brSync media: Ethernet autoselect (100baseTX full-duplex) status: active inet6 fe80::210:f3ff:fe0c:fad9%fxp1 prefixlen 64 scopeid 0x4 enc0: flags=0 mtu 1536 vlan21: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 00:10:f3:0c:fa:d9 description: brSync vlan: 21 priority: 0 parent interface: fxp1 groups: vlan inet6 fe80::210:f3ff:fe0c:fad9%vlan21 prefixlen 64 scopeid 0x7 inet 192.168.7.13 netmask 0xff00 broadcast 192.168.7.255 vlan71: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 00:10:f3:0c:fa:d9 description: OOBMgtExt vlan: 71 priority: 0 parent interface: fxp1 groups: vlan egress inet6 fe80::210:f3ff:fe0c:fad9%vlan71 prefixlen 64 scopeid 0x8 inet 172.16.71.13 netmask 0xff00 broadcast 172.16.71.255 pfsync0: flags=41UP,RUNNING mtu 1460 description: pfSync pfsync: syncdev: vlan21 syncpeer: 224.0.0.240 maxupd: 128 groups: carp pfsync bridge0: flags=41UP,RUNNING mtu 1500 groups: bridge pflog0: flags=141UP,RUNNING,PROMISC mtu 33208 groups: pflog [EMAIL PROTECTED]:home]# bridge0: flags=41UP,RUNNING priority 28672 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp em1 flags=cbLEARNING,DISCOVER,STP,PTP,AUTOPTP port 2 ifpriority 128 ifcost 2 forwarding role designated em0 flags=cfLEARNING,DISCOVER,BLOCKNONIP,STP,PTP,AUTOPTP port 1 ifpriority 128 ifcost 2 forwarding role root Addresses (max cache: 100, timeout: 240): 00:00:5e:00:01:0b em1 1 flags=0 00:11:20:2f:09:54 em0 1 flags=0 00:1d:46:97:5f:0d em1 1 flags=0 00:1d:46:97:5f:03 em0 1 flags=0 [EMAIL PROTECTED]:home]# [EMAIL PROTECTED]:home]# dmesg OpenBSD 4.2-current (GENERIC) #476: Fri Nov 2 14:41:26 MDT 2007 [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Intel(R) Pentium(R) 4 CPU 2.80GHz (GenuineIntel 686-class) 2.80 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR real mem = 1072197632 (1022MB) avail mem = 1028968448 (981MB) mainbus0 at root bios0 at mainbus0: AT/286+ BIOS, date 06/29/06, BIOS32 rev. 0 @ 0xfb250, SMBIOS rev. 2.2 @ 0xf0800 (34 entries) bios0: vendor Phoenix Technologies, LTD version 6.00 PG date 06/29/2006 apm0 at bios0: Power Management spec V1.2 apm0: AC on, battery
Re: How to track down a suspected memory leak?
* Rolf Sommerhalder [EMAIL PROTECTED] [2007-11-25 18:44]: On Nov 25, 2007 5:22 PM, David Higgs [EMAIL PROTECTED] wrote: Is this possibly the same memory leak mentioned below? http://marc.info/?l=openbsd-miscm=119572453509542w=2 Thanks David for this pointer. It may very well be the same issue. Even though the two bridged interfaces are em(4) (1 Gb/s), the Out-of-Band Management (OOBM) interface is fxp(4) that carries two VLANs, one for pfsync(4), and one for commandcontrol/monitoring. the leak had nothing to do with fxp. it's simply a generic memory leak in a state insertion error path that single firewalls tend to trigger seldom if at all, but pfsync regularily hits. Still, I will given Henning's patch a try, while waiting for results of the instrumentation with 'vmstat -m', as suggested by the previous responder. if you're running pfsync i make bets it is that. if you look at vmstat -m and pfstatekeypl has more objects in use than pfstatepl you know it is that. -- Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg Amsterdam
How to track down a suspected memory leak?
Hello list, I am looking for suggestions how to identify the source(s) of what appears to be a memory leak of approx. 10 MByte/day on a clustered pair of filtering bridges. These bridges are running i386 -current snapshot from Nov 2nd. They form outer, Internet-facing stage of a two stage firewall in an enterprise setup. Before we received two new i386 servers, the same setup was running on two sparc64 servers with a snapshot about one month old. Back then, I observed the same steady decrease of memory, graphing trends using net-snmp and Cacti. Those old sparc64 servers only had 192 MByte of RAM, they ran out of memory and stopped working after 10 days or so. As I had some difficulties to get net-snmp to run at all on sparc64 (see patch posted to this list earlier), I was hoping to get away from this apparent memory leak once I migrate to to newer i386 servers. After the migration from sparc64 to i386, indeed the memory consumed during the first few days remained constant. Thereafter however, the steady decrease of free memory also started to appear on the i386 much like with the sparc64. I disabled all SNMP GET operations for a few hours, just to see if the leak might be caused by net-snmp, but the leakage continues during this time too. Staring at the output of 'systat vmstat' etc. did not help either. The pragmatic work-around for the moment is cron job that reboots each of the cluster nodes once a week. There is is enough headroom with 1 GByte of RAM on these i386 servers. The two cluster nodes reboot at different times, so the service is interrupted only for a few seconds until rapid spanning tree completes fail-over. At the moment, on a much smaller scale, I replicate such a two stage clustered firewall setup for home use. based on OpenBSD flashboot, WRAP / ALIX boards from PCengines and surplus Nokia IP120s which I converted to OpenBSD. Also because the WRAPs have only 128 MByte of RAM, I very much like to get to the root cause of that apparent memory leak in my clustered filtering bridge configuration. I am grateful for any hints and suggestions how to track it down. Thanks, Rolf