Re: How to track down a suspected memory leak?

2007-12-03 Thread Der Engel
Is anyone still getting crashes after patch 4 in 4.2?



On Dec 2, 2007 9:06 AM, Rolf Sommerhalder
[EMAIL PROTECTED] wrote:
 On Nov 25, 2007 5:22 PM, David Higgs [EMAIL PROTECTED] wrote:
  Is this possibly the same memory leak mentioned below?
 
  http://marc.info/?l=openbsd-miscm=119572453509542w=2

 Thanks for your pointer! Indeed, this patch/errata appears to have
 sqashed the memory leak. A patched kernel did not loose memory since
 Monday anymore.

 Thanks again,
 Rolf



Re: How to track down a suspected memory leak?

2007-12-03 Thread Josh
 No problems here, I patched around 8 machines and they all stopped
freezing up.

Der Engel wrote:

  Is anyone still getting crashes after patch 4 in 4.2?

  On Dec 2, 2007 9:06 AM, Rolf Sommerhalder  [EMAIL PROTECTED]   wrote:

On Nov 25, 2007 5:22 PM, David Higgs [EMAIL PROTECTED] wrote:

  Is this possibly the same memory leak mentioned below?
  http://marc.info/?l=openbsd-miscm=119572453509542w=2

Thanks for your pointer! Indeed, this patch/errata appears to have
sqashed the memory leak. A patched kernel did not loose memory since
Monday anymore.

Thanks again,
Rolf



Re: How to track down a suspected memory leak?

2007-12-02 Thread Rolf Sommerhalder
On Nov 25, 2007 5:22 PM, David Higgs [EMAIL PROTECTED] wrote:
 Is this possibly the same memory leak mentioned below?

 http://marc.info/?l=openbsd-miscm=119572453509542w=2

Thanks for your pointer! Indeed, this patch/errata appears to have
sqashed the memory leak. A patched kernel did not loose memory since
Monday anymore.

Thanks again,
Rolf



Re: How to track down a suspected memory leak?

2007-11-27 Thread Henning Brauer
* Josh [EMAIL PROTECTED] [2007-11-26 21:57]:
  Henning Brauer wrote:
 
 Thanks David for this pointer. It may very well be the same issue.
 Even though the two bridged interfaces are em(4) (1 Gb/s), the
 Out-of-Band Management (OOBM) interface is fxp(4) that carries two
 VLANs, one for pfsync(4), and one for commandcontrol/monitoring.
 
   the leak had nothing to do with fxp.
   it's simply a generic memory leak in a state insertion error path that 
   single firewalls tend to trigger seldom if at all, but pfsync 
   regularily hits.
 
 Still, I will given Henning's patch a try, while waiting for results
 of the instrumentation with 'vmstat -m', as suggested by the previous
 responder.
 
   if you're running pfsync i make bets it is that.
   if you look at vmstat -m and pfstatekeypl has more objects in use than
   pfstatepl you know it is that.
 
 Yeah your patch thankfully does fix the problem. Just had another pair of
 4.2 boxes
 fall over from the same bug this morning.
 
 Is it serious enough to put an errata  note up?

assuming no ill effects from the fix show up, yes, soon.

-- 
Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED]
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg  Amsterdam



Re: How to track down a suspected memory leak?

2007-11-26 Thread Josh
 Henning Brauer wrote:

Thanks David for this pointer. It may very well be the same issue.
Even though the two bridged interfaces are em(4) (1 Gb/s), the
Out-of-Band Management (OOBM) interface is fxp(4) that carries two
VLANs, one for pfsync(4), and one for commandcontrol/monitoring.

  the leak had nothing to do with fxp.
  it's simply a generic memory leak in a state insertion error path that 
  single firewalls tend to trigger seldom if at all, but pfsync 
  regularily hits.

Still, I will given Henning's patch a try, while waiting for results
of the instrumentation with 'vmstat -m', as suggested by the previous
responder.

  if you're running pfsync i make bets it is that.
  if you look at vmstat -m and pfstatekeypl has more objects in use than
  pfstatepl you know it is that.

Yeah your patch thankfully does fix the problem. Just had another pair of
4.2 boxes
fall over from the same bug this morning.

Is it serious enough to put an errata  note up?



Re: How to track down a suspected memory leak?

2007-11-25 Thread Tobias Ulmer
On Sun, Nov 25, 2007 at 08:03:11AM +0100, Rolf Sommerhalder wrote:
 Hello list,
 
 I am looking for suggestions how to identify the source(s) of what
 appears to be a memory leak of approx. 10 MByte/day on a clustered
 pair of filtering bridges. These bridges are running i386 -current
 snapshot from Nov 2nd. They form outer, Internet-facing stage of a two
 stage firewall in an enterprise setup.
 [...]


If i were you, i would collect a few vmstat -m outputs, probably using
cron, at a time where the machines are pretty much idle and then compare
them with the previous ones and see what's growing. If you're lucky, it
gives you a pretty good indication in which subsystem the memory leak
is. Then use the source :)

Tobias



Re: How to track down a suspected memory leak?

2007-11-25 Thread Rolf Sommerhalder
On Nov 25, 2007 5:22 PM, David Higgs [EMAIL PROTECTED] wrote:

 Is this possibly the same memory leak mentioned below?

 http://marc.info/?l=openbsd-miscm=119572453509542w=2

Thanks David for this pointer. It may very well be the same issue.
Even though the two bridged interfaces are em(4) (1 Gb/s), the
Out-of-Band Management (OOBM) interface is fxp(4) that carries two
VLANs, one for pfsync(4), and one for commandcontrol/monitoring.

Interestingly, I observe memory depletion at the same rate on both
nodes of these active-passive filtering bridge clusters (both the
sparc64 and i386), e.g. free memory on the passive bridge depletes at
the same rate as on the one that is active. This may hint that the
problem is rather with the fxp(4) than with the em(4) which are
bridged. Unless it is somehow related to Rapid Spanning Tree (RSTP)
which is running on both the internal and external em(4)s on both the
active and the passive node.

Maybe it's worth mentioning that on the previous sparc64 platforms
(Sun Blade 100), where I observed slow memory depletion first, the
bridging was between two ports of a quad hme(4) NIC, and the OOBM was
on a third port of the same quad NIC.

Still, I will given Henning's patch a try, while waiting for results
of the instrumentation with 'vmstat -m', as suggested by the previous
responder.

Thanks again,
Rolf




[EMAIL PROTECTED]:home]# ifconfig
lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33208
groups: lo
inet 127.0.0.1 netmask 0xff00
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x6
em0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500
lladdr 00:10:f3:0c:fa:d6
description: brExt_InternetEx
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet6 fe80::210:f3ff:fe0c:fad6%em0 prefixlen 64 scopeid 0x1
em1: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500
lladdr 00:10:f3:0c:fa:d7
description: brInt_InternetInt
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet6 fe80::210:f3ff:fe0c:fad7%em1 prefixlen 64 scopeid 0x2
fxp0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 1500
lladdr 00:10:f3:0c:fa:d8
media: Ethernet autoselect (none)
status: no carrier
fxp1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
lladdr 00:10:f3:0c:fa:d9
description: VLAN trunk OOBMgtExt, brSync
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet6 fe80::210:f3ff:fe0c:fad9%fxp1 prefixlen 64 scopeid 0x4
enc0: flags=0 mtu 1536
vlan21: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
lladdr 00:10:f3:0c:fa:d9
description: brSync
vlan: 21 priority: 0 parent interface: fxp1
groups: vlan
inet6 fe80::210:f3ff:fe0c:fad9%vlan21 prefixlen 64 scopeid 0x7
inet 192.168.7.13 netmask 0xff00 broadcast 192.168.7.255
vlan71: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
lladdr 00:10:f3:0c:fa:d9
description: OOBMgtExt
vlan: 71 priority: 0 parent interface: fxp1
groups: vlan egress
inet6 fe80::210:f3ff:fe0c:fad9%vlan71 prefixlen 64 scopeid 0x8
inet 172.16.71.13 netmask 0xff00 broadcast 172.16.71.255
pfsync0: flags=41UP,RUNNING mtu 1460
description: pfSync
pfsync: syncdev: vlan21 syncpeer: 224.0.0.240 maxupd: 128
groups: carp pfsync
bridge0: flags=41UP,RUNNING mtu 1500
groups: bridge
pflog0: flags=141UP,RUNNING,PROMISC mtu 33208
groups: pflog
[EMAIL PROTECTED]:home]#

bridge0: flags=41UP,RUNNING
priority 28672 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp
em1 flags=cbLEARNING,DISCOVER,STP,PTP,AUTOPTP
port 2 ifpriority 128 ifcost 2 forwarding role designated
em0 flags=cfLEARNING,DISCOVER,BLOCKNONIP,STP,PTP,AUTOPTP
port 1 ifpriority 128 ifcost 2 forwarding role root
Addresses (max cache: 100, timeout: 240):
00:00:5e:00:01:0b em1 1 flags=0
00:11:20:2f:09:54 em0 1 flags=0
00:1d:46:97:5f:0d em1 1 flags=0
00:1d:46:97:5f:03 em0 1 flags=0
[EMAIL PROTECTED]:home]#

[EMAIL PROTECTED]:home]# dmesg
OpenBSD 4.2-current (GENERIC) #476: Fri Nov  2 14:41:26 MDT 2007
[EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Pentium(R) 4 CPU 2.80GHz (GenuineIntel 686-class) 2.80 GHz
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR
real mem  = 1072197632 (1022MB)
avail mem = 1028968448 (981MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 06/29/06, BIOS32 rev. 0 @
0xfb250, SMBIOS rev. 2.2 @ 0xf0800 (34 entries)
bios0: vendor Phoenix Technologies, LTD version 6.00 PG date 06/29/2006
apm0 at bios0: Power Management spec V1.2
apm0: AC on, battery 

Re: How to track down a suspected memory leak?

2007-11-25 Thread Henning Brauer
* Rolf Sommerhalder [EMAIL PROTECTED] [2007-11-25 18:44]:
 On Nov 25, 2007 5:22 PM, David Higgs [EMAIL PROTECTED] wrote:
 
  Is this possibly the same memory leak mentioned below?
 
  http://marc.info/?l=openbsd-miscm=119572453509542w=2
 
 Thanks David for this pointer. It may very well be the same issue.
 Even though the two bridged interfaces are em(4) (1 Gb/s), the
 Out-of-Band Management (OOBM) interface is fxp(4) that carries two
 VLANs, one for pfsync(4), and one for commandcontrol/monitoring.

the leak had nothing to do with fxp.
it's simply a generic memory leak in a state insertion error path that 
single firewalls tend to trigger seldom if at all, but pfsync 
regularily hits.

 Still, I will given Henning's patch a try, while waiting for results
 of the instrumentation with 'vmstat -m', as suggested by the previous
 responder.

if you're running pfsync i make bets it is that.
if you look at vmstat -m and pfstatekeypl has more objects in use than
pfstatepl you know it is that.



-- 
Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED]
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg  Amsterdam



How to track down a suspected memory leak?

2007-11-24 Thread Rolf Sommerhalder
Hello list,

I am looking for suggestions how to identify the source(s) of what
appears to be a memory leak of approx. 10 MByte/day on a clustered
pair of filtering bridges. These bridges are running i386 -current
snapshot from Nov 2nd. They form outer, Internet-facing stage of a two
stage firewall in an enterprise setup.

Before we received two new i386 servers, the same setup was running on
two sparc64 servers with a snapshot about one month old. Back then, I
observed the same steady decrease of memory, graphing trends using
net-snmp and Cacti. Those old sparc64 servers only had 192 MByte of
RAM, they ran out of memory and stopped working after 10 days or so.
As I had some difficulties to get net-snmp to run at all on sparc64
(see patch posted to this list earlier), I was hoping to get away from
this apparent memory leak once I migrate to to newer i386 servers.

After the migration from sparc64 to i386, indeed the memory consumed
during the first few days remained constant. Thereafter however, the
steady decrease of free memory also started to appear on the i386 much
like with the sparc64. I disabled all SNMP GET operations for a few
hours, just to see if the leak might be caused by net-snmp, but the
leakage continues during this time too. Staring at the output of
'systat vmstat' etc. did not help either.

The pragmatic work-around for the moment is cron job that reboots each
of the cluster nodes once a week. There is is enough headroom with 1
GByte of RAM on these i386 servers. The two cluster nodes reboot at
different times, so the service is interrupted only for a few seconds
until rapid spanning tree completes fail-over.

At the moment, on a much smaller scale, I replicate such a two stage
clustered firewall setup for home use. based on OpenBSD flashboot,
WRAP / ALIX boards from PCengines and surplus Nokia IP120s which I
converted to OpenBSD. Also because the WRAPs have only 128 MByte of
RAM, I very much like to get to the root cause of that apparent memory
leak in my clustered filtering bridge configuration. I am grateful for
any hints and suggestions how to track it down.

Thanks,
Rolf