Re: OpenBSD 6.4-stable + current "freezes" after 4h [not]
Re, On 14.01.19 18:40, Theo de Raadt wrote: > We accept reasonable bug reports from systems with a few changes. You do NOT > have > a few changes, you have a huge pile of them, and therefore you are > 'responsible > for all the pieces'. ... > Almost assuredly you are being burned by your own changes. First of all, there will be no irony in the following lines. Theo, I really appreciate your intention protecting the devs from unnecessary work. You were so damn right stopping the assumption I was following. Stuart and Hrvoje, thanks for helping with the information about ddb, that pushes me in the right direction. Just for the record and terms of sharing knowledge (also the bad ones): the problem was caused by a really bad doas call, that I wasn't aware of, but what might creep in my configs at the same time I updated the integration stage to 6.4. (a zabbix_agent was periodically calling '...cmd ksh args -c "/usr/sbin/ospfctl args show neighbor"' instead of '...cmd /usr/sbin/ospfctl args show neighbor') Fixing this doas-line let the server run stable again. So again thanks and last but not least : sorry for the noise, guys ! Cheers, Marco
Re: OpenBSD 6.4-stable + current "freezes" after 4h
On 14.1.2019. 16:25, Hrvoje Popovski wrote: > On 14.1.2019. 10:02, Marco Prause wrote: >> splassert: bstp_notify_rtage: want 2 have 0 >> splassert: bstp_notify_rtage: want 2 have 0 >> splassert: bstp_notify_rtage: want 2 have 0 >> splassert: bstp_notify_rtage: want 2 have 0 >> splassert: bstp_notify_rtage: want 2 have 0 >> splassert: bstp_notify_rtage: want 2 have 0 > > could you try adding this sysctls > sysctl kern.splassert=2 > sysctl kern.pool_debug=1 > > > are you getting similar traces ? > > splassert: bstp_notify_rtage: want 2 have 0 > Starting stack trace... > bstp_set_port_tc(668bdd1357c8fcb,4) at bstp_set_port_tc+0x1a0 > bstp_update_tc(fa46532a51d755d) at bstp_update_tc+0xfd > bstp_tick(809f7c00) at bstp_tick+0x357 > softclock(3c3f171cb53a98a3) at softclock+0x117 > softintr_dispatch(120392a2955eaa7c) at softintr_dispatch+0xfc > Xsoftclock(0,0,1388,0,800267e0,81ccd6b0) at Xsoftclock+0x1f > acpicpu_idle() at acpicpu_idle+0x281 > sched_idle(0) at sched_idle+0x245 > end trace frame: 0x0, count: 249 > End of stack trace. > > > splassert: bstp_notify_rtage: want 2 have 256 > Starting stack trace... > bstp_set_port_tc(668bdd1357c8fcb,4) at bstp_set_port_tc+0x1a0 > bstp_update_tc(fa46532a51d755d) at bstp_update_tc+0xfd > bstp_tick(809f7c00) at bstp_tick+0x357 > softclock(3c3f171cb53a98a3) at softclock+0x117 > softintr_dispatch(120392a2955eaa7c) at softintr_dispatch+0xfc > Xsoftclock(0,0,1388,0,800267e0,81ccd6b0) at Xsoftclock+0x1f > acpicpu_idle() at acpicpu_idle+0x281 > sched_idle(0) at sched_idle+0x245 > end trace frame: 0x0, count: 249 > End of stack trace. > i'm getting these traces even with OpenBSD 6.4-current (GENERIC.MP) #499: Mon Dec 10 11:33:10 MST 2018 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP which is before mpi@ commit Changes by: m...@cvs.openbsd.org2018/12/12 07:19:15 Modified files: sys/net: if_bridge.c bridgectl.c splassert: bstp_notify_rtage: want 2 have 0 Starting stack trace... bstp_set_port_tc(233f0d46a06cbcc7,4) at bstp_set_port_tc+0x1a0 bstp_update_tc(cc45a761c76fe6c6) at bstp_update_tc+0xfd bstp_tick(80663400) at bstp_tick+0x357 softclock(82030e4bce69f3d2) at softclock+0x117 softintr_dispatch(df881ff53c0f4dab) at softintr_dispatch+0xfc Xsoftclock(0,0,1388,0,800267e0,81ca66b0) at Xsoftclock+0x1f acpicpu_idle() at acpicpu_idle+0x281 sched_idle(0) at sched_idle+0x245 end trace frame: 0x0, count: 249 End of stack trace. so, maybe all this traces are noise regarding this problem or it's been in tree for a some time
Re: OpenBSD 6.4-stable + current "freezes" after 4h
On Mon, 14 Jan 2019 18:27:56 +0100, Marco Prause wrote: > > > Am 14. Januar 2019 16:40:48 MEZ schrieb Theo de Raadt : > >It sure looks like you have a pile of your own changes which are highly > >unconventional, > >and you are very far away from a stock OpenBSD configuration. > > Well, that's right so far, because I have decided to use the tool resflash to > create images (https://stable.rcesoftware.com/resflash/). > > That's the "only" changes, that made the system away from a stock OpenBSD > configuration. > > But sure, to get this also out of the way of possible causes, I could install > current to the server on the hard disc. I just thought resflash just did some > changes to the boot process and I assume the issue more at the bridge-part. >From https://stable.rcesoftware.com/resflash/ sources: Resflash is not a supported OpenBSD configuration. Please do not email bugs@ or misc@ asking for help. If you have a question or a bug to report, please https://www.freelists.org/list/resflash";>post to our mailing list, https://gitlab.com/bconway/resflash/issues";>submit an issue on GitLab, or mailto:bconway-at-rcesoftware-dot-com";>email me directly.
Re: OpenBSD 6.4-stable + current "freezes" after 4h
Marco Prause wrote: > Am 14. Januar 2019 16:40:48 MEZ schrieb Theo de Raadt : > >It sure looks like you have a pile of your own changes which are highly > >unconventional, > >and you are very far away from a stock OpenBSD configuration. > > Well, that's right so far, because I have decided to use the tool resflash to > create images (https://stable.rcesoftware.com/resflash/). So basically.. you have changed everything. We accept reasonable bug reports from systems with a few changes. You do NOT have a few changes, you have a huge pile of them, and therefore you are 'responsible for all the pieces'. > That's the "only" changes, that made the system away from a stock OpenBSD > configuration. You have 5 MFS filesystems. "only changes"? Almost assuredly you are being burned by your own changes.
Re: OpenBSD 6.4-stable + current "freezes" after 4h
Am 14. Januar 2019 16:40:48 MEZ schrieb Theo de Raadt : >It sure looks like you have a pile of your own changes which are highly >unconventional, >and you are very far away from a stock OpenBSD configuration. Well, that's right so far, because I have decided to use the tool resflash to create images (https://stable.rcesoftware.com/resflash/). That's the "only" changes, that made the system away from a stock OpenBSD configuration. But sure, to get this also out of the way of possible causes, I could install current to the server on the hard disc. I just thought resflash just did some changes to the boot process and I assume the issue more at the bridge-part. >Having made those decisions, you are responsible for your own issues. > >Sorry. That seems fair enough to me. Let me have a look at the ddb stuff, Stuart mentioned and the splassert stuff Hrvoje mentioned, before I'm going to reinstall the server with a stock current OpenBSD. Cheers, Marco >> Hi Stuart, >> >> thanks for having a look at this. >> >> >> > Is it the same or different hardware type and BIOS version for the >> > working and hanging machines? (maybe diff the two dmesgs) >> > >> > Same or different filesystem mount options? (Are you using >softdep?) >> >> it's (nearly) the same hardware. >> >> But thanks to your hint of diffing the dmesg outputs I found a small >> difference : >> >> >> * server1: >> >> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec200 (78 entries) >> bios0: vendor American Megatrends Inc. version "4.6.5" date >03/02/2015 >> bios0: INTEL Corporation DENLOW_WS >> >> * server2: >> >> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec200 (77 >entries) >> bios0: vendor American Megatrends Inc. version "4.6.5" date >03/02/2015 >> bios0: INTEL Corporation >DENLOW_WS >> >> >> * server2 has an additional entry, I do not see on server1 >> >> acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x >> >> >> * server2 also seems to have a slightly different memory setup : >> >> spdmem0 at iic0 addr 0x50: 8GB DDR3 SDRAM PC3-12800 >> >> * whereas server1 has : >> >> spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM PC3-12800 >> spdmem1 at iic0 addr 0x52: 4GB DDR3 SDRAM PC3-12800 >> >> >> >> On the filesystem I can't see any differences : >> >> * server1: >> $ >> >mount > >> >> /dev/sd0d on / type ffs (local, noatime, nodev, >> read-only) >> mfs:14405 on /tmp type mfs (asynchronous, local, noatime, nodev, >nosuid, >> size=65536 512-blocks) >> mfs:35803 on /dev type mfs (asynchronous, local, noatime, noexec, >> size=12288 512-blocks) >> mfs:30894 on /etc type mfs (asynchronous, local, noatime, nodev, >nosuid, >> size=65536 512-blocks) >> mfs:75826 on /var type mfs (asynchronous, local, noatime, nodev, >noexec, >> size=131072 512-blocks) >> mfs:23894 on /usr/lib type mfs (asynchronous, local, noatime, nodev, >> nosuid, size=262144 512-blocks) >> mfs:21714 on /usr/libexec type mfs (asynchronous, local, noatime, >nodev, >> size=262144 512-blocks) >> $ cat >> >/etc/fstab > >> >> dd6727251088320b.a /mbr ffs rw,noatime,nodev,noexec,noauto 1 >> 2 >> dd6727251088320b.d / ffs ro,noatime,nodev 1 >> 1 >> dd6727251088320b.f /cfg ffs rw,noatime,nodev,noexec,noauto 1 >> 2 >> dd6727251088320b.i /efi msdos rw,noatime,nodev,noexec,noauto 0 >> 0 >> swap /tmp mfs rw,async,noatime,nodev,nosuid,-s32M 0 >> 0 >> >$ > >> >> >> >> * server2: >> >> $ mount >> /dev/sd0e on / type ffs (local, noatime, nodev, read-only) >> mfs:19530 on /tmp type mfs (asynchronous, local, noatime, nodev, >nosuid, >> size=65536 512-blocks) >> mfs:65784 on /dev type mfs (asynchronous, local, noatime, noexec, >> size=12288 512-blocks) >> mfs:41465 on /etc type mfs (asynchronous, local, noatime, nodev, >nosuid, >> size=65536 512-blocks) >> mfs:86708 on /var type mfs (asynchronous, local, noatime, nodev, >noexec, >> size=262144 512-blocks) >> mfs:90223 on /usr/lib type mfs (asynchronous, local, noatime, nodev, >> nosuid, size=262144 512-blocks) >> mfs:22430 on /usr/libexec type mfs (asynchronous, local, noatime, >nodev, >> size=262144 512-blocks) >> $ cat >> >/etc/fstab > >> >> 9f97b8d42ceedbf4.a /mbr ffs rw,noatime,nodev,noexec,noauto 1 >> 2 >> 9f97b8d42ceedbf4.e / ffs ro,noatim
Re: OpenBSD 6.4-stable + current "freezes" after 4h
It sure looks like you have a pile of your own changes which are highly unconventional, and you are very far away from a stock OpenBSD configuration. Having made those decisions, you are responsible for your own issues. Sorry. > Hi Stuart, > > thanks for having a look at this. > > > > Is it the same or different hardware type and BIOS version for the > > working and hanging machines? (maybe diff the two dmesgs) > > > > Same or different filesystem mount options? (Are you using softdep?) > > it's (nearly) the same hardware. > > But thanks to your hint of diffing the dmesg outputs I found a small > difference : > > > * server1: > > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec200 (78 entries) > bios0: vendor American Megatrends Inc. version "4.6.5" date 03/02/2015 > bios0: INTEL Corporation DENLOW_WS > > * server2: > > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec200 (77 entries) > bios0: vendor American Megatrends Inc. version "4.6.5" date 03/02/2015 > bios0: INTEL Corporation DENLOW_WS > > > * server2 has an additional entry, I do not see on server1 > > acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x > > > * server2 also seems to have a slightly different memory setup : > > spdmem0 at iic0 addr 0x50: 8GB DDR3 SDRAM PC3-12800 > > * whereas server1 has : > > spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM PC3-12800 > spdmem1 at iic0 addr 0x52: 4GB DDR3 SDRAM PC3-12800 > > > > On the filesystem I can't see any differences : > > * server1: > $ > mount > > > /dev/sd0d on / type ffs (local, noatime, nodev, > read-only) > mfs:14405 on /tmp type mfs (asynchronous, local, noatime, nodev, nosuid, > size=65536 512-blocks) > mfs:35803 on /dev type mfs (asynchronous, local, noatime, noexec, > size=12288 512-blocks) > mfs:30894 on /etc type mfs (asynchronous, local, noatime, nodev, nosuid, > size=65536 512-blocks) > mfs:75826 on /var type mfs (asynchronous, local, noatime, nodev, noexec, > size=131072 512-blocks) > mfs:23894 on /usr/lib type mfs (asynchronous, local, noatime, nodev, > nosuid, size=262144 512-blocks) > mfs:21714 on /usr/libexec type mfs (asynchronous, local, noatime, nodev, > size=262144 512-blocks) > $ cat > /etc/fstab > > > dd6727251088320b.a /mbr ffs rw,noatime,nodev,noexec,noauto 1 > 2 > dd6727251088320b.d / ffs ro,noatime,nodev 1 > 1 > dd6727251088320b.f /cfg ffs rw,noatime,nodev,noexec,noauto 1 > 2 > dd6727251088320b.i /efi msdos rw,noatime,nodev,noexec,noauto 0 > 0 > swap /tmp mfs rw,async,noatime,nodev,nosuid,-s32M 0 > 0 > $ > > > > > * server2: > > $ mount > /dev/sd0e on / type ffs (local, noatime, nodev, read-only) > mfs:19530 on /tmp type mfs (asynchronous, local, noatime, nodev, nosuid, > size=65536 512-blocks) > mfs:65784 on /dev type mfs (asynchronous, local, noatime, noexec, > size=12288 512-blocks) > mfs:41465 on /etc type mfs (asynchronous, local, noatime, nodev, nosuid, > size=65536 512-blocks) > mfs:86708 on /var type mfs (asynchronous, local, noatime, nodev, noexec, > size=262144 512-blocks) > mfs:90223 on /usr/lib type mfs (asynchronous, local, noatime, nodev, > nosuid, size=262144 512-blocks) > mfs:22430 on /usr/libexec type mfs (asynchronous, local, noatime, nodev, > size=262144 512-blocks) > $ cat > /etc/fstab > > > 9f97b8d42ceedbf4.a /mbr ffs rw,noatime,nodev,noexec,noauto 1 > 2 > 9f97b8d42ceedbf4.e / ffs ro,noatime,nodev 1 1 > 9f97b8d42ceedbf4.f /cfg ffs rw,noatime,nodev,noexec,noauto 1 > 2 > 9f97b8d42ceedbf4.i /efi msdos rw,noatime,nodev,noexec,noauto 0 > 0 > swap /tmp mfs rw,async,noatime,nodev,nosuid,-s32M 0 0 > $ > > > > For the other suggestions, let me run the system with " > > sysctl ddb.console=1" and wait until the problem will occur to answer your > questions as soon I have the additional information. > > > Cheers, > Marco > >
Re: OpenBSD 6.4-stable + current "freezes" after 4h
On 14.1.2019. 10:02, Marco Prause wrote: > splassert: bstp_notify_rtage: want 2 have 0 > splassert: bstp_notify_rtage: want 2 have 0 > splassert: bstp_notify_rtage: want 2 have 0 > splassert: bstp_notify_rtage: want 2 have 0 > splassert: bstp_notify_rtage: want 2 have 0 > splassert: bstp_notify_rtage: want 2 have 0 could you try adding this sysctls sysctl kern.splassert=2 sysctl kern.pool_debug=1 are you getting similar traces ? splassert: bstp_notify_rtage: want 2 have 0 Starting stack trace... bstp_set_port_tc(668bdd1357c8fcb,4) at bstp_set_port_tc+0x1a0 bstp_update_tc(fa46532a51d755d) at bstp_update_tc+0xfd bstp_tick(809f7c00) at bstp_tick+0x357 softclock(3c3f171cb53a98a3) at softclock+0x117 softintr_dispatch(120392a2955eaa7c) at softintr_dispatch+0xfc Xsoftclock(0,0,1388,0,800267e0,81ccd6b0) at Xsoftclock+0x1f acpicpu_idle() at acpicpu_idle+0x281 sched_idle(0) at sched_idle+0x245 end trace frame: 0x0, count: 249 End of stack trace. splassert: bstp_notify_rtage: want 2 have 256 Starting stack trace... bstp_set_port_tc(668bdd1357c8fcb,4) at bstp_set_port_tc+0x1a0 bstp_update_tc(fa46532a51d755d) at bstp_update_tc+0xfd bstp_tick(809f7c00) at bstp_tick+0x357 softclock(3c3f171cb53a98a3) at softclock+0x117 softintr_dispatch(120392a2955eaa7c) at softintr_dispatch+0xfc Xsoftclock(0,0,1388,0,800267e0,81ccd6b0) at Xsoftclock+0x1f acpicpu_idle() at acpicpu_idle+0x281 sched_idle(0) at sched_idle+0x245 end trace frame: 0x0, count: 249 End of stack trace.
Re: OpenBSD 6.4-stable + current "freezes" after 4h
Just a small follow-up to my previous email: I've just had a look at the hardware, that causes the problem before I've exchanged it with the new one, that now also produce the problem. This server seems to have the same hardware-setup then the server1, I mentioned the email before, which is not freezing. Here I see the same memory-setup : spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM PC3-12800 spdmem1 at iic0 addr 0x52: 4GB DDR3 SDRAM PC3-12800 and no acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x00 which may be produced from the current-kernel.
Re: OpenBSD 6.4-stable + current "freezes" after 4h
Hi Stuart, thanks for having a look at this. > Is it the same or different hardware type and BIOS version for the > working and hanging machines? (maybe diff the two dmesgs) > > Same or different filesystem mount options? (Are you using softdep?) it's (nearly) the same hardware. But thanks to your hint of diffing the dmesg outputs I found a small difference : * server1: bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec200 (78 entries) bios0: vendor American Megatrends Inc. version "4.6.5" date 03/02/2015 bios0: INTEL Corporation DENLOW_WS * server2: bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec200 (77 entries) bios0: vendor American Megatrends Inc. version "4.6.5" date 03/02/2015 bios0: INTEL Corporation DENLOW_WS * server2 has an additional entry, I do not see on server1 acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x * server2 also seems to have a slightly different memory setup : spdmem0 at iic0 addr 0x50: 8GB DDR3 SDRAM PC3-12800 * whereas server1 has : spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM PC3-12800 spdmem1 at iic0 addr 0x52: 4GB DDR3 SDRAM PC3-12800 On the filesystem I can't see any differences : * server1: $ mount /dev/sd0d on / type ffs (local, noatime, nodev, read-only) mfs:14405 on /tmp type mfs (asynchronous, local, noatime, nodev, nosuid, size=65536 512-blocks) mfs:35803 on /dev type mfs (asynchronous, local, noatime, noexec, size=12288 512-blocks) mfs:30894 on /etc type mfs (asynchronous, local, noatime, nodev, nosuid, size=65536 512-blocks) mfs:75826 on /var type mfs (asynchronous, local, noatime, nodev, noexec, size=131072 512-blocks) mfs:23894 on /usr/lib type mfs (asynchronous, local, noatime, nodev, nosuid, size=262144 512-blocks) mfs:21714 on /usr/libexec type mfs (asynchronous, local, noatime, nodev, size=262144 512-blocks) $ cat /etc/fstab dd6727251088320b.a /mbr ffs rw,noatime,nodev,noexec,noauto 1 2 dd6727251088320b.d / ffs ro,noatime,nodev 1 1 dd6727251088320b.f /cfg ffs rw,noatime,nodev,noexec,noauto 1 2 dd6727251088320b.i /efi msdos rw,noatime,nodev,noexec,noauto 0 0 swap /tmp mfs rw,async,noatime,nodev,nosuid,-s32M 0 0 $ * server2: $ mount /dev/sd0e on / type ffs (local, noatime, nodev, read-only) mfs:19530 on /tmp type mfs (asynchronous, local, noatime, nodev, nosuid, size=65536 512-blocks) mfs:65784 on /dev type mfs (asynchronous, local, noatime, noexec, size=12288 512-blocks) mfs:41465 on /etc type mfs (asynchronous, local, noatime, nodev, nosuid, size=65536 512-blocks) mfs:86708 on /var type mfs (asynchronous, local, noatime, nodev, noexec, size=262144 512-blocks) mfs:90223 on /usr/lib type mfs (asynchronous, local, noatime, nodev, nosuid, size=262144 512-blocks) mfs:22430 on /usr/libexec type mfs (asynchronous, local, noatime, nodev, size=262144 512-blocks) $ cat /etc/fstab 9f97b8d42ceedbf4.a /mbr ffs rw,noatime,nodev,noexec,noauto 1 2 9f97b8d42ceedbf4.e / ffs ro,noatime,nodev 1 1 9f97b8d42ceedbf4.f /cfg ffs rw,noatime,nodev,noexec,noauto 1 2 9f97b8d42ceedbf4.i /efi msdos rw,noatime,nodev,noexec,noauto 0 0 swap /tmp mfs rw,async,noatime,nodev,nosuid,-s32M 0 0 $ For the other suggestions, let me run the system with " sysctl ddb.console=1" and wait until the problem will occur to answer your questions as soon I have the additional information. Cheers, Marco
Re: OpenBSD 6.4-stable + current "freezes" after 4h
On 2019-01-14, Marco Prause wrote: > after an initial boot, everything is working fine for round about 4 hours. > > After 4 hours, it is not possible to login into the backup/secondary > openbsd-server via ssh or even via serial console, but it seems to still > forward traffic correctly. Also the ospf adjacencies are up&running as > well as ipsec security associations and so on. > > Monitoring metrics doesn't show any meassured increase of any data. > > I've already exchanged the hardware, because it was my first guess, as > the first server/gateway is running without any problems with the same > 6.4-stable and config version - but this unfortunately didn't help. Is it the same or different hardware type and BIOS version for the working and hanging machines? (maybe diff the two dmesgs) Same or different filesystem mount options? (Are you using softdep?) > When I left an serial console login opened, I was able to execute some > commands and also a top, I've invoked before, was still running at the > failure-state. But when entering e.g. ifconfig, or trying a > tab-completion also the serial console freezes. The "WAIT" column of a running top(1) may include useful information. If possible, run with "sysctl ddb.console=1" (needs setting pre-securelevel, add it to sysctl.conf if it's not already there), which should allow you to enter ddb by sending a BREAK signal over the serial line (~# in cu(1)). You can try that under normal operation (will interrupt service; be ready to type "c" and enter to continue to resume) to check it works. Then during a hang attempt to enter ddb, if you are successful then capture at least the following: ps trace Ideally also switch to all other cpus (the number in the ddb prompt shows the current one; you can do "mach ddbcpu 3" etc to switch to another) and re-run trace (which is completely per-cpu), ps (the line marked "*" indicates the currently active process on the currently selected CPU - for a report there's no need to repeat the entire list N times but could be useful to indicate the running processes on all CPUs). When you are done with these then also fetch: sh malloc sh all pools For the benefit of other readers who don't have serial console, ctrl+alt+esc on the keyboard will do the same if the keyboard/monitor are the selected console device, obviously it will be harder to capture the output in an easily readable format!
OpenBSD 6.4-stable + current "freezes" after 4h
Hi all @misc, 1st things 1st : sorry for my long description, but : after upgrading from 6.3-stable to 6.4-stable (and later also current) in our integration stage, I've met a strange problem. I run OpenBSD in a hub-and-spoke vpn architecture in round about 14 distributed datacenters. 6.3-stable is running fine and stable as expected. (all versions 6.3-stable, 6.4-stable and current are running as resflash-image) All locations - including the mentioned integration stage - are running with the same setup. Each location have two OpenBSD server/gateways, that run: - ospf over gre over ipsec -- local to each other and to our two main datacenters (hub) - two bridge-interfaces inside one server -- one for tagged frames, one for untagged -- both bridge-interfaces are connected with a pair-interface -- first server is configured as primary within ospf,stp and carp - layer-2 redundancy is done by stp on the openbsd-side and mstp (instance 0) on the network-gear-side - layer-3 redundancy is done by ospf and carp - pf is enabled The problem can be described as follows : after an initial boot, everything is working fine for round about 4 hours. After 4 hours, it is not possible to login into the backup/secondary openbsd-server via ssh or even via serial console, but it seems to still forward traffic correctly. Also the ospf adjacencies are up&running as well as ipsec security associations and so on. Monitoring metrics doesn't show any meassured increase of any data. I've already exchanged the hardware, because it was my first guess, as the first server/gateway is running without any problems with the same 6.4-stable and config version - but this unfortunately didn't help. When I left an serial console login opened, I was able to execute some commands and also a top, I've invoked before, was still running at the failure-state. But when entering e.g. ifconfig, or trying a tab-completion also the serial console freezes. The problem will not occur, if I : - shutdown bridge0 (for tagged frames) or - shutdown bridge1 (for untagged frames) or - shutdown pair0 or pair1 (interconnection between the bridges) Please find attached the commands I was able to execute before tab-completion or ifconfig in this case : ---cut--- # df -i Filesystem 512-blocks Used Avail Capacity iused ifree %iused Mounted on /dev/sd0e 3473724 1127852 2172188 34% 14494 219360 6% / mfs:64049 63326 12 60148 0% 7 8183 0% /tmp mfs:51486 11391 63 10759 1% 1231 1839 40% /dev mfs:86629 63326 8552 51608 14% 365 7825 4% /etc mfs:35143 253790 11512 229590 5% 236 32530 1% /var mfs:6765 253790 76506 164596 32% 45 32721 0% /usr/lib mfs:9627 253790 6132 234970 3% 66 32700 0% /usr/libexec # # vmstat 1 10 procs memory page disks traps cpu r s avm fre flt re pi po fr sr sd0 sd1 int sys cs us sy id 0 64 104M 7474M 19 0 0 0 0 0 1 0 73 68 168 0 0 100 0 64 104M 7474M 20 0 0 0 0 0 0 0 66 60 128 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 48 45 92 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 73 44 146 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 65 47 132 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 37 49 82 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 52 44 107 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 51 44 106 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 52 44 104 0 0 100 0 64 104M 7474M 12 0 0 0 0 0 0 0 53 47 118 0 0 100 # # iostat 1 10 tty sd0 sd1 cpu tin tout KB/t t/s MB/s KB/t t/s MB/s us ni sy sp in id 0 2 28.82 0 0.01 0.50 0 0.00 0 0 0 0 0100 0 193 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 0 64 0.00 0 0.00 0.00 0 0.00 0 0 0 0 0100 # # df -h Filesystem Size Used Avail Capacity Mounted on /dev/sd0e 1.7G 551M 1.0G 34% / mfs:69819 30.9M 9.0K 29.4M 0% /tm