On 2021/11/13 21:49, Stuart Henderson wrote: > On 2021/11/13 18:04, Alexander Bluhm wrote: > > Hi, > > > > To make IPsec MP safe we need refcounting for the tdb. The diff > > below is part of something bigger we have at genua. Although it > > does not cover timeouts and the tdb reaper yet, I want to get this > > in as a frist step. > > > > It passes regress but there are setups that are not covered. Bridge > > and pfsync with IPsec and TCP signature need special care. > > > > When testing, please check for tdb leaks. The vmstat -m tdb in use > > culumn must be 0. It it is not, try ipsecctl -F. If ipsecctl -sa > > shows nothing, but vmstat -m shows a used tdb, then it is a bug. > > > > Name Size Requests Fail InUse Pgreq Pgrel Npage Hiwat Minpg Maxpg > > Idle > > tdb 904 56 0 0 7 7 0 7 0 8 > > 0 > > Tested with a box running isakmpd at home. Config is: > > isakmpd_flags=-Kv -D0=29 -D1=49 -D2=10 -D3=30 -D5=20 -D6=30 -D8=30 -D9=30 > -D10=20 > > ike passive esp \ > from xxx.xx.xxx.1 (xxx.xx.xxx.0/27) to 0.0.0.0/0 \ > local xxx.xx.xxx.1 \ > main auth hmac-sha2-256 enc aes group ecp256 \ > quick enc aes-128-gcm group ecp256 \ > srcid xxx > > isakmpd starts at boot, 3 devices connect in and tunnels come up, > FLOWS and SAD show in ipsecctl -sa. > > isakmpd crashes shortly after starting (as described in previous mails; > this is not new, it happens very often on the first start at boot), > FLOWS and SAD remain showing. At this point they still show InUse: > > Name Size Requests Fail InUse Pgreq Pgrel Npage Hiwat Minpg Maxpg > Idle > tdb 1088 9 0 6 1 0 1 1 0 8 > 0 > > After ipsecctl -F they are freed: > > tdb 1088 9 0 0 1 0 1 1 0 8 > 1 > > Restart isakmpd, ipsecctl -f /etc/ipsec.conf, devices connect back in: > > tdb 1088 18 0 6 2 1 1 1 0 8 > 0 > > (And at this point isakmpd is usually stable). > > I will leave a shell loop running vmstat -m every few minutes to check > if it grows. > > (Bad timing for testing TCPMD5 because over the last couple of days I have > just removed bgpd+ospfd from the machines that would have been good > choices to run test diffs..I might be able to get some simple setup > back up and running though)
Ah, that was interesting: Sat Nov 13 21:44:17 GMT 2021 tdb 1088 18 0 6 2 1 1 1 0 8 0 Sat Nov 13 21:54:44 GMT 2021 tdb 1088 18 0 6 2 1 1 1 0 8 0 Sat Nov 13 22:04:44 GMT 2021 tdb 1088 27 0 12 2 1 1 1 0 8 0 and then shortly after, boom. Unfortunately I didn't get much from ddb before it hanged after I tried changing cpu. [playback] Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7ffffb2d20, count: 248 End of stack trace. splassert: tdb_free: want 2 have 0 Starting stack trace... tdb_free(ffff800000f40b88) at tdb_free+0x47 pfkeyv2_send(fffffd80603603b8,ffff800000f44c00,170) at pfkeyv2_send+0x656 pfkeyv2_output(fffffd8064cb5e00,fffffd80603603b8,0,0) at pfkeyv2_output+0x8a pfkeyv2_usrreq(fffffd80603603b8,9,fffffd8064cb5e00,0,0,ffff8000ffff9508) at pfkeyv2_usrreq+0x1b0 sosend(fffffd80603603b8,0,ffff80001fd8f6f0,0,0,0) at sosend+0x3a9 dofilewritev(ffff8000ffff9508,7,ffff80001fd8f6f0,0,ffff80001fd8f7f0) at dofilewritev+0x14d sys_writev(ffff8000ffff9508,ffff80001fd8f790,ffff80001fd8f7f0) at sys_writev+0xd2 syscall(ffff80001fd8f860) at syscall+0x3a9 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7ffffb2d20, count: 248 End of stack trace. splassert: tdb_free: want 2 have 0 Starting stack trace... tdb_free(ffff800000f40308) at tdb_free+0x47 pfkeyv2_send(fffffd80603603b8,ffff800000eff300,50) at pfkeyv2_send+0x656 pfkeyv2_output(fffffd8062506e00,fffffd80603603b8,0,0) at pfkeyv2_output+0x8a pfkeyv2_usrreq(fffffd80603603b8,9,fffffd8062506e00,0,0,ffff8000ffff9508) at pfkeyv2_usrreq+0x1b0 sosend(fffffd80603603b8,0,ffff80001fd8f6f0,0,0,0) at sosend+0x3a9 dofilewritev(ffff8000ffff9508,7,ffff80001fd8f6f0,0,ffff80001fd8f7f0) at dofilewritev+0x14d sys_writev(ffff8000ffff9508,ffff80001fd8f790,ffff80001fd8f7f0) at sys_writev+0xd2 syscall(ffff80001fd8f860) at syscall+0x3a9 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7ffffc3170, count: 248 End of stack trace. splassert: ipsp_ids_free: want 2 have 0 Starting stack trace... ipsp_ids_free(ffff800000f30880) at ipsp_ids_free+0x4e tdb_free(ffff800000f40308) at tdb_free+0x13c pfkeyv2_send(fffffd80603603b8,ffff800000eff300,50) at pfkeyv2_send+0x656 pfkeyv2_output(fffffd8062506e00,fffffd80603603b8,0,0) at pfkeyv2_output+0x8a pfkeyv2_usrreq(fffffd80603603b8,9,fffffd8062506e00,0,0,ffff8000ffff9508) at pfkeyv2_usrreq+0x1b0 sosend(fffffd80603603b8,0,ffff80001fd8f6f0,0,0,0) at sosend+0x3a9 dopfainlice:wr iketernv(eflf ffd8i0a0gn0offsftfic9 5a08ss,7er,ftfiofnf 8"0r00ef1cfdnt8 f6!=f 0,~00",f ffaiflfe80d:0 01fiflde8 f7"f/s0ys)/ akter n/kern_ sdyofncilh.ecw"ri,t elivn+e0 x8124d4 Stopped at db_enter+0x10: popq %rbp TID PID UID PRFLAGS PFLAGS CPU COMMAND 185954 5669 68 0x10 0 1 isakmpd 67015 91815 0 0x14000 0x200 3 softnet *318814 6181 0 0x14000 0x40000200 0 softclock db_enter() at db_enter+0x10 panic(ffffffff81e4a37b) at panic+0xbf __assert(ffffffff81eb3b74,ffffffff81e1e19f,338,ffffffff81e4f078) at __assert+0x 25 refcnt_rele(ffff800000f40330) at refcnt_rele+0x6f tdb_delete(ffff800000f40308) at tdb_delete+0x84 tdb_timeout(ffff800000f40308) at tdb_timeout+0x39 softclock_thread(ffff8000fffff500) at softclock_thread+0x16e end trace frame: 0x0, count: 8 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{0}> [-- sthen@localhost attached -- Sat Nov 13 22:23:17 2021] tr db_enter() at db_enter+0x10 panic(ffffffff81e4a37b) at panic+0xbf __assert(ffffffff81eb3b74,ffffffff81e1e19f,338,ffffffff81e4f078) at __assert+0x 25 refcnt_rele(ffff800000f40330) at refcnt_rele+0x6f tdb_delete(ffff800000f40308) at tdb_delete+0x84 tdb_timeout(ffff800000f40308) at tdb_timeout+0x39 softclock_thread(ffff8000fffff500) at softclock_thread+0x16e end trace frame: 0x0, count: -7 ddb{0}> sh reg rdi 0 rsi 0x14 rbp 0xffff80001fbdc6e0 rbx 0 rdx 0xde00000003ff0239 rcx 0x202 rax 0x62 r8 0x101010101010101 r9 0 r10 0 r11 0x254de177953953f9 r12 0xffffffff82196a00 cpu_info_full_primary+0x2a00 r13 0xffffffff82275110 timeout_proc r14 0 r15 0xffffffff81e4a37b cmd0646_9_tim_udma+0x2e00b rip 0xffffffff81dad9f0 db_enter+0x10 cs 0x8 rflags 0x206 rsp 0xffff80001fbdc6e0 ss 0x10 db_enter+0x10: popq %rbp ddb{0}> ps PID TID PPID UID S FLAGS WAIT COMMAND 20274 32066 93874 1000 3 0x100083 nanoslp sleep 93874 445500 10192 1000 3 0x10008b sigsusp ksh 10192 370220 36870 1000 3 0x98 kqread sshd 36870 222408 98260 0 3 0x92 kqread sshd 5669 185954 75457 68 7 0x10 isakmpd 75457 276857 1 0 3 0x80 netio isakmpd 21598 237571 1 0 3 0x100083 ttyin getty 4506 488964 1 0 3 0x100098 kqread cron 39797 52192 15626 846 3 0x100098 kqread dhcpcd 59318 229873 15626 846 3 0x100098 kqread dhcpcd 7830 517562 15626 0 3 0x88 kqread dhcpcd 15626 323417 1 846 3 0x100098 kqread dhcpcd 19237 354049 75741 720 3 0x90 kqread lldpd 75741 66119 1 0 3 0x80 netio lldpd 61311 177515 1 99 3 0x100090 kqread sndiod 11783 282500 1 110 3 0x100090 kqread sndiod 82651 136045 37762 0 3 0x100090 kqread tftp-proxy 37762 379684 1 108 3 0x90 kqread tftp-proxy 1745 324688 1 109 3 0x90 kqread ftp-proxy 39681 116848 88999 95 3 0x100092 kqread smtpd 25384 256665 88999 103 3 0x100092 kqread smtpd 92345 176419 88999 95 3 0x100092 kqread smtpd 55442 232201 88999 95 3 0x100092 kqread smtpd 77443 215878 88999 95 3 0x100092 kqread smtpd 69162 261319 88999 95 3 0x100092 kqread smtpd 88999 111572 1 0 3 0x100080 kqread smtpd 58046 516005 58609 94 3 0x100092 kqread rad 90680 269647 58609 94 3 0x100092 kqread rad 58609 188614 1 0 3 0x100080 kqread rad 3419 227743 1 77 3 0x100090 kqread dhcpd 61767 205183 1 0 3 0x100080 kqread snmpd 51987 257332 1 91 3 0x92 kqread snmpd 98260 56979 1 0 3 0x88 kqread sshd 13528 58314 1 0 3 0x100080 kqread ntpd 18801 320473 51204 83 3 0x100092 kqread ntpd 51204 405642 1 83 3 0x100092 kqread ntpd 53483 209302 30989 53 3 0x90 kqread unbound 30989 499872 1 53 3 0x90 kqread unbound 16457 234403 32406 74 3 0x100092 bpf pflogd 32406 88142 1 0 3 0x80 netio pflogd 80238 516045 79768 73 3 0x100090 kqread syslogd 79768 67472 1 0 3 0x100082 netio syslogd 43550 409362 1 0 3 0x100080 kqread resolvd 442 379277 99464 77 3 0x100092 kqread dhcpleased 47451 452762 99464 77 3 0x100092 kqread dhcpleased 99464 433528 1 0 3 0x80 kqread dhcpleased 21886 318205 0 0 3 0x14200 bored wg_crypt 97050 229388 0 0 3 0x14200 bored wg_crypt 4440 5494 0 0 3 0x14200 bored wg_crypt 72609 120590 0 0 3 0x14200 bored wg_crypt 20025 29295 0 0 3 0x14200 bored wg_handshake 43757 478463 0 0 3 0x14200 bored wg_handshake 55892 346923 51675 115 3 0x100092 kqread slaacd 76843 163407 51675 115 3 0x100092 kqread slaacd 51675 404409 1 0 3 0x100080 kqread slaacd 28362 202426 0 0 3 0x14200 bored smr 82207 376534 0 0 3 0x14200 pgzero zerothread 1524 51196 0 0 3 0x14200 aiodoned aiodoned 22770 341109 0 0 3 0x14200 syncer update 33241 337755 0 0 3 0x14200 cleaner cleaner 27893 102676 0 0 3 0x14200 reaper reaper 79368 439400 0 0 3 0x14200 pgdaemon pagedaemon 98877 12806 0 0 3 0x14200 bored sensors 71274 388091 0 0 3 0x14200 mmctsk sdmmc0 33311 377928 0 0 3 0x14200 usbtsk usbtask 83240 116348 0 0 3 0x14200 usbatsk usbatsk 45461 344150 0 0 3 0x40014200 acpi0 acpi0 94708 135825 0 0 3 0x40014200 idle3 46611 496404 0 0 7 0x40014200 idle2 26280 26936 0 0 3 0x40014200 idle1 91815 67015 0 0 7 0x14200 softnet 90110 476121 0 0 3 0x14200 bored systqmp 12475 134606 0 0 3 0x14200 bored systq * 6181 318814 0 0 7 0x40014200 softclock 38662 5703 0 0 3 0x40014200 idle0 1 335266 0 0 3 0x82 wait init 0 0 -1 0 3 0x10200 scheduler swapper ddb{0}> ps /o TID PID UID PRFLAGS PFLAGS CPU COMMAND 185954 5669 68 0x10 0 1 isakmpd 67015 91815 0 0x14000 0x200 3 softnet *318814 6181 0 0x14000 0x40000200 0 softclock ddb{0}> tr /t 0t185954 ffff800000f40308(ffffffffffffffff,ffffffffffffffff,ffff800000f402e8,0,0,0) at 0 xffff800000f40308 end trace frame: 0x0, count: -1 ddb{0}> mach ddbcpu 1 <ddb hanged here> However, looking at the earlier console log, there were splasserts starting from before it went multiuser: reordering libraries: done.^M starting early daemons: syslogd pflogd unbound ntpd isakmpd.^M starting RPC daemons:.^M savecore: no core dump^M checking quotas: done.^M clearing /tmp^M kern.securelevel: 0 -> 1^M creating runtime link editor directory cache.^M preserving editor files.^M starting network daemons: sshd snmpd dhcpd rad smtpd ftpproxy ftpproxy6(failed) tftpproxy sndiod.^M starting package daemons: lldpd dhcpcdsplassert: tdb_free: want 2 have 0 ^MStarting stack trace... ^Mtdb_free(ffff800000f40b80) at tdb_free+0x47 ^Mpfkeyv2_send(fffffd805f7ace00,ffff800000f35c00,170) at pfkeyv2_send+0x656 ^Mpfkeyv2_output(fffffd8064cb5300,fffffd805f7ace00,0,0) at pfkeyv2_output+0x8a ^Mpfkeyv2_usrreq(fffffd805f7ace00,9,fffffd8064cb5300,0,0,ffff80001fcb22a8) at pfkeyv2_usrreq+0x1b0 ^Msosend(fffffd805f7ace00,0,ffff80001fcd97f0,0,0,0) at sosend+0x3a9 ^Mdofilewritev(ffff80001fcb22a8,7,ffff80001fcd97f0,0,ffff80001fcd98f0) at dofilewritev+0x14d ^Msys_writev(ffff80001fcb22a8,ffff80001fcd9890,ffff80001fcd98f0) at sys_writev+0xd2 ^Msyscall(ffff80001fcd9960) at syscall+0x3a9 ^MXsyscall() at Xsyscall+0x128 ^Mend of kernel ^Mend trace frame: 0x7f7ffffb15b0, count: 248 ^MEnd of stack trace. ^Msplassert: tdb_free: want 2 have 0 ^MStarting stack trace... ^Mtdb_free(ffff800000f40b80) at tdb_free+0x47 ^Mpfkeyv2_send(fffffd805f7ace00,ffff800000f44400,190) at pfkeyv2_send+0x656 ^Mpfkeyv2_output(fffffd806285fe00,fffffd805f7ace00,0,0) at pfkeyv2_output+0x8a ^Mpfkeyv2_usrreq(fffffd805f7ace00,9,fffffd806285fe00,0,0,ffff80001fcb22a8) at pfkeyv2_usrreq+0x1b0 ^Msosend(fffffd805f7ace00,0,ffff80001fcd97f0,0,0,0) at sosend+0x3a9 ^Mdofilewritev(ffff80001fcb22a8,7,ffff80001fcd97f0,0,ffff80001fcd98f0) at dofilewritev+0x14d ^Msys_writev(ffff80001fcb22a8,ffff80001fcd9890,ffff80001fcd98f0) at sys_writev+0xd2 ^Msyscall(ffff80001fcd9960) at syscall+0x3a9 ^MXsyscall() at Xsyscall+0x128 ^Mend of kernel ^Mend trace frame: 0x7f7ffffb15b0, count: 248 ^MEnd of stack trace. ^Msplassert: tdb_free: want 2 have 0 ^MStarting stack trace... ^Mtdb_free(ffff800000f40b80) at tdb_free+0x47 ^Mpfkeyv2_send(fffffd805f7ace00,ffff800000f44400,190) at pfkeyv2_send+0x656 ^Mpfkeyv2_output(fffffd806285fe00,fffffd805f7ace00,0,0) at pfkeyv2_output+0x8a ^Mpfkeyv2_usrreq(fffffd805f7ace00,9,fffffd806285fe00,0,0,ffff80001fcb22a8) at pfkeyv2_usrreq+0x1b0 ^Msosend(fffffd805f7ace00,0,ffff80001fcd97f0,0,0,0) at sosend+0x3a9 ^Mdofilewritev(ffff80001fcb22a8,7,ffff80001fcd97f0,0,ffff80001fcd98f0) at dofilewritev+0x14d ^Msys_writev(ffff80001fcb22a8,ffff80001fcd9890,ffff80001fcd98f0) at sys_writev+0xd2 ^Msyscall(ffff80001fcd9960) at syscall+0x3a9 ^MXsyscall() at Xsyscall+0x128 ^Mend of kernel ^Mend trace frame: 0x7f7ffffb15b0, count: 248 ^MEnd of stack trace. ^Msplassert: tdb_free: want 2 have 0 ^MStarting stack trace... ^Mtdb_free(ffff800000f40b80) at tdb_free+0x47 ^Mpfkeyv2_send(fffffd805f7ace00,ffff800000f35c00,170) at pfkeyv2_send+0x656 ^Mpfkeyv2_output(fffffd8062497e00,fffffd805f7ace00,0,0) at pfkeyv2_output+0x8a ^Mpfkeyv2_usrreq(fffffd805f7ace00,9,fffffd8062497e00,0,0,ffff80001fcb22a8) at pfkeyv2_usrreq+0x1b0 ^Msosend(fffffd805f7ace00,0,ffff80001fcd97f0,0,0,0) at sosend+0x3a9 ^Mdofilewritev(ffff80001fcb22a8,7,ffff80001fcd97f0,0,ffff80001fcd98f0) at dofilewritev+0x14d ^Msys_writev(ffff80001fcb22a8,ffff80001fcd9890,ffff80001fcd98f0) at sys_writev+0xd2 ^Msyscall(ffff80001fcd9960) at syscall+0x3a9 ^MXsyscall() at Xsyscall+0x128 ^Mend of kernel ^Mend trace frame: 0x7f7ffffb15b0, count: 248 ^MEnd of stack trace. ^M.^M starting local daemons: cron.^M Sat Nov 13 21:34:29 GMT 2021^M reorder_kernel: failed -- see /usr/share/relink/kernel/GENERIC.MP/relink.log^M ^M OpenBSD/amd64 (jodrell.spacehopper.org) (tty00)^M ^M login: splassert: tdb_free: want 2 have 0 ^MStarting stack trace... ^Mtdb_free(ffff800000f40b88) at tdb_free+0x47 ^Mpfkeyv2_send(fffffd80603603b8,ffff800000f44600,170) at pfkeyv2_send+0x656 ^Mpfkeyv2_output(fffffd8064cb5a00,fffffd80603603b8,0,0) at pfkeyv2_output+0x8a ^Mpfkeyv2_usrreq(fffffd80603603b8,9,fffffd8064cb5a00,0,0,ffff8000ffff9508) at pfkeyv2_usrreq+0x1b0 ^Msosend(fffffd80603603b8,0,ffff80001fd8f6f0,0,0,0) at sosend+0x3a9 ^Mdofilewritev(ffff8000ffff9508,7,ffff80001fd8f6f0,0,ffff80001fd8f7f0) at dofilewritev+0x14d ^Msys_writev(ffff8000ffff9508,ffff80001fd8f790,ffff80001fd8f7f0) at sys_writev+0xd2 ^Msyscall(ffff80001fd8f860) at syscall+0x3a9 ^MXsyscall() at Xsyscall+0x128 ^Mend of kernel ^Mend trace frame: 0x7f7ffffb2d20, count: 248 ^MEnd of stack trace. ^Msplassert: tdb_free: want 2 have 0 ^MStarting stack trace... ^Mtdb_free(ffff800000f40b88) at tdb_free+0x47 ^Mpfkeyv2_send(fffffd80603603b8,ffff800000f44c00,190) at pfkeyv2_send+0x656 ^Mpfkeyv2_output(fffffd8067cc9e00,fffffd80603603b8,0,0) at pfkeyv2_output+0x8a ^Mpfkeyv2_usrreq(fffffd80603603b8,9,fffffd8067cc9e00,0,0,ffff8000ffff9508) at pfkeyv2_usrreq+0x1b0 ^Msosend(fffffd80603603b8,0,ffff80001fd8f6f0,0,0,0) at sosend+0x3a9 ^Mdofilewritev(ffff8000ffff9508,7,ffff80001fd8f6f0,0,ffff80001fd8f7f0) at dofilewritev+0x14d ^Msys_writev(ffff8000ffff9508,ffff80001fd8f790,ffff80001fd8f7f0) at sys_writev+0xd2 ^Msyscall(ffff80001fd8f860) at syscall+0x3a9 ^MXsyscall() at Xsyscall+0x128 ^Mend of kernel ^Mend trace frame: 0x7f7ffffb2d20, count: 248 ^MEnd of stack trace. ^Msplassert: tdb_free: want 2 have 0 ^MStarting stack trace... ^Mtdb_free(ffff800000f40b88) at tdb_free+0x47 ^Mpfkeyv2_send(fffffd80603603b8,ffff800000f44a00,170) at pfkeyv2_send+0x656 ^Mpfkeyv2_output(fffffd805d1da700,fffffd80603603b8,0,0) at pfkeyv2_output+0x8a ^Mpfkeyv2_usrreq(fffffd80603603b8,9,fffffd805d1da700,0,0,ffff8000ffff9508) at pfkeyv2_usrreq+0x1b0 ^Msosend(fffffd80603603b8,0,ffff80001fd8f6f0,0,0,0) at sosend+0x3a9 ^Mdofilewritev(ffff8000ffff9508,7,ffff80001fd8f6f0,0,ffff80001fd8f7f0) at dofilewritev+0x14d ^Msys_writev(ffff8000ffff9508,ffff80001fd8f790,ffff80001fd8f7f0) at sys_writev+0xd2 ^Msyscall(ffff80001fd8f860) at syscall+0x3a9 ^MXsyscall() at Xsyscall+0x128 ^Mend of kernel ^Mend trace frame: 0x7f7ffffb2d20, count: 248 ^MEnd of stack trace. ^M[-- MARK -- Sat Nov 13 22:00:00 2021]^M splassert: tdb_free: want 2 have 0 [...] I'll see if I can get something else out of it.