Re: forwarding in parallel ipsec workaround
On Sat, Oct 02, 2021 at 03:23:57PM -0700, Chris Cappuccio wrote: > Hrvoje Popovski [hrv...@srce.hr] wrote: > > > > box didn't panic, just stopped forwarding traffic through tunnel. > > any chance any progress has been made here? is there any newer versions > of these diffs floating around? Main problem is that ipsec is not stable and we only have an ugly workaround. One solution is to implement tdb ref counting. I am working on a diff. bluhm
Re: forwarding in parallel ipsec workaround
Hrvoje Popovski [hrv...@srce.hr] wrote: > > box didn't panic, just stopped forwarding traffic through tunnel. any chance any progress has been made here? is there any newer versions of these diffs floating around?
Re: forwarding in parallel ipsec workaround
On 23.7.2021. 16:20, Vitaliy Makkoveev wrote: > On Thu, Jul 22, 2021 at 11:30:02PM +0200, Hrvoje Popovski wrote: >> On 22.7.2021. 22:52, Vitaliy Makkoveev wrote: >>> On Thu, Jul 22, 2021 at 08:38:04PM +0200, Hrvoje Popovski wrote: On 22.7.2021. 12:21, Hrvoje Popovski wrote: > Thank you for explanation.. > > after hitting box all night, box panic and i was able to reproduce it > this morning ... I'm not sure but box panic after hour or more of > sending traffic through iked tunnel .. > I will try to reproduce it through ipsec tunnel .. with isakmpd i've got panic after 5 or more hours r620-1# uvm_fault(0x822b2158, 0x137, 0, 2) -> e kernel: page fault trap, code=0 Stopped at tdb_free+0x9c: movq%rsi,0(%rdi) TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 469292 75215 680x10 02 isakmpd *142487 42745 0 0x14000 0x42000K softclock tdb_free(8118f6e0) at tdb_free+0x9c tdb_timeout(8118f6e0) at tdb_timeout+0x77 softclock_thread(8000efc0) at softclock_thread+0x16e end trace frame: 0x0, count: 12 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. >>> There is the bluhm@'s diff with serialized `ipsec_in_use' access. Can >>> you try it? >> >> Hi, >> >> I'm running this diff with "ipsec crypto no queue" and i'm seeing >> traffic drops. This is little strange because "ipsec crypto no queue" >> diff take care of those drops... >> > > Thanks! > > Did you caught panics? > Could you share "netstat -s -p ipsec" output? > Hi, box didn't panic, just stopped forwarding traffic through tunnel. ipsec: 0 input IPsec packets 3299331 output IPsec packets 0 input bytes 1781638740 output bytes 0 input bytes, decompressed 2235858152 output bytes, uncompressed 0 packets dropped on input 1030761 packets dropped on output 0 packets that failed crypto processing 0 packets for which no XFORM was set in TDB received 447309 packets for which no TDB was found it still little strange to me that with "ipsec crypto no queue" diff i see traffic drops ... so i will do all over again just to be sure that i'm not doing something wrong ..
Re: forwarding in parallel ipsec workaround
On Thu, Jul 22, 2021 at 11:30:02PM +0200, Hrvoje Popovski wrote: > On 22.7.2021. 22:52, Vitaliy Makkoveev wrote: > > On Thu, Jul 22, 2021 at 08:38:04PM +0200, Hrvoje Popovski wrote: > >> On 22.7.2021. 12:21, Hrvoje Popovski wrote: > >>> Thank you for explanation.. > >>> > >>> after hitting box all night, box panic and i was able to reproduce it > >>> this morning ... I'm not sure but box panic after hour or more of > >>> sending traffic through iked tunnel .. > >>> I will try to reproduce it through ipsec tunnel .. > >> > >> with isakmpd i've got panic after 5 or more hours > >> > >> r620-1# uvm_fault(0x822b2158, 0x137, 0, 2) -> e > >> kernel: page fault trap, code=0 > >> Stopped at tdb_free+0x9c: movq%rsi,0(%rdi) > >> TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > >> 469292 75215 680x10 02 isakmpd > >> *142487 42745 0 0x14000 0x42000K softclock > >> tdb_free(8118f6e0) at tdb_free+0x9c > >> tdb_timeout(8118f6e0) at tdb_timeout+0x77 > >> softclock_thread(8000efc0) at softclock_thread+0x16e > >> end trace frame: 0x0, count: 12 > >> https://www.openbsd.org/ddb.html describes the minimum info required in > >> bug reports. Insufficient info makes it difficult to find and fix bugs. > >> > > There is the bluhm@'s diff with serialized `ipsec_in_use' access. Can > > you try it? > > Hi, > > I'm running this diff with "ipsec crypto no queue" and i'm seeing > traffic drops. This is little strange because "ipsec crypto no queue" > diff take care of those drops... > Thanks! Did you caught panics? Could you share "netstat -s -p ipsec" output?
Re: forwarding in parallel ipsec workaround
On 22.7.2021. 22:52, Vitaliy Makkoveev wrote: > On Thu, Jul 22, 2021 at 08:38:04PM +0200, Hrvoje Popovski wrote: >> On 22.7.2021. 12:21, Hrvoje Popovski wrote: >>> Thank you for explanation.. >>> >>> after hitting box all night, box panic and i was able to reproduce it >>> this morning ... I'm not sure but box panic after hour or more of >>> sending traffic through iked tunnel .. >>> I will try to reproduce it through ipsec tunnel .. >> >> with isakmpd i've got panic after 5 or more hours >> >> r620-1# uvm_fault(0x822b2158, 0x137, 0, 2) -> e >> kernel: page fault trap, code=0 >> Stopped at tdb_free+0x9c: movq%rsi,0(%rdi) >> TIDPIDUID PRFLAGS PFLAGS CPU COMMAND >> 469292 75215 680x10 02 isakmpd >> *142487 42745 0 0x14000 0x42000K softclock >> tdb_free(8118f6e0) at tdb_free+0x9c >> tdb_timeout(8118f6e0) at tdb_timeout+0x77 >> softclock_thread(8000efc0) at softclock_thread+0x16e >> end trace frame: 0x0, count: 12 >> https://www.openbsd.org/ddb.html describes the minimum info required in >> bug reports. Insufficient info makes it difficult to find and fix bugs. >> > There is the bluhm@'s diff with serialized `ipsec_in_use' access. Can > you try it? Hi, I'm running this diff with "ipsec crypto no queue" and i'm seeing traffic drops. This is little strange because "ipsec crypto no queue" diff take care of those drops...
Re: forwarding in parallel ipsec workaround
On Thu, Jul 22, 2021 at 08:38:04PM +0200, Hrvoje Popovski wrote: > On 22.7.2021. 12:21, Hrvoje Popovski wrote: > > Thank you for explanation.. > > > > after hitting box all night, box panic and i was able to reproduce it > > this morning ... I'm not sure but box panic after hour or more of > > sending traffic through iked tunnel .. > > I will try to reproduce it through ipsec tunnel .. > > > with isakmpd i've got panic after 5 or more hours > > r620-1# uvm_fault(0x822b2158, 0x137, 0, 2) -> e > kernel: page fault trap, code=0 > Stopped at tdb_free+0x9c: movq%rsi,0(%rdi) > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > 469292 75215 680x10 02 isakmpd > *142487 42745 0 0x14000 0x42000K softclock > tdb_free(8118f6e0) at tdb_free+0x9c > tdb_timeout(8118f6e0) at tdb_timeout+0x77 > softclock_thread(8000efc0) at softclock_thread+0x16e > end trace frame: 0x0, count: 12 > https://www.openbsd.org/ddb.html describes the minimum info required in > bug reports. Insufficient info makes it difficult to find and fix bugs. > There is the bluhm@'s diff with serialized `ipsec_in_use' access. Can you try it? Index: sys/net/if.c === RCS file: /cvs/src/sys/net/if.c,v retrieving revision 1.643 diff -u -p -r1.643 if.c --- sys/net/if.c20 Jul 2021 16:32:28 - 1.643 +++ sys/net/if.c22 Jul 2021 20:46:19 - @@ -109,6 +109,10 @@ #include #endif +#ifdef IPSEC +#include +#endif + #ifdef MPLS #include #endif @@ -238,7 +242,7 @@ int ifq_congestion; int netisr; -#defineNET_TASKQ 1 +#defineNET_TASKQ 4 struct taskq *nettqmp[NET_TASKQ]; struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL); @@ -811,10 +815,13 @@ if_output_local(struct ifnet *ifp, struc return (ifiq_enqueue(ifiq, m) == 0 ? 0 : ENOBUFS); } +struct rwlock ipsec_lock = RWLOCK_INITIALIZER("ipsecl"); + void if_input_process(struct ifnet *ifp, struct mbuf_list *ml) { struct mbuf *m; + int exclusive_lock = 0; if (ml_empty(ml)) return; @@ -835,15 +842,30 @@ if_input_process(struct ifnet *ifp, stru * lists and the socket layer. */ + rw_enter_read(_lock); +#ifdef IPSEC /* * XXXSMP IPsec data structures are not ready to be accessed * by multiple network threads in parallel. In this case * use an exclusive lock. */ - NET_LOCK(); + if (ipsec_in_use) + exclusive_lock = 1; +#endif + if (exclusive_lock) + NET_LOCK(); + else + NET_RLOCK_IN_SOFTNET(); + + rw_exit_read(_lock); + while ((m = ml_dequeue(ml)) != NULL) (*ifp->if_input)(ifp, m); - NET_UNLOCK(); + + if (exclusive_lock) + NET_UNLOCK(); + else + NET_RUNLOCK_IN_SOFTNET(); } void @@ -900,6 +922,12 @@ if_netisr(void *unused) arpintr(); KERNEL_UNLOCK(); } +#endif + if (n & (1 << NETISR_IP)) + ipintr(); +#ifdef INET6 + if (n & (1 << NETISR_IPV6)) + ip6intr(); #endif #if NPPP > 0 if (n & (1 << NETISR_PPP)) { Index: sys/net/if_ethersubr.c === RCS file: /cvs/src/sys/net/if_ethersubr.c,v retrieving revision 1.275 diff -u -p -r1.275 if_ethersubr.c --- sys/net/if_ethersubr.c 7 Jul 2021 20:19:01 - 1.275 +++ sys/net/if_ethersubr.c 22 Jul 2021 20:46:19 - @@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct switch (af) { case AF_INET: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in arpresolve() */ error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); eh->ether_type = htons(ETHERTYPE_IP); @@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct break; #ifdef INET6 case AF_INET6: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in nd6_resolve() */ error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); eh->ether_type = htons(ETHERTYPE_IPV6); @@ -271,13 +277,19 @@ ether_resolve(struct ifnet *ifp, struct break; #ifdef INET6 case AF_INET6: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in nd6_resolve() */ error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK();
Re: forwarding in parallel ipsec workaround
On 22.7.2021. 12:21, Hrvoje Popovski wrote: > Thank you for explanation.. > > after hitting box all night, box panic and i was able to reproduce it > this morning ... I'm not sure but box panic after hour or more of > sending traffic through iked tunnel .. > I will try to reproduce it through ipsec tunnel .. with isakmpd i've got panic after 5 or more hours r620-1# uvm_fault(0x822b2158, 0x137, 0, 2) -> e kernel: page fault trap, code=0 Stopped at tdb_free+0x9c: movq%rsi,0(%rdi) TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 469292 75215 680x10 02 isakmpd *142487 42745 0 0x14000 0x42000K softclock tdb_free(8118f6e0) at tdb_free+0x9c tdb_timeout(8118f6e0) at tdb_timeout+0x77 softclock_thread(8000efc0) at softclock_thread+0x16e end trace frame: 0x0, count: 12 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{0}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 75215 469292 76137 68 70x10isakmpd 76137 414524 1 0 30x80 netio isakmpd 96813 295173 1 0 30x100083 ttyin ksh 19120 299871 1 0 30x100098 poll cron 1532 13552 38673 95 30x100092 kqreadsmtpd 26046 463899 38673103 30x100092 kqreadsmtpd 36828 266802 38673 95 30x100092 kqreadsmtpd 70875 110170 38673 95 30x100092 kqreadsmtpd 94519 66859 38673 95 30x100092 kqreadsmtpd 82071 306235 38673 95 30x100092 kqreadsmtpd 38673 196258 1 0 30x100080 kqreadsmtpd 32975 338908 1 0 30x88 selectsshd 83970 285872 1 0 30x100080 poll ntpd 13501 277269 28135 83 30x100092 poll ntpd 28135 154371 1 83 30x100092 poll ntpd 46277 200739 37910 73 30x100090 kqreadsyslogd 37910 178152 1 0 30x100082 netio syslogd 15169 243641 0 0 3 0x14200 bored smr 72867 54384 0 0 3 0x14200 pgzerozerothread 96747 464994 0 0 3 0x14200 aiodoned aiodoned 21420 278055 0 0 3 0x14200 syncerupdate 4733 35757 0 0 3 0x14200 cleaner cleaner 65184 477103 0 0 3 0x14200 reaperreaper 74689 248130 0 0 3 0x14200 pgdaemon pagedaemon 63567 184653 0 0 3 0x14200 bored crynlk 25576 473224 0 0 3 0x14200 bored crypto 59824 205269 0 0 3 0x14200 usbtskusbtask 79880 326022 0 0 3 0x14200 usbatsk usbatsk 92636 507494 0 0 3 0x40014200 acpi0 acpi0 1561 284019 0 0 7 0x40014200idle5 71732 313587 0 0 7 0x40014200idle4 50021 333287 0 0 7 0x40014200idle3 87783 242752 0 0 3 0x40014200idle2 48193 69868 0 0 7 0x40014200idle1 80238 196627 0 0 3 0x14200 bored sensors 36233 88931 0 0 3 0x14200 netlock softnet 76181 390400 0 0 3 0x14200 netlock softnet 12911 400662 0 0 3 0x14200 netlock softnet 23428 46363 0 0 3 0x14200 netlock softnet 22304 109508 0 0 3 0x14200 bored systqmp 97244 473960 0 0 3 0x14200 bored systq *42745 142487 0 0 7 0x40014200softclock 7369 374311 0 0 3 0x40014200idle0 1 360880 0 0 30x82 wait init 0 0 -1 0 3 0x10200 scheduler swapper ddb{0}> ddb{0}> trace /t 0t142487 alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b tdb_free(8118f6e0) at tdb_free+0x9c tdb_timeout(8118f6e0) at tdb_timeout+0x77 softclock_thread(8000efc0) at softclock_thread+0x16e end trace frame: 0x0, count: -4 ddb{0}> ddb{0}> trace /t 0t469292 sleep_finish(800023906db0,1) at sleep_finish+0x11c rw_enter(82187ca8,1) at rw_enter+0x1c2 solock(fd83b1019e10) at solock+0x4b soo_poll(fd83b1989a68,1,8000238fb7a0) at soo_poll+0x34 selscan(8000238fb7a0,800023906fd0,800023906fdc,14,4,8000239071b 0) at selscan+0x14b dopselect(8000238fb7a0,15,bf1473583e0,bf14736a140,0,8000239070e0) at dopselect+0x487 sys_pselect(8000238fb7a0,800023907150,8000239071b0) at sys_pselect+0xdb syscall(800023907220) at syscall+0x3a9 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7d45b0, count: -9 ddb{0}>
Re: forwarding in parallel ipsec workaround
On Thu, Jul 22, 2021 at 12:21:47PM +0200, Hrvoje Popovski wrote: > On 22.7.2021. 0:39, Alexander Bluhm wrote: > > On Thu, Jul 22, 2021 at 12:06:09AM +0200, Hrvoje Popovski wrote: > >> I'm combining this and last parallel diff and i can't see any drops in > >> traffic. Even sending at high rate, traffic through iked or isakmpd is > >> stable at 150Kpps, which is good .. > > > > Thanks, good news. > > > >> One funny thing is that with top -SHs1 crypto CPU usage is always at 0.00% > >> Could it be because of aes-ni? > > > > I don't use the crypto thread anymore. It is idle. All crypto is > > done in softnet between handling the network part of ipsec. > > > > Here is the flame graph. Crypto is running on top of network. > > http://bluhm.genua.de/perform/results/current/patch-sys-ipsec-noqueue.1/btrace/ssh_perform%40lt13_iperf3_-c10.4.56.36_-P10_-t10-btrace-kstack.0.svg > > > > bluhm > > > > Thank you for explanation.. > > after hitting box all night, box panic and i was able to reproduce it > this morning ... I'm not sure but box panic after hour or more of > sending traffic through iked tunnel .. > I will try to reproduce it through ipsec tunnel .. Unfortunately this unlocked `ipsec_in_use' check can't work as expected, because concurrent pfkey thread could modify it between the check (1) and netlock (2). You a lucky to hit this :) + /* +* XXXSMP IPsec data structures are not ready to be +* accessed by multiple Network threads in parallel. +*/ + if (ipsec_in_use) /* (1) */ + exclusive_lock = 1; + if (exclusive_lock) + NET_LOCK(); /* (2) */ + else + NET_RLOCK_IN_SOFTNET(); /* (2) */ + + while ((m = ml_dequeue(ml)) != NULL) + (*ifp->if_input)(ifp, m); This could be fixed by introducing the another rwlock(9) which should serialize `ipsec_in_use' modifications and check. It's obvious this lock should be grabbed before netlock and we could keep this order, but I strictly don't like this hack. pfkeyv2_output(struct mbuf *mbuf, struct socket *so, struct sockaddr *dstaddr, struct mbuf *control) { /* ... */ sounlock(so, SL_LOCKED); rw_enter_write(_lock); error = pfkeyv2_send(so, message, mbuf->m_pkthdr.len); rw_exit_write(_lock); solock(so); /* ... */ } if_input_process(struct ifnet *ifp, struct mbuf_list *ml) { /* ... */ /* * XXXSMP IPsec data structures are not ready to be * accessed by multiple Network threads in parallel. */ rw_enter_read(_lock); if (ipsec_in_use) /* (1) */ exclusive_lock = 1; if (exclusive_lock) NET_LOCK(); /* (2) */ else NET_RLOCK_IN_SOFTNET(); /* (2) */ rw_exit_read(_lock); while ((m = ml_dequeue(ml)) != NULL) (*ifp->if_input)(ifp, m); /* ... */ } > > > > > r620-1# uvm_fault(0x821a82d8, 0x137, 0, 2) -> e > kernel: page fault trap, code=0 > Stopped at tdb_free+0x9c: movq%rsi,0(%rdi) > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > *310284 77290 0 0x14000 0x2001K softnet > tdb_free(811832a8) at tdb_free+0x9c > esp_output(fd80a5c2f300,811832a8,0,14,9) at esp_output+0x44f > ipsp_process_packet(fd80a5c2f300,811832a8,2,0) at > ipsp_process_packet+0x466 > ip_output_ipsec_send(811832a8,fd80a5c2f300,800023871cf8,1) > at ip_output_ipsec_send+0x161 > ip_output(fd80a5c2f300,0,800023871cf8,1,0,0) at ip_output+0x8bd > ip_forward(fd80a5c2f300,80087048,fd83b3454bd0,0) at > ip_forward+0x26a > ip_input_if(800023871e38,800023871e44,4,0,80087048) at > ip_input_if+0x353 > ipv4_input(80087048,fd80a5c2f300) at ipv4_input+0x39 > if_input_process(80087048,800023871eb8) at if_input_process+0x92 > ifiq_process(80086c00) at ifiq_process+0x69 > taskq_thread(8002f200) at taskq_thread+0x81 > end trace frame: 0x0, count: 4 > https://www.openbsd.org/ddb.html describes the minimum info required in > bug reports. Insufficient info makes it difficult to find and fix bugs. > > > ddb{1}> ps >PID TID PPIDUID S FLAGS WAIT COMMAND > 94310 423919 1 0 30x100083 ttyin ksh > 77465 202466 1 0 30x100098 poll cron > 65594 481620 35139 95 30x100092 kqreadsmtpd > 2326 44364 35139103 30x100092 kqreadsmtpd > 1225 215234 35139 95 30x100092 kqreadsmtpd > 69590 400378 35139 95 30x100092 kqreadsmtpd > 7267 26043 35139 95 30x100092 kqreadsmtpd > 38120 331881 35139 95 30x100092 kqreadsmtpd > 35139 66391 1 0 30x100080 kqreadsmtpd > 79402 86611 1 0 30x88 selectsshd > 43297 495028 53834
Re: forwarding in parallel ipsec workaround
On 22.7.2021. 0:39, Alexander Bluhm wrote: > On Thu, Jul 22, 2021 at 12:06:09AM +0200, Hrvoje Popovski wrote: >> I'm combining this and last parallel diff and i can't see any drops in >> traffic. Even sending at high rate, traffic through iked or isakmpd is >> stable at 150Kpps, which is good .. > > Thanks, good news. > >> One funny thing is that with top -SHs1 crypto CPU usage is always at 0.00% >> Could it be because of aes-ni? > > I don't use the crypto thread anymore. It is idle. All crypto is > done in softnet between handling the network part of ipsec. > > Here is the flame graph. Crypto is running on top of network. > http://bluhm.genua.de/perform/results/current/patch-sys-ipsec-noqueue.1/btrace/ssh_perform%40lt13_iperf3_-c10.4.56.36_-P10_-t10-btrace-kstack.0.svg > > bluhm > Thank you for explanation.. after hitting box all night, box panic and i was able to reproduce it this morning ... I'm not sure but box panic after hour or more of sending traffic through iked tunnel .. I will try to reproduce it through ipsec tunnel .. r620-1# uvm_fault(0x821a82d8, 0x137, 0, 2) -> e kernel: page fault trap, code=0 Stopped at tdb_free+0x9c: movq%rsi,0(%rdi) TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *310284 77290 0 0x14000 0x2001K softnet tdb_free(811832a8) at tdb_free+0x9c esp_output(fd80a5c2f300,811832a8,0,14,9) at esp_output+0x44f ipsp_process_packet(fd80a5c2f300,811832a8,2,0) at ipsp_process_packet+0x466 ip_output_ipsec_send(811832a8,fd80a5c2f300,800023871cf8,1) at ip_output_ipsec_send+0x161 ip_output(fd80a5c2f300,0,800023871cf8,1,0,0) at ip_output+0x8bd ip_forward(fd80a5c2f300,80087048,fd83b3454bd0,0) at ip_forward+0x26a ip_input_if(800023871e38,800023871e44,4,0,80087048) at ip_input_if+0x353 ipv4_input(80087048,fd80a5c2f300) at ipv4_input+0x39 if_input_process(80087048,800023871eb8) at if_input_process+0x92 ifiq_process(80086c00) at ifiq_process+0x69 taskq_thread(8002f200) at taskq_thread+0x81 end trace frame: 0x0, count: 4 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{1}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 94310 423919 1 0 30x100083 ttyin ksh 77465 202466 1 0 30x100098 poll cron 65594 481620 35139 95 30x100092 kqreadsmtpd 2326 44364 35139103 30x100092 kqreadsmtpd 1225 215234 35139 95 30x100092 kqreadsmtpd 69590 400378 35139 95 30x100092 kqreadsmtpd 7267 26043 35139 95 30x100092 kqreadsmtpd 38120 331881 35139 95 30x100092 kqreadsmtpd 35139 66391 1 0 30x100080 kqreadsmtpd 79402 86611 1 0 30x88 selectsshd 43297 495028 53834101 30x100010 netlock iked 73854 239338 53834101 30x100090 kqreadiked 79322 243271 53834101 30x100090 kqreadiked 53834 169269 1 0 30x100080 kqreadiked 98497 240479 1 0 30x100080 poll ntpd 93883 433035 46155 83 30x100092 poll ntpd 46155 241594 1 83 30x100092 poll ntpd 60135 75740 14081 73 30x100090 kqreadsyslogd 14081 390346 1 0 30x100082 netio syslogd 95646 201039 0 0 3 0x14200 bored smr 54746 191914 0 0 3 0x14200 pgzerozerothread 42039 120601 0 0 3 0x14200 aiodoned aiodoned 23202 404058 0 0 3 0x14200 syncerupdate 32492 255945 0 0 3 0x14200 cleaner cleaner 63792 183728 0 0 3 0x14200 reaperreaper 85251 251516 0 0 3 0x14200 pgdaemon pagedaemon 70650 399091 0 0 3 0x14200 bored crynlk 93527 443884 0 0 3 0x14200 bored crypto 98145 39889 0 0 3 0x14200 usbtskusbtask 67334 434779 0 0 3 0x14200 usbatsk usbatsk 86238 520690 0 0 3 0x40014200 acpi0 acpi0 52180 314140 0 0 7 0x40014200idle5 23611 433449 0 0 7 0x40014200idle4 40949 492122 0 0 7 0x40014200idle3 1758 355637 0 0 7 0x40014200idle2 24269 99037 0 0 3 0x40014200idle1 26630 422993 0 0 3 0x14200 bored sensors *77290 310284 0 0 7 0x14200softnet 68084 469615 0 0 3 0x14200 bored softnet 68120 339291 0 0 3 0x14200 bored
Re: [External] : Re: forwarding in parallel ipsec workaround
Hello, my statement here is just for the record. we should have a follow up discussion in a different thread, which is yet to be started (when time will come). > > > - Make ARP MP safe. Currently we need the kernel lock there or > > it crashes. This creates latency for all kind of packets. > > - Convert the rwlock in pf to mutex. I think your argument counts > > much more there. But I cannot prove it. > > My argument about what? Didn't we all agree about converting the rwlock > to a mutex for now? > I disagree here. If we will trade current rw-lock, which protects state table in pf(4), we will see significant drop in throughput. great deal of packets is forwarded as readers on pf_state_lock. only those, which don't match state are taking slow path. to turn rw-lock to mutex, we must break single giant tree into smaller buckets protected by mutexes. Or something like that must happen first. thanks and regards sashan
Re: forwarding in parallel ipsec workaround
On Thu, Jul 22, 2021 at 12:06:09AM +0200, Hrvoje Popovski wrote: > I'm combining this and last parallel diff and i can't see any drops in > traffic. Even sending at high rate, traffic through iked or isakmpd is > stable at 150Kpps, which is good .. Thanks, good news. > One funny thing is that with top -SHs1 crypto CPU usage is always at 0.00% > Could it be because of aes-ni? I don't use the crypto thread anymore. It is idle. All crypto is done in softnet between handling the network part of ipsec. Here is the flame graph. Crypto is running on top of network. http://bluhm.genua.de/perform/results/current/patch-sys-ipsec-noqueue.1/btrace/ssh_perform%40lt13_iperf3_-c10.4.56.36_-P10_-t10-btrace-kstack.0.svg bluhm
Re: forwarding in parallel ipsec workaround
On 21.7.2021. 22:21, Alexander Bluhm wrote: > Ahh, to many diffs in my tree. I have forgotten the cunk > crp->crp_flags = ... | CRYPTO_F_NOQUEUE > > Try this. Still testing it myself, it looks a bit faster. I'm combining this and last parallel diff and i can't see any drops in traffic. Even sending at high rate, traffic through iked or isakmpd is stable at 150Kpps, which is good .. One funny thing is that with top -SHs1 crypto CPU usage is always at 0.00% Could it be because of aes-ni? r620-1# cat /etc/ipsec.conf ike active esp from 192.168.232.0/24 to 192.168.123.0/24 \ local 192.168.42.1 peer 192.168.42.111 \ main auth hmac-sha1 enc aes group modp1024 \ quick enc aes-128-gcm group modp1024 \ psk "123" r620-1# cat /etc/iked.conf ikev2 active esp from 192.168.232.0/24 to 192.168.123.0/24 \ local 192.168.42.1 peer 192.168.42.111 \ ikesa enc aes-128-gcm group modp1024 prf hmac-sha1 \ childsa enc aes-128-gcm group modp1024 \ psk "123"
Re: forwarding in parallel ipsec workaround
On Wed, Jul 21, 2021 at 06:41:30PM +0200, Alexander Bluhm wrote: > On Mon, Jul 19, 2021 at 07:33:55PM +0300, Vitaliy Makkoveev wrote: > > Hi, pipex(4) is also not ready for parallel access. In the chunk below > > it will be accessed through (*ifp->if_input)() -> ether_input() -> > > pipex_pppoe_input(). This looks not fatal but makes at least session > > statistics inconsistent. > > For pipex and pppoe we can put a kernel lock into ether_resolve(). > I also put locks in mpls. > > I am not aware of any other issues. > > Can we commit this now? > Hmm, did I missed locks for mpls? > bluhm > > Index: net/if.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v > retrieving revision 1.643 > diff -u -p -r1.643 if.c > --- net/if.c 20 Jul 2021 16:32:28 - 1.643 > +++ net/if.c 21 Jul 2021 16:07:15 - > @@ -109,6 +109,10 @@ > #include > #endif > > +#ifdef IPSEC > +#include > +#endif > + > #ifdef MPLS > #include > #endif > @@ -238,7 +242,7 @@ int ifq_congestion; > > int netisr; > > -#define NET_TASKQ 1 > +#define NET_TASKQ 4 > struct taskq *nettqmp[NET_TASKQ]; > > struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL); > @@ -815,6 +819,7 @@ void > if_input_process(struct ifnet *ifp, struct mbuf_list *ml) > { > struct mbuf *m; > + int exclusive_lock = 0; > > if (ml_empty(ml)) > return; > @@ -835,15 +840,27 @@ if_input_process(struct ifnet *ifp, stru >* lists and the socket layer. >*/ > > +#ifdef IPSEC > /* >* XXXSMP IPsec data structures are not ready to be accessed >* by multiple network threads in parallel. In this case >* use an exclusive lock. >*/ > - NET_LOCK(); > + if (ipsec_in_use) > + exclusive_lock = 1; > +#endif > + if (exclusive_lock) > + NET_LOCK(); > + else > + NET_RLOCK_IN_SOFTNET(); > + > while ((m = ml_dequeue(ml)) != NULL) > (*ifp->if_input)(ifp, m); > - NET_UNLOCK(); > + > + if (exclusive_lock) > + NET_UNLOCK(); > + else > + NET_RUNLOCK_IN_SOFTNET(); > } > > void > @@ -900,6 +917,12 @@ if_netisr(void *unused) > arpintr(); > KERNEL_UNLOCK(); > } > +#endif > + if (n & (1 << NETISR_IP)) > + ipintr(); > +#ifdef INET6 > + if (n & (1 << NETISR_IPV6)) > + ip6intr(); > #endif > #if NPPP > 0 > if (n & (1 << NETISR_PPP)) { > Index: net/if_ethersubr.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v > retrieving revision 1.275 > diff -u -p -r1.275 if_ethersubr.c > --- net/if_ethersubr.c7 Jul 2021 20:19:01 - 1.275 > +++ net/if_ethersubr.c21 Jul 2021 16:34:51 - > @@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct > > switch (af) { > case AF_INET: > + KERNEL_LOCK(); > + /* XXXSMP there is a MP race in arpresolve() */ > error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); > + KERNEL_UNLOCK(); > if (error) > return (error); > eh->ether_type = htons(ETHERTYPE_IP); > @@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct > break; > #ifdef INET6 > case AF_INET6: > + KERNEL_LOCK(); > + /* XXXSMP there is a MP race in nd6_resolve() */ > error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); > + KERNEL_UNLOCK(); > if (error) > return (error); > eh->ether_type = htons(ETHERTYPE_IPV6); > @@ -271,13 +277,19 @@ ether_resolve(struct ifnet *ifp, struct > break; > #ifdef INET6 > case AF_INET6: > + KERNEL_LOCK(); > + /* XXXSMP there is a MP race in nd6_resolve() */ > error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); > + KERNEL_UNLOCK(); > if (error) > return (error); > break; > #endif > case AF_INET: > + KERNEL_LOCK(); > + /* XXXSMP there is a MP race in arpresolve() */ > error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); > + KERNEL_UNLOCK(); > if (error) > return (error); > break; > @@ -528,12 +540,14 @@ ether_input(struct ifnet *ifp, struct mb > case ETHERTYPE_PPPOE: > if (m->m_flags & (M_MCAST | M_BCAST)) > goto dropanyway; > + KERNEL_LOCK(); >
Re: forwarding in parallel ipsec workaround
On Wed, Jul 21, 2021 at 07:53:55PM +0200, Hrvoje Popovski wrote: > i've applied this and ipsec crypto no queue diff and i'm getting > splasserts below ... maybe it's something obvious, if not, i will try > diff by diff .. Ahh, to many diffs in my tree. I have forgotten the cunk crp->crp_flags = ... | CRYPTO_F_NOQUEUE Try this. Still testing it myself, it looks a bit faster. bluhm Index: crypto/crypto.c === RCS file: /data/mirror/openbsd/cvs/src/sys/crypto/crypto.c,v retrieving revision 1.84 diff -u -p -r1.84 crypto.c --- crypto/crypto.c 21 Jul 2021 11:11:41 - 1.84 +++ crypto/crypto.c 21 Jul 2021 19:51:14 - @@ -387,23 +387,32 @@ crypto_unregister(u_int32_t driverid, in int crypto_dispatch(struct cryptop *crp) { - struct taskq *tq = crypto_taskq; - int error = 0, s; + int error = 0, lock = 1, s; u_int32_t hid; s = splvm(); hid = (crp->crp_sid >> 32) & 0x; if (hid < crypto_drivers_num) { if (crypto_drivers[hid].cc_flags & CRYPTOCAP_F_MPSAFE) - tq = crypto_taskq_mpsafe; + lock = 0; } splx(s); - if ((crp->crp_flags & CRYPTO_F_NOQUEUE) == 0) { + /* XXXSMP crypto_invoke() is not MP safe */ + lock = 1; + + if (crp->crp_flags & CRYPTO_F_NOQUEUE) { + if (lock) + KERNEL_LOCK(); + error = crypto_invoke(crp); + if (lock) + KERNEL_UNLOCK(); + } else { + struct taskq *tq; + + tq = lock ? crypto_taskq : crypto_taskq_mpsafe; task_set(>crp_task, (void (*))crypto_invoke, crp); task_add(tq, >crp_task); - } else { - error = crypto_invoke(crp); } return error; @@ -424,6 +433,8 @@ crypto_invoke(struct cryptop *crp) if (crp == NULL || crp->crp_callback == NULL) return EINVAL; + KERNEL_ASSERT_LOCKED(); + s = splvm(); if (crp->crp_ndesc < 1 || crypto_drivers == NULL) { crp->crp_etype = EINVAL; @@ -535,11 +546,16 @@ void crypto_done(struct cryptop *crp) { crp->crp_flags |= CRYPTO_F_DONE; + if (crp->crp_flags & CRYPTO_F_NOQUEUE) { /* not from the crypto queue, wakeup the userland process */ crp->crp_callback(crp); } else { + struct taskq *tq; + + tq = (crp->crp_flags & CRYPTO_F_MPSAFE) ? + crypto_taskq_mpsafe : crypto_taskq; task_set(>crp_task, (void (*))crp->crp_callback, crp); - task_add(crypto_taskq, >crp_task); + task_add(tq, >crp_task); } } Index: crypto/cryptodev.h === RCS file: /data/mirror/openbsd/cvs/src/sys/crypto/cryptodev.h,v retrieving revision 1.73 diff -u -p -r1.73 cryptodev.h --- crypto/cryptodev.h 9 Jul 2021 20:43:28 - 1.73 +++ crypto/cryptodev.h 21 Jul 2021 19:48:44 - @@ -171,6 +171,7 @@ struct cryptop { #define CRYPTO_F_IMBUF 0x0001 /* Input/output are mbuf chains, otherwise contig */ #define CRYPTO_F_IOV 0x0002 /* Input/output are uio */ +#define CRYPTO_F_MPSAFE0x0004 /* Do not use kernel lock for callback */ #define CRYPTO_F_NOQUEUE 0x0008 /* Don't use crypto queue/thread */ #define CRYPTO_F_DONE 0x0010 /* request completed */ Index: netinet/ip_ah.c === RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_ah.c,v retrieving revision 1.151 diff -u -p -r1.151 ip_ah.c --- netinet/ip_ah.c 18 Jul 2021 14:38:20 - 1.151 +++ netinet/ip_ah.c 21 Jul 2021 19:51:14 - @@ -684,7 +684,7 @@ ah_input(struct mbuf *m, struct tdb *tdb /* Crypto operation descriptor. */ crp->crp_ilen = m->m_pkthdr.len; /* Total input length. */ - crp->crp_flags = CRYPTO_F_IMBUF; + crp->crp_flags = CRYPTO_F_IMBUF | CRYPTO_F_MPSAFE | CRYPTO_F_NOQUEUE; crp->crp_buf = (caddr_t)m; crp->crp_callback = ipsec_input_cb; crp->crp_sid = tdb->tdb_cryptoid; @@ -699,9 +699,7 @@ ah_input(struct mbuf *m, struct tdb *tdb memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union)); tc->tc_rpl = tdb->tdb_rpl; - KERNEL_LOCK(); error = crypto_dispatch(crp); - KERNEL_UNLOCK(); return error; drop: @@ -1134,7 +1132,7 @@ ah_output(struct mbuf *m, struct tdb *td /* Crypto operation descriptor. */ crp->crp_ilen = m->m_pkthdr.len; /* Total input length. */ - crp->crp_flags = CRYPTO_F_IMBUF; + crp->crp_flags = CRYPTO_F_IMBUF | CRYPTO_F_MPSAFE | CRYPTO_F_NOQUEUE; crp->crp_buf = (caddr_t)m; crp->crp_callback = ipsec_output_cb; crp->crp_sid = tdb->tdb_cryptoid; @@ -1148,9 +1146,7 @@
Re: forwarding in parallel ipsec workaround
On 21.7.2021. 18:41, Alexander Bluhm wrote: > On Mon, Jul 19, 2021 at 07:33:55PM +0300, Vitaliy Makkoveev wrote: >> Hi, pipex(4) is also not ready for parallel access. In the chunk below >> it will be accessed through (*ifp->if_input)() -> ether_input() -> >> pipex_pppoe_input(). This looks not fatal but makes at least session >> statistics inconsistent. > For pipex and pppoe we can put a kernel lock into ether_resolve(). > I also put locks in mpls. > > I am not aware of any other issues. Hi, i've applied this and ipsec crypto no queue diff and i'm getting splasserts below ... maybe it's something obvious, if not, i will try diff by diff .. I'm running iked tunnel ... log on box where traffic goes into tunnel splassert: ipsp_process_done: want 2 have 0 Starting stack trace... ipsp_process_done(fd80a8647400,811832a0) at ipsp_process_done+0x59 esp_output_cb(811832a0,81194e00,fd80a8647400,21c,0) at esp_output_cb+0x41 ipsec_output_cb(fd8397dad440) at ipsec_output_cb+0x177 taskq_thread(8013f480) at taskq_thread+0x81 end trace frame: 0x0, count: 253 End of stack trace. splassert: ip_output: want 2 have 0 Starting stack trace... ip_output(fd80a8647400,0,0,2,0,0) at ip_output+0x88 ipsp_process_done(fd80a8647400,811832a0) at ipsp_process_done+0x384 esp_output_cb(811832a0,81194e00,fd80a8647400,21c,0) at esp_o utput_cb+0x41 ipsec_output_cb(fd8397dad440) at ipsec_output_cb+0x177 taskq_thread(8013f480) at taskq_thread+0x81 end trace frame: 0x0, count: 252 End of stack trace. logs on box where traffic goes from tunnel out splassert: tdb_hash: want 2 have 0 Starting stack trace... tdb_hash(36b50c1c,81199200,32) at tdb_hash+0x5c gettdb_dir(0,36b50c1c,81199200,32,0) at gettdb_dir+0x83 ipsec_input_cb(fd839e0c6ca0) at ipsec_input_cb+0x7a taskq_thread(80140d80) at taskq_thread+0x81 end trace frame: 0x0, count: 253 End of stack trace. splassert: esp_input_cb: want 2 have 0 Starting stack trace... esp_input_cb(8118a5f8,81199200,fd80026f5500,0) at esp_input_cb+0x73 ipsec_input_cb(fd839e0c6ca0) at ipsec_input_cb+0x162 taskq_thread(80140d80) at taskq_thread+0x81 end trace frame: 0x0, count: 254 End of stack trace. splassert: enc_getif: want 2 have 0 Starting stack trace... enc_getif(0,0) at enc_getif+0x54 ipsec_common_input_cb(fd80026f5500,8118a5f8,14,9) at ipsec_common_input_cb+0x934 esp_input_cb(8118a5f8,81199200,fd80026f5500,0) at esp_input_cb+0x456 ipsec_input_cb(fd839e0c6ca0) at ipsec_input_cb+0x162 taskq_thread(80140d80) at taskq_thread+0x81 end trace frame: 0x0, count: 252 End of stack trace.
Re: forwarding in parallel ipsec workaround
On Mon, Jul 19, 2021 at 07:33:55PM +0300, Vitaliy Makkoveev wrote: > Hi, pipex(4) is also not ready for parallel access. In the chunk below > it will be accessed through (*ifp->if_input)() -> ether_input() -> > pipex_pppoe_input(). This looks not fatal but makes at least session > statistics inconsistent. For pipex and pppoe we can put a kernel lock into ether_resolve(). I also put locks in mpls. I am not aware of any other issues. Can we commit this now? bluhm Index: net/if.c === RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v retrieving revision 1.643 diff -u -p -r1.643 if.c --- net/if.c20 Jul 2021 16:32:28 - 1.643 +++ net/if.c21 Jul 2021 16:07:15 - @@ -109,6 +109,10 @@ #include #endif +#ifdef IPSEC +#include +#endif + #ifdef MPLS #include #endif @@ -238,7 +242,7 @@ int ifq_congestion; int netisr; -#defineNET_TASKQ 1 +#defineNET_TASKQ 4 struct taskq *nettqmp[NET_TASKQ]; struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL); @@ -815,6 +819,7 @@ void if_input_process(struct ifnet *ifp, struct mbuf_list *ml) { struct mbuf *m; + int exclusive_lock = 0; if (ml_empty(ml)) return; @@ -835,15 +840,27 @@ if_input_process(struct ifnet *ifp, stru * lists and the socket layer. */ +#ifdef IPSEC /* * XXXSMP IPsec data structures are not ready to be accessed * by multiple network threads in parallel. In this case * use an exclusive lock. */ - NET_LOCK(); + if (ipsec_in_use) + exclusive_lock = 1; +#endif + if (exclusive_lock) + NET_LOCK(); + else + NET_RLOCK_IN_SOFTNET(); + while ((m = ml_dequeue(ml)) != NULL) (*ifp->if_input)(ifp, m); - NET_UNLOCK(); + + if (exclusive_lock) + NET_UNLOCK(); + else + NET_RUNLOCK_IN_SOFTNET(); } void @@ -900,6 +917,12 @@ if_netisr(void *unused) arpintr(); KERNEL_UNLOCK(); } +#endif + if (n & (1 << NETISR_IP)) + ipintr(); +#ifdef INET6 + if (n & (1 << NETISR_IPV6)) + ip6intr(); #endif #if NPPP > 0 if (n & (1 << NETISR_PPP)) { Index: net/if_ethersubr.c === RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v retrieving revision 1.275 diff -u -p -r1.275 if_ethersubr.c --- net/if_ethersubr.c 7 Jul 2021 20:19:01 - 1.275 +++ net/if_ethersubr.c 21 Jul 2021 16:34:51 - @@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct switch (af) { case AF_INET: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in arpresolve() */ error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); eh->ether_type = htons(ETHERTYPE_IP); @@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct break; #ifdef INET6 case AF_INET6: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in nd6_resolve() */ error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); eh->ether_type = htons(ETHERTYPE_IPV6); @@ -271,13 +277,19 @@ ether_resolve(struct ifnet *ifp, struct break; #ifdef INET6 case AF_INET6: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in nd6_resolve() */ error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); break; #endif case AF_INET: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in arpresolve() */ error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); break; @@ -528,12 +540,14 @@ ether_input(struct ifnet *ifp, struct mb case ETHERTYPE_PPPOE: if (m->m_flags & (M_MCAST | M_BCAST)) goto dropanyway; + KERNEL_LOCK(); #ifdef PIPEX if (pipex_enable) { struct pipex_session *session; if ((session = pipex_pppoe_lookup_session(m)) != NULL) { pipex_pppoe_input(m, session); +
Re: forwarding in parallel ipsec workaround
On 20/07/21(Tue) 15:46, Alexander Bluhm wrote: > On Tue, Jul 20, 2021 at 02:26:02PM +0200, Alexander Bluhm wrote: > > > Note that having multiple threads competing for an exclusive rwlock will > > > generate unnecessary wakeup/sleep cycles every time the lock is released. > > > It is valuable to keep this in mind as it might add extra latency when > > > processing packets. > > > > Of course. What do you recommend? > > We may have another alternative. > > - Always use a shared net lock but also aquire kernel lock. Using an exclusive NET_LOCK() seems better, that's what we have now. My point is just that if we're using an exclusive NET_LOCK() we lose the gain of having multiple softnet threads, so if we could reduce the number of threads so would be better. That said, if it isn't trivial to do so, I'd better spend the time into making IPsec works with parallel threads. Another reason for not using an exclusive lock in the softnet thread is to be able to execute read ioctls at the same time as the forwarding path. This has been reverted due to a bug elsewhere last year and never got activated again. But somebody might want to revisit this, 'cause doing ifconfig(8) on a busy machine was hanging for a very long time.
Re: forwarding in parallel ipsec workaround
On 20/07/21(Tue) 14:26, Alexander Bluhm wrote: > On Tue, Jul 20, 2021 at 10:08:09AM +0200, Martin Pieuchot wrote: > > On 19/07/21(Mon) 17:53, Alexander Bluhm wrote: > > > Hi, > > > > > > I found why the IPsec workaround did not work. > > > > > > At init time we set ifiq->ifiq_softnet = net_tq(ifp->if_index + > > > idx), but the workaround modifies net_tq() at runtime. Modifying > > > net_tq() at runtime is bad anyway as task_add() and task_del() could > > > be called with different task queues. > > > > > > So better use exclusive lock if IPsec is in use. For me this is > > > running stable. > > > > Note that having multiple threads competing for an exclusive rwlock will > > generate unnecessary wakeup/sleep cycles every time the lock is released. > > It is valuable to keep this in mind as it might add extra latency when > > processing packets. > > Of course. What do you recommend? What you find easier to make progress with :) > - Develop outside of the tree until all problems are fixed. > - Delay work on parallel forwarding until IPsec is MP safe. > - Accept a possible slowdown of IPsec. In my measurements it gets > faster even with the exclusive lock. > - Concentrate on making IPsec faster. By removing the crypto > queues you gain much more performance than the exclusive lock may > cost. Did you see the massive kernel locks in my graph? > > http://bluhm.genua.de/perform/results/latest/patch-sys-ip-multiqueue.1/btrace/ssh_perform%40lt13_iperf3_-c10.4.56.36_-P10_-t10_-R-btrace-kstack.0.svg I didn't see this, nice to know. > - Make ARP MP safe. Currently we need the kernel lock there or > it crashes. This creates latency for all kind of packets. > - Convert the rwlock in pf to mutex. I think your argument counts > much more there. But I cannot prove it. My argument about what? Didn't we all agree about converting the rwlock to a mutex for now? > My plan is to commit what we have and improve where most pain is. > This makes incremental steps easier. Makes sense.
Re: forwarding in parallel ipsec workaround
On Tue, Jul 20, 2021 at 02:26:02PM +0200, Alexander Bluhm wrote: > > Note that having multiple threads competing for an exclusive rwlock will > > generate unnecessary wakeup/sleep cycles every time the lock is released. > > It is valuable to keep this in mind as it might add extra latency when > > processing packets. > > Of course. What do you recommend? We may have another alternative. - Always use a shared net lock but also aquire kernel lock. This seems to be a quick fix for all the MP problmes when running in parallel. Then the kernel lock can be removed step by step. I will create a diff an test. bluhm > - Develop outside of the tree until all problems are fixed. > - Delay work on parallel forwarding until IPsec is MP safe. > - Accept a possible slowdown of IPsec. In my measurements it gets > faster even with the exclusive lock. > - Concentrate on making IPsec faster. By removing the crypto > queues you gain much more performance than the exclusive lock may > cost. Did you see the massive kernel locks in my graph? > > http://bluhm.genua.de/perform/results/latest/patch-sys-ip-multiqueue.1/btrace/ssh_perform%40lt13_iperf3_-c10.4.56.36_-P10_-t10_-R-btrace-kstack.0.svg > - Make ARP MP safe. Currently we need the kernel lock there or > it crashes. This creates latency for all kind of packets. > - Convert the rwlock in pf to mutex. I think your argument counts > much more there. But I cannot prove it. > > My plan is to commit what we have and improve where most pain is. > This makes incremental steps easier. > > bluhm
Re: forwarding in parallel ipsec workaround
On Tue, Jul 20, 2021 at 10:08:09AM +0200, Martin Pieuchot wrote: > On 19/07/21(Mon) 17:53, Alexander Bluhm wrote: > > Hi, > > > > I found why the IPsec workaround did not work. > > > > At init time we set ifiq->ifiq_softnet = net_tq(ifp->if_index + > > idx), but the workaround modifies net_tq() at runtime. Modifying > > net_tq() at runtime is bad anyway as task_add() and task_del() could > > be called with different task queues. > > > > So better use exclusive lock if IPsec is in use. For me this is > > running stable. > > Note that having multiple threads competing for an exclusive rwlock will > generate unnecessary wakeup/sleep cycles every time the lock is released. > It is valuable to keep this in mind as it might add extra latency when > processing packets. Of course. What do you recommend? - Develop outside of the tree until all problems are fixed. - Delay work on parallel forwarding until IPsec is MP safe. - Accept a possible slowdown of IPsec. In my measurements it gets faster even with the exclusive lock. - Concentrate on making IPsec faster. By removing the crypto queues you gain much more performance than the exclusive lock may cost. Did you see the massive kernel locks in my graph? http://bluhm.genua.de/perform/results/latest/patch-sys-ip-multiqueue.1/btrace/ssh_perform%40lt13_iperf3_-c10.4.56.36_-P10_-t10_-R-btrace-kstack.0.svg - Make ARP MP safe. Currently we need the kernel lock there or it crashes. This creates latency for all kind of packets. - Convert the rwlock in pf to mutex. I think your argument counts much more there. But I cannot prove it. My plan is to commit what we have and improve where most pain is. This makes incremental steps easier. bluhm > > Index: net/if.c > > === > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v > > retrieving revision 1.642 > > diff -u -p -r1.642 if.c > > --- net/if.c30 Jun 2021 13:23:33 - 1.642 > > +++ net/if.c19 Jul 2021 14:51:31 - > > @@ -109,6 +109,10 @@ > > #include > > #endif > > > > +#ifdef IPSEC > > +#include > > +#endif > > + > > #ifdef MPLS > > #include > > #endif > > @@ -238,7 +242,7 @@ int ifq_congestion; > > > > int netisr; > > > > -#defineNET_TASKQ 1 > > +#defineNET_TASKQ 4 > > struct taskq *nettqmp[NET_TASKQ]; > > > > struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL); > > @@ -815,6 +819,7 @@ void > > if_input_process(struct ifnet *ifp, struct mbuf_list *ml) > > { > > struct mbuf *m; > > + int exclusive_lock = 0; > > > > if (ml_empty(ml)) > > return; > > @@ -834,10 +839,25 @@ if_input_process(struct ifnet *ifp, stru > > * to PF globals, pipex globals, unicast and multicast addresses > > * lists and the socket layer. > > */ > > - NET_LOCK(); > > + > > + /* > > +* XXXSMP IPsec data structures are not ready to be > > +* accessed by multiple Network threads in parallel. > > +*/ > > + if (ipsec_in_use) > > + exclusive_lock = 1; > > + if (exclusive_lock) > > + NET_LOCK(); > > + else > > + NET_RLOCK_IN_SOFTNET(); > > + > > while ((m = ml_dequeue(ml)) != NULL) > > (*ifp->if_input)(ifp, m); > > - NET_UNLOCK(); > > + > > + if (exclusive_lock) > > + NET_UNLOCK(); > > + else > > + NET_RUNLOCK_IN_SOFTNET(); > > } > > > > void > > @@ -895,6 +915,12 @@ if_netisr(void *unused) > > KERNEL_UNLOCK(); > > } > > #endif > > + if (n & (1 << NETISR_IP)) > > + ipintr(); > > +#ifdef INET6 > > + if (n & (1 << NETISR_IPV6)) > > + ip6intr(); > > +#endif > > #if NPPP > 0 > > if (n & (1 << NETISR_PPP)) { > > KERNEL_LOCK(); > > @@ -3311,17 +3337,14 @@ unhandled_af(int af) > > panic("unhandled af %d", af); > > } > > > > -/* > > - * XXXSMP This tunable is here to work around the fact that IPsec > > - * globals aren't ready to be accessed by multiple threads in > > - * parallel. > > - */ > > -int nettaskqs = NET_TASKQ; > > - > > struct taskq * > > net_tq(unsigned int ifindex) > > { > > struct taskq *t = NULL; > > + static int nettaskqs; > > + > > + if (nettaskqs == 0) > > + nettaskqs = min(NET_TASKQ, ncpus); > > > > t = nettqmp[ifindex % nettaskqs]; > > > > Index: net/if_ethersubr.c > > === > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v > > retrieving revision 1.275 > > diff -u -p -r1.275 if_ethersubr.c > > --- net/if_ethersubr.c 7 Jul 2021 20:19:01 - 1.275 > > +++ net/if_ethersubr.c 19 Jul 2021 14:32:48 - > > @@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct > > > > switch (af) { > > case AF_INET: > > +
Re: forwarding in parallel ipsec workaround
On 19/07/21(Mon) 17:53, Alexander Bluhm wrote: > Hi, > > I found why the IPsec workaround did not work. > > At init time we set ifiq->ifiq_softnet = net_tq(ifp->if_index + > idx), but the workaround modifies net_tq() at runtime. Modifying > net_tq() at runtime is bad anyway as task_add() and task_del() could > be called with different task queues. > > So better use exclusive lock if IPsec is in use. For me this is > running stable. Note that having multiple threads competing for an exclusive rwlock will generate unnecessary wakeup/sleep cycles every time the lock is released. It is valuable to keep this in mind as it might add extra latency when processing packets. > Index: net/if.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v > retrieving revision 1.642 > diff -u -p -r1.642 if.c > --- net/if.c 30 Jun 2021 13:23:33 - 1.642 > +++ net/if.c 19 Jul 2021 14:51:31 - > @@ -109,6 +109,10 @@ > #include > #endif > > +#ifdef IPSEC > +#include > +#endif > + > #ifdef MPLS > #include > #endif > @@ -238,7 +242,7 @@ int ifq_congestion; > > int netisr; > > -#define NET_TASKQ 1 > +#define NET_TASKQ 4 > struct taskq *nettqmp[NET_TASKQ]; > > struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL); > @@ -815,6 +819,7 @@ void > if_input_process(struct ifnet *ifp, struct mbuf_list *ml) > { > struct mbuf *m; > + int exclusive_lock = 0; > > if (ml_empty(ml)) > return; > @@ -834,10 +839,25 @@ if_input_process(struct ifnet *ifp, stru >* to PF globals, pipex globals, unicast and multicast addresses >* lists and the socket layer. >*/ > - NET_LOCK(); > + > + /* > + * XXXSMP IPsec data structures are not ready to be > + * accessed by multiple Network threads in parallel. > + */ > + if (ipsec_in_use) > + exclusive_lock = 1; > + if (exclusive_lock) > + NET_LOCK(); > + else > + NET_RLOCK_IN_SOFTNET(); > + > while ((m = ml_dequeue(ml)) != NULL) > (*ifp->if_input)(ifp, m); > - NET_UNLOCK(); > + > + if (exclusive_lock) > + NET_UNLOCK(); > + else > + NET_RUNLOCK_IN_SOFTNET(); > } > > void > @@ -895,6 +915,12 @@ if_netisr(void *unused) > KERNEL_UNLOCK(); > } > #endif > + if (n & (1 << NETISR_IP)) > + ipintr(); > +#ifdef INET6 > + if (n & (1 << NETISR_IPV6)) > + ip6intr(); > +#endif > #if NPPP > 0 > if (n & (1 << NETISR_PPP)) { > KERNEL_LOCK(); > @@ -3311,17 +3337,14 @@ unhandled_af(int af) > panic("unhandled af %d", af); > } > > -/* > - * XXXSMP This tunable is here to work around the fact that IPsec > - * globals aren't ready to be accessed by multiple threads in > - * parallel. > - */ > -int nettaskqs = NET_TASKQ; > - > struct taskq * > net_tq(unsigned int ifindex) > { > struct taskq *t = NULL; > + static int nettaskqs; > + > + if (nettaskqs == 0) > + nettaskqs = min(NET_TASKQ, ncpus); > > t = nettqmp[ifindex % nettaskqs]; > > Index: net/if_ethersubr.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v > retrieving revision 1.275 > diff -u -p -r1.275 if_ethersubr.c > --- net/if_ethersubr.c7 Jul 2021 20:19:01 - 1.275 > +++ net/if_ethersubr.c19 Jul 2021 14:32:48 - > @@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct > > switch (af) { > case AF_INET: > + KERNEL_LOCK(); > + /* XXXSMP there is a MP race in arpresolve() */ > error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); > + KERNEL_UNLOCK(); > if (error) > return (error); > eh->ether_type = htons(ETHERTYPE_IP); > @@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct > break; > #ifdef INET6 > case AF_INET6: > + KERNEL_LOCK(); > + /* XXXSMP there is a MP race in nd6_resolve() */ > error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); > + KERNEL_UNLOCK(); > if (error) > return (error); > eh->ether_type = htons(ETHERTYPE_IPV6); > Index: net/ifq.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/net/ifq.c,v > retrieving revision 1.44 > diff -u -p -r1.44 ifq.c > --- net/ifq.c 9 Jul 2021 01:22:05 - 1.44 > +++ net/ifq.c 19 Jul 2021 14:32:48 - > @@ -243,7 +243,7 @@ void > ifq_init(struct ifqueue *ifq, struct ifnet *ifp, unsigned int idx) > { > ifq->ifq_if = ifp; > - ifq->ifq_softnet =
Re: forwarding in parallel ipsec workaround
On 19.7.2021. 17:53, Alexander Bluhm wrote: > Hi, > > I found why the IPsec workaround did not work. > > At init time we set ifiq->ifiq_softnet = net_tq(ifp->if_index + > idx), but the workaround modifies net_tq() at runtime. Modifying > net_tq() at runtime is bad anyway as task_add() and task_del() could > be called with different task queues. > > So better use exclusive lock if IPsec is in use. For me this is > running stable. Hi, i can't trigger panic with this diff. I've tried with isakmpd and with iked ... With this diff traffic through tunnel seems little slower ...
Re: forwarding in parallel ipsec workaround
On Mon, Jul 19, 2021 at 05:53:40PM +0200, Alexander Bluhm wrote: > Hi, > > I found why the IPsec workaround did not work. > > At init time we set ifiq->ifiq_softnet = net_tq(ifp->if_index + > idx), but the workaround modifies net_tq() at runtime. Modifying > net_tq() at runtime is bad anyway as task_add() and task_del() could > be called with different task queues. > > So better use exclusive lock if IPsec is in use. For me this is > running stable. > > bluhm > Hi, pipex(4) is also not ready for parallel access. In the chunk below it will be accessed through (*ifp->if_input)() -> ether_input() -> pipex_pppoe_input(). This looks not fatal but makes at least session statistics inconsistent. > @@ -834,10 +839,25 @@ if_input_process(struct ifnet *ifp, stru >* to PF globals, pipex globals, unicast and multicast addresses >* lists and the socket layer. >*/ > - NET_LOCK(); > + > + /* > + * XXXSMP IPsec data structures are not ready to be > + * accessed by multiple Network threads in parallel. > + */ > + if (ipsec_in_use) ---^ So I like to check at least `pipe_enable' here. Or introduce sessions counter `pipex_in_use' for better performance. > + exclusive_lock = 1; > + if (exclusive_lock) > + NET_LOCK(); > + else > + NET_RLOCK_IN_SOFTNET(); > + > while ((m = ml_dequeue(ml)) != NULL) > (*ifp->if_input)(ifp, m); > - NET_UNLOCK();