Re: [External] : parallel IP forwarding
On Fri, Apr 08, 2022 at 12:56:12PM +0200, Alexander Bluhm wrote: > Hi, > > I now the right time to commit the parallel forwarding diff? > > Known limitiations are: > - Hrvoje has seen a crash with both pfsync and ipsec on his production > machine. But he cannot reproduce it in his lab. > - TCP processing gets slower as we have an additional queue between > IP and protocol layer. > - Protocol layer may starve as 1 exclusive lock is fightig with 4 > shared locks. This happens only when forwardig a lot. > > The advantage of commiting is that we see how relevant these things > are in real world. But the most important thing is that we learn > how all the locks behave under MP pressure. You can add a lot of > locking, but only when you run in parallel, you see if it is correct. > > An alternative could be to commit it with NET_TASKQ 1. With only > one softnet thread I would expect to see less bugs, but there is > also less to learn. NET_TASKQ 1 could be a safe point where we > could easily switch back. > > bluhm > I think it's worth to give it a try. May be I would just split it to three commits/diffs: diff which adds task queues for forwarding diff which adds a KERNEL_LOCK to deal with arp diff which deals with local delivery to socket it's up to you. all changes look good to me. You have my OK, in case you want to proceed in one big step. OK sashan
Re: parallel IP forwarding
On Mon, Apr 18, 2022 at 12:27:23PM +0200, Hrvoje Popovski wrote: > On 8.4.2022. 12:56, Alexander Bluhm wrote: > > I now the right time to commit the parallel forwarding diff? > > > > Known limitiations are: > > - Hrvoje has seen a crash with both pfsync and ipsec on his production > > machine. But he cannot reproduce it in his lab. > > This is resolved. At least this panic doesn't happen any more. Good to hear. I guess the crash seen before is related to an uncommited diff that was tested in Hrvoje's setup. So we can run IP forwarding in parallel. I would like to commit this diff now. Then we can learn whether there are other limitations and fix them in tree. ok? bluhm Index: net/if.c === RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v retrieving revision 1.649 diff -u -p -r1.649 if.c --- net/if.c25 Feb 2022 23:51:03 - 1.649 +++ net/if.c18 Apr 2022 17:12:59 - @@ -237,7 +237,7 @@ int ifq_congestion; int netisr; -#defineNET_TASKQ 1 +#defineNET_TASKQ 4 struct taskq *nettqmp[NET_TASKQ]; struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL); @@ -834,15 +834,10 @@ if_input_process(struct ifnet *ifp, stru * lists and the socket layer. */ - /* -* XXXSMP IPsec data structures are not ready to be accessed -* by multiple network threads in parallel. In this case -* use an exclusive lock. -*/ - NET_LOCK(); + NET_RLOCK_IN_SOFTNET(); while ((m = ml_dequeue(ml)) != NULL) (*ifp->if_input)(ifp, m); - NET_UNLOCK(); + NET_RUNLOCK_IN_SOFTNET(); } void @@ -899,6 +894,12 @@ if_netisr(void *unused) arpintr(); KERNEL_UNLOCK(); } +#endif + if (n & (1 << NETISR_IP)) + ipintr(); +#ifdef INET6 + if (n & (1 << NETISR_IPV6)) + ip6intr(); #endif #if NPPP > 0 if (n & (1 << NETISR_PPP)) { Index: net/if_ethersubr.c === RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v retrieving revision 1.278 diff -u -p -r1.278 if_ethersubr.c --- net/if_ethersubr.c 22 Feb 2022 01:15:02 - 1.278 +++ net/if_ethersubr.c 18 Apr 2022 17:12:59 - @@ -221,7 +221,10 @@ ether_resolve(struct ifnet *ifp, struct switch (af) { case AF_INET: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in arpresolve() */ error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); eh->ether_type = htons(ETHERTYPE_IP); @@ -244,7 +247,10 @@ ether_resolve(struct ifnet *ifp, struct break; #ifdef INET6 case AF_INET6: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in nd6_resolve() */ error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); eh->ether_type = htons(ETHERTYPE_IPV6); @@ -270,13 +276,19 @@ ether_resolve(struct ifnet *ifp, struct break; #ifdef INET6 case AF_INET6: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in nd6_resolve() */ error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); break; #endif case AF_INET: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in arpresolve() */ error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); break; @@ -528,12 +540,14 @@ ether_input(struct ifnet *ifp, struct mb case ETHERTYPE_PPPOE: if (m->m_flags & (M_MCAST | M_BCAST)) goto dropanyway; + KERNEL_LOCK(); #ifdef PIPEX if (pipex_enable) { struct pipex_session *session; if ((session = pipex_pppoe_lookup_session(m)) != NULL) { pipex_pppoe_input(m, session); + KERNEL_UNLOCK(); return; } } @@ -542,6 +556,7 @@ ether_input(struct ifnet *ifp, struct mb pppoe_disc_input(m); else pppoe_data_input(m); + KERNEL_UNLOCK();
Re: parallel IP forwarding
On 8.4.2022. 12:56, Alexander Bluhm wrote: > Hi, > > I now the right time to commit the parallel forwarding diff? > > Known limitiations are: > - Hrvoje has seen a crash with both pfsync and ipsec on his production > machine. But he cannot reproduce it in his lab. This is resolved. At least this panic doesn't happen any more. > - TCP processing gets slower as we have an additional queue between > IP and protocol layer. > - Protocol layer may starve as 1 exclusive lock is fightig with 4 > shared locks. This happens only when forwardig a lot.
parallel IP forwarding
Hi, I now the right time to commit the parallel forwarding diff? Known limitiations are: - Hrvoje has seen a crash with both pfsync and ipsec on his production machine. But he cannot reproduce it in his lab. - TCP processing gets slower as we have an additional queue between IP and protocol layer. - Protocol layer may starve as 1 exclusive lock is fightig with 4 shared locks. This happens only when forwardig a lot. The advantage of commiting is that we see how relevant these things are in real world. But the most important thing is that we learn how all the locks behave under MP pressure. You can add a lot of locking, but only when you run in parallel, you see if it is correct. An alternative could be to commit it with NET_TASKQ 1. With only one softnet thread I would expect to see less bugs, but there is also less to learn. NET_TASKQ 1 could be a safe point where we could easily switch back. bluhm Index: net/if.c === RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v retrieving revision 1.649 diff -u -p -r1.649 if.c --- net/if.c25 Feb 2022 23:51:03 - 1.649 +++ net/if.c29 Mar 2022 12:44:05 - @@ -237,7 +237,7 @@ int ifq_congestion; int netisr; -#defineNET_TASKQ 1 +#defineNET_TASKQ 4 struct taskq *nettqmp[NET_TASKQ]; struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL); @@ -834,15 +834,10 @@ if_input_process(struct ifnet *ifp, stru * lists and the socket layer. */ - /* -* XXXSMP IPsec data structures are not ready to be accessed -* by multiple network threads in parallel. In this case -* use an exclusive lock. -*/ - NET_LOCK(); + NET_RLOCK_IN_SOFTNET(); while ((m = ml_dequeue(ml)) != NULL) (*ifp->if_input)(ifp, m); - NET_UNLOCK(); + NET_RUNLOCK_IN_SOFTNET(); } void @@ -899,6 +894,12 @@ if_netisr(void *unused) arpintr(); KERNEL_UNLOCK(); } +#endif + if (n & (1 << NETISR_IP)) + ipintr(); +#ifdef INET6 + if (n & (1 << NETISR_IPV6)) + ip6intr(); #endif #if NPPP > 0 if (n & (1 << NETISR_PPP)) { Index: net/if_ethersubr.c === RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v retrieving revision 1.278 diff -u -p -r1.278 if_ethersubr.c --- net/if_ethersubr.c 22 Feb 2022 01:15:02 - 1.278 +++ net/if_ethersubr.c 29 Mar 2022 12:44:05 - @@ -221,7 +221,10 @@ ether_resolve(struct ifnet *ifp, struct switch (af) { case AF_INET: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in arpresolve() */ error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); eh->ether_type = htons(ETHERTYPE_IP); @@ -244,7 +247,10 @@ ether_resolve(struct ifnet *ifp, struct break; #ifdef INET6 case AF_INET6: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in nd6_resolve() */ error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); eh->ether_type = htons(ETHERTYPE_IPV6); @@ -270,13 +276,19 @@ ether_resolve(struct ifnet *ifp, struct break; #ifdef INET6 case AF_INET6: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in nd6_resolve() */ error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); break; #endif case AF_INET: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in arpresolve() */ error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); break; @@ -528,12 +540,14 @@ ether_input(struct ifnet *ifp, struct mb case ETHERTYPE_PPPOE: if (m->m_flags & (M_MCAST | M_BCAST)) goto dropanyway; + KERNEL_LOCK(); #ifdef PIPEX if (pipex_enable) { struct pipex_session *session; if ((session = pipex_pppoe_lookup_session(m)) != NULL) { pipex_pppoe_input(m, session); + KERNEL_UNLOCK(); return;
Re: parallel ip forwarding
> On 30 Dec 2021, at 11:28, YASUOKA Masahiko wrote: > > Hi, > > On Sat, 25 Dec 2021 21:50:47 +0300 > Vitaliy Makkoveev wrote: >> On Fri, Dec 24, 2021 at 12:50:23PM +0100, Alexander Bluhm wrote: >>> On Fri, Dec 24, 2021 at 04:16:28PM +0900, YASUOKA Masahiko wrote: > - npppd l2pt ipsecflowinfo is not MP safe Does this mean the things we are discussing on the "Fix ipsp_spd_lookup() for transport mode" thread? I wonder if there is another issue. >>> >>> In this mail thread I was concerned about things might get worse. >>> >>> Currently I see these problems: >>> >>> tdb_free() will be called with a shared netlock. From there >>> ipsp_ids_free() is called. >>> >>>if (--ids->id_refcount > 0) >>>return; >>> >>> This ref count needs to be atomic. >>> >>>if (LIST_EMPTY(_ids_gc_list)) >>>timeout_add_sec(_ids_gc_timeout, 1); >>>LIST_INSERT_HEAD(_ids_gc_list, ids, id_gc_list); >>> >>> And some mutex should protect ipsp_ids_gc_list. > > Thanks, I suppose I could catch up the problem. > >> The diff below adds `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' >> list and trees. ipsp_ids_lookup() returns `ids' with bumped reference >> counter. > > This direction seems good. > > One thing, I found a problem. > >> Index: sys/netinet/ip_spd.c >> === >> RCS file: /cvs/src/sys/netinet/ip_spd.c,v >> retrieving revision 1.110 >> diff -u -p -r1.110 ip_spd.c >> --- sys/netinet/ip_spd.c 16 Dec 2021 15:38:03 - 1.110 >> +++ sys/netinet/ip_spd.c 25 Dec 2021 18:34:22 - >> @@ -418,6 +418,7 @@ ipsp_spd_lookup(struct mbuf *m, int af, >> /* Cached entry is good. */ >> error = ipsp_spd_inp(m, inp, ipo, tdbout); >> mtx_leave(_tdb_mtx); >> +ipsp_ids_free(ids); >> return error; >> >> nomatchout: >> @@ -452,6 +453,7 @@ ipsp_spd_lookup(struct mbuf *m, int af, >> dignore ? : >ipo_dst, >> ipo->ipo_sproto, ids ? ids: ipo->ipo_ids, >> >ipo_addr, >ipo_mask); >> +ipsp_ids_free(ids); >> mtx_enter(_tdb_mtx); >> if ((tdbp_new != NULL) && >> (tdbp_new->tdb_flags & TDBF_DELETED)) { > > ids will remain unfreed since there are some code paths which doesn't > pass the above lines. > You are right, I missed that. > I tried to fix that, but adding a lot of ipsp_ids_free() looks a mess. > Instead, how about changing ipsp_spd_lookup() to take a "struct > ipsec_ids *ids" as an argument and letting the caller take the > resposibility of the ids? > Seems reasonable. ok mvs@ > Index: sys/net/if_bridge.c > === > RCS file: /disk/cvs/openbsd/src/sys/net/if_bridge.c,v > retrieving revision 1.362 > diff -u -p -r1.362 if_bridge.c > --- sys/net/if_bridge.c 23 Dec 2021 12:21:48 - 1.362 > +++ sys/net/if_bridge.c 30 Dec 2021 08:12:18 - > @@ -1595,7 +1595,7 @@ bridge_ipsec(struct ifnet *ifp, struct e > } > } else { /* Outgoing from the bridge. */ > error = ipsp_spd_lookup(m, af, hlen, IPSP_DIRECTION_OUT, > - NULL, NULL, , 0); > + NULL, NULL, , NULL); > if (error == 0 && tdb != NULL) { > /* >* We don't need to do loop detection, the > Index: sys/net/if_veb.c > === > RCS file: /disk/cvs/openbsd/src/sys/net/if_veb.c,v > retrieving revision 1.21 > diff -u -p -r1.21 if_veb.c > --- sys/net/if_veb.c 8 Nov 2021 04:15:46 - 1.21 > +++ sys/net/if_veb.c 30 Dec 2021 08:12:18 - > @@ -746,7 +746,7 @@ veb_ipsec_proto_out(struct mbuf *m, sa_f > #endif > > tdb = ipsp_spd_lookup(m, af, iphlen, , IPSP_DIRECTION_OUT, > - NULL, NULL, 0); > + NULL, NULL, NULL); > if (tdb == NULL) > return (m); > > Index: sys/netinet/ip_ipsp.c > === > RCS file: /disk/cvs/openbsd/src/sys/netinet/ip_ipsp.c,v > retrieving revision 1.267 > diff -u -p -r1.267 ip_ipsp.c > --- sys/netinet/ip_ipsp.c 20 Dec 2021 15:59:09 - 1.267 > +++ sys/netinet/ip_ipsp.c 30 Dec 2021 08:12:18 - > @@ -47,6 +47,8 @@ > #include > #include > #include > +#include > +#include > > #include > #include > @@ -84,6 +86,13 @@ void tdb_hashstats(void); > do { } while (0) > #endif > > +/* > + * Locks used to protect global data and struct members: > + * F ipsec_flows_mtx > + */ > + > +struct mutex ipsec_flows_mtx = MUTEX_INITIALIZER(IPL_SOFTNET); > + > int tdb_rehash(void); > void tdb_timeout(void *); > void tdb_firstuse(void *);
Re: parallel ip forwarding
Hi, On Sat, 25 Dec 2021 21:50:47 +0300 Vitaliy Makkoveev wrote: > On Fri, Dec 24, 2021 at 12:50:23PM +0100, Alexander Bluhm wrote: >> On Fri, Dec 24, 2021 at 04:16:28PM +0900, YASUOKA Masahiko wrote: >> > > - npppd l2pt ipsecflowinfo is not MP safe >> > >> > Does this mean the things we are discussing on the "Fix >> > ipsp_spd_lookup() for transport mode" thread? I wonder if there is >> > another issue. >> >> In this mail thread I was concerned about things might get worse. >> >> Currently I see these problems: >> >> tdb_free() will be called with a shared netlock. From there >> ipsp_ids_free() is called. >> >> if (--ids->id_refcount > 0) >> return; >> >> This ref count needs to be atomic. >> >> if (LIST_EMPTY(_ids_gc_list)) >> timeout_add_sec(_ids_gc_timeout, 1); >> LIST_INSERT_HEAD(_ids_gc_list, ids, id_gc_list); >> >> And some mutex should protect ipsp_ids_gc_list. Thanks, I suppose I could catch up the problem. > The diff below adds `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' > list and trees. ipsp_ids_lookup() returns `ids' with bumped reference > counter. This direction seems good. One thing, I found a problem. > Index: sys/netinet/ip_spd.c > === > RCS file: /cvs/src/sys/netinet/ip_spd.c,v > retrieving revision 1.110 > diff -u -p -r1.110 ip_spd.c > --- sys/netinet/ip_spd.c 16 Dec 2021 15:38:03 - 1.110 > +++ sys/netinet/ip_spd.c 25 Dec 2021 18:34:22 - > @@ -418,6 +418,7 @@ ipsp_spd_lookup(struct mbuf *m, int af, > /* Cached entry is good. */ > error = ipsp_spd_inp(m, inp, ipo, tdbout); > mtx_leave(_tdb_mtx); > + ipsp_ids_free(ids); > return error; > >nomatchout: > @@ -452,6 +453,7 @@ ipsp_spd_lookup(struct mbuf *m, int af, > dignore ? : >ipo_dst, > ipo->ipo_sproto, ids ? ids: ipo->ipo_ids, > >ipo_addr, >ipo_mask); > + ipsp_ids_free(ids); > mtx_enter(_tdb_mtx); > if ((tdbp_new != NULL) && > (tdbp_new->tdb_flags & TDBF_DELETED)) { ids will remain unfreed since there are some code paths which doesn't pass the above lines. I tried to fix that, but adding a lot of ipsp_ids_free() looks a mess. Instead, how about changing ipsp_spd_lookup() to take a "struct ipsec_ids *ids" as an argument and letting the caller take the resposibility of the ids? Index: sys/net/if_bridge.c === RCS file: /disk/cvs/openbsd/src/sys/net/if_bridge.c,v retrieving revision 1.362 diff -u -p -r1.362 if_bridge.c --- sys/net/if_bridge.c 23 Dec 2021 12:21:48 - 1.362 +++ sys/net/if_bridge.c 30 Dec 2021 08:12:18 - @@ -1595,7 +1595,7 @@ bridge_ipsec(struct ifnet *ifp, struct e } } else { /* Outgoing from the bridge. */ error = ipsp_spd_lookup(m, af, hlen, IPSP_DIRECTION_OUT, - NULL, NULL, , 0); + NULL, NULL, , NULL); if (error == 0 && tdb != NULL) { /* * We don't need to do loop detection, the Index: sys/net/if_veb.c === RCS file: /disk/cvs/openbsd/src/sys/net/if_veb.c,v retrieving revision 1.21 diff -u -p -r1.21 if_veb.c --- sys/net/if_veb.c8 Nov 2021 04:15:46 - 1.21 +++ sys/net/if_veb.c30 Dec 2021 08:12:18 - @@ -746,7 +746,7 @@ veb_ipsec_proto_out(struct mbuf *m, sa_f #endif tdb = ipsp_spd_lookup(m, af, iphlen, , IPSP_DIRECTION_OUT, - NULL, NULL, 0); + NULL, NULL, NULL); if (tdb == NULL) return (m); Index: sys/netinet/ip_ipsp.c === RCS file: /disk/cvs/openbsd/src/sys/netinet/ip_ipsp.c,v retrieving revision 1.267 diff -u -p -r1.267 ip_ipsp.c --- sys/netinet/ip_ipsp.c 20 Dec 2021 15:59:09 - 1.267 +++ sys/netinet/ip_ipsp.c 30 Dec 2021 08:12:18 - @@ -47,6 +47,8 @@ #include #include #include +#include +#include #include #include @@ -84,6 +86,13 @@ void tdb_hashstats(void); do { } while (0) #endif +/* + * Locks used to protect global data and struct members: + * F ipsec_flows_mtx + */ + +struct mutex ipsec_flows_mtx = MUTEX_INITIALIZER(IPL_SOFTNET); + inttdb_rehash(void); void tdb_timeout(void *); void tdb_firstuse(void *); @@ -98,16 +107,16 @@ int ipsec_ids_idle = 100; /* keep free struct pool tdb_pool; /* Protected by the NET_LOCK(). */ -u_int32_t ipsec_ids_next_flow = 1; /* may not be zero */ -struct ipsec_ids_tree ipsec_ids_tree; -struct ipsec_ids_flows
Re: parallel ip forwarding
On 25.12.2021. 18:52, Alexander Bluhm wrote: > On Sat, Dec 25, 2021 at 09:24:07AM +0100, Hrvoje Popovski wrote: >> On 24.12.2021. 0:55, Alexander Bluhm wrote: >>> I think we can remove the ipsec_in_use workaround now. The IPsec >>> path is protected with the kernel lock. >>> >>> There are some issues left: >>> - npppd l2pt ipsecflowinfo is not MP safe >>> - the acquire SA feature is not MP safe >>> - Hrvoje has seen a panic with sasync >>> >>> If you use one of these, the diff below should trigger crashes faster. >>> If you use only regular IPsec or forwarding, I hope it is stable. >> Hi, >> >> after hitting sasyncd setup with this diff for some time i've run >> ipsecctl -sa just to see what's going on and box panic > According to ddb output ipsecctl is running in userland and the > panic is triggered by a kernel timeout. Could it be coincidence? > When I run with 4 softnet threads, I see userland starvation. Maybe > both timeout and ipsecctl wait for the exclusive netlock and are > executed shortly after each other. > >> r620-1# ipsecctl -sa >> uvm_fault(0x82200c18, 0x417, 0, 2) -> e >> kernel: page fault trap, code=0 >> Stopped at pfsync_delete_tdb+0x84: movq%rcx,0x8(%rsi) >> TIDPIDUID PRFLAGS PFLAGS CPU COMMAND >> 290490 40316 0 0x3 03 ipsecctl >> 10869 22801 680x10 05 sasyncd >> 504041 13202 680x10 0x801 isakmpd >> 476980 6400 00x10 02 ntpd >> 224100 72648 0 0x14000 0x2004 reaper >> * 75659 10211 0 0x14000 0x42000K softclock >> pfsync_delete_tdb(812e8008) at pfsync_delete_tdb+0x84 >> tdb_free(812e8008) at tdb_free+0x67 >> tdb_timeout(812e8008) at tdb_timeout+0x7e >> softclock_thread(8000f260) at softclock_thread+0x13b >> end trace frame: 0x0, count: 11 >> https://www.openbsd.org/ddb.html describes the minimum info required in >> bug reports. Insufficient info makes it difficult to find and fix bugs. >> ddb{0}> > /home/bluhm/openbsd/cvs/src/sys/net/if_pfsync.c:2519 > 5f30: 49 8b 86 10 04 00 00mov0x410(%r14),%rax > 5f37: 49 8b 8e 18 04 00 00mov0x418(%r14),%rcx > 5f3e: 49 8d 94 24 e8 09 00lea0x9e8(%r12),%rdx > 5f45: 00 > 5f46: 48 8d b0 10 04 00 00lea0x410(%rax),%rsi > 5f4d: 48 85 c0test %rax,%rax > * 5f50: 48 0f 44 f2 cmove %rdx,%rsi > 5f54: 48 89 4e 08 mov%rcx,0x8(%rsi) > 5f58: 49 8b 86 10 04 00 00mov0x410(%r14),%rax > 5f5f: 49 8b 8e 18 04 00 00mov0x418(%r14),%rcx > 5f66: 48 89 01mov%rax,(%rcx) > 5f69: 49 c7 86 18 04 00 00movq $0x,0x418(%r14) > 5f70: ff ff ff ff > 5f74: 49 c7 86 10 04 00 00movq $0x,0x410(%r14) > 5f7b: ff ff ff ff > /home/bluhm/openbsd/cvs/src/sys/net/if_pfsync.c:2520 > > 2508 void > 2509 pfsync_delete_tdb(struct tdb *t) > 2510 { > 2511 struct pfsync_softc *sc = pfsyncif; > 2512 size_t nlen; > 2513 > 2514 if (sc == NULL || !ISSET(t->tdb_flags, TDBF_PFSYNC)) > 2515 return; > 2516 > 2517 mtx_enter(>sc_tdb_mtx); > 2518 > * 2519 TAILQ_REMOVE(>sc_tdb_q, t, tdb_sync_entry); > 2520 CLR(t->tdb_flags, TDBF_PFSYNC); > 2521 > 2522 nlen = sizeof(struct pfsync_tdb); > 2523 if (TAILQ_EMPTY(>sc_tdb_q)) > 2524 nlen += sizeof(struct pfsync_subheader); > 2525 atomic_sub_long(>sc_len, nlen); > 2526 > 2527 mtx_leave(>sc_tdb_mtx); > 2528 } > > Most sc_tdb_q are protected by sc_tdb_mtx. But pfsync_drop_snapshot() > and pfsync_sendout() do not have it. > >> rsi0x40f >> rcx 0x >> rax 0x > tqe_next and tqe_prev are -1. Looks like a double remove. Can > pfsync_delete_tdb() be called twice? The tdb_refcnt should enforce > that tdb_free() is only called once per TDB. > >> flags: d1040 > Flags look reasonable. But a problem could be that they are not > protected in if_pfsync. Then set or cleared bits may be lost. > > Fix both missing mutexes . Also put the tdb_sync_entry > modification and TDBF_PFSYNC flag in the same sc_tdb_mtx. > > I still have no pf sync setup. Hrvoje, can you try this? Hi, i think that this panic is coincidence, but i can't prove it because it's not so easy to reproduce. I'm hitting sasync setup with this diff, for 2 days and i can panic boxes... If i try to reproduce panic without this diff, would it give you some more information or hints regarding ipsec and mp forwarding?
Re: parallel ip forwarding
On Fri, Dec 24, 2021 at 12:50:23PM +0100, Alexander Bluhm wrote: > On Fri, Dec 24, 2021 at 04:16:28PM +0900, YASUOKA Masahiko wrote: > > > - npppd l2pt ipsecflowinfo is not MP safe > > > > Does this mean the things we are discussing on the "Fix > > ipsp_spd_lookup() for transport mode" thread? I wonder if there is > > another issue. > > In this mail thread I was concerned about things might get worse. > > Currently I see these problems: > > tdb_free() will be called with a shared netlock. From there > ipsp_ids_free() is called. > > if (--ids->id_refcount > 0) > return; > > This ref count needs to be atomic. > > if (LIST_EMPTY(_ids_gc_list)) > timeout_add_sec(_ids_gc_timeout, 1); > LIST_INSERT_HEAD(_ids_gc_list, ids, id_gc_list); > > And some mutex should protect ipsp_ids_gc_list. > > bluhm > The diff below adds `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and trees. ipsp_ids_lookup() returns `ids' with bumped reference counter. Index: sys/netinet/ip_ipsp.c === RCS file: /cvs/src/sys/netinet/ip_ipsp.c,v retrieving revision 1.267 diff -u -p -r1.267 ip_ipsp.c --- sys/netinet/ip_ipsp.c 20 Dec 2021 15:59:09 - 1.267 +++ sys/netinet/ip_ipsp.c 25 Dec 2021 18:34:22 - @@ -47,6 +47,8 @@ #include #include #include +#include +#include #include #include @@ -84,6 +86,13 @@ void tdb_hashstats(void); do { } while (0) #endif +/* + * Locks used to protect global data and struct members: + * F ipsec_flows_mtx + */ + +struct mutex ipsec_flows_mtx = MUTEX_INITIALIZER(IPL_SOFTNET); + inttdb_rehash(void); void tdb_timeout(void *); void tdb_firstuse(void *); @@ -98,16 +107,16 @@ int ipsec_ids_idle = 100; /* keep free struct pool tdb_pool; /* Protected by the NET_LOCK(). */ -u_int32_t ipsec_ids_next_flow = 1; /* may not be zero */ -struct ipsec_ids_tree ipsec_ids_tree; -struct ipsec_ids_flows ipsec_ids_flows; +u_int32_t ipsec_ids_next_flow = 1; /* [F] may not be zero */ +struct ipsec_ids_tree ipsec_ids_tree; /* [F] */ +struct ipsec_ids_flows ipsec_ids_flows;/* [F] */ struct ipsec_policy_head ipsec_policy_head = TAILQ_HEAD_INITIALIZER(ipsec_policy_head); void ipsp_ids_gc(void *); LIST_HEAD(, ipsec_ids) ipsp_ids_gc_list = -LIST_HEAD_INITIALIZER(ipsp_ids_gc_list); +LIST_HEAD_INITIALIZER(ipsp_ids_gc_list); /* [F] */ struct timeout ipsp_ids_gc_timeout = TIMEOUT_INITIALIZER_FLAGS(ipsp_ids_gc, NULL, TIMEOUT_PROC); @@ -1191,21 +1200,25 @@ ipsp_ids_insert(struct ipsec_ids *ids) struct ipsec_ids *found; u_int32_t start_flow; - NET_ASSERT_LOCKED(); + mtx_enter(_flows_mtx); found = RBT_INSERT(ipsec_ids_tree, _ids_tree, ids); if (found) { /* if refcount was zero, then timeout is running */ - if (found->id_refcount++ == 0) { + if (atomic_inc_int_nv(>id_refcount) == 1) { LIST_REMOVE(found, id_gc_list); if (LIST_EMPTY(_ids_gc_list)) timeout_del(_ids_gc_timeout); } + mtx_leave (_flows_mtx); DPRINTF("ids %p count %d", found, found->id_refcount); return found; } + + ids->id_refcount = 1; ids->id_flow = start_flow = ipsec_ids_next_flow; + if (++ipsec_ids_next_flow == 0) ipsec_ids_next_flow = 1; while (RBT_INSERT(ipsec_ids_flows, _ids_flows, ids) != NULL) { @@ -1214,12 +1227,13 @@ ipsp_ids_insert(struct ipsec_ids *ids) ipsec_ids_next_flow = 1; if (ipsec_ids_next_flow == start_flow) { RBT_REMOVE(ipsec_ids_tree, _ids_tree, ids); + mtx_leave(_flows_mtx); DPRINTF("ipsec_ids_next_flow exhausted %u", - ipsec_ids_next_flow); + start_flow); return NULL; } } - ids->id_refcount = 1; + mtx_leave(_flows_mtx); DPRINTF("new ids %p flow %u", ids, ids->id_flow); return ids; } @@ -1228,11 +1242,16 @@ struct ipsec_ids * ipsp_ids_lookup(u_int32_t ipsecflowinfo) { struct ipsec_idskey; - - NET_ASSERT_LOCKED(); + struct ipsec_ids*ids; key.id_flow = ipsecflowinfo; - return RBT_FIND(ipsec_ids_flows, _ids_flows, ); + + mtx_enter(_flows_mtx); + ids = RBT_FIND(ipsec_ids_flows, _ids_flows, ); + atomic_inc_int(>id_refcount); + mtx_leave(_flows_mtx); + + return ids; } /* free ids only from delayed timeout */ @@ -1241,7 +1260,7 @@ ipsp_ids_gc(void *arg) { struct ipsec_ids *ids, *tids; - NET_LOCK(); + mtx_enter(_flows_mtx);
Re: parallel ip forwarding
On Sat, Dec 25, 2021 at 09:24:07AM +0100, Hrvoje Popovski wrote: > On 24.12.2021. 0:55, Alexander Bluhm wrote: > > I think we can remove the ipsec_in_use workaround now. The IPsec > > path is protected with the kernel lock. > > > > There are some issues left: > > - npppd l2pt ipsecflowinfo is not MP safe > > - the acquire SA feature is not MP safe > > - Hrvoje has seen a panic with sasync > > > > If you use one of these, the diff below should trigger crashes faster. > > If you use only regular IPsec or forwarding, I hope it is stable. > > Hi, > > after hitting sasyncd setup with this diff for some time i've run > ipsecctl -sa just to see what's going on and box panic According to ddb output ipsecctl is running in userland and the panic is triggered by a kernel timeout. Could it be coincidence? When I run with 4 softnet threads, I see userland starvation. Maybe both timeout and ipsecctl wait for the exclusive netlock and are executed shortly after each other. > r620-1# ipsecctl -sa > uvm_fault(0x82200c18, 0x417, 0, 2) -> e > kernel: page fault trap, code=0 > Stopped at pfsync_delete_tdb+0x84: movq%rcx,0x8(%rsi) > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > 290490 40316 0 0x3 03 ipsecctl > 10869 22801 680x10 05 sasyncd > 504041 13202 680x10 0x801 isakmpd > 476980 6400 00x10 02 ntpd > 224100 72648 0 0x14000 0x2004 reaper > * 75659 10211 0 0x14000 0x42000K softclock > pfsync_delete_tdb(812e8008) at pfsync_delete_tdb+0x84 > tdb_free(812e8008) at tdb_free+0x67 > tdb_timeout(812e8008) at tdb_timeout+0x7e > softclock_thread(8000f260) at softclock_thread+0x13b > end trace frame: 0x0, count: 11 > https://www.openbsd.org/ddb.html describes the minimum info required in > bug reports. Insufficient info makes it difficult to find and fix bugs. > ddb{0}> /home/bluhm/openbsd/cvs/src/sys/net/if_pfsync.c:2519 5f30: 49 8b 86 10 04 00 00mov0x410(%r14),%rax 5f37: 49 8b 8e 18 04 00 00mov0x418(%r14),%rcx 5f3e: 49 8d 94 24 e8 09 00lea0x9e8(%r12),%rdx 5f45: 00 5f46: 48 8d b0 10 04 00 00lea0x410(%rax),%rsi 5f4d: 48 85 c0test %rax,%rax * 5f50: 48 0f 44 f2 cmove %rdx,%rsi 5f54: 48 89 4e 08 mov%rcx,0x8(%rsi) 5f58: 49 8b 86 10 04 00 00mov0x410(%r14),%rax 5f5f: 49 8b 8e 18 04 00 00mov0x418(%r14),%rcx 5f66: 48 89 01mov%rax,(%rcx) 5f69: 49 c7 86 18 04 00 00movq $0x,0x418(%r14) 5f70: ff ff ff ff 5f74: 49 c7 86 10 04 00 00movq $0x,0x410(%r14) 5f7b: ff ff ff ff /home/bluhm/openbsd/cvs/src/sys/net/if_pfsync.c:2520 2508 void 2509 pfsync_delete_tdb(struct tdb *t) 2510 { 2511 struct pfsync_softc *sc = pfsyncif; 2512 size_t nlen; 2513 2514 if (sc == NULL || !ISSET(t->tdb_flags, TDBF_PFSYNC)) 2515 return; 2516 2517 mtx_enter(>sc_tdb_mtx); 2518 * 2519 TAILQ_REMOVE(>sc_tdb_q, t, tdb_sync_entry); 2520 CLR(t->tdb_flags, TDBF_PFSYNC); 2521 2522 nlen = sizeof(struct pfsync_tdb); 2523 if (TAILQ_EMPTY(>sc_tdb_q)) 2524 nlen += sizeof(struct pfsync_subheader); 2525 atomic_sub_long(>sc_len, nlen); 2526 2527 mtx_leave(>sc_tdb_mtx); 2528 } Most sc_tdb_q are protected by sc_tdb_mtx. But pfsync_drop_snapshot() and pfsync_sendout() do not have it. > rsi0x40f > rcx 0x > rax 0x tqe_next and tqe_prev are -1. Looks like a double remove. Can pfsync_delete_tdb() be called twice? The tdb_refcnt should enforce that tdb_free() is only called once per TDB. > flags: d1040 Flags look reasonable. But a problem could be that they are not protected in if_pfsync. Then set or cleared bits may be lost. Fix both missing mutexes . Also put the tdb_sync_entry modification and TDBF_PFSYNC flag in the same sc_tdb_mtx. I still have no pf sync setup. Hrvoje, can you try this? ok? bluhm Index: net/if_pfsync.c === RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_pfsync.c,v retrieving revision 1.299 diff -u -p -r1.299 if_pfsync.c --- net/if_pfsync.c 25 Nov 2021 13:46:02 - 1.299 +++ net/if_pfsync.c 25 Dec 2021 17:46:06 - @@ -295,7 +295,7 @@ voidpfsync_bulk_update(void *); void pfsync_bulk_fail(void *); void pfsync_grab_snapshot(struct pfsync_snapshot *, struct pfsync_softc *); -void pfsync_drop_snapshot(struct pfsync_snapshot *); +void pfsync_drop_snapshot(struct
Re: parallel ip forwarding
On 24.12.2021. 0:55, Alexander Bluhm wrote: > I think we can remove the ipsec_in_use workaround now. The IPsec > path is protected with the kernel lock. > > There are some issues left: > - npppd l2pt ipsecflowinfo is not MP safe > - the acquire SA feature is not MP safe > - Hrvoje has seen a panic with sasync > > If you use one of these, the diff below should trigger crashes faster. > If you use only regular IPsec or forwarding, I hope it is stable. Hi, after hitting sasyncd setup with this diff for some time i've run ipsecctl -sa just to see what's going on and box panic r620-1# ipsecctl -sa uvm_fault(0x82200c18, 0x417, 0, 2) -> e kernel: page fault trap, code=0 Stopped at pfsync_delete_tdb+0x84: movq%rcx,0x8(%rsi) TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 290490 40316 0 0x3 03 ipsecctl 10869 22801 680x10 05 sasyncd 504041 13202 680x10 0x801 isakmpd 476980 6400 00x10 02 ntpd 224100 72648 0 0x14000 0x2004 reaper * 75659 10211 0 0x14000 0x42000K softclock pfsync_delete_tdb(812e8008) at pfsync_delete_tdb+0x84 tdb_free(812e8008) at tdb_free+0x67 tdb_timeout(812e8008) at tdb_timeout+0x7e softclock_thread(8000f260) at softclock_thread+0x13b end trace frame: 0x0, count: 11 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{0}> ddb{0}> show reg rdi 0x4 rsi0x40f rbp 0x800022c4e390 rbx0 rdx 0x806b39e8 rcx 0x rax 0x r8 0x1f r9 0 r10 0xbaf844ce8eec335d r11 0xb9858c0c287d2c4d r12 0x806b3000 r13 0x821c5e60timeout_proc r14 0x812e8008 r15 0x806b39f8 rip 0x8131a254pfsync_delete_tdb+0x84 cs 0x8 rflags 0x10286__ALIGN_SIZE+0xf286 rsp 0x800022c4e360 ss 0x10 pfsync_delete_tdb+0x84: movq%rcx,0x8(%rsi) ddb{0}> ddb{0}> show all tdb 0x812e8008: f9f247f0 192.168.42.100->192.168.42.112:50 #0 000d1040 0x812e8428: 959c114b 192.168.42.112->192.168.42.100:50 #2 1002 0x812e8848: b7eb65bc 192.168.42.113->192.168.42.100:50 #2 1002 0x812e9ce8: 55495192 192.168.42.100->192.168.42.113:50 #3 000d1002 ddb{0}> ddb{0}> show tdb /f 0x812e8008 tdb at 0x812e8008 hnext: 0x0 dnext: 0x0 snext: 0x0 inext: 0x0 onext: 0x0 xform: 0x0 refcnt: 0 encalgxform: 0x81f36090 authalgxform: 0x81f36380 compalgxform: 0x0 flags: d1040 seq: 8 exp_allocations: 0 soft_allocations: 0 cur_allocations: 0 exp_bytes: 0 soft_bytes: 0 cur_bytes: 1272048336570 exp_timeout: 1200 soft_timeout: 1080 established: 1640372736 first_use: 1640372754 soft_first_use: 0 exp_first_use: 0 last_used: 1640419926 last_marked: 0 cryptoid: 0 tdb_spi: f9f247f0 amxkeylen: 20 emxkeylen: 20 ivlen: 8 sproto: 50 wnd: 16 satype: 2 updates: 158 dst: 192.168.42.112 src: 192.168.42.100 amxkey: 0x0 emxkey: 0x0 rpl: 5256398086 ids: 0x812d8800 ids_swapped: 0 mtu: 0 mtutimeout: 0 udpencap_port: 0 tag: 0 tap: 0 rdomain: 0 rdomain_post: 0 ddb{0}> PID TID PPIDUID S FLAGS WAIT COMMAND 40316 290490 5050 0 7 0x3ipsecctl 22801 10869 51239 68 70x10sasyncd 51239 39174 1 0 30x80 kqreadsasyncd 13202 504041 43160 68 70x90isakmpd 43160 53413 1 0 30x80 netio isakmpd 5050 12722 1 0 30x10008b sigsusp ksh 64387 345990 1 0 30x100098 poll cron 58819 219224 16427 95 30x100092 kqreadsmtpd 67207 340852 16427103 30x100092 kqreadsmtpd 97983 457473 16427 95 30x100092 kqreadsmtpd 6608 99059 16427 95 30x100092 kqreadsmtpd 91670 313820 16427 95 30x100092 kqreadsmtpd 7571 97218 16427 95 30x100092
Re: parallel ip forwarding
`rtt_link’ list also has missing protection when rtfree() called outside netlock. > On 25 Dec 2021, at 01:34, Alexander Bluhm wrote: > > On Fri, Dec 24, 2021 at 02:04:17PM +0100, Alexander Bluhm wrote: >> On Fri, Dec 24, 2021 at 12:55:04AM +0100, Alexander Bluhm wrote: >>> If you use only regular IPsec or forwarding, I hope it is stable. >> >> false hope >> >> rt_timer_add(fd81b97f5390,814218b0,802040c0,0) at >> rt_timer_ >> add+0xc7 >> icmp_mtudisc_clone(2438040a,0,1) at icmp_mtudisc_clone+0x174 >> ip_output_ipsec_pmtu_update(811e35a0,8000226c0a08,2438040a,0,0) >> at i >> p_output_ipsec_pmtu_update+0x71 >> ip_output_ipsec_send(811e35a0,fd80b8735000,8000226c0a08,1) >> at i >> p_output_ipsec_send+0x231 >> ip_output(fd80b8735000,0,8000226c0a08,1,0,0,457ea326bbcaae85) at >> ip_out >> put+0x7ca >> ip_forward(fd80b8735000,8011e048,fd819089ce70,0) at >> ip_forward+ >> 0x2da >> ip_input_if(8000226c0b48,8000226c0b54,4,0,8011e048) at >> ip_input >> _if+0x353 >> ipv4_input(8011e048,fd80b8735000) at ipv4_input+0x39 >> ether_input(8011e048,fd80b8735000) at ether_input+0x3ad >> if_input_process(8011e048,8000226c0c38) at if_input_process+0x6f >> ifiq_process(8011dc00) at ifiq_process+0x69 >> taskq_thread(8002e200) at taskq_thread+0x100 >> end trace frame: 0x0, count: -12 > > /usr/src/sys/net/route.c:1491 >3773: 49 8b 0fmov(%r15),%rcx >3776: 49 8b 47 08 mov0x8(%r15),%rax >377a: 48 85 c9test %rcx,%rcx >377d: 74 06 je 3785 >377f: 48 83 c1 08 add$0x8,%rcx >3783: eb 08 jmp378d >3785: 49 8b 4f 20 mov0x20(%r15),%rcx >3789: 48 83 c1 18 add$0x18,%rcx >378d: 48 89 01mov%rax,(%rcx) >3790: 49 8b 07mov(%r15),%rax >3793: 49 8b 4f 08 mov0x8(%r15),%rcx > * 3797: 48 89 01mov%rax,(%rcx) >379a: 49 c7 47 08 ff ff ffmovq $0x,0x8(%r15) >37a1: ff >37a2: 49 c7 07 ff ff ff ffmovq $0x,(%r15) > /usr/src/sys/net/route.c:1492 > > 1484 /* > 1485 * If there's already a timer with this action, destroy it > before > 1486 * we add a new one. > 1487 */ > 1488 LIST_FOREACH(r, >rt_timer, rtt_link) { > 1489 if (r->rtt_func == func) { > 1490 LIST_REMOVE(r, rtt_link); > * 1491 TAILQ_REMOVE(>rtt_queue->rtq_head, r, > rtt_next); > 1492 if (r->rtt_queue->rtq_count > 0) > 1493 r->rtt_queue->rtq_count--; > 1494 else > 1495 printf("rt_timer_add: rtq_count > reached 0\n"); > 1496 pool_put(_pool, r); > 1497 break; /* only one per list, so we can > quit... */ > 1498 } > 1499 } > > These lists don't look very MP safe. >
Re: parallel ip forwarding
On Fri, Dec 24, 2021 at 02:04:17PM +0100, Alexander Bluhm wrote: > On Fri, Dec 24, 2021 at 12:55:04AM +0100, Alexander Bluhm wrote: > > If you use only regular IPsec or forwarding, I hope it is stable. > > false hope > > rt_timer_add(fd81b97f5390,814218b0,802040c0,0) at > rt_timer_ > add+0xc7 > icmp_mtudisc_clone(2438040a,0,1) at icmp_mtudisc_clone+0x174 > ip_output_ipsec_pmtu_update(811e35a0,8000226c0a08,2438040a,0,0) > at i > p_output_ipsec_pmtu_update+0x71 > ip_output_ipsec_send(811e35a0,fd80b8735000,8000226c0a08,1) at > i > p_output_ipsec_send+0x231 > ip_output(fd80b8735000,0,8000226c0a08,1,0,0,457ea326bbcaae85) at > ip_out > put+0x7ca > ip_forward(fd80b8735000,8011e048,fd819089ce70,0) at > ip_forward+ > 0x2da > ip_input_if(8000226c0b48,8000226c0b54,4,0,8011e048) at > ip_input > _if+0x353 > ipv4_input(8011e048,fd80b8735000) at ipv4_input+0x39 > ether_input(8011e048,fd80b8735000) at ether_input+0x3ad > if_input_process(8011e048,8000226c0c38) at if_input_process+0x6f > ifiq_process(8011dc00) at ifiq_process+0x69 > taskq_thread(8002e200) at taskq_thread+0x100 > end trace frame: 0x0, count: -12 /usr/src/sys/net/route.c:1491 3773: 49 8b 0fmov(%r15),%rcx 3776: 49 8b 47 08 mov0x8(%r15),%rax 377a: 48 85 c9test %rcx,%rcx 377d: 74 06 je 3785 377f: 48 83 c1 08 add$0x8,%rcx 3783: eb 08 jmp378d 3785: 49 8b 4f 20 mov0x20(%r15),%rcx 3789: 48 83 c1 18 add$0x18,%rcx 378d: 48 89 01mov%rax,(%rcx) 3790: 49 8b 07mov(%r15),%rax 3793: 49 8b 4f 08 mov0x8(%r15),%rcx * 3797: 48 89 01mov%rax,(%rcx) 379a: 49 c7 47 08 ff ff ffmovq $0x,0x8(%r15) 37a1: ff 37a2: 49 c7 07 ff ff ff ffmovq $0x,(%r15) /usr/src/sys/net/route.c:1492 1484 /* 1485 * If there's already a timer with this action, destroy it before 1486 * we add a new one. 1487 */ 1488 LIST_FOREACH(r, >rt_timer, rtt_link) { 1489 if (r->rtt_func == func) { 1490 LIST_REMOVE(r, rtt_link); * 1491 TAILQ_REMOVE(>rtt_queue->rtq_head, r, rtt_next); 1492 if (r->rtt_queue->rtq_count > 0) 1493 r->rtt_queue->rtq_count--; 1494 else 1495 printf("rt_timer_add: rtq_count reached 0\n"); 1496 pool_put(_pool, r); 1497 break; /* only one per list, so we can quit... */ 1498 } 1499 } These lists don't look very MP safe.
Re: parallel ip forwarding
On Fri, Dec 24, 2021 at 12:55:04AM +0100, Alexander Bluhm wrote: > If you use only regular IPsec or forwarding, I hope it is stable. false hope rt_timer_add(fd81b97f5390,814218b0,802040c0,0) at rt_timer_ add+0xc7 icmp_mtudisc_clone(2438040a,0,1) at icmp_mtudisc_clone+0x174 ip_output_ipsec_pmtu_update(811e35a0,8000226c0a08,2438040a,0,0) at i p_output_ipsec_pmtu_update+0x71 ip_output_ipsec_send(811e35a0,fd80b8735000,8000226c0a08,1) at i p_output_ipsec_send+0x231 ip_output(fd80b8735000,0,8000226c0a08,1,0,0,457ea326bbcaae85) at ip_out put+0x7ca ip_forward(fd80b8735000,8011e048,fd819089ce70,0) at ip_forward+ 0x2da ip_input_if(8000226c0b48,8000226c0b54,4,0,8011e048) at ip_input _if+0x353 ipv4_input(8011e048,fd80b8735000) at ipv4_input+0x39 ether_input(8011e048,fd80b8735000) at ether_input+0x3ad if_input_process(8011e048,8000226c0c38) at if_input_process+0x6f ifiq_process(8011dc00) at ifiq_process+0x69 taskq_thread(8002e200) at taskq_thread+0x100 end trace frame: 0x0, count: -12
Re: parallel ip forwarding
On Fri, Dec 24, 2021 at 04:16:28PM +0900, YASUOKA Masahiko wrote: > > - npppd l2pt ipsecflowinfo is not MP safe > > Does this mean the things we are discussing on the "Fix > ipsp_spd_lookup() for transport mode" thread? I wonder if there is > another issue. In this mail thread I was concerned about things might get worse. Currently I see these problems: tdb_free() will be called with a shared netlock. From there ipsp_ids_free() is called. if (--ids->id_refcount > 0) return; This ref count needs to be atomic. if (LIST_EMPTY(_ids_gc_list)) timeout_add_sec(_ids_gc_timeout, 1); LIST_INSERT_HEAD(_ids_gc_list, ids, id_gc_list); And some mutex should protect ipsp_ids_gc_list. bluhm
Re: parallel ip forwarding
Hello, On Fri, 24 Dec 2021 00:55:04 +0100 Alexander Bluhm wrote: > On Fri, Dec 03, 2021 at 08:35:45PM +0100, Alexander Bluhm wrote: >> Note that IPsec still has the workaround to disable multiple queues. > > I think we can remove the ipsec_in_use workaround now. The IPsec > path is protected with the kernel lock. > > There are some issues left: > - npppd l2pt ipsecflowinfo is not MP safe Does this mean the things we are discussing on the "Fix ipsp_spd_lookup() for transport mode" thread? I wonder if there is another issue. > - the acquire SA feature is not MP safe > - Hrvoje has seen a panic with sasync
Re: parallel ip forwarding
On Fri, Dec 03, 2021 at 08:35:45PM +0100, Alexander Bluhm wrote: > Note that IPsec still has the workaround to disable multiple queues. I think we can remove the ipsec_in_use workaround now. The IPsec path is protected with the kernel lock. There are some issues left: - npppd l2pt ipsecflowinfo is not MP safe - the acquire SA feature is not MP safe - Hrvoje has seen a panic with sasync If you use one of these, the diff below should trigger crashes faster. If you use only regular IPsec or forwarding, I hope it is stable. bluhm Index: net/if.c === RCS file: /cvs/src/sys/net/if.c,v retrieving revision 1.644 diff -u -p -r1.644 if.c --- net/if.c11 Nov 2021 10:03:10 - 1.644 +++ net/if.c23 Dec 2021 23:11:20 - @@ -108,6 +108,10 @@ #include #endif +#ifdef IPSEC +#include +#endif + #ifdef MPLS #include #endif @@ -237,7 +241,7 @@ int ifq_congestion; int netisr; -#defineNET_TASKQ 1 +#defineNET_TASKQ 4 struct taskq *nettqmp[NET_TASKQ]; struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL); @@ -834,15 +838,10 @@ if_input_process(struct ifnet *ifp, stru * lists and the socket layer. */ - /* -* XXXSMP IPsec data structures are not ready to be accessed -* by multiple network threads in parallel. In this case -* use an exclusive lock. -*/ - NET_LOCK(); + NET_RLOCK_IN_SOFTNET(); while ((m = ml_dequeue(ml)) != NULL) (*ifp->if_input)(ifp, m); - NET_UNLOCK(); + NET_RUNLOCK_IN_SOFTNET(); } void @@ -899,6 +898,12 @@ if_netisr(void *unused) arpintr(); KERNEL_UNLOCK(); } +#endif + if (n & (1 << NETISR_IP)) + ipintr(); +#ifdef INET6 + if (n & (1 << NETISR_IPV6)) + ip6intr(); #endif #if NPPP > 0 if (n & (1 << NETISR_PPP)) { Index: net/if_ethersubr.c === RCS file: /cvs/src/sys/net/if_ethersubr.c,v retrieving revision 1.276 diff -u -p -r1.276 if_ethersubr.c --- net/if_ethersubr.c 19 Aug 2021 10:22:00 - 1.276 +++ net/if_ethersubr.c 23 Dec 2021 23:11:20 - @@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct switch (af) { case AF_INET: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in arpresolve() */ error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); eh->ether_type = htons(ETHERTYPE_IP); @@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct break; #ifdef INET6 case AF_INET6: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in nd6_resolve() */ error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); eh->ether_type = htons(ETHERTYPE_IPV6); @@ -271,13 +277,19 @@ ether_resolve(struct ifnet *ifp, struct break; #ifdef INET6 case AF_INET6: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in nd6_resolve() */ error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); break; #endif case AF_INET: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in arpresolve() */ error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); break; @@ -529,12 +541,14 @@ ether_input(struct ifnet *ifp, struct mb case ETHERTYPE_PPPOE: if (m->m_flags & (M_MCAST | M_BCAST)) goto dropanyway; + KERNEL_LOCK(); #ifdef PIPEX if (pipex_enable) { struct pipex_session *session; if ((session = pipex_pppoe_lookup_session(m)) != NULL) { pipex_pppoe_input(m, session); + KERNEL_UNLOCK(); return; } } @@ -543,6 +557,7 @@ ether_input(struct ifnet *ifp, struct mb pppoe_disc_input(m); else pppoe_data_input(m); + KERNEL_UNLOCK(); return; #endif #ifdef MPLS Index: net/ifq.c
Re: parallel ip forwarding
On Sat, Dec 04, 2021 at 10:41:02AM +0100, Hrvoje Popovski wrote: > r620-2# uvm_fault(0x8229d4e0, 0x137, 0, 2) -> e > kernel: page fault trap, code=0 > Stopped at ipsp_spd_lookup+0xa2f: movq%rax,0(%rcx) > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > 419237 67407 0 0x14000 0x2000 softnet > *157694 94649 0 0x14000 0x2002K softnet > ipsp_spd_lookup(fd80a4139800,2,14,2,0,0,5b815d966b14b44b,fd80a4139800) > at ipsp_spd_lookup+0xa2f Thanks a lot for the test. It crashes here: /home/bluhm/openbsd/cvs/src/sys/netinet/ip_spd.c:414 cdc: 48 03 0aadd(%rdx),%rcx *cdf: 48 89 01mov%rax,(%rcx) ce2: 49 8b 80 30 01 00 00mov0x130(%r8),%rax ce9: 49 8b 88 38 01 00 00mov0x138(%r8),%rcx cf0: 48 89 01mov%rax,(%rcx) cf3: 49 c7 80 38 01 00 00movq $0x,0x138(%r8) cfa: ff ff ff ff cfe: 49 c7 80 30 01 00 00movq $0x,0x130(%r8) d05: ff ff ff ff /home/bluhm/openbsd/cvs/src/sys/netinet/ip_spd.c:416 nomatchout: /* Cached TDB was not good. */ * TAILQ_REMOVE(>ipo_tdb->tdb_policy_head, ipo, ipo_tdb_next); tdb_unref(ipo->ipo_tdb); ipo->ipo_tdb = NULL; ipo->ipo_last_searched = 0; So mvs@'s concerns are correct, my IPsec workaround is not sufficient. I want to avoid another rwlock in the input path. Maybe I can throw some mutexes into IPsec to make it work. bluhm > ip_output_ipsec_lookup(fd80a4139800,14,0,800022c60228,0) at > ip_output_ipsec_lookup+0x4c > ip_output(fd80a4139800,0,800022c603e8,1,0,0,3ada3367ffb43fe1) at > ip_output+0x39d > ip_forward(fd80a4139800,80087048,fd8394511078,0) at > ip_forward+0x26a > ip_input_if(800022c60528,800022c60534,4,0,80087048) at > ip_input_if+0x353 > ipv4_input(80087048,fd80a4139800) at ipv4_input+0x39 > ether_input(80087048,fd80a4139800) at ether_input+0x3aa > if_input_process(80087048,800022c60618) at if_input_process+0x92 > ifiq_process(80087458) at ifiq_process+0x69 > taskq_thread(8002f080) at taskq_thread+0x81 > end trace frame: 0x0, count: 5
Re: parallel ip forwarding
On 4.12.2021. 10:41, Hrvoje Popovski wrote: > On 3.12.2021. 20:35, Alexander Bluhm wrote: >> Hi, >> >> After implementing MP safe refcounting for IPsec TDB, I wonder how >> stable my diff for forwarding on multiple CPU is. >> >> Note that IPsec still has the workaround to disable multiple queues. >> We do not have a mutex that protects the TDB fields yet. We have >> only fixed the memory management. >> >> My goal is to get real MP pressure on the lower part of the IP >> network stack. Only this will show remaining bugs. > Hi, > > with only plain forwarding it seems that everything works just fine, but > with ipsec i'm getting this panic. i will leave ddb console active if > something else will be needed Now with WITNESS r620-2# uvm_fault(0x8226ee78, 0x137, 0, 2) -> e kernel: page fault trap, code=0 Stopped at ipsp_spd_lookup+0xa2f: movq%rax,0(%rcx) TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 258332 43705 0 0x14000 0x2004 softnet *440260 99913 0 0x14000 0x2003K softnet 63340 24364 0 0x14000 0x2002 softnet ipsp_spd_lookup(fd80a46ed700,2,14,2,0,0,dcfe3c78989b6cd1,fd80a46ed700) at ipsp_spd_lookup+0xa2f ip_output_ipsec_lookup(fd80a46ed700,14,0,800022c71248,0) at ip_output_ipsec_lookup+0x4c ip_output(fd80a46ed700,0,800022c71408,1,0,0,6db6075421f27eb8) at ip_output+0x39d ip_forward(fd80a46ed700,80087048,fd83b4374cb0,0) at ip_forward+0x26a ip_input_if(800022c71548,800022c71554,4,0,80087048) at ip_input_if+0x353 ipv4_input(80087048,fd80a46ed700) at ipv4_input+0x39 ether_input(80087048,fd80a46ed700) at ether_input+0x3aa if_input_process(80087048,800022c71638) at if_input_process+0x92 ifiq_process(80086e00) at ifiq_process+0x69 taskq_thread(80030200) at taskq_thread+0x9f end trace frame: 0x0, count: 5 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{3}> ddb{3}> show reg rdi0 rsi 0x8145d018 rbp 0x800022c71150 rbx0 rdx 0x1 rcx0x137 rax 0x r80xfd839a222d30 r90x81453904 r10 0x8142f240 r11 0xebc2ae563992db8e r12 0x800022c71178 r130 r14 0x1 r150 rip 0x811f56cfipsp_spd_lookup+0xa2f cs 0x8 rflags 0x10213__ALIGN_SIZE+0xf213 rsp 0x800022c71040 ss 0x10 ipsp_spd_lookup+0xa2f: movq%rax,0(%rcx) ddb{3}> ddb{3}> show locks exclusive kernel_lock _lock r = 0 (0x822b3ce8) #0 witness_lock+0x333 #1 kpageflttrap+0x172 #2 kerntrap+0x91 #3 alltraps_kern_meltdown+0x7b #4 ipsp_spd_lookup+0xa2f #5 ip_output_ipsec_lookup+0x4c #6 ip_output+0x39d #7 ip_forward+0x26a #8 ip_input_if+0x353 #9 ipv4_input+0x39 #10 ether_input+0x3aa #11 if_input_process+0x92 #12 ifiq_process+0x69 #13 taskq_thread+0x9f #14 proc_trampoline+0x1c shared rwlock netlock r = 0 (0x8217f598) #0 witness_lock+0x333 #1 rw_enter+0x27f #2 if_input_process+0x75 #3 ifiq_process+0x69 #4 taskq_thread+0x9f #5 proc_trampoline+0x1c shared rwlock softnet r = 0 (0x80030270) #0 witness_lock+0x333 #1 taskq_thread+0x92 #2 proc_trampoline+0x1c ddb{3}> ddb{3}> show all locks Process 43705 (softnet) thread 0x8000e540 (258332) shared rwlock netlock r = 0 (0x8217f598) #0 witness_lock+0x333 #1 rw_enter+0x27f #2 if_input_process+0x75 #3 ifiq_process+0x69 #4 taskq_thread+0x9f #5 proc_trampoline+0x1c shared rwlock softnet r = 0 (0x80030370) #0 witness_lock+0x333 #1 taskq_thread+0x92 #2 proc_trampoline+0x1c Process 99913 (softnet) thread 0x8000e000 (440260) exclusive kernel_lock _lock r = 0 (0x822b3ce8) #0 witness_lock+0x333 #1 kpageflttrap+0x172 #2 kerntrap+0x91 #3 alltraps_kern_meltdown+0x7b #4 ipsp_spd_lookup+0xa2f #5 ip_output_ipsec_lookup+0x4c #6 ip_output+0x39d #7 ip_forward+0x26a #8 ip_input_if+0x353 #9 ipv4_input+0x39 #10 ether_input+0x3aa #11 if_input_process+0x92 #12 ifiq_process+0x69 #13 taskq_thread+0x9f #14 proc_trampoline+0x1c shared rwlock netlock r = 0 (0x8217f598) #0 witness_lock+0x333 #1 rw_enter+0x27f #2 if_input_process+0x75 #3 ifiq_process+0x69 #4 taskq_thread+0x9f #5 proc_trampoline+0x1c shared rwlock softnet r = 0 (0x80030270) #0 witness_lock+0x333 #1 taskq_thread+0x92 #2 proc_trampoline+0x1c Process 39040 (softnet) thread 0x8000e2a0 (358256) shared rwlock softnet r = 0 (0x80030170) #0 witness_lock+0x333 #1
Re: parallel ip forwarding
On 4.12.2021. 10:41, Hrvoje Popovski wrote: > Hi, > > with only plain forwarding it seems that everything works just fine, but > with ipsec i'm getting this panic. i will leave ddb console active if > something else will be needed regarding performance ... without diff i'm getting 1.1Mpps and with this diff i'm getting 2Mpps of plain forwarding ..
Re: parallel ip forwarding
On 3.12.2021. 20:35, Alexander Bluhm wrote: > Hi, > > After implementing MP safe refcounting for IPsec TDB, I wonder how > stable my diff for forwarding on multiple CPU is. > > Note that IPsec still has the workaround to disable multiple queues. > We do not have a mutex that protects the TDB fields yet. We have > only fixed the memory management. > > My goal is to get real MP pressure on the lower part of the IP > network stack. Only this will show remaining bugs. Hi, with only plain forwarding it seems that everything works just fine, but with ipsec i'm getting this panic. i will leave ddb console active if something else will be needed r620-2# uvm_fault(0x8229d4e0, 0x137, 0, 2) -> e kernel: page fault trap, code=0 Stopped at ipsp_spd_lookup+0xa2f: movq%rax,0(%rcx) TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 419237 67407 0 0x14000 0x2000 softnet *157694 94649 0 0x14000 0x2002K softnet ipsp_spd_lookup(fd80a4139800,2,14,2,0,0,5b815d966b14b44b,fd80a4139800) at ipsp_spd_lookup+0xa2f ip_output_ipsec_lookup(fd80a4139800,14,0,800022c60228,0) at ip_output_ipsec_lookup+0x4c ip_output(fd80a4139800,0,800022c603e8,1,0,0,3ada3367ffb43fe1) at ip_output+0x39d ip_forward(fd80a4139800,80087048,fd8394511078,0) at ip_forward+0x26a ip_input_if(800022c60528,800022c60534,4,0,80087048) at ip_input_if+0x353 ipv4_input(80087048,fd80a4139800) at ipv4_input+0x39 ether_input(80087048,fd80a4139800) at ether_input+0x3aa if_input_process(80087048,800022c60618) at if_input_process+0x92 ifiq_process(80087458) at ifiq_process+0x69 taskq_thread(8002f080) at taskq_thread+0x81 end trace frame: 0x0, count: 5 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{2}> ddb{2}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 84832 15199 94097 68 30x10 netlock isakmpd 94097 209788 1 0 30x80 netio isakmpd 15214 351483 1 0 30x100083 ttyin ksh 12905 119865 1 0 30x100098 poll cron 84388 106290 61660 95 30x100092 kqreadsmtpd 49054 464016 61660103 30x100092 kqreadsmtpd 24623 388600 61660 95 30x100092 kqreadsmtpd 25783 71877 61660 95 30x100092 kqreadsmtpd 44044 451515 61660 95 30x100092 kqreadsmtpd 74629 163500 61660 95 30x100092 kqreadsmtpd 61660 91566 1 0 30x100080 kqreadsmtpd 44941 436617 1 0 30x88 poll sshd 63116 236112 1 0 30x100080 poll ntpd 67685 455461 13478 83 30x100092 poll ntpd 13478 271069 1 83 30x100092 poll ntpd 4556 497201 93102 73 30x100090 kqreadsyslogd 93102 715 1 0 30x100082 netio syslogd 75807 17005 0 0 3 0x14200 bored smr 19891 296875 0 0 3 0x14200 pgzerozerothread 85805 493633 0 0 3 0x14200 aiodoned aiodoned 30502 291595 0 0 3 0x14200 syncerupdate 70346 429811 0 0 3 0x14200 cleaner cleaner 2149 206582 0 0 3 0x14200 reaperreaper 34585 457270 0 0 3 0x14200 pgdaemon pagedaemon 68729 397903 0 0 3 0x14200 usbtskusbtask 65328 269730 0 0 3 0x14200 usbatsk usbatsk 69367 498053 0 0 3 0x40014200 acpi0 acpi0 5156 405582 0 0 7 0x40014200idle5 53211 338561 0 0 7 0x40014200idle4 19386 131626 0 0 7 0x40014200idle3 22779 421169 0 0 3 0x40014200idle2 27190 363359 0 0 7 0x40014200idle1 1080 80711 0 0 3 0x14200 bored sensors 67407 419237 0 0 7 0x14200softnet 34617 10042 0 0 3 0x14200 netlock softnet 50133 404397 0 0 3 0x14200 netlock softnet *94649 157694 0 0 7 0x14200softnet 51030 188067 0 0 3 0x14200 bored systqmp 46228 294212 0 0 3 0x14200 bored systq 62154 451682 0 0 2 0x40014200softclock 34808 227380 0 0 3 0x40014200idle0 1 24458 0 0 30x82 wait init 0 0 -1 0 3 0x10200 scheduler swapper ddb{2}> trace /t 0t157694 80087048(1,8122bc39,800022c60550,800022c60528,80002 2c60534,4) at 0x80087048
Re: parallel ip forwarding
On Fri, Dec 03, 2021 at 08:35:45PM +0100, Alexander Bluhm wrote: > Hi, > > After implementing MP safe refcounting for IPsec TDB, I wonder how > stable my diff for forwarding on multiple CPU is. > > Note that IPsec still has the workaround to disable multiple queues. > We do not have a mutex that protects the TDB fields yet. We have > only fixed the memory management. > > My goal is to get real MP pressure on the lower part of the IP > network stack. Only this will show remaining bugs. > > bluhm > This unlocked `ipsec_in_use' check doesn't work. SADB_X_ADDFLOW and `ipsec_in_use' increment are netlock serialized. > if_input_process(struct ifnet *ifp, struct mbuf_list *ml) > { > struct mbuf *m; > + int exclusive_lock = 0; > > if (ml_empty(ml)) > return; > @@ -834,15 +839,27 @@ if_input_process(struct ifnet *ifp, stru >* lists and the socket layer. >*/ > > +#ifdef IPSEC > /* >* XXXSMP IPsec data structures are not ready to be accessed >* by multiple network threads in parallel. In this case >* use an exclusive lock. >*/ > - NET_LOCK(); > + if (ipsec_in_use) > + exclusive_lock = 1; > +#endif Concurrent SADB_X_ADDFLOW threads could add some SA between your unlocked `ipsec_in_use' check and the following shared netlock. This time `ipsec_in_use' modification is netlock serialized but you could add the new rwlock(9). Since we are only interesting in non-null value of `ipsec_in_use' it's enough to add it here and around SADB_X_ADDFLOW case within pfkeyv2_send(). I already posted the fix for this diff to your old [1] thread. > + if (exclusive_lock) > + NET_LOCK(); > + else > + NET_RLOCK_IN_SOFTNET(); > + > while ((m = ml_dequeue(ml)) != NULL) > (*ifp->if_input)(ifp, m); 1. https://marc.info/?l=openbsd-tech=162698721127298=2 > Index: net/if.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v > retrieving revision 1.644 > diff -u -p -r1.644 if.c > --- net/if.c 11 Nov 2021 10:03:10 - 1.644 > +++ net/if.c 3 Dec 2021 18:51:29 - > @@ -108,6 +108,10 @@ > #include > #endif > > +#ifdef IPSEC > +#include > +#endif > + > #ifdef MPLS > #include > #endif > @@ -237,7 +241,7 @@ int ifq_congestion; > > int netisr; > > -#define NET_TASKQ 1 > +#define NET_TASKQ 4 > struct taskq *nettqmp[NET_TASKQ]; > > struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL); > @@ -814,6 +818,7 @@ void > if_input_process(struct ifnet *ifp, struct mbuf_list *ml) > { > struct mbuf *m; > + int exclusive_lock = 0; > > if (ml_empty(ml)) > return; > @@ -834,15 +839,27 @@ if_input_process(struct ifnet *ifp, stru >* lists and the socket layer. >*/ > > +#ifdef IPSEC > /* >* XXXSMP IPsec data structures are not ready to be accessed >* by multiple network threads in parallel. In this case >* use an exclusive lock. >*/ > - NET_LOCK(); > + if (ipsec_in_use) > + exclusive_lock = 1; > +#endif > + if (exclusive_lock) > + NET_LOCK(); > + else > + NET_RLOCK_IN_SOFTNET(); > + > while ((m = ml_dequeue(ml)) != NULL) > (*ifp->if_input)(ifp, m); > - NET_UNLOCK(); > + > + if (exclusive_lock) > + NET_UNLOCK(); > + else > + NET_RUNLOCK_IN_SOFTNET(); > } > > void > @@ -899,6 +916,12 @@ if_netisr(void *unused) > arpintr(); > KERNEL_UNLOCK(); > } > +#endif > + if (n & (1 << NETISR_IP)) > + ipintr(); > +#ifdef INET6 > + if (n & (1 << NETISR_IPV6)) > + ip6intr(); > #endif > #if NPPP > 0 > if (n & (1 << NETISR_PPP)) { > Index: net/if_ethersubr.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v > retrieving revision 1.276 > diff -u -p -r1.276 if_ethersubr.c > --- net/if_ethersubr.c19 Aug 2021 10:22:00 - 1.276 > +++ net/if_ethersubr.c3 Dec 2021 18:51:29 - > @@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct > > switch (af) { > case AF_INET: > + KERNEL_LOCK(); > + /* XXXSMP there is a MP race in arpresolve() */ > error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); > + KERNEL_UNLOCK(); > if (error) > return (error); > eh->ether_type = htons(ETHERTYPE_IP); > @@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct > break; > #ifdef INET6 > case AF_INET6: > + KERNEL_LOCK(); > + /* XXXSMP there is a MP race in nd6_resolve() */ >
parallel ip forwarding
Hi, After implementing MP safe refcounting for IPsec TDB, I wonder how stable my diff for forwarding on multiple CPU is. Note that IPsec still has the workaround to disable multiple queues. We do not have a mutex that protects the TDB fields yet. We have only fixed the memory management. My goal is to get real MP pressure on the lower part of the IP network stack. Only this will show remaining bugs. bluhm Index: net/if.c === RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v retrieving revision 1.644 diff -u -p -r1.644 if.c --- net/if.c11 Nov 2021 10:03:10 - 1.644 +++ net/if.c3 Dec 2021 18:51:29 - @@ -108,6 +108,10 @@ #include #endif +#ifdef IPSEC +#include +#endif + #ifdef MPLS #include #endif @@ -237,7 +241,7 @@ int ifq_congestion; int netisr; -#defineNET_TASKQ 1 +#defineNET_TASKQ 4 struct taskq *nettqmp[NET_TASKQ]; struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL); @@ -814,6 +818,7 @@ void if_input_process(struct ifnet *ifp, struct mbuf_list *ml) { struct mbuf *m; + int exclusive_lock = 0; if (ml_empty(ml)) return; @@ -834,15 +839,27 @@ if_input_process(struct ifnet *ifp, stru * lists and the socket layer. */ +#ifdef IPSEC /* * XXXSMP IPsec data structures are not ready to be accessed * by multiple network threads in parallel. In this case * use an exclusive lock. */ - NET_LOCK(); + if (ipsec_in_use) + exclusive_lock = 1; +#endif + if (exclusive_lock) + NET_LOCK(); + else + NET_RLOCK_IN_SOFTNET(); + while ((m = ml_dequeue(ml)) != NULL) (*ifp->if_input)(ifp, m); - NET_UNLOCK(); + + if (exclusive_lock) + NET_UNLOCK(); + else + NET_RUNLOCK_IN_SOFTNET(); } void @@ -899,6 +916,12 @@ if_netisr(void *unused) arpintr(); KERNEL_UNLOCK(); } +#endif + if (n & (1 << NETISR_IP)) + ipintr(); +#ifdef INET6 + if (n & (1 << NETISR_IPV6)) + ip6intr(); #endif #if NPPP > 0 if (n & (1 << NETISR_PPP)) { Index: net/if_ethersubr.c === RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v retrieving revision 1.276 diff -u -p -r1.276 if_ethersubr.c --- net/if_ethersubr.c 19 Aug 2021 10:22:00 - 1.276 +++ net/if_ethersubr.c 3 Dec 2021 18:51:29 - @@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct switch (af) { case AF_INET: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in arpresolve() */ error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); eh->ether_type = htons(ETHERTYPE_IP); @@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct break; #ifdef INET6 case AF_INET6: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in nd6_resolve() */ error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); eh->ether_type = htons(ETHERTYPE_IPV6); @@ -271,13 +277,19 @@ ether_resolve(struct ifnet *ifp, struct break; #ifdef INET6 case AF_INET6: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in nd6_resolve() */ error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); break; #endif case AF_INET: + KERNEL_LOCK(); + /* XXXSMP there is a MP race in arpresolve() */ error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); + KERNEL_UNLOCK(); if (error) return (error); break; @@ -529,12 +541,14 @@ ether_input(struct ifnet *ifp, struct mb case ETHERTYPE_PPPOE: if (m->m_flags & (M_MCAST | M_BCAST)) goto dropanyway; + KERNEL_LOCK(); #ifdef PIPEX if (pipex_enable) { struct pipex_session *session; if ((session = pipex_pppoe_lookup_session(m)) != NULL) { pipex_pppoe_input(m, session); + KERNEL_UNLOCK();