Re: arpresolve reduce kernel lock
> On 26 Apr 2023, at 15:30, Hrvoje Popovski wrote: > > On 26.4.2023. 12:15, Alexander Bluhm wrote: >> On Wed, Apr 26, 2023 at 11:17:32AM +0200, Alexander Bluhm wrote: >>> On Tue, Apr 25, 2023 at 11:57:09PM +0300, Vitaliy Makkoveev wrote: On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote: > Hi, > > Mutex arp_mtx protects the llinfo_arp la_... fields. So kernel > lock is only needed for changing the route rt_flags. > > Of course there is a race between checking and setting rt_flags. > But the other checks of the RTF_REJECT flags were without kernel > lock before. This does not cause trouble, the worst thing that may > happen is to wait another exprire time for ARP retry. My diff does > not make it worse, reading rt_flags and rt_expire is done without > lock anyway. > > The kernel lock is needed to change rt_flags. Testing without > KERNEL_LOCK() caused crashes. > Hi, I'm interesting is the system stable with the diff below? If so, we could avoid kernel lock in the arpresolve(). >>> I could not crash it. >> I was too fast. Just after writing this mail I restarted the test. > > Hi, > > my boxes are still ok with mvs@ diff even if I'm running arp -ad in loop. > Thanks for testing. It seems rtfree(rt->rt_parent) was called twice, so I’m confusing a little with this unlocked RTF_REJECT check.
Re: arpresolve reduce kernel lock
On 26.4.2023. 12:15, Alexander Bluhm wrote: > On Wed, Apr 26, 2023 at 11:17:32AM +0200, Alexander Bluhm wrote: >> On Tue, Apr 25, 2023 at 11:57:09PM +0300, Vitaliy Makkoveev wrote: >>> On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote: Hi, Mutex arp_mtx protects the llinfo_arp la_... fields. So kernel lock is only needed for changing the route rt_flags. Of course there is a race between checking and setting rt_flags. But the other checks of the RTF_REJECT flags were without kernel lock before. This does not cause trouble, the worst thing that may happen is to wait another exprire time for ARP retry. My diff does not make it worse, reading rt_flags and rt_expire is done without lock anyway. The kernel lock is needed to change rt_flags. Testing without KERNEL_LOCK() caused crashes. >>> Hi, >>> >>> I'm interesting is the system stable with the diff below? If so, we >>> could avoid kernel lock in the arpresolve(). >> I could not crash it. > I was too fast. Just after writing this mail I restarted the test. Hi, my boxes are still ok with mvs@ diff even if I'm running arp -ad in loop.
Re: arpresolve reduce kernel lock
On Wed, Apr 26, 2023 at 11:17:32AM +0200, Alexander Bluhm wrote: > On Tue, Apr 25, 2023 at 11:57:09PM +0300, Vitaliy Makkoveev wrote: > > On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote: > > > Hi, > > > > > > Mutex arp_mtx protects the llinfo_arp la_... fields. So kernel > > > lock is only needed for changing the route rt_flags. > > > > > > Of course there is a race between checking and setting rt_flags. > > > But the other checks of the RTF_REJECT flags were without kernel > > > lock before. This does not cause trouble, the worst thing that may > > > happen is to wait another exprire time for ARP retry. My diff does > > > not make it worse, reading rt_flags and rt_expire is done without > > > lock anyway. > > > > > > The kernel lock is needed to change rt_flags. Testing without > > > KERNEL_LOCK() caused crashes. > > > > > > > Hi, > > > > I'm interesting is the system stable with the diff below? If so, we > > could avoid kernel lock in the arpresolve(). > > I could not crash it. I was too fast. Just after writing this mail I restarted the test. [0] 0:arp- 1:ksh*"ot31.obsd-lab.genua.d" 12:00 26-Apr-23ESC[mESC(BESC[23;18Hpanic: pool_do_get: art_node free list modified: page 0xfd8747128000; item addr 0xfd8747128410; offset 0x0=0x182f4660f2a7188a != 0x182f4660f2a71889 Stopped at db_enter+0x14: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 45805 80626 0 0x14000 0x2003 reaper 353816 99629 0 0x14000 0x2001 softnet 487701 10647 0 0x14000 0x2002 softnet 152789 43620 0 0x14000 0x2007 softnet *356742 68683 0 0x14000 0x2005 softnet db_enter() at db_enter+0x14 panic(8213f5d0) at panic+0xc3 pool_do_get(824ae060,a,8000247b71b4) at pool_do_get+0x321 pool_get(824ae060,a) at pool_get+0x9a art_get(827ceac0,20) at art_get+0x30 rtable_insert(0,827ceac0,0,8000247b72f0,3,fd8745e4a948) at rtab le_insert+0x1a2 rtrequest(b,8000247b73f8,3,8000247b7498,0) at rtrequest+0x613 rt_clone(8000247b7500,8000247b7558,0) at rt_clone+0x77 rtalloc_mpath(8000247b7558,fd800369aad8,0) at rtalloc_mpath+0x50 in_ouraddr(fd80a94fcd00,8077e048,8000247b75d8) at in_ouraddr+0x 88 ip_input_if(8000247b7678,8000247b7684,4,0,8077e048) at ip_input ipv4_input(8077e048,fd80a94fcd00) at ipv4_input+0x3d ether_input(8077e048,fd80a94fcd00) at ether_input+0x3b5 if_input_process(8077e048,8000247b7768) at if_input_process+0x6f end trace frame: 0x8000247b77b0, count: 0 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{5}> show panic *cpu5: pool_do_get: art_node free list modified: page 0xfd8747128000; item a ddr 0xfd8747128410; offset 0x0=0x182f4660f2a7188a != 0x182f4660f2a71889 ddb{5}> show register rdi0 rsi 0x14 rbp 0x8000247b7060 rbx0 rdx 0xc000 rcx0x286 rax 0x9c r8 0x101010101010101 r9 0 r10 0xcdd678d0954ec026 r11 0x92611e3f4c85263e r12 0x80002252e990 r130 r140 r15 0x8213f5d0cy_pio_rec+0x1ea86 rip 0x81b0f124db_enter+0x14 cs 0x8 rflags 0x282 rsp 0x8000247b7060 ss 0x10 db_enter+0x14: popq%rbp ddb{5}> trace db_enter() at db_enter+0x14 panic(8213f5d0) at panic+0xc3 pool_do_get(824ae060,a,8000247b71b4) at pool_do_get+0x321 pool_get(824ae060,a) at pool_get+0x9a art_get(827ceac0,20) at art_get+0x30 rtable_insert(0,827ceac0,0,8000247b72f0,3,fd8745e4a948) at rtab le_insert+0x1a2 rtrequest(b,8000247b73f8,3,8000247b7498,0) at rtrequest+0x613 rt_clone(8000247b7500,8000247b7558,0) at rt_clone+0x77 rtalloc_mpath(8000247b7558,fd800369aad8,0) at rtalloc_mpath+0x50 in_ouraddr(fd80a94fcd00,8077e048,8000247b75d8) at in_ouraddr+0x 88 ip_input_if(8000247b7678,8000247b7684,4,0,8077e048) at ip_input _if+0x1f0 ipv4_input(8077e048,fd80a94fcd00) at ipv4_input+0x3d ether_input(8077e048,fd80a94fcd00) at ether_input+0x3b5 if_input_process(8077e048,8000247b7768) at if_input_process+0x6f ifiq_process(80782400) at ifiq_process+0x75 taskq_thread(80036000) at taskq_thread+0x100 end trace frame: 0x0, count: -16 ddb{5}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 3208
Re: arpresolve reduce kernel lock
On Tue, Apr 25, 2023 at 11:57:09PM +0300, Vitaliy Makkoveev wrote: > On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote: > > Hi, > > > > Mutex arp_mtx protects the llinfo_arp la_... fields. So kernel > > lock is only needed for changing the route rt_flags. > > > > Of course there is a race between checking and setting rt_flags. > > But the other checks of the RTF_REJECT flags were without kernel > > lock before. This does not cause trouble, the worst thing that may > > happen is to wait another exprire time for ARP retry. My diff does > > not make it worse, reading rt_flags and rt_expire is done without > > lock anyway. > > > > The kernel lock is needed to change rt_flags. Testing without > > KERNEL_LOCK() caused crashes. > > > > Hi, > > I'm interesting is the system stable with the diff below? If so, we > could avoid kernel lock in the arpresolve(). I could not crash it. Basically I run while :; do arp -nd 10.6.31.33; arp -nd 10.6.16.36; done on the router while forwarding between the two addresses. So ARP routes are constantly created and deleted. But I think the diff is not correct. While an update where rt_flags is changed with kernel lock, cannot sneak into this atomic operation, the ARP code can still corrupt rt_flags changes holding the kernel lock. Of course my stress test only triggers on side of the picture, the problem is not exploited. So if you just wanted to know, whether this kind of atomic operation works, I would say yes. To make it correct all rt_flags updates would have to be changed in that way. I doubt that this is the way to go. Maybe we could split rt_flags in multiple fields each with its own locking strategy. bluhm > Index: sys/netinet/if_ether.c > === > RCS file: /cvs/src/sys/netinet/if_ether.c,v > retrieving revision 1.263 > diff -u -p -r1.263 if_ether.c > --- sys/netinet/if_ether.c25 Apr 2023 16:24:25 - 1.263 > +++ sys/netinet/if_ether.c25 Apr 2023 20:55:08 - > @@ -462,14 +462,20 @@ arpresolve(struct ifnet *ifp, struct rte > mtx_leave(_mtx); > > if (reject == RTF_REJECT && !ISSET(rt->rt_flags, RTF_REJECT)) { > - KERNEL_LOCK(); > - SET(rt->rt_flags, RTF_REJECT); > - KERNEL_UNLOCK(); > + u_int flags; > + > + do { > + flags = rt->rt_flags; > + } while (atomic_cas_uint(>rt_flags, flags, > + flags | RTF_REJECT) != flags); > } > if (reject == ~RTF_REJECT && ISSET(rt->rt_flags, RTF_REJECT)) { > - KERNEL_LOCK(); > - CLR(rt->rt_flags, RTF_REJECT); > - KERNEL_UNLOCK(); > + u_int flags; > + > + do { > + flags = rt->rt_flags; > + } while (atomic_cas_uint(>rt_flags, flags, > + flags & ~RTF_REJECT) != flags); > } > if (refresh) > arprequest(ifp, (rt->rt_ifa->ifa_addr)->sin_addr.s_addr,
Re: arpresolve reduce kernel lock
On 25.4.2023. 22:57, Vitaliy Makkoveev wrote: > On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote: >> Hi, >> >> Mutex arp_mtx protects the llinfo_arp la_... fields. So kernel >> lock is only needed for changing the route rt_flags. >> >> Of course there is a race between checking and setting rt_flags. >> But the other checks of the RTF_REJECT flags were without kernel >> lock before. This does not cause trouble, the worst thing that may >> happen is to wait another exprire time for ARP retry. My diff does >> not make it worse, reading rt_flags and rt_expire is done without >> lock anyway. >> >> The kernel lock is needed to change rt_flags. Testing without >> KERNEL_LOCK() caused crashes. >> > Hi, > > I'm interesting is the system stable with the diff below? If so, we > could avoid kernel lock in the arpresolve(). Hi, I've put that diff on production boxes and in lab and for now firewalls are stable. Let's see after few more hours.
Re: arpresolve reduce kernel lock
On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote: > Hi, > > Mutex arp_mtx protects the llinfo_arp la_... fields. So kernel > lock is only needed for changing the route rt_flags. > > Of course there is a race between checking and setting rt_flags. > But the other checks of the RTF_REJECT flags were without kernel > lock before. This does not cause trouble, the worst thing that may > happen is to wait another exprire time for ARP retry. My diff does > not make it worse, reading rt_flags and rt_expire is done without > lock anyway. > > The kernel lock is needed to change rt_flags. Testing without > KERNEL_LOCK() caused crashes. > Hi, I'm interesting is the system stable with the diff below? If so, we could avoid kernel lock in the arpresolve(). Index: sys/netinet/if_ether.c === RCS file: /cvs/src/sys/netinet/if_ether.c,v retrieving revision 1.263 diff -u -p -r1.263 if_ether.c --- sys/netinet/if_ether.c 25 Apr 2023 16:24:25 - 1.263 +++ sys/netinet/if_ether.c 25 Apr 2023 20:55:08 - @@ -462,14 +462,20 @@ arpresolve(struct ifnet *ifp, struct rte mtx_leave(_mtx); if (reject == RTF_REJECT && !ISSET(rt->rt_flags, RTF_REJECT)) { - KERNEL_LOCK(); - SET(rt->rt_flags, RTF_REJECT); - KERNEL_UNLOCK(); + u_int flags; + + do { + flags = rt->rt_flags; + } while (atomic_cas_uint(>rt_flags, flags, + flags | RTF_REJECT) != flags); } if (reject == ~RTF_REJECT && ISSET(rt->rt_flags, RTF_REJECT)) { - KERNEL_LOCK(); - CLR(rt->rt_flags, RTF_REJECT); - KERNEL_UNLOCK(); + u_int flags; + + do { + flags = rt->rt_flags; + } while (atomic_cas_uint(>rt_flags, flags, + flags & ~RTF_REJECT) != flags); } if (refresh) arprequest(ifp, (rt->rt_ifa->ifa_addr)->sin_addr.s_addr,
Re: arpresolve reduce kernel lock
On Tue, Apr 25, 2023 at 04:15:49PM +, Klemens Nanni wrote: > A clearer version of this diff would use two new bools `expired' and `reject' > rather than a ternary `reject', but that can be polished and retested later. Or simpler even, use new `expired' and existing `refresh'. Refresh and reject are mutually exclusive actions. This is easier to grok, imho. Index: if_ether.c === RCS file: /cvs/src/sys/netinet/if_ether.c,v retrieving revision 1.263 diff -u -p -r1.263 if_ether.c --- if_ether.c 25 Apr 2023 16:24:25 - 1.263 +++ if_ether.c 25 Apr 2023 16:54:37 - @@ -339,7 +339,7 @@ arpresolve(struct ifnet *ifp, struct rte struct rtentry *rt = NULL; char addr[INET_ADDRSTRLEN]; time_t uptime; - int refresh = 0, reject = 0; + int refresh = 0, expired = 0; if (m->m_flags & M_BCAST) { /* broadcast */ memcpy(desten, etherbroadcastaddr, sizeof(etherbroadcastaddr)); @@ -444,13 +444,12 @@ arpresolve(struct ifnet *ifp, struct rte } #endif if (rt->rt_expire) { - reject = ~RTF_REJECT; + expired = 1; if (la->la_asked == 0 || rt->rt_expire != uptime) { rt->rt_expire = uptime; if (la->la_asked++ < arp_maxtries) refresh = 1; else { - reject = RTF_REJECT; rt->rt_expire += arpt_down; la->la_asked = 0; la->la_refreshed = 0; @@ -461,19 +460,23 @@ arpresolve(struct ifnet *ifp, struct rte } mtx_leave(_mtx); - if (reject == RTF_REJECT && !ISSET(rt->rt_flags, RTF_REJECT)) { - KERNEL_LOCK(); - SET(rt->rt_flags, RTF_REJECT); - KERNEL_UNLOCK(); - } - if (reject == ~RTF_REJECT && ISSET(rt->rt_flags, RTF_REJECT)) { - KERNEL_LOCK(); - CLR(rt->rt_flags, RTF_REJECT); - KERNEL_UNLOCK(); - } - if (refresh) - arprequest(ifp, (rt->rt_ifa->ifa_addr)->sin_addr.s_addr, - (dst)->sin_addr.s_addr, ac->ac_enaddr); + if (expired) { + if (refresh) { + KERNEL_LOCK(); + CLR(rt->rt_flags, RTF_REJECT); + KERNEL_UNLOCK(); + } else { + KERNEL_LOCK(); + SET(rt->rt_flags, RTF_REJECT); + KERNEL_UNLOCK(); + + arprequest(ifp, + (rt->rt_ifa->ifa_addr)->sin_addr.s_addr, + (dst)->sin_addr.s_addr, + ac->ac_enaddr); + } + } + return (EAGAIN); bad:
Re: arpresolve reduce kernel lock
On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote: > Hi, > > Mutex arp_mtx protects the llinfo_arp la_... fields. So kernel > lock is only needed for changing the route rt_flags. > > Of course there is a race between checking and setting rt_flags. > But the other checks of the RTF_REJECT flags were without kernel > lock before. This does not cause trouble, the worst thing that may > happen is to wait another exprire time for ARP retry. My diff does > not make it worse, reading rt_flags and rt_expire is done without > lock anyway. This is progress in the right direction, I think. The XXXSMP comment is incorrect and confusing. > > The kernel lock is needed to change rt_flags. Testing without > KERNEL_LOCK() caused crashes. Still not nice, reducing kernel lock scope helps. With this diff in, we couldn't break the kernel with further unlock diffs. A clearer version of this diff would use two new bools `expired' and `reject' rather than a ternary `reject', but that can be polished and retested later. OK kn > > ok? > > bluhm > > Index: netinet/if_ether.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/if_ether.c,v > retrieving revision 1.262 > diff -u -p -r1.262 if_ether.c > --- netinet/if_ether.c12 Apr 2023 16:14:42 - 1.262 > +++ netinet/if_ether.c24 Apr 2023 14:44:51 - > @@ -339,7 +339,7 @@ arpresolve(struct ifnet *ifp, struct rte > struct rtentry *rt = NULL; > char addr[INET_ADDRSTRLEN]; > time_t uptime; > - int refresh = 0; > + int refresh = 0, reject = 0; > > if (m->m_flags & M_BCAST) { /* broadcast */ > memcpy(desten, etherbroadcastaddr, sizeof(etherbroadcastaddr)); > @@ -411,16 +411,9 @@ arpresolve(struct ifnet *ifp, struct rte > if (ifp->if_flags & (IFF_NOARP|IFF_STATICARP)) > goto bad; > > - KERNEL_LOCK(); > - /* > - * Re-check since we grab the kernel lock after the first check. > - * rtrequest_delete() can be called with shared netlock. From > - * there arp_rtrequest() is reached which touches RTF_LLINFO > - * and rt_llinfo. As this is called with kernel lock we grab the > - * kernel lock here and are safe. XXXSMP > - */ > + mtx_enter(_mtx); > if (!ISSET(rt->rt_flags, RTF_LLINFO)) { > - KERNEL_UNLOCK(); > + mtx_leave(_mtx); > goto bad; > } > la = (struct llinfo_arp *)rt->rt_llinfo; > @@ -451,13 +444,13 @@ arpresolve(struct ifnet *ifp, struct rte > } > #endif > if (rt->rt_expire) { > - rt->rt_flags &= ~RTF_REJECT; > + reject = ~RTF_REJECT; > if (la->la_asked == 0 || rt->rt_expire != uptime) { > rt->rt_expire = uptime; > if (la->la_asked++ < arp_maxtries) > refresh = 1; > else { > - rt->rt_flags |= RTF_REJECT; > + reject = RTF_REJECT; > rt->rt_expire += arpt_down; > la->la_asked = 0; > la->la_refreshed = 0; > @@ -466,8 +459,18 @@ arpresolve(struct ifnet *ifp, struct rte > } > } > } > + mtx_leave(_mtx); > > - KERNEL_UNLOCK(); > + if (reject == RTF_REJECT && !ISSET(rt->rt_flags, RTF_REJECT)) { > + KERNEL_LOCK(); > + SET(rt->rt_flags, RTF_REJECT); > + KERNEL_UNLOCK(); > + } > + if (reject == ~RTF_REJECT && ISSET(rt->rt_flags, RTF_REJECT)) { > + KERNEL_LOCK(); > + CLR(rt->rt_flags, RTF_REJECT); > + KERNEL_UNLOCK(); > + } > if (refresh) > arprequest(ifp, (rt->rt_ifa->ifa_addr)->sin_addr.s_addr, > (dst)->sin_addr.s_addr, ac->ac_enaddr); >
arpresolve reduce kernel lock
Hi, Mutex arp_mtx protects the llinfo_arp la_... fields. So kernel lock is only needed for changing the route rt_flags. Of course there is a race between checking and setting rt_flags. But the other checks of the RTF_REJECT flags were without kernel lock before. This does not cause trouble, the worst thing that may happen is to wait another exprire time for ARP retry. My diff does not make it worse, reading rt_flags and rt_expire is done without lock anyway. The kernel lock is needed to change rt_flags. Testing without KERNEL_LOCK() caused crashes. ok? bluhm Index: netinet/if_ether.c === RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/if_ether.c,v retrieving revision 1.262 diff -u -p -r1.262 if_ether.c --- netinet/if_ether.c 12 Apr 2023 16:14:42 - 1.262 +++ netinet/if_ether.c 24 Apr 2023 14:44:51 - @@ -339,7 +339,7 @@ arpresolve(struct ifnet *ifp, struct rte struct rtentry *rt = NULL; char addr[INET_ADDRSTRLEN]; time_t uptime; - int refresh = 0; + int refresh = 0, reject = 0; if (m->m_flags & M_BCAST) { /* broadcast */ memcpy(desten, etherbroadcastaddr, sizeof(etherbroadcastaddr)); @@ -411,16 +411,9 @@ arpresolve(struct ifnet *ifp, struct rte if (ifp->if_flags & (IFF_NOARP|IFF_STATICARP)) goto bad; - KERNEL_LOCK(); - /* -* Re-check since we grab the kernel lock after the first check. -* rtrequest_delete() can be called with shared netlock. From -* there arp_rtrequest() is reached which touches RTF_LLINFO -* and rt_llinfo. As this is called with kernel lock we grab the -* kernel lock here and are safe. XXXSMP -*/ + mtx_enter(_mtx); if (!ISSET(rt->rt_flags, RTF_LLINFO)) { - KERNEL_UNLOCK(); + mtx_leave(_mtx); goto bad; } la = (struct llinfo_arp *)rt->rt_llinfo; @@ -451,13 +444,13 @@ arpresolve(struct ifnet *ifp, struct rte } #endif if (rt->rt_expire) { - rt->rt_flags &= ~RTF_REJECT; + reject = ~RTF_REJECT; if (la->la_asked == 0 || rt->rt_expire != uptime) { rt->rt_expire = uptime; if (la->la_asked++ < arp_maxtries) refresh = 1; else { - rt->rt_flags |= RTF_REJECT; + reject = RTF_REJECT; rt->rt_expire += arpt_down; la->la_asked = 0; la->la_refreshed = 0; @@ -466,8 +459,18 @@ arpresolve(struct ifnet *ifp, struct rte } } } + mtx_leave(_mtx); - KERNEL_UNLOCK(); + if (reject == RTF_REJECT && !ISSET(rt->rt_flags, RTF_REJECT)) { + KERNEL_LOCK(); + SET(rt->rt_flags, RTF_REJECT); + KERNEL_UNLOCK(); + } + if (reject == ~RTF_REJECT && ISSET(rt->rt_flags, RTF_REJECT)) { + KERNEL_LOCK(); + CLR(rt->rt_flags, RTF_REJECT); + KERNEL_UNLOCK(); + } if (refresh) arprequest(ifp, (rt->rt_ifa->ifa_addr)->sin_addr.s_addr, (dst)->sin_addr.s_addr, ac->ac_enaddr);