On 27/06/22(Mon) 19:11, Alexander Bluhm wrote: > On Mon, Jun 27, 2022 at 11:49:23AM +0200, Alexander Bluhm wrote: > > On Sat, May 21, 2022 at 10:50:28PM +0300, Vitaliy Makkoveev wrote: > > > This diff looks good, except the re-check after kernel lock. It???s > > > supposed `rt??? could became inconsistent, right? But what stops to > > > make it inconsistent after first unlocked RTF_LLINFO flag check? > > > > > > I think this re-check should gone. > > > > I have copied the re-check from intenal genua code. I am not sure > > if it is really needed. We know from Hrvoje that the diff with > > re-check is stable. And we know that it crashes without kernel > > lock at all. > > > > I have talked with mpi@ about it. The main problem is that we have > > no write lock when we change RTF_LLINFO. Then rt_llinfo can get > > NULL or inconsistent. > > > > Plan is that I put some lock asserts into route add and delete. > > This helps to find the parts that modify RTF_LLINFO and rt_llinfo > > without exclusive lock. > > > > Maybe we need some kernel lock somewhere else. Or we want to use > > some ARP mutex. We could also add some comment and commit the diff > > that I have. We know that it is faster and stable. Pushing the > > kernel lock down or replacing it with something clever can always > > be done later. > > We need the re-check. I have tested it with a printf. It is > triggered by running arp -d in a loop while forwarding. > > The concurrent threads are these: > > rtrequest_delete(ffff8000246b7428,3,ffff800000775048,ffff8000246b7510,0) at > rtrequest_delete+0x67 > rtdeletemsg(fffffd8834a23550,ffff800000775048,0) at rtdeletemsg+0x1ad > rtrequest(b,ffff8000246b7678,3,ffff8000246b7718,0) at rtrequest+0x55c > rt_clone(ffff8000246b7780,ffff8000246b78f8,0) at rt_clone+0x73 > rtalloc_mpath(ffff8000246b78f8,fffffd8003169ad8,0) at rtalloc_mpath+0x4c > ip_forward(fffffd80b8cc7e00,ffff80000077d048,fffffd8834a230f0,0) at > ip_forward+0x137 > ip_input_if(ffff8000246b7a28,ffff8000246b7a34,4,0,ffff80000077d048) at > ip_input_if+0x353 > ipv4_input(ffff80000077d048,fffffd80b8cc7e00) at ipv4_input+0x39 > ether_input(ffff80000077d048,fffffd80b8cc7e00) at ether_input+0x3ad > if_input_process(ffff80000077d048,ffff8000246b7b18) at if_input_process+0x6f > ifiq_process(ffff80000077d458) at ifiq_process+0x69 > taskq_thread(ffff800000036080) at taskq_thread+0x100 > > rtrequest_delete(ffff8000246c8d08,3,ffff800000775048,ffff8000246c8df0,0) at > rtrequest_delete+0x67 > rtdeletemsg(fffffd8834a230f0,ffff800000775048,0) at rtdeletemsg+0x1ad > rtrequest(b,ffff8000246c8f58,3,ffff8000246c8ff8,0) at rtrequest+0x55c > rt_clone(ffff8000246c9060,ffff8000246c90b8,0) at rt_clone+0x73 > rtalloc_mpath(ffff8000246c90b8,fffffd8002c754d8,0) at rtalloc_mpath+0x4c > in_ouraddr(fffffd8094771b00,ffff80000077d048,ffff8000246c9138) at > in_ouraddr+0x84 > ip_input_if(ffff8000246c91d8,ffff8000246c91e4,4,0,ffff80000077d048) at > ip_input_if+0x1cd > ipv4_input(ffff80000077d048,fffffd8094771b00) at ipv4_input+0x39 > ether_input(ffff80000077d048,fffffd8094771b00) at ether_input+0x3ad > if_input_process(ffff80000077d048,ffff8000246c92c8) at if_input_process+0x6f > ifiq_process(ffff800000781400) at ifiq_process+0x69 > taskq_thread(ffff800000036200) at taskq_thread+0x100 > > I have added a comment why kernel lock protects us. I would like > to get this in. It has been tested, reduces the kernel lock and > is faster. A more clever lock can be done later. > > ok?
I don't understand how the KERNEL_LOCK() there prevents rtdeletemsg() from running. rtrequest_delete() seems completely broken it assumes it holds an exclusive lock. To "fix" arp the KERNEL_LOCK() should also be taken in RTM_DELETE and RTM_RESOLVE inside arp_rtrequest(). Or maybe around ifp->if_rtrequest() But it doesn't mean there isn't another problem in rtdeletemsg()... > Index: net/if_ethersubr.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v > retrieving revision 1.281 > diff -u -p -r1.281 if_ethersubr.c > --- net/if_ethersubr.c 26 Jun 2022 21:19:53 -0000 1.281 > +++ net/if_ethersubr.c 27 Jun 2022 16:55:15 -0000 > @@ -221,10 +221,7 @@ ether_resolve(struct ifnet *ifp, struct > > switch (af) { > case AF_INET: > - KERNEL_LOCK(); > - /* XXXSMP there is a MP race in arpresolve() */ > error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); > - KERNEL_UNLOCK(); > if (error) > return (error); > eh->ether_type = htons(ETHERTYPE_IP); > @@ -285,10 +282,7 @@ ether_resolve(struct ifnet *ifp, struct > break; > #endif > case AF_INET: > - KERNEL_LOCK(); > - /* XXXSMP there is a MP race in arpresolve() */ > error = arpresolve(ifp, rt, m, dst, eh->ether_dhost); > - KERNEL_UNLOCK(); > if (error) > return (error); > break; > Index: netinet/if_ether.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/if_ether.c,v > retrieving revision 1.249 > diff -u -p -r1.249 if_ether.c > --- netinet/if_ether.c 27 Jun 2022 12:47:07 -0000 1.249 > +++ netinet/if_ether.c 27 Jun 2022 16:55:15 -0000 > @@ -352,8 +352,7 @@ arpresolve(struct ifnet *ifp, struct rte > log(LOG_DEBUG, "%s: %s: route contains no arp information\n", > __func__, inet_ntop(AF_INET, &satosin(rt_key(rt))->sin_addr, > addr, sizeof(addr))); > - m_freem(m); > - return (EINVAL); > + goto bad; > } > > sdl = satosdl(rt->rt_gateway); > @@ -391,6 +390,21 @@ arpresolve(struct ifnet *ifp, struct rte > if (ifp->if_flags & (IFF_NOARP|IFF_STATICARP)) > goto bad; > > + KERNEL_LOCK(); > + /* > + * Re-check since we grab the kernel lock after the first check. > + * rtrequest_delete() can be called with shared netlock. From > + * there arp_rtrequest() is reached which touches RTF_LLINFO > + * and rt_llinfo. As this is called with kernel lock we grab the > + * kernel lock here and are safe. XXXSMP > + */ > + if (!ISSET(rt->rt_flags, RTF_LLINFO)) { > + KERNEL_UNLOCK(); > + goto bad; > + } > + la = (struct llinfo_arp *)rt->rt_llinfo; > + KASSERT(la != NULL); > + > /* > * There is an arptab entry, but no ethernet address > * response yet. Insert mbuf in hold queue if below limit > @@ -435,6 +449,7 @@ arpresolve(struct ifnet *ifp, struct rte > } > } > > + KERNEL_UNLOCK(); > return (EAGAIN); > > bad: