> On 18 May 2022, at 18:31, Alexander Bluhm <[email protected]> wrote:
>
> Hi,
>
> For parallel IP forwarding I had put kernel lock around arpresolve()
> as a quick workaround for crashes. Moving the kernel lock inside
> the function makes the hot path lock free. I see slight prerformace
> increase in my test and no lock contention in kstack flamegraph.
>
> http://bluhm.genua.de/perform/results/latest/2022-05-16T00%3A00%3A00Z/btrace/tcpbench_-S1000000_-t10_-n100_10.3.45.35-btrace-kstack.0.svg
> http://bluhm.genua.de/perform/results/latest/patch-sys-arpresolve-kernel-lock.1/btrace/tcpbench_-S1000000_-t10_-n100_10.3.45.35-btrace-kstack.0.svg
>
> Search for kernel_lock. Matched goes from 0.6% to 0.2%
>
> We are running such a diff in our genua code for a while. I think
> route flags need more love. I doubt that all flags and fields are
> consistent when run on multiple CPU. But this diff does not make
> it worse and increases MP pressure.
>
> ok?
>
Why do you re-check `la’ after you grabbed kernel lock? We always
clean RTF_LLINFO flag with exclusive net lock held, so `la’ can’t
be killed concurrently since you got it.
> bluhm
>
> Index: net/if_ethersubr.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v
> retrieving revision 1.279
> diff -u -p -r1.279 if_ethersubr.c
> --- net/if_ethersubr.c 22 Apr 2022 12:10:57 -0000 1.279
> +++ net/if_ethersubr.c 17 May 2022 22:26:22 -0000
> @@ -221,10 +221,7 @@ ether_resolve(struct ifnet *ifp, struct
>
> switch (af) {
> case AF_INET:
> - KERNEL_LOCK();
> - /* XXXSMP there is a MP race in arpresolve() */
> error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
> - KERNEL_UNLOCK();
> if (error)
> return (error);
> eh->ether_type = htons(ETHERTYPE_IP);
> @@ -285,10 +282,7 @@ ether_resolve(struct ifnet *ifp, struct
> break;
> #endif
> case AF_INET:
> - KERNEL_LOCK();
> - /* XXXSMP there is a MP race in arpresolve() */
> error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
> - KERNEL_UNLOCK();
> if (error)
> return (error);
> break;
> Index: netinet/if_ether.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/if_ether.c,v
> retrieving revision 1.248
> diff -u -p -r1.248 if_ether.c
> --- netinet/if_ether.c 28 Apr 2021 21:21:44 -0000 1.248
> +++ netinet/if_ether.c 17 May 2022 22:26:22 -0000
> @@ -347,8 +347,7 @@ arpresolve(struct ifnet *ifp, struct rte
> log(LOG_DEBUG, "%s: %s: route contains no arp information\n",
> __func__, inet_ntop(AF_INET, &satosin(rt_key(rt))->sin_addr,
> addr, sizeof(addr)));
> - m_freem(m);
> - return (EINVAL);
> + goto bad;
> }
>
> sdl = satosdl(rt->rt_gateway);
> @@ -386,6 +385,15 @@ arpresolve(struct ifnet *ifp, struct rte
> if (ifp->if_flags & (IFF_NOARP|IFF_STATICARP))
> goto bad;
>
> + KERNEL_LOCK();
> + /* re-check since we grab the KERNEL lock after the first check */
> + if (!ISSET(rt->rt_flags, RTF_LLINFO)) {
> + KERNEL_UNLOCK();
> + goto bad;
> + }
> + la = (struct llinfo_arp *)rt->rt_llinfo;
> + KASSERT(la != NULL);
> +
> /*
> * There is an arptab entry, but no ethernet address
> * response yet. Insert mbuf in hold queue if below limit
> @@ -430,6 +438,7 @@ arpresolve(struct ifnet *ifp, struct rte
> }
> }
>
> + KERNEL_UNLOCK();
> return (EAGAIN);
>
> bad:
>