Module Name: src Committed By: martin Date: Tue Sep 24 18:27:10 UTC 2019
Modified Files: src/sys/net [netbsd-8]: if.c if.h if_gif.c if_gif.h if_ipsec.c if_ipsec.h if_l2tp.c if_l2tp.h route.c route.h src/sys/netinet [netbsd-8]: in_gif.c in_l2tp.c ip_input.c wqinput.c src/sys/netinet6 [netbsd-8]: in6_gif.c in6_l2tp.c ip6_forward.c ip6_input.c src/sys/netipsec [netbsd-8]: ipsec_output.c ipsecif.c key.c Log Message: Pull up following revision(s) (requested by knakahara in ticket #1385): sys/net/if.c 1.461 sys/net/if.h 1.277 sys/net/if_gif.c 1.149 sys/net/if_gif.h 1.33 sys/net/if_ipsec.c 1.19,1.20,1.24 sys/net/if_ipsec.h 1.5 sys/net/if_l2tp.c 1.33,1.36-1.39 sys/net/if_l2tp.h 1.7,1.8 sys/net/route.c 1.220,1.221 sys/net/route.h 1.125 sys/netinet/in_gif.c 1.95 sys/netinet/in_l2tp.c 1.17 sys/netinet/ip_input.c 1.391,1.392 sys/netinet/wqinput.c 1.6 sys/netinet6/in6_gif.c 1.94 sys/netinet6/in6_l2tp.c 1.18 sys/netinet6/ip6_forward.c 1.97 sys/netinet6/ip6_input.c 1.210,1.211 sys/netipsec/ipsec_output.c 1.82,1.83 (patched) sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched) sys/netipsec/key.c 1.259,1.260 ipsecif(4) support input drop packet counter. ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks. Remove unnecessary addresses in PF_KEY message. MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says ==================== 5. SPD Update // snip SADB_X_SPDADD: // snip sadb_x_ipsecrequest_reqid: An ID for that SA can be passed to the kernel in the sadb_x_ipsecrequest_reqid field. If tunnel mode is specified, the sadb_x_ipsecrequest structure is followed by two sockaddr structures that define the tunnel endpoint addresses. In the case that transport mode is used, no additional addresses are specified. ==================== see: <a rel="nofollow" href="https://tools.ietf.org/html/draft-schilcher-mobike-pfkey-extension-01">https://tools.ietf.org/html/draft-schilcher-mobike-pfkey-extension-01</a> ipsecif(4) uses transport mode, so it should not add addresses. ipsecif(4) supports multiple peers in the same NAPT. E.g. ipsec0 connects between NetBSD_A and NetBSD_B, ipsec1 connects NetBSD_A and NetBSD_C at the following figure. +----------+ +----| NetBSD_B | +----------+ +------+ | +----------+ | NetBSD_A |--- ... ---| NAPT |---+ +----------+ +------+ | +----------+ +----| NetBSD_C | +----------+ Add ATF later. l2tp(4): fix output bytes counter. Pointed by k-goda@IIJ, thanks. remove a variable which is no longer used. l2tp: initialize mowner variables for MBUFTRACE Avoid having a rtcache directly in a percpu storage percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Using rtcache, i.e., packet processing, typically involves sleepable operations such as rwlock so we must avoid dereferencing a rtcache that is directly stored in a percpu storage during packet processing. Address this situation by having just a pointer to a rtcache in a percpu storage instead. Reviewed by knakahara@ and yamaguchi@ wqinput: avoid having struct wqinput_worklist directly in a percpu storage percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Input handlers of wqinput normally involves sleepable operations so we must avoid dereferencing a percpu data (struct wqinput_worklist) after executing an input handler. Address this situation by having just a pointer to the data in a percpu storage instead. Reviewed by knakahara@ and yamaguchi@ Add missing #include <sys/kmem.h> Divide Tx context of l2tp(4) to improve performance. It seems l2tp(4) call path is too long for instruction cache. So, dividing l2tp(4) Tx context improves CPU use efficiency. After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000). Apply some missing changes lost on the previous commit Avoid having a rtcache directly in a percpu storage for tunnel protocols. percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Using rtcache, i.e., packet processing, typically involves sleepable operations such as rwlock so we must avoid dereferencing a rtcache that is directly stored in a percpu storage during packet processing. Address this situation by having just a pointer to a rtcache in a percpu storage instead. Reviewed by ozaki-r@ and yamaguchi@ l2tp(4): avoid having struct ifqueue directly in a percpu storage. percpu(9) has a certain memory storage for each CPU and provides it by the piece to users. If the storages went short, percpu(9) enlarges them by allocating new larger memory areas, replacing old ones with them and destroying the old ones. A percpu storage referenced by a pointer gotten via percpu_getref can be destroyed by the mechanism after a running thread sleeps even if percpu_putref has not been called. Tx processing of l2tp(4) uses normally involves sleepable operations so we must avoid dereferencing a percpu data (struct ifqueue) after executing Tx processing. Address this situation by having just a pointer to the data in a percpu storage instead. Reviewed by ozaki-r@ and yamaguchi@ To generate a diff of this commit: cvs rdiff -u -r1.394.2.17 -r1.394.2.18 src/sys/net/if.c cvs rdiff -u -r1.239.2.7 -r1.239.2.8 src/sys/net/if.h cvs rdiff -u -r1.126.2.14 -r1.126.2.15 src/sys/net/if_gif.c cvs rdiff -u -r1.25.8.4 -r1.25.8.5 src/sys/net/if_gif.h cvs rdiff -u -r1.3.2.11 -r1.3.2.12 src/sys/net/if_ipsec.c cvs rdiff -u -r1.1.2.4 -r1.1.2.5 src/sys/net/if_ipsec.h cvs rdiff -u -r1.11.2.10 -r1.11.2.11 src/sys/net/if_l2tp.c cvs rdiff -u -r1.2.2.3 -r1.2.2.4 src/sys/net/if_l2tp.h cvs rdiff -u -r1.194.6.13 -r1.194.6.14 src/sys/net/route.c cvs rdiff -u -r1.112.4.5 -r1.112.4.6 src/sys/net/route.h cvs rdiff -u -r1.87.8.5 -r1.87.8.6 src/sys/netinet/in_gif.c cvs rdiff -u -r1.2.8.7 -r1.2.8.8 src/sys/netinet/in_l2tp.c cvs rdiff -u -r1.355.2.7 -r1.355.2.8 src/sys/netinet/ip_input.c cvs rdiff -u -r1.3.2.1 -r1.3.2.2 src/sys/netinet/wqinput.c cvs rdiff -u -r1.85.6.6 -r1.85.6.7 src/sys/netinet6/in6_gif.c cvs rdiff -u -r1.5.8.7 -r1.5.8.8 src/sys/netinet6/in6_l2tp.c cvs rdiff -u -r1.87.2.3 -r1.87.2.4 src/sys/netinet6/ip6_forward.c cvs rdiff -u -r1.178.2.8 -r1.178.2.9 src/sys/netinet6/ip6_input.c cvs rdiff -u -r1.48.2.3 -r1.48.2.4 src/sys/netipsec/ipsec_output.c cvs rdiff -u -r1.1.2.8 -r1.1.2.9 src/sys/netipsec/ipsecif.c cvs rdiff -u -r1.163.2.13 -r1.163.2.14 src/sys/netipsec/key.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
Modified files: Index: src/sys/net/if.c diff -u src/sys/net/if.c:1.394.2.17 src/sys/net/if.c:1.394.2.18 --- src/sys/net/if.c:1.394.2.17 Mon Aug 19 14:27:16 2019 +++ src/sys/net/if.c Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: if.c,v 1.394.2.17 2019/08/19 14:27:16 martin Exp $ */ +/* $NetBSD: if.c,v 1.394.2.18 2019/09/24 18:27:09 martin Exp $ */ /*- * Copyright (c) 1999, 2000, 2001, 2008 The NetBSD Foundation, Inc. @@ -90,7 +90,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: if.c,v 1.394.2.17 2019/08/19 14:27:16 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: if.c,v 1.394.2.18 2019/09/24 18:27:09 martin Exp $"); #if defined(_KERNEL_OPT) #include "opt_inet.h" @@ -2892,6 +2892,63 @@ if_tunnel_check_nesting(struct ifnet *if return 0; } +static void +if_tunnel_ro_init_pc(void *p, void *arg __unused, struct cpu_info *ci __unused) +{ + struct tunnel_ro *tro = p; + + tro->tr_ro = kmem_zalloc(sizeof(*tro->tr_ro), KM_SLEEP); + tro->tr_lock = mutex_obj_alloc(MUTEX_DEFAULT, IPL_NONE); +} + +percpu_t * +if_tunnel_alloc_ro_percpu(void) +{ + percpu_t *ro_percpu; + + ro_percpu = percpu_alloc(sizeof(struct tunnel_ro)); + percpu_foreach(ro_percpu, if_tunnel_ro_init_pc, NULL); + + return ro_percpu; +} + +static void +if_tunnel_ro_fini_pc(void *p, void *arg __unused, struct cpu_info *ci __unused) +{ + struct tunnel_ro *tro = p; + + rtcache_free(tro->tr_ro); + kmem_free(tro->tr_ro, sizeof(*tro->tr_ro)); + + mutex_obj_free(tro->tr_lock); +} + +void +if_tunnel_free_ro_percpu(percpu_t *ro_percpu) +{ + + percpu_foreach(ro_percpu, if_tunnel_ro_fini_pc, NULL); + percpu_free(ro_percpu, sizeof(struct tunnel_ro)); +} + + +static void +if_tunnel_rtcache_free_pc(void *p, void *arg __unused, struct cpu_info *ci __unused) +{ + struct tunnel_ro *tro = p; + + mutex_enter(tro->tr_lock); + rtcache_free(tro->tr_ro); + mutex_exit(tro->tr_lock); +} + +void if_tunnel_ro_percpu_rtcache_free(percpu_t *ro_percpu) +{ + + percpu_foreach(ro_percpu, if_tunnel_rtcache_free_pc, NULL); +} + + /* common */ int ifioctl_common(struct ifnet *ifp, u_long cmd, void *data) Index: src/sys/net/if.h diff -u src/sys/net/if.h:1.239.2.7 src/sys/net/if.h:1.239.2.8 --- src/sys/net/if.h:1.239.2.7 Fri Jul 13 15:49:55 2018 +++ src/sys/net/if.h Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: if.h,v 1.239.2.7 2018/07/13 15:49:55 martin Exp $ */ +/* $NetBSD: if.h,v 1.239.2.8 2019/09/24 18:27:09 martin Exp $ */ /*- * Copyright (c) 1999, 2000, 2001 The NetBSD Foundation, Inc. @@ -1111,6 +1111,33 @@ void if_acquire(struct ifnet *, struct p #define if_release if_put int if_tunnel_check_nesting(struct ifnet *, struct mbuf *, int); +percpu_t *if_tunnel_alloc_ro_percpu(void); +void if_tunnel_free_ro_percpu(percpu_t *); +void if_tunnel_ro_percpu_rtcache_free(percpu_t *); + +struct tunnel_ro { + struct route *tr_ro; + kmutex_t *tr_lock; +}; + +static inline void +if_tunnel_get_ro(percpu_t *ro_percpu, struct route **ro, kmutex_t **lock) +{ + struct tunnel_ro *tro; + + tro = percpu_getref(ro_percpu); + *ro = tro->tr_ro; + *lock = tro->tr_lock; + mutex_enter(*lock); +} + +static inline void +if_tunnel_put_ro(percpu_t *ro_percpu, kmutex_t *lock) +{ + + mutex_exit(lock); + percpu_putref(ro_percpu); +} static inline if_index_t if_get_index(const struct ifnet *ifp) Index: src/sys/net/if_gif.c diff -u src/sys/net/if_gif.c:1.126.2.14 src/sys/net/if_gif.c:1.126.2.15 --- src/sys/net/if_gif.c:1.126.2.14 Mon Apr 22 09:06:49 2019 +++ src/sys/net/if_gif.c Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: if_gif.c,v 1.126.2.14 2019/04/22 09:06:49 martin Exp $ */ +/* $NetBSD: if_gif.c,v 1.126.2.15 2019/09/24 18:27:09 martin Exp $ */ /* $KAME: if_gif.c,v 1.76 2001/08/20 02:01:02 kjc Exp $ */ /* @@ -31,7 +31,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: if_gif.c,v 1.126.2.14 2019/04/22 09:06:49 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: if_gif.c,v 1.126.2.15 2019/09/24 18:27:09 martin Exp $"); #ifdef _KERNEL_OPT #include "opt_inet.h" @@ -107,9 +107,6 @@ static struct { struct psref_class *gv_psref_class __read_mostly; -static void gif_ro_init_pc(void *, void *, struct cpu_info *); -static void gif_ro_fini_pc(void *, void *, struct cpu_info *); - static int gifattach0(struct gif_softc *); static int gif_output(struct ifnet *, struct mbuf *, const struct sockaddr *, const struct rtentry *); @@ -274,8 +271,7 @@ gif_clone_create(struct if_clone *ifc, i mutex_init(&sc->gif_lock, MUTEX_DEFAULT, IPL_NONE); sc->gif_psz = pserialize_create(); - sc->gif_ro_percpu = percpu_alloc(sizeof(struct gif_ro)); - percpu_foreach(sc->gif_ro_percpu, gif_ro_init_pc, NULL); + sc->gif_ro_percpu = if_tunnel_alloc_ro_percpu(); mutex_enter(&gif_softcs.lock); LIST_INSERT_HEAD(&gif_softcs.list, sc, gif_list); mutex_exit(&gif_softcs.lock); @@ -312,32 +308,6 @@ gifattach0(struct gif_softc *sc) return 0; } -static void -gif_ro_init_pc(void *p, void *arg __unused, struct cpu_info *ci __unused) -{ - struct gif_ro *gro = p; - - gro->gr_lock = mutex_obj_alloc(MUTEX_DEFAULT, IPL_NONE); -} - -static void -gif_ro_fini_pc(void *p, void *arg __unused, struct cpu_info *ci __unused) -{ - struct gif_ro *gro = p; - - rtcache_free(&gro->gr_ro); - - mutex_obj_free(gro->gr_lock); -} - -void -gif_rtcache_free_pc(void *p, void *arg __unused, struct cpu_info *ci __unused) -{ - struct gif_ro *gro = p; - - rtcache_free(&gro->gr_ro); -} - static int gif_clone_destroy(struct ifnet *ifp) { @@ -350,8 +320,7 @@ gif_clone_destroy(struct ifnet *ifp) bpf_detach(ifp); if_detach(ifp); - percpu_foreach(sc->gif_ro_percpu, gif_ro_fini_pc, NULL); - percpu_free(sc->gif_ro_percpu, sizeof(struct gif_ro)); + if_tunnel_free_ro_percpu(sc->gif_ro_percpu); pserialize_destroy(sc->gif_psz); mutex_destroy(&sc->gif_lock); Index: src/sys/net/if_gif.h diff -u src/sys/net/if_gif.h:1.25.8.4 src/sys/net/if_gif.h:1.25.8.5 --- src/sys/net/if_gif.h:1.25.8.4 Sun Oct 21 11:55:54 2018 +++ src/sys/net/if_gif.h Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: if_gif.h,v 1.25.8.4 2018/10/21 11:55:54 martin Exp $ */ +/* $NetBSD: if_gif.h,v 1.25.8.5 2019/09/24 18:27:09 martin Exp $ */ /* $KAME: if_gif.h,v 1.23 2001/07/27 09:21:42 itojun Exp $ */ /* @@ -55,11 +55,6 @@ extern struct psref_class *gv_psref_clas struct encaptab; -struct gif_ro { - struct route gr_ro; - kmutex_t *gr_lock; -}; - struct gif_variant { struct gif_softc *gv_softc; struct sockaddr *gv_psrc; /* Physical src addr */ @@ -73,7 +68,7 @@ struct gif_variant { struct gif_softc { struct ifnet gif_if; /* common area - must be at the top */ - percpu_t *gif_ro_percpu; /* struct gif_ro */ + percpu_t *gif_ro_percpu; /* struct tunnel_ro */ struct gif_variant *gif_var; /* * reader must use gif_getref_variant() * instead of direct dereference. @@ -131,8 +126,6 @@ gif_heldref_variant(struct gif_variant * /* Prototypes */ void gif_input(struct mbuf *, int, struct ifnet *); -void gif_rtcache_free_pc(void *, void *, struct cpu_info *); - #ifdef GIF_ENCAPCHECK int gif_encapcheck(struct mbuf *, int, int, void *); #endif @@ -147,8 +140,8 @@ int gif_encapcheck(struct mbuf *, int, i * - gif_var->gv_psref for reader * gif_softc->gif_var is used for variant values while the gif tunnel * exists. - * + Each CPU's gif_ro.gr_ro of gif_ro_percpu are protected by - * percpu'ed gif_ro.gr_lock. + * + Each CPU's tunnel_ro.tr_ro of gif_ro_percpu are protected by + * percpu'ed tunnel_ro.tr_lock. * * Locking order: * - encap_lock => gif_softc->gif_lock => gif_softcs.lock Index: src/sys/net/if_ipsec.c diff -u src/sys/net/if_ipsec.c:1.3.2.11 src/sys/net/if_ipsec.c:1.3.2.12 --- src/sys/net/if_ipsec.c:1.3.2.11 Fri Mar 15 14:47:22 2019 +++ src/sys/net/if_ipsec.c Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: if_ipsec.c,v 1.3.2.11 2019/03/15 14:47:22 martin Exp $ */ +/* $NetBSD: if_ipsec.c,v 1.3.2.12 2019/09/24 18:27:09 martin Exp $ */ /* * Copyright (c) 2017 Internet Initiative Japan Inc. @@ -27,7 +27,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: if_ipsec.c,v 1.3.2.11 2019/03/15 14:47:22 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: if_ipsec.c,v 1.3.2.12 2019/09/24 18:27:09 martin Exp $"); #ifdef _KERNEL_OPT #include "opt_inet.h" @@ -80,9 +80,6 @@ __KERNEL_RCSID(0, "$NetBSD: if_ipsec.c,v #include <netipsec/ipsec.h> #include <netipsec/ipsecif.h> -static void if_ipsec_ro_init_pc(void *, void *, struct cpu_info *); -static void if_ipsec_ro_fini_pc(void *, void *, struct cpu_info *); - static int if_ipsec_clone_create(struct if_clone *, int); static int if_ipsec_clone_destroy(struct ifnet *); @@ -183,8 +180,7 @@ if_ipsec_clone_create(struct if_clone *i sc->ipsec_var = var; mutex_init(&sc->ipsec_lock, MUTEX_DEFAULT, IPL_NONE); sc->ipsec_psz = pserialize_create(); - sc->ipsec_ro_percpu = percpu_alloc(sizeof(struct ipsec_ro)); - percpu_foreach(sc->ipsec_ro_percpu, if_ipsec_ro_init_pc, NULL); + sc->ipsec_ro_percpu = if_tunnel_alloc_ro_percpu(); mutex_enter(&ipsec_softcs.lock); LIST_INSERT_HEAD(&ipsec_softcs.list, sc, ipsec_list); @@ -214,24 +210,6 @@ if_ipsec_attach0(struct ipsec_softc *sc) if_register(&sc->ipsec_if); } -static void -if_ipsec_ro_init_pc(void *p, void *arg __unused, struct cpu_info *ci __unused) -{ - struct ipsec_ro *iro = p; - - iro->ir_lock = mutex_obj_alloc(MUTEX_DEFAULT, IPL_NONE); -} - -static void -if_ipsec_ro_fini_pc(void *p, void *arg __unused, struct cpu_info *ci __unused) -{ - struct ipsec_ro *iro = p; - - rtcache_free(&iro->ir_ro); - - mutex_obj_free(iro->ir_lock); -} - static int if_ipsec_clone_destroy(struct ifnet *ifp) { @@ -250,8 +228,7 @@ if_ipsec_clone_destroy(struct ifnet *ifp bpf_detach(ifp); if_detach(ifp); - percpu_foreach(sc->ipsec_ro_percpu, if_ipsec_ro_fini_pc, NULL); - percpu_free(sc->ipsec_ro_percpu, sizeof(struct ipsec_ro)); + if_tunnel_free_ro_percpu(sc->ipsec_ro_percpu); pserialize_destroy(sc->ipsec_psz); mutex_destroy(&sc->ipsec_lock); @@ -509,6 +486,7 @@ if_ipsec_in_enqueue(struct mbuf *m, int ifp->if_ibytes += pktlen; ifp->if_ipackets++; } else { + ifp->if_iqdrops++; m_freem(m); } @@ -1597,14 +1575,7 @@ if_ipsec_add_sp0(struct sockaddr *src, i padlen = PFKEY_UNUNIT64(xpl.sadb_x_policy_len) - sizeof(xpl); if (policy == IPSEC_POLICY_IPSEC) { if_ipsec_add_mbuf(m, &xisr, sizeof(xisr)); - /* - * secpolicy.req->saidx.{src, dst} must be set port number, - * when it is used for NAT-T. - */ - if_ipsec_add_mbuf_addr_port(m, src, sport, false); - if_ipsec_add_mbuf_addr_port(m, dst, dport, false); padlen -= PFKEY_ALIGN8(sizeof(xisr)); - padlen -= PFKEY_ALIGN8(src->sa_len + dst->sa_len); } if_ipsec_add_pad(m, padlen); Index: src/sys/net/if_ipsec.h diff -u src/sys/net/if_ipsec.h:1.1.2.4 src/sys/net/if_ipsec.h:1.1.2.5 --- src/sys/net/if_ipsec.h:1.1.2.4 Sun Oct 21 11:55:54 2018 +++ src/sys/net/if_ipsec.h Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: if_ipsec.h,v 1.1.2.4 2018/10/21 11:55:54 martin Exp $ */ +/* $NetBSD: if_ipsec.h,v 1.1.2.5 2019/09/24 18:27:09 martin Exp $ */ /* * Copyright (c) 2017 Internet Initiative Japan Inc. @@ -86,14 +86,9 @@ struct ipsec_variant { struct psref_target iv_psref; }; -struct ipsec_ro { - struct route ir_ro; - kmutex_t *ir_lock; -}; - struct ipsec_softc { struct ifnet ipsec_if; /* common area - must be at the top */ - percpu_t *ipsec_ro_percpu; /* struct ipsec_ro */ + percpu_t *ipsec_ro_percpu; /* struct tunnel_ro */ struct ipsec_variant *ipsec_var; /* * reader must use ipsec_getref_variant() * instead of direct dereference. @@ -220,7 +215,7 @@ int if_ipsec_ioctl(struct ifnet *, u_lon * - ipsec_var->iv_psref for reader * ipsec_softc->ipsec_var is used for variant values while the ipsec tunnel * exists. - * + struct ipsec_ro->ir_ro is protected by struct ipsec_ro->ir_lock. + * + struct tunnel_ro->tr_ro is protected by struct tunnel_ro->tr_lock. * This lock is required to exclude softnet/0 lwp(such as output * processing softint) and processing lwp(such as DAD timer processing). * + if_ipsec_share_sp() and if_ipsec_unshare_sp() operations are serialized by Index: src/sys/net/if_l2tp.c diff -u src/sys/net/if_l2tp.c:1.11.2.10 src/sys/net/if_l2tp.c:1.11.2.11 --- src/sys/net/if_l2tp.c:1.11.2.10 Sun Oct 21 11:55:54 2018 +++ src/sys/net/if_l2tp.c Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: if_l2tp.c,v 1.11.2.10 2018/10/21 11:55:54 martin Exp $ */ +/* $NetBSD: if_l2tp.c,v 1.11.2.11 2019/09/24 18:27:09 martin Exp $ */ /* * Copyright (c) 2017 Internet Initiative Japan Inc. @@ -31,7 +31,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: if_l2tp.c,v 1.11.2.10 2018/10/21 11:55:54 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: if_l2tp.c,v 1.11.2.11 2019/09/24 18:27:09 martin Exp $"); #ifdef _KERNEL_OPT #include "opt_inet.h" @@ -118,8 +118,8 @@ static struct { pserialize_t l2tp_psz __read_mostly; struct psref_class *lv_psref_class __read_mostly; -static void l2tp_ro_init_pc(void *, void *, struct cpu_info *); -static void l2tp_ro_fini_pc(void *, void *, struct cpu_info *); +static void l2tp_ifq_init_pc(void *, void *, struct cpu_info *); +static void l2tp_ifq_fini_pc(void *, void *, struct cpu_info *); static int l2tp_clone_create(struct if_clone *, int); static int l2tp_clone_destroy(struct ifnet *); @@ -127,9 +127,12 @@ static int l2tp_clone_destroy(struct ifn struct if_clone l2tp_cloner = IF_CLONE_INITIALIZER("l2tp", l2tp_clone_create, l2tp_clone_destroy); +static int l2tp_tx_enqueue(struct l2tp_variant *, struct mbuf *); static int l2tp_output(struct ifnet *, struct mbuf *, const struct sockaddr *, const struct rtentry *); +static void l2tp_sendit(struct l2tp_variant *, struct mbuf *); static void l2tpintr(struct l2tp_variant *); +static void l2tpintr_softint(void *); static void l2tp_hash_init(void); static int l2tp_hash_fini(void); @@ -152,6 +155,20 @@ static void l2tp_set_state(struct l2tp_s static int l2tp_encap_attach(struct l2tp_variant *); static int l2tp_encap_detach(struct l2tp_variant *); +static inline struct ifqueue * +l2tp_ifq_percpu_getref(percpu_t *pc) +{ + + return *(struct ifqueue **)percpu_getref(pc); +} + +static inline void +l2tp_ifq_percpu_putref(percpu_t *pc) +{ + + percpu_putref(pc); +} + #ifndef MAX_L2TP_NEST /* * This macro controls the upper limitation on nesting of l2tp tunnels. @@ -228,7 +245,10 @@ l2tp_clone_create(struct if_clone *ifc, struct l2tp_softc *sc; struct l2tp_variant *var; int rv; - + u_int si_flags = SOFTINT_NET; +#ifdef NET_MPSAFE + si_flags |= SOFTINT_MPSAFE; +#endif sc = kmem_zalloc(sizeof(struct l2tp_softc), KM_SLEEP); if_initname(&sc->l2tp_ec.ec_if, ifc->ifc_name, unit); rv = l2tpattach0(sc); @@ -248,8 +268,11 @@ l2tp_clone_create(struct if_clone *ifc, sc->l2tp_psz = pserialize_create(); PSLIST_ENTRY_INIT(sc, l2tp_hash); - sc->l2tp_ro_percpu = percpu_alloc(sizeof(struct l2tp_ro)); - percpu_foreach(sc->l2tp_ro_percpu, l2tp_ro_init_pc, NULL); + sc->l2tp_ro_percpu = if_tunnel_alloc_ro_percpu(); + + sc->l2tp_ifq_percpu = percpu_alloc(sizeof(struct ifqueue *)); + percpu_foreach(sc->l2tp_ifq_percpu, l2tp_ifq_init_pc, NULL); + sc->l2tp_si = softint_establish(si_flags, l2tpintr_softint, sc); mutex_enter(&l2tp_softcs.lock); LIST_INSERT_HEAD(&l2tp_softcs.list, sc, l2tp_list); @@ -278,6 +301,24 @@ l2tpattach0(struct l2tp_softc *sc) sc->l2tp_ec.ec_if.if_transmit = l2tp_transmit; sc->l2tp_ec.ec_if._if_input = ether_input; IFQ_SET_READY(&sc->l2tp_ec.ec_if.if_snd); + +#ifdef MBUFTRACE + struct ethercom *ec = &sc->l2tp_ec; + struct ifnet *ifp = &sc->l2tp_ec.ec_if; + + strlcpy(ec->ec_tx_mowner.mo_name, ifp->if_xname, + sizeof(ec->ec_tx_mowner.mo_name)); + strlcpy(ec->ec_tx_mowner.mo_descr, "tx", + sizeof(ec->ec_tx_mowner.mo_descr)); + strlcpy(ec->ec_rx_mowner.mo_name, ifp->if_xname, + sizeof(ec->ec_rx_mowner.mo_name)); + strlcpy(ec->ec_rx_mowner.mo_descr, "rx", + sizeof(ec->ec_rx_mowner.mo_descr)); + MOWNER_ATTACH(&ec->ec_tx_mowner); + MOWNER_ATTACH(&ec->ec_rx_mowner); + ifp->if_mowner = &ec->ec_tx_mowner; +#endif + /* XXX * It may improve performance to use if_initialize()/if_register() * so that l2tp_input() calls if_input() instead of @@ -294,21 +335,20 @@ l2tpattach0(struct l2tp_softc *sc) } void -l2tp_ro_init_pc(void *p, void *arg __unused, struct cpu_info *ci __unused) +l2tp_ifq_init_pc(void *p, void *arg __unused, struct cpu_info *ci __unused) { - struct l2tp_ro *lro = p; + struct ifqueue **ifqp = p; - lro->lr_lock = mutex_obj_alloc(MUTEX_DEFAULT, IPL_NONE); + *ifqp = kmem_zalloc(sizeof(**ifqp), KM_SLEEP); + (*ifqp)->ifq_maxlen = IFQ_MAXLEN; } void -l2tp_ro_fini_pc(void *p, void *arg __unused, struct cpu_info *ci __unused) +l2tp_ifq_fini_pc(void *p, void *arg __unused, struct cpu_info *ci __unused) { - struct l2tp_ro *lro = p; + struct ifqueue **ifqp = p; - rtcache_free(&lro->lr_ro); - - mutex_obj_free(lro->lr_lock); + kmem_free(*ifqp, sizeof(**ifqp)); } static int @@ -321,13 +361,18 @@ l2tp_clone_destroy(struct ifnet *ifp) l2tp_clear_session(sc); l2tp_delete_tunnel(&sc->l2tp_ec.ec_if); /* - * To avoid for l2tp_transmit() to access sc->l2tp_var after free it. + * To avoid for l2tp_transmit() and l2tpintr_softint() to access + * sc->l2tp_var after free it. */ mutex_enter(&sc->l2tp_lock); var = sc->l2tp_var; l2tp_variant_update(sc, NULL); mutex_exit(&sc->l2tp_lock); + softint_disestablish(sc->l2tp_si); + percpu_foreach(sc->l2tp_ifq_percpu, l2tp_ifq_fini_pc, NULL); + percpu_free(sc->l2tp_ifq_percpu, sizeof(struct ifqueue *)); + mutex_enter(&l2tp_softcs.lock); LIST_REMOVE(sc, l2tp_list); mutex_exit(&l2tp_softcs.lock); @@ -336,8 +381,7 @@ l2tp_clone_destroy(struct ifnet *ifp) if_detach(ifp); - percpu_foreach(sc->l2tp_ro_percpu, l2tp_ro_fini_pc, NULL); - percpu_free(sc->l2tp_ro_percpu, sizeof(struct l2tp_ro)); + if_tunnel_free_ro_percpu(sc->l2tp_ro_percpu); kmem_free(var, sizeof(struct l2tp_variant)); pserialize_destroy(sc->l2tp_psz); @@ -348,6 +392,37 @@ l2tp_clone_destroy(struct ifnet *ifp) } static int +l2tp_tx_enqueue(struct l2tp_variant *var, struct mbuf *m) +{ + struct l2tp_softc *sc; + struct ifnet *ifp; + struct ifqueue *ifq; + int s; + + KASSERT(psref_held(&var->lv_psref, lv_psref_class)); + + sc = var->lv_softc; + ifp = &sc->l2tp_ec.ec_if; + + s = splsoftnet(); + ifq = l2tp_ifq_percpu_getref(sc->l2tp_ifq_percpu); + if (IF_QFULL(ifq)) { + ifp->if_oerrors++; + l2tp_ifq_percpu_putref(sc->l2tp_ifq_percpu); + splx(s); + m_freem(m); + return ENOBUFS; + } + + IF_ENQUEUE(ifq, m); + percpu_putref(sc->l2tp_ifq_percpu); + softint_schedule(sc->l2tp_si); + /* counter is incremented in l2tpintr() */ + splx(s); + return 0; +} + +static int l2tp_output(struct ifnet *ifp, struct mbuf *m, const struct sockaddr *dst, const struct rtentry *rt) { @@ -389,17 +464,7 @@ l2tp_output(struct ifnet *ifp, struct mb } *mtod(m, int *) = dst->sa_family; - IFQ_ENQUEUE(&ifp->if_snd, m, error); - if (error) - goto end; - - /* - * direct call to avoid infinite loop at l2tpintr() - */ - l2tpintr(var); - - error = 0; - + error = l2tp_tx_enqueue(var, m); end: l2tp_putref_variant(var, &psref); if (error) @@ -409,12 +474,54 @@ end: } static void +l2tp_sendit(struct l2tp_variant *var, struct mbuf *m) +{ + int len; + int error; + struct l2tp_softc *sc; + struct ifnet *ifp; + + KASSERT(psref_held(&var->lv_psref, lv_psref_class)); + + sc = var->lv_softc; + ifp = &sc->l2tp_ec.ec_if; + + len = m->m_pkthdr.len; + m->m_flags &= ~(M_BCAST|M_MCAST); + bpf_mtap(ifp, m); + + switch (var->lv_psrc->sa_family) { +#ifdef INET + case AF_INET: + error = in_l2tp_output(var, m); + break; +#endif +#ifdef INET6 + case AF_INET6: + error = in6_l2tp_output(var, m); + break; +#endif + default: + m_freem(m); + error = ENETDOWN; + break; + } + if (error) { + ifp->if_oerrors++; + } else { + ifp->if_opackets++; + ifp->if_obytes += len; + } +} + +static void l2tpintr(struct l2tp_variant *var) { struct l2tp_softc *sc; struct ifnet *ifp; struct mbuf *m; - int error; + struct ifqueue *ifq; + u_int cpuid = cpu_index(curcpu()); KASSERT(psref_held(&var->lv_psref, lv_psref_class)); @@ -423,44 +530,49 @@ l2tpintr(struct l2tp_variant *var) /* output processing */ if (var->lv_my_sess_id == 0 || var->lv_peer_sess_id == 0) { - IFQ_PURGE(&ifp->if_snd); + ifq = l2tp_ifq_percpu_getref(sc->l2tp_ifq_percpu); + IF_PURGE(ifq); + l2tp_ifq_percpu_putref(sc->l2tp_ifq_percpu); + if (cpuid == 0) + IFQ_PURGE(&ifp->if_snd); return; } + /* Currently, l2tpintr() is always called in softint context. */ + ifq = l2tp_ifq_percpu_getref(sc->l2tp_ifq_percpu); for (;;) { - IFQ_DEQUEUE(&ifp->if_snd, m); - if (m == NULL) - break; - m->m_flags &= ~(M_BCAST|M_MCAST); - bpf_mtap(ifp, m); - switch (var->lv_psrc->sa_family) { -#ifdef INET - case AF_INET: - error = in_l2tp_output(var, m); - break; -#endif -#ifdef INET6 - case AF_INET6: - error = in6_l2tp_output(var, m); - break; -#endif - default: - m_freem(m); - error = ENETDOWN; + IF_DEQUEUE(ifq, m); + if (m != NULL) + l2tp_sendit(var, m); + else break; - } + } + l2tp_ifq_percpu_putref(sc->l2tp_ifq_percpu); - if (error) - ifp->if_oerrors++; - else { - ifp->if_opackets++; - /* - * obytes is incremented at ether_output() or - * bridge_enqueue(). - */ + if (cpuid == 0) { + for (;;) { + IFQ_DEQUEUE(&ifp->if_snd, m); + if (m != NULL) + l2tp_sendit(var, m); + else + break; } } +} +static void +l2tpintr_softint(void *arg) +{ + struct l2tp_variant *var; + struct psref psref; + struct l2tp_softc *sc = arg; + + var = l2tp_getref_variant(sc, &psref); + if (var == NULL) + return; + + l2tpintr(var); + l2tp_putref_variant(var, &psref); } void @@ -570,7 +682,7 @@ l2tp_start(struct ifnet *ifp) if (var->lv_psrc == NULL || var->lv_pdst == NULL) return; - l2tpintr(var); + softint_schedule(sc->l2tp_si); l2tp_putref_variant(var, &psref); } @@ -596,33 +708,8 @@ l2tp_transmit(struct ifnet *ifp, struct } m->m_flags &= ~(M_BCAST|M_MCAST); - bpf_mtap(ifp, m); - switch (var->lv_psrc->sa_family) { -#ifdef INET - case AF_INET: - error = in_l2tp_output(var, m); - break; -#endif -#ifdef INET6 - case AF_INET6: - error = in6_l2tp_output(var, m); - break; -#endif - default: - m_freem(m); - error = ENETDOWN; - break; - } - - if (error) - ifp->if_oerrors++; - else { - ifp->if_opackets++; - /* - * obytes is incremented at ether_output() or bridge_enqueue(). - */ - } + error = l2tp_tx_enqueue(var, m); out: l2tp_putref_variant(var, &psref); return error; Index: src/sys/net/if_l2tp.h diff -u src/sys/net/if_l2tp.h:1.2.2.3 src/sys/net/if_l2tp.h:1.2.2.4 --- src/sys/net/if_l2tp.h:1.2.2.3 Sun Oct 21 11:55:54 2018 +++ src/sys/net/if_l2tp.h Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: if_l2tp.h,v 1.2.2.3 2018/10/21 11:55:54 martin Exp $ */ +/* $NetBSD: if_l2tp.h,v 1.2.2.4 2019/09/24 18:27:09 martin Exp $ */ /* * Copyright (c) 2017 Internet Initiative Japan Inc. @@ -91,15 +91,10 @@ struct l2tp_variant { struct psref_target lv_psref; }; -struct l2tp_ro { - struct route lr_ro; - kmutex_t *lr_lock; -}; - struct l2tp_softc { struct ethercom l2tp_ec; /* common area - must be at the top */ /* to use ether_input(), we must have this */ - percpu_t *l2tp_ro_percpu; /* struct l2tp_ro */ + percpu_t *l2tp_ro_percpu; /* struct tunnel_ro */ struct l2tp_variant *l2tp_var; /* * reader must use l2tp_getref_variant() * instead of direct dereference. @@ -107,6 +102,9 @@ struct l2tp_softc { kmutex_t l2tp_lock; /* writer lock for l2tp_var */ pserialize_t l2tp_psz; + void *l2tp_si; + percpu_t *l2tp_ifq_percpu; + LIST_ENTRY(l2tp_softc) l2tp_list; /* list of all l2tps */ struct pslist_entry l2tp_hash; /* hashed list to lookup by session id */ }; @@ -192,7 +190,7 @@ struct mbuf *l2tp_tcpmss_clamp(struct if * - l2tp_var->lv_psref for reader * l2tp_softc->l2tp_var is used for variant values while the l2tp tunnel * exists. - * + struct l2tp_ro->lr_ro is protected by struct l2tp_ro->lr_lock. + * + struct l2tp_ro->lr_ro is protected by struct tunnel_ro->tr_lock. * This lock is required to exclude softnet/0 lwp(such as output * processing softint) and processing lwp(such as DAD timer processing). * Index: src/sys/net/route.c diff -u src/sys/net/route.c:1.194.6.13 src/sys/net/route.c:1.194.6.14 --- src/sys/net/route.c:1.194.6.13 Fri Mar 15 14:44:05 2019 +++ src/sys/net/route.c Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: route.c,v 1.194.6.13 2019/03/15 14:44:05 martin Exp $ */ +/* $NetBSD: route.c,v 1.194.6.14 2019/09/24 18:27:09 martin Exp $ */ /*- * Copyright (c) 1998, 2008 The NetBSD Foundation, Inc. @@ -97,7 +97,7 @@ #endif #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: route.c,v 1.194.6.13 2019/03/15 14:44:05 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: route.c,v 1.194.6.14 2019/09/24 18:27:09 martin Exp $"); #include <sys/param.h> #ifdef RTFLUSH_DEBUG @@ -119,6 +119,7 @@ __KERNEL_RCSID(0, "$NetBSD: route.c,v 1. #include <sys/rwlock.h> #include <sys/mutex.h> #include <sys/cpu.h> +#include <sys/kmem.h> #include <net/if.h> #include <net/if_dl.h> @@ -2215,6 +2216,29 @@ rtcache_setdst(struct route *ro, const s return 0; } +static void +rtcache_percpu_init_cpu(void *p, void *arg __unused, struct cpu_info *ci __unused) +{ + struct route **rop = p; + + /* + * We can't have struct route as percpu data because it can be destroyed + * over a memory enlargement processing of percpu. + */ + *rop = kmem_zalloc(sizeof(**rop), KM_SLEEP); +} + +percpu_t * +rtcache_percpu_alloc(void) +{ + percpu_t *pc; + + pc = percpu_alloc(sizeof(struct route *)); + percpu_foreach(pc, rtcache_percpu_init_cpu, NULL); + + return pc; +} + const struct sockaddr * rt_settag(struct rtentry *rt, const struct sockaddr *tag) { Index: src/sys/net/route.h diff -u src/sys/net/route.h:1.112.4.5 src/sys/net/route.h:1.112.4.6 --- src/sys/net/route.h:1.112.4.5 Tue Nov 6 14:38:58 2018 +++ src/sys/net/route.h Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: route.h,v 1.112.4.5 2018/11/06 14:38:58 martin Exp $ */ +/* $NetBSD: route.h,v 1.112.4.6 2019/09/24 18:27:09 martin Exp $ */ /* * Copyright (c) 1980, 1986, 1993 @@ -42,6 +42,7 @@ #include <sys/rwlock.h> #include <sys/condvar.h> #include <sys/pserialize.h> +#include <sys/percpu.h> #endif #include <sys/psref.h> @@ -491,6 +492,24 @@ struct rtentry * void rtcache_unref(struct rtentry *, struct route *); +percpu_t * + rtcache_percpu_alloc(void); + +static inline struct route * +rtcache_percpu_getref(percpu_t *pc) +{ + + return *(struct route **)percpu_getref(pc); +} + +static inline void +rtcache_percpu_putref(percpu_t *pc) +{ + + percpu_putref(pc); +} + + /* rtsock */ void rt_ieee80211msg(struct ifnet *, int, void *, size_t); void rt_ifannouncemsg(struct ifnet *, int); Index: src/sys/netinet/in_gif.c diff -u src/sys/netinet/in_gif.c:1.87.8.5 src/sys/netinet/in_gif.c:1.87.8.6 --- src/sys/netinet/in_gif.c:1.87.8.5 Thu May 17 14:07:04 2018 +++ src/sys/netinet/in_gif.c Tue Sep 24 18:27:10 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: in_gif.c,v 1.87.8.5 2018/05/17 14:07:04 martin Exp $ */ +/* $NetBSD: in_gif.c,v 1.87.8.6 2019/09/24 18:27:10 martin Exp $ */ /* $KAME: in_gif.c,v 1.66 2001/07/29 04:46:09 itojun Exp $ */ /* @@ -31,7 +31,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: in_gif.c,v 1.87.8.5 2018/05/17 14:07:04 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: in_gif.c,v 1.87.8.6 2019/09/24 18:27:10 martin Exp $"); #ifdef _KERNEL_OPT #include "opt_inet.h" @@ -83,12 +83,12 @@ static int in_gif_output(struct gif_variant *var, int family, struct mbuf *m) { struct rtentry *rt; - struct route *ro; - struct gif_ro *gro; struct gif_softc *sc; struct sockaddr_in *sin_src; struct sockaddr_in *sin_dst; struct ifnet *ifp; + struct route *ro_pc; + kmutex_t *lock_pc; struct ip iphdr; /* capsule IP header, host byte ordered */ int proto, error; u_int8_t tos; @@ -175,30 +175,25 @@ in_gif_output(struct gif_variant *var, i bcopy(&iphdr, mtod(m, struct ip *), sizeof(struct ip)); sc = var->gv_softc; - gro = percpu_getref(sc->gif_ro_percpu); - mutex_enter(gro->gr_lock); - ro = &gro->gr_ro; - if ((rt = rtcache_lookup(ro, var->gv_pdst)) == NULL) { - mutex_exit(gro->gr_lock); - percpu_putref(sc->gif_ro_percpu); + if_tunnel_get_ro(sc->gif_ro_percpu, &ro_pc, &lock_pc); + if ((rt = rtcache_lookup(ro_pc, var->gv_pdst)) == NULL) { + if_tunnel_put_ro(sc->gif_ro_percpu, lock_pc); m_freem(m); return ENETUNREACH; } /* If the route constitutes infinite encapsulation, punt. */ if (rt->rt_ifp == ifp) { - rtcache_unref(rt, ro); - rtcache_free(ro); - mutex_exit(gro->gr_lock); - percpu_putref(sc->gif_ro_percpu); + rtcache_unref(rt, ro_pc); + rtcache_free(ro_pc); + if_tunnel_put_ro(sc->gif_ro_percpu, lock_pc); m_freem(m); return ENETUNREACH; /*XXX*/ } - rtcache_unref(rt, ro); + rtcache_unref(rt, ro_pc); - error = ip_output(m, NULL, ro, 0, NULL, NULL); - mutex_exit(gro->gr_lock); - percpu_putref(sc->gif_ro_percpu); + error = ip_output(m, NULL, ro_pc, 0, NULL, NULL); + if_tunnel_put_ro(sc->gif_ro_percpu, lock_pc); return (error); } @@ -400,7 +395,7 @@ in_gif_detach(struct gif_variant *var) if (error == 0) var->gv_encap_cookie4 = NULL; - percpu_foreach(sc->gif_ro_percpu, gif_rtcache_free_pc, NULL); + if_tunnel_ro_percpu_rtcache_free(sc->gif_ro_percpu); return error; } Index: src/sys/netinet/in_l2tp.c diff -u src/sys/netinet/in_l2tp.c:1.2.8.7 src/sys/netinet/in_l2tp.c:1.2.8.8 --- src/sys/netinet/in_l2tp.c:1.2.8.7 Mon Sep 10 15:58:47 2018 +++ src/sys/netinet/in_l2tp.c Tue Sep 24 18:27:10 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: in_l2tp.c,v 1.2.8.7 2018/09/10 15:58:47 martin Exp $ */ +/* $NetBSD: in_l2tp.c,v 1.2.8.8 2019/09/24 18:27:10 martin Exp $ */ /* * Copyright (c) 2017 Internet Initiative Japan Inc. @@ -27,7 +27,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: in_l2tp.c,v 1.2.8.7 2018/09/10 15:58:47 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: in_l2tp.c,v 1.2.8.8 2019/09/24 18:27:10 martin Exp $"); #ifdef _KERNEL_OPT #include "opt_l2tp.h" @@ -93,7 +93,9 @@ in_l2tp_output(struct l2tp_variant *var, struct sockaddr_in *sin_dst = satosin(var->lv_pdst); struct ip iphdr; /* capsule IP header, host byte ordered */ struct rtentry *rt; - struct l2tp_ro *lro; + struct route *ro_pc; + kmutex_t *lock_pc; + int error; uint32_t sess_id; @@ -209,26 +211,23 @@ in_l2tp_output(struct l2tp_variant *var, } memcpy(mtod(m, struct ip *), &iphdr, sizeof(struct ip)); - lro = percpu_getref(sc->l2tp_ro_percpu); - mutex_enter(lro->lr_lock); - if ((rt = rtcache_lookup(&lro->lr_ro, var->lv_pdst)) == NULL) { - mutex_exit(lro->lr_lock); - percpu_putref(sc->l2tp_ro_percpu); + if_tunnel_get_ro(sc->l2tp_ro_percpu, &ro_pc, &lock_pc); + if ((rt = rtcache_lookup(ro_pc, var->lv_pdst)) == NULL) { + if_tunnel_put_ro(sc->l2tp_ro_percpu, lock_pc); m_freem(m); error = ENETUNREACH; goto out; } if (rt->rt_ifp == ifp) { - rtcache_unref(rt, &lro->lr_ro); - rtcache_free(&lro->lr_ro); - mutex_exit(lro->lr_lock); - percpu_putref(sc->l2tp_ro_percpu); + rtcache_unref(rt, ro_pc); + rtcache_free(ro_pc); + if_tunnel_put_ro(sc->l2tp_ro_percpu, lock_pc); m_freem(m); error = ENETUNREACH; /*XXX*/ goto out; } - rtcache_unref(rt, &lro->lr_ro); + rtcache_unref(rt, ro_pc); /* * To avoid inappropriate rewrite of checksum, @@ -236,9 +235,8 @@ in_l2tp_output(struct l2tp_variant *var, */ m->m_pkthdr.csum_flags = 0; - error = ip_output(m, NULL, &lro->lr_ro, 0, NULL, NULL); - mutex_exit(lro->lr_lock); - percpu_putref(sc->l2tp_ro_percpu); + error = ip_output(m, NULL, ro_pc, 0, NULL, NULL); + if_tunnel_put_ro(sc->l2tp_ro_percpu, lock_pc); return error; looped: Index: src/sys/netinet/ip_input.c diff -u src/sys/netinet/ip_input.c:1.355.2.7 src/sys/netinet/ip_input.c:1.355.2.8 --- src/sys/netinet/ip_input.c:1.355.2.7 Tue Sep 17 18:57:23 2019 +++ src/sys/netinet/ip_input.c Tue Sep 24 18:27:10 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: ip_input.c,v 1.355.2.7 2019/09/17 18:57:23 martin Exp $ */ +/* $NetBSD: ip_input.c,v 1.355.2.8 2019/09/24 18:27:10 martin Exp $ */ /* * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project. @@ -91,7 +91,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: ip_input.c,v 1.355.2.7 2019/09/17 18:57:23 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: ip_input.c,v 1.355.2.8 2019/09/24 18:27:10 martin Exp $"); #ifdef _KERNEL_OPT #include "opt_inet.h" @@ -342,7 +342,7 @@ ip_init(void) #endif /* MBUFTRACE */ ipstat_percpu = percpu_alloc(sizeof(uint64_t) * IP_NSTATS); - ipforward_rt_percpu = percpu_alloc(sizeof(struct route)); + ipforward_rt_percpu = rtcache_percpu_alloc(); ip_mtudisc_timeout_q = rt_timer_queue_create(ip_mtudisc_timeout); } @@ -1205,16 +1205,16 @@ ip_rtaddr(struct in_addr dst, struct psr sockaddr_in_init(&u.dst4, &dst, 0); - ro = percpu_getref(ipforward_rt_percpu); + ro = rtcache_percpu_getref(ipforward_rt_percpu); rt = rtcache_lookup(ro, &u.dst); if (rt == NULL) { - percpu_putref(ipforward_rt_percpu); + rtcache_percpu_putref(ipforward_rt_percpu); return NULL; } ia4_acquire(ifatoia(rt->rt_ifa), psref); rtcache_unref(rt, ro); - percpu_putref(ipforward_rt_percpu); + rtcache_percpu_putref(ipforward_rt_percpu); return ifatoia(rt->rt_ifa); } @@ -1390,10 +1390,10 @@ ip_forward(struct mbuf *m, int srcrt, st sockaddr_in_init(&u.dst4, &ip->ip_dst, 0); - ro = percpu_getref(ipforward_rt_percpu); + ro = rtcache_percpu_getref(ipforward_rt_percpu); rt = rtcache_lookup(ro, &u.dst); if (rt == NULL) { - percpu_putref(ipforward_rt_percpu); + rtcache_percpu_putref(ipforward_rt_percpu); icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NET, dest, 0); return; } @@ -1465,13 +1465,13 @@ ip_forward(struct mbuf *m, int srcrt, st m_freem(mcopy); } - percpu_putref(ipforward_rt_percpu); + rtcache_percpu_putref(ipforward_rt_percpu); return; redirect: error: if (mcopy == NULL) { - percpu_putref(ipforward_rt_percpu); + rtcache_percpu_putref(ipforward_rt_percpu); return; } @@ -1514,11 +1514,11 @@ error: */ if (mcopy) m_freem(mcopy); - percpu_putref(ipforward_rt_percpu); + rtcache_percpu_putref(ipforward_rt_percpu); return; } icmp_error(mcopy, type, code, dest, destmtu); - percpu_putref(ipforward_rt_percpu); + rtcache_percpu_putref(ipforward_rt_percpu); } void Index: src/sys/netinet/wqinput.c diff -u src/sys/netinet/wqinput.c:1.3.2.1 src/sys/netinet/wqinput.c:1.3.2.2 --- src/sys/netinet/wqinput.c:1.3.2.1 Mon Feb 26 13:32:01 2018 +++ src/sys/netinet/wqinput.c Tue Sep 24 18:27:10 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: wqinput.c,v 1.3.2.1 2018/02/26 13:32:01 martin Exp $ */ +/* $NetBSD: wqinput.c,v 1.3.2.2 2019/09/24 18:27:10 martin Exp $ */ /*- * Copyright (c) 2017 Internet Initiative Japan Inc. @@ -80,7 +80,8 @@ static void wqinput_sysctl_setup(const c static void wqinput_drops(void *p, void *arg, struct cpu_info *ci __unused) { - struct wqinput_worklist *const wwl = p; + struct wqinput_worklist **const wwlp = p; + struct wqinput_worklist *const wwl = *wwlp; int *sum = arg; *sum += wwl->wwl_dropped; @@ -148,6 +149,28 @@ bad: return; } +static struct wqinput_worklist * +wqinput_percpu_getref(percpu_t *pc) +{ + + return *(struct wqinput_worklist **)percpu_getref(pc); +} + +static void +wqinput_percpu_putref(percpu_t *pc) +{ + + percpu_putref(pc); +} + +static void +wqinput_percpu_init_cpu(void *p, void *arg __unused, struct cpu_info *ci __unused) +{ + struct wqinput_worklist **wwlp = p; + + *wwlp = kmem_zalloc(sizeof(**wwlp), KM_SLEEP); +} + struct wqinput * wqinput_create(const char *name, void (*func)(struct mbuf *, int, int)) { @@ -165,7 +188,8 @@ wqinput_create(const char *name, void (* panic("%s: workqueue_create failed (%d)\n", __func__, error); pool_init(&wqi->wqi_work_pool, sizeof(struct wqinput_work), 0, 0, 0, name, NULL, IPL_SOFTNET); - wqi->wqi_worklists = percpu_alloc(sizeof(struct wqinput_worklist)); + wqi->wqi_worklists = percpu_alloc(sizeof(struct wqinput_worklist *)); + percpu_foreach(wqi->wqi_worklists, wqinput_percpu_init_cpu, NULL); wqi->wqi_input = func; wqinput_sysctl_setup(name, wqi); @@ -207,7 +231,7 @@ wqinput_work(struct work *wk, void *arg) /* Users expect to run at IPL_SOFTNET */ s = splsoftnet(); /* This also prevents LWP migrations between CPUs */ - wwl = percpu_getref(wqi->wqi_worklists); + wwl = wqinput_percpu_getref(wqi->wqi_worklists); /* We can allow enqueuing another work at this point */ wwl->wwl_wq_is_active = false; @@ -222,7 +246,7 @@ wqinput_work(struct work *wk, void *arg) pool_put(&wqi->wqi_work_pool, work); } - percpu_putref(wqi->wqi_worklists); + wqinput_percpu_putref(wqi->wqi_worklists); splx(s); } @@ -245,7 +269,7 @@ wqinput_input(struct wqinput *wqi, struc struct wqinput_work *work; struct wqinput_worklist *wwl; - wwl = percpu_getref(wqi->wqi_worklists); + wwl = wqinput_percpu_getref(wqi->wqi_worklists); /* Prevent too much work and mbuf from being queued */ if (wwl->wwl_len >= WQINPUT_LIST_MAXLEN) { @@ -274,5 +298,5 @@ wqinput_input(struct wqinput *wqi, struc workqueue_enqueue(wqi->wqi_wq, &wwl->wwl_work, NULL); out: - percpu_putref(wqi->wqi_worklists); + wqinput_percpu_putref(wqi->wqi_worklists); } Index: src/sys/netinet6/in6_gif.c diff -u src/sys/netinet6/in6_gif.c:1.85.6.6 src/sys/netinet6/in6_gif.c:1.85.6.7 --- src/sys/netinet6/in6_gif.c:1.85.6.6 Thu May 17 14:07:03 2018 +++ src/sys/netinet6/in6_gif.c Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: in6_gif.c,v 1.85.6.6 2018/05/17 14:07:03 martin Exp $ */ +/* $NetBSD: in6_gif.c,v 1.85.6.7 2019/09/24 18:27:09 martin Exp $ */ /* $KAME: in6_gif.c,v 1.62 2001/07/29 04:27:25 itojun Exp $ */ /* @@ -31,7 +31,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: in6_gif.c,v 1.85.6.6 2018/05/17 14:07:03 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: in6_gif.c,v 1.85.6.7 2019/09/24 18:27:09 martin Exp $"); #ifdef _KERNEL_OPT #include "opt_inet.h" @@ -86,13 +86,13 @@ static int in6_gif_output(struct gif_variant *var, int family, struct mbuf *m) { struct rtentry *rt; - struct route *ro; - struct gif_ro *gro; struct gif_softc *sc; struct sockaddr_in6 *sin6_src; struct sockaddr_in6 *sin6_dst; struct ifnet *ifp; struct ip6_hdr *ip6; + struct route *ro_pc; + kmutex_t *lock_pc; int proto, error; u_int8_t itos, otos; @@ -181,27 +181,23 @@ in6_gif_output(struct gif_variant *var, ip6->ip6_flow |= htonl((u_int32_t)otos << 20); sc = ifp->if_softc; - gro = percpu_getref(sc->gif_ro_percpu); - mutex_enter(gro->gr_lock); - ro = &gro->gr_ro; - rt = rtcache_lookup(ro, var->gv_pdst); + if_tunnel_get_ro(sc->gif_ro_percpu, &ro_pc, &lock_pc); + rt = rtcache_lookup(ro_pc, var->gv_pdst); if (rt == NULL) { - mutex_exit(gro->gr_lock); - percpu_putref(sc->gif_ro_percpu); + if_tunnel_put_ro(sc->gif_ro_percpu, lock_pc); m_freem(m); return ENETUNREACH; } /* If the route constitutes infinite encapsulation, punt. */ if (rt->rt_ifp == ifp) { - rtcache_unref(rt, ro); - rtcache_free(ro); - mutex_exit(gro->gr_lock); - percpu_putref(sc->gif_ro_percpu); + rtcache_unref(rt, ro_pc); + rtcache_free(ro_pc); + if_tunnel_put_ro(sc->gif_ro_percpu, lock_pc); m_freem(m); return ENETUNREACH; /* XXX */ } - rtcache_unref(rt, ro); + rtcache_unref(rt, ro_pc); #ifdef IPV6_MINMTU /* @@ -209,12 +205,11 @@ in6_gif_output(struct gif_variant *var, * it is too painful to ask for resend of inner packet, to achieve * path MTU discovery for encapsulated packets. */ - error = ip6_output(m, 0, ro, IPV6_MINMTU, NULL, NULL, NULL); + error = ip6_output(m, 0, ro_pc, IPV6_MINMTU, NULL, NULL, NULL); #else - error = ip6_output(m, 0, ro, 0, NULL, NULL, NULL); + error = ip6_output(m, 0, ro_pc, 0, NULL, NULL, NULL); #endif - mutex_exit(gro->gr_lock); - percpu_putref(sc->gif_ro_percpu); + if_tunnel_put_ro(sc->gif_ro_percpu, lock_pc); return (error); } @@ -421,7 +416,7 @@ in6_gif_detach(struct gif_variant *var) if (error == 0) var->gv_encap_cookie6 = NULL; - percpu_foreach(sc->gif_ro_percpu, gif_rtcache_free_pc, NULL); + if_tunnel_ro_percpu_rtcache_free(sc->gif_ro_percpu); return error; } @@ -434,7 +429,8 @@ in6_gif_ctlinput(int cmd, const struct s struct ip6ctlparam *ip6cp = NULL; struct ip6_hdr *ip6; const struct sockaddr_in6 *dst6; - struct route *ro; + struct route *ro_pc; + kmutex_t *lock_pc; struct psref psref; if (sa->sa_family != AF_INET6 || @@ -470,15 +466,15 @@ in6_gif_ctlinput(int cmd, const struct s } gif_putref_variant(var, &psref); - ro = percpu_getref(sc->gif_ro_percpu); - dst6 = satocsin6(rtcache_getdst(ro)); + if_tunnel_get_ro(sc->gif_ro_percpu, &ro_pc, &lock_pc); + dst6 = satocsin6(rtcache_getdst(ro_pc)); /* XXX scope */ if (dst6 == NULL) ; else if (IN6_ARE_ADDR_EQUAL(&ip6->ip6_dst, &dst6->sin6_addr)) - rtcache_free(ro); + rtcache_free(ro_pc); - percpu_putref(sc->gif_ro_percpu); + if_tunnel_put_ro(sc->gif_ro_percpu, lock_pc); return NULL; } Index: src/sys/netinet6/in6_l2tp.c diff -u src/sys/netinet6/in6_l2tp.c:1.5.8.7 src/sys/netinet6/in6_l2tp.c:1.5.8.8 --- src/sys/netinet6/in6_l2tp.c:1.5.8.7 Mon Sep 10 15:58:47 2018 +++ src/sys/netinet6/in6_l2tp.c Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: in6_l2tp.c,v 1.5.8.7 2018/09/10 15:58:47 martin Exp $ */ +/* $NetBSD: in6_l2tp.c,v 1.5.8.8 2019/09/24 18:27:09 martin Exp $ */ /* * Copyright (c) 2017 Internet Initiative Japan Inc. @@ -27,7 +27,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: in6_l2tp.c,v 1.5.8.7 2018/09/10 15:58:47 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: in6_l2tp.c,v 1.5.8.8 2019/09/24 18:27:09 martin Exp $"); #ifdef _KERNEL_OPT #include "opt_l2tp.h" @@ -90,7 +90,8 @@ int in6_l2tp_output(struct l2tp_variant *var, struct mbuf *m) { struct rtentry *rt; - struct l2tp_ro *lro; + struct route *ro_pc; + kmutex_t *lock_pc; struct l2tp_softc *sc; struct ifnet *ifp; struct sockaddr_in6 *sin6_src = satosin6(var->lv_psrc); @@ -201,25 +202,22 @@ in6_l2tp_output(struct l2tp_variant *var return ENOBUFS; memcpy(mtod(m, struct ip6_hdr *), &ip6hdr, sizeof(struct ip6_hdr)); - lro = percpu_getref(sc->l2tp_ro_percpu); - mutex_enter(lro->lr_lock); - if ((rt = rtcache_lookup(&lro->lr_ro, var->lv_pdst)) == NULL) { - mutex_exit(lro->lr_lock); - percpu_putref(sc->l2tp_ro_percpu); + if_tunnel_get_ro(sc->l2tp_ro_percpu, &ro_pc, &lock_pc); + if ((rt = rtcache_lookup(ro_pc, var->lv_pdst)) == NULL) { + if_tunnel_put_ro(sc->l2tp_ro_percpu, lock_pc); m_freem(m); return ENETUNREACH; } /* If the route constitutes infinite encapsulation, punt. */ if (rt->rt_ifp == ifp) { - rtcache_unref(rt, &lro->lr_ro); - rtcache_free(&lro->lr_ro); - mutex_exit(lro->lr_lock); - percpu_putref(sc->l2tp_ro_percpu); + rtcache_unref(rt, ro_pc); + rtcache_free(ro_pc); + if_tunnel_put_ro(sc->l2tp_ro_percpu, lock_pc); m_freem(m); return ENETUNREACH; /* XXX */ } - rtcache_unref(rt, &lro->lr_ro); + rtcache_unref(rt, ro_pc); /* * To avoid inappropriate rewrite of checksum, @@ -227,9 +225,8 @@ in6_l2tp_output(struct l2tp_variant *var */ m->m_pkthdr.csum_flags = 0; - error = ip6_output(m, 0, &lro->lr_ro, 0, NULL, NULL, NULL); - mutex_exit(lro->lr_lock); - percpu_putref(sc->l2tp_ro_percpu); + error = ip6_output(m, 0, ro_pc, 0, NULL, NULL, NULL); + if_tunnel_put_ro(sc->l2tp_ro_percpu, lock_pc); return(error); looped: Index: src/sys/netinet6/ip6_forward.c diff -u src/sys/netinet6/ip6_forward.c:1.87.2.3 src/sys/netinet6/ip6_forward.c:1.87.2.4 --- src/sys/netinet6/ip6_forward.c:1.87.2.3 Fri Mar 30 11:57:13 2018 +++ src/sys/netinet6/ip6_forward.c Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: ip6_forward.c,v 1.87.2.3 2018/03/30 11:57:13 martin Exp $ */ +/* $NetBSD: ip6_forward.c,v 1.87.2.4 2019/09/24 18:27:09 martin Exp $ */ /* $KAME: ip6_forward.c,v 1.109 2002/09/11 08:10:17 sakane Exp $ */ /* @@ -31,7 +31,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: ip6_forward.c,v 1.87.2.3 2018/03/30 11:57:13 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: ip6_forward.c,v 1.87.2.4 2019/09/24 18:27:09 martin Exp $"); #ifdef _KERNEL_OPT #include "opt_gateway.h" @@ -203,7 +203,7 @@ ip6_forward(struct mbuf *m, int srcrt) } #endif /* IPSEC */ - ro = percpu_getref(ip6_forward_rt_percpu); + ro = rtcache_percpu_getref(ip6_forward_rt_percpu); if (srcrt) { union { struct sockaddr dst; @@ -469,7 +469,7 @@ ip6_forward(struct mbuf *m, int srcrt) #endif rtcache_unref(rt, ro); if (ro != NULL) - percpu_putref(ip6_forward_rt_percpu); + rtcache_percpu_putref(ip6_forward_rt_percpu); if (rcvif != NULL) m_put_rcvif_psref(rcvif, &psref); return; Index: src/sys/netinet6/ip6_input.c diff -u src/sys/netinet6/ip6_input.c:1.178.2.8 src/sys/netinet6/ip6_input.c:1.178.2.9 --- src/sys/netinet6/ip6_input.c:1.178.2.8 Tue Sep 17 18:57:23 2019 +++ src/sys/netinet6/ip6_input.c Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: ip6_input.c,v 1.178.2.8 2019/09/17 18:57:23 martin Exp $ */ +/* $NetBSD: ip6_input.c,v 1.178.2.9 2019/09/24 18:27:09 martin Exp $ */ /* $KAME: ip6_input.c,v 1.188 2001/03/29 05:34:31 itojun Exp $ */ /* @@ -62,7 +62,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: ip6_input.c,v 1.178.2.8 2019/09/17 18:57:23 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: ip6_input.c,v 1.178.2.9 2019/09/24 18:27:09 martin Exp $"); #ifdef _KERNEL_OPT #include "opt_gateway.h" @@ -205,7 +205,7 @@ ip6_init(void) KASSERT(inet6_pfil_hook != NULL); ip6stat_percpu = percpu_alloc(sizeof(uint64_t) * IP6_NSTATS); - ip6_forward_rt_percpu = percpu_alloc(sizeof(struct route)); + ip6_forward_rt_percpu = rtcache_percpu_alloc(); } static void @@ -453,7 +453,7 @@ ip6_input(struct mbuf *m, struct ifnet * goto bad; } - ro = percpu_getref(ip6_forward_rt_percpu); + ro = rtcache_percpu_getref(ip6_forward_rt_percpu); /* * Multicast check */ @@ -639,7 +639,7 @@ ip6_input(struct mbuf *m, struct ifnet * in6_ifstat_inc(rcvif, ifs6_in_discard); #endif rtcache_unref(rt, ro); - percpu_putref(ip6_forward_rt_percpu); + rtcache_percpu_putref(ip6_forward_rt_percpu); return; /* m have already been freed */ } @@ -664,7 +664,7 @@ ip6_input(struct mbuf *m, struct ifnet * ICMP6_PARAMPROB_HEADER, (char *)&ip6->ip6_plen - (char *)ip6); rtcache_unref(rt, ro); - percpu_putref(ip6_forward_rt_percpu); + rtcache_percpu_putref(ip6_forward_rt_percpu); return; } IP6_EXTHDR_GET(hbh, struct ip6_hbh *, m, sizeof(struct ip6_hdr), @@ -672,7 +672,7 @@ ip6_input(struct mbuf *m, struct ifnet * if (hbh == NULL) { IP6_STATINC(IP6_STAT_TOOSHORT); rtcache_unref(rt, ro); - percpu_putref(ip6_forward_rt_percpu); + rtcache_percpu_putref(ip6_forward_rt_percpu); return; } KASSERT(IP6_HDR_ALIGNED_P(hbh)); @@ -727,7 +727,7 @@ ip6_input(struct mbuf *m, struct ifnet * if (error != 0) { rtcache_unref(rt, ro); - percpu_putref(ip6_forward_rt_percpu); + rtcache_percpu_putref(ip6_forward_rt_percpu); IP6_STATINC(IP6_STAT_CANTFORWARD); goto bad; } @@ -736,7 +736,7 @@ ip6_input(struct mbuf *m, struct ifnet * goto bad_unref; } else if (!ours) { rtcache_unref(rt, ro); - percpu_putref(ip6_forward_rt_percpu); + rtcache_percpu_putref(ip6_forward_rt_percpu); ip6_forward(m, srcrt); return; } @@ -780,7 +780,7 @@ ip6_input(struct mbuf *m, struct ifnet * rtcache_unref(rt, ro); rt = NULL; } - percpu_putref(ip6_forward_rt_percpu); + rtcache_percpu_putref(ip6_forward_rt_percpu); rh_present = 0; frg_present = 0; @@ -839,7 +839,7 @@ ip6_input(struct mbuf *m, struct ifnet * bad_unref: rtcache_unref(rt, ro); - percpu_putref(ip6_forward_rt_percpu); + rtcache_percpu_putref(ip6_forward_rt_percpu); bad: m_freem(m); return; Index: src/sys/netipsec/ipsec_output.c diff -u src/sys/netipsec/ipsec_output.c:1.48.2.3 src/sys/netipsec/ipsec_output.c:1.48.2.4 --- src/sys/netipsec/ipsec_output.c:1.48.2.3 Sat May 5 19:31:33 2018 +++ src/sys/netipsec/ipsec_output.c Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: ipsec_output.c,v 1.48.2.3 2018/05/05 19:31:33 martin Exp $ */ +/* $NetBSD: ipsec_output.c,v 1.48.2.4 2019/09/24 18:27:09 martin Exp $ */ /*- * Copyright (c) 2002, 2003 Sam Leffler, Errno Consulting @@ -29,7 +29,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: ipsec_output.c,v 1.48.2.3 2018/05/05 19:31:33 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: ipsec_output.c,v 1.48.2.4 2019/09/24 18:27:09 martin Exp $"); /* * IPsec output processing. @@ -118,7 +118,7 @@ ipsec_reinject_ipstack(struct mbuf *m, i KASSERT(af == AF_INET || af == AF_INET6); KERNEL_LOCK_UNLESS_NET_MPSAFE(); - ro = percpu_getref(ipsec_rtcache_percpu); + ro = rtcache_percpu_getref(ipsec_rtcache_percpu); switch (af) { #ifdef INET case AF_INET: @@ -136,7 +136,7 @@ ipsec_reinject_ipstack(struct mbuf *m, i break; #endif } - percpu_putref(ipsec_rtcache_percpu); + rtcache_percpu_putref(ipsec_rtcache_percpu); KERNEL_UNLOCK_UNLESS_NET_MPSAFE(); return rv; @@ -283,6 +283,24 @@ static void ipsec_fill_saidx_bymbuf(struct secasindex *saidx, const struct mbuf *m, const int af) { + struct m_tag *mtag; + u_int16_t natt_src = IPSEC_PORT_ANY; + u_int16_t natt_dst = IPSEC_PORT_ANY; + + /* + * For NAT-T enabled ipsecif(4), set NAT-T port numbers + * even if the saidx uses transport mode. + * + * See also ipsecif[46]_output(). + */ + mtag = m_tag_find(m, PACKET_TAG_IPSEC_NAT_T_PORTS, NULL); + if (mtag) { + u_int16_t *natt_ports; + + natt_ports = (u_int16_t *)(mtag + 1); + natt_src = natt_ports[1]; + natt_dst = natt_ports[0]; + } if (af == AF_INET) { struct sockaddr_in *sin; @@ -292,14 +310,14 @@ ipsec_fill_saidx_bymbuf(struct secasinde sin = &saidx->src.sin; sin->sin_len = sizeof(*sin); sin->sin_family = AF_INET; - sin->sin_port = IPSEC_PORT_ANY; + sin->sin_port = natt_src; sin->sin_addr = ip->ip_src; } if (saidx->dst.sa.sa_len == 0) { sin = &saidx->dst.sin; sin->sin_len = sizeof(*sin); sin->sin_family = AF_INET; - sin->sin_port = IPSEC_PORT_ANY; + sin->sin_port = natt_dst; sin->sin_addr = ip->ip_dst; } } else { @@ -310,7 +328,7 @@ ipsec_fill_saidx_bymbuf(struct secasinde sin6 = (struct sockaddr_in6 *)&saidx->src; sin6->sin6_len = sizeof(*sin6); sin6->sin6_family = AF_INET6; - sin6->sin6_port = IPSEC_PORT_ANY; + sin6->sin6_port = natt_src; sin6->sin6_addr = ip6->ip6_src; if (IN6_IS_SCOPE_LINKLOCAL(&ip6->ip6_src)) { /* fix scope id for comparing SPD */ @@ -323,7 +341,7 @@ ipsec_fill_saidx_bymbuf(struct secasinde sin6 = (struct sockaddr_in6 *)&saidx->dst; sin6->sin6_len = sizeof(*sin6); sin6->sin6_family = AF_INET6; - sin6->sin6_port = IPSEC_PORT_ANY; + sin6->sin6_port = natt_dst; sin6->sin6_addr = ip6->ip6_dst; if (IN6_IS_SCOPE_LINKLOCAL(&ip6->ip6_dst)) { /* fix scope id for comparing SPD */ @@ -826,5 +844,5 @@ void ipsec_output_init(void) { - ipsec_rtcache_percpu = percpu_alloc(sizeof(struct route)); + ipsec_rtcache_percpu = rtcache_percpu_alloc(); } Index: src/sys/netipsec/ipsecif.c diff -u src/sys/netipsec/ipsecif.c:1.1.2.8 src/sys/netipsec/ipsecif.c:1.1.2.9 --- src/sys/netipsec/ipsecif.c:1.1.2.8 Wed May 29 15:57:38 2019 +++ src/sys/netipsec/ipsecif.c Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: ipsecif.c,v 1.1.2.8 2019/05/29 15:57:38 martin Exp $ */ +/* $NetBSD: ipsecif.c,v 1.1.2.9 2019/09/24 18:27:09 martin Exp $ */ /* * Copyright (c) 2017 Internet Initiative Japan Inc. @@ -27,7 +27,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: ipsecif.c,v 1.1.2.8 2019/05/29 15:57:38 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: ipsecif.c,v 1.1.2.9 2019/09/24 18:27:09 martin Exp $"); #ifdef _KERNEL_OPT #include "opt_inet.h" @@ -71,6 +71,7 @@ __KERNEL_RCSID(0, "$NetBSD: ipsecif.c,v #include <net/if_ipsec.h> +static int ipsecif_set_natt_ports(struct ipsec_variant *, struct mbuf *); static void ipsecif4_input(struct mbuf *, int, int, void *); static int ipsecif4_output(struct ipsec_variant *, int, struct mbuf *); static int ipsecif4_filter4(const struct ip *, struct ipsec_variant *, @@ -102,6 +103,32 @@ struct encapsw ipsecif4_encapsw = { static const struct encapsw ipsecif6_encapsw; #endif +static int +ipsecif_set_natt_ports(struct ipsec_variant *var, struct mbuf *m) +{ + + KASSERT(if_ipsec_heldref_variant(var)); + + if (var->iv_sport || var->iv_dport) { + struct m_tag *mtag; + + mtag = m_tag_get(PACKET_TAG_IPSEC_NAT_T_PORTS, + sizeof(uint16_t) + sizeof(uint16_t), M_DONTWAIT); + if (mtag) { + uint16_t *natt_port; + + natt_port = (uint16_t *)(mtag + 1); + natt_port[0] = var->iv_dport; + natt_port[1] = var->iv_sport; + m_tag_prepend(m, mtag); + } else { + return ENOBUFS; + } + } + + return 0; +} + static struct mbuf * ipsecif4_prepend_hdr(struct ipsec_variant *var, struct mbuf *m, uint8_t proto, uint8_t tos) @@ -366,10 +393,9 @@ ipsecif4_output(struct ipsec_variant *va KASSERT(sp->policy != IPSEC_POLICY_ENTRUST); KASSERT(sp->policy != IPSEC_POLICY_BYPASS); if(sp->policy != IPSEC_POLICY_IPSEC) { - struct ifnet *ifp = &var->iv_softc->ipsec_if; m_freem(m); - IF_DROP(&ifp->if_snd); - return 0; + error = ENETUNREACH; + goto done; } /* get flowinfo */ @@ -397,6 +423,13 @@ ipsecif4_output(struct ipsec_variant *va if (mtu > 0) return ipsecif4_fragout(var, family, m, mtu); + /* set NAT-T ports */ + error = ipsecif_set_natt_ports(var, m); + if (error) { + m_freem(m); + goto done; + } + /* IPsec output */ IP_STATINC(IP_STAT_LOCALOUT); error = ipsec4_process_packet(m, sp->req, &sa_mtu); @@ -469,7 +502,8 @@ ipsecif6_output(struct ipsec_variant *va { struct ifnet *ifp = &var->iv_softc->ipsec_if; struct ipsec_softc *sc = ifp->if_softc; - struct ipsec_ro *iro; + struct route *ro_pc; + kmutex_t *lock_pc; struct rtentry *rt; struct sockaddr_in6 *sin6_src; struct sockaddr_in6 *sin6_dst; @@ -575,37 +609,41 @@ ipsecif6_output(struct ipsec_variant *va sockaddr_in6_init(&u.dst6, &sin6_dst->sin6_addr, 0, 0, 0); - iro = percpu_getref(sc->ipsec_ro_percpu); - mutex_enter(iro->ir_lock); - if ((rt = rtcache_lookup(&iro->ir_ro, &u.dst)) == NULL) { - mutex_exit(iro->ir_lock); - percpu_putref(sc->ipsec_ro_percpu); + if_tunnel_get_ro(sc->ipsec_ro_percpu, &ro_pc, &lock_pc); + if ((rt = rtcache_lookup(ro_pc, &u.dst)) == NULL) { + if_tunnel_put_ro(sc->ipsec_ro_percpu, lock_pc); m_freem(m); return ENETUNREACH; } if (rt->rt_ifp == ifp) { - rtcache_unref(rt, &iro->ir_ro); - rtcache_free(&iro->ir_ro); - mutex_exit(iro->ir_lock); - percpu_putref(sc->ipsec_ro_percpu); + rtcache_unref(rt, ro_pc); + rtcache_free(ro_pc); + if_tunnel_put_ro(sc->ipsec_ro_percpu, lock_pc); m_freem(m); return ENETUNREACH; } - rtcache_unref(rt, &iro->ir_ro); + rtcache_unref(rt, ro_pc); + + /* set NAT-T ports */ + error = ipsecif_set_natt_ports(var, m); + if (error) { + m_freem(m); + goto out; + } /* * force fragmentation to minimum MTU, to avoid path MTU discovery. * it is too painful to ask for resend of inner packet, to achieve * path MTU discovery for encapsulated packets. */ - error = ip6_output(m, 0, &iro->ir_ro, + error = ip6_output(m, 0, ro_pc, ip6_ipsec_pmtu ? 0 : IPV6_MINMTU, 0, NULL, NULL); - if (error) - rtcache_free(&iro->ir_ro); - mutex_exit(iro->ir_lock); - percpu_putref(sc->ipsec_ro_percpu); +out: + if (error) + rtcache_free(ro_pc); + if_tunnel_put_ro(sc->ipsec_ro_percpu, lock_pc); return error; } @@ -887,17 +925,11 @@ ipsecif4_detach(struct ipsec_variant *va int ipsecif6_attach(struct ipsec_variant *var) { - struct sockaddr_in6 mask6; struct ipsec_softc *sc = var->iv_softc; KASSERT(if_ipsec_variant_is_configured(var)); KASSERT(var->iv_encap_cookie6 == NULL); - memset(&mask6, 0, sizeof(mask6)); - mask6.sin6_len = sizeof(struct sockaddr_in6); - mask6.sin6_addr.s6_addr32[0] = mask6.sin6_addr.s6_addr32[1] = - mask6.sin6_addr.s6_addr32[2] = mask6.sin6_addr.s6_addr32[3] = ~0; - var->iv_encap_cookie6 = encap_attach_func(AF_INET6, -1, if_ipsec_encap_func, &ipsecif6_encapsw, sc); if (var->iv_encap_cookie6 == NULL) @@ -907,16 +939,6 @@ ipsecif6_attach(struct ipsec_variant *va return 0; } -static void -ipsecif6_rtcache_free_pc(void *p, void *arg __unused, struct cpu_info *ci __unused) -{ - struct ipsec_ro *iro = p; - - mutex_enter(iro->ir_lock); - rtcache_free(&iro->ir_ro); - mutex_exit(iro->ir_lock); -} - int ipsecif6_detach(struct ipsec_variant *var) { @@ -925,7 +947,7 @@ ipsecif6_detach(struct ipsec_variant *va KASSERT(var->iv_encap_cookie6 != NULL); - percpu_foreach(sc->ipsec_ro_percpu, ipsecif6_rtcache_free_pc, NULL); + if_tunnel_ro_percpu_rtcache_free(sc->ipsec_ro_percpu); var->iv_output = NULL; error = encap_detach(var->iv_encap_cookie6); @@ -941,7 +963,8 @@ ipsecif6_ctlinput(int cmd, const struct struct ip6ctlparam *ip6cp = NULL; struct ip6_hdr *ip6; const struct sockaddr_in6 *dst6; - struct ipsec_ro *iro; + struct route *ro_pc; + kmutex_t *lock_pc; if (sa->sa_family != AF_INET6 || sa->sa_len != sizeof(struct sockaddr_in6)) @@ -965,18 +988,16 @@ ipsecif6_ctlinput(int cmd, const struct if (!ip6) return NULL; - iro = percpu_getref(sc->ipsec_ro_percpu); - mutex_enter(iro->ir_lock); - dst6 = satocsin6(rtcache_getdst(&iro->ir_ro)); + if_tunnel_get_ro(sc->ipsec_ro_percpu, &ro_pc, &lock_pc); + dst6 = satocsin6(rtcache_getdst(ro_pc)); /* XXX scope */ if (dst6 == NULL) ; else if (IN6_ARE_ADDR_EQUAL(&ip6->ip6_dst, &dst6->sin6_addr)) /* flush route cache */ - rtcache_free(&iro->ir_ro); + rtcache_free(ro_pc); - mutex_exit(iro->ir_lock); - percpu_putref(sc->ipsec_ro_percpu); + if_tunnel_put_ro(sc->ipsec_ro_percpu, lock_pc); return NULL; } Index: src/sys/netipsec/key.c diff -u src/sys/netipsec/key.c:1.163.2.13 src/sys/netipsec/key.c:1.163.2.14 --- src/sys/netipsec/key.c:1.163.2.13 Tue Sep 10 16:03:53 2019 +++ src/sys/netipsec/key.c Tue Sep 24 18:27:09 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: key.c,v 1.163.2.13 2019/09/10 16:03:53 martin Exp $ */ +/* $NetBSD: key.c,v 1.163.2.14 2019/09/24 18:27:09 martin Exp $ */ /* $FreeBSD: src/sys/netipsec/key.c,v 1.3.2.3 2004/02/14 22:23:23 bms Exp $ */ /* $KAME: key.c,v 1.191 2001/06/27 10:46:49 sakane Exp $ */ @@ -32,7 +32,7 @@ */ #include <sys/cdefs.h> -__KERNEL_RCSID(0, "$NetBSD: key.c,v 1.163.2.13 2019/09/10 16:03:53 martin Exp $"); +__KERNEL_RCSID(0, "$NetBSD: key.c,v 1.163.2.14 2019/09/24 18:27:09 martin Exp $"); /* * This code is referred to RFC 2367 @@ -1965,6 +1965,20 @@ _key_msg2sp(const struct sadb_x_policy * (*p_isr)->level = xisr->sadb_x_ipsecrequest_level; /* set IP addresses if there */ + /* + * NOTE: + * MOBIKE Extensions for PF_KEY draft says: + * If tunnel mode is specified, the sadb_x_ipsecrequest + * structure is followed by two sockaddr structures that + * define the tunnel endpoint addresses. In the case that + * transport mode is used, no additional addresses are + * specified. + * see: https://tools.ietf.org/html/draft-schilcher-mobike-pfkey-extension-01 + * + * And then, the IP addresses will be set by + * ipsec_fill_saidx_bymbuf() from packet in transport mode. + * This behavior is used by NAT-T enabled ipsecif(4). + */ if (xisr->sadb_x_ipsecrequest_len > sizeof(*xisr)) { const struct sockaddr *paddr; @@ -4565,13 +4579,13 @@ key_saidx_match( sa1dst = &saidx1->dst.sa; /* * If NAT-T is enabled, check ports for tunnel mode. - * Don't do it for transport mode, as there is no - * port information available in the SP. - * Also don't check ports if they are set to zero + * For ipsecif(4), check ports for transport mode, too. + * Don't check ports if they are set to zero * in the SPD: This means we have a non-generated * SPD which can't know UDP ports. */ - if (saidx1->mode == IPSEC_MODE_TUNNEL) + if (saidx1->mode == IPSEC_MODE_TUNNEL || + saidx1->mode == IPSEC_MODE_TRANSPORT) chkport = PORT_LOOSE; else chkport = PORT_NONE;