Hi [email protected], Thank you for your detailed review!
At first, here is updated patches. http://netbsd.org/~knakahara/if-l2tp-2/01-accept-ifname-include-digit.patch http://netbsd.org/~knakahara/if-l2tp-2/02-if-l2tp.patch And then, each response is below. On 2017/01/20 0:38, Taylor R Campbell wrote: > Date: Thu, 19 Jan 2017 17:58:17 +0900 > From: Kengo NAKAHARA <[email protected]> > A few little comments: > > diff --git a/sys/net/if.c b/sys/net/if.c > index 2386af3..ba63266 100644 > --- a/sys/net/if.c > +++ b/sys/net/if.c > @@ -1599,7 +1613,7 @@ if_clone_lookup(const char *name, int *unitp) > strcpy(ifname, "if_"); > /* separate interface name from unit */ > for (dp = ifname + 3, cp = name; cp - name < IFNAMSIZ && > - *cp && (*cp < '0' || *cp > '9');) > + *cp && !if_is_unit(cp);) > *dp++ = *cp++; > > This changes the generic syntax interface names, perhaps to allow the > `2' in `l2tp', although since this loop skips over the first three > octets that doesn't seem to be necessary. Either way, I don't have a > problem with this, but it should be done in a separate change. I see. But sorry, I want to postpone the fix to reduce unnecessary skip... As a first step, I separate this changes and will commit first. > diff --git a/sys/net/if_l2tp.c b/sys/net/if_l2tp.c > new file mode 100644 > index 0000000..dda8bbd > --- /dev/null > +++ b/sys/net/if_l2tp.c > @@ -0,0 +1,1541 @@ > [...] > +/* > + * l2tp global variable definitions > + */ > +LIST_HEAD(l2tp_sclist, l2tp_softc); > +static struct l2tp_sclist l2tp_softc_list; > +kmutex_t l2tp_list_lock; > + > +#if !defined(L2TP_ID_HASH_SIZE) > +#define L2TP_ID_HASH_SIZE 64 > +#endif > +static u_long l2tp_id_hash_mask; > + > +kmutex_t l2tp_hash_lock; > +static struct pslist_head *l2tp_hashed_list = NULL; > > Consider putting related global state into cacheline-aligned structs? Oh, I forgot it. I should put them into cacheline-aligned structs. In addition, I remove l2tp_id_hash_mask variable and use L2TP_ID_HASH_SIZE to avoid holding lock in fast-path (l2tp_lookup_session_ref() => id_hash_func()). > static struct { > kmutex_t lock; > struct l2tp_sclist list; > } l2tp_softc __cacheline_aligned; > > static struct { > kmutex_t lock; > struct pslist_head *list; > unsigned long mask; > } l2tp_hash __cacheline_aligned; > > +pserialize_t l2tp_psz; > +struct psref_class *lv_psref_class __read_mostly; > > __read_mostly for l2tp_psz? Yes, I add it. > +static int > +l2tpdetach(void) > +{ > + int error; > + > + if (!LIST_EMPTY(&l2tp_softc_list)) > + return EBUSY; > > Need lock here? Need to first set flag preventing new creation? > > mutex_enter(&l2tp_softc.lock); > KASSERT(!l2tp_softc.dying); > l2tp_softc.detaching = true; > if (!LIST_EMPTY(&l2tp_softc.list)) { > l2tp_softc.detaching = false; > mutex_exit(&l2tp_softc.lock); > return EBUSY; > } > mutex_exit(&l2tp_softc.lock); > > Anyone trying to add to l2tp_softc.list must also check > l2tp_softc.detaching before proceeding. You are right. Hmm..., it's seems there are same problems in other module'd interfaces such as pppoe(4), gre(4), and so on. I think module framework could fix this problem, so I will ask [email protected] and [email protected] if they have any idea later. > +static int > +l2tp_clone_destroy(struct ifnet *ifp) > +{ > + struct l2tp_softc *sc = (void *) ifp; > > Use container_of here: > > struct l2tp_softc *sc = container_of(ifp, struct l2tp_softc, > l2tp_ec.ec_if); > > No functional difference, but the compiler type-checks it. I use container_of here and similar codes. > +void > +l2tp_input(struct mbuf *m, struct ifnet *ifp) > +{ > + > + KASSERT(ifp != NULL); > + > + if (0 == (mtod(m, u_long) & 0x03)) { > + /* copy and align head of payload */ > + struct mbuf *m_head; > + int copy_length; > + > +#define L2TP_COPY_LENGTH 60 > +#define L2TP_LINK_HDR_ROOM (MHLEN - L2TP_COPY_LENGTH - 4/*round4(2)*/) > + > + if (m->m_pkthdr.len < L2TP_COPY_LENGTH) { > + copy_length = m->m_pkthdr.len; > + } else { > + copy_length = L2TP_COPY_LENGTH; > + } > + > + if (m->m_len < copy_length) { > + m = m_pullup(m, copy_length); > + if (m == NULL) > + return; > + } > + > + MGETHDR(m_head, M_DONTWAIT, MT_HEADER); > + if (m_head == NULL) { > + m_freem(m); > + return; > + } > + M_COPY_PKTHDR(m_head, m); > + > + m_head->m_data += 2 /* align */ + L2TP_LINK_HDR_ROOM; > + memcpy(m_head->m_data, m->m_data, copy_length); > + m_head->m_len = copy_length; > + m->m_data += copy_length; > + m->m_len -= copy_length; > + > + /* construct chain */ > + if (m->m_len == 0) { > + m_head->m_next = m_free(m); /* not m_freem */ > + } else { > + /* > + * copyed mtag in previous call M_COPY_PKTHDR > + * but don't delete mtag in case cutt of M_PKTHDR > flag > + */ > + m_tag_delete_chain(m, NULL); > + m->m_flags &= ~M_PKTHDR; > + m_head->m_next = m; > + } > + > + /* override m */ > + m = m_head; > + } > > Someone more familiar with the mbuf API than I should review this mbuf > juggling show! I also want someone() to do so... > + case SIOCSIFMTU: > + error = kauth_authorize_generic(kauth_cred_get(), > + KAUTH_GENERIC_ISSUSER, NULL); > > Why the kauth check here and not in any other drivers? Is this kauth > check unnecessary, or does its absence in other drivers indicate a > bug? Likewise in a few other places below. Oh, it must be vestige of old version kernel's manner. I remove it. > + if (error) > + break; > + switch (cmd) { > +#ifdef INET > + case SIOCSIFPHYADDR: > + src = (struct sockaddr *) > + &(((struct in_aliasreq *)data)->ifra_addr); > + dst = (struct sockaddr *) > + &(((struct in_aliasreq > *)data)->ifra_dstaddr); > > Consider using one more local variable instead of multiple levels of > nesting? > > case SIOCSIFPHYADDR: { > struct in_aliasreq *aliasreq = data; > src = (struct sockaddr *)&aliasreq->ifra_data; > dst = (struct sockaddr *)&aliasreq->ifra_dstaddr; > ... > } > > Likewise in a few other places below. Hmm, I think separating SIOCSIFPHYADDR, SIOCSIFPHYADDR_IN6 and SIOCSLIFPHYADDR cases makes simpler. I refactor such way. > + error = encap_lock_enter(); > + if (error) > + goto error; > + > + mutex_enter(&sc->l2tp_lock); > > Document lock order of encap_lock ---> struct l2tp_softc::l2tp_lock? I missed it. I add locking order at the end if_l2tp.h. > + ovar = sc->l2tp_var; > + osrc = ovar->lv_psrc; > + odst = ovar->lv_pdst; > + memcpy(nvar, ovar, sizeof(*nvar)); > > You can just do > > *nvar = *ovar; > > here, since they are both guaranteed to be aligned. I fix it. > +static int id_hash_func(uint32_t id) > +{ > + uint32_t hash; > + > + hash = (id >> 16) ^ id; > + hash = (hash >> 4) ^ hash; > + > + return hash & l2tp_id_hash_mask; > +} > > Is this hash function an essential part of the l2tp protocol, or is it > just something that will more likely involve all the bits of id when > masking with l2tp_id_hash_mask? (Asking so I can know whether it is > safe to replace by, e.g., siphash later, once I get around to adding > the siphash code I've been sitting on for about five years now.) It is not essential of L2TPv3 protocol. So, it is safe to replace other functions :) # The session id would be random value set by userland command/daemon. # And then, kernel just search correct l2tp_softc by the session id. > +/* > + * l2tp_variant update API. > + * > + * Assumption: > + * reader side dereferences sc->l2tp_var in reader critical section only, > + * that is, all of reader sides do not reader the sc->l2tp_var after > + * pserialize_perform(). > + */ > +static void > +l2tp_variant_update(struct l2tp_softc *sc, struct l2tp_variant *nvar) > +{ > + struct ifnet *ifp = &sc->l2tp_ec.ec_if; > + struct l2tp_variant *ovar = sc->l2tp_var; > + > + KASSERT(mutex_owned(&sc->l2tp_lock)); > + > + membar_producer(); > + atomic_swap_ptr(&sc->l2tp_var, nvar); > + pserialize_perform(l2tp_psz); > + psref_target_destroy(&ovar->lv_psref, lv_psref_class); > > No need for atomic_swap_ptr. Just > > sc->l2tp_var = nvar; > > is enough. Nobody else can write to it because we hold the lock. Between writer and writer, it is correct. However, between writer and reader, I think atomic_swap_ptr is required to prevent reader's load before writer's store done. Is this correct? > diff --git a/sys/net/if_l2tp.h b/sys/net/if_l2tp.h > new file mode 100644 > index 0000000..1aae23c > --- /dev/null > +++ b/sys/net/if_l2tp.h > @@ -0,0 +1,206 @@ > [...] > +#include <net/if_ether.h> > +#include <netinet/in.h> > +/* xxx sigh, why route have struct route instead of pointer? */ > > Unclear what this comment refers to? Sorry, garbage comment. # It is vestige of original file... > + > +#define SIOCSL2TPSESSION _IOW('i', 151, struct ifreq) > +#define SIOCDL2TPSESSION _IOW('i', 152, struct ifreq) > +#define SIOCSL2TPCOOKIE _IOW('i', 153, struct ifreq) > +#define SIOCDL2TPCOOKIE _IOW('i', 154, struct ifreq) > +#define SIOCSL2TPSTATE _IOW('i', 155, struct ifreq) > +#define SIOCGL2TP SIOCGIFGENERIC > > Pick tabs or spaces and be consistent? (Makes diffs look nicer. > Usual rule is `#define<TAB>xyz<TAB>'.) I apply "#define<TAB>xyz<TAB>" rule to if_l2tp.h and in_l2tp.h. > Say struct l2tp_req, not struct ifreq, if that's what you mean? I had to miss modification after copy and paste... > +struct l2tp_req { > + int state; > + int my_cookie_len; > + int peer_cookie_len; > > Pick a fixed-width unsigned integer type for this unless you actually > need negative values? They are not required negative values, I use unsigned type. > +#ifdef _KERNEL > +extern struct psref_class *lv_psref_class __read_mostly; > > The __read_mostly attribute matters only for definitions, I believe. Oh, I remove unnecessary attribute. > +struct l2tp_softc { > + struct ethercom l2tp_ec; /* common area - must be at the > top */ > + /* to use ether_input(), we must have this > */ > + percpu_t *l2tp_ro_percpu; > > Mark this with what the type of the per-CPU object is. For example, > > percpu_t *l2tp_ro_percpu; /* struct l2tp_ro */ > > (Obviously this is not as good for type checking as percpu<l2tp_ro> in > C++ or similar, but it's better than nothing for the reader's sake.) Ok, I add that comment. > +static inline bool > +l2tp_heldref_variant(struct l2tp_variant *var) > +{ > + > + if (var == NULL) > + return false; > + return psref_held(&var->lv_psref, lv_psref_class); > +} > > Both users of this first do KASSERT(var != NULL), so there's no need > for the conditional `if (var == NULL)' here. I remove unnecessary condition. > +/* Prototypes */ > +void l2tpattach(int); > +void l2tpattach0(struct l2tp_softc *); > +void l2tp_input(struct mbuf *, struct ifnet *); > +int l2tp_ioctl(struct ifnet *, u_long, void *); > + > +struct l2tp_variant* l2tp_lookup_session_ref(uint32_t, struct psref *); > > KNF: struct l2tp_variant *l2tp_lookup_session_ref(uint32_t, struct psref *); Ah, I missed it... > +/* > + * Locking notes: > + * + l2tp_softc_list is protected by l2tp_list_lock (an adaptive mutex) > + * l2tp_softc_list is list of all l2tp_softcs, and it is used to > avoid > + * wrong unload. > > Instead of `wrong unload', maybe `unload while busy' or something? Yes, I fix my poor English wording. :) > + * + l2tp_hashed_list is protected by > + * - l2tp_hash_lock (an adaptive mutex) for writer > + * - pserialize for reader > + * l2tp_hashed_list is hashed list of all l2tp_softcs, and it is > used by > + * input processing to find appropriate softc. > + * + l2tp_softc->l2tp_var is protected by > + * - l2tp_softc->l2tp_lock (an adaptive mutex) for writer > + * - l2tp_var->lv_psref for reader > + * l2tp_softc->l2tp_var is used for variant values while the l2tp > tunnel > + * exists. > > This looks great! Can you also state any lock order constraints here? > If the only constraint is that no pair of these locks is ever held > simultaneously, so be it -- say that too. It looks like encap_lock > needs to be mentioned, though. I add lock order comment. I found I forgot description about struct l2tp_ro->lr_lock, so I add it, too. > diff --git a/sys/netinet/in_l2tp.c b/sys/netinet/in_l2tp.c > new file mode 100644 > index 0000000..9b2ccd6 > --- /dev/null > +++ b/sys/netinet/in_l2tp.c > @@ -0,0 +1,417 @@ > [...] > +int > +in_l2tp_output(struct l2tp_variant *var, struct mbuf *m) > +{ > [...] > + bzero(&iphdr, sizeof(iphdr)); > > Use memset, not bzero. Ahhhhhh, it is replacement leakage. # original implementation was made for old version... > + if (var->lv_peer_cookie_len == 4) { > + cookie_32 = htonl((uint32_t)var->lv_peer_cookie); > + memcpy(mtod(m, uint32_t *), &cookie_32, > + sizeof(uint32_t)); > > I have the impression that mtod(m, T *) is supposed to be used only > when m is actually aligned for a T. Most uses of memcpy(mtod(m, T *), > ...) use void or uint8_t: > > memcpy(mtod(m, void *), &cookie_32, sizeof(uint32_t)); > > I would suggest doing that, in case anyone ever makes mtod check > alignment -- unless you can guarantee alignment, in which case you can > just do > > *mtod(m, uint32_t *) = cookie_32; I see. I use memcpy(mtod(m, void *), ...). > + error = ip_output(m, NULL, &lro->lr_ro, 0, NULL, NULL); > + mutex_exit(&lro->lr_lock); > + percpu_putref(sc->l2tp_ro_percpu); > > Hope it's safe to call ip_output with this lock held! Is it easy to > prove that ip_output can only at worst put the mbuf on a queue, or > that if it recursively calls in_l2tp_output, the recursion detection > will prevent locking against myself? Sorry to say, it cannot. Because if we call ip_output() without lro->lc_lock, concurrent execution of l2tp output softint and dad timer softint in the same CPU cause panic. That is, + begin dad timer processing (the lwp is softclk/0) - in_l2tp_output() - rtcache_lookup() - hold struct ro->ro_psref - call ip_output() + hardware interrupt arises (would be ethernet Rx interrupt) + call softint handlers by fast softints + begin l2tp output processing (the lwp is softnet/0) - in_l2tp_output() - rtcache_lookup() - hold struct ro->ro_psref - call ip_output() + hardware interrupt arises + resume dad timer processing - resume ip_output() - release struct ro->ro_psref - failure assertion in psref_release()'s KASSERTMSG((psref->psref_lwp == curlwp) At least, locking against myself by calling in_l2tp_output() recursively is prevent by setting MAX_L2TP_NEST to 0. > diff --git a/sys/netinet/in_proto.c b/sys/netinet/in_proto.c > index 5534847..e318a7b 100644 > --- a/sys/netinet/in_proto.c > +++ b/sys/netinet/in_proto.c > @@ -360,6 +360,16 @@ const struct protosw inetsw[] = { > .pr_init = carp_init, > }, > #endif /* NCARP > 0 */ > +{ .pr_type = SOCK_RAW, > + .pr_domain = &inetdomain, > > Should this be conditional on NL2TP > 0? I think no. To build and load libl2tp of rump, libinet should not depend to libl2tp. Thanks, -- ////////////////////////////////////////////////////////////////////// Internet Initiative Japan Inc. Device Engineering Section, IoT Platform Development Department, Network Division, Technology Unit Kengo NAKAHARA <[email protected]>
