Hi,

I'm seeing a panic when I start natd, which I suspect is related to this
commit:

panic: Lock pcbinfohash not exclusively locked @
/usr/src/sys/netinet/in_pcb.c:323

Backtrace:
panic() ...
_rw_assert() at _rw_assert+0x18d
in_pcbbind() at in_pcbbind+0xf5
div_bind() at div_bind+0xb9
sobind() at sobind()+0x79
kern_bind() at kern_bind+0xde
bind() at bind()+0x41
syscallenter() at syscallenter+0x1cb
syscall() at syscall+0x4c
Xfast_syscall() at Xfast_syscall+0xdd

div_bind probably also needs to surround the call to in_pcbbind with
INP_HASHW(UN)LOCK(...)

I'm currently running 222680. I've only now seen the issue, but I've also
just now activated INVARIANTS.

Regards,
Kristof

On 2011-05-30 09:43:55 (+0000), Robert Watson <rwat...@freebsd.org> wrote:
> Author: rwatson
> Date: Mon May 30 09:43:55 2011
> New Revision: 222488
> URL: http://svn.freebsd.org/changeset/base/222488
> 
> Log:
>   Decompose the current single inpcbinfo lock into two locks:
>   
>   - The existing ipi_lock continues to protect the global inpcb list and
>     inpcb counter.  This lock is now relegated to a small number of
>     allocation and free operations, and occasional operations that walk
>     all connections (including, awkwardly, certain UDP multicast receive
>     operations -- something to revisit).
>   
>   - A new ipi_hash_lock protects the two inpcbinfo hash tables for
>     looking up connections and bound sockets, manipulated using new
>     INP_HASH_*() macros.  This lock, combined with inpcb locks, protects
>     the 4-tuple address space.
>   
>   Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
>   connection locks, so may be acquired while manipulating a connection on
>   which a lock is already held, avoiding the need to acquire the inpcbinfo
>   lock preemptively when a binding change might later be required.  As a
>   result, however, lookup operations necessarily go through a reference
>   acquire while holding the lookup lock, later acquiring an inpcb lock --
>   if required.
>   
>   A new function in_pcblookup() looks up connections, and accepts flags
>   indicating how to return the inpcb.  Due to lock order changes, callers
>   no longer need acquire locks before performing a lookup: the lookup
>   routine will acquire the ipi_hash_lock as needed.  In the future, it will
>   also be able to use alternative lookup and locking strategies
>   transparently to callers, such as pcbgroup lookup.  New lookup flags are,
>   supplementing the existing INPLOOKUP_WILDCARD flag:
>   
>     INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
>     INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
>   
>   Callers must pass exactly one of these flags (for the time being).
>   
>   Some notes:
>   
>   - All protocols are updated to work within the new regime; especially,
>     TCP, UDPv4, and UDPv6.  pcbinfo ipi_lock acquisitions are largely
>     eliminated, and global hash lock hold times are dramatically reduced
>     compared to previous locking.
>   - The TCP syncache still relies on the pcbinfo lock, something that we
>     may want to revisit.
>   - Support for reverting to the FreeBSD 7.x locking strategy in TCP input
>     is no longer available -- hash lookup locks are now held only very
>     briefly during inpcb lookup, rather than for potentially extended
>     periods.  However, the pcbinfo ipi_lock will still be acquired if a
>     connection state might change such that a connection is added or
>     removed.
>   - Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
>     due to maintaining their own hash tables.
>   - The interface in6_pcblookup_hash_locked() is maintained, which allows
>     callers to acquire hash locks and perform one or more lookups atomically
>     with 4-tuple allocation: this is required only for TCPv6, as there is no
>     in6_pcbconnect_setup(), which there should be.
>   - UDPv6 locking remains significantly more conservative than UDPv4
>     locking, which relates to source address selection.  This needs
>     attention, as it likely significantly reduces parallelism in this code
>     for multithreaded socket use (such as in BIND).
>   - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
>     somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
>     is no longer sufficient.  A second check once the inpcb lock is held
>     should do the trick, keeping the general case from requiring the inpcb
>     lock for every inpcb visited.
>   - This work reminds us that we need to revisit locking of the v4/v6 flags,
>     which may be accessed lock-free both before and after this change.
>   - Right now, a single lock name is used for the pcbhash lock -- this is
>     undesirable, and probably another argument is required to take care of
>     this (or a char array name field in the pcbinfo?).
>   
>   This is not an MFC candidate for 8.x due to its impact on lookup and
>   locking semantics.  It's possible some of these issues could be worked
>   around with compatibility wrappers, if necessary.
>   
>   Reviewed by:    bz
>   Sponsored by:   Juniper Networks, Inc.
> 
> Modified:
>   head/sys/contrib/pf/net/pf.c
>   head/sys/netinet/in_pcb.c
>   head/sys/netinet/in_pcb.h
>   head/sys/netinet/ip_divert.c
>   head/sys/netinet/ipfw/ip_fw2.c
>   head/sys/netinet/raw_ip.c
>   head/sys/netinet/siftr.c
>   head/sys/netinet/tcp_input.c
>   head/sys/netinet/tcp_subr.c
>   head/sys/netinet/tcp_syncache.c
>   head/sys/netinet/tcp_timer.c
>   head/sys/netinet/tcp_usrreq.c
>   head/sys/netinet/udp_usrreq.c
>   head/sys/netinet6/in6_pcb.c
>   head/sys/netinet6/in6_pcb.h
>   head/sys/netinet6/in6_src.c
>   head/sys/netinet6/udp6_usrreq.c
> 
> Modified: head/sys/contrib/pf/net/pf.c
> ==============================================================================
> --- head/sys/contrib/pf/net/pf.c      Mon May 30 09:41:38 2011        
> (r222487)
> +++ head/sys/contrib/pf/net/pf.c      Mon May 30 09:43:55 2011        
> (r222488)
> @@ -3034,16 +3034,14 @@ pf_socket_lookup(int direction, struct p
>  #ifdef INET
>       case AF_INET:
>  #ifdef __FreeBSD__
> -             INP_INFO_RLOCK(pi);     /* XXX LOR */
> -             inp = in_pcblookup_hash(pi, saddr->v4, sport, daddr->v4,
> -                     dport, 0, NULL);
> +             inp = in_pcblookup(pi, saddr->v4, sport, daddr->v4,
> +                     dport, INPLOOKUP_RLOCKPCB, NULL);
>               if (inp == NULL) {
> -                     inp = in_pcblookup_hash(pi, saddr->v4, sport,
> -                        daddr->v4, dport, INPLOOKUP_WILDCARD, NULL);
> -                     if(inp == NULL) {
> -                             INP_INFO_RUNLOCK(pi);
> +                     inp = in_pcblookup(pi, saddr->v4, sport,
> +                        daddr->v4, dport, INPLOOKUP_WILDCARD |
> +                        INPLOOKUP_RLOCKPCB, NULL);
> +                     if (inp == NULL)
>                               return (-1);
> -                     }
>               }
>  #else
>               inp = in_pcbhashlookup(tb, saddr->v4, sport, daddr->v4, dport);
> @@ -3058,16 +3056,14 @@ pf_socket_lookup(int direction, struct p
>  #ifdef INET6
>       case AF_INET6:
>  #ifdef __FreeBSD__
> -             INP_INFO_RLOCK(pi);
> -             inp = in6_pcblookup_hash(pi, &saddr->v6, sport,
> -                     &daddr->v6, dport, 0, NULL);
> +             inp = in6_pcblookup(pi, &saddr->v6, sport,
> +                     &daddr->v6, dport, INPLOOKUP_RLOCKPCB, NULL);
>               if (inp == NULL) {
> -                     inp = in6_pcblookup_hash(pi, &saddr->v6, sport,
> -                     &daddr->v6, dport, INPLOOKUP_WILDCARD, NULL);
> -                     if (inp == NULL) {
> -                             INP_INFO_RUNLOCK(pi);
> +                     inp = in6_pcblookup(pi, &saddr->v6, sport,
> +                         &daddr->v6, dport, INPLOOKUP_WILDCARD |
> +                         INPLOOKUP_RLOCKPCB, NULL);
> +                     if (inp == NULL)
>                               return (-1);
> -                     }
>               }
>  #else
>               inp = in6_pcbhashlookup(tb, &saddr->v6, sport, &daddr->v6,
> @@ -3085,9 +3081,10 @@ pf_socket_lookup(int direction, struct p
>               return (-1);
>       }
>  #ifdef __FreeBSD__
> +     INP_RLOCK_ASSERT(inp);
>       pd->lookup.uid = inp->inp_cred->cr_uid;
>       pd->lookup.gid = inp->inp_cred->cr_groups[0];
> -     INP_INFO_RUNLOCK(pi);
> +     INP_RUNLOCK(inp);
>  #else
>       pd->lookup.uid = inp->inp_socket->so_euid;
>       pd->lookup.gid = inp->inp_socket->so_egid;
> 
> Modified: head/sys/netinet/in_pcb.c
> ==============================================================================
> --- head/sys/netinet/in_pcb.c Mon May 30 09:41:38 2011        (r222487)
> +++ head/sys/netinet/in_pcb.c Mon May 30 09:43:55 2011        (r222488)
> @@ -127,6 +127,10 @@ static VNET_DEFINE(int, ipport_tcplastco
>  
>  #define      V_ipport_tcplastcount           VNET(ipport_tcplastcount)
>  
> +static struct inpcb  *in_pcblookup_hash_locked(struct inpcbinfo *pcbinfo,
> +                         struct in_addr faddr, u_int fport_arg,
> +                         struct in_addr laddr, u_int lport_arg,
> +                         int lookupflags, struct ifnet *ifp);
>  static void  in_pcbremlists(struct inpcb *inp);
>  
>  #ifdef INET
> @@ -212,11 +216,13 @@ in_pcbinfo_init(struct inpcbinfo *pcbinf
>  {
>  
>       INP_INFO_LOCK_INIT(pcbinfo, name);
> +     INP_HASH_LOCK_INIT(pcbinfo, "pcbinfohash");     /* XXXRW: argument? */
>  #ifdef VIMAGE
>       pcbinfo->ipi_vnet = curvnet;
>  #endif
>       pcbinfo->ipi_listhead = listhead;
>       LIST_INIT(pcbinfo->ipi_listhead);
> +     pcbinfo->ipi_count = 0;
>       pcbinfo->ipi_hashbase = hashinit(hash_nelements, M_PCB,
>           &pcbinfo->ipi_hashmask);
>       pcbinfo->ipi_porthashbase = hashinit(porthash_nelements, M_PCB,
> @@ -234,10 +240,14 @@ void
>  in_pcbinfo_destroy(struct inpcbinfo *pcbinfo)
>  {
>  
> +     KASSERT(pcbinfo->ipi_count == 0,
> +         ("%s: ipi_count = %u", __func__, pcbinfo->ipi_count));
> +
>       hashdestroy(pcbinfo->ipi_hashbase, M_PCB, pcbinfo->ipi_hashmask);
>       hashdestroy(pcbinfo->ipi_porthashbase, M_PCB,
>           pcbinfo->ipi_porthashmask);
>       uma_zdestroy(pcbinfo->ipi_zone);
> +     INP_HASH_LOCK_DESTROY(pcbinfo);
>       INP_INFO_LOCK_DESTROY(pcbinfo);
>  }
>  
> @@ -309,8 +319,8 @@ in_pcbbind(struct inpcb *inp, struct soc
>  {
>       int anonport, error;
>  
> -     INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
>       INP_WLOCK_ASSERT(inp);
> +     INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo);
>  
>       if (inp->inp_lport != 0 || inp->inp_laddr.s_addr != INADDR_ANY)
>               return (EINVAL);
> @@ -351,8 +361,8 @@ in_pcb_lport(struct inpcb *inp, struct i
>        * Because no actual state changes occur here, a global write lock on
>        * the pcbinfo isn't required.
>        */
> -     INP_INFO_LOCK_ASSERT(pcbinfo);
>       INP_LOCK_ASSERT(inp);
> +     INP_HASH_LOCK_ASSERT(pcbinfo);
>  
>       if (inp->inp_flags & INP_HIGHPORT) {
>               first = V_ipport_hifirstauto;   /* sysctl */
> @@ -473,11 +483,10 @@ in_pcbbind_setup(struct inpcb *inp, stru
>       int error;
>  
>       /*
> -      * Because no actual state changes occur here, a global write lock on
> -      * the pcbinfo isn't required.
> +      * No state changes, so read locks are sufficient here.
>        */
> -     INP_INFO_LOCK_ASSERT(pcbinfo);
>       INP_LOCK_ASSERT(inp);
> +     INP_HASH_LOCK_ASSERT(pcbinfo);
>  
>       if (TAILQ_EMPTY(&V_in_ifaddrhead)) /* XXX broken! */
>               return (EADDRNOTAVAIL);
> @@ -618,8 +627,8 @@ in_pcbconnect(struct inpcb *inp, struct 
>       in_addr_t laddr, faddr;
>       int anonport, error;
>  
> -     INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
>       INP_WLOCK_ASSERT(inp);
> +     INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo);
>  
>       lport = inp->inp_lport;
>       laddr = inp->inp_laddr.s_addr;
> @@ -907,8 +916,8 @@ in_pcbconnect_setup(struct inpcb *inp, s
>        * Because a global state change doesn't actually occur here, a read
>        * lock is sufficient.
>        */
> -     INP_INFO_LOCK_ASSERT(inp->inp_pcbinfo);
>       INP_LOCK_ASSERT(inp);
> +     INP_HASH_LOCK_ASSERT(inp->inp_pcbinfo);
>  
>       if (oinpp != NULL)
>               *oinpp = NULL;
> @@ -983,8 +992,8 @@ in_pcbconnect_setup(struct inpcb *inp, s
>               if (error)
>                       return (error);
>       }
> -     oinp = in_pcblookup_hash(inp->inp_pcbinfo, faddr, fport, laddr, lport,
> -         0, NULL);
> +     oinp = in_pcblookup_hash_locked(inp->inp_pcbinfo, faddr, fport,
> +         laddr, lport, 0, NULL);
>       if (oinp != NULL) {
>               if (oinpp != NULL)
>                       *oinpp = oinp;
> @@ -1007,8 +1016,8 @@ void
>  in_pcbdisconnect(struct inpcb *inp)
>  {
>  
> -     INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
>       INP_WLOCK_ASSERT(inp);
> +     INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo);
>  
>       inp->inp_faddr.s_addr = INADDR_ANY;
>       inp->inp_fport = 0;
> @@ -1187,19 +1196,24 @@ void
>  in_pcbdrop(struct inpcb *inp)
>  {
>  
> -     INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
>       INP_WLOCK_ASSERT(inp);
>  
> +     /*
> +      * XXXRW: Possibly we should protect the setting of INP_DROPPED with
> +      * the hash lock...?
> +      */
>       inp->inp_flags |= INP_DROPPED;
>       if (inp->inp_flags & INP_INHASHLIST) {
>               struct inpcbport *phd = inp->inp_phd;
>  
> +             INP_HASH_WLOCK(inp->inp_pcbinfo);
>               LIST_REMOVE(inp, inp_hash);
>               LIST_REMOVE(inp, inp_portlist);
>               if (LIST_FIRST(&phd->phd_pcblist) == NULL) {
>                       LIST_REMOVE(phd, phd_hash);
>                       free(phd, M_PCB);
>               }
> +             INP_HASH_WUNLOCK(inp->inp_pcbinfo);
>               inp->inp_flags &= ~INP_INHASHLIST;
>       }
>  }
> @@ -1328,7 +1342,8 @@ in_pcbpurgeif0(struct inpcbinfo *pcbinfo
>  }
>  
>  /*
> - * Lookup a PCB based on the local address and port.
> + * Lookup a PCB based on the local address and port.  Caller must hold the
> + * hash lock.  No inpcb locks or references are acquired.
>   */
>  #define INP_LOOKUP_MAPPED_PCB_COST   3
>  struct inpcb *
> @@ -1346,7 +1361,7 @@ in_pcblookup_local(struct inpcbinfo *pcb
>       KASSERT((lookupflags & ~(INPLOOKUP_WILDCARD)) == 0,
>           ("%s: invalid lookup flags %d", __func__, lookupflags));
>  
> -     INP_INFO_LOCK_ASSERT(pcbinfo);
> +     INP_HASH_LOCK_ASSERT(pcbinfo);
>  
>       if ((lookupflags & INPLOOKUP_WILDCARD) == 0) {
>               struct inpcbhead *head;
> @@ -1450,10 +1465,12 @@ in_pcblookup_local(struct inpcbinfo *pcb
>  #undef INP_LOOKUP_MAPPED_PCB_COST
>  
>  /*
> - * Lookup PCB in hash list.
> + * Lookup PCB in hash list, using pcbinfo tables.  This variation assumes
> + * that the caller has locked the hash list, and will not perform any further
> + * locking or reference operations on either the hash list or the connection.
>   */
> -struct inpcb *
> -in_pcblookup_hash(struct inpcbinfo *pcbinfo, struct in_addr faddr,
> +static struct inpcb *
> +in_pcblookup_hash_locked(struct inpcbinfo *pcbinfo, struct in_addr faddr,
>      u_int fport_arg, struct in_addr laddr, u_int lport_arg, int lookupflags,
>      struct ifnet *ifp)
>  {
> @@ -1464,7 +1481,7 @@ in_pcblookup_hash(struct inpcbinfo *pcbi
>       KASSERT((lookupflags & ~(INPLOOKUP_WILDCARD)) == 0,
>           ("%s: invalid lookup flags %d", __func__, lookupflags));
>  
> -     INP_INFO_LOCK_ASSERT(pcbinfo);
> +     INP_HASH_LOCK_ASSERT(pcbinfo);
>  
>       /*
>        * First look for an exact match.
> @@ -1574,6 +1591,56 @@ in_pcblookup_hash(struct inpcbinfo *pcbi
>  
>       return (NULL);
>  }
> +
> +/*
> + * Lookup PCB in hash list, using pcbinfo tables.  This variation locks the
> + * hash list lock, and will return the inpcb locked (i.e., requires
> + * INPLOOKUP_LOCKPCB).
> + */
> +static struct inpcb *
> +in_pcblookup_hash(struct inpcbinfo *pcbinfo, struct in_addr faddr,
> +    u_int fport, struct in_addr laddr, u_int lport, int lookupflags,
> +    struct ifnet *ifp)
> +{
> +     struct inpcb *inp;
> +
> +     INP_HASH_RLOCK(pcbinfo);
> +     inp = in_pcblookup_hash_locked(pcbinfo, faddr, fport, laddr, lport,
> +         (lookupflags & ~(INPLOOKUP_RLOCKPCB | INPLOOKUP_WLOCKPCB)), ifp);
> +     if (inp != NULL) {
> +             in_pcbref(inp);
> +             INP_HASH_RUNLOCK(pcbinfo);
> +             if (lookupflags & INPLOOKUP_WLOCKPCB) {
> +                     INP_WLOCK(inp);
> +                     if (in_pcbrele_wlocked(inp))
> +                             return (NULL);
> +             } else if (lookupflags & INPLOOKUP_RLOCKPCB) {
> +                     INP_RLOCK(inp);
> +                     if (in_pcbrele_rlocked(inp))
> +                             return (NULL);
> +             } else
> +                     panic("%s: locking bug", __func__);
> +     } else
> +             INP_HASH_RUNLOCK(pcbinfo);
> +     return (inp);
> +}
> +
> +/*
> + * Public inpcb lookup routines, accepting a 4-tuple.
> + */
> +struct inpcb *
> +in_pcblookup(struct inpcbinfo *pcbinfo, struct in_addr faddr, u_int fport,
> +    struct in_addr laddr, u_int lport, int lookupflags, struct ifnet *ifp)
> +{
> +
> +     KASSERT((lookupflags & ~INPLOOKUP_MASK) == 0,
> +         ("%s: invalid lookup flags %d", __func__, lookupflags));
> +     KASSERT((lookupflags & (INPLOOKUP_RLOCKPCB | INPLOOKUP_WLOCKPCB)) != 0,
> +         ("%s: LOCKPCB not set", __func__));
> +
> +     return (in_pcblookup_hash(pcbinfo, faddr, fport, laddr, lport,
> +         lookupflags, ifp));
> +}
>  #endif /* INET */
>  
>  /*
> @@ -1588,8 +1655,9 @@ in_pcbinshash(struct inpcb *inp)
>       struct inpcbport *phd;
>       u_int32_t hashkey_faddr;
>  
> -     INP_INFO_WLOCK_ASSERT(pcbinfo);
>       INP_WLOCK_ASSERT(inp);
> +     INP_HASH_WLOCK_ASSERT(pcbinfo);
> +
>       KASSERT((inp->inp_flags & INP_INHASHLIST) == 0,
>           ("in_pcbinshash: INP_INHASHLIST"));
>  
> @@ -1645,8 +1713,9 @@ in_pcbrehash(struct inpcb *inp)
>       struct inpcbhead *head;
>       u_int32_t hashkey_faddr;
>  
> -     INP_INFO_WLOCK_ASSERT(pcbinfo);
>       INP_WLOCK_ASSERT(inp);
> +     INP_HASH_WLOCK_ASSERT(pcbinfo);
> +
>       KASSERT(inp->inp_flags & INP_INHASHLIST,
>           ("in_pcbrehash: !INP_INHASHLIST"));
>  
> @@ -1679,12 +1748,14 @@ in_pcbremlists(struct inpcb *inp)
>       if (inp->inp_flags & INP_INHASHLIST) {
>               struct inpcbport *phd = inp->inp_phd;
>  
> +             INP_HASH_WLOCK(pcbinfo);
>               LIST_REMOVE(inp, inp_hash);
>               LIST_REMOVE(inp, inp_portlist);
>               if (LIST_FIRST(&phd->phd_pcblist) == NULL) {
>                       LIST_REMOVE(phd, phd_hash);
>                       free(phd, M_PCB);
>               }
> +             INP_HASH_WUNLOCK(pcbinfo);
>               inp->inp_flags &= ~INP_INHASHLIST;
>       }
>       LIST_REMOVE(inp, inp_list);
> 
> Modified: head/sys/netinet/in_pcb.h
> ==============================================================================
> --- head/sys/netinet/in_pcb.h Mon May 30 09:41:38 2011        (r222487)
> +++ head/sys/netinet/in_pcb.h Mon May 30 09:43:55 2011        (r222488)
> @@ -268,22 +268,22 @@ struct inpcbport {
>   * Global data structure for each high-level protocol (UDP, TCP, ...) in both
>   * IPv4 and IPv6.  Holds inpcb lists and information for managing them.
>   *
> - * Each pcbinfo is protected by ipi_lock, covering mutable global fields 
> (such
> - * as the global pcb list) and hashed lookup tables.  The lock order is:
> + * Each pcbinfo is protected by two locks: ipi_lock and ipi_hash_lock,
> + * the former covering mutable global fields (such as the global pcb list),
> + * and the latter covering the hashed lookup tables.  The lock order is:
>   *
> - *    ipi_lock (before) inpcb locks
> + *    ipi_lock (before) inpcb locks (before) ipi_hash_lock
>   *
>   * Locking key:
>   *
>   * (c) Constant or nearly constant after initialisation
>   * (g) Locked by ipi_lock
> - * (h) Read using either ipi_lock or inpcb lock; write requires both.
> + * (h) Read using either ipi_hash_lock or inpcb lock; write requires both.
>   * (x) Synchronisation properties poorly defined
>   */
>  struct inpcbinfo {
>       /*
> -      * Global lock protecting global inpcb list, inpcb count, hash tables,
> -      * etc.
> +      * Global lock protecting global inpcb list, inpcb count, etc.
>        */
>       struct rwlock            ipi_lock;
>  
> @@ -312,17 +312,22 @@ struct inpcbinfo {
>       struct  uma_zone        *ipi_zone;              /* (c) */
>  
>       /*
> +      * Global lock protecting hash lookup tables.
> +      */
> +     struct rwlock            ipi_hash_lock;
> +
> +     /*
>        * Global hash of inpcbs, hashed by local and foreign addresses and
>        * port numbers.
>        */
> -     struct inpcbhead        *ipi_hashbase;          /* (g) */
> -     u_long                   ipi_hashmask;          /* (g) */
> +     struct inpcbhead        *ipi_hashbase;          /* (h) */
> +     u_long                   ipi_hashmask;          /* (h) */
>  
>       /*
>        * Global hash of inpcbs, hashed by only local port number.
>        */
> -     struct inpcbporthead    *ipi_porthashbase;      /* (g) */
> -     u_long                   ipi_porthashmask;      /* (g) */
> +     struct inpcbporthead    *ipi_porthashbase;      /* (h) */
> +     u_long                   ipi_porthashmask;      /* (h) */
>  
>       /*
>        * Pointer to network stack instance
> @@ -406,6 +411,18 @@ void     inp_4tuple_get(struct inpcb *inp, 
>  #define INP_INFO_WLOCK_ASSERT(ipi)   rw_assert(&(ipi)->ipi_lock, RA_WLOCKED)
>  #define INP_INFO_UNLOCK_ASSERT(ipi)  rw_assert(&(ipi)->ipi_lock, RA_UNLOCKED)
>  
> +#define      INP_HASH_LOCK_INIT(ipi, d) \
> +     rw_init_flags(&(ipi)->ipi_hash_lock, (d), 0)
> +#define      INP_HASH_LOCK_DESTROY(ipi)      
> rw_destroy(&(ipi)->ipi_hash_lock)
> +#define      INP_HASH_RLOCK(ipi)             rw_rlock(&(ipi)->ipi_hash_lock)
> +#define      INP_HASH_WLOCK(ipi)             rw_wlock(&(ipi)->ipi_hash_lock)
> +#define      INP_HASH_RUNLOCK(ipi)           
> rw_runlock(&(ipi)->ipi_hash_lock)
> +#define      INP_HASH_WUNLOCK(ipi)           
> rw_wunlock(&(ipi)->ipi_hash_lock)
> +#define      INP_HASH_LOCK_ASSERT(ipi)       
> rw_assert(&(ipi)->ipi_hash_lock, \
> +                                         RA_LOCKED)
> +#define      INP_HASH_WLOCK_ASSERT(ipi)      
> rw_assert(&(ipi)->ipi_hash_lock, \
> +                                         RA_WLOCKED)
> +
>  #define INP_PCBHASH(faddr, lport, fport, mask) \
>       (((faddr) ^ ((faddr) >> 16) ^ ntohs((lport) ^ (fport))) & (mask))
>  #define INP_PCBPORTHASH(lport, mask) \
> @@ -466,7 +483,16 @@ void     inp_4tuple_get(struct inpcb *inp, 
>  #define      INP_LLE_VALID           0x00000001 /* cached lle is valid */    
>  #define      INP_RT_VALID            0x00000002 /* cached rtentry is valid */
>  
> -#define      INPLOOKUP_WILDCARD      1
> +/*
> + * Flags passed to in_pcblookup*() functions.
> + */
> +#define      INPLOOKUP_WILDCARD      0x00000001      /* Allow wildcard 
> sockets. */
> +#define      INPLOOKUP_RLOCKPCB      0x00000002      /* Return inpcb 
> read-locked. */
> +#define      INPLOOKUP_WLOCKPCB      0x00000004      /* Return inpcb 
> write-locked. */
> +
> +#define      INPLOOKUP_MASK  (INPLOOKUP_WILDCARD | INPLOOKUP_RLOCKPCB | \
> +                         INPLOOKUP_WLOCKPCB)
> +
>  #define      sotoinpcb(so)   ((struct inpcb *)(so)->so_pcb)
>  #define      sotoin6pcb(so)  sotoinpcb(so) /* for KAME src sync over BSD*'s 
> */
>  
> @@ -527,7 +553,7 @@ struct inpcb *
>       in_pcblookup_local(struct inpcbinfo *,
>           struct in_addr, u_short, int, struct ucred *);
>  struct inpcb *
> -     in_pcblookup_hash(struct inpcbinfo *, struct in_addr, u_int,
> +     in_pcblookup(struct inpcbinfo *, struct in_addr, u_int,
>           struct in_addr, u_int, int, struct ifnet *);
>  void in_pcbnotifyall(struct inpcbinfo *pcbinfo, struct in_addr,
>           int, struct inpcb *(*)(struct inpcb *, int));
> 
> Modified: head/sys/netinet/ip_divert.c
> ==============================================================================
> --- head/sys/netinet/ip_divert.c      Mon May 30 09:41:38 2011        
> (r222487)
> +++ head/sys/netinet/ip_divert.c      Mon May 30 09:43:55 2011        
> (r222488)
> @@ -659,9 +659,9 @@ div_pcblist(SYSCTL_HANDLER_ARGS)
>       INP_INFO_WLOCK(&V_divcbinfo);
>       for (i = 0; i < n; i++) {
>               inp = inp_list[i];
> -             INP_WLOCK(inp);
> -             if (!in_pcbrele(inp))
> -                     INP_WUNLOCK(inp);
> +             INP_RLOCK(inp);
> +             if (!in_pcbrele_rlocked(inp))
> +                     INP_RUNLOCK(inp);
>       }
>       INP_INFO_WUNLOCK(&V_divcbinfo);
>  
> 
> Modified: head/sys/netinet/ipfw/ip_fw2.c
> ==============================================================================
> --- head/sys/netinet/ipfw/ip_fw2.c    Mon May 30 09:41:38 2011        
> (r222487)
> +++ head/sys/netinet/ipfw/ip_fw2.c    Mon May 30 09:43:55 2011        
> (r222488)
> @@ -657,7 +657,7 @@ check_uidgid(ipfw_insn_u32 *insn, int pr
>           (struct bsd_ucred *)uc, ugid_lookupp, ((struct mbuf *)inp)->m_skb);
>  #else  /* FreeBSD */
>       struct inpcbinfo *pi;
> -     int wildcard;
> +     int lookupflags;
>       struct inpcb *pcb;
>       int match;
>  
> @@ -682,30 +682,31 @@ check_uidgid(ipfw_insn_u32 *insn, int pr
>       if (*ugid_lookupp == -1)
>               return (0);
>       if (proto == IPPROTO_TCP) {
> -             wildcard = 0;
> +             lookupflags = 0;
>               pi = &V_tcbinfo;
>       } else if (proto == IPPROTO_UDP) {
> -             wildcard = INPLOOKUP_WILDCARD;
> +             lookupflags = INPLOOKUP_WILDCARD;
>               pi = &V_udbinfo;
>       } else
>               return 0;
> +     lookupflags |= INPLOOKUP_RLOCKPCB;
>       match = 0;
>       if (*ugid_lookupp == 0) {
> -             INP_INFO_RLOCK(pi);
>               pcb =  (oif) ?
> -                     in_pcblookup_hash(pi,
> +                     in_pcblookup(pi,
>                               dst_ip, htons(dst_port),
>                               src_ip, htons(src_port),
> -                             wildcard, oif) :
> -                     in_pcblookup_hash(pi,
> +                             lookupflags, oif) :
> +                     in_pcblookup(pi,
>                               src_ip, htons(src_port),
>                               dst_ip, htons(dst_port),
> -                             wildcard, NULL);
> +                             lookupflags, NULL);
>               if (pcb != NULL) {
> +                     INP_RLOCK_ASSERT(pcb);
>                       *uc = crhold(pcb->inp_cred);
>                       *ugid_lookupp = 1;
> +                     INP_RUNLOCK(pcb);
>               }
> -             INP_INFO_RUNLOCK(pi);
>               if (*ugid_lookupp == 0) {
>                       /*
>                        * We tried and failed, set the variable to -1
> @@ -1827,21 +1828,32 @@ do {                                                  
>         \
>                               else
>                                       break;
>  
> +                             /*
> +                              * XXXRW: so_user_cookie should almost
> +                              * certainly be inp_user_cookie?
> +                              */
> +
>                               /* For incomming packet, lookup up the 
>                               inpcb using the src/dest ip/port tuple */
>                               if (inp == NULL) {
> -                                     INP_INFO_RLOCK(pi);
> -                                     inp = in_pcblookup_hash(pi, 
> +                                     inp = in_pcblookup(pi, 
>                                               src_ip, htons(src_port),
>                                               dst_ip, htons(dst_port),
> -                                             0, NULL);
> -                                     INP_INFO_RUNLOCK(pi);
> -                             }
> -                             
> -                             if (inp && inp->inp_socket) {
> -                                     tablearg = 
> inp->inp_socket->so_user_cookie;
> -                                     if (tablearg)
> -                                             match = 1;
> +                                             INPLOOKUP_RLOCKPCB, NULL);
> +                                     if (inp != NULL) {
> +                                             tablearg =
> +                                                 
> inp->inp_socket->so_user_cookie;
> +                                             if (tablearg)
> +                                                     match = 1;
> +                                             INP_RUNLOCK(inp);
> +                                     }
> +                             } else {
> +                                     if (inp->inp_socket) {
> +                                             tablearg =
> +                                                 
> inp->inp_socket->so_user_cookie;
> +                                             if (tablearg)
> +                                                     match = 1;
> +                                     }
>                               }
>                               break;
>                       }
> 
> Modified: head/sys/netinet/raw_ip.c
> ==============================================================================
> --- head/sys/netinet/raw_ip.c Mon May 30 09:41:38 2011        (r222487)
> +++ head/sys/netinet/raw_ip.c Mon May 30 09:43:55 2011        (r222488)
> @@ -226,7 +226,7 @@ rip_append(struct inpcb *last, struct ip
>  {
>       int policyfail = 0;
>  
> -     INP_RLOCK_ASSERT(last);
> +     INP_LOCK_ASSERT(last);
>  
>  #ifdef IPSEC
>       /* check AH/ESP integrity. */
> @@ -834,16 +834,19 @@ rip_detach(struct socket *so)
>  static void
>  rip_dodisconnect(struct socket *so, struct inpcb *inp)
>  {
> +     struct inpcbinfo *pcbinfo;
>  
> -     INP_INFO_WLOCK_ASSERT(inp->inp_pcbinfo);
> -     INP_WLOCK_ASSERT(inp);
> -
> +     pcbinfo = inp->inp_pcbinfo;
> +     INP_INFO_WLOCK(pcbinfo);
> +     INP_WLOCK(inp);
>       rip_delhash(inp);
>       inp->inp_faddr.s_addr = INADDR_ANY;
>       rip_inshash(inp);
>       SOCK_LOCK(so);
>       so->so_state &= ~SS_ISCONNECTED;
>       SOCK_UNLOCK(so);
> +     INP_WUNLOCK(inp);
> +     INP_INFO_WUNLOCK(pcbinfo);
>  }
>  
>  static void
> @@ -854,11 +857,7 @@ rip_abort(struct socket *so)
>       inp = sotoinpcb(so);
>       KASSERT(inp != NULL, ("rip_abort: inp == NULL"));
>  
> -     INP_INFO_WLOCK(&V_ripcbinfo);
> -     INP_WLOCK(inp);
>       rip_dodisconnect(so, inp);
> -     INP_WUNLOCK(inp);
> -     INP_INFO_WUNLOCK(&V_ripcbinfo);
>  }
>  
>  static void
> @@ -869,11 +868,7 @@ rip_close(struct socket *so)
>       inp = sotoinpcb(so);
>       KASSERT(inp != NULL, ("rip_close: inp == NULL"));
>  
> -     INP_INFO_WLOCK(&V_ripcbinfo);
> -     INP_WLOCK(inp);
>       rip_dodisconnect(so, inp);
> -     INP_WUNLOCK(inp);
> -     INP_INFO_WUNLOCK(&V_ripcbinfo);
>  }
>  
>  static int
> @@ -887,11 +882,7 @@ rip_disconnect(struct socket *so)
>       inp = sotoinpcb(so);
>       KASSERT(inp != NULL, ("rip_disconnect: inp == NULL"));
>  
> -     INP_INFO_WLOCK(&V_ripcbinfo);
> -     INP_WLOCK(inp);
>       rip_dodisconnect(so, inp);
> -     INP_WUNLOCK(inp);
> -     INP_INFO_WUNLOCK(&V_ripcbinfo);
>       return (0);
>  }
>  
> @@ -1077,9 +1068,9 @@ rip_pcblist(SYSCTL_HANDLER_ARGS)
>       INP_INFO_WLOCK(&V_ripcbinfo);
>       for (i = 0; i < n; i++) {
>               inp = inp_list[i];
> -             INP_WLOCK(inp);
> -             if (!in_pcbrele(inp))
> -                     INP_WUNLOCK(inp);
> +             INP_RLOCK(inp);
> +             if (!in_pcbrele_rlocked(inp))
> +                     INP_RUNLOCK(inp);
>       }
>       INP_INFO_WUNLOCK(&V_ripcbinfo);
>  
> 
> Modified: head/sys/netinet/siftr.c
> ==============================================================================
> --- head/sys/netinet/siftr.c  Mon May 30 09:41:38 2011        (r222487)
> +++ head/sys/netinet/siftr.c  Mon May 30 09:43:55 2011        (r222488)
> @@ -696,17 +696,16 @@ siftr_findinpcb(int ipver, struct ip *ip
>  
>       /* We need the tcbinfo lock. */
>       INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
> -     INP_INFO_RLOCK(&V_tcbinfo);
>  
>       if (dir == PFIL_IN)
>               inp = (ipver == INP_IPV4 ?
> -                 in_pcblookup_hash(&V_tcbinfo, ip->ip_src, sport, ip->ip_dst,
> -                 dport, 0, m->m_pkthdr.rcvif)
> +                 in_pcblookup(&V_tcbinfo, ip->ip_src, sport, ip->ip_dst,
> +                 dport, INPLOOKUP_RLOCKPCB, m->m_pkthdr.rcvif)
>                   :
>  #ifdef SIFTR_IPV6
> -                 in6_pcblookup_hash(&V_tcbinfo,
> +                 in6_pcblookup(&V_tcbinfo,
>                   &((struct ip6_hdr *)ip)->ip6_src, sport,
> -                 &((struct ip6_hdr *)ip)->ip6_dst, dport, 0,
> +                 &((struct ip6_hdr *)ip)->ip6_dst, dport, INPLOOKUP_RLOCKPCB,
>                   m->m_pkthdr.rcvif)
>  #else
>                   NULL
> @@ -715,13 +714,13 @@ siftr_findinpcb(int ipver, struct ip *ip
>  
>       else
>               inp = (ipver == INP_IPV4 ?
> -                 in_pcblookup_hash(&V_tcbinfo, ip->ip_dst, dport, ip->ip_src,
> -                 sport, 0, m->m_pkthdr.rcvif)
> +                 in_pcblookup(&V_tcbinfo, ip->ip_dst, dport, ip->ip_src,
> +                 sport, INPLOOKUP_RLOCKPCB, m->m_pkthdr.rcvif)
>                   :
>  #ifdef SIFTR_IPV6
> -                 in6_pcblookup_hash(&V_tcbinfo,
> +                 in6_pcblookup(&V_tcbinfo,
>                   &((struct ip6_hdr *)ip)->ip6_dst, dport,
> -                 &((struct ip6_hdr *)ip)->ip6_src, sport, 0,
> +                 &((struct ip6_hdr *)ip)->ip6_src, sport, INPLOOKUP_RLOCKPCB,
>                   m->m_pkthdr.rcvif)
>  #else
>                   NULL
> @@ -734,12 +733,7 @@ siftr_findinpcb(int ipver, struct ip *ip
>                       ss->nskip_in_inpcb++;
>               else
>                       ss->nskip_out_inpcb++;
> -     } else {
> -             /* Acquire the inpcb lock. */
> -             INP_UNLOCK_ASSERT(inp);
> -             INP_RLOCK(inp);
>       }
> -     INP_INFO_RUNLOCK(&V_tcbinfo);
>  
>       return (inp);
>  }
> 
> Modified: head/sys/netinet/tcp_input.c
> ==============================================================================
> --- head/sys/netinet/tcp_input.c      Mon May 30 09:41:38 2011        
> (r222487)
> +++ head/sys/netinet/tcp_input.c      Mon May 30 09:43:55 2011        
> (r222488)
> @@ -5,6 +5,7 @@
>   *   Swinburne University of Technology, Melbourne, Australia.
>   * Copyright (c) 2009-2010 Lawrence Stewart <lstew...@freebsd.org>
>   * Copyright (c) 2010 The FreeBSD Foundation
> + * Copyright (c) 2010-2011 Juniper Networks, Inc.
>   * All rights reserved.
>   *
>   * Portions of this software were developed at the Centre for Advanced 
> Internet
> @@ -16,6 +17,9 @@
>   * Internet Architectures, Swinburne University of Technology, Melbourne,
>   * Australia by David Hayes under sponsorship from the FreeBSD Foundation.
>   *
> + * Portions of this software were developed by Robert N. M. Watson under
> + * contract to Juniper Networks, Inc.
> + *
>   * Redistribution and use in source and binary forms, with or without
>   * modification, are permitted provided that the following conditions
>   * are met:
> @@ -197,10 +201,6 @@ SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO,
>      &VNET_NAME(tcp_autorcvbuf_max), 0,
>      "Max size of automatic receive buffer");
>  
> -int  tcp_read_locking = 1;
> -SYSCTL_INT(_net_inet_tcp, OID_AUTO, read_locking, CTLFLAG_RW,
> -    &tcp_read_locking, 0, "Enable read locking strategy");
> -
>  VNET_DEFINE(struct inpcbhead, tcb);
>  #define      tcb6    tcb  /* for KAME src sync over BSD*'s */
>  VNET_DEFINE(struct inpcbinfo, tcbinfo);
> @@ -591,8 +591,7 @@ tcp_input(struct mbuf *m, int off0)
>       char *s = NULL;                 /* address and port logging */
>       int ti_locked;
>  #define      TI_UNLOCKED     1
> -#define      TI_RLOCKED      2
> -#define      TI_WLOCKED      3
> +#define      TI_WLOCKED      2
>  
>  #ifdef TCPDEBUG
>       /*
> @@ -756,30 +755,25 @@ tcp_input(struct mbuf *m, int off0)
>       drop_hdrlen = off0 + off;
>  
>       /*
> -      * Locate pcb for segment, which requires a lock on tcbinfo.
> -      * Optimisticaly acquire a global read lock rather than a write lock
> -      * unless header flags necessarily imply a state change.  There are
> -      * two cases where we might discover later we need a write lock
> -      * despite the flags: ACKs moving a connection out of the syncache,
> -      * and ACKs for a connection in TIMEWAIT.
> +      * Locate pcb for segment; if we're likely to add or remove a
> +      * connection then first acquire pcbinfo lock.  There are two cases
> +      * where we might discover later we need a write lock despite the
> +      * flags: ACKs moving a connection out of the syncache, and ACKs for
> +      * a connection in TIMEWAIT.
>        */
> -     if ((thflags & (TH_SYN | TH_FIN | TH_RST)) != 0 ||
> -         tcp_read_locking == 0) {
> +     if ((thflags & (TH_SYN | TH_FIN | TH_RST)) != 0) {
>               INP_INFO_WLOCK(&V_tcbinfo);
>               ti_locked = TI_WLOCKED;
> -     } else {
> -             INP_INFO_RLOCK(&V_tcbinfo);
> -             ti_locked = TI_RLOCKED;
> -     }
> +     } else
> +             ti_locked = TI_UNLOCKED;
>  
>  findpcb:
>  #ifdef INVARIANTS
> -     if (ti_locked == TI_RLOCKED)
> -             INP_INFO_RLOCK_ASSERT(&V_tcbinfo);
> -     else if (ti_locked == TI_WLOCKED)
> +     if (ti_locked == TI_WLOCKED) {
>               INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
> -     else
> -             panic("%s: findpcb ti_locked %d\n", __func__, ti_locked);
> +     } else {
> +             INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
> +     }
>  #endif
>  
>  #ifdef INET
> @@ -797,20 +791,18 @@ findpcb:
>                * Transparently forwarded. Pretend to be the destination.
>                * already got one like this?
>                */
> -             inp = in_pcblookup_hash(&V_tcbinfo,
> -                                     ip->ip_src, th->th_sport,
> -                                     ip->ip_dst, th->th_dport,
> -                                     0, m->m_pkthdr.rcvif);
> +             inp = in_pcblookup(&V_tcbinfo, ip->ip_src, th->th_sport,
> +                 ip->ip_dst, th->th_dport, INPLOOKUP_WLOCKPCB,
> +                 m->m_pkthdr.rcvif);
>               if (!inp) {
> -                     /* It's new.  Try to find the ambushing socket. */
> -                     inp = in_pcblookup_hash(&V_tcbinfo,
> -                                             ip->ip_src, th->th_sport,
> -                                             next_hop->sin_addr,
> -                                             next_hop->sin_port ?
> -                                                 ntohs(next_hop->sin_port) :
> -                                                 th->th_dport,
> -                                             INPLOOKUP_WILDCARD,
> -                                             m->m_pkthdr.rcvif);
> +                     /*
> +                      * It's new.  Try to find the ambushing socket.
> +                      */
> +                     inp = in_pcblookup(&V_tcbinfo, ip->ip_src,
> +                         th->th_sport, next_hop->sin_addr,
> +                         next_hop->sin_port ? ntohs(next_hop->sin_port) :
> +                         th->th_dport, INPLOOKUP_WILDCARD |
> +                         INPLOOKUP_WLOCKPCB, m->m_pkthdr.rcvif);
>               }
>               /* Remove the tag from the packet.  We don't need it anymore. */
>               m_tag_delete(m, fwd_tag);
> @@ -820,21 +812,19 @@ findpcb:
>       {
>  #ifdef INET6
>               if (isipv6)
> -                     inp = in6_pcblookup_hash(&V_tcbinfo,
> -                                              &ip6->ip6_src, th->th_sport,
> -                                              &ip6->ip6_dst, th->th_dport,
> -                                              INPLOOKUP_WILDCARD,
> -                                              m->m_pkthdr.rcvif);
> +                     inp = in6_pcblookup(&V_tcbinfo, &ip6->ip6_src,
> +                         th->th_sport, &ip6->ip6_dst, th->th_dport,
> +                         INPLOOKUP_WILDCARD | INPLOOKUP_WLOCKPCB,
> +                         m->m_pkthdr.rcvif);
>  #endif
>  #if defined(INET) && defined(INET6)
>               else
>  #endif
>  #ifdef INET
> -                     inp = in_pcblookup_hash(&V_tcbinfo,
> -                                             ip->ip_src, th->th_sport,
> -                                             ip->ip_dst, th->th_dport,
> -                                             INPLOOKUP_WILDCARD,
> -                                             m->m_pkthdr.rcvif);
> +                     inp = in_pcblookup(&V_tcbinfo, ip->ip_src,
> +                         th->th_sport, ip->ip_dst, th->th_dport,
> +                         INPLOOKUP_WILDCARD | INPLOOKUP_WLOCKPCB,
> +                         m->m_pkthdr.rcvif);
>  #endif
>       }
>  
> @@ -865,7 +855,7 @@ findpcb:
>               rstreason = BANDLIM_RST_CLOSEDPORT;
>               goto dropwithreset;
>       }
> -     INP_WLOCK(inp);
> +     INP_WLOCK_ASSERT(inp);
>       if (!(inp->inp_flags & INP_HW_FLOWID)
>           && (m->m_flags & M_FLOWID)
>           && ((inp->inp_socket == NULL)
> @@ -906,28 +896,26 @@ findpcb:
>        * legitimate new connection attempt the old INPCB gets removed and
>        * we can try again to find a listening socket.
>        *
> -      * At this point, due to earlier optimism, we may hold a read lock on
> -      * the inpcbinfo, rather than a write lock.  If so, we need to
> -      * upgrade, or if that fails, acquire a reference on the inpcb, drop
> -      * all locks, acquire a global write lock, and then re-acquire the
> -      * inpcb lock.  We may at that point discover that another thread has
> -      * tried to free the inpcb, in which case we need to loop back and
> -      * try to find a new inpcb to deliver to.
> +      * At this point, due to earlier optimism, we may hold only an inpcb
> +      * lock, and not the inpcbinfo write lock.  If so, we need to try to
> +      * acquire it, or if that fails, acquire a reference on the inpcb,
> +      * drop all locks, acquire a global write lock, and then re-acquire
> +      * the inpcb lock.  We may at that point discover that another thread
> +      * has tried to free the inpcb, in which case we need to loop back
> +      * and try to find a new inpcb to deliver to.
> +      *
> +      * XXXRW: It may be time to rethink timewait locking.
>        */
>  relocked:
>       if (inp->inp_flags & INP_TIMEWAIT) {
> -             KASSERT(ti_locked == TI_RLOCKED || ti_locked == TI_WLOCKED,
> -                 ("%s: INP_TIMEWAIT ti_locked %d", __func__, ti_locked));
> -
> -             if (ti_locked == TI_RLOCKED) {
> -                     if (INP_INFO_TRY_UPGRADE(&V_tcbinfo) == 0) {
> +             if (ti_locked == TI_UNLOCKED) {
> +                     if (INP_INFO_TRY_WLOCK(&V_tcbinfo) == 0) {
>                               in_pcbref(inp);
>                               INP_WUNLOCK(inp);
> -                             INP_INFO_RUNLOCK(&V_tcbinfo);
>                               INP_INFO_WLOCK(&V_tcbinfo);
>                               ti_locked = TI_WLOCKED;
>                               INP_WLOCK(inp);
> -                             if (in_pcbrele(inp)) {
> +                             if (in_pcbrele_wlocked(inp)) {
>                                       inp = NULL;
>                                       goto findpcb;
>                               }
> @@ -975,26 +963,24 @@ relocked:
>  
>       /*
>        * We've identified a valid inpcb, but it could be that we need an
> -      * inpcbinfo write lock and have only a read lock.  In this case,
> -      * attempt to upgrade/relock using the same strategy as the TIMEWAIT
> -      * case above.  If we relock, we have to jump back to 'relocked' as
> -      * the connection might now be in TIMEWAIT.
> -      */
> -     if (tp->t_state != TCPS_ESTABLISHED ||
> -         (thflags & (TH_SYN | TH_FIN | TH_RST)) != 0 ||
> -         tcp_read_locking == 0) {
> -             KASSERT(ti_locked == TI_RLOCKED || ti_locked == TI_WLOCKED,
> -                 ("%s: upgrade check ti_locked %d", __func__, ti_locked));
> -
> -             if (ti_locked == TI_RLOCKED) {
> -                     if (INP_INFO_TRY_UPGRADE(&V_tcbinfo) == 0) {
> +      * inpcbinfo write lock but don't hold it.  In this case, attempt to
> +      * acquire using the same strategy as the TIMEWAIT case above.  If we
> +      * relock, we have to jump back to 'relocked' as the connection might
> +      * now be in TIMEWAIT.
> +      */
> +#ifdef INVARIANTS
> +     if ((thflags & (TH_SYN | TH_FIN | TH_RST)) != 0)
> +             INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
> +#endif
> +     if (tp->t_state != TCPS_ESTABLISHED) {
> +             if (ti_locked == TI_UNLOCKED) {
> +                     if (INP_INFO_TRY_WLOCK(&V_tcbinfo) == 0) {
>                               in_pcbref(inp);
>                               INP_WUNLOCK(inp);
> -                             INP_INFO_RUNLOCK(&V_tcbinfo);
>                               INP_INFO_WLOCK(&V_tcbinfo);
>                               ti_locked = TI_WLOCKED;
>                               INP_WLOCK(inp);
> -                             if (in_pcbrele(inp)) {
> +                             if (in_pcbrele_wlocked(inp)) {
>                                       inp = NULL;
>                                       goto findpcb;
>                               }
> @@ -1027,13 +1013,16 @@ relocked:
>       /*
>        * When the socket is accepting connections (the INPCB is in LISTEN
>        * state) we look into the SYN cache if this is a new connection
> -      * attempt or the completion of a previous one.
> +      * attempt or the completion of a previous one.  Because listen
> +      * sockets are never in TCPS_ESTABLISHED, the V_tcbinfo lock will be
> +      * held in this case.
>        */
>       if (so->so_options & SO_ACCEPTCONN) {
>               struct in_conninfo inc;
>  
>               KASSERT(tp->t_state == TCPS_LISTEN, ("%s: so accepting but "
>                   "tp not listening", __func__));
> +             INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
>  
>               bzero(&inc, sizeof(inc));
>  #ifdef INET6
> @@ -1371,13 +1360,17 @@ relocked:
>       return;
>  
>  dropwithreset:
> -     if (ti_locked == TI_RLOCKED)
> -             INP_INFO_RUNLOCK(&V_tcbinfo);
> -     else if (ti_locked == TI_WLOCKED)
> +     if (ti_locked == TI_WLOCKED) {
>               INP_INFO_WUNLOCK(&V_tcbinfo);
> -     else
> -             panic("%s: dropwithreset ti_locked %d", __func__, ti_locked);
> -     ti_locked = TI_UNLOCKED;
> +             ti_locked = TI_UNLOCKED;
> +     }
> +#ifdef INVARIANTS
> +     else {
> +             KASSERT(ti_locked == TI_UNLOCKED, ("%s: dropwithreset "
> +                 "ti_locked: %d", __func__, ti_locked));
> +             INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
> +     }
> +#endif
>  
>       if (inp != NULL) {
>               tcp_dropwithreset(m, th, tp, tlen, rstreason);
> @@ -1388,13 +1381,17 @@ dropwithreset:
>       goto drop;
>  
>  dropunlock:
> -     if (ti_locked == TI_RLOCKED)
> -             INP_INFO_RUNLOCK(&V_tcbinfo);
> -     else if (ti_locked == TI_WLOCKED)
> +     if (ti_locked == TI_WLOCKED) {
>               INP_INFO_WUNLOCK(&V_tcbinfo);
> -     else
> -             panic("%s: dropunlock ti_locked %d", __func__, ti_locked);
> -     ti_locked = TI_UNLOCKED;
> +             ti_locked = TI_UNLOCKED;
> +     }
> +#ifdef INVARIANTS
> +     else {
> +             KASSERT(ti_locked == TI_UNLOCKED, ("%s: dropunlock "
> +                 "ti_locked: %d", __func__, ti_locked));
> +             INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
> +     }
> +#endif
>  
>       if (inp != NULL)
>               INP_WUNLOCK(inp);
> @@ -1449,13 +1446,13 @@ tcp_do_segment(struct mbuf *m, struct tc
>               INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
>       } else {
>  #ifdef INVARIANTS
> -             if (ti_locked == TI_RLOCKED)
> -                     INP_INFO_RLOCK_ASSERT(&V_tcbinfo);
> 
> *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
> _______________________________________________
> svn-src-head@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/svn-src-head
> To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
> 
_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to