I believe this is the right way to fix it:

bytes = __offsetof(struct inpcblbgroup, il_inp[size]);

to:

bytes = __offsetof(struct inpcblbgroup, il_inp) +
sizeof(inpcblbgroup::il_inp[0]) * size;

On Sun, Aug 22, 2021 at 9:43 AM Nadav Har'El <[email protected]> wrote:

> Unfortunately, I can't commit this yet, because it fails compilation on my
> gcc 11.2.1:
>
> In file included from bsd/sys/netinet/in_pcb.cc:40:
> bsd/sys/netinet/in_pcb.cc: In function ‘inpcblbgroup*
> in_pcblbgroup_alloc(inpcblbgrouphead*, u_char, uint16_t, const
> in_dependaddr*, int)’:
> bsd/sys/netinet/in_pcb.cc:212:56: error: ‘size’ is not a constant
> expression
>   212 |         bytes = __offsetof(struct inpcblbgroup, il_inp[size]);
>       |                                                        ^~~~
> ./bsd/porting/netport.h:45:59: note: in definition of macro ‘__offsetof’
>    45 | #define __offsetof(type, field)  __builtin_offsetof(type, field)
>       |                                                           ^~~~~
>
>
> Maybe instead of
>         bytes = __offsetof(struct inpcblbgroup, il_inp[size]);
> we should use
>         bytes = __offsetof(struct inpcblbgroup, il_inp[0]) + size;
> ?
> (but I'm not sure, please check it really does the same...)
>
>
> --
> Nadav Har'El
> [email protected]
>
>
> On Sun, Aug 22, 2021 at 7:23 AM Waldemar Kozaczuk <[email protected]>
> wrote:
>
>> This patch is a manual back-port of the original FreeBSD
>> patch https://reviews.freebsd.org/rS334719. The FreeBSD patch
>> adds support of the SO_REUSEPORT_LB socket option, whereas the one
>> below implements the Linux flavor of SO_REUSEPORT which in effect
>> borrows good chunk of the FreeBSD implementation.
>>
>> Please note the FreeBSD committers decided to retain support of the
>> original SO_REUSEPORT option and add new one - SO_REUSEPORT_LB. The new
>> option exhibits same behavior as the older one but adds important new
>> feature
>> - load balancing across listener sockets sharing the same port. The
>> FreeBSD manual states this:
>>
>> "SO_REUSEPORT_LB allows completely duplicate bindings by multiple pro-
>> cesses if they all set SO_REUSEPORT_LB before binding the port. Incoming
>> TCP and UDP connections are distributed among the sharing processes based
>> on a hash function of local port number, foreign IP address and port num-
>> ber. A maximum of 256 processes can share one socket."
>>
>> So most of the original patch was back-ported as-is except for the parts
>> with the conditional logic to account for both SO_REUSEPORT and
>> SO_REUSEPORT_LB
>> which was unnecessary for OSv as it implements Linux which only supports
>> the SO_REUSEPORT option. In addition in some places I had to change
>> some of C code to use C++ constructs just like in another places of
>> in_pcb.cc.
>>
>> Bulk of the patch below, is about adding definitions of the struct
>> inpcblbgroup and
>> functions to allocate, deallocate and manipulate it to manage load
>> balancing groups including adding and removing member sockets or more
>> specifically their PCBs - Protocol Control Blocks:
>>
>> (Internal API)
>> - struct inpcblbgroup *in_pcblbgroup_alloc() - allocates new LB group
>>
>> - void in_pcblbgroup_free(struct inpcblbgroup *grp) - frees existing LB
>> group
>>
>> - struct inpcblbgroup *in_pcblbgroup_resize(struct inpcblbgrouphead *hdr,
>> struct inpcblbgroup *old_grp, int size) - creates new LB group that is a
>> resized version of the old one
>>
>> - in_pcblbgroup_reorder(struct inpcblbgrouphead *hdr, struct inpcblbgroup
>> **grpp, int i) - PCB at index 'i' is removed from the group, pull up the
>> ones below il_inp[i] and shrink group if possible
>>
>> (Public API)
>> - int in_pcbinslbgrouphash(struct inpcb *inp) - adds PCB member to the LB
>> group for SO_REUSEPORT option (allocate new LB group if necessary)
>>
>> - void in_pcbremlbgrouphash(struct inpcb *inp) - removes PCB from load
>> balance group (free existing LB group if last member)
>>
>> - struct inpcb *in_pcblookup_lbgroup(const struct inpcbinfo *pcbinfo,
>>   const struct in_addr *laddr, uint16_t lport, const struct in_addr
>> *faddr, uint16_t fport, int lookupflags) - looks up
>>   inpcb in a load balancing group
>>
>> The remaining part of the patch, modifies relevant parts in in_pcb.cc to:
>>
>> 1) add logic add and remove inpcb members to/from LB groups by
>> delegating to in_pcbinslbgrouphash() and in_pcbremlbgrouphash() during
>> setup and teardown of sockets and their PCBs
>>
>> 2) add logic to lookup PCBs (and relevant sockets) by delegating to
>> in_pcblookup_lbgroup()
>>
>> This patch does not add any new locking appart for some places
>> that verify certain locks are held in place when functions are called.
>>
>> Please note that at some point during the review process the original
>> version of the FreeBSD patch contained the logic originating from
>> DragonFlyBSD (
>> https://github.com/DragonFlyBSD/DragonFlyBSD/commit/02ad2f0b874fb0a45eb69750219f79f5e8982272
>> )
>> to handle a drawback when processes/threads using SO_REUSE_PORT would
>> crash
>> causing some pending sockets in the completion and incompletion queues
>> to be dropped. But due to the concerns in the locking logic, this part
>> was removed from the patch (https://reviews.freebsd.org/D11003#326149)
>> and therefore also is absent in this patch below. I believe also
>> Linux does not handle this drawback correctly as of now.
>>
>> From practical standpoint, this patch greatly improves the throughput
>> of applications using SO_REUSEPORT. More specifically this http
>> server example implemented in Rust -
>> https://gist.github.com/alexcrichton/7b97beda66d5e9b10321207cd69afbbc -
>> yields way better performance in SMP mode (the 4 CPU difference is most
>> profound):
>>
>> Req/sec BEFORE this patch:
>> 2 CPU - 82199.52
>> 4 CPU - 97982.16
>>
>> AFTER this patch:
>> 2 CPU - 82361.77
>> 4 CPU - 147389.79
>>
>> Finally note this patch does not change any non-load balancing
>> aspects of the SO_REUSEPORT option that were already in place
>> before this patch, but inactive. More specifically these would
>> be related to how SO_REUSEADDR and/or SO_REUSEPORT flags drive
>> same address and/or port collision logic.
>>
>> Some articles about SO_REUSE_PORT:
>> - https://lwn.net/Articles/542629/
>> -
>> https://linuxjedi.co.uk/2020/04/25/socket-so_reuseport-and-kernel-implementations/
>>
>> Fixes #1170
>>
>> Signed-off-by: Waldemar Kozaczuk <[email protected]>
>> ---
>>  bsd/sys/compat/linux/linux.h         |   1 +
>>  bsd/sys/compat/linux/linux_socket.cc |   2 +
>>  bsd/sys/netinet/in_pcb.cc            | 285 +++++++++++++++++++++++++++
>>  bsd/sys/netinet/in_pcb.h             |  32 +++
>>  4 files changed, 320 insertions(+)
>>
>> diff --git a/bsd/sys/compat/linux/linux.h b/bsd/sys/compat/linux/linux.h
>> index 7bc8c509..1e6116aa 100644
>> --- a/bsd/sys/compat/linux/linux.h
>> +++ b/bsd/sys/compat/linux/linux.h
>> @@ -89,6 +89,7 @@ typedef struct {
>>  #define        LINUX_SO_NO_CHECK       11
>>  #define        LINUX_SO_PRIORITY       12
>>  #define        LINUX_SO_LINGER         13
>> +#define        LINUX_SO_REUSEPORT      15
>>  #define        LINUX_SO_PEERCRED       17
>>  #define        LINUX_SO_RCVLOWAT       18
>>  #define        LINUX_SO_SNDLOWAT       19
>> diff --git a/bsd/sys/compat/linux/linux_socket.cc
>> b/bsd/sys/compat/linux/linux_socket.cc
>> index 540b5477..cee3993b 100644
>> --- a/bsd/sys/compat/linux/linux_socket.cc
>> +++ b/bsd/sys/compat/linux/linux_socket.cc
>> @@ -340,6 +340,8 @@ linux_to_bsd_so_sockopt(int opt)
>>                 return (SO_OOBINLINE);
>>         case LINUX_SO_LINGER:
>>                 return (SO_LINGER);
>> +       case LINUX_SO_REUSEPORT:
>> +               return (SO_REUSEPORT);
>>         case LINUX_SO_RCVLOWAT:
>>                 return (SO_RCVLOWAT);
>>         case LINUX_SO_SNDLOWAT:
>> diff --git a/bsd/sys/netinet/in_pcb.cc b/bsd/sys/netinet/in_pcb.cc
>> index 0f62561b..530464c2 100644
>> --- a/bsd/sys/netinet/in_pcb.cc
>> +++ b/bsd/sys/netinet/in_pcb.cc
>> @@ -85,6 +85,9 @@
>>
>>  #include <osv/trace.hh>
>>
>> +#define        INPCBLBGROUP_SIZMIN     8
>> +#define        INPCBLBGROUP_SIZMAX     256
>> +
>>  TRACEPOINT(trace_inpcb_ref, "inp=%x", struct inpcb *);
>>  TRACEPOINT(trace_inpcb_rele, "inp=%x", struct inpcb *);
>>  TRACEPOINT(trace_inpcb_free, "inp=%x", struct inpcb *);
>> @@ -199,6 +202,202 @@ SYSCTL_VNET_INT(_net_inet_ip_portrange, OID_AUTO,
>> randomtime, CTLFLAG_RW,
>>   * functions often modify hash chains or addresses in pcbs.
>>   */
>>
>> +static struct inpcblbgroup *
>> +in_pcblbgroup_alloc(struct inpcblbgrouphead *hdr, u_char vflag,
>> +    uint16_t port, const union in_dependaddr *addr, int size)
>> +{
>> +       struct inpcblbgroup *grp;
>> +       size_t bytes;
>> +
>> +       bytes = __offsetof(struct inpcblbgroup, il_inp[size]);
>> +       grp = (struct inpcblbgroup *)malloc(bytes);
>> +       if (!grp)
>> +               return (NULL);
>> +       grp->il_vflag = vflag;
>> +       grp->il_lport = port;
>> +       grp->il_dependladdr = *addr;
>> +       grp->il_inpsiz = size;
>> +       LIST_INSERT_HEAD(hdr, grp, il_list);
>> +       return (grp);
>> +}
>> +
>> +static void
>> +in_pcblbgroup_free(struct inpcblbgroup *grp)
>> +{
>> +
>> +       LIST_REMOVE(grp, il_list);
>> +       free(grp);
>> +}
>> +
>> +static struct inpcblbgroup *
>> +in_pcblbgroup_resize(struct inpcblbgrouphead *hdr,
>> +    struct inpcblbgroup *old_grp, int size)
>> +{
>> +       struct inpcblbgroup *grp;
>> +       int i;
>> +
>> +       grp = in_pcblbgroup_alloc(hdr, old_grp->il_vflag,
>> +           old_grp->il_lport, &old_grp->il_dependladdr, size);
>> +       if (!grp)
>> +               return (NULL);
>> +
>> +       KASSERT(old_grp->il_inpcnt < grp->il_inpsiz,
>> +           ("invalid new local group size %d and old local group count
>> %d",
>> +            grp->il_inpsiz, old_grp->il_inpcnt));
>> +
>> +       for (i = 0; i < old_grp->il_inpcnt; ++i)
>> +               grp->il_inp[i] = old_grp->il_inp[i];
>> +       grp->il_inpcnt = old_grp->il_inpcnt;
>> +       in_pcblbgroup_free(old_grp);
>> +       return (grp);
>> +}
>> +
>> +/*
>> + * PCB at index 'i' is removed from the group. Pull up the ones below
>> il_inp[i]
>> + * and shrink group if possible.
>> + */
>> +static void
>> +in_pcblbgroup_reorder(struct inpcblbgrouphead *hdr, struct inpcblbgroup
>> **grpp,
>> +    int i)
>> +{
>> +       struct inpcblbgroup *grp = *grpp;
>> +
>> +       for (; i + 1 < grp->il_inpcnt; ++i)
>> +               grp->il_inp[i] = grp->il_inp[i + 1];
>> +       grp->il_inpcnt--;
>> +
>> +       if (grp->il_inpsiz > INPCBLBGROUP_SIZMIN &&
>> +           grp->il_inpcnt <= (grp->il_inpsiz / 4)) {
>> +               /* Shrink this group. */
>> +               struct inpcblbgroup *new_grp =
>> +                       in_pcblbgroup_resize(hdr, grp, grp->il_inpsiz /
>> 2);
>> +               if (new_grp)
>> +                       *grpp = new_grp;
>> +       }
>> +       return;
>> +}
>> +
>> +/*
>> + * Add PCB to load balance group for SO_REUSEPORT option.
>> + */
>> +static int
>> +in_pcbinslbgrouphash(struct inpcb *inp)
>> +{
>> +       struct inpcbinfo *pcbinfo;
>> +       struct inpcblbgrouphead *hdr;
>> +       struct inpcblbgroup *grp;
>> +       uint16_t hashmask, lport;
>> +       uint32_t group_index;
>> +       static int limit_logged = 0;
>> +
>> +       pcbinfo = inp->inp_pcbinfo;
>> +
>> +       INP_LOCK_ASSERT(inp);
>> +       INP_HASH_WLOCK_ASSERT(pcbinfo);
>> +
>> +       if (pcbinfo->ipi_lbgrouphashbase == NULL)
>> +               return (0);
>> +
>> +       hashmask = pcbinfo->ipi_lbgrouphashmask;
>> +       lport = inp->inp_lport;
>> +       group_index = INP_PCBLBGROUP_PORTHASH(lport, hashmask);
>> +       hdr = &pcbinfo->ipi_lbgrouphashbase[group_index];
>> +
>> +#ifdef INET6
>> +       /*
>> +        * Don't allow IPv4 mapped INET6 wild socket.
>> +        */
>> +       if ((inp->inp_vflag & INP_IPV4) &&
>> +           inp->inp_laddr.s_addr == INADDR_ANY &&
>> +           INP_CHECK_SOCKAF(inp->inp_socket, AF_INET6)) {
>> +               return (0);
>> +       }
>> +#endif
>> +
>> +       hdr = &pcbinfo->ipi_lbgrouphashbase[
>> +           INP_PCBLBGROUP_PORTHASH(inp->inp_lport,
>> +               pcbinfo->ipi_lbgrouphashmask)];
>> +       LIST_FOREACH(grp, hdr, il_list) {
>> +               if (grp->il_vflag == inp->inp_vflag &&
>> +                   grp->il_lport == inp->inp_lport &&
>> +                   memcmp(&grp->il_dependladdr,
>> +                       &inp->inp_inc.inc_ie.ie_dependladdr,
>> +                       sizeof(grp->il_dependladdr)) == 0) {
>> +                       break;
>> +               }
>> +       }
>> +       if (grp == NULL) {
>> +               /* Create new load balance group. */
>> +               grp = in_pcblbgroup_alloc(hdr, inp->inp_vflag,
>> +                   inp->inp_lport, &inp->inp_inc.inc_ie.ie_dependladdr,
>> +                   INPCBLBGROUP_SIZMIN);
>> +               if (!grp)
>> +                       return (ENOBUFS);
>> +       } else if (grp->il_inpcnt == grp->il_inpsiz) {
>> +               if (grp->il_inpsiz >= INPCBLBGROUP_SIZMAX) {
>> +                       if (!limit_logged) {
>> +                               limit_logged = 1;
>> +                               printf("lb group port %d, limit
>> reached\n",
>> +                                   ntohs(grp->il_lport));
>> +                       }
>> +                       return (0);
>> +               }
>> +
>> +               /* Expand this local group. */
>> +               grp = in_pcblbgroup_resize(hdr, grp, grp->il_inpsiz * 2);
>> +               if (!grp)
>> +                       return (ENOBUFS);
>> +       }
>> +
>> +       KASSERT(grp->il_inpcnt < grp->il_inpsiz,
>> +                       ("invalid local group size %d and count %d",
>> +                        grp->il_inpsiz, grp->il_inpcnt));
>> +
>> +       grp->il_inp[grp->il_inpcnt] = inp;
>> +       grp->il_inpcnt++;
>> +       return (0);
>> +}
>> +
>> +/*
>> + * Remove PCB from load balance group.
>> + */
>> +static void
>> +in_pcbremlbgrouphash(struct inpcb *inp)
>> +{
>> +       struct inpcbinfo *pcbinfo;
>> +       struct inpcblbgrouphead *hdr;
>> +       struct inpcblbgroup *grp;
>> +       int i;
>> +
>> +       pcbinfo = inp->inp_pcbinfo;
>> +
>> +       INP_LOCK_ASSERT(inp);
>> +       INP_HASH_WLOCK_ASSERT(pcbinfo);
>> +
>> +       if (pcbinfo->ipi_lbgrouphashbase == NULL)
>> +               return;
>> +
>> +       hdr = &pcbinfo->ipi_lbgrouphashbase[
>> +           INP_PCBLBGROUP_PORTHASH(inp->inp_lport,
>> +               pcbinfo->ipi_lbgrouphashmask)];
>> +
>> +       LIST_FOREACH(grp, hdr, il_list) {
>> +               for (i = 0; i < grp->il_inpcnt; ++i) {
>> +                       if (grp->il_inp[i] != inp)
>> +                               continue;
>> +
>> +                       if (grp->il_inpcnt == 1) {
>> +                               /* We are the last, free this local
>> group. */
>> +                               in_pcblbgroup_free(grp);
>> +                       } else {
>> +                               /* Pull up inpcbs, shrink group if
>> possible. */
>> +                               in_pcblbgroup_reorder(hdr, &grp, i);
>> +                       }
>> +                       return;
>> +               }
>> +       }
>> +}
>> +
>>  /*
>>   * Initialize an inpcbinfo -- we should be able to reduce the number of
>>   * arguments in time.
>> @@ -221,6 +420,8 @@ in_pcbinfo_init(struct inpcbinfo *pcbinfo, const char
>> *name,
>>             &pcbinfo->ipi_hashmask);
>>         pcbinfo->ipi_porthashbase = (inpcbporthead
>> *)hashinit(porthash_nelements, 0,
>>             &pcbinfo->ipi_porthashmask);
>> +       pcbinfo->ipi_lbgrouphashbase = (inpcblbgrouphead
>> *)hashinit(hash_nelements, 0,
>> +           &pcbinfo->ipi_lbgrouphashmask);
>>         // FIXME: uma_zone_set_max(pcbinfo->ipi_zone, maxsockets);
>>  }
>>
>> @@ -1090,6 +1291,7 @@ in_pcbdrop(struct inpcb *inp)
>>                 struct inpcbport *phd = inp->inp_phd;
>>
>>                 INP_HASH_WLOCK(inp->inp_pcbinfo);
>> +               in_pcbremlbgrouphash(inp);
>>                 LIST_REMOVE(inp, inp_hash);
>>                 LIST_REMOVE(inp, inp_portlist);
>>                 if (LIST_FIRST(&phd->phd_pcblist) == NULL) {
>> @@ -1340,6 +1542,61 @@ in_pcblookup_local(struct inpcbinfo *pcbinfo,
>> struct in_addr laddr,
>>  }
>>  #undef INP_LOOKUP_MAPPED_PCB_COST
>>
>> +static struct inpcb *
>> +in_pcblookup_lbgroup(const struct inpcbinfo *pcbinfo,
>> +  const struct in_addr *laddr, uint16_t lport, const struct in_addr
>> *faddr,
>> +  uint16_t fport, int lookupflags)
>> +{
>> +       struct inpcb *local_wild = NULL;
>> +       const struct inpcblbgrouphead *hdr;
>> +       struct inpcblbgroup *grp;
>> +       struct inpcblbgroup *grp_local_wild;
>> +
>> +       INP_HASH_LOCK_ASSERT(pcbinfo);
>> +
>> +       hdr = &pcbinfo->ipi_lbgrouphashbase[
>> +                 INP_PCBLBGROUP_PORTHASH(lport,
>> pcbinfo->ipi_lbgrouphashmask)];
>> +
>> +       /*
>> +        * Order of socket selection:
>> +        * 1. non-wild.
>> +        * 2. wild (if lookupflags contains INPLOOKUP_WILDCARD).
>> +        *
>> +        * NOTE:
>> +        * - Load balanced group does not contain jailed sockets
>> +        * - Load balanced group does not contain IPv4 mapped INET6 wild
>> sockets
>> +        */
>> +       LIST_FOREACH(grp, hdr, il_list) {
>> +#ifdef INET6
>> +               if (!(grp->il_vflag & INP_IPV4))
>> +                       continue;
>> +#endif
>> +
>> +               if (grp->il_lport == lport) {
>> +
>> +                       uint32_t idx = 0;
>> +                       int pkt_hash =
>> INP_PCBLBGROUP_PKTHASH(faddr->s_addr,
>> +                           lport, fport);
>> +
>> +                       idx = pkt_hash % grp->il_inpcnt;
>> +
>> +                       if (grp->il_laddr.s_addr == laddr->s_addr) {
>> +                               return (grp->il_inp[idx]);
>> +                       } else {
>> +                               if (grp->il_laddr.s_addr == INADDR_ANY &&
>> +                                       (lookupflags &
>> INPLOOKUP_WILDCARD)) {
>> +                                       local_wild = grp->il_inp[idx];
>> +                                       grp_local_wild = grp;
>> +                               }
>> +                       }
>> +               }
>> +       }
>> +       if (local_wild != NULL) {
>> +               return (local_wild);
>> +       }
>> +       return (NULL);
>> +}
>> +
>>  /*
>>   * Lookup PCB in hash list, using pcbinfo tables.  This variation assumes
>>   * that the caller has locked the hash list, and will not perform any
>> further
>> @@ -1387,6 +1644,18 @@ in_pcblookup_hash_locked(struct inpcbinfo
>> *pcbinfo, struct in_addr faddr,
>>         if (tmpinp != NULL)
>>                 return (tmpinp);
>>
>> +       /*
>> +        * Then look in lb group (for wildcard match).
>> +        */
>> +       if (pcbinfo->ipi_lbgrouphashbase != NULL &&
>> +               (lookupflags & INPLOOKUP_WILDCARD)) {
>> +               inp = in_pcblookup_lbgroup(pcbinfo, &laddr, lport, &faddr,
>> +                   fport, lookupflags);
>> +               if (inp != NULL) {
>> +                       return (inp);
>> +               }
>> +       }
>> +
>>         /*
>>          * Then look for a wildcard match, if requested.
>>          */
>> @@ -1552,6 +1821,18 @@ in_pcbinshash_internal(struct inpcb *inp)
>>         pcbporthash = &pcbinfo->ipi_porthashbase[
>>             INP_PCBPORTHASH(inp->inp_lport, pcbinfo->ipi_porthashmask)];
>>
>> +       /*
>> +        * Add entry to load balance group.
>> +        * Only do this if INP_REUSEPORT is set.
>> +        */
>> +       if (inp->inp_flags2 & INP_REUSEPORT) {
>> +               int ret = in_pcbinslbgrouphash(inp);
>> +               if (ret) {
>> +                       /* pcb lb group malloc fail (ret=ENOBUFS). */
>> +                       return (ret);
>> +               }
>> +       }
>> +
>>         /*
>>          * Go through port list and look for a head for this lport.
>>          */
>> @@ -1642,6 +1923,10 @@ in_pcbremlists(struct inpcb *inp)
>>                 struct inpcbport *phd = inp->inp_phd;
>>
>>                 INP_HASH_WLOCK(pcbinfo);
>> +
>> +               /* XXX: Only do if SO_REUSEPORT set? */
>> +               in_pcbremlbgrouphash(inp);
>> +
>>                 LIST_REMOVE(inp, inp_hash);
>>                 LIST_REMOVE(inp, inp_portlist);
>>                 if (LIST_FIRST(&phd->phd_pcblist) == NULL) {
>> diff --git a/bsd/sys/netinet/in_pcb.h b/bsd/sys/netinet/in_pcb.h
>> index 85df54d6..a3f7a77a 100644
>> --- a/bsd/sys/netinet/in_pcb.h
>> +++ b/bsd/sys/netinet/in_pcb.h
>> @@ -318,6 +318,13 @@ struct inpcbinfo {
>>         struct inpcbporthead    *ipi_porthashbase;      /* (h) */
>>         u_long                   ipi_porthashmask;      /* (h) */
>>
>> +       /*
>> +        * Load balance groups used for the SO_REUSEPORT option,
>> +        * hashed by local port.
>> +        */
>> +       struct  inpcblbgrouphead *ipi_lbgrouphashbase;  /* (h) */
>> +       u_long                   ipi_lbgrouphashmask;   /* (h) */
>> +
>>         /*
>>          * Pointer to network stack instance
>>          */
>> @@ -331,6 +338,27 @@ struct inpcbinfo {
>>
>>  #ifdef _KERNEL
>>
>> +/*
>> + * Load balance groups used for the SO_REUSEPORT socket option. Each
>> group
>> + * (or unique address:port combination) can be re-used at most
>> + * INPCBLBGROUP_SIZMAX (256) times. The inpcbs are stored in il_inp which
>> + * is dynamically resized as processes bind/unbind to that specific
>> group.
>> + */
>> +struct inpcblbgroup {
>> +       LIST_ENTRY(inpcblbgroup) il_list;
>> +       uint16_t        il_lport;                       /* (c) */
>> +       u_char          il_vflag;                       /* (c) */
>> +       u_char          il_pad;
>> +       uint32_t        il_pad2;
>> +       union in_dependaddr il_dependladdr;             /* (c) */
>> +#define        il_laddr        il_dependladdr.id46_addr.ia46_addr4
>> +#define        il6_laddr       il_dependladdr.id6_addr
>> +       uint32_t        il_inpsiz; /* max count in il_inp[] (h) */
>> +       uint32_t        il_inpcnt; /* cur count in il_inp[] (h) */
>> +       struct inpcb    *il_inp[];                      /* (h) */
>> +};
>> +LIST_HEAD(inpcblbgrouphead, inpcblbgroup);
>> +
>>  // No need to do any initialization to the lock, if the inp object was
>>  // created in C++ and the constructor ran (i.e., with new)
>>  //#define INP_LOCK_INIT(inp, d, t) mutex_init(&(inp)->inp_lock)
>> @@ -398,6 +426,10 @@ void       inp_4tuple_get(struct inpcb *inp,
>> uint32_t *laddr, uint16_t *lp,
>>         (((faddr) ^ ((faddr) >> 16) ^ ntohs((lport) ^ (fport))) & (mask))
>>  #define INP_PCBPORTHASH(lport, mask) \
>>         (ntohs((lport)) & (mask))
>> +#define        INP_PCBLBGROUP_PORTHASH(lport, mask) \
>> +       (ntohs((lport)) & (mask))
>> +#define        INP_PCBLBGROUP_PKTHASH(faddr, lport, fport) \
>> +       ((faddr) ^ ((faddr) >> 16) ^ ntohs((lport) ^ (fport)))
>>
>>  /*
>>   * Flags for inp_vflags -- historically version flags only
>> --
>> 2.31.1
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "OSv Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/osv-dev/20210822042314.167929-2-jwkozaczuk%40gmail.com
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CAL9cFfM_rnQUAe7NBeCNO-sA9TeQ1ZMy%3Dj3vPK%2BqyU%3Dhr_2Taw%40mail.gmail.com.

Reply via email to