Has anyone got an opinion on this? I am still interested in doing more
packet capture things on OpenBSD using GRE as a transport, and the idea
of maintaining this out of tree just makes me feel tired.

On Tue, Oct 29, 2019 at 06:34:50PM +1000, David Gwynne wrote:
> i've been toying with this idea of implementing GRE as a datagram
> protocol that userland can use just like UDP. the idea is to make it
> easy to support the implementation of NHRP in userland for mgre(4),
> and also for ERSPAN* support without going down the path linux took**.
> 
> so this is the result of having a go at implementing the idea. the diff
> includes several independent parts, but they all work together to make
> GRE as comfortable to use as UDP. the two main parts are the actual
> protocol implementation in src/sys/netinet/ip_gre.c, and the tweaks to
> getaddrinfo to allow the resolution of gre services. the /etc/services
> chunk gets used by the getaddrinfo bits.
> 
> so, the first chunk lets you do this (as root in userland):
> 
>       int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_GRE);
> 
> that gives you a file descriptor you can then use with bind(),
> connect(), sendto(), recvfrom(), etc. you write a message to the
> kernel and it prepends the GRE and IP headers and pushes it out.
> it is set up so the GRE protocol is handed to the kernel via the
> sin_port or sin6_port member of struct sockaddr_in an sockaddr_in6
> respectively. there's no source and destination protocol fields, just
> one that both ends agree on, so if you connect then bind, your
> sockaddrs have to agree on the proto. unfortunately there's no such
> thing as a wildcard or reserved protocol in GRE, so 0 can't be used
> as a wildcard like it can in udp and tcp.
> 
> the sockets support the configuration of optional GRE headers, as
> defined in RFC 2890, using setsockopt. importantly you can enable
> the key and sequence number headers, which again, the kernel offloads
> for you.
> 
> the second chunk tweaks getaddrinfo so it lets you specify things other
> than IPPROTO_UDP and IPPROTO_TCP. protocols other than those are now
> looked up in /etc/protocols to get their name, which in turn is used to
> look up entries in /etc/services. while i was there and reading rfcs, i
> noted different behaviour for wildcarded socktypes and protocols, which
> i've tried to implement. eric@ seems generally ok with this stuff, and
> suggested the tweak to pledge to allow access to /etc/protocols using
> the dns pledge. tcp and udp are still special though, and are still
> omgoptimised.
> 
> all this together lets the program at
> https://mild.embarrassm.net/~dlg/diff/egred.c work. it is a userland
> reimplementation of a simplified egre(4) using tap(4) and a gre socket.
> the io path is literally reading from one fd and writing it to the othe,
> everything else is boilerplate.
> 
> i suspect the kernel stuff is a bit rough as i havent had to test every
> path, but it supports common functionality.
> 
> thoughts? i am pretty pleased with this has turned out, and would be
> keen to put it in the tree and work on it some more.
> 
> * https://tools.ietf.org/html/draft-foschiano-erspan-03
> ** http://vger.kernel.org/lpc_net2018_talks/erspan-linux-presentation.pdf
> 
> Index: etc/services
> ===================================================================
> RCS file: /cvs/src/etc/services,v
> retrieving revision 1.96
> diff -u -p -r1.96 services
> --- etc/services      27 Jan 2019 20:35:06 -0000      1.96
> +++ etc/services      29 Oct 2019 07:57:44 -0000
> @@ -332,6 +332,21 @@ spamd-cfg        8026/tcp                        # 
> spamd(8) configur
>  dhcpd-sync   8067/udp                        # dhcpd(8) synchronisation
>  hunt         26740/udp                       # hunt(6)
>  #
> +# GRE Protocol Types
> +#
> +keepalive    0/gre                           # 0x0000: IP tunnel keepalive
> +ipv4         2048/gre                        # 0x0800: IPv4
> +nhrp         8193/gre                        # 0x2001: Next Hop Resolution 
> Protocol
> +erspan3              8939/gre                        # 0x22eb: ERSPAN III
> +transether   25944/gre       ethernet        # 0x6558: Trans Ether Bridging
> +ipv6         34525/gre                       # 0x86dd: IPv6
> +wccp         34878/gre                       # 0x883e: Web Content Cache 
> Protocol
> +mpls         34887/gre                       # 0x8847: MPLS
> +#mpls                34888/gre                       # 0x8848: MPLS Multicast
> +erspan               35006/gre       erspan2         # 0x88be: ERSPAN I/II
> +nsh          35151/gre                       # 0x894f: Network Service Header
> +control              47082/gre                       # 0xb7ea: RFC 8157
> +#
>  # Appletalk
>  #
>  rtmp         1/ddp                           # Routing Table Maintenance 
> Protocol
> Index: lib/libc/asr/getaddrinfo_async.c
> ===================================================================
> RCS file: /cvs/src/lib/libc/asr/getaddrinfo_async.c,v
> retrieving revision 1.56
> diff -u -p -r1.56 getaddrinfo_async.c
> --- lib/libc/asr/getaddrinfo_async.c  3 Nov 2018 09:13:24 -0000       1.56
> +++ lib/libc/asr/getaddrinfo_async.c  29 Oct 2019 07:57:54 -0000
> @@ -34,36 +34,15 @@
>  
>  #include "asr_private.h"
>  
> -struct match {
> -     int family;
> -     int socktype;
> -     int protocol;
> -};
> -
>  static int getaddrinfo_async_run(struct asr_query *, struct asr_result *);
>  static int get_port(const char *, const char *, int);
> +static int get_service(const char *, int, int);
>  static int iter_family(struct asr_query *, int);
>  static int addrinfo_add(struct asr_query *, const struct sockaddr *, const 
> char *);
>  static int addrinfo_from_file(struct asr_query *, int,  FILE *);
>  static int addrinfo_from_pkt(struct asr_query *, char *, size_t);
>  static int addrconfig_setup(struct asr_query *);
>  
> -static const struct match matches[] = {
> -     { PF_INET,      SOCK_DGRAM,     IPPROTO_UDP     },
> -     { PF_INET,      SOCK_STREAM,    IPPROTO_TCP     },
> -     { PF_INET,      SOCK_RAW,       0               },
> -     { PF_INET6,     SOCK_DGRAM,     IPPROTO_UDP     },
> -     { PF_INET6,     SOCK_STREAM,    IPPROTO_TCP     },
> -     { PF_INET6,     SOCK_RAW,       0               },
> -     { -1,           0,              0,              },
> -};
> -
> -#define MATCH_FAMILY(a, b) ((a) == matches[(b)].family || (a) == PF_UNSPEC)
> -#define MATCH_PROTO(a, b) ((a) == matches[(b)].protocol || (a) == 0 || 
> matches[(b)].protocol == 0)
> -/* Do not match SOCK_RAW unless explicitly specified */
> -#define MATCH_SOCKTYPE(a, b) ((a) == matches[(b)].socktype || ((a) == 0 && \
> -                             matches[(b)].socktype != SOCK_RAW))
> -
>  enum {
>       DOM_INIT,
>       DOM_DOMAIN,
> @@ -199,24 +178,27 @@ getaddrinfo_async_run(struct asr_query *
>                       }
>               }
>  
> -             /* Make sure there is at least a valid combination */
> -             for (i = 0; matches[i].family != -1; i++)
> -                     if (MATCH_FAMILY(ai->ai_family, i) &&
> -                         MATCH_SOCKTYPE(ai->ai_socktype, i) &&
> -                         MATCH_PROTO(ai->ai_protocol, i))
> -                             break;
> -             if (matches[i].family == -1) {
> -                     ar->ar_gai_errno = EAI_BADHINTS;
> -                     async_set_state(as, ASR_STATE_HALT);
> -                     break;
> -             }
> -
> -             if (ai->ai_protocol == 0 || ai->ai_protocol == IPPROTO_UDP)
> +             switch (ai->ai_protocol) {
> +             case 0:
>                       as->as.ai.port_udp = get_port(as->as.ai.servname, "udp",
>                           as->as.ai.hints.ai_flags & AI_NUMERICSERV);
> -             if (ai->ai_protocol == 0 || ai->ai_protocol == IPPROTO_TCP)
>                       as->as.ai.port_tcp = get_port(as->as.ai.servname, "tcp",
>                           as->as.ai.hints.ai_flags & AI_NUMERICSERV);
> +                     break;
> +             case IPPROTO_TCP:
> +                     as->as.ai.port_tcp = get_port(as->as.ai.servname, "tcp",
> +                         as->as.ai.hints.ai_flags & AI_NUMERICSERV);
> +                     break;
> +             case IPPROTO_UDP:
> +                     as->as.ai.port_udp = get_port(as->as.ai.servname, "udp",
> +                         as->as.ai.hints.ai_flags & AI_NUMERICSERV);
> +                     break;
> +             default:
> +                     as->as.ai.port_udp = get_service(as->as.ai.servname,
> +                         ai->ai_protocol,
> +                         as->as.ai.hints.ai_flags & AI_NUMERICSERV);
> +                     break;
> +             }
>               if (as->as.ai.port_tcp == -2 || as->as.ai.port_udp == -2 ||
>                   (as->as.ai.port_tcp == -1 && as->as.ai.port_udp == -1) ||
>                   (ai->ai_protocol && (as->as.ai.port_udp == -1 ||
> @@ -491,6 +473,24 @@ get_port(const char *servname, const cha
>       return (port);
>  }
>  
> +static int
> +get_service(const char *servname, int protocol, int numonly)
> +{
> +     struct protoent pe;
> +     struct protoent_data ped;
> +     int rv;
> +
> +     memset(&ped, 0, sizeof(ped));
> +     rv = getprotobynumber_r(protocol, &pe, &ped);
> +     if (rv == -1)
> +             return (-1);
> +
> +     rv = get_port(servname, pe.p_name, numonly);
> +     endprotoent_r(&ped);
> +
> +     return (rv);
> +}
> +
>  /*
>   * Iterate over the address families that are to be queried. Use the
>   * list on the async context, unless a specific family was given in hints.
> @@ -519,65 +519,107 @@ iter_family(struct asr_query *as, int fi
>   * entry per protocol/socktype match.
>   */
>  static int
> -addrinfo_add(struct asr_query *as, const struct sockaddr *sa, const char 
> *cname)
> +addrinfo_add_ai(struct asr_query *as, const struct sockaddr *sa,
> +    const char *cname, int socktype, int proto, int port)
>  {
>       struct addrinfo         *ai;
> -     int                      i, port, proto;
> -
> -     for (i = 0; matches[i].family != -1; i++) {
> -             if (matches[i].family != sa->sa_family ||
> -                 !MATCH_SOCKTYPE(as->as.ai.hints.ai_socktype, i) ||
> -                 !MATCH_PROTO(as->as.ai.hints.ai_protocol, i))
> -                     continue;
> -
> -             proto = as->as.ai.hints.ai_protocol;
> -             if (!proto)
> -                     proto = matches[i].protocol;
> -
> -             if (proto == IPPROTO_TCP)
> -                     port = as->as.ai.port_tcp;
> -             else if (proto == IPPROTO_UDP)
> -                     port = as->as.ai.port_udp;
> -             else
> -                     port = 0;
> +     int                      i;
>  
> -             /* servname specified, but not defined for this protocol */
> -             if (port == -1)
> -                     continue;
> +     if (port == -1)
> +             return (0);
>  
> -             ai = calloc(1, sizeof(*ai) + sa->sa_len);
> -             if (ai == NULL)
> +     ai = calloc(1, sizeof(*ai) + sa->sa_len);
> +     if (ai == NULL)
> +             return (EAI_MEMORY);
> +     ai->ai_family = sa->sa_family;
> +     ai->ai_socktype = socktype;
> +     ai->ai_protocol = proto;
> +     ai->ai_flags = as->as.ai.hints.ai_flags;
> +     ai->ai_addrlen = sa->sa_len;
> +     ai->ai_addr = (void *)(ai + 1);
> +     if (cname &&
> +         as->as.ai.hints.ai_flags & (AI_CANONNAME | AI_FQDN)) {
> +             if ((ai->ai_canonname = strdup(cname)) == NULL) {
> +                     free(ai);
>                       return (EAI_MEMORY);
> -             ai->ai_family = sa->sa_family;
> -             ai->ai_socktype = matches[i].socktype;
> -             ai->ai_protocol = proto;
> -             ai->ai_flags = as->as.ai.hints.ai_flags;
> -             ai->ai_addrlen = sa->sa_len;
> -             ai->ai_addr = (void *)(ai + 1);
> -             if (cname &&
> -                 as->as.ai.hints.ai_flags & (AI_CANONNAME | AI_FQDN)) {
> -                     if ((ai->ai_canonname = strdup(cname)) == NULL) {
> -                             free(ai);
> -                             return (EAI_MEMORY);
> -                     }
>               }
> -             memmove(ai->ai_addr, sa, sa->sa_len);
> -             if (sa->sa_family == PF_INET)
> -                     ((struct sockaddr_in *)ai->ai_addr)->sin_port =
> -                         htons(port);
> -             else if (sa->sa_family == PF_INET6)
> -                     ((struct sockaddr_in6 *)ai->ai_addr)->sin6_port =
> -                         htons(port);
> -
> -             if (as->as.ai.aifirst == NULL)
> -                     as->as.ai.aifirst = ai;
> -             if (as->as.ai.ailast)
> -                     as->as.ai.ailast->ai_next = ai;
> -             as->as.ai.ailast = ai;
> -             as->as_count += 1;
>       }
> +     memmove(ai->ai_addr, sa, sa->sa_len);
> +     if (sa->sa_family == PF_INET)
> +             ((struct sockaddr_in *)ai->ai_addr)->sin_port =
> +                 htons(port);
> +     else if (sa->sa_family == PF_INET6)
> +             ((struct sockaddr_in6 *)ai->ai_addr)->sin6_port =
> +                 htons(port);
> +
> +     if (as->as.ai.aifirst == NULL)
> +             as->as.ai.aifirst = ai;
> +     if (as->as.ai.ailast)
> +             as->as.ai.ailast->ai_next = ai;
> +     as->as.ai.ailast = ai;
> +     as->as_count += 1;
>  
>       return (0);
> +}
> +
> +static int
> +addrinfo_add_proto(struct asr_query *as, const struct sockaddr *sa,
> +    const char *cname, int proto, int port)
> +{
> +     int rv;
> +
> +     switch (as->as.ai.hints.ai_socktype) {
> +     case 0:
> +             rv = addrinfo_add_ai(as, sa, cname, SOCK_STREAM, proto, port);
> +             if (rv != 0)
> +                     break;
> +
> +             rv = addrinfo_add_ai(as, sa, cname, SOCK_DGRAM, proto, port);
> +             if (rv != 0)
> +                     break;
> +
> +             break;
> +
> +     default:
> +             rv = addrinfo_add_ai(as, sa, cname,
> +                 as->as.ai.hints.ai_socktype, proto, port);
> +             break;
> +     }
> +
> +     return (rv);
> +}
> +
> +static int
> +addrinfo_add(struct asr_query *as, const struct sockaddr *sa, const char 
> *cname)
> +{
> +     int rv;
> +
> +     switch (as->as.ai.hints.ai_protocol) {
> +     case 0:
> +             rv = addrinfo_add_proto(as, sa, cname,
> +                 IPPROTO_TCP, as->as.ai.port_tcp);
> +             if (rv != 0)
> +                     break;
> +
> +             rv = addrinfo_add_proto(as, sa, cname,
> +                 IPPROTO_UDP, as->as.ai.port_udp);
> +             if (rv != 0)
> +                     break;
> +
> +             break;
> +
> +     case IPPROTO_TCP:
> +             rv = addrinfo_add_proto(as, sa, cname,
> +                 IPPROTO_TCP, as->as.ai.port_tcp);
> +             break;
> +
> +     default: /* includes IPPROTO_UDP */
> +             rv = addrinfo_add_proto(as, sa, cname,
> +                 as->as.ai.hints.ai_protocol, as->as.ai.port_udp);
> +             break;
> +     }
> +
> +     return (rv);
>  }
>  
>  static int
> Index: sys/conf/files
> ===================================================================
> RCS file: /cvs/src/sys/conf/files,v
> retrieving revision 1.675
> diff -u -p -r1.675 files
> --- sys/conf/files    5 Oct 2019 05:33:14 -0000       1.675
> +++ sys/conf/files    29 Oct 2019 07:57:58 -0000
> @@ -862,7 +862,7 @@ file netinet/tcp_subr.c
>  file netinet/tcp_timer.c
>  file netinet/tcp_usrreq.c
>  file netinet/udp_usrreq.c
> -file netinet/ip_gre.c
> +file netinet/ip_gre.c                        gre
>  file netinet/ip_ipsp.c                       ipsec | tcp_signature
>  file netinet/ip_spd.c                        ipsec | tcp_signature
>  file netinet/ip_ipip.c
> Index: sys/kern/kern_pledge.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_pledge.c,v
> retrieving revision 1.255
> diff -u -p -r1.255 kern_pledge.c
> --- sys/kern/kern_pledge.c    25 Aug 2019 18:46:40 -0000      1.255
> +++ sys/kern/kern_pledge.c    29 Oct 2019 07:57:58 -0000
> @@ -666,7 +666,7 @@ pledge_namei(struct proc *p, struct name
>                       }
>               }
>  
> -             /* DNS needs /etc/{resolv.conf,hosts,services}. */
> +             /* DNS needs /etc/{resolv.conf,hosts,services,protocols}. */
>               if ((ni->ni_pledge == PLEDGE_RPATH) &&
>                   (p->p_p->ps_pledge & PLEDGE_DNS)) {
>                       if (strcmp(path, "/etc/resolv.conf") == 0) {
> @@ -678,6 +678,10 @@ pledge_namei(struct proc *p, struct name
>                               return (0);
>                       }
>                       if (strcmp(path, "/etc/services") == 0) {
> +                             ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
> +                             return (0);
> +                     }
> +                     if (strcmp(path, "/etc/protocols") == 0) {
>                               ni->ni_cnd.cn_flags |= BYPASSUNVEIL;
>                               return (0);
>                       }
> Index: sys/net/if_gre.c
> ===================================================================
> RCS file: /cvs/src/sys/net/if_gre.c,v
> retrieving revision 1.152
> diff -u -p -r1.152 if_gre.c
> --- sys/net/if_gre.c  29 Jul 2019 16:28:25 -0000      1.152
> +++ sys/net/if_gre.c  29 Oct 2019 07:57:58 -0000
> @@ -69,6 +69,8 @@
>  #include <netinet/ip_var.h>
>  #include <netinet/ip_ecn.h>
>  
> +#include <netinet/gre_proto.h>
> +
>  #ifdef INET6
>  #include <netinet/ip6.h>
>  #include <netinet6/ip6_var.h>
> @@ -103,28 +105,6 @@
>  /*
>   * packet formats
>   */
> -struct gre_header {
> -     uint16_t                gre_flags;
> -#define GRE_CP                               0x8000  /* Checksum Present */
> -#define GRE_KP                               0x2000  /* Key Present */
> -#define GRE_SP                               0x1000  /* Sequence Present */
> -
> -#define GRE_VERS_MASK                        0x0007
> -#define GRE_VERS_0                   0x0000
> -#define GRE_VERS_1                   0x0001
> -
> -     uint16_t                gre_proto;
> -} __packed __aligned(4);
> -
> -struct gre_h_cksum {
> -     uint16_t                gre_cksum;
> -     uint16_t                gre_reserved1;
> -} __packed __aligned(4);
> -
> -struct gre_h_key {
> -     uint32_t                gre_key;
> -} __packed __aligned(4);
> -
>  #define GRE_EOIP             0x6400
>  
>  struct gre_h_key_eoip {
> @@ -132,13 +112,7 @@ struct gre_h_key_eoip {
>       uint16_t                eoip_tunnel_id; /* little endian */
>  } __packed __aligned(4);
>  
> -#define NVGRE_VSID_RES_MIN   0x000000 /* reserved for future use */
> -#define NVGRE_VSID_RES_MAX   0x000fff
> -#define NVGRE_VSID_NVE2NVE   0xffffff /* vendor specific NVE-to-NVE comms */
> -
> -struct gre_h_seq {
> -     uint32_t                gre_seq;
> -} __packed __aligned(4);
> +#define GRE_WCCP 0x883e
>  
>  struct gre_h_wccp {
>       uint8_t                 wccp_flags;
> @@ -147,7 +121,10 @@ struct gre_h_wccp {
>       uint8_t                 pri_bucket;
>  } __packed __aligned(4);
>  
> -#define GRE_WCCP 0x883e
> +
> +#define NVGRE_VSID_RES_MIN   0x000000 /* reserved for future use */
> +#define NVGRE_VSID_RES_MAX   0x000fff
> +#define NVGRE_VSID_NVE2NVE   0xffffff /* vendor specific NVE-to-NVE comms */
>  
>  #define GRE_HDRLEN (sizeof(struct ip) + sizeof(struct gre_header))
>  
> @@ -289,8 +266,8 @@ static int        gre_up(struct gre_softc *);
>  static int   gre_down(struct gre_softc *);
>  static void  gre_link_state(struct ifnet *, unsigned int);
>  
> -static int   gre_input_key(struct mbuf **, int *, int, int, uint8_t,
> -                 struct gre_tunnel *);
> +static struct mbuf *
> +             gre_if_input(struct mbuf *, int, uint8_t, struct gre_tunnel *);
>  
>  static struct mbuf *
>               gre_ipv4_patch(const struct gre_tunnel *, struct mbuf *,
> @@ -893,10 +870,9 @@ eoip_clone_destroy(struct ifnet *ifp)
>       return (0);
>  }
>  
> -int
> -gre_input(struct mbuf **mp, int *offp, int type, int af)
> +struct mbuf *
> +gre_if4_input(struct mbuf *m, int hlen)
>  {
> -     struct mbuf *m = *mp;
>       struct gre_tunnel key;
>       struct ip *ip;
>  
> @@ -908,17 +884,13 @@ gre_input(struct mbuf **mp, int *offp, i
>       key.t_src4 = ip->ip_dst;
>       key.t_dst4 = ip->ip_src;
>  
> -     if (gre_input_key(mp, offp, type, af, ip->ip_tos, &key) == -1)
> -             return (rip_input(mp, offp, type, af));
> -
> -     return (IPPROTO_DONE);
> +     return (gre_if_input(m, hlen, ip->ip_tos, &key));
>  }
>  
>  #ifdef INET6
> -int
> -gre_input6(struct mbuf **mp, int *offp, int type, int af)
> +struct mbuf *
> +gre_if6_input(struct mbuf *m, int hlen)
>  {
> -     struct mbuf *m = *mp;
>       struct gre_tunnel key;
>       struct ip6_hdr *ip6;
>       uint32_t flow;
> @@ -933,10 +905,7 @@ gre_input6(struct mbuf **mp, int *offp, 
>  
>       flow = bemtoh32(&ip6->ip6_flow);
>  
> -     if (gre_input_key(mp, offp, type, af, flow >> 20, &key) == -1)
> -             return (rip6_input(mp, offp, type, af));
> -
> -     return (IPPROTO_DONE);
> +     return (gre_if_input(m, hlen, flow >> 20, &key));
>  }
>  #endif /* INET6 */
>  
> @@ -996,12 +965,10 @@ gre_input_1(struct gre_tunnel *key, stru
>       return (m);
>  }
>  
> -static int
> -gre_input_key(struct mbuf **mp, int *offp, int type, int af, uint8_t otos,
> -    struct gre_tunnel *key)
> +static struct mbuf *
> +gre_if_input(struct mbuf *m, int iphlen, uint8_t otos, struct gre_tunnel 
> *key)
>  {
> -     struct mbuf *m = *mp;
> -     int iphlen = *offp, hlen, rxprio;
> +     int hlen, rxprio;
>       struct ifnet *ifp;
>       const struct gre_tunnel *tunnel;
>       caddr_t buf;
> @@ -1025,7 +992,7 @@ gre_input_key(struct mbuf **mp, int *off
>  
>       m = m_pullup(m, hlen);
>       if (m == NULL)
> -             return (IPPROTO_DONE);
> +             return (NULL);
>  
>       buf = mtod(m, caddr_t);
>       gh = (struct gre_header *)(buf + iphlen);
> @@ -1038,7 +1005,7 @@ gre_input_key(struct mbuf **mp, int *off
>       case htons(GRE_VERS_1):
>               m = gre_input_1(key, m, gh, otos, iphlen);
>               if (m == NULL)
> -                     return (IPPROTO_DONE);
> +                     return (NULL);
>               /* FALLTHROUGH */
>       default:
>               goto decline;
> @@ -1055,7 +1022,7 @@ gre_input_key(struct mbuf **mp, int *off
>  
>               m = m_pullup(m, hlen);
>               if (m == NULL)
> -                     return (IPPROTO_DONE);
> +                     return (NULL);
>  
>               buf = mtod(m, caddr_t);
>               gh = (struct gre_header *)(buf + iphlen);
> @@ -1071,7 +1038,7 @@ gre_input_key(struct mbuf **mp, int *off
>                   nvgre_input(key, m, hlen, otos) == -1)
>                       goto decline;
>  
> -             return (IPPROTO_DONE);
> +             return (NULL);
>       }
>  
>       ifp = gre_find(key);
> @@ -1148,7 +1115,7 @@ gre_input_key(struct mbuf **mp, int *off
>  
>               m_adj(m, hlen);
>               gre_keepalive_recv(ifp, m);
> -             return (IPPROTO_DONE);
> +             return (NULL);
>  
>       default:
>               goto decline;
> @@ -1162,7 +1129,7 @@ gre_input_key(struct mbuf **mp, int *off
>  
>       m = (*patch)(tunnel, m, &itos, otos);
>       if (m == NULL)
> -             return (IPPROTO_DONE); 
> +             return (NULL); 
>  
>       if (tunnel->t_key_mask == GRE_KEY_ENTROPY) {
>               m->m_pkthdr.ph_flowid = M_FLOWID_VALID |
> @@ -1203,10 +1170,9 @@ gre_input_key(struct mbuf **mp, int *off
>  #endif
>  
>       (*input)(ifp, m);
> -     return (IPPROTO_DONE);
> +     return (NULL);
>  decline:
> -     *mp = m;
> -     return (-1);
> +     return (m);
>  }
>  
>  static struct mbuf *
> Index: sys/netinet/gre_proto.h
> ===================================================================
> RCS file: sys/netinet/gre_proto.h
> diff -N sys/netinet/gre_proto.h
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ sys/netinet/gre_proto.h   29 Oct 2019 07:57:58 -0000
> @@ -0,0 +1,48 @@
> +/* $OpenBSD$ */
> +
> +/*
> + * Copyright (c) 2019 David Gwynne <d...@openbsd.org>
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +#ifndef _NETINET_GRE_H_
> +#define _NETINET_GRE_H_
> +
> +struct gre_header {
> +     uint16_t        gre_flags;
> +#define GRE_CP                       0x8000  /* Checksum Present */
> +#define GRE_KP                       0x2000  /* Key Present */
> +#define GRE_SP                       0x1000  /* Sequence Present */
> +
> +#define GRE_VERS_MASK                0x0007
> +#define GRE_VERS_0           0x0000
> +#define GRE_VERS_1           0x0001
> +
> +     uint16_t        gre_proto;
> +};
> +
> +struct gre_h_cksum {
> +     uint16_t        gre_cksum;
> +     uint16_t        gre_reserved1;
> +};
> +
> +struct gre_h_key {
> +     uint32_t        gre_key;
> +};
> +
> +struct gre_h_seq {
> +     uint32_t        gre_seq;
> +};
> +
> +#endif /* _NETINET_GRE_H_ */
> Index: sys/netinet/gre_var.h
> ===================================================================
> RCS file: sys/netinet/gre_var.h
> diff -N sys/netinet/gre_var.h
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ sys/netinet/gre_var.h     29 Oct 2019 07:57:58 -0000
> @@ -0,0 +1,64 @@
> +/* $OpenBSD$ */
> +
> +/*
> + * Copyright (c) 2019 David Gwynne <d...@openbsd.org>
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +#ifndef _NETINET_GRE_VAR_H_
> +#define _NETINET_GRE_VAR_H_
> +
> +/*
> + * setsockopt(s, IPPROTO_GRE, ...
> + */
> +
> +#define GRE_CKSUM    1       /* bool; enable GRE checksum headers */
> +#define GRE_KEY              2       /* uint32_t; enable and set GRE key */
> +                             /* NULL; disable GRE key header */ 
> +#define GRE_SEQ              3       /* uint32_t; enable and set GRE seq */
> +                             /* NULL; disable GRE seq header */
> +
> +#define GRE_RECVSEQ  4       /* bool; enable reception of seq numbers */
> +#define GRE_SENDSEQ  GRE_RECVSEQ
> +
> +#ifdef _KERNEL
> +int  gre_raw_usrreq(struct socket *, int, struct mbuf *, struct mbuf *,
> +         struct mbuf *, struct proc *);
> +int  gre_sysctl(int *, u_int, void *, size_t *, void *, size_t);
> +
> +void gre_init(void);
> +
> +int  gre_attach(struct socket *, int);
> +int  gre_detach(struct socket *);
> +
> +int  gre_ip4_usrreq(struct socket *, int, struct mbuf *, struct mbuf *,
> +         struct mbuf *, struct proc *);
> +int  gre_ip4_ctloutput(int, struct socket *, int, int, struct mbuf *);
> +
> +int  gre_ip4_input(struct mbuf **, int *, int, int);
> +
> +struct mbuf *
> +     gre_if4_input(struct mbuf *, int); /* interface glue */
> +
> +#ifdef INET6
> +int  gre_ip6_usrreq(struct socket *, int, struct mbuf *, struct mbuf *,
> +         struct mbuf *, struct proc *);
> +int  gre_ip6_ctloutput(int, struct socket *, int, int, struct mbuf *);
> +int  gre_ip6_input(struct mbuf **, int *, int, int);
> +
> +struct mbuf *
> +     gre_if6_input(struct mbuf *, int); /* interface glue */
> +#endif /* INET6 */
> +#endif /* _KERNEL */
> +#endif /* _NETINET_GRE_VAR_H_ */
> Index: sys/netinet/in_proto.c
> ===================================================================
> RCS file: /cvs/src/sys/netinet/in_proto.c,v
> retrieving revision 1.93
> diff -u -p -r1.93 in_proto.c
> --- sys/netinet/in_proto.c    15 Jul 2019 12:40:42 -0000      1.93
> +++ sys/netinet/in_proto.c    29 Oct 2019 07:57:58 -0000
> @@ -147,8 +147,7 @@
>  
>  #include "gre.h"
>  #if NGRE > 0
> -#include <netinet/ip_gre.h>
> -#include <net/if_gre.h>
> +#include <netinet/gre_var.h>
>  #endif
>  
>  #include "carp.h"
> @@ -346,11 +345,24 @@ const struct protosw inetsw[] = {
>    .pr_domain = &inetdomain,
>    .pr_protocol       = IPPROTO_GRE,
>    .pr_flags  = PR_ATOMIC|PR_ADDR,
> -  .pr_input  = gre_input,
> +  .pr_input  = gre_ip4_input,
>    .pr_ctloutput      = rip_ctloutput,
> -  .pr_usrreq = gre_usrreq,
> +  .pr_usrreq = gre_raw_usrreq,
>    .pr_attach = rip_attach,
>    .pr_detach = rip_detach,
> +  .pr_sysctl = gre_sysctl
> +},
> +{
> +  .pr_type   = SOCK_DGRAM,
> +  .pr_domain = &inetdomain,
> +  .pr_protocol       = IPPROTO_GRE,
> +  .pr_flags  = PR_ATOMIC|PR_ADDR,
> +  .pr_input  = gre_ip4_input,
> +  .pr_ctloutput      = gre_ip4_ctloutput,
> +  .pr_usrreq = gre_ip4_usrreq,
> +  .pr_attach = gre_attach,
> +  .pr_detach = gre_detach,
> +  .pr_init   = gre_init,
>    .pr_sysctl = gre_sysctl
>  },
>  #endif /* NGRE > 0 */
> Index: sys/netinet/ip_gre.c
> ===================================================================
> RCS file: /cvs/src/sys/netinet/ip_gre.c,v
> retrieving revision 1.71
> diff -u -p -r1.71 ip_gre.c
> --- sys/netinet/ip_gre.c      7 Feb 2018 22:30:59 -0000       1.71
> +++ sys/netinet/ip_gre.c      29 Oct 2019 07:57:58 -0000
> @@ -1,7 +1,23 @@
> -/*      $OpenBSD: ip_gre.c,v 1.71 2018/02/07 22:30:59 dlg Exp $ */
> +/*   $OpenBSD: ip_gre.c,v 1.71 2018/02/07 22:30:59 dlg Exp $ */
>  /*   $NetBSD: ip_gre.c,v 1.9 1999/10/25 19:18:11 drochner Exp $ */
>  
>  /*
> + * Copyright (c) 2019 David Gwynne <d...@openbsd.org>
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +/*
>   * Copyright (c) 1998 The NetBSD Foundation, Inc.
>   * All rights reserved.
>   *
> @@ -30,16 +46,6 @@
>   * POSSIBILITY OF SUCH DAMAGE.
>   */
>  
> -/*
> - * decapsulate tunneled packets and send them on
> - * output half is in net/if_gre.[ch]
> - * This currently handles IPPROTO_GRE, IPPROTO_MOBILE
> - */
> -
> -
> -#include "gre.h"
> -#if NGRE > 0
> -
>  #include <sys/param.h>
>  #include <sys/systm.h>
>  #include <sys/mbuf.h>
> @@ -47,24 +53,40 @@
>  #include <sys/socket.h>
>  #include <sys/socketvar.h>
>  #include <sys/sysctl.h>
> +#include <sys/proc.h>
> +#include <sys/atomic.h>
> +#include <sys/pool.h>
> +#include <sys/tree.h>
> +
> +#include <sys/domain.h>
>  
>  #include <net/if.h>
> +#include <net/if_var.h>
>  #include <net/route.h>
>  
>  #include <netinet/in.h>
> +#include <netinet/in_var.h>
>  #include <netinet/ip.h>
>  #include <netinet/ip_var.h>
>  #include <netinet/in_pcb.h>
>  
> +#include <netinet/gre_var.h>
> +#include <netinet/gre_proto.h>
> +
>  #ifdef PIPEX
>  #include <net/pipex.h>
>  #endif
>  
> +
> +/*
> + * socket({AF_INET,AF_INET6}, SOCK_RAW, IPPROTO_GRE);
> + */
> +
>  int
> -gre_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
> +gre_raw_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
>      struct mbuf *control, struct proc *p)
>  {
> -#ifdef  PIPEX 
> +#ifdef  PIPEX
>       struct inpcb *inp = sotoinpcb(so);
>  
>       if (inp != NULL && inp->inp_pipex && req == PRU_SEND) {
> @@ -92,4 +114,1817 @@ gre_usrreq(struct socket *so, int req, s
>  #endif
>       return rip_usrreq(so, req, m, nam, control, p);
>  }
> -#endif /* if NGRE > 0 */
> +
> +/*
> + * support socket({AF_INET,AF_INET6}, SOCK_DGRAM, IPPROTO_GRE);
> + *
> + * GRE datagram sockets provide support for GRE version 0 packets
> + * in the kernel.
> + */
> +
> +#define GRE_VALID_MASK       (GRE_VERS_MASK | GRE_CP | GRE_KP | GRE_SP)
> +
> +struct gre_pcb_key;
> +
> +struct gre_pcb {
> +     struct inpcb             gpcb_inpcb;
> +     unsigned int             gpcb_pflags;   /* pcb flags */
> +#define GREPCB_RECVSEQ                       (1 << 0)
> +     uint16_t                 gpcb_flags;    /* network byteorder */
> +     uint32_t                 gpcb_key;      /* network byteorder */
> +     uint32_t                 gpcb_seq;      /* host byteorder */
> +
> +     struct gre_pcb_key      *gpcb_pcb_key;
> +     int                      gpcb_reuse;
> +     uint8_t                  gpcb_ttl;
> +};
> +TAILQ_HEAD(gre_pcb_list, inpcb);
> +
> +static inline struct gre_pcb *
> +inp_gpcb(struct inpcb *inp)
> +{
> +     return ((struct gre_pcb *)inp);
> +}
> +
> +#define gpcb_inp(_gpcb) (&(_gpcb)->gpcb_inpcb)
> +
> +static inline struct socket *
> +gpcb_so(struct gre_pcb *gpcb)
> +{
> +     return (gpcb_inp(gpcb)->inp_socket);
> +}
> +
> +struct gre_pcb_key {
> +     union inpaddru          gk_laddr;
> +#define gk_laddr4    gk_laddr.iau_a4u.inaddr
> +#define gk_laddr6    gk_laddr.iau_addr6
> +     union inpaddru          gk_faddr;
> +#define gk_faddr4    gk_faddr.iau_a4u.inaddr
> +#define gk_faddr6    gk_faddr.iau_addr6
> +
> +     uint16_t                gk_flags;       /* network byteorder */
> +     uint16_t                gk_proto;       /* network byteorder */
> +     uint32_t                gk_key;         /* network byteorder */
> +
> +     unsigned int            gk_rtableid;
> +     sa_family_t             gk_family;
> +
> +     RBT_ENTRY(gre_pcb_key)  gk_entry;
> +     struct gre_pcb_list     gk_pcbs;
> +     unsigned int            gk_state;
> +#define GRE_S_DISCONNECTED           0
> +#define GRE_S_WILDCARD                       1
> +#define GRE_S_BOUND                  2
> +#define GRE_S_CONNECTED                      3
> +};
> +
> +RBT_HEAD(gre_tree_wildcards, gre_pcb_key);
> +RBT_HEAD(gre_tree_bound, gre_pcb_key);
> +RBT_HEAD(gre_tree_connected, gre_pcb_key);
> +
> +static int   gre_pcb_key_cmp_wildcard(const struct gre_pcb_key *,
> +                 const struct gre_pcb_key *);
> +static int   gre_pcb_key_cmp_bound(const struct gre_pcb_key *,
> +                 const struct gre_pcb_key *);
> +
> +RBT_PROTOTYPE(gre_tree_wildcards, gre_pcb_key, gk_entry,
> +    gre_pcb_key_cmp_wildcard);
> +RBT_PROTOTYPE(gre_tree_bound, gre_pcb_key, gk_entry,
> +    gre_pcb_key_cmp_bound);
> +RBT_PROTOTYPE(gre_tree_connected, gre_pcb_key, gk_entry,
> +    gre_pcb_key_cmp_connected);
> +
> +struct gre_ops {
> +     int     (*op_nametosa)(struct inpcb *, struct mbuf *,
> +                 struct sockaddr **);
> +     int     (*op_is_wildcard)(struct sockaddr *);
> +     int     (*op_is_multicast)(struct sockaddr *);
> +     int     (*op_is_broadcast)(unsigned int, struct sockaddr *);
> +     int     (*op_is_local)(unsigned int, struct sockaddr *);
> +
> +     uint16_t (*op_proto)(struct sockaddr *);
> +     void    (*op_addr)(union inpaddru *, struct sockaddr *);
> +
> +     int     (*op_selsrc)(struct inpcb *, struct sockaddr *,
> +                 union inpaddru *, void *);
> +     int     (*op_control)(struct socket *, u_long, caddr_t, struct ifnet *);
> +     void    (*op_getsockname)(struct inpcb *, struct mbuf *);
> +     void    (*op_getpeername)(struct inpcb *, struct mbuf *);
> +     int     (*op_ctloutput)(int, struct socket *, int, int, struct mbuf *);
> +
> +     void    (*op_sbappend)(struct gre_pcb *, struct mbuf *, int, uint32_t);
> +
> +     /* PRU_SEND is split into two parts, with gre_output in the middle: */
> +
> +     /* 1: the "pre" phase, so ipv6 can do it's stupid pktops stuff */
> +     int     (*op_send)(const struct gre_ops *, struct gre_pcb *,
> +                 struct mbuf *, struct mbuf *, struct mbuf *);
> +     /* 2: actually doing the ip encap and output */
> +     int     (*op_output)(struct gre_pcb *, const struct gre_pcb_key *,
> +                 struct mbuf *, void *);
> +
> +     int * const defttl;
> +};
> +struct gre_tree_wildcards gre_wildcards = RBT_INITIALIZER();
> +struct gre_tree_bound gre_bound = RBT_INITIALIZER();
> +struct gre_tree_connected gre_connected = RBT_INITIALIZER();
> +
> +struct pool  gre_pcb_key_pool;
> +struct pool  gre_pcb_pool;
> +
> +static struct mbuf *
> +              gre_ip_input(const struct gre_ops *, struct mbuf *, int,
> +                 uint8_t, struct gre_pcb_key *);
> +
> +static int    gre_usrreq(const struct gre_ops *, struct socket *, int,
> +                 struct mbuf *, struct mbuf *, struct mbuf *,
> +                 struct proc *);
> +static int    gre_ctloutput(const struct gre_ops *, int, struct socket *,
> +                 int, int, struct mbuf *);
> +
> +static int    gre_send(const struct gre_ops *, struct gre_pcb *,
> +                 struct mbuf *, struct mbuf *, struct mbuf *);
> +static int    gre_output(const struct gre_ops *, struct gre_pcb *,
> +                 struct mbuf *, struct mbuf *, struct mbuf *, void *);
> +static int    gre_disconnect(struct gre_pcb *);
> +
> +#define GRE_OPT_EINVAL       ((void *)-1)
> +static void  *gre_opt(struct mbuf *, int, int, socklen_t);
> +
> +unsigned int gre_sendspace = 9216; /* XXX sysctl? */
> +unsigned int gre_recvspace = (40 * 1024);
> +
> +static struct gre_pcb_key *
> +gre_pcb_key_get(const struct gre_pcb *gpcb)
> +{
> +     struct gre_pcb_key *gk;
> +
> +     gk = pool_get(&gre_pcb_key_pool, PR_NOWAIT|PR_ZERO);
> +     if (gk == NULL)
> +             return (NULL);
> +
> +     gk->gk_flags = gpcb->gpcb_flags;
> +     gk->gk_key = gpcb->gpcb_key;
> +
> +     TAILQ_INIT(&gk->gk_pcbs);
> +
> +     return (gk);
> +}
> +
> +static void
> +gre_pcb_key_put(struct gre_pcb_key *gk)
> +{
> +     pool_put(&gre_pcb_key_pool, gk);
> +}
> +
> +static struct gre_pcb_key *
> +gre_pcb_key_insert(unsigned int state, struct gre_pcb_key *gk)
> +{
> +     struct gre_pcb_key *ogk;
> +
> +     gk->gk_state = state;
> +
> +     switch (state) {
> +     case GRE_S_WILDCARD:
> +             ogk = RBT_INSERT(gre_tree_wildcards, &gre_wildcards, gk);
> +             break;
> +     case GRE_S_BOUND:
> +             ogk = RBT_INSERT(gre_tree_bound, &gre_bound, gk);
> +             break;
> +     case GRE_S_CONNECTED:
> +             ogk = RBT_INSERT(gre_tree_connected, &gre_connected, gk);
> +             break;
> +     default:
> +             panic("%s unexpected state %u", __func__, state);
> +     }
> +
> +     return (ogk);
> +}
> +
> +static inline int
> +gre_pcb_empty(struct gre_pcb_list *l)
> +{
> +     return (TAILQ_EMPTY(l));
> +}
> +
> +static inline void
> +gre_pcb_insert(struct gre_pcb_list *l, struct gre_pcb *gpcb)
> +{
> +     TAILQ_INSERT_TAIL(l, gpcb_inp(gpcb), inp_queue);
> +}
> +
> +static inline void
> +gre_pcb_remove(struct gre_pcb_list *l, struct gre_pcb *gpcb)
> +{
> +     TAILQ_REMOVE(l, gpcb_inp(gpcb), inp_queue);
> +}
> +
> +static inline struct gre_pcb *
> +gre_pcb_first(struct gre_pcb_list *l)
> +{
> +     return (inp_gpcb(TAILQ_FIRST(l)));
> +}
> +
> +static inline struct gre_pcb *
> +gre_pcb_next(struct gre_pcb *gpcb)
> +{
> +     return (inp_gpcb(TAILQ_NEXT(gpcb_inp(gpcb), inp_queue)));
> +}
> +
> +/*
> + * INET4
> + */
> +
> +static int   gre_ip4_nametosa(struct inpcb *, struct mbuf *,
> +                 struct sockaddr **);
> +static int   gre_ip4_is_wildcard(struct sockaddr *);
> +static int   gre_ip4_is_multicast(struct sockaddr *);
> +static int   gre_ip4_is_broadcast(unsigned int, struct sockaddr *);
> +static int   gre_ip4_is_local(unsigned int, struct sockaddr *);
> +static uint16_t      gre_ip4_proto(struct sockaddr *);
> +static void  gre_ip4_addr(union inpaddru *, struct sockaddr *);
> +static int   gre_ip4_selsrc(struct inpcb *, struct sockaddr *,
> +                 union inpaddru *, void *);
> +static void  gre_ip4_sbappend(struct gre_pcb *, struct mbuf *, int,
> +                 uint32_t);
> +static int   gre_ip4_send(const struct gre_ops *, struct gre_pcb *,
> +                 struct mbuf *, struct mbuf *, struct mbuf *);
> +static int   gre_ip4_output(struct gre_pcb *, const struct gre_pcb_key *,
> +                 struct mbuf *, void *);
> +
> +static const struct gre_ops gre_ip4_ops = {
> +     .op_nametosa            = gre_ip4_nametosa,
> +     .op_is_wildcard         = gre_ip4_is_wildcard,
> +     .op_is_multicast        = gre_ip4_is_multicast,
> +     .op_is_broadcast        = gre_ip4_is_broadcast,
> +     .op_is_local            = gre_ip4_is_local,
> +
> +     .op_proto               = gre_ip4_proto,
> +     .op_addr                = gre_ip4_addr,
> +
> +     .op_selsrc              = gre_ip4_selsrc,
> +     .op_control             = in_control,
> +     .op_getsockname         = in_setsockaddr,
> +     .op_getpeername         = in_setpeeraddr,
> +     .op_ctloutput           = ip_ctloutput,
> +
> +     .op_sbappend            = gre_ip4_sbappend,
> +     .op_send                = gre_ip4_send,
> +     .op_output              = gre_ip4_output,
> +
> +     .defttl                 = &ip_defttl,
> +};
> +
> +
> +static int
> +gre_ip4_nametosa(struct inpcb *inp, struct mbuf *addr, struct sockaddr **sa)
> +{
> +     return (in_nam2sin(addr, (struct sockaddr_in **)sa));
> +}
> +
> +static int
> +gre_ip4_is_wildcard(struct sockaddr *sa)
> +{
> +     struct sockaddr_in *sin = satosin(sa);
> +     return (sin->sin_addr.s_addr == INADDR_ANY);
> +}
> +
> +static int
> +gre_ip4_is_multicast(struct sockaddr *sa)
> +{
> +     struct sockaddr_in *sin = satosin(sa);
> +     return (IN_MULTICAST(sin->sin_addr.s_addr));
> +}
> +
> +static int
> +gre_ip4_is_broadcast(unsigned int rtableid, struct sockaddr *sa)
> +{
> +     struct sockaddr_in *sin = satosin(sa);
> +
> +     return ((sin->sin_addr.s_addr == INADDR_BROADCAST) ||
> +         in_broadcast(sin->sin_addr, rtableid));
> +}
> +
> +static int
> +gre_ip4_is_local(unsigned int rtableid, struct sockaddr *sa)
> +{
> +     struct sockaddr_in *sin = satosin(sa);
> +     /* cope with ifa_ifwithaddr using memcmp */
> +     struct sockaddr_in needle = {
> +             .sin_len = sin->sin_len,
> +             .sin_family = sin->sin_family,
> +             .sin_addr = sin->sin_addr,
> +     };
> +     struct ifaddr *ifa;
> +
> +     ifa = ifa_ifwithaddr(sintosa(&needle), rtableid);
> +     if (ifa == NULL)
> +             return (EADDRNOTAVAIL);
> +
> +     return (0);
> +}
> +
> +static uint16_t
> +gre_ip4_proto(struct sockaddr *sa)
> +{
> +     struct sockaddr_in *sin = satosin(sa);
> +     return (sin->sin_port);
> +}
> +
> +static void
> +gre_ip4_addr(union inpaddru *addr, struct sockaddr *sa)
> +{
> +     struct sockaddr_in *sin = satosin(sa);
> +     addr->iau_a4u.inaddr = sin->sin_addr;
> +}
> +
> +static int
> +gre_ip4_selsrc(struct inpcb *inp, struct sockaddr *sa, union inpaddru *laddr,
> +    void *opts)
> +{
> +     struct in_addr *insrc;
> +     int error;
> +
> +     insrc = gre_opt(opts, IPPROTO_IP, IP_SENDSRCADDR, sizeof(*insrc));
> +     if (insrc == GRE_OPT_EINVAL)
> +             return (EINVAL);
> +
> +     if (insrc == NULL) {
> +             error = in_pcbselsrc(&insrc, satosin(sa), inp);
> +             if (error != 0)
> +                     return (error);
> +     } else {
> +             struct sockaddr_in sin = {
> +                     .sin_len = sizeof(sin),
> +                     .sin_family = AF_INET,
> +                     .sin_addr = *insrc,
> +             };
> +
> +             /* XXX sigh. */
> +             error = in_pcbaddrisavail(inp, &sin, 0, NULL);
> +             if (error != 0)
> +                     return (error);
> +     }
> +
> +     laddr->iau_a4u.inaddr = *insrc;
> +
> +     return (0);
> +}
> +
> +static int
> +gre_ip4_send(const struct gre_ops *ops, struct gre_pcb *gpcb,
> +    struct mbuf *m, struct mbuf *addr, struct mbuf *control)
> +{
> +     return (gre_output(ops, gpcb, m, addr, control, control));
> +}
> +
> +static int
> +gre_ip4_output(struct gre_pcb *gpcb, const struct gre_pcb_key *gk,
> +    struct mbuf *m, void *opts)
> +{
> +     struct inpcb *inp = gpcb_inp(gpcb);
> +     struct ip *ip;
> +     int error;
> +     uint32_t ipsecflowinfo = 0;
> +
> +     m = m_prepend(m, sizeof(*ip), M_DONTWAIT);
> +     if (m == NULL) {
> +             error = ENOBUFS;
> +             goto dropped;
> +     }
> +     if (m->m_pkthdr.len > IP_MAXPACKET) {
> +             error = EMSGSIZE;
> +             goto drop;
> +     }
> +
> +#ifdef IPSEC
> +     if (ISSET(inp->inp_flags, INP_IPSECFLOWINFO)) {
> +             uint32_t *p = gre_opt(opts, IPPROTO_IP, IP_IPSECFLOWINFO,
> +                 sizeof(*p));
> +             if (p == GRE_OPT_EINVAL) {
> +                     error = EINVAL;
> +                     goto drop;
> +             }
> +
> +             if (p != NULL)
> +                     ipsecflowinfo = *p;
> +     }
> +#endif /* IPSEC */
> +
> +     ip = mtod(m, struct ip *);
> +     ip->ip_v = IPVERSION;
> +     ip->ip_hl = sizeof(*ip) >> 2;
> +     ip->ip_off = 0; /* XXX nodf? */
> +     ip->ip_tos = 0; /* XXX */;
> +     ip->ip_len = htons(m->m_pkthdr.len);
> +     ip->ip_ttl = gpcb->gpcb_ttl;
> +     ip->ip_p = IPPROTO_GRE;
> +     ip->ip_src = gk->gk_laddr4;
> +     ip->ip_dst = gk->gk_faddr4;
> +
> +     error = ip_output(m, inp->inp_options, &inp->inp_route,
> +         gpcb_so(gpcb)->so_options & SO_BROADCAST, inp->inp_moptions, inp,
> +         ipsecflowinfo);
> +
> +     return (error);
> +
> +drop:
> +     m_freem(m);
> +dropped:
> +     return (error);
> +}
> +
> +static void *
> +gre_opt(struct mbuf *control, int level, int type, socklen_t len)
> +{
> +     u_int clen;
> +     struct cmsghdr *cm;
> +     caddr_t cmsgs;
> +     size_t mlen;
> +
> +     if (control == NULL)
> +             return (NULL);
> +
> +     if (control->m_next != NULL)
> +             return (GRE_OPT_EINVAL);
> +
> +     mlen = CMSG_LEN(len);
> +
> +     clen = control->m_len;
> +     cmsgs = mtod(control, caddr_t);
> +     do {
> +             if (clen < CMSG_LEN(0))
> +                     return (GRE_OPT_EINVAL);
> +
> +             cm = (struct cmsghdr *)cmsgs;
> +             if (cm->cmsg_len < CMSG_LEN(0) ||
> +                 CMSG_ALIGN(cm->cmsg_len) > clen)
> +                     return (GRE_OPT_EINVAL);
> +
> +             if (cm->cmsg_level == level &&
> +                 cm->cmsg_type == type &&
> +                 cm->cmsg_len == mlen)
> +                     return (CMSG_DATA(cm));
> +
> +             clen -= CMSG_ALIGN(cm->cmsg_len);
> +             cmsgs += CMSG_ALIGN(cm->cmsg_len);
> +     } while (clen);
> +
> +     return (NULL);
> +}
> +
> +static int
> +gre_send(const struct gre_ops *ops, struct gre_pcb *gpcb,
> +    struct mbuf *m, struct mbuf *addr, struct mbuf *control)
> +{
> +     int error;
> +
> +     error = (*ops->op_send)(ops, gpcb, m, addr, control);
> +
> +     m_freem(control);
> +
> +     return (error);
> +}
> +
> +static int
> +gre_output(const struct gre_ops *ops, struct gre_pcb *gpcb,
> +    struct mbuf *m, struct mbuf *addr, struct mbuf *control, void *opts)
> +{
> +     const struct gre_pcb_key *gk = gpcb->gpcb_pcb_key;
> +     struct gre_pcb_key key;
> +     struct gre_header *gh;
> +     int error;
> +
> +     if (addr != NULL) {
> +             struct gre_pcb_key *ogk;
> +             struct inpcb *inp = gpcb_inp(gpcb);
> +             struct sockaddr *sa;
> +             int state = GRE_S_DISCONNECTED;
> +
> +             if (gk != NULL)
> +                     state = gk->gk_state;
> +
> +             if (state == GRE_S_CONNECTED) {
> +                     error = EISCONN;
> +                     goto drop;
> +             }
> +
> +             error = (*ops->op_nametosa)(inp, addr, &sa);
> +             if (error != 0)
> +                     goto drop;
> +
> +             key.gk_family = sa->sa_family;
> +             key.gk_proto = (*ops->op_proto)(sa);
> +
> +             if (state != GRE_S_DISCONNECTED) {
> +                     KASSERT(key.gk_family == gk->gk_family);
> +                     if (key.gk_proto != gk->gk_proto) {
> +                             error = EADDRNOTAVAIL;
> +                             goto drop;
> +                     }
> +             }
> +             if (state == GRE_S_BOUND) {
> +                     key.gk_laddr = gk->gk_laddr;
> +             } else {
> +                     error = (*ops->op_selsrc)(inp, sa, &key.gk_laddr,
> +                         opts);
> +                     if (error != 0)
> +                             goto drop;
> +             }
> +
> +             (*ops->op_addr)(&key.gk_faddr, sa);
> +
> +             key.gk_rtableid = inp->inp_rtableid;
> +             key.gk_flags = gpcb->gpcb_flags;
> +             key.gk_key = gpcb->gpcb_key;
> +
> +             ogk = RBT_FIND(gre_tree_connected, &gre_connected, &key);
> +             if (ogk != NULL) {
> +                     struct gre_pcb *ogpcb;
> +                     int reuse = gpcb->gpcb_reuse;
> +
> +                     ogpcb = gre_pcb_first(&ogk->gk_pcbs);
> +                     if (ogpcb != NULL) {
> +                             struct socket *oso = gpcb_so(ogpcb);
> +                             if (!ISSET(reuse, oso->so_options)) {
> +                                     error = EADDRINUSE;
> +                                     goto drop;
> +                             }
> +                     }
> +             }
> +
> +             gk = &key;
> +     } else {
> +             if (gk == NULL || gk->gk_state != GRE_S_CONNECTED) {
> +                     error = ENOTCONN;
> +                     goto drop;
> +             }
> +     }
> +
> +     if (ISSET(gk->gk_flags, htons(GRE_SP))) {
> +             struct gre_h_seq *gsh;
> +             uint32_t *seqno;
> +
> +             seqno = gre_opt(control, IPPROTO_GRE, GRE_SENDSEQ,
> +                 sizeof(*seqno));
> +             if (seqno == GRE_OPT_EINVAL) {
> +                     error = EINVAL;
> +                     goto drop;
> +             }
> +
> +             m = m_prepend(m, sizeof(*gsh), M_DONTWAIT);
> +             if (m == NULL)
> +                     return (ENOBUFS);
> +
> +             gsh = mtod(m, struct gre_h_seq *);
> +             htobem32(&gsh->gre_seq, seqno != NULL ?
> +                 (gpcb->gpcb_seq = *seqno) : /* keep track of new start */
> +                 atomic_inc_int_nv(&gpcb->gpcb_seq));
> +     }
> +
> +     if (ISSET(gk->gk_flags, htons(GRE_KP))) {
> +             struct gre_h_key *gkh;
> +
> +             m = m_prepend(m, sizeof(*gkh), M_DONTWAIT);
> +             if (m == NULL)
> +                     return (ENOBUFS);
> +
> +             gkh = mtod(m, struct gre_h_key *);
> +             gkh->gre_key = gk->gk_key;
> +     }
> +
> +     if (ISSET(gk->gk_flags, htons(GRE_CP))) {
> +             struct gre_h_cksum *gch;
> +
> +             m = m_prepend(m, sizeof(*gch), M_DONTWAIT);
> +             if (m == NULL)
> +                     return (ENOBUFS);
> +
> +             gch = mtod(m, struct gre_h_cksum *);
> +             gch->gre_cksum = 0; /* XXX need to checksum */
> +             gch->gre_reserved1 = 0;
> +     }
> +
> +     m = m_prepend(m, sizeof(*gh), M_DONTWAIT);
> +     if (m == NULL)
> +             return (ENOBUFS);
> +
> +     gh = mtod(m, struct gre_header *);
> +     gh->gre_flags = gk->gk_flags;
> +     gh->gre_proto = gk->gk_proto;
> +
> +     KASSERT(ISSET(m->m_flags, M_PKTHDR));
> +
> +     m->m_pkthdr.ph_rtableid = gpcb_inp(gpcb)->inp_rtableid;
> +
> +     return ((*ops->op_output)(gpcb, gk, m, opts));
> +
> +drop:
> +     m_freem(m);
> +     return (error);
> +}
> +
> +static void
> +gre_sbappend(struct gre_pcb *gpcb, struct sockaddr *sa, struct mbuf *m,
> +    struct mbuf *opts, int hlen, uint32_t seqno)
> +{
> +     struct socket *so = gpcb_so(gpcb);
> +
> +     if (ISSET(gpcb->gpcb_pflags, GREPCB_RECVSEQ)) {
> +             struct mbuf *opt = sbcreatecontrol(&seqno, sizeof(seqno),
> +                 GRE_RECVSEQ, IPPROTO_GRE);
> +             if (opt != NULL) {
> +                     opt->m_next = opts;
> +                     opts = opt;
> +             }
> +     }
> +
> +     m_adj(m, hlen);
> +     if (sbappendaddr(so, &so->so_rcv, sa, m, opts) == 0) {
> +             m_freem(m);
> +             m_freem(opts);
> +             return;
> +     }
> +
> +     sorwakeup(so);
> +}
> +
> +int
> +gre_ip4_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *addr,
> +    struct mbuf *control, struct proc *p)
> +{
> +     return (gre_usrreq(&gre_ip4_ops, so, req, m, addr, control, p));
> +}
> +
> +int
> +gre_ip4_ctloutput(int op, struct socket *so, int level, int optname,
> +    struct mbuf *m)
> +{
> +     return (gre_ctloutput(&gre_ip4_ops, op, so, level, optname, m));
> +}
> +
> +static void
> +gre_ip4_sbappend(struct gre_pcb *gpcb, struct mbuf *m, int hlen, uint32_t 
> seqno)
> +{
> +     struct inpcb *inp = gpcb_inp(gpcb);
> +     struct socket *so = gpcb_so(gpcb);
> +     struct mbuf *opts = NULL;
> +     struct sockaddr_in sin = {
> +             .sin_len = sizeof(sin),
> +             .sin_family = AF_INET,
> +             .sin_port = inp->inp_lport,
> +             .sin_addr = mtod(m, struct ip *)->ip_src,
> +     };
> +
> +     if (ISSET(inp->inp_flags, INP_CONTROLOPTS) ||
> +         ISSET(so->so_options, SO_TIMESTAMP))
> +             ip_savecontrol(inp, &opts, mtod(m, struct ip *), m);
> +
> +     if (ISSET(inp->inp_flags, INP_RECVDSTPORT)) {
> +             struct mbuf *opt;
> +
> +             opt = sbcreatecontrol(&inp->inp_lport, sizeof(inp->inp_lport),
> +                 IP_RECVDSTPORT, IPPROTO_IP);
> +             if (opt != NULL) {
> +                     opt->m_next = opts;
> +                     opts = opt;
> +             }
> +     }
> +
> +     gre_sbappend(gpcb, sintosa(&sin), m, opts, hlen, seqno);
> +}
> +
> +int
> +gre_ip4_input(struct mbuf **mp, int *offp, int proto, int af)
> +{
> +     struct mbuf *m = *mp;
> +     struct gre_pcb_key gk;
> +     struct ip *ip;
> +
> +     m = gre_if4_input(m, *offp);
> +     if (m == NULL)
> +             return (IPPROTO_DONE);
> +
> +     ip = mtod(m, struct ip *);
> +
> +     gk.gk_family = AF_INET;
> +     gk.gk_laddr4 = ip->ip_dst;
> +     gk.gk_faddr4 = ip->ip_src;
> +
> +     m = gre_ip_input(&gre_ip4_ops, m, *offp, ip->ip_ttl, &gk);
> +     if (m == NULL)
> +             return (IPPROTO_DONE);
> +
> +     *mp = m;
> +     return (rip_input(mp, offp, proto, af));
> +}
> +
> +#if INET6
> +#include <netinet6/ip6_var.h>
> +#include <netinet6/in6_var.h>
> +
> +static int   gre_ip6_nametosa(struct inpcb *, struct mbuf *,
> +                 struct sockaddr **);
> +static int   gre_ip6_is_wildcard(struct sockaddr *);
> +static int   gre_ip6_is_multicast(struct sockaddr *);
> +static int   gre_ip6_is_broadcast(unsigned int, struct sockaddr *);
> +static int   gre_ip6_is_local(unsigned int, struct sockaddr *);
> +static uint16_t      gre_ip6_proto(struct sockaddr *);
> +static void  gre_ip6_addr(union inpaddru *, struct sockaddr *);
> +static int   gre_ip6_selsrc(struct inpcb *, struct sockaddr *,
> +                 union inpaddru *, void *);
> +static void  gre_ip6_sbappend(struct gre_pcb *, struct mbuf *, int,
> +                 uint32_t);
> +static int   gre_ip6_send(const struct gre_ops *, struct gre_pcb *,
> +                 struct mbuf *, struct mbuf *, struct mbuf *);
> +static int   gre_ip6_output(struct gre_pcb *, const struct gre_pcb_key *,
> +                 struct mbuf *, void *);
> +
> +static const struct gre_ops gre_ip6_ops = {
> +     .op_nametosa            = gre_ip6_nametosa,
> +     .op_is_wildcard         = gre_ip6_is_wildcard,
> +     .op_is_multicast        = gre_ip6_is_multicast,
> +     .op_is_broadcast        = gre_ip6_is_broadcast,
> +     .op_is_local            = gre_ip6_is_local,
> +
> +     .op_proto               = gre_ip6_proto,
> +     .op_addr                = gre_ip6_addr,
> +
> +     .op_selsrc              = gre_ip6_selsrc,
> +     .op_control             = in6_control,
> +     .op_getsockname         = in6_setsockaddr,
> +     .op_getpeername         = in6_setpeeraddr,
> +     .op_ctloutput           = ip6_ctloutput,
> +     .op_sbappend            = gre_ip6_sbappend,
> +     .op_send                = gre_ip6_send,
> +     .op_output              = gre_ip6_output,
> +
> +     .defttl                 = &ip6_defhlim,
> +};
> +
> +static int
> +gre_ip6_nametosa(struct inpcb *inp, struct mbuf *addr, struct sockaddr **sa)
> +{
> +     struct sockaddr_in6 *sin6;
> +     int error;
> +
> +     error = in6_nam2sin6(addr, &sin6);
> +     if (error != 0)
> +             return (error);
> +
> +     /* reject IPv4 mapped addresses, we have no support for them */
> +     if (IN6_IS_ADDR_V4MAPPED(&sin6->sin6_addr))
> +             return (EADDRNOTAVAIL);
> +
> +     if (in6_embedscope(&sin6->sin6_addr, sin6, inp) != 0)
> +             return (EINVAL);
> +
> +     *sa = sin6tosa(sin6);
> +     return (0);
> +}
> +
> +static int
> +gre_ip6_is_wildcard(struct sockaddr *sa)
> +{
> +     struct sockaddr_in6 *sin6 = satosin6(sa);
> +     return (IN6_IS_ADDR_UNSPECIFIED(&sin6->sin6_addr));
> +}
> +
> +static int
> +gre_ip6_is_multicast(struct sockaddr *sa)
> +{
> +     struct sockaddr_in6 *sin6 = satosin6(sa);
> +     return (IN6_IS_ADDR_MULTICAST(&sin6->sin6_addr));
> +}
> +
> +static int
> +gre_ip6_is_broadcast(unsigned int rtableid, struct sockaddr *sa)
> +{
> +     return (0);
> +}
> +
> +static int
> +gre_ip6_is_local(unsigned int rtableid, struct sockaddr *sa)
> +{
> +     struct sockaddr_in6 *sin6 = satosin6(sa);
> +     struct sockaddr_in6 needle = {
> +             .sin6_len = sin6->sin6_len,
> +             .sin6_family = sin6->sin6_family,
> +             .sin6_addr = sin6->sin6_addr,
> +     };
> +     struct ifaddr *ifa;
> +
> +     ifa = ifa_ifwithaddr(sin6tosa(&needle), rtableid);
> +     if (ifa == NULL)
> +             return (EADDRNOTAVAIL);
> +
> +     /*
> +      * bind to an anycast address might accidentally
> +      * cause sending a packet with an anycast source
> +      * address, so we forbid it.
> +      *
> +      * We should allow to bind to a deprecated address,
> +      * since the application dare to use it.
> +      * But, can we assume that they are careful enough
> +      * to check if the address is deprecated or not?
> +      * Maybe, as a safeguard, we should have a setsockopt
> +      * flag to control the bind(2) behavior against
> +      * deprecated addresses (default: forbid bind(2)).
> +      */
> +     if (ISSET(ifatoia6(ifa)->ia6_flags, IN6_IFF_ANYCAST|
> +         IN6_IFF_TENTATIVE|IN6_IFF_DUPLICATED|IN6_IFF_DETACHED))
> +             return (EADDRNOTAVAIL);
> +
> +     return (0);
> +}
> +
> +static uint16_t
> +gre_ip6_proto(struct sockaddr *sa)
> +{
> +     struct sockaddr_in6 *sin6 = satosin6(sa);
> +     return (sin6->sin6_port);
> +}
> +
> +static void
> +gre_ip6_addr(union inpaddru *addr, struct sockaddr *sa)
> +{
> +     struct sockaddr_in6 *sin6 = satosin6(sa);
> +     addr->iau_addr6 = sin6->sin6_addr;
> +}
> +
> +static int
> +gre_ip6_selsrc(struct inpcb *inp, struct sockaddr *sa, union inpaddru *laddr,
> +    void *opts)
> +{
> +     struct in6_addr *in6src;
> +     int error;
> +
> +     if (opts == NULL)
> +             opts = inp->inp_outputopts6;
> +
> +     error = in6_pcbselsrc(&in6src, satosin6(sa), inp, opts);
> +     if (error != 0)
> +             return (error);
> +
> +     laddr->iau_addr6 = *in6src;
> +
> +     return (0);
> +}
> +
> +int
> +gre_ip6_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *addr,
> +    struct mbuf *control, struct proc *p)
> +{
> +     return (gre_usrreq(&gre_ip6_ops, so, req, m, addr, control, p));
> +}
> +
> +int
> +gre_ip6_ctloutput(int op, struct socket *so, int level, int optname,
> +    struct mbuf *m)
> +{
> +     return (gre_ctloutput(&gre_ip6_ops, op, so, level, optname, m));
> +}
> +
> +static void
> +gre_ip6_sbappend(struct gre_pcb *gpcb, struct mbuf *m, int hlen, uint32_t 
> seqno)
> +{
> +     struct inpcb *inp = gpcb_inp(gpcb);
> +     struct socket *so = gpcb_so(gpcb);
> +     struct mbuf *opts = NULL;
> +     struct sockaddr_in6 sin6 = {
> +             .sin6_len = sizeof(sin6),
> +             .sin6_family = AF_INET6,
> +             .sin6_port = inp->inp_lport,
> +     };
> +
> +     in6_recoverscope(&sin6, &mtod(m, struct ip6_hdr *)->ip6_src);
> +
> +     if (ISSET(inp->inp_flags, IN6P_CONTROLOPTS) ||
> +         ISSET(so->so_options, SO_TIMESTAMP))
> +             ip6_savecontrol(inp, m, &opts);
> +
> +     if (ISSET(inp->inp_flags, IN6P_RECVDSTPORT)) {
> +             struct mbuf *opt;
> +
> +             opt = sbcreatecontrol(&inp->inp_fport, sizeof(inp->inp_fport),
> +                 IPV6_RECVDSTPORT, IPPROTO_IPV6);
> +             if (opt != NULL) {
> +                     opt->m_next = opts;
> +                     opts = opt;
> +             }
> +     }
> +
> +     gre_sbappend(gpcb, sin6tosa(&sin6), m, opts, hlen, seqno);
> +}
> +
> +static int
> +gre_ip6_send(const struct gre_ops *ops, struct gre_pcb *gpcb, struct mbuf *m,
> +    struct mbuf *addr, struct mbuf *control)
> +{
> +     struct inpcb *inp = gpcb_inp(gpcb);
> +     struct ip6_pktopts *opts = inp->inp_outputopts6;
> +     struct ip6_pktopts opt;
> +     int error;
> +
> +     if (control != NULL) {
> +             error = ip6_setpktopts(control, &opt, opts, /* priv */ 1,
> +                 IPPROTO_GRE);
> +             if (error != 0) {
> +                     m_freem(m);
> +                     return (error);
> +             }
> +             opts = &opt;
> +     }
> +
> +     error = gre_output(ops, gpcb, m, addr, control, opts);
> +     if (control != NULL)
> +             ip6_clearpktopts(&opt, -1);
> +     return (error);
> +}
> +
> +static int
> +gre_ip6_output(struct gre_pcb *gpcb, const struct gre_pcb_key *gk,
> +    struct mbuf *m, void *opts)
> +{
> +     struct inpcb *inp = gpcb_inp(gpcb);
> +     uint16_t len = m->m_pkthdr.len;
> +     struct ip6_hdr *ip6;
> +     int flags = 0;
> +     int error;
> +
> +     if (len > IP_MAXPACKET) {
> +             error = EMSGSIZE;
> +             goto drop;
> +     }
> +
> +     m = m_prepend(m, sizeof(*ip6), M_DONTWAIT);
> +     if (m == NULL) {
> +             error = ENOBUFS;
> +             goto dropped;
> +     }
> +
> +     ip6 = mtod(m, struct ip6_hdr *);
> +     ip6->ip6_vfc = IPV6_VERSION;
> +     ip6->ip6_plen = htons(len);
> +     ip6->ip6_nxt = IPPROTO_GRE;
> +     ip6->ip6_hlim = gpcb->gpcb_ttl;
> +     ip6->ip6_src = gk->gk_laddr6;
> +     ip6->ip6_dst = gk->gk_faddr6;
> +
> +     if (ISSET(inp->inp_flags, IN6P_MINMTU)) /* wtf */
> +             flags |= IPV6_MINMTU;
> +
> +     error = ip6_output(m, opts, &inp->inp_route6,
> +         flags, inp->inp_moptions6, inp);
> +
> +     return (error);
> +
> +drop:
> +     m_freem(m);
> +dropped:
> +     return (error);
> +}
> +
> +int
> +gre_ip6_input(struct mbuf **mp, int *offp, int proto, int af)
> +{
> +     struct mbuf *m = *mp;
> +     struct gre_pcb_key gk;
> +     struct ip6_hdr *ip6;
> +
> +     m = gre_if6_input(m, *offp);
> +     if (m == NULL)
> +             return (IPPROTO_DONE);
> +
> +     ip6 = mtod(m, struct ip6_hdr *);
> +
> +     gk.gk_family = AF_INET6;
> +     gk.gk_laddr6 = ip6->ip6_dst;
> +     gk.gk_faddr6 = ip6->ip6_src;
> +
> +     m = gre_ip_input(&gre_ip6_ops, m, *offp, ip6->ip6_hlim, &gk);
> +     if (m == NULL)
> +             return (IPPROTO_DONE);
> +
> +     *mp = m;
> +     return (rip6_input(mp, offp, proto, af));
> +}
> +#endif /* INET6 */
> +
> +/*
> + * generic GRE protocol handling
> + */
> +
> +void
> +gre_init(void)
> +{
> +     pool_init(&gre_pcb_key_pool, sizeof(struct gre_pcb_key), 0,
> +         IPL_NONE, PR_WAITOK, "grekey", NULL);
> +     pool_init(&gre_pcb_pool, sizeof(struct gre_pcb), 0,
> +         IPL_NONE, PR_WAITOK, "grepcb", NULL);
> +}
> +
> +int
> +gre_attach(struct socket *so, int proto)
> +{
> +     const struct gre_ops *ops = &gre_ip4_ops;
> +     struct gre_pcb *gpcb;
> +     struct inpcb *inp;
> +     int error;
> +     int flags = 0;
> +
> +     if (so->so_pcb != NULL)
> +             return (EINVAL);
> +
> +     error = suser(curproc);
> +     if (error != 0)
> +             return (error);
> +
> +     error = soreserve(so, gre_sendspace, gre_recvspace);
> +     if (error != 0)
> +             return (error);
> +
> +#ifdef INET6
> +     if (sotopf(so) == PF_INET6) {
> +             ops = &gre_ip6_ops;
> +             flags = INP_IPV6;
> +     }
> +#endif
> +
> +     gpcb = pool_get(&gre_pcb_pool, PR_NOWAIT | PR_ZERO);
> +     if (gpcb == NULL)
> +             return (ENOBUFS);
> +
> +     inp = gpcb_inp(gpcb);
> +     inp->inp_socket = so;
> +     refcnt_init(&inp->inp_refcnt);
> +     inp->inp_seclevel[SL_AUTH] = IPSEC_AUTH_LEVEL_DEFAULT;
> +     inp->inp_seclevel[SL_ESP_TRANS] = IPSEC_ESP_TRANS_LEVEL_DEFAULT;
> +     inp->inp_seclevel[SL_ESP_NETWORK] = IPSEC_ESP_NETWORK_LEVEL_DEFAULT;
> +     inp->inp_seclevel[SL_IPCOMP] = IPSEC_IPCOMP_LEVEL_DEFAULT;
> +     inp->inp_rtableid = curproc->p_p->ps_rtableid;
> +     inp->inp_ip_minttl = 0;
> +     inp->inp_flags |= flags;
> +     inp->inp_hops = *ops->defttl;
> +     inp->inp_ppcb = (caddr_t)gpcb;
> +
> +     so->so_pcb = inp;
> +
> +     return (0);
> +}
> +
> +static void
> +gre_inpdetach(struct gre_pcb *gpcb)
> +{
> +     struct socket *so = gpcb_so(gpcb);
> +     struct inpcb *inp = gpcb_inp(gpcb);
> +
> +     KASSERT(so->so_pcb == inp);
> +
> +     so->so_pcb = NULL;
> +     sofree(so, SL_NOUNLOCK);
> +
> +     m_freem(inp->inp_options);
> +     if (inp->inp_route.ro_rt) {
> +             rtfree(inp->inp_route.ro_rt);
> +             inp->inp_route.ro_rt = NULL;
> +     }
> +
> +     switch (ISSET(inp->inp_flags, INP_IPV6)) {
> +     case 0:
> +             ip_freemoptions(inp->inp_moptions);
> +             break;
> +#ifdef INET6
> +     case INP_IPV6:
> +             ip6_freepcbopts(inp->inp_outputopts6);
> +             ip6_freemoptions(inp->inp_moptions6);
> +             break;
> +#endif
> +     }
> +
> +     KASSERT((struct gre_pcb *)inp->inp_ppcb == (struct gre_pcb *)inp);
> +
> +     (void)gre_disconnect(gpcb);
> +     pool_put(&gre_pcb_pool, gpcb);
> +}
> +
> +int
> +gre_detach(struct socket *so)
> +{
> +     struct inpcb *inp;
> +
> +     soassertlocked(so);
> +
> +     inp = sotoinpcb(so);
> +     if (inp == NULL)
> +             return (EINVAL);
> +
> +     gre_inpdetach(inp_gpcb(inp));
> +
> +     return (0);
> +}
> +
> +static int
> +gre_bind(const struct gre_ops *ops, struct gre_pcb *gpcb, struct mbuf *addr)
> +{
> +     struct inpcb *inp = gpcb_inp(gpcb);
> +     struct socket *so = gpcb_so(gpcb);
> +     struct sockaddr *sa;
> +     unsigned int state = GRE_S_BOUND;
> +     struct gre_pcb_key *gk, *ogk;
> +     int reuse = ISSET(so->so_options, SO_REUSEPORT);
> +     int error;
> +
> +     if (gpcb->gpcb_pcb_key != NULL)
> +             return (EISCONN);
> +
> +     error = (*ops->op_nametosa)(inp, addr, &sa);
> +     if (error != 0)
> +             return (error);
> +
> +     if ((*ops->op_is_wildcard)(sa)) {
> +             state = GRE_S_WILDCARD;
> +     } else if ((*ops->op_is_multicast)(sa)) {
> +             /*
> +              * Treat SO_REUSEADDR as SO_REUSEPORT for multicast;
> +              * allow complete duplication of binding if
> +              * SO_REUSEPORT is set, or if SO_REUSEADDR is set
> +              * and a multicast address is bound on both
> +              * new and duplicated sockets.
> +              */
> +             if (ISSET(so->so_options, SO_REUSEADDR|SO_REUSEPORT))
> +                     reuse = SO_REUSEADDR|SO_REUSEPORT;
> +     } else if (ISSET(so->so_options, SO_BINDANY) ||
> +         (*ops->op_is_broadcast)(inp->inp_rtableid, sa) == 0) {
> +             /*
> +              * we must check that we are binding to an address we
> +              * own except when:
> +              * - SO_BINDANY is set or
> +              * - we are binding a UDP socket to 255.255.255.255 or
> +              * - we are binding a UDP socket to one of our broadcast
> +              *   addresses
> +              */
> +             ;
> +     } else {
> +             error = (*ops->op_is_local)(inp->inp_rtableid, sa);
> +             if (error != 0)
> +                     return (error);
> +     }
> +
> +     gk = gre_pcb_key_get(gpcb);
> +     if (gk == NULL)
> +             return (ENOMEM);
> +
> +     gk->gk_family = sa->sa_family;
> +     (*ops->op_addr)(&gk->gk_laddr, sa);
> +     gk->gk_proto = (*ops->op_proto)(sa);
> +
> +     ogk = gre_pcb_key_insert(state, gk);
> +     if (ogk != NULL) {
> +             struct gre_pcb *ogpcb;
> +
> +             gre_pcb_key_put(gk);
> +
> +             ogpcb = gre_pcb_first(&ogk->gk_pcbs);
> +             if (ogpcb != NULL) {
> +                     struct socket *oso = gpcb_so(ogpcb);
> +                     if (!ISSET(reuse, oso->so_options))
> +                             return (EADDRINUSE);
> +             }
> +
> +             gk = ogk;
> +     }
> +
> +     /* commit */
> +
> +     gpcb->gpcb_pcb_key = gk;
> +     gre_pcb_insert(&gk->gk_pcbs, gpcb);
> +
> +     inp->inp_laddru = gk->gk_laddr;
> +     inp->inp_lport = gk->gk_proto;
> +
> +     gpcb->gpcb_reuse = reuse;
> +
> +     return (0);
> +}
> +
> +static int
> +gre_connect(const struct gre_ops *ops, struct gre_pcb *gpcb, struct mbuf 
> *addr)
> +{
> +     struct inpcb *inp = gpcb_inp(gpcb);
> +     struct socket *so = gpcb_so(gpcb);
> +     struct gre_pcb_key *bgk = gpcb->gpcb_pcb_key;
> +     struct gre_pcb_key *gk, *ogk;
> +     struct sockaddr *sa;
> +     int reuse = ISSET(so->so_options, SO_REUSEPORT);
> +     int error;
> +
> +     if (bgk != NULL && bgk->gk_state == GRE_S_CONNECTED)
> +             return (EISCONN);
> +
> +     error = (*ops->op_nametosa)(inp, addr, &sa);
> +     if (error != 0)
> +             return (error);
> +
> +     /* don't allow connections to wildcard addresses */
> +     if ((*ops->op_is_wildcard)(sa))
> +             return (EADDRNOTAVAIL);
> +
> +     gk = gre_pcb_key_get(gpcb);
> +     if (gk == NULL)
> +             return (ENOBUFS);
> +
> +     if (bgk == NULL || bgk->gk_state == GRE_S_WILDCARD) {
> +             error = (*ops->op_selsrc)(inp, sa, &gk->gk_laddr, NULL);
> +             if (error != 0)
> +                     goto put;
> +
> +             gk->gk_proto = (*ops->op_proto)(sa);
> +     } else {
> +             if (bgk->gk_proto != (*ops->op_proto)(sa)) {
> +                     error = EADDRNOTAVAIL;
> +                     goto put;
> +             }
> +
> +             gk->gk_laddr = bgk->gk_laddr;
> +             gk->gk_proto = bgk->gk_proto;
> +             reuse |= gpcb->gpcb_reuse;
> +     }
> +
> +     gk->gk_family = sa->sa_family;
> +     (*ops->op_addr)(&gk->gk_faddr, sa);
> +
> +     ogk = gre_pcb_key_insert(GRE_S_CONNECTED, gk);
> +     if (ogk != NULL) {
> +             struct gre_pcb *ogpcb;
> +
> +             gre_pcb_key_put(gk);
> +
> +             ogpcb = gre_pcb_first(&ogk->gk_pcbs);
> +             if (ogpcb != NULL) {
> +                     struct socket *oso = gpcb_so(ogpcb);
> +                     KASSERTMSG(oso != NULL, "ogk %p ogpcb %p oso %p",
> +                         ogk, ogpcb, oso);
> +                     if (!ISSET(reuse, oso->so_options))
> +                             return (EADDRINUSE);
> +             }
> +
> +             gk = ogk;
> +     }
> +
> +     /* commit */
> +
> +     gre_disconnect(gpcb);
> +     gpcb->gpcb_pcb_key = gk;
> +     gre_pcb_insert(&gk->gk_pcbs, gpcb);
> +
> +     inp->inp_laddru = gk->gk_laddr;
> +     inp->inp_faddru = gk->gk_faddr;
> +     inp->inp_lport = inp->inp_fport = gk->gk_proto;
> +
> +     gpcb->gpcb_reuse = reuse;
> +     soisconnected(so);
> +     return (0);
> +
> +put:
> +     gre_pcb_key_put(gk);
> +     return (error);
> +}
> +
> +
> +static int
> +gre_getsockname(const struct gre_ops *ops, struct gre_pcb *gpcb,
> +    struct mbuf *addr)
> +{
> +     struct gre_pcb_key *gk = gpcb->gpcb_pcb_key;
> +
> +     if (gk == NULL)
> +             return (ENOTCONN);
> +
> +     (*ops->op_getsockname)(gpcb_inp(gpcb), addr);
> +
> +     return (0);
> +}
> +
> +static int
> +gre_getpeername(const struct gre_ops *ops, struct gre_pcb *gpcb,
> +    struct mbuf *addr)
> +{
> +     struct gre_pcb_key *gk = gpcb->gpcb_pcb_key;
> +
> +     if (gk == NULL || gk->gk_state != GRE_S_CONNECTED)
> +             return (ENOTCONN);
> +
> +     (*ops->op_getpeername)(gpcb_inp(gpcb), addr);
> +
> +     return (0);
> +}
> +
> +static int
> +gre_disconnect(struct gre_pcb *gpcb)
> +{
> +     struct gre_pcb_key *gk;
> +
> +     gk = gpcb->gpcb_pcb_key;
> +     if (gk == NULL) {
> +             return (ENOTCONN);
> +     }
> +
> +     gre_pcb_remove(&gk->gk_pcbs, gpcb);
> +     if (gre_pcb_empty(&gk->gk_pcbs)) {
> +             switch (gk->gk_state) {
> +             case GRE_S_WILDCARD:
> +                     RBT_REMOVE(gre_tree_wildcards, &gre_wildcards, gk);
> +                     break;
> +             case GRE_S_BOUND:
> +                     RBT_REMOVE(gre_tree_bound, &gre_bound, gk);
> +                     break;
> +             case GRE_S_CONNECTED:
> +                     RBT_REMOVE(gre_tree_connected, &gre_connected, gk);
> +                     break;
> +             }
> +             pool_put(&gre_pcb_key_pool, gk);
> +     }
> +
> +     gpcb->gpcb_pcb_key = NULL;
> +
> +     return (0);
> +}
> +
> +static int
> +gre_usrreq(const struct gre_ops *ops, struct socket *so, int req,
> +    struct mbuf *m, struct mbuf *addr, struct mbuf *control, struct proc *p)
> +{
> +     struct inpcb *inp;
> +     struct gre_pcb *gpcb;
> +     int error = 0;
> +
> +     if (req == PRU_CONTROL) {
> +             return ((*ops->op_control)(so, (u_long)m, (caddr_t)addr,
> +                 (struct ifnet *)control));
> +     }
> +
> +     soassertlocked(so);
> +
> +     inp = sotoinpcb(so);
> +     if (inp == NULL) {
> +             error = EINVAL;
> +             goto release;
> +     }
> +     gpcb = inp_gpcb(inp);
> +
> +     switch (req) {
> +     case PRU_BIND:
> +             error = gre_bind(ops, gpcb, addr);
> +             break;
> +
> +     case PRU_LISTEN:
> +             error = EOPNOTSUPP;
> +             break;
> +
> +     case PRU_CONNECT:
> +             error = gre_connect(ops, gpcb, addr);
> +             break;
> +
> +     case PRU_CONNECT2:
> +             error = EOPNOTSUPP;
> +             break;
> +
> +     case PRU_ACCEPT:
> +             error = EOPNOTSUPP;
> +             break;
> +
> +     case PRU_DISCONNECT:
> +             error = gre_disconnect(gpcb);
> +             if (error != 0)
> +                     break;
> +
> +             gpcb->gpcb_reuse = 0;
> +             CLR(so->so_state, SS_ISCONNECTED); /* XXX cos udp_usrreq.c */
> +             memset(&inp->inp_laddru, 0, sizeof(inp->inp_laddru));
> +             memset(&inp->inp_faddru, 0, sizeof(inp->inp_faddru));
> +             inp->inp_lport = inp->inp_fport = 0;
> +             break;
> +
> +     case PRU_SHUTDOWN:
> +             socantsendmore(so);
> +             break;
> +
> +     case PRU_SEND:
> +             return (gre_send(ops, gpcb, m, addr, control));
> +
> +     case PRU_ABORT:
> +             soisdisconnected(so);
> +             gre_inpdetach(gpcb);
> +             break;
> +
> +     case PRU_SOCKADDR:
> +             return (gre_getsockname(ops, gpcb, addr));
> +             break;
> +     case PRU_PEERADDR:
> +             return (gre_getpeername(ops, gpcb, addr));
> +             break;
> +
> +     case PRU_SENSE:
> +             /* stat: don't bother with a block size. */
> +             break;
> +
> +     case PRU_SENDOOB:
> +     case PRU_FASTTIMO:
> +     case PRU_SLOWTIMO:
> +     case PRU_PROTORCV:
> +     case PRU_PROTOSEND:
> +     case PRU_RCVD:
> +     case PRU_RCVOOB:
> +             error = EOPNOTSUPP;
> +             break;
> +
> +     default:
> +             panic("%s req %d", __func__, req);
> +     }
> +
> +release:
> +     switch (req) {
> +     case PRU_RCVD:
> +     case PRU_RCVOOB:
> +     case PRU_SENSE:
> +             break;
> +     default:
> +             m_freem(control);
> +             m_freem(m);
> +             break;
> +     }
> +
> +     return (error);
> +}
> +
> +static int
> +gre_setopt(struct gre_pcb *gpcb, int optname, struct mbuf *m)
> +{
> +     /*
> +      * only support changing these options when the socket is
> +      * completely disconnected. the amount of code needed to try
> +      * changing the gre_pcb_key was "quite large" and arguably
> +      * not worth it.
> +      */
> +     switch (optname) {
> +     case GRE_CKSUM:
> +     case GRE_KEY:
> +     case GRE_SEQ:
> +             if (gpcb->gpcb_pcb_key != NULL)
> +                     return (EISCONN);
> +             break;
> +     }
> +
> +     switch (optname) {
> +     case GRE_CKSUM:
> +             if (m == NULL || m->m_len != sizeof(int))
> +                     return (EINVAL);
> +
> +             if (*mtod(m, int *))
> +                     SET(gpcb->gpcb_flags, htons(GRE_CP));
> +             else
> +                     CLR(gpcb->gpcb_flags, htons(GRE_CP));
> +             break;
> +     case GRE_KEY:
> +             if (m == NULL || m->m_len == 0) { /* disable key */
> +                     CLR(gpcb->gpcb_flags, htons(GRE_KP));
> +                     gpcb->gpcb_key = htonl(0);
> +                     break;
> +             }
> +
> +             if (m->m_len != sizeof(gpcb->gpcb_key))
> +                     return (EINVAL);
> +
> +             SET(gpcb->gpcb_flags, htons(GRE_KP));
> +             htobem32(&gpcb->gpcb_key, *mtod(m, uint32_t *));
> +             break;
> +     case GRE_SEQ:
> +             if (m == NULL || m->m_len == 0) { /* disable seq */
> +                     CLR(gpcb->gpcb_flags, htons(GRE_SP));
> +                     gpcb->gpcb_seq = 0;
> +                     break;
> +             }
> +
> +             if (m->m_len != sizeof(gpcb->gpcb_seq))
> +                     return (EINVAL);
> +
> +             SET(gpcb->gpcb_flags, htons(GRE_SP));
> +             gpcb->gpcb_seq = *mtod(m, uint32_t *);
> +             break;
> +     case GRE_RECVSEQ:
> +             if (m == NULL || m->m_len != sizeof(int))
> +                     return (EINVAL);
> +
> +             if (*mtod(m, int *))
> +                     SET(gpcb->gpcb_pflags, GREPCB_RECVSEQ);
> +             else
> +                     CLR(gpcb->gpcb_pflags, GREPCB_RECVSEQ);
> +             break;
> +     default:
> +             return (ENOPROTOOPT);
> +             break;
> +     }
> +
> +     return (0);
> +}
> +
> +static int
> +gre_getopt(struct gre_pcb *gpcb, int optname, struct mbuf *m)
> +{
> +     switch (optname) {
> +     case GRE_KEY:
> +             if (!ISSET(gpcb->gpcb_flags, htons(GRE_KP)))
> +                     return (ENOTCONN);
> +
> +             m->m_len = sizeof(gpcb->gpcb_key);
> +             *mtod(m, uint32_t *) = bemtoh32(&gpcb->gpcb_key);
> +             break;
> +     case GRE_CKSUM:
> +             m->m_len = sizeof(int);
> +             *mtod(m, int *) = !!ISSET(gpcb->gpcb_flags, htons(GRE_CP));
> +             break;
> +     case GRE_SEQ:
> +             if (!ISSET(gpcb->gpcb_flags, htons(GRE_SP)))
> +                     return (ENOTCONN);
> +
> +             m->m_len = sizeof(gpcb->gpcb_seq);
> +             *mtod(m, uint32_t *) = gpcb->gpcb_key;
> +             break;
> +     default:
> +             return (ENOPROTOOPT);
> +     }
> +
> +     return (0);
> +}
> +
> +/*
> + * IP socket option processing.
> + */
> +static int
> +gre_ctloutput(const struct gre_ops *ops, int op, struct socket *so,
> +    int level, int optname, struct mbuf *m)
> +{
> +     struct inpcb *inp;
> +     struct gre_pcb *gpcb;
> +     int error;
> +
> +     inp = sotoinpcb(so);
> +     if (inp == NULL)
> +             return (ECONNRESET);
> +     if (level != IPPROTO_GRE)
> +             return ((*ops->op_ctloutput)(level, so, level, optname, m));
> +
> +     gpcb = inp_gpcb(inp);
> +
> +     switch (op) {
> +     case PRCO_SETOPT:
> +             error = gre_setopt(gpcb, optname, m);
> +             break;
> +     case PRCO_GETOPT:
> +             error = gre_getopt(gpcb, optname, m);
> +             break;
> +     default:
> +             panic("%s op %d", __func__, op);
> +     }
> +
> +     return (error);
> +}
> +
> +static void *
> +gre_pullup(struct mbuf **mp, int *offp, int len)
> +{
> +     int hlen = *offp + len;
> +     void *h;
> +
> +     if ((*mp)->m_pkthdr.len < hlen)
> +             return (NULL); /* decline */
> +
> +     *mp = m_pullup(*mp, hlen);
> +     if (*mp == NULL)
> +             return (NULL);
> +
> +     h = mtod(*mp, caddr_t) + *offp;
> +     *offp = hlen;
> +
> +     return (h);
> +}
> +
> +static int
> +gre_candeliver(const struct gre_pcb *gpcb, uint8_t ttl)
> +{
> +     const struct inpcb *inp = gpcb_inp(gpcb);
> +
> +     if (ISSET(inp->inp_socket->so_state, SS_CANTRCVMORE))
> +             return (0);
> +
> +     if (inp->inp_ip_minttl > ttl)
> +             return (0);
> +
> +     return (1);
> +}
> +
> +static struct mbuf *
> +gre_ip_input(const struct gre_ops *ops, struct mbuf *m, int iphlen,
> +    uint8_t ttl, struct gre_pcb_key *key)
> +{
> +     int hlen = iphlen;
> +     struct gre_header *gh;
> +     struct gre_pcb_key *gk;
> +     struct gre_pcb *gpcb, *ngpcb;
> +     uint16_t cksum = 0;
> +     uint32_t seqno = 0;
> +
> +     gh = gre_pullup(&m, &hlen, sizeof(*gh));
> +     if (gh == NULL)
> +             return (m);
> +
> +     if ((gh->gre_flags & htons(GRE_VERS_MASK)) != htons(GRE_VERS_0))
> +             return (m); /* decline */
> +
> +     if (ISSET(gh->gre_flags, ~htons(GRE_VALID_MASK)))
> +             return (m); /* decline */
> +
> +     key->gk_flags = gh->gre_flags;
> +     key->gk_proto = gh->gre_proto;
> +
> +     if (ISSET(key->gk_flags, htons(GRE_CP))) {
> +             struct gre_h_cksum *gch;
> +
> +             gch = gre_pullup(&m, &hlen, sizeof(*gch));
> +             if (gch == NULL)
> +                     return (m);
> +
> +             cksum = gch->gre_cksum;
> +
> +             /* XXX ignore Reserved (Offset) field */
> +     }
> +
> +     if (ISSET(key->gk_flags, htons(GRE_KP))) {
> +             struct gre_h_key *gkh;
> +
> +             gkh = gre_pullup(&m, &hlen, sizeof(*gkh));
> +             if (gkh == NULL)
> +                     return (m);
> +
> +             key->gk_key = gkh->gre_key;
> +     }
> +
> +     if (ISSET(key->gk_flags, htons(GRE_SP))) {
> +             struct gre_h_seq *gsh;
> +
> +             gsh = gre_pullup(&m, &hlen, sizeof(*gsh));
> +             if (gsh == NULL)
> +                     return (m);
> +
> +             seqno = bemtoh32(&gsh->gre_seq);
> +     }
> +
> +     key->gk_rtableid = m->m_pkthdr.ph_rtableid;
> +
> +     gk = RBT_FIND(gre_tree_connected, &gre_connected, key);
> +     if (gk == NULL) {
> +             gk = RBT_FIND(gre_tree_bound, &gre_bound, key);
> +             if (gk == NULL) {
> +                     gk = RBT_FIND(gre_tree_wildcards, &gre_wildcards, key);
> +                     if (gk == NULL)
> +                             return (m); /* decline */
> +             }
> +     }
> +
> +     /* it's ours now */
> +
> +     if (ISSET(key->gk_flags, htons(GRE_CP))) {
> +             /* XXX actually do the checksum calc */
> +     }
> +
> +     gpcb = gre_pcb_first(&gk->gk_pcbs);
> +     while (!gre_candeliver(gpcb, ttl)) {
> +             gpcb = gre_pcb_next(gpcb);
> +             if (gpcb == NULL)
> +                     goto drop;
> +     }
> +
> +     ngpcb = gpcb;
> +     for (;;) {
> +             struct mbuf *mm;
> +
> +             ngpcb = gre_pcb_next(ngpcb);
> +             if (ngpcb == NULL)
> +                     break;
> +
> +             if (!gre_candeliver(ngpcb, ttl))
> +                     continue;
> +
> +             mm = m_dup_pkt(m, 0, M_DONTWAIT);
> +             if (mm == NULL) {
> +                     /* assume further copies will also fail */
> +                     break;
> +             }
> +
> +             (*ops->op_sbappend)(gpcb, mm, hlen, seqno);
> +             gpcb = ngpcb;
> +     }
> +
> +     (*ops->op_sbappend)(gpcb, m, hlen, seqno);
> +
> +     return (NULL);
> +
> +drop:
> +     m_freem(m);
> +     return (NULL);
> +}
> +
> +static int
> +gre_pcb_key_cmp_wildcard(const struct gre_pcb_key *a,
> +    const struct gre_pcb_key *b)
> +{
> +     if (a->gk_proto > b->gk_proto)
> +             return (1);
> +     if (a->gk_proto < b->gk_proto)
> +             return (-1);
> +
> +     if (a->gk_flags > b->gk_flags)
> +             return (1);
> +     if (a->gk_flags < b->gk_flags)
> +             return (-1);
> +
> +     if (ISSET(a->gk_flags, htons(GRE_KP))) {
> +             if (a->gk_key > b->gk_key)
> +                     return (1);
> +             if (a->gk_key < b->gk_key)
> +                     return (-1);
> +     }
> +
> +     if (a->gk_rtableid > b->gk_rtableid)
> +             return (1);
> +     if (a->gk_rtableid < b->gk_rtableid)
> +             return (-1);
> +
> +     if (a->gk_family > b->gk_family)
> +             return (1);
> +     if (a->gk_family < b->gk_family)
> +             return (-1);
> +
> +     return (0);
> +}
> +
> +RBT_GENERATE(gre_tree_wildcards, gre_pcb_key, gk_entry,
> +    gre_pcb_key_cmp_wildcard);
> +
> +static int
> +gre_pcb_key_cmp_bound(const struct gre_pcb_key *a,
> +    const struct gre_pcb_key *b)
> +{
> +     int rv;
> +
> +     rv = gre_pcb_key_cmp_wildcard(a, b);
> +     if (rv != 0)
> +             return (rv);
> +
> +     switch (a->gk_family) {
> +     case AF_INET:
> +             rv = memcmp(&a->gk_laddr4, &b->gk_laddr4, sizeof(a->gk_laddr4));
> +             break;
> +#ifdef INET
> +     case AF_INET6:
> +             rv = memcmp(&a->gk_laddr6, &b->gk_laddr6, sizeof(a->gk_laddr6));
> +             break;
> +#endif
> +     default:
> +             unhandled_af(a->gk_family);
> +     }
> +
> +     return (rv);
> +}
> +
> +RBT_GENERATE(gre_tree_bound, gre_pcb_key, gk_entry, gre_pcb_key_cmp_bound);
> +
> +static int
> +gre_pcb_key_cmp_connected(const struct gre_pcb_key *a,
> +    const struct gre_pcb_key *b)
> +{
> +     int rv;
> +
> +     rv = gre_pcb_key_cmp_bound(a, b);
> +     if (rv != 0)
> +             return (rv);
> +
> +     switch (a->gk_family) {
> +     case AF_INET:
> +             rv = memcmp(&a->gk_faddr4, &b->gk_faddr4, sizeof(a->gk_faddr4));
> +             break;
> +#ifdef INET
> +     case AF_INET6:
> +             rv = memcmp(&a->gk_faddr6, &b->gk_faddr6, sizeof(a->gk_faddr6));
> +             break;
> +#endif
> +     default:
> +             unhandled_af(a->gk_family);
> +     }
> +
> +     return (rv);
> +}
> +
> +RBT_GENERATE(gre_tree_connected, gre_pcb_key, gk_entry,
> +    gre_pcb_key_cmp_connected);
> Index: sys/netinet6/in6_proto.c
> ===================================================================
> RCS file: /cvs/src/sys/netinet6/in6_proto.c,v
> retrieving revision 1.104
> diff -u -p -r1.104 in6_proto.c
> --- sys/netinet6/in6_proto.c  13 Jun 2019 08:12:11 -0000      1.104
> +++ sys/netinet6/in6_proto.c  29 Oct 2019 07:57:58 -0000
> @@ -117,7 +117,7 @@
>  
>  #include "gre.h"
>  #if NGRE > 0
> -#include <net/if_gre.h>
> +#include <netinet/gre_var.h>
>  #endif
>  
>  /*
> @@ -340,11 +340,22 @@ const struct protosw inet6sw[] = {
>    .pr_domain = &inet6domain,
>    .pr_protocol       = IPPROTO_GRE,
>    .pr_flags  = PR_ATOMIC|PR_ADDR,
> -  .pr_input  = gre_input6,
> +  .pr_input  = gre_ip6_input,
>    .pr_ctloutput      = rip6_ctloutput,
>    .pr_usrreq = rip6_usrreq,
>    .pr_attach = rip6_attach,
>    .pr_detach = rip6_detach,
> +},
> +{
> +  .pr_type   = SOCK_DGRAM,
> +  .pr_domain = &inet6domain,
> +  .pr_protocol       = IPPROTO_GRE,
> +  .pr_flags  = PR_ATOMIC|PR_ADDR,
> +  .pr_input  = gre_ip6_input,
> +  .pr_ctloutput      = gre_ip6_ctloutput,
> +  .pr_usrreq = gre_ip6_usrreq,
> +  .pr_attach = gre_attach,
> +  .pr_detach = gre_detach,
>  },
>  #endif /* NGRE */

Reply via email to