On Fri, Oct 25, 2019 at 12:27:25PM -0700, Mike Larkin wrote:
> On Fri, Oct 25, 2019 at 06:15:59PM +0000, Reyk Floeter wrote:
> > Hi,
> > 
> > the attached diff is rather large and implements two things for vmd:
> > 
> > 1) Allow to configure static IP address/gateway pairs local interfaces.
> > 2) Skip statically configured interface names (eg. tap0) when
> >   allocating dynamic interfaces.
> > 
> > Example:
> > ---snip---
> > vm "foo" {
> >         disable
> >         local interface "tap0" {
> >                 address 192.168.0.10/24 192.168.0.1
> >         }
> >     local interface "tap1"
> >         disk "/home/vm/foo.qcow2"
> > }
> > 
> > vm "bar" {
> >         local interface
> >         disk "/home/vm/bar.qcow2"
> > }
> > ---snap---
> > 
> > 
> > 1) The VM "foo" has two interfaces: The first interface has a fixed
> > IPv4 address with 192.168.0.1/24 on the gateway and 192.168.0.10/24 on
> > the VM.  192.168.0.10/24 is assigned to the VM's first NIC via the
> > built-in DHCP server.  The second VM gets a default 100.64.x.x/31 IP.
> 
> I'm not sure the above description matches what I'm seeing in the vm.conf
> snippet above.
> 
> What's "the gateway" here? Is this the host machine, or the actual
> gateway, perhaps on some other machine? Does this just allow me to specify
> the host-side tap(4) IP address for a corresponding given VM vio(4) interface?
> 

Ah, OK.  I used the terms without explaining them:

With local interfaces, vmd(8) uses two IPs per interface: one for the
tap(4) on the host, one for the vio(4) on the VM.  It configures the
first one on the host and provides the second one via DHCP.  The IP on
the host IP is the default "gateway" router for the VM.

The address syntax is currently reversed:
        address "address/prefix" "gateway"
Maybe I should change it to
        address "gateway" "address/prefix"
or
        address "address/prefix" gateway "gateway"

I also wonder if we could technically use a non-local IP address for
the gateway.  I currently enforce that the prefix matches, but I don't
enforce that both addresses are in the same subnet.

When using the default auto-generated 100.64.0.0/31 method, it uses
the first IP in the subnet as the gateway and the second IP for the
VM.

> And did you mean "The second interface" there instead of the "The second VM"?
> (Although I think the description fits for "The second VM" also...)
> 

Yes, both, the second interface is correct as well.

> I think the idea is sound. As long as we don't end up adding extra command
> line args to vmctl to manually configure this, which it doesn't appear we are
> doing here. :)
> 

I don't want to add it to vmctl either.

> I didn't read the diff in great detail, I'll wait until you say you have a
> final version.
> 

OK, thanks.

Reyk

> -ml
> 
> >     This idea came up when I talked with Mischa at EuroBSDCon about
> > OpenBSDAms: instead of using L2 and external static dhcpd for all VMs,
> > it could be a solution to use L3 and to avoid bridge(4) and dhcpd(8).
> > But it would need a way to serve static IPs via the internal dhcp
> > server.  Using L3 with vmd is better with performance, routing, PF,
> > etc., but has the drawback that it wastes a subnet and gateway IP per
> > VM (maybe rdomains or other tricks could help here, but this is a
> > problem for later).
> > 
> > 2) The VM "foo" uses two static interface names, tap0 and tap1, and
> > the VM "bar" uses a dynamic interface name (tapX).  Without this diff,
> > vmd would most certainly use tap0 for bar's interface because foo is
> > disabled and not started before bar.  With the diff, the first
> > interface of bar will be tap2 or higher.
> >     The problem was just reported by kn@.  I mixed both things into
> > one diff because I was working on 1) when kn@ reported it.  There are
> > other ways to implement 2) but solving both issues in a similar way
> > made more sense.
> > 
> > This is not the final diff.  I still have to clean it up, get
> > feedback, think a little bit about it, and split it into smaller parts
> > for review.  I wanted to share the big picture.
> > 
> > As a side node, I implemented the lookup with sorted tables because it
> > is the most efficient way to do it, but maybe a simple linear lookup
> > (iterating over all the VMs and all the interfaces all the time) would
> > be good enough.  But the current approach has benefits - if I did it
> > right ;)
> > 
> > Thoughts?
> > 
> > Reyk
> > 
> > Index: usr.sbin/vmd/dhcp.c
> > ===================================================================
> > RCS file: /cvs/src/usr.sbin/vmd/dhcp.c,v
> > retrieving revision 1.8
> > diff -u -p -u -p -r1.8 dhcp.c
> > --- usr.sbin/vmd/dhcp.c     27 Dec 2018 19:51:30 -0000      1.8
> > +++ usr.sbin/vmd/dhcp.c     25 Oct 2019 18:11:05 -0000
> > @@ -119,8 +119,7 @@ dhcp_request(struct vionet_dev *dev, cha
> >     }
> >  
> >     if ((client_addr.s_addr =
> > -       vm_priv_addr(&env->vmd_cfg,
> > -       dev->vm_vmid, dev->idx, 1)) == 0)
> > +       vm_priv_addr(&env->vmd_cfg, dev->vm_vmid, dev->idx, 1, &mask)) == 0)
> >             return (-1);
> >     memcpy(&resp.yiaddr, &client_addr,
> >         sizeof(client_addr));
> > @@ -129,7 +128,7 @@ dhcp_request(struct vionet_dev *dev, cha
> >     ss2sin(&pc.pc_dst)->sin_port = htons(CLIENT_PORT);
> >  
> >     if ((server_addr.s_addr = vm_priv_addr(&env->vmd_cfg, dev->vm_vmid,
> > -       dev->idx, 0)) == 0)
> > +       dev->idx, 0, &mask)) == 0)
> >             return (-1);
> >     memcpy(&resp.siaddr, &server_addr, sizeof(server_addr));
> >     memcpy(&ss2sin(&pc.pc_src)->sin_addr, &server_addr,
> > Index: usr.sbin/vmd/parse.y
> > ===================================================================
> > RCS file: /cvs/src/usr.sbin/vmd/parse.y,v
> > retrieving revision 1.52
> > diff -u -p -u -p -r1.52 parse.y
> > --- usr.sbin/vmd/parse.y    14 May 2019 06:05:45 -0000      1.52
> > +++ usr.sbin/vmd/parse.y    25 Oct 2019 18:11:05 -0000
> > @@ -120,9 +120,9 @@ typedef struct {
> >  
> >  
> >  %token     INCLUDE ERROR
> > -%token     ADD ALLOW BOOT CDROM DEVICE DISABLE DISK DOWN ENABLE FORMAT 
> > GROUP
> > -%token     INET6 INSTANCE INTERFACE LLADDR LOCAL LOCKED MEMORY NET NIFS 
> > OWNER
> > -%token     PATH PREFIX RDOMAIN SIZE SOCKET SWITCH UP VM VMID
> > +%token     ADD ADDRESS ALLOW BOOT CDROM DEVICE DISABLE DISK DOWN ENABLE 
> > FORMAT
> > +%token     GROUP INET6 INSTANCE INTERFACE LLADDR LOCAL LOCKED MEMORY NET 
> > NIFS
> > +%token     OWNER PATH PREFIX RDOMAIN SIZE SOCKET SWITCH UP VM VMID
> >  %token     <v.number>      NUMBER
> >  %token     <v.string>      STRING
> >  %type      <v.lladdr>      lladdr
> > @@ -413,6 +413,12 @@ vm_opts                : disable                       
> > {
> >  
> >                     if ($1)
> >                             vmc.vmc_ifflags[i] |= VMIFF_LOCAL;
> > +                   else if (vmc.vmc_ifflags[i] &
> > +                       (VMIFF_ADDR4|VMIFF_ADDR6)) {
> > +                           yyerror("address on non-local interface");
> > +                           free($3);
> > +                           YYERROR;
> > +                   }
> >                     if ($3 != NULL) {
> >                             if (strcmp($3, "tap") != 0 &&
> >                                 (priv_getiftype($3, type, NULL) == -1 ||
> > @@ -617,7 +623,53 @@ iface_opts_c   : iface_opts_c iface_opts o
> >             | iface_opts
> >             ;
> >  
> > -iface_opts : SWITCH string                 {
> > +iface_opts : ADDRESS STRING STRING                 {
> > +                   struct vmop_address     *vma;
> > +                   unsigned int             i = vcp_nnics;
> > +                   struct address           addr, gw;
> > +                   char                    *gwp = NULL;
> > +                   int                      maxprefixlen = 0;
> > +
> > +                   /* Does the gateway have a /prefix syntax? */
> > +                   gwp = strrchr($3, '/');
> > +
> > +                   if (host($2, &addr) == -1 ||
> > +                       host($3, &gw) == -1 ||
> > +                       addr.ss.ss_family != gw.ss.ss_family) {
> > +                           yyerror("invalid address: %s %s", $2, $3);
> > +                           free($2);
> > +                           free($3);
> > +                           YYERROR;
> > +                   }
> > +                   free($2);
> > +
> > +                   if (gwp == NULL)
> > +                           gw.prefixlen = addr.prefixlen;
> > +                   else if (gw.prefixlen != addr.prefixlen) {
> > +                           yyerror("mismatched gateway prefix: %s", $3);
> > +                           free($3);
> > +                           YYERROR;
> > +                   }
> > +                   free($3);
> > +
> > +                   if (addr.ss.ss_family == AF_INET) {
> > +                           vmc.vmc_ifflags[i] |= VMIFF_ADDR4;
> > +                           vma = &vmc.vmc_ifaddr4[i];
> > +                           maxprefixlen = 127;
> > +                   } else {
> > +                           vmc.vmc_ifflags[i] |= VMIFF_ADDR6;
> > +                           vma = &vmc.vmc_ifaddr6[i];
> > +                           maxprefixlen = 31;
> > +                   }
> > +                   if (maxprefixlen && addr.prefixlen > maxprefixlen) {
> > +                           yyerror("address prefix larger than /%u");
> > +                           YYERROR;
> > +                   }
> > +                   memcpy(&vma->vma_addr, &addr.ss, sizeof(addr.ss));
> > +                   memcpy(&vma->vma_gw, &gw.ss, sizeof(gw.ss));
> > +                   vma->vma_prefixlen = addr.prefixlen;
> > +           }
> > +           | SWITCH string                 {
> >                     unsigned int    i = vcp_nnics;
> >  
> >                     /* No need to check if the switch exists */
> > @@ -763,6 +815,7 @@ lookup(char *s)
> >     /* this has to be sorted always */
> >     static const struct keywords keywords[] = {
> >             { "add",                ADD },
> > +           { "address",            ADDRESS },
> >             { "allow",              ALLOW },
> >             { "boot",               BOOT },
> >             { "cdrom",              CDROM },
> > Index: usr.sbin/vmd/priv.c
> > ===================================================================
> > RCS file: /cvs/src/usr.sbin/vmd/priv.c,v
> > retrieving revision 1.15
> > diff -u -p -u -p -r1.15 priv.c
> > --- usr.sbin/vmd/priv.c     28 Jun 2019 13:32:51 -0000      1.15
> > +++ usr.sbin/vmd/priv.c     25 Oct 2019 18:11:05 -0000
> > @@ -46,6 +46,12 @@
> >  #include "proc.h"
> >  #include "vmd.h"
> >  
> > +static unsigned int *priv_ifunits;
> > +static size_t priv_nifunits;
> > +
> > +static struct vmd_ifconfig *priv_ifs;
> > +static size_t priv_nifs;
> > +
> >  int         priv_dispatch_parent(int, struct privsep_proc *, struct imsg 
> > *);
> >  void        priv_run(struct privsep *, struct privsep_proc *, void *);
> >  
> > @@ -91,7 +97,9 @@ priv_dispatch_parent(int fd, struct priv
> >     struct ifaliasreq        ifra;
> >     struct in6_aliasreq      in6_ifra;
> >     struct if_afreq          ifar;
> > +   struct vmd_ifconfig      vifc;
> >     char                     type[IF_NAMESIZE];
> > +   int                      i;
> >  
> >     switch (imsg->hdr.type) {
> >     case IMSG_VMDOP_PRIV_IFDESCR:
> > @@ -112,6 +120,8 @@ priv_dispatch_parent(int fd, struct priv
> >                     fatalx("%s: rejected priv operation on interface: %s",
> >                         __func__, vfr.vfr_name);
> >             break;
> > +   case IMSG_VMDOP_IF_REGISTER:
> > +   case IMSG_VMDOP_IF_UNREGISTER:
> >     case IMSG_VMDOP_CONFIG:
> >     case IMSG_CTL_RESET:
> >             break;
> > @@ -244,6 +254,18 @@ priv_dispatch_parent(int fd, struct priv
> >             if (ioctl(env->vmd_fd6, SIOCAIFADDR_IN6, &in6_ifra) == -1)
> >                     log_warn("SIOCAIFADDR_IN6");
> >             break;
> > +   case IMSG_VMDOP_IF_REGISTER:
> > +           IMSG_SIZE_CHECK(imsg, &vifc);
> > +           memcpy(&vifc, imsg->data, sizeof(vifc));
> > +           if (vm_priv_register(ps, &vifc) == -1)
> > +                   fatalx("%s: failed to register interface",
> > +                       __func__);
> > +           break;
> > +   case IMSG_VMDOP_IF_UNREGISTER:
> > +           IMSG_SIZE_CHECK(imsg, &i);
> > +           memcpy(&i, imsg->data, sizeof(i));
> > +           vm_priv_unregister(ps, imsg->hdr.peerid, i);
> > +           break;
> >     case IMSG_VMDOP_CONFIG:
> >             config_getconfig(env, imsg);
> >             break;
> > @@ -328,8 +350,9 @@ vm_priv_ifconfig(struct privsep *ps, str
> >     struct vmd_switch       *vsw;
> >     unsigned int             i;
> >     struct vmop_ifreq        vfr, vfbr;
> > -   struct sockaddr_in      *sin4;
> > -   struct sockaddr_in6     *sin6;
> > +   struct sockaddr_in      *sin4, *mask4;
> > +   struct sockaddr_in6     *sin6, *mask6;
> > +   uint8_t                  prefixlen;
> >  
> >     for (i = 0; i < VMM_MAX_NICS_PER_VM; i++) {
> >             vif = &vm->vm_ifs[i];
> > @@ -435,24 +458,25 @@ vm_priv_ifconfig(struct privsep *ps, str
> >                     memset(&vfr.vfr_mask, 0, sizeof(vfr.vfr_mask));
> >                     memset(&vfr.vfr_addr, 0, sizeof(vfr.vfr_addr));
> >  
> > -                   /* local IPv4 address with a /31 mask */
> > -                   sin4 = (struct sockaddr_in *)&vfr.vfr_mask;
> > -                   sin4->sin_family = AF_INET;
> > -                   sin4->sin_len = sizeof(*sin4);
> > -                   sin4->sin_addr.s_addr = htonl(0xfffffffe);
> > +                   /* local IPv4 address and netmask */
> > +                   mask4 = ss2sin(&vfr.vfr_mask);
> > +                   mask4->sin_family = AF_INET;
> > +                   mask4->sin_len = sizeof(*mask4);
> >  
> > -                   sin4 = (struct sockaddr_in *)&vfr.vfr_addr;
> > +                   sin4 = ss2sin(&vfr.vfr_addr);
> >                     sin4->sin_family = AF_INET;
> >                     sin4->sin_len = sizeof(*sin4);
> >                     if ((sin4->sin_addr.s_addr =
> >                         vm_priv_addr(&env->vmd_cfg,
> > -                       vm->vm_vmid, i, 0)) == 0)
> > +                       vm->vm_vmid, i, 0, &mask4->sin_addr)) == 0)
> >                             return (-1);
> >  
> >                     inet_ntop(AF_INET, &sin4->sin_addr,
> >                         name, sizeof(name));
> > -                   log_debug("%s: interface %s address %s/31",
> > -                       __func__, vfr.vfr_name, name);
> > +                   prefixlen = mask2prefixlen((struct sockaddr *)mask4);
> > +                   
> > +                   log_debug("%s: interface %s address %s/%u",
> > +                       __func__, vfr.vfr_name, name, prefixlen);
> >  
> >                     proc_compose(ps, PROC_PRIV, IMSG_VMDOP_PRIV_IFADDR,
> >                         &vfr, sizeof(vfr));
> > @@ -462,24 +486,24 @@ vm_priv_ifconfig(struct privsep *ps, str
> >                     memset(&vfr.vfr_mask, 0, sizeof(vfr.vfr_mask));
> >                     memset(&vfr.vfr_addr, 0, sizeof(vfr.vfr_addr));
> >  
> > -                   /* local IPv6 address with a /96 mask */
> > -                   sin6 = ss2sin6(&vfr.vfr_mask);
> > -                   sin6->sin6_family = AF_INET6;
> > -                   sin6->sin6_len = sizeof(*sin6);
> > -                   memset(&sin6->sin6_addr.s6_addr[0], 0xff, 12);
> > -                   memset(&sin6->sin6_addr.s6_addr[12], 0, 4);
> > +                   /* local IPv6 address and netmask */
> > +                   mask6 = ss2sin6(&vfr.vfr_mask);
> > +                   mask6->sin6_family = AF_INET6;
> > +                   mask6->sin6_len = sizeof(*sin6);
> >  
> >                     sin6 = ss2sin6(&vfr.vfr_addr);
> >                     sin6->sin6_family = AF_INET6;
> >                     sin6->sin6_len = sizeof(*sin6);
> >                     if (vm_priv_addr6(&env->vmd_cfg,
> > -                       vm->vm_vmid, i, 0, &sin6->sin6_addr) == -1)
> > +                       vm->vm_vmid, i, 0, &sin6->sin6_addr,
> > +                       &mask6->sin6_addr) == -1)
> >                             return (-1);
> >  
> >                     inet_ntop(AF_INET6, &sin6->sin6_addr,
> >                         name, sizeof(name));
> > -                   log_debug("%s: interface %s address %s/96",
> > -                       __func__, vfr.vfr_name, name);
> > +                   prefixlen = mask2prefixlen6((struct sockaddr *)mask6);
> > +                   log_debug("%s: interface %s address %s/%u",
> > +                       __func__, vfr.vfr_name, name, prefixlen);
> >  
> >                     proc_compose(ps, PROC_PRIV, IMSG_VMDOP_PRIV_IFADDR6,
> >                         &vfr, sizeof(vfr));
> > @@ -543,11 +567,196 @@ vm_priv_brconfig(struct privsep *ps, str
> >     return (0);
> >  }
> >  
> > +static int
> > +priv_if_cmp(const void *a, const void *b)
> > +{
> > +   const struct vmd_ifconfig *vifca = a;
> > +   const struct vmd_ifconfig *vifcb = b;
> > +
> > +   if (vifca->vifc_vmid != vifcb->vifc_vmid)
> > +           return (vifca->vifc_vmid > vifcb->vifc_vmid ? 1 : -1);
> > +   if (vifca->vifc_idx != vifcb->vifc_idx)
> > +           return (vifca->vifc_idx > vifcb->vifc_idx ? 1 : -1);
> > +
> > +   return (0);
> > +}
> > +
> > +static int
> > +priv_ifunit_cmp(const void *a, const void *b)
> > +{
> > +   int     ia = *(const unsigned int *)a;
> > +   int     ib = *(const unsigned int *)b;
> > +
> > +   return ((int)ia - (int)ib);
> > +}
> > +
> > +unsigned int *
> > +vm_priv_byunit(unsigned int unit)
> > +{
> > +   return (bsearch(&unit, priv_ifunits, priv_nifunits, sizeof(unit),
> > +       priv_ifunit_cmp));
> > +}
> > +
> > +struct vmd_ifconfig *
> > +vm_priv_byid(uint32_t vmid, int idx)
> > +{
> > +   struct vmd_ifconfig      key;
> > +
> > +   key.vifc_vmid = vmid;
> > +   key.vifc_idx = idx;
> > +   return (bsearch(&key, priv_ifs, priv_nifs, sizeof(key), priv_if_cmp));
> > +}
> > +
> > +/*
> > + * Called to register global interface configuration
> > + * - the associated VM id
> > + * - the relativ interface index of the VM
> > + * - the fixed tap(4) interface unit (optional)
> > + * - the fixed IP address (optional)
> > + */
> > +int
> > +vm_priv_register(struct privsep *ps, struct vmd_ifconfig *vifc)
> > +{
> > +   struct vmd_ifconfig     *ifc = NULL;
> > +   unsigned int            *ifu;
> > +
> > +   /* Ignore interfaces that don't have any relevant configuration */
> > +   if (vifc->vifc_flags == 0)
> > +           return (0);
> > +
> > +   if (vifc->vifc_vmid == UINT32_MAX) {
> > +           log_warnx("VM id %u too large", vifc->vifc_unit);
> > +           goto fail;
> > +   }
> > +
> > +   if (vm_priv_byid(vifc->vifc_vmid, vifc->vifc_idx) != NULL) {
> > +           log_warnx("interface vm %u #%u registered twice",
> > +               vifc->vifc_vmid, vifc->vifc_idx);
> > +           goto fail;
> > +   }
> > +
> > +   /* Append new interface */
> > +   if ((ifc = recallocarray(priv_ifs, priv_nifs,
> > +       priv_nifs + 1, sizeof(*ifc))) == NULL) {
> > +           log_warn("failed to grow interface table");
> > +           goto fail;
> > +   }
> > +   priv_ifs = ifc;
> > +   memcpy(&priv_ifs[priv_nifs], vifc, sizeof(*vifc));
> > +   priv_nifs++;
> > +
> > +   /* Sort table */
> > +   qsort(priv_ifs, priv_nifs, sizeof(*ifc), priv_if_cmp);
> > +
> > +   if (vifc->vifc_flags & VMD_IFC_UNIT) {
> > +           if (vifc->vifc_unit == UINT_MAX) {
> > +                   log_warnx("interface tap%u unit too large",
> > +                       vifc->vifc_unit);
> > +                   goto fail;
> > +           }
> > +
> > +           if (vm_priv_byunit(vifc->vifc_unit) != NULL) {
> > +                   log_warnx("interface tap%u defined twice",
> > +                       vifc->vifc_unit);
> > +                   goto fail;
> > +           }
> > +
> > +           /* Append new interface unit */
> > +           if ((ifu = recallocarray(priv_ifunits, priv_nifunits,
> > +               priv_nifunits + 1, sizeof(*ifu))) == NULL) {
> > +                   log_warn("failed to grow interface unit table");
> > +                   goto fail;
> > +           }
> > +           priv_ifunits = ifu;
> > +           priv_ifunits[priv_nifunits++] = vifc->vifc_unit;
> > +
> > +           /* Sort table */
> > +           qsort(priv_ifunits, priv_nifunits, sizeof(*ifu),
> > +              priv_ifunit_cmp);
> > +
> > +           log_debug("%s: %s registered interface tap%u", __func__,
> > +               ps->ps_title[privsep_process],
> > +               vifc->vifc_unit);
> > +   }
> > +
> > +   return (0);
> > +
> > + fail:
> > +   if (ifc != NULL)
> > +           vm_priv_unregister(ps, vifc->vifc_vmid, vifc->vifc_idx);
> > +   return (-1);
> > +}
> > +
> > +/*
> > + * Called to unregister global interface configuration
> > + */
> > +void
> > +vm_priv_unregister(struct privsep *ps, uint32_t vmid, int idx)
> > +{
> > +   struct vmd_ifconfig      *vifc, *ifc;
> > +   unsigned int            *ifu;
> > +
> > +   if ((vifc = vm_priv_byid(vmid, idx)) == NULL)
> > +           return;
> > +
> > +   if (vifc->vifc_flags & VMD_IFC_UNIT &&
> > +       (ifu = vm_priv_byunit(vifc->vifc_unit)) != NULL) {
> > +           /* Move entry to the end */
> > +           *ifu = UINT_MAX;
> > +           qsort(priv_ifunits, priv_nifunits, sizeof(*ifu),
> > +              priv_ifunit_cmp);
> > +
> > +           /* and remove last entry from the table */
> > +           if ((ifu = recallocarray(priv_ifunits, priv_nifunits,
> > +               priv_nifunits - 1, sizeof(*ifu))) == NULL &&
> > +               priv_nifunits > 1) {
> > +                   log_warn("failed to shrink interface unit table");
> > +                   return;
> > +           }
> > +           priv_ifunits = ifu;
> > +           priv_nifunits--;
> > +
> > +           log_debug("%s: %s unregistered interface tap%u", __func__,
> > +               ps->ps_title[privsep_process],
> > +               vifc->vifc_unit);
> > +   }
> > +
> > +   /* Move entry to the end */
> > +   vifc->vifc_vmid = UINT32_MAX;
> > +   qsort(priv_ifs, priv_nifs, sizeof(*ifc), priv_if_cmp);
> > +
> > +   /* and remove last entry from the table */
> > +   if ((ifc = recallocarray(priv_ifs, priv_nifs,
> > +       priv_nifs - 1, sizeof(*ifc))) == NULL &&
> > +       priv_nifs > 1) {
> > +           log_warn("failed to shrink interface table");
> > +           return;
> > +   }
> > +   priv_ifs = ifc;
> > +   priv_nifs--;
> > +
> > +   log_debug("%s: %s unregistered interface vm %u #%u", __func__,
> > +       ps->ps_title[privsep_process], vmid, idx);
> > +}
> > +
> >  uint32_t
> > -vm_priv_addr(struct vmd_config *cfg, uint32_t vmid, int idx, int isvm)
> > +vm_priv_addr(struct vmd_config *cfg, uint32_t vmid, int idx, int isvm,
> > +    struct in_addr *mask)
> >  {
> >     struct address          *h = &cfg->cfg_localprefix;
> > -   in_addr_t                prefix, mask, addr;
> > +   in_addr_t                prefix, addr;
> > +   struct vmd_ifconfig     *vifc;
> > +
> > +   /* Check if there is a preconfigured address for this interface */
> > +   if ((vifc = vm_priv_byid(vmid, idx)) != NULL &&
> > +       vifc->vifc_flags & VMD_IFC_ADDR4) {
> > +           if (isvm)
> > +                   addr = vifc->vifc_addr4.sin_addr.s_addr;
> > +           else
> > +                   addr = vifc->vifc_gw4.sin_addr.s_addr;
> > +           mask->s_addr = prefixlen2mask(vifc->vifc_prefixlen4);
> > +           return (addr);
> > +   }
> >  
> >     /*
> >      * 1. Set the address prefix and mask, 100.64.0.0/10 by default.
> > @@ -556,7 +765,7 @@ vm_priv_addr(struct vmd_config *cfg, uin
> >         h->prefixlen < 0 || h->prefixlen > 32)
> >             fatal("local prefix");
> >     prefix = ss2sin(&h->ss)->sin_addr.s_addr;
> > -   mask = prefixlen2mask(h->prefixlen);
> > +   mask->s_addr = prefixlen2mask(h->prefixlen);
> >  
> >     /* 2. Encode the VM ID as a per-VM subnet range N, 100.64.N.0/24. */
> >     addr = vmid << 8;
> > @@ -580,7 +789,7 @@ vm_priv_addr(struct vmd_config *cfg, uin
> >      * - the address should not exceed the prefix (eg. VM ID to high).
> >      * - up to 126 interfaces can be encoded per VM.
> >      */
> > -   if (prefix != (addr & mask) || idx >= 0x7f) {
> > +   if (prefix != (addr & mask->s_addr) || idx >= 0x7f) {
> >             log_warnx("%s: dhcp address range exceeded,"
> >                 " vm id %u interface %d", __func__, vmid, idx);
> >             return (0);
> > @@ -591,21 +800,35 @@ vm_priv_addr(struct vmd_config *cfg, uin
> >  
> >  int
> >  vm_priv_addr6(struct vmd_config *cfg, uint32_t vmid,
> > -    int idx, int isvm, struct in6_addr *in6_addr)
> > +    int idx, int isvm, struct in6_addr *in6_addr, struct in6_addr *mask)
> >  {
> >     struct address          *h = &cfg->cfg_localprefix6;
> > -   struct in6_addr          addr, mask;
> > +   struct in6_addr          addr, *addrptr;
> > +   struct vmd_ifconfig     *vifc;
> >     uint32_t                 addr4;
> > +   struct in_addr           mask4;
> > +
> > +   /* Check if there is a preconfigured address for this interface */
> > +   if ((vifc = vm_priv_byid(vmid, idx)) != NULL &&
> > +       vifc->vifc_flags & VMD_IFC_ADDR6) {
> > +           if (isvm)
> > +                   addrptr = &vifc->vifc_addr6.sin6_addr;
> > +           else
> > +                   addrptr = &vifc->vifc_gw6.sin6_addr;
> > +           memcpy(in6_addr, addrptr, sizeof(*in6_addr));
> > +           prefixlen2mask6(vifc->vifc_prefixlen6, mask);
> > +           return (0);
> > +   }
> >  
> >     /* 1. Set the address prefix and mask, fd00::/8 by default. */
> >     if (h->ss.ss_family != AF_INET6 ||
> >         h->prefixlen < 0 || h->prefixlen > 128)
> >             fatal("local prefix6");
> >     addr = ss2sin6(&h->ss)->sin6_addr;
> > -   prefixlen2mask6(h->prefixlen, &mask);
> > +   prefixlen2mask6(h->prefixlen, mask);
> >  
> >     /* 2. Encode the VM IPv4 address as subnet, fd00::NN:NN:0:0/96. */
> > -   if ((addr4 = vm_priv_addr(cfg, vmid, idx, 1)) == 0)
> > +   if ((addr4 = vm_priv_addr(cfg, vmid, idx, 1, &mask4)) == 0)
> >             return (0);
> >     memcpy(&addr.s6_addr[8], &addr4, sizeof(addr4));
> >  
> > Index: usr.sbin/vmd/vm.conf.5
> > ===================================================================
> > RCS file: /cvs/src/usr.sbin/vmd/vm.conf.5,v
> > retrieving revision 1.44
> > diff -u -p -u -p -r1.44 vm.conf.5
> > --- usr.sbin/vmd/vm.conf.5  14 May 2019 12:47:17 -0000      1.44
> > +++ usr.sbin/vmd/vm.conf.5  25 Oct 2019 18:11:05 -0000
> > @@ -209,6 +209,14 @@ to select a specific one.
> >  .Pp
> >  Valid options are:
> >  .Bl -tag -width Ds
> > +.It Ic address Ar address Ns Li / Ns Ar prefix Ar gateway
> > +If the interface is configured as a
> > +.Cm local
> > +interface,
> > +use a static IP address and gateway.
> > +This option can be specified for IPv4 and for IPv6.
> > +If not specified, the default is to auto-generate the address pair using 
> > the
> > +.Cm local Oo Cm inet6 Oc Cm prefix .
> >  .It Cm group Ar group-name
> >  Assign the interface to a specific interface
> >  .Dq group .
> > @@ -258,6 +266,8 @@ A
> >  interface will auto-generate an IPv4 subnet for the interface,
> >  configure a gateway address on the VM host side,
> >  and run a simple DHCP/BOOTP server for the VM.
> > +The address can optionally be configured as a static
> > +.Cm address .
> >  This option can be used for layer 3 mode without configuring a switch.
> >  .Pp
> >  If the global
> > Index: usr.sbin/vmd/vmd.c
> > ===================================================================
> > RCS file: /cvs/src/usr.sbin/vmd/vmd.c,v
> > retrieving revision 1.116
> > diff -u -p -u -p -r1.116 vmd.c
> > --- usr.sbin/vmd/vmd.c      4 Sep 2019 07:02:03 -0000       1.116
> > +++ usr.sbin/vmd/vmd.c      25 Oct 2019 18:11:05 -0000
> > @@ -1161,6 +1161,8 @@ void
> >  vm_remove(struct vmd_vm *vm, const char *caller)
> >  {
> >     struct privsep  *ps = &env->vmd_ps;
> > +   size_t           i;
> > +   int              idx;
> >  
> >     if (vm == NULL)
> >             return;
> > @@ -1171,6 +1173,16 @@ vm_remove(struct vmd_vm *vm, const char 
> >  
> >     TAILQ_REMOVE(env->vmd_vms, vm, vm_entry);
> >  
> > +   for (i = 0; i < vm->vm_params.vmc_params.vcp_nnics; i++) {
> > +           idx = (int)i;
> > +           vm_priv_unregister(ps, vm->vm_vmid, idx);
> > +           if (privsep_process == PROC_PARENT) {
> > +                   proc_compose_imsg(ps, PROC_PRIV, -1,
> > +                       IMSG_VMDOP_IF_UNREGISTER,
> > +                       vm->vm_vmid, -1, &idx, sizeof(idx));
> > +           }
> > +   }
> > +
> >     user_put(vm->vm_user);
> >     vm_stop(vm, 0, caller);
> >     free(vm);
> > @@ -1211,14 +1223,17 @@ int
> >  vm_register(struct privsep *ps, struct vmop_create_params *vmc,
> >      struct vmd_vm **ret_vm, uint32_t id, uid_t uid)
> >  {
> > -   struct vmd_vm           *vm = NULL, *vm_parent = NULL;
> > +   char                     ifname[IF_NAMESIZE], *s;
> > +   struct vmd_vm           *vm = NULL, *vm_new = NULL, *vm_parent = NULL;
> >     struct vm_create_params *vcp = &vmc->vmc_params;
> >     struct vmop_owner       *vmo = NULL;
> > +   struct vmop_address     *vma;
> >     struct vmd_user         *usr = NULL;
> > +   struct vmd_ifconfig      vifc;
> > +   int                      maxprefixlen;
> >     uint32_t                 nid, rng;
> >     unsigned int             i, j;
> >     struct vmd_switch       *sw;
> > -   char                    *s;
> >  
> >     /* Check if this is an instance of another VM */
> >     if (vm_instance(ps, &vm_parent, vmc, uid) == -1)
> > @@ -1294,7 +1309,7 @@ vm_register(struct privsep *ps, struct v
> >             goto fail;
> >     }
> >  
> > -   if ((vm = calloc(1, sizeof(*vm))) == NULL)
> > +   if ((vm = vm_new = calloc(1, sizeof(*vm))) == NULL)
> >             goto fail;
> >  
> >     memcpy(&vm->vm_params, vmc, sizeof(vm->vm_params));
> > @@ -1305,6 +1320,20 @@ vm_register(struct privsep *ps, struct v
> >     vm->vm_receive_fd = -1;
> >     vm->vm_state &= ~VM_STATE_PAUSED;
> >     vm->vm_user = usr;
> > +   vm->vm_kernel = -1;
> > +   vm->vm_cdrom = -1;
> > +   vm->vm_iev.ibuf.fd = -1;
> > +
> > +   /*
> > +    * Assign a new internal Id if not specified and we succeed in
> > +    * claiming a new Id.
> > +    */
> > +   if (id != 0)
> > +           vm->vm_vmid = id;
> > +   else if (vm_claimid(vcp->vcp_name, uid, &nid) == -1)
> > +           goto fail;
> > +   else
> > +           vm->vm_vmid = nid;
> >  
> >     for (i = 0; i < VMM_MAX_DISKS_PER_VM; i++)
> >             for (j = 0; j < VM_MAX_BASE_PER_DISK; j++)
> > @@ -1333,30 +1362,69 @@ vm_register(struct privsep *ps, struct v
> >                     vcp->vcp_macs[i][4] = rng;
> >                     vcp->vcp_macs[i][5] = rng >> 8;
> >             }
> > -   }
> > -   vm->vm_kernel = -1;
> > -   vm->vm_cdrom = -1;
> > -   vm->vm_iev.ibuf.fd = -1;
> >  
> > -   /*
> > -    * Assign a new internal Id if not specified and we succeed in
> > -    * claiming a new Id.
> > -    */
> > -   if (id != 0)
> > -           vm->vm_vmid = id;
> > -   else if (vm_claimid(vcp->vcp_name, uid, &nid) == -1)
> > -           goto fail;
> > -   else
> > -           vm->vm_vmid = nid;
> > +           /*
> > +            * Store interface in global configuration table
> > +            */
> > +           memset(&vifc, 0, sizeof(vifc));
> > +
> > +           /* Get and check pre-configured interface name */
> > +           s = vmc->vmc_ifnames[i];
> > +           if (*s != '\0' && strcmp("tap", s) != 0 &&
> > +               priv_getiftype(s, ifname, &vifc.vifc_unit) != -1)
> > +                   vifc.vifc_flags |= VMD_IFC_UNIT;
> > +
> > +           maxprefixlen = 0;
> > +           if (vmc->vmc_ifflags[i] & VMIFF_ADDR4) {
> > +                   vma = &vmc->vmc_ifaddr4[i];
> > +                   memcpy(&vifc.vifc_addr4, &vma->vma_addr,
> > +                       sizeof(vifc.vifc_addr4));
> > +                   memcpy(&vifc.vifc_gw4, &vma->vma_gw,
> > +                       sizeof(vifc.vifc_gw4));
> > +                   vifc.vifc_prefixlen4 = vma->vma_prefixlen;
> > +                   vifc.vifc_flags |= VMD_IFC_ADDR4;
> > +                   maxprefixlen = 127;
> > +           }
> > +           if (vmc->vmc_ifflags[i] & VMIFF_ADDR6) {
> > +                   vma = &vmc->vmc_ifaddr6[i];
> > +                   memcpy(&vifc.vifc_addr4, &vma->vma_addr,
> > +                       sizeof(vifc.vifc_addr4));
> > +                   memcpy(&vifc.vifc_gw4, &vma->vma_gw,
> > +                       sizeof(vifc.vifc_gw4));
> > +                   vifc.vifc_prefixlen4 = vma->vma_prefixlen;
> > +                   vifc.vifc_flags |= VMD_IFC_ADDR6;
> > +                   maxprefixlen = 31;
> > +           }
> > +           if (maxprefixlen && vma->vma_prefixlen > maxprefixlen) {
> > +                   log_warnx("address prefix larger than /%d",
> > +                       maxprefixlen);
> > +                   goto fail;
> > +           }
> > +
> > +           vifc.vifc_vmid = vm->vm_vmid;
> > +           vifc.vifc_idx = i;
> > +
> > +           if (vm_priv_register(ps, &vifc) == -1)
> > +                   goto fail;
> > +
> > +           if (privsep_process == PROC_PARENT) {
> > +                   proc_compose_imsg(ps, PROC_PRIV, -1,
> > +                       IMSG_VMDOP_IF_REGISTER, -1, -1,
> > +                       &vifc, sizeof(vifc));
> > +           }
> > +   }
> >  
> >     log_debug("%s: registering vm %d", __func__, vm->vm_vmid);
> >     TAILQ_INSERT_TAIL(env->vmd_vms, vm, vm_entry);
> >  
> >     *ret_vm = vm;
> >     return (0);
> > +
> >   fail:
> > +   free(vm_new);
> >     if (errno == 0)
> >             errno = EINVAL;
> > +
> >     return (-1);
> >  }
> >  
> > @@ -1956,6 +2024,71 @@ get_string(uint8_t *ptr, size_t len)
> >                     break;
> >  
> >     return strndup(ptr, i);
> > +}
> > +
> > +uint8_t
> > +mask2prefixlen(struct sockaddr *sa)
> > +{
> > +   struct sockaddr_in      *sa_in = (struct sockaddr_in *)sa;
> > +   in_addr_t                ina = sa_in->sin_addr.s_addr;
> > +
> > +   if (ina == 0)
> > +           return (0);
> > +   else
> > +           return (33 - ffs(ntohl(ina)));
> > +}
> > +
> > +uint8_t
> > +mask2prefixlen6(struct sockaddr *sa)
> > +{
> > +   struct sockaddr_in6     *sa_in6 = (struct sockaddr_in6 *)sa;
> > +   uint8_t                 *ap, *ep;
> > +   unsigned int             l = 0;
> > +
> > +   /*
> > +    * sin6_len is the size of the sockaddr so substract the offset of
> > +    * the possibly truncated sin6_addr struct.
> > +    */
> > +   ap = (uint8_t *)&sa_in6->sin6_addr;
> > +   ep = (uint8_t *)sa_in6 + sa_in6->sin6_len;
> > +   for (; ap < ep; ap++) {
> > +           /* this "beauty" is adopted from sbin/route/show.c ... */
> > +           switch (*ap) {
> > +           case 0xff:
> > +                   l += 8;
> > +                   break;
> > +           case 0xfe:
> > +                   l += 7;
> > +                   goto done;
> > +           case 0xfc:
> > +                   l += 6;
> > +                   goto done;
> > +           case 0xf8:
> > +                   l += 5;
> > +                   goto done;
> > +           case 0xf0:
> > +                   l += 4;
> > +                   goto done;
> > +           case 0xe0:
> > +                   l += 3;
> > +                   goto done;
> > +           case 0xc0:
> > +                   l += 2;
> > +                   goto done;
> > +           case 0x80:
> > +                   l += 1;
> > +                   goto done;
> > +           case 0x00:
> > +                   goto done;
> > +           default:
> > +                   fatalx("non contiguous inet6 netmask");
> > +           }
> > +   }
> > +
> > +done:
> > +   if (l > sizeof(struct in6_addr) * 8)
> > +           fatalx("%s: prefixlen %d out of bound", __func__, l);
> > +   return (l);
> >  }
> >  
> >  uint32_t
> > Index: usr.sbin/vmd/vmd.h
> > ===================================================================
> > RCS file: /cvs/src/usr.sbin/vmd/vmd.h,v
> > retrieving revision 1.97
> > diff -u -p -u -p -r1.97 vmd.h
> > --- usr.sbin/vmd/vmd.h      7 Sep 2019 09:11:14 -0000       1.97
> > +++ usr.sbin/vmd/vmd.h      25 Oct 2019 18:11:06 -0000
> > @@ -119,6 +119,8 @@ enum imsg_type {
> >     IMSG_VMDOP_PRIV_IFRDOMAIN,
> >     IMSG_VMDOP_VM_SHUTDOWN,
> >     IMSG_VMDOP_VM_REBOOT,
> > +   IMSG_VMDOP_IF_REGISTER,
> > +   IMSG_VMDOP_IF_UNREGISTER,
> >     IMSG_VMDOP_CONFIG,
> >     IMSG_VMDOP_DONE
> >  };
> > @@ -160,6 +162,12 @@ struct vmop_owner {
> >     int64_t                  gid;
> >  };
> >  
> > +struct vmop_address {
> > +   struct sockaddr_storage  vma_addr;
> > +   struct sockaddr_storage  vma_gw;
> > +   int                      vma_prefixlen;
> > +};
> > +
> >  struct vmop_create_params {
> >     struct vm_create_params  vmc_params;
> >     unsigned int             vmc_flags;
> > @@ -185,7 +193,10 @@ struct vmop_create_params {
> >  #define VMIFF_LOCKED               0x02
> >  #define VMIFF_LOCAL                0x04
> >  #define VMIFF_RDOMAIN              0x08
> > -#define VMIFF_OPTMASK              (VMIFF_LOCKED|VMIFF_LOCAL|VMIFF_RDOMAIN)
> > +#define VMIFF_ADDR4                0x10
> > +#define VMIFF_ADDR6                0x20
> > +#define VMIFF_OPTMASK              \
> > +   (VMIFF_LOCKED|VMIFF_LOCAL|VMIFF_RDOMAIN|VMIFF_ADDR4|VMIFF_ADDR6)
> >  
> >     unsigned int             vmc_disktypes[VMM_MAX_DISKS_PER_VM];
> >     unsigned int             vmc_diskbases[VMM_MAX_DISKS_PER_VM];
> > @@ -196,6 +207,8 @@ struct vmop_create_params {
> >     char                     vmc_ifswitch[VMM_MAX_NICS_PER_VM][VM_NAME_MAX];
> >     char                     vmc_ifgroup[VMM_MAX_NICS_PER_VM][IF_NAMESIZE];
> >     unsigned int             vmc_ifrdomain[VMM_MAX_NICS_PER_VM];
> > +   struct vmop_address      vmc_ifaddr4[VMM_MAX_NICS_PER_VM];
> > +   struct vmop_address      vmc_ifaddr6[VMM_MAX_NICS_PER_VM];
> >     struct vmop_owner        vmc_owner;
> >  
> >     /* instance template params */
> > @@ -315,6 +328,26 @@ struct address {
> >  };
> >  TAILQ_HEAD(addresslist, address);
> >  
> > +struct vmd_ifconfig {
> > +   uint32_t                 vifc_vmid;     /* associated VM id */
> > +   unsigned int             vifc_idx;      /* relative interface index */
> > +
> > +   unsigned int             vifc_flags;
> > +#define VMD_IFC_UNIT               0x01            /* has interface tap(4) 
> > unit */
> > +#define VMD_IFC_ADDR4              0x02            /* has IPv4 address */
> > +#define VMD_IFC_ADDR6              0x04            /* has IPv6 address */
> > +
> > +   unsigned int             vifc_unit;
> > +
> > +   struct sockaddr_in       vifc_addr4;
> > +   struct sockaddr_in       vifc_gw4;
> > +   int                      vifc_prefixlen4;
> > +
> > +   struct sockaddr_in6      vifc_addr6;
> > +   struct sockaddr_in6      vifc_gw6;
> > +   int                      vifc_prefixlen6;
> > +};
> > +
> >  struct vmd_config {
> >     unsigned int             cfg_flags;
> >  #define VMD_CFG_INET6              0x01
> > @@ -391,6 +424,7 @@ void     vm_stop(struct vmd_vm *, int, cons
> >  void        vm_remove(struct vmd_vm *, const char *);
> >  int         vm_register(struct privsep *, struct vmop_create_params *,
> >         struct vmd_vm **, uint32_t, uid_t);
> > +void        vm_priv_unregister(struct privsep *, uint32_t, int);
> >  int         vm_checkperm(struct vmd_vm *, struct vmop_owner *, uid_t);
> >  int         vm_checkaccess(int, unsigned int, uid_t, int);
> >  int         vm_opentty(struct vmd_vm *);
> > @@ -402,6 +436,8 @@ void     user_put(struct vmd_user *);
> >  void        user_inc(struct vm_create_params *, struct vmd_user *, int);
> >  int         user_checklimit(struct vmd_user *, struct vm_create_params *);
> >  char       *get_string(uint8_t *, size_t);
> > +uint8_t     mask2prefixlen(struct sockaddr *);
> > +uint8_t     mask2prefixlen6(struct sockaddr *);
> >  uint32_t prefixlen2mask(uint8_t);
> >  void        prefixlen2mask6(u_int8_t, struct in6_addr *);
> >  void        getmonotime(struct timeval *);
> > @@ -411,11 +447,15 @@ void   priv(struct privsep *, struct priv
> >  int         priv_getiftype(char *, char *, unsigned int *);
> >  int         priv_findname(const char *, const char **);
> >  int         priv_validgroup(const char *);
> > +int         vm_priv_register(struct privsep *, struct vmd_ifconfig *);
> >  int         vm_priv_ifconfig(struct privsep *, struct vmd_vm *);
> >  int         vm_priv_brconfig(struct privsep *, struct vmd_switch *);
> > -uint32_t vm_priv_addr(struct vmd_config *, uint32_t, int, int);
> > +uint32_t vm_priv_addr(struct vmd_config *, uint32_t, int, int,
> > +       struct in_addr *);
> >  int         vm_priv_addr6(struct vmd_config *, uint32_t, int, int,
> > -       struct in6_addr *);
> > +       struct in6_addr *, struct in6_addr *);
> > +unsigned int *vm_priv_byunit(unsigned int);
> > +struct vmd_ifconfig *vm_priv_byid(uint32_t, int);
> >  
> >  /* vmm.c */
> >  struct iovec;
> > Index: usr.sbin/vmd/vmm.c
> > ===================================================================
> > RCS file: /cvs/src/usr.sbin/vmd/vmm.c,v
> > retrieving revision 1.94
> > diff -u -p -u -p -r1.94 vmm.c
> > --- usr.sbin/vmd/vmm.c      25 Oct 2019 09:57:33 -0000      1.94
> > +++ usr.sbin/vmd/vmm.c      25 Oct 2019 18:11:06 -0000
> > @@ -602,6 +602,9 @@ opentap(char *ifname)
> >     char path[PATH_MAX];
> >  
> >     for (i = 0; i < MAX_TAP; i++) {
> > +           /* Skip statically configured interface names (eg. tap0) */
> > +           if (vm_priv_byunit(i) != NULL)
> > +                   continue;
> >             snprintf(path, PATH_MAX, "/dev/tap%d", i);
> >             fd = open(path, O_RDWR | O_NONBLOCK);
> >             if (fd != -1) {
> > 

Reply via email to