On Fri, Oct 25, 2019 at 07:47:35PM +0000, Reyk Floeter wrote:
> On Fri, Oct 25, 2019 at 12:27:25PM -0700, Mike Larkin wrote:
> > On Fri, Oct 25, 2019 at 06:15:59PM +0000, Reyk Floeter wrote:
> > > Hi,
> > >
> > > the attached diff is rather large and implements two things for vmd:
> > >
> > > 1) Allow to configure static IP address/gateway pairs local interfaces.
> > > 2) Skip statically configured interface names (eg. tap0) when
> > > allocating dynamic interfaces.
> > >
> > > Example:
> > > ---snip---
> > > vm "foo" {
> > > disable
> > > local interface "tap0" {
> > > address 192.168.0.10/24 192.168.0.1
> > > }
> > > local interface "tap1"
> > > disk "/home/vm/foo.qcow2"
> > > }
> > >
> > > vm "bar" {
> > > local interface
> > > disk "/home/vm/bar.qcow2"
> > > }
> > > ---snap---
> > >
> > >
> > > 1) The VM "foo" has two interfaces: The first interface has a fixed
> > > IPv4 address with 192.168.0.1/24 on the gateway and 192.168.0.10/24 on
> > > the VM. 192.168.0.10/24 is assigned to the VM's first NIC via the
> > > built-in DHCP server. The second VM gets a default 100.64.x.x/31 IP.
> >
> > I'm not sure the above description matches what I'm seeing in the vm.conf
> > snippet above.
> >
> > What's "the gateway" here? Is this the host machine, or the actual
> > gateway, perhaps on some other machine? Does this just allow me to specify
> > the host-side tap(4) IP address for a corresponding given VM vio(4)
> > interface?
> >
>
> Ah, OK. I used the terms without explaining them:
>
> With local interfaces, vmd(8) uses two IPs per interface: one for the
> tap(4) on the host, one for the vio(4) on the VM. It configures the
> first one on the host and provides the second one via DHCP. The IP on
> the host IP is the default "gateway" router for the VM.
>
Ah, I missed the fact that these are not "-i" style interfaces but rather
*local* interfaces (eg, "-L" style).
> The address syntax is currently reversed:
> address "address/prefix" "gateway"
> Maybe I should change it to
> address "gateway" "address/prefix"
> or
> address "address/prefix" gateway "gateway"
I like the last one, but I probably won't be a heavy user of this ...
-ml
>
> I also wonder if we could technically use a non-local IP address for
> the gateway. I currently enforce that the prefix matches, but I don't
> enforce that both addresses are in the same subnet.
>
> When using the default auto-generated 100.64.0.0/31 method, it uses
> the first IP in the subnet as the gateway and the second IP for the
> VM.
>
> > And did you mean "The second interface" there instead of the "The second
> > VM"?
> > (Although I think the description fits for "The second VM" also...)
> >
>
> Yes, both, the second interface is correct as well.
>
> > I think the idea is sound. As long as we don't end up adding extra command
> > line args to vmctl to manually configure this, which it doesn't appear we
> > are
> > doing here. :)
> >
>
> I don't want to add it to vmctl either.
>
> > I didn't read the diff in great detail, I'll wait until you say you have a
> > final version.
> >
>
> OK, thanks.
>
> Reyk
>
> > -ml
> >
> > > This idea came up when I talked with Mischa at EuroBSDCon about
> > > OpenBSDAms: instead of using L2 and external static dhcpd for all VMs,
> > > it could be a solution to use L3 and to avoid bridge(4) and dhcpd(8).
> > > But it would need a way to serve static IPs via the internal dhcp
> > > server. Using L3 with vmd is better with performance, routing, PF,
> > > etc., but has the drawback that it wastes a subnet and gateway IP per
> > > VM (maybe rdomains or other tricks could help here, but this is a
> > > problem for later).
> > >
> > > 2) The VM "foo" uses two static interface names, tap0 and tap1, and
> > > the VM "bar" uses a dynamic interface name (tapX). Without this diff,
> > > vmd would most certainly use tap0 for bar's interface because foo is
> > > disabled and not started before bar. With the diff, the first
> > > interface of bar will be tap2 or higher.
> > > The problem was just reported by kn@. I mixed both things into
> > > one diff because I was working on 1) when kn@ reported it. There are
> > > other ways to implement 2) but solving both issues in a similar way
> > > made more sense.
> > >
> > > This is not the final diff. I still have to clean it up, get
> > > feedback, think a little bit about it, and split it into smaller parts
> > > for review. I wanted to share the big picture.
> > >
> > > As a side node, I implemented the lookup with sorted tables because it
> > > is the most efficient way to do it, but maybe a simple linear lookup
> > > (iterating over all the VMs and all the interfaces all the time) would
> > > be good enough. But the current approach has benefits - if I did it
> > > right ;)
> > >
> > > Thoughts?
> > >
> > > Reyk
> > >
> > > Index: usr.sbin/vmd/dhcp.c
> > > ===================================================================
> > > RCS file: /cvs/src/usr.sbin/vmd/dhcp.c,v
> > > retrieving revision 1.8
> > > diff -u -p -u -p -r1.8 dhcp.c
> > > --- usr.sbin/vmd/dhcp.c 27 Dec 2018 19:51:30 -0000 1.8
> > > +++ usr.sbin/vmd/dhcp.c 25 Oct 2019 18:11:05 -0000
> > > @@ -119,8 +119,7 @@ dhcp_request(struct vionet_dev *dev, cha
> > > }
> > >
> > > if ((client_addr.s_addr =
> > > - vm_priv_addr(&env->vmd_cfg,
> > > - dev->vm_vmid, dev->idx, 1)) == 0)
> > > + vm_priv_addr(&env->vmd_cfg, dev->vm_vmid, dev->idx, 1, &mask)) == 0)
> > > return (-1);
> > > memcpy(&resp.yiaddr, &client_addr,
> > > sizeof(client_addr));
> > > @@ -129,7 +128,7 @@ dhcp_request(struct vionet_dev *dev, cha
> > > ss2sin(&pc.pc_dst)->sin_port = htons(CLIENT_PORT);
> > >
> > > if ((server_addr.s_addr = vm_priv_addr(&env->vmd_cfg, dev->vm_vmid,
> > > - dev->idx, 0)) == 0)
> > > + dev->idx, 0, &mask)) == 0)
> > > return (-1);
> > > memcpy(&resp.siaddr, &server_addr, sizeof(server_addr));
> > > memcpy(&ss2sin(&pc.pc_src)->sin_addr, &server_addr,
> > > Index: usr.sbin/vmd/parse.y
> > > ===================================================================
> > > RCS file: /cvs/src/usr.sbin/vmd/parse.y,v
> > > retrieving revision 1.52
> > > diff -u -p -u -p -r1.52 parse.y
> > > --- usr.sbin/vmd/parse.y 14 May 2019 06:05:45 -0000 1.52
> > > +++ usr.sbin/vmd/parse.y 25 Oct 2019 18:11:05 -0000
> > > @@ -120,9 +120,9 @@ typedef struct {
> > >
> > >
> > > %token INCLUDE ERROR
> > > -%token ADD ALLOW BOOT CDROM DEVICE DISABLE DISK DOWN ENABLE FORMAT
> > > GROUP
> > > -%token INET6 INSTANCE INTERFACE LLADDR LOCAL LOCKED MEMORY NET NIFS
> > > OWNER
> > > -%token PATH PREFIX RDOMAIN SIZE SOCKET SWITCH UP VM VMID
> > > +%token ADD ADDRESS ALLOW BOOT CDROM DEVICE DISABLE DISK DOWN ENABLE
> > > FORMAT
> > > +%token GROUP INET6 INSTANCE INTERFACE LLADDR LOCAL LOCKED MEMORY NET
> > > NIFS
> > > +%token OWNER PATH PREFIX RDOMAIN SIZE SOCKET SWITCH UP VM VMID
> > > %token <v.number> NUMBER
> > > %token <v.string> STRING
> > > %type <v.lladdr> lladdr
> > > @@ -413,6 +413,12 @@ vm_opts : disable
> > > {
> > >
> > > if ($1)
> > > vmc.vmc_ifflags[i] |= VMIFF_LOCAL;
> > > + else if (vmc.vmc_ifflags[i] &
> > > + (VMIFF_ADDR4|VMIFF_ADDR6)) {
> > > + yyerror("address on non-local interface");
> > > + free($3);
> > > + YYERROR;
> > > + }
> > > if ($3 != NULL) {
> > > if (strcmp($3, "tap") != 0 &&
> > > (priv_getiftype($3, type, NULL) == -1 ||
> > > @@ -617,7 +623,53 @@ iface_opts_c : iface_opts_c iface_opts o
> > > | iface_opts
> > > ;
> > >
> > > -iface_opts : SWITCH string {
> > > +iface_opts : ADDRESS STRING STRING {
> > > + struct vmop_address *vma;
> > > + unsigned int i = vcp_nnics;
> > > + struct address addr, gw;
> > > + char *gwp = NULL;
> > > + int maxprefixlen = 0;
> > > +
> > > + /* Does the gateway have a /prefix syntax? */
> > > + gwp = strrchr($3, '/');
> > > +
> > > + if (host($2, &addr) == -1 ||
> > > + host($3, &gw) == -1 ||
> > > + addr.ss.ss_family != gw.ss.ss_family) {
> > > + yyerror("invalid address: %s %s", $2, $3);
> > > + free($2);
> > > + free($3);
> > > + YYERROR;
> > > + }
> > > + free($2);
> > > +
> > > + if (gwp == NULL)
> > > + gw.prefixlen = addr.prefixlen;
> > > + else if (gw.prefixlen != addr.prefixlen) {
> > > + yyerror("mismatched gateway prefix: %s", $3);
> > > + free($3);
> > > + YYERROR;
> > > + }
> > > + free($3);
> > > +
> > > + if (addr.ss.ss_family == AF_INET) {
> > > + vmc.vmc_ifflags[i] |= VMIFF_ADDR4;
> > > + vma = &vmc.vmc_ifaddr4[i];
> > > + maxprefixlen = 127;
> > > + } else {
> > > + vmc.vmc_ifflags[i] |= VMIFF_ADDR6;
> > > + vma = &vmc.vmc_ifaddr6[i];
> > > + maxprefixlen = 31;
> > > + }
> > > + if (maxprefixlen && addr.prefixlen > maxprefixlen) {
> > > + yyerror("address prefix larger than /%u");
> > > + YYERROR;
> > > + }
> > > + memcpy(&vma->vma_addr, &addr.ss, sizeof(addr.ss));
> > > + memcpy(&vma->vma_gw, &gw.ss, sizeof(gw.ss));
> > > + vma->vma_prefixlen = addr.prefixlen;
> > > + }
> > > + | SWITCH string {
> > > unsigned int i = vcp_nnics;
> > >
> > > /* No need to check if the switch exists */
> > > @@ -763,6 +815,7 @@ lookup(char *s)
> > > /* this has to be sorted always */
> > > static const struct keywords keywords[] = {
> > > { "add", ADD },
> > > + { "address", ADDRESS },
> > > { "allow", ALLOW },
> > > { "boot", BOOT },
> > > { "cdrom", CDROM },
> > > Index: usr.sbin/vmd/priv.c
> > > ===================================================================
> > > RCS file: /cvs/src/usr.sbin/vmd/priv.c,v
> > > retrieving revision 1.15
> > > diff -u -p -u -p -r1.15 priv.c
> > > --- usr.sbin/vmd/priv.c 28 Jun 2019 13:32:51 -0000 1.15
> > > +++ usr.sbin/vmd/priv.c 25 Oct 2019 18:11:05 -0000
> > > @@ -46,6 +46,12 @@
> > > #include "proc.h"
> > > #include "vmd.h"
> > >
> > > +static unsigned int *priv_ifunits;
> > > +static size_t priv_nifunits;
> > > +
> > > +static struct vmd_ifconfig *priv_ifs;
> > > +static size_t priv_nifs;
> > > +
> > > int priv_dispatch_parent(int, struct privsep_proc *, struct imsg
> > > *);
> > > void priv_run(struct privsep *, struct privsep_proc *, void *);
> > >
> > > @@ -91,7 +97,9 @@ priv_dispatch_parent(int fd, struct priv
> > > struct ifaliasreq ifra;
> > > struct in6_aliasreq in6_ifra;
> > > struct if_afreq ifar;
> > > + struct vmd_ifconfig vifc;
> > > char type[IF_NAMESIZE];
> > > + int i;
> > >
> > > switch (imsg->hdr.type) {
> > > case IMSG_VMDOP_PRIV_IFDESCR:
> > > @@ -112,6 +120,8 @@ priv_dispatch_parent(int fd, struct priv
> > > fatalx("%s: rejected priv operation on interface: %s",
> > > __func__, vfr.vfr_name);
> > > break;
> > > + case IMSG_VMDOP_IF_REGISTER:
> > > + case IMSG_VMDOP_IF_UNREGISTER:
> > > case IMSG_VMDOP_CONFIG:
> > > case IMSG_CTL_RESET:
> > > break;
> > > @@ -244,6 +254,18 @@ priv_dispatch_parent(int fd, struct priv
> > > if (ioctl(env->vmd_fd6, SIOCAIFADDR_IN6, &in6_ifra) == -1)
> > > log_warn("SIOCAIFADDR_IN6");
> > > break;
> > > + case IMSG_VMDOP_IF_REGISTER:
> > > + IMSG_SIZE_CHECK(imsg, &vifc);
> > > + memcpy(&vifc, imsg->data, sizeof(vifc));
> > > + if (vm_priv_register(ps, &vifc) == -1)
> > > + fatalx("%s: failed to register interface",
> > > + __func__);
> > > + break;
> > > + case IMSG_VMDOP_IF_UNREGISTER:
> > > + IMSG_SIZE_CHECK(imsg, &i);
> > > + memcpy(&i, imsg->data, sizeof(i));
> > > + vm_priv_unregister(ps, imsg->hdr.peerid, i);
> > > + break;
> > > case IMSG_VMDOP_CONFIG:
> > > config_getconfig(env, imsg);
> > > break;
> > > @@ -328,8 +350,9 @@ vm_priv_ifconfig(struct privsep *ps, str
> > > struct vmd_switch *vsw;
> > > unsigned int i;
> > > struct vmop_ifreq vfr, vfbr;
> > > - struct sockaddr_in *sin4;
> > > - struct sockaddr_in6 *sin6;
> > > + struct sockaddr_in *sin4, *mask4;
> > > + struct sockaddr_in6 *sin6, *mask6;
> > > + uint8_t prefixlen;
> > >
> > > for (i = 0; i < VMM_MAX_NICS_PER_VM; i++) {
> > > vif = &vm->vm_ifs[i];
> > > @@ -435,24 +458,25 @@ vm_priv_ifconfig(struct privsep *ps, str
> > > memset(&vfr.vfr_mask, 0, sizeof(vfr.vfr_mask));
> > > memset(&vfr.vfr_addr, 0, sizeof(vfr.vfr_addr));
> > >
> > > - /* local IPv4 address with a /31 mask */
> > > - sin4 = (struct sockaddr_in *)&vfr.vfr_mask;
> > > - sin4->sin_family = AF_INET;
> > > - sin4->sin_len = sizeof(*sin4);
> > > - sin4->sin_addr.s_addr = htonl(0xfffffffe);
> > > + /* local IPv4 address and netmask */
> > > + mask4 = ss2sin(&vfr.vfr_mask);
> > > + mask4->sin_family = AF_INET;
> > > + mask4->sin_len = sizeof(*mask4);
> > >
> > > - sin4 = (struct sockaddr_in *)&vfr.vfr_addr;
> > > + sin4 = ss2sin(&vfr.vfr_addr);
> > > sin4->sin_family = AF_INET;
> > > sin4->sin_len = sizeof(*sin4);
> > > if ((sin4->sin_addr.s_addr =
> > > vm_priv_addr(&env->vmd_cfg,
> > > - vm->vm_vmid, i, 0)) == 0)
> > > + vm->vm_vmid, i, 0, &mask4->sin_addr)) == 0)
> > > return (-1);
> > >
> > > inet_ntop(AF_INET, &sin4->sin_addr,
> > > name, sizeof(name));
> > > - log_debug("%s: interface %s address %s/31",
> > > - __func__, vfr.vfr_name, name);
> > > + prefixlen = mask2prefixlen((struct sockaddr *)mask4);
> > > +
> > > + log_debug("%s: interface %s address %s/%u",
> > > + __func__, vfr.vfr_name, name, prefixlen);
> > >
> > > proc_compose(ps, PROC_PRIV, IMSG_VMDOP_PRIV_IFADDR,
> > > &vfr, sizeof(vfr));
> > > @@ -462,24 +486,24 @@ vm_priv_ifconfig(struct privsep *ps, str
> > > memset(&vfr.vfr_mask, 0, sizeof(vfr.vfr_mask));
> > > memset(&vfr.vfr_addr, 0, sizeof(vfr.vfr_addr));
> > >
> > > - /* local IPv6 address with a /96 mask */
> > > - sin6 = ss2sin6(&vfr.vfr_mask);
> > > - sin6->sin6_family = AF_INET6;
> > > - sin6->sin6_len = sizeof(*sin6);
> > > - memset(&sin6->sin6_addr.s6_addr[0], 0xff, 12);
> > > - memset(&sin6->sin6_addr.s6_addr[12], 0, 4);
> > > + /* local IPv6 address and netmask */
> > > + mask6 = ss2sin6(&vfr.vfr_mask);
> > > + mask6->sin6_family = AF_INET6;
> > > + mask6->sin6_len = sizeof(*sin6);
> > >
> > > sin6 = ss2sin6(&vfr.vfr_addr);
> > > sin6->sin6_family = AF_INET6;
> > > sin6->sin6_len = sizeof(*sin6);
> > > if (vm_priv_addr6(&env->vmd_cfg,
> > > - vm->vm_vmid, i, 0, &sin6->sin6_addr) == -1)
> > > + vm->vm_vmid, i, 0, &sin6->sin6_addr,
> > > + &mask6->sin6_addr) == -1)
> > > return (-1);
> > >
> > > inet_ntop(AF_INET6, &sin6->sin6_addr,
> > > name, sizeof(name));
> > > - log_debug("%s: interface %s address %s/96",
> > > - __func__, vfr.vfr_name, name);
> > > + prefixlen = mask2prefixlen6((struct sockaddr *)mask6);
> > > + log_debug("%s: interface %s address %s/%u",
> > > + __func__, vfr.vfr_name, name, prefixlen);
> > >
> > > proc_compose(ps, PROC_PRIV, IMSG_VMDOP_PRIV_IFADDR6,
> > > &vfr, sizeof(vfr));
> > > @@ -543,11 +567,196 @@ vm_priv_brconfig(struct privsep *ps, str
> > > return (0);
> > > }
> > >
> > > +static int
> > > +priv_if_cmp(const void *a, const void *b)
> > > +{
> > > + const struct vmd_ifconfig *vifca = a;
> > > + const struct vmd_ifconfig *vifcb = b;
> > > +
> > > + if (vifca->vifc_vmid != vifcb->vifc_vmid)
> > > + return (vifca->vifc_vmid > vifcb->vifc_vmid ? 1 : -1);
> > > + if (vifca->vifc_idx != vifcb->vifc_idx)
> > > + return (vifca->vifc_idx > vifcb->vifc_idx ? 1 : -1);
> > > +
> > > + return (0);
> > > +}
> > > +
> > > +static int
> > > +priv_ifunit_cmp(const void *a, const void *b)
> > > +{
> > > + int ia = *(const unsigned int *)a;
> > > + int ib = *(const unsigned int *)b;
> > > +
> > > + return ((int)ia - (int)ib);
> > > +}
> > > +
> > > +unsigned int *
> > > +vm_priv_byunit(unsigned int unit)
> > > +{
> > > + return (bsearch(&unit, priv_ifunits, priv_nifunits, sizeof(unit),
> > > + priv_ifunit_cmp));
> > > +}
> > > +
> > > +struct vmd_ifconfig *
> > > +vm_priv_byid(uint32_t vmid, int idx)
> > > +{
> > > + struct vmd_ifconfig key;
> > > +
> > > + key.vifc_vmid = vmid;
> > > + key.vifc_idx = idx;
> > > + return (bsearch(&key, priv_ifs, priv_nifs, sizeof(key), priv_if_cmp));
> > > +}
> > > +
> > > +/*
> > > + * Called to register global interface configuration
> > > + * - the associated VM id
> > > + * - the relativ interface index of the VM
> > > + * - the fixed tap(4) interface unit (optional)
> > > + * - the fixed IP address (optional)
> > > + */
> > > +int
> > > +vm_priv_register(struct privsep *ps, struct vmd_ifconfig *vifc)
> > > +{
> > > + struct vmd_ifconfig *ifc = NULL;
> > > + unsigned int *ifu;
> > > +
> > > + /* Ignore interfaces that don't have any relevant configuration */
> > > + if (vifc->vifc_flags == 0)
> > > + return (0);
> > > +
> > > + if (vifc->vifc_vmid == UINT32_MAX) {
> > > + log_warnx("VM id %u too large", vifc->vifc_unit);
> > > + goto fail;
> > > + }
> > > +
> > > + if (vm_priv_byid(vifc->vifc_vmid, vifc->vifc_idx) != NULL) {
> > > + log_warnx("interface vm %u #%u registered twice",
> > > + vifc->vifc_vmid, vifc->vifc_idx);
> > > + goto fail;
> > > + }
> > > +
> > > + /* Append new interface */
> > > + if ((ifc = recallocarray(priv_ifs, priv_nifs,
> > > + priv_nifs + 1, sizeof(*ifc))) == NULL) {
> > > + log_warn("failed to grow interface table");
> > > + goto fail;
> > > + }
> > > + priv_ifs = ifc;
> > > + memcpy(&priv_ifs[priv_nifs], vifc, sizeof(*vifc));
> > > + priv_nifs++;
> > > +
> > > + /* Sort table */
> > > + qsort(priv_ifs, priv_nifs, sizeof(*ifc), priv_if_cmp);
> > > +
> > > + if (vifc->vifc_flags & VMD_IFC_UNIT) {
> > > + if (vifc->vifc_unit == UINT_MAX) {
> > > + log_warnx("interface tap%u unit too large",
> > > + vifc->vifc_unit);
> > > + goto fail;
> > > + }
> > > +
> > > + if (vm_priv_byunit(vifc->vifc_unit) != NULL) {
> > > + log_warnx("interface tap%u defined twice",
> > > + vifc->vifc_unit);
> > > + goto fail;
> > > + }
> > > +
> > > + /* Append new interface unit */
> > > + if ((ifu = recallocarray(priv_ifunits, priv_nifunits,
> > > + priv_nifunits + 1, sizeof(*ifu))) == NULL) {
> > > + log_warn("failed to grow interface unit table");
> > > + goto fail;
> > > + }
> > > + priv_ifunits = ifu;
> > > + priv_ifunits[priv_nifunits++] = vifc->vifc_unit;
> > > +
> > > + /* Sort table */
> > > + qsort(priv_ifunits, priv_nifunits, sizeof(*ifu),
> > > + priv_ifunit_cmp);
> > > +
> > > + log_debug("%s: %s registered interface tap%u", __func__,
> > > + ps->ps_title[privsep_process],
> > > + vifc->vifc_unit);
> > > + }
> > > +
> > > + return (0);
> > > +
> > > + fail:
> > > + if (ifc != NULL)
> > > + vm_priv_unregister(ps, vifc->vifc_vmid, vifc->vifc_idx);
> > > + return (-1);
> > > +}
> > > +
> > > +/*
> > > + * Called to unregister global interface configuration
> > > + */
> > > +void
> > > +vm_priv_unregister(struct privsep *ps, uint32_t vmid, int idx)
> > > +{
> > > + struct vmd_ifconfig *vifc, *ifc;
> > > + unsigned int *ifu;
> > > +
> > > + if ((vifc = vm_priv_byid(vmid, idx)) == NULL)
> > > + return;
> > > +
> > > + if (vifc->vifc_flags & VMD_IFC_UNIT &&
> > > + (ifu = vm_priv_byunit(vifc->vifc_unit)) != NULL) {
> > > + /* Move entry to the end */
> > > + *ifu = UINT_MAX;
> > > + qsort(priv_ifunits, priv_nifunits, sizeof(*ifu),
> > > + priv_ifunit_cmp);
> > > +
> > > + /* and remove last entry from the table */
> > > + if ((ifu = recallocarray(priv_ifunits, priv_nifunits,
> > > + priv_nifunits - 1, sizeof(*ifu))) == NULL &&
> > > + priv_nifunits > 1) {
> > > + log_warn("failed to shrink interface unit table");
> > > + return;
> > > + }
> > > + priv_ifunits = ifu;
> > > + priv_nifunits--;
> > > +
> > > + log_debug("%s: %s unregistered interface tap%u", __func__,
> > > + ps->ps_title[privsep_process],
> > > + vifc->vifc_unit);
> > > + }
> > > +
> > > + /* Move entry to the end */
> > > + vifc->vifc_vmid = UINT32_MAX;
> > > + qsort(priv_ifs, priv_nifs, sizeof(*ifc), priv_if_cmp);
> > > +
> > > + /* and remove last entry from the table */
> > > + if ((ifc = recallocarray(priv_ifs, priv_nifs,
> > > + priv_nifs - 1, sizeof(*ifc))) == NULL &&
> > > + priv_nifs > 1) {
> > > + log_warn("failed to shrink interface table");
> > > + return;
> > > + }
> > > + priv_ifs = ifc;
> > > + priv_nifs--;
> > > +
> > > + log_debug("%s: %s unregistered interface vm %u #%u", __func__,
> > > + ps->ps_title[privsep_process], vmid, idx);
> > > +}
> > > +
> > > uint32_t
> > > -vm_priv_addr(struct vmd_config *cfg, uint32_t vmid, int idx, int isvm)
> > > +vm_priv_addr(struct vmd_config *cfg, uint32_t vmid, int idx, int isvm,
> > > + struct in_addr *mask)
> > > {
> > > struct address *h = &cfg->cfg_localprefix;
> > > - in_addr_t prefix, mask, addr;
> > > + in_addr_t prefix, addr;
> > > + struct vmd_ifconfig *vifc;
> > > +
> > > + /* Check if there is a preconfigured address for this interface */
> > > + if ((vifc = vm_priv_byid(vmid, idx)) != NULL &&
> > > + vifc->vifc_flags & VMD_IFC_ADDR4) {
> > > + if (isvm)
> > > + addr = vifc->vifc_addr4.sin_addr.s_addr;
> > > + else
> > > + addr = vifc->vifc_gw4.sin_addr.s_addr;
> > > + mask->s_addr = prefixlen2mask(vifc->vifc_prefixlen4);
> > > + return (addr);
> > > + }
> > >
> > > /*
> > > * 1. Set the address prefix and mask, 100.64.0.0/10 by default.
> > > @@ -556,7 +765,7 @@ vm_priv_addr(struct vmd_config *cfg, uin
> > > h->prefixlen < 0 || h->prefixlen > 32)
> > > fatal("local prefix");
> > > prefix = ss2sin(&h->ss)->sin_addr.s_addr;
> > > - mask = prefixlen2mask(h->prefixlen);
> > > + mask->s_addr = prefixlen2mask(h->prefixlen);
> > >
> > > /* 2. Encode the VM ID as a per-VM subnet range N, 100.64.N.0/24. */
> > > addr = vmid << 8;
> > > @@ -580,7 +789,7 @@ vm_priv_addr(struct vmd_config *cfg, uin
> > > * - the address should not exceed the prefix (eg. VM ID to high).
> > > * - up to 126 interfaces can be encoded per VM.
> > > */
> > > - if (prefix != (addr & mask) || idx >= 0x7f) {
> > > + if (prefix != (addr & mask->s_addr) || idx >= 0x7f) {
> > > log_warnx("%s: dhcp address range exceeded,"
> > > " vm id %u interface %d", __func__, vmid, idx);
> > > return (0);
> > > @@ -591,21 +800,35 @@ vm_priv_addr(struct vmd_config *cfg, uin
> > >
> > > int
> > > vm_priv_addr6(struct vmd_config *cfg, uint32_t vmid,
> > > - int idx, int isvm, struct in6_addr *in6_addr)
> > > + int idx, int isvm, struct in6_addr *in6_addr, struct in6_addr *mask)
> > > {
> > > struct address *h = &cfg->cfg_localprefix6;
> > > - struct in6_addr addr, mask;
> > > + struct in6_addr addr, *addrptr;
> > > + struct vmd_ifconfig *vifc;
> > > uint32_t addr4;
> > > + struct in_addr mask4;
> > > +
> > > + /* Check if there is a preconfigured address for this interface */
> > > + if ((vifc = vm_priv_byid(vmid, idx)) != NULL &&
> > > + vifc->vifc_flags & VMD_IFC_ADDR6) {
> > > + if (isvm)
> > > + addrptr = &vifc->vifc_addr6.sin6_addr;
> > > + else
> > > + addrptr = &vifc->vifc_gw6.sin6_addr;
> > > + memcpy(in6_addr, addrptr, sizeof(*in6_addr));
> > > + prefixlen2mask6(vifc->vifc_prefixlen6, mask);
> > > + return (0);
> > > + }
> > >
> > > /* 1. Set the address prefix and mask, fd00::/8 by default. */
> > > if (h->ss.ss_family != AF_INET6 ||
> > > h->prefixlen < 0 || h->prefixlen > 128)
> > > fatal("local prefix6");
> > > addr = ss2sin6(&h->ss)->sin6_addr;
> > > - prefixlen2mask6(h->prefixlen, &mask);
> > > + prefixlen2mask6(h->prefixlen, mask);
> > >
> > > /* 2. Encode the VM IPv4 address as subnet, fd00::NN:NN:0:0/96. */
> > > - if ((addr4 = vm_priv_addr(cfg, vmid, idx, 1)) == 0)
> > > + if ((addr4 = vm_priv_addr(cfg, vmid, idx, 1, &mask4)) == 0)
> > > return (0);
> > > memcpy(&addr.s6_addr[8], &addr4, sizeof(addr4));
> > >
> > > Index: usr.sbin/vmd/vm.conf.5
> > > ===================================================================
> > > RCS file: /cvs/src/usr.sbin/vmd/vm.conf.5,v
> > > retrieving revision 1.44
> > > diff -u -p -u -p -r1.44 vm.conf.5
> > > --- usr.sbin/vmd/vm.conf.5 14 May 2019 12:47:17 -0000 1.44
> > > +++ usr.sbin/vmd/vm.conf.5 25 Oct 2019 18:11:05 -0000
> > > @@ -209,6 +209,14 @@ to select a specific one.
> > > .Pp
> > > Valid options are:
> > > .Bl -tag -width Ds
> > > +.It Ic address Ar address Ns Li / Ns Ar prefix Ar gateway
> > > +If the interface is configured as a
> > > +.Cm local
> > > +interface,
> > > +use a static IP address and gateway.
> > > +This option can be specified for IPv4 and for IPv6.
> > > +If not specified, the default is to auto-generate the address pair using
> > > the
> > > +.Cm local Oo Cm inet6 Oc Cm prefix .
> > > .It Cm group Ar group-name
> > > Assign the interface to a specific interface
> > > .Dq group .
> > > @@ -258,6 +266,8 @@ A
> > > interface will auto-generate an IPv4 subnet for the interface,
> > > configure a gateway address on the VM host side,
> > > and run a simple DHCP/BOOTP server for the VM.
> > > +The address can optionally be configured as a static
> > > +.Cm address .
> > > This option can be used for layer 3 mode without configuring a switch.
> > > .Pp
> > > If the global
> > > Index: usr.sbin/vmd/vmd.c
> > > ===================================================================
> > > RCS file: /cvs/src/usr.sbin/vmd/vmd.c,v
> > > retrieving revision 1.116
> > > diff -u -p -u -p -r1.116 vmd.c
> > > --- usr.sbin/vmd/vmd.c 4 Sep 2019 07:02:03 -0000 1.116
> > > +++ usr.sbin/vmd/vmd.c 25 Oct 2019 18:11:05 -0000
> > > @@ -1161,6 +1161,8 @@ void
> > > vm_remove(struct vmd_vm *vm, const char *caller)
> > > {
> > > struct privsep *ps = &env->vmd_ps;
> > > + size_t i;
> > > + int idx;
> > >
> > > if (vm == NULL)
> > > return;
> > > @@ -1171,6 +1173,16 @@ vm_remove(struct vmd_vm *vm, const char
> > >
> > > TAILQ_REMOVE(env->vmd_vms, vm, vm_entry);
> > >
> > > + for (i = 0; i < vm->vm_params.vmc_params.vcp_nnics; i++) {
> > > + idx = (int)i;
> > > + vm_priv_unregister(ps, vm->vm_vmid, idx);
> > > + if (privsep_process == PROC_PARENT) {
> > > + proc_compose_imsg(ps, PROC_PRIV, -1,
> > > + IMSG_VMDOP_IF_UNREGISTER,
> > > + vm->vm_vmid, -1, &idx, sizeof(idx));
> > > + }
> > > + }
> > > +
> > > user_put(vm->vm_user);
> > > vm_stop(vm, 0, caller);
> > > free(vm);
> > > @@ -1211,14 +1223,17 @@ int
> > > vm_register(struct privsep *ps, struct vmop_create_params *vmc,
> > > struct vmd_vm **ret_vm, uint32_t id, uid_t uid)
> > > {
> > > - struct vmd_vm *vm = NULL, *vm_parent = NULL;
> > > + char ifname[IF_NAMESIZE], *s;
> > > + struct vmd_vm *vm = NULL, *vm_new = NULL, *vm_parent = NULL;
> > > struct vm_create_params *vcp = &vmc->vmc_params;
> > > struct vmop_owner *vmo = NULL;
> > > + struct vmop_address *vma;
> > > struct vmd_user *usr = NULL;
> > > + struct vmd_ifconfig vifc;
> > > + int maxprefixlen;
> > > uint32_t nid, rng;
> > > unsigned int i, j;
> > > struct vmd_switch *sw;
> > > - char *s;
> > >
> > > /* Check if this is an instance of another VM */
> > > if (vm_instance(ps, &vm_parent, vmc, uid) == -1)
> > > @@ -1294,7 +1309,7 @@ vm_register(struct privsep *ps, struct v
> > > goto fail;
> > > }
> > >
> > > - if ((vm = calloc(1, sizeof(*vm))) == NULL)
> > > + if ((vm = vm_new = calloc(1, sizeof(*vm))) == NULL)
> > > goto fail;
> > >
> > > memcpy(&vm->vm_params, vmc, sizeof(vm->vm_params));
> > > @@ -1305,6 +1320,20 @@ vm_register(struct privsep *ps, struct v
> > > vm->vm_receive_fd = -1;
> > > vm->vm_state &= ~VM_STATE_PAUSED;
> > > vm->vm_user = usr;
> > > + vm->vm_kernel = -1;
> > > + vm->vm_cdrom = -1;
> > > + vm->vm_iev.ibuf.fd = -1;
> > > +
> > > + /*
> > > + * Assign a new internal Id if not specified and we succeed in
> > > + * claiming a new Id.
> > > + */
> > > + if (id != 0)
> > > + vm->vm_vmid = id;
> > > + else if (vm_claimid(vcp->vcp_name, uid, &nid) == -1)
> > > + goto fail;
> > > + else
> > > + vm->vm_vmid = nid;
> > >
> > > for (i = 0; i < VMM_MAX_DISKS_PER_VM; i++)
> > > for (j = 0; j < VM_MAX_BASE_PER_DISK; j++)
> > > @@ -1333,30 +1362,69 @@ vm_register(struct privsep *ps, struct v
> > > vcp->vcp_macs[i][4] = rng;
> > > vcp->vcp_macs[i][5] = rng >> 8;
> > > }
> > > - }
> > > - vm->vm_kernel = -1;
> > > - vm->vm_cdrom = -1;
> > > - vm->vm_iev.ibuf.fd = -1;
> > >
> > > - /*
> > > - * Assign a new internal Id if not specified and we succeed in
> > > - * claiming a new Id.
> > > - */
> > > - if (id != 0)
> > > - vm->vm_vmid = id;
> > > - else if (vm_claimid(vcp->vcp_name, uid, &nid) == -1)
> > > - goto fail;
> > > - else
> > > - vm->vm_vmid = nid;
> > > + /*
> > > + * Store interface in global configuration table
> > > + */
> > > + memset(&vifc, 0, sizeof(vifc));
> > > +
> > > + /* Get and check pre-configured interface name */
> > > + s = vmc->vmc_ifnames[i];
> > > + if (*s != '\0' && strcmp("tap", s) != 0 &&
> > > + priv_getiftype(s, ifname, &vifc.vifc_unit) != -1)
> > > + vifc.vifc_flags |= VMD_IFC_UNIT;
> > > +
> > > + maxprefixlen = 0;
> > > + if (vmc->vmc_ifflags[i] & VMIFF_ADDR4) {
> > > + vma = &vmc->vmc_ifaddr4[i];
> > > + memcpy(&vifc.vifc_addr4, &vma->vma_addr,
> > > + sizeof(vifc.vifc_addr4));
> > > + memcpy(&vifc.vifc_gw4, &vma->vma_gw,
> > > + sizeof(vifc.vifc_gw4));
> > > + vifc.vifc_prefixlen4 = vma->vma_prefixlen;
> > > + vifc.vifc_flags |= VMD_IFC_ADDR4;
> > > + maxprefixlen = 127;
> > > + }
> > > + if (vmc->vmc_ifflags[i] & VMIFF_ADDR6) {
> > > + vma = &vmc->vmc_ifaddr6[i];
> > > + memcpy(&vifc.vifc_addr4, &vma->vma_addr,
> > > + sizeof(vifc.vifc_addr4));
> > > + memcpy(&vifc.vifc_gw4, &vma->vma_gw,
> > > + sizeof(vifc.vifc_gw4));
> > > + vifc.vifc_prefixlen4 = vma->vma_prefixlen;
> > > + vifc.vifc_flags |= VMD_IFC_ADDR6;
> > > + maxprefixlen = 31;
> > > + }
> > > + if (maxprefixlen && vma->vma_prefixlen > maxprefixlen) {
> > > + log_warnx("address prefix larger than /%d",
> > > + maxprefixlen);
> > > + goto fail;
> > > + }
> > > +
> > > + vifc.vifc_vmid = vm->vm_vmid;
> > > + vifc.vifc_idx = i;
> > > +
> > > + if (vm_priv_register(ps, &vifc) == -1)
> > > + goto fail;
> > > +
> > > + if (privsep_process == PROC_PARENT) {
> > > + proc_compose_imsg(ps, PROC_PRIV, -1,
> > > + IMSG_VMDOP_IF_REGISTER, -1, -1,
> > > + &vifc, sizeof(vifc));
> > > + }
> > > + }
> > >
> > > log_debug("%s: registering vm %d", __func__, vm->vm_vmid);
> > > TAILQ_INSERT_TAIL(env->vmd_vms, vm, vm_entry);
> > >
> > > *ret_vm = vm;
> > > return (0);
> > > +
> > > fail:
> > > + free(vm_new);
> > > if (errno == 0)
> > > errno = EINVAL;
> > > +
> > > return (-1);
> > > }
> > >
> > > @@ -1956,6 +2024,71 @@ get_string(uint8_t *ptr, size_t len)
> > > break;
> > >
> > > return strndup(ptr, i);
> > > +}
> > > +
> > > +uint8_t
> > > +mask2prefixlen(struct sockaddr *sa)
> > > +{
> > > + struct sockaddr_in *sa_in = (struct sockaddr_in *)sa;
> > > + in_addr_t ina = sa_in->sin_addr.s_addr;
> > > +
> > > + if (ina == 0)
> > > + return (0);
> > > + else
> > > + return (33 - ffs(ntohl(ina)));
> > > +}
> > > +
> > > +uint8_t
> > > +mask2prefixlen6(struct sockaddr *sa)
> > > +{
> > > + struct sockaddr_in6 *sa_in6 = (struct sockaddr_in6 *)sa;
> > > + uint8_t *ap, *ep;
> > > + unsigned int l = 0;
> > > +
> > > + /*
> > > + * sin6_len is the size of the sockaddr so substract the offset of
> > > + * the possibly truncated sin6_addr struct.
> > > + */
> > > + ap = (uint8_t *)&sa_in6->sin6_addr;
> > > + ep = (uint8_t *)sa_in6 + sa_in6->sin6_len;
> > > + for (; ap < ep; ap++) {
> > > + /* this "beauty" is adopted from sbin/route/show.c ... */
> > > + switch (*ap) {
> > > + case 0xff:
> > > + l += 8;
> > > + break;
> > > + case 0xfe:
> > > + l += 7;
> > > + goto done;
> > > + case 0xfc:
> > > + l += 6;
> > > + goto done;
> > > + case 0xf8:
> > > + l += 5;
> > > + goto done;
> > > + case 0xf0:
> > > + l += 4;
> > > + goto done;
> > > + case 0xe0:
> > > + l += 3;
> > > + goto done;
> > > + case 0xc0:
> > > + l += 2;
> > > + goto done;
> > > + case 0x80:
> > > + l += 1;
> > > + goto done;
> > > + case 0x00:
> > > + goto done;
> > > + default:
> > > + fatalx("non contiguous inet6 netmask");
> > > + }
> > > + }
> > > +
> > > +done:
> > > + if (l > sizeof(struct in6_addr) * 8)
> > > + fatalx("%s: prefixlen %d out of bound", __func__, l);
> > > + return (l);
> > > }
> > >
> > > uint32_t
> > > Index: usr.sbin/vmd/vmd.h
> > > ===================================================================
> > > RCS file: /cvs/src/usr.sbin/vmd/vmd.h,v
> > > retrieving revision 1.97
> > > diff -u -p -u -p -r1.97 vmd.h
> > > --- usr.sbin/vmd/vmd.h 7 Sep 2019 09:11:14 -0000 1.97
> > > +++ usr.sbin/vmd/vmd.h 25 Oct 2019 18:11:06 -0000
> > > @@ -119,6 +119,8 @@ enum imsg_type {
> > > IMSG_VMDOP_PRIV_IFRDOMAIN,
> > > IMSG_VMDOP_VM_SHUTDOWN,
> > > IMSG_VMDOP_VM_REBOOT,
> > > + IMSG_VMDOP_IF_REGISTER,
> > > + IMSG_VMDOP_IF_UNREGISTER,
> > > IMSG_VMDOP_CONFIG,
> > > IMSG_VMDOP_DONE
> > > };
> > > @@ -160,6 +162,12 @@ struct vmop_owner {
> > > int64_t gid;
> > > };
> > >
> > > +struct vmop_address {
> > > + struct sockaddr_storage vma_addr;
> > > + struct sockaddr_storage vma_gw;
> > > + int vma_prefixlen;
> > > +};
> > > +
> > > struct vmop_create_params {
> > > struct vm_create_params vmc_params;
> > > unsigned int vmc_flags;
> > > @@ -185,7 +193,10 @@ struct vmop_create_params {
> > > #define VMIFF_LOCKED 0x02
> > > #define VMIFF_LOCAL 0x04
> > > #define VMIFF_RDOMAIN 0x08
> > > -#define VMIFF_OPTMASK (VMIFF_LOCKED|VMIFF_LOCAL|VMIFF_RDOMAIN)
> > > +#define VMIFF_ADDR4 0x10
> > > +#define VMIFF_ADDR6 0x20
> > > +#define VMIFF_OPTMASK \
> > > + (VMIFF_LOCKED|VMIFF_LOCAL|VMIFF_RDOMAIN|VMIFF_ADDR4|VMIFF_ADDR6)
> > >
> > > unsigned int vmc_disktypes[VMM_MAX_DISKS_PER_VM];
> > > unsigned int vmc_diskbases[VMM_MAX_DISKS_PER_VM];
> > > @@ -196,6 +207,8 @@ struct vmop_create_params {
> > > char vmc_ifswitch[VMM_MAX_NICS_PER_VM][VM_NAME_MAX];
> > > char vmc_ifgroup[VMM_MAX_NICS_PER_VM][IF_NAMESIZE];
> > > unsigned int vmc_ifrdomain[VMM_MAX_NICS_PER_VM];
> > > + struct vmop_address vmc_ifaddr4[VMM_MAX_NICS_PER_VM];
> > > + struct vmop_address vmc_ifaddr6[VMM_MAX_NICS_PER_VM];
> > > struct vmop_owner vmc_owner;
> > >
> > > /* instance template params */
> > > @@ -315,6 +328,26 @@ struct address {
> > > };
> > > TAILQ_HEAD(addresslist, address);
> > >
> > > +struct vmd_ifconfig {
> > > + uint32_t vifc_vmid; /* associated VM id */
> > > + unsigned int vifc_idx; /* relative interface index */
> > > +
> > > + unsigned int vifc_flags;
> > > +#define VMD_IFC_UNIT 0x01 /* has interface tap(4)
> > > unit */
> > > +#define VMD_IFC_ADDR4 0x02 /* has IPv4 address */
> > > +#define VMD_IFC_ADDR6 0x04 /* has IPv6 address */
> > > +
> > > + unsigned int vifc_unit;
> > > +
> > > + struct sockaddr_in vifc_addr4;
> > > + struct sockaddr_in vifc_gw4;
> > > + int vifc_prefixlen4;
> > > +
> > > + struct sockaddr_in6 vifc_addr6;
> > > + struct sockaddr_in6 vifc_gw6;
> > > + int vifc_prefixlen6;
> > > +};
> > > +
> > > struct vmd_config {
> > > unsigned int cfg_flags;
> > > #define VMD_CFG_INET6 0x01
> > > @@ -391,6 +424,7 @@ void vm_stop(struct vmd_vm *, int, cons
> > > void vm_remove(struct vmd_vm *, const char *);
> > > int vm_register(struct privsep *, struct vmop_create_params *,
> > > struct vmd_vm **, uint32_t, uid_t);
> > > +void vm_priv_unregister(struct privsep *, uint32_t, int);
> > > int vm_checkperm(struct vmd_vm *, struct vmop_owner *, uid_t);
> > > int vm_checkaccess(int, unsigned int, uid_t, int);
> > > int vm_opentty(struct vmd_vm *);
> > > @@ -402,6 +436,8 @@ void user_put(struct vmd_user *);
> > > void user_inc(struct vm_create_params *, struct vmd_user *, int);
> > > int user_checklimit(struct vmd_user *, struct vm_create_params *);
> > > char *get_string(uint8_t *, size_t);
> > > +uint8_t mask2prefixlen(struct sockaddr *);
> > > +uint8_t mask2prefixlen6(struct sockaddr *);
> > > uint32_t prefixlen2mask(uint8_t);
> > > void prefixlen2mask6(u_int8_t, struct in6_addr *);
> > > void getmonotime(struct timeval *);
> > > @@ -411,11 +447,15 @@ void priv(struct privsep *, struct priv
> > > int priv_getiftype(char *, char *, unsigned int *);
> > > int priv_findname(const char *, const char **);
> > > int priv_validgroup(const char *);
> > > +int vm_priv_register(struct privsep *, struct vmd_ifconfig *);
> > > int vm_priv_ifconfig(struct privsep *, struct vmd_vm *);
> > > int vm_priv_brconfig(struct privsep *, struct vmd_switch *);
> > > -uint32_t vm_priv_addr(struct vmd_config *, uint32_t, int, int);
> > > +uint32_t vm_priv_addr(struct vmd_config *, uint32_t, int, int,
> > > + struct in_addr *);
> > > int vm_priv_addr6(struct vmd_config *, uint32_t, int, int,
> > > - struct in6_addr *);
> > > + struct in6_addr *, struct in6_addr *);
> > > +unsigned int *vm_priv_byunit(unsigned int);
> > > +struct vmd_ifconfig *vm_priv_byid(uint32_t, int);
> > >
> > > /* vmm.c */
> > > struct iovec;
> > > Index: usr.sbin/vmd/vmm.c
> > > ===================================================================
> > > RCS file: /cvs/src/usr.sbin/vmd/vmm.c,v
> > > retrieving revision 1.94
> > > diff -u -p -u -p -r1.94 vmm.c
> > > --- usr.sbin/vmd/vmm.c 25 Oct 2019 09:57:33 -0000 1.94
> > > +++ usr.sbin/vmd/vmm.c 25 Oct 2019 18:11:06 -0000
> > > @@ -602,6 +602,9 @@ opentap(char *ifname)
> > > char path[PATH_MAX];
> > >
> > > for (i = 0; i < MAX_TAP; i++) {
> > > + /* Skip statically configured interface names (eg. tap0) */
> > > + if (vm_priv_byunit(i) != NULL)
> > > + continue;
> > > snprintf(path, PATH_MAX, "/dev/tap%d", i);
> > > fd = open(path, O_RDWR | O_NONBLOCK);
> > > if (fd != -1) {
> > >
>