Re: vmd writes corrupt qcow2 images

2022-04-18 Thread Ori Bernstein
Yep. I can look at it in a few days.

I'm assuming that the vm was stopped before running the check?



On April 18, 2022 3:42:50 PM EDT, Mike Larkin  wrote:
>On Mon, Apr 18, 2022 at 12:21:39PM -0400, Dave Voutila wrote:
>>
>> "Thomas L."  writes:
>>
>> > Hi,
>> >
>> > I recently tried to use qemu-img with qcow2 images of my VMs and
>> > qemu-img finds them corrupted. I can reproduce the issue in the
>> > following way (on -current, but is the same on -stable; tried different
>> > hosts to exclude hardware errors):
>> >
>> > marsden# vmctl create -s 300G test.qcow2
>> > vmctl: qcow2 imagefile created
>> > marsden# qemu-img check test.qcow2
>> > No errors were found on the image.
>> > Image end offset: 262144
>> > marsden# vmctl start -cL -B net -b /bsd.rd -d test.qcow2 test
>> 
>> > marsden# qemu-img check test.qcow2
>> > ERROR cluster 32769 refcount=0 reference=1
>> >
>> > 1 errors were found on the image.
>> > Data may be corrupted, or further writes to the image may corrupt it.
>> > 39422/4915200 = 0.80% allocated, 3.31% fragmented, 0.00% compressed 
>> > clusters
>> > Image end offset: 2610888704
>>
>> I've been able to reproduce the above, but interestingly if I "repair"
>> the qcow2 image using:
>>
>> $ qemu-img -r all test.qcow2
>>
>> It then reports 0 errors. Booting up the vm and installing packages like
>> python3 and git and then cloning some git repos to work the disk, it
>> still reports 0 errors.
>>
>> Interestingly, if I install something like an Alpine 3.13 guest (just an
>> iso I had handy), qemu-img reports 0 errors.
>>
>> I also looked at a handful of qcow2 images on one of my workstations and
>> seems only an Alpine and NixOS guest do not have this "corruption"
>> report. However, I've never experienced any noticeable abnormalities.
>>
>> Since I see this from non-OpenBSD guests (Debian, for instance) it
>> doesn't seem to be our vioblk(4) driver and is probably vmd(8)'s qcow2
>> implementation..
>>
>> Anyone familiar enough with qcow2 that might make heads or tails of
>> this? (cc'd ori@ as he was the original implementer.)
>>
>> -dv
>>
>
>Yeah, Ori would be the right person to answer. If not I can take a look if I
>find the time.
>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: fix and enable vmd(8) diskfmt regress test

2021-06-13 Thread Ori Bernstein
On Sun, 13 Jun 2021 11:09:20 -0400, Dave Voutila  wrote:

> Was about to add a test for something I'm hacking on when I noticed the
> diskfmt regress test wasn't enabled. I took a look and it's rotted. :(
> 
> It was originally written by Ori Bernstein and imported by ccardenas@
> days before the vmctl(8) qcow2 disk creation capability was
> committed by the same team. (8 Sep 2011 and 11 Sep 2011 respectively.)
> 
> The below diff allows it to compile and run properly. It also removes
> the requirement of qemu from ports for making the qcow2 image since
> vmctl(8) can do that now. Since it can now run with just vmctl, I've
> also wired it into the parent Makefile to run with the other vmd regress
> tests.
> 
> OK?


Ok by me.
 
> -dv
> 
> Index: regress/usr.sbin/vmd/Makefile
> ===
> RCS file: /cvs/src/regress/usr.sbin/vmd/Makefile,v
> retrieving revision 1.2
> diff -u -p -r1.2 Makefile
> --- regress/usr.sbin/vmd/Makefile 29 Jan 2019 21:08:12 -  1.2
> +++ regress/usr.sbin/vmd/Makefile 13 Jun 2021 14:42:42 -
> @@ -2,7 +2,7 @@
> 
>  .if ${MACHINE} == "amd64"
>  SUBDIR =
> -SUBDIR += config
> +SUBDIR += config diskfmt
> 
>  .elif make(regress) || make(all)
>  ${.TARGETS}:
> Index: regress/usr.sbin/vmd/diskfmt/Makefile
> ===
> RCS file: /cvs/src/regress/usr.sbin/vmd/diskfmt/Makefile,v
> retrieving revision 1.2
> diff -u -p -r1.2 Makefile
> --- regress/usr.sbin/vmd/diskfmt/Makefile 8 Oct 2018 16:32:01 -   
> 1.2
> +++ regress/usr.sbin/vmd/diskfmt/Makefile 13 Jun 2021 14:42:42 -
> @@ -4,28 +4,26 @@
>  # qcow disk image, and scribbles the same data to both
>  # of them. It verifies that they both have the same
>  # result.
> -#
> -# In order for this test to work, qemu must be installed
> -# in order to create the disk images.
> 
> -VMD_DIR=$(BSDSRCDIR)/usr.sbin/vmd/
> +VMD_DIR=$(BSDSRCDIR)/usr.sbin/vmd
> 
>  PROG=vioscribble
> -SRCS=vioscribble.c vioqcow2.c vioraw.c
> +SRCS=vioscribble.c vioqcow2.c vioraw.c log.c
>  CFLAGS+=-I$(VMD_DIR) -pthread
>  LDFLAGS+=-pthread
> 
>  run-regress-vioscribble: scribble-images
> 
>  scribble-images:
> - rm -f scribble.raw scribble.qc2
> - vmctl create scribble.raw -s 4G
> - qemu-img create -f qcow2 scribble.qc2 4G
> + rm -f scribble.raw scribble.qcow2
> + vmctl create -s 4G scribble.raw
> + vmctl create -s 4G scribble.qcow2
> 
> 
>  .PHONY: ${REGRESS_TARGETS} scribble-images
> 
>  .include 
> 
> -vioqcow2.c vioraw.c: $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c
> - cp $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c .
> +vioqcow2.c vioraw.c log.c: $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c \
> + $(VMD_DIR)/log.c
> + cp $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c $(VMD_DIR)/log.c .
> Index: regress/usr.sbin/vmd/diskfmt/vioscribble.c
> ===
> RCS file: /cvs/src/regress/usr.sbin/vmd/diskfmt/vioscribble.c,v
> retrieving revision 1.2
> diff -u -p -r1.2 vioscribble.c
> --- regress/usr.sbin/vmd/diskfmt/vioscribble.c8 Oct 2018 16:32:01 
> -   1.2
> +++ regress/usr.sbin/vmd/diskfmt/vioscribble.c13 Jun 2021 14:42:42 
> -
> @@ -16,7 +16,7 @@
>   * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>   */
> 
> -/*
> +/*
>   * Quick hack of a program to try to test vioqcow2.c against
>   * vioraw.c.
>   *
> @@ -51,7 +51,6 @@
>  #include 
>  #include 
> 
> -#include "pci.h"
>  #include "vmd.h"
>  #include "vmm.h"
>  #include "virtio.h"
> @@ -65,43 +64,6 @@ struct virtio_backing rawfile;
>  /* We expect the scribble disks to be 4g in size */
>  #define DISKSZ   (4ull*1024ull*1024ull*1024ull)
> 
> -/* functions that io code depends on */
> -
> -void
> -log_debug(const char *emsg, ...)
> -{
> - if (verbose) {
> - va_list ap;
> -
> - va_start(ap, emsg);
> - vfprintf(stdout, emsg, ap);
> - fprintf(stdout, "\n");
> - va_end(ap);
> - }
> -}
> -
> -void
> -log_warnx(const char *emsg, ...)
> -{
> - va_list ap;
> -
> - va_start(ap, emsg);
> - vfprintf(stdout, emsg, ap);
> - fprintf(stdout, "\n");
> - va_end(ap);
> -}
> -
> -void
> -log_warn(const char *emsg, ...)
> -{
> - va_list ap;
> -
> - va_start(ap, emsg);
> - vfprintf(stdout, emsg, ap);
> - fprintf(stdout, "\n");
> - va_end(

Re: Fixes for padding in OpenFlow header match fields

2019-11-26 Thread Ori Bernstein
  instructionslen, padsize;
> @@ -498,27 +486,10 @@ ofp_print_flowmod(const u_char *bp, u_in
>   goto parse_next_instruction;
>   }
>  
> - parse_next_oxm:
> - if (length < sizeof(*oxm)) {
> - printf(" [|OpenFlow]");
> - return;
> - }
> -
> - oxm = (struct ofp_ox_match *)bp;
> - bp += sizeof(*oxm);
> - length -= sizeof(*oxm);
> - if (length < oxm->oxm_length) {
> - printf(" [|OpenFlow]");
> - return;
> - }
> + ofp_print_oxm_field(bp, length, omlen, 0);
>  
> - ofp_print_oxm(oxm, bp, length);
> -
> - bp += oxm->oxm_length;
> - length -= oxm->oxm_length;
> - omlen -= min(sizeof(*oxm) + oxm->oxm_length, omlen);
> - if (omlen)
> - goto parse_next_oxm;
> + bp += omlen;
> + length -= omlen;
>  
>   /* Skip padding if any. */
>   if (padsize) {
> @@ -1017,7 +988,6 @@ void
>  action_print_setfield(const u_char *bp, u_int length)
>  {
>   struct ofp_action_set_field *asf;
> - struct ofp_ox_match *oxm;
>   int  omlen;
>  
>   if (length < (sizeof(*asf) - AH_UNPADDED)) {
> @@ -1030,27 +1000,7 @@ action_print_setfield(const u_char *bp, 
>   if (omlen == 0)
>   return;
>  
> - parse_next_oxm:
> - if (length < sizeof(*oxm)) {
> - printf(" [|OpenFlow]");
> - return;
> - }
> -
> - oxm = (struct ofp_ox_match *)bp;
> - bp += sizeof(*oxm);
> - length -= sizeof(*oxm);
> - if (length < oxm->oxm_length) {
> - printf(" [|OpenFlow]");
> - return;
> - }
> -
> - ofp_print_oxm(oxm, bp, length);
> -
> - bp += oxm->oxm_length;
> - length -= oxm->oxm_length;
> - omlen -= min(sizeof(*oxm) + oxm->oxm_length, omlen);
> - if (omlen)
> - goto parse_next_oxm;
> + ofp_print_oxm_field(bp, length, omlen, 1);
>  }
>  
>  void
> @@ -1094,6 +1044,7 @@ ofp_print_action(struct ofp_action_heade
>   break;
>  
>   case OFP_ACTION_SET_FIELD:
> + action_print_setfield(bp, length);
>   break;
>  
>   case OFP_ACTION_COPY_TTL_OUT:
> 


-- 
Ori Bernstein



Re: Reuse VM ids.

2018-11-23 Thread Ori Bernstein
On Fri, 16 Nov 2018 16:15:14 +0100, Reyk Floeter  wrote:

> On Sat, Oct 27, 2018 at 02:53:16PM -0700, Ori Bernstein wrote:
> > On Fri, 26 Oct 2018 01:57:15 +0200, Reyk Floeter  wrote:
> > 
> > > On Tue, Oct 23, 2018 at 10:21:08PM -0700, Ori Bernstein wrote:
> > > > On Mon, 8 Oct 2018 07:59:15 -0700, Bob Beck  wrote:
> > > > 
> > > > > works here and I like it.  but probably for after unlock
> > > > > 
> > > > 
> > > > It's after unlock -- pinging for OKs.
> > > > 
> > > 
> > > Not yet.  Please include the VM's uid in the claim, e.g.
> > > 
> > >   claim_vmid(const char *name, uid_t uid)
> > > 
> > > It is not a strong protection, but it doesn't make sense that other
> > > users can run a VM with the same name and get the claimed Id.
> > > 
> > > Reyk
> > > 
> > 
> > Updated.
> > 
> > Ok?
> > 
> 
> Sorry for the delay...
> 
> Two minor things:
> 
> - I think config_purge() should clear env->vmd_known as it does with
> vmd_vms and vmd_switches.
>
> - The use of static functions (static uint32_t vm_claimid) is disputable
> but somewhat uncommon in OpenBSD.  But even static functions do need a
> prototype ("All functions are prototyped somewhere." - style(9)).
 
Done. Anyone want to give a second ok?

diff --git usr.sbin/vmd/config.c usr.sbin/vmd/config.c
index 20a16f85442..c662d329eb4 100644
--- usr.sbin/vmd/config.c
+++ usr.sbin/vmd/config.c
@@ -87,7 +87,10 @@ config_init(struct vmd *env)
if (what & CONFIG_VMS) {
if ((env->vmd_vms = calloc(1, sizeof(*env->vmd_vms))) == NULL)
return (-1);
+   if ((env->vmd_known = calloc(1, sizeof(*env->vmd_known))) == 
NULL)
+   return (-1);
TAILQ_INIT(env->vmd_vms);
+   TAILQ_INIT(env->vmd_known);
}
if (what & CONFIG_SWITCHES) {
if ((env->vmd_switches = calloc(1,
@@ -109,6 +112,7 @@ void
 config_purge(struct vmd *env, unsigned int reset)
 {
struct privsep  *ps = >vmd_ps;
+   struct name2id  *n2i;
struct vmd_vm   *vm;
struct vmd_switch   *vsw;
unsigned int what;
@@ -125,6 +129,10 @@ config_purge(struct vmd *env, unsigned int reset)
while ((vm = TAILQ_FIRST(env->vmd_vms)) != NULL) {
vm_remove(vm, __func__);
}
+   while ((n2i = TAILQ_FIRST(env->vmd_known)) != NULL) {
+   TAILQ_REMOVE(env->vmd_known, n2i, entry);
+   free(n2i);
+   }
env->vmd_nvm = 0;
}
if (what & CONFIG_SWITCHES && env->vmd_switches != NULL) {
diff --git usr.sbin/vmd/vmd.c usr.sbin/vmd/vmd.c
index 9423081df1e..5bb751511d0 100644
--- usr.sbin/vmd/vmd.c
+++ usr.sbin/vmd/vmd.c
@@ -62,6 +62,7 @@ intvmd_check_vmh(struct vm_dump_header *);
 int vm_instance(struct privsep *, struct vmd_vm **,
struct vmop_create_params *, uid_t);
 int vm_checkinsflag(struct vmop_create_params *, unsigned int, uid_t);
+uint32_t vm_claimid(const char *, int);
 
 struct vmd *env;
 
@@ -1169,6 +1170,28 @@ vm_remove(struct vmd_vm *vm, const char *caller)
free(vm);
 }
 
+uint32_t
+vm_claimid(const char *name, int uid)
+{
+   struct name2id *n2i = NULL;
+
+   TAILQ_FOREACH(n2i, env->vmd_known, entry)
+   if (strcmp(n2i->name, name) == 0 && n2i->uid == uid)
+   return n2i->id;
+
+   if (++env->vmd_nvm == 0)
+   fatalx("too many vms");
+   if ((n2i = calloc(1, sizeof(struct name2id))) == NULL)
+   fatalx("could not alloc vm name");
+   n2i->id = env->vmd_nvm;
+   n2i->uid = uid;
+   if (strlcpy(n2i->name, name, sizeof(n2i->name)) >= sizeof(n2i->name))
+   fatalx("overlong vm name");
+   TAILQ_INSERT_TAIL(env->vmd_known, n2i, entry);
+
+   return n2i->id;
+}
+
 int
 vm_register(struct privsep *ps, struct vmop_create_params *vmc,
 struct vmd_vm **ret_vm, uint32_t id, uid_t uid)
@@ -1300,11 +1323,8 @@ vm_register(struct privsep *ps, struct 
vmop_create_params *vmc,
vm->vm_cdrom = -1;
vm->vm_iev.ibuf.fd = -1;
 
-   if (++env->vmd_nvm == 0)
-   fatalx("too many vms");
-
/* Assign a new internal Id if not specified */
-   vm->vm_vmid = id == 0 ? env->vmd_nvm : id;
+   vm->vm_vmid = (id == 0) ? vm_claimid(vcp->vcp_name, uid) : id;
 
log_debug("%s: registering vm %d", __func__, vm->vm_vmid);
TAILQ_INSERT_TAIL(env->vmd_vms, vm

Re: Reuse VM ids.

2018-11-03 Thread Ori Bernstein
On Sat, 27 Oct 2018 14:53:16 -0700, Ori Bernstein  wrote:

> On Fri, 26 Oct 2018 01:57:15 +0200, Reyk Floeter  wrote:
> 
> > On Tue, Oct 23, 2018 at 10:21:08PM -0700, Ori Bernstein wrote:
> > > On Mon, 8 Oct 2018 07:59:15 -0700, Bob Beck  wrote:
> > > 
> > > > works here and I like it.  but probably for after unlock
> > > > 
> > > 
> > > It's after unlock -- pinging for OKs.
> > > 
> > 
> > Not yet.  Please include the VM's uid in the claim, e.g.
> > 
> > claim_vmid(const char *name, uid_t uid)
> > 
> > It is not a strong protection, but it doesn't make sense that other
> > users can run a VM with the same name and get the claimed Id.
> > 
> > Reyk
> > 
> 
> Updated.
> 
> Ok?
> 

Ping?

-- 
Ori Bernstein



Re: Qcow2: Clean up logging/error handling

2018-11-03 Thread Ori Bernstein
On Tue, 30 Oct 2018 23:01:50 -0700, Mike Larkin  wrote:

> On Tue, Oct 30, 2018 at 10:41:21PM -0700, o...@eigenstate.org wrote:
> > > On Tue, Oct 30, 2018 at 10:32:37PM -0700, Ori Bernstein wrote:
> > >> On Tue, 30 Oct 2018 22:29:01 -0700, Mike Larkin  
> > >> wrote:
> > >> 
> > >> > > -if (disk->base->clustersz != disk->clustersz) {
> > >> > > -log_warn("%s: all disks must share clustersize",
> > >> > > +fatal("%s: could not open %s", basepath, 
> > >> > > __func__);
> > >> > 
> > >> > is this right?
> > >> > 
> > >> > > +if (qc2_open(disk->base, fds + 1, nfd - 1) == -1)
> > >> > > +fatalx("%s: could not open %s", basepath, 
> > >> > > __func__);
> > >> > 
> > >> > same
> > >> > 
> > >> > > +if (disk->base->clustersz != disk->clustersz)
> > >> > > +fatalx("%s: all disk parts must share 
> > >> > > clustersize",
> > >> > >  __func__);
> > >> > > -    goto error;
> > >> > > -}
> > >> > > -}
> > >> 
> > >> Good question. I think from earlier discussion, we wanted to fail if we 
> > >> could
> > >> not open a disk image -- especially since these ones are actually 
> > >> *parts* of
> > >> a disk image, meaning that we've got a corrupt image.
> > >> 
> > >> I can easily change these back to warns + error returns.
> > >> 
> > >> -- 
> > >> Ori Bernstein
> > > 
> > > I was referring to the argument order.
> > 
> > Oh, that's obviously wrong. Updated.
> > 
> 
> with that fix, go for it. we can hack on it further in-tree.
> 

Anyone want to give me a second ok?

-- 
Ori Bernstein



Re: Qcow2: Clean up logging/error handling

2018-10-30 Thread Ori Bernstein
On Tue, 30 Oct 2018 22:29:01 -0700, Mike Larkin  wrote:

> > -   if (disk->base->clustersz != disk->clustersz) {
> > -   log_warn("%s: all disks must share clustersize",
> > +   fatal("%s: could not open %s", basepath, __func__);
> 
> is this right?
> 
> > +   if (qc2_open(disk->base, fds + 1, nfd - 1) == -1)
> > +   fatalx("%s: could not open %s", basepath, __func__);
> 
> same
> 
> > +   if (disk->base->clustersz != disk->clustersz)
> > +   fatalx("%s: all disk parts must share clustersize",
> > __func__);
> > -   goto error;
> > -   }
> > -   }

Good question. I think from earlier discussion, we wanted to fail if we could
not open a disk image -- especially since these ones are actually *parts* of
a disk image, meaning that we've got a corrupt image.

I can easily change these back to warns + error returns.

-- 
Ori Bernstein



Re: Qcow2: Clean up logging/error handling

2018-10-30 Thread Ori Bernstein
truncate(disk->fd, disk->end) == -1) {
-   perror("ftruncate");
-   goto fail;
-   }
+   if (ftruncate(disk->fd, disk->end) == -1)
+   fatal("%s: ftruncate failed", __func__);
 
/*
 * If we translated, found a L2 entry, but it needed to
 * be copied, copy it.
 */
-   if (orig != 0 && copy_cluster(disk, disk, l2tab, orig) == -1) {
-   perror("move cluster");
-   goto fail;
-   }
+   if (orig != 0)
+   copy_cluster(disk, disk, l2tab, orig);
/* Update l1 -- we flush it later */
disk->l1[l1off] = l2tab | QCOW2_INPLACE;
-   if (inc_refs(disk, l2tab, 1) == -1) {
-   perror("refs");
-   goto fail;
-   }
+   inc_refs(disk, l2tab, 1);
}
l2tab &= ~QCOW2_INPLACE;
 
/* Grow the disk */
if (ftruncate(disk->fd, disk->end + disk->clustersz) < 0)
-   goto fail;
+   fatalx("%s: could not grow disk", __func__);
if (src_phys > 0)
-   if (copy_cluster(disk, base, disk->end, src_phys) == -1)
-   goto fail;
+   copy_cluster(disk, base, disk->end, src_phys);
cluster = disk->end;
disk->end += disk->clustersz;
buf = htobe64(cluster | QCOW2_INPLACE);
if (pwrite(disk->fd, , sizeof(buf), l2tab + l2off * 8) != 8)
-   goto fail;
+   fatalx("%s: could not write cluster", __func__);
 
/* TODO: lazily sync: currently VMD doesn't close things */
buf = htobe64(disk->l1[l1off]);
if (pwrite(disk->fd, , sizeof(buf), disk->l1off + 8 * l1off) != 8)
-   goto fail;
-   if (inc_refs(disk, cluster, 1) == -1)
-   goto fail;
+   fatalx("%s: could not write l1", __func__);
+   inc_refs(disk, cluster, 1);
 
pthread_rwlock_unlock(>lock);
clusteroff = off % disk->clustersz;
+   if (cluster + clusteroff < disk->clustersz)
+   fatalx("write would clobber header");
return cluster + clusteroff;
-
-fail:
-   pthread_rwlock_unlock(>lock);
-   return -1;
 }
 
 /* Copies a cluster containing src to dst. Src and dst need not be aligned. */
-static int
+static void
 copy_cluster(struct qcdisk *disk, struct qcdisk *base, off_t dst, off_t src)
 {
char *scratch;
 
scratch = alloca(disk->clustersz);
if (!scratch)
-   err(1, "out of memory");
+   fatal("out of memory");
src &= ~(disk->clustersz - 1);
dst &= ~(disk->clustersz - 1);
if (pread(base->fd, scratch, disk->clustersz, src) == -1)
-   return -1;
+   fatal("%s: could not read cluster", __func__);
if (pwrite(disk->fd, scratch, disk->clustersz, dst) == -1)
-   return -1;
-   return 0;
+   fatal("%s: could not write cluster", __func__);
 }
 
-static int
+static void
 inc_refs(struct qcdisk *disk, off_t off, int newcluster)
 {
off_t l1off, l1idx, l2idx, l2cluster;
@@ -623,34 +583,28 @@ inc_refs(struct qcdisk *disk, off_t off, int newcluster)
l2idx = (off / disk->clustersz) % nper;
l1off = disk->refoff + 8 * l1idx;
if (pread(disk->fd, , sizeof(buf), l1off) != 8)
-   return -1;
+   fatal("could not read refs");
 
l2cluster = be64toh(buf);
if (l2cluster == 0) {
l2cluster = disk->end;
disk->end += disk->clustersz;
-   if (ftruncate(disk->fd, disk->end) < 0) {
-   log_warn("%s: refs block grow fail", __func__);
-       return -1;
-   }
+   if (ftruncate(disk->fd, disk->end) < 0)
+   fatal("%s: failed to allocate ref block", __func__);
buf = htobe64(l2cluster);
-   if (pwrite(disk->fd, , sizeof(buf), l1off) != 8) {
-   return -1;
-   }
+   if (pwrite(disk->fd, , sizeof(buf), l1off) != 8)
+   fatal("%s: failed to write ref block", __func__);
}
 
refs = 1;
if (!newcluster) {
if (pread(disk->fd, , sizeof(refs),
l2cluster + 2 * l2idx) != 2)
-   return -1;
+   fatal("could not read ref cluster");
refs = be16toh(refs) + 1;
}
refs = htobe16(refs);
-   if (pwrite(disk->fd, , sizeof(refs), l2cluster + 2 * l2idx) != 2) {
-   log_warn("%s: could not write ref block", __func__);
-   return -1;
-   }
-   return 0;
+   if (pwrite(disk->fd, , sizeof(refs), l2cluster + 2 * l2idx) != 2)
+   fatal("%s: could not write ref block", __func__);
 }
 

-- 
Ori Bernstein



Re: Qcow2: Clean up logging/error handling

2018-10-27 Thread Ori Bernstein
On Sat, 27 Oct 2018 16:15:32 -0600, "Theo de Raadt"  wrote:

> I quite dislike when software uses __func__ to tell me what internal
> function had an error.
> 
> Why should a user see an error with the string virtio_qcow2_get_base,
> qc2_open, qc2_pwrite, etc?
> 
> It feels unpolished.
> 

Possibly, but it is consistent with the rest of vmd (16 instances are
from vioqcow2.c)

$ grep log.*__func__ *.c | wc -l
162

But many of them are on lines that wrap. Overcounting a bit:

$ grep __func__ *.c | wc -l
474

If we want to change it, I think it's better as an independent patch that
converts all of vmd.

-- 
Ori Bernstein



Re: Reuse VM ids.

2018-10-27 Thread Ori Bernstein
On Fri, 26 Oct 2018 01:57:15 +0200, Reyk Floeter  wrote:

> On Tue, Oct 23, 2018 at 10:21:08PM -0700, Ori Bernstein wrote:
> > On Mon, 8 Oct 2018 07:59:15 -0700, Bob Beck  wrote:
> > 
> > > works here and I like it.  but probably for after unlock
> > > 
> > 
> > It's after unlock -- pinging for OKs.
> > 
> 
> Not yet.  Please include the VM's uid in the claim, e.g.
> 
>   claim_vmid(const char *name, uid_t uid)
> 
> It is not a strong protection, but it doesn't make sense that other
> users can run a VM with the same name and get the claimed Id.
> 
> Reyk
> 

Updated.

Ok?

diff --git usr.sbin/vmd/config.c usr.sbin/vmd/config.c
index a749e3595b5..31f46bf3c5b 100644
--- usr.sbin/vmd/config.c
+++ usr.sbin/vmd/config.c
@@ -60,7 +60,10 @@ config_init(struct vmd *env)
if (what & CONFIG_VMS) {
if ((env->vmd_vms = calloc(1, sizeof(*env->vmd_vms))) == NULL)
return (-1);
+   if ((env->vmd_known = calloc(1, sizeof(*env->vmd_known))) == 
NULL)
+   return (-1);
TAILQ_INIT(env->vmd_vms);
+   TAILQ_INIT(env->vmd_known);
}
if (what & CONFIG_SWITCHES) {
if ((env->vmd_switches = calloc(1,
diff --git usr.sbin/vmd/vmd.c usr.sbin/vmd/vmd.c
index 8053b02620f..0683812f9f0 100644
--- usr.sbin/vmd/vmd.c
+++ usr.sbin/vmd/vmd.c
@@ -1169,6 +1169,28 @@ vm_remove(struct vmd_vm *vm, const char *caller)
free(vm);
 }
 
+static uint32_t
+vm_claimid(const char *name, int uid)
+{
+   struct name2id *n2i = NULL;
+
+   TAILQ_FOREACH(n2i, env->vmd_known, entry)
+   if (strcmp(n2i->name, name) == 0 && n2i->uid == uid)
+   return n2i->id;
+
+   if (++env->vmd_nvm == 0)
+   fatalx("too many vms");
+   if ((n2i = calloc(1, sizeof(struct name2id))) == NULL)
+   fatalx("could not alloc vm name");
+   n2i->id = env->vmd_nvm;
+   n2i->uid = uid;
+   if (strlcpy(n2i->name, name, sizeof(n2i->name)) >= sizeof(n2i->name))
+   fatalx("overlong vm name");
+   TAILQ_INSERT_TAIL(env->vmd_known, n2i, entry);
+
+   return n2i->id;
+}
+
 int
 vm_register(struct privsep *ps, struct vmop_create_params *vmc,
 struct vmd_vm **ret_vm, uint32_t id, uid_t uid)
@@ -1300,11 +1322,8 @@ vm_register(struct privsep *ps, struct 
vmop_create_params *vmc,
vm->vm_cdrom = -1;
vm->vm_iev.ibuf.fd = -1;
 
-   if (++env->vmd_nvm == 0)
-   fatalx("too many vms");
-
/* Assign a new internal Id if not specified */
-   vm->vm_vmid = id == 0 ? env->vmd_nvm : id;
+   vm->vm_vmid = (id == 0) ? vm_claimid(vcp->vcp_name, uid) : id;
 
log_debug("%s: registering vm %d", __func__, vm->vm_vmid);
TAILQ_INSERT_TAIL(env->vmd_vms, vm, vm_entry);
diff --git usr.sbin/vmd/vmd.h usr.sbin/vmd/vmd.h
index 7ae4e4bd65e..047b5bb3e00 100644
--- usr.sbin/vmd/vmd.h
+++ usr.sbin/vmd/vmd.h
@@ -284,6 +284,14 @@ struct vmd_user {
 };
 TAILQ_HEAD(userlist, vmd_user);
 
+struct name2id {
+   charname[VMM_MAX_NAME_LEN];
+   int uid;
+   int32_t id;
+   TAILQ_ENTRY(name2id)entry;
+};
+TAILQ_HEAD(name2idlist, name2id);
+
 struct address {
struct sockaddr_storage  ss;
int  prefixlen;
@@ -308,6 +316,7 @@ struct vmd {
 
uint32_t vmd_nvm;
    struct vmlist   *vmd_vms;
+   struct name2idlist  *vmd_known;
uint32_t vmd_nswitches;
struct switchlist   *vmd_switches;
struct userlist *vmd_users;

-- 
Ori Bernstein



Re: Qcow2: Clean up logging/error handling

2018-10-27 Thread Ori Bernstein
On Wed, 24 Oct 2018 16:23:29 +0800, Michael Mikonos  wrote:

> On Tue, Oct 23, 2018 at 09:44:24PM -0700, Ori Bernstein wrote:
> > This patch turns most warnings into errors, and uses the
> > appropriate fatal/fatalx so that we don't print bogus error
> > strings. It also adds checks for unsupported refcount sizes
> > and writes that clobber the header.
> > 
> > Ok?
> 
> Hello, two comments below.

Updated. Also changed some fatalx'es to fatals, so we get the errno, and
removed status returns plus associated checks from functions that no longer
had a failure path.


diff --git usr.sbin/vmd/vioqcow2.c usr.sbin/vmd/vioqcow2.c
index 3a215599d49..12301650127 100644
--- usr.sbin/vmd/vioqcow2.c
+++ usr.sbin/vmd/vioqcow2.c
@@ -79,15 +79,15 @@ struct qcdisk {
int   fd;
uint64_t *l1;
off_t end;
-   uint32_t  clustersz;
+   off_t clustersz;
off_t disksz; /* In bytes */
-   uint32_t cryptmethod;
+   uint32_t  cryptmethod;
 
uint32_t l1sz;
off_tl1off;
 
off_trefoff;
-   uint32_t refsz;
+   off_trefsz;
 
uint32_t nsnap;
off_tsnapoff;
@@ -102,9 +102,9 @@ struct qcdisk {
 extern char *__progname;
 
 static off_t xlate(struct qcdisk *, off_t, int *);
-static int copy_cluster(struct qcdisk *, struct qcdisk *, off_t, off_t);
+static void copy_cluster(struct qcdisk *, struct qcdisk *, off_t, off_t);
+static void inc_refs(struct qcdisk *, off_t, int);
 static off_t mkcluster(struct qcdisk *, struct qcdisk *, off_t, off_t);
-static int inc_refs(struct qcdisk *, off_t, int);
 static int qc2_open(struct qcdisk *, int *, size_t);
 static ssize_t qc2_pread(void *, char *, size_t, off_t);
 static ssize_t qc2_pwrite(void *, char *, size_t, off_t);
@@ -137,6 +137,10 @@ virtio_init_qcow2(struct virtio_backing *file, off_t *szp, 
int *fd, size_t nfd)
return 0;
 }
 
+/*
+ * Return the path to the base image given a disk image.
+ * Called from vmctl.
+ */
 ssize_t
 virtio_qcow2_get_base(int fd, char *path, size_t npath, const char *dpath)
 {
@@ -151,7 +155,7 @@ virtio_qcow2_get_base(int fd, char *path, size_t npath, 
const char *dpath)
return -1;
}
if (strncmp(header.magic, VM_MAGIC_QCOW, strlen(VM_MAGIC_QCOW)) != 0) {
-   log_warn("%s: invalid magic numbers", __func__);
+   log_warnx("%s: invalid magic numbers", __func__);
return -1;
}
backingoff = be64toh(header.backingoff);
@@ -160,7 +164,7 @@ virtio_qcow2_get_base(int fd, char *path, size_t npath, 
const char *dpath)
return 0;
 
if (backingsz >= npath - 1) {
-   log_warn("%s: snapshot path too long", __func__);
+   log_warnx("%s: snapshot path too long", __func__);
return -1;
}
if (pread(fd, path, backingsz, backingoff) != backingsz) {
@@ -178,20 +182,19 @@ virtio_qcow2_get_base(int fd, char *path, size_t npath, 
const char *dpath)
if (path[0] == '/') {
if (realpath(path, expanded) == NULL ||
strlcpy(path, expanded, npath) >= npath) {
-   log_warn("unable to resolve %s", path);
+   log_warnx("unable to resolve %s", path);
return -1;
}
} else {
s = dirname(dpath);
if (snprintf(expanded, sizeof(expanded),
"%s/%s", s, path) >= (int)sizeof(expanded)) {
-   log_warn("path too long: %s/%s",
-   s, path);
+   log_warnx("path too long: %s/%s", s, path);
return -1;
}
if (npath < PATH_MAX ||
realpath(expanded, path) == NULL) {
-   log_warn("unable to resolve %s", path);
+   log_warnx("unable to resolve %s", path);
return -1;
}
}
@@ -207,7 +210,7 @@ qc2_open(struct qcdisk *disk, int *fds, size_t nfd)
struct qcheader header;
uint64_t backingoff;
uint32_t backingsz;
-   size_t i;
+   off_t i;
int version, fd;
 
pthread_rwlock_init(>lock, NULL);
@@ -216,15 +219,10 @@ qc2_open(struct qcdisk *disk, int *fds, size_t nfd)
disk->base = NULL;
disk->l1 = NULL;
 
-   if (pread(fd, , sizeof(header), 0) != sizeof(header)) {
-   log_warn("%s: short read on header", __func__);
-   goto error;
-   }
-   if (strncmp(header.magic,
-   VM_MAGIC_QCOW, strlen(VM_MAGIC_QCOW)) != 0) {
-   log_warn("%s: invalid magic numbers", __func__);
-   goto error;
-   }
+   if (pread(fd, , size

Re: Reuse VM ids.

2018-10-23 Thread Ori Bernstein
On Mon, 8 Oct 2018 07:59:15 -0700, Bob Beck  wrote:

> works here and I like it.  but probably for after unlock
> 

It's after unlock -- pinging for OKs.

-- 
    Ori Bernstein



Qcow2: Clean up logging/error handling

2018-10-23 Thread Ori Bernstein
541,54 +516,46 @@ mkcluster(struct qcdisk *disk, struct qcdisk *base, off_t 
off, off_t src_phys)
orig = l2tab & ~QCOW2_INPLACE;
l2tab = disk->end;
disk->end += disk->clustersz;
-   if (ftruncate(disk->fd, disk->end) == -1) {
-   perror("ftruncate");
-   goto fail;
-   }
+   if (ftruncate(disk->fd, disk->end) == -1)
+   fatalx("%s: ftruncate failed", __func__);
 
/*
 * If we translated, found a L2 entry, but it needed to
 * be copied, copy it.
 */
-   if (orig != 0 && copy_cluster(disk, disk, l2tab, orig) == -1) {
-   perror("move cluster");
-   goto fail;
-   }
+   if (orig != 0 && copy_cluster(disk, disk, l2tab, orig) == -1)
+   fatalx("%s: could not move cluster", __func__);
/* Update l1 -- we flush it later */
disk->l1[l1off] = l2tab | QCOW2_INPLACE;
-   if (inc_refs(disk, l2tab, 1) == -1) {
-   perror("refs");
-   goto fail;
-   }
+   if (inc_refs(disk, l2tab, 1) == -1)
+   fatalx("%s: could not inc refs", __func__);
}
l2tab &= ~QCOW2_INPLACE;
 
/* Grow the disk */
if (ftruncate(disk->fd, disk->end + disk->clustersz) < 0)
-   goto fail;
+   fatalx("%s: could not grow disk", __func__);
if (src_phys > 0)
if (copy_cluster(disk, base, disk->end, src_phys) == -1)
-   goto fail;
+   fatalx("%s: could not copy cluster", __func__);
cluster = disk->end;
disk->end += disk->clustersz;
buf = htobe64(cluster | QCOW2_INPLACE);
if (pwrite(disk->fd, , sizeof(buf), l2tab + l2off * 8) != 8)
-   goto fail;
+   fatalx("%s: could not write cluster", __func__);
 
/* TODO: lazily sync: currently VMD doesn't close things */
buf = htobe64(disk->l1[l1off]);
if (pwrite(disk->fd, , sizeof(buf), disk->l1off + 8 * l1off) != 8)
-   goto fail;
+   fatalx("%s: could not write l1", __func__);
if (inc_refs(disk, cluster, 1) == -1)
-   goto fail;
+   fatalx("%s: could not inc refs", __func__);
 
pthread_rwlock_unlock(>lock);
clusteroff = off % disk->clustersz;
+   if (cluster + clusteroff < disk->clustersz)
+   fatalx("write would clobber header");
return cluster + clusteroff;
-
-fail:
-   pthread_rwlock_unlock(>lock);
-   return -1;
 }
 
 /* Copies a cluster containing src to dst. Src and dst need not be aligned. */
@@ -630,7 +597,7 @@ inc_refs(struct qcdisk *disk, off_t off, int newcluster)
l2cluster = disk->end;
disk->end += disk->clustersz;
if (ftruncate(disk->fd, disk->end) < 0) {
-   log_warn("%s: refs block grow fail", __func__);
+   fatalx("%s: refs block grow fail", __func__);
return -1;
}
buf = htobe64(l2cluster);
@@ -648,7 +615,7 @@ inc_refs(struct qcdisk *disk, off_t off, int newcluster)
}
refs = htobe16(refs);
if (pwrite(disk->fd, , sizeof(refs), l2cluster + 2 * l2idx) != 2) {
-   log_warn("%s: could not write ref block", __func__);
+   fatalx("%s: could not write ref block", __func__);
return -1;
}
return 0;

-- 
Ori Bernstein



Reuse VM ids.

2018-10-07 Thread Ori Bernstein
Keep a list of known vms, and reuse the VM IDs. This means that when using
'-L', the IP addresses of the VMs are stable.

diff --git usr.sbin/vmd/config.c usr.sbin/vmd/config.c
index af12b790002..522bae32501 100644
--- usr.sbin/vmd/config.c
+++ usr.sbin/vmd/config.c
@@ -61,7 +61,10 @@ config_init(struct vmd *env)
if (what & CONFIG_VMS) {
if ((env->vmd_vms = calloc(1, sizeof(*env->vmd_vms))) == NULL)
return (-1);
+   if ((env->vmd_known = calloc(1, sizeof(*env->vmd_known))) == 
NULL)
+   return (-1);
TAILQ_INIT(env->vmd_vms);
+   TAILQ_INIT(env->vmd_known);
}
if (what & CONFIG_SWITCHES) {
if ((env->vmd_switches = calloc(1,
diff --git usr.sbin/vmd/vmd.c usr.sbin/vmd/vmd.c
index 18a5e0d3d5d..732691b4381 100644
--- usr.sbin/vmd/vmd.c
+++ usr.sbin/vmd/vmd.c
@@ -1169,6 +1169,27 @@ vm_remove(struct vmd_vm *vm, const char *caller)
free(vm);
 }
 
+static uint32_t
+claim_vmid(const char *name)
+{
+   struct name2id *n2i = NULL;
+
+   TAILQ_FOREACH(n2i, env->vmd_known, entry)
+   if (strcmp(n2i->name, name) == 0)
+   return n2i->id;
+
+   if (++env->vmd_nvm == 0)
+   fatalx("too many vms");
+   if ((n2i = calloc(1, sizeof(struct name2id))) == NULL)
+   fatalx("could not alloc vm name");
+   n2i->id = env->vmd_nvm;
+   if (strlcpy(n2i->name, name, sizeof(n2i->name)) >= sizeof(n2i->name))
+   fatalx("overlong vm name");
+   TAILQ_INSERT_TAIL(env->vmd_known, n2i, entry);
+
+   return n2i->id;
+}
+
 int
 vm_register(struct privsep *ps, struct vmop_create_params *vmc,
 struct vmd_vm **ret_vm, uint32_t id, uid_t uid)
@@ -1300,11 +1321,8 @@ vm_register(struct privsep *ps, struct 
vmop_create_params *vmc,
vm->vm_cdrom = -1;
vm->vm_iev.ibuf.fd = -1;
 
-   if (++env->vmd_nvm == 0)
-   fatalx("too many vms");
-
/* Assign a new internal Id if not specified */
-   vm->vm_vmid = id == 0 ? env->vmd_nvm : id;
+   vm->vm_vmid = (id == 0) ? claim_vmid(vcp->vcp_name) : id;
 
log_debug("%s: registering vm %d", __func__, vm->vm_vmid);
TAILQ_INSERT_TAIL(env->vmd_vms, vm, vm_entry);
diff --git usr.sbin/vmd/vmd.h usr.sbin/vmd/vmd.h
index b7c012854e8..86fad536e59 100644
--- usr.sbin/vmd/vmd.h
+++ usr.sbin/vmd/vmd.h
@@ -276,6 +276,13 @@ struct vmd_user {
 };
 TAILQ_HEAD(userlist, vmd_user);
 
+struct name2id {
+   charname[VMM_MAX_NAME_LEN];
+   int32_t id;
+   TAILQ_ENTRY(name2id)entry;
+};
+TAILQ_HEAD(name2idlist, name2id);
+
 struct address {
struct sockaddr_storage  ss;
int  prefixlen;
@@ -300,6 +307,7 @@ struct vmd {
 
uint32_t vmd_nvm;
struct vmlist   *vmd_vms;
+   struct name2idlist  *vmd_known;
    uint32_t     vmd_nswitches;
struct switchlist   *vmd_switches;
struct userlist *vmd_users;

-- 
Ori Bernstein



Re: Qcow2: External snapshots

2018-10-06 Thread Ori Bernstein
On Wed, 3 Oct 2018 07:27:17 +0100, Jason McIntyre  wrote:

> On Tue, Oct 02, 2018 at 11:13:35PM -0700, Ori Bernstein wrote:
> > 
> > Updated version. Changes from the last diff:
> > 
> > - Merge in syntax changes. 
> > - Don't over-read when getting the base images.
> > - Fix relative paths in base images.
> > - Allow multiple derived images to use a single base image, and allow a user
> >   with only read permisssions to base their images on top of it.
> > - Probe the base image size, use/validate it when craeting disk images.
> > - Fix style a bit (long lines, changing from sizeof foo to sizeof(foo).
> > - Move a define out of vmmvar.h
> > - And update the manpage with these changes.
> > - Improve error checking around creating/resolving base disk paths.
> > 
> 
> morning.
> 
> you should start new sentences on new lines - it forces a double spacing
> between sentences that all man pages have.
> 
> if you run your proposed changes to man pages through "mandoc -Tlint",
> it will pick up on silly things like that.
> 
> note there is also a double space in "Op  Fl b"
> 
> jmc
> 

Forgot to do this in my last update -- done now, and ran it through mandoc 
-Tlint.
Also fixed the bug that Reyk caught.

diff --git regress/usr.sbin/vmd/diskfmt/Makefile 
regress/usr.sbin/vmd/diskfmt/Makefile
index c2a5f42d5f6..1f8673e0e26 100644
--- regress/usr.sbin/vmd/diskfmt/Makefile
+++ regress/usr.sbin/vmd/diskfmt/Makefile
@@ -11,7 +11,7 @@
 VMD_DIR=$(BSDSRCDIR)/usr.sbin/vmd/
 
 PROG=vioscribble
-SRCS=vioscribble.c $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c
+SRCS=vioscribble.c vioqcow2.c vioraw.c
 CFLAGS+=-I$(VMD_DIR) -pthread
 LDFLAGS+=-pthread
 
@@ -26,3 +26,6 @@ scribble-images:
 .PHONY: ${REGRESS_TARGETS} scribble-images
 
 .include 
+
+vioqcow2.c vioraw.c: $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c
+   cp $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c .
diff --git regress/usr.sbin/vmd/diskfmt/vioscribble.c 
regress/usr.sbin/vmd/diskfmt/vioscribble.c
index 14d720db652..1da8efedac7 100644
--- regress/usr.sbin/vmd/diskfmt/vioscribble.c
+++ regress/usr.sbin/vmd/diskfmt/vioscribble.c
@@ -122,16 +122,18 @@ main(int argc, char **argv)
verbose = !!getenv("VERBOSE");
qcfd = open("scribble.qc2", O_RDWR);
rawfd = open("scribble.raw", O_RDWR);
-   if (qcfd == -1 || virtio_init_qcow2(, , qcfd) == -1)
+   if (qcfd == -1)
err(1, "unable to open qcow");
-   if (rawfd == -1 || virtio_init_raw(, , rawfd) == -1)
+   if (virtio_init_qcow2(, , , 1) == -1)
+   err(1, "unable to init qcow");
+   if (rawfd == -1 || virtio_init_raw(, , , 1) == -1)
err(1, "unable to open raw");
 
srandom_deterministic(123);
 
/* scribble to both disks */
printf("scribbling...\n");
-   for (i = 0; i < 16; i++) {
+   for (i = 0; i < 1024*16; i++) {
off = (random() % DISKSZ);
len = random() % sizeof buf + 1;
fill(off, buf, sizeof buf);
diff --git usr.sbin/vmctl/main.c usr.sbin/vmctl/main.c
index 69c5e013f4c..2cfb6848899 100644
--- usr.sbin/vmctl/main.c
+++ usr.sbin/vmctl/main.c
@@ -67,7 +67,8 @@ intctl_receive(struct parse_result *, int, char 
*[]);
 
 struct ctl_command ctl_commands[] = {
{ "console",CMD_CONSOLE,ctl_console,"id" },
-   { "create", CMD_CREATE, ctl_create, "\"path\" -s size", 1 },
+   { "create", CMD_CREATE, ctl_create,
+   "\"path\" [-s size] [-b base]", 1 },
{ "load",   CMD_LOAD,   ctl_load,   "\"path\"" },
{ "log",CMD_LOG,ctl_log,"[verbose|brief]" },
{ "reload", CMD_RELOAD, ctl_reload, "" },
@@ -539,47 +540,55 @@ int
 ctl_create(struct parse_result *res, int argc, char *argv[])
 {
int  ch, ret, type;
-   const char  *paths[2], *disk, *format;
+   const char  *disk, *format, *base;
 
if (argc < 2)
ctl_usage(res->ctl);
 
+   base = NULL;
type = parse_disktype(argv[1], );
 
-   paths[0] = disk;
-   paths[1] = NULL;
-
-   if (unveil(paths[0], "rwc") == -1)
+   if (pledge("stdio rpath wpath cpath unveil", NULL) == -1)
+   err(1, "pledge");
+   if (unveil(disk, "rwc") == -1)
err(1, "unveil");
 
-   if (pledge("stdio rpath wpath cpath", NULL) == -1)
-   err(1, "pledge");
argc--;
argv++;
 
-   while ((ch = getopt(argc, argv, "s

Re: Qcow2: External snapshots

2018-10-04 Thread Ori Bernstein
On Wed, 3 Oct 2018 11:36:53 +0200, Reyk Floeter  wrote:

> On Tue, Oct 02, 2018 at 11:13:35PM -0700, Ori Bernstein wrote:
> > On Mon, 1 Oct 2018 11:24:01 -0700, Ori Bernstein  
> > wrote:
> > 
> > > On Mon, 1 Oct 2018 12:55:12 +0200
> > > Reyk Floeter  wrote:
> > > 
> > > > Hi Ori,
> > > > 
> > > > On Sun, Sep 30, 2018 at 12:27:00PM -0700, Ori Bernstein wrote:
> > > > > I've added support to vmd for external snapshots. That is,
> > > > > snapshots that are derived from a base image. Data lookups
> > > > > start in the derived image, and if the derived image does not
> > > > > contain some data, the search proceeds ot the base image.
> > > > > Multiple derived images may exist off of a single base image.
> > > > > 
> > > > 
> > > > Nice work!  This will be quite useful, thanks.
> > > > 
> > > > I think I broke your diff as my last commit to derive the raw/qcow2
> > > > format introduced some conflicts.  I had posted it on hackers@ and
> > > > forgot that your aren't on the internal list yet - sorry for that.
> > 
> > Updated version. Changes from the last diff:
> > 
> > - Merge in syntax changes. 
> > - Don't over-read when getting the base images.
> > - Fix relative paths in base images.
> > - Allow multiple derived images to use a single base image, and allow a user
> >   with only read permisssions to base their images on top of it.
> > - Probe the base image size, use/validate it when craeting disk images.
> > - Fix style a bit (long lines, changing from sizeof foo to sizeof(foo).
> > - Move a define out of vmmvar.h
 
> Light testing works except of an issue with read-only base images; the
> required fix is in the comments below.
> 
> Other than that, it is really cool to run many VMs from a single base
> image.  In my tests, I installed OpenBSD once and started a few VMs
> using the installed disk as a base.
> 
> More comments below.
> 
> Reyk
 
Thanks, another update based on Reyk's feeback and fixes.

diff --git regress/usr.sbin/vmd/diskfmt/Makefile 
regress/usr.sbin/vmd/diskfmt/Makefile
index c2a5f42d5f6..1f8673e0e26 100644
--- regress/usr.sbin/vmd/diskfmt/Makefile
+++ regress/usr.sbin/vmd/diskfmt/Makefile
@@ -11,7 +11,7 @@
 VMD_DIR=$(BSDSRCDIR)/usr.sbin/vmd/
 
 PROG=vioscribble
-SRCS=vioscribble.c $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c
+SRCS=vioscribble.c vioqcow2.c vioraw.c
 CFLAGS+=-I$(VMD_DIR) -pthread
 LDFLAGS+=-pthread
 
@@ -26,3 +26,6 @@ scribble-images:
 .PHONY: ${REGRESS_TARGETS} scribble-images
 
 .include 
+
+vioqcow2.c vioraw.c: $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c
+   cp $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c .
diff --git regress/usr.sbin/vmd/diskfmt/vioscribble.c 
regress/usr.sbin/vmd/diskfmt/vioscribble.c
index 14d720db652..1da8efedac7 100644
--- regress/usr.sbin/vmd/diskfmt/vioscribble.c
+++ regress/usr.sbin/vmd/diskfmt/vioscribble.c
@@ -122,16 +122,18 @@ main(int argc, char **argv)
verbose = !!getenv("VERBOSE");
qcfd = open("scribble.qc2", O_RDWR);
rawfd = open("scribble.raw", O_RDWR);
-   if (qcfd == -1 || virtio_init_qcow2(, , qcfd) == -1)
+   if (qcfd == -1)
err(1, "unable to open qcow");
-   if (rawfd == -1 || virtio_init_raw(, , rawfd) == -1)
+   if (virtio_init_qcow2(, , , 1) == -1)
+   err(1, "unable to init qcow");
+   if (rawfd == -1 || virtio_init_raw(, , , 1) == -1)
err(1, "unable to open raw");
 
srandom_deterministic(123);
 
/* scribble to both disks */
printf("scribbling...\n");
-   for (i = 0; i < 16; i++) {
+   for (i = 0; i < 1024*16; i++) {
off = (random() % DISKSZ);
len = random() % sizeof buf + 1;
fill(off, buf, sizeof buf);
diff --git usr.sbin/vmctl/main.c usr.sbin/vmctl/main.c
index 8748ecfdedc..a3ab4672370 100644
--- usr.sbin/vmctl/main.c
+++ usr.sbin/vmctl/main.c
@@ -67,7 +67,8 @@ intctl_receive(struct parse_result *, int, char 
*[]);
 
 struct ctl_command ctl_commands[] = {
{ "console",CMD_CONSOLE,ctl_console,"id" },
-   { "create", CMD_CREATE, ctl_create, "\"path\" -s size", 1 },
+   { "create", CMD_CREATE, ctl_create,
+   "\"path\" [-s size] [-b base]", 1 },
{ "load",   CMD_LOAD,   ctl_load,   "\"path\"" },
{ "log",CMD_LOG,ctl_log,"[verbose|brief]" },
{ "reload", CMD_RELOAD, ctl_reload, 

Re: Qcow2: External snapshots

2018-10-03 Thread Ori Bernstein
On Mon, 1 Oct 2018 11:24:01 -0700, Ori Bernstein  wrote:

> On Mon, 1 Oct 2018 12:55:12 +0200
> Reyk Floeter  wrote:
> 
> > Hi Ori,
> > 
> > On Sun, Sep 30, 2018 at 12:27:00PM -0700, Ori Bernstein wrote:
> > > I've added support to vmd for external snapshots. That is,
> > > snapshots that are derived from a base image. Data lookups
> > > start in the derived image, and if the derived image does not
> > > contain some data, the search proceeds ot the base image.
> > > Multiple derived images may exist off of a single base image.
> > > 
> > 
> > Nice work!  This will be quite useful, thanks.
> > 
> > I think I broke your diff as my last commit to derive the raw/qcow2
> > format introduced some conflicts.  I had posted it on hackers@ and
> > forgot that your aren't on the internal list yet - sorry for that.

Updated version. Changes from the last diff:

- Merge in syntax changes. 
- Don't over-read when getting the base images.
- Fix relative paths in base images.
- Allow multiple derived images to use a single base image, and allow a user
  with only read permisssions to base their images on top of it.
- Probe the base image size, use/validate it when craeting disk images.
- Fix style a bit (long lines, changing from sizeof foo to sizeof(foo).
- Move a define out of vmmvar.h
- And update the manpage with these changes.
- Improve error checking around creating/resolving base disk paths.


diff --git regress/usr.sbin/vmd/diskfmt/Makefile 
regress/usr.sbin/vmd/diskfmt/Makefile
index c2a5f42d5f6..1f8673e0e26 100644
--- regress/usr.sbin/vmd/diskfmt/Makefile
+++ regress/usr.sbin/vmd/diskfmt/Makefile
@@ -11,7 +11,7 @@
 VMD_DIR=$(BSDSRCDIR)/usr.sbin/vmd/
 
 PROG=vioscribble
-SRCS=vioscribble.c $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c
+SRCS=vioscribble.c vioqcow2.c vioraw.c
 CFLAGS+=-I$(VMD_DIR) -pthread
 LDFLAGS+=-pthread
 
@@ -26,3 +26,6 @@ scribble-images:
 .PHONY: ${REGRESS_TARGETS} scribble-images
 
 .include 
+
+vioqcow2.c vioraw.c: $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c
+   cp $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c .
diff --git regress/usr.sbin/vmd/diskfmt/vioscribble.c 
regress/usr.sbin/vmd/diskfmt/vioscribble.c
index 14d720db652..1da8efedac7 100644
--- regress/usr.sbin/vmd/diskfmt/vioscribble.c
+++ regress/usr.sbin/vmd/diskfmt/vioscribble.c
@@ -122,16 +122,18 @@ main(int argc, char **argv)
verbose = !!getenv("VERBOSE");
qcfd = open("scribble.qc2", O_RDWR);
rawfd = open("scribble.raw", O_RDWR);
-   if (qcfd == -1 || virtio_init_qcow2(, , qcfd) == -1)
+   if (qcfd == -1)
err(1, "unable to open qcow");
-   if (rawfd == -1 || virtio_init_raw(, , rawfd) == -1)
+   if (virtio_init_qcow2(, , , 1) == -1)
+   err(1, "unable to init qcow");
+   if (rawfd == -1 || virtio_init_raw(, , , 1) == -1)
err(1, "unable to open raw");
 
srandom_deterministic(123);
 
/* scribble to both disks */
printf("scribbling...\n");
-   for (i = 0; i < 16; i++) {
+   for (i = 0; i < 1024*16; i++) {
off = (random() % DISKSZ);
len = random() % sizeof buf + 1;
fill(off, buf, sizeof buf);
diff --git usr.sbin/vmctl/main.c usr.sbin/vmctl/main.c
index 8748ecfdedc..4637256452b 100644
--- usr.sbin/vmctl/main.c
+++ usr.sbin/vmctl/main.c
@@ -67,7 +67,8 @@ intctl_receive(struct parse_result *, int, char 
*[]);
 
 struct ctl_command ctl_commands[] = {
{ "console",CMD_CONSOLE,ctl_console,"id" },
-   { "create", CMD_CREATE, ctl_create, "\"path\" -s size", 1 },
+   { "create", CMD_CREATE, ctl_create, 
+   "\"path\" [-s size] [-b base]", 1 },
{ "load",   CMD_LOAD,   ctl_load,   "\"path\"" },
{ "log",CMD_LOG,ctl_log,"[verbose|brief]" },
{ "reload", CMD_RELOAD, ctl_reload, "" },
@@ -538,47 +539,54 @@ int
 ctl_create(struct parse_result *res, int argc, char *argv[])
 {
int  ch, ret, type;
-   const char  *paths[2], *disk, *format;
+   const char  *disk, *format, *base;
 
if (argc < 2)
ctl_usage(res->ctl);
 
+   base = NULL;
type = parse_disktype(argv[1], );
 
-   paths[0] = disk;
-   paths[1] = NULL;
-
-   if (unveil(paths[0], "rwc") == -1)
+   if (unveil(disk, "rwc") == -1)
err(1, "unveil");
 
-   if (pledge("stdio rpath wpath cpath", NULL) == -1)
-   err(1, "pledge");
argc--;
argv+

Re: Qcow2: External snapshots

2018-10-01 Thread Ori Bernstein
On Mon, 1 Oct 2018 12:55:12 +0200
Reyk Floeter  wrote:

> Hi Ori,
> 
> On Sun, Sep 30, 2018 at 12:27:00PM -0700, Ori Bernstein wrote:
> > I've added support to vmd for external snapshots. That is,
> > snapshots that are derived from a base image. Data lookups
> > start in the derived image, and if the derived image does not
> > contain some data, the search proceeds ot the base image.
> > Multiple derived images may exist off of a single base image.
> > 
> 
> Nice work!  This will be quite useful, thanks.
> 
> I think I broke your diff as my last commit to derive the raw/qcow2
> format introduced some conflicts.  I had posted it on hackers@ and
> forgot that your aren't on the internal list yet - sorry for that.
 
No worries -- I'll rebase it after work.

> but we should be able to derive it from the base as well (there's now
> base in raw images), so the following should work as well: 
> 
>   vmctl create derived.img -s 16G -b base.img

Will fold that in to this change. I also want the creation to
automatically figure out the size here.
 
> > The main implementation change is that we now probe base
> > images before sending the disk FDs to the VM, which means that
> > we can actually open the images.
> > 
> > The base image paths may be relative. If they are relative,
> > they are interpreted relative to the location of the derived
> > image, and not relative to the directory where vmd happens to
> > be running.
> > 
> 
> OK, that needs some care + review.
 
Indeed -- I hadn't gotten it quite right.

> Please be extremely careful with the design here.  Unlike qemu, a vmd
> VM is not able to create new files itself and it should never be able
> to do it.  So when we create snapshots, we need to find a way that the
> parent prepares the file, sends the fd, and asks the VM process to use
> it.

I was not planning on allowing snapshots of running images -- merely
starting instances where all changes were transient. My plan was to make
vmctl create a temporary disk image, and point vmd to that.

-- 
Ori Bernstein 



Re: Qcow2: External snapshots

2018-09-30 Thread Ori Bernstein
On Sun, 30 Sep 2018 12:27:00 -0700, Ori Bernstein  wrote:

> I've added support to vmd for external snapshots. That is,
> snapshots that are derived from a base image. Data lookups
> start in the derived image, and if the derived image does not
> contain some data, the search proceeds ot the base image.
> Multiple derived images may exist off of a single base image.

And, a few fixes (thanks to Anton Lindqvist for testing and pointing
out issues). Please apply on top of the previous diff.

diff --git usr.sbin/vmd/config.c usr.sbin/vmd/config.c
index 21aeb050371..ff9f79b87b2 100644
--- usr.sbin/vmd/config.c
+++ usr.sbin/vmd/config.c
@@ -314,26 +314,30 @@ config_setvm(struct privsep *ps, struct vmd_vm *vm, 
uint32_t peerid, uid_t uid)
goto fail;
}
 
-   /* 
-* Clear the read-write flag for base images. 
+   /*
+* Clear the read-write flag for base images.
 * All writes should go to the top image.
 */
flags = O_RDONLY|O_EXLOCK|O_NONBLOCK;
n = virtio_get_base(diskfds[i][j], base, sizeof base,
vmc->vmc_disktypes[i]);
-   if (n == -1)
-   log_warnx("vm \"%s\" unable to read"
-   "base for disk %s", vcp->vcp_name,
-   vcp->vcp_disks[i]);
if (n == 0)
break;
-   /* 
+   if (n == -1) {
+   log_warnx("vm \"%s\" unable to read "
+   "base for disk %s", vcp->vcp_name,
+   vcp->vcp_disks[i]);
+   goto fail;
+   }
+   /*
 * Relative paths should be interpreted relative
 * to the disk image, rather than relative to the
 * directory vmd happens to be running in, since
 * this is the only userful interpretation.
 */
-   if (base[0] != '/') {
+   if (base[0] == '/') {
+   realpath(base, path);
+   } else {
s = dirname(path);
snprintf(expanded, sizeof expanded,
"%s/%s", s, base);

-- 
Ori Bernstein



Qcow2: External snapshots

2018-09-30 Thread Ori Bernstein
 unsigned int i, j;
 
if (vm == NULL)
return;
@@ -1100,9 +1100,11 @@ vm_stop(struct vmd_vm *vm, int keeptty, const char 
*caller)
close(vm->vm_iev.ibuf.fd);
}
for (i = 0; i < VMM_MAX_DISKS_PER_VM; i++) {
-   if (vm->vm_disks[i] != -1) {
-   close(vm->vm_disks[i]);
-   vm->vm_disks[i] = -1;
+   for (j = 0; j < VMM_MAX_BASE_PER_DISK; j++) {
+   if (vm->vm_disks[i][j] != -1) {
+   close(vm->vm_disks[i][j]);
+   vm->vm_disks[i][j] = -1;
+   }
}
}
for (i = 0; i < VMM_MAX_NICS_PER_VM; i++) {
@@ -1159,7 +1161,7 @@ vm_register(struct privsep *ps, struct vmop_create_params 
*vmc,
struct vmop_owner   *vmo = NULL;
struct vmd_user *usr = NULL;
uint32_t rng;
-   unsigned int i;
+   unsigned int i, j;
struct vmd_switch   *sw;
char*s;
 
@@ -1250,7 +1252,8 @@ vm_register(struct privsep *ps, struct vmop_create_params 
*vmc,
vm->vm_user = usr;
 
for (i = 0; i < VMM_MAX_DISKS_PER_VM; i++)
-   vm->vm_disks[i] = -1;
+   for (j = 0; j < VMM_MAX_BASE_PER_DISK; j++)
+   vm->vm_disks[i][j] = -1;
for (i = 0; i < VMM_MAX_NICS_PER_VM; i++)
vm->vm_ifs[i].vif_fd = -1;
for (i = 0; i < vcp->vcp_nnics; i++) {
diff --git usr.sbin/vmd/vmd.h usr.sbin/vmd/vmd.h
index 4cf0295d946..b803d44340c 100644
--- usr.sbin/vmd/vmd.h
+++ usr.sbin/vmd/vmd.h
@@ -166,6 +166,7 @@ struct vmop_create_params {
 #define VMIFF_OPTMASK  (VMIFF_LOCKED|VMIFF_LOCAL|VMIFF_RDOMAIN)
 
unsigned int vmc_disktypes[VMM_MAX_DISKS_PER_VM];
+   unsigned int vmc_diskbases[VMM_MAX_DISKS_PER_VM];
 #define VMDF_RAW   0x01
 #define VMDF_QCOW2 0x02
 
@@ -238,7 +239,7 @@ struct vmd_vm {
uint32_t vm_vmid;
int  vm_kernel;
int  vm_cdrom;
-   int  vm_disks[VMM_MAX_DISKS_PER_VM];
+   int  
vm_disks[VMM_MAX_DISKS_PER_VM][VMM_MAX_BASE_PER_DISK];
struct vmd_ifvm_ifs[VMM_MAX_NICS_PER_VM];
char*vm_ttyname;
int  vm_tty;
@@ -412,4 +413,7 @@ int  parse_config(const char *);
 int cmdline_symset(char *);
 int host(const char *, struct address *);
 
+/* virtio.c */
+int virtio_get_base(int, char *, size_t, int);
+
 #endif /* VMD_H */
diff --git usr.sbin/vmd/vmm.c usr.sbin/vmd/vmm.c
index 7757856323f..3176fd85713 100644
--- usr.sbin/vmd/vmm.c
+++ usr.sbin/vmd/vmm.c
@@ -608,7 +608,7 @@ vmm_start_vm(struct imsg *imsg, uint32_t *id, pid_t *pid)
struct vmd_vm   *vm;
int  ret = EINVAL;
int  fds[2];
-   size_t   i;
+   size_t   i, j;
 
if ((vm = vm_getbyvmid(imsg->hdr.peerid)) == NULL) {
log_warnx("%s: can't find vm", __func__);
@@ -643,8 +643,11 @@ vmm_start_vm(struct imsg *imsg, uint32_t *id, pid_t *pid)
close(fds[1]);
 
for (i = 0 ; i < vcp->vcp_ndisks; i++) {
-   close(vm->vm_disks[i]);
-   vm->vm_disks[i] = -1;
+   for (j = 0; j < VMM_MAX_BASE_PER_DISK; j++) {
+   if (vm->vm_disks[i][j] != -1)
+   close(vm->vm_disks[i][j]);
+   vm->vm_disks[i][j] = -1;
+   }
}
for (i = 0 ; i < vcp->vcp_nnics; i++) {
close(vm->vm_ifs[i].vif_fd);

-- 
Ori Bernstein



Re: vmd: add some NULL checks after {c,m}alloc()

2018-09-14 Thread Ori Bernstein
On Fri, 14 Sep 2018 13:50:40 +0800, Michael Mikonos  wrote:

> On Thu, Sep 13, 2018 at 10:08:16PM -0700, Ori Bernstein wrote:
> > On Thu, 13 Sep 2018 10:30:54 +0800, Michael Mikonos  wrote:
> > 
> I think a check for !disk->base belongs here as done above for disk->l1.
> 

I think that co

---
 usr.sbin/vmd/vioqcow2.c | 61 +++--
 usr.sbin/vmd/vioraw.c   |  2 ++
 usr.sbin/vmd/virtio.c   | 11 
 usr.sbin/vmd/virtio.h   |  1 +
 usr.sbin/vmd/vm.c   |  3 ++
 5 files changed, 57 insertions(+), 21 deletions(-)

diff --git usr.sbin/vmd/vioqcow2.c usr.sbin/vmd/vioqcow2.c
index 7c7e4857695..b3e4e1ec058 100644
--- usr.sbin/vmd/vioqcow2.c
+++ usr.sbin/vmd/vioqcow2.c
@@ -77,7 +77,6 @@ struct qcdisk {
 
int   fd;
uint64_t *l1;
-   char *scratch;
off_t end;
uint32_t  clustersz;
off_t disksz; /* In bytes */
@@ -161,17 +160,19 @@ qc2_open(struct qcdisk *disk, int fd)
size_t i;
int version;
 
+   pthread_rwlock_init(>lock, NULL);
+   disk->fd = fd;
+   disk->base = NULL;
+   disk->l1 = NULL;
+
if (pread(fd, , sizeof header, 0) != sizeof header) {
log_warn("%s: short read on header", __func__);
-   return -1;
+   goto error;
}
if (strncmp(header.magic, "QFI\xfb", 4) != 0) {
log_warn("%s: invalid magic numbers", __func__);
-   return -1;
+   goto error;
}
-   pthread_rwlock_init(>lock, NULL);
-   disk->fd = fd;
-   disk->base = NULL;
 
disk->clustersz = (1ull << be32toh(header.clustershift));
disk->disksz= be64toh(header.disksz);
@@ -182,6 +183,7 @@ qc2_open(struct qcdisk *disk, int fd)
disk->refoff= be64toh(header.refoff);
disk->nsnap = be32toh(header.snapcount);
disk->snapoff   = be64toh(header.snapsz);
+
/*
 * The additional features here are defined as 0 in the v2 format,
 * so as long as we clear the buffer before parsing, we don't need
@@ -196,23 +198,25 @@ qc2_open(struct qcdisk *disk, int fd)
 * We only know about the dirty or corrupt bits here.
 */
if (disk->incompatfeatures & ~(QCOW2_DIRTY|QCOW2_CORRUPT)) {
-   log_warn("%s: unsupported features %llx", __func__,
+   log_warnx("%s: unsupported features %llx", __func__,
disk->incompatfeatures & ~(QCOW2_DIRTY|QCOW2_CORRUPT));
-   return -1;
+   goto error;
}
 
disk->l1 = calloc(disk->l1sz, sizeof *disk->l1);
-   if (pread(disk->fd, (char*)disk->l1, 8*disk->l1sz, disk->l1off)
+   if (!disk->l1)
+   goto error;
+   if (pread(disk->fd, disk->l1, 8*disk->l1sz, disk->l1off)
!= 8*disk->l1sz) {
-   free(disk->l1);
-   return -1;
+   log_warn("%s: unable to read qcow2 L1 table", __func__);
+   goto error;
}
for (i = 0; i < disk->l1sz; i++)
disk->l1[i] = be64toh(disk->l1[i]);
version = be32toh(header.version);
if (version != 2 && version != 3) {
log_warn("%s: unknown qcow2 version %d", __func__, version);
-   return -1;
+   goto error;
}
 
backingoff = be64toh(header.backingoff);
@@ -223,34 +227,47 @@ qc2_open(struct qcdisk *disk, int fd)
 * otherwise we just crash with a pledge violation.
 */
log_warn("%s: unsupported external snapshot images", __func__);
-   return -1;
+   goto error;
 
if (backingsz >= sizeof basepath - 1) {
log_warn("%s: snapshot path too long", __func__);
-   return -1;
+   goto error;
}
if (pread(fd, basepath, backingsz, backingoff) != backingsz) {
log_warn("%s: could not read snapshot base name",
__func__);
-   return -1;
+   goto error;
}
basepath[backingsz] = 0;
 
disk->base = calloc(1, sizeof(struct qcdisk));
+   if (!disk->base)
+   goto error;
if (qc2_openpath(disk->base, basepath, O_RDONLY) == -1) {
-   free(disk->base);
-   return -1;
+   log_warn("%s: could not open %s", basepath, __func__);
+   goto error;
}
i

Re: vmd: add some NULL checks after {c,m}alloc()

2018-09-13 Thread Ori Bernstein
log_warn("%s: all disks must share clustersize",
__func__);
-   free(disk->base);
-   return -1;
+   goto error;
}
}
-   fstat(fd, );
+   if (fstat(fd, ) == -1) {
+   log_warn("%s: unable to stat disk", __func__);
+   goto error;
+   }
+
disk->end = st.st_size;
+
+   log_debug("opened qcow2 disk version %d:", version);
+   log_debug("size:\t%lld", disk->disksz);
+   log_debug("end:\t%lld", disk->end);
+   log_debug("nsnap:\t%d", disk->nsnap);
return 0;
+error:
+   qc2_close(disk);
+   return -1;
 }
 
 static ssize_t
@@ -362,8 +377,10 @@ qc2_close(void *p)
struct qcdisk *disk;
 
disk = p;
-   pwrite(disk->fd, disk->l1, disk->l1sz, disk->l1off);
+   if (disk->base)
+   qc2_close(disk->base);
close(disk->fd);
+   free(disk->l1);
free(disk);
 }
 
diff --git usr.sbin/vmd/vioraw.c usr.sbin/vmd/vioraw.c
index 4b87c649407..9c76b1356f9 100644
--- usr.sbin/vmd/vioraw.c
+++ usr.sbin/vmd/vioraw.c
@@ -62,6 +62,8 @@ virtio_init_raw(struct virtio_backing *file, off_t *szp, int 
fd)
return -1;
 
fdp = malloc(sizeof(int));
+   if (!fdp)
+   return -1;
*fdp = fd;
file->p = fdp;
file->pread = raw_pread;
diff --git usr.sbin/vmd/virtio.c usr.sbin/vmd/virtio.c
index c774cc7813a..20a94e6fa4d 100644
--- usr.sbin/vmd/virtio.c
+++ usr.sbin/vmd/virtio.c
@@ -2009,6 +2009,17 @@ virtio_init(struct vmd_vm *vm, int child_cdrom, int 
*child_disks,
evtimer_set(, vmmci_timeout, NULL);
 }
 
+void
+virtio_shutdown(struct vmd_vm *vm)
+{
+   int i;
+
+   /* ensure that our disks are synced */
+   vioscsi->file.close(vioscsi->file.p);
+   for (i = 0; i < nr_vioblk; i++)
+   vioblk[i].file.close(vioblk[i].file.p);
+}
+
 int
 vmmci_restore(int fd, uint32_t vm_id)
 {
diff --git usr.sbin/vmd/virtio.h usr.sbin/vmd/virtio.h
index 86ee6d21f9f..574f77d5fe8 100644
--- usr.sbin/vmd/virtio.h
+++ usr.sbin/vmd/virtio.h
@@ -258,6 +258,7 @@ struct ioinfo {
 
 /* virtio.c */
 void virtio_init(struct vmd_vm *, int, int *, int *);
+void virtio_shutdown(struct vmd_vm *);
 int virtio_dump(int);
 int virtio_restore(int, struct vmd_vm *, int, int *, int *);
 uint32_t vring_size(uint32_t);
diff --git usr.sbin/vmd/vm.c usr.sbin/vmd/vm.c
index 046b2be8503..491ca2eae3f 100644
--- usr.sbin/vmd/vm.c
+++ usr.sbin/vmd/vm.c
@@ -371,6 +371,9 @@ start_vm(struct vmd_vm *vm, int fd)
/* Execute the vcpu run loop(s) for this VM */
ret = run_vm(vm->vm_cdrom, vm->vm_disks, nicfds, >vm_params, );
 
+   /* Ensure that any in-flight data is written back */
+   virtio_shutdown(vm);
+
return (ret);
 }
 
-- 
2.18.0


-- 
Ori Bernstein



Re: vmd: add some NULL checks after {c,m}alloc()

2018-09-12 Thread Ori Bernstein
On Wed, 12 Sep 2018 15:36:32 +0800
Michael Mikonos  wrote:

> On Wed, Sep 12, 2018 at 12:13:35AM -0700, Ori Bernstein wrote:
> > On Tue, 11 Sep 2018 23:29:53 -0700, Ori Bernstein  
> > wrote:
> > 
> > >  static ssize_t
> > > @@ -362,8 +377,9 @@ qc2_close(void *p)
> > >   struct qcdisk *disk;
> > >  
> > >   disk = p;
> > > - pwrite(disk->fd, disk->l1, disk->l1sz, disk->l1off);
> > > - close(disk->fd);
> > > + if (disk->base)
> > > + qc2_close(disk->base);
> > > + free(disk->l1);
> > >   free(disk);
> > >  }
> > >  
> > 
> > Er, of course, the close should still exist here.
> 
> I was curious about qc2_close() calling itself for disk->base.

We open it recursively for multi-disk snapshots, where the disk
specifies the base to write deltas off of. We should close it
the same way.

> Also, is it worth setting disk->fd = -1 after close(), and
> adding a check to prevent closing it again?

The data is in a malloc'ed structure we're freeing on the next
line. I'm not sure that there's much point in setting the fd 
to -1 and checking it: if we try closing the same fd again, we've
got bigger problems.

-- 
Ori Bernstein 



Re: vmd: add some NULL checks after {c,m}alloc()

2018-09-12 Thread Ori Bernstein
On Tue, 11 Sep 2018 23:29:53 -0700, Ori Bernstein  wrote:

>  static ssize_t
> @@ -362,8 +377,9 @@ qc2_close(void *p)
>   struct qcdisk *disk;
>  
>   disk = p;
> - pwrite(disk->fd, disk->l1, disk->l1sz, disk->l1off);
> - close(disk->fd);
> + if (disk->base)
> + qc2_close(disk->base);
> + free(disk->l1);
>   free(disk);
>  }
>  

Er, of course, the close should still exist here.

-- 
Ori Bernstein



Re: vmd: add some NULL checks after {c,m}alloc()

2018-09-12 Thread Ori Bernstein
On Wed, 12 Sep 2018 12:04:21 +0800, Michael Mikonos  wrote:

> On Tue, Sep 11, 2018 at 11:25:52AM -0700, Ori Bernstein wrote:
> > On Tue, 11 Sep 2018 15:36:49 +0800
> > Michael Mikonos  wrote:
> > 
> > > Hello,
> > > 
> > > Sometimes vmd doesn't seem to check the result of malloc/calloc.
> > > I tried to preserve the existing behavour w.r.t. return values
> > > for the functions modified; some functions returned 1 on error
> > > while others return -1. Does this look correct?
> > > 
> > > - Michael
> > > 
> > > 
> > 
> > > Index: vioqcow2.c
> > > ===
> > > RCS file: /cvs/src/usr.sbin/vmd/vioqcow2.c,v
> > > retrieving revision 1.2
> > > diff -u -p -u -r1.2 vioqcow2.c
> > > --- vioqcow2.c11 Sep 2018 04:06:32 -  1.2
> > > +++ vioqcow2.c11 Sep 2018 07:29:10 -
> > > @@ -202,6 +202,9 @@ qc2_open(struct qcdisk *disk, int fd)
> > >   }
> > >  
> > >   disk->l1 = calloc(disk->l1sz, sizeof *disk->l1);
> > > + if (disk->l1 == NULL)
> > > + return -1;
> > > +
> > >   if (pread(disk->fd, (char*)disk->l1, 8*disk->l1sz, disk->l1off)
> > >   != 8*disk->l1sz) {
> > >   free(disk->l1);
> > > @@ -237,6 +240,8 @@ qc2_open(struct qcdisk *disk, int fd)
> > >   basepath[backingsz] = 0;
> > >  
> > >   disk->base = calloc(1, sizeof(struct qcdisk));
> > > + if (disk->base == NULL)
> > > + return -1;
> > 
> > This early return leaks disk->l1. The other vioqcow2/vioraw changes
> > look fine to me.
> 
> Thanks for the feedback.
> The other eary returns which currently free disk->base don't free
> disk->l1. I can change those error cases, and this calloc() one, to free
> disk->l1 if that's what is intended.
> 

This is a bit more involved, but I think better:

I moved the init of the various fields in disk up before any
reads are done, and initialized l1, so that qc2_close will
work at any point (at worst, it's a no-op).

I added a virtio_shutdown function that acutally calls the
close function -- it shouldn't matter now, since we basically
sync on every operation, but I think it'll be important once
we start trying to improve speed.

And then, I made all the cleanup use it.

Thanks for finding the issues.

diff --git usr.sbin/vmd/vioqcow2.c usr.sbin/vmd/vioqcow2.c
index 7c7e4857695..ba871006b70 100644
--- usr.sbin/vmd/vioqcow2.c
+++ usr.sbin/vmd/vioqcow2.c
@@ -77,7 +77,6 @@ struct qcdisk {
 
int   fd;
uint64_t *l1;
-   char *scratch;
off_t end;
uint32_t  clustersz;
off_t disksz; /* In bytes */
@@ -161,17 +160,19 @@ qc2_open(struct qcdisk *disk, int fd)
size_t i;
int version;
 
+   pthread_rwlock_init(>lock, NULL);
+   disk->fd = fd;
+   disk->base = NULL;
+   disk->l1 = NULL;
+
if (pread(fd, , sizeof header, 0) != sizeof header) {
log_warn("%s: short read on header", __func__);
-   return -1;
+   goto error;
}
if (strncmp(header.magic, "QFI\xfb", 4) != 0) {
log_warn("%s: invalid magic numbers", __func__);
-   return -1;
+   goto error;
}
-   pthread_rwlock_init(>lock, NULL);
-   disk->fd = fd;
-   disk->base = NULL;
 
disk->clustersz = (1ull << be32toh(header.clustershift));
disk->disksz= be64toh(header.disksz);
@@ -182,6 +183,7 @@ qc2_open(struct qcdisk *disk, int fd)
disk->refoff= be64toh(header.refoff);
disk->nsnap = be32toh(header.snapcount);
disk->snapoff   = be64toh(header.snapsz);
+
/*
 * The additional features here are defined as 0 in the v2 format,
 * so as long as we clear the buffer before parsing, we don't need
@@ -196,23 +198,25 @@ qc2_open(struct qcdisk *disk, int fd)
 * We only know about the dirty or corrupt bits here.
 */
if (disk->incompatfeatures & ~(QCOW2_DIRTY|QCOW2_CORRUPT)) {
-   log_warn("%s: unsupported features %llx", __func__,
+   log_warnx("%s: unsupported features %llx", __func__,
disk->incompatfeatures & ~(QCOW2_DIRTY|QCOW2_CORRUPT));
-   return -1;
+   goto error;
}
 
disk->l1 = calloc(disk->l1sz, sizeof *disk->l1);
-   if (pread(disk->fd, (char*)disk->l1, 8*disk->l1sz, disk->l1off)
+   if (!dis

Re: vmd: add some NULL checks after {c,m}alloc()

2018-09-11 Thread Ori Bernstein
On Tue, 11 Sep 2018 15:36:49 +0800
Michael Mikonos  wrote:

> Hello,
> 
> Sometimes vmd doesn't seem to check the result of malloc/calloc.
> I tried to preserve the existing behavour w.r.t. return values
> for the functions modified; some functions returned 1 on error
> while others return -1. Does this look correct?
> 
> - Michael
> 
> 

> Index: vioqcow2.c
> ===
> RCS file: /cvs/src/usr.sbin/vmd/vioqcow2.c,v
> retrieving revision 1.2
> diff -u -p -u -r1.2 vioqcow2.c
> --- vioqcow2.c11 Sep 2018 04:06:32 -  1.2
> +++ vioqcow2.c11 Sep 2018 07:29:10 -
> @@ -202,6 +202,9 @@ qc2_open(struct qcdisk *disk, int fd)
>   }
>  
>   disk->l1 = calloc(disk->l1sz, sizeof *disk->l1);
> + if (disk->l1 == NULL)
> + return -1;
> +
>   if (pread(disk->fd, (char*)disk->l1, 8*disk->l1sz, disk->l1off)
>   != 8*disk->l1sz) {
>   free(disk->l1);
> @@ -237,6 +240,8 @@ qc2_open(struct qcdisk *disk, int fd)
>   basepath[backingsz] = 0;
>  
>   disk->base = calloc(1, sizeof(struct qcdisk));
> + if (disk->base == NULL)
> + return -1;

This early return leaks disk->l1. The other vioqcow2/vioraw changes
look fine to me.

>   if (qc2_openpath(disk->base, basepath, O_RDONLY) == -1) {
>   free(disk->base);
>   return -1;
> Index: vioraw.c
> ===
> RCS file: /cvs/src/usr.sbin/vmd/vioraw.c,v
> retrieving revision 1.1
> diff -u -p -u -r1.1 vioraw.c
> --- vioraw.c  25 Aug 2018 04:16:09 -  1.1
> +++ vioraw.c  11 Sep 2018 07:29:10 -
> @@ -62,6 +62,8 @@ virtio_init_raw(struct virtio_backing *f
>   return -1;
>  
>   fdp = malloc(sizeof(int));
> + if (fdp == NULL)
> + return -1;
>   *fdp = fd;
>   file->p = fdp;
>   file->pread = raw_pread;
> 


-- 
Ori Bernstein 



Re: vmd fail fast on unknown disk format

2018-09-10 Thread Ori Bernstein
On Mon, 10 Sep 2018 20:00:16 -0700, Carlos Cardenas  
wrote:

> Howdy.
> 
> Attached patch allows vmd to fail fast on unknown disk format along with
> some minor clean up on logging.
> 
> Comments? Ok?
> 
> +--+
> Carlos

One nit:

refs = htobe16(refs);
if (pwrite(disk->fd, , sizeof refs, l2cluster + 2*l2idx) != 2) {
-   log_warn("could not write ref block");
+   log_warn("%s: could not write ref block", __func__);
}
return 0;
 }

Should probably return -1 in the error case. I think I changed that in
my (WIP) cluster reuse change, but if you're touching the code it might
as well go in here.


-- 
Ori Bernstein



Re: [PATCH] qcow2 disk format

2018-08-31 Thread Ori Bernstein
d.conf
new file mode 100644
index 000..24ef3d8c771
--- /dev/null
+++ regress/usr.sbin/vmd/config/vmd-pass-format-keyword.conf
@@ -0,0 +1,8 @@
+#  $OpenBSD: vmd-pass-cdrom-keyword.conf,v 1.1 2018/01/07 22:59:57 
ccardenas Exp $
+# Pass on cdrom keyword
+
+vm "x" {
+memory 1G
+disk "foo.img" format "raw"
+disable
+}
diff --git regress/usr.sbin/vmd/config/vmd-pass-format-keyword.ok 
regress/usr.sbin/vmd/config/vmd-pass-format-keyword.ok
new file mode 100644
index 000..403d828b763
--- /dev/null
+++ regress/usr.sbin/vmd/config/vmd-pass-format-keyword.ok
@@ -0,0 +1 @@
+configuration OK
diff --git regress/usr.sbin/vmd/diskfmt/Makefile 
regress/usr.sbin/vmd/diskfmt/Makefile
new file mode 100644
index 000..71bb2b8ce52
--- /dev/null
+++ regress/usr.sbin/vmd/diskfmt/Makefile
@@ -0,0 +1,28 @@
+#  $OpenBSD: Makefile,v 1.5 2018/07/20 22:18:49 bluhm Exp $
+
+# This regression test creates a raw disk image and a
+# qcow disk image, and scribbles the same data to both
+# of them. It verifies that they both have the same
+# result.
+#
+# In order for this test to work, qemu must be installed
+# in order to create the disk images.
+
+VMD_DIR=$(BSDSRCDIR)/usr.sbin/vmd/
+
+PROG=vioscribble
+SRCS=vioscribble.c $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c
+CFLAGS+=-I$(VMD_DIR) -pthread
+LDFLAGS+=-pthread
+
+run-regress-vioscribble: scribble-images
+
+scribble-images:
+   rm -f scribble.raw scribble.qc2
+   vmctl create scribble.raw -s 4G
+   qemu-img create -f qcow2 scribble.qc2 4G
+
+
+.PHONY: ${REGRESS_TARGETS} scribble-images
+
+.include 
diff --git regress/usr.sbin/vmd/diskfmt/vioscribble.c 
regress/usr.sbin/vmd/diskfmt/vioscribble.c
new file mode 100644
index 000..3821c3b277b
--- /dev/null
+++ regress/usr.sbin/vmd/diskfmt/vioscribble.c
@@ -0,0 +1,165 @@
+/* $OpenBSD: $ */
+
+/*
+ * Copyright (c) 2018 Ori Bernstein 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+/* 
+ * Quick hack of a program to try to test vioqcow2.c against
+ * vioraw.c.
+ *
+ * Compile with:
+ *
+ * cc -pthread -o scribble vioscribble.c vioqcow2.c vioraw.c
+ */
+#include  /* PAGE_SIZE */
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pci.h"
+#include "vmd.h"
+#include "vmm.h"
+#include "virtio.h"
+
+#define CLUSTERSZ 65536
+
+int verbose;
+struct virtio_backing qcowfile;
+struct virtio_backing rawfile;
+
+/* We expect the scribble disks to be 4g in size */
+#define DISKSZ (4ull*1024ull*1024ull*1024ull)
+
+/* functions that io code depends on */
+
+void
+log_debug(const char *emsg, ...)
+{
+   if (verbose) {
+   va_list ap;
+
+   va_start(ap, emsg);
+   vfprintf(stdout, emsg, ap);
+   fprintf(stdout, "\n");
+   va_end(ap);
+   }
+}
+
+void
+log_warnx(const char *emsg, ...)
+{
+   va_list ap;
+
+   va_start(ap, emsg);
+   vfprintf(stdout, emsg, ap);
+   fprintf(stdout, "\n");
+   va_end(ap);
+}
+
+void
+log_warn(const char *emsg, ...)
+{
+   va_list ap;
+
+   va_start(ap, emsg);
+   vfprintf(stdout, emsg, ap);
+   fprintf(stdout, "\n");
+   va_end(ap);
+}
+
+static void
+fill(size_t off, char *buf, size_t len)
+{
+   size_t i;
+
+   /* use the top bits of off, since we can guess at where we went wrong. 
*/
+   for (i = 0; i < len; i++)
+   buf[i] = (off >> 8);
+}
+
+int
+main(int argc, char **argv)
+{
+   int qcfd, rawfd, i;
+   char buf[64*1024], cmp[64*1024];
+   off_t len, off, qcsz, rawsz;
+
+   verbose = !!getenv("VERBOSE");
+   qcfd = open("scribble.qc2", O_RDWR);
+   rawfd = open("scribble.raw", O_RDWR);
+   if (qcfd == -1 || virtio_init_qcow2(, , qcfd) == -1)
+   err(1, "unable to open qcow");
+   if (rawfd == -1 || virtio_init_raw(, , rawfd) == -1)
+   err(1, "unable to open raw");
+
+   srandom_deterministic(123);
+
+

Re: [PATCH] qcow2 disk format

2018-08-31 Thread Ori Bernstein
On Wed, 15 Aug 2018 23:02:16 -0700, Ori Bernstein  wrote:

> Updated style from feedback from off-list. Also added checks and
> erroring for incompatible extensions.

One more update. This revision adds explicit definitions of the
disk format, and adds regress tests for parsing it. The syntax
for it is:

disk "foo" format "qcow2"
disk "bar" format "raw"

If the format is left unspecified, you get a raw image. This
change is fully backwards compatible, so your configs don't
need to change.

Alternatively, you can specify the disk format on the command
line:

vmctl start ... -d raw:mydisk -d qcow2:moredisk \
-d harddiskraw.img

Again, the default format is raw, and qcow2 needs to be specified
explicitly.

Updated patch below:

---
 regress/usr.sbin/vmd/config/Makefile   |   6 +-
 .../usr.sbin/vmd/config/vmd-fail-bad-format.conf   |   8 +
 regress/usr.sbin/vmd/config/vmd-fail-bad-format.ok |   1 +
 .../vmd/config/vmd-fail-missing-format.conf|   8 +
 .../usr.sbin/vmd/config/vmd-fail-missing-format.ok |   1 +
 .../vmd/config/vmd-pass-format-keyword.conf|   8 +
 .../usr.sbin/vmd/config/vmd-pass-format-keyword.ok |   1 +
 regress/usr.sbin/vmd/diskfmt/Makefile  |  28 +
 regress/usr.sbin/vmd/diskfmt/vioscribble.c | 165 ++
 usr.sbin/vmctl/main.c  |  42 +-
 usr.sbin/vmctl/vmctl.8 |  18 +-
 usr.sbin/vmctl/vmctl.c |   8 +-
 usr.sbin/vmctl/vmctl.h |   6 +-
 usr.sbin/vmd/Makefile  |   2 +-
 usr.sbin/vmd/parse.y   |  33 +-
 usr.sbin/vmd/vioqcow2.c| 574 +
 usr.sbin/vmd/virtio.c  |  26 +-
 usr.sbin/vmd/virtio.h  |   3 +-
 usr.sbin/vmd/vm.conf.5 |   9 +-
 usr.sbin/vmd/vmd.h |   5 +
 20 files changed, 917 insertions(+), 35 deletions(-)
 create mode 100644 regress/usr.sbin/vmd/config/vmd-fail-bad-format.conf
 create mode 100644 regress/usr.sbin/vmd/config/vmd-fail-bad-format.ok
 create mode 100644 regress/usr.sbin/vmd/config/vmd-fail-missing-format.conf
 create mode 100644 regress/usr.sbin/vmd/config/vmd-fail-missing-format.ok
 create mode 100644 regress/usr.sbin/vmd/config/vmd-pass-format-keyword.conf
 create mode 100644 regress/usr.sbin/vmd/config/vmd-pass-format-keyword.ok
 create mode 100644 regress/usr.sbin/vmd/diskfmt/Makefile
 create mode 100644 regress/usr.sbin/vmd/diskfmt/vioscribble.c
 create mode 100644 usr.sbin/vmd/vioqcow2.c

diff --git regress/usr.sbin/vmd/config/Makefile 
regress/usr.sbin/vmd/config/Makefile
index 2b41e49ac83..a98012da09c 100644
--- regress/usr.sbin/vmd/config/Makefile
+++ regress/usr.sbin/vmd/config/Makefile
@@ -2,10 +2,12 @@
 
 VMD ?= /usr/sbin/vmd
 
-VMD_PASS=boot-keyword memory-round memory-just-enough cdrom-keyword
+VMD_PASS=boot-keyword memory-round memory-just-enough cdrom-keyword \
+format-keyword
 VMD_FAIL=kernel-keyword too-few-ram vm-name-too-long too-many-ifs \
 boot-name-too-long disk-path-too-long too-many-disks \
-switch-no-interface switch-no-add cdrom-name-too-long
+switch-no-interface switch-no-add cdrom-name-too-long \
+bad-format missing-format
 
 REGRESS_TARGETS=
 
diff --git regress/usr.sbin/vmd/config/vmd-fail-bad-format.conf 
regress/usr.sbin/vmd/config/vmd-fail-bad-format.conf
new file mode 100644
index 000..bd92b765587
--- /dev/null
+++ regress/usr.sbin/vmd/config/vmd-fail-bad-format.conf
@@ -0,0 +1,8 @@
+#  $OpenBSD: vmd-pass-cdrom-keyword.conf,v 1.1 2018/01/07 22:59:57 
ccardenas Exp $
+# Pass on cdrom keyword
+
+vm "x" {
+memory 1G
+disk "foo.img" format "rotten"
+disable
+}
diff --git regress/usr.sbin/vmd/config/vmd-fail-bad-format.ok 
regress/usr.sbin/vmd/config/vmd-fail-bad-format.ok
new file mode 100644
index 000..1f79afafa1e
--- /dev/null
+++ regress/usr.sbin/vmd/config/vmd-fail-bad-format.ok
@@ -0,0 +1 @@
+6: unrecognized disk format rotten
diff --git regress/usr.sbin/vmd/config/vmd-fail-missing-format.conf 
regress/usr.sbin/vmd/config/vmd-fail-missing-format.conf
new file mode 100644
index 000..b4363fc5440
--- /dev/null
+++ regress/usr.sbin/vmd/config/vmd-fail-missing-format.conf
@@ -0,0 +1,8 @@
+#  $OpenBSD: vmd-pass-cdrom-keyword.conf,v 1.1 2018/01/07 22:59:57 
ccardenas Exp $
+# Pass on cdrom keyword
+
+vm "x" {
+memory 1G
+disk "foo.img" format
+disable
+}
diff --git regress/usr.sbin/vmd/config/vmd-fail-missing-format.ok 
regress/usr.sbin/vmd/config/vmd-fail-missing-format.ok
new file mode 100644
index 000..c88cb26bf39
--- /dev/null
+++ regress/usr.sbin/vmd/config/vmd-fail-missing-format.ok
@@ -0,0 +1 @@
+6: syntax error
diff --git regre

Re: [PATCH] Pluggable disk formats for vmd (qcow2 preparation)

2018-08-24 Thread Ori Bernstein
On Tue, 21 Aug 2018 23:03:36 -0700, Carlos Cardenas  
wrote:

> On Sun, Aug 19, 2018 at 11:46:12PM -0700, Ori Bernstein wrote:
> > One more minor update to this patch:
> > 
> > - Remove unused enum
> > - Remove unused function prototype
> > - Move some qcow2-specific headers into the qcow2 patch.
> 
> Ori, it's nice seeing good progress on this.  I have a couple of
> questions inline below.
> 
> Why are we making sz represent the number of 512 byte sectors?  This may
> be true for hd disks but it needs to be *bytes* for ISO/cdrom.

Oops, was focused on hard disks, and misread what CDs did. Moved this division
out to the hard disk initialization. Fixed in both this and the qcow2, but
will see if you have any more comments on the qcow2 patch before I spam the
list with yet another minor revision.

> This has to be the number of bytes shifted to get the number of 2K
> blocks.  As the patch is, this yields the incorrect number of blocks an
> ISO has. This breaks ISO/cdrom support.
 
Solved, see above -- we can now boot from and mount CD correctly.
 
> [snip]
> > +struct virtio_backing {
> > +   void  *p;
> > +   char *path;
> > +   ssize_t  (*pread)(void *p, char *buf, size_t len, off_t off);
> > +   ssize_t  (*pwrite)(void *p, char *buf, size_t len, off_t off);
> > +   void (*close)(void *p);
> > +};
> 
> What is path used for?  I don't see it used anywhere in this patch or in
> the qcow2 patch.

The intent was to use it for better error messages, especially since with
qcow2 snapshots you can smear the disk image across multiple files. I never
hooked it into any errors, and it's probably not worth the trouble.

Updated patch inline below:

---
 usr.sbin/vmd/Makefile  |  2 +-
 usr.sbin/vmd/vioraw.c  | 73 ++
 usr.sbin/vmd/vioscsi.c |  7 +++--
 usr.sbin/vmd/virtio.c  | 56 +-
 usr.sbin/vmd/virtio.h  | 16 ---
 5 files changed, 124 insertions(+), 30 deletions(-)
 create mode 100644 usr.sbin/vmd/vioraw.c

diff --git usr.sbin/vmd/Makefile usr.sbin/vmd/Makefile
index 7ea090f8886..24c1d1b1d4a 100644
--- usr.sbin/vmd/Makefile
+++ usr.sbin/vmd/Makefile
@@ -6,7 +6,7 @@ PROG=   vmd
 SRCS=  vmd.c control.c log.c priv.c proc.c config.c vmm.c
 SRCS+= vm.c loadfile_elf.c pci.c virtio.c i8259.c mc146818.c
 SRCS+= ns8250.c i8253.c vmboot.c ufs.c disklabel.c dhcp.c packet.c
-SRCS+= parse.y atomicio.c vioscsi.c
+SRCS+= parse.y atomicio.c vioscsi.c vioraw.c
 
 CFLAGS+=   -Wall -I${.CURDIR}
 CFLAGS+=   -Wstrict-prototypes -Wmissing-prototypes
diff --git usr.sbin/vmd/vioraw.c usr.sbin/vmd/vioraw.c
new file mode 100644
index 000..0dcb8b3c9c2
--- /dev/null
+++ usr.sbin/vmd/vioraw.c
@@ -0,0 +1,73 @@
+/* $OpenBSD: $ */
+/*
+ * Copyright (c) 2018 Ori Bernstein 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include 
+
+#include 
+#include 
+
+#include 
+#include 
+
+#include "vmd.h"
+#include "vmm.h"
+#include "virtio.h"
+
+static ssize_t
+raw_pread(void *file, char *buf, size_t len, off_t off)
+{
+   return pread(*(int *)file, buf, len, off);
+}
+
+static ssize_t
+raw_pwrite(void *file, char *buf, size_t len, off_t off)
+{
+   return pwrite(*(int *)file, buf, len, off);
+}
+
+static void
+raw_close(void *file)
+{
+   close(*(int *)file);
+   free(file);
+}
+
+/*
+ * Initializes a raw disk image backing file from an fd.
+ * Stores the number of 512 byte sectors in *szp,
+ * returning -1 for error, 0 for success.
+ */
+int
+virtio_init_raw(struct virtio_backing *file, off_t *szp, int fd)
+{
+   off_t sz;
+   int *fdp;
+
+   sz = lseek(fd, 0, SEEK_END);
+   if (sz == -1)
+   return -1;
+
+   fdp = malloc(sizeof(int));
+   *fdp = fd;
+   file->p = fdp;
+   file->pread = raw_pread;
+   file->pwrite = raw_pwrite;
+   file->close = raw_close;
+   *szp = sz;
+   return 0;
+}
+
diff --git usr.sbin/vmd/vioscsi.c usr.sbin/vmd/vioscsi.c
index 93867887598..af504f0550d 100644
--- usr.sbin/vmd/vioscsi.c
+++ usr.sbin/vmd/vioscsi.c
@@ -197,7 +19

Re: [patch] switchd(8) on sparc64: panic: trap type 0x34

2018-08-20 Thread Ori Bernstein
On Mon, 20 Aug 2018 23:38:28 +0900 (JST), YASUOKA Masahiko 
 wrote:

> Hi,

Heyho.

> > panic: trap type 0x34 (mem address not aligned): pc=15864ec
> > npc=15864f0 pstate=99820006
> (snip)
> > o...@eigenstate.org and I put together the following diff. I haven't been 
> > able
> > to check with the original reporter, and I don't have a machine to test it 
> > on,
> > so comments and/or tests would be appreciated!
> 
> Do you have a test case of this?

You're responding to a request for a test case. The initial reporter
disappeared after providing a stack trace, so we couldn't confirm if this
makes a difference at all.

Right now, this is an untested stab in the dark. It would be unsurprising if
the fix is wrong.

> As far as I know, switch(4) assumes it receives packets located at 64
> bit aligned memory.  So if the code like below
> 
>   *(uint64_t *)oxm->oxm_value = htobe64(val);
> 
> cause alignment faults, the assumption may be broken.  If so, the
> place we should fix may not be here.

The reason we are suspicious here is because oxm_value comes from struct
ofp_ox_match, which is defined like this:

struct ofp_ox_match {
uint16_toxm_class;
uint8_t oxm_fh;
uint8_t oxm_length;
uint8_t oxm_value[0];
} __packed;

While the packet may be aligned to 8 bytes, my reading is that the oxm_value
itself is not at the start of the packet, and is not aligned within the
ofp_ox_match. I'm not familiar enough with the open flowz to know if
ofp_ox_match is guaranteed to start at a 4-byte alignment (which would
push oxm_value out to an 8 byte alignment), but I suspect that's not the
case.

-- 
Ori Bernstein



Re: [PATCH] qcow2 disk format

2018-08-20 Thread Ori Bernstein
On Sun, 12 Aug 2018 22:51:24 -0700, Ori Bernstein  wrote:

> 

This patch adds preliminary support for creating qcow2 disk images. It gives
the 'vmctl create' command  an option '-f', which can be given arguments of
either 'raw' (defualt) or 'qcow2', and it creates a disk of the appropriate
format.

The qcow2 header is duplicated inline, since it seemed better than the other
options (putting it into a header and messing around with include/source paths 
so
that we could share it with vmd, or creating a libqcow2)

That decision may get revisited when I find time to implement snapshotting,
since creating a snapshot will involve relatively deep modifications to the
disk (adjusting refcounts, rewriting L1/L2 tables, etc).

---
 usr.sbin/vmctl/main.c  |  18 +--
 usr.sbin/vmctl/vmctl.8 |   7 ++-
 usr.sbin/vmctl/vmctl.c | 124 -
 usr.sbin/vmctl/vmctl.h |   3 +-
 4 files changed, 143 insertions(+), 9 deletions(-)

diff --git usr.sbin/vmctl/main.c usr.sbin/vmctl/main.c
index b7674d0c980..f39ccbdbc1f 100644
--- usr.sbin/vmctl/main.c
+++ usr.sbin/vmctl/main.c
@@ -63,7 +63,8 @@ intctl_receive(struct parse_result *, int, char 
*[]);
 
 struct ctl_command ctl_commands[] = {
{ "console",CMD_CONSOLE,ctl_console,"id" },
-   { "create", CMD_CREATE, ctl_create, "\"path\" -s size", 1 },
+   { "create", CMD_CREATE, ctl_create, 
+   "\"path\" -s size [-f fmt]", 1 },
{ "load",   CMD_LOAD,   ctl_load,   "\"path\"" },
{ "log",CMD_LOG,ctl_log,"(verbose|brief)" },
{ "reload", CMD_RELOAD, ctl_reload, "" },
@@ -470,24 +471,28 @@ int
 ctl_create(struct parse_result *res, int argc, char *argv[])
 {
int  ch, ret;
-   const char  *paths[2];
+   const char  *paths[2], *format;
 
if (argc < 2)
ctl_usage(res->ctl);
 
paths[0] = argv[1];
paths[1] = NULL;
+   format = "raw";
if (pledge("stdio rpath wpath cpath", NULL) == -1)
err(1, "pledge");
argc--;
argv++;
 
-   while ((ch = getopt(argc, argv, "s:")) != -1) {
+   while ((ch = getopt(argc, argv, "s:f:")) != -1) {
switch (ch) {
case 's':
if (parse_size(res, optarg, 0) != 0)
errx(1, "invalid size: %s", optarg);
break;
+   case 'f':
+   format = optarg;
+   break;
default:
ctl_usage(res->ctl);
/* NOTREACHED */
@@ -498,7 +503,12 @@ ctl_create(struct parse_result *res, int argc, char 
*argv[])
fprintf(stderr, "missing size argument\n");
ctl_usage(res->ctl);
}
-   ret = create_imagefile(paths[0], res->size);
+   if (strcmp(format, "raw") == 0)
+   ret = create_raw_imagefile(paths[0], res->size);
+   else if (strcmp(format, "qcow2") == 0)
+   ret = create_qc2_imagefile(paths[0], res->size);
+   else
+   errx(1, "unknown image format %s", format);
if (ret != 0) {
errno = ret;
err(1, "create imagefile operation failed");
diff --git usr.sbin/vmctl/vmctl.8 usr.sbin/vmctl/vmctl.8
index 81ecbeb6c1d..358928c7ff6 100644
--- usr.sbin/vmctl/vmctl.8
+++ usr.sbin/vmctl/vmctl.8
@@ -50,12 +50,15 @@ Using
 .Xr cu 1
 connect to the console of the VM with the specified
 .Ar id .
-.It Cm create Ar path Fl s Ar size
+.It Cm create Ar path Fl s Ar size Op Fl f Ar format
 Creates a VM disk image file with the specified
 .Ar path
 and
 .Ar size ,
-rounded to megabytes.
+rounded to megabytes. The disk
+.Ar format
+may be specified as either 'raw' or 'qcow2', defaulting to 'raw'
+if left unspecified.
 .It Cm load Ar filename
 Load additional configuration from the specified file.
 .It Cm log brief
diff --git usr.sbin/vmctl/vmctl.c usr.sbin/vmctl/vmctl.c
index 7ab9a9bc60c..b3a4e6d7b14 100644
--- usr.sbin/vmctl/vmctl.c
+++ usr.sbin/vmctl/vmctl.c
@@ -735,7 +735,7 @@ vm_console(struct vmop_info_result *list, size_t ct)
 }
 
 /*
- * create_imagefile
+ * create_raw_imagefile
  *
  * Create an empty imagefile with the specified path and size.
  *
@@ -749,7 +749,7 @@ vm_console(struct vmop_info_result *list, size_t ct)
  *  E : Various other E errno codes due to other I/O errors
  */
 int
-create_imagefile(const char *imgfile_path, long imgsize)
+create_raw_imagefile(const char *imgfile_path, long imgsize)
 {
int fd, ret;
 
@@ -770,3 +770,123 @@ create_i

Re: [PATCH] qcow2 disk format

2018-08-20 Thread Ori Bernstein
On Sun, 12 Aug 2018 22:51:24 -0700, Ori Bernstein  wrote:

> On Sun, 5 Aug 2018 00:52:32 -0700, Ori Bernstein  wrote:
> 
> And, now something that actually appears to work. You can create a
> disk on OpenBSD using qemu:
> 
>   qemu-img create foo.qc2 16G
> 
> add it to your vm.conf:
> 
>   disk "/path/to/foo.qc2"
> 
> boot and install OpenBSD on it as normal, and if you decide you don't like
> hardware virtualization, you can point qemu at it and run using that:
> 
>   qemu-system-x86_64 -m 1024 -hda foo.qc2 
> 
> Snapshots haven't been tested yet, and tools need to be added, incompatible
> extensions are silently ignored, and there could stand to be a bit more sanity
> checking.
> 
> vioscribble.c should also probably be extracted into a regress test, rather
> than just something that sits beside the I/O code.
> 
> Patch below:

One more update, with some significant differences:

  - External snapshots will work if you comment out the chroot and add rpath
to the pledges. This is a bad idea, so external snapshots will return a
clean error until I can figure out a good way to plumb the fds, shuffle
around the pledges, or do something else to make it possible to open the
backing files from the vm process.
  
  - Internal snapshots seem to work, but you will need qemu to manage them.
  
qemu-img snapshot -c snapname disk.qc2 # create
qemu-img snapshot -a snapname disk.qc2 # revert

These have only been tested lightly, so I wouldn't trust them fully yet.

  - vioscribble has been turned into a regress test, and grew license
information.
  
  - A somewhat embarrassing bug, where I malloced the wrong type, was fixed.

---
 regress/usr.sbin/vmd/diskfmt/Makefile  |  28 ++
 regress/usr.sbin/vmd/diskfmt/vioscribble.c | 165 +
 usr.sbin/vmd/Makefile  |   2 +-
 usr.sbin/vmd/vioqcow2.c| 574 +
 usr.sbin/vmd/virtio.c  |  10 +-
 usr.sbin/vmd/virtio.h  |   1 +
 6 files changed, 776 insertions(+), 4 deletions(-)
 create mode 100644 regress/usr.sbin/vmd/diskfmt/Makefile
 create mode 100644 regress/usr.sbin/vmd/diskfmt/vioscribble.c
 create mode 100644 usr.sbin/vmd/vioqcow2.c

diff --git regress/usr.sbin/vmd/diskfmt/Makefile 
regress/usr.sbin/vmd/diskfmt/Makefile
new file mode 100644
index 000..71bb2b8ce52
--- /dev/null
+++ regress/usr.sbin/vmd/diskfmt/Makefile
@@ -0,0 +1,28 @@
+#  $OpenBSD: Makefile,v 1.5 2018/07/20 22:18:49 bluhm Exp $
+
+# This regression test creates a raw disk image and a
+# qcow disk image, and scribbles the same data to both
+# of them. It verifies that they both have the same
+# result.
+#
+# In order for this test to work, qemu must be installed
+# in order to create the disk images.
+
+VMD_DIR=$(BSDSRCDIR)/usr.sbin/vmd/
+
+PROG=vioscribble
+SRCS=vioscribble.c $(VMD_DIR)/vioqcow2.c $(VMD_DIR)/vioraw.c
+CFLAGS+=-I$(VMD_DIR) -pthread
+LDFLAGS+=-pthread
+
+run-regress-vioscribble: scribble-images
+
+scribble-images:
+   rm -f scribble.raw scribble.qc2
+   vmctl create scribble.raw -s 4G
+   qemu-img create -f qcow2 scribble.qc2 4G
+
+
+.PHONY: ${REGRESS_TARGETS} scribble-images
+
+.include 
diff --git regress/usr.sbin/vmd/diskfmt/vioscribble.c 
regress/usr.sbin/vmd/diskfmt/vioscribble.c
new file mode 100644
index 000..3821c3b277b
--- /dev/null
+++ regress/usr.sbin/vmd/diskfmt/vioscribble.c
@@ -0,0 +1,165 @@
+/* $OpenBSD: $ */
+
+/*
+ * Copyright (c) 2018 Ori Bernstein 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+/* 
+ * Quick hack of a program to try to test vioqcow2.c against
+ * vioraw.c.
+ *
+ * Compile with:
+ *
+ * cc -pthread -o scribble vioscribble.c vioqcow2.c vioraw.c
+ */
+#include  /* PAGE_SIZE */
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pci.h"
+#include "vmd.h"
+#include "vmm.h"
+#include "virtio.h"
+
+#define CLUSTERSZ 65536
+
+int verbo

Re: [PATCH] Pluggable disk formats for vmd (qcow2 preparation)

2018-08-20 Thread Ori Bernstein
One more minor update to this patch:

- Remove unused enum
- Remove unused function prototype
- Move some qcow2-specific headers into the qcow2 patch.

---
 usr.sbin/vmd/Makefile  |  2 +-
 usr.sbin/vmd/vioraw.c  | 73 ++
 usr.sbin/vmd/vioscsi.c |  7 +++--
 usr.sbin/vmd/virtio.c  | 55 -
 usr.sbin/vmd/virtio.h  | 17 +---
 5 files changed, 124 insertions(+), 30 deletions(-)
 create mode 100644 usr.sbin/vmd/vioraw.c

diff --git usr.sbin/vmd/Makefile usr.sbin/vmd/Makefile
index 7ea090f8886..24c1d1b1d4a 100644
--- usr.sbin/vmd/Makefile
+++ usr.sbin/vmd/Makefile
@@ -6,7 +6,7 @@ PROG=   vmd
 SRCS=  vmd.c control.c log.c priv.c proc.c config.c vmm.c
 SRCS+= vm.c loadfile_elf.c pci.c virtio.c i8259.c mc146818.c
 SRCS+= ns8250.c i8253.c vmboot.c ufs.c disklabel.c dhcp.c packet.c
-SRCS+= parse.y atomicio.c vioscsi.c
+SRCS+= parse.y atomicio.c vioscsi.c vioraw.c
 
 CFLAGS+=   -Wall -I${.CURDIR}
 CFLAGS+=   -Wstrict-prototypes -Wmissing-prototypes
diff --git usr.sbin/vmd/vioraw.c usr.sbin/vmd/vioraw.c
new file mode 100644
index 000..d377546de67
--- /dev/null
+++ usr.sbin/vmd/vioraw.c
@@ -0,0 +1,73 @@
+/* $OpenBSD: $ */
+/*
+ * Copyright (c) 2018 Ori Bernstein 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include 
+
+#include 
+#include 
+
+#include 
+#include 
+
+#include "vmd.h"
+#include "vmm.h"
+#include "virtio.h"
+
+static ssize_t
+raw_pread(void *file, char *buf, size_t len, off_t off)
+{
+   return pread(*(int *)file, buf, len, off);
+}
+
+static ssize_t
+raw_pwrite(void *file, char *buf, size_t len, off_t off)
+{
+   return pwrite(*(int *)file, buf, len, off);
+}
+
+static void
+raw_close(void *file)
+{
+   close(*(int *)file);
+   free(file);
+}
+
+/*
+ * Initializes a raw disk image backing file from an fd.
+ * Stores the number of 512 byte sectors in *szp,
+ * returning -1 for error, 0 for success.
+ */
+int
+virtio_init_raw(struct virtio_backing *file, off_t *szp, int fd)
+{
+   off_t sz;
+   int *fdp;
+
+   sz = lseek(fd, 0, SEEK_END);
+   if (sz == -1)
+   return -1;
+
+   fdp = malloc(sizeof(int));
+   *fdp = fd;
+   file->p = fdp;
+   file->pread = raw_pread;
+   file->pwrite = raw_pwrite;
+   file->close = raw_close;
+   *szp = sz / 512;
+   return 0;
+}
+
diff --git usr.sbin/vmd/vioscsi.c usr.sbin/vmd/vioscsi.c
index 93867887598..af504f0550d 100644
--- usr.sbin/vmd/vioscsi.c
+++ usr.sbin/vmd/vioscsi.c
@@ -197,7 +197,7 @@ vioscsi_start_read(struct vioscsi_dev *dev, off_t block, 
ssize_t n_blocks)
goto nomem;
info->len = n_blocks * VIOSCSI_BLOCK_SIZE_CDROM;
info->offset = block * VIOSCSI_BLOCK_SIZE_CDROM;
-   info->fd = dev->fd;
+   info->file = >file;
 
return info;
 
@@ -210,7 +210,10 @@ nomem:
 static const uint8_t *
 vioscsi_finish_read(struct ioinfo *info)
 {
-   if (pread(info->fd, info->buf, info->len, info->offset) != info->len) {
+   struct virtio_backing *f;
+
+   f = info->file;
+   if (f->pread(f->p, info->buf, info->len, info->offset) != info->len) {
info->error = errno;
log_warn("vioscsi read error");
return NULL;
diff --git usr.sbin/vmd/virtio.c usr.sbin/vmd/virtio.c
index 4622ef4943f..099f1df6a7e 100644
--- usr.sbin/vmd/virtio.c
+++ usr.sbin/vmd/virtio.c
@@ -361,7 +361,7 @@ vioblk_start_read(struct vioblk_dev *dev, off_t sector, 
ssize_t sz)
goto nomem;
info->len = sz;
info->offset = sector * VIRTIO_BLK_SECTOR_SIZE;
-   info->fd = dev->fd;
+   info->file = >file;
 
return info;
 
@@ -375,7 +375,10 @@ nomem:
 static const uint8_t *
 vioblk_finish_read(struct ioinfo *info)
 {
-   if (pread(info->fd, info->buf, info->len, info->offset) != info->len) {
+   struct virtio_backing *file;
+
+   file = info->file;
+   if (file->pread(file->p, info->buf, info->len, info->offset) != 
info->len) {

Re: [PATCH] qcow2 disk format

2018-08-16 Thread Ori Bernstein
Updated style from feedback from off-list. Also added checks and
erroring for incompatible extensions.

---
 usr.sbin/vmd/Makefile  |   2 +-
 usr.sbin/vmd/vioqcow2.c| 546 +
 usr.sbin/vmd/vioqcow2.h|   6 +
 usr.sbin/vmd/vioscribble.c | 143 ++
 usr.sbin/vmd/virtio.c  |  10 +-
 5 files changed, 703 insertions(+), 4 deletions(-)
 create mode 100644 usr.sbin/vmd/vioqcow2.c
 create mode 100644 usr.sbin/vmd/vioqcow2.h
 create mode 100644 usr.sbin/vmd/vioscribble.c

diff --git usr.sbin/vmd/Makefile usr.sbin/vmd/Makefile
index 24c1d1b1d4a..b6db6c782d6 100644
--- usr.sbin/vmd/Makefile
+++ usr.sbin/vmd/Makefile
@@ -6,7 +6,7 @@ PROG=   vmd
 SRCS=  vmd.c control.c log.c priv.c proc.c config.c vmm.c
 SRCS+= vm.c loadfile_elf.c pci.c virtio.c i8259.c mc146818.c
 SRCS+= ns8250.c i8253.c vmboot.c ufs.c disklabel.c dhcp.c packet.c
-SRCS+= parse.y atomicio.c vioscsi.c vioraw.c
+SRCS+= parse.y atomicio.c vioscsi.c vioraw.c vioqcow2.c
 
 CFLAGS+=   -Wall -I${.CURDIR}
 CFLAGS+=   -Wstrict-prototypes -Wmissing-prototypes
diff --git usr.sbin/vmd/vioqcow2.c usr.sbin/vmd/vioqcow2.c
new file mode 100644
index 000..f951fcbd99a
--- /dev/null
+++ usr.sbin/vmd/vioqcow2.c
@@ -0,0 +1,546 @@
+/* $OpenBSD: $ */
+
+/*
+ * Copyright (c) 2018 Ori Bernstein 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmd.h"
+#include "vmm.h"
+#include "virtio.h"
+
+#define QCOW2_COMPRESSED   0x4000ull
+#define QCOW2_INPLACE  0x8000ull
+
+#define QCOW2_DIRTY(1 << 0)
+#define QCOW2_CORRUPT  (1 << 1)
+
+enum {
+   ICFEATURE_DIRTY = 1 << 0,
+   ICFEATURE_CORRUPT   = 1 << 1,
+};
+
+enum {
+   ACFEATURE_BITEXT= 1 << 0,
+};
+
+struct qcheader {
+   char magic[4];
+   uint32_t version;
+   uint64_t backingoff;
+   uint32_t backingsz;
+   uint32_t clustershift;
+   uint64_t disksz;
+   uint32_t cryptmethod;
+   uint32_t l1sz;
+   uint64_t l1off;
+   uint64_t refoff;
+   uint32_t refsz;
+   uint32_t snapcount;
+   uint64_t snapsz;
+   /* v3 additions */
+   uint64_t incompatfeatures;
+   uint64_t autoclearfeatures;
+   uint32_t refbits;
+   uint32_t headersz;
+} __packed;
+
+struct qcdisk {
+   pthread_rwlock_t lock;
+   struct qcdisk *base;
+   struct qcheader header;
+
+   int   fd;
+   uint64_t *l1;
+   char *scratch;
+   off_t end;
+   uint32_t  clustersz;
+   off_t disksz; /* in bytes */
+   uint32_t cryptmethod;
+
+   uint32_t l1sz;
+   off_tl1off;
+
+   off_trefoff;
+   uint32_t refsz;
+
+   uint32_t nsnap;
+   off_tsnapoff;
+
+   /* v3 features */
+   uint64_t incompatfeatures;
+   uint64_t autoclearfeatures;
+   uint32_t refssz;
+   uint32_t headersz;
+};
+
+extern char *__progname;
+
+static int move_cluster(struct qcdisk *, off_t, off_t);
+static off_t xlate(struct qcdisk *, off_t, int *);
+static off_t mkcluster(struct qcdisk *, off_t, off_t);
+static int inc_refs(struct qcdisk *, off_t, int);
+static int qc2_openpath(struct qcdisk *, char *, int);
+static int qc2_open(struct qcdisk *, int);
+static ssize_t qc2_pread(void *, char *, size_t, off_t);
+static ssize_t qc2_pwrite(void *, char *, size_t, off_t);
+static void qc2_close(void *);
+
+int
+virtio_init_qcow2(struct virtio_backing *file, off_t *szp, int fd)
+{
+   struct qcdisk *diskp;
+
+   diskp = malloc(sizeof(int*));
+   if (diskp == NULL)
+   return -1;
+   if (qc2_open(diskp, fd) == -1) {
+   free(diskp);
+   return -1;
+   }
+   file->p = diskp;
+   file->pread = qc2_pread;
+   file->pwrite = qc2_pwrite;
+   file->close = qc2_close;
+   *szp = diskp->disksz / 512;
+   return 0;
+}
+
+static int
+qc2_openpath(struct qcdisk *disk, char *path, int flags)
+{
+   int fd;
+
+   fd = open(path, flags);
+   if (f

Re: [PATCH] Pluggable disk formats for vmd (qcow2 preparation)

2018-08-15 Thread Ori Bernstein
Updated version below. This change fixes a few things:

- Removes a 30 second sleep that I'd forgotten to remove
  while debugging.
- Removes a dead seek() call.

---
 usr.sbin/vmd/Makefile  |  2 +-
 usr.sbin/vmd/vioraw.c  | 68 ++
 usr.sbin/vmd/vioscsi.c |  7 +++--
 usr.sbin/vmd/virtio.c  | 55 +++---
 usr.sbin/vmd/virtio.h  | 23 --
 5 files changed, 125 insertions(+), 30 deletions(-)
 create mode 100644 usr.sbin/vmd/vioraw.c

diff --git usr.sbin/vmd/Makefile usr.sbin/vmd/Makefile
index 7ea090f8886..24c1d1b1d4a 100644
--- usr.sbin/vmd/Makefile
+++ usr.sbin/vmd/Makefile
@@ -6,7 +6,7 @@ PROG=   vmd
 SRCS=  vmd.c control.c log.c priv.c proc.c config.c vmm.c
 SRCS+= vm.c loadfile_elf.c pci.c virtio.c i8259.c mc146818.c
 SRCS+= ns8250.c i8253.c vmboot.c ufs.c disklabel.c dhcp.c packet.c
-SRCS+= parse.y atomicio.c vioscsi.c
+SRCS+= parse.y atomicio.c vioscsi.c vioraw.c
 
 CFLAGS+=   -Wall -I${.CURDIR}
 CFLAGS+=   -Wstrict-prototypes -Wmissing-prototypes
diff --git usr.sbin/vmd/vioraw.c usr.sbin/vmd/vioraw.c
new file mode 100644
index 000..409974003f8
--- /dev/null
+++ usr.sbin/vmd/vioraw.c
@@ -0,0 +1,68 @@
+/* $OpenBSD: $ */
+/*
+ * Copyright (c) 2018 Ori Bernstein 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include 
+
+#include 
+#include 
+
+#include 
+#include 
+
+#include "vmd.h"
+#include "vmm.h"
+#include "virtio.h"
+
+static ssize_t
+raw_pread(void *file, char *buf, size_t len, off_t off)
+{
+   return pread(*(int *)file, buf, len, off);
+}
+
+static ssize_t
+raw_pwrite(void *file, char *buf, size_t len, off_t off)
+{
+   return pwrite(*(int *)file, buf, len, off);
+}
+
+static void
+raw_close(void *file)
+{
+   close(*(int *)file);
+   free(file);
+}
+
+int
+virtio_init_raw(struct virtio_backing *file, off_t *szp, int fd)
+{
+   off_t sz;
+   int *fdp;
+
+   sz = lseek(fd, 0, SEEK_END);
+   if (sz == -1)
+   return -1;
+
+   fdp = malloc(sizeof(int*));
+   *fdp = fd;
+   file->p = fdp;
+   file->pread = raw_pread;
+   file->pwrite = raw_pwrite;
+   file->close = raw_close;
+   *szp = sz / 512;
+   return 0;
+}
+
diff --git usr.sbin/vmd/vioscsi.c usr.sbin/vmd/vioscsi.c
index 93867887598..af504f0550d 100644
--- usr.sbin/vmd/vioscsi.c
+++ usr.sbin/vmd/vioscsi.c
@@ -197,7 +197,7 @@ vioscsi_start_read(struct vioscsi_dev *dev, off_t block, 
ssize_t n_blocks)
goto nomem;
info->len = n_blocks * VIOSCSI_BLOCK_SIZE_CDROM;
info->offset = block * VIOSCSI_BLOCK_SIZE_CDROM;
-   info->fd = dev->fd;
+   info->file = >file;
 
return info;
 
@@ -210,7 +210,10 @@ nomem:
 static const uint8_t *
 vioscsi_finish_read(struct ioinfo *info)
 {
-   if (pread(info->fd, info->buf, info->len, info->offset) != info->len) {
+   struct virtio_backing *f;
+
+   f = info->file;
+   if (f->pread(f->p, info->buf, info->len, info->offset) != info->len) {
info->error = errno;
log_warn("vioscsi read error");
return NULL;
diff --git usr.sbin/vmd/virtio.c usr.sbin/vmd/virtio.c
index 4622ef4943f..4aa3d081041 100644
--- usr.sbin/vmd/virtio.c
+++ usr.sbin/vmd/virtio.c
@@ -361,7 +361,7 @@ vioblk_start_read(struct vioblk_dev *dev, off_t sector, 
ssize_t sz)
goto nomem;
info->len = sz;
info->offset = sector * VIRTIO_BLK_SECTOR_SIZE;
-   info->fd = dev->fd;
+   info->file = >file;
 
return info;
 
@@ -375,7 +375,10 @@ nomem:
 static const uint8_t *
 vioblk_finish_read(struct ioinfo *info)
 {
-   if (pread(info->fd, info->buf, info->len, info->offset) != info->len) {
+   struct virtio_backing *file;
+
+   file = info->file;
+   if (file->pread(file->p, info->buf, info->len, info->offset) != 
info->len) {
info->error = errno;
log_warn("vioblk read error");
return NULL;
@@ -398,7 +401,7 @@ vioblk_start_write(s

Re: [PATCH] qcow2 disk format

2018-08-12 Thread Ori Bernstein
On Sun, 5 Aug 2018 00:52:32 -0700, Ori Bernstein  wrote:

> This change introduces a 'struct virtio_backing' which makes the
> disk i/o pluggable, providing 'backing->{pread,pwrite}' calls that
> can be replaced by different disk i/o drivers.
> 
> This is necessary preparation for adding qcow2 support, which will
> come as a follow up patch. I'll be posting a preview of it in a follow
> up email.

And, now something that actually appears to work. You can create a
disk on OpenBSD using qemu:

qemu-img create foo.qc2 16G

add it to your vm.conf:

disk "/path/to/foo.qc2"

boot and install OpenBSD on it as normal, and if you decide you don't like
hardware virtualization, you can point qemu at it and run using that:

qemu-system-x86_64 -m 1024 -hda foo.qc2 

Snapshots haven't been tested yet, and tools need to be added, incompatible
extensions are silently ignored, and there could stand to be a bit more sanity
checking.

vioscribble.c should also probably be extracted into a regress test, rather
than just something that sits beside the I/O code.

Patch below:

---
 usr.sbin/vmd/Makefile  |   2 +-
 usr.sbin/vmd/vioqcow2.c| 624 +
 usr.sbin/vmd/vioqcow2.h|   6 +
 usr.sbin/vmd/vioraw.c  |  17 +
 usr.sbin/vmd/vioscribble.c | 143 +
 usr.sbin/vmd/virtio.c  |  10 +-
 6 files changed, 798 insertions(+), 4 deletions(-)
 create mode 100644 usr.sbin/vmd/vioqcow2.c
 create mode 100644 usr.sbin/vmd/vioqcow2.h
 create mode 100644 usr.sbin/vmd/vioscribble.c

diff --git usr.sbin/vmd/Makefile usr.sbin/vmd/Makefile
index 24c1d1b1d4a..b6db6c782d6 100644
--- usr.sbin/vmd/Makefile
+++ usr.sbin/vmd/Makefile
@@ -6,7 +6,7 @@ PROG=   vmd
 SRCS=  vmd.c control.c log.c priv.c proc.c config.c vmm.c
 SRCS+= vm.c loadfile_elf.c pci.c virtio.c i8259.c mc146818.c
 SRCS+= ns8250.c i8253.c vmboot.c ufs.c disklabel.c dhcp.c packet.c
-SRCS+= parse.y atomicio.c vioscsi.c vioraw.c
+SRCS+= parse.y atomicio.c vioscsi.c vioraw.c vioqcow2.c
 
 CFLAGS+=   -Wall -I${.CURDIR}
 CFLAGS+=   -Wstrict-prototypes -Wmissing-prototypes
diff --git usr.sbin/vmd/vioqcow2.c usr.sbin/vmd/vioqcow2.c
new file mode 100644
index 000..fbc2e245495
--- /dev/null
+++ usr.sbin/vmd/vioqcow2.c
@@ -0,0 +1,624 @@
+/* $OpenBSD: $     */
+
+/*
+ * Copyright (c) 2018 Ori Bernstein 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include  /* PAGE_SIZE */
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pci.h"
+#include "vmd.h"
+#include "vmm.h"
+#include "virtio.h"
+
+#define QCOW2_COMPRESSED   0x4000ull
+#define QCOW2_INPLACE  0x8000ull
+
+enum {
+   ICFEATURE_DIRTY = 1 << 0,
+   ICFEATURE_CORRUPT   = 1 << 1,
+};
+
+enum {
+   ACFEATURE_BITEXT= 1 << 0,
+};
+
+struct qcdisk {
+   pthread_rwlock_t lock;
+   struct qcdisk *base;
+   int fd;
+
+   char *l1;
+   char *scratch;
+   off_t end;
+
+   uint32_t version;
+
+   uint64_t backingoff;
+   uint32_t backingsz;
+
+   uint32_t clustershift;
+   uint32_t clustersz;
+   off_tsz; /* in bytes */
+   uint32_t cryptmethod;
+
+   uint32_t l1size;
+   off_tl1off;
+
+   off_trefoff;
+   uint32_t refsize;
+
+   uint32_t nsnap;
+   off_tsnapoff;
+
+   /* v3 features */
+   uint64_t incompatfeatures;
+   uint64_t autoclearfeatures;
+   uint32_t refssz;
+   uint32_t headersz;
+};
+
+extern char *__progname;
+
+static int move_cluster(struct qcdisk *, off_t, off_t);
+static off_t xlate(struct qcdisk *, off_t, int *);
+static off_t mkcluster(struct qcdisk *, off_t, off_t);
+static int inc_refs(struct qcdisk *, off_t, int);
+static uint32_t getbe32(char **, char *);
+static uint64_t getbe64(char **, char *);
+static uint16_t unpackbe16(char *);
+static uint32_t unpackbe32(char *);
+static uint64_t unpackbe64(char *);

Re: [PATCH] Pluggable disk formats for vmd (qcow2 preparation)

2018-08-05 Thread Ori Bernstein
And, as promised, the preview of the approach to qcow2.

Current state:

- I can install and boot OpenBSD on qcow2 on OpenBSD.  However, I can't take
  that disk image and give it to qemu, it boots but then dies a few seconds
  after it starts to do file system access:

$ qemu-system-x86_64 -m 2048 -hda obsd.qc2  \
-netdev user,id=mynet0,hostfwd=tcp:127.0.0.1:7922-:22 \
-device e1000,netdev=mynet0 -smp 2

  Gives the following error:

qcow2: Marking image as corrupt: Preventing invalid write on metadata
(overlaps with active L2 table); further corruption events will be 
suppressed

- Snapshots have not been implemented, although the code theoretically won't
  clobber on-disk snapshots made with qemu. This has not been tested.

- There's no tool to create or manage disk images. This still needs to be
  implemented.

- There's no way to specify that an image is a qcow2 image: vmd probes it
  based on the header, which has a very slim theoretical possibility of
  misidentifying a raw disk.

- And so on.

---

diff --git usr.sbin/vmd/Makefile usr.sbin/vmd/Makefile
index 24c1d1b1d4a..b6db6c782d6 100644
--- usr.sbin/vmd/Makefile
+++ usr.sbin/vmd/Makefile
@@ -6,7 +6,7 @@ PROG=   vmd
 SRCS=  vmd.c control.c log.c priv.c proc.c config.c vmm.c
 SRCS+= vm.c loadfile_elf.c pci.c virtio.c i8259.c mc146818.c
 SRCS+= ns8250.c i8253.c vmboot.c ufs.c disklabel.c dhcp.c packet.c
-SRCS+= parse.y atomicio.c vioscsi.c vioraw.c
+SRCS+= parse.y atomicio.c vioscsi.c vioraw.c vioqcow2.c
 
 CFLAGS+=   -Wall -I${.CURDIR}
 CFLAGS+=   -Wstrict-prototypes -Wmissing-prototypes
diff --git usr.sbin/vmd/vioqcow2.c usr.sbin/vmd/vioqcow2.c
new file mode 100644
index 000..553506d467f
--- /dev/null
+++ usr.sbin/vmd/vioqcow2.c
@@ -0,0 +1,612 @@
+/* $OpenBSD: $ */
+
+/*
+ * Copyright (c) 2018 Ori Bernstein 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include  /* PAGE_SIZE */
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pci.h"
+#include "vmd.h"
+#include "vmm.h"
+#include "virtio.h"
+
+#define QCOW2_COMPRESSED   0x4000ull
+#define QCOW2_INPLACE  0x8000ull
+
+enum {
+   ICFEATURE_DIRTY = 1 << 0,
+   ICFEATURE_CORRUPT   = 1 << 1,
+};
+
+enum {
+   ACFEATURE_BITEXT= 1 << 0,
+};
+
+struct qcdisk {
+   pthread_rwlock_t lock;
+   struct qcdisk *base;
+   int fd;
+
+   char *l1;
+   char *scratch;
+   off_t end;
+
+   uint32_t version;
+
+   uint64_t backingoff;
+   uint32_t backingsz;
+
+   uint32_t clustershift;
+   uint32_t clustersz;
+   off_tsz; /* in bytes */
+   uint32_t cryptmethod;
+
+   uint32_t l1size;
+   uint64_t l1off;
+
+   uint64_t refoff;
+   uint32_t refsize;
+
+   uint32_t nsnap;
+   uint64_t snapoff;
+
+   /* v3 features */
+   uint64_t incompatfeatures;
+   uint64_t autoclearfeatures;
+   uint32_t refcountsz;
+   uint32_t headersz;
+};
+
+extern char *__progname;
+
+static int move_cluster(struct qcdisk *, off_t, off_t);
+static off_t xlate(struct qcdisk *, off_t, int *);
+static off_t mkcluster(struct qcdisk *, off_t, off_t);
+static int inc_refcount(struct qcdisk *, off_t, int);
+static uint32_t getbe32(char **, char *);
+static uint64_t getbe64(char **, char *);
+static uint16_t unpackbe16(char *);
+static uint32_t unpackbe32(char *);
+static uint64_t unpackbe64(char *);
+static void packbe16(char *p, uint32_t v);
+//static void packbe32(char *p, uint32_t v);
+static void packbe64(char *, uint64_t);
+static int qc2_openpath(struct qcdisk *, char *, int);
+static int qc2_open(struct qcdisk *, int);
+static ssize_t qc2_pread(void *, char *, size_t, off_t);
+static ssize_t qc2_pwrite(void *, char *, size_t, off_t);
+static void qc2_close(void *);
+
+int
+virtio_init_qcow2(struct virtio_backing *file, off_t *szp, int fd)
+{
+   struct qcdisk *diskp;
+
+   diskp = malloc

[PATCH] Pluggable disk formats for vmd (qcow2 preparation)

2018-08-05 Thread Ori Bernstein
;
return (-1);
}
 
-   vioscsi->fd = child_cdrom;
+   virtio_init_disk(>file, >sz, child_cdrom);
 
return (0);
 }
diff --git usr.sbin/vmd/virtio.h usr.sbin/vmd/virtio.h
index 20327f9915c..6c5f4f0c7aa 100644
--- usr.sbin/vmd/virtio.h
+++ usr.sbin/vmd/virtio.h
@@ -46,6 +46,11 @@
  */
 #define VIRTIO_MAX_QUEUES  3
 
+enum {
+   DISKTYPE_RAW,
+   DISKTYPE_QCOW2
+};
+
 /*
  * This struct stores notifications from a virtio driver. There is
  * one such struct per virtio device.
@@ -61,6 +66,14 @@ struct virtio_io_cfg {
uint8_t isr_status;
 };
 
+struct virtio_backing {
+   void  *p;
+   char *path;
+   ssize_t  (*pread)(void *p, char *buf, size_t len, off_t off);
+   ssize_t  (*pwrite)(void *p, char *buf, size_t len, off_t off);
+   void (*close)(void *p);
+};
+
 /*
  * A virtio device can have several virtqs. For example, vionet has one virtq
  * each for transmitting and receiving packets. This struct describes the state
@@ -147,8 +160,8 @@ struct vioblk_dev {
struct virtio_io_cfg cfg;
 
struct virtio_vq_info vq[VIRTIO_MAX_QUEUES];
+   struct virtio_backing file;
 
-   int fd;
uint64_t sz;
uint32_t max_xfer;
 
@@ -168,9 +181,10 @@ struct vioscsi_dev {
 
struct virtio_vq_info vq[VIRTIO_MAX_QUEUES];
 
+   struct virtio_backing file;
+
/* is the device locked */
int locked;
-   int fd;
/* size of iso file in bytes */
uint64_t sz;
/* last block address read */
@@ -241,10 +255,10 @@ struct vmmci_dev {
 };
 
 struct ioinfo {
+   struct virtio_backing *file;
uint8_t *buf;
ssize_t len;
off_t offset;
-   int fd;
int error;
 };
 
@@ -261,6 +275,9 @@ void viornd_update_qs(void);
 void viornd_update_qa(void);
 int viornd_notifyq(void);
 
+int virtio_init_raw(struct virtio_backing *dev, off_t *sz, int fd);
+int virtio_init_qcow2(struct virtio_backing *dev, off_t *sz, int fd);
+
 int virtio_blk_io(int, uint16_t, uint32_t *, uint8_t *, void *, uint8_t);
 int vioblk_dump(int);
 int vioblk_restore(int, struct vm_create_params *, int *);

-- 
Ori Bernstein



Re: patterns.c question or possible bug

2018-01-29 Thread Ori Bernstein
On Mon, 29 Jan 2018 23:23:18 -0600, Edgar Pettijohn <ed...@pettijohn-web.com> 
wrote:

> I'm trying to use patterns.c for some pattern matching. The manual 
> mentions captures using "()" around what you want to capture.  I don't 
> see how to get at the data though.  Here is a sample program.
> 
> #include 
> #include "patterns.h"
> 
> int
> main(int argc, char *argv[])
> {
>  const char*errstr = NULL;
>  const char*string = "the quick the brown the fox";
>  const char*pattern = "the";
>  intret;
>  struct str_match match;
> 
>  ret = str_match(string, pattern, , );
> 
>  if (errstr != NULL)
>  printf("%s\n", errstr);
>  else
>  printf("number of matches %d\n", match.sm_nmatch);
> 
>  return 0;
> }
> 
> It prints 2 which I was expecting 3. I've tried multiple other patterns 
> and it seems the answer is always 2. Which leads me to believe I'm doing 
> something wrong.  Any assistance appreciated.
> 
> 
> Thanks,
> 
> 
> Edgar

The code is looking for a match of the pattern in the string, not all matches
of the pattern in the string. It also makes the (IMO, surprising) decision
that not having any capture groups in the pattern implies capturing the whole
pattern. The whole string goes into the first match.

So, in your case, you're matching:

"the quick the brown the fox";
 ^^^

Accordingly:

matches.sm_match[0] = "the quick the brown the fox"
matches.sm_match[1] = "the"

If you had 'quick', you'd get similar behavior:

"the quick the brown the fox";
 

Equivalently, putting the whole pattern in '()' will match the same thing:

pattern = "(quick)"

But multiple parens will match their substrings:

pattern = "(qu)ick (the)"

"the quick the brown the fox";
 ^^^^^
        matches.sm_match[0] = "the quick the brown the fox"
matches.sm_match[1] = "qu"
matches.sm_match[2] = "the"

The choice to capture implicitly, I think, is confusing, but the behavior
seems to me to be correct.

-- 
Ori Bernstein



Re: Remove commented out pledge in tsort

2017-11-13 Thread Ori Bernstein
On Sun, 12 Nov 2017 13:31:10 +0100, Marc Espie <es...@nerim.net> wrote:

> I did this in an eager way a while back, was told in no uncertain terms
> "not yet, we're still looking at things".
> 
> Well, see pledge(2). Whenever the dust settles, sure we can kill this line,
> or activate it instead.
> 
> Dust is not yet settled.
> 

In this specific case, the line is already activated. It's just duplicated
in a comment one line before the pledge gets called.

-- 
Ori Bernstein



Re: Httpd support for internal redirects.

2017-10-21 Thread Ori Bernstein
On Sat, 21 Oct 2017 22:18:17 +0200, Anders Andersson <pipat...@gmail.com> wrote:

> Maybe I don't fully understand the problem, but is it not better to
> fix your broken website generator than to add more code to httpd?
> 

It seemed like other people wanted the feature as well, and based
on the contents of the cited thread, it seemed like something that
might be accepted if implemented.

It was easy to implement, so I added it and sent a patch. If
it's not a desirable feature for httpd, no problem.

-- 
    Ori Bernstein



Re: ksh: skip long history lines

2017-10-18 Thread Ori Bernstein
On Thu, 19 Oct 2017 00:22:51 +0200
Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote:

> Those variables are static so that error messages are only printed once
> in the shell lifetime.  history reload happens each time the shell
> detects that the histfile has been modified, which can happen often if
> like me you have multiple interactives shells running at the same time.
 
Makes sense, although I had been thinking that it could lead to lost
warnings when changing histfile; I'd hope that a corrupt histfile isn't
a persistent condition.

In any case, this looks pretty good to me.

> > Maybe hoist this outside of the loop.
> 
> I don't see a reason for this.
> 
> >> +  /* no trailing newline? */
> >> +  if (strlen(p) != sizeof(encoded) - 1) {
> >> +  if (!corrupt) {
> >> +  corrupt = 1;
> >> +  bi_errorf("history file is corrupt");
> >> +  }
> >> +  return;
> >
> > Why does `corrupt` need to be recorded at all? There's a
> > return right after it.
> 
> Also to limit the amount of crap printed on screen, though maybe we
> should warn every time in this case.  Here's an updated diff that does
> so and hopefully makes the intents clearer.  It also adds a comment
> suggested by anton@
> 
> If your histfile ends with an overlong line which is also not terminated
> by a newline, the do...while loop would silently ignore this last line.
> I think that's acceptable, but YMMV.
> 
> 
> Index: history.c
> ===
> RCS file: /d/cvs/src/bin/ksh/history.c,v
> retrieving revision 1.72
> diff -u -p -p -u -r1.72 history.c
> --- history.c 18 Oct 2017 15:41:25 -  1.72
> +++ history.c 18 Oct 2017 22:16:43 -
> @@ -739,24 +739,42 @@ history_close(void)
>  static void
>  history_load(Source *s)
>  {
> - char*p, encoded[LINE + 1], line[LINE + 1];
> + char*nl, *p, encoded[LINE + 1], line[LINE + 1];
>  
>   rewind(histfh);
> + line_co = 1;
>  
>   /* just read it all; will auto resize history upon next command */
> - for (line_co = 1; ; line_co++) {
> - p = fgets(encoded, sizeof(encoded), histfh);
> - if (p == NULL || feof(histfh) || ferror(histfh))
> - break;
> - if ((p = strchr(encoded, '\n')) == NULL) {
> - bi_errorf("history file is corrupt");
> - return;
> + while ((p = fgets(encoded, sizeof(encoded), histfh)) != NULL) {
> + if ((nl = strchr(encoded, '\n')) == NULL) {
> + static int warned;
> +
> + /* no trailing newline? */
> + if (strlen(p) != sizeof(encoded) - 1) {
> + bi_errorf("history file is corrupt");
> + return;
> + }
> +
> + if (!warned) {
> + warned = 1;
> + bi_errorf(
> + "ignoring history line(s) longer than %d"
> + " bytes", LINE);
> + }
> +
> + /* discard overlong line */
> + do {
> + p = fgets(encoded, sizeof(encoded), histfh);
> + } while (p != NULL && strchr(p, '\n') == NULL);
> +
> + continue;
>   }
> - *p = '\0';
> +     *nl = '\0';
>   s->line = line_co;
>   s->cmd_offset = line_co;
>   strunvis(line, encoded);
>   histsave(line_co, line, 0);
> + line_co++;
>   }
>  
>   history_write();
> 
> -- 
> jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE
> 


-- 
Ori Bernstein <o...@eigenstate.org>



Re: Httpd support for internal redirects.

2017-10-18 Thread Ori Bernstein
Pinging this patch.

On Tue, 10 Oct 2017 21:31:20 -0700
Ori Bernstein <o...@eigenstate.org> wrote:

> My website generator is a little stupid at times. It generates
> files with .html suffixes, but urls without them.
> 
> I worked around this with some redirects, but it never felt
> quite right doing an extra round trip. Therefore, I added
> internal redirects, processing the rewrite before responding to
> an http request.
> 
> This introduces new syntax to the config file, allowing you to
> do:
> 
>   location match "^(/foo/bar/[%w]+)$" {
>   rewrite-to "/baz/%1.html"
>   }
> 
> Because we don't know what the paths should be relative
> to, all paths rewritten must be absolute.
> 
> It seems like someone else may find it useful[1], so
> I'm submitting it. I've been running a slightly older
> version of this on https://myrlang.org for the last
> day or two, and it's been uneventful. The difference
> is that the syntax used to piggy back off the "block"
> action => 'block internal return 302 "path"'.
> 
> This doesn't currently support chained rewrites. I think
> that it wouldn't be hard to add if it's needed.
> 
> Ok?
> 
> [1] https://github.com/reyk/httpd/issues/27
> ==
> 
> 
> diff --git usr.sbin/httpd/config.c usr.sbin/httpd/config.c
> index 3c31c3d4cd3..7d2982af7b9 100644
> --- usr.sbin/httpd/config.c
> +++ usr.sbin/httpd/config.c
> @@ -448,7 +448,7 @@ config_getserver_config(struct httpd *env, struct server 
> *srv,
>   sizeof(srv_conf->errorlog));
>   }
>  
> - f = SRVFLAG_BLOCK|SRVFLAG_NO_BLOCK;
> + f = SRVFLAG_BLOCK|SRVFLAG_NO_BLOCK|SRVFLAG_REWRITE;
>   if ((srv_conf->flags & f) == 0) {
>   free(srv_conf->return_uri);
>   srv_conf->flags |= parent->flags & f;
> diff --git usr.sbin/httpd/httpd.conf.5 usr.sbin/httpd/httpd.conf.5
> index a3c97629de3..3a00a750537 100644
> --- usr.sbin/httpd/httpd.conf.5
> +++ usr.sbin/httpd/httpd.conf.5
> @@ -454,6 +454,14 @@ instead of the log files.
>  Disable any previous
>  .Ic block
>  in a location.
> +.It Ic rewrite-to Ar path
> +The current request path is rewritten to
> +.Ar  path .
> +using the same macro expansions as
> +.Cm block return
> +rules. After macros are substituted, the resulting paths must be
> +absolute, beginning with a slash.  Rewriting is not done
> +recursively.
>  .It Ic root Ar option
>  Configure the document root and options for the request path.
>  Valid options are:
> diff --git usr.sbin/httpd/httpd.h usr.sbin/httpd/httpd.h
> index 05cbb8e3550..477115ec92d 100644
> --- usr.sbin/httpd/httpd.h
> +++ usr.sbin/httpd/httpd.h
> @@ -394,6 +394,7 @@ SPLAY_HEAD(client_tree, client);
>  #define SRVFLAG_SERVER_MATCH 0x0020
>  #define SRVFLAG_SERVER_HSTS  0x0040
>  #define SRVFLAG_DEFAULT_TYPE 0x0080
> +#define SRVFLAG_REWRITE  0x0100
>  
>  #define SRVFLAG_BITS \
>   "\10\01INDEX\02NO_INDEX\03AUTO_INDEX\04NO_AUTO_INDEX"   \
> diff --git usr.sbin/httpd/parse.y usr.sbin/httpd/parse.y
> index fcf1938c42d..4072ee5b532 100644
> --- usr.sbin/httpd/parse.y
> +++ usr.sbin/httpd/parse.y
> @@ -134,7 +134,7 @@ typedef struct {
>  %token   LISTEN LOCATION LOG LOGDIR MATCH MAXIMUM NO NODELAY OCSP ON 
> PORT PREFORK
>  %token   PROTOCOLS REQUESTS ROOT SACK SERVER SOCKET STRIP STYLE SYSLOG 
> TCP TICKET
>  %token   TIMEOUT TLS TYPE TYPES HSTS MAXAGE SUBDOMAINS DEFAULT PRELOAD 
> REQUEST
> -%token   ERROR INCLUDE AUTHENTICATE WITH BLOCK DROP RETURN PASS
> +%token   ERROR INCLUDE AUTHENTICATE WITH BLOCK DROP RETURN REWRITE PASS
>  %token STRING
>  %token NUMBER
>  %typeport
> @@ -986,6 +986,11 @@ filter   : block RETURN NUMBER optstring {
>   srv_conf->return_uri_len = strlen($4) + 1;
>   }
>   }
> + | REWRITE STRING {
> + srv_conf->flags |= SRVFLAG_REWRITE;
> + srv_conf->return_uri = $2;
> + srv_conf->return_uri_len = strlen($2) + 1;
> + }
>   | block DROP{
>   /* No return code, silently drop the connection */
>   srv_conf->return_code = 0;
> @@ -1255,6 +1260,7 @@ lookup(char *s)
>   { "request",REQUEST },
>   { "requests",   REQUESTS },
>

Re: ksh: skip long history lines

2017-10-18 Thread Ori Bernstein
On Wed, 18 Oct 2017 22:33:51 +0200
Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote:

> 
> It would be nice to support arbitrarily long lines, but a first step
> would be to skip them gracefuly.
> 
> The code modifies the loop condition: no tests against ferror(3)/feof(3)
> are performed any more (I don't see their point).
> 
> We could print a warning on ferror(), though, but that would be another
> diff.
> 
> ok?
> 
> 
> Index: history.c
> ===
> RCS file: /d/cvs/src/bin/ksh/history.c,v
> retrieving revision 1.72
> diff -u -p -p -u -r1.72 history.c
> --- history.c 18 Oct 2017 15:41:25 -  1.72
> +++ history.c 18 Oct 2017 20:18:24 -
> @@ -739,24 +739,45 @@ history_close(void)
>  static void
>  history_load(Source *s)
>  {
> - char*p, encoded[LINE + 1], line[LINE + 1];
> + char*nl, *p, encoded[LINE + 1], line[LINE + 1];
>  
>   rewind(histfh);
> + line_co = 1;
>  
>   /* just read it all; will auto resize history upon next command */
> - for (line_co = 1; ; line_co++) {
> - p = fgets(encoded, sizeof(encoded), histfh);
> - if (p == NULL || feof(histfh) || ferror(histfh))
> - break;
> - if ((p = strchr(encoded, '\n')) == NULL) {
> - bi_errorf("history file is corrupt");
> - return;
> + for (p = fgets(encoded, sizeof(encoded), histfh); p != NULL;
> +  p = fgets(encoded, sizeof(encoded), histfh)) {

The redundant calls prevent this code from scanning as well as it
could for me. Perhaps:

for(;;) {
if ((p = fgets(...) == NULL)
break;

Just a minor cosmetic nitpick, it's not a big deal either way.

> + if ((nl = strchr(encoded, '\n')) == NULL) {
> + static int corrupt, toolong;

I'm not sure why these need to be static. I'm not an expert on
the code, but it looks like histsave can be called multiple
times as the shell runs, which triggers a history reload.
Keeping this state seems like it would be wrong.

Maybe hoist this outside of the loop.

> + /* no trailing newline? */
> + if (strlen(p) != sizeof(encoded) - 1) {
> + if (!corrupt) {
> + corrupt = 1;
> + bi_errorf("history file is corrupt");
> + }
> + return;

Why does `corrupt` need to be recorded at all? There's a
return right after it.

> + }
> +
> + if (!toolong) {
> + toolong = 1;
> + bi_errorf(
> + "ignoring history line(s) longer than %d"
> + " bytes", LINE);
> + }
> +
> + do {
> + p = fgets(encoded, sizeof(encoded), histfh);
> + } while (p != NULL && strchr(p, '\n') == NULL);
> +
> + continue;
>   }
> - *p = '\0';
> + *nl = '\0';
>   s->line = line_co;
>   s->cmd_offset = line_co;
>   strunvis(line, encoded);
>   histsave(line_co, line, 0);
> + line_co++;
>   }
>  
>   history_write();
> 
> -- 
> jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE
> 


-- 
Ori Bernstein <o...@eigenstate.org>



Httpd support for internal redirects.

2017-10-10 Thread Ori Bernstein
My website generator is a little stupid at times. It generates
files with .html suffixes, but urls without them.

I worked around this with some redirects, but it never felt
quite right doing an extra round trip. Therefore, I added
internal redirects, processing the rewrite before responding to
an http request.

This introduces new syntax to the config file, allowing you to
do:

location match "^(/foo/bar/[%w]+)$" {
rewrite-to "/baz/%1.html"
}

Because we don't know what the paths should be relative
to, all paths rewritten must be absolute.

It seems like someone else may find it useful[1], so
I'm submitting it. I've been running a slightly older
version of this on https://myrlang.org for the last
day or two, and it's been uneventful. The difference
is that the syntax used to piggy back off the "block"
action => 'block internal return 302 "path"'.

This doesn't currently support chained rewrites. I think
that it wouldn't be hard to add if it's needed.

Ok?

[1] https://github.com/reyk/httpd/issues/27
==


diff --git usr.sbin/httpd/config.c usr.sbin/httpd/config.c
index 3c31c3d4cd3..7d2982af7b9 100644
--- usr.sbin/httpd/config.c
+++ usr.sbin/httpd/config.c
@@ -448,7 +448,7 @@ config_getserver_config(struct httpd *env, struct server 
*srv,
sizeof(srv_conf->errorlog));
}
 
-   f = SRVFLAG_BLOCK|SRVFLAG_NO_BLOCK;
+   f = SRVFLAG_BLOCK|SRVFLAG_NO_BLOCK|SRVFLAG_REWRITE;
if ((srv_conf->flags & f) == 0) {
free(srv_conf->return_uri);
srv_conf->flags |= parent->flags & f;
diff --git usr.sbin/httpd/httpd.conf.5 usr.sbin/httpd/httpd.conf.5
index a3c97629de3..3a00a750537 100644
--- usr.sbin/httpd/httpd.conf.5
+++ usr.sbin/httpd/httpd.conf.5
@@ -454,6 +454,14 @@ instead of the log files.
 Disable any previous
 .Ic block
 in a location.
+.It Ic rewrite-to Ar path
+The current request path is rewritten to
+.Ar  path .
+using the same macro expansions as
+.Cm block return
+rules. After macros are substituted, the resulting paths must be
+absolute, beginning with a slash.  Rewriting is not done
+recursively.
 .It Ic root Ar option
 Configure the document root and options for the request path.
 Valid options are:
diff --git usr.sbin/httpd/httpd.h usr.sbin/httpd/httpd.h
index 05cbb8e3550..477115ec92d 100644
--- usr.sbin/httpd/httpd.h
+++ usr.sbin/httpd/httpd.h
@@ -394,6 +394,7 @@ SPLAY_HEAD(client_tree, client);
 #define SRVFLAG_SERVER_MATCH   0x0020
 #define SRVFLAG_SERVER_HSTS0x0040
 #define SRVFLAG_DEFAULT_TYPE   0x0080
+#define SRVFLAG_REWRITE0x0100
 
 #define SRVFLAG_BITS   \
"\10\01INDEX\02NO_INDEX\03AUTO_INDEX\04NO_AUTO_INDEX"   \
diff --git usr.sbin/httpd/parse.y usr.sbin/httpd/parse.y
index fcf1938c42d..4072ee5b532 100644
--- usr.sbin/httpd/parse.y
+++ usr.sbin/httpd/parse.y
@@ -134,7 +134,7 @@ typedef struct {
 %token LISTEN LOCATION LOG LOGDIR MATCH MAXIMUM NO NODELAY OCSP ON PORT PREFORK
 %token PROTOCOLS REQUESTS ROOT SACK SERVER SOCKET STRIP STYLE SYSLOG TCP TICKET
 %token TIMEOUT TLS TYPE TYPES HSTS MAXAGE SUBDOMAINS DEFAULT PRELOAD REQUEST
-%token ERROR INCLUDE AUTHENTICATE WITH BLOCK DROP RETURN PASS
+%token ERROR INCLUDE AUTHENTICATE WITH BLOCK DROP RETURN REWRITE PASS
 %token   STRING
 %token   NUMBER
 %type  port
@@ -986,6 +986,11 @@ filter : block RETURN NUMBER optstring {
srv_conf->return_uri_len = strlen($4) + 1;
}
}
+   | REWRITE STRING {
+   srv_conf->flags |= SRVFLAG_REWRITE;
+   srv_conf->return_uri = $2;
+   srv_conf->return_uri_len = strlen($2) + 1;
+   }
| block DROP{
/* No return code, silently drop the connection */
srv_conf->return_code = 0;
@@ -1255,6 +1260,7 @@ lookup(char *s)
{ "request",REQUEST },
{ "requests",   REQUESTS },
{ "return", RETURN },
+   { "rewrite-to", REWRITE },
{ "root",   ROOT },
{ "sack",   SACK },
{ "server", SERVER },
diff --git usr.sbin/httpd/server_http.c usr.sbin/httpd/server_http.c
index e64de0d2f9c..c9ea4771037 100644
--- usr.sbin/httpd/server_http.c
+++ usr.sbin/httpd/server_http.c
@@ -1162,10 +1162,34 @@ server_expand_http(struct client *clt, const char *val, 
char *buf,
return (buf);
 }
 
+static int
+server_set_path(struct http_descriptor *desc, char *input)
+{
+   char path[PATH_MAX];
+
+   if (input == NULL || url_decode(input) == NULL)
+   return -1;
+   if 

Awk: Store program name before printing it.

2017-10-08 Thread Ori Bernstein
Just move the assignment up a bit.



diff --git usr.bin/awk/main.c usr.bin/awk/main.c
index 82996bc6f71..ebff17d240a 100644
--- usr.bin/awk/main.c
+++ usr.bin/awk/main.c
@@ -64,13 +64,13 @@ int main(int argc, char *argv[])
setlocale(LC_ALL, "");
setlocale(LC_NUMERIC, "C"); /* for parsing cmdline & prog */
 
+   cmdname = __progname;
if (pledge("stdio rpath wpath cpath proc exec", NULL) == -1) {
fprintf(stderr, "%s: pledge: incorrect arguments\n",
cmdname);
exit(1);
}
 
-   cmdname = __progname;
if (argc == 1) {
fprintf(stderr, "usage: %s [-safe] [-V] [-d[n]] [-F fs] "
"[-v var=value] [prog | -f progfile]\n\tfile ...\n",



Re: Exec pledges

2017-10-08 Thread Ori Bernstein
And pax, because I can




diff --git bin/pax/ar_io.c bin/pax/ar_io.c
index 40a6492405e..ce53a9ae51b 100644
--- bin/pax/ar_io.c
+++ bin/pax/ar_io.c
@@ -1281,6 +1281,11 @@ ar_start_gzip(int fd, const char *path, int wr)
/* System compressors are more likely to use pledge(2) */
putenv("PATH=/usr/bin:/usr/local/bin");
 
+   /* Restrict them to sane pledges */
+   if (pledge(NULL, "stdio rpath wpath cpath fattr chown "
+   "prot_exec") == -1)
+   err(1, "pledge");
+
if (execlp(path, path, gzip_flags, (char *)NULL) < 0)
err(1, "could not exec %s", path);
/* NOTREACHED */



Re: Exec pledges

2017-10-08 Thread Ori Bernstein
Slowcgi. Because if someone could fool it into
running the wrong binary, the outcome may be
suboptimal.




diff --git usr.sbin/slowcgi/slowcgi.8 usr.sbin/slowcgi/slowcgi.8
index d3ab4030bed..f8f07630204 100644
--- usr.sbin/slowcgi/slowcgi.8
+++ usr.sbin/slowcgi/slowcgi.8
@@ -24,6 +24,7 @@
 .Nm
 .Op Fl d
 .Op Fl p Ar path
+.Op Fl P Ar pledge
 .Op Fl s Ar socket
 .Op Fl u Ar user
 .Sh DESCRIPTION
@@ -72,6 +73,9 @@ A
 of
 .Pa /
 effectively disables the chroot.
+.It Fl P Ar pledge
+Restrict all spawned processes to the pledge
+.Ar pledge .
 .It Fl s Ar socket
 Create and bind to alternative local socket at
 .Ar socket .
diff --git usr.sbin/slowcgi/slowcgi.c usr.sbin/slowcgi/slowcgi.c
index a9a90b2db1f..16cfbd1b80a 100644
--- usr.sbin/slowcgi/slowcgi.c
+++ usr.sbin/slowcgi/slowcgi.c
@@ -275,6 +275,7 @@ main(int argc, char *argv[])
struct passwd   *pw;
struct stat  sb;
int  c, fd;
+   const char  *execpledge = NULL;
const char  *chrootpath = NULL;
const char  *slowcgi_user = SLOWCGI_USER;
 
@@ -303,6 +304,9 @@ main(int argc, char *argv[])
case 'p':
chrootpath = optarg;
break;
+   case 'P':
+   execpledge = optarg;
+   break;
case 's':
fcgi_socket = optarg;
break;
@@ -353,7 +357,7 @@ main(int argc, char *argv[])
setresuid(pw->pw_uid, pw->pw_uid, pw->pw_uid))
lerr(1, "unable to revoke privs");
 
-   if (pledge("stdio rpath unix proc exec", NULL) == -1)
+   if (pledge("stdio rpath unix proc exec", execpledge) == -1)
lerr(1, "pledge");
 
SLIST_INIT(_proc.requests);



Re: Exec pledges

2017-10-08 Thread Ori Bernstein
This is my pledge(1). There are many like it, but this one is mine.
When directory pledges land, this should also get support for them.

Usage example:

pledge stdio echo hello world

More complicated, with enough pledges to run awk:

pledge "stdio rpath wpath cpath proc exec prot_exec" \
awk 'BEGIN {print "hi"}'




diff --git usr.bin/Makefile usr.bin/Makefile
index f428c790fe7..d4c506bc918 100644
--- usr.bin/Makefile
+++ usr.bin/Makefile
@@ -18,7 +18,7 @@ SUBDIR= apply arch at aucat audioctl awk banner \
midiplay mixerctl mkdep mklocale mktemp nc netstat \
newsyslog \
nfsstat nice nm nl nohup openssl pagesize passwd paste patch pctr \
-   pkg-config pkill \
+   pkg-config pkill pledge \
pr printenv printf quota radioctl rcs rdist rdistd \
readlink renice rev rpcgen rpcinfo rs rup rusers rwall \
sdiff script sed sendbug shar showmount signify skey \
diff --git usr.bin/pledge/Makefile usr.bin/pledge/Makefile
new file mode 100644
index 000..cdcccf1af61
--- /dev/null
+++ usr.bin/pledge/Makefile
@@ -0,0 +1,6 @@
+#  $OpenBSD$
+
+PROG=  pledge
+CFLAGS+= -Wall -Werror
+
+.include 
diff --git usr.bin/pledge/pledge.1 usr.bin/pledge/pledge.1
new file mode 100644
index 000..e049277fdff
--- /dev/null
+++ usr.bin/pledge/pledge.1
@@ -0,0 +1,44 @@
+.\" $OpenBSD$
+.\"
+.\"Copyright (c) 2017 Ori Bernstein <o...@eigenstate.org>
+.\"
+.\"Permission to use, copy, modify, and distribute this software for any
+.\"purpose with or without fee is hereby granted, provided that the above
+.\"copyright notice and this permission notice appear in all copies.
+.\"
+.\"THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+.\"WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+.\"MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+.\"ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+.\"WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+.\"ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+.\"OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+.Dd $Mdocdate: September 4 2017 $
+.Dt PLEDGE 1
+.Os
+.Sh NAME
+.Nm pledge
+.Nd execute commands under a pledge promise
+.Sh SYNOPSIS
+.Nm pledge
+.Ar promise
+.Ar command...
+.Sh DESCRIPTION
+The
+.Nm pledge
+utility executes the given command with the provided pledge
+restrictions. The first argument specifies the
+.Ar promise
+that the
+.Ar command
+will be run under.
+.Sh HISTORY
+The
+.Nm
+command first appeared in
+.Ox 6.2 .
+.Sh BUGS
+This program does not support directory pledges.
+.Sh AUTHORS
+.An Ori Bernstein Aq Mt o...@eigenstate.org
+
diff --git usr.bin/pledge/pledge.c usr.bin/pledge/pledge.c
new file mode 100644
index 000..0431df7c7de
--- /dev/null
+++ usr.bin/pledge/pledge.c
@@ -0,0 +1,15 @@
+#include 
+#include 
+#include 
+
+int
+main(int argc, char **argv)
+{
+   if (argc < 2)
+   errx(1, "%s pledge cmd...\n", argv[0]);
+   if (pledge("stdio exec", argv[1]) == -1)
+   err(1, "pledge");
+   if (execvp(argv[2], [2]) == -1)
+   err(1, "exec");
+   return 0;
+}



Exec pledges

2017-10-08 Thread Ori Bernstein
I wrote some patches to allow pledging across execs.
Currently, the exec pledge passes down the process tree.

The initial version simply inherited the current pledge when
execing with the `pledge("rexec")` promise, but after
discussing with Theo at EuroBSD, a better design was
suggested.  Because directory pledges are going to be their
own system call, we can repurpose the second argument of
pledge as the "exec pledge".

My use is for sandboxing a bad idea, where arbitrary users can
submit code to a sandbox for my pet language, sitting on my
website, which gets compiled and run.  In the base system, it
seems like many programs with pledge("exec") invoke arbitrary
binaries on behalf of the user, but there are others (eg, pax)
that expect specfic behavior from what they exec.

This also could have broader uses -- for example, pledging
entire shell scripts. It also helps mitigate the case where an
attacker might fool us into exec()ing the wrong binary.

This is split into four patches:

- Adding kernel+libc support and docs. This changes the
  signature of pledge, but because everything calls
  plege("promises", NULL), things keep working.

- Add a pledge(1) binary + docs. This allows enforcing
  arbitrary pledges for a process tree. The minimal pledge for
  dynamically linked executables is a bit broad ("stdio rpath
  prot_exec"), due to ld.so, but this is still usefully
  restrictive. For statically linked executables, we do even
  better.  'pledge stdio echo hi' works just fine.

- The third patch adds the ability to pledge programs running
  under slowcgi.

- The fourth patch exec-pledges pax. I'm not sure I got the
  pledges right, so more careful review would be appreciated.
  I grabbed the pledges from compress, which seem to be a
  superset of the bzip2 pledges.

Regress tests would be good to add. I haven't done the work
yet, but it's mostly a matter of figuring out how to arrange
the makefiles to produce a binary that can be exec'ed. The
examples I see all seem to just produce one regress binary.

In an appropriate turn of events, this patch was written
while sitting in a capsicum talk at EuroBSD.



diff --git include/unistd.h include/unistd.h
index ffec1538f44..1faf2543f3d 100644
--- include/unistd.h
+++ include/unistd.h
@@ -522,7 +522,7 @@ int  strtofflags(char **, u_int32_t *, u_int32_t *);
 int swapctl(int cmd, const void *arg, int misc);
 int syscall(int, ...);
 int getentropy(void *, size_t);
-int pledge(const char *, const char **);
+int pledge(const char *, const char *);
 pid_t   __tfork_thread(const struct __tfork *, size_t, void (*)(void *),
void *);
 #endif /* __BSD_VISIBLE */
diff --git lib/libc/sys/pledge.2 lib/libc/sys/pledge.2
index 89884352500..9bfedc3d14b 100644
--- lib/libc/sys/pledge.2
+++ lib/libc/sys/pledge.2
@@ -23,7 +23,7 @@
 .Sh SYNOPSIS
 .In unistd.h
 .Ft int
-.Fn pledge "const char *promises" "const char *paths[]"
+.Fn pledge "const char *promises" "const char *execpromises"
 .Sh DESCRIPTION
 The current process is forced into a restricted-service operating mode.
 A few subsets are available, roughly described as computation, memory
@@ -31,9 +31,16 @@ management, read-write operations on file descriptors, 
opening of files,
 networking.
 In general, these modes were selected by studying the operation
 of many programs using libc and other such interfaces, and setting
-.Ar promises
-or
-.Ar paths .
+.Ar promises.
+.Pp
+Specifying
+.Ar execpromises
+allow the programmer to set the default pledge set for all future
+processes process that are execed. This is allows for inheriting
+.Fn fork
+and
+.Fn exec
+calls.
 .Pp
 Use of
 .Fn pledge
@@ -70,7 +77,7 @@ Passing
 to
 .Fa promises
 or
-.Fa paths
+.Fa execpromises
 specifies to not change the current value.
 .Pp
 Some system calls, when allowed, have restrictions applied to them:
@@ -143,9 +150,7 @@ support:
 system sensor readings.
 .Pp
 .It Fn pledge
-Can only reduce permissions; can only set a list of
-.Pa paths
-once.
+Can only reduce permissions.
 .El
 .Pp
 The
@@ -467,8 +472,8 @@ Allows a process to call
 Coupled with the
 .Va proc
 promise, this allows a process to fork and execute another program.
-The new program starts running without pledge active and hopefully
-makes a new
+The new program starts running with the requested exec pledges active
+and hopefully makes a new
 .Fn pledge .
 .It Va prot_exec
 Allows the use of
@@ -556,10 +561,6 @@ Allow
 operation for statistics collection from a bpf device.
 .El
 .Pp
-A whitelist of permitted paths may be provided in
-.Ar paths .
-All other paths will return
-.Er ENOENT .
 At least one promise is required to be pledged in order to activate a 
whitelist.
 .Sh RETURN VALUES
 .Rv -std
diff --git regress/sys/kern/pledge/generic/manager.c 
regress/sys/kern/pledge/generic/manager.c
index 451a3ecc088..e6af6b3d69e 100644
--- regress/sys/kern/pledge/generic/manager.c
+++ regress/sys/kern/pledge/generic/manager.c
@@ -175,7 +175,7 @@ 

Re: llvm - xor return pointers

2017-08-15 Thread Ori Bernstein
On Sat, 22 Jul 2017 02:25:29 -0400
Todd Mortimer <t...@opennet.ca> wrote:

> xor [rsp], rsp
> 
> at the start of each function, and before every RET.

Wouldn't this break with alloca() or C99 VLAs?
%rbp may work better, if the frame pointer is
retained.

-- 
Ori Bernstein <o...@eigenstate.org>