[PATCH 0/7] Virtio 1.0 in the kernel

2019-01-19 Thread Stefan Fritsch
Hi,

here comes the split up diff for virtio 1.0 support.  Compared to the previous
diff, I have omitted some changes in how the feature bits and negotiation
works. This also means that the feature bit defines in the headers stay the
same and vmd is not affected. I have tested that virtio_mmio on arm64 still
works.

Cheers,
Stefan



Re: Virtio 1.0 for the kernel

2019-01-19 Thread Stefan Fritsch
On Saturday, 12 January 2019 01:49:09 CET Jonathan Gray wrote:
> On Fri, Jan 11, 2019 at 03:21:11PM -0800, William Ahern wrote:
> > On Fri, Jan 11, 2019 at 10:43:25AM +0100, Stefan Fritsch wrote:
> > 
> > 
> > > /* only used for sizeof, not actually allocated */
> > > extern struct virtio_pci_common_cfg ccfg;
> > 
> > 
> > 
> > > #define CREAD(sc, memb)  _cread(sc, \
> > > 
> > > offsetof(struct virtio_pci_common_cfg, memb), sizeof(ccfg.memb))
> > > 
> > > The compiler should optimize this to the same code as the complicated
> > > macro above. You think this variant is acceptable?
> > 
> > Maybe I'm missing something, but these are the more idiomatic constructs
> > 
> >   sizeof ((struct virtio_pci_common_cfg *)0)->memb
> >   sizeof ((struct virtio_pci_common_cfg){ 0 }).memb
> 
> No, expanding the offsetof macro and avoiding __builtin_offsetof misses
> the point of it.

It allows to get rid of the "extern struct virtio_pci_common_cfg ccfg;" 
though. That's an improvement.

Stefan




Re: Virtio 1.0 for the kernel

2019-01-11 Thread Jonathan Gray
On Fri, Jan 11, 2019 at 03:21:11PM -0800, William Ahern wrote:
> On Fri, Jan 11, 2019 at 10:43:25AM +0100, Stefan Fritsch wrote:
>  
> > /* only used for sizeof, not actually allocated */
> > extern struct virtio_pci_common_cfg ccfg;
>  
> > #define CREAD(sc, memb)  _cread(sc, \
> > offsetof(struct virtio_pci_common_cfg, memb), sizeof(ccfg.memb))
> > 
> > The compiler should optimize this to the same code as the complicated 
> > macro above. You think this variant is acceptable?
> 
> Maybe I'm missing something, but these are the more idiomatic constructs
> 
>   sizeof ((struct virtio_pci_common_cfg *)0)->memb
>   sizeof ((struct virtio_pci_common_cfg){ 0 }).memb
> 

No, expanding the offsetof macro and avoiding __builtin_offsetof misses
the point of it.



Re: Virtio 1.0 for the kernel

2019-01-11 Thread William Ahern
On Fri, Jan 11, 2019 at 10:43:25AM +0100, Stefan Fritsch wrote:
 
> /* only used for sizeof, not actually allocated */
> extern struct virtio_pci_common_cfg ccfg;
 
> #define CREAD(sc, memb)  _cread(sc, \
> offsetof(struct virtio_pci_common_cfg, memb), sizeof(ccfg.memb))
> 
> The compiler should optimize this to the same code as the complicated 
> macro above. You think this variant is acceptable?

Maybe I'm missing something, but these are the more idiomatic constructs

  sizeof ((struct virtio_pci_common_cfg *)0)->memb
  sizeof ((struct virtio_pci_common_cfg){ 0 }).memb



Re: Virtio 1.0 for the kernel

2019-01-11 Thread Ted Unangst
Stefan Fritsch wrote:
> On Fri, 11 Jan 2019, Ted Unangst wrote:
> A simple inline function does not work because memb is the name of a 
> member of a struct. But combined with a macro it would work. Like this:

yeah, that looks pretty normal.



Re: Virtio 1.0 for the kernel

2019-01-11 Thread Stefan Fritsch
On Fri, 11 Jan 2019, Ted Unangst wrote:

> Stefan Fritsch wrote:
> > +#define CREAD(sc, memb)
> > \
> > +   ({  
> > \
> > +   struct virtio_pci_common_cfg c; 
> > \
> > +   size_t off = offsetof(struct virtio_pci_common_cfg, memb);  
> > \
> > +   size_t size = sizeof(c.memb);   
> > \
> > +   typeof(c.memb) val; 
> > \
> > +   
> > \
> > +   switch (size) { 
> > \
> > +   case 1: 
> > \
> > +   val = bus_space_read_1(sc->sc_iot, sc->sc_ioh, off);
> > \
> > +   break;  
> > \
> > +   case 2: 
> > \
> > +   val = bus_space_read_2(sc->sc_iot, sc->sc_ioh, off);
> > \
> > +   break;  
> > \
> > +   case 4: 
> > \
> > +   val = bus_space_read_4(sc->sc_iot, sc->sc_ioh, off);
> > \
> > +   break;  
> > \
> > +   case 8: 
> > \
> > +   val = bus_space_read_8(sc->sc_iot, sc->sc_ioh, off);
> > \
> > +   break;  
> > \
> > +   }   
> > \
> > +   DNPRINTF(2, "%s: %d: off %#zx size %#zx read %#llx\n",  
> > \
> > +  __func__, __LINE__, off, size, (unsigned long long)val); 
> > \
> > +   val;
> > \
> > +   })
> 
> I'm not a big fan of statement expressions. I thought we purged most of them
> from the tree in fact. I would say just use an inline. And no typeof.
> 
> (There's only one typeof in sys/ outside of drm, and I added it, but no more
> please.)
> 

A simple inline function does not work because memb is the name of a 
member of a struct. But combined with a macro it would work. Like this:


/* only used for sizeof, not actually allocated */
extern struct virtio_pci_common_cfg ccfg;

static inline
uint64_t _cread(struct virtio_pci_softc *sc, unsigned off, unsigned size)
{
uint64_t val;
switch (size) {
case 1:
val = bus_space_read_1(sc->sc_iot, sc->sc_ioh, off);
break;
case 2:
val = bus_space_read_2(sc->sc_iot, sc->sc_ioh, off);
break;
case 4:
val = bus_space_read_4(sc->sc_iot, sc->sc_ioh, off);
break;
case 8:
val = bus_space_read_8(sc->sc_iot, sc->sc_ioh, off);
break;
}
return val;
}

#define CREAD(sc, memb)  _cread(sc, \
offsetof(struct virtio_pci_common_cfg, memb), sizeof(ccfg.memb))


The compiler should optimize this to the same code as the complicated 
macro above. You think this variant is acceptable?

Cheers,
Stefan



Re: Virtio 1.0 for the kernel

2019-01-11 Thread Ted Unangst
Stefan Fritsch wrote:
> +#define CREAD(sc, memb)  
> \
> + ({  
> \
> + struct virtio_pci_common_cfg c; 
> \
> + size_t off = offsetof(struct virtio_pci_common_cfg, memb);  
> \
> + size_t size = sizeof(c.memb);   
> \
> + typeof(c.memb) val; 
> \
> + 
> \
> + switch (size) { 
> \
> + case 1: 
> \
> + val = bus_space_read_1(sc->sc_iot, sc->sc_ioh, off);
> \
> + break;  
> \
> + case 2: 
> \
> + val = bus_space_read_2(sc->sc_iot, sc->sc_ioh, off);
> \
> + break;  
> \
> + case 4: 
> \
> + val = bus_space_read_4(sc->sc_iot, sc->sc_ioh, off);
> \
> + break;  
> \
> + case 8: 
> \
> + val = bus_space_read_8(sc->sc_iot, sc->sc_ioh, off);
> \
> + break;  
> \
> + }   
> \
> + DNPRINTF(2, "%s: %d: off %#zx size %#zx read %#llx\n",  
> \
> +__func__, __LINE__, off, size, (unsigned long long)val); 
> \
> + val;
> \
> + })

I'm not a big fan of statement expressions. I thought we purged most of them
from the tree in fact. I would say just use an inline. And no typeof.

(There's only one typeof in sys/ outside of drm, and I added it, but no more
please.)



Re: Virtio 1.0 for the kernel

2019-01-11 Thread Stefan Fritsch
Hi Carlos,

> 
> I'm getting ramped back up from being AWOL for several months, so
> apologies in advance for the delay of going through your diff (and the
> virtio header re-order one that you fixed).  It'll be over the weekend
> at the earliest before I can review/test it out.

There is no hurry, take your time. I will split the diff up into smaller 
chunks, so you may wait for that with a detailed review. But opinions 
about the offsetof hackery are welcome before that.

Cheers,
Stefan



Re: Virtio 1.0 for the kernel

2019-01-11 Thread Stefan Fritsch
On Fri, 11 Jan 2019, Jonathan Gray wrote:
> Assuming a qcow2 image created by something along the lines of
> 'qemu-img create -f qcow2 root.qcow2 2g' miniroot64.fs from an arm64
> snapshot and u-boot for qemu_arm64 from the u-boot-aarch64 package:
> 
> doas sh -c "qemu-system-aarch64 -runas $USER \
>   -M virt -serial stdio -m 1024 -cpu cortex-a57 \
>   -bios /usr/local/share/u-boot/qemu_arm64/u-boot.bin \
>   -device virtio-rng-device -netdev tap,id=net0 \
>   -device virtio-net-device,netdev=net0 \
>   -drive file=miniroot64.fs,if=none,id=drive0,format=raw \
>   -drive file=root.qcow2,if=none,id=drive1,format=qcow2 \
>   -device ich9-ahci,id=ahci \
>   -device ide-drive,drive=drive0,bus=ahci.0 \
>   -device ide-drive,drive=drive1,bus=ahci.1"

Thanks, that works. I had tried with some efi firmware before and not with 
u-boot.

Cheers,
Stefan



Re: Virtio 1.0 for the kernel

2019-01-11 Thread Stefan Fritsch
On Thu, 10 Jan 2019, Theo de Raadt wrote:

> arm64 also uses this subsystem, and as a result this diff breaks
> all those kernels.

The diff also breaks vmd. Even if it compiles it will still be broken. As 
I wrote, it's not ready for commit yet.

> 
> You ask how to run arm64 Uhm, you didn't even try to compile it.
> 
> 



Re: Virtio 1.0 for the kernel

2019-01-10 Thread Theo de Raadt
arm64 also uses this subsystem, and as a result this diff breaks
all those kernels.

You ask how to run arm64 Uhm, you didn't even try to compile it.



Re: Virtio 1.0 for the kernel

2019-01-10 Thread Jonathan Gray
On Thu, Jan 10, 2019 at 11:38:38PM +0100, Stefan Fritsch wrote:
> Hi,
> 
> the diff below implements the virtio 1.0 standard in the kernel (0.9 keeps 
> working, of course). It's not ready for commit, yet, but I would like some 
> input.
> 
> For the most part, the changes from 0.9 to 1.0 are not that big, but there 
> are some notable differences. Since some headers are also used by vmd in 
> userspace, how those things are handled in the kernel are also relevant to 
> vmd.
> 
> One change is that the number of feature bits is no longer limited to 32. 
> This means in the long run, defining constants with values that have only 
> the relevant feature bit set won't work anymore. So eventually, we will 
> have to change the constants to contain the bit offsets instead. Then we 
> can add some macros to make handling easier. This is what the attached 
> diff does. However, in the nearer future 64 bits will be sufficient and 
> another possibility is to continue as now and define features as uint64_t 
> values. So, should we make the switch from "#define foo (1<<24)" to 
> "#define foo 24" now or at some later time, when devices start using more 
> than 64 bits?
> 
> The biggest change in 1.0 is the way the resources are found and accessed 
> for virtio_pci devices. This means that all the offsets for the generic 
> configuration registers change between 0.9 and 1.0. One possibility would 
> be to just define a second set of constants and use that for 1.0. However, 
> since the register layout is documented as structs in the standard, I have 
> tried something different and used some macros (CREAD/CWRITE in 
> virtio_pci.c) and offsetof() magic to determine the register offset and 
> width for an access. This has the advantage that the widths are 
> automatically taken care of, whereas in the old style one has to manually 
> pick the correct bus_dma_read/write_* function. However, I don't like the 
> 0.9 code to use the one style and the 1.0 code to use the other style. I 
> also don't know how suitable the offsetof hackery would be for vmd. So I 
> would like input on which way you think is preferable?
> 
> The diff does not implement 1.0 for virtio/mmio, yet. I hope that it does 
> not break 0.9 virtio/mmio, but I have not tested that. So far I did not 
> have success in running openbsd arm or aarch64 on qemu (openbsd panics). 
> Any pointers about this?

Assuming a qcow2 image created by something along the lines of
'qemu-img create -f qcow2 root.qcow2 2g' miniroot64.fs from an arm64
snapshot and u-boot for qemu_arm64 from the u-boot-aarch64 package:

doas sh -c "qemu-system-aarch64 -runas $USER \
-M virt -serial stdio -m 1024 -cpu cortex-a57 \
-bios /usr/local/share/u-boot/qemu_arm64/u-boot.bin \
-device virtio-rng-device -netdev tap,id=net0 \
-device virtio-net-device,netdev=net0 \
-drive file=miniroot64.fs,if=none,id=drive0,format=raw \
-drive file=root.qcow2,if=none,id=drive1,format=qcow2 \
-device ich9-ahci,id=ahci \
-device ide-drive,drive=drive0,bus=ahci.0 \
-device ide-drive,drive=drive1,bus=ahci.1"

arm is similiar with no need to specify cpu.



Virtio 1.0 for the kernel

2019-01-10 Thread Stefan Fritsch
Hi,

the diff below implements the virtio 1.0 standard in the kernel (0.9 keeps 
working, of course). It's not ready for commit, yet, but I would like some 
input.

For the most part, the changes from 0.9 to 1.0 are not that big, but there 
are some notable differences. Since some headers are also used by vmd in 
userspace, how those things are handled in the kernel are also relevant to 
vmd.

One change is that the number of feature bits is no longer limited to 32. 
This means in the long run, defining constants with values that have only 
the relevant feature bit set won't work anymore. So eventually, we will 
have to change the constants to contain the bit offsets instead. Then we 
can add some macros to make handling easier. This is what the attached 
diff does. However, in the nearer future 64 bits will be sufficient and 
another possibility is to continue as now and define features as uint64_t 
values. So, should we make the switch from "#define foo (1<<24)" to 
"#define foo 24" now or at some later time, when devices start using more 
than 64 bits?

The biggest change in 1.0 is the way the resources are found and accessed 
for virtio_pci devices. This means that all the offsets for the generic 
configuration registers change between 0.9 and 1.0. One possibility would 
be to just define a second set of constants and use that for 1.0. However, 
since the register layout is documented as structs in the standard, I have 
tried something different and used some macros (CREAD/CWRITE in 
virtio_pci.c) and offsetof() magic to determine the register offset and 
width for an access. This has the advantage that the widths are 
automatically taken care of, whereas in the old style one has to manually 
pick the correct bus_dma_read/write_* function. However, I don't like the 
0.9 code to use the one style and the 1.0 code to use the other style. I 
also don't know how suitable the offsetof hackery would be for vmd. So I 
would like input on which way you think is preferable?

The diff does not implement 1.0 for virtio/mmio, yet. I hope that it does 
not break 0.9 virtio/mmio, but I have not tested that. So far I did not 
have success in running openbsd arm or aarch64 on qemu (openbsd panics). 
Any pointers about this?

So far, I have only tested the net, block and rng devices. It's possible 
that the other devices need some more minor tweaks, too.

And in case anyone wonders, there is no killer feature in virtio 1.0 that 
we really want to have. It's just that we run on hypervisors that don't 
support 0.9, and we run better on systems where 1.0-only is the default 
(libvirt does that when the newer qemu machine type q35 is used :-( ).

Cheers,
Stefan


diff --git a/share/man/man4/virtio.4 b/share/man/man4/virtio.4
index 5e60e2e0c32..6ee23a7b47d 100644
--- a/share/man/man4/virtio.4
+++ b/share/man/man4/virtio.4
@@ -22,7 +22,7 @@
 .Nd VirtIO support driver
 .Sh SYNOPSIS
 .Cd "virtio* at fdt?"
-.Cd "virtio* at pci?"
+.Cd "virtio* at pci? flags 0x00"
 .Sh DESCRIPTION
 The
 .Nm
@@ -56,7 +56,12 @@ control interface
 The
 .Nm
 driver conforms to the virtio 0.9.5 specification.
-The virtio 1.0 standard is not supported, yet.
+The virtio 1.0 standard is only supported for pci devices.
+.Pp
+By default 0.9 is preferred over 1.0.
+This can be changed by setting the bit 0x4 in the flags.
+.Pp
+Setting the bit 0x8 in the flags disables the 1.0 support completely.
 .Sh SEE ALSO
 .Xr intro 4
 .Sh HISTORY
diff --git a/sys/dev/fdt/virtio_mmio.c b/sys/dev/fdt/virtio_mmio.c
index a48091d7fca..3a74e647c19 100644
--- a/sys/dev/fdt/virtio_mmio.c
+++ b/sys/dev/fdt/virtio_mmio.c
@@ -89,7 +89,7 @@ void  virtio_mmio_write_device_config_2(struct 
virtio_softc *, int, uint16_t);
 void   virtio_mmio_write_device_config_4(struct virtio_softc *, int, 
uint32_t);
 void   virtio_mmio_write_device_config_8(struct virtio_softc *, int, 
uint64_t);
 uint16_t   virtio_mmio_read_queue_size(struct virtio_softc *, uint16_t);
-void   virtio_mmio_setup_queue(struct virtio_softc *, uint16_t, 
uint32_t);
+void   virtio_mmio_setup_queue(struct virtio_softc *, struct virtqueue 
*, uint64_t);
 void   virtio_mmio_set_status(struct virtio_softc *, int);
 uint32_t   virtio_mmio_negotiate_features(struct virtio_softc *, uint32_t,
  const struct virtio_feature_name 
*);
@@ -151,15 +151,18 @@ virtio_mmio_read_queue_size(struct virtio_softc *vsc, 
uint16_t idx)
 }
 
 void
-virtio_mmio_setup_queue(struct virtio_softc *vsc, uint16_t idx, uint32_t addr)
+virtio_mmio_setup_queue(struct virtio_softc *vsc, struct virtqueue *vq,
+uint64_t addr)
 {
struct virtio_mmio_softc *sc = (struct virtio_mmio_softc *)vsc;
-   bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_QUEUE_SEL, idx);
+   bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_QUEUE_SEL,
+   vq->vq_index);