Re: [libvirt PATCH 5/5] qemu: Implement support for vDPA block devices

2023-09-08 Thread Peter Krempa
On Fri, Sep 08, 2023 at 14:51:14 -0500, Jonathon Jongsma wrote:
> On 7/24/23 8:05 AM, Peter Krempa wrote:
> > As I've promised a long time ago I gave your patches some testing in
> > regards of cooperation with blockjobs and snapshots.
> > 
> > Since the new version of the patches was not yet posted on the list I'm
> > replying here including my observations from testing patches from your
> > gitlab branch:
> > 
> > On Tue, Jun 06, 2023 at 16:15:30 -0500, Jonathon Jongsma wrote:
> > > Requires recent qemu with support for the virtio-blk-vhost-vdpa device
> > > and the ability to pass a /dev/fdset/N path for the vdpa path (8.1.0)
> > > 
> > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1900770
> > 
> > Since this is a feature addition the 'Fixes' keyword doesn't make sense.
> > Use e.g. 'Resolves' instead.
> > 
> > Additionally you're missing the DCO certification here.
> > 
> > > ---
> > >   src/qemu/qemu_block.c  | 20 --
> > >   src/qemu/qemu_domain.c | 25 
> > >   src/qemu/qemu_validate.c   | 44 +++---
> > >   tests/qemuxml2argvdata/disk-vhostvdpa.args | 35 +
> > >   tests/qemuxml2argvdata/disk-vhostvdpa.xml  | 21 +++
> > >   tests/qemuxml2argvtest.c   |  2 +
> > >   6 files changed, 139 insertions(+), 8 deletions(-)
> > >   create mode 100644 tests/qemuxml2argvdata/disk-vhostvdpa.args
> > >   create mode 100644 tests/qemuxml2argvdata/disk-vhostvdpa.xml
> > 
> > [...]
> > 
> > > diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> > > index 2f6b32e394..119e52a7d7 100644
> > > --- a/src/qemu/qemu_domain.c
> > > +++ b/src/qemu/qemu_domain.c
> > > @@ -11157,6 +11157,28 @@ 
> > > qemuDomainPrepareStorageSourceFDs(virStorageSource *src,
> > >   }
> > > +static int
> > > +qemuDomainPrepareStorageSourceVDPA(virStorageSource *src,
> > > +   qemuDomainObjPrivate *priv)
> > > +{
> > > +qemuDomainStorageSourcePrivate *srcpriv = NULL;
> > > +virStorageType actualType = virStorageSourceGetActualType(src);
> > > +int vdpafd = -1;
> > > +
> > > +if (actualType != VIR_STORAGE_TYPE_VHOST_VDPA)
> > > +return 0;
> > > +
> > > +if ((vdpafd = qemuVDPAConnect(src->path)) < 0)
> > > +return -1;
> > 
> > This function call directly touches the host filesystem, which is not
> > supposed to be in the *DomainPrepareStorageSource* functions but we
> > rather have a completely separate machinery in
> > qemuProcessPrepareHostStorage.
> > 
> > Unfortunately that one doesn't yet need to handle individual backing
> > chain members though.
> > 
> > This ensures that the code doesn't get accidentally called from tests
> > even without mocking the code as the tests reimplement the functions
> > differently for testing purposes.
> 
> Somehow I missed this comment earlier. Unfortunately, it doesn't seem
> straightforward to move this code. We can't simply move all of the logic to
> qemuProcessPrepareHostStorage() because that function doesn't get called
> during the tests.

That is the exact idea of it. This must not be called from tests because
you must not attempt to rely on host state during tests.

To make tests work, in testCompareXMLToArgvCreateArgs, we then setup a
fake FD to be passed to the command line generator. Similarly other
stuff is mocked there, mostly FD passing-related.

Additionally qemuProcessPrepareHostStorage() is also skipped inside the
domxmlToNative API as you also don't want to actually attempt opening
the VDPA socket in case when it won't be used. In that case mocking via
LD_PRELOAD is not possible.

In certain cases (if we care enough about the domxml->native API) we
even setup an alternative syntax (e.g. not using FD passing) so that the
users can adapt and use such commandline as inspiration to actually run
qemu.

> I thought I could move only the opening of the fd to the
> PrepareHostStorage() function and then keep the qemuFDPass construction in
> this function, but that doesn't work: the PrepareHostStorage() function
> actually gets called *after* this function. So the fd would not even be open
> yet at the time this function gets called.
> 
> So... it seems that the options are either:
> 
> - leave everything in qemuDomainPrepareStorageSourceVDPA() (as is)

No. That would violate the idea of qemuProcessPrepareHostStorage().

> - move the fd opening to PrepareHostStorage() and then move the rest to a
> different common function that is called after that, such as
> qemuBuildDiskSourceCommandLine()

Inside PrepareHostStorage() you both open the FD and create the fdpass
structure and assign it to the private data, so there's no need for any
other location to do anything else.

> - a third option (suggestions?)
> 
> It's worth noting that the vdpa *network* device essentially does everything
> (opening the file, creating the qemuFDPass object, etc) in the
> qemuBuildInterfaceCommandLine() function. 

Re: [libvirt PATCH 5/5] qemu: Implement support for vDPA block devices

2023-09-08 Thread Jonathon Jongsma

On 7/24/23 8:05 AM, Peter Krempa wrote:

As I've promised a long time ago I gave your patches some testing in
regards of cooperation with blockjobs and snapshots.

Since the new version of the patches was not yet posted on the list I'm
replying here including my observations from testing patches from your
gitlab branch:

On Tue, Jun 06, 2023 at 16:15:30 -0500, Jonathon Jongsma wrote:

Requires recent qemu with support for the virtio-blk-vhost-vdpa device
and the ability to pass a /dev/fdset/N path for the vdpa path (8.1.0)

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1900770


Since this is a feature addition the 'Fixes' keyword doesn't make sense.
Use e.g. 'Resolves' instead.

Additionally you're missing the DCO certification here.


---
  src/qemu/qemu_block.c  | 20 --
  src/qemu/qemu_domain.c | 25 
  src/qemu/qemu_validate.c   | 44 +++---
  tests/qemuxml2argvdata/disk-vhostvdpa.args | 35 +
  tests/qemuxml2argvdata/disk-vhostvdpa.xml  | 21 +++
  tests/qemuxml2argvtest.c   |  2 +
  6 files changed, 139 insertions(+), 8 deletions(-)
  create mode 100644 tests/qemuxml2argvdata/disk-vhostvdpa.args
  create mode 100644 tests/qemuxml2argvdata/disk-vhostvdpa.xml


[...]


diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 2f6b32e394..119e52a7d7 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -11157,6 +11157,28 @@ qemuDomainPrepareStorageSourceFDs(virStorageSource 
*src,
  }
  
  
+static int

+qemuDomainPrepareStorageSourceVDPA(virStorageSource *src,
+   qemuDomainObjPrivate *priv)
+{
+qemuDomainStorageSourcePrivate *srcpriv = NULL;
+virStorageType actualType = virStorageSourceGetActualType(src);
+int vdpafd = -1;
+
+if (actualType != VIR_STORAGE_TYPE_VHOST_VDPA)
+return 0;
+
+if ((vdpafd = qemuVDPAConnect(src->path)) < 0)
+return -1;


This function call directly touches the host filesystem, which is not
supposed to be in the *DomainPrepareStorageSource* functions but we
rather have a completely separate machinery in
qemuProcessPrepareHostStorage.

Unfortunately that one doesn't yet need to handle individual backing
chain members though.

This ensures that the code doesn't get accidentally called from tests
even without mocking the code as the tests reimplement the functions
differently for testing purposes.


Somehow I missed this comment earlier. Unfortunately, it doesn't seem 
straightforward to move this code. We can't simply move all of the logic 
to qemuProcessPrepareHostStorage() because that function doesn't get 
called during the tests. I thought I could move only the opening of the 
fd to the PrepareHostStorage() function and then keep the qemuFDPass 
construction in this function, but that doesn't work: the 
PrepareHostStorage() function actually gets called *after* this 
function. So the fd would not even be open yet at the time this function 
gets called.


So... it seems that the options are either:

- leave everything in qemuDomainPrepareStorageSourceVDPA() (as is)
- move the fd opening to PrepareHostStorage() and then move the rest to 
a different common function that is called after that, such as 
qemuBuildDiskSourceCommandLine()

- a third option (suggestions?)

It's worth noting that the vdpa *network* device essentially does 
everything (opening the file, creating the qemuFDPass object, etc) in 
the qemuBuildInterfaceCommandLine() function. This was done to match 
other network devices that connect to open file descriptors (see 
qemuBuildInterfaceConnect()). But based on your comments above, it 
sounds like this may not be a the ideal situation even though the 
function is mocked to not actually open any file descriptors from the 
host filesystem under test.


Jonathon




+
+srcpriv = qemuDomainStorageSourcePrivateFetch(src);
+
+srcpriv->fdpass = qemuFDPassNew(src->nodestorage, priv);
+qemuFDPassAddFD(srcpriv->fdpass, , "-vdpa");
+return 0;
+}


[...]


diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
index 9dce908cfe..67b0944162 100644
--- a/src/qemu/qemu_validate.c
+++ b/src/qemu/qemu_validate.c
@@ -3220,6 +3220,28 @@ qemuValidateDomainDeviceDefDiskTransient(const 
virDomainDiskDef *disk,
  }
  
  
+static int

+qemuValidateDomainDeviceDefDiskVhost(const virDomainDef *def,
+ virStorageType storagetype,
+ virQEMUCapsFlags flag,
+ virQEMUCaps *qemuCaps)
+{
+const char *vhosttype = virStorageTypeToString(storagetype);
+
+if (!virQEMUCapsGet(qemuCaps, flag)) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+   _("%1$s disk is not supported with this QEMU binary"),
+   vhosttype);
+return -1;


I'd prefer if both things this function does are duplicated 

Re: [libvirt PATCH 5/5] qemu: Implement support for vDPA block devices

2023-09-04 Thread Stefano Garzarella

On Fri, Sep 01, 2023 at 03:24:11PM -0500, Jonathon Jongsma wrote:

On 8/16/23 4:19 PM, Jonathon Jongsma wrote:

On 8/8/23 6:00 AM, Stefano Garzarella wrote:

On Mon, Aug 07, 2023 at 03:41:21PM +0200, Peter Krempa wrote:

On Thu, Aug 03, 2023 at 09:48:01 +0200, Stefano Garzarella wrote:
On Wed, Aug 2, 2023 at 10:33 PM Jonathon Jongsma 
 wrote:

On 7/24/23 8:05 AM, Peter Krempa wrote:


[...]


>
> I've also noticed that using 'qcow2' format for the device 

doesn't work:

>
> error: internal error: process exited while connecting to 
monitor: 2023-07-24T12:54:15.818631Z qemu-system-x86_64: 
-blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage"}: 
Could not read qcow2 header: Invalid argument

>
> If that is supposed to work, then qemu devs will probably 

need to know

> about that, if that is not supposed to work, libvirt needs to add a
> check, because the error doesn't tell much. It's also possible I've
> messed up when formatting the image though, as didn't really try to
> figure out what's happening.
>


That's a good question, and I don't actually know the 

answer. Were you
using an actual vdpa block device for your tests or were you 

using the

vdpa block simulator kernel module? How did you set it up? Adding
Stefano to cc for his thoughts.


Yep, I would also like to understand how you initialized the device
with a qcow2 format.


Naively and originally I've simply used it as 'raw' at first and
formatted it from the guest OS. Then I've shut-down the VM and started
it back reconfiguring the image format as qcow2. This normally works
with real-file backed storage, and since the vdpa simulator seems to
persist the contents I supposed this would work.


Cool, I'll try that.
Can you try to reboot the VM, use it as `raw`, and read the qcow2 in the
vm from the guest OS?

Note: there could be some bugs in the simulator!




Theoretically, the best use case for vDPA block is that the backend
handles formats, for QEMU it should just be a virtio device, but being
a blockdev, we should be able to use formats anyway, so it should
work.


Yeah, ideally there will be no format driver in qemu used for these
devices (this is not yet the case, I'll need to fix libvirt to stop
using the 'raw' driver if not needed).

Here I'm more interested whether it is supposed to work, in which case
we want to allow using qcow2 as a format in libvirt, or it's not
supposed to work and we should forbid it before the user gets a
suboptimal error message such as now.


This is a good question. We certainly haven't tested it, because it's an
uncommon scenario, but as I said before, maybe it should work. I need to
check it better.





For now, waiting for real hardware, the only way to test vDPA block
support in QEMU is to use the simulator in the kernel or VDUSE.

With the kernel simulator we only have a 128 MB ramdisk available,
with VDUSE you can use QSD with any file:

$ modprobe -a vhost_vdpa vduse
$ qemu-storage-daemon \
    --blockdev 
file,filename=/path/to/image.qcow2,cache.direct=on,aio=native,node-name=file
\
    --blockdev qcow2,file=file,node-name=qcow2 \
    --export 
vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=qcow2,writable=on

$ vdpa dev add name vduse0 mgmtdev vduse

Then you have a /dev/vhost-vdpa-X device that you can use with the
`virtio-blk-vhost-vdpa` blockdev (note: vduse requires QEMU with a
memory-backed with `share=on`), but using raw since the qcow2 is
handled by QSD.
Of course, we should be able to use raw file with QSD and qcow2 on
qemu (although it's not the optimal configuration), but I don't know
how to initialize a `virtio-blk-vhost-vdpa` blockdev with a qcow2
image :-(


With the above qemu storage daemon you should be able to do that by
simply dropping the qcow2 format driver and simply exposing a qcow2
formatted image. It similarly works with NBD:

I've formatted 2 qcow2 images:

# qemu-img create -f qcow2 /root/image1.qcow2 100M
# qemu-img create -f qcow2 /root/image2.qcow2 100M

And then exported them both via vduse and nbd without interpreting
qcow2, thus making the QSD into just a dumb storage device:

# qemu-storage-daemon \
  --blockdev file,filename=/root/image1.qcow2,cache.direct=on,aio=native,node-name=file1 
\
  --export vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=file1,writable=on 
\
  --blockdev file,filename=/root/image2.qcow2,cache.direct=on,aio=native,node-name=file2 
\

  --nbd-server addr.type=unix,addr.path=/tmp/nbd.sock \
  --export nbd,id=nbd0,node-name=file2,writable=on,name=exportname


Cool! Thanks for sharing!



Now when I start a VM using the NBD export in qcow2 format:

   
 
 
   
 
 
 function='0x0'/>

   

The VM starts fine, but when using:

   
 
 
 
 function='0x0'/>

   

I get:

error: internal error: QEMU unexpectedly closed the monitor 
(vm='vdpa'): 2023-08-07T12:34:21.628520Z qemu-system-x86_64: 
-blockdev 

Re: [libvirt PATCH 5/5] qemu: Implement support for vDPA block devices

2023-09-01 Thread Jonathon Jongsma

On 8/16/23 4:19 PM, Jonathon Jongsma wrote:

On 8/8/23 6:00 AM, Stefano Garzarella wrote:

On Mon, Aug 07, 2023 at 03:41:21PM +0200, Peter Krempa wrote:

On Thu, Aug 03, 2023 at 09:48:01 +0200, Stefano Garzarella wrote:
On Wed, Aug 2, 2023 at 10:33 PM Jonathon Jongsma 
 wrote:

> On 7/24/23 8:05 AM, Peter Krempa wrote:

[...]

> >
> > I've also noticed that using 'qcow2' format for the device 
doesn't work:

> >
> > error: internal error: process exited while connecting to 
monitor: 2023-07-24T12:54:15.818631Z qemu-system-x86_64: -blockdev 
{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage"}: Could not read qcow2 header: Invalid argument

> >
> > If that is supposed to work, then qemu devs will probably need 
to know

> > about that, if that is not supposed to work, libvirt needs to add a
> > check, because the error doesn't tell much. It's also possible I've
> > messed up when formatting the image though, as didn't really try to
> > figure out what's happening.
> >
>
>
> That's a good question, and I don't actually know the answer. Were 
you
> using an actual vdpa block device for your tests or were you using 
the

> vdpa block simulator kernel module? How did you set it up? Adding
> Stefano to cc for his thoughts.

Yep, I would also like to understand how you initialized the device
with a qcow2 format.


Naively and originally I've simply used it as 'raw' at first and
formatted it from the guest OS. Then I've shut-down the VM and started
it back reconfiguring the image format as qcow2. This normally works
with real-file backed storage, and since the vdpa simulator seems to
persist the contents I supposed this would work.


Cool, I'll try that.
Can you try to reboot the VM, use it as `raw`, and read the qcow2 in the
vm from the guest OS?

Note: there could be some bugs in the simulator!




Theoretically, the best use case for vDPA block is that the backend
handles formats, for QEMU it should just be a virtio device, but being
a blockdev, we should be able to use formats anyway, so it should
work.


Yeah, ideally there will be no format driver in qemu used for these
devices (this is not yet the case, I'll need to fix libvirt to stop
using the 'raw' driver if not needed).

Here I'm more interested whether it is supposed to work, in which case
we want to allow using qcow2 as a format in libvirt, or it's not
supposed to work and we should forbid it before the user gets a
suboptimal error message such as now.


This is a good question. We certainly haven't tested it, because it's an
uncommon scenario, but as I said before, maybe it should work. I need to
check it better.





For now, waiting for real hardware, the only way to test vDPA block
support in QEMU is to use the simulator in the kernel or VDUSE.

With the kernel simulator we only have a 128 MB ramdisk available,
with VDUSE you can use QSD with any file:

$ modprobe -a vhost_vdpa vduse
$ qemu-storage-daemon \
    --blockdev 
file,filename=/path/to/image.qcow2,cache.direct=on,aio=native,node-name=file

\
    --blockdev qcow2,file=file,node-name=qcow2 \
    --export 
vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=qcow2,writable=on


$ vdpa dev add name vduse0 mgmtdev vduse

Then you have a /dev/vhost-vdpa-X device that you can use with the
`virtio-blk-vhost-vdpa` blockdev (note: vduse requires QEMU with a
memory-backed with `share=on`), but using raw since the qcow2 is
handled by QSD.
Of course, we should be able to use raw file with QSD and qcow2 on
qemu (although it's not the optimal configuration), but I don't know
how to initialize a `virtio-blk-vhost-vdpa` blockdev with a qcow2
image :-(


With the above qemu storage daemon you should be able to do that by
simply dropping the qcow2 format driver and simply exposing a qcow2
formatted image. It similarly works with NBD:

I've formatted 2 qcow2 images:

# qemu-img create -f qcow2 /root/image1.qcow2 100M
# qemu-img create -f qcow2 /root/image2.qcow2 100M

And then exported them both via vduse and nbd without interpreting
qcow2, thus making the QSD into just a dumb storage device:

# qemu-storage-daemon \
  --blockdev 
file,filename=/root/image1.qcow2,cache.direct=on,aio=native,node-name=file1 \
  --export 
vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=file1,writable=on \
  --blockdev 
file,filename=/root/image2.qcow2,cache.direct=on,aio=native,node-name=file2 \

  --nbd-server addr.type=unix,addr.path=/tmp/nbd.sock \
  --export nbd,id=nbd0,node-name=file2,writable=on,name=exportname


Cool! Thanks for sharing!



Now when I start a VM using the NBD export in qcow2 format:

   
 
 
   
 
 
 function='0x0'/>

   

The VM starts fine, but when using:

   
 
 
 
 function='0x0'/>

   

I get:

error: internal error: QEMU unexpectedly closed the monitor 
(vm='vdpa'): 2023-08-07T12:34:21.628520Z qemu-system-x86_64: 
-blockdev 

Re: [libvirt PATCH 5/5] qemu: Implement support for vDPA block devices

2023-08-16 Thread Jonathon Jongsma

On 8/8/23 6:00 AM, Stefano Garzarella wrote:

On Mon, Aug 07, 2023 at 03:41:21PM +0200, Peter Krempa wrote:

On Thu, Aug 03, 2023 at 09:48:01 +0200, Stefano Garzarella wrote:
On Wed, Aug 2, 2023 at 10:33 PM Jonathon Jongsma 
 wrote:

> On 7/24/23 8:05 AM, Peter Krempa wrote:

[...]

> >
> > I've also noticed that using 'qcow2' format for the device 
doesn't work:

> >
> > error: internal error: process exited while connecting to 
monitor: 2023-07-24T12:54:15.818631Z qemu-system-x86_64: -blockdev 
{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage"}: Could not read qcow2 header: Invalid argument

> >
> > If that is supposed to work, then qemu devs will probably need to 
know

> > about that, if that is not supposed to work, libvirt needs to add a
> > check, because the error doesn't tell much. It's also possible I've
> > messed up when formatting the image though, as didn't really try to
> > figure out what's happening.
> >
>
>
> That's a good question, and I don't actually know the answer. Were you
> using an actual vdpa block device for your tests or were you using the
> vdpa block simulator kernel module? How did you set it up? Adding
> Stefano to cc for his thoughts.

Yep, I would also like to understand how you initialized the device
with a qcow2 format.


Naively and originally I've simply used it as 'raw' at first and
formatted it from the guest OS. Then I've shut-down the VM and started
it back reconfiguring the image format as qcow2. This normally works
with real-file backed storage, and since the vdpa simulator seems to
persist the contents I supposed this would work.


Cool, I'll try that.
Can you try to reboot the VM, use it as `raw`, and read the qcow2 in the
vm from the guest OS?

Note: there could be some bugs in the simulator!




Theoretically, the best use case for vDPA block is that the backend
handles formats, for QEMU it should just be a virtio device, but being
a blockdev, we should be able to use formats anyway, so it should
work.


Yeah, ideally there will be no format driver in qemu used for these
devices (this is not yet the case, I'll need to fix libvirt to stop
using the 'raw' driver if not needed).

Here I'm more interested whether it is supposed to work, in which case
we want to allow using qcow2 as a format in libvirt, or it's not
supposed to work and we should forbid it before the user gets a
suboptimal error message such as now.


This is a good question. We certainly haven't tested it, because it's an
uncommon scenario, but as I said before, maybe it should work. I need to
check it better.





For now, waiting for real hardware, the only way to test vDPA block
support in QEMU is to use the simulator in the kernel or VDUSE.

With the kernel simulator we only have a 128 MB ramdisk available,
with VDUSE you can use QSD with any file:

$ modprobe -a vhost_vdpa vduse
$ qemu-storage-daemon \
    --blockdev 
file,filename=/path/to/image.qcow2,cache.direct=on,aio=native,node-name=file

\
    --blockdev qcow2,file=file,node-name=qcow2 \
    --export 
vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=qcow2,writable=on


$ vdpa dev add name vduse0 mgmtdev vduse

Then you have a /dev/vhost-vdpa-X device that you can use with the
`virtio-blk-vhost-vdpa` blockdev (note: vduse requires QEMU with a
memory-backed with `share=on`), but using raw since the qcow2 is
handled by QSD.
Of course, we should be able to use raw file with QSD and qcow2 on
qemu (although it's not the optimal configuration), but I don't know
how to initialize a `virtio-blk-vhost-vdpa` blockdev with a qcow2
image :-(


With the above qemu storage daemon you should be able to do that by
simply dropping the qcow2 format driver and simply exposing a qcow2
formatted image. It similarly works with NBD:

I've formatted 2 qcow2 images:

# qemu-img create -f qcow2 /root/image1.qcow2 100M
# qemu-img create -f qcow2 /root/image2.qcow2 100M

And then exported them both via vduse and nbd without interpreting
qcow2, thus making the QSD into just a dumb storage device:

# qemu-storage-daemon \
  --blockdev 
file,filename=/root/image1.qcow2,cache.direct=on,aio=native,node-name=file1 \
  --export 
vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=file1,writable=on \
  --blockdev 
file,filename=/root/image2.qcow2,cache.direct=on,aio=native,node-name=file2 \

  --nbd-server addr.type=unix,addr.path=/tmp/nbd.sock \
  --export nbd,id=nbd0,node-name=file2,writable=on,name=exportname


Cool! Thanks for sharing!



Now when I start a VM using the NBD export in qcow2 format:

   
 
 
   
 
 
 function='0x0'/>

   

The VM starts fine, but when using:

   
 
 
 
 function='0x0'/>

   

I get:

error: internal error: QEMU unexpectedly closed the monitor 
(vm='vdpa'): 2023-08-07T12:34:21.628520Z qemu-system-x86_64: -blockdev 

Re: [libvirt PATCH 5/5] qemu: Implement support for vDPA block devices

2023-08-08 Thread Stefano Garzarella

On Mon, Aug 07, 2023 at 03:41:21PM +0200, Peter Krempa wrote:

On Thu, Aug 03, 2023 at 09:48:01 +0200, Stefano Garzarella wrote:

On Wed, Aug 2, 2023 at 10:33 PM Jonathon Jongsma  wrote:
> On 7/24/23 8:05 AM, Peter Krempa wrote:

[...]

> >
> > I've also noticed that using 'qcow2' format for the device doesn't work:
> >
> > error: internal error: process exited while connecting to monitor: 2023-07-24T12:54:15.818631Z qemu-system-x86_64: -blockdev 
{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage"}:
 Could not read qcow2 header: Invalid argument
> >
> > If that is supposed to work, then qemu devs will probably need to know
> > about that, if that is not supposed to work, libvirt needs to add a
> > check, because the error doesn't tell much. It's also possible I've
> > messed up when formatting the image though, as didn't really try to
> > figure out what's happening.
> >
>
>
> That's a good question, and I don't actually know the answer. Were you
> using an actual vdpa block device for your tests or were you using the
> vdpa block simulator kernel module? How did you set it up? Adding
> Stefano to cc for his thoughts.

Yep, I would also like to understand how you initialized the device
with a qcow2 format.


Naively and originally I've simply used it as 'raw' at first and
formatted it from the guest OS. Then I've shut-down the VM and started
it back reconfiguring the image format as qcow2. This normally works
with real-file backed storage, and since the vdpa simulator seems to
persist the contents I supposed this would work.


Cool, I'll try that.
Can you try to reboot the VM, use it as `raw`, and read the qcow2 in the
vm from the guest OS?

Note: there could be some bugs in the simulator!




Theoretically, the best use case for vDPA block is that the backend
handles formats, for QEMU it should just be a virtio device, but being
a blockdev, we should be able to use formats anyway, so it should
work.


Yeah, ideally there will be no format driver in qemu used for these
devices (this is not yet the case, I'll need to fix libvirt to stop
using the 'raw' driver if not needed).

Here I'm more interested whether it is supposed to work, in which case
we want to allow using qcow2 as a format in libvirt, or it's not
supposed to work and we should forbid it before the user gets a
suboptimal error message such as now.


This is a good question. We certainly haven't tested it, because it's an
uncommon scenario, but as I said before, maybe it should work. I need to
check it better.





For now, waiting for real hardware, the only way to test vDPA block
support in QEMU is to use the simulator in the kernel or VDUSE.

With the kernel simulator we only have a 128 MB ramdisk available,
with VDUSE you can use QSD with any file:

$ modprobe -a vhost_vdpa vduse
$ qemu-storage-daemon \
--blockdev 
file,filename=/path/to/image.qcow2,cache.direct=on,aio=native,node-name=file
\
--blockdev qcow2,file=file,node-name=qcow2 \
--export 
vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=qcow2,writable=on

$ vdpa dev add name vduse0 mgmtdev vduse

Then you have a /dev/vhost-vdpa-X device that you can use with the
`virtio-blk-vhost-vdpa` blockdev (note: vduse requires QEMU with a
memory-backed with `share=on`), but using raw since the qcow2 is
handled by QSD.
Of course, we should be able to use raw file with QSD and qcow2 on
qemu (although it's not the optimal configuration), but I don't know
how to initialize a `virtio-blk-vhost-vdpa` blockdev with a qcow2
image :-(


With the above qemu storage daemon you should be able to do that by
simply dropping the qcow2 format driver and simply exposing a qcow2
formatted image. It similarly works with NBD:

I've formatted 2 qcow2 images:

# qemu-img create -f qcow2 /root/image1.qcow2 100M
# qemu-img create -f qcow2 /root/image2.qcow2 100M

And then exported them both via vduse and nbd without interpreting
qcow2, thus making the QSD into just a dumb storage device:

# qemu-storage-daemon \
  --blockdev 
file,filename=/root/image1.qcow2,cache.direct=on,aio=native,node-name=file1 \
  --export 
vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=file1,writable=on \
  --blockdev 
file,filename=/root/image2.qcow2,cache.direct=on,aio=native,node-name=file2 \
  --nbd-server addr.type=unix,addr.path=/tmp/nbd.sock \
  --export nbd,id=nbd0,node-name=file2,writable=on,name=exportname


Cool! Thanks for sharing!



Now when I start a VM using the NBD export in qcow2 format:

   
 
 
   
 
 
 
   

The VM starts fine, but when using:

   
 
 
 
 
   

I get:

error: internal error: QEMU unexpectedly closed the monitor (vm='vdpa'): 2023-08-07T12:34:21.628520Z qemu-system-x86_64: -blockdev 
{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage"}:
 Could not read qcow2 header: Invalid 

Re: [libvirt PATCH 5/5] qemu: Implement support for vDPA block devices

2023-08-07 Thread Peter Krempa
On Thu, Aug 03, 2023 at 09:48:01 +0200, Stefano Garzarella wrote:
> On Wed, Aug 2, 2023 at 10:33 PM Jonathon Jongsma  wrote:
> > On 7/24/23 8:05 AM, Peter Krempa wrote:
> 
> [...]
> 
> > >
> > > I've also noticed that using 'qcow2' format for the device doesn't work:
> > >
> > > error: internal error: process exited while connecting to monitor: 
> > > 2023-07-24T12:54:15.818631Z qemu-system-x86_64: -blockdev 
> > > {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage"}:
> > >  Could not read qcow2 header: Invalid argument
> > >
> > > If that is supposed to work, then qemu devs will probably need to know
> > > about that, if that is not supposed to work, libvirt needs to add a
> > > check, because the error doesn't tell much. It's also possible I've
> > > messed up when formatting the image though, as didn't really try to
> > > figure out what's happening.
> > >
> >
> >
> > That's a good question, and I don't actually know the answer. Were you
> > using an actual vdpa block device for your tests or were you using the
> > vdpa block simulator kernel module? How did you set it up? Adding
> > Stefano to cc for his thoughts.
> 
> Yep, I would also like to understand how you initialized the device
> with a qcow2 format.

Naively and originally I've simply used it as 'raw' at first and
formatted it from the guest OS. Then I've shut-down the VM and started
it back reconfiguring the image format as qcow2. This normally works
with real-file backed storage, and since the vdpa simulator seems to
persist the contents I supposed this would work.

> Theoretically, the best use case for vDPA block is that the backend
> handles formats, for QEMU it should just be a virtio device, but being
> a blockdev, we should be able to use formats anyway, so it should
> work.

Yeah, ideally there will be no format driver in qemu used for these
devices (this is not yet the case, I'll need to fix libvirt to stop
using the 'raw' driver if not needed).

Here I'm more interested whether it is supposed to work, in which case
we want to allow using qcow2 as a format in libvirt, or it's not
supposed to work and we should forbid it before the user gets a
suboptimal error message such as now.

> 
> For now, waiting for real hardware, the only way to test vDPA block
> support in QEMU is to use the simulator in the kernel or VDUSE.
> 
> With the kernel simulator we only have a 128 MB ramdisk available,
> with VDUSE you can use QSD with any file:
> 
> $ modprobe -a vhost_vdpa vduse
> $ qemu-storage-daemon \
> --blockdev 
> file,filename=/path/to/image.qcow2,cache.direct=on,aio=native,node-name=file
> \
> --blockdev qcow2,file=file,node-name=qcow2 \
> --export 
> vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=qcow2,writable=on
> 
> $ vdpa dev add name vduse0 mgmtdev vduse
> 
> Then you have a /dev/vhost-vdpa-X device that you can use with the
> `virtio-blk-vhost-vdpa` blockdev (note: vduse requires QEMU with a
> memory-backed with `share=on`), but using raw since the qcow2 is
> handled by QSD.
> Of course, we should be able to use raw file with QSD and qcow2 on
> qemu (although it's not the optimal configuration), but I don't know
> how to initialize a `virtio-blk-vhost-vdpa` blockdev with a qcow2
> image :-(

With the above qemu storage daemon you should be able to do that by
simply dropping the qcow2 format driver and simply exposing a qcow2
formatted image. It similarly works with NBD:

I've formatted 2 qcow2 images:

 # qemu-img create -f qcow2 /root/image1.qcow2 100M
 # qemu-img create -f qcow2 /root/image2.qcow2 100M

And then exported them both via vduse and nbd without interpreting
qcow2, thus making the QSD into just a dumb storage device:

 # qemu-storage-daemon \
   --blockdev 
file,filename=/root/image1.qcow2,cache.direct=on,aio=native,node-name=file1 \
   --export 
vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=file1,writable=on \
   --blockdev 
file,filename=/root/image2.qcow2,cache.direct=on,aio=native,node-name=file2 \
   --nbd-server addr.type=unix,addr.path=/tmp/nbd.sock \
   --export nbd,id=nbd0,node-name=file2,writable=on,name=exportname

Now when I start a VM using the NBD export in qcow2 format:


  
  

  
  
  


The VM starts fine, but when using:


  
  
  
  


I get:

error: internal error: QEMU unexpectedly closed the monitor (vm='vdpa'): 
2023-08-07T12:34:21.628520Z qemu-system-x86_64: -blockdev 
{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage"}:
 Could not read qcow2 header: Invalid argument

What is weird though, that if I attempt to use qemu itself to format the
image, with VDPA simulator it appears to work the very first time after
the image is formatted, but if I restart the VM it no longer recognizes
the image. Since the simulator's content seems to be preserved 

Re: [libvirt PATCH 5/5] qemu: Implement support for vDPA block devices

2023-08-03 Thread Stefano Garzarella
On Wed, Aug 2, 2023 at 10:33 PM Jonathon Jongsma  wrote:
> On 7/24/23 8:05 AM, Peter Krempa wrote:

[...]

> >
> > I've also noticed that using 'qcow2' format for the device doesn't work:
> >
> > error: internal error: process exited while connecting to monitor: 
> > 2023-07-24T12:54:15.818631Z qemu-system-x86_64: -blockdev 
> > {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage"}:
> >  Could not read qcow2 header: Invalid argument
> >
> > If that is supposed to work, then qemu devs will probably need to know
> > about that, if that is not supposed to work, libvirt needs to add a
> > check, because the error doesn't tell much. It's also possible I've
> > messed up when formatting the image though, as didn't really try to
> > figure out what's happening.
> >
>
>
> That's a good question, and I don't actually know the answer. Were you
> using an actual vdpa block device for your tests or were you using the
> vdpa block simulator kernel module? How did you set it up? Adding
> Stefano to cc for his thoughts.

Yep, I would also like to understand how you initialized the device
with a qcow2 format.

Theoretically, the best use case for vDPA block is that the backend
handles formats, for QEMU it should just be a virtio device, but being
a blockdev, we should be able to use formats anyway, so it should
work.

For now, waiting for real hardware, the only way to test vDPA block
support in QEMU is to use the simulator in the kernel or VDUSE.

With the kernel simulator we only have a 128 MB ramdisk available,
with VDUSE you can use QSD with any file:

$ modprobe -a vhost_vdpa vduse
$ qemu-storage-daemon \
--blockdev 
file,filename=/path/to/image.qcow2,cache.direct=on,aio=native,node-name=file
\
--blockdev qcow2,file=file,node-name=qcow2 \
--export 
vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=qcow2,writable=on

$ vdpa dev add name vduse0 mgmtdev vduse

Then you have a /dev/vhost-vdpa-X device that you can use with the
`virtio-blk-vhost-vdpa` blockdev (note: vduse requires QEMU with a
memory-backed with `share=on`), but using raw since the qcow2 is
handled by QSD.
Of course, we should be able to use raw file with QSD and qcow2 on
qemu (although it's not the optimal configuration), but I don't know
how to initialize a `virtio-blk-vhost-vdpa` blockdev with a qcow2
image :-(

Thanks,
Stefano



Re: [libvirt PATCH 5/5] qemu: Implement support for vDPA block devices

2023-08-02 Thread Jonathon Jongsma

On 7/24/23 8:05 AM, Peter Krempa wrote:

As I've promised a long time ago I gave your patches some testing in
regards of cooperation with blockjobs and snapshots.

Since the new version of the patches was not yet posted on the list I'm
replying here including my observations from testing patches from your
gitlab branch:

On Tue, Jun 06, 2023 at 16:15:30 -0500, Jonathon Jongsma wrote:

Requires recent qemu with support for the virtio-blk-vhost-vdpa device
and the ability to pass a /dev/fdset/N path for the vdpa path (8.1.0)

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1900770


Since this is a feature addition the 'Fixes' keyword doesn't make sense.
Use e.g. 'Resolves' instead.

Additionally you're missing the DCO certification here.


---
  src/qemu/qemu_block.c  | 20 --
  src/qemu/qemu_domain.c | 25 
  src/qemu/qemu_validate.c   | 44 +++---
  tests/qemuxml2argvdata/disk-vhostvdpa.args | 35 +
  tests/qemuxml2argvdata/disk-vhostvdpa.xml  | 21 +++
  tests/qemuxml2argvtest.c   |  2 +
  6 files changed, 139 insertions(+), 8 deletions(-)
  create mode 100644 tests/qemuxml2argvdata/disk-vhostvdpa.args
  create mode 100644 tests/qemuxml2argvdata/disk-vhostvdpa.xml


[...]


diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 2f6b32e394..119e52a7d7 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -11157,6 +11157,28 @@ qemuDomainPrepareStorageSourceFDs(virStorageSource 
*src,
  }
  
  
+static int

+qemuDomainPrepareStorageSourceVDPA(virStorageSource *src,
+   qemuDomainObjPrivate *priv)
+{
+qemuDomainStorageSourcePrivate *srcpriv = NULL;
+virStorageType actualType = virStorageSourceGetActualType(src);
+int vdpafd = -1;
+
+if (actualType != VIR_STORAGE_TYPE_VHOST_VDPA)
+return 0;
+
+if ((vdpafd = qemuVDPAConnect(src->path)) < 0)
+return -1;


This function call directly touches the host filesystem, which is not
supposed to be in the *DomainPrepareStorageSource* functions but we
rather have a completely separate machinery in
qemuProcessPrepareHostStorage.

Unfortunately that one doesn't yet need to handle individual backing
chain members though.

This ensures that the code doesn't get accidentally called from tests
even without mocking the code as the tests reimplement the functions
differently for testing purposes.


+
+srcpriv = qemuDomainStorageSourcePrivateFetch(src);
+
+srcpriv->fdpass = qemuFDPassNew(src->nodestorage, priv);
+qemuFDPassAddFD(srcpriv->fdpass, , "-vdpa");
+return 0;
+}


[...]


diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
index 9dce908cfe..67b0944162 100644
--- a/src/qemu/qemu_validate.c
+++ b/src/qemu/qemu_validate.c
@@ -3220,6 +3220,28 @@ qemuValidateDomainDeviceDefDiskTransient(const 
virDomainDiskDef *disk,
  }
  
  
+static int

+qemuValidateDomainDeviceDefDiskVhost(const virDomainDef *def,
+ virStorageType storagetype,
+ virQEMUCapsFlags flag,
+ virQEMUCaps *qemuCaps)
+{
+const char *vhosttype = virStorageTypeToString(storagetype);
+
+if (!virQEMUCapsGet(qemuCaps, flag)) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+   _("%1$s disk is not supported with this QEMU binary"),
+   vhosttype);
+return -1;


I'd prefer if both things this function does are duplicated inline below
rather than passing it via arguments here. It makes it harder to follow.


+}
+
+if (qemuValidateDomainDefVhostUserRequireSharedMemory(def, vhosttype) < 0)
+return -1;
+
+return 0;
+}


In my testing of the code from the new branch I've observed that
blockjobs and snapshot creation work well thanks to libblkio, so we
don't have to add any additional checks or limitations.

I'll still need to go ahead and finish the series removing the 'raw'
driver when it's not necessary so that the fast-path, once implemented
will be possible. Waiting for that is not necessary for this series as
it works properly even with the 'raw' driver in place.

With your new version of the patches I've noticed the following
problems:

  - After converting to store the vdpa device path in src->vdpadev:

   - rejecting of the empty disk source doesn't work for vdpa. If you use
  in stead of the proper path, the XML will be defined but
 broken.

   - virStorageSourceCopy doesn't copy the new member (as instructed in
 the comment above the struct), thus block job code which uses this
 extensively to work on the inactive definition creates broken
 configurations.

I've also noticed that using 'qcow2' format for the device doesn't work:

error: internal error: process exited while connecting to monitor: 2023-07-24T12:54:15.818631Z qemu-system-x86_64: -blockdev 

Re: [libvirt PATCH 5/5] qemu: Implement support for vDPA block devices

2023-07-24 Thread Peter Krempa
As I've promised a long time ago I gave your patches some testing in
regards of cooperation with blockjobs and snapshots.

Since the new version of the patches was not yet posted on the list I'm
replying here including my observations from testing patches from your
gitlab branch:

On Tue, Jun 06, 2023 at 16:15:30 -0500, Jonathon Jongsma wrote:
> Requires recent qemu with support for the virtio-blk-vhost-vdpa device
> and the ability to pass a /dev/fdset/N path for the vdpa path (8.1.0)
> 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1900770

Since this is a feature addition the 'Fixes' keyword doesn't make sense.
Use e.g. 'Resolves' instead.

Additionally you're missing the DCO certification here.

> ---
>  src/qemu/qemu_block.c  | 20 --
>  src/qemu/qemu_domain.c | 25 
>  src/qemu/qemu_validate.c   | 44 +++---
>  tests/qemuxml2argvdata/disk-vhostvdpa.args | 35 +
>  tests/qemuxml2argvdata/disk-vhostvdpa.xml  | 21 +++
>  tests/qemuxml2argvtest.c   |  2 +
>  6 files changed, 139 insertions(+), 8 deletions(-)
>  create mode 100644 tests/qemuxml2argvdata/disk-vhostvdpa.args
>  create mode 100644 tests/qemuxml2argvdata/disk-vhostvdpa.xml

[...]

> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> index 2f6b32e394..119e52a7d7 100644
> --- a/src/qemu/qemu_domain.c
> +++ b/src/qemu/qemu_domain.c
> @@ -11157,6 +11157,28 @@ qemuDomainPrepareStorageSourceFDs(virStorageSource 
> *src,
>  }
>  
>  
> +static int
> +qemuDomainPrepareStorageSourceVDPA(virStorageSource *src,
> +   qemuDomainObjPrivate *priv)
> +{
> +qemuDomainStorageSourcePrivate *srcpriv = NULL;
> +virStorageType actualType = virStorageSourceGetActualType(src);
> +int vdpafd = -1;
> +
> +if (actualType != VIR_STORAGE_TYPE_VHOST_VDPA)
> +return 0;
> +
> +if ((vdpafd = qemuVDPAConnect(src->path)) < 0)
> +return -1;

This function call directly touches the host filesystem, which is not
supposed to be in the *DomainPrepareStorageSource* functions but we
rather have a completely separate machinery in
qemuProcessPrepareHostStorage.

Unfortunately that one doesn't yet need to handle individual backing
chain members though.

This ensures that the code doesn't get accidentally called from tests
even without mocking the code as the tests reimplement the functions
differently for testing purposes.

> +
> +srcpriv = qemuDomainStorageSourcePrivateFetch(src);
> +
> +srcpriv->fdpass = qemuFDPassNew(src->nodestorage, priv);
> +qemuFDPassAddFD(srcpriv->fdpass, , "-vdpa");
> +return 0;
> +}

[...]

> diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
> index 9dce908cfe..67b0944162 100644
> --- a/src/qemu/qemu_validate.c
> +++ b/src/qemu/qemu_validate.c
> @@ -3220,6 +3220,28 @@ qemuValidateDomainDeviceDefDiskTransient(const 
> virDomainDiskDef *disk,
>  }
>  
>  
> +static int
> +qemuValidateDomainDeviceDefDiskVhost(const virDomainDef *def,
> + virStorageType storagetype,
> + virQEMUCapsFlags flag,
> + virQEMUCaps *qemuCaps)
> +{
> +const char *vhosttype = virStorageTypeToString(storagetype);
> +
> +if (!virQEMUCapsGet(qemuCaps, flag)) {
> +virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> +   _("%1$s disk is not supported with this QEMU binary"),
> +   vhosttype);
> +return -1;

I'd prefer if both things this function does are duplicated inline below
rather than passing it via arguments here. It makes it harder to follow.

> +}
> +
> +if (qemuValidateDomainDefVhostUserRequireSharedMemory(def, vhosttype) < 
> 0)
> +return -1;
> +
> +return 0;
> +}

In my testing of the code from the new branch I've observed that
blockjobs and snapshot creation work well thanks to libblkio, so we
don't have to add any additional checks or limitations.

I'll still need to go ahead and finish the series removing the 'raw'
driver when it's not necessary so that the fast-path, once implemented
will be possible. Waiting for that is not necessary for this series as
it works properly even with the 'raw' driver in place.

With your new version of the patches I've noticed the following
problems:

 - After converting to store the vdpa device path in src->vdpadev:

  - rejecting of the empty disk source doesn't work for vdpa. If you use
 in stead of the proper path, the XML will be defined but
broken.

  - virStorageSourceCopy doesn't copy the new member (as instructed in
the comment above the struct), thus block job code which uses this
extensively to work on the inactive definition creates broken
configurations.

I've also noticed that using 'qcow2' format for the device doesn't work:

error: internal error: process exited while connecting to 

Re: [libvirt PATCH 5/5] qemu: Implement support for vDPA block devices

2023-06-08 Thread Peter Krempa
On Tue, Jun 06, 2023 at 16:15:30 -0500, Jonathon Jongsma wrote:
> Requires recent qemu with support for the virtio-blk-vhost-vdpa device
> and the ability to pass a /dev/fdset/N path for the vdpa path (8.1.0)
> 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1900770
> ---
>  src/qemu/qemu_block.c  | 20 --
>  src/qemu/qemu_domain.c | 25 
>  src/qemu/qemu_validate.c   | 44 +++---
>  tests/qemuxml2argvdata/disk-vhostvdpa.args | 35 +
>  tests/qemuxml2argvdata/disk-vhostvdpa.xml  | 21 +++
>  tests/qemuxml2argvtest.c   |  2 +
>  6 files changed, 139 insertions(+), 8 deletions(-)
>  create mode 100644 tests/qemuxml2argvdata/disk-vhostvdpa.args
>  create mode 100644 tests/qemuxml2argvdata/disk-vhostvdpa.xml

[...]

> diff --git a/tests/qemuxml2argvdata/disk-vhostvdpa.args 
> b/tests/qemuxml2argvdata/disk-vhostvdpa.args
> new file mode 100644
> index 00..878f3b5fce
> --- /dev/null
> +++ b/tests/qemuxml2argvdata/disk-vhostvdpa.args
> @@ -0,0 +1,35 @@
> +LC_ALL=C \
> +PATH=/bin \
> +HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1 \
> +USER=test \
> +LOGNAME=test \
> +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.local/share \
> +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.cache \
> +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.config \
> +/usr/bin/qemu-system-x86_64 \
> +-name guest=QEMUGuest1,debug-threads=on \
> +-S \
> +-object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/master-key.aes
>  \
> +-machine pc,usb=off,dump-guest-core=off \
> +-accel tcg \
> +-m 214 \
> +-overcommit mem-lock=off \
> +-smp 1,sockets=1,cores=1,threads=1 \
> +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \
> +-display none \
> +-no-user-config \
> +-nodefaults \
> +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \
> +-mon chardev=charmonitor,id=monitor,mode=control \
> +-rtc base=utc \
> +-no-shutdown \
> +-no-acpi \
> +-boot strict=on \
> +-usb \
> +-add-fd set=0,fd=1732,opaque=libvirt-1-storage-vdpa \
> +-blockdev 
> '{"driver":"virtio-blk-vhost-vdpa","path":"/dev/fdset/0","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}'
>  \
> +-blockdev 
> '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}'
>  \

IIUC the use of the 'raw' driver as an additional layer will prevent the
fast path working when it will eventually be implemented in qemu.

There are multiple options how we could eliminate that, but I think I
should excavate my patches for:

https://bugzilla.redhat.com/show_bug.cgi?id=1838189

which asks for removal of the dummy layer also for other cases when it
is not needed.

Alternative ways would require that we limit vdpa to be the only layer
and disallow blockjobs, as both of those cases expect the format layer
to be present.

> +-device 
> virtio-blk-pci,bus=pci.0,addr=0x2,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on
>  \
> +-audiodev '{"id":"audio1","driver":"none"}' \
> +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 \
> +-msg timestamp=on



> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c
> index d914d8cbea..928375c173 100644
> --- a/tests/qemuxml2argvtest.c
> +++ b/tests/qemuxml2argvtest.c
> @@ -1224,6 +1224,8 @@ mymain(void)
>  DO_TEST_CAPS_VER("disk-vhostuser-numa", "4.2.0");
>  DO_TEST_CAPS_LATEST("disk-vhostuser-numa");
>  DO_TEST_CAPS_LATEST("disk-vhostuser");
> +DO_TEST("disk-vhostvdpa",
> +QEMU_CAPS_DEVICE_VIRTIO_BLK_VHOST_VDPA);

NACK to fake-caps test. I'll regenerate the qemu caps with newer
libblkio and you can then use DO_TEST_CAPS_LATEST here.

Apart from the above the patch looks good.

I'll have my final word after I consider all the blockjob code paths and
snapshots (added by patch 1).



[libvirt PATCH 5/5] qemu: Implement support for vDPA block devices

2023-06-06 Thread Jonathon Jongsma
Requires recent qemu with support for the virtio-blk-vhost-vdpa device
and the ability to pass a /dev/fdset/N path for the vdpa path (8.1.0)

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1900770
---
 src/qemu/qemu_block.c  | 20 --
 src/qemu/qemu_domain.c | 25 
 src/qemu/qemu_validate.c   | 44 +++---
 tests/qemuxml2argvdata/disk-vhostvdpa.args | 35 +
 tests/qemuxml2argvdata/disk-vhostvdpa.xml  | 21 +++
 tests/qemuxml2argvtest.c   |  2 +
 6 files changed, 139 insertions(+), 8 deletions(-)
 create mode 100644 tests/qemuxml2argvdata/disk-vhostvdpa.args
 create mode 100644 tests/qemuxml2argvdata/disk-vhostvdpa.xml

diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c
index 35312fb7d4..78f6e63aad 100644
--- a/src/qemu/qemu_block.c
+++ b/src/qemu/qemu_block.c
@@ -778,6 +778,20 @@ qemuBlockStorageSourceGetNVMeProps(virStorageSource *src)
 }
 
 
+static virJSONValue *
+qemuBlockStorageSourceGetVhostVdpaProps(virStorageSource *src)
+{
+virJSONValue *ret = NULL;
+qemuDomainStorageSourcePrivate *srcpriv = 
QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(src);
+
+ignore_value(virJSONValueObjectAdd(,
+   "s:driver", "virtio-blk-vhost-vdpa",
+   "s:path", 
qemuFDPassGetPath(srcpriv->fdpass),
+   NULL));
+return ret;
+}
+
+
 static int
 qemuBlockStorageSourceGetBlockdevGetCacheProps(virStorageSource *src,
virJSONValue *props)
@@ -874,9 +888,9 @@ qemuBlockStorageSourceGetBackendProps(virStorageSource *src,
 break;
 
 case VIR_STORAGE_TYPE_VHOST_VDPA:
-virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
-   _("vhostvdpa disk type not yet supported"));
-return NULL;
+if (!(fileprops = qemuBlockStorageSourceGetVhostVdpaProps(src)))
+return NULL;
+break;
 
 case VIR_STORAGE_TYPE_VHOST_USER:
 virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 2f6b32e394..119e52a7d7 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -11157,6 +11157,28 @@ qemuDomainPrepareStorageSourceFDs(virStorageSource 
*src,
 }
 
 
+static int
+qemuDomainPrepareStorageSourceVDPA(virStorageSource *src,
+   qemuDomainObjPrivate *priv)
+{
+qemuDomainStorageSourcePrivate *srcpriv = NULL;
+virStorageType actualType = virStorageSourceGetActualType(src);
+int vdpafd = -1;
+
+if (actualType != VIR_STORAGE_TYPE_VHOST_VDPA)
+return 0;
+
+if ((vdpafd = qemuVDPAConnect(src->path)) < 0)
+return -1;
+
+srcpriv = qemuDomainStorageSourcePrivateFetch(src);
+
+srcpriv->fdpass = qemuFDPassNew(src->nodestorage, priv);
+qemuFDPassAddFD(srcpriv->fdpass, , "-vdpa");
+return 0;
+}
+
+
 int
 qemuDomainPrepareStorageSourceBlockdevNodename(virDomainDiskDef *disk,
virStorageSource *src,
@@ -11197,6 +11219,9 @@ 
qemuDomainPrepareStorageSourceBlockdevNodename(virDomainDiskDef *disk,
 if (qemuDomainPrepareStorageSourceFDs(src, priv) < 0)
 return -1;
 
+if (qemuDomainPrepareStorageSourceVDPA(src, priv) < 0)
+return -1;
+
 return 0;
 }
 
diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
index 9dce908cfe..67b0944162 100644
--- a/src/qemu/qemu_validate.c
+++ b/src/qemu/qemu_validate.c
@@ -3220,6 +3220,28 @@ qemuValidateDomainDeviceDefDiskTransient(const 
virDomainDiskDef *disk,
 }
 
 
+static int
+qemuValidateDomainDeviceDefDiskVhost(const virDomainDef *def,
+ virStorageType storagetype,
+ virQEMUCapsFlags flag,
+ virQEMUCaps *qemuCaps)
+{
+const char *vhosttype = virStorageTypeToString(storagetype);
+
+if (!virQEMUCapsGet(qemuCaps, flag)) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+   _("%1$s disk is not supported with this QEMU binary"),
+   vhosttype);
+return -1;
+}
+
+if (qemuValidateDomainDefVhostUserRequireSharedMemory(def, vhosttype) < 0)
+return -1;
+
+return 0;
+}
+
+
 int
 qemuValidateDomainDeviceDefDisk(const virDomainDiskDef *disk,
 const virDomainDef *def,
@@ -3281,13 +3303,25 @@ qemuValidateDomainDeviceDefDisk(const virDomainDiskDef 
*disk,
 }
 
 if (disk->src->type == VIR_STORAGE_TYPE_VHOST_USER) {
-if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VHOST_USER_BLK)) {
-virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
-   _("vhostuser disk is not supported with this QEMU 
binary"));
+if (qemuValidateDomainDeviceDefDiskVhost(def, disk->src->type,
+