Re: [PATCH for-9.0 v3] vdpa-dev: Fix initialisation order to restore VDUSE compatibility
On Fri, Mar 15, 2024 at 11:59 PM Kevin Wolf wrote: > > VDUSE requires that virtqueues are first enabled before the DRIVER_OK > status flag is set; with the current API of the kernel module, it is > impossible to enable the opposite order in our block export code because > userspace is not notified when a virtqueue is enabled. > > This requirement also mathces the normal initialisation order as done by > the generic vhost code in QEMU. However, commit 6c482547 accidentally > changed the order for vdpa-dev and broke access to VDUSE devices with > this. > > This changes vdpa-dev to use the normal order again and use the standard > vhost callback .vhost_set_vring_enable for this. VDUSE devices can be > used with vdpa-dev again after this fix. > > vhost_net intentionally avoided enabling the vrings for vdpa and does > this manually later while it does enable them for other vhost backends. > Reflect this in the vhost_net code and return early for vdpa, so that > the behaviour doesn't change for this device. > > Cc: qemu-sta...@nongnu.org > Fixes: 6c4825476a4351530bcac17abab72295b75ffe98 > Signed-off-by: Kevin Wolf > --- > v2: > - Actually make use of the @enable parameter > - Change vhost_net to preserve the current behaviour > > v3: > - Updated trace point [Stefano] > - Fixed typo in comment [Stefano] > > hw/net/vhost_net.c | 10 ++ > hw/virtio/vdpa-dev.c | 5 + > hw/virtio/vhost-vdpa.c | 29 ++--- > hw/virtio/vhost.c | 8 +++- > hw/virtio/trace-events | 2 +- > 5 files changed, 45 insertions(+), 9 deletions(-) > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > index e8e1661646..fd1a93701a 100644 > --- a/hw/net/vhost_net.c > +++ b/hw/net/vhost_net.c > @@ -541,6 +541,16 @@ int vhost_set_vring_enable(NetClientState *nc, int > enable) > VHostNetState *net = get_vhost_net(nc); > const VhostOps *vhost_ops = net->dev.vhost_ops; > > +/* > + * vhost-vdpa network devices need to enable dataplane virtqueues after > + * DRIVER_OK, so they can recover device state before starting dataplane. > + * Because of that, we don't enable virtqueues here and leave it to > + * net/vhost-vdpa.c. > + */ > +if (nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { > +return 0; > +} I think we need some inputs from Eugenio, this is only needed for shadow virtqueue during live migration but not other cases. Thanks
Re: [PATCH v2 1/6] virtio/virtio-pci: Handle extra notification data
On Wed, Mar 13, 2024 at 7:55 PM Jonah Palmer wrote: > > Add support to virtio-pci devices for handling the extra data sent > from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA > transport feature has been negotiated. > > The extra data that's passed to the virtio-pci device when this > feature is enabled varies depending on the device's virtqueue > layout. > > In a split virtqueue layout, this data includes: > - upper 16 bits: shadow_avail_idx > - lower 16 bits: virtqueue index > > In a packed virtqueue layout, this data includes: > - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx > - lower 16 bits: virtqueue index > > Tested-by: Lei Yang > Reviewed-by: Eugenio Pérez > Signed-off-by: Jonah Palmer > --- > hw/virtio/virtio-pci.c | 10 +++--- > hw/virtio/virtio.c | 18 ++ > include/hw/virtio/virtio.h | 1 + > 3 files changed, 26 insertions(+), 3 deletions(-) > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > index cb6940fc0e..0f5c3c3b2f 100644 > --- a/hw/virtio/virtio-pci.c > +++ b/hw/virtio/virtio-pci.c > @@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t > addr, uint32_t val) > { > VirtIOPCIProxy *proxy = opaque; > VirtIODevice *vdev = virtio_bus_get_device(>bus); > -uint16_t vector; > +uint16_t vector, vq_idx; > hwaddr pa; > > switch (addr) { > @@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, uint32_t > addr, uint32_t val) > vdev->queue_sel = val; > break; > case VIRTIO_PCI_QUEUE_NOTIFY: > -if (val < VIRTIO_QUEUE_MAX) { > -virtio_queue_notify(vdev, val); > +vq_idx = val; > +if (vq_idx < VIRTIO_QUEUE_MAX) { > +if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) { > +virtio_queue_set_shadow_avail_data(vdev, val); > +} > +virtio_queue_notify(vdev, vq_idx); > } > break; > case VIRTIO_PCI_STATUS: > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > index d229755eae..bcb9e09df0 100644 > --- a/hw/virtio/virtio.c > +++ b/hw/virtio/virtio.c > @@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, > int align) > } > } > > +void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data) > +{ > +/* Lower 16 bits is the virtqueue index */ > +uint16_t i = data; > +VirtQueue *vq = >vq[i]; > + > +if (!vq->vring.desc) { > +return; > +} > + > +if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) { > +vq->shadow_avail_wrap_counter = (data >> 31) & 0x1; > +vq->shadow_avail_idx = (data >> 16) & 0x7FFF; > +} else { > +vq->shadow_avail_idx = (data >> 16); Do we need to do a sanity check for this value? Thanks > +} > +} > + > static void virtio_queue_notify_vq(VirtQueue *vq) > { > if (vq->vring.desc && vq->handle_output) { > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > index c8f72850bc..53915947a7 100644 > --- a/include/hw/virtio/virtio.h > +++ b/include/hw/virtio/virtio.h > @@ -335,6 +335,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n); > void virtio_init_region_cache(VirtIODevice *vdev, int n); > void virtio_queue_set_align(VirtIODevice *vdev, int n, int align); > void virtio_queue_notify(VirtIODevice *vdev, int n); > +void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data); > uint16_t virtio_queue_vector(VirtIODevice *vdev, int n); > void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector); > int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n, > -- > 2.39.3 >
Re: [PATCH v1 0/8] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support
On Tue, Mar 5, 2024 at 3:46 AM Jonah Palmer wrote: > > The goal of these patches are to add support to a variety of virtio and > vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This > feature indicates that a driver will pass extra data (instead of just a > virtqueue's index) when notifying the corresponding device. > > The data passed in by the driver when this feature is enabled varies in > format depending on if the device is using a split or packed virtqueue > layout: > > Split VQ > - Upper 16 bits: shadow_avail_idx > - Lower 16 bits: virtqueue index > > Packed VQ > - Upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx > - Lower 16 bits: virtqueue index > > Also, due to the limitations of ioeventfd not being able to carry the > extra provided by the driver, ioeventfd is left disabled for any devices > using this feature. Is there any method to overcome this? This might help for vhost. Thanks > > A significant aspect of this effort has been to maintain compatibility > across different backends. As such, the feature is offered by backend > devices only when supported, with fallback mechanisms where backend > support is absent. > > Jonah Palmer (8): > virtio/virtio-pci: Handle extra notification data > virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA > virtio-mmio: Handle extra notification data > virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA > virtio-ccw: Handle extra notification data > virtio-ccw: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA > vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits > virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition > > hw/block/vhost-user-blk.c| 1 + > hw/net/vhost_net.c | 2 ++ > hw/s390x/s390-virtio-ccw.c | 16 > hw/s390x/virtio-ccw.c| 6 -- > hw/scsi/vhost-scsi.c | 1 + > hw/scsi/vhost-user-scsi.c| 1 + > hw/virtio/vhost-user-fs.c| 2 +- > hw/virtio/vhost-user-vsock.c | 1 + > hw/virtio/virtio-mmio.c | 15 +++ > hw/virtio/virtio-pci.c | 16 +++- > hw/virtio/virtio.c | 18 ++ > include/hw/virtio/virtio.h | 5 - > net/vhost-vdpa.c | 1 + > 13 files changed, 68 insertions(+), 17 deletions(-) > > -- > 2.39.3 >
Re: Re: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility
On Wed, Feb 7, 2024 at 4:47 PM Stefano Garzarella wrote: > > On Wed, Feb 07, 2024 at 11:17:34AM +0800, Jason Wang wrote: > >On Tue, Feb 6, 2024 at 4:31 PM Stefano Garzarella > >wrote: > >> > >> On Tue, Feb 06, 2024 at 10:47:40AM +0800, Jason Wang wrote: > >> >On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella > >> >wrote: > >> >> > >> >> On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote: > >> >> >VDUSE requires that virtqueues are first enabled before the DRIVER_OK > >> >> >status flag is set; with the current API of the kernel module, it is > >> >> >impossible to enable the opposite order in our block export code > >> >> >because > >> >> >userspace is not notified when a virtqueue is enabled. > >> > > >> >Did this mean virtio-blk will enable a virtqueue after DRIVER_OK? > >> > >> It's not specific to virtio-blk, but to the generic vdpa device we have > >> in QEMU (i.e. vhost-vdpa-device). Yep, after commit > >> 6c4825476a4351530bcac17abab72295b75ffe98, virtqueues are enabled after > >> DRIVER_OK. > > > >Right. > > > >> > >> >Sepc is not clear about this and that's why we introduce > >> >VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK. > >> > >> Ah, I didn't know about this new feature. So after commit > >> 6c4825476a4351530bcac17abab72295b75ffe98 the vhost-vdpa-device is not > >> complying with the specification, right? > > > >Kind of, but as stated, it's just because spec is unclear about the > >behaviour. There's a chance that spec will explicitly support it in > >the future. > > > >> > >> > > >> >> > >> >> Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message, > >> > > >> >I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is > >> >negotiated. > >> > >> At this point yes. But if VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not > >> negotiated, should we return an error in vhost-vdpa kernel module if > >> VHOST_VDPA_SET_VRING_ENABLE is called when DRIVER_OK is already set? > > > >I'm not sure if this can break some setups or not. It might be better > >to leave it as is? > > As I also answered in the kernel patch, IMHO we have to return an error, > because any setups are broken anyway (see the problem we're fixing with is > this patch), For VDUSE probably yes, but not for other parents. It's easy to mandate on day 0 but might be hard for now. For mlx5e, it supports the semantic VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK even before the feature bit is introduced. And we don't do other checks like for example setting vq address, size after DRIVER_OK. We can hear from others. > so at this point it's better to return an error, so they > don't spend time figuring out why the VDUSE device doesn't work. > > > > >Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we don't know if > >parent support vq_ready after driver_ok. > > So we have to assume that it doesn't support it, to be more > conservative, right? > > >With VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we know parent support > >vq_ready after driver_ok. > > > >> > >> >If this is truth, it seems a little more complicated, for > >> >example the get_backend_features needs to be forward to the userspace? > >> > >> I'm not understanding, don't we already have VHOST_GET_BACKEND_FEATURES > >> for this? Or do you mean userspace on the VDUSE side? > > > >Yes, since in this case the parent is in the userspace, there's no way > >for VDUSE to know if user space supports vq_ready after driver_ok or > >not. > > > >As you may have noticed, we don't have a message for vq_ready which > >implies that vq_ready after driver_ok can't be supported. > > Yep, I see. > > > > >> > >> >This seems suboptimal to implement this in the spec first and then we > >> >can leverage the features. Or we can have another parameter for the > >> >ioctl that creates the vduse device. > >> > >> I got a little lost, though in vhost-user, the device can always expect > >> a vring_enable/disable, so I thought it was not complicated in VDUSE. > > > >Yes, the problem is assuming we have a message for vq_ready, there > >could be a "legacy" userspace that doesn't support that. So in that > >case, VDUSE needs to know if the userspace parent can support that or > >not. > > I think VDUSE needs to negotiate features (if it doesn't already) as > vhost-user does with the backend. Also for new future functionality. It negotiates virtio features but not vhost backend features. Thanks
Re: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility
On Tue, Feb 6, 2024 at 4:31 PM Stefano Garzarella wrote: > > On Tue, Feb 06, 2024 at 10:47:40AM +0800, Jason Wang wrote: > >On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella > >wrote: > >> > >> On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote: > >> >VDUSE requires that virtqueues are first enabled before the DRIVER_OK > >> >status flag is set; with the current API of the kernel module, it is > >> >impossible to enable the opposite order in our block export code because > >> >userspace is not notified when a virtqueue is enabled. > > > >Did this mean virtio-blk will enable a virtqueue after DRIVER_OK? > > It's not specific to virtio-blk, but to the generic vdpa device we have > in QEMU (i.e. vhost-vdpa-device). Yep, after commit > 6c4825476a4351530bcac17abab72295b75ffe98, virtqueues are enabled after > DRIVER_OK. Right. > > >Sepc is not clear about this and that's why we introduce > >VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK. > > Ah, I didn't know about this new feature. So after commit > 6c4825476a4351530bcac17abab72295b75ffe98 the vhost-vdpa-device is not > complying with the specification, right? Kind of, but as stated, it's just because spec is unclear about the behaviour. There's a chance that spec will explicitly support it in the future. > > > > >> > >> Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message, > > > >I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is > >negotiated. > > At this point yes. But if VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not > negotiated, should we return an error in vhost-vdpa kernel module if > VHOST_VDPA_SET_VRING_ENABLE is called when DRIVER_OK is already set? I'm not sure if this can break some setups or not. It might be better to leave it as is? Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we don't know if parent support vq_ready after driver_ok. With VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we know parent support vq_ready after driver_ok. > > >If this is truth, it seems a little more complicated, for > >example the get_backend_features needs to be forward to the userspace? > > I'm not understanding, don't we already have VHOST_GET_BACKEND_FEATURES > for this? Or do you mean userspace on the VDUSE side? Yes, since in this case the parent is in the userspace, there's no way for VDUSE to know if user space supports vq_ready after driver_ok or not. As you may have noticed, we don't have a message for vq_ready which implies that vq_ready after driver_ok can't be supported. > > >This seems suboptimal to implement this in the spec first and then we > >can leverage the features. Or we can have another parameter for the > >ioctl that creates the vduse device. > > I got a little lost, though in vhost-user, the device can always expect > a vring_enable/disable, so I thought it was not complicated in VDUSE. Yes, the problem is assuming we have a message for vq_ready, there could be a "legacy" userspace that doesn't support that. So in that case, VDUSE needs to know if the userspace parent can support that or not. > > > > >> I'll start another thread about that, but in the meantime I agree that > >> we should fix QEMU since we need to work properly with old kernels as > >> well. > >> > >> > > >> >This requirement also mathces the normal initialisation order as done by > >> >the generic vhost code in QEMU. However, commit 6c482547 accidentally > >> >changed the order for vdpa-dev and broke access to VDUSE devices with > >> >this. > >> > > >> >This changes vdpa-dev to use the normal order again and use the standard > >> >vhost callback .vhost_set_vring_enable for this. VDUSE devices can be > >> >used with vdpa-dev again after this fix. > >> > >> I like this approach and the patch LGTM, but I'm a bit worried about > >> this function in hw/net/vhost_net.c: > >> > >> int vhost_set_vring_enable(NetClientState *nc, int enable) > >> { > >> VHostNetState *net = get_vhost_net(nc); > >> const VhostOps *vhost_ops = net->dev.vhost_ops; > >> > >> nc->vring_enable = enable; > >> > >> if (vhost_ops && vhost_ops->vhost_set_vring_enable) { > >> return vhost_ops->vhost_set_vring_enable(>dev, enable); > >> } > >> > >> return 0; > >> } > >> > >> @Eugenio, @Jason, should we change some things there if vhost-vdpa > >> implements the vhost_set_vring_enable callback? > > > >Eugenio may know more, I remember we need to enable cvq first for > >shadow virtqueue to restore some states. > > > >> > >> Do you remember why we didn't implement it from the beginning? > > > >It seems the vrings parameter is introduced after vhost-vdpa is > >implemented. > > Sorry, I mean why we didn't implement the vhost_set_vring_enable > callback for vhost-vdpa from the beginning. Adding Cindy who writes those codes for more thoughts. Thanks > > Thanks, > Stefano >
Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility
On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella wrote: > > On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote: > >VDUSE requires that virtqueues are first enabled before the DRIVER_OK > >status flag is set; with the current API of the kernel module, it is > >impossible to enable the opposite order in our block export code because > >userspace is not notified when a virtqueue is enabled. Did this mean virtio-blk will enable a virtqueue after DRIVER_OK? Sepc is not clear about this and that's why we introduce VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK. > > Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message, I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is negotiated. If this is truth, it seems a little more complicated, for example the get_backend_features needs to be forward to the userspace? This seems suboptimal to implement this in the spec first and then we can leverage the features. Or we can have another parameter for the ioctl that creates the vduse device. > I'll start another thread about that, but in the meantime I agree that > we should fix QEMU since we need to work properly with old kernels as > well. > > > > >This requirement also mathces the normal initialisation order as done by > >the generic vhost code in QEMU. However, commit 6c482547 accidentally > >changed the order for vdpa-dev and broke access to VDUSE devices with > >this. > > > >This changes vdpa-dev to use the normal order again and use the standard > >vhost callback .vhost_set_vring_enable for this. VDUSE devices can be > >used with vdpa-dev again after this fix. > > I like this approach and the patch LGTM, but I'm a bit worried about > this function in hw/net/vhost_net.c: > > int vhost_set_vring_enable(NetClientState *nc, int enable) > { > VHostNetState *net = get_vhost_net(nc); > const VhostOps *vhost_ops = net->dev.vhost_ops; > > nc->vring_enable = enable; > > if (vhost_ops && vhost_ops->vhost_set_vring_enable) { > return vhost_ops->vhost_set_vring_enable(>dev, enable); > } > > return 0; > } > > @Eugenio, @Jason, should we change some things there if vhost-vdpa > implements the vhost_set_vring_enable callback? Eugenio may know more, I remember we need to enable cvq first for shadow virtqueue to restore some states. > > Do you remember why we didn't implement it from the beginning? It seems the vrings parameter is introduced after vhost-vdpa is implemented. Thanks > > Thanks, > Stefano > > > > >Cc: qemu-sta...@nongnu.org > >Fixes: 6c4825476a4351530bcac17abab72295b75ffe98 > >Signed-off-by: Kevin Wolf > >--- > > hw/virtio/vdpa-dev.c | 5 + > > hw/virtio/vhost-vdpa.c | 17 + > > 2 files changed, 18 insertions(+), 4 deletions(-) > > > >diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c > >index eb9ecea83b..13e87f06f6 100644 > >--- a/hw/virtio/vdpa-dev.c > >+++ b/hw/virtio/vdpa-dev.c > >@@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, > >Error **errp) > > > > s->dev.acked_features = vdev->guest_features; > > > >-ret = vhost_dev_start(>dev, vdev, false); > >+ret = vhost_dev_start(>dev, vdev, true); > > if (ret < 0) { > > error_setg_errno(errp, -ret, "Error starting vhost"); > > goto err_guest_notifiers; > > } > >-for (i = 0; i < s->dev.nvqs; ++i) { > >-vhost_vdpa_set_vring_ready(>vdpa, i); > >-} > > s->started = true; > > > > /* > >diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > >index 3a43beb312..c4574d56c5 100644 > >--- a/hw/virtio/vhost-vdpa.c > >+++ b/hw/virtio/vhost-vdpa.c > >@@ -904,6 +904,22 @@ int vhost_vdpa_set_vring_ready(struct vhost_vdpa *v, > >unsigned idx) > > return r; > > } > > > >+static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int enable) > >+{ > >+struct vhost_vdpa *v = dev->opaque; > >+unsigned int i; > >+int ret; > >+ > >+for (i = 0; i < dev->nvqs; ++i) { > >+ret = vhost_vdpa_set_vring_ready(v, i); > >+if (ret < 0) { > >+return ret; > >+} > >+} > >+ > >+return 0; > >+} > >+ > > static int vhost_vdpa_set_config_call(struct vhost_dev *dev, > >int fd) > > { > >@@ -1524,6 +1540,7 @@ const VhostOps vdpa_ops = { > > .vhost_set_features = vhost_vdpa_set_features, > > .vhost_reset_device = vhost_vdpa_reset_device, > > .vhost_get_vq_index = vhost_vdpa_get_vq_index, > >+.vhost_set_vring_enable = vhost_vdpa_set_vring_enable, > > .vhost_get_config = vhost_vdpa_get_config, > > .vhost_set_config = vhost_vdpa_set_config, > > .vhost_requires_shm_log = NULL, > >-- > >2.43.0 > > >
Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation
On Mon, Dec 11, 2023 at 4:29 PM Akihiko Odaki wrote: > > On 2023/12/11 16:26, Jason Wang wrote: > > On Mon, Dec 11, 2023 at 1:30 PM Akihiko Odaki > > wrote: > >> > >> On 2023/12/11 11:52, Jason Wang wrote: > >>> On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki > >>> wrote: > >>>> > >>>> Introduction > >>>> > >>>> > >>>> This series is based on the RFC series submitted by Yui Washizu[1]. > >>>> See also [2] for the context. > >>>> > >>>> This series enables SR-IOV emulation for virtio-net. It is useful > >>>> to test SR-IOV support on the guest, or to expose several vDPA devices > >>>> in a VM. vDPA devices can also provide L2 switching feature for > >>>> offloading though it is out of scope to allow the guest to configure > >>>> such a feature. > >>>> > >>>> The PF side code resides in virtio-pci. The VF side code resides in > >>>> the PCI common infrastructure, but it is restricted to work only for > >>>> virtio-net-pci because of lack of validation. > >>>> > >>>> User Interface > >>>> -- > >>>> > >>>> A user can configure a SR-IOV capable virtio-net device by adding > >>>> virtio-net-pci functions to a bus. Below is a command line example: > >>>> -netdev user,id=n -netdev user,id=o > >>>> -netdev user,id=p -netdev user,id=q > >>>> -device pcie-root-port,id=b > >>>> -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f > >>>> -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f > >>>> -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f > >>>> -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f > >>>> > >>>> The VFs specify the paired PF with "sriov-pf" property. The PF must be > >>>> added after all VFs. It is user's responsibility to ensure that VFs have > >>>> function numbers larger than one of the PF, and the function numbers > >>>> have a consistent stride. > >>> > >>> This seems not user friendly. Any reason we can't just allow user to > >>> specify the stride here? > >> > >> It should be possible to assign addr automatically without requiring > >> user to specify the stride. I'll try that in the next version. > >> > >>> > >>> Btw, I vaguely remember qemu allows the params to be accepted as a > >>> list. If this is true, we can accept a list of netdev here? > >> > >> Yes, rocker does that. But the problem is not just about getting > >> parameters needed for VFs, which I forgot to mention in the cover letter > >> and will explain below. > >> > >>> > >>>> > >>>> Keeping VF instances > >>>> > >>>> > >>>> A problem with SR-IOV emulation is that it needs to hotplug the VFs as > >>>> the guest requests. Previously, this behavior was implemented by > >>>> realizing and unrealizing VFs at runtime. However, this strategy does > >>>> not work well for the proposed virtio-net emulation; in this proposal, > >>>> device options passed in the command line must be maintained as VFs > >>>> are hotplugged, but they are consumed when the machine starts and not > >>>> available after that, which makes realizing VFs at runtime impossible. > >>> > >>> Could we store the device options in the PF? > >> > >> I wrote it's to store the device options, but the problem is actually > >> more about realizing VFs at runtime instead of at the initialization time. > >> > >> Realizing VFs at runtime have two major problems. One is that it delays > >> the validations of options; invalid options will be noticed when the > >> guest requests to realize VFs. > > > > If PCI spec allows the failure when creating VF, then it should not be > > a problem. > > I doubt the spec cares such a failure at all. VF enablement should > always work for a real hardware. It's neither user-friendly to tell > configuration errors at runtime. I'm not sure which options we should care about? Did you mean netdev options or the virtio-net specific ones? If VF stick to the same options as PF (except for the SRIOV), it should be validate
Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation
On Mon, Dec 11, 2023 at 1:30 PM Akihiko Odaki wrote: > > On 2023/12/11 11:52, Jason Wang wrote: > > On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki > > wrote: > >> > >> Introduction > >> > >> > >> This series is based on the RFC series submitted by Yui Washizu[1]. > >> See also [2] for the context. > >> > >> This series enables SR-IOV emulation for virtio-net. It is useful > >> to test SR-IOV support on the guest, or to expose several vDPA devices > >> in a VM. vDPA devices can also provide L2 switching feature for > >> offloading though it is out of scope to allow the guest to configure > >> such a feature. > >> > >> The PF side code resides in virtio-pci. The VF side code resides in > >> the PCI common infrastructure, but it is restricted to work only for > >> virtio-net-pci because of lack of validation. > >> > >> User Interface > >> -- > >> > >> A user can configure a SR-IOV capable virtio-net device by adding > >> virtio-net-pci functions to a bus. Below is a command line example: > >>-netdev user,id=n -netdev user,id=o > >>-netdev user,id=p -netdev user,id=q > >>-device pcie-root-port,id=b > >>-device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f > >>-device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f > >>-device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f > >>-device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f > >> > >> The VFs specify the paired PF with "sriov-pf" property. The PF must be > >> added after all VFs. It is user's responsibility to ensure that VFs have > >> function numbers larger than one of the PF, and the function numbers > >> have a consistent stride. > > > > This seems not user friendly. Any reason we can't just allow user to > > specify the stride here? > > It should be possible to assign addr automatically without requiring > user to specify the stride. I'll try that in the next version. > > > > > Btw, I vaguely remember qemu allows the params to be accepted as a > > list. If this is true, we can accept a list of netdev here? > > Yes, rocker does that. But the problem is not just about getting > parameters needed for VFs, which I forgot to mention in the cover letter > and will explain below. > > > > >> > >> Keeping VF instances > >> > >> > >> A problem with SR-IOV emulation is that it needs to hotplug the VFs as > >> the guest requests. Previously, this behavior was implemented by > >> realizing and unrealizing VFs at runtime. However, this strategy does > >> not work well for the proposed virtio-net emulation; in this proposal, > >> device options passed in the command line must be maintained as VFs > >> are hotplugged, but they are consumed when the machine starts and not > >> available after that, which makes realizing VFs at runtime impossible. > > > > Could we store the device options in the PF? > > I wrote it's to store the device options, but the problem is actually > more about realizing VFs at runtime instead of at the initialization time. > > Realizing VFs at runtime have two major problems. One is that it delays > the validations of options; invalid options will be noticed when the > guest requests to realize VFs. If PCI spec allows the failure when creating VF, then it should not be a problem. > netdevs also warn that they are not used > at initialization time, not knowing that they will be used by VFs later. We could invent things to calm down this false positive. > References to other QEMU objects in the option may also die before VFs > are realized. Is there any other thing than netdev we need to consider? > > The other problem is that QEMU cannot interact with the unrealized VFs. > For example, if you type "device_add virtio-net-pci,id=vf,sriov-pf=pf" > in HMP, you will expect "device_del vf" works, but it's hard to > implement such behaviors with unrealized VFs. I think hotplug can only be done at PF level if we do that. > > I was first going to compromise and allow such quirky behaviors, but I > realized such a compromise is unnecessary if we reuse the PCI power down > logic so I wrote v2. Haven't checked the code, but anything related to the PM here? Thanks > > Regards, > Akihiko Odaki >
Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation
On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki wrote: > > Introduction > > > This series is based on the RFC series submitted by Yui Washizu[1]. > See also [2] for the context. > > This series enables SR-IOV emulation for virtio-net. It is useful > to test SR-IOV support on the guest, or to expose several vDPA devices > in a VM. vDPA devices can also provide L2 switching feature for > offloading though it is out of scope to allow the guest to configure > such a feature. > > The PF side code resides in virtio-pci. The VF side code resides in > the PCI common infrastructure, but it is restricted to work only for > virtio-net-pci because of lack of validation. > > User Interface > -- > > A user can configure a SR-IOV capable virtio-net device by adding > virtio-net-pci functions to a bus. Below is a command line example: > -netdev user,id=n -netdev user,id=o > -netdev user,id=p -netdev user,id=q > -device pcie-root-port,id=b > -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f > -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f > -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f > -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f > > The VFs specify the paired PF with "sriov-pf" property. The PF must be > added after all VFs. It is user's responsibility to ensure that VFs have > function numbers larger than one of the PF, and the function numbers > have a consistent stride. This seems not user friendly. Any reason we can't just allow user to specify the stride here? Btw, I vaguely remember qemu allows the params to be accepted as a list. If this is true, we can accept a list of netdev here? > > Keeping VF instances > > > A problem with SR-IOV emulation is that it needs to hotplug the VFs as > the guest requests. Previously, this behavior was implemented by > realizing and unrealizing VFs at runtime. However, this strategy does > not work well for the proposed virtio-net emulation; in this proposal, > device options passed in the command line must be maintained as VFs > are hotplugged, but they are consumed when the machine starts and not > available after that, which makes realizing VFs at runtime impossible. Could we store the device options in the PF? Thanks > > As an strategy alternative to runtime realization/unrealization, this > series proposes to reuse the code to power down PCI Express devices. > When a PCI Express device is powered down, it will be hidden from the > guest but will be kept realized. This effectively implements the > behavior we need for the SR-IOV emulation. > > Summary > --- > > Patch [1, 5] refactors the PCI infrastructure code. > Patch [6, 10] adds user-created SR-IOV VF infrastructure. > Patch 11 makes virtio-pci work as SR-IOV PF for user-created VFs. > Patch 12 allows user to create SR-IOV VFs with virtio-net-pci. > > [1] > https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.wash...@gmail.com/ > [2] > https://lore.kernel.org/all/5d46f455-f530-4e5e-9ae7-13a2297d4...@daynix.com/ > > Co-developed-by: Yui Washizu > Signed-off-by: Akihiko Odaki > --- > Changes in v2: > - Changed to keep VF instances. > - Link to v1: > https://lore.kernel.org/r/20231202-sriov-v1-0-32b3570f7...@daynix.com > > --- > Akihiko Odaki (12): > hw/pci: Initialize PCI multifunction after realization > hw/pci: Determine if rombar is explicitly enabled > hw/pci: Do not add ROM BAR for SR-IOV VF > vfio: Avoid inspecting option QDict for rombar > hw/qdev: Remove opts member > pcie_sriov: Reuse SR-IOV VF device instances > pcie_sriov: Release VFs failed to realize > pcie_sriov: Ensure PF and VF are mutually exclusive > pcie_sriov: Check PCI Express for SR-IOV PF > pcie_sriov: Allow user to create SR-IOV device > virtio-pci: Implement SR-IOV PF > virtio-net: Implement SR-IOV VF > > docs/pcie_sriov.txt | 8 +- > include/hw/pci/pci.h| 2 +- > include/hw/pci/pci_device.h | 13 +- > include/hw/pci/pcie_sriov.h | 25 ++- > include/hw/qdev-core.h | 4 - > hw/core/qdev.c | 1 - > hw/net/igb.c| 3 +- > hw/nvme/ctrl.c | 3 +- > hw/pci/pci.c| 98 +++- > hw/pci/pci_host.c | 4 +- > hw/pci/pcie.c | 4 +- > hw/pci/pcie_sriov.c | 360 > +--- > hw/vfio/pci.c | 3 +- > hw/virtio/virtio-net-pci.c | 1 + > hw/virtio/virtio-pci.c | 7 + > system/qdev-monitor.c | 12 +- > 16 files changed, 395 insertions(+), 153 deletions(-) > --- > base-commit: 4705fc0c8511d073bee4751c3c974aab2b10a970 > change-id: 20231202-sriov-9402fb262be8 > > Best regards, > -- > Akihiko Odaki >
Re: [PATCH v3 09/14] hw/net/tulip: Finish QOM conversion
On Tue, Feb 28, 2023 at 9:39 PM Philippe Mathieu-Daudé wrote: > > Hi Jason, do you Ack this patch? Yes. Acked-by: Jason Wang Thanks > > On 13/2/23 19:43, Philippe Mathieu-Daudé wrote: > > Use the TULIP() and DEVICE() QOM type-checking macros. > > Remove uses of DO_UPCAST(). > > > > Signed-off-by: Philippe Mathieu-Daudé > > --- > > hw/net/tulip.c | 20 +++- > > 1 file changed, 11 insertions(+), 9 deletions(-) > > > > diff --git a/hw/net/tulip.c b/hw/net/tulip.c > > index 915e5fb595..990507859d 100644 > > --- a/hw/net/tulip.c > > +++ b/hw/net/tulip.c > > @@ -19,7 +19,10 @@ > > #include "net/eth.h" > > > > struct TULIPState { > > +/*< private >*/ > > PCIDevice dev; > > +/*< public >*/ > > + > > MemoryRegion io; > > MemoryRegion memory; > > NICConf c; > > @@ -959,7 +962,7 @@ static void tulip_fill_eeprom(TULIPState *s) > > > > static void pci_tulip_realize(PCIDevice *pci_dev, Error **errp) > > { > > -TULIPState *s = DO_UPCAST(TULIPState, dev, pci_dev); > > +TULIPState *s = TULIP(pci_dev); > > uint8_t *pci_conf; > > > > pci_conf = s->dev.config; > > @@ -967,7 +970,7 @@ static void pci_tulip_realize(PCIDevice *pci_dev, Error > > **errp) > > > > qemu_macaddr_default_if_unset(>c.macaddr); > > > > -s->eeprom = eeprom93xx_new(_dev->qdev, 64); > > +s->eeprom = eeprom93xx_new(DEVICE(pci_dev), 64); > > tulip_fill_eeprom(s); > > > > memory_region_init_io(>io, OBJECT(>dev), _ops, s, > > @@ -983,27 +986,26 @@ static void pci_tulip_realize(PCIDevice *pci_dev, > > Error **errp) > > > > s->nic = qemu_new_nic(_tulip_info, >c, > > object_get_typename(OBJECT(pci_dev)), > > - pci_dev->qdev.id, s); > > + DEVICE(pci_dev)->id, s); > > qemu_format_nic_info_str(qemu_get_queue(s->nic), s->c.macaddr.a); > > } > > > > static void pci_tulip_exit(PCIDevice *pci_dev) > > { > > -TULIPState *s = DO_UPCAST(TULIPState, dev, pci_dev); > > +TULIPState *s = TULIP(pci_dev); > > > > qemu_del_nic(s->nic); > > qemu_free_irq(s->irq); > > -eeprom93xx_free(_dev->qdev, s->eeprom); > > +eeprom93xx_free(DEVICE(s), s->eeprom); > > } > > > > static void tulip_instance_init(Object *obj) > > { > > -PCIDevice *pci_dev = PCI_DEVICE(obj); > > -TULIPState *d = DO_UPCAST(TULIPState, dev, pci_dev); > > +TULIPState *s = TULIP(obj); > > > > -device_add_bootindex_property(obj, >c.bootindex, > > +device_add_bootindex_property(obj, >c.bootindex, > > "bootindex", "/ethernet-phy@0", > > - _dev->qdev); > > + DEVICE(obj)); > > } > > > > static Property tulip_properties[] = { >
Re: [PATCH] vhost: mask VIRTIO_F_RING_RESET for vhost and vhost-user devices
On Mon, Nov 21, 2022 at 6:11 PM Stefano Garzarella wrote: > > Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support") > enabled VIRTIO_F_RING_RESET by default for all virtio devices. > > This feature is not currently emulated by QEMU, so for vhost and > vhost-user devices we need to make sure it is supported by the offloaded > device emulation (in-kernel or in another process). > To do this we need to add VIRTIO_F_RING_RESET to the features bitmap > passed to vhost_get_features(). This way it will be masked if the device > does not support it. > > This issue was initially discovered with vhost-vsock and vhost-user-vsock, > and then also tested with vhost-user-rng which confirmed the same issue. > They fail when sending features through VHOST_SET_FEATURES ioctl or > VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated > by the guest (Linux >= v6.0), but not supported by the device. > > Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support") > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318 > Signed-off-by: Stefano Garzarella Acked-by: Jason Wang > --- > > To prevent this problem in the future, perhaps we should provide a function > (e.g. vhost_device_get_features) where we go to mask all non-device-specific > features (e.g VIRTIO_F_*, VIRTIO_RING_F_*) that are not emulated by QEMU but > we expect them to be emulated by the vhost or vhost-user devices. Adding Eugenio, this might not be true if we want to use shadow virtqueue for emulating some features? Thanks > Then we can call it in all .get_features callbacks just before return the > features. > > What do you think? > > But maybe better to do that for the next release, I will send an RFC. > > Thanks, > Stefano > --- > hw/block/vhost-user-blk.c | 1 + > hw/net/vhost_net.c | 1 + > hw/scsi/vhost-scsi.c | 1 + > hw/scsi/vhost-user-scsi.c | 1 + > hw/virtio/vhost-user-fs.c | 1 + > hw/virtio/vhost-user-gpio.c| 1 + > hw/virtio/vhost-user-i2c.c | 1 + > hw/virtio/vhost-user-rng.c | 11 +-- > hw/virtio/vhost-vsock-common.c | 1 + > net/vhost-vdpa.c | 1 + > 10 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > index 16ad400889..0d5190accf 100644 > --- a/hw/block/vhost-user-blk.c > +++ b/hw/block/vhost-user-blk.c > @@ -52,6 +52,7 @@ static const int user_feature_bits[] = { > VIRTIO_F_NOTIFY_ON_EMPTY, > VIRTIO_F_RING_PACKED, > VIRTIO_F_IOMMU_PLATFORM, > +VIRTIO_F_RING_RESET, > VHOST_INVALID_FEATURE_BIT > }; > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > index feda448878..26e4930676 100644 > --- a/hw/net/vhost_net.c > +++ b/hw/net/vhost_net.c > @@ -75,6 +75,7 @@ static const int user_feature_bits[] = { > VIRTIO_NET_F_MTU, > VIRTIO_F_IOMMU_PLATFORM, > VIRTIO_F_RING_PACKED, > +VIRTIO_F_RING_RESET, > VIRTIO_NET_F_RSS, > VIRTIO_NET_F_HASH_REPORT, > > diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c > index bdf337a7a2..6a0fd0dfb1 100644 > --- a/hw/scsi/vhost-scsi.c > +++ b/hw/scsi/vhost-scsi.c > @@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = { > VIRTIO_RING_F_INDIRECT_DESC, > VIRTIO_RING_F_EVENT_IDX, > VIRTIO_SCSI_F_HOTPLUG, > +VIRTIO_F_RING_RESET, > VHOST_INVALID_FEATURE_BIT > }; > > diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c > index bc37317d55..b7a71a802c 100644 > --- a/hw/scsi/vhost-user-scsi.c > +++ b/hw/scsi/vhost-user-scsi.c > @@ -36,6 +36,7 @@ static const int user_feature_bits[] = { > VIRTIO_RING_F_INDIRECT_DESC, > VIRTIO_RING_F_EVENT_IDX, > VIRTIO_SCSI_F_HOTPLUG, > +VIRTIO_F_RING_RESET, > VHOST_INVALID_FEATURE_BIT > }; > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > index 1c40f42045..dc4014cdef 100644 > --- a/hw/virtio/vhost-user-fs.c > +++ b/hw/virtio/vhost-user-fs.c > @@ -32,6 +32,7 @@ static const int user_feature_bits[] = { > VIRTIO_F_NOTIFY_ON_EMPTY, > VIRTIO_F_RING_PACKED, > VIRTIO_F_IOMMU_PLATFORM, > +VIRTIO_F_RING_RESET, > > VHOST_INVALID_FEATURE_BIT > }; > diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c > index 677d1c7730..5851cb3bc9 100644 > --- a/hw/virtio/vhost-user-gpio.c > +++ b/hw/virtio/vhost-user-gpio.c > @@ -24,6 +24,7 @@ static const int feature_bits[] = { > VIRTIO_RING_F_INDIRECT_DESC, > VIRTIO_RING_F_EVENT_IDX, > VIRTIO_GPIO_F_IRQ, > +VIRTIO_F_RING_RESET, > VHOST_INVALID_FEATURE_BIT > }; &g
Re: Re: [PATCH v2] hw/virtio/vhost-user: support obtain vdpa device's mac address automatically
On Fri, Sep 23, 2022 at 11:55 AM ho...@yusur.tech wrote: > > On Thu, 22 Sep 2022 09:34:41 +0800 Jason Wang wrote: > > > >On Thu, Sep 22, 2022 at 1:58 AM Raphael Norwitz > > wrote: > >> > >> If I read your response on the other thread correctly, this change is > >> intended > >> > >> to prioritize the MAC address exposed by DPDK over the one provided by the > >> > >> QEMU command line? Sounds reasonable in principle, but I would get > >> confirmation > >> > >> from vDPA/vhost-net maintainers. > > >I think the best way is to (and it seems easier) > > >1) have the management layer to make sure the mac came from cli > >matches the underlayer vDPA > > Agreed, that's no problem. > > >2) having a sanity check and fail the device initialization if they don't > >match > > However, one MAC address for integrity check is from the cli, and the other > MAC address is from the vDPA device, > How to get it? VHOST_USER_GET_CONFIG? Thanks > > The current situation is if MAC came from cli don't match the underlayer > vDPA, the virtual machine can still start without any hints. > > Thanks > > > >Thanks > > >> > >> > >> > >> That said the way you’re hacking the vhost-user code breaks a valuable > >> check for > >> > >> bad vhost-user-blk backends. I would suggest finding another > >> implementation which > >> > >> does not regress functionality for other device types. > >> > >> > >> > >> > >> > >> >From: Hao Chen > >> > >> > > >> > >> >When use dpdk-vdpa tests vdpa device. You need to specify the mac address > >> >to > >> > >> >start the virtual machine through libvirt or qemu, but now, the libvirt or > >> > >> >qemu can call dpdk vdpa vendor driver's ops .get_config through > >> >vhost_net_get_config > >> > >> >to get the mac address of the vdpa hardware without manual configuration. > >> > >> > > >> > >> >v1->v2: > >> > >> >Only copy ETH_ALEN data of netcfg for some vdpa device such as > >> > >> >NVIDIA BLUEFIELD DPU(BF2)'s netcfg->status is not right. > >> > >> >We only need the mac address and don't care about the status field. > >> > >> > > >> > >> >Signed-off-by: Hao Chen > >> > >> >--- > >> > >> > hw/block/vhost-user-blk.c | 1 - > >> > >> > hw/net/virtio-net.c | 7 +++ > >> > >> > hw/virtio/vhost-user.c| 19 --- > >> > >> > 3 files changed, 7 insertions(+), 20 deletions(-) > >> > >> > > >> > >> >diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > >> > >> >index 9117222456..5dca4eab09 100644 > >> > >> >--- a/hw/block/vhost-user-blk.c > >> > >> >+++ b/hw/block/vhost-user-blk.c > >> > >> >@@ -337,7 +337,6 @@ static int vhost_user_blk_connect(DeviceState *dev, > >> >Error **errp) > >> > >> > > >> > >> > vhost_dev_set_config_notifier(>dev, _ops); > >> > >> > > >> > >> >-s->vhost_user.supports_config = true; > >> > >> > ret = vhost_dev_init(>dev, >vhost_user, > >> > VHOST_BACKEND_TYPE_USER, 0, > >> > >> > errp); > >> > >> > if (ret < 0) { > >> > >> >diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >> > >> >index dd0d056fde..90405083b1 100644 > >> > >> >--- a/hw/net/virtio-net.c > >> > >> >+++ b/hw/net/virtio-net.c > >> > >> >@@ -166,6 +166,13 @@ static void virtio_net_get_config(VirtIODevice > >> >*vdev, uint8_t *config) > >> > >> > } > >> > >> > memcpy(config, , n->config_size); > >> > >> > } > >> > >> >+} else if (nc->peer && nc->peer->info->type == > >> >NET_CLIENT_DRIVER_VHOST_USER) { > >> > >> >+ret = vhost_net_get_config(get_vhost_net(nc->peer), (uint8_t > >> >*), > >> > >> >+ n->config_size); > &g
Re: [PATCH v2] hw/virtio/vhost-user: support obtain vdpa device's mac address automatically
On Fri, Sep 23, 2022 at 11:33 AM 陈浩 wrote: > > > On 2022/9/22 17:56, Michael S. Tsirkin wrote: > > On Thu, Sep 22, 2022 at 09:34:41AM +0800, Jason Wang wrote: > >> On Thu, Sep 22, 2022 at 1:58 AM Raphael Norwitz > >> wrote: > >>> If I read your response on the other thread correctly, this change is > >>> intended > >>> > >>> to prioritize the MAC address exposed by DPDK over the one provided by the > >>> > >>> QEMU command line? Sounds reasonable in principle, but I would get > >>> confirmation > >>> > >>> from vDPA/vhost-net maintainers. > >> I think the best way is to (and it seems easier) > >> > >> 1) have the management layer to make sure the mac came from cli > >> matches the underlayer vDPA > >> 2) having a sanity check and fail the device initialization if they don't > >> match > >> > >> Thanks > > I think I agree, we should not overwrite user supplied arguments. > > I think the case where we might want to get the mac from VDPA > > and use it is when it's been assigned randomly as opposed to coming from > > cli. > Yes, when the cli passes in a random mac address that is inconsistent > with the vdpa device, qemu still starts the virtual machine as usual, > but in this case, the qemu and vdpa environments do not work correctly. > So I want to get mac address directly from vdpa device instead of cli. Let's teach the management to do that. It has a lot of advantages, more below. Not sure for the DPDK case, but kernel vDPA allows the mgmt to provision the vDPA with a mac address which could be used in this case. > > Having a sanity check and fail the device initialization if they don't > match is a good idea, but it seems more convenient to directly overwrite > the cli mac address. This will confuse the management where it may do a lot of mac related setups. Thanks > > > > > >>> > >>> > >>> That said the way you’re hacking the vhost-user code breaks a valuable > >>> check for > >>> > >>> bad vhost-user-blk backends. I would suggest finding another > >>> implementation which > >>> > >>> does not regress functionality for other device types. > >>> > >>> > >>> > >>> > >>> > >>>> From: Hao Chen > >>>> When use dpdk-vdpa tests vdpa device. You need to specify the mac > >>>> address to > >>>> start the virtual machine through libvirt or qemu, but now, the libvirt > >>>> or > >>>> qemu can call dpdk vdpa vendor driver's ops .get_config through > >>>> vhost_net_get_config > >>>> to get the mac address of the vdpa hardware without manual configuration. > >>>> v1->v2: > >>>> Only copy ETH_ALEN data of netcfg for some vdpa device such as > >>>> NVIDIA BLUEFIELD DPU(BF2)'s netcfg->status is not right. > >>>> We only need the mac address and don't care about the status field. > >>>> Signed-off-by: Hao Chen > >>>> --- > >>>> hw/block/vhost-user-blk.c | 1 - > >>>> hw/net/virtio-net.c | 7 +++ > >>>> hw/virtio/vhost-user.c| 19 --- > >>>> 3 files changed, 7 insertions(+), 20 deletions(-) > >>>> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > >>>> index 9117222456..5dca4eab09 100644 > >>>> --- a/hw/block/vhost-user-blk.c > >>>> +++ b/hw/block/vhost-user-blk.c > >>>> @@ -337,7 +337,6 @@ static int vhost_user_blk_connect(DeviceState *dev, > >>>> Error **errp) > >>>> vhost_dev_set_config_notifier(>dev, _ops); > >>>> -s->vhost_user.supports_config = true; > >>>> ret = vhost_dev_init(>dev, >vhost_user, > >>>> VHOST_BACKEND_TYPE_USER, 0, > >>>> errp); > >>>> if (ret < 0) { > >>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >>>> index dd0d056fde..90405083b1 100644 > >>>> --- a/hw/net/virtio-net.c > >>>> +++ b/hw/net/virtio-net.c > >>>> @@ -166,6 +166,13 @@ static void virtio_net_get_config(VirtIODevice > >>>> *vdev, uint8_t *config) > >>>> } > >>>> memcpy(config, , n->config_size); > >>>> } > >>>> +} else if
Re: [PATCH v2] hw/virtio/vhost-user: support obtain vdpa device's mac address automatically
On Thu, Sep 22, 2022 at 1:58 AM Raphael Norwitz wrote: > > If I read your response on the other thread correctly, this change is intended > > to prioritize the MAC address exposed by DPDK over the one provided by the > > QEMU command line? Sounds reasonable in principle, but I would get > confirmation > > from vDPA/vhost-net maintainers. I think the best way is to (and it seems easier) 1) have the management layer to make sure the mac came from cli matches the underlayer vDPA 2) having a sanity check and fail the device initialization if they don't match Thanks > > > > That said the way you’re hacking the vhost-user code breaks a valuable check > for > > bad vhost-user-blk backends. I would suggest finding another implementation > which > > does not regress functionality for other device types. > > > > > > >From: Hao Chen > > > > > >When use dpdk-vdpa tests vdpa device. You need to specify the mac address to > > >start the virtual machine through libvirt or qemu, but now, the libvirt or > > >qemu can call dpdk vdpa vendor driver's ops .get_config through > >vhost_net_get_config > > >to get the mac address of the vdpa hardware without manual configuration. > > > > > >v1->v2: > > >Only copy ETH_ALEN data of netcfg for some vdpa device such as > > >NVIDIA BLUEFIELD DPU(BF2)'s netcfg->status is not right. > > >We only need the mac address and don't care about the status field. > > > > > >Signed-off-by: Hao Chen > > >--- > > > hw/block/vhost-user-blk.c | 1 - > > > hw/net/virtio-net.c | 7 +++ > > > hw/virtio/vhost-user.c| 19 --- > > > 3 files changed, 7 insertions(+), 20 deletions(-) > > > > > >diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > > >index 9117222456..5dca4eab09 100644 > > >--- a/hw/block/vhost-user-blk.c > > >+++ b/hw/block/vhost-user-blk.c > > >@@ -337,7 +337,6 @@ static int vhost_user_blk_connect(DeviceState *dev, > >Error **errp) > > > > > > vhost_dev_set_config_notifier(>dev, _ops); > > > > > >-s->vhost_user.supports_config = true; > > > ret = vhost_dev_init(>dev, >vhost_user, VHOST_BACKEND_TYPE_USER, > > 0, > > > errp); > > > if (ret < 0) { > > >diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > > >index dd0d056fde..90405083b1 100644 > > >--- a/hw/net/virtio-net.c > > >+++ b/hw/net/virtio-net.c > > >@@ -166,6 +166,13 @@ static void virtio_net_get_config(VirtIODevice *vdev, > >uint8_t *config) > > > } > > > memcpy(config, , n->config_size); > > > } > > >+} else if (nc->peer && nc->peer->info->type == > >NET_CLIENT_DRIVER_VHOST_USER) { > > >+ret = vhost_net_get_config(get_vhost_net(nc->peer), (uint8_t > >*), > > >+ n->config_size); > > >+if (ret != -1) { > > >+ /* Automatically obtain the mac address of the vdpa device > > >+* when using the dpdk vdpa */ > > >+memcpy(config, , ETH_ALEN); > > > } > > > } > > > > > >diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > >index bd24741be8..8b01078249 100644 > > >--- a/hw/virtio/vhost-user.c > > >+++ b/hw/virtio/vhost-user.c > > >@@ -2013,8 +2013,6 @@ static int vhost_user_backend_init(struct vhost_dev > >*dev, void *opaque, > > > } > > > > > > if (virtio_has_feature(features, VHOST_USER_F_PROTOCOL_FEATURES)) { > > >-bool supports_f_config = vus->supports_config || > > >-(dev->config_ops && dev->config_ops->vhost_dev_config_notifier); > > > uint64_t protocol_features; > > > > > > dev->backend_features |= 1ULL << VHOST_USER_F_PROTOCOL_FEATURES; > > >@@ -2033,23 +2031,6 @@ static int vhost_user_backend_init(struct vhost_dev > >*dev, void *opaque, > > > */ > > > protocol_features &= VHOST_USER_PROTOCOL_FEATURE_MASK; > > > > > >-if (supports_f_config) { > > >-if (!virtio_has_feature(protocol_features, > > >-VHOST_USER_PROTOCOL_F_CONFIG)) { > > >-error_setg(errp, "vhost-user device expecting " > > >- "VHOST_USER_PROTOCOL_F_CONFIG but the vhost-user > >backend does " > > >- "not support it."); > > >-return -EPROTO; > > >-} > > >-} else { > > >-if (virtio_has_feature(protocol_features, > > >- VHOST_USER_PROTOCOL_F_CONFIG)) { > > >-warn_reportf_err(*errp, "vhost-user backend supports " > > >- "VHOST_USER_PROTOCOL_F_CONFIG but QEMU > >does not."); > > >-protocol_features &= ~(1ULL << > >VHOST_USER_PROTOCOL_F_CONFIG); > > >-} > > >-} > > >- > > > /* final set of protocol features */ > > > dev->protocol_features = protocol_features; > > > err = vhost_user_set_protocol_features(dev, dev->protocol_features); > > >-- > > >2.27.0 > > >
Re: [PATCH] hw/virtio/vhost-user: support obtain vdpa device's mac address automatically
On Tue, Sep 13, 2022 at 5:13 PM Hao Chen wrote: > > When use dpdk-vdpa tests vdpa device. You need to specify the mac address to > start the virtual machine through libvirt or qemu, but now, the libvirt or > qemu can call dpdk vdpa vendor driver's ops .get_config through > vhost_net_get_config > to get the mac address of the vdpa hardware without manual configuration. > > Signed-off-by: Hao Chen Adding Cindy for comments. Thanks > --- > hw/block/vhost-user-blk.c | 1 - > hw/net/virtio-net.c | 3 ++- > hw/virtio/vhost-user.c| 19 --- > 3 files changed, 2 insertions(+), 21 deletions(-) > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > index 9117222456..5dca4eab09 100644 > --- a/hw/block/vhost-user-blk.c > +++ b/hw/block/vhost-user-blk.c > @@ -337,7 +337,6 @@ static int vhost_user_blk_connect(DeviceState *dev, Error > **errp) > > vhost_dev_set_config_notifier(>dev, _ops); > > -s->vhost_user.supports_config = true; > ret = vhost_dev_init(>dev, >vhost_user, VHOST_BACKEND_TYPE_USER, 0, > errp); > if (ret < 0) { > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index dd0d056fde..274ea84644 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -149,7 +149,8 @@ static void virtio_net_get_config(VirtIODevice *vdev, > uint8_t *config) > * Is this VDPA? No peer means not VDPA: there's no way to > * disconnect/reconnect a VDPA peer. > */ > -if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { > +if ((nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) || > +(nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER)) { > ret = vhost_net_get_config(get_vhost_net(nc->peer), (uint8_t > *), > n->config_size); > if (ret != -1) { > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > index bd24741be8..8b01078249 100644 > --- a/hw/virtio/vhost-user.c > +++ b/hw/virtio/vhost-user.c > @@ -2013,8 +2013,6 @@ static int vhost_user_backend_init(struct vhost_dev > *dev, void *opaque, > } > > if (virtio_has_feature(features, VHOST_USER_F_PROTOCOL_FEATURES)) { > -bool supports_f_config = vus->supports_config || > -(dev->config_ops && dev->config_ops->vhost_dev_config_notifier); > uint64_t protocol_features; > > dev->backend_features |= 1ULL << VHOST_USER_F_PROTOCOL_FEATURES; > @@ -2033,23 +2031,6 @@ static int vhost_user_backend_init(struct vhost_dev > *dev, void *opaque, > */ > protocol_features &= VHOST_USER_PROTOCOL_FEATURE_MASK; > > -if (supports_f_config) { > -if (!virtio_has_feature(protocol_features, > -VHOST_USER_PROTOCOL_F_CONFIG)) { > -error_setg(errp, "vhost-user device expecting " > - "VHOST_USER_PROTOCOL_F_CONFIG but the vhost-user > backend does " > - "not support it."); > -return -EPROTO; > -} > -} else { > -if (virtio_has_feature(protocol_features, > - VHOST_USER_PROTOCOL_F_CONFIG)) { > -warn_reportf_err(*errp, "vhost-user backend supports " > - "VHOST_USER_PROTOCOL_F_CONFIG but QEMU does > not."); > -protocol_features &= ~(1ULL << VHOST_USER_PROTOCOL_F_CONFIG); > -} > -} > - > /* final set of protocol features */ > dev->protocol_features = protocol_features; > err = vhost_user_set_protocol_features(dev, dev->protocol_features); > -- > 2.27.0 >
Re: [PATCH v3 05/21] block/vhost-user-blk-server: don't expose VHOST_USER_F_PROTOCOL_FEATURES
在 2022/7/27 03:21, Alex Bennée 写道: This bit is unused in actual VirtIO feature negotiation and should only appear in the vhost-user messages between master and slave. I might be wrong, but this is actually used between master and slave not the device and driver? Thanks [AJB: experiment, this doesn't break the tests but I'm not super confident of the range of tests] Signed-off-by: Alex Bennée --- block/export/vhost-user-blk-server.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c index 3409d9e02e..d31436006d 100644 --- a/block/export/vhost-user-blk-server.c +++ b/block/export/vhost-user-blk-server.c @@ -125,8 +125,7 @@ static uint64_t vu_blk_get_features(VuDev *dev) 1ull << VIRTIO_BLK_F_MQ | 1ull << VIRTIO_F_VERSION_1 | 1ull << VIRTIO_RING_F_INDIRECT_DESC | - 1ull << VIRTIO_RING_F_EVENT_IDX | - 1ull << VHOST_USER_F_PROTOCOL_FEATURES; + 1ull << VIRTIO_RING_F_EVENT_IDX ; if (!vexp->handler.writable) { features |= 1ull << VIRTIO_BLK_F_RO;
Re: [PATCH] aio-posix: fix spurious ->poll_ready() callbacks in main loop
在 2022/2/23 下午11:57, Stefan Hajnoczi 写道: When ->poll() succeeds the AioHandler is placed on the ready list with revents set to the magic value 0. This magic value causes aio_dispatch_handler() to invoke ->poll_ready() instead of ->io_read() for G_IO_IN or ->io_write() for G_IO_OUT. This magic value 0 hack works for the IOThread where AioHandlers are placed on ->ready_list and processed by aio_dispatch_ready_handlers(). It does not work for the main loop where all AioHandlers are processed by aio_dispatch_handlers(), even those that are not ready and have a revents value of 0. As a result the main loop invokes ->poll_ready() on AioHandlers that are not ready. These spurious ->poll_ready() calls waste CPU cycles and could lead to crashes if the code assumes ->poll() must have succeeded before ->poll_ready() is called (a reasonable asumption but I haven't seen it in practice). Stop using revents to track whether ->poll_ready() will be called on an AioHandler. Introduce a separate AioHandler->poll_ready field instead. This eliminates spurious ->poll_ready() calls in the main loop. Fixes: 826cc32423db2a99d184dbf4f507c737d7e7a4ae ("aio-posix: split poll check from ready handler") Signed-off-by: Stefan Hajnoczi Reported-by: Jason Wang Tested-by: Jason Wang Thanks --- util/aio-posix.h | 1 + util/aio-posix.c | 32 ++-- 2 files changed, 19 insertions(+), 14 deletions(-) diff --git a/util/aio-posix.h b/util/aio-posix.h index 7f2c37a684..80b927c7f4 100644 --- a/util/aio-posix.h +++ b/util/aio-posix.h @@ -37,6 +37,7 @@ struct AioHandler { unsigned flags; /* see fdmon-io_uring.c */ #endif int64_t poll_idle_timeout; /* when to stop userspace polling */ +bool poll_ready; /* has polling detected an event? */ bool is_external; }; diff --git a/util/aio-posix.c b/util/aio-posix.c index 7b9f629218..be0182a3c6 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -23,15 +23,6 @@ #include "trace.h" #include "aio-posix.h" -/* - * G_IO_IN and G_IO_OUT are not appropriate revents values for polling, since - * the handler may not need to access the file descriptor. For example, the - * handler doesn't need to read from an EventNotifier if it polled a memory - * location and a read syscall would be slow. Define our own unique revents - * value to indicate that polling determined this AioHandler is ready. - */ -#define REVENTS_POLL_READY 0 - /* Stop userspace polling on a handler if it isn't active for some time */ #define POLL_IDLE_INTERVAL_NS (7 * NANOSECONDS_PER_SECOND) @@ -49,6 +40,14 @@ void aio_add_ready_handler(AioHandlerList *ready_list, QLIST_INSERT_HEAD(ready_list, node, node_ready); } +static void aio_add_poll_ready_handler(AioHandlerList *ready_list, + AioHandler *node) +{ +QLIST_SAFE_REMOVE(node, node_ready); /* remove from nested parent's list */ +node->poll_ready = true; +QLIST_INSERT_HEAD(ready_list, node, node_ready); +} + static AioHandler *find_aio_handler(AioContext *ctx, int fd) { AioHandler *node; @@ -76,6 +75,7 @@ static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node) } node->pfd.revents = 0; +node->poll_ready = false; /* If the fd monitor has already marked it deleted, leave it alone */ if (QLIST_IS_INSERTED(node, node_deleted)) { @@ -247,7 +247,7 @@ static bool poll_set_started(AioContext *ctx, AioHandlerList *ready_list, /* Poll one last time in case ->io_poll_end() raced with the event */ if (!started && node->io_poll(node->opaque)) { -aio_add_ready_handler(ready_list, node, REVENTS_POLL_READY); +aio_add_poll_ready_handler(ready_list, node); progress = true; } } @@ -282,6 +282,7 @@ bool aio_pending(AioContext *ctx) QLIST_FOREACH_RCU(node, >aio_handlers, node) { int revents; +/* TODO should this check poll ready? */ revents = node->pfd.revents & node->pfd.events; if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read && aio_node_check(ctx, node->is_external)) { @@ -323,11 +324,15 @@ static void aio_free_deleted_handlers(AioContext *ctx) static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node) { bool progress = false; +bool poll_ready; int revents; revents = node->pfd.revents & node->pfd.events; node->pfd.revents = 0; +poll_ready = node->poll_ready; +node->poll_ready = false; + /* * Start polling AioHandlers when they become ready because activity is * likely to continue. Note that starvation is theoretically possible when @@ -344,7 +349,7 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *nod
Re: [RFC PATCH 2/3] hw/sd/sdhci: Prohibit DMA accesses to devices
On Thu, Dec 16, 2021 at 4:57 AM Philippe Mathieu-Daudé wrote: > > From: Philippe Mathieu-Daudé > > The issue reported by OSS-Fuzz produces the following backtrace: > > ==447470==ERROR: AddressSanitizer: heap-buffer-overflow > READ of size 1 at 0x6152a080 thread T0 > #0 0x71766d47 in sdhci_read_dataport hw/sd/sdhci.c:474:18 > #1 0x7175f139 in sdhci_read hw/sd/sdhci.c:1022:19 > #2 0x721b937b in memory_region_read_accessor softmmu/memory.c:440:11 > #3 0x72171e51 in access_with_adjusted_size softmmu/memory.c:554:18 > #4 0x7216f47c in memory_region_dispatch_read1 softmmu/memory.c:1424:16 > #5 0x7216ebb9 in memory_region_dispatch_read softmmu/memory.c:1452:9 > #6 0x7212db5d in flatview_read_continue softmmu/physmem.c:2879:23 > #7 0x7212f958 in flatview_read softmmu/physmem.c:2921:12 > #8 0x7212f418 in address_space_read_full softmmu/physmem.c:2934:18 > #9 0x721305a9 in address_space_rw softmmu/physmem.c:2962:16 > #10 0x7175a392 in dma_memory_rw_relaxed include/sysemu/dma.h:89:12 > #11 0x7175a0ea in dma_memory_rw include/sysemu/dma.h:132:12 > #12 0x71759684 in dma_memory_read include/sysemu/dma.h:152:12 > #13 0x7175518c in sdhci_do_adma hw/sd/sdhci.c:823:27 > #14 0x7174bf69 in sdhci_data_transfer hw/sd/sdhci.c:935:13 > #15 0x7176aaa7 in sdhci_send_command hw/sd/sdhci.c:376:9 > #16 0x717629ee in sdhci_write hw/sd/sdhci.c:1212:9 > #17 0x72172513 in memory_region_write_accessor softmmu/memory.c:492:5 > #18 0x72171e51 in access_with_adjusted_size softmmu/memory.c:554:18 > #19 0x72170766 in memory_region_dispatch_write softmmu/memory.c:1504:16 > #20 0x721419ee in flatview_write_continue softmmu/physmem.c:2812:23 > #21 0x721301eb in flatview_write softmmu/physmem.c:2854:12 > #22 0x7212fca8 in address_space_write softmmu/physmem.c:2950:18 > #23 0x721d9a53 in qtest_process_command softmmu/qtest.c:727:9 > > A DMA descriptor is previously filled in RAM. An I/O access to the > device (frames #22 to #16) start the DMA engine (frame #13). The > engine fetch the descriptor and execute the request, which itself > accesses the SDHCI I/O registers (frame #1 and #0), triggering a > re-entrancy issue. > > Fix by prohibit transactions from the DMA to devices. The DMA engine > is thus restricted to memories. > > Reported-by: OSS-Fuzz (Issue 36391) > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/451 > Signed-off-by: Philippe Mathieu-Daudé > --- > hw/sd/sdhci.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c > index fe2f21f0c37..0e5e988927e 100644 > --- a/hw/sd/sdhci.c > +++ b/hw/sd/sdhci.c > @@ -741,6 +741,7 @@ static void sdhci_do_adma(SDHCIState *s) > { > unsigned int begin, length; > const uint16_t block_size = s->blksize & BLOCK_SIZE_MASK; > +const MemTxAttrs attrs = { .memory = true }; > ADMADescr dscr = {}; > MemTxResult res; > int i; > @@ -794,7 +795,7 @@ static void sdhci_do_adma(SDHCIState *s) > res = dma_memory_write(s->dma_as, dscr.addr, > >fifo_buffer[begin], > s->data_count - begin, > - MEMTXATTRS_UNSPECIFIED); > + attrs); > if (res != MEMTX_OK) { > break; > } > @@ -823,7 +824,7 @@ static void sdhci_do_adma(SDHCIState *s) > res = dma_memory_read(s->dma_as, dscr.addr, >>fifo_buffer[begin], >s->data_count - begin, > - MEMTXATTRS_UNSPECIFIED); > + attrs); > if (res != MEMTX_OK) { > break; > } I wonder how we can fix this for other devices, as this seems to be a known issue for many years. We've received many reports from the networking side. It looks like this patch simply forbids p2p which is probably not the case for other devices. I remember there's ideas like using bh from Paolo or detecting reentrancy in the memory core, both of them seems more general than this? Thanks > -- > 2.33.1 >
Re: [PATCH v2 03/15] net/vhost-vdpa: Fix device compatibility check
在 2021/10/8 下午9:34, Kevin Wolf 写道: vhost-vdpa works only with specific devices. At startup, it second guesses what the command line option handling will do and error out if it thinks a non-virtio device will attach to them. This second guessing is not only ugly, it can lead to wrong error messages ('-device floppy,netdev=foo' should complain about an unknown property, not about the wrong kind of network device being attached) and completely ignores hotplugging. Drop the old checks and implement .check_peer_type() instead to fix this. As a nice side effect, it also removes one more dependency on the legacy QemuOpts infrastructure and even reduces the code size. Signed-off-by: Kevin Wolf Acked-by: Jason Wang --- net/vhost-vdpa.c | 37 ++--- 1 file changed, 14 insertions(+), 23 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 912686457c..6dc68d8677 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -147,12 +147,26 @@ static bool vhost_vdpa_has_ufo(NetClientState *nc) } +static bool vhost_vdpa_check_peer_type(NetClientState *nc, ObjectClass *oc, + Error **errp) +{ +const char *driver = object_class_get_name(oc); + +if (!g_str_has_prefix(driver, "virtio-net-")) { +error_setg(errp, "vhost-vdpa requires frontend driver virtio-net-*"); +return false; +} + +return true; +} + static NetClientInfo net_vhost_vdpa_info = { .type = NET_CLIENT_DRIVER_VHOST_VDPA, .size = sizeof(VhostVDPAState), .cleanup = vhost_vdpa_cleanup, .has_vnet_hdr = vhost_vdpa_has_vnet_hdr, .has_ufo = vhost_vdpa_has_ufo, +.check_peer_type = vhost_vdpa_check_peer_type, }; static int net_vhost_vdpa_init(NetClientState *peer, const char *device, @@ -179,24 +193,6 @@ static int net_vhost_vdpa_init(NetClientState *peer, const char *device, return ret; } -static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp) -{ -const char *name = opaque; -const char *driver, *netdev; - -driver = qemu_opt_get(opts, "driver"); -netdev = qemu_opt_get(opts, "netdev"); -if (!driver || !netdev) { -return 0; -} -if (strcmp(netdev, name) == 0 && -!g_str_has_prefix(driver, "virtio-net-")) { -error_setg(errp, "vhost-vdpa requires frontend driver virtio-net-*"); -return -1; -} -return 0; -} - int net_init_vhost_vdpa(const Netdev *netdev, const char *name, NetClientState *peer, Error **errp) { @@ -204,10 +200,5 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name, assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA); opts = >u.vhost_vdpa; -/* verify net frontend */ -if (qemu_opts_foreach(qemu_find_opts("device"), net_vhost_check_net, - (char *)name, errp)) { -return -1; -} return net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name, opts->vhostdev); }
Re: [PATCH v2 02/15] net/vhost-user: Fix device compatibility check
在 2021/10/8 下午9:34, Kevin Wolf 写道: vhost-user works only with specific devices. At startup, it second guesses what the command line option handling will do and error out if it thinks a non-virtio device will attach to them. This second guessing is not only ugly, it can lead to wrong error messages ('-device floppy,netdev=foo' should complain about an unknown property, not about the wrong kind of network device being attached) and completely ignores hotplugging. Drop the old checks and implement .check_peer_type() instead to fix this. As a nice side effect, it also removes one more dependency on the legacy QemuOpts infrastructure and even reduces the code size. Signed-off-by: Kevin Wolf Acked-by: Jason Wang --- net/vhost-user.c | 41 ++--- 1 file changed, 14 insertions(+), 27 deletions(-) diff --git a/net/vhost-user.c b/net/vhost-user.c index 4a939124d2..b1a0247b59 100644 --- a/net/vhost-user.c +++ b/net/vhost-user.c @@ -198,6 +198,19 @@ static bool vhost_user_has_ufo(NetClientState *nc) return true; } +static bool vhost_user_check_peer_type(NetClientState *nc, ObjectClass *oc, + Error **errp) +{ +const char *driver = object_class_get_name(oc); + +if (!g_str_has_prefix(driver, "virtio-net-")) { +error_setg(errp, "vhost-user requires frontend driver virtio-net-*"); +return false; +} + +return true; +} + static NetClientInfo net_vhost_user_info = { .type = NET_CLIENT_DRIVER_VHOST_USER, .size = sizeof(NetVhostUserState), @@ -207,6 +220,7 @@ static NetClientInfo net_vhost_user_info = { .has_ufo = vhost_user_has_ufo, .set_vnet_be = vhost_user_set_vnet_endianness, .set_vnet_le = vhost_user_set_vnet_endianness, +.check_peer_type = vhost_user_check_peer_type, }; static gboolean net_vhost_user_watch(void *do_not_use, GIOCondition cond, @@ -397,27 +411,6 @@ static Chardev *net_vhost_claim_chardev( return chr; } -static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp) -{ -const char *name = opaque; -const char *driver, *netdev; - -driver = qemu_opt_get(opts, "driver"); -netdev = qemu_opt_get(opts, "netdev"); - -if (!driver || !netdev) { -return 0; -} - -if (strcmp(netdev, name) == 0 && -!g_str_has_prefix(driver, "virtio-net-")) { -error_setg(errp, "vhost-user requires frontend driver virtio-net-*"); -return -1; -} - -return 0; -} - int net_init_vhost_user(const Netdev *netdev, const char *name, NetClientState *peer, Error **errp) { @@ -433,12 +426,6 @@ int net_init_vhost_user(const Netdev *netdev, const char *name, return -1; } -/* verify net frontend */ -if (qemu_opts_foreach(qemu_find_opts("device"), net_vhost_check_net, - (char *)name, errp)) { -return -1; -} - queues = vhost_user_opts->has_queues ? vhost_user_opts->queues : 1; if (queues < 1 || queues > MAX_QUEUE_NUM) { error_setg(errp,
Re: [PATCH v2 01/15] net: Introduce NetClientInfo.check_peer_type()
在 2021/10/8 下午9:34, Kevin Wolf 写道: Some network backends (vhost-user and vhost-vdpa) work only with specific devices. At startup, they second guess what the command line option handling will do and error out if they think a non-virtio device will attach to them. This second guessing is not only ugly, it can lead to wrong error messages ('-device floppy,netdev=foo' should complain about an unknown property, not about the wrong kind of network device being attached) and completely ignores hotplugging. Add a callback where backends can check compatibility with a device when it actually tries to attach, even on hotplug. Signed-off-by: Kevin Wolf Acked-by: Jason Wang --- include/net/net.h| 2 ++ hw/core/qdev-properties-system.c | 6 ++ 2 files changed, 8 insertions(+) diff --git a/include/net/net.h b/include/net/net.h index 5d1508081f..986288eb07 100644 --- a/include/net/net.h +++ b/include/net/net.h @@ -62,6 +62,7 @@ typedef struct SocketReadState SocketReadState; typedef void (SocketReadStateFinalize)(SocketReadState *rs); typedef void (NetAnnounce)(NetClientState *); typedef bool (SetSteeringEBPF)(NetClientState *, int); +typedef bool (NetCheckPeerType)(NetClientState *, ObjectClass *, Error **); typedef struct NetClientInfo { NetClientDriver type; @@ -84,6 +85,7 @@ typedef struct NetClientInfo { SetVnetBE *set_vnet_be; NetAnnounce *announce; SetSteeringEBPF *set_steering_ebpf; +NetCheckPeerType *check_peer_type; } NetClientInfo; struct NetClientState { diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c index e71f5d64d1..a91f60567a 100644 --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -431,6 +431,12 @@ static void set_netdev(Object *obj, Visitor *v, const char *name, goto out; } +if (peers[i]->info->check_peer_type) { +if (!peers[i]->info->check_peer_type(peers[i], obj->class, errp)) { +goto out; +} +} + ncs[i] = peers[i]; ncs[i]->queue_index = i; }
Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy
On Wed, Sep 8, 2021 at 11:19 PM Peter Xu wrote: > > On Wed, Sep 08, 2021 at 09:19:27AM +0100, Dr. David Alan Gilbert wrote: > > * Jason Wang (jasow...@redhat.com) wrote: > > > On Wed, Sep 8, 2021 at 2:32 AM Peter Xu wrote: > > > > > > > > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos > > > > wrote: > > > > > > I don't think it is valid to unconditionally enable this feature > > > > > > due to the > > > > > > resource usage implications > > > > > > > > > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html > > > > > > > > > > > > "A zerocopy failure will return -1 with errno ENOBUFS. This > > > > > > happens > > > > > >if the socket option was not set, the socket exceeds its optmem > > > > > >limit or the user exceeds its ulimit on locked pages." > > > > > > > > > > You are correct, I unfortunately missed this part in the docs :( > > > > > > > > > > > The limit on locked pages is something that looks very likely to be > > > > > > exceeded unless you happen to be running a QEMU config that already > > > > > > implies locked memory (eg PCI assignment) > > > > > > > > > > Do you mean the limit an user has on locking memory? > > > > > > > > > > If so, that makes sense. I remember I needed to set the upper limit > > > > > of locked > > > > > memory for the user before using it, or adding a capability to qemu > > > > > before. > > > > > > > > So I'm a bit confused on why MSG_ZEROCOPY requires checking > > > > RLIMIT_MEMLOCK. > > > > > > > > The thing is IIUC that's accounting for pinned pages only with either > > > > mlock() > > > > (FOLL_MLOCK) or vfio (FOLL_PIN). > > > > > > > > I don't really think MSG_ZEROCOPY is doing that at all... I'm looking > > > > at > > > > __zerocopy_sg_from_iter() -> iov_iter_get_pages(). > > > > > > It happens probably here: > > > > > > E.g > > > > > > __ip_append_data() > > > msg_zerocopy_realloc() > > > mm_account_pinned_pages() > > > > When do they get uncounted? i.e. is it just the data that's in flight > > that's marked as pinned? > > I think so - there's __msg_zerocopy_callback() -> mm_unaccount_pinned_pages() > correspondingly. Thanks, Yes, and actually the memory that could be pinned is limited by the sndbuf of TCP socket. So we are fine with rlimit (e.g we don't need to pin all guest pages). Thanks > > -- > Peter Xu >
Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy
On Wed, Sep 8, 2021 at 11:24 AM Peter Xu wrote: > > On Wed, Sep 08, 2021 at 10:59:57AM +0800, Jason Wang wrote: > > On Wed, Sep 8, 2021 at 2:32 AM Peter Xu wrote: > > > > > > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos > > > wrote: > > > > > I don't think it is valid to unconditionally enable this feature due > > > > > to the > > > > > resource usage implications > > > > > > > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html > > > > > > > > > > "A zerocopy failure will return -1 with errno ENOBUFS. This happens > > > > >if the socket option was not set, the socket exceeds its optmem > > > > >limit or the user exceeds its ulimit on locked pages." > > > > > > > > You are correct, I unfortunately missed this part in the docs :( > > > > > > > > > The limit on locked pages is something that looks very likely to be > > > > > exceeded unless you happen to be running a QEMU config that already > > > > > implies locked memory (eg PCI assignment) > > > > > > > > Do you mean the limit an user has on locking memory? > > > > > > > > If so, that makes sense. I remember I needed to set the upper limit of > > > > locked > > > > memory for the user before using it, or adding a capability to qemu > > > > before. > > > > > > So I'm a bit confused on why MSG_ZEROCOPY requires checking > > > RLIMIT_MEMLOCK. > > > > > > The thing is IIUC that's accounting for pinned pages only with either > > > mlock() > > > (FOLL_MLOCK) or vfio (FOLL_PIN). > > > > > > I don't really think MSG_ZEROCOPY is doing that at all... I'm looking at > > > __zerocopy_sg_from_iter() -> iov_iter_get_pages(). > > > > It happens probably here: > > > > E.g > > > > __ip_append_data() > > msg_zerocopy_realloc() > > mm_account_pinned_pages() > > Right. :) > > But my previous question is more about the reason behind it - I thought that's > a common GUP and it shouldn't rely on locked_vm because it should still be > temporary GUP rather than a long time GUP, IMHO that's how we use locked_vm > (we > don't account for temp GUPs but only longterm ones). IOW, I'm wondering > whether > all the rest of iov_iter_get_pages() callers should check locked_vm too, and > AFAIU they're not doing that right now.. You are probably right, but I'm not sure if it's too late to fix that. (E.g it will break existing userspace) Thanks > > Thanks, > > -- > Peter Xu >
Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy
On Wed, Sep 8, 2021 at 2:32 AM Peter Xu wrote: > > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos wrote: > > > I don't think it is valid to unconditionally enable this feature due to > > > the > > > resource usage implications > > > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html > > > > > > "A zerocopy failure will return -1 with errno ENOBUFS. This happens > > >if the socket option was not set, the socket exceeds its optmem > > >limit or the user exceeds its ulimit on locked pages." > > > > You are correct, I unfortunately missed this part in the docs :( > > > > > The limit on locked pages is something that looks very likely to be > > > exceeded unless you happen to be running a QEMU config that already > > > implies locked memory (eg PCI assignment) > > > > Do you mean the limit an user has on locking memory? > > > > If so, that makes sense. I remember I needed to set the upper limit of > > locked > > memory for the user before using it, or adding a capability to qemu before. > > So I'm a bit confused on why MSG_ZEROCOPY requires checking RLIMIT_MEMLOCK. > > The thing is IIUC that's accounting for pinned pages only with either mlock() > (FOLL_MLOCK) or vfio (FOLL_PIN). > > I don't really think MSG_ZEROCOPY is doing that at all... I'm looking at > __zerocopy_sg_from_iter() -> iov_iter_get_pages(). It happens probably here: E.g __ip_append_data() msg_zerocopy_realloc() mm_account_pinned_pages() Thanks > > Say, I think there're plenty of iov_iter_get_pages() callers and normal GUPs, > I > think most of them do no need such accountings. > > -- > Peter Xu >
Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy
在 2021/9/1 下午11:35, Peter Xu 写道: On Wed, Sep 01, 2021 at 09:53:07AM +0100, Daniel P. Berrangé wrote: On Tue, Aug 31, 2021 at 04:29:09PM -0400, Peter Xu wrote: On Tue, Aug 31, 2021 at 02:16:42PM +0100, Daniel P. Berrangé wrote: On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote: Call qio_channel_set_zerocopy(true) in the start of every multifd thread. Change the send_write() interface of multifd, allowing it to pass down flags for qio_channel_write*(). Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping the other data being sent at the default copying approach. Signed-off-by: Leonardo Bras --- migration/multifd-zlib.c | 7 --- migration/multifd-zstd.c | 7 --- migration/multifd.c | 9 ++--- migration/multifd.h | 3 ++- 4 files changed, 16 insertions(+), 10 deletions(-) @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque) } if (used) { -ret = multifd_send_state->ops->send_write(p, used, _err); +ret = multifd_send_state->ops->send_write(p, used, MSG_ZEROCOPY, + _err); I don't think it is valid to unconditionally enable this feature due to the resource usage implications https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html "A zerocopy failure will return -1 with errno ENOBUFS. This happens if the socket option was not set, the socket exceeds its optmem limit or the user exceeds its ulimit on locked pages." The limit on locked pages is something that looks very likely to be exceeded unless you happen to be running a QEMU config that already implies locked memory (eg PCI assignment) Yes it would be great to be a migration capability in parallel to multifd. At initial phase if it's easy to be implemented on multi-fd only, we can add a dependency between the caps. In the future we can remove that dependency when the code is ready to go without multifd. Thanks, Also, I'm wondering how zerocopy support interacts with kernel support for kTLS and multipath-TCP, both of which we want to be able to use with migration. Copying Jason Wang for net implications between these features on kernel side Note that the MSG_ZEROCOPY is contributed by Google :) and whether they can be enabled together (MSG_ZEROCOPY, mptcp, kTLS). I think they can. Anyway kernel can choose to do datacopy when necessary. Note that the "zerocopy" is probably not correct here. What's better is "Enable MSG_ZEROCOPY" since: 1) kernel supports various kinds of zerocopy, for TX, it has supported sendfile() for many years. 2) MSG_ZEROCOPY is only used for TX but not RX 3) TCP rx zerocopy is only supported via mmap() which requires some extra configurations e.g 4K MTU, driver support for header split etc. [1] https://www.youtube.com/watch?v=_ZfiQGWFvg0 Thanks From the safe side we may want to only enable one of them until we prove they'll work together I guess.. Not a immediate concern as I don't really think any of them is really explicitly supported in qemu. KTLS may be implicitly included by a new gnutls, but we need to mark TLS and ZEROCOPY mutual exclusive anyway because at least the userspace TLS code of gnutls won't has a way to maintain the tls buffers used by zerocopy. So at least we need some knob to detect whether kTLS is enabled in gnutls.
Re: [PATCH v6 4/6] qmp: add QMP command x-debug-virtio-queue-status
在 2021/8/26 下午2:25, Jonah Palmer 写道: Hi Jason, could I get your thoughts on this implementation question below? I'm not too sure on how I should proceed determining if vhost is active or not. Thank you! Jonah On 7/26/21 5:33 AM, Jonah Palmer wrote: On 7/22/21 5:22 AM, Jason Wang wrote: 在 2021/7/21 下午4:59, Jonah Palmer 写道: On 7/13/21 10:37 PM, Jason Wang wrote: 在 2021/7/12 下午6:35, Jonah Palmer 写道: From: Laurent Vivier This new command shows internal status of a VirtQueue. (vrings and indexes). Signed-off-by: Laurent Vivier Signed-off-by: Jonah Palmer --- hw/virtio/virtio-stub.c | 6 +++ hw/virtio/virtio.c | 37 ++ qapi/virtio.json | 102 3 files changed, 145 insertions(+) [Jonah: Added 'device-type' field to VirtQueueStatus and qmp command x-debug-virtio-queue-status.] diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c index ddb592f..3c1bf17 100644 --- a/hw/virtio/virtio-stub.c +++ b/hw/virtio/virtio-stub.c @@ -17,3 +17,9 @@ VirtioStatus *qmp_x_debug_virtio_status(const char* path, Error **errp) { return qmp_virtio_unsupported(errp); } + +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path, + uint16_t queue, Error **errp) +{ + return qmp_virtio_unsupported(errp); +} diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 81a0ee8..ccd4371 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -3935,6 +3935,43 @@ static VirtIODevice *virtio_device_find(const char *path) return NULL; } +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path, + uint16_t queue, Error **errp) +{ + VirtIODevice *vdev; + VirtQueueStatus *status; + + vdev = virtio_device_find(path); + if (vdev == NULL) { + error_setg(errp, "Path %s is not a VirtIO device", path); + return NULL; + } + + if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) { + error_setg(errp, "Invalid virtqueue number %d", queue); + return NULL; + } + + status = g_new0(VirtQueueStatus, 1); + status->device_type = qapi_enum_parse(_lookup, vdev->name, + VIRTIO_TYPE_UNKNOWN, NULL); + status->queue_index = vdev->vq[queue].queue_index; + status->inuse = vdev->vq[queue].inuse; + status->vring_num = vdev->vq[queue].vring.num; + status->vring_num_default = vdev->vq[queue].vring.num_default; + status->vring_align = vdev->vq[queue].vring.align; + status->vring_desc = vdev->vq[queue].vring.desc; + status->vring_avail = vdev->vq[queue].vring.avail; + status->vring_used = vdev->vq[queue].vring.used; + status->last_avail_idx = vdev->vq[queue].last_avail_idx; As mentioned in previous versions. We need add vhost support otherwise the value here is wrong. Got it. I'll add a case to determine if vhost is active for a given device. So, in the case that vhost is active, should I just not print out the value or would I substitute it with another value (whatever that might be)? You can query the vhost for those index. (vhost_get_vring_base()) Same question for shadow_avail_idx below as well. It's an implementation specific. I think we can simply not show it if vhost is enabled. Thanks Ah I see, thank you! So, it appears to me that it's not very easy to get the struct vhost_dev pointer from struct VirtIODevice to indicate whether or not vhost is active, e.g. there's no virtio class-independent way to get struct vhost_dev. I was thinking of adding an op/callback function to struct VirtioDeviceClass, e.g. bool has_vhost(VirtIODevice *vdev), and implement it for each virtio class (net, scsi, blk, etc.). For example, for virtio-net, maybe it'd be something like: bool has_vhost(VirtIODevice *vdev) { VirtIONet *n = VIRTIO_NET(vdev); NetClientState *nc = qemu_get_queue(n->nic); return nc->peer ? get_vhost_net(nc->peer) : false; } Something like this, yes. Also, for getting the last_avail_idx, I was also thinking of adding another op/callback to struct VirtioDeviceClass, e.g. unsigned int get_last_avail_idx(VirtIODevice *vdev, unsigned int vq_idx) that finds if vhost is active or not and either gets last_avail_idx from virtio directly or through vhost (e.g. vhost_dev->vhost_ops->vhost_get_vring_base()). So I think instead of has_vhost, we probably need get_vhost() to have a pointer to vhost_dev. Then we can do anything we want other than a dedicated interface just for avail index. Thanks I wanted to run this by you and get your opinion on this before I started implementing it in code. Let me know what you think about this. Jonah Jonah + status->shadow_avail_idx = vdev->vq[queue].shadow_avail_idx; The shadow index is something that is implementation specific e.g in the case of vhost it's kind of meaningless.
Re: [PATCH v6 4/6] qmp: add QMP command x-debug-virtio-queue-status
在 2021/7/21 下午4:59, Jonah Palmer 写道: On 7/13/21 10:37 PM, Jason Wang wrote: 在 2021/7/12 下午6:35, Jonah Palmer 写道: From: Laurent Vivier This new command shows internal status of a VirtQueue. (vrings and indexes). Signed-off-by: Laurent Vivier Signed-off-by: Jonah Palmer --- hw/virtio/virtio-stub.c | 6 +++ hw/virtio/virtio.c | 37 ++ qapi/virtio.json | 102 3 files changed, 145 insertions(+) [Jonah: Added 'device-type' field to VirtQueueStatus and qmp command x-debug-virtio-queue-status.] diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c index ddb592f..3c1bf17 100644 --- a/hw/virtio/virtio-stub.c +++ b/hw/virtio/virtio-stub.c @@ -17,3 +17,9 @@ VirtioStatus *qmp_x_debug_virtio_status(const char* path, Error **errp) { return qmp_virtio_unsupported(errp); } + +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path, + uint16_t queue, Error **errp) +{ + return qmp_virtio_unsupported(errp); +} diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 81a0ee8..ccd4371 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -3935,6 +3935,43 @@ static VirtIODevice *virtio_device_find(const char *path) return NULL; } +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path, + uint16_t queue, Error **errp) +{ + VirtIODevice *vdev; + VirtQueueStatus *status; + + vdev = virtio_device_find(path); + if (vdev == NULL) { + error_setg(errp, "Path %s is not a VirtIO device", path); + return NULL; + } + + if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) { + error_setg(errp, "Invalid virtqueue number %d", queue); + return NULL; + } + + status = g_new0(VirtQueueStatus, 1); + status->device_type = qapi_enum_parse(_lookup, vdev->name, + VIRTIO_TYPE_UNKNOWN, NULL); + status->queue_index = vdev->vq[queue].queue_index; + status->inuse = vdev->vq[queue].inuse; + status->vring_num = vdev->vq[queue].vring.num; + status->vring_num_default = vdev->vq[queue].vring.num_default; + status->vring_align = vdev->vq[queue].vring.align; + status->vring_desc = vdev->vq[queue].vring.desc; + status->vring_avail = vdev->vq[queue].vring.avail; + status->vring_used = vdev->vq[queue].vring.used; + status->last_avail_idx = vdev->vq[queue].last_avail_idx; As mentioned in previous versions. We need add vhost support otherwise the value here is wrong. Got it. I'll add a case to determine if vhost is active for a given device. So, in the case that vhost is active, should I just not print out the value or would I substitute it with another value (whatever that might be)? You can query the vhost for those index. (vhost_get_vring_base()) Same question for shadow_avail_idx below as well. It's an implementation specific. I think we can simply not show it if vhost is enabled. Thanks Jonah + status->shadow_avail_idx = vdev->vq[queue].shadow_avail_idx; The shadow index is something that is implementation specific e.g in the case of vhost it's kind of meaningless. Thanks + status->used_idx = vdev->vq[queue].used_idx; + status->signalled_used = vdev->vq[queue].signalled_used; + status->signalled_used_valid = vdev->vq[queue].signalled_used_valid; + + return status; +} + #define CONVERT_FEATURES(type, map) \ ({ \ type *list = NULL; \ diff --git a/qapi/virtio.json b/qapi/virtio.json index 78873cd..7007e0c 100644 --- a/qapi/virtio.json +++ b/qapi/virtio.json @@ -406,3 +406,105 @@ 'data': { 'path': 'str' }, 'returns': 'VirtioStatus' } + +## +# @VirtQueueStatus: +# +# Status of a VirtQueue +# +# @device-type: VirtIO device type +# +# @queue-index: VirtQueue queue_index +# +# @inuse: VirtQueue inuse +# +# @vring-num: VirtQueue vring.num +# +# @vring-num-default: VirtQueue vring.num_default +# +# @vring-align: VirtQueue vring.align +# +# @vring-desc: VirtQueue vring.desc +# +# @vring-avail: VirtQueue vring.avail +# +# @vring-used: VirtQueue vring.used +# +# @last-avail-idx: VirtQueue last_avail_idx +# +# @shadow-avail-idx: VirtQueue shadow_avail_idx +# +# @used-idx: VirtQueue used_idx +# +# @signalled-used: VirtQueue signalled_used +# +# @signalled-used-valid: VirtQueue signalled_used_valid +# +# Since: 6.1 +# +## + +{ 'struct': 'VirtQueueStatus', + 'data': { + 'device-type': 'VirtioType', + 'queue-index': 'uint16', + 'inuse': 'uint32', + 'vring-num': 'int', + 'vring-num-default': 'int', + 'vring-align': 'int', + 'vring-desc': 'uint64', + 'vring-avail': 'uint64', + 'vring-used': 'uint64', + 'last-avail-i
Re: [PATCH v6 6/6] hmp: add virtio commands
在 2021/7/21 下午5:11, Jonah Palmer 写道: On 7/13/21 10:40 PM, Jason Wang wrote: 在 2021/7/12 下午6:35, Jonah Palmer 写道: +void hmp_virtio_queue_status(Monitor *mon, const QDict *qdict) +{ + Error *err = NULL; + const char *path = qdict_get_try_str(qdict, "path"); + int queue = qdict_get_int(qdict, "queue"); + VirtQueueStatus *s = qmp_x_debug_virtio_queue_status(path, queue, ); + + if (err != NULL) { + hmp_handle_error(mon, err); + return; + } + + monitor_printf(mon, "%s:\n", path); + monitor_printf(mon, " device_type: %s\n", + VirtioType_str(s->device_type)); + monitor_printf(mon, " index: %d\n", s->queue_index); + monitor_printf(mon, " inuse: %d\n", s->inuse); + monitor_printf(mon, " last_avail_idx: %d (%"PRId64" %% %"PRId64")\n", + s->last_avail_idx, s->last_avail_idx % s->vring_num, + s->vring_num); + monitor_printf(mon, " shadow_avail_idx: %d (%"PRId64" %% %"PRId64")\n", + s->shadow_avail_idx, s->shadow_avail_idx % s->vring_num, + s->vring_num); + monitor_printf(mon, " used_idx: %d (%"PRId64" %% %"PRId64")\n", + s->used_idx, s->used_idx % s->vring_num, s->vring_num); The modular information is not the case of packed ring where the queue size does not have to be a power of 2. Doesn't modulo work for any integer, regardless if it's a power of 2 or not? Could you clarify this for me? For packed ring, the index doesn't increase freely, it's always small than the virtqueue size. So showing the modulo arithmetic seems useless since the device or driver doesn't use modulo for calculating the real offset. Thanks Thank you,
Re: [PATCH v6 0/6] hmp, qmp: Add some commands to introspect virtio devices
在 2021/7/21 下午4:53, Jonah Palmer 写道: Hi Jason. My apologies for the delayed response, several work-related things came up recently, but they're slowing down now so I'm turning my attention these patches to get taken care of. A few questions and comments below (and in other following patches): On 7/13/21 10:42 PM, Jason Wang wrote: 在 2021/7/12 下午6:35, Jonah Palmer 写道: Dump the information of the head element of the third queue of virtio-scsi: (qemu) virtio queue-element /machine/peripheral-anon/device[3]/virtio-backend 3 index: 122 ndescs: 3 descs: addr 0x7302d000 len 4096 (write), addr 0x3c951763 len 108 (write, next), addr 0x3c951728 len 51 (next) I think it would be nice if we can show driver area and device area as well here. Sure thing. And I apologize if it's obvious (I'm relatively new to virtio), but how can I expose the driver area? So the spec defines three parts: the device area, the driver area, and the descriptor area. And they are all located in the guest memory. I understand that virtio devices are part of the Qemu process, but I also thought that virtio drivers are in the guest's kernel, which I don't believe I can see into from Qemu (or, at least, it's not obvious to me). It works like how you access the descriptor ring (descriptor area). Thanks Jonah Thanks
Re: [PATCH v6 0/6] hmp, qmp: Add some commands to introspect virtio devices
在 2021/7/12 下午6:35, Jonah Palmer 写道: Dump the information of the head element of the third queue of virtio-scsi: (qemu) virtio queue-element /machine/peripheral-anon/device[3]/virtio-backend 3 index: 122 ndescs: 3 descs: addr 0x7302d000 len 4096 (write), addr 0x3c951763 len 108 (write, next), addr 0x3c951728 len 51 (next) I think it would be nice if we can show driver area and device area as well here. Thanks
Re: [PATCH v6 6/6] hmp: add virtio commands
在 2021/7/12 下午6:35, Jonah Palmer 写道: +void hmp_virtio_queue_status(Monitor *mon, const QDict *qdict) +{ +Error *err = NULL; +const char *path = qdict_get_try_str(qdict, "path"); +int queue = qdict_get_int(qdict, "queue"); +VirtQueueStatus *s = qmp_x_debug_virtio_queue_status(path, queue, ); + +if (err != NULL) { +hmp_handle_error(mon, err); +return; +} + +monitor_printf(mon, "%s:\n", path); +monitor_printf(mon, " device_type: %s\n", + VirtioType_str(s->device_type)); +monitor_printf(mon, " index:%d\n", s->queue_index); +monitor_printf(mon, " inuse:%d\n", s->inuse); +monitor_printf(mon, " last_avail_idx: %d (%"PRId64" %% %"PRId64")\n", + s->last_avail_idx, s->last_avail_idx % s->vring_num, + s->vring_num); +monitor_printf(mon, " shadow_avail_idx: %d (%"PRId64" %% %"PRId64")\n", + s->shadow_avail_idx, s->shadow_avail_idx % s->vring_num, + s->vring_num); +monitor_printf(mon, " used_idx: %d (%"PRId64" %% %"PRId64")\n", + s->used_idx, s->used_idx % s->vring_num, s->vring_num); The modular information is not the case of packed ring where the queue size does not have to be a power of 2. Thanks +monitor_printf(mon, " signalled_used: %d (%"PRId64" %% %"PRId64")\n", + s->signalled_used, s->signalled_used % s->vring_num, + s->vring_num); +monitor_printf(mon, " signalled_used_valid: %d\n", s->signalled_used_valid); +monitor_printf(mon, " VRing:\n"); +monitor_printf(mon, "num: %"PRId64"\n", s->vring_num); +monitor_printf(mon, "num_default: %"PRId64"\n", s->vring_num_default); +monitor_printf(mon, "align: %"PRId64"\n", s->vring_align); +monitor_printf(mon, "desc:0x%016"PRIx64"\n", s->vring_desc); +monitor_printf(mon, "avail: 0x%016"PRIx64"\n", s->vring_avail); +monitor_printf(mon, "used:0x%016"PRIx64"\n", s->vring_used); + +qapi_free_VirtQueueStatus(s);
Re: [PATCH v6 4/6] qmp: add QMP command x-debug-virtio-queue-status
在 2021/7/12 下午6:35, Jonah Palmer 写道: From: Laurent Vivier This new command shows internal status of a VirtQueue. (vrings and indexes). Signed-off-by: Laurent Vivier Signed-off-by: Jonah Palmer --- hw/virtio/virtio-stub.c | 6 +++ hw/virtio/virtio.c | 37 ++ qapi/virtio.json| 102 3 files changed, 145 insertions(+) [Jonah: Added 'device-type' field to VirtQueueStatus and qmp command x-debug-virtio-queue-status.] diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c index ddb592f..3c1bf17 100644 --- a/hw/virtio/virtio-stub.c +++ b/hw/virtio/virtio-stub.c @@ -17,3 +17,9 @@ VirtioStatus *qmp_x_debug_virtio_status(const char* path, Error **errp) { return qmp_virtio_unsupported(errp); } + +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path, + uint16_t queue, Error **errp) +{ +return qmp_virtio_unsupported(errp); +} diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 81a0ee8..ccd4371 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -3935,6 +3935,43 @@ static VirtIODevice *virtio_device_find(const char *path) return NULL; } +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path, + uint16_t queue, Error **errp) +{ +VirtIODevice *vdev; +VirtQueueStatus *status; + +vdev = virtio_device_find(path); +if (vdev == NULL) { +error_setg(errp, "Path %s is not a VirtIO device", path); +return NULL; +} + +if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) { +error_setg(errp, "Invalid virtqueue number %d", queue); +return NULL; +} + +status = g_new0(VirtQueueStatus, 1); +status->device_type = qapi_enum_parse(_lookup, vdev->name, + VIRTIO_TYPE_UNKNOWN, NULL); +status->queue_index = vdev->vq[queue].queue_index; +status->inuse = vdev->vq[queue].inuse; +status->vring_num = vdev->vq[queue].vring.num; +status->vring_num_default = vdev->vq[queue].vring.num_default; +status->vring_align = vdev->vq[queue].vring.align; +status->vring_desc = vdev->vq[queue].vring.desc; +status->vring_avail = vdev->vq[queue].vring.avail; +status->vring_used = vdev->vq[queue].vring.used; +status->last_avail_idx = vdev->vq[queue].last_avail_idx; As mentioned in previous versions. We need add vhost support otherwise the value here is wrong. +status->shadow_avail_idx = vdev->vq[queue].shadow_avail_idx; The shadow index is something that is implementation specific e.g in the case of vhost it's kind of meaningless. Thanks +status->used_idx = vdev->vq[queue].used_idx; +status->signalled_used = vdev->vq[queue].signalled_used; +status->signalled_used_valid = vdev->vq[queue].signalled_used_valid; + +return status; +} + #define CONVERT_FEATURES(type, map)\ ({ \ type *list = NULL; \ diff --git a/qapi/virtio.json b/qapi/virtio.json index 78873cd..7007e0c 100644 --- a/qapi/virtio.json +++ b/qapi/virtio.json @@ -406,3 +406,105 @@ 'data': { 'path': 'str' }, 'returns': 'VirtioStatus' } + +## +# @VirtQueueStatus: +# +# Status of a VirtQueue +# +# @device-type: VirtIO device type +# +# @queue-index: VirtQueue queue_index +# +# @inuse: VirtQueue inuse +# +# @vring-num: VirtQueue vring.num +# +# @vring-num-default: VirtQueue vring.num_default +# +# @vring-align: VirtQueue vring.align +# +# @vring-desc: VirtQueue vring.desc +# +# @vring-avail: VirtQueue vring.avail +# +# @vring-used: VirtQueue vring.used +# +# @last-avail-idx: VirtQueue last_avail_idx +# +# @shadow-avail-idx: VirtQueue shadow_avail_idx +# +# @used-idx: VirtQueue used_idx +# +# @signalled-used: VirtQueue signalled_used +# +# @signalled-used-valid: VirtQueue signalled_used_valid +# +# Since: 6.1 +# +## + +{ 'struct': 'VirtQueueStatus', + 'data': { +'device-type': 'VirtioType', +'queue-index': 'uint16', +'inuse': 'uint32', +'vring-num': 'int', +'vring-num-default': 'int', +'vring-align': 'int', +'vring-desc': 'uint64', +'vring-avail': 'uint64', +'vring-used': 'uint64', +'last-avail-idx': 'uint16', +'shadow-avail-idx': 'uint16', +'used-idx': 'uint16', +'signalled-used': 'uint16', +'signalled-used-valid': 'uint16' + } +} + +## +# @x-debug-virtio-queue-status: +# +# Return the status of a given VirtQueue +# +# @path: QOBject path of the VirtIODevice +# +# @queue: queue number to examine +# +# Returns: Status of the VirtQueue +# +# Since: 6.1 +# +# Example: +# +# -> { "execute": "x-debug-virtio-queue-status", +# "arguments": { +# "path": "/machine/peripheral-anon/device[3]/virtio-backend", +# "queue": 0 +# } +# } +# <- { "return": { +#
Re: [PATCH 3/3] virtio-net: Constify VirtIOFeature feature_sizes[]
在 2021/5/11 下午6:41, Philippe Mathieu-Daudé 写道: Signed-off-by: Philippe Mathieu-Daudé --- hw/net/virtio-net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 66b9ff45118..6b7e8dd04ef 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -89,7 +89,7 @@ VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \ VIRTIO_NET_RSS_HASH_TYPE_UDP_EX) -static VirtIOFeature feature_sizes[] = { +static const VirtIOFeature feature_sizes[] = { {.flags = 1ULL << VIRTIO_NET_F_MAC, .end = endof(struct virtio_net_config, mac)}, {.flags = 1ULL << VIRTIO_NET_F_STATUS, Acked-by: Jason Wang
Re: [PATCH 2/3] virtio-blk: Constify VirtIOFeature feature_sizes[]
在 2021/5/11 下午6:41, Philippe Mathieu-Daudé 写道: Signed-off-by: Philippe Mathieu-Daudé --- hw/block/virtio-blk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index d28979efb8d..f139cd7cc9c 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -40,7 +40,7 @@ * Starting from the discard feature, we can use this array to properly * set the config size depending on the features enabled. */ -static VirtIOFeature feature_sizes[] = { +static const VirtIOFeature feature_sizes[] = { {.flags = 1ULL << VIRTIO_BLK_F_DISCARD, .end = endof(struct virtio_blk_config, discard_sector_alignment)}, {.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES, Acked-by: Jason Wang
Re: [PATCH 1/3] hw/virtio: Pass virtio_feature_get_config_size() a const argument
在 2021/5/11 下午6:41, Philippe Mathieu-Daudé 写道: The VirtIOFeature structure isn't modified, mark it const. Signed-off-by: Philippe Mathieu-Daudé Acked-by: Jason Wang --- include/hw/virtio/virtio.h | 2 +- hw/virtio/virtio.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index b7ece7a6a89..8bab9cfb750 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -43,7 +43,7 @@ typedef struct VirtIOFeature { size_t end; } VirtIOFeature; -size_t virtio_feature_get_config_size(VirtIOFeature *features, +size_t virtio_feature_get_config_size(const VirtIOFeature *features, uint64_t host_features); typedef struct VirtQueue VirtQueue; diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 9e13cb9e3ad..e02544b2df7 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -2981,7 +2981,7 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val) return ret; } -size_t virtio_feature_get_config_size(VirtIOFeature *feature_sizes, +size_t virtio_feature_get_config_size(const VirtIOFeature *feature_sizes, uint64_t host_features) { size_t config_size = 0;
Re: [PATCH 15/23] net: Avoid dynamic stack allocation
在 2021/5/6 上午5:10, Philippe Mathieu-Daudé 写道: Use autofree heap allocation instead of variable-length array on the stack. Signed-off-by: Philippe Mathieu-Daudé Acked-by: Jason Wang --- hw/net/fsl_etsec/rings.c | 9 - hw/net/rocker/rocker_of_dpa.c | 2 +- net/dump.c| 2 +- net/tap.c | 2 +- 4 files changed, 7 insertions(+), 8 deletions(-) diff --git a/hw/net/fsl_etsec/rings.c b/hw/net/fsl_etsec/rings.c index 8f084464155..1abdcb5a29c 100644 --- a/hw/net/fsl_etsec/rings.c +++ b/hw/net/fsl_etsec/rings.c @@ -381,8 +381,6 @@ static void fill_rx_bd(eTSEC *etsec, uint16_t to_write; hwaddr bufptr = bd->bufptr + ((hwaddr)(etsec->regs[TBDBPH].value & 0xF) << 32); -uint8_t padd[etsec->rx_padding]; -uint8_t rem; RING_DEBUG("eTSEC fill Rx buffer @ 0x%016" HWADDR_PRIx " size:%zu(padding + crc:%u) + fcb:%u\n", @@ -423,11 +421,12 @@ static void fill_rx_bd(eTSEC *etsec, /* The remaining bytes are only for padding which is not actually * allocated in the data buffer. */ - -rem = MIN(etsec->regs[MRBLR].value - bd->length, etsec->rx_padding); +uint8_t rem = MIN(etsec->regs[MRBLR].value - bd->length, + etsec->rx_padding); if (rem > 0) { -memset(padd, 0x0, sizeof(padd)); +g_autofree uint8_t *padd = g_malloc0(etsec->rx_padding); + etsec->rx_padding -= rem; *size -= rem; bd->length+= rem; diff --git a/hw/net/rocker/rocker_of_dpa.c b/hw/net/rocker/rocker_of_dpa.c index b3b8c5bb6d4..3e400ceaef7 100644 --- a/hw/net/rocker/rocker_of_dpa.c +++ b/hw/net/rocker/rocker_of_dpa.c @@ -1043,7 +1043,7 @@ static void of_dpa_flow_ig_tbl(OfDpaFlowContext *fc, uint32_t tbl_id) static ssize_t of_dpa_ig(World *world, uint32_t pport, const struct iovec *iov, int iovcnt) { -struct iovec iov_copy[iovcnt + 2]; +g_autofree struct iovec *iov_copy = g_new(struct iovec, iovcnt + 2); OfDpaFlowContext fc = { .of_dpa = world_private(world), .in_pport = pport, diff --git a/net/dump.c b/net/dump.c index 4d538d82a69..b830302e27d 100644 --- a/net/dump.c +++ b/net/dump.c @@ -68,7 +68,7 @@ static ssize_t dump_receive_iov(DumpState *s, const struct iovec *iov, int cnt) int64_t ts; int caplen; size_t size = iov_size(iov, cnt); -struct iovec dumpiov[cnt + 1]; +g_autofree struct iovec *dumpiov = g_new(struct iovec, cnt + 1); /* Early return in case of previous error. */ if (s->fd < 0) { diff --git a/net/tap.c b/net/tap.c index bae895e2874..2b9ed8a2cd8 100644 --- a/net/tap.c +++ b/net/tap.c @@ -120,7 +120,7 @@ static ssize_t tap_receive_iov(NetClientState *nc, const struct iovec *iov, { TAPState *s = DO_UPCAST(TAPState, nc, nc); const struct iovec *iovp = iov; -struct iovec iov_copy[iovcnt + 1]; +g_autofree struct iovec *iov_copy = g_new(struct iovec, iovcnt + 1); struct virtio_net_hdr_mrg_rxbuf hdr = { }; if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {
Re: [PATCH 09/23] hw/net/e1000e_core: Use definition to avoid dynamic stack allocation
在 2021/5/6 上午5:10, Philippe Mathieu-Daudé 写道: The compiler isn't clever enough to figure 'min_buf_size' is a constant, so help it by using a definitions instead. Signed-off-by: Philippe Mathieu-Daudé Acked-by: Jason Wang --- hw/net/e1000e_core.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index b75f2ab8fc1..4b1d4521a50 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -1621,15 +1621,16 @@ e1000e_rx_fix_l4_csum(E1000ECore *core, struct NetRxPkt *pkt) } } +/* Min. octets in an ethernet frame sans FCS */ +#define MIN_BUF_SIZE 60 + ssize_t e1000e_receive_iov(E1000ECore *core, const struct iovec *iov, int iovcnt) { static const int maximum_ethernet_hdr_len = (14 + 4); -/* Min. octets in an ethernet frame sans FCS */ -static const int min_buf_size = 60; uint32_t n = 0; -uint8_t min_buf[min_buf_size]; +uint8_t min_buf[MIN_BUF_SIZE]; struct iovec min_iov; uint8_t *filter_buf; size_t size, orig_size;
Re: [PATCH 0/6] Add debug interface to kick/call on purpose
在 2021/4/8 下午1:51, Dongli Zhang 写道: On 4/6/21 7:20 PM, Jason Wang wrote: 在 2021/4/7 上午7:27, Dongli Zhang 写道: This will answer your question that "Can it bypass the masking?". For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd is not able to bypass masking because masking is to unregister the eventfd. To write to eventfd does not take effect. However, it is possible to bypass masking for vhost-net because vhost-net registered a specific masked_notifier eventfd in order to mask irq. To write to original eventfd still takes effect. We may leave the user to decide whether to write to 'masked_notifier' or original 'guest_notifier' for vhost-net. My fault here. To write to masked_notifier will always be masked:( Only when there's no bug in the qemu. If it is EventNotifier level, we will not care whether the EventNotifier is masked or not. It just provides an interface to write to EventNotifier. Yes. To dump the MSI-x table for both virtio and vfio will help confirm if the vector is masked. That would be helpful as well. It's probably better to extend "info pci" command. Thanks I will try if to add to "info pci" (introduce new arg option to "info pci"), or to introduce new command. It's better to just reuse "info pci". About the EventNotifier, I will classify them as guest notifier or host notifier so that it will be much more easier for user to tell if the eventfd is for injecting IRQ or kicking the doorbell. Sounds good. Thank you very much for all suggestions! Dongli Zhang Thanks Thank you very much! Dongli Zhang
Re: [PATCH 0/6] Add debug interface to kick/call on purpose
在 2021/4/7 上午7:27, Dongli Zhang 写道: This will answer your question that "Can it bypass the masking?". For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd is not able to bypass masking because masking is to unregister the eventfd. To write to eventfd does not take effect. However, it is possible to bypass masking for vhost-net because vhost-net registered a specific masked_notifier eventfd in order to mask irq. To write to original eventfd still takes effect. We may leave the user to decide whether to write to 'masked_notifier' or original 'guest_notifier' for vhost-net. My fault here. To write to masked_notifier will always be masked:( Only when there's no bug in the qemu. If it is EventNotifier level, we will not care whether the EventNotifier is masked or not. It just provides an interface to write to EventNotifier. Yes. To dump the MSI-x table for both virtio and vfio will help confirm if the vector is masked. That would be helpful as well. It's probably better to extend "info pci" command. Thanks Thank you very much! Dongli Zhang
Re: [PATCH 0/6] Add debug interface to kick/call on purpose
在 2021/4/6 下午4:43, Dongli Zhang 写道: On 4/5/21 6:55 PM, Jason Wang wrote: 在 2021/4/6 上午4:00, Dongli Zhang 写道: On 4/1/21 8:47 PM, Jason Wang wrote: 在 2021/3/30 下午3:29, Dongli Zhang 写道: On 3/28/21 8:56 PM, Jason Wang wrote: 在 2021/3/27 上午5:16, Dongli Zhang 写道: Hi Jason, On 3/26/21 12:24 AM, Jason Wang wrote: 在 2021/3/26 下午1:44, Dongli Zhang 写道: The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to the loss of doorbell kick, e.g., https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$ ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler"). This patch introduces a new debug interface 'DeviceEvent' to DeviceClass to help narrow down if the issue is due to loss of irq/kick. So far the new interface handles only two events: 'call' and 'kick'. Any device (e.g., virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X or legacy IRQ). The 'call' is to inject irq on purpose by admin for a specific device (e.g., vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell on purpose by admin at QEMU/host side for a specific device. This device can be used as a workaround if call/kick is lost due to virtualization software (e.g., kernel or QEMU) issue. We may also implement the interface for VFIO PCI, e.g., to write to VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM on purpose. This is considered future work once the virtio part is done. Below is from live crash analysis. Initially, the queue=2 has count=15 for 'kick' eventfd_ctx. Suppose there is data in vring avail while there is no used available. We suspect this is because vhost-scsi was not notified by VM. In order to narrow down and analyze the issue, we use live crash to dump the current counter of eventfd for queue=2. crash> eventfd_ctx 8f67f6bbe700 struct eventfd_ctx { kref = { refcount = { refs = { counter = 4 } } }, wqh = { lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, head = { next = 0x8f841dc08e18, prev = 0x8f841dc08e18 } }, count = 15, ---> eventfd is 15 !!! flags = 526336 } Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic with this interface. { "execute": "x-debug-device-event", "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick", "queue": 2 } } The counter is increased to 16. Suppose the hang issue is resolved, it indicates something bad is in software that the 'kick' is lost. What do you mean by "software" here? And it looks to me you're testing whether event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not sure how much value could we gain from a dedicated debug interface like this consider there're a lot of exisinting general purpose debugging method like tracing or gdb. I'd say the path from virtio_queue_notify() to event_notifier_set() is only a very small fraction of the process of virtqueue kick which is unlikey to be buggy. Consider usually the ioeventfd will be offloaded to KVM, it's more a chance that something is wrong in setuping ioeventfd instead of here. Irq is even more complicated. Thank you very much! I am not testing whether event_notifier_set() is called by virtio_queue_notify(). The 'software' indicates the data processing and event notification mechanism involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event() erroneously returns false due to corrupted ring buffer status. So there could be several factors that may block the notification: 1) eventfd bug (ioeventfd vs irqfd) 2) wrong virtqueue state (either driver or device) 3) missing barriers (either driver or device) 4) Qemu bug (irqchip and routing) ... This is not only about whether notification is blocked. It can also be used to help narrow down and understand if there is any suspicious issue in blk-mq/scsi/netdev/napi code. The PV drivers are not only drivers following virtio spec. It is closely related to many of other kernel components. Suppose IO was recovered after we inject an IRQ to vhost-scsi on purpose, we will be able to analyze what may happen along the IO completion path starting from when /where the IRQ is injected ... perhaps the root cause is not with virtio but blk-mq/scsi (this is just an example). In addition, this idea should help for vfi
Re: [PATCH 0/6] Add debug interface to kick/call on purpose
在 2021/4/6 上午4:00, Dongli Zhang 写道: On 4/1/21 8:47 PM, Jason Wang wrote: 在 2021/3/30 下午3:29, Dongli Zhang 写道: On 3/28/21 8:56 PM, Jason Wang wrote: 在 2021/3/27 上午5:16, Dongli Zhang 写道: Hi Jason, On 3/26/21 12:24 AM, Jason Wang wrote: 在 2021/3/26 下午1:44, Dongli Zhang 写道: The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to the loss of doorbell kick, e.g., https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$ ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler"). This patch introduces a new debug interface 'DeviceEvent' to DeviceClass to help narrow down if the issue is due to loss of irq/kick. So far the new interface handles only two events: 'call' and 'kick'. Any device (e.g., virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X or legacy IRQ). The 'call' is to inject irq on purpose by admin for a specific device (e.g., vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell on purpose by admin at QEMU/host side for a specific device. This device can be used as a workaround if call/kick is lost due to virtualization software (e.g., kernel or QEMU) issue. We may also implement the interface for VFIO PCI, e.g., to write to VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM on purpose. This is considered future work once the virtio part is done. Below is from live crash analysis. Initially, the queue=2 has count=15 for 'kick' eventfd_ctx. Suppose there is data in vring avail while there is no used available. We suspect this is because vhost-scsi was not notified by VM. In order to narrow down and analyze the issue, we use live crash to dump the current counter of eventfd for queue=2. crash> eventfd_ctx 8f67f6bbe700 struct eventfd_ctx { kref = { refcount = { refs = { counter = 4 } } }, wqh = { lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, head = { next = 0x8f841dc08e18, prev = 0x8f841dc08e18 } }, count = 15, ---> eventfd is 15 !!! flags = 526336 } Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic with this interface. { "execute": "x-debug-device-event", "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick", "queue": 2 } } The counter is increased to 16. Suppose the hang issue is resolved, it indicates something bad is in software that the 'kick' is lost. What do you mean by "software" here? And it looks to me you're testing whether event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not sure how much value could we gain from a dedicated debug interface like this consider there're a lot of exisinting general purpose debugging method like tracing or gdb. I'd say the path from virtio_queue_notify() to event_notifier_set() is only a very small fraction of the process of virtqueue kick which is unlikey to be buggy. Consider usually the ioeventfd will be offloaded to KVM, it's more a chance that something is wrong in setuping ioeventfd instead of here. Irq is even more complicated. Thank you very much! I am not testing whether event_notifier_set() is called by virtio_queue_notify(). The 'software' indicates the data processing and event notification mechanism involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event() erroneously returns false due to corrupted ring buffer status. So there could be several factors that may block the notification: 1) eventfd bug (ioeventfd vs irqfd) 2) wrong virtqueue state (either driver or device) 3) missing barriers (either driver or device) 4) Qemu bug (irqchip and routing) ... This is not only about whether notification is blocked. It can also be used to help narrow down and understand if there is any suspicious issue in blk-mq/scsi/netdev/napi code. The PV drivers are not only drivers following virtio spec. It is closely related to many of other kernel components. Suppose IO was recovered after we inject an IRQ to vhost-scsi on purpose, we will be able to analyze what may happen along the IO completion path starting from when /where the IRQ is injected ... perhaps the root cause is not with virtio but blk-mq/scsi (this is just an example). In addition, this idea should help for vfio-pci. Suppose the developer for a specific device driver suspects IO/networking hangs because of lo
Re: [PATCH 0/6] Add debug interface to kick/call on purpose
在 2021/3/30 下午3:29, Dongli Zhang 写道: On 3/28/21 8:56 PM, Jason Wang wrote: 在 2021/3/27 上午5:16, Dongli Zhang 写道: Hi Jason, On 3/26/21 12:24 AM, Jason Wang wrote: 在 2021/3/26 下午1:44, Dongli Zhang 写道: The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to the loss of doorbell kick, e.g., https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$ ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler"). This patch introduces a new debug interface 'DeviceEvent' to DeviceClass to help narrow down if the issue is due to loss of irq/kick. So far the new interface handles only two events: 'call' and 'kick'. Any device (e.g., virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X or legacy IRQ). The 'call' is to inject irq on purpose by admin for a specific device (e.g., vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell on purpose by admin at QEMU/host side for a specific device. This device can be used as a workaround if call/kick is lost due to virtualization software (e.g., kernel or QEMU) issue. We may also implement the interface for VFIO PCI, e.g., to write to VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM on purpose. This is considered future work once the virtio part is done. Below is from live crash analysis. Initially, the queue=2 has count=15 for 'kick' eventfd_ctx. Suppose there is data in vring avail while there is no used available. We suspect this is because vhost-scsi was not notified by VM. In order to narrow down and analyze the issue, we use live crash to dump the current counter of eventfd for queue=2. crash> eventfd_ctx 8f67f6bbe700 struct eventfd_ctx { kref = { refcount = { refs = { counter = 4 } } }, wqh = { lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, head = { next = 0x8f841dc08e18, prev = 0x8f841dc08e18 } }, count = 15, ---> eventfd is 15 !!! flags = 526336 } Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic with this interface. { "execute": "x-debug-device-event", "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick", "queue": 2 } } The counter is increased to 16. Suppose the hang issue is resolved, it indicates something bad is in software that the 'kick' is lost. What do you mean by "software" here? And it looks to me you're testing whether event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not sure how much value could we gain from a dedicated debug interface like this consider there're a lot of exisinting general purpose debugging method like tracing or gdb. I'd say the path from virtio_queue_notify() to event_notifier_set() is only a very small fraction of the process of virtqueue kick which is unlikey to be buggy. Consider usually the ioeventfd will be offloaded to KVM, it's more a chance that something is wrong in setuping ioeventfd instead of here. Irq is even more complicated. Thank you very much! I am not testing whether event_notifier_set() is called by virtio_queue_notify(). The 'software' indicates the data processing and event notification mechanism involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event() erroneously returns false due to corrupted ring buffer status. So there could be several factors that may block the notification: 1) eventfd bug (ioeventfd vs irqfd) 2) wrong virtqueue state (either driver or device) 3) missing barriers (either driver or device) 4) Qemu bug (irqchip and routing) ... This is not only about whether notification is blocked. It can also be used to help narrow down and understand if there is any suspicious issue in blk-mq/scsi/netdev/napi code. The PV drivers are not only drivers following virtio spec. It is closely related to many of other kernel components. Suppose IO was recovered after we inject an IRQ to vhost-scsi on purpose, we will be able to analyze what may happen along the IO completion path starting from when /where the IRQ is injected ... perhaps the root cause is not with virtio but blk-mq/scsi (this is just an example). In addition, this idea should help for vfio-pci. Suppose the developer for a specific device driver suspects IO/networking hangs because of loss if IRQ, we will be able to verify if that assumption is correct by injecting an IRQ on purpose. T
Re: [PATCH 0/6] Add debug interface to kick/call on purpose
在 2021/3/27 上午5:16, Dongli Zhang 写道: Hi Jason, On 3/26/21 12:24 AM, Jason Wang wrote: 在 2021/3/26 下午1:44, Dongli Zhang 写道: The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to the loss of doorbell kick, e.g., https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$ ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler"). This patch introduces a new debug interface 'DeviceEvent' to DeviceClass to help narrow down if the issue is due to loss of irq/kick. So far the new interface handles only two events: 'call' and 'kick'. Any device (e.g., virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X or legacy IRQ). The 'call' is to inject irq on purpose by admin for a specific device (e.g., vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell on purpose by admin at QEMU/host side for a specific device. This device can be used as a workaround if call/kick is lost due to virtualization software (e.g., kernel or QEMU) issue. We may also implement the interface for VFIO PCI, e.g., to write to VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM on purpose. This is considered future work once the virtio part is done. Below is from live crash analysis. Initially, the queue=2 has count=15 for 'kick' eventfd_ctx. Suppose there is data in vring avail while there is no used available. We suspect this is because vhost-scsi was not notified by VM. In order to narrow down and analyze the issue, we use live crash to dump the current counter of eventfd for queue=2. crash> eventfd_ctx 8f67f6bbe700 struct eventfd_ctx { kref = { refcount = { refs = { counter = 4 } } }, wqh = { lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, head = { next = 0x8f841dc08e18, prev = 0x8f841dc08e18 } }, count = 15, ---> eventfd is 15 !!! flags = 526336 } Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic with this interface. { "execute": "x-debug-device-event", "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick", "queue": 2 } } The counter is increased to 16. Suppose the hang issue is resolved, it indicates something bad is in software that the 'kick' is lost. What do you mean by "software" here? And it looks to me you're testing whether event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not sure how much value could we gain from a dedicated debug interface like this consider there're a lot of exisinting general purpose debugging method like tracing or gdb. I'd say the path from virtio_queue_notify() to event_notifier_set() is only a very small fraction of the process of virtqueue kick which is unlikey to be buggy. Consider usually the ioeventfd will be offloaded to KVM, it's more a chance that something is wrong in setuping ioeventfd instead of here. Irq is even more complicated. Thank you very much! I am not testing whether event_notifier_set() is called by virtio_queue_notify(). The 'software' indicates the data processing and event notification mechanism involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event() erroneously returns false due to corrupted ring buffer status. So there could be several factors that may block the notification: 1) eventfd bug (ioeventfd vs irqfd) 2) wrong virtqueue state (either driver or device) 3) missing barriers (either driver or device) 4) Qemu bug (irqchip and routing) ... Consider we want to debug virtio issue, only 2) or 3) is what we really cared. So for kick you did (assume vhost is on): virtio_device_event_kick() virtio_queue_notify() event_notifier_set() It looks to me you're actaully testing if ioeventfd is correctly set by Qemu. For call you did: virtio_device_event_call() event_notifier_set() A test of irqfd is correctly set by Qemu. So all of those are not virtio specific stuffs but you introduce virtio specific command to do debug non virtio functions. In the case of what you mentioned for vring_need_event(), what we really want is to dump the virtqueue state from the guest. This might requries some work of extending virtio spec (e.g to dump device status like indices, event, wrap counters). This was initially proposed for vhost only and I was going to export ioeventfd/irqfd from vhost to admin via sysfs. Finally, I realized I would better impl
Re: [PATCH 0/6] Add debug interface to kick/call on purpose
在 2021/3/26 下午1:44, Dongli Zhang 写道: The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to the loss of doorbell kick, e.g., https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler"). This patch introduces a new debug interface 'DeviceEvent' to DeviceClass to help narrow down if the issue is due to loss of irq/kick. So far the new interface handles only two events: 'call' and 'kick'. Any device (e.g., virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X or legacy IRQ). The 'call' is to inject irq on purpose by admin for a specific device (e.g., vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell on purpose by admin at QEMU/host side for a specific device. This device can be used as a workaround if call/kick is lost due to virtualization software (e.g., kernel or QEMU) issue. We may also implement the interface for VFIO PCI, e.g., to write to VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM on purpose. This is considered future work once the virtio part is done. Below is from live crash analysis. Initially, the queue=2 has count=15 for 'kick' eventfd_ctx. Suppose there is data in vring avail while there is no used available. We suspect this is because vhost-scsi was not notified by VM. In order to narrow down and analyze the issue, we use live crash to dump the current counter of eventfd for queue=2. crash> eventfd_ctx 8f67f6bbe700 struct eventfd_ctx { kref = { refcount = { refs = { counter = 4 } } }, wqh = { lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, head = { next = 0x8f841dc08e18, prev = 0x8f841dc08e18 } }, count = 15, ---> eventfd is 15 !!! flags = 526336 } Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic with this interface. { "execute": "x-debug-device-event", "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick", "queue": 2 } } The counter is increased to 16. Suppose the hang issue is resolved, it indicates something bad is in software that the 'kick' is lost. What do you mean by "software" here? And it looks to me you're testing whether event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not sure how much value could we gain from a dedicated debug interface like this consider there're a lot of exisinting general purpose debugging method like tracing or gdb. I'd say the path from virtio_queue_notify() to event_notifier_set() is only a very small fraction of the process of virtqueue kick which is unlikey to be buggy. Consider usually the ioeventfd will be offloaded to KVM, it's more a chance that something is wrong in setuping ioeventfd instead of here. Irq is even more complicated. I think we could not gain much for introducing an dedicated mechanism for such a corner case. Thanks crash> eventfd_ctx 8f67f6bbe700 struct eventfd_ctx { kref = { refcount = { refs = { counter = 4 } } }, wqh = { lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, head = { next = 0x8f841dc08e18, prev = 0x8f841dc08e18 } }, count = 16, ---> eventfd incremented to 16 !!! flags = 526336 } Original RFC link: https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg03441.html Changed since RFC: - add support for more virtio/vhost pci devices - add log (toggled by DEBUG_VIRTIO_EVENT) to virtio.c to say that this mischeivous command had been used - fix grammer error (s/lost/loss/) - change version to 6.1 - fix incorrect example in qapi/qdev.json - manage event types with enum/array, instead of hard coding Dongli Zhang (6): qdev: introduce qapi/hmp command for kick/call event virtio: introduce helper function for kick/call device event virtio-blk-pci: implement device event interface for kick/call virtio-scsi-pci: implement device event interface for kick/call vhost-scsi-pci: implement device event interface for kick/call virtio-net-pci: implement device event interface for kick/call hmp-commands.hx | 14 hw/block/virtio-blk.c | 9 + hw/net/virtio-net.c | 9 + hw/scsi/vhost-scsi.c| 6 hw/scsi/virtio-scsi.c | 9 + hw/virtio/vhost-scsi-pci.c | 10 ++ hw/virtio/virtio-blk-pci.c | 10 ++ hw/virtio/virtio-net-pci.c | 10 ++ hw/virtio/virtio-scsi-pci.c | 10 ++ hw/virtio/virtio.c
Re: [RFC PATCH 00/12] hw: Forbid DMA write accesses to MMIO regions
On 2020/9/4 上午3:46, Edgar E. Iglesias wrote: On Thu, Sep 03, 2020 at 07:53:33PM +0200, Paolo Bonzini wrote: On 03/09/20 17:50, Edgar E. Iglesias wrote: Hmm, I guess it would make sense to have a configurable option in KVM to isolate passthrough devices so they only can DMA to guest RAM... Passthrough devices are always protected by the IOMMU, anything else would be obviously insane^H^H^Hecure. :) Really? To always do that blindly seems wrong. I'm refering to the passthrough device not being able to reach registers of other passthrough devices within the same guest. Ah okay; sorry, I misunderstood. That makes more sense now! Multiple devices are put in the same IOMMU "container" (page table basically), and that takes care of reaching registers of other passthrough devices. Thanks, yes, that's a sane default. What I was trying to say before is that it may make sense to allow the user to "harden" the setup by selectivly putting certain passthrough devs on a separate group that can *only* DMA access guest RAM (not other device regs). This makes sens but it requires the knowledge from the management layer of whether P2P is needed which is probably not easy. Thanks Some devs need access to other device's regs but many passthrough devs don't need DMA access to anything else but RAM (e.g an Ethernet MAC). That could mitigate the damage caused by wild DMA pointers... Cheers, Edgar
Re: [PATCH v2 0/2] util/hexdump: Cleanup qemu_hexdump()
On 2020/8/23 上午2:09, Philippe Mathieu-Daudé wrote: - Pass const void* buffer - Reorder arguments Supersedes: <20200822150457.1322519-1-f4...@amsat.org> Philippe Mathieu-Daudé (2): util/hexdump: Convert to take a void pointer argument util/hexdump: Reorder qemu_hexdump() arguments include/qemu-common.h| 3 ++- hw/dma/xlnx_dpdma.c | 2 +- hw/net/fsl_etsec/etsec.c | 2 +- hw/net/fsl_etsec/rings.c | 2 +- hw/sd/sd.c | 2 +- hw/usb/redirect.c| 2 +- net/colo-compare.c | 24 net/net.c| 2 +- util/hexdump.c | 4 +++- util/iov.c | 2 +- 10 files changed, 24 insertions(+), 21 deletions(-) Applied. Thanks
Re: [PATCH v2 4/7] vhost: involve device backends in feature negotiation
On 2020/6/10 下午2:11, Michael S. Tsirkin wrote: On Wed, Jun 10, 2020 at 01:53:57PM +0800, Jason Wang wrote: On 2020/6/10 下午12:15, Michael S. Tsirkin wrote: On Wed, Jun 10, 2020 at 11:21:50AM +0800, Jason Wang wrote: On 2020/6/10 上午2:07, Michael S. Tsirkin wrote: +/* + * Default vhost_get_features() feature bits for existing device types that do + * not define their own. + * + * This is a workaround for existing device types, do not use this in new vhost + * device types. Explicitly define a list of feature bits instead. + * + * The following feature bits are excluded because libvhost-user device + * backends did not advertise them for a long time. Therefore we cannot detect + * their presence. Instead we assume they are always supported by the device + * backend: + * VIRTIO_F_NOTIFY_ON_EMPTY + * VIRTIO_F_ANY_LAYOUT + * VIRTIO_F_VERSION_1 + * VIRTIO_RING_F_INDIRECT_DESC + * VIRTIO_RING_F_EVENT_IDX Weird. I remember that it's common for vhost-user not to set VIRTIO_RING_F_INDIRECT_DESC - they have huge queues so don't need it and inline descriptors give them better performance. So what's going on here? I guess one reason is to support live migration between vhost-user and vhost-net. Thanks But how can we force-enable features backend doesn't want to enable? We can't and the code just forces qemu to validate VIRTIO_RING_F_INDIRECT_DESC for each vhost backends instead of assuming the support silently. Thanks So why does the comment above say: Instead we assume they are always supported by the device backend Sorry for being unclear. I meant, for any new type of vhost devices, they should advertise features as what spec said and qemu must validate VIRTIO_RING_F_INDIRECT_DESC. Thanks This may or may not break backends ... I would rather just be strict and ask backends to fix their feature bits. See user_feature_bits in hw/net/vhost-net.c which supports all these features.
Re: [PATCH v2 4/7] vhost: involve device backends in feature negotiation
On 2020/6/10 下午12:15, Michael S. Tsirkin wrote: On Wed, Jun 10, 2020 at 11:21:50AM +0800, Jason Wang wrote: On 2020/6/10 上午2:07, Michael S. Tsirkin wrote: +/* + * Default vhost_get_features() feature bits for existing device types that do + * not define their own. + * + * This is a workaround for existing device types, do not use this in new vhost + * device types. Explicitly define a list of feature bits instead. + * + * The following feature bits are excluded because libvhost-user device + * backends did not advertise them for a long time. Therefore we cannot detect + * their presence. Instead we assume they are always supported by the device + * backend: + * VIRTIO_F_NOTIFY_ON_EMPTY + * VIRTIO_F_ANY_LAYOUT + * VIRTIO_F_VERSION_1 + * VIRTIO_RING_F_INDIRECT_DESC + * VIRTIO_RING_F_EVENT_IDX Weird. I remember that it's common for vhost-user not to set VIRTIO_RING_F_INDIRECT_DESC - they have huge queues so don't need it and inline descriptors give them better performance. So what's going on here? I guess one reason is to support live migration between vhost-user and vhost-net. Thanks But how can we force-enable features backend doesn't want to enable? We can't and the code just forces qemu to validate VIRTIO_RING_F_INDIRECT_DESC for each vhost backends instead of assuming the support silently. Thanks This may or may not break backends ... I would rather just be strict and ask backends to fix their feature bits. See user_feature_bits in hw/net/vhost-net.c which supports all these features.
Re: [PATCH v2 4/7] vhost: involve device backends in feature negotiation
On 2020/6/10 上午2:07, Michael S. Tsirkin wrote: +/* + * Default vhost_get_features() feature bits for existing device types that do + * not define their own. + * + * This is a workaround for existing device types, do not use this in new vhost + * device types. Explicitly define a list of feature bits instead. + * + * The following feature bits are excluded because libvhost-user device + * backends did not advertise them for a long time. Therefore we cannot detect + * their presence. Instead we assume they are always supported by the device + * backend: + * VIRTIO_F_NOTIFY_ON_EMPTY + * VIRTIO_F_ANY_LAYOUT + * VIRTIO_F_VERSION_1 + * VIRTIO_RING_F_INDIRECT_DESC + * VIRTIO_RING_F_EVENT_IDX Weird. I remember that it's common for vhost-user not to set VIRTIO_RING_F_INDIRECT_DESC - they have huge queues so don't need it and inline descriptors give them better performance. So what's going on here? I guess one reason is to support live migration between vhost-user and vhost-net. Thanks
Re: [PATCH 2/5] vhost: involve device backends in feature negotiation
On 2020/5/29 下午9:56, Stefan Hajnoczi wrote: On Fri, May 29, 2020 at 03:04:54PM +0800, Jason Wang wrote: On 2020/5/23 上午1:17, Stefan Hajnoczi wrote: Many vhost devices in QEMU currently do not involve the device backend in feature negotiation. This seems fine at first glance for device types without their own feature bits (virtio-net has many but other device types have none). This overlooks the fact that QEMU's virtqueue implementation and the device backend's implementation may support different features. QEMU must not report features to the guest that the the device backend doesn't support. For example, QEMU supports VIRTIO 1.1 packed virtqueues while many existing vhost device backends do not. When the user sets packed=on the device backend breaks. This should have been handled gracefully by feature negotiation instead. Introduce vhost_get_default_features() and update all vhost devices in QEMU to involve the device backend in feature negotiation. This patch fixes the following error: $ x86_64-softmmu/qemu-system-x86_64 \ -drive if=virtio,file=test.img,format=raw \ -chardev socket,path=/tmp/vhost-user-blk.sock,id=char0 \ -device vhost-user-blk-pci,chardev=char0,packed=on \ -object memory-backend-memfd,size=1G,share=on,id=ram0 \ -M accel=kvm,memory-backend=ram0 qemu-system-x86_64: Failed to set msg fds. qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Success (0) It looks to me this could be fixed simply by adding VIRTIO_F_RING_PACKED into user_feature_bits in vhost-user-blk.c. And for the rest, we need require them to call vhost_get_features() with the proper feature bits that needs to be acked by the backend. There is a backwards-compatibility problem: we cannot call vhost_get_features() with the full set of feature bits that each device type supports because existing vhost-user backends don't advertise features properly. QEMU disables features not advertised by the vhost-user backend. For example, if we add VIRTIO_RING_F_EVENT_IDX then it will always be disabled for existing libvhost-user backends because they don't advertise this bit :(. Agree. The reason I introduced vhost_get_default_features() is to at least salvage VIRTIO_F_IOMMU_PLATFORM and VIRTIO_F_RING_PACKED. These bits can be safely negotiated for all devices. Any new transport/vring VIRTIO feature bits can also be added to vhost_get_default_features() safely. If a vhost-user device wants to use device type-specific feature bits then it really needs to call vhost_get_features() directly as you suggest. But it's important to check whether existing vhost-user backends actually advertise them - because if they don't advertise them but rely on them then we'll break existing backends. Would you prefer a different approach? I get you now so I think not. Maybe we need document the expected behavior of VHOST_USER_GET_FEATURES e.g which set of features that must be advertised explicitly. And for the set you mention here, we probably need to add: VIRTIO_F_ORDER_PLATFORM, VIRTIO_F_ANY_LAYOUT (not sure if it was too late). And VIRTIO_F_IN_ORDER diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index aff98a0ede..f8a144dcd0 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -48,6 +48,23 @@ static unsigned int used_memslots; static QLIST_HEAD(, vhost_dev) vhost_devices = QLIST_HEAD_INITIALIZER(vhost_devices); +/* + * Feature bits that device backends must explicitly report. Feature bits not + * listed here maybe set by QEMU without checking with the device backend. + * Ideally all feature bits would be listed here but existing vhost device + * implementations do not explicitly report bits like VIRTIO_F_VERSION_1, so we + * can only assume they are supported. For most backends, we care about the features for datapath. So this is not true at least for networking device, since VIRTIO_F_VERSION_1 have impact on the length of vnet header length. The net device is in good shape and doesn't use vhost_default_feature_bits[]. vhost_default_feature_bits[] is for devices that haven't been negotiating feature bits properly. The goal is start involving them in feature negotiation without breaking existing backends. Would you like me to rephrase this comment in some way? That will be better. + * + * New feature bits added to the VIRTIO spec should usually be included here + * so that existing vhost device backends that do not support them yet continue + * to work. It actually depends on the type of backend. Kernel vhost-net does not validate GSO features since qemu can talk directly to TAP and vhost doesn't report those features. But for vhost-user GSO features must be validated by qemu since qemu can't see what is behind vhost-user. Maybe the comment should say "New transport/vring feature bits". I'm thinking about things like VIRTIO_F_RING_PACKED that are not device-specific but require backend support. The GSO fe
Re: [PATCH 5/5] virtio: enable VIRTIO_F_RING_PACKED for all devices
On 2020/5/23 上午1:17, Stefan Hajnoczi wrote: The packed virtqueue layout was introduced in VIRTIO 1.1. It is a single ring instead of a split avail/used ring design. There are CPU cache advantages to this layout and it is also suited better to hardware implementation. The vhost-net backend has already supported packed virtqueues for some time. Performance benchmarks show that virtio-blk performance on NVMe drives is also improved. Go ahead and enable this feature for all VIRTIO devices. Keep it disabled for QEMU 5.0 and earlier machine types. Signed-off-by: Stefan Hajnoczi --- include/hw/virtio/virtio.h | 2 +- hw/core/machine.c | 18 +- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index b69d517496..fd5b4a2044 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -292,7 +292,7 @@ typedef struct VirtIORNGConf VirtIORNGConf; DEFINE_PROP_BIT64("iommu_platform", _state, _field, \ VIRTIO_F_IOMMU_PLATFORM, false), \ DEFINE_PROP_BIT64("packed", _state, _field, \ - VIRTIO_F_RING_PACKED, false) + VIRTIO_F_RING_PACKED, true) hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n); bool virtio_queue_enabled(VirtIODevice *vdev, int n); diff --git a/hw/core/machine.c b/hw/core/machine.c index bb3a7b18b1..3598c3c825 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -28,7 +28,23 @@ #include "hw/mem/nvdimm.h" #include "migration/vmstate.h" -GlobalProperty hw_compat_5_0[] = {}; +GlobalProperty hw_compat_5_0[] = { +{ "vhost-user-blk", "packed", "off" }, +{ "vhost-user-fs-device", "packed", "off" }, +{ "vhost-vsock-device", "packed", "off" }, +{ "virtio-9p-device", "packed", "off" }, +{ "virtio-balloon-device", "packed", "off" }, +{ "virtio-blk-device", "packed", "off" }, +{ "virtio-crypto-device", "packed", "off" }, +{ "virtio-gpu-device", "packed", "off" }, +{ "virtio-input-device", "packed", "off" }, +{ "virtio-iommu-device", "packed", "off" }, +{ "virtio-net-device", "packed", "off" }, +{ "virtio-pmem", "packed", "off" }, +{ "virtio-rng-device", "packed", "off" }, +{ "virtio-scsi-common", "packed", "off" }, +{ "virtio-serial-device", "packed", "off" }, Missing "vhost-user-gpu" here? I try to do something like this in the past but give up since I end up with similar list. It would be better to consider something more smart, probably need some refactor for a common parent class. Thanks +}; const size_t hw_compat_5_0_len = G_N_ELEMENTS(hw_compat_5_0); GlobalProperty hw_compat_4_2[] = {
Re: [PATCH 2/5] vhost: involve device backends in feature negotiation
On 2020/5/23 上午1:17, Stefan Hajnoczi wrote: Many vhost devices in QEMU currently do not involve the device backend in feature negotiation. This seems fine at first glance for device types without their own feature bits (virtio-net has many but other device types have none). This overlooks the fact that QEMU's virtqueue implementation and the device backend's implementation may support different features. QEMU must not report features to the guest that the the device backend doesn't support. For example, QEMU supports VIRTIO 1.1 packed virtqueues while many existing vhost device backends do not. When the user sets packed=on the device backend breaks. This should have been handled gracefully by feature negotiation instead. Introduce vhost_get_default_features() and update all vhost devices in QEMU to involve the device backend in feature negotiation. This patch fixes the following error: $ x86_64-softmmu/qemu-system-x86_64 \ -drive if=virtio,file=test.img,format=raw \ -chardev socket,path=/tmp/vhost-user-blk.sock,id=char0 \ -device vhost-user-blk-pci,chardev=char0,packed=on \ -object memory-backend-memfd,size=1G,share=on,id=ram0 \ -M accel=kvm,memory-backend=ram0 qemu-system-x86_64: Failed to set msg fds. qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Success (0) It looks to me this could be fixed simply by adding VIRTIO_F_RING_PACKED into user_feature_bits in vhost-user-blk.c. And for the rest, we need require them to call vhost_get_features() with the proper feature bits that needs to be acked by the backend. The vhost-user-blk backend failed as follows: $ ./vhost-user-blk --socket-path=/tmp/vhost-user-blk.sock -b test2.img vu_panic: virtio: zero sized buffers are not allowed virtio-blk request missing headers Signed-off-by: Stefan Hajnoczi --- include/hw/virtio/vhost.h| 1 + include/hw/virtio/virtio-gpu.h | 2 ++ include/sysemu/cryptodev-vhost.h | 11 +++ backends/cryptodev-vhost.c | 19 +++ hw/display/vhost-user-gpu.c | 17 + hw/display/virtio-gpu-base.c | 2 +- hw/input/vhost-user-input.c | 9 + hw/virtio/vhost-user-fs.c| 5 +++-- hw/virtio/vhost-vsock.c | 5 +++-- hw/virtio/vhost.c| 22 ++ hw/virtio/virtio-crypto.c| 3 ++- 11 files changed, 90 insertions(+), 6 deletions(-) diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index 085450c6f8..d2e54dd4a8 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -112,6 +112,7 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, VirtIODevice *vdev, int n, bool mask); uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits, uint64_t features); +uint64_t vhost_get_default_features(struct vhost_dev *hdev, uint64_t features); void vhost_ack_features(struct vhost_dev *hdev, const int *feature_bits, uint64_t features); bool vhost_has_free_slot(void); diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h index 6dd57f2025..41d270d80e 100644 --- a/include/hw/virtio/virtio-gpu.h +++ b/include/hw/virtio/virtio-gpu.h @@ -192,6 +192,8 @@ bool virtio_gpu_base_device_realize(DeviceState *qdev, void virtio_gpu_base_reset(VirtIOGPUBase *g); void virtio_gpu_base_fill_display_info(VirtIOGPUBase *g, struct virtio_gpu_resp_display_info *dpy_info); +uint64_t virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t features, + Error **errp); /* virtio-gpu.c */ void virtio_gpu_ctrl_response(VirtIOGPU *g, diff --git a/include/sysemu/cryptodev-vhost.h b/include/sysemu/cryptodev-vhost.h index f42824fbde..e629446bfb 100644 --- a/include/sysemu/cryptodev-vhost.h +++ b/include/sysemu/cryptodev-vhost.h @@ -122,6 +122,17 @@ int cryptodev_vhost_start(VirtIODevice *dev, int total_queues); */ void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues); +/** + * cryptodev_vhost_get_features: + * @dev: the virtio crypto object + * @requested_features: the features being offered + * + * Returns: the requested features bits that are supported by the vhost device, + * or the original request feature bits if vhost is disabled + * + */ +uint64_t cryptodev_vhost_get_features(VirtIODevice *dev, uint64_t features); + /** * cryptodev_vhost_virtqueue_mask: * @dev: the virtio crypto object diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c index 8337c9a495..5f5a4fda7b 100644 --- a/backends/cryptodev-vhost.c +++ b/backends/cryptodev-vhost.c @@ -266,6 +266,20 @@ void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues) assert(r >= 0); } +uint64_t cryptodev_vhost_get_features(VirtIODevice *dev, uint64_t features) +{ +VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev); +
Re: [PATCH v3 2/2] vhost-user-blk: delay vhost_user_blk_disconnect
On 2020/5/20 下午11:53, Dima Stepanov wrote: A socket write during vhost-user communication may trigger a disconnect event, calling vhost_user_blk_disconnect() and clearing all the vhost_dev structures holding data that vhost-user functions expect to remain valid to roll back initialization correctly. Delay the cleanup to keep vhost_dev structure valid. There are two possible states to handle: 1. RUN_STATE_PRELAUNCH: skip bh oneshot call and perform disconnect in the caller routine. 2. RUN_STATE_RUNNING: delay by using bh BH changes are based on the similar changes for the vhost-user-net device: commit e7c83a885f865128ae3cf1946f8cb538b63cbfba "vhost-user: delay vhost_user_stop" It's better to explain why we don't need to deal with case 1 in the vhost-user-net case. And do we still need patch 1 if we had this patch?? Thanks Signed-off-by: Dima Stepanov --- hw/block/vhost-user-blk.c | 49 +-- 1 file changed, 43 insertions(+), 6 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 9d8c0b3..447fc9c 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -337,11 +337,6 @@ static void vhost_user_blk_disconnect(DeviceState *dev) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VHostUserBlk *s = VHOST_USER_BLK(vdev); -if (!s->connected) { -return; -} -s->connected = false; - if (s->dev.started) { vhost_user_blk_stop(vdev); } @@ -349,6 +344,19 @@ static void vhost_user_blk_disconnect(DeviceState *dev) vhost_dev_cleanup(>dev); } +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event); + +static void vhost_user_blk_chr_closed_bh(void *opaque) +{ +DeviceState *dev = opaque; +VirtIODevice *vdev = VIRTIO_DEVICE(dev); +VHostUserBlk *s = VHOST_USER_BLK(vdev); + +vhost_user_blk_disconnect(dev); +qemu_chr_fe_set_handlers(>chardev, NULL, NULL, vhost_user_blk_event, +NULL, opaque, NULL, true); +} + static void vhost_user_blk_event(void *opaque, QEMUChrEvent event) { DeviceState *dev = opaque; @@ -363,7 +371,28 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event) } break; case CHR_EVENT_CLOSED: -vhost_user_blk_disconnect(dev); +/* + * A close event may happen during a read/write, but vhost + * code assumes the vhost_dev remains setup, so delay the + * stop & clear. There are two possible paths to hit this + * disconnect event: + * 1. When VM is in the RUN_STATE_PRELAUNCH state. The + * vhost_user_blk_device_realize() is a caller. + * 2. In tha main loop phase after VM start. + * + * For p2 the disconnect event will be delayed. We can't + * do the same for p1, because we are not running the loop + * at this moment. So just skip this step and perform + * disconnect in the caller function. + */ +if (s->connected && runstate_is_running()) { +AioContext *ctx = qemu_get_current_aio_context(); + +qemu_chr_fe_set_handlers(>chardev, NULL, NULL, NULL, NULL, +NULL, NULL, false); +aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque); +} +s->connected = false; break; case CHR_EVENT_BREAK: case CHR_EVENT_MUX_IN: @@ -428,6 +457,14 @@ reconnect: ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg, sizeof(struct virtio_blk_config)); +if (!s->connected) { +/* + * Perform disconnect before making reconnect. More detailed + * comment why it was delayed is in the vhost_user_blk_event() + * routine. + */ +vhost_user_blk_disconnect(dev); +} if (ret < 0) { error_report("vhost-user-blk: get block config failed"); goto reconnect;
Re: [RFC v3 4/6] qmp: add QMP command x-debug-virtio-queue-status
On 2020/5/15 下午11:16, Laurent Vivier wrote: On 08/05/2020 04:57, Jason Wang wrote: On 2020/5/7 下午7:49, Laurent Vivier wrote: This new command shows internal status of a VirtQueue. (vrings and indexes). Signed-off-by: Laurent Vivier It looks to me that packed virtqueue is not supported. It's better to add them in the future. I agree, it's why the series still remains an "RFC". ... diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 59bf6ef651a6..57552bf05014 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -3877,6 +3877,41 @@ static VirtIODevice *virtio_device_find(const char *path) return NULL; } +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path, + uint16_t queue, Error **errp) +{ + VirtIODevice *vdev; + VirtQueueStatus *status; + + vdev = virtio_device_find(path); + if (vdev == NULL) { + error_setg(errp, "Path %s is not a VirtIO device", path); + return NULL; + } + + if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) { + error_setg(errp, "Invalid virtqueue number %d", queue); + return NULL; + } + + status = g_new0(VirtQueueStatus, 1); + status->queue_index = vdev->vq[queue].queue_index; + status->inuse = vdev->vq[queue].inuse; + status->vring_num = vdev->vq[queue].vring.num; + status->vring_num_default = vdev->vq[queue].vring.num_default; + status->vring_align = vdev->vq[queue].vring.align; + status->vring_desc = vdev->vq[queue].vring.desc; + status->vring_avail = vdev->vq[queue].vring.avail; + status->vring_used = vdev->vq[queue].vring.used; + status->last_avail_idx = vdev->vq[queue].last_avail_idx; This might not be correct when vhost is used. We may consider to sync last_avail_idx from vhost backends here? Yes, but I don't know how to do that. Where can I find the information? It could be synced through vhost ops vhost_get_vring_base(), see vhost_virtqueue_stop(). Thanks Thanks, Laurent
Re: [PATCH v2 5/5] vhost: add device started check in migration set log
On 2020/5/16 上午11:20, Li Feng wrote: Hi, Dima. This abort is what I have mentioned in my previous email. I have triggered this crash without any fix a week ago. And I have written a test patch to let vhost_log_global_start return int and propagate the error to up layer. However, my change is a little large, because the origin callback return void, and don't do some rollback. After test, the migration could migrate to dst successfully, and fio is still running perfectly, but the src vm is still stuck here, no crash. Is it right to return this error to the up layer? That could be a solution or we may ask David for more suggestion. Another thing that might be useful is to block re connection during migration. Thanks Thanks, Feng Li Dima Stepanov 于2020年5月16日周六 上午12:55写道: On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote: On 2020/5/13 下午5:47, Dima Stepanov wrote: case CHR_EVENT_CLOSED: /* a close event may happen during a read/write, but vhost * code assumes the vhost_dev remains setup, so delay the * stop & clear to idle. * FIXME: better handle failure in vhost code, remove bh */ if (s->watch) { AioContext *ctx = qemu_get_current_aio_context(); g_source_remove(s->watch); s->watch = 0; qemu_chr_fe_set_handlers(>chr, NULL, NULL, NULL, NULL, NULL, NULL, false); aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque); } break; I think it's time we dropped the FIXME and moved the handling to common code. Jason? Marc-André? I agree. Just to confirm, do you prefer bh or doing changes like what is done in this series? It looks to me bh can have more easier codes. Could it be a good idea just to make disconnect in the char device but postphone clean up in the vhost-user-blk (or any other vhost-user device) itself? So we are moving the postphone logic and decision from the char device to vhost-user device. One of the idea i have is as follows: - Put ourself in the INITIALIZATION state - Start these vhost-user "handshake" commands - If we got a disconnect error, perform disconnect, but don't clean up device (it will be clean up on the roll back). I can be done by checking the state in vhost_user_..._disconnect routine or smth like it Any issue you saw just using the aio bh as Michael posted above. Then we don't need to deal with the silent vhost_dev_stop() and we will have codes that is much more easier to understand. I've implemented this solution inside hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by using the s->connected field. Looks good and more correct fix ). I have two questions here before i'll rework the fixes: 1. Is it okay to make the similar fix inside vhost_user_blk_event() or we are looking for more generic vhost-user solution? What do you think? 2. For migration we require an additional information that for the vhost-user device it isn't an error, because i'm trigerring the following assert error: Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'. Program terminated with signal SIGABRT, Aborted. #0 0x7fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6 [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))] (gdb) bt #0 0x7fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x7fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x5648ea376ee6 in vhost_log_global_start (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857 #3 0x5648ea2dde7e in memory_global_dirty_log_start () at ./memory.c:2611 #4 0x5648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0) at ./migration/ram.c:2305 #5 0x5648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 ) at ./migration/ram.c:2323 #6 0x5648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00, opaque=0x5648eb1f0f20 ) at ./migration/ram.c:2436 #7 0x5648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at migration/savevm.c:1176 #8 0x5648ea674511 in migration_thread (opaque=0x5648ec031ff0) at migration/migration.c:3416 #9 0x5648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at util/qemu-thread-posix.c:519 #10 0x7fb56eac56ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #11 0x7fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) frame 2 #2 0x5648ea376ee6 in vhost_log_global_start (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857 857 abort(); (gdb) list 852 { 853 int r; 854 855 r = vhost_migration_log(listener, true); 856 if (r < 0) { 857 abort(); 858 } 859 } 860 861 s
Re: [PATCH v2 5/5] vhost: add device started check in migration set log
On 2020/5/16 上午12:54, Dima Stepanov wrote: On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote: On 2020/5/13 下午5:47, Dima Stepanov wrote: case CHR_EVENT_CLOSED: /* a close event may happen during a read/write, but vhost * code assumes the vhost_dev remains setup, so delay the * stop & clear to idle. * FIXME: better handle failure in vhost code, remove bh */ if (s->watch) { AioContext *ctx = qemu_get_current_aio_context(); g_source_remove(s->watch); s->watch = 0; qemu_chr_fe_set_handlers(>chr, NULL, NULL, NULL, NULL, NULL, NULL, false); aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque); } break; I think it's time we dropped the FIXME and moved the handling to common code. Jason? Marc-André? I agree. Just to confirm, do you prefer bh or doing changes like what is done in this series? It looks to me bh can have more easier codes. Could it be a good idea just to make disconnect in the char device but postphone clean up in the vhost-user-blk (or any other vhost-user device) itself? So we are moving the postphone logic and decision from the char device to vhost-user device. One of the idea i have is as follows: - Put ourself in the INITIALIZATION state - Start these vhost-user "handshake" commands - If we got a disconnect error, perform disconnect, but don't clean up device (it will be clean up on the roll back). I can be done by checking the state in vhost_user_..._disconnect routine or smth like it Any issue you saw just using the aio bh as Michael posted above. Then we don't need to deal with the silent vhost_dev_stop() and we will have codes that is much more easier to understand. I've implemented this solution inside hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by using the s->connected field. Looks good and more correct fix ). I have two questions here before i'll rework the fixes: 1. Is it okay to make the similar fix inside vhost_user_blk_event() or we are looking for more generic vhost-user solution? What do you think? I think I agree with Michael, it's better to have a generic vhost-user solution. But if it turns out to be not easy, we can start from fixing vhost-user-blk. 2. For migration we require an additional information that for the vhost-user device it isn't an error, because i'm trigerring the following assert error: Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'. Program terminated with signal SIGABRT, Aborted. #0 0x7fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6 [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))] (gdb) bt #0 0x7fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x7fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x5648ea376ee6 in vhost_log_global_start (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857 #3 0x5648ea2dde7e in memory_global_dirty_log_start () at ./memory.c:2611 #4 0x5648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0) at ./migration/ram.c:2305 #5 0x5648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 ) at ./migration/ram.c:2323 #6 0x5648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00, opaque=0x5648eb1f0f20 ) at ./migration/ram.c:2436 #7 0x5648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at migration/savevm.c:1176 #8 0x5648ea674511 in migration_thread (opaque=0x5648ec031ff0) at migration/migration.c:3416 #9 0x5648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at util/qemu-thread-posix.c:519 #10 0x7fb56eac56ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #11 0x7fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) frame 2 #2 0x5648ea376ee6 in vhost_log_global_start (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857 857 abort(); (gdb) list 852 { 853 int r; 854 855 r = vhost_migration_log(listener, true); 856 if (r < 0) { 857 abort(); 858 } 859 } 860 861 static void vhost_log_global_stop(MemoryListener *listener) Since bh postphone the clean up, we can't use the ->started field. Do we have any mechanism to get the device type/state in the common vhost_migration_log() routine? So for example for the vhost-user/disconnect device we will be able to return 0. Or should we implement it and introduce it in this patch set? This requires more thought, I will reply in Feng's mail. Thanks Thanks, Dima. Thank - vhost-user command returns error back to the _start() routine - Rollback in one place in the start() routine, by
Re: [PATCH v2 5/5] vhost: add device started check in migration set log
On 2020/5/13 下午5:47, Dima Stepanov wrote: case CHR_EVENT_CLOSED: /* a close event may happen during a read/write, but vhost * code assumes the vhost_dev remains setup, so delay the * stop & clear to idle. * FIXME: better handle failure in vhost code, remove bh */ if (s->watch) { AioContext *ctx = qemu_get_current_aio_context(); g_source_remove(s->watch); s->watch = 0; qemu_chr_fe_set_handlers(>chr, NULL, NULL, NULL, NULL, NULL, NULL, false); aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque); } break; I think it's time we dropped the FIXME and moved the handling to common code. Jason? Marc-André? I agree. Just to confirm, do you prefer bh or doing changes like what is done in this series? It looks to me bh can have more easier codes. Could it be a good idea just to make disconnect in the char device but postphone clean up in the vhost-user-blk (or any other vhost-user device) itself? So we are moving the postphone logic and decision from the char device to vhost-user device. One of the idea i have is as follows: - Put ourself in the INITIALIZATION state - Start these vhost-user "handshake" commands - If we got a disconnect error, perform disconnect, but don't clean up device (it will be clean up on the roll back). I can be done by checking the state in vhost_user_..._disconnect routine or smth like it Any issue you saw just using the aio bh as Michael posted above. Then we don't need to deal with the silent vhost_dev_stop() and we will have codes that is much more easier to understand. Thank - vhost-user command returns error back to the _start() routine - Rollback in one place in the start() routine, by calling this postphoned clean up for the disconnect
Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
On 2020/5/13 下午5:36, Dima Stepanov wrote: On Wed, May 13, 2020 at 11:00:38AM +0800, Jason Wang wrote: On 2020/5/12 下午5:08, Dima Stepanov wrote: On Tue, May 12, 2020 at 11:26:11AM +0800, Jason Wang wrote: On 2020/5/11 下午5:11, Dima Stepanov wrote: On Mon, May 11, 2020 at 11:05:58AM +0800, Jason Wang wrote: On 2020/4/30 下午9:36, Dima Stepanov wrote: Since disconnect can happen at any time during initialization not all vring buffers (for instance used vring) can be intialized successfully. If the buffer was not initialized then vhost_memory_unmap call will lead to SIGSEGV. Add checks for the vring address value before calling unmap. Also add assert() in the vhost_memory_unmap() routine. Signed-off-by: Dima Stepanov --- hw/virtio/vhost.c | 27 +-- 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index ddbdc53..3ee50c4 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer, hwaddr len, int is_write, hwaddr access_len) { +assert(buffer); + if (!vhost_dev_has_iommu(dev)) { cpu_physical_memory_unmap(buffer, len, is_write, access_len); } @@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev, vhost_vq_index); } -vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx), - 1, virtio_queue_get_used_size(vdev, idx)); -vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx), - 0, virtio_queue_get_avail_size(vdev, idx)); -vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx), - 0, virtio_queue_get_desc_size(vdev, idx)); +/* + * Since the vhost-user disconnect can happen during initialization + * check if vring was initialized, before making unmap. + */ +if (vq->used) { +vhost_memory_unmap(dev, vq->used, + virtio_queue_get_used_size(vdev, idx), + 1, virtio_queue_get_used_size(vdev, idx)); +} +if (vq->avail) { +vhost_memory_unmap(dev, vq->avail, + virtio_queue_get_avail_size(vdev, idx), + 0, virtio_queue_get_avail_size(vdev, idx)); +} +if (vq->desc) { +vhost_memory_unmap(dev, vq->desc, + virtio_queue_get_desc_size(vdev, idx), + 0, virtio_queue_get_desc_size(vdev, idx)); +} Any reason not checking hdev->started instead? vhost_dev_start() will set it to true if virtqueues were correctly mapped. Thanks Well i see it a little bit different: - vhost_dev_start() sets hdev->started to true before starting virtqueues - vhost_virtqueue_start() maps all the memory If we hit the vhost disconnect at the start of the vhost_virtqueue_start(), for instance for this call: r = dev->vhost_ops->vhost_set_vring_base(dev, ); Then we will call vhost_user_blk_disconnect: vhost_user_blk_disconnect()-> vhost_user_blk_stop()-> vhost_dev_stop()-> vhost_virtqueue_stop() As a result we will come in this routine with the hdev->started still set to true, but if used/avail/desc fields still uninitialized and set to 0. I may miss something, but consider both vhost_dev_start() and vhost_user_blk_disconnect() were serialized in main loop. Can this really happen? Yes, consider the case when we start the vhost-user-blk device: vhost_dev_start-> vhost_virtqueue_start And we got a disconnect in the middle of vhost_virtqueue_start() routine, for instance: 1000 vq->num = state.num = virtio_queue_get_num(vdev, idx); 1001 r = dev->vhost_ops->vhost_set_vring_num(dev, ); 1002 if (r) { 1003 VHOST_OPS_DEBUG("vhost_set_vring_num failed"); 1004 return -errno; 1005 } --> Here we got a disconnect <-- 1006 1007 state.num = virtio_queue_get_last_avail_idx(vdev, idx); 1008 r = dev->vhost_ops->vhost_set_vring_base(dev, ); 1009 if (r) { 1010 VHOST_OPS_DEBUG("vhost_set_vring_base failed"); 1011 return -errno; 1012 } As a result call to vhost_set_vring_base will call the disconnect routine. The backtrace log for SIGSEGV is as follows: Thread 4 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x72ea9700 (LWP 183150)] 0x74d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt #0 0x74d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x5590fd90 in flatview_write_continue (fv=0x7fffec4a2600, addr=0, attrs=..., ptr=0x0, len=1028, ad
Re: [PATCH v2 5/5] vhost: add device started check in migration set log
On 2020/5/13 下午12:15, Michael S. Tsirkin wrote: On Tue, May 12, 2020 at 12:35:30PM +0300, Dima Stepanov wrote: On Tue, May 12, 2020 at 11:32:50AM +0800, Jason Wang wrote: On 2020/5/11 下午5:25, Dima Stepanov wrote: On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote: On 2020/4/30 下午9:36, Dima Stepanov wrote: If vhost-user daemon is used as a backend for the vhost device, then we should consider a possibility of disconnect at any moment. If such disconnect happened in the vhost_migration_log() routine the vhost device structure will be clean up. At the start of the vhost_migration_log() function there is a check: if (!dev->started) { dev->log_enabled = enable; return 0; } To be consistent with this check add the same check after calling the vhost_dev_set_log() routine. This in general help not to break a migration due the assert() message. But it looks like that this code should be revised to handle these errors more carefully. In case of vhost-user device backend the fail paths should consider the state of the device. In this case we should skip some function calls during rollback on the error paths, so not to get the NULL dereference errors. Signed-off-by: Dima Stepanov --- hw/virtio/vhost.c | 39 +++ 1 file changed, 35 insertions(+), 4 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 3ee50c4..d5ab96d 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev, static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log) { int r, i, idx; + +if (!dev->started) { +/* + * If vhost-user daemon is used as a backend for the + * device and the connection is broken, then the vhost_dev + * structure will be reset all its values to 0. + * Add additional check for the device state. + */ +return -1; +} + r = vhost_dev_set_features(dev, enable_log); if (r < 0) { goto err_features; @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log) } return 0; err_vq: -for (; i >= 0; --i) { +/* + * Disconnect with the vhost-user daemon can lead to the + * vhost_dev_cleanup() call which will clean up vhost_dev + * structure. + */ +for (; dev->started && (i >= 0); --i) { idx = dev->vhost_ops->vhost_get_vq_index( Why need the check of dev->started here, can started be modified outside mainloop? If yes, I don't get the check of !dev->started in the beginning of this function. No dev->started can't change outside the mainloop. The main problem is only for the vhost_user_blk daemon. Consider the case when we successfully pass the dev->started check at the beginning of the function, but after it we hit the disconnect on the next call on the second or third iteration: r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log); The unix socket backend device will call the disconnect routine for this device and reset the structure. So the structure will be reset (and dev->started set to false) inside this set_addr() call. I still don't get here. I think the disconnect can not happen in the middle of vhost_dev_set_log() since both of them were running in mainloop. And even if it can, we probably need other synchronization mechanism other than simple check here. Disconnect isn't happened in the separate thread it is happened in this routine inside vhost_dev_set_log. When for instance vhost_user_write() call failed: vhost_user_set_log_base() vhost_user_write() vhost_user_blk_disconnect() vhost_dev_cleanup() vhost_user_backend_cleanup() So the point is that if we somehow got a disconnect with the vhost-user-blk daemon before the vhost_user_write() call then it will continue clean up by running vhost_user_blk_disconnect() function. I wrote a more detailed backtrace stack in the separate thread, which is pretty similar to what we have here: Re: [PATCH v2 4/5] vhost: check vring address before calling unmap The places are different but the problem is pretty similar. So if vhost-user commands handshake then everything is fine and reconnect will work as expected. The only problem is how to handle reconnect properly between vhost-user command send/receive. So vhost net had this problem too. commit e7c83a885f865128ae3cf1946f8cb538b63cbfba Author: Marc-André Lureau Date: Mon Feb 27 14:49:56 2017 +0400 vhost-user: delay vhost_user_stop Since commit b0a335e351103bf92f3f9d0bd5759311be8156ac, a socket write may trigger a disconnect events, calling vhost_user_stop() and clearing all the vhost_dev strutures holding data that vhost.c functions expect to remain valid. Delay the cleanup to keep the vhost_dev structure valid during the vhost.c f
Re: [PATCH v2 5/5] vhost: add device started check in migration set log
On 2020/5/12 下午5:35, Dima Stepanov wrote: On Tue, May 12, 2020 at 11:32:50AM +0800, Jason Wang wrote: On 2020/5/11 下午5:25, Dima Stepanov wrote: On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote: On 2020/4/30 下午9:36, Dima Stepanov wrote: If vhost-user daemon is used as a backend for the vhost device, then we should consider a possibility of disconnect at any moment. If such disconnect happened in the vhost_migration_log() routine the vhost device structure will be clean up. At the start of the vhost_migration_log() function there is a check: if (!dev->started) { dev->log_enabled = enable; return 0; } To be consistent with this check add the same check after calling the vhost_dev_set_log() routine. This in general help not to break a migration due the assert() message. But it looks like that this code should be revised to handle these errors more carefully. In case of vhost-user device backend the fail paths should consider the state of the device. In this case we should skip some function calls during rollback on the error paths, so not to get the NULL dereference errors. Signed-off-by: Dima Stepanov --- hw/virtio/vhost.c | 39 +++ 1 file changed, 35 insertions(+), 4 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 3ee50c4..d5ab96d 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev, static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log) { int r, i, idx; + +if (!dev->started) { +/* + * If vhost-user daemon is used as a backend for the + * device and the connection is broken, then the vhost_dev + * structure will be reset all its values to 0. + * Add additional check for the device state. + */ +return -1; +} + r = vhost_dev_set_features(dev, enable_log); if (r < 0) { goto err_features; @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log) } return 0; err_vq: -for (; i >= 0; --i) { +/* + * Disconnect with the vhost-user daemon can lead to the + * vhost_dev_cleanup() call which will clean up vhost_dev + * structure. + */ +for (; dev->started && (i >= 0); --i) { idx = dev->vhost_ops->vhost_get_vq_index( Why need the check of dev->started here, can started be modified outside mainloop? If yes, I don't get the check of !dev->started in the beginning of this function. No dev->started can't change outside the mainloop. The main problem is only for the vhost_user_blk daemon. Consider the case when we successfully pass the dev->started check at the beginning of the function, but after it we hit the disconnect on the next call on the second or third iteration: r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log); The unix socket backend device will call the disconnect routine for this device and reset the structure. So the structure will be reset (and dev->started set to false) inside this set_addr() call. I still don't get here. I think the disconnect can not happen in the middle of vhost_dev_set_log() since both of them were running in mainloop. And even if it can, we probably need other synchronization mechanism other than simple check here. Disconnect isn't happened in the separate thread it is happened in this routine inside vhost_dev_set_log. When for instance vhost_user_write() call failed: vhost_user_set_log_base() vhost_user_write() vhost_user_blk_disconnect() vhost_dev_cleanup() vhost_user_backend_cleanup() So the point is that if we somehow got a disconnect with the vhost-user-blk daemon before the vhost_user_write() call then it will continue clean up by running vhost_user_blk_disconnect() function. I wrote a more detailed backtrace stack in the separate thread, which is pretty similar to what we have here: Re: [PATCH v2 4/5] vhost: check vring address before calling unmap The places are different but the problem is pretty similar. Yes. So if vhost-user commands handshake then everything is fine and reconnect will work as expected. The only problem is how to handle reconnect properly between vhost-user command send/receive. As i wrote we have a test: - run src VM with vhost-usr-blk daemon used - run fio inside it - perform reconnect every X seconds (just kill and restart daemon), X is random - run dst VM - perform migration - fio should complete in dst VM And we cycle this test like forever. So it fails once per ~25 iteration. By adding some delays inside qemu we were able to make the race window larger. It would be better if we can draft some qtest for this. Thanks
Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
On 2020/5/12 下午5:08, Dima Stepanov wrote: On Tue, May 12, 2020 at 11:26:11AM +0800, Jason Wang wrote: On 2020/5/11 下午5:11, Dima Stepanov wrote: On Mon, May 11, 2020 at 11:05:58AM +0800, Jason Wang wrote: On 2020/4/30 下午9:36, Dima Stepanov wrote: Since disconnect can happen at any time during initialization not all vring buffers (for instance used vring) can be intialized successfully. If the buffer was not initialized then vhost_memory_unmap call will lead to SIGSEGV. Add checks for the vring address value before calling unmap. Also add assert() in the vhost_memory_unmap() routine. Signed-off-by: Dima Stepanov --- hw/virtio/vhost.c | 27 +-- 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index ddbdc53..3ee50c4 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer, hwaddr len, int is_write, hwaddr access_len) { +assert(buffer); + if (!vhost_dev_has_iommu(dev)) { cpu_physical_memory_unmap(buffer, len, is_write, access_len); } @@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev, vhost_vq_index); } -vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx), - 1, virtio_queue_get_used_size(vdev, idx)); -vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx), - 0, virtio_queue_get_avail_size(vdev, idx)); -vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx), - 0, virtio_queue_get_desc_size(vdev, idx)); +/* + * Since the vhost-user disconnect can happen during initialization + * check if vring was initialized, before making unmap. + */ +if (vq->used) { +vhost_memory_unmap(dev, vq->used, + virtio_queue_get_used_size(vdev, idx), + 1, virtio_queue_get_used_size(vdev, idx)); +} +if (vq->avail) { +vhost_memory_unmap(dev, vq->avail, + virtio_queue_get_avail_size(vdev, idx), + 0, virtio_queue_get_avail_size(vdev, idx)); +} +if (vq->desc) { +vhost_memory_unmap(dev, vq->desc, + virtio_queue_get_desc_size(vdev, idx), + 0, virtio_queue_get_desc_size(vdev, idx)); +} Any reason not checking hdev->started instead? vhost_dev_start() will set it to true if virtqueues were correctly mapped. Thanks Well i see it a little bit different: - vhost_dev_start() sets hdev->started to true before starting virtqueues - vhost_virtqueue_start() maps all the memory If we hit the vhost disconnect at the start of the vhost_virtqueue_start(), for instance for this call: r = dev->vhost_ops->vhost_set_vring_base(dev, ); Then we will call vhost_user_blk_disconnect: vhost_user_blk_disconnect()-> vhost_user_blk_stop()-> vhost_dev_stop()-> vhost_virtqueue_stop() As a result we will come in this routine with the hdev->started still set to true, but if used/avail/desc fields still uninitialized and set to 0. I may miss something, but consider both vhost_dev_start() and vhost_user_blk_disconnect() were serialized in main loop. Can this really happen? Yes, consider the case when we start the vhost-user-blk device: vhost_dev_start-> vhost_virtqueue_start And we got a disconnect in the middle of vhost_virtqueue_start() routine, for instance: 1000 vq->num = state.num = virtio_queue_get_num(vdev, idx); 1001 r = dev->vhost_ops->vhost_set_vring_num(dev, ); 1002 if (r) { 1003 VHOST_OPS_DEBUG("vhost_set_vring_num failed"); 1004 return -errno; 1005 } --> Here we got a disconnect <-- 1006 1007 state.num = virtio_queue_get_last_avail_idx(vdev, idx); 1008 r = dev->vhost_ops->vhost_set_vring_base(dev, ); 1009 if (r) { 1010 VHOST_OPS_DEBUG("vhost_set_vring_base failed"); 1011 return -errno; 1012 } As a result call to vhost_set_vring_base will call the disconnect routine. The backtrace log for SIGSEGV is as follows: Thread 4 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x72ea9700 (LWP 183150)] 0x74d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt #0 0x74d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x5590fd90 in flatview_write_continue (fv=0x7fffec4a2600, addr=0, attrs=..., ptr=0x0, len=1028, addr1=0, l=1028, mr=0x56b1b310) at ./exec.c:3142 #2 0x5590fe98 in flatview_write (f
Re: [PATCH v2 5/5] vhost: add device started check in migration set log
On 2020/5/11 下午5:25, Dima Stepanov wrote: On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote: On 2020/4/30 下午9:36, Dima Stepanov wrote: If vhost-user daemon is used as a backend for the vhost device, then we should consider a possibility of disconnect at any moment. If such disconnect happened in the vhost_migration_log() routine the vhost device structure will be clean up. At the start of the vhost_migration_log() function there is a check: if (!dev->started) { dev->log_enabled = enable; return 0; } To be consistent with this check add the same check after calling the vhost_dev_set_log() routine. This in general help not to break a migration due the assert() message. But it looks like that this code should be revised to handle these errors more carefully. In case of vhost-user device backend the fail paths should consider the state of the device. In this case we should skip some function calls during rollback on the error paths, so not to get the NULL dereference errors. Signed-off-by: Dima Stepanov --- hw/virtio/vhost.c | 39 +++ 1 file changed, 35 insertions(+), 4 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 3ee50c4..d5ab96d 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev, static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log) { int r, i, idx; + +if (!dev->started) { +/* + * If vhost-user daemon is used as a backend for the + * device and the connection is broken, then the vhost_dev + * structure will be reset all its values to 0. + * Add additional check for the device state. + */ +return -1; +} + r = vhost_dev_set_features(dev, enable_log); if (r < 0) { goto err_features; @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log) } return 0; err_vq: -for (; i >= 0; --i) { +/* + * Disconnect with the vhost-user daemon can lead to the + * vhost_dev_cleanup() call which will clean up vhost_dev + * structure. + */ +for (; dev->started && (i >= 0); --i) { idx = dev->vhost_ops->vhost_get_vq_index( Why need the check of dev->started here, can started be modified outside mainloop? If yes, I don't get the check of !dev->started in the beginning of this function. No dev->started can't change outside the mainloop. The main problem is only for the vhost_user_blk daemon. Consider the case when we successfully pass the dev->started check at the beginning of the function, but after it we hit the disconnect on the next call on the second or third iteration: r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log); The unix socket backend device will call the disconnect routine for this device and reset the structure. So the structure will be reset (and dev->started set to false) inside this set_addr() call. I still don't get here. I think the disconnect can not happen in the middle of vhost_dev_set_log() since both of them were running in mainloop. And even if it can, we probably need other synchronization mechanism other than simple check here. So we shouldn't call the clean up calls because this virtqueues were clean up in the disconnect call. But we should protect these calls somehow, so it will not hit SIGSEGV and we will be able to pass migration. Just to summarize it: For the vhost-user-blk devices we ca hit clean up calls twice in case of vhost disconnect: 1. The first time during the disconnect process. The clean up is called inside it. 2. The second time during roll back clean up. So if it is the case we should skip p2. dev, dev->vq_index + i); vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, dev->log_enabled); } -vhost_dev_set_features(dev, dev->log_enabled); +if (dev->started) { +vhost_dev_set_features(dev, dev->log_enabled); +} err_features: return r; } @@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, int enable) } else { vhost_dev_log_resize(dev, vhost_get_log_size(dev)); r = vhost_dev_set_log(dev, true); -if (r < 0) { +/* + * The dev log resize can fail, because of disconnect + * with the vhost-user-blk daemon. Check the device + * state before calling the vhost_dev_set_log() + * function. + * Don't return error if device isn't started to be + * consistent with the check above. + */ +if (dev->started && r < 0) { return r; } } @@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev) fail_log: vhost_log_put(hdev, false); fail_vq
Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
On 2020/5/11 下午5:11, Dima Stepanov wrote: On Mon, May 11, 2020 at 11:05:58AM +0800, Jason Wang wrote: On 2020/4/30 下午9:36, Dima Stepanov wrote: Since disconnect can happen at any time during initialization not all vring buffers (for instance used vring) can be intialized successfully. If the buffer was not initialized then vhost_memory_unmap call will lead to SIGSEGV. Add checks for the vring address value before calling unmap. Also add assert() in the vhost_memory_unmap() routine. Signed-off-by: Dima Stepanov --- hw/virtio/vhost.c | 27 +-- 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index ddbdc53..3ee50c4 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer, hwaddr len, int is_write, hwaddr access_len) { +assert(buffer); + if (!vhost_dev_has_iommu(dev)) { cpu_physical_memory_unmap(buffer, len, is_write, access_len); } @@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev, vhost_vq_index); } -vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx), - 1, virtio_queue_get_used_size(vdev, idx)); -vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx), - 0, virtio_queue_get_avail_size(vdev, idx)); -vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx), - 0, virtio_queue_get_desc_size(vdev, idx)); +/* + * Since the vhost-user disconnect can happen during initialization + * check if vring was initialized, before making unmap. + */ +if (vq->used) { +vhost_memory_unmap(dev, vq->used, + virtio_queue_get_used_size(vdev, idx), + 1, virtio_queue_get_used_size(vdev, idx)); +} +if (vq->avail) { +vhost_memory_unmap(dev, vq->avail, + virtio_queue_get_avail_size(vdev, idx), + 0, virtio_queue_get_avail_size(vdev, idx)); +} +if (vq->desc) { +vhost_memory_unmap(dev, vq->desc, + virtio_queue_get_desc_size(vdev, idx), + 0, virtio_queue_get_desc_size(vdev, idx)); +} Any reason not checking hdev->started instead? vhost_dev_start() will set it to true if virtqueues were correctly mapped. Thanks Well i see it a little bit different: - vhost_dev_start() sets hdev->started to true before starting virtqueues - vhost_virtqueue_start() maps all the memory If we hit the vhost disconnect at the start of the vhost_virtqueue_start(), for instance for this call: r = dev->vhost_ops->vhost_set_vring_base(dev, ); Then we will call vhost_user_blk_disconnect: vhost_user_blk_disconnect()-> vhost_user_blk_stop()-> vhost_dev_stop()-> vhost_virtqueue_stop() As a result we will come in this routine with the hdev->started still set to true, but if used/avail/desc fields still uninitialized and set to 0. I may miss something, but consider both vhost_dev_start() and vhost_user_blk_disconnect() were serialized in main loop. Can this really happen? Thanks } static void vhost_eventfd_add(MemoryListener *listener,
Re: [PATCH v2 5/5] vhost: add device started check in migration set log
On 2020/4/30 下午9:36, Dima Stepanov wrote: If vhost-user daemon is used as a backend for the vhost device, then we should consider a possibility of disconnect at any moment. If such disconnect happened in the vhost_migration_log() routine the vhost device structure will be clean up. At the start of the vhost_migration_log() function there is a check: if (!dev->started) { dev->log_enabled = enable; return 0; } To be consistent with this check add the same check after calling the vhost_dev_set_log() routine. This in general help not to break a migration due the assert() message. But it looks like that this code should be revised to handle these errors more carefully. In case of vhost-user device backend the fail paths should consider the state of the device. In this case we should skip some function calls during rollback on the error paths, so not to get the NULL dereference errors. Signed-off-by: Dima Stepanov --- hw/virtio/vhost.c | 39 +++ 1 file changed, 35 insertions(+), 4 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 3ee50c4..d5ab96d 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev, static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log) { int r, i, idx; + +if (!dev->started) { +/* + * If vhost-user daemon is used as a backend for the + * device and the connection is broken, then the vhost_dev + * structure will be reset all its values to 0. + * Add additional check for the device state. + */ +return -1; +} + r = vhost_dev_set_features(dev, enable_log); if (r < 0) { goto err_features; @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log) } return 0; err_vq: -for (; i >= 0; --i) { +/* + * Disconnect with the vhost-user daemon can lead to the + * vhost_dev_cleanup() call which will clean up vhost_dev + * structure. + */ +for (; dev->started && (i >= 0); --i) { idx = dev->vhost_ops->vhost_get_vq_index( Why need the check of dev->started here, can started be modified outside mainloop? If yes, I don't get the check of !dev->started in the beginning of this function. dev, dev->vq_index + i); vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, dev->log_enabled); } -vhost_dev_set_features(dev, dev->log_enabled); +if (dev->started) { +vhost_dev_set_features(dev, dev->log_enabled); +} err_features: return r; } @@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, int enable) } else { vhost_dev_log_resize(dev, vhost_get_log_size(dev)); r = vhost_dev_set_log(dev, true); -if (r < 0) { +/* + * The dev log resize can fail, because of disconnect + * with the vhost-user-blk daemon. Check the device + * state before calling the vhost_dev_set_log() + * function. + * Don't return error if device isn't started to be + * consistent with the check above. + */ +if (dev->started && r < 0) { return r; } } @@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev) fail_log: vhost_log_put(hdev, false); fail_vq: -while (--i >= 0) { +/* + * Disconnect with the vhost-user daemon can lead to the + * vhost_dev_cleanup() call which will clean up vhost_dev + * structure. + */ +while ((--i >= 0) && (hdev->started)) { vhost_virtqueue_stop(hdev, vdev, hdev->vqs + i, This should be a separate patch. Thanks
Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
On 2020/4/30 下午9:36, Dima Stepanov wrote: Since disconnect can happen at any time during initialization not all vring buffers (for instance used vring) can be intialized successfully. If the buffer was not initialized then vhost_memory_unmap call will lead to SIGSEGV. Add checks for the vring address value before calling unmap. Also add assert() in the vhost_memory_unmap() routine. Signed-off-by: Dima Stepanov --- hw/virtio/vhost.c | 27 +-- 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index ddbdc53..3ee50c4 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer, hwaddr len, int is_write, hwaddr access_len) { +assert(buffer); + if (!vhost_dev_has_iommu(dev)) { cpu_physical_memory_unmap(buffer, len, is_write, access_len); } @@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev, vhost_vq_index); } -vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx), - 1, virtio_queue_get_used_size(vdev, idx)); -vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx), - 0, virtio_queue_get_avail_size(vdev, idx)); -vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx), - 0, virtio_queue_get_desc_size(vdev, idx)); +/* + * Since the vhost-user disconnect can happen during initialization + * check if vring was initialized, before making unmap. + */ +if (vq->used) { +vhost_memory_unmap(dev, vq->used, + virtio_queue_get_used_size(vdev, idx), + 1, virtio_queue_get_used_size(vdev, idx)); +} +if (vq->avail) { +vhost_memory_unmap(dev, vq->avail, + virtio_queue_get_avail_size(vdev, idx), + 0, virtio_queue_get_avail_size(vdev, idx)); +} +if (vq->desc) { +vhost_memory_unmap(dev, vq->desc, + virtio_queue_get_desc_size(vdev, idx), + 0, virtio_queue_get_desc_size(vdev, idx)); +} Any reason not checking hdev->started instead? vhost_dev_start() will set it to true if virtqueues were correctly mapped. Thanks } static void vhost_eventfd_add(MemoryListener *listener,
Re: [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device
On 2020/4/30 下午9:36, Dima Stepanov wrote: Introduce new wrappers to set/reset guest notifiers for the virtio device in the vhost device module: vhost_dev_assign_guest_notifiers ->set_guest_notifiers(..., ..., true); vhost_dev_drop_guest_notifiers ->set_guest_notifiers(..., ..., false); This is a preliminary step to refactor code, Maybe I miss something, I don't see any add-on patch to modify the new wrapper in this series? so the set_guest_notifiers methods could be called based on the vhost device state. Update all vhost used devices to use these wrappers instead of direct method call. Signed-off-by: Dima Stepanov --- backends/cryptodev-vhost.c | 26 +++--- backends/vhost-user.c | 16 +--- hw/block/vhost-user-blk.c | 15 +-- hw/net/vhost_net.c | 30 +- hw/scsi/vhost-scsi-common.c | 15 +-- hw/virtio/vhost-user-fs.c | 17 +++-- hw/virtio/vhost-vsock.c | 18 -- hw/virtio/vhost.c | 38 ++ hw/virtio/virtio.c | 13 + include/hw/virtio/vhost.h | 4 include/hw/virtio/virtio.h | 1 + 11 files changed, 118 insertions(+), 75 deletions(-) diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c index 8337c9a..4522195 100644 --- a/backends/cryptodev-vhost.c +++ b/backends/cryptodev-vhost.c @@ -169,16 +169,13 @@ vhost_set_vring_enable(CryptoDevBackendClient *cc, int cryptodev_vhost_start(VirtIODevice *dev, int total_queues) { VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev); -BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev))); -VirtioBusState *vbus = VIRTIO_BUS(qbus); -VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); int r, e; int i; CryptoDevBackend *b = vcrypto->cryptodev; CryptoDevBackendVhost *vhost_crypto; CryptoDevBackendClient *cc; -if (!k->set_guest_notifiers) { +if (!virtio_device_guest_notifiers_initialized(dev)) { error_report("binding does not support guest notifiers"); return -ENOSYS; } @@ -198,9 +195,13 @@ int cryptodev_vhost_start(VirtIODevice *dev, int total_queues) } } -r = k->set_guest_notifiers(qbus->parent, total_queues, true); +/* + * Since all the states are handled by one vhost device, + * use the first one in array. + */ +vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0); +r = vhost_dev_assign_guest_notifiers(_crypto->dev, dev, total_queues); if (r < 0) { -error_report("error binding guest notifier: %d", -r); goto err; } @@ -232,7 +233,8 @@ err_start: vhost_crypto = cryptodev_get_vhost(cc, b, i); cryptodev_vhost_stop_one(vhost_crypto, dev); } -e = k->set_guest_notifiers(qbus->parent, total_queues, false); +vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0); +e = vhost_dev_drop_guest_notifiers(_crypto->dev, dev, total_queues); if (e < 0) { error_report("vhost guest notifier cleanup failed: %d", e); } @@ -242,9 +244,6 @@ err: void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues) { -BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev))); -VirtioBusState *vbus = VIRTIO_BUS(qbus); -VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev); CryptoDevBackend *b = vcrypto->cryptodev; CryptoDevBackendVhost *vhost_crypto; @@ -259,7 +258,12 @@ void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues) cryptodev_vhost_stop_one(vhost_crypto, dev); } -r = k->set_guest_notifiers(qbus->parent, total_queues, false); +/* + * Since all the states are handled by one vhost device, + * use the first one in array. + */ +vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0); +r = vhost_dev_drop_guest_notifiers(_crypto->dev, dev, total_queues); if (r < 0) { error_report("vhost guest notifier cleanup failed: %d", r); } diff --git a/backends/vhost-user.c b/backends/vhost-user.c index 2bf3406..e116bc6 100644 --- a/backends/vhost-user.c +++ b/backends/vhost-user.c @@ -60,15 +60,13 @@ vhost_user_backend_dev_init(VhostUserBackend *b, VirtIODevice *vdev, void vhost_user_backend_start(VhostUserBackend *b) { -BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(b->vdev))); -VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); int ret, i ; if (b->started) { return; } -if (!k->set_guest_notifiers) { +if (!virtio_device_guest_notifiers_initialized(b->vdev)) { error_report("binding does not support guest notifiers"); return; } @@ -78,9 +76,8 @@ vhost_user_backend_start(VhostUserBackend *b) return; } -ret = k->set_guest_notifiers(qbus->parent, b->dev.nvqs,
Re: [RFC v3 5/6] qmp: add QMP command x-debug-virtio-queue-element
On 2020/5/7 下午7:49, Laurent Vivier wrote: This new command shows the information of a VirtQueue element. Signed-off-by: Laurent Vivier --- hw/virtio/virtio-stub.c | 9 +++ hw/virtio/virtio.c | 130 qapi/virtio.json| 94 + 3 files changed, 233 insertions(+) diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c index 3c1bf172acf6..8275e31430e7 100644 --- a/hw/virtio/virtio-stub.c +++ b/hw/virtio/virtio-stub.c @@ -23,3 +23,12 @@ VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path, { return qmp_virtio_unsupported(errp); } + +VirtioQueueElement *qmp_x_debug_virtio_queue_element(const char* path, + uint16_t queue, + bool has_index, + uint16_t index, + Error **errp) +{ +return qmp_virtio_unsupported(errp); +} diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 57552bf05014..66dc2cef1b39 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -4033,6 +4033,136 @@ VirtioStatus *qmp_x_debug_virtio_status(const char* path, Error **errp) return status; } +static VirtioRingDescFlagsList *qmp_decode_vring_desc_flags(uint16_t flags) +{ +VirtioRingDescFlagsList *list = NULL; +VirtioRingDescFlagsList *node; +int i; +struct { +uint16_t flag; +VirtioRingDescFlags value; +} map[] = { +{ VRING_DESC_F_NEXT, VIRTIO_RING_DESC_FLAGS_NEXT }, +{ VRING_DESC_F_WRITE, VIRTIO_RING_DESC_FLAGS_WRITE }, +{ VRING_DESC_F_INDIRECT, VIRTIO_RING_DESC_FLAGS_INDIRECT }, +{ 1 << VRING_PACKED_DESC_F_AVAIL, VIRTIO_RING_DESC_FLAGS_AVAIL }, +{ 1 << VRING_PACKED_DESC_F_USED, VIRTIO_RING_DESC_FLAGS_USED }, +{ 0, -1 } +}; + +for (i = 0; map[i].flag; i++) { +if ((map[i].flag & flags) == 0) { +continue; +} +node = g_malloc0(sizeof(VirtioRingDescFlagsList)); +node->value = map[i].value; +node->next = list; +list = node; +} + +return list; +} + +VirtioQueueElement *qmp_x_debug_virtio_queue_element(const char* path, + uint16_t queue, + bool has_index, + uint16_t index, + Error **errp) +{ +VirtIODevice *vdev; +VirtQueue *vq; +VirtioQueueElement *element = NULL; + +vdev = virtio_device_find(path); +if (vdev == NULL) { +error_setg(errp, "Path %s is not a VirtIO device", path); +return NULL; +} + +if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) { +error_setg(errp, "Invalid virtqueue number %d", queue); +return NULL; +} +vq = >vq[queue]; + +if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) { +error_setg(errp, "Packed ring not supported"); +return NULL; +} else { +unsigned int head, i, max; +VRingMemoryRegionCaches *caches; +MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID; +MemoryRegionCache *desc_cache; +VRingDesc desc; +VirtioRingDescList *list = NULL; +VirtioRingDescList *node; +int rc; + +RCU_READ_LOCK_GUARD(); + +max = vq->vring.num; + +if (!has_index) { +head = vring_avail_ring(vq, vq->last_avail_idx % vq->vring.num); +} else { +head = vring_avail_ring(vq, index % vq->vring.num); +} +i = head; + +caches = vring_get_region_caches(vq); +if (!caches) { +error_setg(errp, "Region caches not initialized"); +return NULL; +} + +if (caches->desc.len < max * sizeof(VRingDesc)) { +error_setg(errp, "Cannot map descriptor ring"); +return NULL; +} + +desc_cache = >desc; +vring_split_desc_read(vdev, , desc_cache, i); +if (desc.flags & VRING_DESC_F_INDIRECT) { +int64_t len; + +len = address_space_cache_init(_desc_cache, vdev->dma_as, + desc.addr, desc.len, false); +desc_cache = _desc_cache; +if (len < desc.len) { +error_setg(errp, "Cannot map indirect buffer"); +goto done; +} +i = 0; +vring_split_desc_read(vdev, , desc_cache, i); +} + +element = g_new0(VirtioQueueElement, 1); +element->index = head; +element->ndescs = 0; + +do { +node = g_new0(VirtioRingDescList, 1); +node->value = g_new0(VirtioRingDesc, 1); +node->value->addr = desc.addr; +
Re: [RFC v3 4/6] qmp: add QMP command x-debug-virtio-queue-status
On 2020/5/7 下午7:49, Laurent Vivier wrote: This new command shows internal status of a VirtQueue. (vrings and indexes). Signed-off-by: Laurent Vivier It looks to me that packed virtqueue is not supported. It's better to add them in the future. --- hw/virtio/virtio-stub.c | 6 +++ hw/virtio/virtio.c | 35 +++ qapi/virtio.json| 98 + 3 files changed, 139 insertions(+) diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c index ddb592f72eee..3c1bf172acf6 100644 --- a/hw/virtio/virtio-stub.c +++ b/hw/virtio/virtio-stub.c @@ -17,3 +17,9 @@ VirtioStatus *qmp_x_debug_virtio_status(const char* path, Error **errp) { return qmp_virtio_unsupported(errp); } + +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path, + uint16_t queue, Error **errp) +{ +return qmp_virtio_unsupported(errp); +} diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 59bf6ef651a6..57552bf05014 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -3877,6 +3877,41 @@ static VirtIODevice *virtio_device_find(const char *path) return NULL; } +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path, + uint16_t queue, Error **errp) +{ +VirtIODevice *vdev; +VirtQueueStatus *status; + +vdev = virtio_device_find(path); +if (vdev == NULL) { +error_setg(errp, "Path %s is not a VirtIO device", path); +return NULL; +} + +if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) { +error_setg(errp, "Invalid virtqueue number %d", queue); +return NULL; +} + +status = g_new0(VirtQueueStatus, 1); +status->queue_index = vdev->vq[queue].queue_index; +status->inuse = vdev->vq[queue].inuse; +status->vring_num = vdev->vq[queue].vring.num; +status->vring_num_default = vdev->vq[queue].vring.num_default; +status->vring_align = vdev->vq[queue].vring.align; +status->vring_desc = vdev->vq[queue].vring.desc; +status->vring_avail = vdev->vq[queue].vring.avail; +status->vring_used = vdev->vq[queue].vring.used; +status->last_avail_idx = vdev->vq[queue].last_avail_idx; This might not be correct when vhost is used. We may consider to sync last_avail_idx from vhost backends here? Thanks +status->shadow_avail_idx = vdev->vq[queue].shadow_avail_idx; +status->used_idx = vdev->vq[queue].used_idx; +status->signalled_used = vdev->vq[queue].signalled_used; +status->signalled_used_valid = vdev->vq[queue].signalled_used_valid; + +return status; +} + #define CONVERT_FEATURES(type, map)\ ({ \ type *list = NULL; \ diff --git a/qapi/virtio.json b/qapi/virtio.json index 69dd107d0c9b..43c234a9fc69 100644 --- a/qapi/virtio.json +++ b/qapi/virtio.json @@ -308,3 +308,101 @@ 'data': { 'path': 'str' }, 'returns': 'VirtioStatus' } + +## +# @VirtQueueStatus: +# +# Status of a VirtQueue +# +# @queue-index: VirtQueue queue_index +# +# @inuse: VirtQueue inuse +# +# @vring-num: VirtQueue vring.num +# +# @vring-num-default: VirtQueue vring.num_default +# +# @vring-align: VirtQueue vring.align +# +# @vring-desc: VirtQueue vring.desc +# +# @vring-avail: VirtQueue vring.avail +# +# @vring-used: VirtQueue vring.used +# +# @last-avail-idx: VirtQueue last_avail_idx +# +# @shadow-avail-idx: VirtQueue shadow_avail_idx +# +# @used-idx: VirtQueue used_idx +# +# @signalled-used: VirtQueue signalled_used +# +# @signalled-used-valid: VirtQueue signalled_used_valid +# +# Since: 5.1 +# +## + +{ 'struct': 'VirtQueueStatus', + 'data': { +'queue-index': 'uint16', +'inuse': 'uint32', +'vring-num': 'int', +'vring-num-default': 'int', +'vring-align': 'int', +'vring-desc': 'uint64', +'vring-avail': 'uint64', +'vring-used': 'uint64', +'last-avail-idx': 'uint16', +'shadow-avail-idx': 'uint16', +'used-idx': 'uint16', +'signalled-used': 'uint16', +'signalled-used-valid': 'uint16' + } +} + +## +# @x-debug-virtio-queue-status: +# +# Return the status of a given VirtQueue +# +# @path: QOBject path of the VirtIODevice +# +# @queue: queue number to examine +# +# Returns: Status of the VirtQueue +# +# Since: 5.1 +# +# Example: +# +# -> { "execute": "x-debug-virtio-queue-status", +# "arguments": { +# "path": "/machine/peripheral-anon/device[3]/virtio-backend", +# "queue": 0 +# } +# } +# <- { "return": { +# "signalled-used": 373, +# "inuse": 0, +# "vring-desc": 864411648, +# "vring-num-default": 256, +# "signalled-used-valid": 1, +# "vring-avail": 864415744, +# "last-avail-idx": 373, +# "queue-index": 0, +# "vring-used": 864416320, +# "shadow-avail-idx": 619, +# "used-idx": 373, +# "vring-num": 256, +#
Re: [RFC v3 3/6] qmp: decode feature bits in virtio-status
On 2020/5/7 下午7:49, Laurent Vivier wrote: Display feature names instead of a features bitmap for host, guest and backend. Decode features according device type, transport features are on the first line. Undecoded bits (if any) are stored in a separate field. Signed-off-by: Laurent Vivier This is really useful. I wonder maybe we need something similar in driver side, current sysfs can only display magic binary numbers. Thanks
Re: [PATCH v7 0/4] colo: Add support for continuous replication
On 2020/2/20 上午9:38, Zhang, Chen wrote: Hi Jason, I noticed this series can't be merged or queued, do you met some problem about it? Thanks Zhang Chen Not, I've queued this. Thanks
Re: [PATCH v7 0/4] colo: Add support for continuous replication
On 2019/11/19 下午8:28, Zhang, Chen wrote: -Original Message- From: Lukas Straub Sent: Thursday, November 14, 2019 12:36 AM To: qemu-devel Cc: Zhang, Chen ; Jason Wang ; Wen Congyang ; Xie Changlong ; Kevin Wolf ; Max Reitz ; qemu-block Subject: Re: [PATCH v7 0/4] colo: Add support for continuous replication On Fri, 25 Oct 2019 19:06:31 +0200 Lukas Straub wrote: Hello Everyone, These Patches add support for continuous replication to colo. This means that after the Primary fails and the Secondary did a failover, the Secondary can then become Primary and resume replication to a new Secondary. Regards, Lukas Straub v7: - clarify meaning of ip's in documentation and note that active and hidden images just need to be created once - reverted removal of bdrv_is_root_node(top_bs) in replication and adjusted the top-id= parameter in documentation acordingly v6: - documented the position= and insert= options - renamed replication test - clarified documentation by using different ip's for primary and secondary - added Reviewed-by tags v5: - change syntax for the position= parameter - fix spelling mistake v4: - fix checkpatch.pl warnings v3: - add test for replication changes - check if the filter to be inserted before/behind belongs to the same interface - fix the error message for the position= parameter - rename term "after" -> "behind" and variable "insert_before" -> "insert_before_flag" - document the quorum node on the secondary side - simplify quorum parameters in documentation - remove trailing spaces in documentation - clarify the testing procedure in documentation v2: - fix email formating - fix checkpatch.pl warnings - fix patchew error - clearer commit messages Lukas Straub (4): block/replication.c: Ignore requests after failover tests/test-replication.c: Add test for for secondary node continuing replication net/filter.c: Add Options to insert filters anywhere in the filter list colo: Update Documentation for continuous replication block/replication.c| 35 +- docs/COLO-FT.txt | 224 +++-- docs/block-replication.txt | 28 +++-- include/net/filter.h | 2 + net/filter.c | 92 ++- qemu-options.hx| 31 - tests/test-replication.c | 52 + 7 files changed, 389 insertions(+), 75 deletions(-) Hello Everyone, So I guess this is ready for merging or will that have to wait until the 4.2 release is done? Due to Qemu 4.2 release schedule, after soft feature freeze(Oct29) the master branch does not accept feature changes. But I don't know if sub-maintainer(block or net) can queue this series first then merge it after 4.2 release? Thanks Zhang Chen Will try to queue this series. Thanks Regards, Lukas Straub
Re: [PATCH v6 0/9] Packed virtqueue for virtio
On 2019/10/25 上午1:13, Eugenio Pérez wrote: Hi: This is an updated version of packed virtqueue support based on Wei and Jason's V5, mainly solving the clang leak detector error CI gave. Please review. Changes from V5: - Fix qemu's CI asan error. - Move/copy rcu comments. - Merge duplicated vdev->broken check between split and packet version. Eugenio Pérez (3): virtio: Free rng and blk virqueues virtio: add some rcu comments virtio: Move vdev->broken check to dispatch drop_all Jason Wang (4): virtio: basic packed virtqueue support virtio: event suppression support for packed ring vhost_net: enable packed ring support virtio: add property to enable packed virtqueue Wei Xu (2): virtio: basic structure for packed ring virtio: device/driverr area size calculation refactor for split ring Looks good to me. Just two nits: I tend to squash patch 8 and patch 9 into the patch that introduces those issues and split patch 3 into two parts. Btw, if you wish you can add your s-o-b to the series. Do you want to post a new version or I can tweak them by myself? Thanks hw/block/virtio-blk.c |7 +- hw/char/virtio-serial-bus.c |2 +- hw/net/vhost_net.c |2 + hw/scsi/virtio-scsi.c |3 +- hw/virtio/virtio-rng.c |1 + hw/virtio/virtio.c | 1154 ++- include/hw/virtio/virtio.h | 14 +- 7 files changed, 1045 insertions(+), 138 deletions(-)
[Qemu-block] [PATCH V4 01/10] virtio: convert to use DMA api
Currently, all virtio devices bypass IOMMU completely. This is because address_space_memory is assumed and used during DMA emulation. This patch converts the virtio core API to use DMA API. This idea is - introducing a new transport specific helper to query the dma address space. (only pci version is implemented). - query and use this address space during virtio device guest memory accessing when iommu platform (VIRTIO_F_IOMMU_PLATFORM) was enabled for this device. Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Stefan Hajnoczi <stefa...@redhat.com> Cc: Kevin Wolf <kw...@redhat.com> Cc: Amit Shah <amit.s...@redhat.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang <jasow...@redhat.com> --- hw/block/virtio-blk.c | 2 +- hw/char/virtio-serial-bus.c | 3 ++- hw/scsi/virtio-scsi.c | 4 ++- hw/virtio/virtio-bus.c| 8 ++ hw/virtio/virtio-pci.c| 14 ++ hw/virtio/virtio.c| 57 +-- include/hw/virtio/virtio-access.h | 31 ++--- include/hw/virtio/virtio-bus.h| 1 + include/hw/virtio/virtio.h| 9 --- 9 files changed, 93 insertions(+), 36 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 0c5fd27..12657ff 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -857,7 +857,7 @@ static int virtio_blk_load_device(VirtIODevice *vdev, QEMUFile *f, } } -req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq)); +req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq)); virtio_blk_init_request(s, virtio_get_queue(vdev, vq_idx), req); req->next = s->rq; s->rq = req; diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index 7975c2c..d544cd9 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -732,6 +732,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque) static int fetch_active_ports_list(QEMUFile *f, VirtIOSerial *s, uint32_t nr_active_ports) { +VirtIODevice *vdev = VIRTIO_DEVICE(s); uint32_t i; s->post_load = g_malloc0(sizeof(*s->post_load)); @@ -765,7 +766,7 @@ static int fetch_active_ports_list(QEMUFile *f, qemu_get_be64s(f, >iov_offset); port->elem = -qemu_get_virtqueue_element(f, sizeof(VirtQueueElement)); +qemu_get_virtqueue_element(vdev, f, sizeof(VirtQueueElement)); /* * Port was throttled on source machine. Let's diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 34bba35..3da9d62 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -198,12 +198,14 @@ static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq) SCSIBus *bus = sreq->bus; VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus); VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s); +VirtIODevice *vdev = VIRTIO_DEVICE(s); VirtIOSCSIReq *req; uint32_t n; qemu_get_be32s(f, ); assert(n < vs->conf.num_queues); -req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size); +req = qemu_get_virtqueue_element(vdev, f, + sizeof(VirtIOSCSIReq) + vs->cdb_size); virtio_scsi_init_req(s, vs->cmd_vqs[n], req); if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size, diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c index d6c0c72..d31cc00 100644 --- a/hw/virtio/virtio-bus.c +++ b/hw/virtio/virtio-bus.c @@ -28,6 +28,7 @@ #include "hw/qdev.h" #include "hw/virtio/virtio-bus.h" #include "hw/virtio/virtio.h" +#include "exec/address-spaces.h" /* #define DEBUG_VIRTIO_BUS */ @@ -61,6 +62,13 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp) if (klass->device_plugged != NULL) { klass->device_plugged(qbus->parent, errp); } + +if (klass->get_dma_as != NULL && +virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) { +vdev->dma_as = klass->get_dma_as(qbus->parent); +} else { +vdev->dma_as = _space_memory; +} } /* Reset the virtio_bus */ diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 21c2b9d..213d57e 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -1144,6 +1144,14 @@ static int virtio_pci_query_nvectors(DeviceState *d) return proxy->nvectors; } +static AddressSpace *virtio_pci_get_dma_as(DeviceState *d) +{ +VirtIOPCIProxy *proxy = VIRTIO_PCI(d); +PCIDevice *dev = >pci_dev; + +return pci_get_address_space(dev); +} + static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
[Qemu-block] [PATCH V3 01/10] virtio: convert to use DMA api
Currently, all virtio devices bypass IOMMU completely. This is because address_space_memory is assumed and used during DMA emulation. This patch converts the virtio core API to use DMA API. This idea is - introducing a new transport specific helper to query the dma address space. (only pci version is implemented). - query and use this address space during virtio device guest memory accessing when iommu platform (VIRTIO_F_IOMMU_PLATFORM) was enabled for this device. Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Stefan Hajnoczi <stefa...@redhat.com> Cc: Kevin Wolf <kw...@redhat.com> Cc: Amit Shah <amit.s...@redhat.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang <jasow...@redhat.com> --- hw/block/virtio-blk.c | 2 +- hw/char/virtio-serial-bus.c | 3 ++- hw/scsi/virtio-scsi.c | 4 ++- hw/virtio/virtio-bus.c| 8 ++ hw/virtio/virtio-pci.c| 14 ++ hw/virtio/virtio.c| 57 +-- include/hw/virtio/virtio-access.h | 31 ++--- include/hw/virtio/virtio-bus.h| 1 + include/hw/virtio/virtio.h| 9 --- 9 files changed, 93 insertions(+), 36 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 37fe72b..6a2dbaf 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -860,7 +860,7 @@ static int virtio_blk_load_device(VirtIODevice *vdev, QEMUFile *f, } } -req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq)); +req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq)); virtio_blk_init_request(s, virtio_get_queue(vdev, vq_idx), req); req->next = s->rq; s->rq = req; diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index 7975c2c..d544cd9 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -732,6 +732,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque) static int fetch_active_ports_list(QEMUFile *f, VirtIOSerial *s, uint32_t nr_active_ports) { +VirtIODevice *vdev = VIRTIO_DEVICE(s); uint32_t i; s->post_load = g_malloc0(sizeof(*s->post_load)); @@ -765,7 +766,7 @@ static int fetch_active_ports_list(QEMUFile *f, qemu_get_be64s(f, >iov_offset); port->elem = -qemu_get_virtqueue_element(f, sizeof(VirtQueueElement)); +qemu_get_virtqueue_element(vdev, f, sizeof(VirtQueueElement)); /* * Port was throttled on source machine. Let's diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 4762f05..1519cee 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -198,12 +198,14 @@ static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq) SCSIBus *bus = sreq->bus; VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus); VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s); +VirtIODevice *vdev = VIRTIO_DEVICE(s); VirtIOSCSIReq *req; uint32_t n; qemu_get_be32s(f, ); assert(n < vs->conf.num_queues); -req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size); +req = qemu_get_virtqueue_element(vdev, f, + sizeof(VirtIOSCSIReq) + vs->cdb_size); virtio_scsi_init_req(s, vs->cmd_vqs[n], req); if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size, diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c index 11f65bd..8597762 100644 --- a/hw/virtio/virtio-bus.c +++ b/hw/virtio/virtio-bus.c @@ -28,6 +28,7 @@ #include "hw/qdev.h" #include "hw/virtio/virtio-bus.h" #include "hw/virtio/virtio.h" +#include "exec/address-spaces.h" /* #define DEBUG_VIRTIO_BUS */ @@ -61,6 +62,13 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp) if (klass->device_plugged != NULL) { klass->device_plugged(qbus->parent, errp); } + +if (klass->get_dma_as != NULL && +virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) { +vdev->dma_as = klass->get_dma_as(qbus->parent); +} else { +vdev->dma_as = _space_memory; +} } /* Reset the virtio_bus */ diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 06831de..6ceb43e 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -1168,6 +1168,14 @@ static int virtio_pci_query_nvectors(DeviceState *d) return proxy->nvectors; } +static AddressSpace *virtio_pci_get_dma_as(DeviceState *d) +{ +VirtIOPCIProxy *proxy = VIRTIO_PCI(d); +PCIDevice *dev = >pci_dev; + +return pci_get_address_space(dev); +} + static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
Re: [Qemu-block] [PATCH V2 02/11] virtio: convert to use DMA api
On 2016年11月04日 03:46, Michael S. Tsirkin wrote: @@ -244,6 +245,7 @@ int virtio_queue_empty(VirtQueue *vq) > static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem, > unsigned int len) > { >+AddressSpace *dma_as = virtio_get_dma_as(vq->vdev); > unsigned int offset; > int i; Can't we use vdev->dma_as here? I'd like to avoid query on data path. Same in most other places below. Will use vdev->dma_as in next version. Thanks
[Qemu-block] [PATCH V2 02/11] virtio: convert to use DMA api
Currently, all virtio devices bypass IOMMU completely. This is because address_space_memory is assumed and used during DMA emulation. This patch converts the virtio core API to use DMA API. This idea is - introducing a new transport specific helper to query the dma address space. (only pci version is implemented). - query and use this address space during virtio device guest memory accessing when iommu platform (VIRTIO_F_IOMMU_PLATFORM) was enabled for this device. Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Stefan Hajnoczi <stefa...@redhat.com> Cc: Kevin Wolf <kw...@redhat.com> Cc: Amit Shah <amit.s...@redhat.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang <jasow...@redhat.com> --- hw/block/virtio-blk.c | 2 +- hw/char/virtio-serial-bus.c | 3 ++- hw/scsi/virtio-scsi.c | 4 ++- hw/virtio/virtio-bus.c| 8 ++ hw/virtio/virtio-pci.c| 14 ++ hw/virtio/virtio.c| 57 +-- include/hw/virtio/virtio-access.h | 36 ++--- include/hw/virtio/virtio-bus.h| 1 + include/hw/virtio/virtio.h| 9 --- 9 files changed, 98 insertions(+), 36 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 37fe72b..6a2dbaf 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -860,7 +860,7 @@ static int virtio_blk_load_device(VirtIODevice *vdev, QEMUFile *f, } } -req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq)); +req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq)); virtio_blk_init_request(s, virtio_get_queue(vdev, vq_idx), req); req->next = s->rq; s->rq = req; diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index 7975c2c..d544cd9 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -732,6 +732,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque) static int fetch_active_ports_list(QEMUFile *f, VirtIOSerial *s, uint32_t nr_active_ports) { +VirtIODevice *vdev = VIRTIO_DEVICE(s); uint32_t i; s->post_load = g_malloc0(sizeof(*s->post_load)); @@ -765,7 +766,7 @@ static int fetch_active_ports_list(QEMUFile *f, qemu_get_be64s(f, >iov_offset); port->elem = -qemu_get_virtqueue_element(f, sizeof(VirtQueueElement)); +qemu_get_virtqueue_element(vdev, f, sizeof(VirtQueueElement)); /* * Port was throttled on source machine. Let's diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 4762f05..1519cee 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -198,12 +198,14 @@ static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq) SCSIBus *bus = sreq->bus; VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus); VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s); +VirtIODevice *vdev = VIRTIO_DEVICE(s); VirtIOSCSIReq *req; uint32_t n; qemu_get_be32s(f, ); assert(n < vs->conf.num_queues); -req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size); +req = qemu_get_virtqueue_element(vdev, f, + sizeof(VirtIOSCSIReq) + vs->cdb_size); virtio_scsi_init_req(s, vs->cmd_vqs[n], req); if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size, diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c index 11f65bd..8597762 100644 --- a/hw/virtio/virtio-bus.c +++ b/hw/virtio/virtio-bus.c @@ -28,6 +28,7 @@ #include "hw/qdev.h" #include "hw/virtio/virtio-bus.h" #include "hw/virtio/virtio.h" +#include "exec/address-spaces.h" /* #define DEBUG_VIRTIO_BUS */ @@ -61,6 +62,13 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp) if (klass->device_plugged != NULL) { klass->device_plugged(qbus->parent, errp); } + +if (klass->get_dma_as != NULL && +virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) { +vdev->dma_as = klass->get_dma_as(qbus->parent); +} else { +vdev->dma_as = _space_memory; +} } /* Reset the virtio_bus */ diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 06831de..6ceb43e 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -1168,6 +1168,14 @@ static int virtio_pci_query_nvectors(DeviceState *d) return proxy->nvectors; } +static AddressSpace *virtio_pci_get_dma_as(DeviceState *d) +{ +VirtIOPCIProxy *proxy = VIRTIO_PCI(d); +PCIDevice *dev = >pci_dev; + +return pci_get_address_space(dev); +} + static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
Re: [Qemu-block] [Qemu-devel] [PATCH for 2.8 02/11] virtio: convert to use DMA api
On 2016年09月05日 10:26, Wei Xu wrote: On 2016年08月30日 11:06, Jason Wang wrote: @@ -1587,6 +1595,11 @@ static void virtio_pci_device_plugged(DeviceState *d, Error **errp) } if (legacy) { +if (virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) { +error_setg(errp, "VIRTIO_F_IOMMU_PLATFORM was supported by" + "neither legacy nor transitional device."); +return ; +} Not sure if i understand it correctly, the transitional device here maybe a bit hard to understand, "transitional" were defined by spec. just a tip for your convenience, besides the denied prompt, can we add what kind of device is supported to the message? such as modern device only, like this. "VIRTIO_F_IOMMU_PLATFORM is supported by modern device only, it is not supported by either legacy or transitional device." Ok. /* legacy and transitional */ pci_set_word(config + PCI_SUBSYSTEM_VENDOR_ID, pci_get_word(config + PCI_VENDOR_ID));
Re: [Qemu-block] [PATCH for 2.8 02/11] virtio: convert to use DMA api
On 2016年08月30日 15:31, Cornelia Huck wrote: On Tue, 30 Aug 2016 11:06:50 +0800 Jason Wang <jasow...@redhat.com> wrote: Currently, all virtio devices bypass IOMMU completely. This is because address_space_memory is assumed and used during DMA emulation. This patch converts the virtio core API to use DMA API. This idea is - introducing a new transport specific helper to query the dma address space. (only pci version is implemented). - query and use this address space during virtio device guest memory accessing when iommu platform (VIRTIO_F_IOMMU_PLATFORM) was enabled for this device. Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Stefan Hajnoczi <stefa...@redhat.com> Cc: Kevin Wolf <kw...@redhat.com> Cc: Amit Shah <amit.s...@redhat.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang <jasow...@redhat.com> --- hw/block/virtio-blk.c | 2 +- hw/char/virtio-serial-bus.c | 3 +- hw/scsi/virtio-scsi.c | 4 ++- hw/virtio/virtio-pci.c| 14 + hw/virtio/virtio.c| 62 --- include/hw/virtio/virtio-access.h | 43 --- include/hw/virtio/virtio-bus.h| 1 + include/hw/virtio/virtio.h| 8 +++-- 8 files changed, 98 insertions(+), 39 deletions(-) diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h index 440b455..4071dad 100644 --- a/include/hw/virtio/virtio-access.h +++ b/include/hw/virtio/virtio-access.h @@ -17,12 +17,25 @@ #define QEMU_VIRTIO_ACCESS_H #include "hw/virtio/virtio.h" +#include "hw/virtio/virtio-bus.h" #include "exec/address-spaces.h" #if defined(TARGET_PPC64) || defined(TARGET_ARM) #define LEGACY_VIRTIO_IS_BIENDIAN 1 #endif +static inline AddressSpace *virtio_get_dma_as(VirtIODevice *vdev) +{ +BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); +VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); + +if (virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) && +k->get_dma_as) { +return k->get_dma_as(qbus->parent); +} +return _space_memory; +} One thing I'm a bit worried about is that we're introducing a check even if we know that the device will never support VIRTIO_F_IOMMU_PLATFORM (i.e. virtio-ccw). The qom incantations will add cycles to any invocation of this. Is the address space likely to change during device lifetime? Can we cache it in some way? I think so, we can cache it in vdev and set it during features set. Then we can avoid qom stuffs during data path.
[Qemu-block] [PATCH for 2.8 02/11] virtio: convert to use DMA api
Currently, all virtio devices bypass IOMMU completely. This is because address_space_memory is assumed and used during DMA emulation. This patch converts the virtio core API to use DMA API. This idea is - introducing a new transport specific helper to query the dma address space. (only pci version is implemented). - query and use this address space during virtio device guest memory accessing when iommu platform (VIRTIO_F_IOMMU_PLATFORM) was enabled for this device. Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Stefan Hajnoczi <stefa...@redhat.com> Cc: Kevin Wolf <kw...@redhat.com> Cc: Amit Shah <amit.s...@redhat.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang <jasow...@redhat.com> --- hw/block/virtio-blk.c | 2 +- hw/char/virtio-serial-bus.c | 3 +- hw/scsi/virtio-scsi.c | 4 ++- hw/virtio/virtio-pci.c| 14 + hw/virtio/virtio.c| 62 --- include/hw/virtio/virtio-access.h | 43 --- include/hw/virtio/virtio-bus.h| 1 + include/hw/virtio/virtio.h| 8 +++-- 8 files changed, 98 insertions(+), 39 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 331d766..8fd6df7 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -856,7 +856,7 @@ static int virtio_blk_load_device(VirtIODevice *vdev, QEMUFile *f, } } -req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq)); +req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq)); virtio_blk_init_request(s, virtio_get_queue(vdev, vq_idx), req); req->next = s->rq; s->rq = req; diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index db57a38..94f19ba 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -682,6 +682,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque) static int fetch_active_ports_list(QEMUFile *f, VirtIOSerial *s, uint32_t nr_active_ports) { +VirtIODevice *vdev = VIRTIO_DEVICE(s); uint32_t i; s->post_load = g_malloc0(sizeof(*s->post_load)); @@ -715,7 +716,7 @@ static int fetch_active_ports_list(QEMUFile *f, qemu_get_be64s(f, >iov_offset); port->elem = -qemu_get_virtqueue_element(f, sizeof(VirtQueueElement)); +qemu_get_virtqueue_element(vdev, f, sizeof(VirtQueueElement)); /* * Port was throttled on source machine. Let's diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index ce57ef6..4cc7627 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -197,12 +197,14 @@ static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq) SCSIBus *bus = sreq->bus; VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus); VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s); +VirtIODevice *vdev = VIRTIO_DEVICE(s); VirtIOSCSIReq *req; uint32_t n; qemu_get_be32s(f, ); assert(n < vs->conf.num_queues); -req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size); +req = qemu_get_virtqueue_element(vdev, f, + sizeof(VirtIOSCSIReq) + vs->cdb_size); virtio_scsi_init_req(s, vs->cmd_vqs[n], req); if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size, diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 755f921..c10bf55 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -1162,6 +1162,14 @@ static int virtio_pci_query_nvectors(DeviceState *d) return proxy->nvectors; } +static AddressSpace *virtio_pci_get_dma_as(DeviceState *d) +{ +VirtIOPCIProxy *proxy = VIRTIO_PCI(d); +PCIDevice *dev = >pci_dev; + +return pci_get_address_space(dev); +} + static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy, struct virtio_pci_cap *cap) { @@ -1587,6 +1595,11 @@ static void virtio_pci_device_plugged(DeviceState *d, Error **errp) } if (legacy) { +if (virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) { +error_setg(errp, "VIRTIO_F_IOMMU_PLATFORM was supported by" + "neither legacy nor transitional device."); +return ; +} /* legacy and transitional */ pci_set_word(config + PCI_SUBSYSTEM_VENDOR_ID, pci_get_word(config + PCI_VENDOR_ID)); @@ -2452,6 +2465,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data) k->ioeventfd_disabled = virtio_pci_ioeventfd_disabled; k->ioeventfd_set_disabled = virtio_pci_ioeventfd_set_disabled; k->ioeventfd_assign = virtio_pci_ioeventfd_assign; +k->get_dma_as = vi
[Qemu-block] [RFC PATCH 1/8] virtio: convert to use DMA api
Currently, all virtio devices bypass IOMMU completely. This is because address_space_memory is assumed and used during DMA emulation. This patch converts the virtio core API to use DMA API. This idea is - introducing a new transport specific helper to query the dma address space. (only pci version is implemented). - query and use this address space during virtio device guest memory accessing With this virtio devices will not bypass IOMMU anymore. Tested with intel_iommu=on/strict with: - virtio guest DMA series posted in https://lkml.org/lkml/2015/10/28/64. - vfio (unsafe interrupt mode) dpdk l2fwd in guest TODO: - Feature bit for this - Implement this for all transports Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Stefan Hajnoczi <stefa...@redhat.com> Cc: Kevin Wolf <kw...@redhat.com> Cc: Amit Shah <amit.s...@redhat.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang <jasow...@redhat.com> --- hw/block/virtio-blk.c | 2 +- hw/char/virtio-serial-bus.c | 3 +- hw/scsi/virtio-scsi.c | 4 ++- hw/virtio/virtio-pci.c| 9 ++ hw/virtio/virtio.c| 58 +++ include/hw/virtio/virtio-access.h | 42 +--- include/hw/virtio/virtio-bus.h| 1 + include/hw/virtio/virtio.h| 4 +-- 8 files changed, 85 insertions(+), 38 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index cb710f1..9411f99 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -829,7 +829,7 @@ static int virtio_blk_load_device(VirtIODevice *vdev, QEMUFile *f, while (qemu_get_sbyte(f)) { VirtIOBlockReq *req; -req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq)); +req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq)); virtio_blk_init_request(s, req); req->next = s->rq; s->rq = req; diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index 99cb683..bdc5393 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -687,6 +687,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque) static int fetch_active_ports_list(QEMUFile *f, int version_id, VirtIOSerial *s, uint32_t nr_active_ports) { +VirtIODevice *vdev = VIRTIO_DEVICE(s); uint32_t i; s->post_load = g_malloc0(sizeof(*s->post_load)); @@ -722,7 +723,7 @@ static int fetch_active_ports_list(QEMUFile *f, int version_id, qemu_get_be64s(f, >iov_offset); port->elem = -qemu_get_virtqueue_element(f, sizeof(VirtQueueElement)); +qemu_get_virtqueue_element(vdev, f, sizeof(VirtQueueElement)); /* * Port was throttled on source machine. Let's diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 0c30d2e..26ce701 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -196,12 +196,14 @@ static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq) SCSIBus *bus = sreq->bus; VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus); VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s); +VirtIODevice *vdev = VIRTIO_DEVICE(s); VirtIOSCSIReq *req; uint32_t n; qemu_get_be32s(f, ); assert(n < vs->conf.num_queues); -req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size); +req = qemu_get_virtqueue_element(vdev, f, + sizeof(VirtIOSCSIReq) + vs->cdb_size); virtio_scsi_init_req(s, vs->cmd_vqs[n], req); if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size, diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 0dadb66..5508b1c 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -1211,6 +1211,14 @@ static int virtio_pci_query_nvectors(DeviceState *d) return proxy->nvectors; } +static AddressSpace *virtio_pci_get_dma_as(DeviceState *d) +{ +VirtIOPCIProxy *proxy = VIRTIO_PCI(d); +PCIDevice *dev = >pci_dev; + +return pci_get_address_space(dev); +} + static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy, struct virtio_pci_cap *cap) { @@ -2495,6 +2503,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data) k->device_plugged = virtio_pci_device_plugged; k->device_unplugged = virtio_pci_device_unplugged; k->query_nvectors = virtio_pci_query_nvectors; +k->get_dma_as = virtio_pci_get_dma_as; } static const TypeInfo virtio_pci_bus_info = { diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 08275a9..37c9951 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -21,6 +21,7 @@ #include "hw/virtio/virtio-bus.h" #include "migration/mig
[Qemu-block] [PATCH V4 3/3] virtio-blk: only clear VIRTIO_F_ANY_LAYOUT for legacy device
Chapter 6.3 of spec said Transitional devices MUST offer, and if offered by the device transitional drivers MUST accept the following: VIRTIO_F_ANY_LAYOUT (27) So this patch only clear VIRTIO_F_LAYOUT for legacy device. Cc: Stefan Hajnoczi stefa...@redhat.com Cc: Kevin Wolf kw...@redhat.com Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang jasow...@redhat.com --- hw/block/virtio-blk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 9acbc3a..1d3f26c 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -731,7 +731,6 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features, virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY); virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY); virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE); -virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT); if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)) { if (s-conf.scsi) { error_setg(errp, Virtio 1.0 does not support scsi passthrough!); @@ -739,6 +738,7 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features, } virtio_add_feature(features, VIRTIO_F_ANY_LAYOUT); } else { +virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT); virtio_add_feature(features, VIRTIO_BLK_F_SCSI); } -- 2.1.4
[Qemu-block] [PATCH V3 3/3] virtio-blk: set VIRTIO_F_ANY_LAYOUT when 1.0 is supported
Chapter 6.3 of spec said Transitional devices MUST offer, and if offered by the device transitional drivers MUST accept the following: VIRTIO_F_ANY_LAYOUT (27) So this patch sets VIRTIO_F_ANY_LAYOUT when 1.0 is supported. Cc: Stefan Hajnoczi stefa...@redhat.com Cc: Kevin Wolf kw...@redhat.com Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang jasow...@redhat.com --- hw/block/virtio-blk.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 4716c3e..2e7a190 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -736,6 +736,7 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features, error_setg(errp, Virtio 1.0 does not support scsi passthrough!); return 0; } +virtio_add_feature(features, VIRTIO_F_ANY_LAYOUT); } else { virtio_add_feature(features, VIRTIO_BLK_F_SCSI); } -- 2.1.4
Re: [Qemu-block] [PATCH V2 2/5] virtio-blk: advertise scsi only when scsi is set
On 07/15/2015 03:57 PM, Paolo Bonzini wrote: On 15/07/2015 07:29, Jason Wang wrote: Cc: Stefan Hajnoczi stefa...@redhat.com Cc: Kevin Wolf kw...@redhat.com Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang jasow...@redhat.com --- hw/block/virtio-blk.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 4c27974..761d763 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -731,7 +731,9 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features, virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY); virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY); virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE); -virtio_add_feature(features, VIRTIO_BLK_F_SCSI); +if (s-conf.scsi) { +virtio_add_feature(features, VIRTIO_BLK_F_SCSI); +} This must only be done for newer machine types only, or you change guest ABI for scsi=off. Effectively you have to split it in two properties, scsi and always_set_f_scsi. Paolo And always_set_f_scsi is true only for legacy machine types?
[Qemu-block] [PATCH V2 3/5] virtio-blk: disable scsi passthrough by default
Disable scsi passthrough by default since it was incompatible with virtio 1.0. For legacy machine types, keep this on by default. Cc: Stefan Hajnoczi stefa...@redhat.com Cc: Kevin Wolf kw...@redhat.com Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang jasow...@redhat.com --- hw/block/virtio-blk.c | 2 +- include/hw/compat.h | 6 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 761d763..362fe53 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -964,7 +964,7 @@ static Property virtio_blk_properties[] = { DEFINE_PROP_STRING(serial, VirtIOBlock, conf.serial), DEFINE_PROP_BIT(config-wce, VirtIOBlock, conf.config_wce, 0, true), #ifdef __linux__ -DEFINE_PROP_BIT(scsi, VirtIOBlock, conf.scsi, 0, true), +DEFINE_PROP_BIT(scsi, VirtIOBlock, conf.scsi, 0, false), #endif DEFINE_PROP_BIT(request-merging, VirtIOBlock, conf.request_merging, 0, true), diff --git a/include/hw/compat.h b/include/hw/compat.h index 4a43466..56039d8 100644 --- a/include/hw/compat.h +++ b/include/hw/compat.h @@ -2,7 +2,11 @@ #define HW_COMPAT_H #define HW_COMPAT_2_3 \ -/* empty */ +{\ +.driver = virtio-blk-pci,\ +.property = scsi,\ +.value= on,\ +}, #define HW_COMPAT_2_2 \ /* empty */ -- 2.1.4
[Qemu-block] [PATCH V2 4/5] virtio-blk: fail the init when both 1.0 and scsi is set
Scsi passthrough was no longer supported in 1.0, so fail the initialization when user want both features. Cc: Stefan Hajnoczi stefa...@redhat.com Cc: Kevin Wolf kw...@redhat.com Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang jasow...@redhat.com --- hw/block/virtio-blk.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 362fe53..4a0ef68 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -732,6 +732,10 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features, virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY); virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE); if (s-conf.scsi) { +if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)) { +error_setg(errp, Virtio 1.0 does not support scsi passthrough); +return 0; +} virtio_add_feature(features, VIRTIO_BLK_F_SCSI); } -- 2.1.4
[Qemu-block] [PATCH V2 2/5] virtio-blk: advertise scsi only when scsi is set
Cc: Stefan Hajnoczi stefa...@redhat.com Cc: Kevin Wolf kw...@redhat.com Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang jasow...@redhat.com --- hw/block/virtio-blk.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 4c27974..761d763 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -731,7 +731,9 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features, virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY); virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY); virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE); -virtio_add_feature(features, VIRTIO_BLK_F_SCSI); +if (s-conf.scsi) { +virtio_add_feature(features, VIRTIO_BLK_F_SCSI); +} if (s-conf.config_wce) { virtio_add_feature(features, VIRTIO_BLK_F_CONFIG_WCE); -- 2.1.4
Re: [Qemu-block] [Qemu-devel] [PATCH 2/5] virtio-blk: disable scsi passthrough for 1.0 device
On 07/13/2015 03:46 PM, Michael S. Tsirkin wrote: On Mon, Jul 13, 2015 at 01:46:48PM +0800, Jason Wang wrote: VIRTIO_BLK_F_SCSI was no longer supported in 1.0. So disable it. Cc: Stefan Hajnoczi stefa...@redhat.com Cc: Kevin Wolf kw...@redhat.com Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang jasow...@redhat.com Interesting, I noticed we have a field scsi - see commit 1ba1f2e319afdcb485963cd3f426fdffd1b725f2 Author: Paolo Bonzini pbonz...@redhat.com Date: Fri Dec 23 15:39:03 2011 +0100 virtio-blk: refuse SG_IO requests with scsi=off but it doesn't seem to be propagated to guest features in any way. Maybe we should fix that, making that flag AutoOnOff? Looks ok but auto may need some compat work since default is true. Then, if user explicitly requested scsi=on with a modern interface then we can error out cleanly. Given scsi flag is currently ignored, I think this can be a patch on top. Looks like virtio_blk_handle_scsi_req() check this: if (!blk-conf.scsi) { status = VIRTIO_BLK_S_UNSUPP; goto fail; } --- hw/block/virtio-blk.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 6aefda4..f30ad25 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -730,7 +730,8 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features) virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY); virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY); virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE); -virtio_add_feature(features, VIRTIO_BLK_F_SCSI); +if (!__virtio_has_feature(features, VIRTIO_F_VERSION_1)) +virtio_add_feature(features, VIRTIO_BLK_F_SCSI); if (s-conf.config_wce) { virtio_add_feature(features, VIRTIO_BLK_F_CONFIG_WCE); -- 2.1.4
[Qemu-block] [PATCH 2/5] virtio-blk: disable scsi passthrough for 1.0 device
VIRTIO_BLK_F_SCSI was no longer supported in 1.0. So disable it. Cc: Stefan Hajnoczi stefa...@redhat.com Cc: Kevin Wolf kw...@redhat.com Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang jasow...@redhat.com --- hw/block/virtio-blk.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 6aefda4..f30ad25 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -730,7 +730,8 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features) virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY); virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY); virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE); -virtio_add_feature(features, VIRTIO_BLK_F_SCSI); +if (!__virtio_has_feature(features, VIRTIO_F_VERSION_1)) +virtio_add_feature(features, VIRTIO_BLK_F_SCSI); if (s-conf.config_wce) { virtio_add_feature(features, VIRTIO_BLK_F_CONFIG_WCE); -- 2.1.4
[Qemu-block] [PATCH 3/5] virtio-blk: set VIRTIO_F_ANY_LAYOUT when 1.0 is supported
Chapter 6.3 of spec said Transitional devices MUST offer, and if offered by the device transitional drivers MUST accept the following: VIRTIO_F_ANY_LAYOUT (27) So this patch sets VIRTIO_F_ANY_LAYOUT when 1.0 is supported. Cc: Stefan Hajnoczi stefa...@redhat.com Cc: Kevin Wolf kw...@redhat.com Cc: qemu-block@nongnu.org Signed-off-by: Jason Wang jasow...@redhat.com --- hw/block/virtio-blk.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index f30ad25..0f07e25 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -732,6 +732,8 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features) virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE); if (!__virtio_has_feature(features, VIRTIO_F_VERSION_1)) virtio_add_feature(features, VIRTIO_BLK_F_SCSI); +else +virtio_add_feature(features, VIRTIO_F_ANY_LAYOUT); if (s-conf.config_wce) { virtio_add_feature(features, VIRTIO_BLK_F_CONFIG_WCE); -- 2.1.4