Re: [PATCH for-9.0 v3] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-03-17 Thread Jason Wang
On Fri, Mar 15, 2024 at 11:59 PM Kevin Wolf  wrote:
>
> VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> status flag is set; with the current API of the kernel module, it is
> impossible to enable the opposite order in our block export code because
> userspace is not notified when a virtqueue is enabled.
>
> This requirement also mathces the normal initialisation order as done by
> the generic vhost code in QEMU. However, commit 6c482547 accidentally
> changed the order for vdpa-dev and broke access to VDUSE devices with
> this.
>
> This changes vdpa-dev to use the normal order again and use the standard
> vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> used with vdpa-dev again after this fix.
>
> vhost_net intentionally avoided enabling the vrings for vdpa and does
> this manually later while it does enable them for other vhost backends.
> Reflect this in the vhost_net code and return early for vdpa, so that
> the behaviour doesn't change for this device.
>
> Cc: qemu-sta...@nongnu.org
> Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
> Signed-off-by: Kevin Wolf 
> ---
> v2:
> - Actually make use of the @enable parameter
> - Change vhost_net to preserve the current behaviour
>
> v3:
> - Updated trace point [Stefano]
> - Fixed typo in comment [Stefano]
>
>  hw/net/vhost_net.c | 10 ++
>  hw/virtio/vdpa-dev.c   |  5 +
>  hw/virtio/vhost-vdpa.c | 29 ++---
>  hw/virtio/vhost.c  |  8 +++-
>  hw/virtio/trace-events |  2 +-
>  5 files changed, 45 insertions(+), 9 deletions(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index e8e1661646..fd1a93701a 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -541,6 +541,16 @@ int vhost_set_vring_enable(NetClientState *nc, int 
> enable)
>  VHostNetState *net = get_vhost_net(nc);
>  const VhostOps *vhost_ops = net->dev.vhost_ops;
>
> +/*
> + * vhost-vdpa network devices need to enable dataplane virtqueues after
> + * DRIVER_OK, so they can recover device state before starting dataplane.
> + * Because of that, we don't enable virtqueues here and leave it to
> + * net/vhost-vdpa.c.
> + */
> +if (nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> +return 0;
> +}

I think we need some inputs from Eugenio, this is only needed for
shadow virtqueue during live migration but not other cases.

Thanks




Re: [PATCH v2 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-13 Thread Jason Wang
On Wed, Mar 13, 2024 at 7:55 PM Jonah Palmer  wrote:
>
> Add support to virtio-pci devices for handling the extra data sent
> from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
> transport feature has been negotiated.
>
> The extra data that's passed to the virtio-pci device when this
> feature is enabled varies depending on the device's virtqueue
> layout.
>
> In a split virtqueue layout, this data includes:
>  - upper 16 bits: shadow_avail_idx
>  - lower 16 bits: virtqueue index
>
> In a packed virtqueue layout, this data includes:
>  - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
>  - lower 16 bits: virtqueue index
>
> Tested-by: Lei Yang 
> Reviewed-by: Eugenio Pérez 
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio-pci.c | 10 +++---
>  hw/virtio/virtio.c | 18 ++
>  include/hw/virtio/virtio.h |  1 +
>  3 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index cb6940fc0e..0f5c3c3b2f 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  {
>  VirtIOPCIProxy *proxy = opaque;
>  VirtIODevice *vdev = virtio_bus_get_device(>bus);
> -uint16_t vector;
> +uint16_t vector, vq_idx;
>  hwaddr pa;
>
>  switch (addr) {
> @@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  vdev->queue_sel = val;
>  break;
>  case VIRTIO_PCI_QUEUE_NOTIFY:
> -if (val < VIRTIO_QUEUE_MAX) {
> -virtio_queue_notify(vdev, val);
> +vq_idx = val;
> +if (vq_idx < VIRTIO_QUEUE_MAX) {
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
> +virtio_queue_set_shadow_avail_data(vdev, val);
> +}
> +virtio_queue_notify(vdev, vq_idx);
>  }
>  break;
>  case VIRTIO_PCI_STATUS:
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index d229755eae..bcb9e09df0 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, 
> int align)
>  }
>  }
>
> +void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data)
> +{
> +/* Lower 16 bits is the virtqueue index */
> +uint16_t i = data;
> +VirtQueue *vq = >vq[i];
> +
> +if (!vq->vring.desc) {
> +return;
> +}
> +
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +vq->shadow_avail_wrap_counter = (data >> 31) & 0x1;
> +vq->shadow_avail_idx = (data >> 16) & 0x7FFF;
> +} else {
> +vq->shadow_avail_idx = (data >> 16);

Do we need to do a sanity check for this value?

Thanks

> +}
> +}
> +
>  static void virtio_queue_notify_vq(VirtQueue *vq)
>  {
>  if (vq->vring.desc && vq->handle_output) {
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index c8f72850bc..53915947a7 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -335,6 +335,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n);
>  void virtio_init_region_cache(VirtIODevice *vdev, int n);
>  void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
>  void virtio_queue_notify(VirtIODevice *vdev, int n);
> +void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data);
>  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
>  void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
>  int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
> --
> 2.39.3
>




Re: [PATCH v1 0/8] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-05 Thread Jason Wang
On Tue, Mar 5, 2024 at 3:46 AM Jonah Palmer  wrote:
>
> The goal of these patches are to add support to a variety of virtio and
> vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This
> feature indicates that a driver will pass extra data (instead of just a
> virtqueue's index) when notifying the corresponding device.
>
> The data passed in by the driver when this feature is enabled varies in
> format depending on if the device is using a split or packed virtqueue
> layout:
>
>  Split VQ
>   - Upper 16 bits: shadow_avail_idx
>   - Lower 16 bits: virtqueue index
>
>  Packed VQ
>   - Upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
>   - Lower 16 bits: virtqueue index
>
> Also, due to the limitations of ioeventfd not being able to carry the
> extra provided by the driver, ioeventfd is left disabled for any devices
> using this feature.

Is there any method to overcome this? This might help for vhost.

Thanks

>
> A significant aspect of this effort has been to maintain compatibility
> across different backends. As such, the feature is offered by backend
> devices only when supported, with fallback mechanisms where backend
> support is absent.
>
> Jonah Palmer (8):
>   virtio/virtio-pci: Handle extra notification data
>   virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
>   virtio-mmio: Handle extra notification data
>   virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
>   virtio-ccw: Handle extra notification data
>   virtio-ccw: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
>   vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits
>   virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition
>
>  hw/block/vhost-user-blk.c|  1 +
>  hw/net/vhost_net.c   |  2 ++
>  hw/s390x/s390-virtio-ccw.c   | 16 
>  hw/s390x/virtio-ccw.c|  6 --
>  hw/scsi/vhost-scsi.c |  1 +
>  hw/scsi/vhost-user-scsi.c|  1 +
>  hw/virtio/vhost-user-fs.c|  2 +-
>  hw/virtio/vhost-user-vsock.c |  1 +
>  hw/virtio/virtio-mmio.c  | 15 +++
>  hw/virtio/virtio-pci.c   | 16 +++-
>  hw/virtio/virtio.c   | 18 ++
>  include/hw/virtio/virtio.h   |  5 -
>  net/vhost-vdpa.c |  1 +
>  13 files changed, 68 insertions(+), 17 deletions(-)
>
> --
> 2.39.3
>




Re: Re: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-07 Thread Jason Wang
On Wed, Feb 7, 2024 at 4:47 PM Stefano Garzarella  wrote:
>
> On Wed, Feb 07, 2024 at 11:17:34AM +0800, Jason Wang wrote:
> >On Tue, Feb 6, 2024 at 4:31 PM Stefano Garzarella  
> >wrote:
> >>
> >> On Tue, Feb 06, 2024 at 10:47:40AM +0800, Jason Wang wrote:
> >> >On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella  
> >> >wrote:
> >> >>
> >> >> On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote:
> >> >> >VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> >> >> >status flag is set; with the current API of the kernel module, it is
> >> >> >impossible to enable the opposite order in our block export code 
> >> >> >because
> >> >> >userspace is not notified when a virtqueue is enabled.
> >> >
> >> >Did this mean virtio-blk will enable a virtqueue after DRIVER_OK?
> >>
> >> It's not specific to virtio-blk, but to the generic vdpa device we have
> >> in QEMU (i.e. vhost-vdpa-device). Yep, after commit
> >> 6c4825476a4351530bcac17abab72295b75ffe98, virtqueues are enabled after
> >> DRIVER_OK.
> >
> >Right.
> >
> >>
> >> >Sepc is not clear about this and that's why we introduce
> >> >VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK.
> >>
> >> Ah, I didn't know about this new feature. So after commit
> >> 6c4825476a4351530bcac17abab72295b75ffe98 the vhost-vdpa-device is not
> >> complying with the specification, right?
> >
> >Kind of, but as stated, it's just because spec is unclear about the
> >behaviour. There's a chance that spec will explicitly support it in
> >the future.
> >
> >>
> >> >
> >> >>
> >> >> Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message,
> >> >
> >> >I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is
> >> >negotiated.
> >>
> >> At this point yes. But if VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not
> >> negotiated, should we return an error in vhost-vdpa kernel module if
> >> VHOST_VDPA_SET_VRING_ENABLE is called when DRIVER_OK is already set?
> >
> >I'm not sure if this can break some setups or not. It might be better
> >to leave it as is?
>
> As I also answered in the kernel patch, IMHO we have to return an error,
> because any setups are broken anyway (see the problem we're fixing with
 is > this patch),

For VDUSE probably yes, but not for other parents. It's easy to
mandate on day 0 but might be hard for now.

For mlx5e, it supports the semantic
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK even before the feature bit is
introduced.

And we don't do other checks like for example setting vq address, size
after DRIVER_OK.

We can hear from others.

> so at this point it's better to return an error, so they
> don't spend time figuring out why the VDUSE device doesn't work.
>
> >
> >Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we don't know if
> >parent support vq_ready after driver_ok.
>
> So we have to assume that it doesn't support it, to be more
> conservative, right?
>
> >With VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we know parent support
> >vq_ready after driver_ok.
> >
> >>
> >> >If this is truth, it seems a little more complicated, for
> >> >example the get_backend_features needs to be forward to the userspace?
> >>
> >> I'm not understanding, don't we already have VHOST_GET_BACKEND_FEATURES
> >> for this? Or do you mean userspace on the VDUSE side?
> >
> >Yes, since in this case the parent is in the userspace, there's no way
> >for VDUSE to know if user space supports vq_ready after driver_ok or
> >not.
> >
> >As you may have noticed, we don't have a message for vq_ready which
> >implies that vq_ready after driver_ok can't be supported.
>
> Yep, I see.
>
> >
> >>
> >> >This seems suboptimal to implement this in the spec first and then we
> >> >can leverage the features. Or we can have another parameter for the
> >> >ioctl that creates the vduse device.
> >>
> >> I got a little lost, though in vhost-user, the device can always expect
> >> a vring_enable/disable, so I thought it was not complicated in VDUSE.
> >
> >Yes, the problem is assuming we have a message for vq_ready, there
> >could be  a "legacy" userspace that doesn't support that.  So in that
> >case, VDUSE needs to know if the userspace parent can support that or
> >not.
>
> I think VDUSE needs to negotiate features (if it doesn't already) as
> vhost-user does with the backend. Also for new future functionality.

It negotiates virtio features but not vhost backend features.

Thanks




Re: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-06 Thread Jason Wang
On Tue, Feb 6, 2024 at 4:31 PM Stefano Garzarella  wrote:
>
> On Tue, Feb 06, 2024 at 10:47:40AM +0800, Jason Wang wrote:
> >On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella  
> >wrote:
> >>
> >> On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote:
> >> >VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> >> >status flag is set; with the current API of the kernel module, it is
> >> >impossible to enable the opposite order in our block export code because
> >> >userspace is not notified when a virtqueue is enabled.
> >
> >Did this mean virtio-blk will enable a virtqueue after DRIVER_OK?
>
> It's not specific to virtio-blk, but to the generic vdpa device we have
> in QEMU (i.e. vhost-vdpa-device). Yep, after commit
> 6c4825476a4351530bcac17abab72295b75ffe98, virtqueues are enabled after
> DRIVER_OK.

Right.

>
> >Sepc is not clear about this and that's why we introduce
> >VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK.
>
> Ah, I didn't know about this new feature. So after commit
> 6c4825476a4351530bcac17abab72295b75ffe98 the vhost-vdpa-device is not
> complying with the specification, right?

Kind of, but as stated, it's just because spec is unclear about the
behaviour. There's a chance that spec will explicitly support it in
the future.

>
> >
> >>
> >> Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message,
> >
> >I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is
> >negotiated.
>
> At this point yes. But if VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not
> negotiated, should we return an error in vhost-vdpa kernel module if
> VHOST_VDPA_SET_VRING_ENABLE is called when DRIVER_OK is already set?

I'm not sure if this can break some setups or not. It might be better
to leave it as is?

Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we don't know if
parent support vq_ready after driver_ok.
With VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we know parent support
vq_ready after driver_ok.

>
> >If this is truth, it seems a little more complicated, for
> >example the get_backend_features needs to be forward to the userspace?
>
> I'm not understanding, don't we already have VHOST_GET_BACKEND_FEATURES
> for this? Or do you mean userspace on the VDUSE side?

Yes, since in this case the parent is in the userspace, there's no way
for VDUSE to know if user space supports vq_ready after driver_ok or
not.

As you may have noticed, we don't have a message for vq_ready which
implies that vq_ready after driver_ok can't be supported.

>
> >This seems suboptimal to implement this in the spec first and then we
> >can leverage the features. Or we can have another parameter for the
> >ioctl that creates the vduse device.
>
> I got a little lost, though in vhost-user, the device can always expect
> a vring_enable/disable, so I thought it was not complicated in VDUSE.

Yes, the problem is assuming we have a message for vq_ready, there
could be  a "legacy" userspace that doesn't support that.  So in that
case, VDUSE needs to know if the userspace parent can support that or
not.

>
> >
> >> I'll start another thread about that, but in the meantime I agree that
> >> we should fix QEMU since we need to work properly with old kernels as
> >> well.
> >>
> >> >
> >> >This requirement also mathces the normal initialisation order as done by
> >> >the generic vhost code in QEMU. However, commit 6c482547 accidentally
> >> >changed the order for vdpa-dev and broke access to VDUSE devices with
> >> >this.
> >> >
> >> >This changes vdpa-dev to use the normal order again and use the standard
> >> >vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> >> >used with vdpa-dev again after this fix.
> >>
> >> I like this approach and the patch LGTM, but I'm a bit worried about
> >> this function in hw/net/vhost_net.c:
> >>
> >>  int vhost_set_vring_enable(NetClientState *nc, int enable)
> >>  {
> >>  VHostNetState *net = get_vhost_net(nc);
> >>  const VhostOps *vhost_ops = net->dev.vhost_ops;
> >>
> >>  nc->vring_enable = enable;
> >>
> >>  if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
> >>  return vhost_ops->vhost_set_vring_enable(>dev, enable);
> >>  }
> >>
> >>  return 0;
> >>  }
> >>
> >> @Eugenio, @Jason, should we change some things there if vhost-vdpa
> >> implements the vhost_set_vring_enable callback?
> >
> >Eugenio may know more, I remember we need to enable cvq first for
> >shadow virtqueue to restore some states.
> >
> >>
> >> Do you remember why we didn't implement it from the beginning?
> >
> >It seems the vrings parameter is introduced after vhost-vdpa is
> >implemented.
>
> Sorry, I mean why we didn't implement the vhost_set_vring_enable
> callback for vhost-vdpa from the beginning.

Adding Cindy who writes those codes for more thoughts.

Thanks

>
> Thanks,
> Stefano
>




Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-05 Thread Jason Wang
On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella  wrote:
>
> On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote:
> >VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> >status flag is set; with the current API of the kernel module, it is
> >impossible to enable the opposite order in our block export code because
> >userspace is not notified when a virtqueue is enabled.

Did this mean virtio-blk will enable a virtqueue after DRIVER_OK? Sepc
is not clear about this and that's why we introduce
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK.

>
> Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message,

I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is
negotiated. If this is truth, it seems a little more complicated, for
example the get_backend_features needs to be forward to the userspace?
This seems suboptimal to implement this in the spec first and then we
can leverage the features. Or we can have another parameter for the
ioctl that creates the vduse device.

> I'll start another thread about that, but in the meantime I agree that
> we should fix QEMU since we need to work properly with old kernels as
> well.
>
> >
> >This requirement also mathces the normal initialisation order as done by
> >the generic vhost code in QEMU. However, commit 6c482547 accidentally
> >changed the order for vdpa-dev and broke access to VDUSE devices with
> >this.
> >
> >This changes vdpa-dev to use the normal order again and use the standard
> >vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> >used with vdpa-dev again after this fix.
>
> I like this approach and the patch LGTM, but I'm a bit worried about
> this function in hw/net/vhost_net.c:
>
>  int vhost_set_vring_enable(NetClientState *nc, int enable)
>  {
>  VHostNetState *net = get_vhost_net(nc);
>  const VhostOps *vhost_ops = net->dev.vhost_ops;
>
>  nc->vring_enable = enable;
>
>  if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
>  return vhost_ops->vhost_set_vring_enable(>dev, enable);
>  }
>
>  return 0;
>  }
>
> @Eugenio, @Jason, should we change some things there if vhost-vdpa
> implements the vhost_set_vring_enable callback?

Eugenio may know more, I remember we need to enable cvq first for
shadow virtqueue to restore some states.

>
> Do you remember why we didn't implement it from the beginning?

It seems the vrings parameter is introduced after vhost-vdpa is implemented.

Thanks

>
> Thanks,
> Stefano
>
> >
> >Cc: qemu-sta...@nongnu.org
> >Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
> >Signed-off-by: Kevin Wolf 
> >---
> > hw/virtio/vdpa-dev.c   |  5 +
> > hw/virtio/vhost-vdpa.c | 17 +
> > 2 files changed, 18 insertions(+), 4 deletions(-)
> >
> >diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> >index eb9ecea83b..13e87f06f6 100644
> >--- a/hw/virtio/vdpa-dev.c
> >+++ b/hw/virtio/vdpa-dev.c
> >@@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
> >Error **errp)
> >
> > s->dev.acked_features = vdev->guest_features;
> >
> >-ret = vhost_dev_start(>dev, vdev, false);
> >+ret = vhost_dev_start(>dev, vdev, true);
> > if (ret < 0) {
> > error_setg_errno(errp, -ret, "Error starting vhost");
> > goto err_guest_notifiers;
> > }
> >-for (i = 0; i < s->dev.nvqs; ++i) {
> >-vhost_vdpa_set_vring_ready(>vdpa, i);
> >-}
> > s->started = true;
> >
> > /*
> >diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >index 3a43beb312..c4574d56c5 100644
> >--- a/hw/virtio/vhost-vdpa.c
> >+++ b/hw/virtio/vhost-vdpa.c
> >@@ -904,6 +904,22 @@ int vhost_vdpa_set_vring_ready(struct vhost_vdpa *v, 
> >unsigned idx)
> > return r;
> > }
> >
> >+static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int enable)
> >+{
> >+struct vhost_vdpa *v = dev->opaque;
> >+unsigned int i;
> >+int ret;
> >+
> >+for (i = 0; i < dev->nvqs; ++i) {
> >+ret = vhost_vdpa_set_vring_ready(v, i);
> >+if (ret < 0) {
> >+return ret;
> >+}
> >+}
> >+
> >+return 0;
> >+}
> >+
> > static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
> >int fd)
> > {
> >@@ -1524,6 +1540,7 @@ const VhostOps vdpa_ops = {
> > .vhost_set_features = vhost_vdpa_set_features,
> > .vhost_reset_device = vhost_vdpa_reset_device,
> > .vhost_get_vq_index = vhost_vdpa_get_vq_index,
> >+.vhost_set_vring_enable = vhost_vdpa_set_vring_enable,
> > .vhost_get_config  = vhost_vdpa_get_config,
> > .vhost_set_config = vhost_vdpa_set_config,
> > .vhost_requires_shm_log = NULL,
> >--
> >2.43.0
> >
>




Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation

2023-12-11 Thread Jason Wang
On Mon, Dec 11, 2023 at 4:29 PM Akihiko Odaki  wrote:
>
> On 2023/12/11 16:26, Jason Wang wrote:
> > On Mon, Dec 11, 2023 at 1:30 PM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/12/11 11:52, Jason Wang wrote:
> >>> On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki  
> >>> wrote:
> >>>>
> >>>> Introduction
> >>>> 
> >>>>
> >>>> This series is based on the RFC series submitted by Yui Washizu[1].
> >>>> See also [2] for the context.
> >>>>
> >>>> This series enables SR-IOV emulation for virtio-net. It is useful
> >>>> to test SR-IOV support on the guest, or to expose several vDPA devices
> >>>> in a VM. vDPA devices can also provide L2 switching feature for
> >>>> offloading though it is out of scope to allow the guest to configure
> >>>> such a feature.
> >>>>
> >>>> The PF side code resides in virtio-pci. The VF side code resides in
> >>>> the PCI common infrastructure, but it is restricted to work only for
> >>>> virtio-net-pci because of lack of validation.
> >>>>
> >>>> User Interface
> >>>> --
> >>>>
> >>>> A user can configure a SR-IOV capable virtio-net device by adding
> >>>> virtio-net-pci functions to a bus. Below is a command line example:
> >>>> -netdev user,id=n -netdev user,id=o
> >>>> -netdev user,id=p -netdev user,id=q
> >>>> -device pcie-root-port,id=b
> >>>> -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
> >>>> -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
> >>>> -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
> >>>> -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
> >>>>
> >>>> The VFs specify the paired PF with "sriov-pf" property. The PF must be
> >>>> added after all VFs. It is user's responsibility to ensure that VFs have
> >>>> function numbers larger than one of the PF, and the function numbers
> >>>> have a consistent stride.
> >>>
> >>> This seems not user friendly. Any reason we can't just allow user to
> >>> specify the stride here?
> >>
> >> It should be possible to assign addr automatically without requiring
> >> user to specify the stride. I'll try that in the next version.
> >>
> >>>
> >>> Btw, I vaguely remember qemu allows the params to be accepted as a
> >>> list. If this is true, we can accept a list of netdev here?
> >>
> >> Yes, rocker does that. But the problem is not just about getting
> >> parameters needed for VFs, which I forgot to mention in the cover letter
> >> and will explain below.
> >>
> >>>
> >>>>
> >>>> Keeping VF instances
> >>>> 
> >>>>
> >>>> A problem with SR-IOV emulation is that it needs to hotplug the VFs as
> >>>> the guest requests. Previously, this behavior was implemented by
> >>>> realizing and unrealizing VFs at runtime. However, this strategy does
> >>>> not work well for the proposed virtio-net emulation; in this proposal,
> >>>> device options passed in the command line must be maintained as VFs
> >>>> are hotplugged, but they are consumed when the machine starts and not
> >>>> available after that, which makes realizing VFs at runtime impossible.
> >>>
> >>> Could we store the device options in the PF?
> >>
> >> I wrote it's to store the device options, but the problem is actually
> >> more about realizing VFs at runtime instead of at the initialization time.
> >>
> >> Realizing VFs at runtime have two major problems. One is that it delays
> >> the validations of options; invalid options will be noticed when the
> >> guest requests to realize VFs.
> >
> > If PCI spec allows the failure when creating VF, then it should not be
> > a problem.
>
> I doubt the spec cares such a failure at all. VF enablement should
> always work for a real hardware. It's neither user-friendly to tell
> configuration errors at runtime.

I'm not sure which options we should care about? Did you mean netdev
options or the virtio-net specific ones?

If VF stick to the same options as PF (except for the SRIOV), it
should be validate

Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation

2023-12-10 Thread Jason Wang
On Mon, Dec 11, 2023 at 1:30 PM Akihiko Odaki  wrote:
>
> On 2023/12/11 11:52, Jason Wang wrote:
> > On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki  
> > wrote:
> >>
> >> Introduction
> >> 
> >>
> >> This series is based on the RFC series submitted by Yui Washizu[1].
> >> See also [2] for the context.
> >>
> >> This series enables SR-IOV emulation for virtio-net. It is useful
> >> to test SR-IOV support on the guest, or to expose several vDPA devices
> >> in a VM. vDPA devices can also provide L2 switching feature for
> >> offloading though it is out of scope to allow the guest to configure
> >> such a feature.
> >>
> >> The PF side code resides in virtio-pci. The VF side code resides in
> >> the PCI common infrastructure, but it is restricted to work only for
> >> virtio-net-pci because of lack of validation.
> >>
> >> User Interface
> >> --
> >>
> >> A user can configure a SR-IOV capable virtio-net device by adding
> >> virtio-net-pci functions to a bus. Below is a command line example:
> >>-netdev user,id=n -netdev user,id=o
> >>-netdev user,id=p -netdev user,id=q
> >>-device pcie-root-port,id=b
> >>-device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
> >>-device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
> >>-device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
> >>-device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
> >>
> >> The VFs specify the paired PF with "sriov-pf" property. The PF must be
> >> added after all VFs. It is user's responsibility to ensure that VFs have
> >> function numbers larger than one of the PF, and the function numbers
> >> have a consistent stride.
> >
> > This seems not user friendly. Any reason we can't just allow user to
> > specify the stride here?
>
> It should be possible to assign addr automatically without requiring
> user to specify the stride. I'll try that in the next version.
>
> >
> > Btw, I vaguely remember qemu allows the params to be accepted as a
> > list. If this is true, we can accept a list of netdev here?
>
> Yes, rocker does that. But the problem is not just about getting
> parameters needed for VFs, which I forgot to mention in the cover letter
> and will explain below.
>
> >
> >>
> >> Keeping VF instances
> >> 
> >>
> >> A problem with SR-IOV emulation is that it needs to hotplug the VFs as
> >> the guest requests. Previously, this behavior was implemented by
> >> realizing and unrealizing VFs at runtime. However, this strategy does
> >> not work well for the proposed virtio-net emulation; in this proposal,
> >> device options passed in the command line must be maintained as VFs
> >> are hotplugged, but they are consumed when the machine starts and not
> >> available after that, which makes realizing VFs at runtime impossible.
> >
> > Could we store the device options in the PF?
>
> I wrote it's to store the device options, but the problem is actually
> more about realizing VFs at runtime instead of at the initialization time.
>
> Realizing VFs at runtime have two major problems. One is that it delays
> the validations of options; invalid options will be noticed when the
> guest requests to realize VFs.

If PCI spec allows the failure when creating VF, then it should not be
a problem.

> netdevs also warn that they are not used
> at initialization time, not knowing that they will be used by VFs later.

We could invent things to calm down this false positive.

> References to other QEMU objects in the option may also die before VFs
> are realized.

Is there any other thing than netdev we need to consider?

>
> The other problem is that QEMU cannot interact with the unrealized VFs.
> For example, if you type "device_add virtio-net-pci,id=vf,sriov-pf=pf"
> in HMP, you will expect "device_del vf" works, but it's hard to
> implement such behaviors with unrealized VFs.

I think hotplug can only be done at PF level if we do that.

>
> I was first going to compromise and allow such quirky behaviors, but I
> realized such a compromise is unnecessary if we reuse the PCI power down
> logic so I wrote v2.

Haven't checked the code, but anything related to the PM here?

Thanks

>
> Regards,
> Akihiko Odaki
>




Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation

2023-12-10 Thread Jason Wang
On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki  wrote:
>
> Introduction
> 
>
> This series is based on the RFC series submitted by Yui Washizu[1].
> See also [2] for the context.
>
> This series enables SR-IOV emulation for virtio-net. It is useful
> to test SR-IOV support on the guest, or to expose several vDPA devices
> in a VM. vDPA devices can also provide L2 switching feature for
> offloading though it is out of scope to allow the guest to configure
> such a feature.
>
> The PF side code resides in virtio-pci. The VF side code resides in
> the PCI common infrastructure, but it is restricted to work only for
> virtio-net-pci because of lack of validation.
>
> User Interface
> --
>
> A user can configure a SR-IOV capable virtio-net device by adding
> virtio-net-pci functions to a bus. Below is a command line example:
>   -netdev user,id=n -netdev user,id=o
>   -netdev user,id=p -netdev user,id=q
>   -device pcie-root-port,id=b
>   -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
>   -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
>   -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
>   -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
>
> The VFs specify the paired PF with "sriov-pf" property. The PF must be
> added after all VFs. It is user's responsibility to ensure that VFs have
> function numbers larger than one of the PF, and the function numbers
> have a consistent stride.

This seems not user friendly. Any reason we can't just allow user to
specify the stride here?

Btw, I vaguely remember qemu allows the params to be accepted as a
list. If this is true, we can accept a list of netdev here?

>
> Keeping VF instances
> 
>
> A problem with SR-IOV emulation is that it needs to hotplug the VFs as
> the guest requests. Previously, this behavior was implemented by
> realizing and unrealizing VFs at runtime. However, this strategy does
> not work well for the proposed virtio-net emulation; in this proposal,
> device options passed in the command line must be maintained as VFs
> are hotplugged, but they are consumed when the machine starts and not
> available after that, which makes realizing VFs at runtime impossible.

Could we store the device options in the PF?

Thanks

>
> As an strategy alternative to runtime realization/unrealization, this
> series proposes to reuse the code to power down PCI Express devices.
> When a PCI Express device is powered down, it will be hidden from the
> guest but will be kept realized. This effectively implements the
> behavior we need for the SR-IOV emulation.
>
> Summary
> ---
>
> Patch [1, 5] refactors the PCI infrastructure code.
> Patch [6, 10] adds user-created SR-IOV VF infrastructure.
> Patch 11 makes virtio-pci work as SR-IOV PF for user-created VFs.
> Patch 12 allows user to create SR-IOV VFs with virtio-net-pci.
>
> [1] 
> https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.wash...@gmail.com/
> [2] 
> https://lore.kernel.org/all/5d46f455-f530-4e5e-9ae7-13a2297d4...@daynix.com/
>
> Co-developed-by: Yui Washizu 
> Signed-off-by: Akihiko Odaki 
> ---
> Changes in v2:
> - Changed to keep VF instances.
> - Link to v1: 
> https://lore.kernel.org/r/20231202-sriov-v1-0-32b3570f7...@daynix.com
>
> ---
> Akihiko Odaki (12):
>   hw/pci: Initialize PCI multifunction after realization
>   hw/pci: Determine if rombar is explicitly enabled
>   hw/pci: Do not add ROM BAR for SR-IOV VF
>   vfio: Avoid inspecting option QDict for rombar
>   hw/qdev: Remove opts member
>   pcie_sriov: Reuse SR-IOV VF device instances
>   pcie_sriov: Release VFs failed to realize
>   pcie_sriov: Ensure PF and VF are mutually exclusive
>   pcie_sriov: Check PCI Express for SR-IOV PF
>   pcie_sriov: Allow user to create SR-IOV device
>   virtio-pci: Implement SR-IOV PF
>   virtio-net: Implement SR-IOV VF
>
>  docs/pcie_sriov.txt |   8 +-
>  include/hw/pci/pci.h|   2 +-
>  include/hw/pci/pci_device.h |  13 +-
>  include/hw/pci/pcie_sriov.h |  25 ++-
>  include/hw/qdev-core.h  |   4 -
>  hw/core/qdev.c  |   1 -
>  hw/net/igb.c|   3 +-
>  hw/nvme/ctrl.c  |   3 +-
>  hw/pci/pci.c|  98 +++-
>  hw/pci/pci_host.c   |   4 +-
>  hw/pci/pcie.c   |   4 +-
>  hw/pci/pcie_sriov.c | 360 
> +---
>  hw/vfio/pci.c   |   3 +-
>  hw/virtio/virtio-net-pci.c  |   1 +
>  hw/virtio/virtio-pci.c  |   7 +
>  system/qdev-monitor.c   |  12 +-
>  16 files changed, 395 insertions(+), 153 deletions(-)
> ---
> base-commit: 4705fc0c8511d073bee4751c3c974aab2b10a970
> change-id: 20231202-sriov-9402fb262be8
>
> Best regards,
> --
> Akihiko Odaki 
>




Re: [PATCH v3 09/14] hw/net/tulip: Finish QOM conversion

2023-03-05 Thread Jason Wang
On Tue, Feb 28, 2023 at 9:39 PM Philippe Mathieu-Daudé
 wrote:
>
> Hi Jason, do you Ack this patch?

Yes.

Acked-by: Jason Wang 

Thanks

>
> On 13/2/23 19:43, Philippe Mathieu-Daudé wrote:
> > Use the TULIP() and DEVICE() QOM type-checking macros.
> > Remove uses of DO_UPCAST().
> >
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> >   hw/net/tulip.c | 20 +++-
> >   1 file changed, 11 insertions(+), 9 deletions(-)
> >
> > diff --git a/hw/net/tulip.c b/hw/net/tulip.c
> > index 915e5fb595..990507859d 100644
> > --- a/hw/net/tulip.c
> > +++ b/hw/net/tulip.c
> > @@ -19,7 +19,10 @@
> >   #include "net/eth.h"
> >
> >   struct TULIPState {
> > +/*< private >*/
> >   PCIDevice dev;
> > +/*< public >*/
> > +
> >   MemoryRegion io;
> >   MemoryRegion memory;
> >   NICConf c;
> > @@ -959,7 +962,7 @@ static void tulip_fill_eeprom(TULIPState *s)
> >
> >   static void pci_tulip_realize(PCIDevice *pci_dev, Error **errp)
> >   {
> > -TULIPState *s = DO_UPCAST(TULIPState, dev, pci_dev);
> > +TULIPState *s = TULIP(pci_dev);
> >   uint8_t *pci_conf;
> >
> >   pci_conf = s->dev.config;
> > @@ -967,7 +970,7 @@ static void pci_tulip_realize(PCIDevice *pci_dev, Error 
> > **errp)
> >
> >   qemu_macaddr_default_if_unset(>c.macaddr);
> >
> > -s->eeprom = eeprom93xx_new(_dev->qdev, 64);
> > +s->eeprom = eeprom93xx_new(DEVICE(pci_dev), 64);
> >   tulip_fill_eeprom(s);
> >
> >   memory_region_init_io(>io, OBJECT(>dev), _ops, s,
> > @@ -983,27 +986,26 @@ static void pci_tulip_realize(PCIDevice *pci_dev, 
> > Error **errp)
> >
> >   s->nic = qemu_new_nic(_tulip_info, >c,
> > object_get_typename(OBJECT(pci_dev)),
> > -  pci_dev->qdev.id, s);
> > +  DEVICE(pci_dev)->id, s);
> >   qemu_format_nic_info_str(qemu_get_queue(s->nic), s->c.macaddr.a);
> >   }
> >
> >   static void pci_tulip_exit(PCIDevice *pci_dev)
> >   {
> > -TULIPState *s = DO_UPCAST(TULIPState, dev, pci_dev);
> > +TULIPState *s = TULIP(pci_dev);
> >
> >   qemu_del_nic(s->nic);
> >   qemu_free_irq(s->irq);
> > -eeprom93xx_free(_dev->qdev, s->eeprom);
> > +eeprom93xx_free(DEVICE(s), s->eeprom);
> >   }
> >
> >   static void tulip_instance_init(Object *obj)
> >   {
> > -PCIDevice *pci_dev = PCI_DEVICE(obj);
> > -TULIPState *d = DO_UPCAST(TULIPState, dev, pci_dev);
> > +TULIPState *s = TULIP(obj);
> >
> > -device_add_bootindex_property(obj, >c.bootindex,
> > +device_add_bootindex_property(obj, >c.bootindex,
> > "bootindex", "/ethernet-phy@0",
> > -  _dev->qdev);
> > +  DEVICE(obj));
> >   }
> >
> >   static Property tulip_properties[] = {
>




Re: [PATCH] vhost: mask VIRTIO_F_RING_RESET for vhost and vhost-user devices

2022-11-21 Thread Jason Wang
On Mon, Nov 21, 2022 at 6:11 PM Stefano Garzarella  wrote:
>
> Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> enabled VIRTIO_F_RING_RESET by default for all virtio devices.
>
> This feature is not currently emulated by QEMU, so for vhost and
> vhost-user devices we need to make sure it is supported by the offloaded
> device emulation (in-kernel or in another process).
> To do this we need to add VIRTIO_F_RING_RESET to the features bitmap
> passed to vhost_get_features(). This way it will be masked if the device
> does not support it.
>
> This issue was initially discovered with vhost-vsock and vhost-user-vsock,
> and then also tested with vhost-user-rng which confirmed the same issue.
> They fail when sending features through VHOST_SET_FEATURES ioctl or
> VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated
> by the guest (Linux >= v6.0), but not supported by the device.
>
> Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318
> Signed-off-by: Stefano Garzarella 

Acked-by: Jason Wang 

> ---
>
> To prevent this problem in the future, perhaps we should provide a function
> (e.g. vhost_device_get_features) where we go to mask all non-device-specific
> features (e.g VIRTIO_F_*, VIRTIO_RING_F_*) that are not emulated by QEMU but
> we expect them to be emulated by the vhost or vhost-user devices.

Adding Eugenio, this might not be true if we want to use shadow
virtqueue for emulating some features?

Thanks

> Then we can call it in all .get_features callbacks just before return the
> features.
>
> What do you think?
>
> But maybe better to do that for the next release, I will send an RFC.
>
> Thanks,
> Stefano
> ---
>  hw/block/vhost-user-blk.c  |  1 +
>  hw/net/vhost_net.c |  1 +
>  hw/scsi/vhost-scsi.c   |  1 +
>  hw/scsi/vhost-user-scsi.c  |  1 +
>  hw/virtio/vhost-user-fs.c  |  1 +
>  hw/virtio/vhost-user-gpio.c|  1 +
>  hw/virtio/vhost-user-i2c.c |  1 +
>  hw/virtio/vhost-user-rng.c | 11 +--
>  hw/virtio/vhost-vsock-common.c |  1 +
>  net/vhost-vdpa.c   |  1 +
>  10 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 16ad400889..0d5190accf 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -52,6 +52,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_F_NOTIFY_ON_EMPTY,
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_IOMMU_PLATFORM,
> +VIRTIO_F_RING_RESET,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index feda448878..26e4930676 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -75,6 +75,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_NET_F_MTU,
>  VIRTIO_F_IOMMU_PLATFORM,
>  VIRTIO_F_RING_PACKED,
> +VIRTIO_F_RING_RESET,
>  VIRTIO_NET_F_RSS,
>  VIRTIO_NET_F_HASH_REPORT,
>
> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> index bdf337a7a2..6a0fd0dfb1 100644
> --- a/hw/scsi/vhost-scsi.c
> +++ b/hw/scsi/vhost-scsi.c
> @@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
>  VIRTIO_RING_F_INDIRECT_DESC,
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_SCSI_F_HOTPLUG,
> +VIRTIO_F_RING_RESET,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index bc37317d55..b7a71a802c 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_RING_F_INDIRECT_DESC,
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_SCSI_F_HOTPLUG,
> +VIRTIO_F_RING_RESET,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index 1c40f42045..dc4014cdef 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -32,6 +32,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_F_NOTIFY_ON_EMPTY,
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_IOMMU_PLATFORM,
> +VIRTIO_F_RING_RESET,
>
>  VHOST_INVALID_FEATURE_BIT
>  };
> diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
> index 677d1c7730..5851cb3bc9 100644
> --- a/hw/virtio/vhost-user-gpio.c
> +++ b/hw/virtio/vhost-user-gpio.c
> @@ -24,6 +24,7 @@ static const int feature_bits[] = {
>  VIRTIO_RING_F_INDIRECT_DESC,
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_GPIO_F_IRQ,
> +VIRTIO_F_RING_RESET,
>  VHOST_INVALID_FEATURE_BIT
>  };
&g

Re: Re: [PATCH v2] hw/virtio/vhost-user: support obtain vdpa device's mac address automatically

2022-09-22 Thread Jason Wang
On Fri, Sep 23, 2022 at 11:55 AM ho...@yusur.tech  wrote:
>
> On Thu, 22 Sep 2022 09:34:41 +0800 Jason Wang  wrote:
>
>
> >On Thu, Sep 22, 2022 at 1:58 AM Raphael Norwitz
> > wrote:
> >>
> >> If I read your response on the other thread correctly, this change is 
> >> intended
> >>
> >> to prioritize the MAC address exposed by DPDK over the one provided by the
> >>
> >> QEMU command line? Sounds reasonable in principle, but I would get 
> >> confirmation
> >>
> >> from vDPA/vhost-net maintainers.
>
> >I think the best way is to (and it seems easier)
>
> >1) have the management layer to make sure the mac came from cli
> >matches the underlayer vDPA
>
>  Agreed, that's no problem.
>
> >2) having a sanity check and fail the device initialization if they don't 
> >match
>
> However, one MAC address for integrity check is from the cli, and the other 
> MAC address is from the vDPA device,
> How to get it?

VHOST_USER_GET_CONFIG?

Thanks

>
> The current situation is if MAC came from cli don't match the underlayer 
> vDPA, the virtual machine can still start without any hints.
>
> Thanks
>
>
> >Thanks
>
> >>
> >>
> >>
> >> That said the way you’re hacking the vhost-user code breaks a valuable 
> >> check for
> >>
> >> bad vhost-user-blk backends. I would suggest finding another 
> >> implementation which
> >>
> >> does not regress functionality for other device types.
> >>
> >>
> >>
> >>
> >>
> >> >From: Hao Chen 
> >>
> >> >
> >>
> >> >When use dpdk-vdpa tests vdpa device. You need to specify the mac address 
> >> >to
> >>
> >> >start the virtual machine through libvirt or qemu, but now, the libvirt or
> >>
> >> >qemu can call dpdk vdpa vendor driver's ops .get_config through 
> >> >vhost_net_get_config
> >>
> >> >to get the mac address of the vdpa hardware without manual configuration.
> >>
> >> >
> >>
> >> >v1->v2:
> >>
> >> >Only copy ETH_ALEN data of netcfg for some vdpa device such as
> >>
> >> >NVIDIA BLUEFIELD DPU(BF2)'s netcfg->status is not right.
> >>
> >> >We only need the mac address and don't care about the status field.
> >>
> >> >
> >>
> >> >Signed-off-by: Hao Chen 
> >>
> >> >---
> >>
> >> > hw/block/vhost-user-blk.c |  1 -
> >>
> >> > hw/net/virtio-net.c   |  7 +++
> >>
> >> > hw/virtio/vhost-user.c| 19 ---
> >>
> >> > 3 files changed, 7 insertions(+), 20 deletions(-)
> >>
> >> >
> >>
> >> >diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> >>
> >> >index 9117222456..5dca4eab09 100644
> >>
> >> >--- a/hw/block/vhost-user-blk.c
> >>
> >> >+++ b/hw/block/vhost-user-blk.c
> >>
> >> >@@ -337,7 +337,6 @@ static int vhost_user_blk_connect(DeviceState *dev, 
> >> >Error **errp)
> >>
> >> >
> >>
> >> > vhost_dev_set_config_notifier(>dev, _ops);
> >>
> >> >
> >>
> >> >-s->vhost_user.supports_config = true;
> >>
> >> > ret = vhost_dev_init(>dev, >vhost_user, 
> >> > VHOST_BACKEND_TYPE_USER, 0,
> >>
> >> >  errp);
> >>
> >> > if (ret < 0) {
> >>
> >> >diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>
> >> >index dd0d056fde..90405083b1 100644
> >>
> >> >--- a/hw/net/virtio-net.c
> >>
> >> >+++ b/hw/net/virtio-net.c
> >>
> >> >@@ -166,6 +166,13 @@ static void virtio_net_get_config(VirtIODevice 
> >> >*vdev, uint8_t *config)
> >>
> >> > }
> >>
> >> > memcpy(config, , n->config_size);
> >>
> >> > }
> >>
> >> >+} else if (nc->peer && nc->peer->info->type == 
> >> >NET_CLIENT_DRIVER_VHOST_USER) {
> >>
> >> >+ret = vhost_net_get_config(get_vhost_net(nc->peer), (uint8_t 
> >> >*),
> >>
> >> >+   n->config_size);
> &g

Re: [PATCH v2] hw/virtio/vhost-user: support obtain vdpa device's mac address automatically

2022-09-22 Thread Jason Wang
On Fri, Sep 23, 2022 at 11:33 AM 陈浩  wrote:
>
>
> On 2022/9/22 17:56, Michael S. Tsirkin wrote:
> > On Thu, Sep 22, 2022 at 09:34:41AM +0800, Jason Wang wrote:
> >> On Thu, Sep 22, 2022 at 1:58 AM Raphael Norwitz
> >>  wrote:
> >>> If I read your response on the other thread correctly, this change is 
> >>> intended
> >>>
> >>> to prioritize the MAC address exposed by DPDK over the one provided by the
> >>>
> >>> QEMU command line? Sounds reasonable in principle, but I would get 
> >>> confirmation
> >>>
> >>> from vDPA/vhost-net maintainers.
> >> I think the best way is to (and it seems easier)
> >>
> >> 1) have the management layer to make sure the mac came from cli
> >> matches the underlayer vDPA
> >> 2) having a sanity check and fail the device initialization if they don't 
> >> match
> >>
> >> Thanks
> > I think I agree, we should not overwrite user supplied arguments.
> > I think the case where we might want to get the mac from VDPA
> > and use it is when it's been assigned randomly as opposed to coming from 
> > cli.
> Yes, when the cli passes in a random mac address that is inconsistent
> with the vdpa device, qemu still starts the virtual machine as usual,
> but in this case, the qemu and vdpa environments do not work correctly.
> So I want to get mac address directly from vdpa device instead of cli.

Let's teach the management to do that. It has a lot of advantages, more below.

Not sure for the DPDK case, but kernel vDPA allows the mgmt to
provision the vDPA with a mac address which could be used in this
case.

>
> Having a sanity check and fail the device initialization if they don't
> match is a good idea, but it seems more convenient to directly overwrite
> the cli mac address.

This will confuse the management where it may do a lot of mac related setups.

Thanks

> >
> >
> >>>
> >>>
> >>> That said the way you’re hacking the vhost-user code breaks a valuable 
> >>> check for
> >>>
> >>> bad vhost-user-blk backends. I would suggest finding another 
> >>> implementation which
> >>>
> >>> does not regress functionality for other device types.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> From: Hao Chen 
> >>>> When use dpdk-vdpa tests vdpa device. You need to specify the mac 
> >>>> address to
> >>>> start the virtual machine through libvirt or qemu, but now, the libvirt 
> >>>> or
> >>>> qemu can call dpdk vdpa vendor driver's ops .get_config through 
> >>>> vhost_net_get_config
> >>>> to get the mac address of the vdpa hardware without manual configuration.
> >>>> v1->v2:
> >>>> Only copy ETH_ALEN data of netcfg for some vdpa device such as
> >>>> NVIDIA BLUEFIELD DPU(BF2)'s netcfg->status is not right.
> >>>> We only need the mac address and don't care about the status field.
> >>>> Signed-off-by: Hao Chen 
> >>>> ---
> >>>> hw/block/vhost-user-blk.c |  1 -
> >>>> hw/net/virtio-net.c   |  7 +++
> >>>> hw/virtio/vhost-user.c| 19 ---
> >>>> 3 files changed, 7 insertions(+), 20 deletions(-)
> >>>> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> >>>> index 9117222456..5dca4eab09 100644
> >>>> --- a/hw/block/vhost-user-blk.c
> >>>> +++ b/hw/block/vhost-user-blk.c
> >>>> @@ -337,7 +337,6 @@ static int vhost_user_blk_connect(DeviceState *dev, 
> >>>> Error **errp)
> >>>>  vhost_dev_set_config_notifier(>dev, _ops);
> >>>> -s->vhost_user.supports_config = true;
> >>>>  ret = vhost_dev_init(>dev, >vhost_user, 
> >>>> VHOST_BACKEND_TYPE_USER, 0,
> >>>>   errp);
> >>>>  if (ret < 0) {
> >>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>>> index dd0d056fde..90405083b1 100644
> >>>> --- a/hw/net/virtio-net.c
> >>>> +++ b/hw/net/virtio-net.c
> >>>> @@ -166,6 +166,13 @@ static void virtio_net_get_config(VirtIODevice 
> >>>> *vdev, uint8_t *config)
> >>>>  }
> >>>>  memcpy(config, , n->config_size);
> >>>>  }
> >>>> +} else if

Re: [PATCH v2] hw/virtio/vhost-user: support obtain vdpa device's mac address automatically

2022-09-21 Thread Jason Wang
On Thu, Sep 22, 2022 at 1:58 AM Raphael Norwitz
 wrote:
>
> If I read your response on the other thread correctly, this change is intended
>
> to prioritize the MAC address exposed by DPDK over the one provided by the
>
> QEMU command line? Sounds reasonable in principle, but I would get 
> confirmation
>
> from vDPA/vhost-net maintainers.

I think the best way is to (and it seems easier)

1) have the management layer to make sure the mac came from cli
matches the underlayer vDPA
2) having a sanity check and fail the device initialization if they don't match

Thanks

>
>
>
> That said the way you’re hacking the vhost-user code breaks a valuable check 
> for
>
> bad vhost-user-blk backends. I would suggest finding another implementation 
> which
>
> does not regress functionality for other device types.
>
>
>
>
>
> >From: Hao Chen 
>
> >
>
> >When use dpdk-vdpa tests vdpa device. You need to specify the mac address to
>
> >start the virtual machine through libvirt or qemu, but now, the libvirt or
>
> >qemu can call dpdk vdpa vendor driver's ops .get_config through 
> >vhost_net_get_config
>
> >to get the mac address of the vdpa hardware without manual configuration.
>
> >
>
> >v1->v2:
>
> >Only copy ETH_ALEN data of netcfg for some vdpa device such as
>
> >NVIDIA BLUEFIELD DPU(BF2)'s netcfg->status is not right.
>
> >We only need the mac address and don't care about the status field.
>
> >
>
> >Signed-off-by: Hao Chen 
>
> >---
>
> > hw/block/vhost-user-blk.c |  1 -
>
> > hw/net/virtio-net.c   |  7 +++
>
> > hw/virtio/vhost-user.c| 19 ---
>
> > 3 files changed, 7 insertions(+), 20 deletions(-)
>
> >
>
> >diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
>
> >index 9117222456..5dca4eab09 100644
>
> >--- a/hw/block/vhost-user-blk.c
>
> >+++ b/hw/block/vhost-user-blk.c
>
> >@@ -337,7 +337,6 @@ static int vhost_user_blk_connect(DeviceState *dev, 
> >Error **errp)
>
> >
>
> > vhost_dev_set_config_notifier(>dev, _ops);
>
> >
>
> >-s->vhost_user.supports_config = true;
>
> > ret = vhost_dev_init(>dev, >vhost_user, VHOST_BACKEND_TYPE_USER, 
> > 0,
>
> >  errp);
>
> > if (ret < 0) {
>
> >diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>
> >index dd0d056fde..90405083b1 100644
>
> >--- a/hw/net/virtio-net.c
>
> >+++ b/hw/net/virtio-net.c
>
> >@@ -166,6 +166,13 @@ static void virtio_net_get_config(VirtIODevice *vdev, 
> >uint8_t *config)
>
> > }
>
> > memcpy(config, , n->config_size);
>
> > }
>
> >+} else if (nc->peer && nc->peer->info->type == 
> >NET_CLIENT_DRIVER_VHOST_USER) {
>
> >+ret = vhost_net_get_config(get_vhost_net(nc->peer), (uint8_t 
> >*),
>
> >+   n->config_size);
>
> >+if (ret != -1) {
>
> >+   /* Automatically obtain the mac address of the vdpa device
>
> >+* when using the dpdk vdpa */
>
> >+memcpy(config, , ETH_ALEN);
>
> > }
>
> > }
>
> >
>
> >diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>
> >index bd24741be8..8b01078249 100644
>
> >--- a/hw/virtio/vhost-user.c
>
> >+++ b/hw/virtio/vhost-user.c
>
> >@@ -2013,8 +2013,6 @@ static int vhost_user_backend_init(struct vhost_dev 
> >*dev, void *opaque,
>
> > }
>
> >
>
> > if (virtio_has_feature(features, VHOST_USER_F_PROTOCOL_FEATURES)) {
>
> >-bool supports_f_config = vus->supports_config ||
>
> >-(dev->config_ops && dev->config_ops->vhost_dev_config_notifier);
>
> > uint64_t protocol_features;
>
> >
>
> > dev->backend_features |= 1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
>
> >@@ -2033,23 +2031,6 @@ static int vhost_user_backend_init(struct vhost_dev 
> >*dev, void *opaque,
>
> >  */
>
> > protocol_features &= VHOST_USER_PROTOCOL_FEATURE_MASK;
>
> >
>
> >-if (supports_f_config) {
>
> >-if (!virtio_has_feature(protocol_features,
>
> >-VHOST_USER_PROTOCOL_F_CONFIG)) {
>
> >-error_setg(errp, "vhost-user device expecting "
>
> >-   "VHOST_USER_PROTOCOL_F_CONFIG but the vhost-user 
> >backend does "
>
> >-   "not support it.");
>
> >-return -EPROTO;
>
> >-}
>
> >-} else {
>
> >-if (virtio_has_feature(protocol_features,
>
> >-   VHOST_USER_PROTOCOL_F_CONFIG)) {
>
> >-warn_reportf_err(*errp, "vhost-user backend supports "
>
> >- "VHOST_USER_PROTOCOL_F_CONFIG but QEMU 
> >does not.");
>
> >-protocol_features &= ~(1ULL << 
> >VHOST_USER_PROTOCOL_F_CONFIG);
>
> >-}
>
> >-}
>
> >-
>
> > /* final set of protocol features */
>
> > dev->protocol_features = protocol_features;
>
> > err = vhost_user_set_protocol_features(dev, dev->protocol_features);
>
> >--
>
> >2.27.0
>
> >

Re: [PATCH] hw/virtio/vhost-user: support obtain vdpa device's mac address automatically

2022-09-19 Thread Jason Wang
On Tue, Sep 13, 2022 at 5:13 PM Hao Chen  wrote:
>
> When use dpdk-vdpa tests vdpa device. You need to specify the mac address to
> start the virtual machine through libvirt or qemu, but now, the libvirt or
> qemu can call dpdk vdpa vendor driver's ops .get_config through 
> vhost_net_get_config
> to get the mac address of the vdpa hardware without manual configuration.
>
> Signed-off-by: Hao Chen 

Adding Cindy for comments.

Thanks

> ---
>  hw/block/vhost-user-blk.c |  1 -
>  hw/net/virtio-net.c   |  3 ++-
>  hw/virtio/vhost-user.c| 19 ---
>  3 files changed, 2 insertions(+), 21 deletions(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 9117222456..5dca4eab09 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -337,7 +337,6 @@ static int vhost_user_blk_connect(DeviceState *dev, Error 
> **errp)
>
>  vhost_dev_set_config_notifier(>dev, _ops);
>
> -s->vhost_user.supports_config = true;
>  ret = vhost_dev_init(>dev, >vhost_user, VHOST_BACKEND_TYPE_USER, 0,
>   errp);
>  if (ret < 0) {
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index dd0d056fde..274ea84644 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -149,7 +149,8 @@ static void virtio_net_get_config(VirtIODevice *vdev, 
> uint8_t *config)
>   * Is this VDPA? No peer means not VDPA: there's no way to
>   * disconnect/reconnect a VDPA peer.
>   */
> -if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> +if ((nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) ||
> +(nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER)) {
>  ret = vhost_net_get_config(get_vhost_net(nc->peer), (uint8_t 
> *),
> n->config_size);
>  if (ret != -1) {
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index bd24741be8..8b01078249 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -2013,8 +2013,6 @@ static int vhost_user_backend_init(struct vhost_dev 
> *dev, void *opaque,
>  }
>
>  if (virtio_has_feature(features, VHOST_USER_F_PROTOCOL_FEATURES)) {
> -bool supports_f_config = vus->supports_config ||
> -(dev->config_ops && dev->config_ops->vhost_dev_config_notifier);
>  uint64_t protocol_features;
>
>  dev->backend_features |= 1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
> @@ -2033,23 +2031,6 @@ static int vhost_user_backend_init(struct vhost_dev 
> *dev, void *opaque,
>   */
>  protocol_features &= VHOST_USER_PROTOCOL_FEATURE_MASK;
>
> -if (supports_f_config) {
> -if (!virtio_has_feature(protocol_features,
> -VHOST_USER_PROTOCOL_F_CONFIG)) {
> -error_setg(errp, "vhost-user device expecting "
> -   "VHOST_USER_PROTOCOL_F_CONFIG but the vhost-user 
> backend does "
> -   "not support it.");
> -return -EPROTO;
> -}
> -} else {
> -if (virtio_has_feature(protocol_features,
> -   VHOST_USER_PROTOCOL_F_CONFIG)) {
> -warn_reportf_err(*errp, "vhost-user backend supports "
> - "VHOST_USER_PROTOCOL_F_CONFIG but QEMU does 
> not.");
> -protocol_features &= ~(1ULL << VHOST_USER_PROTOCOL_F_CONFIG);
> -}
> -}
> -
>  /* final set of protocol features */
>  dev->protocol_features = protocol_features;
>  err = vhost_user_set_protocol_features(dev, dev->protocol_features);
> --
> 2.27.0
>




Re: [PATCH v3 05/21] block/vhost-user-blk-server: don't expose VHOST_USER_F_PROTOCOL_FEATURES

2022-07-28 Thread Jason Wang



在 2022/7/27 03:21, Alex Bennée 写道:

This bit is unused in actual VirtIO feature negotiation and should
only appear in the vhost-user messages between master and slave.



I might be wrong, but this is actually used between master and slave not 
the device and driver?


Thanks




[AJB: experiment, this doesn't break the tests but I'm not super
confident of the range of tests]

Signed-off-by: Alex Bennée 
---
  block/export/vhost-user-blk-server.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index 3409d9e02e..d31436006d 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -125,8 +125,7 @@ static uint64_t vu_blk_get_features(VuDev *dev)
 1ull << VIRTIO_BLK_F_MQ |
 1ull << VIRTIO_F_VERSION_1 |
 1ull << VIRTIO_RING_F_INDIRECT_DESC |
-   1ull << VIRTIO_RING_F_EVENT_IDX |
-   1ull << VHOST_USER_F_PROTOCOL_FEATURES;
+   1ull << VIRTIO_RING_F_EVENT_IDX ;
  
  if (!vexp->handler.writable) {

  features |= 1ull << VIRTIO_BLK_F_RO;





Re: [PATCH] aio-posix: fix spurious ->poll_ready() callbacks in main loop

2022-02-24 Thread Jason Wang



在 2022/2/23 下午11:57, Stefan Hajnoczi 写道:

When ->poll() succeeds the AioHandler is placed on the ready list with
revents set to the magic value 0. This magic value causes
aio_dispatch_handler() to invoke ->poll_ready() instead of ->io_read()
for G_IO_IN or ->io_write() for G_IO_OUT.

This magic value 0 hack works for the IOThread where AioHandlers are
placed on ->ready_list and processed by aio_dispatch_ready_handlers().
It does not work for the main loop where all AioHandlers are processed
by aio_dispatch_handlers(), even those that are not ready and have a
revents value of 0.

As a result the main loop invokes ->poll_ready() on AioHandlers that are
not ready. These spurious ->poll_ready() calls waste CPU cycles and
could lead to crashes if the code assumes ->poll() must have succeeded
before ->poll_ready() is called (a reasonable asumption but I haven't
seen it in practice).

Stop using revents to track whether ->poll_ready() will be called on an
AioHandler. Introduce a separate AioHandler->poll_ready field instead.
This eliminates spurious ->poll_ready() calls in the main loop.

Fixes: 826cc32423db2a99d184dbf4f507c737d7e7a4ae ("aio-posix: split poll check from 
ready handler")
Signed-off-by: Stefan Hajnoczi 



Reported-by: Jason Wang 

Tested-by: Jason Wang 

Thanks



---
  util/aio-posix.h |  1 +
  util/aio-posix.c | 32 ++--
  2 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/util/aio-posix.h b/util/aio-posix.h
index 7f2c37a684..80b927c7f4 100644
--- a/util/aio-posix.h
+++ b/util/aio-posix.h
@@ -37,6 +37,7 @@ struct AioHandler {
  unsigned flags; /* see fdmon-io_uring.c */
  #endif
  int64_t poll_idle_timeout; /* when to stop userspace polling */
+bool poll_ready; /* has polling detected an event? */
  bool is_external;
  };
  
diff --git a/util/aio-posix.c b/util/aio-posix.c

index 7b9f629218..be0182a3c6 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -23,15 +23,6 @@
  #include "trace.h"
  #include "aio-posix.h"
  
-/*

- * G_IO_IN and G_IO_OUT are not appropriate revents values for polling, since
- * the handler may not need to access the file descriptor. For example, the
- * handler doesn't need to read from an EventNotifier if it polled a memory
- * location and a read syscall would be slow. Define our own unique revents
- * value to indicate that polling determined this AioHandler is ready.
- */
-#define REVENTS_POLL_READY 0
-
  /* Stop userspace polling on a handler if it isn't active for some time */
  #define POLL_IDLE_INTERVAL_NS (7 * NANOSECONDS_PER_SECOND)
  
@@ -49,6 +40,14 @@ void aio_add_ready_handler(AioHandlerList *ready_list,

  QLIST_INSERT_HEAD(ready_list, node, node_ready);
  }
  
+static void aio_add_poll_ready_handler(AioHandlerList *ready_list,

+   AioHandler *node)
+{
+QLIST_SAFE_REMOVE(node, node_ready); /* remove from nested parent's list */
+node->poll_ready = true;
+QLIST_INSERT_HEAD(ready_list, node, node_ready);
+}
+
  static AioHandler *find_aio_handler(AioContext *ctx, int fd)
  {
  AioHandler *node;
@@ -76,6 +75,7 @@ static bool aio_remove_fd_handler(AioContext *ctx, AioHandler 
*node)
  }
  
  node->pfd.revents = 0;

+node->poll_ready = false;
  
  /* If the fd monitor has already marked it deleted, leave it alone */

  if (QLIST_IS_INSERTED(node, node_deleted)) {
@@ -247,7 +247,7 @@ static bool poll_set_started(AioContext *ctx, 
AioHandlerList *ready_list,
  
  /* Poll one last time in case ->io_poll_end() raced with the event */

  if (!started && node->io_poll(node->opaque)) {
-aio_add_ready_handler(ready_list, node, REVENTS_POLL_READY);
+aio_add_poll_ready_handler(ready_list, node);
  progress = true;
  }
  }
@@ -282,6 +282,7 @@ bool aio_pending(AioContext *ctx)
  QLIST_FOREACH_RCU(node, >aio_handlers, node) {
  int revents;
  
+/* TODO should this check poll ready? */

  revents = node->pfd.revents & node->pfd.events;
  if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read &&
  aio_node_check(ctx, node->is_external)) {
@@ -323,11 +324,15 @@ static void aio_free_deleted_handlers(AioContext *ctx)
  static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
  {
  bool progress = false;
+bool poll_ready;
  int revents;
  
  revents = node->pfd.revents & node->pfd.events;

  node->pfd.revents = 0;
  
+poll_ready = node->poll_ready;

+node->poll_ready = false;
+
  /*
   * Start polling AioHandlers when they become ready because activity is
   * likely to continue.  Note that starvation is theoretically possible 
when
@@ -344,7 +349,7 @@ static bool aio_dispatch_handler(AioContext *ctx, 
AioHandler *nod

Re: [RFC PATCH 2/3] hw/sd/sdhci: Prohibit DMA accesses to devices

2021-12-15 Thread Jason Wang
On Thu, Dec 16, 2021 at 4:57 AM Philippe Mathieu-Daudé
 wrote:
>
> From: Philippe Mathieu-Daudé 
>
> The issue reported by OSS-Fuzz produces the following backtrace:
>
>   ==447470==ERROR: AddressSanitizer: heap-buffer-overflow
>   READ of size 1 at 0x6152a080 thread T0
>   #0 0x71766d47 in sdhci_read_dataport hw/sd/sdhci.c:474:18
>   #1 0x7175f139 in sdhci_read hw/sd/sdhci.c:1022:19
>   #2 0x721b937b in memory_region_read_accessor softmmu/memory.c:440:11
>   #3 0x72171e51 in access_with_adjusted_size softmmu/memory.c:554:18
>   #4 0x7216f47c in memory_region_dispatch_read1 softmmu/memory.c:1424:16
>   #5 0x7216ebb9 in memory_region_dispatch_read softmmu/memory.c:1452:9
>   #6 0x7212db5d in flatview_read_continue softmmu/physmem.c:2879:23
>   #7 0x7212f958 in flatview_read softmmu/physmem.c:2921:12
>   #8 0x7212f418 in address_space_read_full softmmu/physmem.c:2934:18
>   #9 0x721305a9 in address_space_rw softmmu/physmem.c:2962:16
>   #10 0x7175a392 in dma_memory_rw_relaxed include/sysemu/dma.h:89:12
>   #11 0x7175a0ea in dma_memory_rw include/sysemu/dma.h:132:12
>   #12 0x71759684 in dma_memory_read include/sysemu/dma.h:152:12
>   #13 0x7175518c in sdhci_do_adma hw/sd/sdhci.c:823:27
>   #14 0x7174bf69 in sdhci_data_transfer hw/sd/sdhci.c:935:13
>   #15 0x7176aaa7 in sdhci_send_command hw/sd/sdhci.c:376:9
>   #16 0x717629ee in sdhci_write hw/sd/sdhci.c:1212:9
>   #17 0x72172513 in memory_region_write_accessor softmmu/memory.c:492:5
>   #18 0x72171e51 in access_with_adjusted_size softmmu/memory.c:554:18
>   #19 0x72170766 in memory_region_dispatch_write softmmu/memory.c:1504:16
>   #20 0x721419ee in flatview_write_continue softmmu/physmem.c:2812:23
>   #21 0x721301eb in flatview_write softmmu/physmem.c:2854:12
>   #22 0x7212fca8 in address_space_write softmmu/physmem.c:2950:18
>   #23 0x721d9a53 in qtest_process_command softmmu/qtest.c:727:9
>
> A DMA descriptor is previously filled in RAM. An I/O access to the
> device (frames #22 to #16) start the DMA engine (frame #13). The
> engine fetch the descriptor and execute the request, which itself
> accesses the SDHCI I/O registers (frame #1 and #0), triggering a
> re-entrancy issue.
>
> Fix by prohibit transactions from the DMA to devices. The DMA engine
> is thus restricted to memories.
>
> Reported-by: OSS-Fuzz (Issue 36391)
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/451
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/sd/sdhci.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
> index fe2f21f0c37..0e5e988927e 100644
> --- a/hw/sd/sdhci.c
> +++ b/hw/sd/sdhci.c
> @@ -741,6 +741,7 @@ static void sdhci_do_adma(SDHCIState *s)
>  {
>  unsigned int begin, length;
>  const uint16_t block_size = s->blksize & BLOCK_SIZE_MASK;
> +const MemTxAttrs attrs = { .memory = true };
>  ADMADescr dscr = {};
>  MemTxResult res;
>  int i;
> @@ -794,7 +795,7 @@ static void sdhci_do_adma(SDHCIState *s)
>  res = dma_memory_write(s->dma_as, dscr.addr,
> >fifo_buffer[begin],
> s->data_count - begin,
> -   MEMTXATTRS_UNSPECIFIED);
> +   attrs);
>  if (res != MEMTX_OK) {
>  break;
>  }
> @@ -823,7 +824,7 @@ static void sdhci_do_adma(SDHCIState *s)
>  res = dma_memory_read(s->dma_as, dscr.addr,
>>fifo_buffer[begin],
>s->data_count - begin,
> -  MEMTXATTRS_UNSPECIFIED);
> +  attrs);
>  if (res != MEMTX_OK) {
>  break;
>  }

I wonder how we can fix this for other devices, as this seems to be a
known issue for many years. We've received many reports from the
networking side.

It looks like this patch simply forbids p2p which is probably not the
case for other devices.

I remember there's ideas like using bh from Paolo or detecting
reentrancy in the memory core, both of them seems more general than
this?

Thanks

> --
> 2.33.1
>




Re: [PATCH v2 03/15] net/vhost-vdpa: Fix device compatibility check

2021-10-13 Thread Jason Wang



在 2021/10/8 下午9:34, Kevin Wolf 写道:

vhost-vdpa works only with specific devices. At startup, it second
guesses what the command line option handling will do and error out if
it thinks a non-virtio device will attach to them.

This second guessing is not only ugly, it can lead to wrong error
messages ('-device floppy,netdev=foo' should complain about an unknown
property, not about the wrong kind of network device being attached) and
completely ignores hotplugging.

Drop the old checks and implement .check_peer_type() instead to fix
this. As a nice side effect, it also removes one more dependency on the
legacy QemuOpts infrastructure and even reduces the code size.

Signed-off-by: Kevin Wolf 



Acked-by: Jason Wang 



---
  net/vhost-vdpa.c | 37 ++---
  1 file changed, 14 insertions(+), 23 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 912686457c..6dc68d8677 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -147,12 +147,26 @@ static bool vhost_vdpa_has_ufo(NetClientState *nc)
  
  }
  
+static bool vhost_vdpa_check_peer_type(NetClientState *nc, ObjectClass *oc,

+   Error **errp)
+{
+const char *driver = object_class_get_name(oc);
+
+if (!g_str_has_prefix(driver, "virtio-net-")) {
+error_setg(errp, "vhost-vdpa requires frontend driver virtio-net-*");
+return false;
+}
+
+return true;
+}
+
  static NetClientInfo net_vhost_vdpa_info = {
  .type = NET_CLIENT_DRIVER_VHOST_VDPA,
  .size = sizeof(VhostVDPAState),
  .cleanup = vhost_vdpa_cleanup,
  .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
  .has_ufo = vhost_vdpa_has_ufo,
+.check_peer_type = vhost_vdpa_check_peer_type,
  };
  
  static int net_vhost_vdpa_init(NetClientState *peer, const char *device,

@@ -179,24 +193,6 @@ static int net_vhost_vdpa_init(NetClientState *peer, const 
char *device,
  return ret;
  }
  
-static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp)

-{
-const char *name = opaque;
-const char *driver, *netdev;
-
-driver = qemu_opt_get(opts, "driver");
-netdev = qemu_opt_get(opts, "netdev");
-if (!driver || !netdev) {
-return 0;
-}
-if (strcmp(netdev, name) == 0 &&
-!g_str_has_prefix(driver, "virtio-net-")) {
-error_setg(errp, "vhost-vdpa requires frontend driver virtio-net-*");
-return -1;
-}
-return 0;
-}
-
  int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
  NetClientState *peer, Error **errp)
  {
@@ -204,10 +200,5 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
  
  assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);

  opts = >u.vhost_vdpa;
-/* verify net frontend */
-if (qemu_opts_foreach(qemu_find_opts("device"), net_vhost_check_net,
-  (char *)name, errp)) {
-return -1;
-}
  return net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name, opts->vhostdev);
  }





Re: [PATCH v2 02/15] net/vhost-user: Fix device compatibility check

2021-10-13 Thread Jason Wang



在 2021/10/8 下午9:34, Kevin Wolf 写道:

vhost-user works only with specific devices. At startup, it second
guesses what the command line option handling will do and error out if
it thinks a non-virtio device will attach to them.

This second guessing is not only ugly, it can lead to wrong error
messages ('-device floppy,netdev=foo' should complain about an unknown
property, not about the wrong kind of network device being attached) and
completely ignores hotplugging.

Drop the old checks and implement .check_peer_type() instead to fix
this. As a nice side effect, it also removes one more dependency on the
legacy QemuOpts infrastructure and even reduces the code size.

Signed-off-by: Kevin Wolf 



Acked-by: Jason Wang 



---
  net/vhost-user.c | 41 ++---
  1 file changed, 14 insertions(+), 27 deletions(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 4a939124d2..b1a0247b59 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -198,6 +198,19 @@ static bool vhost_user_has_ufo(NetClientState *nc)
  return true;
  }
  
+static bool vhost_user_check_peer_type(NetClientState *nc, ObjectClass *oc,

+   Error **errp)
+{
+const char *driver = object_class_get_name(oc);
+
+if (!g_str_has_prefix(driver, "virtio-net-")) {
+error_setg(errp, "vhost-user requires frontend driver virtio-net-*");
+return false;
+}
+
+return true;
+}
+
  static NetClientInfo net_vhost_user_info = {
  .type = NET_CLIENT_DRIVER_VHOST_USER,
  .size = sizeof(NetVhostUserState),
@@ -207,6 +220,7 @@ static NetClientInfo net_vhost_user_info = {
  .has_ufo = vhost_user_has_ufo,
  .set_vnet_be = vhost_user_set_vnet_endianness,
  .set_vnet_le = vhost_user_set_vnet_endianness,
+.check_peer_type = vhost_user_check_peer_type,
  };
  
  static gboolean net_vhost_user_watch(void *do_not_use, GIOCondition cond,

@@ -397,27 +411,6 @@ static Chardev *net_vhost_claim_chardev(
  return chr;
  }
  
-static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp)

-{
-const char *name = opaque;
-const char *driver, *netdev;
-
-driver = qemu_opt_get(opts, "driver");
-netdev = qemu_opt_get(opts, "netdev");
-
-if (!driver || !netdev) {
-return 0;
-}
-
-if (strcmp(netdev, name) == 0 &&
-!g_str_has_prefix(driver, "virtio-net-")) {
-error_setg(errp, "vhost-user requires frontend driver virtio-net-*");
-return -1;
-}
-
-return 0;
-}
-
  int net_init_vhost_user(const Netdev *netdev, const char *name,
  NetClientState *peer, Error **errp)
  {
@@ -433,12 +426,6 @@ int net_init_vhost_user(const Netdev *netdev, const char 
*name,
  return -1;
  }
  
-/* verify net frontend */

-if (qemu_opts_foreach(qemu_find_opts("device"), net_vhost_check_net,
-  (char *)name, errp)) {
-return -1;
-}
-
  queues = vhost_user_opts->has_queues ? vhost_user_opts->queues : 1;
  if (queues < 1 || queues > MAX_QUEUE_NUM) {
  error_setg(errp,





Re: [PATCH v2 01/15] net: Introduce NetClientInfo.check_peer_type()

2021-10-13 Thread Jason Wang



在 2021/10/8 下午9:34, Kevin Wolf 写道:

Some network backends (vhost-user and vhost-vdpa) work only with
specific devices. At startup, they second guess what the command line
option handling will do and error out if they think a non-virtio device
will attach to them.

This second guessing is not only ugly, it can lead to wrong error
messages ('-device floppy,netdev=foo' should complain about an unknown
property, not about the wrong kind of network device being attached) and
completely ignores hotplugging.

Add a callback where backends can check compatibility with a device when
it actually tries to attach, even on hotplug.

Signed-off-by: Kevin Wolf 



Acked-by: Jason Wang 



---
  include/net/net.h| 2 ++
  hw/core/qdev-properties-system.c | 6 ++
  2 files changed, 8 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 5d1508081f..986288eb07 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -62,6 +62,7 @@ typedef struct SocketReadState SocketReadState;
  typedef void (SocketReadStateFinalize)(SocketReadState *rs);
  typedef void (NetAnnounce)(NetClientState *);
  typedef bool (SetSteeringEBPF)(NetClientState *, int);
+typedef bool (NetCheckPeerType)(NetClientState *, ObjectClass *, Error **);
  
  typedef struct NetClientInfo {

  NetClientDriver type;
@@ -84,6 +85,7 @@ typedef struct NetClientInfo {
  SetVnetBE *set_vnet_be;
  NetAnnounce *announce;
  SetSteeringEBPF *set_steering_ebpf;
+NetCheckPeerType *check_peer_type;
  } NetClientInfo;
  
  struct NetClientState {

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index e71f5d64d1..a91f60567a 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -431,6 +431,12 @@ static void set_netdev(Object *obj, Visitor *v, const char 
*name,
  goto out;
  }
  
+if (peers[i]->info->check_peer_type) {

+if (!peers[i]->info->check_peer_type(peers[i], obj->class, errp)) {
+goto out;
+}
+}
+
  ncs[i] = peers[i];
  ncs[i]->queue_index = i;
  }





Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-08 Thread Jason Wang
On Wed, Sep 8, 2021 at 11:19 PM Peter Xu  wrote:
>
> On Wed, Sep 08, 2021 at 09:19:27AM +0100, Dr. David Alan Gilbert wrote:
> > * Jason Wang (jasow...@redhat.com) wrote:
> > > On Wed, Sep 8, 2021 at 2:32 AM Peter Xu  wrote:
> > > >
> > > > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos 
> > > > wrote:
> > > > > > I don't think it is valid to unconditionally enable this feature 
> > > > > > due to the
> > > > > > resource usage implications
> > > > > >
> > > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > > >
> > > > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This 
> > > > > > happens
> > > > > >if the socket option was not set, the socket exceeds its optmem
> > > > > >limit or the user exceeds its ulimit on locked pages."
> > > > >
> > > > > You are correct, I unfortunately missed this part in the docs :(
> > > > >
> > > > > > The limit on locked pages is something that looks very likely to be
> > > > > > exceeded unless you happen to be running a QEMU config that already
> > > > > > implies locked memory (eg PCI assignment)
> > > > >
> > > > > Do you mean the limit an user has on locking memory?
> > > > >
> > > > > If so, that makes sense. I remember I needed to set the upper limit 
> > > > > of locked
> > > > > memory for the user before using it, or adding a capability to qemu 
> > > > > before.
> > > >
> > > > So I'm a bit confused on why MSG_ZEROCOPY requires checking 
> > > > RLIMIT_MEMLOCK.
> > > >
> > > > The thing is IIUC that's accounting for pinned pages only with either 
> > > > mlock()
> > > > (FOLL_MLOCK) or vfio (FOLL_PIN).
> > > >
> > > > I don't really think MSG_ZEROCOPY is doing that at all...  I'm looking 
> > > > at
> > > > __zerocopy_sg_from_iter() -> iov_iter_get_pages().
> > >
> > > It happens probably here:
> > >
> > > E.g
> > >
> > > __ip_append_data()
> > > msg_zerocopy_realloc()
> > > mm_account_pinned_pages()
> >
> > When do they get uncounted?  i.e. is it just the data that's in flight
> > that's marked as pinned?
>
> I think so - there's __msg_zerocopy_callback() -> mm_unaccount_pinned_pages()
> correspondingly.  Thanks,

Yes, and actually the memory that could be pinned is limited by the
sndbuf of TCP socket. So we are fine with rlimit (e.g we don't need to
pin all guest pages).

Thanks

>
> --
> Peter Xu
>




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-07 Thread Jason Wang
On Wed, Sep 8, 2021 at 11:24 AM Peter Xu  wrote:
>
> On Wed, Sep 08, 2021 at 10:59:57AM +0800, Jason Wang wrote:
> > On Wed, Sep 8, 2021 at 2:32 AM Peter Xu  wrote:
> > >
> > > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos 
> > > wrote:
> > > > > I don't think it is valid to unconditionally enable this feature due 
> > > > > to the
> > > > > resource usage implications
> > > > >
> > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > >
> > > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
> > > > >if the socket option was not set, the socket exceeds its optmem
> > > > >limit or the user exceeds its ulimit on locked pages."
> > > >
> > > > You are correct, I unfortunately missed this part in the docs :(
> > > >
> > > > > The limit on locked pages is something that looks very likely to be
> > > > > exceeded unless you happen to be running a QEMU config that already
> > > > > implies locked memory (eg PCI assignment)
> > > >
> > > > Do you mean the limit an user has on locking memory?
> > > >
> > > > If so, that makes sense. I remember I needed to set the upper limit of 
> > > > locked
> > > > memory for the user before using it, or adding a capability to qemu 
> > > > before.
> > >
> > > So I'm a bit confused on why MSG_ZEROCOPY requires checking 
> > > RLIMIT_MEMLOCK.
> > >
> > > The thing is IIUC that's accounting for pinned pages only with either 
> > > mlock()
> > > (FOLL_MLOCK) or vfio (FOLL_PIN).
> > >
> > > I don't really think MSG_ZEROCOPY is doing that at all...  I'm looking at
> > > __zerocopy_sg_from_iter() -> iov_iter_get_pages().
> >
> > It happens probably here:
> >
> > E.g
> >
> > __ip_append_data()
> > msg_zerocopy_realloc()
> > mm_account_pinned_pages()
>
> Right. :)
>
> But my previous question is more about the reason behind it - I thought that's
> a common GUP and it shouldn't rely on locked_vm because it should still be
> temporary GUP rather than a long time GUP, IMHO that's how we use locked_vm 
> (we
> don't account for temp GUPs but only longterm ones). IOW, I'm wondering 
> whether
> all the rest of iov_iter_get_pages() callers should check locked_vm too, and
> AFAIU they're not doing that right now..

You are probably right, but I'm not sure if it's too late to fix that.
(E.g it will break existing userspace)

Thanks

>
> Thanks,
>
> --
> Peter Xu
>




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-07 Thread Jason Wang
On Wed, Sep 8, 2021 at 2:32 AM Peter Xu  wrote:
>
> On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos wrote:
> > > I don't think it is valid to unconditionally enable this feature due to 
> > > the
> > > resource usage implications
> > >
> > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > >
> > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
> > >if the socket option was not set, the socket exceeds its optmem
> > >limit or the user exceeds its ulimit on locked pages."
> >
> > You are correct, I unfortunately missed this part in the docs :(
> >
> > > The limit on locked pages is something that looks very likely to be
> > > exceeded unless you happen to be running a QEMU config that already
> > > implies locked memory (eg PCI assignment)
> >
> > Do you mean the limit an user has on locking memory?
> >
> > If so, that makes sense. I remember I needed to set the upper limit of 
> > locked
> > memory for the user before using it, or adding a capability to qemu before.
>
> So I'm a bit confused on why MSG_ZEROCOPY requires checking RLIMIT_MEMLOCK.
>
> The thing is IIUC that's accounting for pinned pages only with either mlock()
> (FOLL_MLOCK) or vfio (FOLL_PIN).
>
> I don't really think MSG_ZEROCOPY is doing that at all...  I'm looking at
> __zerocopy_sg_from_iter() -> iov_iter_get_pages().

It happens probably here:

E.g

__ip_append_data()
msg_zerocopy_realloc()
mm_account_pinned_pages()

Thanks

>
> Say, I think there're plenty of iov_iter_get_pages() callers and normal GUPs, 
> I
> think most of them do no need such accountings.
>
> --
> Peter Xu
>




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-02 Thread Jason Wang



在 2021/9/1 下午11:35, Peter Xu 写道:

On Wed, Sep 01, 2021 at 09:53:07AM +0100, Daniel P. Berrangé wrote:

On Tue, Aug 31, 2021 at 04:29:09PM -0400, Peter Xu wrote:

On Tue, Aug 31, 2021 at 02:16:42PM +0100, Daniel P. Berrangé wrote:

On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:

Call qio_channel_set_zerocopy(true) in the start of every multifd thread.

Change the send_write() interface of multifd, allowing it to pass down
flags for qio_channel_write*().

Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping the
other data being sent at the default copying approach.

Signed-off-by: Leonardo Bras 
---
  migration/multifd-zlib.c | 7 ---
  migration/multifd-zstd.c | 7 ---
  migration/multifd.c  | 9 ++---
  migration/multifd.h  | 3 ++-
  4 files changed, 16 insertions(+), 10 deletions(-)
@@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
  }
  
  if (used) {

-ret = multifd_send_state->ops->send_write(p, used, _err);
+ret = multifd_send_state->ops->send_write(p, used, 
MSG_ZEROCOPY,
+  _err);

I don't think it is valid to unconditionally enable this feature due to the
resource usage implications

https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html

   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
if the socket option was not set, the socket exceeds its optmem
limit or the user exceeds its ulimit on locked pages."

The limit on locked pages is something that looks very likely to be
exceeded unless you happen to be running a QEMU config that already
implies locked memory (eg PCI assignment)

Yes it would be great to be a migration capability in parallel to multifd. At
initial phase if it's easy to be implemented on multi-fd only, we can add a
dependency between the caps.  In the future we can remove that dependency when
the code is ready to go without multifd.  Thanks,

Also, I'm wondering how zerocopy support interacts with kernel support
for kTLS and multipath-TCP, both of which we want to be able to use
with migration.

Copying Jason Wang for net implications between these features on kernel side



Note that the MSG_ZEROCOPY is contributed by Google :)



and whether they can be enabled together (MSG_ZEROCOPY, mptcp, kTLS).



I think they can. Anyway kernel can choose to do datacopy when necessary.

Note that the "zerocopy" is probably not correct here. What's better is 
"Enable MSG_ZEROCOPY" since:


1) kernel supports various kinds of zerocopy, for TX, it has supported 
sendfile() for many years.

2) MSG_ZEROCOPY is only used for TX but not RX
3) TCP rx zerocopy is only supported via mmap() which requires some 
extra configurations e.g 4K MTU, driver support for header split etc.


[1] https://www.youtube.com/watch?v=_ZfiQGWFvg0

Thanks




 From the safe side we may want to only enable one of them until we prove
they'll work together I guess..

Not a immediate concern as I don't really think any of them is really
explicitly supported in qemu.

KTLS may be implicitly included by a new gnutls, but we need to mark TLS and
ZEROCOPY mutual exclusive anyway because at least the userspace TLS code of
gnutls won't has a way to maintain the tls buffers used by zerocopy.  So at
least we need some knob to detect whether kTLS is enabled in gnutls.






Re: [PATCH v6 4/6] qmp: add QMP command x-debug-virtio-queue-status

2021-08-26 Thread Jason Wang



在 2021/8/26 下午2:25, Jonah Palmer 写道:


Hi Jason, could I get your thoughts on this implementation question below?

I'm not too sure on how I should proceed determining if vhost is 
active or not.


Thank you!


Jonah

On 7/26/21 5:33 AM, Jonah Palmer wrote:



On 7/22/21 5:22 AM, Jason Wang wrote:


在 2021/7/21 下午4:59, Jonah Palmer 写道:



On 7/13/21 10:37 PM, Jason Wang wrote:


在 2021/7/12 下午6:35, Jonah Palmer 写道:

From: Laurent Vivier 

This new command shows internal status of a VirtQueue.
(vrings and indexes).

Signed-off-by: Laurent Vivier 
Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio-stub.c |   6 +++
  hw/virtio/virtio.c  |  37 ++
  qapi/virtio.json    | 102 


  3 files changed, 145 insertions(+)

  [Jonah: Added 'device-type' field to VirtQueueStatus and
  qmp command x-debug-virtio-queue-status.]

diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
index ddb592f..3c1bf17 100644
--- a/hw/virtio/virtio-stub.c
+++ b/hw/virtio/virtio-stub.c
@@ -17,3 +17,9 @@ VirtioStatus *qmp_x_debug_virtio_status(const 
char* path, Error **errp)

  {
  return qmp_virtio_unsupported(errp);
  }
+
+VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path,
+ uint16_t queue, Error **errp)
+{
+    return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 81a0ee8..ccd4371 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3935,6 +3935,43 @@ static VirtIODevice 
*virtio_device_find(const char *path)

  return NULL;
  }
  +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char 
*path,

+ uint16_t queue, Error **errp)
+{
+    VirtIODevice *vdev;
+    VirtQueueStatus *status;
+
+    vdev = virtio_device_find(path);
+    if (vdev == NULL) {
+    error_setg(errp, "Path %s is not a VirtIO device", path);
+    return NULL;
+    }
+
+    if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, 
queue)) {

+    error_setg(errp, "Invalid virtqueue number %d", queue);
+    return NULL;
+    }
+
+    status = g_new0(VirtQueueStatus, 1);
+    status->device_type = qapi_enum_parse(_lookup, 
vdev->name,

+ VIRTIO_TYPE_UNKNOWN, NULL);
+    status->queue_index = vdev->vq[queue].queue_index;
+    status->inuse = vdev->vq[queue].inuse;
+    status->vring_num = vdev->vq[queue].vring.num;
+    status->vring_num_default = vdev->vq[queue].vring.num_default;
+    status->vring_align = vdev->vq[queue].vring.align;
+    status->vring_desc = vdev->vq[queue].vring.desc;
+    status->vring_avail = vdev->vq[queue].vring.avail;
+    status->vring_used = vdev->vq[queue].vring.used;
+    status->last_avail_idx = vdev->vq[queue].last_avail_idx;



As mentioned in previous versions. We need add vhost support 
otherwise the value here is wrong.
Got it. I'll add a case to determine if vhost is active for a given 
device.
So, in the case that vhost is active, should I just not print out 
the value or would I substitute it with

another value (whatever that might be)?



You can query the vhost for those index.

(vhost_get_vring_base())



  Same question for shadow_avail_idx below as well.



It's an implementation specific. I think we can simply not show it 
if vhost is enabled.


Thanks


Ah I see, thank you!

So, it appears to me that it's not very easy to get the struct 
vhost_dev pointer from struct VirtIODevice to indicate whether or not 
vhost is active, e.g. there's no virtio class-independent way to get 
struct vhost_dev.


I was thinking of adding an op/callback function to struct 
VirtioDeviceClass, e.g. bool has_vhost(VirtIODevice *vdev), and 
implement it for each virtio class (net, scsi, blk, etc.).


For example, for virtio-net, maybe it'd be something like:

bool has_vhost(VirtIODevice *vdev) {
   VirtIONet *n = VIRTIO_NET(vdev);
   NetClientState *nc = qemu_get_queue(n->nic);
   return nc->peer ? get_vhost_net(nc->peer) : false;
}



Something like this, yes.


Also, for getting the last_avail_idx, I was also thinking of adding 
another op/callback to struct VirtioDeviceClass, e.g. unsigned int 
get_last_avail_idx(VirtIODevice *vdev, unsigned int vq_idx) that 
finds if vhost is active or not and either gets last_avail_idx from 
virtio directly or through vhost (e.g. 
vhost_dev->vhost_ops->vhost_get_vring_base()).




So I think instead of has_vhost, we probably need get_vhost() to have a 
pointer to vhost_dev. Then we can do anything we want other than a 
dedicated interface just for avail index.


Thanks


I wanted to run this by you and get your opinion on this before I 
started implementing it in code. Let me know what you think about this.



Jonah






Jonah




+    status->shadow_avail_idx = vdev->vq[queue].shadow_avail_idx;



The shadow index is something that is implementation specific e.g 
in the case of vhost it's kind of meaningless.



Re: [PATCH v6 4/6] qmp: add QMP command x-debug-virtio-queue-status

2021-07-22 Thread Jason Wang



在 2021/7/21 下午4:59, Jonah Palmer 写道:



On 7/13/21 10:37 PM, Jason Wang wrote:


在 2021/7/12 下午6:35, Jonah Palmer 写道:

From: Laurent Vivier 

This new command shows internal status of a VirtQueue.
(vrings and indexes).

Signed-off-by: Laurent Vivier 
Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio-stub.c |   6 +++
  hw/virtio/virtio.c  |  37 ++
  qapi/virtio.json    | 102 


  3 files changed, 145 insertions(+)

  [Jonah: Added 'device-type' field to VirtQueueStatus and
  qmp command x-debug-virtio-queue-status.]

diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
index ddb592f..3c1bf17 100644
--- a/hw/virtio/virtio-stub.c
+++ b/hw/virtio/virtio-stub.c
@@ -17,3 +17,9 @@ VirtioStatus *qmp_x_debug_virtio_status(const 
char* path, Error **errp)

  {
  return qmp_virtio_unsupported(errp);
  }
+
+VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path,
+ uint16_t queue, 
Error **errp)

+{
+    return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 81a0ee8..ccd4371 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3935,6 +3935,43 @@ static VirtIODevice *virtio_device_find(const 
char *path)

  return NULL;
  }
  +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path,
+ uint16_t queue, 
Error **errp)

+{
+    VirtIODevice *vdev;
+    VirtQueueStatus *status;
+
+    vdev = virtio_device_find(path);
+    if (vdev == NULL) {
+    error_setg(errp, "Path %s is not a VirtIO device", path);
+    return NULL;
+    }
+
+    if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, 
queue)) {

+    error_setg(errp, "Invalid virtqueue number %d", queue);
+    return NULL;
+    }
+
+    status = g_new0(VirtQueueStatus, 1);
+    status->device_type = qapi_enum_parse(_lookup, 
vdev->name,

+ VIRTIO_TYPE_UNKNOWN, NULL);
+    status->queue_index = vdev->vq[queue].queue_index;
+    status->inuse = vdev->vq[queue].inuse;
+    status->vring_num = vdev->vq[queue].vring.num;
+    status->vring_num_default = vdev->vq[queue].vring.num_default;
+    status->vring_align = vdev->vq[queue].vring.align;
+    status->vring_desc = vdev->vq[queue].vring.desc;
+    status->vring_avail = vdev->vq[queue].vring.avail;
+    status->vring_used = vdev->vq[queue].vring.used;
+    status->last_avail_idx = vdev->vq[queue].last_avail_idx;



As mentioned in previous versions. We need add vhost support 
otherwise the value here is wrong.

Got it. I'll add a case to determine if vhost is active for a given device.
So, in the case that vhost is active, should I just not print out the value or 
would I substitute it with
another value (whatever that might be)?



You can query the vhost for those index.

(vhost_get_vring_base())



  Same question for shadow_avail_idx below as well.



It's an implementation specific. I think we can simply not show it if 
vhost is enabled.


Thanks




Jonah




+    status->shadow_avail_idx = vdev->vq[queue].shadow_avail_idx;



The shadow index is something that is implementation specific e.g in 
the case of vhost it's kind of meaningless.


Thanks



+    status->used_idx = vdev->vq[queue].used_idx;
+    status->signalled_used = vdev->vq[queue].signalled_used;
+    status->signalled_used_valid = 
vdev->vq[queue].signalled_used_valid;

+
+    return status;
+}
+
  #define CONVERT_FEATURES(type, map)    \
  ({   \
  type *list = NULL; \
diff --git a/qapi/virtio.json b/qapi/virtio.json
index 78873cd..7007e0c 100644
--- a/qapi/virtio.json
+++ b/qapi/virtio.json
@@ -406,3 +406,105 @@
    'data': { 'path': 'str' },
    'returns': 'VirtioStatus'
  }
+
+##
+# @VirtQueueStatus:
+#
+# Status of a VirtQueue
+#
+# @device-type: VirtIO device type
+#
+# @queue-index: VirtQueue queue_index
+#
+# @inuse: VirtQueue inuse
+#
+# @vring-num: VirtQueue vring.num
+#
+# @vring-num-default: VirtQueue vring.num_default
+#
+# @vring-align: VirtQueue vring.align
+#
+# @vring-desc: VirtQueue vring.desc
+#
+# @vring-avail: VirtQueue vring.avail
+#
+# @vring-used: VirtQueue vring.used
+#
+# @last-avail-idx: VirtQueue last_avail_idx
+#
+# @shadow-avail-idx: VirtQueue shadow_avail_idx
+#
+# @used-idx: VirtQueue used_idx
+#
+# @signalled-used: VirtQueue signalled_used
+#
+# @signalled-used-valid: VirtQueue signalled_used_valid
+#
+# Since: 6.1
+#
+##
+
+{ 'struct': 'VirtQueueStatus',
+  'data': {
+    'device-type': 'VirtioType',
+    'queue-index': 'uint16',
+    'inuse': 'uint32',
+    'vring-num': 'int',
+    'vring-num-default': 'int',
+    'vring-align': 'int',
+    'vring-desc': 'uint64',
+    'vring-avail': 'uint64',
+    'vring-used': 'uint64',
+    'last-avail-i

Re: [PATCH v6 6/6] hmp: add virtio commands

2021-07-22 Thread Jason Wang



在 2021/7/21 下午5:11, Jonah Palmer 写道:



On 7/13/21 10:40 PM, Jason Wang wrote:


在 2021/7/12 下午6:35, Jonah Palmer 写道:

+void hmp_virtio_queue_status(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    const char *path = qdict_get_try_str(qdict, "path");
+    int queue = qdict_get_int(qdict, "queue");
+    VirtQueueStatus *s = qmp_x_debug_virtio_queue_status(path, 
queue, );

+
+    if (err != NULL) {
+    hmp_handle_error(mon, err);
+    return;
+    }
+
+    monitor_printf(mon, "%s:\n", path);
+    monitor_printf(mon, "  device_type:  %s\n",
+   VirtioType_str(s->device_type));
+    monitor_printf(mon, "  index:    %d\n", 
s->queue_index);

+    monitor_printf(mon, "  inuse:    %d\n", s->inuse);
+    monitor_printf(mon, "  last_avail_idx:   %d (%"PRId64" %% 
%"PRId64")\n",
+   s->last_avail_idx, s->last_avail_idx % 
s->vring_num,

+   s->vring_num);
+    monitor_printf(mon, "  shadow_avail_idx: %d (%"PRId64" %% 
%"PRId64")\n",
+   s->shadow_avail_idx, s->shadow_avail_idx % 
s->vring_num,

+   s->vring_num);
+    monitor_printf(mon, "  used_idx: %d (%"PRId64" %% 
%"PRId64")\n",
+   s->used_idx, s->used_idx % s->vring_num, 
s->vring_num);



The modular information is not the case of packed ring where the 
queue size does not have to be a power of 2.

Doesn't modulo work for any integer, regardless if it's a power of 2 or not? 
Could you clarify this for me?



For packed ring, the index doesn't increase freely, it's always small 
than the virtqueue size.


So showing the modulo arithmetic seems useless since the device or 
driver doesn't use modulo for calculating the real offset.


Thanks




Thank you,





Re: [PATCH v6 0/6] hmp, qmp: Add some commands to introspect virtio devices

2021-07-22 Thread Jason Wang



在 2021/7/21 下午4:53, Jonah Palmer 写道:


Hi Jason. My apologies for the delayed response, several work-related 
things came up recently, but they're slowing down now so I'm turning 
my attention these patches to get taken care of.


A few questions and comments below (and in other following patches):


On 7/13/21 10:42 PM, Jason Wang wrote:


在 2021/7/12 下午6:35, Jonah Palmer 写道:
 Dump the information of the head element of the third queue 
of virtio-scsi:


 (qemu) virtio queue-element 
/machine/peripheral-anon/device[3]/virtio-backend 3

 index: 122
 ndescs: 3
 descs: addr 0x7302d000 len 4096 (write), addr 0x3c951763 
len 108 (write, next),

    addr 0x3c951728 len 51 (next)



I think it would be nice if we can show driver area and device area 
as well here.

Sure thing. And I apologize if it's obvious (I'm relatively new to virtio), but 
how can I expose the driver area?



So the spec defines three parts: the device area, the driver area, and 
the descriptor area. And they are all located in the guest memory.




I understand that virtio devices are part of the Qemu process, but I also 
thought that virtio drivers are in the
guest's kernel, which I don't believe I can see into from Qemu (or, at least, 
it's not obvious to me).



It works like how you access the descriptor ring (descriptor area).

Thanks




Jonah


Thanks






Re: [PATCH v6 0/6] hmp, qmp: Add some commands to introspect virtio devices

2021-07-13 Thread Jason Wang



在 2021/7/12 下午6:35, Jonah Palmer 写道:

 Dump the information of the head element of the third queue of 
virtio-scsi:

 (qemu) virtio queue-element 
/machine/peripheral-anon/device[3]/virtio-backend 3
 index: 122
 ndescs: 3
 descs: addr 0x7302d000 len 4096 (write), addr 0x3c951763 len 108 
(write, next),
addr 0x3c951728 len 51 (next)



I think it would be nice if we can show driver area and device area as 
well here.


Thanks




Re: [PATCH v6 6/6] hmp: add virtio commands

2021-07-13 Thread Jason Wang



在 2021/7/12 下午6:35, Jonah Palmer 写道:

+void hmp_virtio_queue_status(Monitor *mon, const QDict *qdict)
+{
+Error *err = NULL;
+const char *path = qdict_get_try_str(qdict, "path");
+int queue = qdict_get_int(qdict, "queue");
+VirtQueueStatus *s = qmp_x_debug_virtio_queue_status(path, queue, );
+
+if (err != NULL) {
+hmp_handle_error(mon, err);
+return;
+}
+
+monitor_printf(mon, "%s:\n", path);
+monitor_printf(mon, "  device_type:  %s\n",
+   VirtioType_str(s->device_type));
+monitor_printf(mon, "  index:%d\n", s->queue_index);
+monitor_printf(mon, "  inuse:%d\n", s->inuse);
+monitor_printf(mon, "  last_avail_idx:   %d (%"PRId64" %% 
%"PRId64")\n",
+   s->last_avail_idx, s->last_avail_idx % s->vring_num,
+   s->vring_num);
+monitor_printf(mon, "  shadow_avail_idx: %d (%"PRId64" %% 
%"PRId64")\n",
+   s->shadow_avail_idx, s->shadow_avail_idx % s->vring_num,
+   s->vring_num);
+monitor_printf(mon, "  used_idx: %d (%"PRId64" %% 
%"PRId64")\n",
+   s->used_idx, s->used_idx % s->vring_num, s->vring_num);



The modular information is not the case of packed ring where the queue 
size does not have to be a power of 2.


Thanks



+monitor_printf(mon, "  signalled_used:   %d (%"PRId64" %% 
%"PRId64")\n",
+   s->signalled_used, s->signalled_used % s->vring_num,
+   s->vring_num);
+monitor_printf(mon, "  signalled_used_valid: %d\n", 
s->signalled_used_valid);
+monitor_printf(mon, "  VRing:\n");
+monitor_printf(mon, "num: %"PRId64"\n", s->vring_num);
+monitor_printf(mon, "num_default: %"PRId64"\n", s->vring_num_default);
+monitor_printf(mon, "align:   %"PRId64"\n", s->vring_align);
+monitor_printf(mon, "desc:0x%016"PRIx64"\n", s->vring_desc);
+monitor_printf(mon, "avail:   0x%016"PRIx64"\n", s->vring_avail);
+monitor_printf(mon, "used:0x%016"PRIx64"\n", s->vring_used);
+
+qapi_free_VirtQueueStatus(s);





Re: [PATCH v6 4/6] qmp: add QMP command x-debug-virtio-queue-status

2021-07-13 Thread Jason Wang



在 2021/7/12 下午6:35, Jonah Palmer 写道:

From: Laurent Vivier 

This new command shows internal status of a VirtQueue.
(vrings and indexes).

Signed-off-by: Laurent Vivier 
Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio-stub.c |   6 +++
  hw/virtio/virtio.c  |  37 ++
  qapi/virtio.json| 102 
  3 files changed, 145 insertions(+)

  [Jonah: Added 'device-type' field to VirtQueueStatus and
  qmp command x-debug-virtio-queue-status.]

diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
index ddb592f..3c1bf17 100644
--- a/hw/virtio/virtio-stub.c
+++ b/hw/virtio/virtio-stub.c
@@ -17,3 +17,9 @@ VirtioStatus *qmp_x_debug_virtio_status(const char* path, 
Error **errp)
  {
  return qmp_virtio_unsupported(errp);
  }
+
+VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path,
+ uint16_t queue, Error **errp)
+{
+return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 81a0ee8..ccd4371 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3935,6 +3935,43 @@ static VirtIODevice *virtio_device_find(const char *path)
  return NULL;
  }
  
+VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path,

+ uint16_t queue, Error **errp)
+{
+VirtIODevice *vdev;
+VirtQueueStatus *status;
+
+vdev = virtio_device_find(path);
+if (vdev == NULL) {
+error_setg(errp, "Path %s is not a VirtIO device", path);
+return NULL;
+}
+
+if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) {
+error_setg(errp, "Invalid virtqueue number %d", queue);
+return NULL;
+}
+
+status = g_new0(VirtQueueStatus, 1);
+status->device_type = qapi_enum_parse(_lookup, vdev->name,
+  VIRTIO_TYPE_UNKNOWN, NULL);
+status->queue_index = vdev->vq[queue].queue_index;
+status->inuse = vdev->vq[queue].inuse;
+status->vring_num = vdev->vq[queue].vring.num;
+status->vring_num_default = vdev->vq[queue].vring.num_default;
+status->vring_align = vdev->vq[queue].vring.align;
+status->vring_desc = vdev->vq[queue].vring.desc;
+status->vring_avail = vdev->vq[queue].vring.avail;
+status->vring_used = vdev->vq[queue].vring.used;
+status->last_avail_idx = vdev->vq[queue].last_avail_idx;



As mentioned in previous versions. We need add vhost support otherwise 
the value here is wrong.




+status->shadow_avail_idx = vdev->vq[queue].shadow_avail_idx;



The shadow index is something that is implementation specific e.g in the 
case of vhost it's kind of meaningless.


Thanks



+status->used_idx = vdev->vq[queue].used_idx;
+status->signalled_used = vdev->vq[queue].signalled_used;
+status->signalled_used_valid = vdev->vq[queue].signalled_used_valid;
+
+return status;
+}
+
  #define CONVERT_FEATURES(type, map)\
  ({   \
  type *list = NULL; \
diff --git a/qapi/virtio.json b/qapi/virtio.json
index 78873cd..7007e0c 100644
--- a/qapi/virtio.json
+++ b/qapi/virtio.json
@@ -406,3 +406,105 @@
'data': { 'path': 'str' },
'returns': 'VirtioStatus'
  }
+
+##
+# @VirtQueueStatus:
+#
+# Status of a VirtQueue
+#
+# @device-type: VirtIO device type
+#
+# @queue-index: VirtQueue queue_index
+#
+# @inuse: VirtQueue inuse
+#
+# @vring-num: VirtQueue vring.num
+#
+# @vring-num-default: VirtQueue vring.num_default
+#
+# @vring-align: VirtQueue vring.align
+#
+# @vring-desc: VirtQueue vring.desc
+#
+# @vring-avail: VirtQueue vring.avail
+#
+# @vring-used: VirtQueue vring.used
+#
+# @last-avail-idx: VirtQueue last_avail_idx
+#
+# @shadow-avail-idx: VirtQueue shadow_avail_idx
+#
+# @used-idx: VirtQueue used_idx
+#
+# @signalled-used: VirtQueue signalled_used
+#
+# @signalled-used-valid: VirtQueue signalled_used_valid
+#
+# Since: 6.1
+#
+##
+
+{ 'struct': 'VirtQueueStatus',
+  'data': {
+'device-type': 'VirtioType',
+'queue-index': 'uint16',
+'inuse': 'uint32',
+'vring-num': 'int',
+'vring-num-default': 'int',
+'vring-align': 'int',
+'vring-desc': 'uint64',
+'vring-avail': 'uint64',
+'vring-used': 'uint64',
+'last-avail-idx': 'uint16',
+'shadow-avail-idx': 'uint16',
+'used-idx': 'uint16',
+'signalled-used': 'uint16',
+'signalled-used-valid': 'uint16'
+  }
+}
+
+##
+# @x-debug-virtio-queue-status:
+#
+# Return the status of a given VirtQueue
+#
+# @path: QOBject path of the VirtIODevice
+#
+# @queue: queue number to examine
+#
+# Returns: Status of the VirtQueue
+#
+# Since: 6.1
+#
+# Example:
+#
+# -> { "execute": "x-debug-virtio-queue-status",
+#  "arguments": {
+#  "path": "/machine/peripheral-anon/device[3]/virtio-backend",
+#  "queue": 0
+#  }
+#   }
+# <- { "return": {
+#  

Re: [PATCH 3/3] virtio-net: Constify VirtIOFeature feature_sizes[]

2021-05-11 Thread Jason Wang



在 2021/5/11 下午6:41, Philippe Mathieu-Daudé 写道:

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/net/virtio-net.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 66b9ff45118..6b7e8dd04ef 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -89,7 +89,7 @@
   VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \
   VIRTIO_NET_RSS_HASH_TYPE_UDP_EX)
  
-static VirtIOFeature feature_sizes[] = {

+static const VirtIOFeature feature_sizes[] = {
  {.flags = 1ULL << VIRTIO_NET_F_MAC,
   .end = endof(struct virtio_net_config, mac)},
  {.flags = 1ULL << VIRTIO_NET_F_STATUS,



Acked-by: Jason Wang 





Re: [PATCH 2/3] virtio-blk: Constify VirtIOFeature feature_sizes[]

2021-05-11 Thread Jason Wang



在 2021/5/11 下午6:41, Philippe Mathieu-Daudé 写道:

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/block/virtio-blk.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index d28979efb8d..f139cd7cc9c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -40,7 +40,7 @@
   * Starting from the discard feature, we can use this array to properly
   * set the config size depending on the features enabled.
   */
-static VirtIOFeature feature_sizes[] = {
+static const VirtIOFeature feature_sizes[] = {
  {.flags = 1ULL << VIRTIO_BLK_F_DISCARD,
   .end = endof(struct virtio_blk_config, discard_sector_alignment)},
  {.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,



Acked-by: Jason Wang 





Re: [PATCH 1/3] hw/virtio: Pass virtio_feature_get_config_size() a const argument

2021-05-11 Thread Jason Wang



在 2021/5/11 下午6:41, Philippe Mathieu-Daudé 写道:

The VirtIOFeature structure isn't modified, mark it const.

Signed-off-by: Philippe Mathieu-Daudé 



Acked-by: Jason Wang 



---
  include/hw/virtio/virtio.h | 2 +-
  hw/virtio/virtio.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b7ece7a6a89..8bab9cfb750 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -43,7 +43,7 @@ typedef struct VirtIOFeature {
  size_t end;
  } VirtIOFeature;
  
-size_t virtio_feature_get_config_size(VirtIOFeature *features,

+size_t virtio_feature_get_config_size(const VirtIOFeature *features,
uint64_t host_features);
  
  typedef struct VirtQueue VirtQueue;

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 9e13cb9e3ad..e02544b2df7 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2981,7 +2981,7 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
  return ret;
  }
  
-size_t virtio_feature_get_config_size(VirtIOFeature *feature_sizes,

+size_t virtio_feature_get_config_size(const VirtIOFeature *feature_sizes,
uint64_t host_features)
  {
  size_t config_size = 0;





Re: [PATCH 15/23] net: Avoid dynamic stack allocation

2021-05-06 Thread Jason Wang



在 2021/5/6 上午5:10, Philippe Mathieu-Daudé 写道:

Use autofree heap allocation instead of variable-length
array on the stack.

Signed-off-by: Philippe Mathieu-Daudé 



Acked-by: Jason Wang 



---
  hw/net/fsl_etsec/rings.c  | 9 -
  hw/net/rocker/rocker_of_dpa.c | 2 +-
  net/dump.c| 2 +-
  net/tap.c | 2 +-
  4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/net/fsl_etsec/rings.c b/hw/net/fsl_etsec/rings.c
index 8f084464155..1abdcb5a29c 100644
--- a/hw/net/fsl_etsec/rings.c
+++ b/hw/net/fsl_etsec/rings.c
@@ -381,8 +381,6 @@ static void fill_rx_bd(eTSEC  *etsec,
  uint16_t to_write;
  hwaddr   bufptr = bd->bufptr +
  ((hwaddr)(etsec->regs[TBDBPH].value & 0xF) << 32);
-uint8_t  padd[etsec->rx_padding];
-uint8_t  rem;
  
  RING_DEBUG("eTSEC fill Rx buffer @ 0x%016" HWADDR_PRIx

 " size:%zu(padding + crc:%u) + fcb:%u\n",
@@ -423,11 +421,12 @@ static void fill_rx_bd(eTSEC  *etsec,
  /* The remaining bytes are only for padding which is not actually
   * allocated in the data buffer.
   */
-
-rem = MIN(etsec->regs[MRBLR].value - bd->length, etsec->rx_padding);
+uint8_t  rem = MIN(etsec->regs[MRBLR].value - bd->length,
+   etsec->rx_padding);
  
  if (rem > 0) {

-memset(padd, 0x0, sizeof(padd));
+g_autofree uint8_t *padd = g_malloc0(etsec->rx_padding);
+
  etsec->rx_padding -= rem;
  *size -= rem;
  bd->length+= rem;
diff --git a/hw/net/rocker/rocker_of_dpa.c b/hw/net/rocker/rocker_of_dpa.c
index b3b8c5bb6d4..3e400ceaef7 100644
--- a/hw/net/rocker/rocker_of_dpa.c
+++ b/hw/net/rocker/rocker_of_dpa.c
@@ -1043,7 +1043,7 @@ static void of_dpa_flow_ig_tbl(OfDpaFlowContext *fc, 
uint32_t tbl_id)
  static ssize_t of_dpa_ig(World *world, uint32_t pport,
   const struct iovec *iov, int iovcnt)
  {
-struct iovec iov_copy[iovcnt + 2];
+g_autofree struct iovec *iov_copy = g_new(struct iovec, iovcnt + 2);
  OfDpaFlowContext fc = {
  .of_dpa = world_private(world),
  .in_pport = pport,
diff --git a/net/dump.c b/net/dump.c
index 4d538d82a69..b830302e27d 100644
--- a/net/dump.c
+++ b/net/dump.c
@@ -68,7 +68,7 @@ static ssize_t dump_receive_iov(DumpState *s, const struct 
iovec *iov, int cnt)
  int64_t ts;
  int caplen;
  size_t size = iov_size(iov, cnt);
-struct iovec dumpiov[cnt + 1];
+g_autofree struct iovec *dumpiov = g_new(struct iovec, cnt + 1);
  
  /* Early return in case of previous error. */

  if (s->fd < 0) {
diff --git a/net/tap.c b/net/tap.c
index bae895e2874..2b9ed8a2cd8 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -120,7 +120,7 @@ static ssize_t tap_receive_iov(NetClientState *nc, const 
struct iovec *iov,
  {
  TAPState *s = DO_UPCAST(TAPState, nc, nc);
  const struct iovec *iovp = iov;
-struct iovec iov_copy[iovcnt + 1];
+g_autofree struct iovec *iov_copy = g_new(struct iovec, iovcnt + 1);
  struct virtio_net_hdr_mrg_rxbuf hdr = { };
  
  if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {





Re: [PATCH 09/23] hw/net/e1000e_core: Use definition to avoid dynamic stack allocation

2021-05-05 Thread Jason Wang



在 2021/5/6 上午5:10, Philippe Mathieu-Daudé 写道:

The compiler isn't clever enough to figure 'min_buf_size'
is a constant, so help it by using a definitions instead.

Signed-off-by: Philippe Mathieu-Daudé 



Acked-by: Jason Wang 



---
  hw/net/e1000e_core.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index b75f2ab8fc1..4b1d4521a50 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1621,15 +1621,16 @@ e1000e_rx_fix_l4_csum(E1000ECore *core, struct NetRxPkt 
*pkt)
  }
  }
  
+/* Min. octets in an ethernet frame sans FCS */

+#define MIN_BUF_SIZE 60
+
  ssize_t
  e1000e_receive_iov(E1000ECore *core, const struct iovec *iov, int iovcnt)
  {
  static const int maximum_ethernet_hdr_len = (14 + 4);
-/* Min. octets in an ethernet frame sans FCS */
-static const int min_buf_size = 60;
  
  uint32_t n = 0;

-uint8_t min_buf[min_buf_size];
+uint8_t min_buf[MIN_BUF_SIZE];
  struct iovec min_iov;
  uint8_t *filter_buf;
  size_t size, orig_size;





Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-08 Thread Jason Wang



在 2021/4/8 下午1:51, Dongli Zhang 写道:


On 4/6/21 7:20 PM, Jason Wang wrote:

在 2021/4/7 上午7:27, Dongli Zhang 写道:

This will answer your question that "Can it bypass the masking?".

For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd is
not able to bypass masking because masking is to unregister the eventfd. To
write to eventfd does not take effect.

However, it is possible to bypass masking for vhost-net because vhost-net
registered a specific masked_notifier eventfd in order to mask irq. To write to
original eventfd still takes effect.

We may leave the user to decide whether to write to 'masked_notifier' or
original 'guest_notifier' for vhost-net.

My fault here. To write to masked_notifier will always be masked:(


Only when there's no bug in the qemu.



If it is EventNotifier level, we will not care whether the EventNotifier is
masked or not. It just provides an interface to write to EventNotifier.


Yes.



To dump the MSI-x table for both virtio and vfio will help confirm if the vector
is masked.


That would be helpful as well. It's probably better to extend "info pci" 
command.

Thanks

I will try if to add to "info pci" (introduce new arg option to "info pci"), or
to introduce new command.



It's better to just reuse "info pci".




About the EventNotifier, I will classify them as guest notifier or host notifier
so that it will be much more easier for user to tell if the eventfd is for
injecting IRQ or kicking the doorbell.



Sounds good.




Thank you very much for all suggestions!

Dongli Zhang



Thanks





Thank you very much!

Dongli Zhang






Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-06 Thread Jason Wang



在 2021/4/7 上午7:27, Dongli Zhang 写道:

This will answer your question that "Can it bypass the masking?".

For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd is
not able to bypass masking because masking is to unregister the eventfd. To
write to eventfd does not take effect.

However, it is possible to bypass masking for vhost-net because vhost-net
registered a specific masked_notifier eventfd in order to mask irq. To write to
original eventfd still takes effect.

We may leave the user to decide whether to write to 'masked_notifier' or
original 'guest_notifier' for vhost-net.

My fault here. To write to masked_notifier will always be masked:(



Only when there's no bug in the qemu.




If it is EventNotifier level, we will not care whether the EventNotifier is
masked or not. It just provides an interface to write to EventNotifier.



Yes.




To dump the MSI-x table for both virtio and vfio will help confirm if the vector
is masked.



That would be helpful as well. It's probably better to extend "info pci" 
command.


Thanks




Thank you very much!

Dongli Zhang






Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-06 Thread Jason Wang



在 2021/4/6 下午4:43, Dongli Zhang 写道:


On 4/5/21 6:55 PM, Jason Wang wrote:

在 2021/4/6 上午4:00, Dongli Zhang 写道:

On 4/1/21 8:47 PM, Jason Wang wrote:

在 2021/3/30 下午3:29, Dongli Zhang 写道:

On 3/28/21 8:56 PM, Jason Wang wrote:

在 2021/3/27 上午5:16, Dongli Zhang 写道:

Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:

在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$




... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device
(e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
   kref = {
     refcount = {
   refs = {
     counter = 4
   }
     }
   },
   wqh = {
     lock = {
   {
     rlock = {
   raw_lock = {
     val = {
   counter = 0
     }
   }
     }
   }
     },
     head = {
   next = 0x8f841dc08e18,
   prev = 0x8f841dc08e18
     }
   },
   count = 15, ---> eventfd is 15 !!!
   flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
   "arguments": { "dev": "/machine/peripheral/vscsi0",
  "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

What do you mean by "software" here? And it looks to me you're testing
whether
event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
sure how much value could we gain from a dedicated debug interface like this
consider there're a lot of exisinting general purpose debugging method like
tracing or gdb. I'd say the path from virtio_queue_notify() to
event_notifier_set() is only a very small fraction of the process of
virtqueue
kick which is unlikey to be buggy. Consider usually the ioeventfd will be
offloaded to KVM, it's more a chance that something is wrong in setuping
ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting
for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.

So there could be several factors that may block the notification:

1) eventfd bug (ioeventfd vs irqfd)
2) wrong virtqueue state (either driver or device)
3) missing barriers (either driver or device)
4) Qemu bug (irqchip and routing)
...

This is not only about whether notification is blocked.

It can also be used to help narrow down and understand if there is any
suspicious issue in blk-mq/scsi/netdev/napi code. The PV drivers are not only
drivers following virtio spec. It is closely related to many of other kernel
components.

Suppose IO was recovered after we inject an IRQ to vhost-scsi on purpose, we
will be able to analyze what may happen along the IO completion path starting
from when /where the IRQ is injected ... perhaps the root cause is not with
virtio but blk-mq/scsi (this is just an example).


In addition, this idea should help for vfi

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-05 Thread Jason Wang



在 2021/4/6 上午4:00, Dongli Zhang 写道:


On 4/1/21 8:47 PM, Jason Wang wrote:

在 2021/3/30 下午3:29, Dongli Zhang 写道:

On 3/28/21 8:56 PM, Jason Wang wrote:

在 2021/3/27 上午5:16, Dongli Zhang 写道:

Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:

在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$



... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
  kref = {
    refcount = {
  refs = {
    counter = 4
  }
    }
  },
  wqh = {
    lock = {
  {
    rlock = {
  raw_lock = {
    val = {
  counter = 0
    }
  }
    }
  }
    },
    head = {
  next = 0x8f841dc08e18,
  prev = 0x8f841dc08e18
    }
  },
  count = 15, ---> eventfd is 15 !!!
  flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
  "arguments": { "dev": "/machine/peripheral/vscsi0",
     "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

What do you mean by "software" here? And it looks to me you're testing whether
event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
sure how much value could we gain from a dedicated debug interface like this
consider there're a lot of exisinting general purpose debugging method like
tracing or gdb. I'd say the path from virtio_queue_notify() to
event_notifier_set() is only a very small fraction of the process of virtqueue
kick which is unlikey to be buggy. Consider usually the ioeventfd will be
offloaded to KVM, it's more a chance that something is wrong in setuping
ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.

So there could be several factors that may block the notification:

1) eventfd bug (ioeventfd vs irqfd)
2) wrong virtqueue state (either driver or device)
3) missing barriers (either driver or device)
4) Qemu bug (irqchip and routing)
...

This is not only about whether notification is blocked.

It can also be used to help narrow down and understand if there is any
suspicious issue in blk-mq/scsi/netdev/napi code. The PV drivers are not only
drivers following virtio spec. It is closely related to many of other kernel
components.

Suppose IO was recovered after we inject an IRQ to vhost-scsi on purpose, we
will be able to analyze what may happen along the IO completion path starting
from when /where the IRQ is injected ... perhaps the root cause is not with
virtio but blk-mq/scsi (this is just an example).


In addition, this idea should help for vfio-pci. Suppose the developer for a
specific device driver suspects IO/networking hangs because of lo

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-01 Thread Jason Wang



在 2021/3/30 下午3:29, Dongli Zhang 写道:


On 3/28/21 8:56 PM, Jason Wang wrote:

在 2021/3/27 上午5:16, Dongli Zhang 写道:

Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:

在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$


... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
     kref = {
   refcount = {
     refs = {
   counter = 4
     }
   }
     },
     wqh = {
   lock = {
     {
   rlock = {
     raw_lock = {
   val = {
     counter = 0
   }
     }
   }
     }
   },
   head = {
     next = 0x8f841dc08e18,
     prev = 0x8f841dc08e18
   }
     },
     count = 15, ---> eventfd is 15 !!!
     flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
     "arguments": { "dev": "/machine/peripheral/vscsi0",
    "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

What do you mean by "software" here? And it looks to me you're testing whether
event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
sure how much value could we gain from a dedicated debug interface like this
consider there're a lot of exisinting general purpose debugging method like
tracing or gdb. I'd say the path from virtio_queue_notify() to
event_notifier_set() is only a very small fraction of the process of virtqueue
kick which is unlikey to be buggy. Consider usually the ioeventfd will be
offloaded to KVM, it's more a chance that something is wrong in setuping
ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by 
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.


So there could be several factors that may block the notification:

1) eventfd bug (ioeventfd vs irqfd)
2) wrong virtqueue state (either driver or device)
3) missing barriers (either driver or device)
4) Qemu bug (irqchip and routing)
...

This is not only about whether notification is blocked.

It can also be used to help narrow down and understand if there is any
suspicious issue in blk-mq/scsi/netdev/napi code. The PV drivers are not only
drivers following virtio spec. It is closely related to many of other kernel
components.

Suppose IO was recovered after we inject an IRQ to vhost-scsi on purpose, we
will be able to analyze what may happen along the IO completion path starting
from when /where the IRQ is injected ... perhaps the root cause is not with
virtio but blk-mq/scsi (this is just an example).


In addition, this idea should help for vfio-pci. Suppose the developer for a
specific device driver suspects IO/networking hangs because of loss if IRQ, we
will be able to verify if that assumption is correct by injecting an IRQ on 
purpose.

T

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-03-28 Thread Jason Wang



在 2021/3/27 上午5:16, Dongli Zhang 写道:

Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:

在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$

... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
    kref = {
  refcount = {
    refs = {
  counter = 4
    }
  }
    },
    wqh = {
  lock = {
    {
  rlock = {
    raw_lock = {
  val = {
    counter = 0
  }
    }
  }
    }
  },
  head = {
    next = 0x8f841dc08e18,
    prev = 0x8f841dc08e18
  }
    },
    count = 15, ---> eventfd is 15 !!!
    flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
    "arguments": { "dev": "/machine/peripheral/vscsi0",
   "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

What do you mean by "software" here? And it looks to me you're testing whether
event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
sure how much value could we gain from a dedicated debug interface like this
consider there're a lot of exisinting general purpose debugging method like
tracing or gdb. I'd say the path from virtio_queue_notify() to
event_notifier_set() is only a very small fraction of the process of virtqueue
kick which is unlikey to be buggy. Consider usually the ioeventfd will be
offloaded to KVM, it's more a chance that something is wrong in setuping
ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by 
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.



So there could be several factors that may block the notification:

1) eventfd bug (ioeventfd vs irqfd)
2) wrong virtqueue state (either driver or device)
3) missing barriers (either driver or device)
4) Qemu bug (irqchip and routing)
...

Consider we want to debug virtio issue, only 2) or 3) is what we really 
cared.


So for kick you did (assume vhost is on):

virtio_device_event_kick()
    virtio_queue_notify()
        event_notifier_set()

It looks to me you're actaully testing if ioeventfd is correctly set by 
Qemu.


For call you did:

virtio_device_event_call()
    event_notifier_set()

A test of irqfd is correctly set by Qemu. So all of those are not virtio 
specific stuffs but you introduce virtio specific command to do debug 
non virtio functions.


In the case of what you mentioned for vring_need_event(), what we really 
want is to dump the virtqueue state from the guest. This might requries 
some work of extending virtio spec (e.g to dump device status like 
indices, event, wrap counters).




This was initially proposed for vhost only and I was going to export
ioeventfd/irqfd from vhost to admin via sysfs. Finally, I realized I would
better impl

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-03-26 Thread Jason Wang



在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html

... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
   kref = {
 refcount = {
   refs = {
 counter = 4
   }
 }
   },
   wqh = {
 lock = {
   {
 rlock = {
   raw_lock = {
 val = {
   counter = 0
 }
   }
 }
   }
 },
 head = {
   next = 0x8f841dc08e18,
   prev = 0x8f841dc08e18
 }
   },
   count = 15, ---> eventfd is 15 !!!
   flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
   "arguments": { "dev": "/machine/peripheral/vscsi0",
  "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.



What do you mean by "software" here? And it looks to me you're testing 
whether event_notifier_set() is called by virtio_queue_notify() here. If 
so, I'm not sure how much value could we gain from a dedicated debug 
interface like this consider there're a lot of exisinting general 
purpose debugging method like tracing or gdb. I'd say the path from 
virtio_queue_notify() to event_notifier_set() is only a very small 
fraction of the process of virtqueue kick which is unlikey to be buggy. 
Consider usually the ioeventfd will be offloaded to KVM, it's more a 
chance that something is wrong in setuping ioeventfd instead of here. 
Irq is even more complicated.


I think we could not gain much for introducing an dedicated mechanism 
for such a corner case.


Thanks




crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
   kref = {
 refcount = {
   refs = {
 counter = 4
   }
 }
   },
   wqh = {
 lock = {
   {
 rlock = {
   raw_lock = {
 val = {
   counter = 0
 }
   }
 }
   }
 },
 head = {
   next = 0x8f841dc08e18,
   prev = 0x8f841dc08e18
 }
   },
   count = 16, ---> eventfd incremented to 16 !!!
   flags = 526336
}


Original RFC link:

https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg03441.html

Changed since RFC:
   - add support for more virtio/vhost pci devices
   - add log (toggled by DEBUG_VIRTIO_EVENT) to virtio.c to say that this
 mischeivous command had been used
   - fix grammer error (s/lost/loss/)
   - change version to 6.1
   - fix incorrect example in qapi/qdev.json
   - manage event types with enum/array, instead of hard coding


Dongli Zhang (6):
qdev: introduce qapi/hmp command for kick/call event
virtio: introduce helper function for kick/call device event
virtio-blk-pci: implement device event interface for kick/call
virtio-scsi-pci: implement device event interface for kick/call
vhost-scsi-pci: implement device event interface for kick/call
virtio-net-pci: implement device event interface for kick/call

  hmp-commands.hx | 14 
  hw/block/virtio-blk.c   |  9 +
  hw/net/virtio-net.c |  9 +
  hw/scsi/vhost-scsi.c|  6 
  hw/scsi/virtio-scsi.c   |  9 +
  hw/virtio/vhost-scsi-pci.c  | 10 ++
  hw/virtio/virtio-blk-pci.c  | 10 ++
  hw/virtio/virtio-net-pci.c  | 10 ++
  hw/virtio/virtio-scsi-pci.c | 10 ++
  hw/virtio/virtio.c

Re: [RFC PATCH 00/12] hw: Forbid DMA write accesses to MMIO regions

2020-09-03 Thread Jason Wang



On 2020/9/4 上午3:46, Edgar E. Iglesias wrote:

On Thu, Sep 03, 2020 at 07:53:33PM +0200, Paolo Bonzini wrote:

On 03/09/20 17:50, Edgar E. Iglesias wrote:

Hmm, I guess it would make sense to have a configurable option in KVM
to isolate passthrough devices so they only can DMA to guest RAM...

Passthrough devices are always protected by the IOMMU, anything else
would be obviously insane^H^H^Hecure. :)

Really? To always do that blindly seems wrong.

I'm refering to the passthrough device not being able to reach registers
of other passthrough devices within the same guest.

Ah okay; sorry, I misunderstood.  That makes more sense now!

Multiple devices are put in the same IOMMU "container" (page table
basically), and that takes care of reaching registers of other
passthrough devices.

Thanks, yes, that's a sane default. What I was trying to say before is that
it may make sense to allow the user to "harden" the setup by selectivly
putting certain passthrough devs on a separate group that can *only*
DMA access guest RAM (not other device regs).



This makes sens but it requires the knowledge from the management layer 
of whether P2P is needed which is probably not easy.


Thanks




Some devs need access to other device's regs but many passthrough devs don't
need DMA access to anything else but RAM (e.g an Ethernet MAC).

That could mitigate the damage caused by wild DMA pointers...

Cheers,
Edgar






Re: [PATCH v2 0/2] util/hexdump: Cleanup qemu_hexdump()

2020-08-31 Thread Jason Wang



On 2020/8/23 上午2:09, Philippe Mathieu-Daudé wrote:

- Pass const void* buffer
- Reorder arguments

Supersedes: <20200822150457.1322519-1-f4...@amsat.org>

Philippe Mathieu-Daudé (2):
   util/hexdump: Convert to take a void pointer argument
   util/hexdump: Reorder qemu_hexdump() arguments

  include/qemu-common.h|  3 ++-
  hw/dma/xlnx_dpdma.c  |  2 +-
  hw/net/fsl_etsec/etsec.c |  2 +-
  hw/net/fsl_etsec/rings.c |  2 +-
  hw/sd/sd.c   |  2 +-
  hw/usb/redirect.c|  2 +-
  net/colo-compare.c   | 24 
  net/net.c|  2 +-
  util/hexdump.c   |  4 +++-
  util/iov.c   |  2 +-
  10 files changed, 24 insertions(+), 21 deletions(-)



Applied.

Thanks




Re: [PATCH v2 4/7] vhost: involve device backends in feature negotiation

2020-06-10 Thread Jason Wang



On 2020/6/10 下午2:11, Michael S. Tsirkin wrote:

On Wed, Jun 10, 2020 at 01:53:57PM +0800, Jason Wang wrote:

On 2020/6/10 下午12:15, Michael S. Tsirkin wrote:

On Wed, Jun 10, 2020 at 11:21:50AM +0800, Jason Wang wrote:

On 2020/6/10 上午2:07, Michael S. Tsirkin wrote:

+/*
+ * Default vhost_get_features() feature bits for existing device types that do
+ * not define their own.
+ *
+ * This is a workaround for existing device types, do not use this in new vhost
+ * device types. Explicitly define a list of feature bits instead.
+ *
+ * The following feature bits are excluded because libvhost-user device
+ * backends did not advertise them for a long time. Therefore we cannot detect
+ * their presence. Instead we assume they are always supported by the device
+ * backend:
+ * VIRTIO_F_NOTIFY_ON_EMPTY
+ * VIRTIO_F_ANY_LAYOUT
+ * VIRTIO_F_VERSION_1
+ * VIRTIO_RING_F_INDIRECT_DESC
+ * VIRTIO_RING_F_EVENT_IDX

Weird. I remember that it's common for vhost-user not to set
VIRTIO_RING_F_INDIRECT_DESC - they have huge queues so
don't need it and inline descriptors give them better
performance.

So what's going on here?

I guess one reason is to support live migration between vhost-user and
vhost-net.

Thanks


But how can we force-enable features backend doesn't want to enable?


We can't and the code just forces qemu to validate
VIRTIO_RING_F_INDIRECT_DESC for each vhost backends instead of assuming the
support silently.

Thanks

So why does the comment above say:

  Instead we assume they are always supported by the device backend



Sorry for being unclear. I meant, for any new type of vhost devices, 
they should advertise features as what spec said and qemu must validate 
VIRTIO_RING_F_INDIRECT_DESC.


Thanks








This may or may not break backends ...
I would rather just be strict and ask backends to fix their feature
bits. See user_feature_bits in hw/net/vhost-net.c which supports
all these features.






Re: [PATCH v2 4/7] vhost: involve device backends in feature negotiation

2020-06-10 Thread Jason Wang



On 2020/6/10 下午12:15, Michael S. Tsirkin wrote:

On Wed, Jun 10, 2020 at 11:21:50AM +0800, Jason Wang wrote:

On 2020/6/10 上午2:07, Michael S. Tsirkin wrote:

+/*
+ * Default vhost_get_features() feature bits for existing device types that do
+ * not define their own.
+ *
+ * This is a workaround for existing device types, do not use this in new vhost
+ * device types. Explicitly define a list of feature bits instead.
+ *
+ * The following feature bits are excluded because libvhost-user device
+ * backends did not advertise them for a long time. Therefore we cannot detect
+ * their presence. Instead we assume they are always supported by the device
+ * backend:
+ * VIRTIO_F_NOTIFY_ON_EMPTY
+ * VIRTIO_F_ANY_LAYOUT
+ * VIRTIO_F_VERSION_1
+ * VIRTIO_RING_F_INDIRECT_DESC
+ * VIRTIO_RING_F_EVENT_IDX

Weird. I remember that it's common for vhost-user not to set
VIRTIO_RING_F_INDIRECT_DESC - they have huge queues so
don't need it and inline descriptors give them better
performance.

So what's going on here?


I guess one reason is to support live migration between vhost-user and
vhost-net.

Thanks


But how can we force-enable features backend doesn't want to enable?



We can't and the code just forces qemu to validate 
VIRTIO_RING_F_INDIRECT_DESC for each vhost backends instead of assuming 
the support silently.


Thanks



This may or may not break backends ...
I would rather just be strict and ask backends to fix their feature
bits. See user_feature_bits in hw/net/vhost-net.c which supports
all these features.






Re: [PATCH v2 4/7] vhost: involve device backends in feature negotiation

2020-06-09 Thread Jason Wang



On 2020/6/10 上午2:07, Michael S. Tsirkin wrote:

+/*
+ * Default vhost_get_features() feature bits for existing device types that do
+ * not define their own.
+ *
+ * This is a workaround for existing device types, do not use this in new vhost
+ * device types. Explicitly define a list of feature bits instead.
+ *
+ * The following feature bits are excluded because libvhost-user device
+ * backends did not advertise them for a long time. Therefore we cannot detect
+ * their presence. Instead we assume they are always supported by the device
+ * backend:
+ * VIRTIO_F_NOTIFY_ON_EMPTY
+ * VIRTIO_F_ANY_LAYOUT
+ * VIRTIO_F_VERSION_1
+ * VIRTIO_RING_F_INDIRECT_DESC
+ * VIRTIO_RING_F_EVENT_IDX

Weird. I remember that it's common for vhost-user not to set
VIRTIO_RING_F_INDIRECT_DESC - they have huge queues so
don't need it and inline descriptors give them better
performance.

So what's going on here?



I guess one reason is to support live migration between vhost-user and 
vhost-net.


Thanks





Re: [PATCH 2/5] vhost: involve device backends in feature negotiation

2020-06-01 Thread Jason Wang



On 2020/5/29 下午9:56, Stefan Hajnoczi wrote:

On Fri, May 29, 2020 at 03:04:54PM +0800, Jason Wang wrote:

On 2020/5/23 上午1:17, Stefan Hajnoczi wrote:

Many vhost devices in QEMU currently do not involve the device backend
in feature negotiation. This seems fine at first glance for device types
without their own feature bits (virtio-net has many but other device
types have none).

This overlooks the fact that QEMU's virtqueue implementation and the
device backend's implementation may support different features.  QEMU
must not report features to the guest that the the device backend
doesn't support.

For example, QEMU supports VIRTIO 1.1 packed virtqueues while many
existing vhost device backends do not. When the user sets packed=on the
device backend breaks. This should have been handled gracefully by
feature negotiation instead.

Introduce vhost_get_default_features() and update all vhost devices in
QEMU to involve the device backend in feature negotiation.

This patch fixes the following error:

$ x86_64-softmmu/qemu-system-x86_64 \
-drive if=virtio,file=test.img,format=raw \
-chardev socket,path=/tmp/vhost-user-blk.sock,id=char0 \
-device vhost-user-blk-pci,chardev=char0,packed=on \
-object memory-backend-memfd,size=1G,share=on,id=ram0 \
-M accel=kvm,memory-backend=ram0
qemu-system-x86_64: Failed to set msg fds.
qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Success (0)


It looks to me this could be fixed simply by adding VIRTIO_F_RING_PACKED
into user_feature_bits in vhost-user-blk.c.

And for the rest, we need require them to call vhost_get_features() with the
proper feature bits that needs to be acked by the backend.

There is a backwards-compatibility problem: we cannot call
vhost_get_features() with the full set of feature bits that each device
type supports because existing vhost-user backends don't advertise
features properly. QEMU disables features not advertised by the
vhost-user backend.

For example, if we add VIRTIO_RING_F_EVENT_IDX then it will always be
disabled for existing libvhost-user backends because they don't
advertise this bit :(.



Agree.




The reason I introduced vhost_get_default_features() is to at least
salvage VIRTIO_F_IOMMU_PLATFORM and VIRTIO_F_RING_PACKED. These bits can
be safely negotiated for all devices.

Any new transport/vring VIRTIO feature bits can also be added to
vhost_get_default_features() safely.

If a vhost-user device wants to use device type-specific feature bits
then it really needs to call vhost_get_features() directly as you
suggest. But it's important to check whether existing vhost-user
backends actually advertise them - because if they don't advertise them
but rely on them then we'll break existing backends.

Would you prefer a different approach?



I get you now so I think not.

Maybe we need document the expected behavior of VHOST_USER_GET_FEATURES 
e.g which set of features that must be advertised explicitly.


And for the set you mention here, we probably need to add:

VIRTIO_F_ORDER_PLATFORM,
VIRTIO_F_ANY_LAYOUT (not sure if it was too late).

And

VIRTIO_F_IN_ORDER





diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index aff98a0ede..f8a144dcd0 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -48,6 +48,23 @@ static unsigned int used_memslots;
   static QLIST_HEAD(, vhost_dev) vhost_devices =
   QLIST_HEAD_INITIALIZER(vhost_devices);
+/*
+ * Feature bits that device backends must explicitly report. Feature bits not
+ * listed here maybe set by QEMU without checking with the device backend.
+ * Ideally all feature bits would be listed here but existing vhost device
+ * implementations do not explicitly report bits like VIRTIO_F_VERSION_1, so we
+ * can only assume they are supported.


For most backends, we care about the features for datapath. So this is not
true at least for networking device, since VIRTIO_F_VERSION_1 have impact on
the length of vnet header length.

The net device is in good shape and doesn't use vhost_default_feature_bits[].

vhost_default_feature_bits[] is for devices that haven't been
negotiating feature bits properly. The goal is start involving them in
feature negotiation without breaking existing backends.

Would you like me to rephrase this comment in some way?



That will be better.





+ *
+ * New feature bits added to the VIRTIO spec should usually be included here
+ * so that existing vhost device backends that do not support them yet continue
+ * to work.


It actually depends on the type of backend.

Kernel vhost-net does not validate GSO features since qemu can talk directly
to TAP and vhost doesn't report those features. But for vhost-user GSO
features must be validated by qemu since qemu can't see what is behind
vhost-user.

Maybe the comment should say "New transport/vring feature bits". I'm
thinking about things like VIRTIO_F_RING_PACKED that are not
device-specific but require backend support.

The GSO fe

Re: [PATCH 5/5] virtio: enable VIRTIO_F_RING_PACKED for all devices

2020-05-29 Thread Jason Wang



On 2020/5/23 上午1:17, Stefan Hajnoczi wrote:

The packed virtqueue layout was introduced in VIRTIO 1.1. It is a single
ring instead of a split avail/used ring design. There are CPU cache
advantages to this layout and it is also suited better to hardware
implementation.

The vhost-net backend has already supported packed virtqueues for some
time. Performance benchmarks show that virtio-blk performance on NVMe
drives is also improved.

Go ahead and enable this feature for all VIRTIO devices. Keep it
disabled for QEMU 5.0 and earlier machine types.

Signed-off-by: Stefan Hajnoczi 
---
  include/hw/virtio/virtio.h |  2 +-
  hw/core/machine.c  | 18 +-
  2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b69d517496..fd5b4a2044 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -292,7 +292,7 @@ typedef struct VirtIORNGConf VirtIORNGConf;
  DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
VIRTIO_F_IOMMU_PLATFORM, false), \
  DEFINE_PROP_BIT64("packed", _state, _field, \
-  VIRTIO_F_RING_PACKED, false)
+  VIRTIO_F_RING_PACKED, true)
  
  hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);

  bool virtio_queue_enabled(VirtIODevice *vdev, int n);
diff --git a/hw/core/machine.c b/hw/core/machine.c
index bb3a7b18b1..3598c3c825 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -28,7 +28,23 @@
  #include "hw/mem/nvdimm.h"
  #include "migration/vmstate.h"
  
-GlobalProperty hw_compat_5_0[] = {};

+GlobalProperty hw_compat_5_0[] = {
+{ "vhost-user-blk", "packed", "off" },
+{ "vhost-user-fs-device", "packed", "off" },
+{ "vhost-vsock-device", "packed", "off" },
+{ "virtio-9p-device", "packed", "off" },
+{ "virtio-balloon-device", "packed", "off" },
+{ "virtio-blk-device", "packed", "off" },
+{ "virtio-crypto-device", "packed", "off" },
+{ "virtio-gpu-device", "packed", "off" },
+{ "virtio-input-device", "packed", "off" },
+{ "virtio-iommu-device", "packed", "off" },
+{ "virtio-net-device", "packed", "off" },
+{ "virtio-pmem", "packed", "off" },
+{ "virtio-rng-device", "packed", "off" },
+{ "virtio-scsi-common", "packed", "off" },
+{ "virtio-serial-device", "packed", "off" },



Missing "vhost-user-gpu" here?

I try to do something like this in the past but give up since I end up 
with similar list.


It would be better to consider something more smart, probably need some 
refactor for a common parent class.


Thanks



+};
  const size_t hw_compat_5_0_len = G_N_ELEMENTS(hw_compat_5_0);
  
  GlobalProperty hw_compat_4_2[] = {





Re: [PATCH 2/5] vhost: involve device backends in feature negotiation

2020-05-29 Thread Jason Wang



On 2020/5/23 上午1:17, Stefan Hajnoczi wrote:

Many vhost devices in QEMU currently do not involve the device backend
in feature negotiation. This seems fine at first glance for device types
without their own feature bits (virtio-net has many but other device
types have none).

This overlooks the fact that QEMU's virtqueue implementation and the
device backend's implementation may support different features.  QEMU
must not report features to the guest that the the device backend
doesn't support.

For example, QEMU supports VIRTIO 1.1 packed virtqueues while many
existing vhost device backends do not. When the user sets packed=on the
device backend breaks. This should have been handled gracefully by
feature negotiation instead.

Introduce vhost_get_default_features() and update all vhost devices in
QEMU to involve the device backend in feature negotiation.

This patch fixes the following error:

   $ x86_64-softmmu/qemu-system-x86_64 \
   -drive if=virtio,file=test.img,format=raw \
   -chardev socket,path=/tmp/vhost-user-blk.sock,id=char0 \
   -device vhost-user-blk-pci,chardev=char0,packed=on \
   -object memory-backend-memfd,size=1G,share=on,id=ram0 \
   -M accel=kvm,memory-backend=ram0
   qemu-system-x86_64: Failed to set msg fds.
   qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Success (0)



It looks to me this could be fixed simply by adding VIRTIO_F_RING_PACKED 
into user_feature_bits in vhost-user-blk.c.


And for the rest, we need require them to call vhost_get_features() with 
the proper feature bits that needs to be acked by the backend.





The vhost-user-blk backend failed as follows:

   $ ./vhost-user-blk --socket-path=/tmp/vhost-user-blk.sock -b test2.img
   vu_panic: virtio: zero sized buffers are not allowed
   virtio-blk request missing headers

Signed-off-by: Stefan Hajnoczi 
---
  include/hw/virtio/vhost.h|  1 +
  include/hw/virtio/virtio-gpu.h   |  2 ++
  include/sysemu/cryptodev-vhost.h | 11 +++
  backends/cryptodev-vhost.c   | 19 +++
  hw/display/vhost-user-gpu.c  | 17 +
  hw/display/virtio-gpu-base.c |  2 +-
  hw/input/vhost-user-input.c  |  9 +
  hw/virtio/vhost-user-fs.c|  5 +++--
  hw/virtio/vhost-vsock.c  |  5 +++--
  hw/virtio/vhost.c| 22 ++
  hw/virtio/virtio-crypto.c|  3 ++-
  11 files changed, 90 insertions(+), 6 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 085450c6f8..d2e54dd4a8 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -112,6 +112,7 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, 
VirtIODevice *vdev, int n,
bool mask);
  uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits,
  uint64_t features);
+uint64_t vhost_get_default_features(struct vhost_dev *hdev, uint64_t features);
  void vhost_ack_features(struct vhost_dev *hdev, const int *feature_bits,
  uint64_t features);
  bool vhost_has_free_slot(void);
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 6dd57f2025..41d270d80e 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -192,6 +192,8 @@ bool virtio_gpu_base_device_realize(DeviceState *qdev,
  void virtio_gpu_base_reset(VirtIOGPUBase *g);
  void virtio_gpu_base_fill_display_info(VirtIOGPUBase *g,
  struct virtio_gpu_resp_display_info *dpy_info);
+uint64_t virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t features,
+  Error **errp);
  
  /* virtio-gpu.c */

  void virtio_gpu_ctrl_response(VirtIOGPU *g,
diff --git a/include/sysemu/cryptodev-vhost.h b/include/sysemu/cryptodev-vhost.h
index f42824fbde..e629446bfb 100644
--- a/include/sysemu/cryptodev-vhost.h
+++ b/include/sysemu/cryptodev-vhost.h
@@ -122,6 +122,17 @@ int cryptodev_vhost_start(VirtIODevice *dev, int 
total_queues);
   */
  void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues);
  
+/**

+ * cryptodev_vhost_get_features:
+ * @dev: the virtio crypto object
+ * @requested_features: the features being offered
+ *
+ * Returns: the requested features bits that are supported by the vhost device,
+ * or the original request feature bits if vhost is disabled
+ *
+ */
+uint64_t cryptodev_vhost_get_features(VirtIODevice *dev, uint64_t features);
+
  /**
   * cryptodev_vhost_virtqueue_mask:
   * @dev: the virtio crypto object
diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c
index 8337c9a495..5f5a4fda7b 100644
--- a/backends/cryptodev-vhost.c
+++ b/backends/cryptodev-vhost.c
@@ -266,6 +266,20 @@ void cryptodev_vhost_stop(VirtIODevice *dev, int 
total_queues)
  assert(r >= 0);
  }
  
+uint64_t cryptodev_vhost_get_features(VirtIODevice *dev, uint64_t features)

+{
+VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev);
+

Re: [PATCH v3 2/2] vhost-user-blk: delay vhost_user_blk_disconnect

2020-05-24 Thread Jason Wang



On 2020/5/20 下午11:53, Dima Stepanov wrote:

A socket write during vhost-user communication may trigger a disconnect
event, calling vhost_user_blk_disconnect() and clearing all the
vhost_dev structures holding data that vhost-user functions expect to
remain valid to roll back initialization correctly. Delay the cleanup to
keep vhost_dev structure valid.
There are two possible states to handle:
1. RUN_STATE_PRELAUNCH: skip bh oneshot call and perform disconnect in
the caller routine.
2. RUN_STATE_RUNNING: delay by using bh

BH changes are based on the similar changes for the vhost-user-net
device:
   commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
   "vhost-user: delay vhost_user_stop"



It's better to explain why we don't need to deal with case 1 in the 
vhost-user-net case.


And do we still need patch 1 if we had this patch??

Thanks




Signed-off-by: Dima Stepanov 
---
  hw/block/vhost-user-blk.c | 49 +--
  1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9d8c0b3..447fc9c 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -337,11 +337,6 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
  VHostUserBlk *s = VHOST_USER_BLK(vdev);
  
-if (!s->connected) {

-return;
-}
-s->connected = false;
-
  if (s->dev.started) {
  vhost_user_blk_stop(vdev);
  }
@@ -349,6 +344,19 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
  vhost_dev_cleanup(>dev);
  }
  
+static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);

+
+static void vhost_user_blk_chr_closed_bh(void *opaque)
+{
+DeviceState *dev = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+vhost_user_blk_disconnect(dev);
+qemu_chr_fe_set_handlers(>chardev, NULL, NULL, vhost_user_blk_event,
+NULL, opaque, NULL, true);
+}
+
  static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
  {
  DeviceState *dev = opaque;
@@ -363,7 +371,28 @@ static void vhost_user_blk_event(void *opaque, 
QEMUChrEvent event)
  }
  break;
  case CHR_EVENT_CLOSED:
-vhost_user_blk_disconnect(dev);
+/*
+ * A close event may happen during a read/write, but vhost
+ * code assumes the vhost_dev remains setup, so delay the
+ * stop & clear. There are two possible paths to hit this
+ * disconnect event:
+ * 1. When VM is in the RUN_STATE_PRELAUNCH state. The
+ * vhost_user_blk_device_realize() is a caller.
+ * 2. In tha main loop phase after VM start.
+ *
+ * For p2 the disconnect event will be delayed. We can't
+ * do the same for p1, because we are not running the loop
+ * at this moment. So just skip this step and perform
+ * disconnect in the caller function.
+ */
+if (s->connected && runstate_is_running()) {
+AioContext *ctx = qemu_get_current_aio_context();
+
+qemu_chr_fe_set_handlers(>chardev, NULL, NULL, NULL, NULL,
+NULL, NULL, false);
+aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);
+}
+s->connected = false;
  break;
  case CHR_EVENT_BREAK:
  case CHR_EVENT_MUX_IN:
@@ -428,6 +457,14 @@ reconnect:
  
  ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg,

 sizeof(struct virtio_blk_config));
+if (!s->connected) {
+/*
+ * Perform disconnect before making reconnect. More detailed
+ * comment why it was delayed is in the vhost_user_blk_event()
+ * routine.
+ */
+vhost_user_blk_disconnect(dev);
+}
  if (ret < 0) {
  error_report("vhost-user-blk: get block config failed");
  goto reconnect;





Re: [RFC v3 4/6] qmp: add QMP command x-debug-virtio-queue-status

2020-05-17 Thread Jason Wang



On 2020/5/15 下午11:16, Laurent Vivier wrote:

On 08/05/2020 04:57, Jason Wang wrote:

On 2020/5/7 下午7:49, Laurent Vivier wrote:

This new command shows internal status of a VirtQueue.
(vrings and indexes).

Signed-off-by: Laurent Vivier 


It looks to me that packed virtqueue is not supported. It's better to
add them in the future.

I agree, it's why the series still remains an "RFC".

...

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 59bf6ef651a6..57552bf05014 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3877,6 +3877,41 @@ static VirtIODevice *virtio_device_find(const
char *path)
   return NULL;
   }
   +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path,
+ uint16_t queue,
Error **errp)
+{
+    VirtIODevice *vdev;
+    VirtQueueStatus *status;
+
+    vdev = virtio_device_find(path);
+    if (vdev == NULL) {
+    error_setg(errp, "Path %s is not a VirtIO device", path);
+    return NULL;
+    }
+
+    if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev,
queue)) {
+    error_setg(errp, "Invalid virtqueue number %d", queue);
+    return NULL;
+    }
+
+    status = g_new0(VirtQueueStatus, 1);
+    status->queue_index = vdev->vq[queue].queue_index;
+    status->inuse = vdev->vq[queue].inuse;
+    status->vring_num = vdev->vq[queue].vring.num;
+    status->vring_num_default = vdev->vq[queue].vring.num_default;
+    status->vring_align = vdev->vq[queue].vring.align;
+    status->vring_desc = vdev->vq[queue].vring.desc;
+    status->vring_avail = vdev->vq[queue].vring.avail;
+    status->vring_used = vdev->vq[queue].vring.used;
+    status->last_avail_idx = vdev->vq[queue].last_avail_idx;


This might not be correct when vhost is used.

We may consider to sync last_avail_idx from vhost backends here?

Yes, but I don't know how to do that. Where can I find the information?



It could be synced through vhost ops vhost_get_vring_base(), see 
vhost_virtqueue_stop().


Thanks




Thanks,
Laurent





Re: [PATCH v2 5/5] vhost: add device started check in migration set log

2020-05-17 Thread Jason Wang



On 2020/5/16 上午11:20, Li Feng wrote:

Hi, Dima.
This abort is what I have mentioned in my previous email.
I have triggered this crash without any fix a week ago.
And I have written a test patch to let vhost_log_global_start return
int and propagate the error to up layer.
However, my change is a little large, because the origin callback
return void, and don't do some rollback.
After test, the migration could migrate to dst successfully, and fio
is still running perfectly, but the src vm is still stuck here, no
crash.

Is it right to return this error to the up layer?



That could be a solution or we may ask David for more suggestion.

Another thing that might be useful is to block re connection during 
migration.


Thanks




Thanks,
Feng Li

Dima Stepanov  于2020年5月16日周六 上午12:55写道:

On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:

On 2020/5/13 下午5:47, Dima Stepanov wrote:

 case CHR_EVENT_CLOSED:
 /* a close event may happen during a read/write, but vhost
  * code assumes the vhost_dev remains setup, so delay the
  * stop & clear to idle.
  * FIXME: better handle failure in vhost code, remove bh
  */
 if (s->watch) {
 AioContext *ctx = qemu_get_current_aio_context();

 g_source_remove(s->watch);
 s->watch = 0;
 qemu_chr_fe_set_handlers(>chr, NULL, NULL, NULL, NULL,
  NULL, NULL, false);

 aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
 }
 break;

I think it's time we dropped the FIXME and moved the handling to common
code. Jason? Marc-André?

I agree. Just to confirm, do you prefer bh or doing changes like what is
done in this series? It looks to me bh can have more easier codes.

Could it be a good idea just to make disconnect in the char device but
postphone clean up in the vhost-user-blk (or any other vhost-user
device) itself? So we are moving the postphone logic and decision from
the char device to vhost-user device. One of the idea i have is as
follows:
   - Put ourself in the INITIALIZATION state
   - Start these vhost-user "handshake" commands
   - If we got a disconnect error, perform disconnect, but don't clean up
 device (it will be clean up on the roll back). I can be done by
 checking the state in vhost_user_..._disconnect routine or smth like it


Any issue you saw just using the aio bh as Michael posted above.

Then we don't need to deal with the silent vhost_dev_stop() and we will have
codes that is much more easier to understand.

I've implemented this solution inside
hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
using the s->connected field. Looks good and more correct fix ). I have
two questions here before i'll rework the fixes:
1. Is it okay to make the similar fix inside vhost_user_blk_event() or
we are looking for more generic vhost-user solution? What do you think?
2. For migration we require an additional information that for the
vhost-user device it isn't an error, because i'm trigerring the
following assert error:
   Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults 
-no-user-config -M q35,sata=false'.
   Program terminated with signal SIGABRT, Aborted.
   #0  0x7fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
   [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]

   (gdb) bt
   #0  0x7fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
   #1  0x7fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
   #2  0x5648ea376ee6 in vhost_log_global_start
   (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
   #3  0x5648ea2dde7e in memory_global_dirty_log_start ()
   at ./memory.c:2611
   #4  0x5648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
   at ./migration/ram.c:2305
   #5  0x5648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 )
   at ./migration/ram.c:2323
   #6  0x5648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
   opaque=0x5648eb1f0f20 )
   at ./migration/ram.c:2436
   #7  0x5648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
   migration/savevm.c:1176
   #8  0x5648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
   migration/migration.c:3416
   #9  0x5648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
   util/qemu-thread-posix.c:519
   #10 0x7fb56eac56ba in start_thread () from
   /lib/x86_64-linux-gnu/libpthread.so.0
   #11 0x7fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
   (gdb) frame 2
   #2  0x5648ea376ee6 in vhost_log_global_start
  (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
   857 abort();
   (gdb) list
   852 {
   853 int r;
   854
   855 r = vhost_migration_log(listener, true);
   856 if (r < 0) {
   857 abort();
   858 }
   859 }
   860
   861 s

Re: [PATCH v2 5/5] vhost: add device started check in migration set log

2020-05-17 Thread Jason Wang



On 2020/5/16 上午12:54, Dima Stepanov wrote:

On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:

On 2020/5/13 下午5:47, Dima Stepanov wrote:

 case CHR_EVENT_CLOSED:
 /* a close event may happen during a read/write, but vhost
  * code assumes the vhost_dev remains setup, so delay the
  * stop & clear to idle.
  * FIXME: better handle failure in vhost code, remove bh
  */
 if (s->watch) {
 AioContext *ctx = qemu_get_current_aio_context();

 g_source_remove(s->watch);
 s->watch = 0;
 qemu_chr_fe_set_handlers(>chr, NULL, NULL, NULL, NULL,
  NULL, NULL, false);

 aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
 }
 break;

I think it's time we dropped the FIXME and moved the handling to common
code. Jason? Marc-André?

I agree. Just to confirm, do you prefer bh or doing changes like what is
done in this series? It looks to me bh can have more easier codes.

Could it be a good idea just to make disconnect in the char device but
postphone clean up in the vhost-user-blk (or any other vhost-user
device) itself? So we are moving the postphone logic and decision from
the char device to vhost-user device. One of the idea i have is as
follows:
   - Put ourself in the INITIALIZATION state
   - Start these vhost-user "handshake" commands
   - If we got a disconnect error, perform disconnect, but don't clean up
 device (it will be clean up on the roll back). I can be done by
 checking the state in vhost_user_..._disconnect routine or smth like it


Any issue you saw just using the aio bh as Michael posted above.

Then we don't need to deal with the silent vhost_dev_stop() and we will have
codes that is much more easier to understand.

I've implemented this solution inside
hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
using the s->connected field. Looks good and more correct fix ). I have
two questions here before i'll rework the fixes:
1. Is it okay to make the similar fix inside vhost_user_blk_event() or
we are looking for more generic vhost-user solution? What do you think?



I think I agree with Michael, it's better to have a generic vhost-user 
solution. But if it turns out to be not easy, we can start from fixing 
vhost-user-blk.




2. For migration we require an additional information that for the
vhost-user device it isn't an error, because i'm trigerring the
following assert error:
   Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults 
-no-user-config -M q35,sata=false'.
   Program terminated with signal SIGABRT, Aborted.
   #0  0x7fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
   [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]

   (gdb) bt
   #0  0x7fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
   #1  0x7fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
   #2  0x5648ea376ee6 in vhost_log_global_start
   (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
   #3  0x5648ea2dde7e in memory_global_dirty_log_start ()
   at ./memory.c:2611
   #4  0x5648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
   at ./migration/ram.c:2305
   #5  0x5648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 )
   at ./migration/ram.c:2323
   #6  0x5648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
   opaque=0x5648eb1f0f20 )
   at ./migration/ram.c:2436
   #7  0x5648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
   migration/savevm.c:1176
   #8  0x5648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
   migration/migration.c:3416
   #9  0x5648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
   util/qemu-thread-posix.c:519
   #10 0x7fb56eac56ba in start_thread () from
   /lib/x86_64-linux-gnu/libpthread.so.0
   #11 0x7fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
   (gdb) frame 2
   #2  0x5648ea376ee6 in vhost_log_global_start
  (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
   857 abort();
   (gdb) list
   852 {
   853 int r;
   854
   855 r = vhost_migration_log(listener, true);
   856 if (r < 0) {
   857 abort();
   858 }
   859 }
   860
   861 static void vhost_log_global_stop(MemoryListener *listener)
Since bh postphone the clean up, we can't use the ->started field.
Do we have any mechanism to get the device type/state in the common
vhost_migration_log() routine? So for example for the vhost-user/disconnect
device we will be able to return 0. Or should we implement it and introduce
it in this patch set?



This requires more thought, I will reply in Feng's mail.

Thanks




Thanks, Dima.


Thank



   - vhost-user command returns error back to the _start() routine
   - Rollback in one place in the start() routine, by

Re: [PATCH v2 5/5] vhost: add device started check in migration set log

2020-05-14 Thread Jason Wang



On 2020/5/13 下午5:47, Dima Stepanov wrote:

 case CHR_EVENT_CLOSED:
 /* a close event may happen during a read/write, but vhost
  * code assumes the vhost_dev remains setup, so delay the
  * stop & clear to idle.
  * FIXME: better handle failure in vhost code, remove bh
  */
 if (s->watch) {
 AioContext *ctx = qemu_get_current_aio_context();

 g_source_remove(s->watch);
 s->watch = 0;
 qemu_chr_fe_set_handlers(>chr, NULL, NULL, NULL, NULL,
  NULL, NULL, false);

 aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
 }
 break;

I think it's time we dropped the FIXME and moved the handling to common
code. Jason? Marc-André?

I agree. Just to confirm, do you prefer bh or doing changes like what is
done in this series? It looks to me bh can have more easier codes.

Could it be a good idea just to make disconnect in the char device but
postphone clean up in the vhost-user-blk (or any other vhost-user
device) itself? So we are moving the postphone logic and decision from
the char device to vhost-user device. One of the idea i have is as
follows:
   - Put ourself in the INITIALIZATION state
   - Start these vhost-user "handshake" commands
   - If we got a disconnect error, perform disconnect, but don't clean up
 device (it will be clean up on the roll back). I can be done by
 checking the state in vhost_user_..._disconnect routine or smth like it



Any issue you saw just using the aio bh as Michael posted above.

Then we don't need to deal with the silent vhost_dev_stop() and we will 
have codes that is much more easier to understand.


Thank



   - vhost-user command returns error back to the _start() routine
   - Rollback in one place in the start() routine, by calling this
 postphoned clean up for the disconnect






Re: [PATCH v2 4/5] vhost: check vring address before calling unmap

2020-05-14 Thread Jason Wang



On 2020/5/13 下午5:36, Dima Stepanov wrote:

On Wed, May 13, 2020 at 11:00:38AM +0800, Jason Wang wrote:

On 2020/5/12 下午5:08, Dima Stepanov wrote:

On Tue, May 12, 2020 at 11:26:11AM +0800, Jason Wang wrote:

On 2020/5/11 下午5:11, Dima Stepanov wrote:

On Mon, May 11, 2020 at 11:05:58AM +0800, Jason Wang wrote:

On 2020/4/30 下午9:36, Dima Stepanov wrote:

Since disconnect can happen at any time during initialization not all
vring buffers (for instance used vring) can be intialized successfully.
If the buffer was not initialized then vhost_memory_unmap call will lead
to SIGSEGV. Add checks for the vring address value before calling unmap.
Also add assert() in the vhost_memory_unmap() routine.

Signed-off-by: Dima Stepanov
---
  hw/virtio/vhost.c | 27 +--
  1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddbdc53..3ee50c4 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void 
*buffer,
 hwaddr len, int is_write,
 hwaddr access_len)
  {
+assert(buffer);
+
  if (!vhost_dev_has_iommu(dev)) {
  cpu_physical_memory_unmap(buffer, len, is_write, access_len);
  }
@@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
  vhost_vq_index);
  }
-vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
-   1, virtio_queue_get_used_size(vdev, idx));
-vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
-   0, virtio_queue_get_avail_size(vdev, idx));
-vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
-   0, virtio_queue_get_desc_size(vdev, idx));
+/*
+ * Since the vhost-user disconnect can happen during initialization
+ * check if vring was initialized, before making unmap.
+ */
+if (vq->used) {
+vhost_memory_unmap(dev, vq->used,
+   virtio_queue_get_used_size(vdev, idx),
+   1, virtio_queue_get_used_size(vdev, idx));
+}
+if (vq->avail) {
+vhost_memory_unmap(dev, vq->avail,
+   virtio_queue_get_avail_size(vdev, idx),
+   0, virtio_queue_get_avail_size(vdev, idx));
+}
+if (vq->desc) {
+vhost_memory_unmap(dev, vq->desc,
+   virtio_queue_get_desc_size(vdev, idx),
+   0, virtio_queue_get_desc_size(vdev, idx));
+}

Any reason not checking hdev->started instead? vhost_dev_start() will set it
to true if virtqueues were correctly mapped.

Thanks

Well i see it a little bit different:
  - vhost_dev_start() sets hdev->started to true before starting
virtqueues
  - vhost_virtqueue_start() maps all the memory
If we hit the vhost disconnect at the start of the
vhost_virtqueue_start(), for instance for this call:
   r = dev->vhost_ops->vhost_set_vring_base(dev, );
Then we will call vhost_user_blk_disconnect:
   vhost_user_blk_disconnect()->
 vhost_user_blk_stop()->
   vhost_dev_stop()->
 vhost_virtqueue_stop()
As a result we will come in this routine with the hdev->started still
set to true, but if used/avail/desc fields still uninitialized and set
to 0.

I may miss something, but consider both vhost_dev_start() and
vhost_user_blk_disconnect() were serialized in main loop. Can this really
happen?

Yes, consider the case when we start the vhost-user-blk device:
   vhost_dev_start->
 vhost_virtqueue_start
And we got a disconnect in the middle of vhost_virtqueue_start()
routine, for instance:
   1000 vq->num = state.num = virtio_queue_get_num(vdev, idx);
   1001 r = dev->vhost_ops->vhost_set_vring_num(dev, );
   1002 if (r) {
   1003 VHOST_OPS_DEBUG("vhost_set_vring_num failed");
   1004 return -errno;
   1005 }
   --> Here we got a disconnect <--
   1006
   1007 state.num = virtio_queue_get_last_avail_idx(vdev, idx);
   1008 r = dev->vhost_ops->vhost_set_vring_base(dev, );
   1009 if (r) {
   1010 VHOST_OPS_DEBUG("vhost_set_vring_base failed");
   1011 return -errno;
   1012 }
As a result call to vhost_set_vring_base will call the disconnect
routine. The backtrace log for SIGSEGV is as follows:
   Thread 4 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
   [Switching to Thread 0x72ea9700 (LWP 183150)]
   0x74d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
   (gdb) bt
   #0  0x74d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
   #1  0x5590fd90 in flatview_write_continue (fv=0x7fffec4a2600,
   addr=0, attrs=..., ptr=0x0, len=1028, ad

Re: [PATCH v2 5/5] vhost: add device started check in migration set log

2020-05-12 Thread Jason Wang



On 2020/5/13 下午12:15, Michael S. Tsirkin wrote:

On Tue, May 12, 2020 at 12:35:30PM +0300, Dima Stepanov wrote:

On Tue, May 12, 2020 at 11:32:50AM +0800, Jason Wang wrote:

On 2020/5/11 下午5:25, Dima Stepanov wrote:

On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:

On 2020/4/30 下午9:36, Dima Stepanov wrote:

If vhost-user daemon is used as a backend for the vhost device, then we
should consider a possibility of disconnect at any moment. If such
disconnect happened in the vhost_migration_log() routine the vhost
device structure will be clean up.
At the start of the vhost_migration_log() function there is a check:
   if (!dev->started) {
   dev->log_enabled = enable;
   return 0;
   }
To be consistent with this check add the same check after calling the
vhost_dev_set_log() routine. This in general help not to break a
migration due the assert() message. But it looks like that this code
should be revised to handle these errors more carefully.

In case of vhost-user device backend the fail paths should consider the
state of the device. In this case we should skip some function calls
during rollback on the error paths, so not to get the NULL dereference
errors.

Signed-off-by: Dima Stepanov
---
  hw/virtio/vhost.c | 39 +++
  1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 3ee50c4..d5ab96d 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
  {
  int r, i, idx;
+
+if (!dev->started) {
+/*
+ * If vhost-user daemon is used as a backend for the
+ * device and the connection is broken, then the vhost_dev
+ * structure will be reset all its values to 0.
+ * Add additional check for the device state.
+ */
+return -1;
+}
+
  r = vhost_dev_set_features(dev, enable_log);
  if (r < 0) {
  goto err_features;
@@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool 
enable_log)
  }
  return 0;
  err_vq:
-for (; i >= 0; --i) {
+/*
+ * Disconnect with the vhost-user daemon can lead to the
+ * vhost_dev_cleanup() call which will clean up vhost_dev
+ * structure.
+ */
+for (; dev->started && (i >= 0); --i) {
  idx = dev->vhost_ops->vhost_get_vq_index(

Why need the check of dev->started here, can started be modified outside
mainloop? If yes, I don't get the check of !dev->started in the beginning of
this function.


No dev->started can't change outside the mainloop. The main problem is
only for the vhost_user_blk daemon. Consider the case when we
successfully pass the dev->started check at the beginning of the
function, but after it we hit the disconnect on the next call on the
second or third iteration:
  r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
The unix socket backend device will call the disconnect routine for this
device and reset the structure. So the structure will be reset (and
dev->started set to false) inside this set_addr() call.

I still don't get here. I think the disconnect can not happen in the middle
of vhost_dev_set_log() since both of them were running in mainloop. And even
if it can, we probably need other synchronization mechanism other than
simple check here.

Disconnect isn't happened in the separate thread it is happened in this
routine inside vhost_dev_set_log. When for instance vhost_user_write()
call failed:
   vhost_user_set_log_base()
 vhost_user_write()
   vhost_user_blk_disconnect()
 vhost_dev_cleanup()
   vhost_user_backend_cleanup()
So the point is that if we somehow got a disconnect with the
vhost-user-blk daemon before the vhost_user_write() call then it will
continue clean up by running vhost_user_blk_disconnect() function. I
wrote a more detailed backtrace stack in the separate thread, which is
pretty similar to what we have here:
   Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
The places are different but the problem is pretty similar.

So if vhost-user commands handshake then everything is fine and
reconnect will work as expected. The only problem is how to handle
reconnect properly between vhost-user command send/receive.


So vhost net had this problem too.

commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
Author: Marc-André Lureau
Date:   Mon Feb 27 14:49:56 2017 +0400

 vhost-user: delay vhost_user_stop
 
 Since commit b0a335e351103bf92f3f9d0bd5759311be8156ac, a socket write

 may trigger a disconnect events, calling vhost_user_stop() and clearing
 all the vhost_dev strutures holding data that vhost.c functions expect
 to remain valid. Delay the cleanup to keep the vhost_dev structure
 valid during the vhost.c f

Re: [PATCH v2 5/5] vhost: add device started check in migration set log

2020-05-12 Thread Jason Wang



On 2020/5/12 下午5:35, Dima Stepanov wrote:

On Tue, May 12, 2020 at 11:32:50AM +0800, Jason Wang wrote:

On 2020/5/11 下午5:25, Dima Stepanov wrote:

On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:

On 2020/4/30 下午9:36, Dima Stepanov wrote:

If vhost-user daemon is used as a backend for the vhost device, then we
should consider a possibility of disconnect at any moment. If such
disconnect happened in the vhost_migration_log() routine the vhost
device structure will be clean up.
At the start of the vhost_migration_log() function there is a check:
   if (!dev->started) {
   dev->log_enabled = enable;
   return 0;
   }
To be consistent with this check add the same check after calling the
vhost_dev_set_log() routine. This in general help not to break a
migration due the assert() message. But it looks like that this code
should be revised to handle these errors more carefully.

In case of vhost-user device backend the fail paths should consider the
state of the device. In this case we should skip some function calls
during rollback on the error paths, so not to get the NULL dereference
errors.

Signed-off-by: Dima Stepanov
---
  hw/virtio/vhost.c | 39 +++
  1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 3ee50c4..d5ab96d 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
  {
  int r, i, idx;
+
+if (!dev->started) {
+/*
+ * If vhost-user daemon is used as a backend for the
+ * device and the connection is broken, then the vhost_dev
+ * structure will be reset all its values to 0.
+ * Add additional check for the device state.
+ */
+return -1;
+}
+
  r = vhost_dev_set_features(dev, enable_log);
  if (r < 0) {
  goto err_features;
@@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool 
enable_log)
  }
  return 0;
  err_vq:
-for (; i >= 0; --i) {
+/*
+ * Disconnect with the vhost-user daemon can lead to the
+ * vhost_dev_cleanup() call which will clean up vhost_dev
+ * structure.
+ */
+for (; dev->started && (i >= 0); --i) {
  idx = dev->vhost_ops->vhost_get_vq_index(

Why need the check of dev->started here, can started be modified outside
mainloop? If yes, I don't get the check of !dev->started in the beginning of
this function.


No dev->started can't change outside the mainloop. The main problem is
only for the vhost_user_blk daemon. Consider the case when we
successfully pass the dev->started check at the beginning of the
function, but after it we hit the disconnect on the next call on the
second or third iteration:
  r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
The unix socket backend device will call the disconnect routine for this
device and reset the structure. So the structure will be reset (and
dev->started set to false) inside this set_addr() call.

I still don't get here. I think the disconnect can not happen in the middle
of vhost_dev_set_log() since both of them were running in mainloop. And even
if it can, we probably need other synchronization mechanism other than
simple check here.

Disconnect isn't happened in the separate thread it is happened in this
routine inside vhost_dev_set_log. When for instance vhost_user_write()
call failed:
   vhost_user_set_log_base()
 vhost_user_write()
   vhost_user_blk_disconnect()
 vhost_dev_cleanup()
   vhost_user_backend_cleanup()
So the point is that if we somehow got a disconnect with the
vhost-user-blk daemon before the vhost_user_write() call then it will
continue clean up by running vhost_user_blk_disconnect() function. I
wrote a more detailed backtrace stack in the separate thread, which is
pretty similar to what we have here:
   Re: [PATCH v2 4/5] vhost: check vring address before calling unmap
The places are different but the problem is pretty similar.



Yes.




So if vhost-user commands handshake then everything is fine and
reconnect will work as expected. The only problem is how to handle
reconnect properly between vhost-user command send/receive.

As i wrote we have a test:
   - run src VM with vhost-usr-blk daemon used
   - run fio inside it
   - perform reconnect every X seconds (just kill and restart daemon),
 X is random
   - run dst VM
   - perform migration
   - fio should complete in dst VM
And we cycle this test like forever.
So it fails once per ~25 iteration. By adding some delays inside qemu we
were able to make the race window larger.



It would be better if we can draft some qtest for this.

Thanks









Re: [PATCH v2 4/5] vhost: check vring address before calling unmap

2020-05-12 Thread Jason Wang



On 2020/5/12 下午5:08, Dima Stepanov wrote:

On Tue, May 12, 2020 at 11:26:11AM +0800, Jason Wang wrote:

On 2020/5/11 下午5:11, Dima Stepanov wrote:

On Mon, May 11, 2020 at 11:05:58AM +0800, Jason Wang wrote:

On 2020/4/30 下午9:36, Dima Stepanov wrote:

Since disconnect can happen at any time during initialization not all
vring buffers (for instance used vring) can be intialized successfully.
If the buffer was not initialized then vhost_memory_unmap call will lead
to SIGSEGV. Add checks for the vring address value before calling unmap.
Also add assert() in the vhost_memory_unmap() routine.

Signed-off-by: Dima Stepanov 
---
  hw/virtio/vhost.c | 27 +--
  1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddbdc53..3ee50c4 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void 
*buffer,
 hwaddr len, int is_write,
 hwaddr access_len)
  {
+assert(buffer);
+
  if (!vhost_dev_has_iommu(dev)) {
  cpu_physical_memory_unmap(buffer, len, is_write, access_len);
  }
@@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
  vhost_vq_index);
  }
-vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
-   1, virtio_queue_get_used_size(vdev, idx));
-vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
-   0, virtio_queue_get_avail_size(vdev, idx));
-vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
-   0, virtio_queue_get_desc_size(vdev, idx));
+/*
+ * Since the vhost-user disconnect can happen during initialization
+ * check if vring was initialized, before making unmap.
+ */
+if (vq->used) {
+vhost_memory_unmap(dev, vq->used,
+   virtio_queue_get_used_size(vdev, idx),
+   1, virtio_queue_get_used_size(vdev, idx));
+}
+if (vq->avail) {
+vhost_memory_unmap(dev, vq->avail,
+   virtio_queue_get_avail_size(vdev, idx),
+   0, virtio_queue_get_avail_size(vdev, idx));
+}
+if (vq->desc) {
+vhost_memory_unmap(dev, vq->desc,
+   virtio_queue_get_desc_size(vdev, idx),
+   0, virtio_queue_get_desc_size(vdev, idx));
+}

Any reason not checking hdev->started instead? vhost_dev_start() will set it
to true if virtqueues were correctly mapped.

Thanks

Well i see it a little bit different:
  - vhost_dev_start() sets hdev->started to true before starting
virtqueues
  - vhost_virtqueue_start() maps all the memory
If we hit the vhost disconnect at the start of the
vhost_virtqueue_start(), for instance for this call:
   r = dev->vhost_ops->vhost_set_vring_base(dev, );
Then we will call vhost_user_blk_disconnect:
   vhost_user_blk_disconnect()->
 vhost_user_blk_stop()->
   vhost_dev_stop()->
 vhost_virtqueue_stop()
As a result we will come in this routine with the hdev->started still
set to true, but if used/avail/desc fields still uninitialized and set
to 0.


I may miss something, but consider both vhost_dev_start() and
vhost_user_blk_disconnect() were serialized in main loop. Can this really
happen?

Yes, consider the case when we start the vhost-user-blk device:
   vhost_dev_start->
 vhost_virtqueue_start
And we got a disconnect in the middle of vhost_virtqueue_start()
routine, for instance:
   1000 vq->num = state.num = virtio_queue_get_num(vdev, idx);
   1001 r = dev->vhost_ops->vhost_set_vring_num(dev, );
   1002 if (r) {
   1003 VHOST_OPS_DEBUG("vhost_set_vring_num failed");
   1004 return -errno;
   1005 }
   --> Here we got a disconnect <--
   1006
   1007 state.num = virtio_queue_get_last_avail_idx(vdev, idx);
   1008 r = dev->vhost_ops->vhost_set_vring_base(dev, );
   1009 if (r) {
   1010 VHOST_OPS_DEBUG("vhost_set_vring_base failed");
   1011 return -errno;
   1012 }
As a result call to vhost_set_vring_base will call the disconnect
routine. The backtrace log for SIGSEGV is as follows:
   Thread 4 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
   [Switching to Thread 0x72ea9700 (LWP 183150)]
   0x74d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
   (gdb) bt
   #0  0x74d60840 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
   #1  0x5590fd90 in flatview_write_continue (fv=0x7fffec4a2600,
   addr=0, attrs=..., ptr=0x0, len=1028, addr1=0,
   l=1028, mr=0x56b1b310) at ./exec.c:3142
   #2  0x5590fe98 in flatview_write (f

Re: [PATCH v2 5/5] vhost: add device started check in migration set log

2020-05-11 Thread Jason Wang



On 2020/5/11 下午5:25, Dima Stepanov wrote:

On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:

On 2020/4/30 下午9:36, Dima Stepanov wrote:

If vhost-user daemon is used as a backend for the vhost device, then we
should consider a possibility of disconnect at any moment. If such
disconnect happened in the vhost_migration_log() routine the vhost
device structure will be clean up.
At the start of the vhost_migration_log() function there is a check:
   if (!dev->started) {
   dev->log_enabled = enable;
   return 0;
   }
To be consistent with this check add the same check after calling the
vhost_dev_set_log() routine. This in general help not to break a
migration due the assert() message. But it looks like that this code
should be revised to handle these errors more carefully.

In case of vhost-user device backend the fail paths should consider the
state of the device. In this case we should skip some function calls
during rollback on the error paths, so not to get the NULL dereference
errors.

Signed-off-by: Dima Stepanov 
---
  hw/virtio/vhost.c | 39 +++
  1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 3ee50c4..d5ab96d 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
  {
  int r, i, idx;
+
+if (!dev->started) {
+/*
+ * If vhost-user daemon is used as a backend for the
+ * device and the connection is broken, then the vhost_dev
+ * structure will be reset all its values to 0.
+ * Add additional check for the device state.
+ */
+return -1;
+}
+
  r = vhost_dev_set_features(dev, enable_log);
  if (r < 0) {
  goto err_features;
@@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool 
enable_log)
  }
  return 0;
  err_vq:
-for (; i >= 0; --i) {
+/*
+ * Disconnect with the vhost-user daemon can lead to the
+ * vhost_dev_cleanup() call which will clean up vhost_dev
+ * structure.
+ */
+for (; dev->started && (i >= 0); --i) {
  idx = dev->vhost_ops->vhost_get_vq_index(


Why need the check of dev->started here, can started be modified outside
mainloop? If yes, I don't get the check of !dev->started in the beginning of
this function.


No dev->started can't change outside the mainloop. The main problem is
only for the vhost_user_blk daemon. Consider the case when we
successfully pass the dev->started check at the beginning of the
function, but after it we hit the disconnect on the next call on the
second or third iteration:
  r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
The unix socket backend device will call the disconnect routine for this
device and reset the structure. So the structure will be reset (and
dev->started set to false) inside this set_addr() call.



I still don't get here. I think the disconnect can not happen in the 
middle of vhost_dev_set_log() since both of them were running in 
mainloop. And even if it can, we probably need other synchronization 
mechanism other than simple check here.




  So
we shouldn't call the clean up calls because this virtqueues were clean
up in the disconnect call. But we should protect these calls somehow, so
it will not hit SIGSEGV and we will be able to pass migration.

Just to summarize it:
For the vhost-user-blk devices we ca hit clean up calls twice in case of
vhost disconnect:
1. The first time during the disconnect process. The clean up is called
inside it.
2. The second time during roll back clean up.
So if it is the case we should skip p2.


dev, dev->vq_index + i);
  vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
   dev->log_enabled);
  }
-vhost_dev_set_features(dev, dev->log_enabled);
+if (dev->started) {
+vhost_dev_set_features(dev, dev->log_enabled);
+}
  err_features:
  return r;
  }
@@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, 
int enable)
  } else {
  vhost_dev_log_resize(dev, vhost_get_log_size(dev));
  r = vhost_dev_set_log(dev, true);
-if (r < 0) {
+/*
+ * The dev log resize can fail, because of disconnect
+ * with the vhost-user-blk daemon. Check the device
+ * state before calling the vhost_dev_set_log()
+ * function.
+ * Don't return error if device isn't started to be
+ * consistent with the check above.
+ */
+if (dev->started && r < 0) {
  return r;
  }
  }
@@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev)
  fail_log:
  vhost_log_put(hdev, false);
  fail_vq

Re: [PATCH v2 4/5] vhost: check vring address before calling unmap

2020-05-11 Thread Jason Wang



On 2020/5/11 下午5:11, Dima Stepanov wrote:

On Mon, May 11, 2020 at 11:05:58AM +0800, Jason Wang wrote:

On 2020/4/30 下午9:36, Dima Stepanov wrote:

Since disconnect can happen at any time during initialization not all
vring buffers (for instance used vring) can be intialized successfully.
If the buffer was not initialized then vhost_memory_unmap call will lead
to SIGSEGV. Add checks for the vring address value before calling unmap.
Also add assert() in the vhost_memory_unmap() routine.

Signed-off-by: Dima Stepanov 
---
  hw/virtio/vhost.c | 27 +--
  1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddbdc53..3ee50c4 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void 
*buffer,
 hwaddr len, int is_write,
 hwaddr access_len)
  {
+assert(buffer);
+
  if (!vhost_dev_has_iommu(dev)) {
  cpu_physical_memory_unmap(buffer, len, is_write, access_len);
  }
@@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
  vhost_vq_index);
  }
-vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
-   1, virtio_queue_get_used_size(vdev, idx));
-vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
-   0, virtio_queue_get_avail_size(vdev, idx));
-vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
-   0, virtio_queue_get_desc_size(vdev, idx));
+/*
+ * Since the vhost-user disconnect can happen during initialization
+ * check if vring was initialized, before making unmap.
+ */
+if (vq->used) {
+vhost_memory_unmap(dev, vq->used,
+   virtio_queue_get_used_size(vdev, idx),
+   1, virtio_queue_get_used_size(vdev, idx));
+}
+if (vq->avail) {
+vhost_memory_unmap(dev, vq->avail,
+   virtio_queue_get_avail_size(vdev, idx),
+   0, virtio_queue_get_avail_size(vdev, idx));
+}
+if (vq->desc) {
+vhost_memory_unmap(dev, vq->desc,
+   virtio_queue_get_desc_size(vdev, idx),
+   0, virtio_queue_get_desc_size(vdev, idx));
+}


Any reason not checking hdev->started instead? vhost_dev_start() will set it
to true if virtqueues were correctly mapped.

Thanks

Well i see it a little bit different:
  - vhost_dev_start() sets hdev->started to true before starting
virtqueues
  - vhost_virtqueue_start() maps all the memory
If we hit the vhost disconnect at the start of the
vhost_virtqueue_start(), for instance for this call:
   r = dev->vhost_ops->vhost_set_vring_base(dev, );
Then we will call vhost_user_blk_disconnect:
   vhost_user_blk_disconnect()->
 vhost_user_blk_stop()->
   vhost_dev_stop()->
 vhost_virtqueue_stop()
As a result we will come in this routine with the hdev->started still
set to true, but if used/avail/desc fields still uninitialized and set
to 0.



I may miss something, but consider both vhost_dev_start() and 
vhost_user_blk_disconnect() were serialized in main loop. Can this 
really happen?


Thanks







  }
  static void vhost_eventfd_add(MemoryListener *listener,





Re: [PATCH v2 5/5] vhost: add device started check in migration set log

2020-05-10 Thread Jason Wang



On 2020/4/30 下午9:36, Dima Stepanov wrote:

If vhost-user daemon is used as a backend for the vhost device, then we
should consider a possibility of disconnect at any moment. If such
disconnect happened in the vhost_migration_log() routine the vhost
device structure will be clean up.
At the start of the vhost_migration_log() function there is a check:
   if (!dev->started) {
   dev->log_enabled = enable;
   return 0;
   }
To be consistent with this check add the same check after calling the
vhost_dev_set_log() routine. This in general help not to break a
migration due the assert() message. But it looks like that this code
should be revised to handle these errors more carefully.

In case of vhost-user device backend the fail paths should consider the
state of the device. In this case we should skip some function calls
during rollback on the error paths, so not to get the NULL dereference
errors.

Signed-off-by: Dima Stepanov 
---
  hw/virtio/vhost.c | 39 +++
  1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 3ee50c4..d5ab96d 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
  {
  int r, i, idx;
+
+if (!dev->started) {
+/*
+ * If vhost-user daemon is used as a backend for the
+ * device and the connection is broken, then the vhost_dev
+ * structure will be reset all its values to 0.
+ * Add additional check for the device state.
+ */
+return -1;
+}
+
  r = vhost_dev_set_features(dev, enable_log);
  if (r < 0) {
  goto err_features;
@@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool 
enable_log)
  }
  return 0;
  err_vq:
-for (; i >= 0; --i) {
+/*
+ * Disconnect with the vhost-user daemon can lead to the
+ * vhost_dev_cleanup() call which will clean up vhost_dev
+ * structure.
+ */
+for (; dev->started && (i >= 0); --i) {
  idx = dev->vhost_ops->vhost_get_vq_index(



Why need the check of dev->started here, can started be modified outside 
mainloop? If yes, I don't get the check of !dev->started in the 
beginning of this function.




dev, dev->vq_index + i);
  vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
   dev->log_enabled);
  }
-vhost_dev_set_features(dev, dev->log_enabled);
+if (dev->started) {
+vhost_dev_set_features(dev, dev->log_enabled);
+}
  err_features:
  return r;
  }
@@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, 
int enable)
  } else {
  vhost_dev_log_resize(dev, vhost_get_log_size(dev));
  r = vhost_dev_set_log(dev, true);
-if (r < 0) {
+/*
+ * The dev log resize can fail, because of disconnect
+ * with the vhost-user-blk daemon. Check the device
+ * state before calling the vhost_dev_set_log()
+ * function.
+ * Don't return error if device isn't started to be
+ * consistent with the check above.
+ */
+if (dev->started && r < 0) {
  return r;
  }
  }
@@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev)
  fail_log:
  vhost_log_put(hdev, false);
  fail_vq:
-while (--i >= 0) {
+/*
+ * Disconnect with the vhost-user daemon can lead to the
+ * vhost_dev_cleanup() call which will clean up vhost_dev
+ * structure.
+ */
+while ((--i >= 0) && (hdev->started)) {
  vhost_virtqueue_stop(hdev,
   vdev,
   hdev->vqs + i,



This should be a separate patch.

Thanks




Re: [PATCH v2 4/5] vhost: check vring address before calling unmap

2020-05-10 Thread Jason Wang



On 2020/4/30 下午9:36, Dima Stepanov wrote:

Since disconnect can happen at any time during initialization not all
vring buffers (for instance used vring) can be intialized successfully.
If the buffer was not initialized then vhost_memory_unmap call will lead
to SIGSEGV. Add checks for the vring address value before calling unmap.
Also add assert() in the vhost_memory_unmap() routine.

Signed-off-by: Dima Stepanov 
---
  hw/virtio/vhost.c | 27 +--
  1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddbdc53..3ee50c4 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void 
*buffer,
 hwaddr len, int is_write,
 hwaddr access_len)
  {
+assert(buffer);
+
  if (!vhost_dev_has_iommu(dev)) {
  cpu_physical_memory_unmap(buffer, len, is_write, access_len);
  }
@@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
  vhost_vq_index);
  }
  
-vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),

-   1, virtio_queue_get_used_size(vdev, idx));
-vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
-   0, virtio_queue_get_avail_size(vdev, idx));
-vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
-   0, virtio_queue_get_desc_size(vdev, idx));
+/*
+ * Since the vhost-user disconnect can happen during initialization
+ * check if vring was initialized, before making unmap.
+ */
+if (vq->used) {
+vhost_memory_unmap(dev, vq->used,
+   virtio_queue_get_used_size(vdev, idx),
+   1, virtio_queue_get_used_size(vdev, idx));
+}
+if (vq->avail) {
+vhost_memory_unmap(dev, vq->avail,
+   virtio_queue_get_avail_size(vdev, idx),
+   0, virtio_queue_get_avail_size(vdev, idx));
+}
+if (vq->desc) {
+vhost_memory_unmap(dev, vq->desc,
+   virtio_queue_get_desc_size(vdev, idx),
+   0, virtio_queue_get_desc_size(vdev, idx));
+}



Any reason not checking hdev->started instead? vhost_dev_start() will 
set it to true if virtqueues were correctly mapped.


Thanks



  }
  
  static void vhost_eventfd_add(MemoryListener *listener,





Re: [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device

2020-05-10 Thread Jason Wang



On 2020/4/30 下午9:36, Dima Stepanov wrote:

Introduce new wrappers to set/reset guest notifiers for the virtio
device in the vhost device module:
   vhost_dev_assign_guest_notifiers
 ->set_guest_notifiers(..., ..., true);
   vhost_dev_drop_guest_notifiers
 ->set_guest_notifiers(..., ..., false);
This is a preliminary step to refactor code,



Maybe I miss something, I don't see any add-on patch to modify the new 
wrapper in this series?




  so the set_guest_notifiers
methods could be called based on the vhost device state.
Update all vhost used devices to use these wrappers instead of direct
method call.

Signed-off-by: Dima Stepanov 
---
  backends/cryptodev-vhost.c  | 26 +++---
  backends/vhost-user.c   | 16 +---
  hw/block/vhost-user-blk.c   | 15 +--
  hw/net/vhost_net.c  | 30 +-
  hw/scsi/vhost-scsi-common.c | 15 +--
  hw/virtio/vhost-user-fs.c   | 17 +++--
  hw/virtio/vhost-vsock.c | 18 --
  hw/virtio/vhost.c   | 38 ++
  hw/virtio/virtio.c  | 13 +
  include/hw/virtio/vhost.h   |  4 
  include/hw/virtio/virtio.h  |  1 +
  11 files changed, 118 insertions(+), 75 deletions(-)

diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c
index 8337c9a..4522195 100644
--- a/backends/cryptodev-vhost.c
+++ b/backends/cryptodev-vhost.c
@@ -169,16 +169,13 @@ vhost_set_vring_enable(CryptoDevBackendClient *cc,
  int cryptodev_vhost_start(VirtIODevice *dev, int total_queues)
  {
  VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev);
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
-VirtioBusState *vbus = VIRTIO_BUS(qbus);
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
  int r, e;
  int i;
  CryptoDevBackend *b = vcrypto->cryptodev;
  CryptoDevBackendVhost *vhost_crypto;
  CryptoDevBackendClient *cc;
  
-if (!k->set_guest_notifiers) {

+if (!virtio_device_guest_notifiers_initialized(dev)) {
  error_report("binding does not support guest notifiers");
  return -ENOSYS;
  }
@@ -198,9 +195,13 @@ int cryptodev_vhost_start(VirtIODevice *dev, int 
total_queues)
  }
   }
  
-r = k->set_guest_notifiers(qbus->parent, total_queues, true);

+/*
+ * Since all the states are handled by one vhost device,
+ * use the first one in array.
+ */
+vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
+r = vhost_dev_assign_guest_notifiers(_crypto->dev, dev, 
total_queues);
  if (r < 0) {
-error_report("error binding guest notifier: %d", -r);
  goto err;
  }
  
@@ -232,7 +233,8 @@ err_start:

  vhost_crypto = cryptodev_get_vhost(cc, b, i);
  cryptodev_vhost_stop_one(vhost_crypto, dev);
  }
-e = k->set_guest_notifiers(qbus->parent, total_queues, false);
+vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
+e = vhost_dev_drop_guest_notifiers(_crypto->dev, dev, total_queues);
  if (e < 0) {
  error_report("vhost guest notifier cleanup failed: %d", e);
  }
@@ -242,9 +244,6 @@ err:
  
  void cryptodev_vhost_stop(VirtIODevice *dev, int total_queues)

  {
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
-VirtioBusState *vbus = VIRTIO_BUS(qbus);
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
  VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(dev);
  CryptoDevBackend *b = vcrypto->cryptodev;
  CryptoDevBackendVhost *vhost_crypto;
@@ -259,7 +258,12 @@ void cryptodev_vhost_stop(VirtIODevice *dev, int 
total_queues)
  cryptodev_vhost_stop_one(vhost_crypto, dev);
  }
  
-r = k->set_guest_notifiers(qbus->parent, total_queues, false);

+/*
+ * Since all the states are handled by one vhost device,
+ * use the first one in array.
+ */
+vhost_crypto = cryptodev_get_vhost(b->conf.peers.ccs[0], b, 0);
+r = vhost_dev_drop_guest_notifiers(_crypto->dev, dev, total_queues);
  if (r < 0) {
  error_report("vhost guest notifier cleanup failed: %d", r);
  }
diff --git a/backends/vhost-user.c b/backends/vhost-user.c
index 2bf3406..e116bc6 100644
--- a/backends/vhost-user.c
+++ b/backends/vhost-user.c
@@ -60,15 +60,13 @@ vhost_user_backend_dev_init(VhostUserBackend *b, 
VirtIODevice *vdev,
  void
  vhost_user_backend_start(VhostUserBackend *b)
  {
-BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(b->vdev)));
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
  int ret, i ;
  
  if (b->started) {

  return;
  }
  
-if (!k->set_guest_notifiers) {

+if (!virtio_device_guest_notifiers_initialized(b->vdev)) {
  error_report("binding does not support guest notifiers");
  return;
  }
@@ -78,9 +76,8 @@ vhost_user_backend_start(VhostUserBackend *b)
  return;
  }
  
-ret = k->set_guest_notifiers(qbus->parent, b->dev.nvqs, 

Re: [RFC v3 5/6] qmp: add QMP command x-debug-virtio-queue-element

2020-05-07 Thread Jason Wang



On 2020/5/7 下午7:49, Laurent Vivier wrote:

This new command shows the information of a VirtQueue element.

Signed-off-by: Laurent Vivier 
---
  hw/virtio/virtio-stub.c |   9 +++
  hw/virtio/virtio.c  | 130 
  qapi/virtio.json|  94 +
  3 files changed, 233 insertions(+)

diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
index 3c1bf172acf6..8275e31430e7 100644
--- a/hw/virtio/virtio-stub.c
+++ b/hw/virtio/virtio-stub.c
@@ -23,3 +23,12 @@ VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char 
*path,
  {
  return qmp_virtio_unsupported(errp);
  }
+
+VirtioQueueElement *qmp_x_debug_virtio_queue_element(const char* path,
+ uint16_t queue,
+ bool has_index,
+ uint16_t index,
+ Error **errp)
+{
+return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 57552bf05014..66dc2cef1b39 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -4033,6 +4033,136 @@ VirtioStatus *qmp_x_debug_virtio_status(const char* 
path, Error **errp)
  return status;
  }
  
+static VirtioRingDescFlagsList *qmp_decode_vring_desc_flags(uint16_t flags)

+{
+VirtioRingDescFlagsList *list = NULL;
+VirtioRingDescFlagsList *node;
+int i;
+struct {
+uint16_t flag;
+VirtioRingDescFlags value;
+} map[] = {
+{ VRING_DESC_F_NEXT, VIRTIO_RING_DESC_FLAGS_NEXT },
+{ VRING_DESC_F_WRITE, VIRTIO_RING_DESC_FLAGS_WRITE },
+{ VRING_DESC_F_INDIRECT, VIRTIO_RING_DESC_FLAGS_INDIRECT },
+{ 1 << VRING_PACKED_DESC_F_AVAIL, VIRTIO_RING_DESC_FLAGS_AVAIL },
+{ 1 << VRING_PACKED_DESC_F_USED, VIRTIO_RING_DESC_FLAGS_USED },
+{ 0, -1 }
+};
+
+for (i = 0; map[i].flag; i++) {
+if ((map[i].flag & flags) == 0) {
+continue;
+}
+node = g_malloc0(sizeof(VirtioRingDescFlagsList));
+node->value = map[i].value;
+node->next = list;
+list = node;
+}
+
+return list;
+}
+
+VirtioQueueElement *qmp_x_debug_virtio_queue_element(const char* path,
+ uint16_t queue,
+ bool has_index,
+ uint16_t index,
+ Error **errp)
+{
+VirtIODevice *vdev;
+VirtQueue *vq;
+VirtioQueueElement *element = NULL;
+
+vdev = virtio_device_find(path);
+if (vdev == NULL) {
+error_setg(errp, "Path %s is not a VirtIO device", path);
+return NULL;
+}
+
+if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) {
+error_setg(errp, "Invalid virtqueue number %d", queue);
+return NULL;
+}
+vq = >vq[queue];
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+error_setg(errp, "Packed ring not supported");
+return NULL;
+} else {
+unsigned int head, i, max;
+VRingMemoryRegionCaches *caches;
+MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
+MemoryRegionCache *desc_cache;
+VRingDesc desc;
+VirtioRingDescList *list = NULL;
+VirtioRingDescList *node;
+int rc;
+
+RCU_READ_LOCK_GUARD();
+
+max = vq->vring.num;
+
+if (!has_index) {
+head = vring_avail_ring(vq, vq->last_avail_idx % vq->vring.num);
+} else {
+head = vring_avail_ring(vq, index % vq->vring.num);
+}
+i = head;
+
+caches = vring_get_region_caches(vq);
+if (!caches) {
+error_setg(errp, "Region caches not initialized");
+return NULL;
+}
+
+if (caches->desc.len < max * sizeof(VRingDesc)) {
+error_setg(errp, "Cannot map descriptor ring");
+return NULL;
+}
+
+desc_cache = >desc;
+vring_split_desc_read(vdev, , desc_cache, i);
+if (desc.flags & VRING_DESC_F_INDIRECT) {
+int64_t len;
+
+len = address_space_cache_init(_desc_cache, vdev->dma_as,
+   desc.addr, desc.len, false);
+desc_cache = _desc_cache;
+if (len < desc.len) {
+error_setg(errp, "Cannot map indirect buffer");
+goto done;
+}
+i = 0;
+vring_split_desc_read(vdev, , desc_cache, i);
+}
+
+element = g_new0(VirtioQueueElement, 1);
+element->index = head;
+element->ndescs = 0;
+
+do {
+node = g_new0(VirtioRingDescList, 1);
+node->value = g_new0(VirtioRingDesc, 1);
+node->value->addr = desc.addr;
+   

Re: [RFC v3 4/6] qmp: add QMP command x-debug-virtio-queue-status

2020-05-07 Thread Jason Wang



On 2020/5/7 下午7:49, Laurent Vivier wrote:

This new command shows internal status of a VirtQueue.
(vrings and indexes).

Signed-off-by: Laurent Vivier 



It looks to me that packed virtqueue is not supported. It's better to 
add them in the future.




---
  hw/virtio/virtio-stub.c |  6 +++
  hw/virtio/virtio.c  | 35 +++
  qapi/virtio.json| 98 +
  3 files changed, 139 insertions(+)

diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
index ddb592f72eee..3c1bf172acf6 100644
--- a/hw/virtio/virtio-stub.c
+++ b/hw/virtio/virtio-stub.c
@@ -17,3 +17,9 @@ VirtioStatus *qmp_x_debug_virtio_status(const char* path, 
Error **errp)
  {
  return qmp_virtio_unsupported(errp);
  }
+
+VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path,
+ uint16_t queue, Error **errp)
+{
+return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 59bf6ef651a6..57552bf05014 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3877,6 +3877,41 @@ static VirtIODevice *virtio_device_find(const char *path)
  return NULL;
  }
  
+VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path,

+ uint16_t queue, Error **errp)
+{
+VirtIODevice *vdev;
+VirtQueueStatus *status;
+
+vdev = virtio_device_find(path);
+if (vdev == NULL) {
+error_setg(errp, "Path %s is not a VirtIO device", path);
+return NULL;
+}
+
+if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) {
+error_setg(errp, "Invalid virtqueue number %d", queue);
+return NULL;
+}
+
+status = g_new0(VirtQueueStatus, 1);
+status->queue_index = vdev->vq[queue].queue_index;
+status->inuse = vdev->vq[queue].inuse;
+status->vring_num = vdev->vq[queue].vring.num;
+status->vring_num_default = vdev->vq[queue].vring.num_default;
+status->vring_align = vdev->vq[queue].vring.align;
+status->vring_desc = vdev->vq[queue].vring.desc;
+status->vring_avail = vdev->vq[queue].vring.avail;
+status->vring_used = vdev->vq[queue].vring.used;
+status->last_avail_idx = vdev->vq[queue].last_avail_idx;



This might not be correct when vhost is used.

We may consider to sync last_avail_idx from vhost backends here?

Thanks



+status->shadow_avail_idx = vdev->vq[queue].shadow_avail_idx;
+status->used_idx = vdev->vq[queue].used_idx;
+status->signalled_used = vdev->vq[queue].signalled_used;
+status->signalled_used_valid = vdev->vq[queue].signalled_used_valid;
+
+return status;
+}
+
  #define CONVERT_FEATURES(type, map)\
  ({   \
  type *list = NULL; \
diff --git a/qapi/virtio.json b/qapi/virtio.json
index 69dd107d0c9b..43c234a9fc69 100644
--- a/qapi/virtio.json
+++ b/qapi/virtio.json
@@ -308,3 +308,101 @@
'data': { 'path': 'str' },
'returns': 'VirtioStatus'
  }
+
+##
+# @VirtQueueStatus:
+#
+# Status of a VirtQueue
+#
+# @queue-index: VirtQueue queue_index
+#
+# @inuse: VirtQueue inuse
+#
+# @vring-num: VirtQueue vring.num
+#
+# @vring-num-default: VirtQueue vring.num_default
+#
+# @vring-align: VirtQueue vring.align
+#
+# @vring-desc: VirtQueue vring.desc
+#
+# @vring-avail: VirtQueue vring.avail
+#
+# @vring-used: VirtQueue vring.used
+#
+# @last-avail-idx: VirtQueue last_avail_idx
+#
+# @shadow-avail-idx: VirtQueue shadow_avail_idx
+#
+# @used-idx: VirtQueue used_idx
+#
+# @signalled-used: VirtQueue signalled_used
+#
+# @signalled-used-valid: VirtQueue signalled_used_valid
+#
+# Since: 5.1
+#
+##
+
+{ 'struct': 'VirtQueueStatus',
+  'data': {
+'queue-index': 'uint16',
+'inuse': 'uint32',
+'vring-num': 'int',
+'vring-num-default': 'int',
+'vring-align': 'int',
+'vring-desc': 'uint64',
+'vring-avail': 'uint64',
+'vring-used': 'uint64',
+'last-avail-idx': 'uint16',
+'shadow-avail-idx': 'uint16',
+'used-idx': 'uint16',
+'signalled-used': 'uint16',
+'signalled-used-valid': 'uint16'
+  }
+}
+
+##
+# @x-debug-virtio-queue-status:
+#
+# Return the status of a given VirtQueue
+#
+# @path: QOBject path of the VirtIODevice
+#
+# @queue: queue number to examine
+#
+# Returns: Status of the VirtQueue
+#
+# Since: 5.1
+#
+# Example:
+#
+# -> { "execute": "x-debug-virtio-queue-status",
+#  "arguments": {
+#  "path": "/machine/peripheral-anon/device[3]/virtio-backend",
+#  "queue": 0
+#  }
+#   }
+# <- { "return": {
+#  "signalled-used": 373,
+#  "inuse": 0,
+#  "vring-desc": 864411648,
+#  "vring-num-default": 256,
+#  "signalled-used-valid": 1,
+#  "vring-avail": 864415744,
+#  "last-avail-idx": 373,
+#  "queue-index": 0,
+#  "vring-used": 864416320,
+#  "shadow-avail-idx": 619,
+#  "used-idx": 373,
+#  "vring-num": 256,
+#  

Re: [RFC v3 3/6] qmp: decode feature bits in virtio-status

2020-05-07 Thread Jason Wang



On 2020/5/7 下午7:49, Laurent Vivier wrote:

Display feature names instead of a features bitmap for host, guest
and backend.

Decode features according device type, transport features are
on the first line. Undecoded bits (if any) are stored in a separate
field.

Signed-off-by: Laurent Vivier



This is really useful. I wonder maybe we need something similar in 
driver side, current sysfs can only display magic binary numbers.


Thanks




Re: [PATCH v7 0/4] colo: Add support for continuous replication

2020-02-19 Thread Jason Wang



On 2020/2/20 上午9:38, Zhang, Chen wrote:

Hi Jason,

I noticed this series can't be merged or queued, do you met some 
problem about it?



Thanks

Zhang Chen



Not, I've queued this.

Thanks





Re: [PATCH v7 0/4] colo: Add support for continuous replication

2019-11-19 Thread Jason Wang



On 2019/11/19 下午8:28, Zhang, Chen wrote:



-Original Message-
From: Lukas Straub 
Sent: Thursday, November 14, 2019 12:36 AM
To: qemu-devel 
Cc: Zhang, Chen ; Jason Wang
; Wen Congyang ;
Xie Changlong ; Kevin Wolf
; Max Reitz ; qemu-block

Subject: Re: [PATCH v7 0/4] colo: Add support for continuous replication

On Fri, 25 Oct 2019 19:06:31 +0200
Lukas Straub  wrote:


Hello Everyone,
These Patches add support for continuous replication to colo. This
means that after the Primary fails and the Secondary did a failover,
the Secondary can then become Primary and resume replication to a new

Secondary.

Regards,
Lukas Straub

v7:
  - clarify meaning of ip's in documentation and note that active and hidden
images just need to be created once
  - reverted removal of bdrv_is_root_node(top_bs) in replication and

adjusted

the top-id= parameter in documentation acordingly

v6:
  - documented the position= and insert= options
  - renamed replication test
  - clarified documentation by using different ip's for primary and
secondary
  - added Reviewed-by tags

v5:
  - change syntax for the position= parameter
  - fix spelling mistake

v4:
  - fix checkpatch.pl warnings

v3:
  - add test for replication changes
  - check if the filter to be inserted before/behind belongs to the
same interface
  - fix the error message for the position= parameter
  - rename term "after" -> "behind" and variable "insert_before" ->

"insert_before_flag"

  - document the quorum node on the secondary side
  - simplify quorum parameters in documentation
  - remove trailing spaces in documentation
  - clarify the testing procedure in documentation

v2:
  - fix email formating
  - fix checkpatch.pl warnings
  - fix patchew error
  - clearer commit messages


Lukas Straub (4):
   block/replication.c: Ignore requests after failover
   tests/test-replication.c: Add test for for secondary node continuing
 replication
   net/filter.c: Add Options to insert filters anywhere in the filter
 list
   colo: Update Documentation for continuous replication

  block/replication.c|  35 +-
  docs/COLO-FT.txt   | 224 +++--
  docs/block-replication.txt |  28 +++--
  include/net/filter.h   |   2 +
  net/filter.c   |  92 ++-
  qemu-options.hx|  31 -
  tests/test-replication.c   |  52 +
  7 files changed, 389 insertions(+), 75 deletions(-)


Hello Everyone,
So I guess this is ready for merging or will that have to wait until the 4.2
release is done?

Due to Qemu 4.2 release schedule, after soft feature freeze(Oct29) the master 
branch does not accept feature changes.
But I don't know if sub-maintainer(block or net) can queue this series first 
then merge it after 4.2 release?

Thanks
Zhang Chen



Will try to queue this series.

Thanks





Regards,
Lukas Straub





Re: [PATCH v6 0/9] Packed virtqueue for virtio

2019-10-24 Thread Jason Wang



On 2019/10/25 上午1:13, Eugenio Pérez wrote:

Hi:

This is an updated version of packed virtqueue support based on Wei and Jason's
V5, mainly solving the clang leak detector error CI gave.

Please review.

Changes from V5:
- Fix qemu's CI asan error.
- Move/copy rcu comments.
- Merge duplicated vdev->broken check between split and packet version.

Eugenio Pérez (3):
   virtio: Free rng and blk virqueues
   virtio: add some rcu comments
   virtio: Move vdev->broken check to dispatch drop_all

Jason Wang (4):
   virtio: basic packed virtqueue support
   virtio: event suppression support for packed ring
   vhost_net: enable packed ring support
   virtio: add property to enable packed virtqueue

Wei Xu (2):
   virtio: basic structure for packed ring
   virtio: device/driverr area size calculation refactor for split ring



Looks good to me.

Just two nits:

I tend to squash patch 8 and patch 9 into the patch that introduces 
those issues and split patch 3 into two parts.


Btw, if you wish you can add your s-o-b to the series.

Do you want to post a new version or I can tweak them by myself?

Thanks




  hw/block/virtio-blk.c   |7 +-
  hw/char/virtio-serial-bus.c |2 +-
  hw/net/vhost_net.c  |2 +
  hw/scsi/virtio-scsi.c   |3 +-
  hw/virtio/virtio-rng.c  |1 +
  hw/virtio/virtio.c  | 1154 ++-
  include/hw/virtio/virtio.h  |   14 +-
  7 files changed, 1045 insertions(+), 138 deletions(-)






[Qemu-block] [PATCH V4 01/10] virtio: convert to use DMA api

2016-12-30 Thread Jason Wang
Currently, all virtio devices bypass IOMMU completely. This is because
address_space_memory is assumed and used during DMA emulation. This
patch converts the virtio core API to use DMA API. This idea is

- introducing a new transport specific helper to query the dma address
  space. (only pci version is implemented).
- query and use this address space during virtio device guest memory
  accessing when iommu platform (VIRTIO_F_IOMMU_PLATFORM) was enabled
  for this device.

Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Stefan Hajnoczi <stefa...@redhat.com>
Cc: Kevin Wolf <kw...@redhat.com>
Cc: Amit Shah <amit.s...@redhat.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang <jasow...@redhat.com>
---
 hw/block/virtio-blk.c |  2 +-
 hw/char/virtio-serial-bus.c   |  3 ++-
 hw/scsi/virtio-scsi.c |  4 ++-
 hw/virtio/virtio-bus.c|  8 ++
 hw/virtio/virtio-pci.c| 14 ++
 hw/virtio/virtio.c| 57 +--
 include/hw/virtio/virtio-access.h | 31 ++---
 include/hw/virtio/virtio-bus.h|  1 +
 include/hw/virtio/virtio.h|  9 ---
 9 files changed, 93 insertions(+), 36 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 0c5fd27..12657ff 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -857,7 +857,7 @@ static int virtio_blk_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 }
 }
 
-req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq));
+req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq));
 virtio_blk_init_request(s, virtio_get_queue(vdev, vq_idx), req);
 req->next = s->rq;
 s->rq = req;
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 7975c2c..d544cd9 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -732,6 +732,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
 static int fetch_active_ports_list(QEMUFile *f,
VirtIOSerial *s, uint32_t nr_active_ports)
 {
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
 uint32_t i;
 
 s->post_load = g_malloc0(sizeof(*s->post_load));
@@ -765,7 +766,7 @@ static int fetch_active_ports_list(QEMUFile *f,
 qemu_get_be64s(f, >iov_offset);
 
 port->elem =
-qemu_get_virtqueue_element(f, sizeof(VirtQueueElement));
+qemu_get_virtqueue_element(vdev, f, sizeof(VirtQueueElement));
 
 /*
  *  Port was throttled on source machine.  Let's
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 34bba35..3da9d62 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -198,12 +198,14 @@ static void *virtio_scsi_load_request(QEMUFile *f, 
SCSIRequest *sreq)
 SCSIBus *bus = sreq->bus;
 VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
 VirtIOSCSIReq *req;
 uint32_t n;
 
 qemu_get_be32s(f, );
 assert(n < vs->conf.num_queues);
-req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size);
+req = qemu_get_virtqueue_element(vdev, f,
+ sizeof(VirtIOSCSIReq) + vs->cdb_size);
 virtio_scsi_init_req(s, vs->cmd_vqs[n], req);
 
 if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size,
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index d6c0c72..d31cc00 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -28,6 +28,7 @@
 #include "hw/qdev.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio.h"
+#include "exec/address-spaces.h"
 
 /* #define DEBUG_VIRTIO_BUS */
 
@@ -61,6 +62,13 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error 
**errp)
 if (klass->device_plugged != NULL) {
 klass->device_plugged(qbus->parent, errp);
 }
+
+if (klass->get_dma_as != NULL &&
+virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
+vdev->dma_as = klass->get_dma_as(qbus->parent);
+} else {
+vdev->dma_as = _space_memory;
+}
 }
 
 /* Reset the virtio_bus */
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 21c2b9d..213d57e 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1144,6 +1144,14 @@ static int virtio_pci_query_nvectors(DeviceState *d)
 return proxy->nvectors;
 }
 
+static AddressSpace *virtio_pci_get_dma_as(DeviceState *d)
+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
+PCIDevice *dev = >pci_dev;
+
+return pci_get_address_space(dev);
+}
+
 static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
   

[Qemu-block] [PATCH V3 01/10] virtio: convert to use DMA api

2016-11-06 Thread Jason Wang
Currently, all virtio devices bypass IOMMU completely. This is because
address_space_memory is assumed and used during DMA emulation. This
patch converts the virtio core API to use DMA API. This idea is

- introducing a new transport specific helper to query the dma address
  space. (only pci version is implemented).
- query and use this address space during virtio device guest memory
  accessing when iommu platform (VIRTIO_F_IOMMU_PLATFORM) was enabled
  for this device.

Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Stefan Hajnoczi <stefa...@redhat.com>
Cc: Kevin Wolf <kw...@redhat.com>
Cc: Amit Shah <amit.s...@redhat.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang <jasow...@redhat.com>
---
 hw/block/virtio-blk.c |  2 +-
 hw/char/virtio-serial-bus.c   |  3 ++-
 hw/scsi/virtio-scsi.c |  4 ++-
 hw/virtio/virtio-bus.c|  8 ++
 hw/virtio/virtio-pci.c| 14 ++
 hw/virtio/virtio.c| 57 +--
 include/hw/virtio/virtio-access.h | 31 ++---
 include/hw/virtio/virtio-bus.h|  1 +
 include/hw/virtio/virtio.h|  9 ---
 9 files changed, 93 insertions(+), 36 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 37fe72b..6a2dbaf 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -860,7 +860,7 @@ static int virtio_blk_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 }
 }
 
-req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq));
+req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq));
 virtio_blk_init_request(s, virtio_get_queue(vdev, vq_idx), req);
 req->next = s->rq;
 s->rq = req;
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 7975c2c..d544cd9 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -732,6 +732,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
 static int fetch_active_ports_list(QEMUFile *f,
VirtIOSerial *s, uint32_t nr_active_ports)
 {
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
 uint32_t i;
 
 s->post_load = g_malloc0(sizeof(*s->post_load));
@@ -765,7 +766,7 @@ static int fetch_active_ports_list(QEMUFile *f,
 qemu_get_be64s(f, >iov_offset);
 
 port->elem =
-qemu_get_virtqueue_element(f, sizeof(VirtQueueElement));
+qemu_get_virtqueue_element(vdev, f, sizeof(VirtQueueElement));
 
 /*
  *  Port was throttled on source machine.  Let's
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 4762f05..1519cee 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -198,12 +198,14 @@ static void *virtio_scsi_load_request(QEMUFile *f, 
SCSIRequest *sreq)
 SCSIBus *bus = sreq->bus;
 VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
 VirtIOSCSIReq *req;
 uint32_t n;
 
 qemu_get_be32s(f, );
 assert(n < vs->conf.num_queues);
-req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size);
+req = qemu_get_virtqueue_element(vdev, f,
+ sizeof(VirtIOSCSIReq) + vs->cdb_size);
 virtio_scsi_init_req(s, vs->cmd_vqs[n], req);
 
 if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size,
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index 11f65bd..8597762 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -28,6 +28,7 @@
 #include "hw/qdev.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio.h"
+#include "exec/address-spaces.h"
 
 /* #define DEBUG_VIRTIO_BUS */
 
@@ -61,6 +62,13 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error 
**errp)
 if (klass->device_plugged != NULL) {
 klass->device_plugged(qbus->parent, errp);
 }
+
+if (klass->get_dma_as != NULL &&
+virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
+vdev->dma_as = klass->get_dma_as(qbus->parent);
+} else {
+vdev->dma_as = _space_memory;
+}
 }
 
 /* Reset the virtio_bus */
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 06831de..6ceb43e 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1168,6 +1168,14 @@ static int virtio_pci_query_nvectors(DeviceState *d)
 return proxy->nvectors;
 }
 
+static AddressSpace *virtio_pci_get_dma_as(DeviceState *d)
+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
+PCIDevice *dev = >pci_dev;
+
+return pci_get_address_space(dev);
+}
+
 static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
   

Re: [Qemu-block] [PATCH V2 02/11] virtio: convert to use DMA api

2016-11-04 Thread Jason Wang



On 2016年11月04日 03:46, Michael S. Tsirkin wrote:

@@ -244,6 +245,7 @@ int virtio_queue_empty(VirtQueue *vq)
>  static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
> unsigned int len)
>  {
>+AddressSpace *dma_as = virtio_get_dma_as(vq->vdev);
>  unsigned int offset;
>  int i;

Can't we use

vdev->dma_as

here?

I'd like to avoid query on data path.

Same in most other places below.




Will use vdev->dma_as in next version.

Thanks



[Qemu-block] [PATCH V2 02/11] virtio: convert to use DMA api

2016-11-03 Thread Jason Wang
Currently, all virtio devices bypass IOMMU completely. This is because
address_space_memory is assumed and used during DMA emulation. This
patch converts the virtio core API to use DMA API. This idea is

- introducing a new transport specific helper to query the dma address
  space. (only pci version is implemented).
- query and use this address space during virtio device guest memory
  accessing when iommu platform (VIRTIO_F_IOMMU_PLATFORM) was enabled
  for this device.

Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Stefan Hajnoczi <stefa...@redhat.com>
Cc: Kevin Wolf <kw...@redhat.com>
Cc: Amit Shah <amit.s...@redhat.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang <jasow...@redhat.com>
---
 hw/block/virtio-blk.c |  2 +-
 hw/char/virtio-serial-bus.c   |  3 ++-
 hw/scsi/virtio-scsi.c |  4 ++-
 hw/virtio/virtio-bus.c|  8 ++
 hw/virtio/virtio-pci.c| 14 ++
 hw/virtio/virtio.c| 57 +--
 include/hw/virtio/virtio-access.h | 36 ++---
 include/hw/virtio/virtio-bus.h|  1 +
 include/hw/virtio/virtio.h|  9 ---
 9 files changed, 98 insertions(+), 36 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 37fe72b..6a2dbaf 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -860,7 +860,7 @@ static int virtio_blk_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 }
 }
 
-req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq));
+req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq));
 virtio_blk_init_request(s, virtio_get_queue(vdev, vq_idx), req);
 req->next = s->rq;
 s->rq = req;
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 7975c2c..d544cd9 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -732,6 +732,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
 static int fetch_active_ports_list(QEMUFile *f,
VirtIOSerial *s, uint32_t nr_active_ports)
 {
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
 uint32_t i;
 
 s->post_load = g_malloc0(sizeof(*s->post_load));
@@ -765,7 +766,7 @@ static int fetch_active_ports_list(QEMUFile *f,
 qemu_get_be64s(f, >iov_offset);
 
 port->elem =
-qemu_get_virtqueue_element(f, sizeof(VirtQueueElement));
+qemu_get_virtqueue_element(vdev, f, sizeof(VirtQueueElement));
 
 /*
  *  Port was throttled on source machine.  Let's
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 4762f05..1519cee 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -198,12 +198,14 @@ static void *virtio_scsi_load_request(QEMUFile *f, 
SCSIRequest *sreq)
 SCSIBus *bus = sreq->bus;
 VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
 VirtIOSCSIReq *req;
 uint32_t n;
 
 qemu_get_be32s(f, );
 assert(n < vs->conf.num_queues);
-req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size);
+req = qemu_get_virtqueue_element(vdev, f,
+ sizeof(VirtIOSCSIReq) + vs->cdb_size);
 virtio_scsi_init_req(s, vs->cmd_vqs[n], req);
 
 if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size,
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index 11f65bd..8597762 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -28,6 +28,7 @@
 #include "hw/qdev.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio.h"
+#include "exec/address-spaces.h"
 
 /* #define DEBUG_VIRTIO_BUS */
 
@@ -61,6 +62,13 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error 
**errp)
 if (klass->device_plugged != NULL) {
 klass->device_plugged(qbus->parent, errp);
 }
+
+if (klass->get_dma_as != NULL &&
+virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
+vdev->dma_as = klass->get_dma_as(qbus->parent);
+} else {
+vdev->dma_as = _space_memory;
+}
 }
 
 /* Reset the virtio_bus */
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 06831de..6ceb43e 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1168,6 +1168,14 @@ static int virtio_pci_query_nvectors(DeviceState *d)
 return proxy->nvectors;
 }
 
+static AddressSpace *virtio_pci_get_dma_as(DeviceState *d)
+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
+PCIDevice *dev = >pci_dev;
+
+return pci_get_address_space(dev);
+}
+
 static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
   

Re: [Qemu-block] [Qemu-devel] [PATCH for 2.8 02/11] virtio: convert to use DMA api

2016-09-06 Thread Jason Wang



On 2016年09月05日 10:26, Wei Xu wrote:

On 2016年08月30日 11:06, Jason Wang wrote:
@@ -1587,6 +1595,11 @@ static void 
virtio_pci_device_plugged(DeviceState *d, Error **errp)

  }

  if (legacy) {
+if (virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
+error_setg(errp, "VIRTIO_F_IOMMU_PLATFORM was supported by"
+   "neither legacy nor transitional device.");
+return ;
+}


Not sure if i understand it correctly, the transitional device here 
maybe a bit hard to understand, 


"transitional" were defined by spec.


just a tip for your convenience,
besides the denied prompt, can we add what kind of device is supported 
to the message? such as modern device only, like this.


"VIRTIO_F_IOMMU_PLATFORM is supported by modern device only, it
is not supported by either legacy or transitional device."


Ok.




  /* legacy and transitional */
  pci_set_word(config + PCI_SUBSYSTEM_VENDOR_ID,
   pci_get_word(config + PCI_VENDOR_ID));







Re: [Qemu-block] [PATCH for 2.8 02/11] virtio: convert to use DMA api

2016-08-30 Thread Jason Wang



On 2016年08月30日 15:31, Cornelia Huck wrote:

On Tue, 30 Aug 2016 11:06:50 +0800
Jason Wang <jasow...@redhat.com> wrote:


Currently, all virtio devices bypass IOMMU completely. This is because
address_space_memory is assumed and used during DMA emulation. This
patch converts the virtio core API to use DMA API. This idea is

- introducing a new transport specific helper to query the dma address
   space. (only pci version is implemented).
- query and use this address space during virtio device guest memory
   accessing when iommu platform (VIRTIO_F_IOMMU_PLATFORM) was enabled
   for this device.

Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Stefan Hajnoczi <stefa...@redhat.com>
Cc: Kevin Wolf <kw...@redhat.com>
Cc: Amit Shah <amit.s...@redhat.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang <jasow...@redhat.com>
---
  hw/block/virtio-blk.c |  2 +-
  hw/char/virtio-serial-bus.c   |  3 +-
  hw/scsi/virtio-scsi.c |  4 ++-
  hw/virtio/virtio-pci.c| 14 +
  hw/virtio/virtio.c| 62 ---
  include/hw/virtio/virtio-access.h | 43 ---
  include/hw/virtio/virtio-bus.h|  1 +
  include/hw/virtio/virtio.h|  8 +++--
  8 files changed, 98 insertions(+), 39 deletions(-)

diff --git a/include/hw/virtio/virtio-access.h 
b/include/hw/virtio/virtio-access.h
index 440b455..4071dad 100644
--- a/include/hw/virtio/virtio-access.h
+++ b/include/hw/virtio/virtio-access.h
@@ -17,12 +17,25 @@
  #define QEMU_VIRTIO_ACCESS_H

  #include "hw/virtio/virtio.h"
+#include "hw/virtio/virtio-bus.h"
  #include "exec/address-spaces.h"

  #if defined(TARGET_PPC64) || defined(TARGET_ARM)
  #define LEGACY_VIRTIO_IS_BIENDIAN 1
  #endif

+static inline AddressSpace *virtio_get_dma_as(VirtIODevice *vdev)
+{
+BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+
+if (virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) &&
+k->get_dma_as) {
+return k->get_dma_as(qbus->parent);
+}
+return _space_memory;
+}

One thing I'm a bit worried about is that we're introducing a check
even if we know that the device will never support
VIRTIO_F_IOMMU_PLATFORM (i.e. virtio-ccw). The qom incantations will
add cycles to any invocation of this.

Is the address space likely to change during device lifetime? Can we
cache it in some way?



I think so, we can cache it in vdev and set it during features set.

Then we can avoid qom stuffs during data path.




[Qemu-block] [PATCH for 2.8 02/11] virtio: convert to use DMA api

2016-08-29 Thread Jason Wang
Currently, all virtio devices bypass IOMMU completely. This is because
address_space_memory is assumed and used during DMA emulation. This
patch converts the virtio core API to use DMA API. This idea is

- introducing a new transport specific helper to query the dma address
  space. (only pci version is implemented).
- query and use this address space during virtio device guest memory
  accessing when iommu platform (VIRTIO_F_IOMMU_PLATFORM) was enabled
  for this device.

Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Stefan Hajnoczi <stefa...@redhat.com>
Cc: Kevin Wolf <kw...@redhat.com>
Cc: Amit Shah <amit.s...@redhat.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang <jasow...@redhat.com>
---
 hw/block/virtio-blk.c |  2 +-
 hw/char/virtio-serial-bus.c   |  3 +-
 hw/scsi/virtio-scsi.c |  4 ++-
 hw/virtio/virtio-pci.c| 14 +
 hw/virtio/virtio.c| 62 ---
 include/hw/virtio/virtio-access.h | 43 ---
 include/hw/virtio/virtio-bus.h|  1 +
 include/hw/virtio/virtio.h|  8 +++--
 8 files changed, 98 insertions(+), 39 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 331d766..8fd6df7 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -856,7 +856,7 @@ static int virtio_blk_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 }
 }
 
-req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq));
+req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq));
 virtio_blk_init_request(s, virtio_get_queue(vdev, vq_idx), req);
 req->next = s->rq;
 s->rq = req;
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index db57a38..94f19ba 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -682,6 +682,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
 static int fetch_active_ports_list(QEMUFile *f,
VirtIOSerial *s, uint32_t nr_active_ports)
 {
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
 uint32_t i;
 
 s->post_load = g_malloc0(sizeof(*s->post_load));
@@ -715,7 +716,7 @@ static int fetch_active_ports_list(QEMUFile *f,
 qemu_get_be64s(f, >iov_offset);
 
 port->elem =
-qemu_get_virtqueue_element(f, sizeof(VirtQueueElement));
+qemu_get_virtqueue_element(vdev, f, sizeof(VirtQueueElement));
 
 /*
  *  Port was throttled on source machine.  Let's
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index ce57ef6..4cc7627 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -197,12 +197,14 @@ static void *virtio_scsi_load_request(QEMUFile *f, 
SCSIRequest *sreq)
 SCSIBus *bus = sreq->bus;
 VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
 VirtIOSCSIReq *req;
 uint32_t n;
 
 qemu_get_be32s(f, );
 assert(n < vs->conf.num_queues);
-req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size);
+req = qemu_get_virtqueue_element(vdev, f,
+ sizeof(VirtIOSCSIReq) + vs->cdb_size);
 virtio_scsi_init_req(s, vs->cmd_vqs[n], req);
 
 if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size,
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 755f921..c10bf55 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1162,6 +1162,14 @@ static int virtio_pci_query_nvectors(DeviceState *d)
 return proxy->nvectors;
 }
 
+static AddressSpace *virtio_pci_get_dma_as(DeviceState *d)
+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
+PCIDevice *dev = >pci_dev;
+
+return pci_get_address_space(dev);
+}
+
 static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
struct virtio_pci_cap *cap)
 {
@@ -1587,6 +1595,11 @@ static void virtio_pci_device_plugged(DeviceState *d, 
Error **errp)
 }
 
 if (legacy) {
+if (virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
+error_setg(errp, "VIRTIO_F_IOMMU_PLATFORM was supported by"
+   "neither legacy nor transitional device.");
+return ;
+}
 /* legacy and transitional */
 pci_set_word(config + PCI_SUBSYSTEM_VENDOR_ID,
  pci_get_word(config + PCI_VENDOR_ID));
@@ -2452,6 +2465,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, 
void *data)
 k->ioeventfd_disabled = virtio_pci_ioeventfd_disabled;
 k->ioeventfd_set_disabled = virtio_pci_ioeventfd_set_disabled;
 k->ioeventfd_assign = virtio_pci_ioeventfd_assign;
+k->get_dma_as = vi

[Qemu-block] [RFC PATCH 1/8] virtio: convert to use DMA api

2016-03-25 Thread Jason Wang
Currently, all virtio devices bypass IOMMU completely. This is because
address_space_memory is assumed and used during DMA emulation. This
patch converts the virtio core API to use DMA API. This idea is

- introducing a new transport specific helper to query the dma address
  space. (only pci version is implemented).
- query and use this address space during virtio device guest memory
  accessing

With this virtio devices will not bypass IOMMU anymore. Tested with
intel_iommu=on/strict with:

- virtio guest DMA series posted in https://lkml.org/lkml/2015/10/28/64.
- vfio (unsafe interrupt mode) dpdk l2fwd in guest

TODO:
- Feature bit for this
- Implement this for all transports

Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Stefan Hajnoczi <stefa...@redhat.com>
Cc: Kevin Wolf <kw...@redhat.com>
Cc: Amit Shah <amit.s...@redhat.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang <jasow...@redhat.com>
---
 hw/block/virtio-blk.c |  2 +-
 hw/char/virtio-serial-bus.c   |  3 +-
 hw/scsi/virtio-scsi.c |  4 ++-
 hw/virtio/virtio-pci.c|  9 ++
 hw/virtio/virtio.c| 58 +++
 include/hw/virtio/virtio-access.h | 42 +---
 include/hw/virtio/virtio-bus.h|  1 +
 include/hw/virtio/virtio.h|  4 +--
 8 files changed, 85 insertions(+), 38 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cb710f1..9411f99 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -829,7 +829,7 @@ static int virtio_blk_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 
 while (qemu_get_sbyte(f)) {
 VirtIOBlockReq *req;
-req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq));
+req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq));
 virtio_blk_init_request(s, req);
 req->next = s->rq;
 s->rq = req;
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 99cb683..bdc5393 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -687,6 +687,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
 static int fetch_active_ports_list(QEMUFile *f, int version_id,
VirtIOSerial *s, uint32_t nr_active_ports)
 {
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
 uint32_t i;
 
 s->post_load = g_malloc0(sizeof(*s->post_load));
@@ -722,7 +723,7 @@ static int fetch_active_ports_list(QEMUFile *f, int 
version_id,
 qemu_get_be64s(f, >iov_offset);
 
 port->elem =
-qemu_get_virtqueue_element(f, sizeof(VirtQueueElement));
+qemu_get_virtqueue_element(vdev, f, 
sizeof(VirtQueueElement));
 
 /*
  *  Port was throttled on source machine.  Let's
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 0c30d2e..26ce701 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -196,12 +196,14 @@ static void *virtio_scsi_load_request(QEMUFile *f, 
SCSIRequest *sreq)
 SCSIBus *bus = sreq->bus;
 VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
 VirtIOSCSIReq *req;
 uint32_t n;
 
 qemu_get_be32s(f, );
 assert(n < vs->conf.num_queues);
-req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size);
+req = qemu_get_virtqueue_element(vdev, f,
+ sizeof(VirtIOSCSIReq) + vs->cdb_size);
 virtio_scsi_init_req(s, vs->cmd_vqs[n], req);
 
 if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size,
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 0dadb66..5508b1c 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1211,6 +1211,14 @@ static int virtio_pci_query_nvectors(DeviceState *d)
 return proxy->nvectors;
 }
 
+static AddressSpace *virtio_pci_get_dma_as(DeviceState *d)
+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
+PCIDevice *dev = >pci_dev;
+
+return pci_get_address_space(dev);
+}
+
 static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
struct virtio_pci_cap *cap)
 {
@@ -2495,6 +2503,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, 
void *data)
 k->device_plugged = virtio_pci_device_plugged;
 k->device_unplugged = virtio_pci_device_unplugged;
 k->query_nvectors = virtio_pci_query_nvectors;
+k->get_dma_as = virtio_pci_get_dma_as;
 }
 
 static const TypeInfo virtio_pci_bus_info = {
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 08275a9..37c9951 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -21,6 +21,7 @@
 #include "hw/virtio/virtio-bus.h"
 #include "migration/mig

[Qemu-block] [PATCH V4 3/3] virtio-blk: only clear VIRTIO_F_ANY_LAYOUT for legacy device

2015-07-27 Thread Jason Wang
Chapter 6.3 of spec said


Transitional devices MUST offer, and if offered by the device
transitional drivers MUST accept the following:

VIRTIO_F_ANY_LAYOUT (27)


So this patch only clear VIRTIO_F_LAYOUT for legacy device.

Cc: Stefan Hajnoczi stefa...@redhat.com
Cc: Kevin Wolf kw...@redhat.com
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/block/virtio-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 9acbc3a..1d3f26c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -731,7 +731,6 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features,
 virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY);
 virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY);
 virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE);
-virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT);
 if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)) {
 if (s-conf.scsi) {
 error_setg(errp, Virtio 1.0 does not support scsi passthrough!);
@@ -739,6 +738,7 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features,
 }
 virtio_add_feature(features, VIRTIO_F_ANY_LAYOUT);
 } else {
+virtio_clear_feature(features, VIRTIO_F_ANY_LAYOUT);
 virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
 }
 
-- 
2.1.4




[Qemu-block] [PATCH V3 3/3] virtio-blk: set VIRTIO_F_ANY_LAYOUT when 1.0 is supported

2015-07-22 Thread Jason Wang
Chapter 6.3 of spec said


Transitional devices MUST offer, and if offered by the device
transitional drivers MUST accept the following:

VIRTIO_F_ANY_LAYOUT (27)


So this patch sets VIRTIO_F_ANY_LAYOUT when 1.0 is supported.

Cc: Stefan Hajnoczi stefa...@redhat.com
Cc: Kevin Wolf kw...@redhat.com
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/block/virtio-blk.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 4716c3e..2e7a190 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -736,6 +736,7 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features,
 error_setg(errp, Virtio 1.0 does not support scsi passthrough!);
 return 0;
 }
+virtio_add_feature(features, VIRTIO_F_ANY_LAYOUT);
 } else {
 virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
 }
-- 
2.1.4




Re: [Qemu-block] [PATCH V2 2/5] virtio-blk: advertise scsi only when scsi is set

2015-07-15 Thread Jason Wang


On 07/15/2015 03:57 PM, Paolo Bonzini wrote:

 On 15/07/2015 07:29, Jason Wang wrote:
 Cc: Stefan Hajnoczi stefa...@redhat.com
 Cc: Kevin Wolf kw...@redhat.com
 Cc: qemu-block@nongnu.org
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  hw/block/virtio-blk.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
 index 4c27974..761d763 100644
 --- a/hw/block/virtio-blk.c
 +++ b/hw/block/virtio-blk.c
 @@ -731,7 +731,9 @@ static uint64_t virtio_blk_get_features(VirtIODevice 
 *vdev, uint64_t features,
  virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY);
  virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY);
  virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE);
 -virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
 +if (s-conf.scsi) {
 +virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
 +}
 This must only be done for newer machine types only, or you change guest
 ABI for scsi=off.  Effectively you have to split it in two properties,
 scsi and always_set_f_scsi.

 Paolo

And always_set_f_scsi is true only for legacy machine types?




[Qemu-block] [PATCH V2 3/5] virtio-blk: disable scsi passthrough by default

2015-07-14 Thread Jason Wang
Disable scsi passthrough by default since it was incompatible with
virtio 1.0. For legacy machine types, keep this on by default.

Cc: Stefan Hajnoczi stefa...@redhat.com
Cc: Kevin Wolf kw...@redhat.com
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/block/virtio-blk.c | 2 +-
 include/hw/compat.h   | 6 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 761d763..362fe53 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -964,7 +964,7 @@ static Property virtio_blk_properties[] = {
 DEFINE_PROP_STRING(serial, VirtIOBlock, conf.serial),
 DEFINE_PROP_BIT(config-wce, VirtIOBlock, conf.config_wce, 0, true),
 #ifdef __linux__
-DEFINE_PROP_BIT(scsi, VirtIOBlock, conf.scsi, 0, true),
+DEFINE_PROP_BIT(scsi, VirtIOBlock, conf.scsi, 0, false),
 #endif
 DEFINE_PROP_BIT(request-merging, VirtIOBlock, conf.request_merging, 0,
 true),
diff --git a/include/hw/compat.h b/include/hw/compat.h
index 4a43466..56039d8 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -2,7 +2,11 @@
 #define HW_COMPAT_H
 
 #define HW_COMPAT_2_3 \
-/* empty */
+{\
+.driver   = virtio-blk-pci,\
+.property = scsi,\
+.value= on,\
+},
 
 #define HW_COMPAT_2_2 \
 /* empty */
-- 
2.1.4




[Qemu-block] [PATCH V2 4/5] virtio-blk: fail the init when both 1.0 and scsi is set

2015-07-14 Thread Jason Wang
Scsi passthrough was no longer supported in 1.0, so fail the
initialization when user want both features.

Cc: Stefan Hajnoczi stefa...@redhat.com
Cc: Kevin Wolf kw...@redhat.com
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/block/virtio-blk.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 362fe53..4a0ef68 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -732,6 +732,10 @@ static uint64_t virtio_blk_get_features(VirtIODevice 
*vdev, uint64_t features,
 virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY);
 virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE);
 if (s-conf.scsi) {
+if (__virtio_has_feature(features, VIRTIO_F_VERSION_1)) {
+error_setg(errp, Virtio 1.0 does not support scsi passthrough);
+return 0;
+}
 virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
 }
 
-- 
2.1.4




[Qemu-block] [PATCH V2 2/5] virtio-blk: advertise scsi only when scsi is set

2015-07-14 Thread Jason Wang
Cc: Stefan Hajnoczi stefa...@redhat.com
Cc: Kevin Wolf kw...@redhat.com
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/block/virtio-blk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 4c27974..761d763 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -731,7 +731,9 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features,
 virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY);
 virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY);
 virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE);
-virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
+if (s-conf.scsi) {
+virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
+}
 
 if (s-conf.config_wce) {
 virtio_add_feature(features, VIRTIO_BLK_F_CONFIG_WCE);
-- 
2.1.4




Re: [Qemu-block] [Qemu-devel] [PATCH 2/5] virtio-blk: disable scsi passthrough for 1.0 device

2015-07-13 Thread Jason Wang


On 07/13/2015 03:46 PM, Michael S. Tsirkin wrote:
 On Mon, Jul 13, 2015 at 01:46:48PM +0800, Jason Wang wrote:
 VIRTIO_BLK_F_SCSI was no longer supported in 1.0. So disable it.

 Cc: Stefan Hajnoczi stefa...@redhat.com
 Cc: Kevin Wolf kw...@redhat.com
 Cc: qemu-block@nongnu.org
 Signed-off-by: Jason Wang jasow...@redhat.com
 Interesting, I noticed we have a field scsi - see
   commit 1ba1f2e319afdcb485963cd3f426fdffd1b725f2
   Author: Paolo Bonzini pbonz...@redhat.com
   Date:   Fri Dec 23 15:39:03 2011 +0100

   virtio-blk: refuse SG_IO requests with scsi=off

 but it doesn't seem to be propagated to guest features in
 any way.

 Maybe we should fix that, making that flag AutoOnOff?

Looks ok but auto may need some compat work since default is true.

 Then, if user explicitly requested scsi=on with a modern
 interface then we can error out cleanly.

 Given scsi flag is currently ignored, I think
 this can be a patch on top.

Looks like virtio_blk_handle_scsi_req() check this:

if (!blk-conf.scsi) {
status = VIRTIO_BLK_S_UNSUPP;
goto fail;
}


 ---
  hw/block/virtio-blk.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
 index 6aefda4..f30ad25 100644
 --- a/hw/block/virtio-blk.c
 +++ b/hw/block/virtio-blk.c
 @@ -730,7 +730,8 @@ static uint64_t virtio_blk_get_features(VirtIODevice 
 *vdev, uint64_t features)
  virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY);
  virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY);
  virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE);
 -virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
 +if (!__virtio_has_feature(features, VIRTIO_F_VERSION_1))
 +virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
  
  if (s-conf.config_wce) {
  virtio_add_feature(features, VIRTIO_BLK_F_CONFIG_WCE);
 -- 
 2.1.4




[Qemu-block] [PATCH 2/5] virtio-blk: disable scsi passthrough for 1.0 device

2015-07-12 Thread Jason Wang
VIRTIO_BLK_F_SCSI was no longer supported in 1.0. So disable it.

Cc: Stefan Hajnoczi stefa...@redhat.com
Cc: Kevin Wolf kw...@redhat.com
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/block/virtio-blk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 6aefda4..f30ad25 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -730,7 +730,8 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features)
 virtio_add_feature(features, VIRTIO_BLK_F_GEOMETRY);
 virtio_add_feature(features, VIRTIO_BLK_F_TOPOLOGY);
 virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE);
-virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
+if (!__virtio_has_feature(features, VIRTIO_F_VERSION_1))
+virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
 
 if (s-conf.config_wce) {
 virtio_add_feature(features, VIRTIO_BLK_F_CONFIG_WCE);
-- 
2.1.4




[Qemu-block] [PATCH 3/5] virtio-blk: set VIRTIO_F_ANY_LAYOUT when 1.0 is supported

2015-07-12 Thread Jason Wang
Chapter 6.3 of spec said


Transitional devices MUST offer, and if offered by the device
transitional drivers MUST accept the following:

VIRTIO_F_ANY_LAYOUT (27)


So this patch sets VIRTIO_F_ANY_LAYOUT when 1.0 is supported.

Cc: Stefan Hajnoczi stefa...@redhat.com
Cc: Kevin Wolf kw...@redhat.com
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/block/virtio-blk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f30ad25..0f07e25 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -732,6 +732,8 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features)
 virtio_add_feature(features, VIRTIO_BLK_F_BLK_SIZE);
 if (!__virtio_has_feature(features, VIRTIO_F_VERSION_1))
 virtio_add_feature(features, VIRTIO_BLK_F_SCSI);
+else
+virtio_add_feature(features, VIRTIO_F_ANY_LAYOUT);
 
 if (s-conf.config_wce) {
 virtio_add_feature(features, VIRTIO_BLK_F_CONFIG_WCE);
-- 
2.1.4