Re: [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps

2022-04-01 Thread Michael Qiu




On 2022/4/2 10:38, Jason Wang wrote:


在 2022/4/1 下午7:06, Michael Qiu 写道:

Currently in vhost framwork, vhost_reset_device() is misnamed.
Actually, it should be vhost_reset_owner().

In vhost user, it make compatible with reset device ops, but
vhost kernel does not compatible with it, for vhost vdpa, it
only implement reset device action.

So we need seperate the function into vhost_reset_owner() and
vhost_reset_device(). So that different backend could use the
correct function.



I see no reason when RESET_OWNER needs to be done for kernel backend.



In kernel vhost, RESET_OWNER  indeed do vhost device level reset: 
vhost_net_reset_owner()


static long vhost_net_reset_owner(struct vhost_net *n)
{
[...]
err = vhost_dev_check_owner(&n->dev);
if (err)
goto done;
umem = vhost_dev_reset_owner_prepare();
if (!umem) {
err = -ENOMEM;
goto done;
}
vhost_net_stop(n, &tx_sock, &rx_sock);
vhost_net_flush(n);
vhost_dev_stop(&n->dev);
vhost_dev_reset_owner(&n->dev, umem);
vhost_net_vq_reset(n);
[...]

}

In the history of QEMU, There is a commit:
commit d1f8b30ec8dde0318fd1b98d24a64926feae9625
Author: Yuanhan Liu 
Date:   Wed Sep 23 12:19:57 2015 +0800

vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE

Quote from Michael:

We really should rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE.

but finally, it has been reverted by the author:
commit 60915dc4691768c4dc62458bb3e16c843fab091d
Author: Yuanhan Liu 
Date:   Wed Nov 11 21:24:37 2015 +0800

vhost: rename RESET_DEVICE backto RESET_OWNER

This patch basically reverts commit d1f8b30e.

It turned out that it breaks stuff, so revert it:

http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html

Seems kernel take RESET_OWNER for reset,but QEMU never call to this 
function to do a reset.


And if I understand the code correctly, vhost-user "abuse" RESET_OWNER 
for reset. So the current code looks fine?





Signde-off-by: Michael Qiu 
---
  hw/scsi/vhost-user-scsi.c |  6 +-
  hw/virtio/vhost-backend.c |  4 ++--
  hw/virtio/vhost-user.c    | 22 ++
  include/hw/virtio/vhost-backend.h |  2 ++
  4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 1b2f7ee..f179626 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -80,8 +80,12 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
  return;
  }
-    if (dev->vhost_ops->vhost_reset_device) {
+    if (virtio_has_feature(dev->protocol_features,
+   VHOST_USER_PROTOCOL_F_RESET_DEVICE) &&
+   dev->vhost_ops->vhost_reset_device) {
  dev->vhost_ops->vhost_reset_device(dev);
+    } else if (dev->vhost_ops->vhost_reset_owner) {
+    dev->vhost_ops->vhost_reset_owner(dev);



Actually, I fail to understand why we need an indirection via vhost_ops. 
It's guaranteed to be vhost_user_ops.




  }
  }
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index e409a86..abbaa8b 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -191,7 +191,7 @@ static int vhost_kernel_set_owner(struct vhost_dev 
*dev)

  return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
  }
-static int vhost_kernel_reset_device(struct vhost_dev *dev)
+static int vhost_kernel_reset_owner(struct vhost_dev *dev)
  {
  return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
  }
@@ -317,7 +317,7 @@ const VhostOps kernel_ops = {
  .vhost_get_features = vhost_kernel_get_features,
  .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
  .vhost_set_owner = vhost_kernel_set_owner,
-    .vhost_reset_device = vhost_kernel_reset_device,
+    .vhost_reset_owner = vhost_kernel_reset_owner,



I think we can delete the current vhost_reset_device() since it not used 
in any code path.




I planned to use it for vDPA reset, and vhost-user-scsi also use device 
reset.


Thanks,
Michael


Thanks



  .vhost_get_vq_index = vhost_kernel_get_vq_index,
  #ifdef CONFIG_VHOST_VSOCK
  .vhost_vsock_set_guest_cid = vhost_kernel_vsock_set_guest_cid,
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 6abbc9d..4412008 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1475,16 +1475,29 @@ static int vhost_user_get_max_memslots(struct 
vhost_dev *dev,

  return 0;
  }
+static int vhost_user_reset_owner(struct vhost_dev *dev)
+{
+    VhostUserMsg msg = {
+    .hdr.request = VHOST_USER_RESET_OWNER,
+    .hdr.flags = VHOST_USER_VERSION,
+    };
+
+    return vhost_user_write(dev, &msg, NULL, 0);
+}
+
  static int vhost_user_reset_device(struct vhost_dev *dev)
  {
  VhostUserMsg msg = {
+    .hdr.request = VHOST_USER_RESET_DEVICE,
  .hdr.flags = VHOST_USER_VERSI

Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()

2022-04-01 Thread Michael Qiu




On 2022/4/2 10:20, Jason Wang wrote:

Adding Michael.

On Sat, Apr 2, 2022 at 7:08 AM Si-Wei Liu  wrote:




On 3/31/2022 7:53 PM, Jason Wang wrote:

On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu  wrote:

Currently, when VM poweroff, it will trigger vdpa
device(such as mlx bluefield2 VF) reset many times(with 1 datapath
queue pair and one control queue, triggered 3 times), this
leads to below issue:

vhost VQ 2 ring restore failed: -22: Invalid argument (22)

This because in vhost_net_stop(), it will stop all vhost device bind to
this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
, then stop the queue: vhost_virtqueue_stop().

In vhost_dev_stop(), it resets the device, which clear some flags
in low level driver, and in next loop(stop other vhost backends),
qemu try to stop the queue corresponding to the vhost backend,
   the driver finds that the VQ is invalied, this is the root cause.

To solve the issue, vdpa should set vring unready, and
remove reset ops in device stop: vhost_dev_start(hdev, false).

and implement a new function vhost_dev_reset, only reset backend
device after all vhost(per-queue) stoped.

Typo.


Signed-off-by: Michael Qiu
Acked-by: Jason Wang 

Rethink this patch, consider there're devices that don't support
set_vq_ready(). I wonder if we need

1) uAPI to tell the user space whether or not it supports set_vq_ready()

I guess what's more relevant here is to define the uAPI semantics for
unready i.e. set_vq_ready(0) for resuming/stopping virtqueue processing,
as starting vq is comparatively less ambiguous.


Yes.


Considering the
likelihood that this interface may be used for live migration, it would
be nice to come up with variants such as 1) discard inflight request
v.s. 2) waiting for inflight processing to be done,


Or inflight descriptor reporting (which seems to be tricky). But we
can start from net that a discarding may just work.


and 3) timeout in
waiting.


Actually, that's the plan and Eugenio is proposing something like this
via virtio spec:

https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html




2) userspace will call SET_VRING_ENABLE() when the device supports
otherwise it will use RESET.

Are you looking to making virtqueue resume-able through the new
SET_VRING_ENABLE() uAPI?

I think RESET is inevitable in some case, i.e. when guest initiates
device reset by writing 0 to the status register.


Yes, that's all my plan.


For suspend/resume and
live migration use cases, indeed RESET can be substituted with
SET_VRING_ENABLE. Again, it'd need quite some code refactoring to
accommodate this change. Although I'm all for it, it'd be the best to
lay out the plan for multiple phases rather than overload this single
patch too much. You can count my time on this endeavor if you don't mind. :)


You're welcome, I agree we should choose a way to go first:

1) manage to use SET_VRING_ENABLE (more like a workaround anyway)
2) go with virtio-spec (may take a while)
3) don't wait for the spec, have a vDPA specific uAPI first. Note that
I've chatted with most of the vendors and they seem to be fine with
the _S_STOP. If we go this way, we can still provide the forward
compatibility of _S_STOP
4) or do them all (in parallel)

Any thoughts?



virtio-spec should be long-term, not only because the spec goes very 
slowly, but also the hardware upgrade should be a problem.


For short-term, better to take the first one?

Thanks,
Michael

Thanks





And for safety, I suggest tagging this as 7.1.

+1

Regards,
-Siwei




---
v4 --> v3
  Nothing changed, becasue of issue with mimecast,
  when the From: tag is different from the sender,
  the some mail client will take the patch as an
  attachment, RESEND v3 does not work, So resend
  the patch as v4

v3 --> v2:
  Call vhost_dev_reset() at the end of vhost_net_stop().

  Since the vDPA device need re-add the status bit
  VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
  simply, add them inside vhost_vdpa_reset_device, and
  the only way calling vhost_vdpa_reset_device is in
  vhost_net_stop(), so it keeps the same behavior as before.

v2 --> v1:
 Implement a new function vhost_dev_reset,
 reset the backend kernel device at last.
---
   hw/net/vhost_net.c| 24 +---
   hw/virtio/vhost-vdpa.c| 15 +--
   hw/virtio/vhost.c | 15 ++-
   include/hw/virtio/vhost.h |  1 +
   4 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 30379d2..422c9bf 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
   int total_notifiers = data_queue_pairs * 2 + cvq;
   VirtIONet *n = VIRTIO_NET(dev);
   int nvhosts = data_queue_pairs + cvq;
-struct vhost_net *net;
+struct vhost_net *net = NULL;
   int r, e, i, index_end = data_queue_pairs * 2;
  

Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()

2022-04-01 Thread Michael Qiu




On 2022/4/2 9:48, Jason Wang wrote:

On Fri, Apr 1, 2022 at 11:22 AM Michael Qiu  wrote:




On 2022/4/1 10:53, Jason Wang wrote:

On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu  wrote:


Currently, when VM poweroff, it will trigger vdpa
device(such as mlx bluefield2 VF) reset many times(with 1 datapath
queue pair and one control queue, triggered 3 times), this
leads to below issue:

vhost VQ 2 ring restore failed: -22: Invalid argument (22)

This because in vhost_net_stop(), it will stop all vhost device bind to
this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
, then stop the queue: vhost_virtqueue_stop().

In vhost_dev_stop(), it resets the device, which clear some flags
in low level driver, and in next loop(stop other vhost backends),
qemu try to stop the queue corresponding to the vhost backend,
   the driver finds that the VQ is invalied, this is the root cause.

To solve the issue, vdpa should set vring unready, and
remove reset ops in device stop: vhost_dev_start(hdev, false).

and implement a new function vhost_dev_reset, only reset backend
device after all vhost(per-queue) stoped.


Typo.



Signed-off-by: Michael Qiu
Acked-by: Jason Wang 


Rethink this patch, consider there're devices that don't support
set_vq_ready(). I wonder if we need

1) uAPI to tell the user space whether or not it supports set_vq_ready()
2) userspace will call SET_VRING_ENABLE() when the device supports
otherwise it will use RESET.


if the device does not support set_vq_ready() in kernel, it will trigger
kernel oops, at least in current kernel, it does not check where
set_vq_ready has been implemented.

And I checked all vdpa driver in kernel, all drivers has implemented
this ops.


Actually, it's not about whether or not the set_vq_ready() is
implemented. It's about whether the parent supports it correctly:

The ability to suspend and resume a virtqueue is currently beyond the
ability of some transport (e.g PCI).



OK, Got it


For IFCVF:

static void ifcvf_vdpa_set_vq_ready(struct vdpa_device *vdpa_dev,
 u16 qid, bool ready)
{
 struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);

 vf->vring[qid].ready = ready;
}

It seems to follow the PCI transport, so if you just set it to zero,
it simply doesn't work at all. I can see some tricks that are used in
the DPDK driver, maybe we can use the same to "fix" this.

For VDUSE, we are basically the same:

static void vduse_vdpa_set_vq_ready(struct vdpa_device *vdpa,
 u16 idx, bool ready)
{
 struct vduse_dev *dev = vdpa_to_vduse(vdpa);
 struct vduse_virtqueue *vq = &dev->vqs[idx];

 vq->ready = ready;
}

It can't be stopped correctly if we just set it to zero.

For vp_vdpa, it basically wants to abuse the queue_enable, which may
result in a warning in Qemu (and the device isn't stopped).

static void vp_vdpa_set_vq_ready(struct vdpa_device *vdpa,
  u16 qid, bool ready)
{
 struct virtio_pci_modern_device *mdev = vdpa_to_mdev(vdpa);

 vp_modern_set_queue_enable(mdev, qid, ready);
}

ENI did a trick in writing 0 to virtqueue address, so it works for
stop but not the start.

static void eni_vdpa_set_vq_ready(struct vdpa_device *vdpa, u16 qid,
   bool ready)
{
 struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);

 /* ENI is a legacy virtio-pci device. This is not supported
  * by specification. But we can disable virtqueue by setting
  * address to 0.
  */
 if (!ready)
 vp_legacy_set_queue_address(ldev, qid, 0);
}

mlx5 call suspend_vq() which should be fine.

Simulator is probably fine.

So I worry if we use the set_vq_ready(0) it won't work correctly and
will have other issues. The idea is:

- advertise the suspend/resume capability via uAPI, then mlx5_vdpa and
simulator can go with set_vq_ready()
- others can still go with reset(), and we can try to fix them
gradually (and introduce this in the virtio spec).



Totally agreet.



So I think it is OK to call set_vq_ready without check.



And for safety, I suggest tagging this as 7.1.


---
v4 --> v3
  Nothing changed, becasue of issue with mimecast,
  when the From: tag is different from the sender,
  the some mail client will take the patch as an
  attachment, RESEND v3 does not work, So resend
  the patch as v4

v3 --> v2:
  Call vhost_dev_reset() at the end of vhost_net_stop().

  Since the vDPA device need re-add the status bit
  VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
  simply, add them inside vhost_vdpa_reset_device, and
  the only way calling vhost_vdpa_reset_device is in
  vhost_net_stop(), so it keeps the same behavior as before.

v2 --> v1:
 Implement a new function vhost_dev_reset,
 reset the backend kernel device at last.
---
   hw/net/vhost_net.c| 24 

Re: [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps

2022-04-01 Thread Jason Wang



在 2022/4/1 下午7:06, Michael Qiu 写道:

Currently in vhost framwork, vhost_reset_device() is misnamed.
Actually, it should be vhost_reset_owner().

In vhost user, it make compatible with reset device ops, but
vhost kernel does not compatible with it, for vhost vdpa, it
only implement reset device action.

So we need seperate the function into vhost_reset_owner() and
vhost_reset_device(). So that different backend could use the
correct function.



I see no reason when RESET_OWNER needs to be done for kernel backend.

And if I understand the code correctly, vhost-user "abuse" RESET_OWNER 
for reset. So the current code looks fine?





Signde-off-by: Michael Qiu 
---
  hw/scsi/vhost-user-scsi.c |  6 +-
  hw/virtio/vhost-backend.c |  4 ++--
  hw/virtio/vhost-user.c| 22 ++
  include/hw/virtio/vhost-backend.h |  2 ++
  4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 1b2f7ee..f179626 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -80,8 +80,12 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
  return;
  }
  
-if (dev->vhost_ops->vhost_reset_device) {

+if (virtio_has_feature(dev->protocol_features,
+   VHOST_USER_PROTOCOL_F_RESET_DEVICE) &&
+   dev->vhost_ops->vhost_reset_device) {
  dev->vhost_ops->vhost_reset_device(dev);
+} else if (dev->vhost_ops->vhost_reset_owner) {
+dev->vhost_ops->vhost_reset_owner(dev);



Actually, I fail to understand why we need an indirection via vhost_ops. 
It's guaranteed to be vhost_user_ops.




  }
  }
  
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c

index e409a86..abbaa8b 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -191,7 +191,7 @@ static int vhost_kernel_set_owner(struct vhost_dev *dev)
  return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
  }
  
-static int vhost_kernel_reset_device(struct vhost_dev *dev)

+static int vhost_kernel_reset_owner(struct vhost_dev *dev)
  {
  return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
  }
@@ -317,7 +317,7 @@ const VhostOps kernel_ops = {
  .vhost_get_features = vhost_kernel_get_features,
  .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
  .vhost_set_owner = vhost_kernel_set_owner,
-.vhost_reset_device = vhost_kernel_reset_device,
+.vhost_reset_owner = vhost_kernel_reset_owner,



I think we can delete the current vhost_reset_device() since it not used 
in any code path.


Thanks



  .vhost_get_vq_index = vhost_kernel_get_vq_index,
  #ifdef CONFIG_VHOST_VSOCK
  .vhost_vsock_set_guest_cid = vhost_kernel_vsock_set_guest_cid,
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 6abbc9d..4412008 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1475,16 +1475,29 @@ static int vhost_user_get_max_memslots(struct vhost_dev 
*dev,
  return 0;
  }
  
+static int vhost_user_reset_owner(struct vhost_dev *dev)

+{
+VhostUserMsg msg = {
+.hdr.request = VHOST_USER_RESET_OWNER,
+.hdr.flags = VHOST_USER_VERSION,
+};
+
+return vhost_user_write(dev, &msg, NULL, 0);
+}
+
  static int vhost_user_reset_device(struct vhost_dev *dev)
  {
  VhostUserMsg msg = {
+.hdr.request = VHOST_USER_RESET_DEVICE,
  .hdr.flags = VHOST_USER_VERSION,
  };
  
-msg.hdr.request = virtio_has_feature(dev->protocol_features,

- VHOST_USER_PROTOCOL_F_RESET_DEVICE)
-? VHOST_USER_RESET_DEVICE
-: VHOST_USER_RESET_OWNER;
+/* Caller must ensure the backend has VHOST_USER_PROTOCOL_F_RESET_DEVICE
+ * support */
+if (!virtio_has_feature(dev->protocol_features,
+   VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
+return -EPERM;
+}
  
  return vhost_user_write(dev, &msg, NULL, 0);

  }
@@ -2548,6 +2561,7 @@ const VhostOps user_ops = {
  .vhost_set_features = vhost_user_set_features,
  .vhost_get_features = vhost_user_get_features,
  .vhost_set_owner = vhost_user_set_owner,
+.vhost_reset_owner = vhost_user_reset_owner,
  .vhost_reset_device = vhost_user_reset_device,
  .vhost_get_vq_index = vhost_user_get_vq_index,
  .vhost_set_vring_enable = vhost_user_set_vring_enable,
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index 81bf310..affeeb0 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -77,6 +77,7 @@ typedef int (*vhost_get_features_op)(struct vhost_dev *dev,
   uint64_t *features);
  typedef int (*vhost_set_backend_cap_op)(struct vhost_dev *dev);
  typedef int (*vhost_set_owner_op)(struct vhost_dev *dev);
+typedef int (*vhost_reset_owner_op)(struct vhost_dev *dev);

Re: [PATCH v5 3/4] hw/intc/exynos4210: replace 'qemu_split_irq' in combiner and gic

2022-04-01 Thread Zongyuan Li
On Fri, Apr 1, 2022 at 9:35 PM Peter Maydell  wrote:
>
> On Thu, 24 Mar 2022 at 18:16, Zongyuan Li  wrote:
> >
> > Signed-off-by: Zongyuan Li 
> > ---
> >  hw/arm/exynos4210.c   | 26 +++
> >  hw/intc/exynos4210_combiner.c | 81 +++
> >  hw/intc/exynos4210_gic.c  | 36 +---
> >  include/hw/arm/exynos4210.h   | 11 ++---
> >  include/hw/core/split-irq.h   |  5 +--
> >  5 files changed, 126 insertions(+), 33 deletions(-)
>
> Looking at this patch, I think it's ended up quite complicated
> because the exynos4210 code as it stands today is doing
> some rather odd things with interrupts. I'm going to have a
> go at some refactoring patches which try to clean some of that
> oddness up into code more like what we'd write today, which
> should make getting rid of the use of qemu_split_irq() a bit easier.

Should we wait for the refactoring to be done and rebase this patch set on it,
or just remove exynos4210 related code? I'm glad to see if there's anything I
can help with the refactoring.

Thanks for your review.
Li



Re: [PATCH v3 1/4] hw/arm/virt: Consider SMP configuration in CPU topology

2022-04-01 Thread wangyanan (Y)

On 2022/3/30 20:50, Igor Mammedov wrote:

On Sat, 26 Mar 2022 02:49:59 +0800
Gavin Shan  wrote:


Hi Igor,

On 3/25/22 9:19 PM, Igor Mammedov wrote:

On Wed, 23 Mar 2022 15:24:35 +0800
Gavin Shan  wrote:

Currently, the SMP configuration isn't considered when the CPU
topology is populated. In this case, it's impossible to provide
the default CPU-to-NUMA mapping or association based on the socket
ID of the given CPU.

This takes account of SMP configuration when the CPU topology
is populated. The die ID for the given CPU isn't assigned since
it's not supported on arm/virt machine yet. Besides, the cluster
ID for the given CPU is assigned because it has been supported
on arm/virt machine.

Signed-off-by: Gavin Shan 
---
   hw/arm/virt.c | 11 +++
   qapi/machine.json |  6 --
   2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d2e5ecd234..064eac42f7 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2505,6 +2505,7 @@ static const CPUArchIdList 
*virt_possible_cpu_arch_ids(MachineState *ms)
   int n;
   unsigned int max_cpus = ms->smp.max_cpus;
   VirtMachineState *vms = VIRT_MACHINE(ms);
+MachineClass *mc = MACHINE_GET_CLASS(vms);
   
   if (ms->possible_cpus) {

   assert(ms->possible_cpus->len == max_cpus);
@@ -2518,6 +2519,16 @@ static const CPUArchIdList 
*virt_possible_cpu_arch_ids(MachineState *ms)
   ms->possible_cpus->cpus[n].type = ms->cpu_type;
   ms->possible_cpus->cpus[n].arch_id =
   virt_cpu_mp_affinity(vms, n);
+
+assert(!mc->smp_props.dies_supported);
+ms->possible_cpus->cpus[n].props.has_socket_id = true;
+ms->possible_cpus->cpus[n].props.socket_id =
+n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads);
+ms->possible_cpus->cpus[n].props.has_cluster_id = true;
+ms->possible_cpus->cpus[n].props.cluster_id =
+n / (ms->smp.cores * ms->smp.threads);

are there any relation cluster values here and number of clusters with
what virt_cpu_mp_affinity() calculates?
   

They're different clusters. The cluster returned by virt_cpu_mp_affinity()
is reflected to MPIDR_EL1 system register, which is mainly used by VGIC2/3
interrupt controller to send send group interrupts to the CPU cluster. It's
notable that the value returned from virt_cpu_mp_affinity() is always
overrided by KVM. It means this value is only used by TCG for the emulated
GIC2/GIC3.

The cluster in 'ms->possible_cpus' is passed to ACPI PPTT table to populate
the CPU topology.



+ms->possible_cpus->cpus[n].props.has_core_id = true;
+ms->possible_cpus->cpus[n].props.core_id = n / ms->smp.threads;
   

   ms->possible_cpus->cpus[n].props.has_thread_id = true;
   ms->possible_cpus->cpus[n].props.thread_id = n;

of cause target has the right to decide how to allocate IDs, and mgmt
is supposed to query these IDs before using them.
But:
   * IDs within 'props' are supposed to be arch defined.
 (on x86 IDs in range [0-smp.foo_id), on ppc it something different)
 Question is what real hardware does here in ARM case (i.e.
 how .../cores/threads are described on bare-metal)?


On ARM64 bare-metal machine, the core/cluster ID assignment is pretty arbitrary.
I checked the CPU topology on my bare-metal machine, which has following SMP
configurations.

  # lscpu
:
  Thread(s) per core: 4
  Core(s) per socket: 28
  Socket(s):  2

  smp.sockets  = 2
  smp.clusters = 1
  smp.cores= 56   (28 per socket)
  smp.threads  = 4

  // CPU0-111 belongs to socket0 or package0
  // CPU112-223 belongs to socket1 or package1
  # cat /sys/devices/system/cpu/cpu0/topology/package_cpus
  ,,,,,,
  # cat /sys/devices/system/cpu/cpu111/topology/package_cpus
  ,,,,,,
  # cat /sys/devices/system/cpu/cpu112/topology/package_cpus
  ,,,,,,
  # cat /sys/devices/system/cpu/cpu223/topology/package_cpus
  ,,,,,,

  // core/cluster ID spans from 0 to 27 on socket0
  # for i in `seq 0 27`; do cat 
/sys/devices/system/cpu/cpu$i/topology/core_id; done
  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
  # for i in `seq 28 55`; do cat 
/sys/devices/system/cpu/cpu$i/topology/core_id; done
  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
  # for i in `seq 0 27`; do cat 
/sys/devices/system/cpu/cpu$i/topology/cluster_id; done
  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
  # for i in `seq 28 55`; do cat 
/sys/devices/system/cpu/cpu$i/topology/cluster_id; done
  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 2

Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()

2022-04-01 Thread Jason Wang
Adding Michael.

On Sat, Apr 2, 2022 at 7:08 AM Si-Wei Liu  wrote:
>
>
>
> On 3/31/2022 7:53 PM, Jason Wang wrote:
> > On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu  wrote:
> >> Currently, when VM poweroff, it will trigger vdpa
> >> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
> >> queue pair and one control queue, triggered 3 times), this
> >> leads to below issue:
> >>
> >> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
> >>
> >> This because in vhost_net_stop(), it will stop all vhost device bind to
> >> this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
> >> , then stop the queue: vhost_virtqueue_stop().
> >>
> >> In vhost_dev_stop(), it resets the device, which clear some flags
> >> in low level driver, and in next loop(stop other vhost backends),
> >> qemu try to stop the queue corresponding to the vhost backend,
> >>   the driver finds that the VQ is invalied, this is the root cause.
> >>
> >> To solve the issue, vdpa should set vring unready, and
> >> remove reset ops in device stop: vhost_dev_start(hdev, false).
> >>
> >> and implement a new function vhost_dev_reset, only reset backend
> >> device after all vhost(per-queue) stoped.
> > Typo.
> >
> >> Signed-off-by: Michael Qiu
> >> Acked-by: Jason Wang 
> > Rethink this patch, consider there're devices that don't support
> > set_vq_ready(). I wonder if we need
> >
> > 1) uAPI to tell the user space whether or not it supports set_vq_ready()
> I guess what's more relevant here is to define the uAPI semantics for
> unready i.e. set_vq_ready(0) for resuming/stopping virtqueue processing,
> as starting vq is comparatively less ambiguous.

Yes.

> Considering the
> likelihood that this interface may be used for live migration, it would
> be nice to come up with variants such as 1) discard inflight request
> v.s. 2) waiting for inflight processing to be done,

Or inflight descriptor reporting (which seems to be tricky). But we
can start from net that a discarding may just work.

>and 3) timeout in
> waiting.

Actually, that's the plan and Eugenio is proposing something like this
via virtio spec:

https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html

>
> > 2) userspace will call SET_VRING_ENABLE() when the device supports
> > otherwise it will use RESET.
> Are you looking to making virtqueue resume-able through the new
> SET_VRING_ENABLE() uAPI?
>
> I think RESET is inevitable in some case, i.e. when guest initiates
> device reset by writing 0 to the status register.

Yes, that's all my plan.

> For suspend/resume and
> live migration use cases, indeed RESET can be substituted with
> SET_VRING_ENABLE. Again, it'd need quite some code refactoring to
> accommodate this change. Although I'm all for it, it'd be the best to
> lay out the plan for multiple phases rather than overload this single
> patch too much. You can count my time on this endeavor if you don't mind. :)

You're welcome, I agree we should choose a way to go first:

1) manage to use SET_VRING_ENABLE (more like a workaround anyway)
2) go with virtio-spec (may take a while)
3) don't wait for the spec, have a vDPA specific uAPI first. Note that
I've chatted with most of the vendors and they seem to be fine with
the _S_STOP. If we go this way, we can still provide the forward
compatibility of _S_STOP
4) or do them all (in parallel)

Any thoughts?

Thanks

>
> >
> > And for safety, I suggest tagging this as 7.1.
> +1
>
> Regards,
> -Siwei
>
> >
> >> ---
> >> v4 --> v3
> >>  Nothing changed, becasue of issue with mimecast,
> >>  when the From: tag is different from the sender,
> >>  the some mail client will take the patch as an
> >>  attachment, RESEND v3 does not work, So resend
> >>  the patch as v4
> >>
> >> v3 --> v2:
> >>  Call vhost_dev_reset() at the end of vhost_net_stop().
> >>
> >>  Since the vDPA device need re-add the status bit
> >>  VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
> >>  simply, add them inside vhost_vdpa_reset_device, and
> >>  the only way calling vhost_vdpa_reset_device is in
> >>  vhost_net_stop(), so it keeps the same behavior as before.
> >>
> >> v2 --> v1:
> >> Implement a new function vhost_dev_reset,
> >> reset the backend kernel device at last.
> >> ---
> >>   hw/net/vhost_net.c| 24 +---
> >>   hw/virtio/vhost-vdpa.c| 15 +--
> >>   hw/virtio/vhost.c | 15 ++-
> >>   include/hw/virtio/vhost.h |  1 +
> >>   4 files changed, 45 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >> index 30379d2..422c9bf 100644
> >> --- a/hw/net/vhost_net.c
> >> +++ b/hw/net/vhost_net.c
> >> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
> >> *ncs,
> >>   int total_notifiers = data_queue_pairs * 2 + cvq;
> >>   VirtIONet *n = VIRTIO_NET(dev);
> >>   int nvhosts = data_queue_pairs + cvq;
> >> -struct vh

Re: [PATCH v3 1/4] hw/arm/virt: Consider SMP configuration in CPU topology

2022-04-01 Thread wangyanan (Y)

Hi Gavin,

On 2022/3/23 15:24, Gavin Shan wrote:

Currently, the SMP configuration isn't considered when the CPU
topology is populated. In this case, it's impossible to provide
the default CPU-to-NUMA mapping or association based on the socket
ID of the given CPU.

This takes account of SMP configuration when the CPU topology
is populated. The die ID for the given CPU isn't assigned since
it's not supported on arm/virt machine yet. Besides, the cluster
ID for the given CPU is assigned because it has been supported
on arm/virt machine.

Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c | 11 +++
  qapi/machine.json |  6 --
  2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d2e5ecd234..064eac42f7 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2505,6 +2505,7 @@ static const CPUArchIdList 
*virt_possible_cpu_arch_ids(MachineState *ms)
  int n;
  unsigned int max_cpus = ms->smp.max_cpus;
  VirtMachineState *vms = VIRT_MACHINE(ms);
+MachineClass *mc = MACHINE_GET_CLASS(vms);
  
  if (ms->possible_cpus) {

  assert(ms->possible_cpus->len == max_cpus);
@@ -2518,6 +2519,16 @@ static const CPUArchIdList 
*virt_possible_cpu_arch_ids(MachineState *ms)
  ms->possible_cpus->cpus[n].type = ms->cpu_type;
  ms->possible_cpus->cpus[n].arch_id =
  virt_cpu_mp_affinity(vms, n);
+
+assert(!mc->smp_props.dies_supported);
+ms->possible_cpus->cpus[n].props.has_socket_id = true;
+ms->possible_cpus->cpus[n].props.socket_id =
+n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads);
+ms->possible_cpus->cpus[n].props.has_cluster_id = true;
+ms->possible_cpus->cpus[n].props.cluster_id =
+n / (ms->smp.cores * ms->smp.threads);
+ms->possible_cpus->cpus[n].props.has_core_id = true;
+ms->possible_cpus->cpus[n].props.core_id = n / ms->smp.threads;
  ms->possible_cpus->cpus[n].props.has_thread_id = true;
  ms->possible_cpus->cpus[n].props.thread_id = n;
  }
diff --git a/qapi/machine.json b/qapi/machine.json
index 42fc68403d..99c945f258 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -868,10 +868,11 @@
  # @node-id: NUMA node ID the CPU belongs to
  # @socket-id: socket number within node/board the CPU belongs to
  # @die-id: die number within socket the CPU belongs to (since 4.1)
-# @core-id: core number within die the CPU belongs to
+# @cluster-id: cluster number within die the CPU belongs to
+# @core-id: core number within cluster the CPU belongs to
  # @thread-id: thread number within core the CPU belongs to
  #
-# Note: currently there are 5 properties that could be present
+# Note: currently there are 6 properties that could be present
  #   but management should be prepared to pass through other
  #   properties with device_add command to allow for future
  #   interface extension. This also requires the filed names to be kept in
@@ -883,6 +884,7 @@
'data': { '*node-id': 'int',
  '*socket-id': 'int',
  '*die-id': 'int',
+'*cluster-id': 'int',
  '*core-id': 'int',
  '*thread-id': 'int'
}

Since new cluster-id is introduced, you may want to check whether to
update machine_set_cpu_numa_node() and hmp_hotpluggable_cpus(),
accordingly, which both deal with topo-ids. If we need to update them,
it's easier to review to make the whole cluster-id introduction part
a separate patch.

Thanks,
Yanan



Re: [PATCH v2 0/4] Dirty ring and auto converge optimization

2022-04-01 Thread Chongyun Wu

Thanks for review.

On 4/1/2022 9:13 PM, Peter Xu wrote:

Chongyun,

On Mon, Mar 28, 2022 at 09:32:10AM +0800, wuc...@chinatelecom.cn wrote:

From: Chongyun Wu 

v2:
-patch 1: remove patch_1

v1:
-rebase to qemu/master

Overview

This series of patches is to optimize the performance of
online migration using dirty ring and autoconverge.

Mainly through the following aspects to do optimization:
1. Dynamically adjust the dirty ring collection thread to
reduce the occurrence of ring full, thereby reducing the
impact on customers, improving the efficiency of dirty
page collection, and thus improving the migration efficiency.

2. When collecting dirty pages from KVM,
kvm_cpu_synchronize_kick_all is not called if the rate is
limited, and it is called only once before suspending the
virtual machine. Because kvm_cpu_synchronize_kick_all will
become very time-consuming when the CPU is limited, and
there will not be too many dirty pages, so it only needs
to be called once before suspending the virtual machine to
ensure that dirty pages will not be lost and the efficiency
of migration is guaranteed .

3. Based on the characteristic of collecting dirty pages
in the dirty ring, a new dirty page rate calculation method
is proposed to obtain a more accurate dirty page rate.

4. Use a more accurate dirty page rate and calculate the
matched speed limit throttle required to complete the
migration according to the current system bandwidth and
parameters, instead of the current time-consuming method
of trying to get a speed limit, greatly reducing migration
time.


Thanks for the patches.

I'm curious what's the relationship between this series and Yong's?
I personally think it is a complementary relationship. Yong's can limit 
per-vcpu. In the case of memory pressure threads in certain vcpu scenarios, the 
restrictions on other vcpus are very small, and the impact on customers during 
the migration process will be smaller. The auto-convergence optimization of the 
last two patches in this patch series can cope with scenarios where the memory 
pressure is balanced on each vcpu. Each has its own advantages, and customers 
can choose the appropriate mode according to their own application scenarios. 
The first two patches are for the dirty ring, and both auto converge and yong 
modes can improve performance.




If talking about throttling, I do think the old auto-converge was kind of
inefficient comparing to the new per-vcpu ways of throttling at least in
either granularity or on read tolerances (e.g., dirty ring based solution
will not block vcpu readers even if the thread is heavily throttled).
Yes, I agree with that. Through the research of dirty ring and a lot of tests, 
some points that may affect the advantages of dirty ring have been found, so 
some optimizations have been made, and these optimizations are found to be 
effective through testing and verification.
In this patch series, only the last two patches are optimized for autocoverge. 
The first two patches are for all situations where the dirty ring is used, 
including Yong's, and there is no conflict with his. Among them, "kvm: 
Dynamically adjust the rate of dirty ring reaper thread" is proposed to take 
advantage of dirty ring. When the memory pressure is high, speeding up the rate 
at which the reaper thread collects dirty pages can effectively solve the 
problem that the frequent occurrence of ring full leads to the frequent exit of 
the guest and the performance of the guestperf is degraded. When the migration 
thread migrates data, it also completes the synchronization of most dirty pages. 
When the migration thread of the dirty ring synchronizes the dirty pages, it 
will take less time, which will also speed up the migration. These two patches 
will make yong's test results better, and the two optimization points are different.



We've got quite a few techniques taking care of migration convergence
issues (didn't mention postcopy yet..), and I'm wondering whether at some
point we should be more focused and make a chosen one better, rather than
building different blocks servicing the same purpose.
I'm sorry, maybe I should separate these patch series to avoid 
misunderstandings. These patches and yong's should be complementary, and two of 
them can also help yong get some performance improvements.


Thanks,



--
Best Regard,
Chongyun Wu



Re: [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa

2022-04-01 Thread Jason Wang
On Sat, Apr 2, 2022 at 6:32 AM Si-Wei Liu  wrote:
>
>
>
> On 3/31/2022 1:39 AM, Jason Wang wrote:
> > On Wed, Mar 30, 2022 at 11:48 PM Si-Wei Liu  wrote:
> >>
> >>
> >> On 3/30/2022 2:00 AM, Jason Wang wrote:
> >>> On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu  wrote:
>  With MQ enabled vdpa device and non-MQ supporting guest e.g.
>  booting vdpa with mq=on over OVMF of single vqp, below assert
>  failure is seen:
> 
>  ../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx 
>  >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.
> 
>  0  0x7f8ce3ff3387 in raise () at /lib64/libc.so.6
>  1  0x7f8ce3ff4a78 in abort () at /lib64/libc.so.6
>  2  0x7f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
>  3  0x7f8ce3fec252 in  () at /lib64/libc.so.6
>  4  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
>  idx=) at ../hw/virtio/vhost-vdpa.c:563
>  5  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
>  idx=) at ../hw/virtio/vhost-vdpa.c:558
>  6  0x558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, 
>  vdev=0x558f568f91f0, n=2, mask=) at 
>  ../hw/virtio/vhost.c:1557
>  7  0x558f52c6b89a in virtio_pci_set_guest_notifier 
>  (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, 
>  with_irqfd=with_irqfd@entry=false)
>   at ../hw/virtio/virtio-pci.c:974
>  8  0x558f52c6c0d8 in virtio_pci_set_guest_notifiers 
>  (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
>  9  0x558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, 
>  ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, 
>  cvq=cvq@entry=1)
>   at ../hw/net/vhost_net.c:361
>  10 0x558f52d4e5e7 in virtio_net_set_status (status=, 
>  n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
>  11 0x558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, 
>  status=15 '\017') at ../hw/net/virtio-net.c:370
>  12 0x558f52d6c4b2 in virtio_set_status 
>  (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at 
>  ../hw/virtio/virtio.c:1945
>  13 0x558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, 
>  addr=, val=, size=) at 
>  ../hw/virtio/virtio-pci.c:1292
>  14 0x558f52d15d6e in memory_region_write_accessor 
>  (mr=0x558f568f19d0, addr=20, value=, size=1, 
>  shift=, mask=, attrs=...)
>   at ../softmmu/memory.c:492
>  15 0x558f52d127de in access_with_adjusted_size (addr=addr@entry=20, 
>  value=value@entry=0x7f8cdbffe748, size=size@entry=1, 
>  access_size_min=, access_size_max=, 
>  access_fn=0x558f52d15cf0 , 
>  mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
>  16 0x558f52d157ef in memory_region_dispatch_write 
>  (mr=mr@entry=0x558f568f19d0, addr=20, data=, 
>  op=, attrs=attrs@entry=...)
>   at ../softmmu/memory.c:1504
>  17 0x558f52d078e7 in flatview_write_continue 
>  (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., 
>  ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=, 
>  l=, mr=0x558f568f19d0) at 
>  /home/opc/qemu-upstream/include/qemu/host-utils.h:165
>  18 0x558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, 
>  addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at 
>  ../softmmu/physmem.c:2822
>  19 0x558f52d0b36b in address_space_write (as=, 
>  addr=, attrs=..., buf=buf@entry=0x7f8ce6300028, 
>  len=)
>   at ../softmmu/physmem.c:2914
>  20 0x558f52d0b3da in address_space_rw (as=, 
>  addr=, attrs=...,
>   attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=, 
>  is_write=) at ../softmmu/physmem.c:2924
>  21 0x558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at 
>  ../accel/kvm/kvm-all.c:2903
>  22 0x558f52dcfabd in kvm_vcpu_thread_fn 
>  (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
>  23 0x558f52f9f04a in qemu_thread_start (args=) at 
>  ../util/qemu-thread-posix.c:556
>  24 0x7f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
>  25 0x7f8ce40bb9fd in clone () at /lib64/libc.so.6
> 
>  The cause for the assert failure is due to that the vhost_dev index
>  for the ctrl vq was not aligned with actual one in use by the guest.
>  Upon multiqueue feature negotiation in virtio_net_set_multiqueue(),
>  if guest doesn't support multiqueue, the guest vq layout would shrink
>  to a single queue pair, consisting of 3 vqs in total (rx, tx and ctrl).
>  This results in ctrl_vq taking a different vhost_dev group index than
>  the default. We can map vq to the correct vhost_dev group by checking
>  if MQ is supported by guest and successfully negotiated. Since the
>  MQ feature is only present along with CTRL_VQ, we make sure

Re: [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps

2022-04-01 Thread Michael Qiu




On 2022/4/2 8:44, Si-Wei Liu wrote:



On 4/1/2022 4:06 AM, Michael Qiu wrote:

Currently in vhost framwork, vhost_reset_device() is misnamed.
Actually, it should be vhost_reset_owner().

In vhost user, it make compatible with reset device ops, but
vhost kernel does not compatible with it, for vhost vdpa, it
only implement reset device action.

So we need seperate the function into vhost_reset_owner() and
vhost_reset_device(). So that different backend could use the
correct function.

Signde-off-by: Michael Qiu 
---
  hw/scsi/vhost-user-scsi.c |  6 +-
  hw/virtio/vhost-backend.c |  4 ++--
  hw/virtio/vhost-user.c    | 22 ++
  include/hw/virtio/vhost-backend.h |  2 ++
  4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 1b2f7ee..f179626 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -80,8 +80,12 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
  return;
  }
-    if (dev->vhost_ops->vhost_reset_device) {
+    if (virtio_has_feature(dev->protocol_features,
+   VHOST_USER_PROTOCOL_F_RESET_DEVICE) &&
This line change is not needed. VHOST_USER_PROTOCOL_F_RESET_DEVICE is 
guaranteed to be available if getting here.

+   dev->vhost_ops->vhost_reset_device) {
  dev->vhost_ops->vhost_reset_device(dev);
+    } else if (dev->vhost_ops->vhost_reset_owner) {
+    dev->vhost_ops->vhost_reset_owner(dev);
Nope, drop these two lines. The caller of vhost_user_scsi_reset() 
doesn't expect vhost_reset_owner to be called in case vhost_reset_device 
is not implemented.




 You are right, I will drop these two lines and remove 
VHOST_USER_PROTOCOL_F_RESET_DEVICE  check.




  }
  }
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index e409a86..abbaa8b 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -191,7 +191,7 @@ static int vhost_kernel_set_owner(struct vhost_dev 
*dev)

  return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
  }
-static int vhost_kernel_reset_device(struct vhost_dev *dev)
+static int vhost_kernel_reset_owner(struct vhost_dev *dev)
  {
  return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
  }
@@ -317,7 +317,7 @@ const VhostOps kernel_ops = {
  .vhost_get_features = vhost_kernel_get_features,
  .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
  .vhost_set_owner = vhost_kernel_set_owner,
-    .vhost_reset_device = vhost_kernel_reset_device,
+    .vhost_reset_owner = vhost_kernel_reset_owner,
  .vhost_get_vq_index = vhost_kernel_get_vq_index,
  #ifdef CONFIG_VHOST_VSOCK
  .vhost_vsock_set_guest_cid = vhost_kernel_vsock_set_guest_cid,
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 6abbc9d..4412008 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1475,16 +1475,29 @@ static int vhost_user_get_max_memslots(struct 
vhost_dev *dev,

  return 0;
  }
+static int vhost_user_reset_owner(struct vhost_dev *dev)
+{
+    VhostUserMsg msg = {
+    .hdr.request = VHOST_USER_RESET_OWNER,
+    .hdr.flags = VHOST_USER_VERSION,
+    };
+
+    return vhost_user_write(dev, &msg, NULL, 0);
+}
+
  static int vhost_user_reset_device(struct vhost_dev *dev)
  {
  VhostUserMsg msg = {
+    .hdr.request = VHOST_USER_RESET_DEVICE,
  .hdr.flags = VHOST_USER_VERSION,
  };
-    msg.hdr.request = virtio_has_feature(dev->protocol_features,
- 
VHOST_USER_PROTOCOL_F_RESET_DEVICE)

-    ? VHOST_USER_RESET_DEVICE
-    : VHOST_USER_RESET_OWNER;
+    /* Caller must ensure the backend has 
VHOST_USER_PROTOCOL_F_RESET_DEVICE

+ * support */
+    if (!virtio_has_feature(dev->protocol_features,
+   VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
+    return -EPERM;
+    }
I think we can safely remove this check, since the caller already 
guarantees VHOST_USER_PROTOCOL_F_RESET_DEVICE is around as what your 
comment mentions.




I think it probely worth to check, because for vhost_net_stop() it does 
not check this flag, otherwise we should check if the backend is vhost 
user with this flag enabled.


The previous branch condition is to reuse the vhost_reset_device op for 
two different ends, but there's no actual user for 
VHOST_USER_RESET_OWNER historically.


Thanks,
-Siwei


  return vhost_user_write(dev, &msg, NULL, 0);
  }
@@ -2548,6 +2561,7 @@ const VhostOps user_ops = {
  .vhost_set_features = vhost_user_set_features,
  .vhost_get_features = vhost_user_get_features,
  .vhost_set_owner = vhost_user_set_owner,
+    .vhost_reset_owner = vhost_user_reset_owner,
  .vhost_reset_device = vhost_user_reset_device,
  .vhost_get_vq_index = vhost_user_get_vq_index,
  .vhost_set_vring_enable = vhost_user_set_vring_enable,
diff --gi

Re: [PATCH v3 2/4] hw/arm/virt: Fix CPU's default NUMA node ID

2022-04-01 Thread wangyanan (Y)



On 2022/3/23 15:24, Gavin Shan wrote:

When CPU-to-NUMA association isn't explicitly provided by users,
the default on is given by mc->get_default_cpu_node_id(). However,

s/on/one

the CPU topology isn't fully considered in the default association
and this causes CPU topology broken warnings on booting Linux guest.

For example, the following warning messages are observed when the
Linux guest is booted with the following command lines.

   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
   -accel kvm -machine virt,gic-version=host   \
   -cpu host   \
   -smp 6,sockets=2,cores=3,threads=1  \
   -m 1024M,slots=16,maxmem=64G\
   -object memory-backend-ram,id=mem0,size=128M\
   -object memory-backend-ram,id=mem1,size=128M\
   -object memory-backend-ram,id=mem2,size=128M\
   -object memory-backend-ram,id=mem3,size=128M\
   -object memory-backend-ram,id=mem4,size=128M\
   -object memory-backend-ram,id=mem4,size=384M\
   -numa node,nodeid=0,memdev=mem0 \
   -numa node,nodeid=1,memdev=mem1 \
   -numa node,nodeid=2,memdev=mem2 \
   -numa node,nodeid=3,memdev=mem3 \
   -numa node,nodeid=4,memdev=mem4 \
   -numa node,nodeid=5,memdev=mem5
  :
   alternatives: patching kernel code
   BUG: arch topology borken
   the CLS domain not a subset of the MC domain
   
   BUG: arch topology borken
   the DIE domain not a subset of the NODE domain

With current implementation of mc->get_default_cpu_node_id(),
CPU#0 to CPU#5 are associated with NODE#0 to NODE#5 separately.
That's incorrect because CPU#0/1/2 should be associated with same
NUMA node because they're seated in same socket.

This fixes the issue by considering the socket ID when the default
CPU-to-NUMA association is provided in virt_possible_cpu_arch_ids().
With this applied, no more CPU topology broken warnings are seen
from the Linux guest. The 6 CPUs are associated with NODE#0/1, but
there are no CPUs associated with NODE#2/3/4/5.

Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

Reviewed-by: Yanan Wang 

Thanks,
Yanan

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 064eac42f7..3286915229 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2497,7 +2497,9 @@ virt_cpu_index_to_props(MachineState *ms, unsigned 
cpu_index)
  
  static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)

  {
-return idx % ms->numa_state->num_nodes;
+int64_t socket_id = ms->possible_cpus->cpus[idx].props.socket_id;
+
+return socket_id % ms->numa_state->num_nodes;
  }
  
  static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)





Re: [PATCH 4/7] virtio: don't read pending event on host notifier if disabled

2022-04-01 Thread Jason Wang
On Sat, Apr 2, 2022 at 4:37 AM Si-Wei Liu  wrote:
>
>
>
> On 3/31/2022 1:36 AM, Jason Wang wrote:
> > On Thu, Mar 31, 2022 at 12:41 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 3/30/2022 2:14 AM, Jason Wang wrote:
> >>> On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu  wrote:
>  Previous commit prevents vhost-user and vhost-vdpa from using
>  userland vq handler via disable_ioeventfd_handler. The same
>  needs to be done for host notifier cleanup too, as the
>  virtio_queue_host_notifier_read handler still tends to read
>  pending event left behind on ioeventfd and attempts to handle
>  outstanding kicks from QEMU userland vq.
> 
>  If vq handler is not disabled on cleanup, it may lead to sigsegv
>  with recursive virtio_net_set_status call on the control vq:
> 
>  0  0x7f8ce3ff3387 in raise () at /lib64/libc.so.6
>  1  0x7f8ce3ff4a78 in abort () at /lib64/libc.so.6
>  2  0x7f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
>  3  0x7f8ce3fec252 in  () at /lib64/libc.so.6
>  4  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
>  idx=) at ../hw/virtio/vhost-vdpa.c:563
>  5  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
>  idx=) at ../hw/virtio/vhost-vdpa.c:558
>  6  0x558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, 
>  vdev=0x558f568f91f0, n=2, mask=) at 
>  ../hw/virtio/vhost.c:1557
> >>> I feel it's probably a bug elsewhere e.g when we fail to start
> >>> vhost-vDPA, it's the charge of the Qemu to poll host notifier and we
> >>> will fallback to the userspace vq handler.
> >> Apologies, an incorrect stack trace was pasted which actually came from
> >> patch #1. I will post a v2 with the corresponding one as below:
> >>
> >> 0  0x55f800df1780 in qdev_get_parent_bus (dev=0x0) at
> >> ../hw/core/qdev.c:376
> >> 1  0x55f800c68ad8 in virtio_bus_device_iommu_enabled
> >> (vdev=vdev@entry=0x0) at ../hw/virtio/virtio-bus.c:331
> >> 2  0x55f800d70d7f in vhost_memory_unmap (dev=) at
> >> ../hw/virtio/vhost.c:318
> >> 3  0x55f800d70d7f in vhost_memory_unmap (dev=,
> >> buffer=0x7fc19bec5240, len=2052, is_write=1, access_len=2052) at
> >> ../hw/virtio/vhost.c:336
> >> 4  0x55f800d71867 in vhost_virtqueue_stop
> >> (dev=dev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590,
> >> vq=0x55f8037cceb0, idx=0) at ../hw/virtio/vhost.c:1241
> >> 5  0x55f800d7406c in vhost_dev_stop (hdev=hdev@entry=0x55f8037ccc30,
> >> vdev=vdev@entry=0x55f8044ec590) at ../hw/virtio/vhost.c:1839
> >> 6  0x55f800bf00a7 in vhost_net_stop_one (net=0x55f8037ccc30,
> >> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:315
> >> 7  0x55f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
> >> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
> >> cvq=cvq@entry=1)
> >>  at ../hw/net/vhost_net.c:423
> >> 8  0x55f800d4e628 in virtio_net_set_status (status=,
> >> n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
> >> 9  0x55f800d4e628 in virtio_net_set_status
> >> (vdev=vdev@entry=0x55f8044ec590, status=15 '\017') at
> >> ../hw/net/virtio-net.c:370
> > I don't understand why virtio_net_handle_ctrl() call 
> > virtio_net_set_stauts()...
> The pending request left over on the ctrl vq was a VIRTIO_NET_CTRL_MQ
> command, i.e. in virtio_net_handle_mq():

Completely forget that the code was actually written by me :\

>
> 1413 n->curr_queue_pairs = queue_pairs;
> 1414 /* stop the backend before changing the number of queue_pairs
> to avoid handling a
> 1415  * disabled queue */
> 1416 virtio_net_set_status(vdev, vdev->status);
> 1417 virtio_net_set_queue_pairs(n);
>
> Noted before the vdpa multiqueue support, there was never a vhost_dev
> for ctrl_vq exposed, i.e. there's no host notifier set up for the
> ctrl_vq on vhost_kernel as it is emulated in QEMU software.
>
> >
> >> 10 0x55f800d534d8 in virtio_net_handle_ctrl (iov_cnt= >> out>, iov=, cmd=0 '\000', n=0x55f8044ec590) at
> >> ../hw/net/virtio-net.c:1408
> >> 11 0x55f800d534d8 in virtio_net_handle_ctrl (vdev=0x55f8044ec590,
> >> vq=0x7fc1a7e888d0) at ../hw/net/virtio-net.c:1452
> >> 12 0x55f800d69f37 in virtio_queue_host_notifier_read
> >> (vq=0x7fc1a7e888d0) at ../hw/virtio/virtio.c:2331
> >> 13 0x55f800d69f37 in virtio_queue_host_notifier_read
> >> (n=n@entry=0x7fc1a7e8894c) at ../hw/virtio/virtio.c:3575
> >> 14 0x55f800c688e6 in virtio_bus_cleanup_host_notifier
> >> (bus=, n=n@entry=14) at ../hw/virtio/virtio-bus.c:312
> >> 15 0x55f800d73106 in vhost_dev_disable_notifiers
> >> (hdev=hdev@entry=0x55f8035b51b0, vdev=vdev@entry=0x55f8044ec590)
> >>  at ../../../include/hw/virtio/virtio-bus.h:35
> >> 16 0x55f800bf00b2 in vhost_net_stop_one (net=0x55f8035b51b0,
> >> dev=0x55f8044ec590) at ../hw/net/vhost_net.c:316
> >> 17 0x55f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
> >> ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
> >> cvq=cvq@entry=1)

Re: [PATCH V2 4/4] intel-iommu: PASID support

2022-04-01 Thread Jason Wang
On Fri, Apr 1, 2022 at 9:43 PM Yi Liu  wrote:
>
>
>
> On 2022/3/29 12:54, Jason Wang wrote:
> >
> > 在 2022/3/28 下午4:45, Yi Liu 写道:
> >>
> >>
> >> On 2022/3/21 13:54, Jason Wang wrote:
> >>> This patch introduce ECAP_PASID via "x-pasid-mode". Based on the
> >>> existing support for scalable mode, we need to implement the following
> >>> missing parts:
> >>>
> >>> 1) tag VTDAddressSpace with PASID and support IOMMU/DMA translation
> >>> with PASID
> >>
> >> should it be tagging with bdf+pasid?
> >
> >
> > The problem is BDF is programmable by the guest. So we may end up
> > duplicated BDFs. That's why the code uses struct PCIBus.
>
> how about the devfn? will it also change?

The code has already used devfn, doesn't it?

struct vtd_as_key {
PCIBus *bus;
uint8_t devfn;
uint32_t pasid;
};

Thanks

>  taggiing addressspace with
> BDF+PASID mostly suits the spec since the PASID table is per-bdf. If
> bus number may change, using PCIBus is fine.
>
> Regards,
> Yi Liu
>




Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()

2022-04-01 Thread Jason Wang
On Fri, Apr 1, 2022 at 11:22 AM Michael Qiu  wrote:
>
>
>
> On 2022/4/1 10:53, Jason Wang wrote:
> > On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu  wrote:
> >>
> >> Currently, when VM poweroff, it will trigger vdpa
> >> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
> >> queue pair and one control queue, triggered 3 times), this
> >> leads to below issue:
> >>
> >> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
> >>
> >> This because in vhost_net_stop(), it will stop all vhost device bind to
> >> this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
> >> , then stop the queue: vhost_virtqueue_stop().
> >>
> >> In vhost_dev_stop(), it resets the device, which clear some flags
> >> in low level driver, and in next loop(stop other vhost backends),
> >> qemu try to stop the queue corresponding to the vhost backend,
> >>   the driver finds that the VQ is invalied, this is the root cause.
> >>
> >> To solve the issue, vdpa should set vring unready, and
> >> remove reset ops in device stop: vhost_dev_start(hdev, false).
> >>
> >> and implement a new function vhost_dev_reset, only reset backend
> >> device after all vhost(per-queue) stoped.
> >
> > Typo.
> >
> >>
> >> Signed-off-by: Michael Qiu
> >> Acked-by: Jason Wang 
> >
> > Rethink this patch, consider there're devices that don't support
> > set_vq_ready(). I wonder if we need
> >
> > 1) uAPI to tell the user space whether or not it supports set_vq_ready()
> > 2) userspace will call SET_VRING_ENABLE() when the device supports
> > otherwise it will use RESET.
>
> if the device does not support set_vq_ready() in kernel, it will trigger
> kernel oops, at least in current kernel, it does not check where
> set_vq_ready has been implemented.
>
> And I checked all vdpa driver in kernel, all drivers has implemented
> this ops.

Actually, it's not about whether or not the set_vq_ready() is
implemented. It's about whether the parent supports it correctly:

The ability to suspend and resume a virtqueue is currently beyond the
ability of some transport (e.g PCI).

For IFCVF:

static void ifcvf_vdpa_set_vq_ready(struct vdpa_device *vdpa_dev,
u16 qid, bool ready)
{
struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);

vf->vring[qid].ready = ready;
}

It seems to follow the PCI transport, so if you just set it to zero,
it simply doesn't work at all. I can see some tricks that are used in
the DPDK driver, maybe we can use the same to "fix" this.

For VDUSE, we are basically the same:

static void vduse_vdpa_set_vq_ready(struct vdpa_device *vdpa,
u16 idx, bool ready)
{
struct vduse_dev *dev = vdpa_to_vduse(vdpa);
struct vduse_virtqueue *vq = &dev->vqs[idx];

vq->ready = ready;
}

It can't be stopped correctly if we just set it to zero.

For vp_vdpa, it basically wants to abuse the queue_enable, which may
result in a warning in Qemu (and the device isn't stopped).

static void vp_vdpa_set_vq_ready(struct vdpa_device *vdpa,
 u16 qid, bool ready)
{
struct virtio_pci_modern_device *mdev = vdpa_to_mdev(vdpa);

vp_modern_set_queue_enable(mdev, qid, ready);
}

ENI did a trick in writing 0 to virtqueue address, so it works for
stop but not the start.

static void eni_vdpa_set_vq_ready(struct vdpa_device *vdpa, u16 qid,
  bool ready)
{
struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);

/* ENI is a legacy virtio-pci device. This is not supported
 * by specification. But we can disable virtqueue by setting
 * address to 0.
 */
if (!ready)
vp_legacy_set_queue_address(ldev, qid, 0);
}

mlx5 call suspend_vq() which should be fine.

Simulator is probably fine.

So I worry if we use the set_vq_ready(0) it won't work correctly and
will have other issues. The idea is:

- advertise the suspend/resume capability via uAPI, then mlx5_vdpa and
simulator can go with set_vq_ready()
- others can still go with reset(), and we can try to fix them
gradually (and introduce this in the virtio spec).

>
> So I think it is OK to call set_vq_ready without check.
>
> >
> > And for safety, I suggest tagging this as 7.1.
> >
> >> ---
> >> v4 --> v3
> >>  Nothing changed, becasue of issue with mimecast,
> >>  when the From: tag is different from the sender,
> >>  the some mail client will take the patch as an
> >>  attachment, RESEND v3 does not work, So resend
> >>  the patch as v4
> >>
> >> v3 --> v2:
> >>  Call vhost_dev_reset() at the end of vhost_net_stop().
> >>
> >>  Since the vDPA device need re-add the status bit
> >>  VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
> >>  simply, add them inside vhost_vdpa_reset_device, and
> >>  the only way calling vhost_vdpa_reset_device is in
> >>  vhost_net_stop(), so it keeps the same behavior as before.
> >>

Re: [PATCH 7/7] vhost-vdpa: backend feature should set only once

2022-04-01 Thread Jason Wang
On Fri, Apr 1, 2022 at 12:18 PM Si-Wei Liu  wrote:
>
>
>
> On 3/31/2022 7:39 PM, Jason Wang wrote:
> > On Thu, Mar 31, 2022 at 5:20 PM Eugenio Perez Martin
> >  wrote:
> >> On Thu, Mar 31, 2022 at 10:54 AM Jason Wang  wrote:
> >>>
> >>> 在 2022/3/31 下午4:02, Eugenio Perez Martin 写道:
>  On Thu, Mar 31, 2022 at 1:03 AM Si-Wei Liu  wrote:
> >
> > On 3/30/2022 12:01 PM, Eugenio Perez Martin wrote:
> >> On Wed, Mar 30, 2022 at 8:33 AM Si-Wei Liu  
> >> wrote:
> >>> The vhost_vdpa_one_time_request() branch in
> >>> vhost_vdpa_set_backend_cap() incorrectly sends down
> >>> iotls on vhost_dev with non-zero index. This may
> >>> end up with multiple VHOST_SET_BACKEND_FEATURES
> >>> ioctl calls sent down on the vhost-vdpa fd that is
> >>> shared between all these vhost_dev's.
> >>>
> >> Not only that. This means that qemu thinks the device supports iotlb
> >> batching as long as the device does not have cvq. If vdpa does not
> >> support batching, it will return an error later with no possibility of
> >> doing it ok.
> > I think the implicit assumption here is that the caller should back off
> > to where it was if it comes to error i.e. once the first
> > vhost_dev_set_features call gets an error, vhost_dev_start() will fail
> > straight.
>  Sorry, I don't follow you here, and maybe my message was not clear 
>  enough.
> 
>  What I meant is that your patch fixes another problem not stated in
>  the message: it is not possible to initialize a net vdpa device that
>  does not have cvq and does not support iotlb batches without it. Qemu
>  will assume that the device supports batching, so the write of
>  VHOST_IOTLB_BATCH_BEGIN will fail. I didn't test what happens next but
>  it probably cannot continue.
> >>>
> >>> So you mean we actually didn't call VHOST_SET_BACKEND_CAP in this case.
> >>> Fortunately, kernel didn't check the backend cap when accepting batching
> >>> hints.
> >>>
> >>> We are probably fine?
> >>>
> >> We're fine as long as the vdpa driver in the kernel effectively
> >> supports batching. If not, qemu will try to batch, and it will fail.
> >>
> >> It was introduced in v5.9, so qemu has not supported kernel <5.9 since
> >> we introduced multiqueue support (I didn't test). Unless we apply this
> >> patch. That's the reason it should be marked as fixed and backported
> >> to stable IMO.
> > Ok, so it looks to me we have more issues.
> >
> > In vhost_vdpa_set_backend_cap() we fail when
> > VHOST_VDPA_GET_BACKEND_FEATURES fails. This breaks the older kernel
> > since that ioctl is introduced in
> >
> > 653055b9acd4 ("vhost-vdpa: support get/set backend features")
> Yep, the GET/SET_BACKEND ioctl pair got introduced together in this
> exact commit.
> >
> > We should:
> >
> > 1) make it work by not failing the vhost_vdpa_set_backend_cap() and
> > assuming MSG_V2.
> This issue is orthogonal with my fix, which was pre-existing before the
> multiqueue support. I believe there should be another separate patch to
> fix QEMU for pre-GET/SET_BACKEND kernel.

Right.

>
> > 2) check the batching support in vhost_vdpa_listener_begin_batch()
> > instead of trying to set VHOST_IOTLB_BATCH_BEGIN uncondtionally
> This is non-issue since VHOST_BACKEND_F_IOTLB_BATCH is already validated
> in the caller vhost_vdpa_iotlb_batch_begin_once().

Yes, I miss that optimization.

Thanks

>
> -Siwei
> >
> > Thanks
> >
> >> Thanks!
> >>
> >>> Thanks
> >>>
> >>>
>  In that regard, this commit needs to be marked as "Fixes: ...", either
>  ("a5bd058 vhost-vdpa: batch updating IOTLB mappings") or maybe better
>  ("4d191cf vhost-vdpa: classify one time request"). We have a
>  regression if we introduce both, or the second one and the support of
>  any other backend feature.
> 
> > Noted that the VHOST_SET_BACKEND_FEATURES ioctl is not per-vq
> > and it doesn't even need to. There seems to me no possibility for it to
> > fail in a way as thought here. The capture is that IOTLB batching is at
> > least a vdpa device level backend feature, if not per-kernel. Same as
> > IOTLB_MSG_V2.
> >
>  At this moment it is per-kernel, yes. With your patch there is no need
>  to fail because of the lack of _F_IOTLB_BATCH, the code should handle
>  this case ok.
> 
>  But if VHOST_GET_BACKEND_FEATURES returns no support for
>  VHOST_BACKEND_F_IOTLB_MSG_V2, the qemu code will happily send v2
>  messages anyway. This has nothing to do with the patch, I'm just
>  noting it here.
> 
>  In that case, maybe it is better to return something like -ENOTSUP?
> 
>  Thanks!
> 
> > -Siwei
> >
> >> Some open questions:
> >>
> >> Should we make the vdpa driver return error as long as a feature is
> >> used but not set by qemu, or let it as undefined? I guess we have to
> >> keep the batching at least without checking so the ker

Re: [PATCH] vhost: Fix bad return of descriptors to SVQ

2022-04-01 Thread Jason Wang
On Fri, Apr 1, 2022 at 3:31 PM Eugenio Perez Martin  wrote:
>
> On Fri, Apr 1, 2022 at 4:30 AM Jason Wang  wrote:
> >
> > On Fri, Apr 1, 2022 at 2:14 AM Eugenio Pérez  wrote:
> > >
> > > Only the first one of them were properly enqueued back.
> > >
> > > Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
> > > Signed-off-by: Eugenio Pérez 
> > > ---
> > >  hw/virtio/vhost-shadow-virtqueue.c | 17 +++--
> > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > > b/hw/virtio/vhost-shadow-virtqueue.c
> > > index b232803d1b..c17506df20 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > @@ -333,13 +333,25 @@ static void 
> > > vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
> > >  svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> > >  }
> > >
> > > +static uint16_t vhost_svq_last_desc_of_chain(VhostShadowVirtqueue *svq,
> > > + uint16_t i)
> > > +{
> > > +vring_desc_t *descs = svq->vring.desc;
> > > +
> > > +while (le16_to_cpu(descs[i].flags) & VRING_DESC_F_NEXT) {
> > > +i = le16_to_cpu(descs[i].next);
> >
> >
> > This seems to be a guest trigger-able infinite loop?
> >
>
> This is the list of the SVQ vring. We could consider an infinite loop
> triggable by the device if it can write the vring directly.
>

Ok.

> I can add a counter in the loop, or to maintain an internal copy of
> the vring so it's completely hardened against malicious/bad devices.
> It should be done for packed vring anyway.

Yes, let's do that. It would be better if we don't trust the device.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > > +}
> > > +
> > > +return i;
> > > +}
> > > +
> > >  static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
> > > uint32_t *len)
> > >  {
> > >  vring_desc_t *descs = svq->vring.desc;
> > >  const vring_used_t *used = svq->vring.used;
> > >  vring_used_elem_t used_elem;
> > > -uint16_t last_used;
> > > +uint16_t last_used, last_used_chain;
> > >
> > >  if (!vhost_svq_more_used(svq)) {
> > >  return NULL;
> > > @@ -365,7 +377,8 @@ static VirtQueueElement 
> > > *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
> > >  return NULL;
> > >  }
> > >
> > > -descs[used_elem.id].next = svq->free_head;
> > > +last_used_chain = vhost_svq_last_desc_of_chain(svq, used_elem.id);
> > > +descs[last_used_chain].next = svq->free_head;
> > >  svq->free_head = used_elem.id;
> > >
> > >  *len = used_elem.len;
> > > --
> > > 2.27.0
> > >
> >
>




Re: [PATCH 2/3] vhost: add vhost_dev_reset()

2022-04-01 Thread Si-Wei Liu




On 4/1/2022 4:06 AM, Michael Qiu wrote:

Not all vhost-user backends support ops->vhost_reset_device(). Instead
of adding backend check and call backend ops directly, it's better to
implement a function in vhost framework, so that it could hide vhost_ops
details.

SIgned-off-by: Michael Qiu 
---
  hw/virtio/vhost.c | 14 ++
  include/hw/virtio/vhost.h |  1 +
  2 files changed, 15 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index b643f42..26667ae 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1854,3 +1854,17 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
  
  return -ENOSYS;

  }
+
+int vhost_dev_reset(struct vhost_dev *hdev)

Maybe vhost_user_scsi_reset() can call this function instead?

-Siwei

+{
+int ret = 0;
+
+/* should only be called after backend is connected */
+assert(hdev->vhost_ops);
+
+if (hdev->vhost_ops->vhost_reset_device) {
+ret = hdev->vhost_ops->vhost_reset_device(hdev);
+}
+
+return ret;
+}
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 58a73e7..b8b7c20 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
  void vhost_dev_cleanup(struct vhost_dev *hdev);
  int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
  void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
+int vhost_dev_reset(struct vhost_dev *hdev);
  int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
  void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
  





Re: [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps

2022-04-01 Thread Si-Wei Liu




On 4/1/2022 4:06 AM, Michael Qiu wrote:

Currently in vhost framwork, vhost_reset_device() is misnamed.
Actually, it should be vhost_reset_owner().

In vhost user, it make compatible with reset device ops, but
vhost kernel does not compatible with it, for vhost vdpa, it
only implement reset device action.

So we need seperate the function into vhost_reset_owner() and
vhost_reset_device(). So that different backend could use the
correct function.

Signde-off-by: Michael Qiu 
---
  hw/scsi/vhost-user-scsi.c |  6 +-
  hw/virtio/vhost-backend.c |  4 ++--
  hw/virtio/vhost-user.c| 22 ++
  include/hw/virtio/vhost-backend.h |  2 ++
  4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 1b2f7ee..f179626 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -80,8 +80,12 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
  return;
  }
  
-if (dev->vhost_ops->vhost_reset_device) {

+if (virtio_has_feature(dev->protocol_features,
+   VHOST_USER_PROTOCOL_F_RESET_DEVICE) &&
This line change is not needed. VHOST_USER_PROTOCOL_F_RESET_DEVICE is 
guaranteed to be available if getting here.

+   dev->vhost_ops->vhost_reset_device) {
  dev->vhost_ops->vhost_reset_device(dev);
+} else if (dev->vhost_ops->vhost_reset_owner) {
+dev->vhost_ops->vhost_reset_owner(dev);
Nope, drop these two lines. The caller of vhost_user_scsi_reset() 
doesn't expect vhost_reset_owner to be called in case vhost_reset_device 
is not implemented.



  }
  }
  
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c

index e409a86..abbaa8b 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -191,7 +191,7 @@ static int vhost_kernel_set_owner(struct vhost_dev *dev)
  return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
  }
  
-static int vhost_kernel_reset_device(struct vhost_dev *dev)

+static int vhost_kernel_reset_owner(struct vhost_dev *dev)
  {
  return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
  }
@@ -317,7 +317,7 @@ const VhostOps kernel_ops = {
  .vhost_get_features = vhost_kernel_get_features,
  .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
  .vhost_set_owner = vhost_kernel_set_owner,
-.vhost_reset_device = vhost_kernel_reset_device,
+.vhost_reset_owner = vhost_kernel_reset_owner,
  .vhost_get_vq_index = vhost_kernel_get_vq_index,
  #ifdef CONFIG_VHOST_VSOCK
  .vhost_vsock_set_guest_cid = vhost_kernel_vsock_set_guest_cid,
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 6abbc9d..4412008 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1475,16 +1475,29 @@ static int vhost_user_get_max_memslots(struct vhost_dev 
*dev,
  return 0;
  }
  
+static int vhost_user_reset_owner(struct vhost_dev *dev)

+{
+VhostUserMsg msg = {
+.hdr.request = VHOST_USER_RESET_OWNER,
+.hdr.flags = VHOST_USER_VERSION,
+};
+
+return vhost_user_write(dev, &msg, NULL, 0);
+}
+
  static int vhost_user_reset_device(struct vhost_dev *dev)
  {
  VhostUserMsg msg = {
+.hdr.request = VHOST_USER_RESET_DEVICE,
  .hdr.flags = VHOST_USER_VERSION,
  };
  
-msg.hdr.request = virtio_has_feature(dev->protocol_features,

- VHOST_USER_PROTOCOL_F_RESET_DEVICE)
-? VHOST_USER_RESET_DEVICE
-: VHOST_USER_RESET_OWNER;
+/* Caller must ensure the backend has VHOST_USER_PROTOCOL_F_RESET_DEVICE
+ * support */
+if (!virtio_has_feature(dev->protocol_features,
+   VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
+return -EPERM;
+}
I think we can safely remove this check, since the caller already 
guarantees VHOST_USER_PROTOCOL_F_RESET_DEVICE is around as what your 
comment mentions.


The previous branch condition is to reuse the vhost_reset_device op for 
two different ends, but there's no actual user for 
VHOST_USER_RESET_OWNER historically.


Thanks,
-Siwei

  
  return vhost_user_write(dev, &msg, NULL, 0);

  }
@@ -2548,6 +2561,7 @@ const VhostOps user_ops = {
  .vhost_set_features = vhost_user_set_features,
  .vhost_get_features = vhost_user_get_features,
  .vhost_set_owner = vhost_user_set_owner,
+.vhost_reset_owner = vhost_user_reset_owner,
  .vhost_reset_device = vhost_user_reset_device,
  .vhost_get_vq_index = vhost_user_get_vq_index,
  .vhost_set_vring_enable = vhost_user_set_vring_enable,
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index 81bf310..affeeb0 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -77,6 +77,7 @@ typedef int (*vhost_get_features_op)(struct vhost_dev *dev,
   uin

Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()

2022-04-01 Thread Si-Wei Liu




On 3/31/2022 7:53 PM, Jason Wang wrote:

On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu  wrote:

Currently, when VM poweroff, it will trigger vdpa
device(such as mlx bluefield2 VF) reset many times(with 1 datapath
queue pair and one control queue, triggered 3 times), this
leads to below issue:

vhost VQ 2 ring restore failed: -22: Invalid argument (22)

This because in vhost_net_stop(), it will stop all vhost device bind to
this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
, then stop the queue: vhost_virtqueue_stop().

In vhost_dev_stop(), it resets the device, which clear some flags
in low level driver, and in next loop(stop other vhost backends),
qemu try to stop the queue corresponding to the vhost backend,
  the driver finds that the VQ is invalied, this is the root cause.

To solve the issue, vdpa should set vring unready, and
remove reset ops in device stop: vhost_dev_start(hdev, false).

and implement a new function vhost_dev_reset, only reset backend
device after all vhost(per-queue) stoped.

Typo.


Signed-off-by: Michael Qiu
Acked-by: Jason Wang 

Rethink this patch, consider there're devices that don't support
set_vq_ready(). I wonder if we need

1) uAPI to tell the user space whether or not it supports set_vq_ready()
I guess what's more relevant here is to define the uAPI semantics for 
unready i.e. set_vq_ready(0) for resuming/stopping virtqueue processing, 
as starting vq is comparatively less ambiguous. Considering the 
likelihood that this interface may be used for live migration, it would 
be nice to come up with variants such as 1) discard inflight request 
v.s. 2) waiting for inflight processing to be done, and 3) timeout in 
waiting.



2) userspace will call SET_VRING_ENABLE() when the device supports
otherwise it will use RESET.
Are you looking to making virtqueue resume-able through the new 
SET_VRING_ENABLE() uAPI?


I think RESET is inevitable in some case, i.e. when guest initiates 
device reset by writing 0 to the status register. For suspend/resume and 
live migration use cases, indeed RESET can be substituted with 
SET_VRING_ENABLE. Again, it'd need quite some code refactoring to 
accommodate this change. Although I'm all for it, it'd be the best to 
lay out the plan for multiple phases rather than overload this single 
patch too much. You can count my time on this endeavor if you don't mind. :)




And for safety, I suggest tagging this as 7.1.

+1

Regards,
-Siwei




---
v4 --> v3
 Nothing changed, becasue of issue with mimecast,
 when the From: tag is different from the sender,
 the some mail client will take the patch as an
 attachment, RESEND v3 does not work, So resend
 the patch as v4

v3 --> v2:
 Call vhost_dev_reset() at the end of vhost_net_stop().

 Since the vDPA device need re-add the status bit
 VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
 simply, add them inside vhost_vdpa_reset_device, and
 the only way calling vhost_vdpa_reset_device is in
 vhost_net_stop(), so it keeps the same behavior as before.

v2 --> v1:
Implement a new function vhost_dev_reset,
reset the backend kernel device at last.
---
  hw/net/vhost_net.c| 24 +---
  hw/virtio/vhost-vdpa.c| 15 +--
  hw/virtio/vhost.c | 15 ++-
  include/hw/virtio/vhost.h |  1 +
  4 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 30379d2..422c9bf 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
  int total_notifiers = data_queue_pairs * 2 + cvq;
  VirtIONet *n = VIRTIO_NET(dev);
  int nvhosts = data_queue_pairs + cvq;
-struct vhost_net *net;
+struct vhost_net *net = NULL;
  int r, e, i, index_end = data_queue_pairs * 2;
  NetClientState *peer;

@@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
  err_start:
  while (--i >= 0) {
  peer = qemu_get_peer(ncs , i);
-vhost_net_stop_one(get_vhost_net(peer), dev);
+
+net = get_vhost_net(peer);
+
+vhost_net_stop_one(net, dev);
  }
+
+/* We only reset backend vdpa device */
+if (net && net->dev.vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
+vhost_dev_reset(&net->dev);
+}
+
  e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
  if (e < 0) {
  fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", e);
@@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
  VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
  VirtIONet *n = VIRTIO_NET(dev);
  NetClientState *peer;
+struct vhost_net *net = NULL;
  int total_notifiers = data_queue_pairs * 2 + cvq;
  int nvhosts = data_queue_pairs + cvq;
  int i, r;
@@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs

[RFC PATCH v1 8/8] qapi: golang: document skip function visit_array_types

2022-04-01 Thread Victor Toso
Signed-off-by: Victor Toso 
---
 scripts/qapi/golang.py | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/scripts/qapi/golang.py b/scripts/qapi/golang.py
index 5d3395514d..9a775d0691 100644
--- a/scripts/qapi/golang.py
+++ b/scripts/qapi/golang.py
@@ -321,7 +321,12 @@ def visit_enum_type(self: QAPISchemaGenGolangVisitor,
 '''
 
 def visit_array_type(self, name, info, ifcond, element_type):
-pass
+# TLDR: We don't need to any extra boilerplate in Go to handle Arrays.
+#
+# This function is implemented just to be sure that:
+# 1. Every array type ends with List
+# 2. Every array type's element is the array type without 'List'
+assert name.endswith("List") and name[:-4] == element_type.name
 
 def visit_command(self,
   name: str,
-- 
2.35.1




[RFC PATCH v1 6/8] qapi: golang: Generate qapi's command types in Go

2022-04-01 Thread Victor Toso
This patch handles QAPI command types and generates data structures in
Go that decodes from QMP JSON Object to Go data structure and vice
versa.

At the time of this writing, it generates 210 structures
(208 commands)

This is very similar to previous commit, that handles QAPI union types
in Go.

Each QAPI command will generate a Go struct with the suffix 'Command'.
Its fields, if any, are the mandatory or optional arguments defined in
the QAPI command.

Simlar to Event, this patch adds two structures to handle QAPI
specification for command types: 'Command' and 'CommandBase'.

'CommandBase' contains @Id and @Name. 'Command' extends 'CommandBase'
with an @Arg field of type 'Any'.

The 'Command' type implements the Unmarshaler to decode QMP JSON
Object into the correct Golang (command) struct.

Marshal example:
```go
cmdArg := SetPasswordCommand{}
cmdArg.Protocol = DisplayProtocolVnc
cmdArg.Password = "secret"
cmd := Command{}
cmd.Name = "set_password"
cmd.Arg = cmdArg

b, _ := json.Marshal(&cmd)
// string(b) == 
`{"execute":"set_password","arguments":{"protocol":"vnc","password":"secret"}}`
```

Unmarshal example:
```go
qmpCommand := `
{
"execute": "set_password",
"arguments":{
"protocol": "vnc",
"password": "secret"
}
}`
cmd := Command{}
_ = json.Unmarshal([]byte(qmpCommand), &cmd)
// cmd.Name == "set_password"
// cmd1.Arg.(SetPasswordCommand).Protocol == DisplayProtocolVnc
```

Signed-off-by: Victor Toso 
---
 scripts/qapi/golang.py | 49 ++
 1 file changed, 45 insertions(+), 4 deletions(-)

diff --git a/scripts/qapi/golang.py b/scripts/qapi/golang.py
index 3bb66d07c7..0b9c19babb 100644
--- a/scripts/qapi/golang.py
+++ b/scripts/qapi/golang.py
@@ -31,10 +31,11 @@ class QAPISchemaGenGolangVisitor(QAPISchemaVisitor):
 
 def __init__(self, prefix: str):
 super().__init__()
-self.target = {name: "" for name in ["alternate", "enum", "event", 
"helper", "struct", "union"]}
+self.target = {name: "" for name in ["alternate", "command", "enum", 
"event", "helper", "struct", "union"]}
 self.objects_seen = {}
 self.schema = None
 self.events = {}
+self.commands = {}
 self._docmap = {}
 self.golang_package_name = "qapi"
 
@@ -76,6 +77,19 @@ def visit_end(self):
 '''
 self.target["event"] += generate_marshal_methods('Event', self.events)
 
+self.target["command"] += '''
+type CommandBase struct {
+Id   string `json:"id,omitempty"`
+Name string `json:"execute"`
+}
+
+type Command struct {
+CommandBase
+Arg Any`json:"arguments,omitempty"`
+}
+'''
+self.target["command"] += generate_marshal_methods('Command', 
self.commands)
+
 self.target["helper"] += '''
 // Creates a decoder that errors on unknown Fields
 // Returns true if successfully decoded @from string @into type
@@ -295,7 +309,29 @@ def visit_command(self,
   allow_oob: bool,
   allow_preconfig: bool,
   coroutine: bool) -> None:
-pass
+assert name == info.defn_name
+type_name = qapi_to_go_type_name(name, info.defn_meta)
+self.commands[name] = type_name
+
+doc = self._docmap.get(name, None)
+self_contained = True if not arg_type or not 
arg_type.name.startswith("q_obj") else False
+content = ""
+if boxed or self_contained:
+args = "" if not arg_type else "\n" + arg_type.name
+doc_struct, _ = qapi_to_golang_struct_docs(doc)
+content = generate_struct_type(type_name, args, doc_struct)
+else:
+assert isinstance(arg_type, QAPISchemaObjectType)
+content = qapi_to_golang_struct(name,
+doc,
+arg_type.info,
+arg_type.ifcond,
+arg_type.features,
+arg_type.base,
+arg_type.members,
+arg_type.variants)
+
+self.target["command"] += content
 
 def visit_event(self, name, info, ifcond, features, arg_type, boxed):
 assert name == info.defn_name
@@ -391,7 +427,7 @@ def generate_marshal_methods_enum(members: 
List[QAPISchemaEnumMember]) -> str:
 }}
 '''
 
-# Marshal methods for Event and Union types
+# Marshal methods for Event, Commad and Union types
 def generate_marshal_methods(type: str,
  type_dict: Dict[str, str],
  discriminator: str = "",
@@ -404,6 +440,11 @@ def generate_marshal_methods(type: str,
 discriminator = "base.Name"
 struct_field = "Arg"
 json_field =

[RFC PATCH v1 3/8] qapi: golang: Generate qapi's struct types in Go

2022-04-01 Thread Victor Toso
This patch handles QAPI struct types and generates the equivalent
types in Go.

At the time of this writing, it generates 375 structures.

The highlights of this implementation are:

1. Generating an Go struct that requires a @base type, the @base
   type is embedded in this Go structure.

   Example: See InetSocketAddress with it's base InetSocketAddressBase

2. Differently from previous two types ('enum' and 'alternate'), the
   generated QAPI's struct type do not need to implement Marshaler and
   Unmarshaler interfaces. This generated structures will naturally
   match with JSON Objects.

3. About the Go struct's fields:

  i) They can be either by Value or Reference.

  ii) Every field that is marked as optional in the QAPI specification
  are translated to Reference fields in its Go structure. This design
  decision is the most straightforward way to check if a given field
  was set or not.

  iii) Mandatory fields are always by Value with the exception of QAPI
  arrays, which are handled by Reference (to a block of memory) by Go.

  iv) All the fields are named with Uppercase due Golang's export
  convention.

  v) In order to avoid any kind of issues when encoding ordecoding, to
  or from JSON, we mark all fields with its @name and, when it is
  optional, member, with @omitempty

Signed-off-by: Victor Toso 
---
 scripts/qapi/golang.py | 79 --
 1 file changed, 77 insertions(+), 2 deletions(-)

diff --git a/scripts/qapi/golang.py b/scripts/qapi/golang.py
index 8be31bd902..50e39f8925 100644
--- a/scripts/qapi/golang.py
+++ b/scripts/qapi/golang.py
@@ -31,7 +31,7 @@ class QAPISchemaGenGolangVisitor(QAPISchemaVisitor):
 
 def __init__(self, prefix: str):
 super().__init__()
-self.target = {name: "" for name in ["alternate", "enum", "helper"]}
+self.target = {name: "" for name in ["alternate", "enum", "helper", 
"struct"]}
 self.objects_seen = {}
 self.schema = None
 self._docmap = {}
@@ -82,7 +82,31 @@ def visit_object_type(self: QAPISchemaGenGolangVisitor,
   members: List[QAPISchemaObjectTypeMember],
   variants: Optional[QAPISchemaVariants]
   ) -> None:
-pass
+# Do not handle anything besides structs
+if (name == self.schema.the_empty_object_type.name or
+not isinstance(name, str) or
+info.defn_meta not in ["struct"]):
+return
+
+assert name not in self.objects_seen
+self.objects_seen[name] = True
+
+# visit all inner objects as well, they are not going to be
+# called by python's generator.
+if variants:
+for var in variants.variants:
+assert isinstance(var.type, QAPISchemaObjectType)
+self.visit_object_type(self,
+   var.type.name,
+   var.type.info,
+   var.type.ifcond,
+   var.type.base,
+   var.type.local_members,
+   var.type.variants)
+
+doc = self._docmap.get(info.defn_name, None)
+self.target[info.defn_meta] += qapi_to_golang_struct(name, doc, info,
+ifcond, features, base, members, variants)
 
 def visit_alternate_type(self: QAPISchemaGenGolangVisitor,
  name: str,
@@ -276,6 +300,14 @@ def gen_golang(schema: QAPISchema,
 schema.visit(vis)
 vis.write(output_dir)
 
+# Helper function for boxed or self contained structures.
+def generate_struct_type(type_name, args="", doc_struct="") -> str:
+args =  args if len(args) == 0 else f"\n{args}\n"
+return f'''
+{doc_struct}
+type {type_name} struct {{{args}}}
+'''
+
 def generate_marshal_methods_enum(members: List[QAPISchemaEnumMember]) -> str:
 type = qapi_to_go_type_name(members[0].defined_in, "enum")
 
@@ -345,6 +377,46 @@ def qapi_to_golang_struct_docs(doc: QAPIDoc) -> (str, 
Dict[str, str]):
 
 return main, fields
 
+# Helper function that is used for most of QAPI types
+def qapi_to_golang_struct(name: str,
+  doc: QAPIDoc,
+  info: Optional[QAPISourceInfo],
+  ifcond: QAPISchemaIfCond,
+  features: List[QAPISchemaFeature],
+  base: Optional[QAPISchemaObjectType],
+  members: List[QAPISchemaObjectTypeMember],
+  variants: Optional[QAPISchemaVariants]) -> str:
+
+type_name = qapi_to_go_type_name(name, info.defn_meta)
+doc_struct, doc_fields = qapi_to_golang_struct_docs(doc)
+
+base_fields = ""
+if base:
+base_type_name = qapi_to_go_type_name(base.name, base.meta)
+base_fields = f"\t// Base type for this struct\n\t{base_type_name}\n"
+
+

[RFC PATCH v1 7/8] qapi: golang: Add CommandResult type to Go

2022-04-01 Thread Victor Toso
This patch adds a struct type in Go that will handle return values for
QAPI's command types.

The return value of a Command is, encouraged to be, QAPI's complext
types or an Array of those.

The 'CommandResult' type acts in similar fashion to 'Event' and
'Command', in order to map the right return data structure based on
the issued 'Command'.

This patch also adds a syntax sugar method to 'Command' to return the
'CommandResult' struct to use when receiving the return data.

Example:
```go
cmd := Command{}
cmd.Name = `query-tpm-models`
// bytes, _ := json.Marshal(&cmd)
// send bytes ...
received := `{"return":["tpm-tis","tpm-crb","tpm-spapr"]}`
cmdRet := cmd.GetReturnType()
_ = json.Unmarshal([]byte(received), &cmdRet)
// cmdRet.Value.([]TpmModel)[2] == TpmModelTpmSpapr
```

Signed-off-by: Victor Toso 
---
 scripts/qapi/golang.py | 42 --
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/scripts/qapi/golang.py b/scripts/qapi/golang.py
index 0b9c19babb..5d3395514d 100644
--- a/scripts/qapi/golang.py
+++ b/scripts/qapi/golang.py
@@ -36,6 +36,7 @@ def __init__(self, prefix: str):
 self.schema = None
 self.events = {}
 self.commands = {}
+self.command_results = {}
 self._docmap = {}
 self.golang_package_name = "qapi"
 
@@ -90,6 +91,32 @@ def visit_end(self):
 '''
 self.target["command"] += generate_marshal_methods('Command', 
self.commands)
 
+self.target["command"] += '''
+type CommandResult struct {
+   CommandBase
+   Value   Any `json:"return,omitempty"`
+}
+
+func (s Command) GetReturnType() CommandResult {
+   return CommandResult{
+   CommandBase: s.CommandBase,
+   }
+}
+
+// In order to evaluate nil value to empty JSON object
+func (s *CommandResult) MarshalJSON() ([]byte, error) {
+   if s.Value == nil {
+   return []byte(`{"return":{}}`), nil
+   }
+   tmp := struct {
+   Value Any `json:"return"`
+   }{Value: s.Value}
+
+   return json.Marshal(&tmp)
+}
+'''
+self.target["command"] += generate_marshal_methods('CommandResult', 
self.command_results)
+
 self.target["helper"] += '''
 // Creates a decoder that errors on unknown Fields
 // Returns true if successfully decoded @from string @into type
@@ -312,6 +339,9 @@ def visit_command(self,
 assert name == info.defn_name
 type_name = qapi_to_go_type_name(name, info.defn_meta)
 self.commands[name] = type_name
+if ret_type:
+ret_type_name = qapi_schema_type_to_go_type(ret_type.name)
+self.command_results[name] = ret_type_name
 
 doc = self._docmap.get(name, None)
 self_contained = True if not arg_type or not 
arg_type.name.startswith("q_obj") else False
@@ -445,6 +475,11 @@ def generate_marshal_methods(type: str,
 discriminator = "base.Name"
 struct_field = "Arg"
 json_field = "arguments"
+elif type == "CommandResult":
+base = "CommandBase"
+discriminator = "s.Name"
+struct_field = "Value"
+json_field = "return"
 else:
 assert base != ""
 discriminator = "base." + discriminator
@@ -527,14 +562,17 @@ def generate_marshal_methods(type: str,
 return []byte(result), nil
 }}
 '''
-unmarshal_base = f'''
+unmarshal_base = ""
+unmarshal_default_warn = ""
+if type != "CommandResult":
+unmarshal_base = f'''
 var base {base}
 if err := json.Unmarshal(data, &base); err != nil {{
 return err
 }}
 s.{base} = base
 '''
-unmarshal_default_warn = f'''
+unmarshal_default_warn = f'''
 default:
 fmt.Println("Failed to decode {type}", {discriminator})'''
 
-- 
2.35.1




[RFC PATCH v1 4/8] qapi: golang: Generate qapi's union types in Go

2022-04-01 Thread Victor Toso
This patch handles QAPI union types and generates the equivalent data
structures and methods in Go to handle it.

At the moment of this writing, it generates 67 structures.

The QAPI union type can be summarized by its common members that are
defined in a @base struct and a @value. The @value type can vary and
depends on @base's field that we call @discriminator. The
@discriminator is always a Enum type.

Golang does not have Unions. The generation of QAPI union type in Go
with this patch, follows similar approach to what is done for QAPI
struct types and QAPI alternate types.

Similarly to Go implementation of QAPI alternate types, we will
implement the Marshaler and Unmarshaler interfaces to seamless decode
from JSON objects to Golang structs and vice versa.

Similarly to Go implementation of QAPI struct types, we will need to
tag @base fields accordingly.

The embedded documentation in Golang's structures and fields are
particularly important here, to help developers know what Types to use
for @value. Runtime checks too.

Signed-off-by: Victor Toso 
---
 scripts/qapi/golang.py | 124 +++--
 1 file changed, 119 insertions(+), 5 deletions(-)

diff --git a/scripts/qapi/golang.py b/scripts/qapi/golang.py
index 50e39f8925..0a1bf430ba 100644
--- a/scripts/qapi/golang.py
+++ b/scripts/qapi/golang.py
@@ -31,7 +31,7 @@ class QAPISchemaGenGolangVisitor(QAPISchemaVisitor):
 
 def __init__(self, prefix: str):
 super().__init__()
-self.target = {name: "" for name in ["alternate", "enum", "helper", 
"struct"]}
+self.target = {name: "" for name in ["alternate", "enum", "helper", 
"struct", "union"]}
 self.objects_seen = {}
 self.schema = None
 self._docmap = {}
@@ -82,10 +82,10 @@ def visit_object_type(self: QAPISchemaGenGolangVisitor,
   members: List[QAPISchemaObjectTypeMember],
   variants: Optional[QAPISchemaVariants]
   ) -> None:
-# Do not handle anything besides structs
+# Do not handle anything besides struct and unions.
 if (name == self.schema.the_empty_object_type.name or
 not isinstance(name, str) or
-info.defn_meta not in ["struct"]):
+info.defn_meta not in ["struct", "union"]):
 return
 
 assert name not in self.objects_seen
@@ -351,6 +351,93 @@ def generate_marshal_methods_enum(members: 
List[QAPISchemaEnumMember]) -> str:
 }}
 '''
 
+# Marshal methods for Union types
+def generate_marshal_methods(type: str,
+ type_dict: Dict[str, str],
+ discriminator: str = "",
+ base: str = "") -> str:
+assert base != ""
+discriminator = "base." + discriminator
+
+switch_case_format = '''
+case {name}:
+value := {case_type}{{}}
+if err := json.Unmarshal(data, &value); err != nil {{
+return err
+}}
+s.Value = value'''
+
+if_supported_types = ""
+added = {}
+switch_cases = ""
+for name in sorted(type_dict):
+case_type = type_dict[name]
+isptr = "*" if case_type[0] not in "*[" else ""
+switch_cases += switch_case_format.format(name = name,
+  case_type = case_type)
+if case_type not in added:
+if_supported_types += f'''typestr != "{case_type}" &&\n\t\t'''
+added[case_type] = True
+
+marshalfn = f'''
+func (s {type}) MarshalJSON() ([]byte, error) {{
+   base, err := json.Marshal(s.{base})
+   if err != nil {{
+   return nil, err
+   }}
+
+typestr := fmt.Sprintf("%T", s.Value)
+typestr = typestr[strings.LastIndex(typestr, ".")+1:]
+
+// "The branches need not cover all possible enum values"
+// This means that on Marshal, we can safely ignore empty values
+if typestr == "" {{
+return []byte(base), nil
+}}
+
+// Runtime check for supported value types
+if {if_supported_types[:-6]} {{
+return nil, errors.New(fmt.Sprintf("Type is not supported: %s", 
typestr))
+}}
+   value, err := json.Marshal(s.Value)
+   if err != nil {{
+   return nil, err
+   }}
+
+// Workaround to avoid checking s.Value being empty
+if string(value) == "{{}}" {{
+return []byte(base), nil
+}}
+
+// Removes the last '}}' from base and the first '{{' from value, in order 
to
+// return a single JSON object.
+result := fmt.Sprintf("%s,%s", base[:len(base)-1], value[1:])
+return []byte(result), nil
+}}
+'''
+unmarshal_base = f'''
+var base {base}
+if err := json.Unmarshal(data, &base); err != nil {{
+return err
+}}
+s.{base} = base
+'''
+unmarshal_default_warn = f'''
+default:
+fmt.Println("Failed to decode {type}", {discriminator})'''
+
+return f'''{marshalfn}
+func 

[RFC PATCH v1 2/8] qapi: golang: Generate qapi's alternate types in Go

2022-04-01 Thread Victor Toso
This patch handles QAPI alternate types and generates data structures
in Go that handles it.

At this moment, there are 5 alternates in qemu/qapi, they are:
 * BlockDirtyBitmapMergeSource
 * Qcow2OverlapChecks
 * BlockdevRef
 * BlockdevRefOrNull
 * StrOrNull

Alternate types are similar to Union but without a discriminator that
can be used to identify the underlying value on the wire. It is needed
to infer it. That can't be easily mapped in Go.

For each Alternate type, we will be using a Any type to hold the
value. 'Any' is an alias type for interface{} (similar to void* in C).

Similarly to the Enum types (see previous commit), we will implement
Marshaler and Unmarshaler interfaces for the Alternate types and in
those MarshalJSON() and UnmarshalJSON() methods is where we are going
to put the logic to read/set alternate's value.

Note that on UnmarshalJSON(), a helper function called StrictDecode()
will be used. This function is the main logic to infer if a given JSON
object fits in a given Go struct. Because we only have 5 alternate
types, it is not hard to validate the unmarshaling logic but we might
need to improve it in the future if Alternate with branches that have
similar fields appear.

Examples:
 * BlockdevRef
```go
// Data to set in BlockdevOptions
qcow2 := BlockdevOptionsQcow2{}
// BlockdevRef using a string
qcow2.File = BlockdevRef{Value: "/some/place/my-image"}
opt := BlockdevOptions{}
opt.Driver = BlockdevDriverQcow2
opt.Value = qcow2

b, _ := json.Marshal(data.s)
// string(b) == `{"driver":"qcow2","file":"/some/place/my-image"}`
```

Signed-off-by: Victor Toso 
---
 scripts/qapi/golang.py | 157 -
 1 file changed, 155 insertions(+), 2 deletions(-)

diff --git a/scripts/qapi/golang.py b/scripts/qapi/golang.py
index 070d4cbbae..8be31bd902 100644
--- a/scripts/qapi/golang.py
+++ b/scripts/qapi/golang.py
@@ -31,7 +31,8 @@ class QAPISchemaGenGolangVisitor(QAPISchemaVisitor):
 
 def __init__(self, prefix: str):
 super().__init__()
-self.target = {name: "" for name in ["enum"]}
+self.target = {name: "" for name in ["alternate", "enum", "helper"]}
+self.objects_seen = {}
 self.schema = None
 self._docmap = {}
 self.golang_package_name = "qapi"
@@ -43,6 +44,10 @@ def visit_begin(self, schema):
 for target in self.target:
 self.target[target] = f"package {self.golang_package_name}\n"
 
+self.target["helper"] += f'''
+// Alias for go version lower than 1.18
+type Any = interface{{}}'''
+
 # Iterate once in schema.docs to map doc objects to its name
 for doc in schema.docs:
 if doc.symbol is None:
@@ -52,6 +57,22 @@ def visit_begin(self, schema):
 def visit_end(self):
 self.schema = None
 
+self.target["helper"] += '''
+// Creates a decoder that errors on unknown Fields
+// Returns true if successfully decoded @from string @into type
+// Returns false without error is failed with "unknown field"
+// Returns false with error is a different error was found
+func StrictDecode(into interface{}, from []byte) error {
+   dec := json.NewDecoder(strings.NewReader(string(from)))
+   dec.DisallowUnknownFields()
+
+if err := dec.Decode(into); err != nil {
+return err
+}
+   return nil
+}
+'''
+
 def visit_object_type(self: QAPISchemaGenGolangVisitor,
   name: str,
   info: Optional[QAPISourceInfo],
@@ -70,7 +91,123 @@ def visit_alternate_type(self: QAPISchemaGenGolangVisitor,
  features: List[QAPISchemaFeature],
  variants: QAPISchemaVariants
  ) -> None:
-pass
+assert name not in self.objects_seen
+self.objects_seen[name] = True
+
+# Alternate marshal logic
+#
+# To avoid programming errors by users of this generated Go module,
+# we add a runtime check to error out in case the underlying Go type
+# doesn't not match any of supported types of the Alternate type.
+#
+# Also, Golang's json Marshal will include as JSON's object, the
+# wrapper we use to hold the Go struct (Value Any -> `Value: {...}`)
+# This would not be an valid QMP message so we workaround it by
+# calling RemoveValueObject function.
+doc = self._docmap.get(name, None)
+doc_struct, doc_fields = qapi_to_golang_struct_docs(doc)
+
+members_doc = '''// Options are:'''
+if_supported_types = ""
+for var in variants.variants:
+field_doc = doc_fields.get(var.name, "")
+field_go_type = qapi_schema_type_to_go_type(var.type.name)
+members_doc += f'''\n// * {var.name} 
({field_go_type}):{field_doc[3:]}'''
+
+if field_go_type == "nil":
+field_go_type = "*string"
+
+i

[RFC PATCH v1 5/8] qapi: golang: Generate qapi's event types in Go

2022-04-01 Thread Victor Toso
This patch handles QAPI event types and generates data structures in
Go that handles it.

At the moment of this writing, it generates 51 structures (49 events)

In Golang, each event is handled as a Go structure and there is no big
difference, in the Go generated code, between what is a QAPI event
type and what is a QAPI struct.

Each QAPI event has the suffix 'Event' in its Golang data structure
and contains the fields, mandatory and optional, that can be
sent or received.

In addition, there are two structures added to handle QAPI
specification for event types: 'Event' and 'EventBase'.

'EventBase' contains @Name and @Timestamp fields and then 'Event'
extends 'EventBase' with an @Arg field of type 'Any'.

The 'Event' type implements the Unmarshaler to decode the QMP JSON
Object into the correct Golang (event) struct. The goal here is to
facilitate receiving Events.

A TODO for this type is to implement Marshaler for 'Event'. It'll
containt runtime checks to validate before transforming the struct
into a JSON Object.

Example:
```go
qmpMsg := `{
"event" : "MIGRATION",
"timestamp":{
"seconds": 1432121972,
"microseconds": 744001
},
"data":{
"status": "completed"
}
}`

e := Event{}
_ = json.Unmarshal([]byte(qmpMsg), &e)
// e.Name == "MIGRATION"
// e.Arg.(MigrationEvent).Status == MigrationStatusCompleted
```

Signed-off-by: Victor Toso 
---
 scripts/qapi/golang.py | 92 ++
 1 file changed, 84 insertions(+), 8 deletions(-)

diff --git a/scripts/qapi/golang.py b/scripts/qapi/golang.py
index 0a1bf430ba..3bb66d07c7 100644
--- a/scripts/qapi/golang.py
+++ b/scripts/qapi/golang.py
@@ -31,9 +31,10 @@ class QAPISchemaGenGolangVisitor(QAPISchemaVisitor):
 
 def __init__(self, prefix: str):
 super().__init__()
-self.target = {name: "" for name in ["alternate", "enum", "helper", 
"struct", "union"]}
+self.target = {name: "" for name in ["alternate", "enum", "event", 
"helper", "struct", "union"]}
 self.objects_seen = {}
 self.schema = None
+self.events = {}
 self._docmap = {}
 self.golang_package_name = "qapi"
 
@@ -57,6 +58,24 @@ def visit_begin(self, schema):
 def visit_end(self):
 self.schema = None
 
+# EventBase and Event are not specified in the QAPI schema,
+# so we need to generate it ourselves.
+self.target["event"] += '''
+type EventBase struct {
+Name  string `json:"event"`
+Timestamp struct {
+Seconds  int64 `json:"seconds"`
+Microseconds int64 `json:"microseconds"`
+} `json:"timestamp"`
+}
+
+type Event struct {
+EventBase
+Arg   Any`json:"data,omitempty"`
+}
+'''
+self.target["event"] += generate_marshal_methods('Event', self.events)
+
 self.target["helper"] += '''
 // Creates a decoder that errors on unknown Fields
 // Returns true if successfully decoded @from string @into type
@@ -279,7 +298,28 @@ def visit_command(self,
 pass
 
 def visit_event(self, name, info, ifcond, features, arg_type, boxed):
-pass
+assert name == info.defn_name
+type_name = qapi_to_go_type_name(name, info.defn_meta)
+self.events[name] = type_name
+
+doc = self._docmap.get(name, None)
+self_contained = True if not arg_type or not 
arg_type.name.startswith("q_obj") else False
+content = ""
+if self_contained:
+doc_struct, _ = qapi_to_golang_struct_docs(doc)
+content = generate_struct_type(type_name, "", doc_struct)
+else:
+assert isinstance(arg_type, QAPISchemaObjectType)
+content = qapi_to_golang_struct(name,
+doc,
+arg_type.info,
+arg_type.ifcond,
+arg_type.features,
+arg_type.base,
+arg_type.members,
+arg_type.variants)
+
+self.target["event"] += content
 
 def write(self, output_dir: str) -> None:
 for module_name, content in self.target.items():
@@ -351,15 +391,41 @@ def generate_marshal_methods_enum(members: 
List[QAPISchemaEnumMember]) -> str:
 }}
 '''
 
-# Marshal methods for Union types
+# Marshal methods for Event and Union types
 def generate_marshal_methods(type: str,
  type_dict: Dict[str, str],
  discriminator: str = "",
  base: str = "") -> str:
-assert base != ""
-discriminator = "base." + discriminator
-
-switch_case_format = '''
+type_is_union = False
+json_field = ""
+struct_field = ""
+if type == "Event":
+base = type + "Base"
+discriminator = "base.Name"
+

[RFC PATCH v1 0/8] qapi: add generator for Golang interface

2022-04-01 Thread Victor Toso
Hi,

Happy 1st April. Not a joke :) /* ugh, took me too long to send */

This series is about adding a generator in scripts/qapi to produce
Go data structures that can be used to communicate with QEMU over
QMP.


* Why Go?

There are quite a few Go projects that interact with QEMU over QMP
and they endup using a mix of different libraries with their own
code.


** Which projects?

The ones I've found so far:

- podman machine
  https://github.com/containers/podman/tree/main/pkg/machine/qemu

- kata-containers (govmm)
  
https://github.com/kata-containers/kata-containers/tree/main/src/runtime/pkg/govmm

- lxd
  https://github.com/lxc/lxd/tree/master/lxd/instance/drivers

- kubevirt (plain json strings)
  https://github.com/kubevirt/kubevirt

(let me know if you know others)


* But Why?

I'm particularly interested in 3 out of 4 of the projects above and
only Kubevirt uses libvirt to handle QEMU. That means that every
QEMU releases where a QMP command, event or other data struct is
added, removed or changed, those projects need to check what changed
in QEMU and then address those changes in their projects, if needed.

The idea behind generating Go data structures is that we can keep a
Go module which can have releases that follow QEMU releases.

The project that uses this Go module, only need to bump the module
version and it shall receive all the changes in their own vendored
code base.


* Status

There are a few rough edges to work on but this is usable. The major
thing I forgot to add is handling Error from Commands. It'll be the
first thing I'll work on next week.

If you want to start using this Today you can fetch it in at

https://gitlab.com/victortoso/qapi-go/

There are quite a few tests that I took from the examples in the
qapi schema. Coverage using go's cover tool is giving `28.6% of
statements`

I've uploaded the a static generated godoc output of the above Go
module here:


https://fedorapeople.org/~victortoso/qapi-go/rfc/victortoso.com/qapi-go/pkg/qapi/


* License

While the generator (golang.py in this series) is GPL v2, the
generated code needs to be compatible with other Golang projects,
such as the ones mentioned above. My intention is to keep a Go
module with a MIT license.


* Disclaimer to reviewers

This is my first serious python project so there'll be lots of
suggetions that I'll be happy to take and learn from.


Thanks for taking a look, let me know if you have questions, ideas
or suggestions.

Cheers,
Victor

Victor Toso (8):
  qapi: golang: Generate qapi's enum types in Go
  qapi: golang: Generate qapi's alternate types in Go
  qapi: golang: Generate qapi's struct types in Go
  qapi: golang: Generate qapi's union types in Go
  qapi: golang: Generate qapi's event types in Go
  qapi: golang: Generate qapi's command types in Go
  qapi: golang: Add CommandResult type to Go
  qapi: golang: document skip function visit_array_types

 qapi/meson.build   |   1 +
 scripts/qapi/golang.py | 727 +
 scripts/qapi/main.py   |   2 +
 3 files changed, 730 insertions(+)
 create mode 100644 scripts/qapi/golang.py

-- 
2.35.1




[RFC PATCH v1 1/8] qapi: golang: Generate qapi's enum types in Go

2022-04-01 Thread Victor Toso
This patch handles QAPI enum types and generates its equivalent in Go.

The highlights of this implementation are:

1. For each QAPI enum, we will define an int32 type in Go to be the
   assigned type of this specific enum

2. While in the Go codebase we can use the generated enum values, the
   specification requires that, on the wire, the enumeration type's
   value to be represented by its string name.

   For this reason, each Go type that represent's a QAPI enum will be
   implementing the Marshaler[0] and Unmarshaler[1] interfaces to
   seamless handle QMP's string to Go int32 value and vice-versa.

3. Naming: CamelCase will be used in any identifier that we want to
   export [2], which is everything in this patch.

[0] https://pkg.go.dev/encoding/json#Marshaler
[1] https://pkg.go.dev/encoding/json#Unmarshaler
[2] https://go.dev/ref/spec#Exported_identifiers

Signed-off-by: Victor Toso 
---
 qapi/meson.build   |   1 +
 scripts/qapi/golang.py | 225 +
 scripts/qapi/main.py   |   2 +
 3 files changed, 228 insertions(+)
 create mode 100644 scripts/qapi/golang.py

diff --git a/qapi/meson.build b/qapi/meson.build
index 656ef0e039..0951692332 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -90,6 +90,7 @@ qapi_nonmodule_outputs = [
   'qapi-init-commands.h', 'qapi-init-commands.c',
   'qapi-events.h', 'qapi-events.c',
   'qapi-emit-events.c', 'qapi-emit-events.h',
+  'qapibara.go',
 ]
 
 # First build all sources
diff --git a/scripts/qapi/golang.py b/scripts/qapi/golang.py
new file mode 100644
index 00..070d4cbbae
--- /dev/null
+++ b/scripts/qapi/golang.py
@@ -0,0 +1,225 @@
+"""
+Golang QAPI generator
+"""
+# Copyright (c) 2021 Red Hat Inc.
+#
+# Authors:
+#  Victor Toso 
+# 
+# This work is licensed under the terms of the GNU GPL, version 2.
+# See the COPYING file in the top-level directory.
+
+# Just for type hint on self
+from __future__ import annotations
+
+import os
+from typing import Dict, List, Optional
+
+from .schema import (
+QAPISchema,
+QAPISchemaVisitor,
+QAPISchemaEnumMember,
+QAPISchemaFeature,
+QAPISchemaIfCond,
+QAPISchemaObjectType,
+QAPISchemaObjectTypeMember,
+QAPISchemaVariants,
+)
+from .source import QAPISourceInfo
+
+class QAPISchemaGenGolangVisitor(QAPISchemaVisitor):
+
+def __init__(self, prefix: str):
+super().__init__()
+self.target = {name: "" for name in ["enum"]}
+self.schema = None
+self._docmap = {}
+self.golang_package_name = "qapi"
+
+def visit_begin(self, schema):
+self.schema = schema
+
+# Every Go file needs to reference its package name
+for target in self.target:
+self.target[target] = f"package {self.golang_package_name}\n"
+
+# Iterate once in schema.docs to map doc objects to its name
+for doc in schema.docs:
+if doc.symbol is None:
+continue
+self._docmap[doc.symbol] = doc
+
+def visit_end(self):
+self.schema = None
+
+def visit_object_type(self: QAPISchemaGenGolangVisitor,
+  name: str,
+  info: Optional[QAPISourceInfo],
+  ifcond: QAPISchemaIfCond,
+  features: List[QAPISchemaFeature],
+  base: Optional[QAPISchemaObjectType],
+  members: List[QAPISchemaObjectTypeMember],
+  variants: Optional[QAPISchemaVariants]
+  ) -> None:
+pass
+
+def visit_alternate_type(self: QAPISchemaGenGolangVisitor,
+ name: str,
+ info: Optional[QAPISourceInfo],
+ ifcond: QAPISchemaIfCond,
+ features: List[QAPISchemaFeature],
+ variants: QAPISchemaVariants
+ ) -> None:
+pass
+
+def visit_enum_type(self: QAPISchemaGenGolangVisitor,
+name: str,
+info: Optional[QAPISourceInfo],
+ifcond: QAPISchemaIfCond,
+features: List[QAPISchemaFeature],
+members: List[QAPISchemaEnumMember],
+prefix: Optional[str]
+) -> None:
+doc = self._docmap.get(name, None)
+doc_struct, doc_fields = qapi_to_golang_struct_docs(doc)
+
+value = qapi_to_field_name_enum(members[0].name)
+fields = f"\t{name}{value} {name} = iota\n"
+for member in members[1:]:
+field_doc = " " + doc_fields.get(member.name, "") if doc_fields 
else ""
+value = qapi_to_field_name_enum(member.name)
+fields += f"\t{name}{value}{field_doc}\n"
+
+self.target["enum"] += f'''
+{doc_struct}
+type {name} int32
+const (
+{fields[:-1]}
+)
+{generate_marshal_methods_enum(members)}
+''

Re: [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa

2022-04-01 Thread Si-Wei Liu




On 3/31/2022 1:39 AM, Jason Wang wrote:

On Wed, Mar 30, 2022 at 11:48 PM Si-Wei Liu  wrote:



On 3/30/2022 2:00 AM, Jason Wang wrote:

On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu  wrote:

With MQ enabled vdpa device and non-MQ supporting guest e.g.
booting vdpa with mq=on over OVMF of single vqp, below assert
failure is seen:

../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx >= dev->vq_index 
&& idx < dev->vq_index + dev->nvqs' failed.

0  0x7f8ce3ff3387 in raise () at /lib64/libc.so.6
1  0x7f8ce3ff4a78 in abort () at /lib64/libc.so.6
2  0x7f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
3  0x7f8ce3fec252 in  () at /lib64/libc.so.6
4  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
idx=) at ../hw/virtio/vhost-vdpa.c:563
5  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
idx=) at ../hw/virtio/vhost-vdpa.c:558
6  0x558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, 
vdev=0x558f568f91f0, n=2, mask=) at ../hw/virtio/vhost.c:1557
7  0x558f52c6b89a in virtio_pci_set_guest_notifier 
(d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, 
with_irqfd=with_irqfd@entry=false)
 at ../hw/virtio/virtio-pci.c:974
8  0x558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, 
nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
9  0x558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, 
ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
 at ../hw/net/vhost_net.c:361
10 0x558f52d4e5e7 in virtio_net_set_status (status=, 
n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
11 0x558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 
'\017') at ../hw/net/virtio-net.c:370
12 0x558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, 
val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
13 0x558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=, val=, size=) at ../hw/virtio/virtio-pci.c:1292
14 0x558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, 
value=, size=1, shift=, mask=, 
attrs=...)
 at ../softmmu/memory.c:492
15 0x558f52d127de in access_with_adjusted_size (addr=addr@entry=20, 
value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=, 
access_size_max=, access_fn=0x558f52d15cf0 
, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
16 0x558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, 
data=, op=, attrs=attrs@entry=...)
 at ../softmmu/memory.c:1504
17 0x558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, 
addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, 
addr1=, l=, mr=0x558f568f19d0) at 
/home/opc/qemu-upstream/include/qemu/host-utils.h:165
18 0x558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, 
attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
19 0x558f52d0b36b in address_space_write (as=, addr=, attrs=..., buf=buf@entry=0x7f8ce6300028, len=)
 at ../softmmu/physmem.c:2914
20 0x558f52d0b3da in address_space_rw (as=, addr=, attrs=...,
 attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=, 
is_write=) at ../softmmu/physmem.c:2924
21 0x558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at 
../accel/kvm/kvm-all.c:2903
22 0x558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at 
../accel/kvm/kvm-accel-ops.c:49
23 0x558f52f9f04a in qemu_thread_start (args=) at 
../util/qemu-thread-posix.c:556
24 0x7f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
25 0x7f8ce40bb9fd in clone () at /lib64/libc.so.6

The cause for the assert failure is due to that the vhost_dev index
for the ctrl vq was not aligned with actual one in use by the guest.
Upon multiqueue feature negotiation in virtio_net_set_multiqueue(),
if guest doesn't support multiqueue, the guest vq layout would shrink
to a single queue pair, consisting of 3 vqs in total (rx, tx and ctrl).
This results in ctrl_vq taking a different vhost_dev group index than
the default. We can map vq to the correct vhost_dev group by checking
if MQ is supported by guest and successfully negotiated. Since the
MQ feature is only present along with CTRL_VQ, we make sure the index
2 is only meant for the control vq while MQ is not supported by guest.

Be noted if QEMU or guest doesn't support control vq, there's no bother
exposing vhost_dev and guest notifier for the control vq. Since
vhost_net_start/stop implies DRIVER_OK is set in device status, feature
negotiation should be completed when reaching virtio_net_vhost_status().

Fixes: 22288fe ("virtio-net: vhost control virtqueue support")
Suggested-by: Jason Wang 
Signed-off-by: Si-Wei Liu 
---
   hw/net/virtio-net.c | 19 ---
   1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 1067e72..484b215 100644
--- a/

Re: [PATCH] linux-user/ppc: Narrow type of ccr in save_user_regs

2022-04-01 Thread Cédric Le Goater

On 4/1/22 21:16, Richard Henderson wrote:

Coverity warns that we shift a 32-bit value by N, and then
accumulate it into a 64-bit type (target_ulong on ppc64).

The ccr is always 8 * 4-bit fields, and thus is always a
32-bit quantity; narrow the type to avoid the warning.

Fixes: Coverity CID 1487223
Signed-off-by: Richard Henderson 


Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  linux-user/ppc/signal.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/ppc/signal.c b/linux-user/ppc/signal.c
index ec0b9c0df3..ce5a4682cd 100644
--- a/linux-user/ppc/signal.c
+++ b/linux-user/ppc/signal.c
@@ -229,7 +229,7 @@ static void save_user_regs(CPUPPCState *env, struct 
target_mcontext *frame)
  {
  target_ulong msr = env->msr;
  int i;
-target_ulong ccr = 0;
+uint32_t ccr = 0;
  
  /* In general, the kernel attempts to be intelligent about what it

 needs to save for Altivec/FP/SPE registers.  We don't care that





Re: [PATCH v3 0/3] block/dirty-bitmaps: fix and improve bitmap merge

2022-04-01 Thread Vladimir Sementsov-Ogievskiy

01.04.2022 23:17, Eric Blake wrote:

On Fri, Apr 01, 2022 at 01:08:01PM +0300, Vladimir Sementsov-Ogievskiy wrote:

v3: rebase on master, one patch is already merged.

Vladimir Sementsov-Ogievskiy (3):
   block: block_dirty_bitmap_merge(): fix error path
   block: improve block_dirty_bitmap_merge(): don't allocate extra bitmap
   block: simplify handling of try to merge different sized bitmaps


Is any of this series worth getting into 7.0, or are we safe letting
it slide to 7.1?



Let's check.

Only first patch is a fix.

anon bitmap is created on same bs where dst is found, so they should be 
compatible in size.

But bdrv_merge_dirty_bitmap also do some checks on dst status, which may 
actually fail..

So, in bad case we set errp, but return non-NULL dst bitmap.

Look at callers of block_dirty_bitmap_merge:

1. qmp_block_dirty_bitmap_merge() is OK, it ignores return value.

2. qmp_transaction use local_err to detect error, so we'll go through error path, 
that's good. state->bitmap is set, but it's not really matter. What makes sense is 
state->backup set or not?

state->backup is initialized with zero, as qmp_transaction() use g_malloc0 to 
allocate state buffer.

And bdrv_merge_dirty_bitmap() will do all checks prior to call 
bdrv_dirty_bitmap_merge_internal(), which actually can set @backup. So, 
state->backup is not set in our bad case.

So that all should be OK to postpone for 7.1.

--
Best regards,
Vladimir



[PATCH 1/1] xlnx-bbram: hw/nvram: Fix uninitialized Error *

2022-04-01 Thread Tong Ho
This adds required initialization of Error * variable.

Signed-off-by: Tong Ho 
---
 hw/nvram/xlnx-bbram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/nvram/xlnx-bbram.c b/hw/nvram/xlnx-bbram.c
index b70828e5bf..6ed32adad9 100644
--- a/hw/nvram/xlnx-bbram.c
+++ b/hw/nvram/xlnx-bbram.c
@@ -89,7 +89,7 @@ static bool bbram_pgm_enabled(XlnxBBRam *s)
 
 static void bbram_bdrv_error(XlnxBBRam *s, int rc, gchar *detail)
 {
-Error *errp;
+Error *errp = NULL;
 
 error_setg_errno(&errp, -rc, "%s: BBRAM backstore %s failed.",
  blk_name(s->blk), detail);
-- 
2.25.1




Re: [PATCH 4/7] virtio: don't read pending event on host notifier if disabled

2022-04-01 Thread Si-Wei Liu




On 3/31/2022 1:36 AM, Jason Wang wrote:

On Thu, Mar 31, 2022 at 12:41 AM Si-Wei Liu  wrote:



On 3/30/2022 2:14 AM, Jason Wang wrote:

On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu  wrote:

Previous commit prevents vhost-user and vhost-vdpa from using
userland vq handler via disable_ioeventfd_handler. The same
needs to be done for host notifier cleanup too, as the
virtio_queue_host_notifier_read handler still tends to read
pending event left behind on ioeventfd and attempts to handle
outstanding kicks from QEMU userland vq.

If vq handler is not disabled on cleanup, it may lead to sigsegv
with recursive virtio_net_set_status call on the control vq:

0  0x7f8ce3ff3387 in raise () at /lib64/libc.so.6
1  0x7f8ce3ff4a78 in abort () at /lib64/libc.so.6
2  0x7f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
3  0x7f8ce3fec252 in  () at /lib64/libc.so.6
4  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
idx=) at ../hw/virtio/vhost-vdpa.c:563
5  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
idx=) at ../hw/virtio/vhost-vdpa.c:558
6  0x558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, 
vdev=0x558f568f91f0, n=2, mask=) at ../hw/virtio/vhost.c:1557

I feel it's probably a bug elsewhere e.g when we fail to start
vhost-vDPA, it's the charge of the Qemu to poll host notifier and we
will fallback to the userspace vq handler.

Apologies, an incorrect stack trace was pasted which actually came from
patch #1. I will post a v2 with the corresponding one as below:

0  0x55f800df1780 in qdev_get_parent_bus (dev=0x0) at
../hw/core/qdev.c:376
1  0x55f800c68ad8 in virtio_bus_device_iommu_enabled
(vdev=vdev@entry=0x0) at ../hw/virtio/virtio-bus.c:331
2  0x55f800d70d7f in vhost_memory_unmap (dev=) at
../hw/virtio/vhost.c:318
3  0x55f800d70d7f in vhost_memory_unmap (dev=,
buffer=0x7fc19bec5240, len=2052, is_write=1, access_len=2052) at
../hw/virtio/vhost.c:336
4  0x55f800d71867 in vhost_virtqueue_stop
(dev=dev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590,
vq=0x55f8037cceb0, idx=0) at ../hw/virtio/vhost.c:1241
5  0x55f800d7406c in vhost_dev_stop (hdev=hdev@entry=0x55f8037ccc30,
vdev=vdev@entry=0x55f8044ec590) at ../hw/virtio/vhost.c:1839
6  0x55f800bf00a7 in vhost_net_stop_one (net=0x55f8037ccc30,
dev=0x55f8044ec590) at ../hw/net/vhost_net.c:315
7  0x55f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
cvq=cvq@entry=1)
 at ../hw/net/vhost_net.c:423
8  0x55f800d4e628 in virtio_net_set_status (status=,
n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
9  0x55f800d4e628 in virtio_net_set_status
(vdev=vdev@entry=0x55f8044ec590, status=15 '\017') at
../hw/net/virtio-net.c:370

I don't understand why virtio_net_handle_ctrl() call virtio_net_set_stauts()...
The pending request left over on the ctrl vq was a VIRTIO_NET_CTRL_MQ 
command, i.e. in virtio_net_handle_mq():


1413 n->curr_queue_pairs = queue_pairs;
1414 /* stop the backend before changing the number of queue_pairs 
to avoid handling a

1415  * disabled queue */
1416 virtio_net_set_status(vdev, vdev->status);
1417 virtio_net_set_queue_pairs(n);

Noted before the vdpa multiqueue support, there was never a vhost_dev 
for ctrl_vq exposed, i.e. there's no host notifier set up for the 
ctrl_vq on vhost_kernel as it is emulated in QEMU software.





10 0x55f800d534d8 in virtio_net_handle_ctrl (iov_cnt=, iov=, cmd=0 '\000', n=0x55f8044ec590) at
../hw/net/virtio-net.c:1408
11 0x55f800d534d8 in virtio_net_handle_ctrl (vdev=0x55f8044ec590,
vq=0x7fc1a7e888d0) at ../hw/net/virtio-net.c:1452
12 0x55f800d69f37 in virtio_queue_host_notifier_read
(vq=0x7fc1a7e888d0) at ../hw/virtio/virtio.c:2331
13 0x55f800d69f37 in virtio_queue_host_notifier_read
(n=n@entry=0x7fc1a7e8894c) at ../hw/virtio/virtio.c:3575
14 0x55f800c688e6 in virtio_bus_cleanup_host_notifier
(bus=, n=n@entry=14) at ../hw/virtio/virtio-bus.c:312
15 0x55f800d73106 in vhost_dev_disable_notifiers
(hdev=hdev@entry=0x55f8035b51b0, vdev=vdev@entry=0x55f8044ec590)
 at ../../../include/hw/virtio/virtio-bus.h:35
16 0x55f800bf00b2 in vhost_net_stop_one (net=0x55f8035b51b0,
dev=0x55f8044ec590) at ../hw/net/vhost_net.c:316
17 0x55f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
cvq=cvq@entry=1)
 at ../hw/net/vhost_net.c:423
18 0x55f800d4e628 in virtio_net_set_status (status=,
n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
19 0x55f800d4e628 in virtio_net_set_status (vdev=0x55f8044ec590,
status=15 '\017') at ../hw/net/virtio-net.c:370
20 0x55f800d6c4b2 in virtio_set_status (vdev=0x55f8044ec590,
val=) at ../hw/virtio/virtio.c:1945
21 0x55f800d11d9d in vm_state_notify (running=running@entry=false,
state=state@entry=RUN_STATE_SHUTDOWN) at ../softmmu/runstate.c:333
22 0x55f800d04e7a in do_vm_stop
(state=state@entry=RUN_STATE_SH

Re: [PATCH v3 0/3] block/dirty-bitmaps: fix and improve bitmap merge

2022-04-01 Thread Eric Blake
On Fri, Apr 01, 2022 at 01:08:01PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> v3: rebase on master, one patch is already merged.
> 
> Vladimir Sementsov-Ogievskiy (3):
>   block: block_dirty_bitmap_merge(): fix error path
>   block: improve block_dirty_bitmap_merge(): don't allocate extra bitmap
>   block: simplify handling of try to merge different sized bitmaps

Is any of this series worth getting into 7.0, or are we safe letting
it slide to 7.1?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 1/3] softfloat: Fix declaration of partsN_compare

2022-04-01 Thread Richard Henderson

On 4/1/22 12:12, Peter Maydell wrote:

The types are compatible, but it's weird that the compiler doesn't
warn that the prototype and definition are different types: it
seems like the kind of "technically valid but usually a bug" that
a warning would be nice for.


Good point. Submitted

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105131
https://github.com/llvm/llvm-project/issues/54706


r~



Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-01 Thread Andy Lutomirski
On Fri, Apr 1, 2022, at 7:59 AM, Quentin Perret wrote:
> On Thursday 31 Mar 2022 at 09:04:56 (-0700), Andy Lutomirski wrote:


> To answer your original question about memory 'conversion', the key
> thing is that the pKVM hypervisor controls the stage-2 page-tables for
> everyone in the system, all guests as well as the host. As such, a page
> 'conversion' is nothing more than a permission change in the relevant
> page-tables.
>

So I can see two different ways to approach this.

One is that you split the whole address space in half and, just like SEV and 
TDX, allocate one bit to indicate the shared/private status of a page.  This 
makes it work a lot like SEV and TDX.

The other is to have shared and private pages be distinguished only by their 
hypercall history and the (protected) page tables.  This saves some address 
space and some page table allocations, but it opens some cans of worms too.  In 
particular, the guest and the hypervisor need to coordinate, in a way that the 
guest can trust, to ensure that the guest's idea of which pages are private 
match the host's.  This model seems a bit harder to support nicely with the 
private memory fd model, but not necessarily impossible.

Also, what are you trying to accomplish by having the host userspace mmap 
private pages?  Is the idea that multiple guest could share the same page until 
such time as one of them tries to write to it?  That would be kind of like 
having a third kind of memory that's visible to host and guests but is 
read-only for everyone.  TDX and SEV can't support this at all (a private page 
belongs to one guest and one guest only, at least in SEV and in the current TDX 
SEAM spec).  I imagine that this could be supported with private memory fds 
with some care without mmap, though -- the host could still populate the page 
with memcpy.  Or I suppose a memslot could support using MAP_PRIVATE fds and 
have approximately the right semantics.

--Andy





Re: [PATCH] ppc/pnv: Fix number of registers in the PCIe controller on POWER9

2022-04-01 Thread Daniel Henrique Barboza




On 4/1/22 06:19, Frederic Barrat wrote:

The spec defines 3 registers, even though only index 0 and 2 are valid
on POWER9. The same model is used on POWER10. Register 1 is defined
there but we currently don't use it in skiboot. So we can keep
reporting an error on write.

Reported by Coverity (CID 1487176).

Fixes: 4f9924c4d4cf ("ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge")
Suggested-by: Benjamin Herrenschmidt 
Signed-off-by: Frederic Barrat 
---


Reviewed-by: Daniel Henrique Barboza 


  include/hw/pci-host/pnv_phb4.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h
index b02ecdceaa..19dcbd6f87 100644
--- a/include/hw/pci-host/pnv_phb4.h
+++ b/include/hw/pci-host/pnv_phb4.h
@@ -180,7 +180,7 @@ struct PnvPhb4PecState {
  MemoryRegion nest_regs_mr;
  
  /* PCI registers, excluding per-stack */

-#define PHB4_PEC_PCI_REGS_COUNT 0x2
+#define PHB4_PEC_PCI_REGS_COUNT 0x3
  uint64_t pci_regs[PHB4_PEC_PCI_REGS_COUNT];
  MemoryRegion pci_regs_mr;
  




[PATCH 0/1] xlnx-bbram: hw/nvram: Fix Coverity CID 1487233

2022-04-01 Thread Tong Ho
This patch addresses Coverity CID 1487233 by adding the required
initialiation of a local variable of type Error *.

Tong Ho (1):
  xlnx-bbram: hw/nvram: Fix uninitialized Error *

 hw/nvram/xlnx-bbram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.25.1




[PATCH] target/s390x: Fix the accumulation of ccm in op_icm

2022-04-01 Thread Richard Henderson
Coverity rightly reports that 0xff << pos can overflow.
This would affect the ICMH instruction.

Fixes: Coverity CID 1487161
Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 5acfc0ff9b..ea7baf0832 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2622,7 +2622,7 @@ static DisasJumpType op_icm(DisasContext *s, DisasOps *o)
 tcg_gen_qemu_ld8u(tmp, o->in2, get_mem_index(s));
 tcg_gen_addi_i64(o->in2, o->in2, 1);
 tcg_gen_deposit_i64(o->out, o->out, tmp, pos, 8);
-ccm |= 0xff << pos;
+ccm |= 0xffull << pos;
 }
 m3 = (m3 << 1) & 0xf;
 pos -= 8;
-- 
2.25.1




[PATCH] linux-user/ppc: Narrow type of ccr in save_user_regs

2022-04-01 Thread Richard Henderson
Coverity warns that we shift a 32-bit value by N, and then
accumulate it into a 64-bit type (target_ulong on ppc64).

The ccr is always 8 * 4-bit fields, and thus is always a
32-bit quantity; narrow the type to avoid the warning.

Fixes: Coverity CID 1487223
Signed-off-by: Richard Henderson 
---
 linux-user/ppc/signal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/ppc/signal.c b/linux-user/ppc/signal.c
index ec0b9c0df3..ce5a4682cd 100644
--- a/linux-user/ppc/signal.c
+++ b/linux-user/ppc/signal.c
@@ -229,7 +229,7 @@ static void save_user_regs(CPUPPCState *env, struct 
target_mcontext *frame)
 {
 target_ulong msr = env->msr;
 int i;
-target_ulong ccr = 0;
+uint32_t ccr = 0;
 
 /* In general, the kernel attempts to be intelligent about what it
needs to save for Altivec/FP/SPE registers.  We don't care that
-- 
2.25.1




[PATCH] plugins: Assert mmu_idx in range before use in qemu_plugin_get_hwaddr

2022-04-01 Thread Richard Henderson
Coverity reports out-of-bound accesses here.  This should be a
false positive due to how the index is decoded from MemOpIdx.

Fixes: Coverity CID 1487201
Signed-off-by: Richard Henderson 
---
 plugins/api.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/plugins/api.c b/plugins/api.c
index 7bf71b189d..2078b16edb 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -289,6 +289,8 @@ struct qemu_plugin_hwaddr 
*qemu_plugin_get_hwaddr(qemu_plugin_meminfo_t info,
 enum qemu_plugin_mem_rw rw = get_plugin_meminfo_rw(info);
 hwaddr_info.is_store = (rw & QEMU_PLUGIN_MEM_W) != 0;
 
+assert(mmu_idx < NB_MMU_MODES);
+
 if (!tlb_plugin_lookup(cpu, vaddr, mmu_idx,
hwaddr_info.is_store, &hwaddr_info)) {
 error_report("invalid use of qemu_plugin_get_hwaddr");
-- 
2.25.1




[PATCH] target/i386: Suppress coverity warning on fsave/frstor

2022-04-01 Thread Richard Henderson
Coverity warns that 14 << data32 may overflow with respect
to the target_ulong to which it is subsequently added.
We know this wasn't true because data32 is in [1,2],
but the suggested fix is perfectly fine.

Fixes: Coverity CID 1487135, 1487256
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/fpu_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index ebf5e73df9..30bc44fcf8 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -2466,7 +2466,7 @@ static void do_fsave(CPUX86State *env, target_ulong ptr, 
int data32,
 
 do_fstenv(env, ptr, data32, retaddr);
 
-ptr += (14 << data32);
+ptr += (target_ulong)14 << data32;
 for (i = 0; i < 8; i++) {
 tmp = ST(i);
 do_fstt(env, tmp, ptr, retaddr);
@@ -2488,7 +2488,7 @@ static void do_frstor(CPUX86State *env, target_ulong ptr, 
int data32,
 int i;
 
 do_fldenv(env, ptr, data32, retaddr);
-ptr += (14 << data32);
+ptr += (target_ulong)14 << data32;
 
 for (i = 0; i < 8; i++) {
 tmp = do_fldt(env, ptr, retaddr);
-- 
2.25.1




Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-01 Thread Sean Christopherson
On Fri, Apr 01, 2022, Quentin Perret wrote:
> On Friday 01 Apr 2022 at 17:14:21 (+), Sean Christopherson wrote:
> > On Fri, Apr 01, 2022, Quentin Perret wrote:
> > I assume there is a scenario where a page can be converted from 
> > shared=>private?
> > If so, is there a use case where that happens post-boot _and_ the contents 
> > of the
> > page are preserved?
> 
> I think most our use-cases are private=>shared, but how is that
> different?

Ah, it's not really different.  What I really was trying to understand is if 
there
are post-boot conversions that preserve data.  I asked about shared=>private 
because
there are known pre-boot conversions, e.g. populating the initial guest image, 
but
AFAIK there are no use cases for post-boot conversions, which might be more 
needy in
terms of performance.

> > > We currently don't allow the host punching holes in the guest IPA space.
> > 
> > The hole doesn't get punched in guest IPA space, it gets punched in the 
> > private
> > backing store, which is host PA space.
> 
> Hmm, in a previous message I thought that you mentioned when a whole
> gets punched in the fd KVM will go and unmap the page in the private
> SPTEs, which will cause a fatal error for any subsequent access from the
> guest to the corresponding IPA?

Oooh, that was in the context of TDX.  Mixing VMX and arm64 terminology... TDX 
has
two separate stage-2 roots, one for private IPAs and one for shared IPAs.  The
guest selects private/shared by toggling a bit stolen from the guest IPA space.
Upon conversion, KVM will remove from one stage-2 tree and insert into the 
other.

But even then, subsequent accesses to the wrong IPA won't be fatal, as KVM will
treat them as implicit conversions.  I wish they could be fatal, but that's not
"allowed" given the guest/host contract dictated by the TDX specs.

> If that's correct, I meant that we currently don't support that - the
> host can't unmap anything from the guest stage-2, it can only tear it
> down entirely. But again, I'm not too worried about that, we could
> certainly implement that part without too many issues.

I believe for the pKVM case it wouldn't be unmapping, it would be a PFN change.

> > > Once it has donated a page to a guest, it can't have it back until the
> > > guest has been entirely torn down (at which point all of memory is
> > > poisoned by the hypervisor obviously).
> > 
> > The guest doesn't have to know that it was handed back a different page.  
> > It will
> > require defining the semantics to state that the trusted hypervisor will 
> > clear
> > that page on conversion, but IMO the trusted hypervisor should be doing that
> > anyways.  IMO, forcing on the guest to correctly zero pages on conversion is
> > unnecessarily risky because converting private=>shared and preserving the 
> > contents
> > should be a very, very rare scenario, i.e. it's just one more thing for the 
> > guest
> > to get wrong.
> 
> I'm not sure I agree. The guest is going to communicate with an
> untrusted entity via that shared page, so it better be careful. Guest
> hardening in general is a major topic, and of all problems, zeroing the
> page before sharing is probably one of the simplest to solve.

Yes, for private=>shared you're correct, the guest needs to be paranoid as
there are no guarantees as to what data may be in the shared page.

I was thinking more in the context of shared=>private conversions, e.g. the 
guest
is done sharing a page and wants it back.  In that case, forcing the guest to 
zero
the private page upon re-acceptance is dicey.  Hmm, but if the guest needs to
explicitly re-accept the page, then putting the onus on the guest to zero the 
page
isn't a big deal.  The pKVM contract would just need to make it clear that the
guest cannot make any assumptions about the state of private data 

Oh, now I remember why I'm biased toward the trusted entity doing the work.
IIRC, thanks to TDX's lovely memory poisoning and cache aliasing behavior, the
guest can't be trusted to properly initialize private memory with the guest key,
i.e. the guest could induce a #MC and crash the host.

Anywho, I agree that for performance reasons, requiring the guest to zero 
private
pages is preferable so long as the guest must explicitly accept/initiate 
conversions.

> Also, note that in pKVM all the hypervisor code at EL2 runs with
> preemption disabled, which is a strict constraint. As such one of the
> main goals is the spend as little time as possible in that context.
> We're trying hard to keep the amount of zeroing/memcpy-ing to an
> absolute minimum. And that's especially true as we introduce support for
> huge pages. So, we'll take every opportunity we get to have the guest
> or the host do that work.

FWIW, TDX has the exact same constraints (they're actually worse as the trusted
entity runs with _all_ interrupts blocked).  And yeah, it needs to be careful 
when
dealing with huge pages, e.g. many flows force the guest/host to do 512 * 4kb 
operatio

Re: [PATCH 1/3] softfloat: Fix declaration of partsN_compare

2022-04-01 Thread Peter Maydell
On Fri, 1 Apr 2022 at 19:08, Richard Henderson
 wrote:
>
> On 4/1/22 07:22, Richard Henderson wrote:
> > The declaration used 'int', while the definition used 'FloatRelation'.
> > This should have resulted in a compiler error, but mysteriously didn't.
>
> Bah, of course there's no error -- this is C not C++.
>
> The enumeration has values -1 ... 2, which means that the enumeration is 
> compatible with
> an implementation defined signed integer type, which for our set of hosts 
> will be 'int'.
> So, no conflict.

The types are compatible, but it's weird that the compiler doesn't
warn that the prototype and definition are different types: it
seems like the kind of "technically valid but usually a bug" that
a warning would be nice for.

-- PMM



Re: [PATCH 1/3] softfloat: Fix declaration of partsN_compare

2022-04-01 Thread Richard Henderson

On 4/1/22 07:22, Richard Henderson wrote:

The declaration used 'int', while the definition used 'FloatRelation'.
This should have resulted in a compiler error, but mysteriously didn't.


Bah, of course there's no error -- this is C not C++.

The enumeration has values -1 ... 2, which means that the enumeration is compatible with 
an implementation defined signed integer type, which for our set of hosts will be 'int'. 
So, no conflict.


I'll edit the commit message.


r~



Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-01 Thread Quentin Perret
On Friday 01 Apr 2022 at 17:14:21 (+), Sean Christopherson wrote:
> On Fri, Apr 01, 2022, Quentin Perret wrote:
> > The typical flow is as follows:
> > 
> >  - the host asks the hypervisor to run a guest;
> > 
> >  - the hypervisor does the context switch, which includes switching
> >stage-2 page-tables;
> > 
> >  - initially the guest has an empty stage-2 (we don't require
> >pre-faulting everything), which means it'll immediately fault;
> > 
> >  - the hypervisor switches back to host context to handle the guest
> >fault;
> > 
> >  - the host handler finds the corresponding memslot and does the
> >ipa->hva conversion. In our current implementation it uses a longterm
> >GUP pin on the corresponding page;
> > 
> >  - once it has a page, the host handler issues a hypercall to donate the
> >page to the guest;
> > 
> >  - the hypervisor does a bunch of checks to make sure the host owns the
> >page, and if all is fine it will unmap it from the host stage-2 and
> >map it in the guest stage-2, and do some bookkeeping as it needs to
> >track page ownership, etc;
> > 
> >  - the guest can then proceed to run, and possibly faults in many more
> >pages;
> > 
> >  - when it wants to, the guest can then issue a hypercall to share a
> >page back with the host;
> > 
> >  - the hypervisor checks the request, maps the page back in the host
> >stage-2, does more bookkeeping and returns back to the host to notify
> >it of the share;
> > 
> >  - the host kernel at that point can exit back to userspace to relay
> >that information to the VMM;
> > 
> >  - rinse and repeat.
> 
> I assume there is a scenario where a page can be converted from 
> shared=>private?
> If so, is there a use case where that happens post-boot _and_ the contents of 
> the
> page are preserved?

I think most our use-cases are private=>shared, but how is that
different?

> > We currently don't allow the host punching holes in the guest IPA space.
> 
> The hole doesn't get punched in guest IPA space, it gets punched in the 
> private
> backing store, which is host PA space.

Hmm, in a previous message I thought that you mentioned when a whole
gets punched in the fd KVM will go and unmap the page in the private
SPTEs, which will cause a fatal error for any subsequent access from the
guest to the corresponding IPA?

If that's correct, I meant that we currently don't support that - the
host can't unmap anything from the guest stage-2, it can only tear it
down entirely. But again, I'm not too worried about that, we could
certainly implement that part without too many issues.

> > Once it has donated a page to a guest, it can't have it back until the
> > guest has been entirely torn down (at which point all of memory is
> > poisoned by the hypervisor obviously).
> 
> The guest doesn't have to know that it was handed back a different page.  It 
> will
> require defining the semantics to state that the trusted hypervisor will clear
> that page on conversion, but IMO the trusted hypervisor should be doing that
> anyways.  IMO, forcing on the guest to correctly zero pages on conversion is
> unnecessarily risky because converting private=>shared and preserving the 
> contents
> should be a very, very rare scenario, i.e. it's just one more thing for the 
> guest
> to get wrong.

I'm not sure I agree. The guest is going to communicate with an
untrusted entity via that shared page, so it better be careful. Guest
hardening in general is a major topic, and of all problems, zeroing the
page before sharing is probably one of the simplest to solve.

Also, note that in pKVM all the hypervisor code at EL2 runs with
preemption disabled, which is a strict constraint. As such one of the
main goals is the spend as little time as possible in that context.
We're trying hard to keep the amount of zeroing/memcpy-ing to an
absolute minimum. And that's especially true as we introduce support for
huge pages. So, we'll take every opportunity we get to have the guest
or the host do that work.



Re: [PATCH] accel/tcg: Remove ATOMIC_MMU_IDX

2022-04-01 Thread Peter Maydell
On Fri, 1 Apr 2022 at 18:11, Richard Henderson
 wrote:
>
> The last use of this macro was removed in f3e182b10013
> ("accel/tcg: Push trace info building into atomic_common.c.inc")
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Peter Maydell 

thanks
-- PMM



Re: [qemu.qmp PATCH 09/13] [FIXME] move PACKAGE.rst to README.rst and update

2022-04-01 Thread John Snow
On Fri, Apr 1, 2022, 12:40 PM Kashyap Chamarthy  wrote:

> On Wed, Mar 30, 2022 at 02:24:20PM -0400, John Snow wrote:
> > The README here will reflect both what is shown on GitLab and on the
> > PyPI landing page. Update it accordingly, and freshen it up.
> >
> > FIXME: Update URLs when pushing to the production repo.
> >
> > Suggested-by: Kashyap Chamarthy 
> > Signed-off-by: John Snow 
> > ---
>
> Hi,
>
> This version reads good to me. :)
>
> [...]
>
> > +
> > +Who isn't this library for?
> > +---
> > +
> > +It is not designed for anyone looking for a turn-key solution for VM
> > +management. QEMU is a low-level component that resembles a particularly
> > +impressive Swiss Army knife. This library does not manage that
> > +complexity and is largely "VM-ignorant". It's not a replacement for
> > +projects like `libvirt `_, `virt-manager
> > +`_, `GNOME Boxes
> > +`_, etc.
> > +
> > +
> > +Installing
> > +--
> > +
> > +This package can be installed from PyPI with pip: ``> pip3 install
> > +qemu.qmp``.
> >
> > +
> > +Usage
> > +-
> > +
> > +Launch QEMU with a monitor, e.g.::
> > +
> > +  > qemu-system-x86_64 -qmp unix:qmp.sock,server=on,wait=off
> > +
> > +
> > +Then, at its simplest, script-style usage looks like this::
> > +
> > +  import asyncio
> > +  from qemu.qmp import QMPClient
> > +
> > +  async def main():
> > +  qmp = QMPClient('my-vm-nickname')
> > +  await qmp.connect('qmp.sock')
> > +
> > +  res = await qmp.execute('query-status')
> > +  print(f"VM status: {res['status']}")
> > +
> > +  await qmp.disconnect()
> > +
> > +  asyncio.run(main())
>
> Tested the exmaple; this works!
>
> > +The above script will connect to the UNIX socket located at
> > +``qmp.sock``, query the VM's runstate, then print it out
> > +to the terminal::
> > +
> > +  > python3 example.py
> > +  VM status: running
> > +
> > +
> > +For more complex usages, especially those that make full advantage of
> > +monitoring asynchronous events, refer to the `online documentation
> > +`_ or type ``import qemu.qmp;
> > +help(qemu.qmp)`` in your Python terminal of choice.
> > +
> > +
> > +Contributing
> > +
> > +
> > +Contributions are quite welcome! Please file bugs using the `GitLab
> > +issue tracker `_. This
> > +project will accept GitLab merge requests, but due to the close
> > +association with the QEMU project, there are some additional guidelines:
> > +
> > +1. Please use the "Signed-off-by" tag in your commit messages. See
> > +   https://wiki.linuxfoundation.org/dco for more information on this
> > +   requirement.
> > +
> > +2. This repository won't squash merge requests into a single commit on
> > +   pull; each commit should seek to be self-contained (within reason).
> > +
> > +3. Owing to the above, each commit sent as part of a merge request
> > +   should not introduce any temporary regressions, even if fixed later
> > +   in the same merge request. This is done to preserve bisectability.
> > +
> > +4. Please associate every merge request with at least one `GitLab issue
> > +   `_. This helps with
> > +   generating Changelog text and staying organized. Thank you 🙇
>
> /me didn't miss the Japanese bow.
>
> > +Developing
> > +^^
> > +
> > +Optional packages necessary for running code quality analysis for this
> > +package can be installed with the optional dependency group "devel":
> > +``pip install qemu.qmp[devel]``.
> > +
> > +``make develop`` can be used to install this package in editable mode
> > +(to the current environment) *and* bring in testing dependencies in one
> > +command.
> > +
> > +``make check`` can be used to run the available tests. Consult ``make
> > +help`` for other targets and tests that make sense for different
> > +occasions.
> > +
> > +Before submitting a pull request, consider running ``make check-tox &&
> > +make check-pipenv`` locally to spot any issues that will cause the CI to
> > +fail. These checks use their own virtual environments and won't pollute
> > +your working space.
>
> Nit: Consider hyper-linking "virtual environments" to:
> https://docs.python.org/3/library/venv.html#module-venv
>
> (I realize, within context people will recognize the term "virtual
> environment" is not a virtual guest environment, but a Python venv.  I
> have a mild preference for being explicit here.)
>

OK. it's tough: there's a dual audience in mind here. People who are
familiar with Python development and QEMU devs who are not!

A link would suffice, but how to install, develop, and use python packages
can get a little complex. Longer than a paragraph, anyway.

Hard to know which complexities to omit and which to front-load.

¯\_(ツ)_/¯


> FWIW:
>
> Reviewed-by: Kashyap Chamarthy 


Thanks for your review prior to sending, and

Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-01 Thread Sean Christopherson
On Fri, Apr 01, 2022, Quentin Perret wrote:
> The typical flow is as follows:
> 
>  - the host asks the hypervisor to run a guest;
> 
>  - the hypervisor does the context switch, which includes switching
>stage-2 page-tables;
> 
>  - initially the guest has an empty stage-2 (we don't require
>pre-faulting everything), which means it'll immediately fault;
> 
>  - the hypervisor switches back to host context to handle the guest
>fault;
> 
>  - the host handler finds the corresponding memslot and does the
>ipa->hva conversion. In our current implementation it uses a longterm
>GUP pin on the corresponding page;
> 
>  - once it has a page, the host handler issues a hypercall to donate the
>page to the guest;
> 
>  - the hypervisor does a bunch of checks to make sure the host owns the
>page, and if all is fine it will unmap it from the host stage-2 and
>map it in the guest stage-2, and do some bookkeeping as it needs to
>track page ownership, etc;
> 
>  - the guest can then proceed to run, and possibly faults in many more
>pages;
> 
>  - when it wants to, the guest can then issue a hypercall to share a
>page back with the host;
> 
>  - the hypervisor checks the request, maps the page back in the host
>stage-2, does more bookkeeping and returns back to the host to notify
>it of the share;
> 
>  - the host kernel at that point can exit back to userspace to relay
>that information to the VMM;
> 
>  - rinse and repeat.

I assume there is a scenario where a page can be converted from shared=>private?
If so, is there a use case where that happens post-boot _and_ the contents of 
the
page are preserved?

> We currently don't allow the host punching holes in the guest IPA space.

The hole doesn't get punched in guest IPA space, it gets punched in the private
backing store, which is host PA space.

> Once it has donated a page to a guest, it can't have it back until the
> guest has been entirely torn down (at which point all of memory is
> poisoned by the hypervisor obviously).

The guest doesn't have to know that it was handed back a different page.  It 
will
require defining the semantics to state that the trusted hypervisor will clear
that page on conversion, but IMO the trusted hypervisor should be doing that
anyways.  IMO, forcing on the guest to correctly zero pages on conversion is
unnecessarily risky because converting private=>shared and preserving the 
contents
should be a very, very rare scenario, i.e. it's just one more thing for the 
guest
to get wrong.

If there is a use case where the page contents need to be preserved, then that 
can
and should be an explicit request from the guest, and can be handled through
export/import style functions.  Export/import would be slow-ish due to memcpy(),
which is why I asked if there's a need to do this specific action frequently (or
at all).



[PATCH] accel/tcg: Remove ATOMIC_MMU_IDX

2022-04-01 Thread Richard Henderson
The last use of this macro was removed in f3e182b10013
("accel/tcg: Push trace info building into atomic_common.c.inc")

Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c| 1 -
 accel/tcg/user-exec.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 2035b2ac0a..dd45e0467b 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -2552,7 +2552,6 @@ void cpu_stq_le_mmu(CPUArchState *env, target_ulong addr, 
uint64_t val,
 glue(glue(glue(cpu_atomic_ ## X, SUFFIX), END), _mmu)
 
 #define ATOMIC_MMU_CLEANUP
-#define ATOMIC_MMU_IDX   get_mmuidx(oi)
 
 #include "atomic_common.c.inc"
 
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 8edf0bbaa1..ac57324d4f 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -506,7 +506,6 @@ static void *atomic_mmu_lookup(CPUArchState *env, 
target_ulong addr,
 #define ATOMIC_NAME(X) \
 glue(glue(glue(cpu_atomic_ ## X, SUFFIX), END), _mmu)
 #define ATOMIC_MMU_CLEANUP do { clear_helper_retaddr(); } while (0)
-#define ATOMIC_MMU_IDX MMU_USER_IDX
 
 #define DATA_SIZE 1
 #include "atomic_template.h"
-- 
2.25.1




[PATCH] accel/tcg: Assert mmu_idx in range before use in cputlb

2022-04-01 Thread Richard Henderson
Coverity reports out-of-bound accesses within cputlb.c.
This should be a false positive due to how the index is
decoded from MemOpIdx.  To be fair, nothing is checking
the correct bounds during encoding either.

Assert index in range before use, both to catch user errors
and to pacify static analysis.

Fixes: Coverity CID 1487120, 1487127, 1487170, 1487196, 1487215, 1487238
Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 40 +++-
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index dd45e0467b..f90f4312ea 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1761,7 +1761,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, 
target_ulong addr,
MemOpIdx oi, int size, int prot,
uintptr_t retaddr)
 {
-size_t mmu_idx = get_mmuidx(oi);
+uintptr_t mmu_idx = get_mmuidx(oi);
 MemOp mop = get_memop(oi);
 int a_bits = get_alignment_bits(mop);
 uintptr_t index;
@@ -1769,6 +1769,8 @@ static void *atomic_mmu_lookup(CPUArchState *env, 
target_ulong addr,
 target_ulong tlb_addr;
 void *hostaddr;
 
+tcg_debug_assert(mmu_idx < NB_MMU_MODES);
+
 /* Adjust the given return address.  */
 retaddr -= GETPC_ADJ;
 
@@ -1908,18 +1910,20 @@ load_helper(CPUArchState *env, target_ulong addr, 
MemOpIdx oi,
 uintptr_t retaddr, MemOp op, bool code_read,
 FullLoadHelper *full_load)
 {
-uintptr_t mmu_idx = get_mmuidx(oi);
-uintptr_t index = tlb_index(env, mmu_idx, addr);
-CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-target_ulong tlb_addr = code_read ? entry->addr_code : entry->addr_read;
 const size_t tlb_off = code_read ?
 offsetof(CPUTLBEntry, addr_code) : offsetof(CPUTLBEntry, addr_read);
 const MMUAccessType access_type =
 code_read ? MMU_INST_FETCH : MMU_DATA_LOAD;
-unsigned a_bits = get_alignment_bits(get_memop(oi));
+const unsigned a_bits = get_alignment_bits(get_memop(oi));
+const size_t size = memop_size(op);
+uintptr_t mmu_idx = get_mmuidx(oi);
+uintptr_t index;
+CPUTLBEntry *entry;
+target_ulong tlb_addr;
 void *haddr;
 uint64_t res;
-size_t size = memop_size(op);
+
+tcg_debug_assert(mmu_idx < NB_MMU_MODES);
 
 /* Handle CPU specific unaligned behaviour */
 if (addr & ((1 << a_bits) - 1)) {
@@ -1927,6 +1931,10 @@ load_helper(CPUArchState *env, target_ulong addr, 
MemOpIdx oi,
  mmu_idx, retaddr);
 }
 
+index = tlb_index(env, mmu_idx, addr);
+entry = tlb_entry(env, mmu_idx, addr);
+tlb_addr = code_read ? entry->addr_code : entry->addr_read;
+
 /* If the TLB entry is for a different page, reload and try again.  */
 if (!tlb_hit(tlb_addr, addr)) {
 if (!victim_tlb_hit(env, mmu_idx, index, tlb_off,
@@ -2310,14 +2318,16 @@ static inline void QEMU_ALWAYS_INLINE
 store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
  MemOpIdx oi, uintptr_t retaddr, MemOp op)
 {
-uintptr_t mmu_idx = get_mmuidx(oi);
-uintptr_t index = tlb_index(env, mmu_idx, addr);
-CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-target_ulong tlb_addr = tlb_addr_write(entry);
 const size_t tlb_off = offsetof(CPUTLBEntry, addr_write);
-unsigned a_bits = get_alignment_bits(get_memop(oi));
+const unsigned a_bits = get_alignment_bits(get_memop(oi));
+const size_t size = memop_size(op);
+uintptr_t mmu_idx = get_mmuidx(oi);
+uintptr_t index;
+CPUTLBEntry *entry;
+target_ulong tlb_addr;
 void *haddr;
-size_t size = memop_size(op);
+
+tcg_debug_assert(mmu_idx < NB_MMU_MODES);
 
 /* Handle CPU specific unaligned behaviour */
 if (addr & ((1 << a_bits) - 1)) {
@@ -2325,6 +2335,10 @@ store_helper(CPUArchState *env, target_ulong addr, 
uint64_t val,
  mmu_idx, retaddr);
 }
 
+index = tlb_index(env, mmu_idx, addr);
+entry = tlb_entry(env, mmu_idx, addr);
+tlb_addr = tlb_addr_write(entry);
+
 /* If the TLB entry is for a different page, reload and try again.  */
 if (!tlb_hit(tlb_addr, addr)) {
 if (!victim_tlb_hit(env, mmu_idx, index, tlb_off,
-- 
2.25.1




Re: who's maintaining amd_iommu.c these days?

2022-04-01 Thread Wei Huang




On 4/1/22 11:14, Peter Maydell wrote:

On Fri, 1 Apr 2022 at 17:11, Wei Huang  wrote:




On 3/31/22 21:09, Jason Wang wrote:

On Fri, Apr 1, 2022 at 2:30 AM Peter Xu  wrote:


On Thu, Mar 31, 2022 at 05:01:52PM +0100, Peter Maydell wrote:

(4) The claimed bit layout of the event structure doesn't
match up with the one in the spec document I found. This
could be because I found a document for some other bit
of hardware, of course.

Could you elaborate? Are you referring to amdvi_setevent_bits(evt, info,
55, 8)?


https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf
was the spec I found through random googling, but the event
structure layouts in chapter 2.5 of that document aren't
at all like the one that amdvi_encode_event() is using.
Maybe that's the wrong spec, as I say.


The spec is up-to-date. But it seems amdvi_setevent_bits(evt, info, 55, 
8) is problematic. Use amdvi_log_illegaldevtab_error() as an example:


info |= AMDVI_EVENT_ILLEGAL_DEVTAB_ENTRY;
amdvi_encode_event(evt, devid, addr, info);
amdvi_log_event(s, evt);

info is encoded with AMDVI_EVENT_ILLEGAL_DEVTAB_ENTRY (1U << 12) and is 
used by amdvi_encode_event() via amdvi_setevent_bits(evt, info, 55, 8) 
into event log bits[63:55]. This should instead be written into 
bits[63:48] to match ILLEGAL_DEV_TABLE_ENTRY event format in Chapter 2.5.2.




thanks
-- PMM




Re: [qemu.qmp PATCH 02/13] fork qemu.qmp from qemu.git

2022-04-01 Thread Kashyap Chamarthy
On Wed, Mar 30, 2022 at 02:24:13PM -0400, John Snow wrote:
> Split python/ from qemu.git, using these commands:
> 
> > git subtree split -P python/ -b python-split-v3
> > mkdir ~/src/tmp
> > cd ~/src/tmp
> > git clone --no-local --branch python-split-v3 --single-branch ~/src/qemu
> > cd qemu
> > git filter-repo --path qemu/machine/   \
>   --path qemu/utils/ \
>   --path tests/iotests-mypy.sh   \
>   --path tests/iotests-pylint.sh \
>   --invert-paths
> 
> This commit, however, only performs some minimum cleanup to reflect the
> deletion of the other subpackages. It is not intended to be exhaustive,
> and further edits are made in forthcoming commits.
> 
> These fixes are broken apart into micro-changes to facilitate mailing
> list review subject-by-subject. They *could* be squashed into a single
> larger commit on merge if desired, but due to the nature of the fork,
> bisectability across the fork boundary is going to be challenging
> anyway. It may be better value to just leave these initial commits
> as-is.
> 
> Signed-off-by: John Snow 
> ---
>  .gitignore |  2 +-
>  Makefile   | 16 
>  setup.cfg  | 24 +---
>  setup.py   |  2 +-
>  4 files changed, 11 insertions(+), 33 deletions(-)

The changes here look fine to me (and thanks for making it a "micro
change").  I'll let sharper eyes than mine to give a closer look at the
`git filter-repo` surgery.  Although, that looks fine to me too.

[...]

>  .PHONY: distclean
>  distclean: clean
> - rm -rf qemu.egg-info/ .venv/ .tox/ $(QEMU_VENV_DIR) dist/
> + rm -rf qemu.qmp.egg-info/ .venv/ .tox/ $(QEMU_VENV_DIR) dist/
>   rm -f .coverage .coverage.*
>   rm -rf htmlcov/
> diff --git a/setup.cfg b/setup.cfg
> index e877ea5..4ffab73 100644
> --- a/setup.cfg
> +++ b/setup.cfg
> @@ -1,5 +1,5 @@
>  [metadata]
> -name = qemu
> +name = qemu.qmp
>  version = file:VERSION
>  maintainer = QEMU Developer Team

In the spirit of patch 04 ("update maintainer metadata"), do you also
want to update here too? s/QEMU Developer Team/QEMU Project?

FWIW:

Reviewed-by: Kashyap Chamarthy 

[...]

-- 
/kashyap




Re: [qemu.qmp PATCH 07/13] add a couple new trove classifiers

2022-04-01 Thread Beraldo Leal
On Wed, Mar 30, 2022 at 02:24:18PM -0400, John Snow wrote:
> Signed-off-by: John Snow 
> ---
>  setup.cfg | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/setup.cfg b/setup.cfg
> index 776f4f1..44f913d 100644
> --- a/setup.cfg
> +++ b/setup.cfg
> @@ -14,6 +14,7 @@ long_description = file:PACKAGE.rst
>  long_description_content_type = text/x-rst
>  classifiers =
>  Development Status :: 3 - Alpha
> +Intended Audience :: Developers
>  License :: OSI Approved :: GNU General Public License v2 (GPLv2)
>  Natural Language :: English
>  Operating System :: OS Independent
> @@ -23,6 +24,7 @@ classifiers =
>  Programming Language :: Python :: 3.8
>  Programming Language :: Python :: 3.9
>  Programming Language :: Python :: 3.10
> +Topic :: System :: Emulators
>  Typing :: Typed
>  
>  [options]
> -- 
> 2.34.1
> 

Reviewed-by: Beraldo Leal 

--
Beraldo




Re: [qemu.qmp PATCH 05/13] update project description

2022-04-01 Thread Beraldo Leal
On Wed, Mar 30, 2022 at 02:24:16PM -0400, John Snow wrote:
> Signed-off-by: John Snow 
> ---
>  setup.cfg | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/setup.cfg b/setup.cfg
> index f06f944..c21f2ce 100644
> --- a/setup.cfg
> +++ b/setup.cfg
> @@ -7,7 +7,7 @@ maintainer = John Snow
>  maintainer_email = js...@redhat.com
>  url = https://www.qemu.org/
>  download_url = https://www.qemu.org/download/
> -description = QEMU Python Build, Debug and SDK tooling.
> +description = QEMU Monitor Protocol library
>  long_description = file:PACKAGE.rst
>  long_description_content_type = text/x-rst
>  classifiers =

Reviewed-by: Beraldo Leal 

--
Beraldo




Re: who's maintaining amd_iommu.c these days?

2022-04-01 Thread Wei Huang




On 3/31/22 21:09, Jason Wang wrote:

On Fri, Apr 1, 2022 at 2:30 AM Peter Xu  wrote:


On Thu, Mar 31, 2022 at 05:01:52PM +0100, Peter Maydell wrote:

Coverity points out some problems with hw/i386/amd_iommu.c's event
logging code -- specifically, CID 1487115 1487116 1487190 1487200
1487232 1487258 are all the same basic problem, which is that various
functions declare a local "uint64_t evt[4]", populate only some
bits of it and then write it to guest memory, so we end up using
uninitialized host data and leaking it to the guest. I was going to
write a fix for this, but in looking at the code I noticed that
it has more extensive problems:

(1) these functions allocate an array of 4 64-bit values,
but we only copy 2 to the guest, because AMDVI_EVENT_LEN is 16.
Looking at the spec, I think that the length is right and it's
really 4 32-bit values (or 2 64-bit values, if you like).


Yes, amd event log entry size is 16 bytes. This should be easy to fixed.



(2) There are host-endianness bugs, because we assemble the
event as a set of host-endianness values but then write them
to guest memory as a bag-of-bytes with dma_memory_write()


I doubt people will enable AMD IOMMUs in QEMU on non-x86 host. 
Nevertheless this can be problematic and should be addressed.




(3) amdvi_encode_event() is throwing away most of its
"addr" argument, because it calls
   amdvi_setevent_bits(evt, addr, 63, 64) apparently intending
that to write 64 bits starting at 63 bits into the packet, but
the amdvi_setevent_bits() function only ever updates one
uint64_t in the array, so it will in fact write bit 63 and
nothing else.


Agreed



(4) The claimed bit layout of the event structure doesn't
match up with the one in the spec document I found. This
could be because I found a document for some other bit
of hardware, of course.
Could you elaborate? Are you referring to amdvi_setevent_bits(evt, info, 
55, 8)?




Anyway, adding all these up, the event logging probably
needs a bit of a restructuring, and that should ideally be
done by somebody who (a) knows the hardware we're emulating
here and (b) is in a position to test things. Any volunteers?


Copying some AMD developers (from where I saw the last patches from)...


Btw, the AMD IOMMU seems not to work for a while (just boot it with
virtio-blk and it still doesn't work).



This can be related to address space notifier problem I encountered. We 
will take care of these issues mentioned in this email thread.



Thanks



--
Peter Xu







Re: [qemu.qmp PATCH 09/13] [FIXME] move PACKAGE.rst to README.rst and update

2022-04-01 Thread Kashyap Chamarthy
On Wed, Mar 30, 2022 at 02:24:20PM -0400, John Snow wrote:
> The README here will reflect both what is shown on GitLab and on the
> PyPI landing page. Update it accordingly, and freshen it up.
> 
> FIXME: Update URLs when pushing to the production repo.
> 
> Suggested-by: Kashyap Chamarthy 
> Signed-off-by: John Snow 
> ---

Hi,

This version reads good to me. :)

[...]

> +
> +Who isn't this library for?
> +---
> +
> +It is not designed for anyone looking for a turn-key solution for VM
> +management. QEMU is a low-level component that resembles a particularly
> +impressive Swiss Army knife. This library does not manage that
> +complexity and is largely "VM-ignorant". It's not a replacement for
> +projects like `libvirt `_, `virt-manager
> +`_, `GNOME Boxes
> +`_, etc.
> +
> +
> +Installing
> +--
> +
> +This package can be installed from PyPI with pip: ``> pip3 install
> +qemu.qmp``.
> 
> +
> +Usage
> +-
> +
> +Launch QEMU with a monitor, e.g.::
> +
> +  > qemu-system-x86_64 -qmp unix:qmp.sock,server=on,wait=off
> +
> +
> +Then, at its simplest, script-style usage looks like this::
> +
> +  import asyncio
> +  from qemu.qmp import QMPClient
> +
> +  async def main():
> +  qmp = QMPClient('my-vm-nickname')
> +  await qmp.connect('qmp.sock')
> +
> +  res = await qmp.execute('query-status')
> +  print(f"VM status: {res['status']}")
> +
> +  await qmp.disconnect()
> +
> +  asyncio.run(main())

Tested the exmaple; this works!

> +The above script will connect to the UNIX socket located at
> +``qmp.sock``, query the VM's runstate, then print it out
> +to the terminal::
> +
> +  > python3 example.py
> +  VM status: running
> +
> +
> +For more complex usages, especially those that make full advantage of
> +monitoring asynchronous events, refer to the `online documentation
> +`_ or type ``import qemu.qmp;
> +help(qemu.qmp)`` in your Python terminal of choice.
> +
> +
> +Contributing
> +
> +
> +Contributions are quite welcome! Please file bugs using the `GitLab
> +issue tracker `_. This
> +project will accept GitLab merge requests, but due to the close
> +association with the QEMU project, there are some additional guidelines:
> +
> +1. Please use the "Signed-off-by" tag in your commit messages. See
> +   https://wiki.linuxfoundation.org/dco for more information on this
> +   requirement.
> +
> +2. This repository won't squash merge requests into a single commit on
> +   pull; each commit should seek to be self-contained (within reason).
> +
> +3. Owing to the above, each commit sent as part of a merge request
> +   should not introduce any temporary regressions, even if fixed later
> +   in the same merge request. This is done to preserve bisectability.
> +
> +4. Please associate every merge request with at least one `GitLab issue
> +   `_. This helps with
> +   generating Changelog text and staying organized. Thank you 🙇

/me didn't miss the Japanese bow.

> +Developing
> +^^
> +
> +Optional packages necessary for running code quality analysis for this
> +package can be installed with the optional dependency group "devel":
> +``pip install qemu.qmp[devel]``.
> +
> +``make develop`` can be used to install this package in editable mode
> +(to the current environment) *and* bring in testing dependencies in one
> +command.
> +
> +``make check`` can be used to run the available tests. Consult ``make
> +help`` for other targets and tests that make sense for different
> +occasions.
> +
> +Before submitting a pull request, consider running ``make check-tox &&
> +make check-pipenv`` locally to spot any issues that will cause the CI to
> +fail. These checks use their own virtual environments and won't pollute
> +your working space.

Nit: Consider hyper-linking "virtual environments" to:
https://docs.python.org/3/library/venv.html#module-venv

(I realize, within context people will recognize the term "virtual
environment" is not a virtual guest environment, but a Python venv.  I
have a mild preference for being explicit here.)

FWIW:

Reviewed-by: Kashyap Chamarthy  

[...]
 

-- 
/kashyap




Re: [PULL 0/2] riscv-to-apply queue

2022-04-01 Thread Peter Maydell
On Fri, 1 Apr 2022 at 00:50, Alistair Francis
 wrote:
>
> From: Alistair Francis 
>
> The following changes since commit d5341e09135b871199073572f53bc11ae9b44897:
>
>   Merge tag 'pull-tcg-20220331' of https://gitlab.com/rth7680/qemu into 
> staging (2022-03-31 18:36:08 +0100)
>
> are available in the Git repository at:
>
>   g...@github.com:alistair23/qemu.git tags/pull-riscv-to-apply-20220401
>
> for you to fetch changes up to 8ff8ac63298611c8373b294ec936475b1a33f63f:
>
>   target/riscv: rvv: Add missing early exit condition for whole register 
> load/store (2022-04-01 08:40:55 +1000)
>
> 
> Sixth RISC-V PR for QEMU 7.0
>
> This is a last minute RISC-V PR for 7.0.
>
> It includes a fix to avoid leaking no translation TLB entries. This
> incorrectly cached uncachable baremetal entries. This would break Linux
> boot while single stepping. As the fix is pretty straight forward (flush
> the cache more often) it's being pulled in for 7.0.
>
> At the same time I have included a RISC-V vector extension fixup patch.
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM



Re: who's maintaining amd_iommu.c these days?

2022-04-01 Thread Peter Maydell
On Fri, 1 Apr 2022 at 17:11, Wei Huang  wrote:
>
>
>
> On 3/31/22 21:09, Jason Wang wrote:
> > On Fri, Apr 1, 2022 at 2:30 AM Peter Xu  wrote:
> >>
> >> On Thu, Mar 31, 2022 at 05:01:52PM +0100, Peter Maydell wrote:
> >>> (4) The claimed bit layout of the event structure doesn't
> >>> match up with the one in the spec document I found. This
> >>> could be because I found a document for some other bit
> >>> of hardware, of course.
> Could you elaborate? Are you referring to amdvi_setevent_bits(evt, info,
> 55, 8)?

https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf
was the spec I found through random googling, but the event
structure layouts in chapter 2.5 of that document aren't
at all like the one that amdvi_encode_event() is using.
Maybe that's the wrong spec, as I say.

thanks
-- PMM



Re: [PATCH v2 5/7] block/block-copy: block_copy(): add timeout_ns parameter

2022-04-01 Thread Vladimir Sementsov-Ogievskiy

01.04.2022 16:16, Hanna Reitz wrote:

On 01.04.22 11:19, Vladimir Sementsov-Ogievskiy wrote:

Add possibility to limit block_copy() call in time. To be used in the
next commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/block-copy.c | 26 +++---
  block/copy-before-write.c  |  2 +-
  include/block/block-copy.h |  2 +-
  3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index ec46775ea5..b47cb188dd 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c


[...]


@@ -894,12 +902,16 @@ int coroutine_fn block_copy(BlockCopyState *s, int64_t 
start, int64_t bytes,
  .max_workers = BLOCK_COPY_MAX_WORKERS,
  };
-    return block_copy_common(&call_state);
-}
+    ret = qemu_co_timeout(block_copy_async_co_entry, call_state, timeout_ns,
+  g_free);


A direct path for timeout_ns == 0 might still be nice to have.


+    if (ret < 0) {
+    /* Timeout. call_state will be freed by running coroutine. */


Maybe assert(ret == -ETIMEDOUT);?


OK




+    return ret;


If I’m right in understanding how qemu_co_timeout() works, block_copy_common() 
will continue to run here.  Shouldn’t we at least cancel it by setting 
call_state->cancelled to true?


Agree



(Besides this, I think that letting block_copy_common() running in the 
background should be OK.  I’m not sure what the implications are if we do 
cancel the call here, while on-cbw-error is break-guest-write, though.  Should 
be fine, I guess, because block_copy_common() will still correctly keep track 
of what it has successfully copied and what it hasn’t?)


Hmm. I now think, that we should at least wait for such cancelled background requests 
before block_copy_state_free in cbw_close(). But in "[PATCH v5 00/45] Transactional 
block-graph modifying API" I want to detach children from CBW filter before calling 
.close().. So, possible solution is to wait for all cancelled requests on 
.bdrv_co_drain_begin().

Or alternatively, may be just increase bs->in_flight for CBW filter for each 
background cancelled request? And decrease when it finish. For this we should add 
a kind of callback to be called when timed-out coroutine entry finish.




+    }
-static void coroutine_fn block_copy_async_co_entry(void *opaque)
-{
-    block_copy_common(opaque);
+    ret = call_state->ret;
+
+    return ret;


But here we still need to free call_state, right?


  }
  BlockCopyCallState *block_copy_async(BlockCopyState *s,





--
Best regards,
Vladimir



Re: [RFC PATCH 1/2] spapr: Report correct GTSE support via ov5

2022-04-01 Thread Fabiano Rosas
"Aneesh Kumar K.V"  writes:

> David Gibson  writes:
>
>> On Mon, Mar 14, 2022 at 07:10:10PM -0300, Fabiano Rosas wrote:
>>> David Gibson  writes:
>>> 
>>> > On Tue, Mar 08, 2022 at 10:23:59PM -0300, Fabiano Rosas wrote:
>>>
>
> ...
>
>>> To satisfy TCG we could keep a spapr capability as ON and usually the
>>> guest would pass cap-gtse=off when running with KVM. However this
>>> doesn't work because this crash happens precisely because the nested
>>> guest doesn't know that it needs to use cap-rpt-invalidate=on. Another
>>> cap wouldn't help.
>>> 
>>> So I think the only way to have a spapr capability for this is if TCG
>>> always defaults to ON and KVM always defaults to OFF. But then we would
>>> be changing guest visible behaviour depending on host properties.
>>
>> Ok, I'd forgotten we already have cap-rpt-invalidate.  It still
>> defaults to OFF for now, which might help us.
>>
>> What's clear is that we should never disable GTSE if
>> cap-rpt-invalidate is off - qemu should enforce that before even
>> starting the guest if at all possible.
>>
>> What's less clear to me is if we want to enable GTSE by default or
>> not, in the cases where we're able to choose.  Would always disabling
>> GTSE when cap-rpt-invalidate=on be ok?  Or do we want to be able to
>> control GTSE separately.  In that case we might need a second cap, but
>> it would need inverted sense, so e.g. cap-disable-gtse.
>
>
> GTSE and cap-rpt-invalidate can be looked at as independent such that we
> can do GTSE=1 or GTSE=0 with cap-rpt-invalidate=on. But GTSE=0 with
> cap-rpt-invalidate=off is not allowed/possible. GTSE value is what is
> negotiated via CAS so we should let the hypervisor inform the guest whether it
> can do GTSE 0 or 1. The challenge IIUC is Qemu always assumed GTSE=1
> which is not true in the case of nested virt where L1 guest that is booted
> with GTSE=0.
>
> with cap-disable-gtse how would one interpret that? Whether hypervisor
> have the capability to disable gtse?

The spapr capability would mean "disable GTSE if KVM allows
it". Although I'd prefer using cap-gtse= because it gives us
more flexibility if we ever want to change the default value.

On the KVM side I am testing a KVM_CAP_PPC_GTSE_DISABLE with the
semantics of "whether QEMU is allowed to disable GTSE". It reports the
inverse of MMU_FTR_GTSE. So if L1 runs with GTSE=0, then the capability
returns 1 and therefore QEMU can disable GTSE. If the capability is not
present, then QEMU is not allowed to disable GTSE.

With David's idea of disallowing cap-rpt-invalidate=off,cap-gtse=off we
can simply deny the nested guest command line if it doesn't include
cap-rpt-invalidate=on when KVM L1 reports KVM_CAP_PPC_GTSE_DISABLE. That
way cap-gtse can default to ON to keep TCG working.

On a first look, I think the above works. I'm still running some tests
with different QEMU/kernel versions.





Re: use of uninitialized variable involving visit_type_uint32() and friends

2022-04-01 Thread Paolo Bonzini

On 4/1/22 15:11, Markus Armbruster wrote:

If it can do really serious interprocedural analysis, it _might_ be able
to see through the visitor constructor and know that the "value = *obj"
is not initialized (e.g. "all callers of object_property_set use an
input visitor").  I doubt that honestly, but a man can dream.


I'm wary of arguments based on "a sufficiently smart compiler can"...


Absolutely.


Because it communicates what the caller expects: "I have left this
uninitialized because I expect my "v" argument to be the kind of visitor
that fills it in".  It's this argument that gives me the confidence
needed to shut up Coverity's false positives.

Embedding the visitor type in the signature makes it impossible not to
pass it, unlike e.g. an assertion in every getter or setter.


I think we got two kinds of code calling visitor methods:

1. Code for use with one kind of visitor only

We get to pass a literal argument to the additional parameter you
propose.

2. Code for use with arbitrary visitors (such as qapi-visit*.c)

We need to pass v->type, where @v is the existing visitor argument.
Except we can't: struct Visitor and VisitorType are private, defined
in .  Easy enough to work around, but has a distinct
"this design is falling apart" smell, at least to me.


Hmm, maybe that's a feature though.  If we only need v->type in .c files 
for the generated visit_type_* functions, then it's not a huge deal that 
they will have to include .  All callers outside 
generated type visitors (which includes for example QMP command 
marshaling), instead, would _have_ to pass visitor type constants and 
make it clear what direction the visit is going.



Note that "intent explicit in every method call" is sufficient, but not
necessary for "intent is locally explicit, which lets us dismiss false
positives with confidence".  We could do "every function that calls
methods".  Like checking a precondition.  We already have
visit_is_input().  We could have visit_is_output().

The sane way to make output intent explicit is of course passing the
thing by value rather than by reference.  To get that, we could generate
even more code.  So, if the amount of code we currently generate isn't
disgusting enough, ...


Yeah, that would be ugly.  Or, we could generate the same code plus some 
static inline wrappers that take a


  struct InputVisitor {
  Visitor dont_use_me_it_hurts;
  }
  struct OutputVisitor {
  Visitor dont_use_me_it_hurts;
  }

That would be zero-cost abstraction at runtime.

Paolo



Re: [PATCH v4 10/11] tests/tcg/s390x: Tests for Vector Enhancements Facility 2

2022-04-01 Thread Christian Borntraeger

Am 01.04.22 um 17:02 schrieb David Miller:

vrr is almost a perfect match (it is for this, larger than imm4 would
need to be split).

.long : this would be uglier.
use enough to be filled with nops after ?
or use a 32b and 16b instead if it's in .text it should make no difference.


I will let Richard or David decide what they prefer.



[PULL 4/6] meson.build: Fix dependency of page-vary-common.c to config-poison.h

2022-04-01 Thread Thomas Huth


binieFV4N6em2.bin
Description: Binary data


Re: [PATCH 2/3] i386: factor out x86_firmware_configure()

2022-04-01 Thread Xiaoyao Li

On 4/1/2022 6:36 PM, Philippe Mathieu-Daudé wrote:

On 1/4/22 07:28, Xiaoyao Li wrote:

On 4/1/2022 1:08 PM, Gerd Hoffmann wrote:

   if (sev_enabled()) {


 ^^^



Can we remove the SEV check ...



+    pc_system_parse_ovmf_flash(ptr, size);
+
+    if (sev_enabled()) {


... because we are still checking SEV here.


Well, the two checks have slightly different purposes.  The first check
will probably become "if (sev || tdx)" soon, 


Not soon for TDX since the hacky pflash interface to load TDVF is 
rejected.


You can still convince us you need a pflash for TDX, and particularly
"a pflash that doesn't behave like pflash". 


I'm fine with "-bios" option to load TDVF. :)


Also, see the comment in
the next patch of this series:

+ * [...] there is no need to register
+ * the firmware as rom to properly re-initialize on reset.
+ * Just go for a straight file load instead.
+ */


Yes, Gerd's this series make it easier for TDX to load TDVF via -bios.


whereas the second will
become "if (sev) { ... } if (tdx) { ... }".

We could remove the first.  pc_system_parse_ovmf_flash() would run
unconditionally then.  Not needed, but should not have any bad side
effects.


OK, then:
Reviewed-by: Philippe Mathieu-Daudé 







[PULL 5/6] 9p: move P9_XATTR_SIZE_MAX from 9p.h to 9p.c

2022-04-01 Thread Thomas Huth


binQXAZYq898R.bin
Description: Binary data


[PULL 3/6] target/s390x: Fix determination of overflow condition code after subtraction

2022-04-01 Thread Thomas Huth


bina_MI2Na3ly.bin
Description: Binary data


[PULL 2/6] target/s390x: Fix determination of overflow condition code after addition

2022-04-01 Thread Thomas Huth


binUXStoSsacT.bin
Description: Binary data


[PULL 6/6] trace: fix compilation with lttng-ust >= 2.13

2022-04-01 Thread Thomas Huth
From: Marc-André Lureau 

On Fedora 36, with lttng-ust 2.13.1, compilation fails with:

In file included from trace/trace-ust-all.h:49085,
 from trace/trace-ust-all.c:13:
/usr/include/lttng/tracepoint-event.h:67:10: error: #include expects "FILENAME" 
or 
   67 | #include LTTNG_UST_TRACEPOINT_INCLUDE
  |  ^~~~

In lttng-ust commit 41858e2b6e8 ("Fix: don't do macro expansion in
tracepoint file name") from 2012, starting from lttng-ust 2.1, the API
was changed to expect TRACEPOINT_INCLUDE to be defined as a string.

In lttng-ust commit d2966b4b0b2 ("Remove TRACEPOINT_INCLUDE_FILE
macro"), in 2021, the compatibility macro was removed.

Use the "new" API from 2012, and bump the version requirement to 2.1 to
fix compilation with >= 2.13.

According to repology, all distributions we support have >= 2.1 (centos
8 has oldest with 2.8.1 afaict)

Signed-off-by: Marc-André Lureau 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20220328084717.367993-2-marcandre.lur...@redhat.com>
Signed-off-by: Thomas Huth 
---
 meson.build  | 4 ++--
 scripts/tracetool/format/ust_events_h.py | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/meson.build b/meson.build
index 04ce33fef1..861de93c4f 100644
--- a/meson.build
+++ b/meson.build
@@ -455,8 +455,8 @@ if 'CONFIG_GIO' in config_host
 endif
 lttng = not_found
 if 'ust' in get_option('trace_backends')
-  lttng = dependency('lttng-ust', required: true, method: 'pkg-config',
- kwargs: static_kwargs)
+  lttng = dependency('lttng-ust', required: true, version: '>= 2.1',
+ method: 'pkg-config', kwargs: static_kwargs)
 endif
 pixman = not_found
 if have_system or have_tools
diff --git a/scripts/tracetool/format/ust_events_h.py 
b/scripts/tracetool/format/ust_events_h.py
index 6ce559f6cc..b99fe6896b 100644
--- a/scripts/tracetool/format/ust_events_h.py
+++ b/scripts/tracetool/format/ust_events_h.py
@@ -29,8 +29,8 @@ def generate(events, backend, group):
 '#undef TRACEPOINT_PROVIDER',
 '#define TRACEPOINT_PROVIDER qemu',
 '',
-'#undef TRACEPOINT_INCLUDE_FILE',
-'#define TRACEPOINT_INCLUDE_FILE ./%s' % include,
+'#undef TRACEPOINT_INCLUDE',
+'#define TRACEPOINT_INCLUDE "./%s"' % include,
 '',
 '#if !defined (TRACE_%s_GENERATED_UST_H) || \\'  % group.upper(),
 ' defined(TRACEPOINT_HEADER_MULTI_READ)',
-- 
2.27.0




[PULL 1/6] misc: Fixes MAINTAINERS's path .github/workflows/lockdown.yml

2022-04-01 Thread Thomas Huth
From: Yonggang Luo 

Signed-off-by: Yonggang Luo 
Message-Id: <20220323080755.156-4-luoyongg...@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Thomas Huth 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index cc364afef7..d8b2601981 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3615,7 +3615,7 @@ M: Thomas Huth 
 R: Wainer dos Santos Moschetta 
 R: Beraldo Leal 
 S: Maintained
-F: .github/lockdown.yml
+F: .github/workflows/lockdown.yml
 F: .gitlab-ci.yml
 F: .gitlab-ci.d/
 F: .travis.yml
-- 
2.27.0




[PULL 0/6] Misc fixes for 7.0

2022-04-01 Thread Thomas Huth
The following changes since commit 9b617b1bb4056e60b39be4c33be20c10928a6a5c:

  Merge tag 'trivial-branch-for-7.0-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-04-01 10:23:27 +0100)

are available in the Git repository at:

  https://gitlab.com/thuth/qemu.git tags/pull-request-2022-04-01

for you to fetch changes up to e32aaa5a19e24233180042f84a0235a209de71cc:

  trace: fix compilation with lttng-ust >= 2.13 (2022-04-01 13:06:07 +0200)


* Fix some compilation issues
* Fix overflow calculation in s390x emulation
* Update location of lockdown.yml in MAINTAINERS file


Bruno Haible (2):
  target/s390x: Fix determination of overflow condition code after addition
  target/s390x: Fix determination of overflow condition code after 
subtraction

Marc-André Lureau (1):
  trace: fix compilation with lttng-ust >= 2.13

Thomas Huth (1):
  meson.build: Fix dependency of page-vary-common.c to config-poison.h

Will Cohen (1):
  9p: move P9_XATTR_SIZE_MAX from 9p.h to 9p.c

Yonggang Luo (1):
  misc: Fixes MAINTAINERS's path .github/workflows/lockdown.yml

 meson.build  |  6 +++---
 hw/9pfs/9p.h | 18 --
 hw/9pfs/9p.c | 28 +++-
 target/s390x/tcg/cc_helper.c |  8 
 MAINTAINERS  |  2 +-
 scripts/tracetool/format/ust_events_h.py |  4 ++--
 6 files changed, 33 insertions(+), 33 deletions(-)




Re: [PATCH v3 0/3] block/dirty-bitmaps: fix and improve bitmap merge

2022-04-01 Thread Hanna Reitz

On 01.04.22 12:08, Vladimir Sementsov-Ogievskiy wrote:

v3: rebase on master, one patch is already merged.

Vladimir Sementsov-Ogievskiy (3):
   block: block_dirty_bitmap_merge(): fix error path
   block: improve block_dirty_bitmap_merge(): don't allocate extra bitmap
   block: simplify handling of try to merge different sized bitmaps

  include/block/block_int-io.h|  2 +-
  include/qemu/hbitmap.h  | 15 ++---
  block/backup.c  |  6 ++
  block/dirty-bitmap.c| 26 ++
  block/monitor/bitmap-qmp-cmds.c | 38 +
  util/hbitmap.c  | 25 ++
  6 files changed, 43 insertions(+), 69 deletions(-)


Reviewed-by: Hanna Reitz 




Re: [PATCH v4 10/11] tests/tcg/s390x: Tests for Vector Enhancements Facility 2

2022-04-01 Thread David Miller
vrr is almost a perfect match (it is for this, larger than imm4 would
need to be split).

.long : this would be uglier.
use enough to be filled with nops after ?
or use a 32b and 16b instead if it's in .text it should make no difference.


On Fri, Apr 1, 2022 at 2:42 AM Christian Borntraeger
 wrote:
>
>
>
> Am 01.04.22 um 04:15 schrieb David Miller:
> > Hi,
> >
> > There is some issue with instruction sub/alt encodings not matching,
> > but I worked around it easily.
> >
> > I'm dropping the updated patch for the tests in here.
> > I know I should resend the entire patch series as a higher version
> > really, and will do so.
> > I'm hoping someone can tell me if it's ok to use .insn vrr  in place
> > of vri(-d) as it doesn't match vri.
> > [https://sourceware.org/binutils/docs-2.37/as/s390-Formats.html]
> >
> > .insn doesn't deal with sub encodings and there is no good alternative
> > that I know of.
> >
> > example:
> >
> >  /* vri-d as vrr */
> >  asm volatile(".insn vrr, 0xE786, %[v1], %[v2], %[v3], 0, %[I], 
> > 0\n"
> >  : [v1] "=v" (v1->v)
> >  : [v2]  "v" (v2->v)
> >  , [v3]  "v" (v3->v)
> >  , [I]   "i" (I & 7));
> >
> > Patch is attached
>
> Yes, vri sucks and does not work with vrsd. Maybe just use .long which is 
> probably
> better than using a "wrong" format.
> Opinions?



[PULL 6/6] target/arm: Don't use DISAS_NORETURN in STXP !HAVE_CMPXCHG128 codegen

2022-04-01 Thread Peter Maydell
In gen_store_exclusive(), if the host does not have a cmpxchg128
primitive then we generate bad code for STXP for storing two 64-bit
values.  We generate a call to the exit_atomic helper, which never
returns, and set is_jmp to DISAS_NORETURN.  However, this is
forgetting that we have already emitted a brcond that jumps over this
call for the case where we don't hold the exclusive.  The effect is
that we don't generate any code to end the TB for the
exclusive-not-held execution path, which falls into the "exit with
TB_EXIT_REQUESTED" code that gen_tb_end() emits.  This then causes an
assert at runtime when cpu_loop_exec_tb() sees an EXIT_REQUESTED TB
return that wasn't for an interrupt or icount.

In particular, you can hit this case when using the clang sanitizers
and trying to run the xlnx-versal-virt acceptance test in 'make
check-acceptance'.  This bug was masked until commit 848126d11e93ff
("meson: move int128 checks from configure") because we used to set
CONFIG_CMPXCHG128=1 and avoid the buggy codepath, but after that we
do not.

Fix the bug by not setting is_jmp.  The code after the exit_atomic
call up to the fail_label is dead, but TCG is smart enough to
eliminate it.  We do need to set 'tmp' to some valid value, though
(in the same way the exit_atomic-using code in tcg/tcg-op.c does).

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/953
Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220331150858.96348-1-peter.mayd...@linaro.org
---
 target/arm/translate-a64.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index d1a59fad9c2..9333d7be41a 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -2470,7 +2470,12 @@ static void gen_store_exclusive(DisasContext *s, int rd, 
int rt, int rt2,
 } else if (tb_cflags(s->base.tb) & CF_PARALLEL) {
 if (!HAVE_CMPXCHG128) {
 gen_helper_exit_atomic(cpu_env);
-s->base.is_jmp = DISAS_NORETURN;
+/*
+ * Produce a result so we have a well-formed opcode
+ * stream when the following (dead) code uses 'tmp'.
+ * TCG will remove the dead ops for us.
+ */
+tcg_gen_movi_i64(tmp, 0);
 } else if (s->be_data == MO_LE) {
 gen_helper_paired_cmpxchg64_le_parallel(tmp, cpu_env,
 cpu_exclusive_addr,
-- 
2.25.1




[PULL 2/6] target/arm: Check VSTCR.SW when assigning the stage 2 output PA space

2022-04-01 Thread Peter Maydell
From: Idan Horowitz 

As per the AArch64.SS2OutputPASpace() psuedo-code in the ARMv8 ARM when the
PA space of the IPA is non secure, the output PA space is secure if and only
if all of the bits VTCR., VSTCR. are not set.

Signed-off-by: Idan Horowitz 
Reviewed-by: Richard Henderson 
Message-id: 20220327093427.1548629-2-idan.horow...@gmail.com
Signed-off-by: Peter Maydell 
---
 target/arm/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 3aeaea40683..a65b39625db 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12697,7 +12697,7 @@ bool get_phys_addr(CPUARMState *env, target_ulong 
address,
 } else {
 attrs->secure =
 !((env->cp15.vtcr_el2.raw_tcr & (VTCR_NSA | VTCR_NSW))
-|| (env->cp15.vstcr_el2.raw_tcr & VSTCR_SA));
+|| (env->cp15.vstcr_el2.raw_tcr & (VSTCR_SA | 
VSTCR_SW)));
 }
 }
 return 0;
-- 
2.25.1




[PULL 1/6] target/arm: Fix MTE access checks for disabled SEL2

2022-04-01 Thread Peter Maydell
From: Idan Horowitz 

While not mentioned anywhere in the actual specification text, the
HCR_EL2.ATA bit is treated as '1' when EL2 is disabled at the current
security state. This can be observed in the psuedo-code implementation
of AArch64.AllocationTagAccessIsEnabled().

Signed-off-by: Idan Horowitz 
Reviewed-by: Richard Henderson 
Message-id: 20220328173107.311267-1-idan.horow...@gmail.com
Signed-off-by: Peter Maydell 
---
 target/arm/internals.h | 2 +-
 target/arm/helper.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index a34be2e4595..7f696cd36a8 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1094,7 +1094,7 @@ static inline bool 
allocation_tag_access_enabled(CPUARMState *env, int el,
 && !(env->cp15.scr_el3 & SCR_ATA)) {
 return false;
 }
-if (el < 2 && arm_feature(env, ARM_FEATURE_EL2)) {
+if (el < 2 && arm_is_el2_enabled(env)) {
 uint64_t hcr = arm_hcr_el2_eff(env);
 if (!(hcr & HCR_ATA) && (!(hcr & HCR_E2H) || !(hcr & HCR_TGE))) {
 return false;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 812ca591f4e..3aeaea40683 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7176,7 +7176,7 @@ static CPAccessResult access_mte(CPUARMState *env, const 
ARMCPRegInfo *ri,
 {
 int el = arm_current_el(env);
 
-if (el < 2 && arm_feature(env, ARM_FEATURE_EL2)) {
+if (el < 2 && arm_is_el2_enabled(env)) {
 uint64_t hcr = arm_hcr_el2_eff(env);
 if (!(hcr & HCR_ATA) && (!(hcr & HCR_E2H) || !(hcr & HCR_TGE))) {
 return CP_ACCESS_TRAP_EL2;
-- 
2.25.1




[PULL 3/6] target/arm: Take VSTCR.SW, VTCR.NSW into account in final stage 2 walk

2022-04-01 Thread Peter Maydell
From: Idan Horowitz 

As per the AArch64.SS2InitialTTWState() psuedo-code in the ARMv8 ARM the
initial PA space used for stage 2 table walks is assigned based on the SW
and NSW bits of the VSTCR and VTCR registers.
This was already implemented for the recursive stage 2 page table walks
in S1_ptw_translate(), but was missing for the final stage 2 walk.

Signed-off-by: Idan Horowitz 
Reviewed-by: Richard Henderson 
Message-id: 20220327093427.1548629-3-idan.horow...@gmail.com
Signed-off-by: Peter Maydell 
---
 target/arm/helper.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index a65b39625db..6fd5c27340e 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12657,6 +12657,16 @@ bool get_phys_addr(CPUARMState *env, target_ulong 
address,
 return ret;
 }
 
+if (arm_is_secure_below_el3(env)) {
+if (attrs->secure) {
+attrs->secure = !(env->cp15.vstcr_el2.raw_tcr & VSTCR_SW);
+} else {
+attrs->secure = !(env->cp15.vtcr_el2.raw_tcr & VTCR_NSW);
+}
+} else {
+assert(!attrs->secure);
+}
+
 s2_mmu_idx = attrs->secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
 is_el0 = mmu_idx == ARMMMUIdx_E10_0 || mmu_idx == ARMMMUIdx_SE10_0;
 
-- 
2.25.1




[PULL 5/6] MAINTAINERS: change Fred Konrad's email address

2022-04-01 Thread Peter Maydell
From: Frederic Konrad 

frederic.kon...@adacore.com and kon...@adacore.com will stop working starting
2022-04-01.

Use my personal email instead.

Signed-off-by: Frederic Konrad 
Reviewed-by: Fabien Chouteau >
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 1648643217-15811-1-git-send-email-frederic.kon...@adacore.com
Signed-off-by: Peter Maydell 
---
 .mailmap| 3 ++-
 MAINTAINERS | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/.mailmap b/.mailmap
index 09dcd8c2169..2976a675ea5 100644
--- a/.mailmap
+++ b/.mailmap
@@ -56,7 +56,8 @@ Alexander Graf  
 Anthony Liguori  Anthony Liguori 
 Christian Borntraeger  
 Filip Bozuta  
-Frederic Konrad  
+Frederic Konrad  
+Frederic Konrad  
 Greg Kurz  
 Huacai Chen  
 Huacai Chen  
diff --git a/MAINTAINERS b/MAINTAINERS
index cc364afef73..68142340bd1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1533,7 +1533,7 @@ F: include/hw/rtc/sun4v-rtc.h
 
 Leon3
 M: Fabien Chouteau 
-M: KONRAD Frederic 
+M: Frederic Konrad 
 S: Maintained
 F: hw/sparc/leon3.c
 F: hw/*/grlib*
-- 
2.25.1




[PULL 0/6] target-arm queue

2022-04-01 Thread Peter Maydell
Some small arm bug fixes for rc3.

-- PMM

The following changes since commit 9b617b1bb4056e60b39be4c33be20c10928a6a5c:

  Merge tag 'trivial-branch-for-7.0-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-04-01 10:23:27 +0100)

are available in the Git repository at:

  https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20220401

for you to fetch changes up to a5b1e1ab662aa6dc42d5a913080fccbb8bf82e9b:

  target/arm: Don't use DISAS_NORETURN in STXP !HAVE_CMPXCHG128 codegen 
(2022-04-01 15:35:49 +0100)


target-arm queue:
 * target/arm: Fix some bugs in secure EL2 handling
 * target/arm: Fix assert when !HAVE_CMPXCHG128
 * MAINTAINERS: change Fred Konrad's email address


Frederic Konrad (1):
  MAINTAINERS: change Fred Konrad's email address

Idan Horowitz (4):
  target/arm: Fix MTE access checks for disabled SEL2
  target/arm: Check VSTCR.SW when assigning the stage 2 output PA space
  target/arm: Take VSTCR.SW, VTCR.NSW into account in final stage 2 walk
  target/arm: Determine final stage 2 output PA space based on original IPA

Peter Maydell (1):
  target/arm: Don't use DISAS_NORETURN in STXP !HAVE_CMPXCHG128 codegen

 target/arm/internals.h |  2 +-
 target/arm/helper.c| 18 +++---
 target/arm/translate-a64.c |  7 ++-
 .mailmap   |  3 ++-
 MAINTAINERS|  2 +-
 5 files changed, 25 insertions(+), 7 deletions(-)



[PULL 4/6] target/arm: Determine final stage 2 output PA space based on original IPA

2022-04-01 Thread Peter Maydell
From: Idan Horowitz 

As per the AArch64.S2Walk() pseudo-code in the ARMv8 ARM, the final
decision as to the output address's PA space based on the SA/SW/NSA/NSW
bits needs to take the input IPA's PA space into account, and not the
PA space of the result of the stage 2 walk itself.

Signed-off-by: Idan Horowitz 
Reviewed-by: Richard Henderson 
Message-id: 20220327093427.1548629-4-idan.horow...@gmail.com
[PMM: fixed commit message typo]
Signed-off-by: Peter Maydell 
---
 target/arm/helper.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 6fd5c27340e..7d14650615c 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12644,6 +12644,7 @@ bool get_phys_addr(CPUARMState *env, target_ulong 
address,
 hwaddr ipa;
 int s2_prot;
 int ret;
+bool ipa_secure;
 ARMCacheAttrs cacheattrs2 = {};
 ARMMMUIdx s2_mmu_idx;
 bool is_el0;
@@ -12657,14 +12658,15 @@ bool get_phys_addr(CPUARMState *env, target_ulong 
address,
 return ret;
 }
 
+ipa_secure = attrs->secure;
 if (arm_is_secure_below_el3(env)) {
-if (attrs->secure) {
+if (ipa_secure) {
 attrs->secure = !(env->cp15.vstcr_el2.raw_tcr & VSTCR_SW);
 } else {
 attrs->secure = !(env->cp15.vtcr_el2.raw_tcr & VTCR_NSW);
 }
 } else {
-assert(!attrs->secure);
+assert(!ipa_secure);
 }
 
 s2_mmu_idx = attrs->secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
@@ -12701,7 +12703,7 @@ bool get_phys_addr(CPUARMState *env, target_ulong 
address,
 
 /* Check if IPA translates to secure or non-secure PA space. */
 if (arm_is_secure_below_el3(env)) {
-if (attrs->secure) {
+if (ipa_secure) {
 attrs->secure =
 !(env->cp15.vstcr_el2.raw_tcr & (VSTCR_SA | VSTCR_SW));
 } else {
-- 
2.25.1




Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-01 Thread Quentin Perret
On Thursday 31 Mar 2022 at 09:04:56 (-0700), Andy Lutomirski wrote:
> On Wed, Mar 30, 2022, at 10:58 AM, Sean Christopherson wrote:
> > On Wed, Mar 30, 2022, Quentin Perret wrote:
> >> On Wednesday 30 Mar 2022 at 09:58:27 (+0100), Steven Price wrote:
> >> > On 29/03/2022 18:01, Quentin Perret wrote:
> >> > > Is implicit sharing a thing? E.g., if a guest makes a memory access in
> >> > > the shared gpa range at an address that doesn't have a backing memslot,
> >> > > will KVM check whether there is a corresponding private memslot at the
> >> > > right offset with a hole punched and report a KVM_EXIT_MEMORY_ERROR? Or
> >> > > would that just generate an MMIO exit as usual?
> >> > 
> >> > My understanding is that the guest needs some way of tagging whether a
> >> > page is expected to be shared or private. On the architectures I'm aware
> >> > of this is done by effectively stealing a bit from the IPA space and
> >> > pretending it's a flag bit.
> >> 
> >> Right, and that is in fact the main point of divergence we have I think.
> >> While I understand this might be necessary for TDX and the likes, this
> >> makes little sense for pKVM. This would effectively embed into the IPA a
> >> purely software-defined non-architectural property/protocol although we
> >> don't actually need to: we (pKVM) can reasonably expect the guest to
> >> explicitly issue hypercalls to share pages in-place. So I'd be really
> >> keen to avoid baking in assumptions about that model too deep in the
> >> host mm bits if at all possible.
> >
> > There is no assumption about stealing PA bits baked into this API.  Even 
> > within
> > x86 KVM, I consider it a hard requirement that the common flows not assume 
> > the
> > private vs. shared information is communicated through the PA.
> 
> Quentin, I think we might need a clarification.  The API in this patchset 
> indeed has no requirement that a PA bit distinguish between private and 
> shared, but I think it makes at least a weak assumption that *something*, a 
> priori, distinguishes them.  In particular, there are private memslots and 
> shared memslots, so the logical flow of resolving a guest memory access looks 
> like:
> 
> 1. guest accesses a GVA
> 
> 2. read guest paging structures
> 
> 3. determine whether this is a shared or private access
> 
> 4. read host (KVM memslots and anything else, EPT, NPT, RMP, etc) structures 
> accordingly.  In particular, the memslot to reference is different depending 
> on the access type.
> 
> For TDX, this maps on to the fd-based model perfectly: the host-side paging 
> structures for the shared and private slots are completely separate.  For 
> SEV, the structures are shared and KVM will need to figure out what to do in 
> case a private and shared memslot overlap.  Presumably it's sufficient to 
> declare that one of them wins, although actually determining which one is 
> active for a given GPA may involve checking whether the backing store for a 
> given page actually exists.
> 
> But I don't understand pKVM well enough to understand how it fits in.  
> Quentin, how is the shared vs private mode of a memory access determined?  
> How do the paging structures work?  Can a guest switch between shared and 
> private by issuing a hypercall without changing any guest-side paging 
> structures or anything else?

My apologies, I've indeed shared very little details about how pKVM
works. We'll be posting patches upstream really soon that will hopefully
help with this, but in the meantime, here is the idea.

pKVM is designed around MMU-based protection as opposed to encryption as
is the case for many confidential computing solutions. It's probably
worth mentioning that, although it targets arm64, pKVM is distinct from
the Arm CC-A stuff and requires no fancy hardware extensions -- it is
applicable all the way back to Arm v8.0 which makes it an interesting
solution for mobile.

Another particularity of the pKVM approach is that the code of the
hypervisor itself lives in the kernel source tree (see
arch/arm64/kvm/hyp/nvhe/). The hypervisor is built with the rest of the
kernel but as a self-sufficient object, and ends up in its own dedicated
ELF section (.hyp.*) in the kernel image. The main requirement for pKVM
(and KVM on arm64 in general) is to have the bootloader enter the kernel
at the hypervisor exception level (a.k.a EL2). The boot procedure is a
bit involved, but eventually the hypervisor object is installed at EL2,
and the kernel is deprivileged to EL1 and proceeds to boot. From that
point on the hypervisor no longer trusts the kernel and will enable the
stage-2 MMU to impose access-control restrictions to all memory accesses
from the host.

All that to say: the pKVM approach offers a great deal of flexibility
when it comes to hypervisor behaviour. We have control over the
hypervisor code and can change it as we see fit. Since both the
hypervisor and the host kernel are part of the same image, the ABI
between them is very much *not* stable and

[PATCH v5 3/3] avocado/vnc: add test_change_listen

2022-04-01 Thread Vladimir Sementsov-Ogievskiy
Add simple test-case for new display-update qmp command.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Daniel P. Berrangé 
---
 tests/avocado/vnc.py | 63 
 1 file changed, 63 insertions(+)

diff --git a/tests/avocado/vnc.py b/tests/avocado/vnc.py
index 096432988f..187fd3febc 100644
--- a/tests/avocado/vnc.py
+++ b/tests/avocado/vnc.py
@@ -8,9 +8,48 @@
 # This work is licensed under the terms of the GNU GPL, version 2 or
 # later.  See the COPYING file in the top-level directory.
 
+import socket
+from typing import List
+
 from avocado_qemu import QemuSystemTest
 
 
+VNC_ADDR = '127.0.0.1'
+VNC_PORT_START = 32768
+VNC_PORT_END = VNC_PORT_START + 1024
+
+
+def check_bind(port: int) -> bool:
+with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
+try:
+sock.bind((VNC_ADDR, port))
+except OSError:
+return False
+
+return True
+
+
+def check_connect(port: int) -> bool:
+with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
+try:
+sock.connect((VNC_ADDR, port))
+except ConnectionRefusedError:
+return False
+
+return True
+
+
+def find_free_ports(count: int) -> List[int]:
+result = []
+for port in range(VNC_PORT_START, VNC_PORT_END):
+if check_bind(port):
+result.append(port)
+if len(result) >= count:
+break
+assert len(result) == count
+return result
+
+
 class Vnc(QemuSystemTest):
 """
 :avocado: tags=vnc,quick
@@ -51,3 +90,27 @@ def test_change_password(self):
 set_password_response = self.vm.qmp('change-vnc-password',
 password='new_password')
 self.assertEqual(set_password_response['return'], {})
+
+def test_change_listen(self):
+a, b, c = find_free_ports(3)
+self.assertFalse(check_connect(a))
+self.assertFalse(check_connect(b))
+self.assertFalse(check_connect(c))
+
+self.vm.add_args('-nodefaults', '-S', '-vnc', f'{VNC_ADDR}:{a - 5900}')
+self.vm.launch()
+self.assertEqual(self.vm.qmp('query-vnc')['return']['service'], str(a))
+self.assertTrue(check_connect(a))
+self.assertFalse(check_connect(b))
+self.assertFalse(check_connect(c))
+
+res = self.vm.qmp('display-update', type='vnc',
+  addresses=[{'type': 'inet', 'host': VNC_ADDR,
+  'port': str(b)},
+ {'type': 'inet', 'host': VNC_ADDR,
+  'port': str(c)}])
+self.assertEqual(res['return'], {})
+self.assertEqual(self.vm.qmp('query-vnc')['return']['service'], str(b))
+self.assertFalse(check_connect(a))
+self.assertTrue(check_connect(b))
+self.assertTrue(check_connect(c))
-- 
2.35.1




[PATCH v5 0/3] qapi/ui: change vnc addresses

2022-04-01 Thread Vladimir Sementsov-Ogievskiy
v5: tiny tweaks, add r-bs

Recently our customer requested a possibility to change VNC listen port
dynamically.

Happily in Rhel7-based Qemu we already have this possibility: through
deprecated "change" qmp command.

But since 6.0 "change" qmp command was removed, with recommendation to
use change-vnc-password or blockdev-change-medium instead. Of course,
neither of them allow change VNC listen port.

So, let's reimplement the possibility.

Vladimir Sementsov-Ogievskiy (3):
  ui/vnc: refactor arrays of addresses to SocketAddressList
  qapi/ui: add 'display-update' command for changing listen address
  avocado/vnc: add test_change_listen

 docs/about/removed-features.rst |   3 +-
 qapi/ui.json|  65 ++
 include/ui/console.h|   1 +
 monitor/qmp-cmds.c  |  15 
 ui/vnc.c| 152 
 tests/avocado/vnc.py|  63 +
 6 files changed, 220 insertions(+), 79 deletions(-)

-- 
2.35.1




[PATCH v5 2/3] qapi/ui: add 'display-update' command for changing listen address

2022-04-01 Thread Vladimir Sementsov-Ogievskiy
Add possibility to change addresses where VNC server listens for new
connections. Prior to 6.0 this functionality was available through
'change' qmp command which was deleted.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Daniel P. Berrangé 
---
 docs/about/removed-features.rst |  3 +-
 qapi/ui.json| 65 +
 include/ui/console.h|  1 +
 monitor/qmp-cmds.c  | 15 
 ui/vnc.c| 23 
 5 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index 4b831ea291..b367418ca7 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -355,7 +355,8 @@ documentation of ``query-hotpluggable-cpus`` for additional 
details.
 ``change`` (removed in 6.0)
 '''
 
-Use ``blockdev-change-medium`` or ``change-vnc-password`` instead.
+Use ``blockdev-change-medium`` or ``change-vnc-password`` or
+``display-update`` instead.
 
 ``query-events`` (removed in 6.0)
 '
diff --git a/qapi/ui.json b/qapi/ui.json
index a810ed680c..d3bf9e72de 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -1447,3 +1447,68 @@
 { 'command': 'display-reload',
   'data': 'DisplayReloadOptions',
   'boxed' : true }
+
+##
+# @DisplayUpdateType:
+#
+# Available DisplayUpdate types.
+#
+# @vnc: VNC display
+#
+# Since: 7.1
+#
+##
+{ 'enum': 'DisplayUpdateType',
+  'data': ['vnc'] }
+
+##
+# @DisplayUpdateOptionsVNC:
+#
+# Specify the VNC reload options.
+#
+# @addresses: If specified, change set of addresses
+# to listen for connections. Addresses configured
+# for websockets are not touched.
+#
+# Since: 7.1
+#
+##
+{ 'struct': 'DisplayUpdateOptionsVNC',
+  'data': { '*addresses': ['SocketAddress'] } }
+
+##
+# @DisplayUpdateOptions:
+#
+# Options of the display configuration reload.
+#
+# @type: Specify the display type.
+#
+# Since: 7.1
+#
+##
+{ 'union': 'DisplayUpdateOptions',
+  'base': {'type': 'DisplayUpdateType'},
+  'discriminator': 'type',
+  'data': { 'vnc': 'DisplayUpdateOptionsVNC' } }
+
+##
+# @display-update:
+#
+# Update display configuration.
+#
+# Returns: Nothing on success.
+#
+# Since: 7.1
+#
+# Example:
+#
+# -> { "execute": "display-update",
+#  "arguments": { "type": "vnc", "addresses":
+# [ { "type": "inet", "host": "0.0.0.0",
+# "port": "5901" } ] } }
+# <- { "return": {} }
+#
+##
+{ 'command': 'display-update',
+  'data': 'DisplayUpdateOptions',
+  'boxed' : true }
diff --git a/include/ui/console.h b/include/ui/console.h
index 0f84861933..c44b28a972 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -518,6 +518,7 @@ int vnc_display_pw_expire(const char *id, time_t expires);
 void vnc_parse(const char *str);
 int vnc_init_func(void *opaque, QemuOpts *opts, Error **errp);
 bool vnc_display_reload_certs(const char *id,  Error **errp);
+bool vnc_display_update(DisplayUpdateOptionsVNC *arg, Error **errp);
 
 /* input.c */
 int index_from_key(const char *key, size_t key_length);
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 0b04855ce8..22aa91cc8d 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -347,6 +347,21 @@ void qmp_display_reload(DisplayReloadOptions *arg, Error 
**errp)
 }
 }
 
+void qmp_display_update(DisplayUpdateOptions *arg, Error **errp)
+{
+switch (arg->type) {
+case DISPLAY_UPDATE_TYPE_VNC:
+#ifdef CONFIG_VNC
+vnc_display_update(&arg->u.vnc, errp);
+#else
+error_setg(errp, "vnc is invalid, missing 'CONFIG_VNC'");
+#endif
+break;
+default:
+abort();
+}
+}
+
 static int qmp_x_query_rdma_foreach(Object *obj, void *opaque)
 {
 RdmaProvider *rdma;
diff --git a/ui/vnc.c b/ui/vnc.c
index 57cbf18ce6..ead85d1794 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -3981,6 +3981,29 @@ static int vnc_display_listen(VncDisplay *vd,
 return 0;
 }
 
+bool vnc_display_update(DisplayUpdateOptionsVNC *arg, Error **errp)
+{
+VncDisplay *vd = vnc_display_find(NULL);
+
+if (!vd) {
+error_setg(errp, "Can not find vnc display");
+return false;
+}
+
+if (arg->has_addresses) {
+if (vd->listener) {
+qio_net_listener_disconnect(vd->listener);
+object_unref(OBJECT(vd->listener));
+vd->listener = NULL;
+}
+
+if (vnc_display_listen(vd, arg->addresses, NULL, errp) < 0) {
+return false;
+}
+}
+
+return true;
+}
 
 void vnc_display_open(const char *id, Error **errp)
 {
-- 
2.35.1




[PATCH v5 1/3] ui/vnc: refactor arrays of addresses to SocketAddressList

2022-04-01 Thread Vladimir Sementsov-Ogievskiy
Let's use SocketAddressList instead of dynamic arrays.
Benefits:
 - Automatic cleanup: don't need specific freeing function and drop
   some gotos.
 - Less indirection: no triple asterix anymore!
 - Prepare for the following commit, which will reuse new interface of
   vnc_display_listen().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Daniel P. Berrangé 
---
 ui/vnc.c | 129 ++-
 1 file changed, 51 insertions(+), 78 deletions(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index 310a873c21..57cbf18ce6 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -3820,30 +3820,19 @@ static int vnc_display_get_address(const char *addrstr,
 return ret;
 }
 
-static void vnc_free_addresses(SocketAddress ***retsaddr,
-   size_t *retnsaddr)
-{
-size_t i;
-
-for (i = 0; i < *retnsaddr; i++) {
-qapi_free_SocketAddress((*retsaddr)[i]);
-}
-g_free(*retsaddr);
-
-*retsaddr = NULL;
-*retnsaddr = 0;
-}
-
 static int vnc_display_get_addresses(QemuOpts *opts,
  bool reverse,
- SocketAddress ***retsaddr,
- size_t *retnsaddr,
- SocketAddress ***retwsaddr,
- size_t *retnwsaddr,
+ SocketAddressList **saddr_list_ret,
+ SocketAddressList **wsaddr_list_ret,
  Error **errp)
 {
 SocketAddress *saddr = NULL;
 SocketAddress *wsaddr = NULL;
+g_autoptr(SocketAddressList) saddr_list = NULL;
+SocketAddressList **saddr_tail = &saddr_list;
+SocketAddress *single_saddr = NULL;
+g_autoptr(SocketAddressList) wsaddr_list = NULL;
+SocketAddressList **wsaddr_tail = &wsaddr_list;
 QemuOptsIter addriter;
 const char *addr;
 int to = qemu_opt_get_number(opts, "to", 0);
@@ -3852,23 +3841,16 @@ static int vnc_display_get_addresses(QemuOpts *opts,
 bool ipv4 = qemu_opt_get_bool(opts, "ipv4", false);
 bool ipv6 = qemu_opt_get_bool(opts, "ipv6", false);
 int displaynum = -1;
-int ret = -1;
-
-*retsaddr = NULL;
-*retnsaddr = 0;
-*retwsaddr = NULL;
-*retnwsaddr = 0;
 
 addr = qemu_opt_get(opts, "vnc");
 if (addr == NULL || g_str_equal(addr, "none")) {
-ret = 0;
-goto cleanup;
+return 0;
 }
 if (qemu_opt_get(opts, "websocket") &&
 !qcrypto_hash_supports(QCRYPTO_HASH_ALG_SHA1)) {
 error_setg(errp,
"SHA1 hash support is required for websockets");
-goto cleanup;
+return -1;
 }
 
 qemu_opt_iter_init(&addriter, opts, "vnc");
@@ -3879,7 +3861,7 @@ static int vnc_display_get_addresses(QemuOpts *opts,
  ipv4, ipv6,
  &saddr, errp);
 if (rv < 0) {
-goto cleanup;
+return -1;
 }
 /* Historical compat - first listen address can be used
  * to set the default websocket port
@@ -3887,13 +3869,16 @@ static int vnc_display_get_addresses(QemuOpts *opts,
 if (displaynum == -1) {
 displaynum = rv;
 }
-*retsaddr = g_renew(SocketAddress *, *retsaddr, *retnsaddr + 1);
-(*retsaddr)[(*retnsaddr)++] = saddr;
+QAPI_LIST_APPEND(saddr_tail, saddr);
 }
 
-/* If we had multiple primary displays, we don't do defaults
- * for websocket, and require explicit config instead. */
-if (*retnsaddr > 1) {
+if (saddr_list && !saddr_list->next) {
+single_saddr = saddr_list->value;
+} else {
+/*
+ * If we had multiple primary displays, we don't do defaults
+ * for websocket, and require explicit config instead.
+ */
 displaynum = -1;
 }
 
@@ -3903,57 +3888,50 @@ static int vnc_display_get_addresses(QemuOpts *opts,
 has_ipv4, has_ipv6,
 ipv4, ipv6,
 &wsaddr, errp) < 0) {
-goto cleanup;
+return -1;
 }
 
 /* Historical compat - if only a single listen address was
  * provided, then this is used to set the default listen
  * address for websocket too
  */
-if (*retnsaddr == 1 &&
-(*retsaddr)[0]->type == SOCKET_ADDRESS_TYPE_INET &&
+if (single_saddr &&
+single_saddr->type == SOCKET_ADDRESS_TYPE_INET &&
 wsaddr->type == SOCKET_ADDRESS_TYPE_INET &&
 g_str_equal(wsaddr->u.inet.host, "") &&
-!g_str_equal((*retsaddr)[0]->u.inet.host, "")) {
+!g_str_equal(single_saddr->u.inet.host, "")) {
 g_free(wsaddr->u.inet.host);
-wsaddr->u.inet.host = g_strdup((*retsaddr)[0]->u.inet.host);
+wsaddr-

Re: [PULL 06/22] hw/nvram: Introduce Xilinx battery-backed ram

2022-04-01 Thread Peter Maydell
On Thu, 30 Sept 2021 at 16:12, Peter Maydell  wrote:
>
> From: Tong Ho 
>
> This device is present in Versal and ZynqMP product
> families to store a 256-bit encryption key.

Hi; Coverity points out a bug in this change (CID 1487233):

> +static void bbram_bdrv_error(XlnxBBRam *s, int rc, gchar *detail)
> +{
> +Error *errp;
> +
> +error_setg_errno(&errp, -rc, "%s: BBRAM backstore %s failed.",
> + blk_name(s->blk), detail);

The Error** argument to error_setg() and error_setg_errno()
needs to (assuming it's not one of the special cases
&error_abort or &error_fatal) point to an Error* that has
been initialized to NULL. (The comment on error_setg() in
include/qapi/error.h explains this.)

So this function needs to start "Error *errp = NULL;"
to avoid a probable assert() inside error_setg().

> +error_report("%s", error_get_pretty(errp));
> +error_free(errp);
> +
> +g_free(detail);
> +}

thanks
-- PMM



[PATCH 1/2] bsd-user: Add missing semicolon

2022-04-01 Thread Maya Rashish
Signed-off-by: Maya Rashish 
---
 bsd-user/netbsd/target_os_siginfo.h  | 2 +-
 bsd-user/openbsd/target_os_siginfo.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/bsd-user/netbsd/target_os_siginfo.h 
b/bsd-user/netbsd/target_os_siginfo.h
index 667c19cc7c..36c2323cbe 100644
--- a/bsd-user/netbsd/target_os_siginfo.h
+++ b/bsd-user/netbsd/target_os_siginfo.h
@@ -14,7 +14,7 @@ typedef struct target_sigaltstack {
 
 typedef struct {
 uint32_t __bits[TARGET_NSIG_WORDS];
-} target_sigset_t
+} target_sigset_t;
 
 struct target_sigaction {
 abi_ulong   _sa_handler;
diff --git a/bsd-user/openbsd/target_os_siginfo.h 
b/bsd-user/openbsd/target_os_siginfo.h
index baf646a5ab..b5332a83f9 100644
--- a/bsd-user/openbsd/target_os_siginfo.h
+++ b/bsd-user/openbsd/target_os_siginfo.h
@@ -14,7 +14,7 @@ typedef struct target_sigaltstack {
 
 typedef struct {
 uint32_t __bits[TARGET_NSIG_WORDS];
-} target_sigset_t
+} target_sigset_t;
 
 struct target_sigaction {
 abi_ulong   _sa_handler;
-- 
2.35.1




[PATCH 17/17] tests/tcg: fix non-static build

2022-04-01 Thread Paolo Bonzini
If linking with -static fails at configure time, -static should not be used
at build time either.  Do not include BUILD_STATIC in $config_target_mak.

Signed-off-by: Paolo Bonzini 
Message-Id: <20220328140240.40798-16-pbonz...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 tests/tcg/configure.sh | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index 75603fee6d..691d90abac 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -261,7 +261,6 @@ for target in $target_list; do
   if do_compiler "$target_compiler" $target_compiler_cflags \
  -o $TMPE $TMPC ; then
   got_cross_cc=yes
-  echo "BUILD_STATIC=y" >> $config_target_mak
   echo "CC=$target_compiler" >> $config_target_mak
   fi
   else
-- 
2.35.1




Re: [PATCH v2 4/7] util: add qemu-co-timeout

2022-04-01 Thread Vladimir Sementsov-Ogievskiy

01.04.2022 16:13, Hanna Reitz wrote:

On 01.04.22 11:19, Vladimir Sementsov-Ogievskiy wrote:

Add new API, to make a time limited call of the coroutine.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/qemu/coroutine.h | 13 ++
  util/meson.build |  1 +
  util/qemu-co-timeout.c   | 89 
  3 files changed, 103 insertions(+)
  create mode 100644 util/qemu-co-timeout.c


I don’t really understand what this does.  Intuitively, I would have assumed 
this makes the first yield of the respective coroutine not return when the 
timeout has elapsed, and instead, we switch back to qemu_co_timeout(), which 
returns to its callers.

But it looks like when this yield happens and we switch back to 
qemu_co_timeout(), the coroutine actually isn’t cancelled, and will just 
continue running, actually.  Is that right?  On first look, this looks like 
it’ll be quite difficult to think about what happens when this is used, because 
the coroutine in question is no longer run in sequence with its caller, but 
actually might run in parallel (even though it’s still a coroutine, so it’ll 
remain cooperative multitasking).



Yes, the coroutine continue execution in parallel. That's a generic interface, and there 
is no way to "cancel" generic coroutine. So, caller should care about it.


--
Best regards,
Vladimir



[PATCH 16/17] tests/docker: remove SKIP_DOCKER_BUILD

2022-04-01 Thread Paolo Bonzini
It is now unused.

Signed-off-by: Paolo Bonzini 
Message-Id: <20220328140240.40798-15-pbonz...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 tests/docker/Makefile.include | 12 +---
 tests/docker/docker.py| 57 ---
 2 files changed, 1 insertion(+), 68 deletions(-)

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index eb100c294f..ca2157db46 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -33,15 +33,7 @@ $(DOCKER_SRC_COPY):
 
 docker-qemu-src: $(DOCKER_SRC_COPY)
 
-# General rule for building docker images. If we are a sub-make
-# invoked with SKIP_DOCKER_BUILD we still check the image is up to date
-# though
-ifdef SKIP_DOCKER_BUILD
-docker-image-%: $(DOCKER_FILES_DIR)/%.docker
-   $(call quiet-command, \
-   $(DOCKER_SCRIPT) check --quiet qemu/$* $<, \
-   "CHECK", "$*")
-else
+# General rule for building docker images.
 docker-image-%: $(DOCKER_FILES_DIR)/%.docker
$(call quiet-command,\
$(DOCKER_SCRIPT) build -t qemu/$* -f $< \
@@ -77,8 +69,6 @@ docker-binfmt-image-debian-%: 
$(DOCKER_FILES_DIR)/debian-bootstrap.docker
{ echo "You will need to build $(EXECUTABLE)"; exit 
1;},\
"CHECK", "debian-$* exists"))
 
-endif
-
 # Enforce dependencies for composite images
 # we don't run tests on intermediate images (used as base by another image)
 DOCKER_PARTIAL_IMAGES := debian10 debian11
diff --git a/tests/docker/docker.py b/tests/docker/docker.py
index 78dd13171e..d0af2861b8 100755
--- a/tests/docker/docker.py
+++ b/tests/docker/docker.py
@@ -676,63 +676,6 @@ def run(self, args, argv):
 as_user=True)
 
 
-class CheckCommand(SubCommand):
-"""Check if we need to re-build a docker image out of a dockerfile.
-Arguments:  """
-name = "check"
-
-def args(self, parser):
-parser.add_argument("tag",
-help="Image Tag")
-parser.add_argument("dockerfile", default=None,
-help="Dockerfile name", nargs='?')
-parser.add_argument("--checktype", choices=["checksum", "age"],
-default="checksum", help="check type")
-parser.add_argument("--olderthan", default=60, type=int,
-help="number of minutes")
-
-def run(self, args, argv):
-tag = args.tag
-
-try:
-dkr = Docker()
-except subprocess.CalledProcessError:
-print("Docker not set up")
-return 1
-
-info = dkr.inspect_tag(tag)
-if info is None:
-print("Image does not exist")
-return 1
-
-if args.checktype == "checksum":
-if not args.dockerfile:
-print("Need a dockerfile for tag:%s" % (tag))
-return 1
-
-dockerfile = _read_dockerfile(args.dockerfile)
-
-if dkr.image_matches_dockerfile(tag, dockerfile):
-if not args.quiet:
-print("Image is up to date")
-return 0
-else:
-print("Image needs updating")
-return 1
-elif args.checktype == "age":
-timestr = dkr.get_image_creation_time(info).split(".")[0]
-created = datetime.strptime(timestr, "%Y-%m-%dT%H:%M:%S")
-past = datetime.now() - timedelta(minutes=args.olderthan)
-if created < past:
-print ("Image created @ %s more than %d minutes old" %
-   (timestr, args.olderthan))
-return 1
-else:
-if not args.quiet:
-print ("Image less than %d minutes old" % (args.olderthan))
-return 0
-
-
 def main():
 global USE_ENGINE
 
-- 
2.35.1





[PATCH 10/17] tests/tcg: remove CONFIG_LINUX_USER from config-target.mak

2022-04-01 Thread Paolo Bonzini
Just check the target name instead.

Signed-off-by: Paolo Bonzini 
Message-Id: <20220328140240.40798-10-pbonz...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Paolo Bonzini 
---
 tests/tcg/configure.sh  | 2 --
 tests/tcg/multiarch/Makefile.target | 2 +-
 tests/tcg/x86_64/Makefile.target| 2 +-
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index 8927a2b260..57026b5899 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -227,12 +227,10 @@ for target in $target_list; do
   case $target in
 *-linux-user)
   echo "CONFIG_USER_ONLY=y" >> $config_target_mak
-  echo "CONFIG_LINUX_USER=y" >> $config_target_mak
   echo "QEMU=$PWD/qemu-$arch" >> $config_target_mak
   ;;
 *-bsd-user)
   echo "CONFIG_USER_ONLY=y" >> $config_target_mak
-  echo "CONFIG_BSD_USER=y" >> $config_target_mak
   echo "QEMU=$PWD/qemu-$arch" >> $config_target_mak
   ;;
 *-softmmu)
diff --git a/tests/tcg/multiarch/Makefile.target 
b/tests/tcg/multiarch/Makefile.target
index dec401e67f..6bba523729 100644
--- a/tests/tcg/multiarch/Makefile.target
+++ b/tests/tcg/multiarch/Makefile.target
@@ -10,7 +10,7 @@ MULTIARCH_SRC=$(SRC_PATH)/tests/tcg/multiarch
 # Set search path for all sources
 VPATH += $(MULTIARCH_SRC)
 MULTIARCH_SRCS =  $(notdir $(wildcard $(MULTIARCH_SRC)/*.c))
-ifneq ($(CONFIG_LINUX_USER),)
+ifeq ($(filter %-linux-user, $(TARGET)),$(TARGET))
 VPATH += $(MULTIARCH_SRC)/linux
 MULTIARCH_SRCS += $(notdir $(wildcard $(MULTIARCH_SRC)/linux/*.c))
 endif
diff --git a/tests/tcg/x86_64/Makefile.target b/tests/tcg/x86_64/Makefile.target
index 17cf168f0a..f9fcd31caf 100644
--- a/tests/tcg/x86_64/Makefile.target
+++ b/tests/tcg/x86_64/Makefile.target
@@ -8,7 +8,7 @@
 
 include $(SRC_PATH)/tests/tcg/i386/Makefile.target
 
-ifneq ($(CONFIG_LINUX_USER),)
+ifeq ($(filter %-linux-user, $(TARGET)),$(TARGET))
 X86_64_TESTS += vsyscall
 TESTS=$(MULTIARCH_TESTS) $(X86_64_TESTS) test-x86_64
 else
-- 
2.35.1





[PATCH 13/17] tests/tcg: list test targets in Makefile.prereqs

2022-04-01 Thread Paolo Bonzini
Omit the rules altogether for targets that do not have a compiler.
Makefile.qemu now is only invoked if the tests are actually built/run.

Signed-off-by: Paolo Bonzini 
Message-Id: <20220328140240.40798-13-pbonz...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 tests/Makefile.include  | 14 +++---
 tests/tcg/Makefile.qemu | 11 ---
 tests/tcg/configure.sh  | 17 -
 3 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index b5d0d6bc98..091ca8513f 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -36,19 +36,16 @@ export SRC_PATH
 
 SPEED = quick
 
-# Build up our target list from the filtered list of ninja targets
-TARGETS=$(patsubst libqemu-%.fa, %, $(filter libqemu-%.fa, $(ninja-targets)))
-
 -include tests/tcg/Makefile.prereqs
 config-host.mak: $(SRC_PATH)/tests/tcg/configure.sh
 tests/tcg/Makefile.prereqs: config-host.mak
 
 # Per guest TCG tests
-BUILD_TCG_TARGET_RULES=$(patsubst %,build-tcg-tests-%, $(TARGETS))
-CLEAN_TCG_TARGET_RULES=$(patsubst %,clean-tcg-tests-%, $(TARGETS))
-RUN_TCG_TARGET_RULES=$(patsubst %,run-tcg-tests-%, $(TARGETS))
+BUILD_TCG_TARGET_RULES=$(patsubst %,build-tcg-tests-%, $(TCG_TESTS_TARGETS))
+CLEAN_TCG_TARGET_RULES=$(patsubst %,clean-tcg-tests-%, $(TCG_TESTS_TARGETS))
+RUN_TCG_TARGET_RULES=$(patsubst %,run-tcg-tests-%, $(TCG_TESTS_TARGETS))
 
-$(foreach TARGET,$(TARGETS), \
+$(foreach TARGET,$(TCG_TESTS_TARGETS), \
 $(eval $(BUILD_DIR)/tests/tcg/config-$(TARGET).mak: config-host.mak))
 
 $(BUILD_TCG_TARGET_RULES): build-tcg-tests-%: $(if 
$(CONFIG_PLUGIN),test-plugins)
@@ -84,6 +81,9 @@ clean-tcg: $(CLEAN_TCG_TARGET_RULES)
 
 .PHONY: check-venv check-avocado check-acceptance 
check-acceptance-deprecated-warning
 
+# Build up our target list from the filtered list of ninja targets
+TARGETS=$(patsubst libqemu-%.fa, %, $(filter libqemu-%.fa, $(ninja-targets)))
+
 TESTS_VENV_DIR=$(BUILD_DIR)/tests/venv
 TESTS_VENV_REQ=$(SRC_PATH)/tests/requirements.txt
 TESTS_RESULTS_DIR=$(BUILD_DIR)/tests/results
diff --git a/tests/tcg/Makefile.qemu b/tests/tcg/Makefile.qemu
index 84c8543878..cda5e5a33e 100644
--- a/tests/tcg/Makefile.qemu
+++ b/tests/tcg/Makefile.qemu
@@ -95,7 +95,6 @@ all:
 
 .PHONY: guest-tests
 
-ifneq ($(GUEST_BUILD),)
 guest-tests: $(GUEST_BUILD)
 
 run-guest-tests: guest-tests
@@ -105,16 +104,6 @@ run-guest-tests: guest-tests
SRC_PATH="$(SRC_PATH)" SPEED=$(SPEED) run), \
"RUN", "tests for $(TARGET_NAME)")
 
-else
-guest-tests:
-   $(call quiet-command, true, "BUILD", \
-   "$(TARGET) guest-tests SKIPPED")
-
-run-guest-tests:
-   $(call quiet-command, true, "RUN", \
-   "tests for $(TARGET) SKIPPED")
-endif
-
 # It doesn't matter if these don't exits
 .PHONY: clean-guest-tests
 clean-guest-tests:
diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index 904c351029..e51cd56b60 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -81,7 +81,9 @@ fi
 : ${cross_ld_tricore="tricore-ld"}
 
 makefile=tests/tcg/Makefile.prereqs
-: > $makefile
+echo "# Automatically generated by configure - do not modify" > $makefile
+
+tcg_tests_targets=
 for target in $target_list; do
   arch=${target%%-*}
 
@@ -228,6 +230,7 @@ for target in $target_list; do
   echo "target=$target" >> $config_target_mak
   case $target in
 *-softmmu)
+  test -f $source_path/tests/tcg/$arch/Makefile.softmmu-target || continue
   qemu="qemu-system-$arch"
   ;;
 *-linux-user|*-bsd-user)
@@ -235,11 +238,7 @@ for target in $target_list; do
   ;;
   esac
 
-  echo "run-tcg-tests-$target: $qemu\$(EXESUF)" >> $makefile
-
   eval "target_compiler_cflags=\${cross_cc_cflags_$arch}"
-  echo "QEMU=$PWD/$qemu" >> $config_target_mak
-  echo "CROSS_CC_GUEST_CFLAGS=$target_compiler_cflags" >> $config_target_mak
 
   got_cross_cc=no
 
@@ -362,8 +361,16 @@ for target in $target_list; do
   echo "CROSS_CC_HAS_I386_NOPIE=y" >> $config_target_mak
   ;;
   esac
+  got_cross_cc=yes
   break
   fi
   done
   fi
+  if test $got_cross_cc = yes; then
+  echo "QEMU=$PWD/$qemu" >> $config_target_mak
+  echo "CROSS_CC_GUEST_CFLAGS=$target_compiler_cflags" >> 
$config_target_mak
+  echo "run-tcg-tests-$target: $qemu\$(EXESUF)" >> $makefile
+  tcg_tests_targets="$tcg_tests_targets $target"
+  fi
 done
+echo "TCG_TESTS_TARGETS=$tcg_tests_targets" >> $makefile
-- 
2.35.1





[PATCH 14/17] tests/tcg: invoke Makefile.target directly from QEMU's makefile

2022-04-01 Thread Paolo Bonzini
Build the "docker.py cc" invocation directly in tests/tcg/configure.sh, and
remove the Makefile.qemu wrapper around Makefile.target.  The config-*.mak
files now include the actual variables used when building the tests, rather
than the CROSS_* variables that Makefile.qemu used to "translate".

This is a first step towards generalizing the cross-compilation infrastructure
so that it can be used for firmware as well.

Signed-off-by: Paolo Bonzini 
Message-Id: <20220328140240.40798-14-pbonz...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 tests/Makefile.include|  36 +++--
 tests/tcg/Makefile.qemu   | 110 --
 tests/tcg/Makefile.target |   3 +-
 tests/tcg/configure.sh|  28 +-
 4 files changed, 33 insertions(+), 144 deletions(-)
 delete mode 100644 tests/tcg/Makefile.qemu

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 091ca8513f..ec84b2ebc0 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -48,25 +48,27 @@ RUN_TCG_TARGET_RULES=$(patsubst %,run-tcg-tests-%, 
$(TCG_TESTS_TARGETS))
 $(foreach TARGET,$(TCG_TESTS_TARGETS), \
 $(eval $(BUILD_DIR)/tests/tcg/config-$(TARGET).mak: config-host.mak))
 
-$(BUILD_TCG_TARGET_RULES): build-tcg-tests-%: $(if 
$(CONFIG_PLUGIN),test-plugins)
-   $(call quiet-command,$(MAKE) $(SUBDIR_MAKEFLAGS) \
-   -f $(SRC_PATH)/tests/tcg/Makefile.qemu \
-   SRC_PATH=$(SRC_PATH) \
-   V="$(V)" TARGET="$*" guest-tests, \
-   "BUILD", "TCG tests for $*")
+.PHONY: $(TCG_TESTS_TARGETS:%=build-tcg-tests-%)
+$(TCG_TESTS_TARGETS:%=build-tcg-tests-%): build-tcg-tests-%: 
$(BUILD_DIR)/tests/tcg/config-%.mak
+   $(call quiet-command, \
+$(MAKE) -C tests/tcg/$* -f ../Makefile.target $(SUBDIR_MAKEFLAGS) \
+DOCKER_SCRIPT="$(DOCKER_SCRIPT)" \
+TARGET="$*" SRC_PATH="$(SRC_PATH)", \
+"BUILD","$* guest-tests")
 
-$(RUN_TCG_TARGET_RULES): run-tcg-tests-%: build-tcg-tests-% all
-   $(call quiet-command,$(MAKE) $(SUBDIR_MAKEFLAGS) \
-   -f $(SRC_PATH)/tests/tcg/Makefile.qemu \
-   SRC_PATH=$(SRC_PATH) SPEED="$(SPEED)" \
-   V="$(V)" TARGET="$*" run-guest-tests, \
-   "RUN", "TCG tests for $*")
+.PHONY: $(TCG_TESTS_TARGETS:%=run-tcg-tests-%)
+$(TCG_TESTS_TARGETS:%=run-tcg-tests-%): run-tcg-tests-%: build-tcg-tests-% 
$(if $(CONFIG_PLUGIN),test-plugins)
+   $(call quiet-command, \
+   $(MAKE) -C tests/tcg/$* -f ../Makefile.target $(SUBDIR_MAKEFLAGS) \
+TARGET="$*" SRC_PATH="$(SRC_PATH)" SPEED=$(SPEED) run, 
\
+"RUN", "$* guest-tests")
 
-$(CLEAN_TCG_TARGET_RULES): clean-tcg-tests-%:
-   $(call quiet-command,$(MAKE) $(SUBDIR_MAKEFLAGS) \
-   -f $(SRC_PATH)/tests/tcg/Makefile.qemu \
-   SRC_PATH=$(SRC_PATH) TARGET="$*" clean-guest-tests, \
-   "CLEAN", "TCG tests for $*")
+.PHONY: $(TCG_TESTS_TARGETS:%=clean-tcg-tests-%)
+$(TCG_TESTS_TARGETS:%=clean-tcg-tests-%): clean-tcg-tests-%:
+   $(call quiet-command, \
+   $(MAKE) -C tests/tcg/$* -f ../Makefile.target $(SUBDIR_MAKEFLAGS) \
+TARGET="$*" SRC_PATH="$(SRC_PATH)" clean, \
+"CLEAN", "$* guest-tests")
 
 .PHONY: build-tcg
 build-tcg: $(BUILD_TCG_TARGET_RULES)
diff --git a/tests/tcg/Makefile.qemu b/tests/tcg/Makefile.qemu
deleted file mode 100644
index cda5e5a33e..00
--- a/tests/tcg/Makefile.qemu
+++ /dev/null
@@ -1,110 +0,0 @@
-# -*- Mode: makefile -*-
-#
-# TCG tests (per-target rules)
-#
-# This Makefile fragment is included from the build-tcg target, once
-# for each target we build. We have two options for compiling, either
-# using a configured guest compiler or calling one of our docker images
-# to do it for us.
-#
-
-# The configure script fills in extra information about
-# useful docker images or alternative compiler flags.
-
-# Usage: $(call quiet-command,command and args,"NAME","args to print")
-# This will run "command and args", and either:
-#  if V=1 just print the whole command and args
-#  otherwise print the 'quiet' output in the format "  NAME args to print"
-# NAME should be a short name of the command, 7 letters or fewer.
-# If called with only a single argument, will print nothing in quiet mode.
-quiet-command-run = $(if $(V),,$(if $2,printf "  %-7s %s\n" $2 $3 && ))$1
-quiet-@ = $(if $(V),,@)
-quiet-command = $(quiet-@)$(call quiet-command-run,$1,$2,$3)
-
-CROSS_CC_GUEST:=
-CROSS_AS_GUEST:=
-CROSS_LD_GUEST:=
-DOCKER_IMAGE:=
-
--include tests/tcg/config-$(TARGET).mak
-
-GUEST_BUILD=
-TCG_MAKE=../Makefile.target
-
-# We also need the Docker make rules to depend on
-SKIP_DOCKER_BUILD=1
-include $(SRC_PATH)/tests/docker/Makefile.include
-
-# Support installed Cross Compilers
-
-ifdef CROSS_CC_GUEST
-
-.PHONY: cross-build-guest-tests
-cross-build-guest-tests:
-   $(call quiet-command, \
-  (mkdir -p tests/tcg/$(TARGET) && 

[PATCH 2/2] bsd-user: correct notion of return value

2022-04-01 Thread Maya Rashish
Fix EFAULT at startup.

Signed-off-by: Maya Rashish 
---
 bsd-user/netbsd/target_os_stack.h  | 2 +-
 bsd-user/openbsd/target_os_stack.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/bsd-user/netbsd/target_os_stack.h 
b/bsd-user/netbsd/target_os_stack.h
index 503279c1a9..f3cfbe8626 100644
--- a/bsd-user/netbsd/target_os_stack.h
+++ b/bsd-user/netbsd/target_os_stack.h
@@ -40,7 +40,7 @@ static inline int setup_initial_stack(struct bsd_binprm 
*bprm, abi_ulong *p,
 for (i = 0; i < MAX_ARG_PAGES; i++) {
 if (bprm->page[i]) {
 info->rss++;
-if (!memcpy_to_target(stack_base, bprm->page[i],
+if (memcpy_to_target(stack_base, bprm->page[i],
 TARGET_PAGE_SIZE)) {
 errno = EFAULT;
 return -1;
diff --git a/bsd-user/openbsd/target_os_stack.h 
b/bsd-user/openbsd/target_os_stack.h
index 4b37955d3b..3f799ef5d1 100644
--- a/bsd-user/openbsd/target_os_stack.h
+++ b/bsd-user/openbsd/target_os_stack.h
@@ -40,7 +40,7 @@ static inline int setup_initial_stack(struct bsd_binprm 
*bprm, abi_ulong *p,
 for (i = 0; i < MAX_ARG_PAGES; i++) {
 if (bprm->page[i]) {
 info->rss++;
-if (!memcpy_to_target(stack_base, bprm->page[i],
+if (memcpy_to_target(stack_base, bprm->page[i],
 TARGET_PAGE_SIZE)) {
 errno = EFAULT;
 return -1;
-- 
2.35.1




[PATCH 07/17] tests/docker: simplify docker-TEST@IMAGE targets

2022-04-01 Thread Paolo Bonzini
No need to go through the shell when we already have the test and images at
the point where the targets are declared.

Signed-off-by: Paolo Bonzini 
Message-Id: <20220328140240.40798-8-pbonz...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 tests/docker/Makefile.include | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index 3b5ebd5567..2a187cb5a2 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -204,7 +204,7 @@ DOCKER_TESTS := $(if $(TESTS), $(filter $(TESTS), 
$(__TESTS)), $(__TESTS))
 $(foreach i,$(filter-out $(DOCKER_PARTIAL_IMAGES),$(DOCKER_IMAGES)), \
$(foreach t,$(DOCKER_TESTS), \
$(eval .PHONY: docker-$t@$i) \
-   $(eval docker-$t@$i: docker-image-$i docker-run-$t@$i) \
+   $(eval docker-$t@$i: docker-image-$i; @$(MAKE) docker-run 
TEST=$t IMAGE=$i) \
) \
$(foreach t,$(DOCKER_TESTS), \
$(eval docker-all-tests: docker-$t@$i) \
@@ -263,7 +263,7 @@ DOCKER_CCACHE_DIR := $$HOME/.cache/qemu-docker-ccache
 
 # This rule if for directly running against an arbitrary docker target.
 # It is called by the expanded docker targets (e.g. make
-# docker-test-foo@bar) which will do additional verification.
+# docker-test-foo@bar) which will also ensure the image is up to date.
 #
 # For example: make docker-run TEST="test-quick" IMAGE="debian:arm64" 
EXECUTABLE=./aarch64-linux-user/qemu-aarch64
 #
@@ -298,14 +298,6 @@ docker-run: docker-qemu-src
$(call quiet-command, rm -r $(DOCKER_SRC_COPY), \
"  CLEANUP $(DOCKER_SRC_COPY)")
 
-# Run targets:
-#
-# Of the form docker-TEST-FOO@IMAGE-BAR which will then be expanded into a 
call to "make docker-run"
-docker-run-%: CMD = $(shell echo '$@' | sed -e 
's/docker-run-\([^@]*\)@\(.*\)/\1/')
-docker-run-%: IMAGE = $(shell echo '$@' | sed -e 
's/docker-run-\([^@]*\)@\(.*\)/\2/')
-docker-run-%:
-   @$(MAKE) docker-run TEST=$(CMD) IMAGE=qemu/$(IMAGE)
-
 docker-image: ${DOCKER_IMAGES:%=docker-image-%}
 
 docker-clean:
-- 
2.35.1





[PATCH 11/17] tests/tcg: remove CONFIG_USER_ONLY from config-target.mak

2022-04-01 Thread Paolo Bonzini
Just check the target name instead.

Signed-off-by: Paolo Bonzini 
Message-Id: <20220328140240.40798-11-pbonz...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 tests/tcg/Makefile.target |  8 
 tests/tcg/configure.sh| 12 +++-
 2 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/tests/tcg/Makefile.target b/tests/tcg/Makefile.target
index acda5bcec2..c75e8d983f 100644
--- a/tests/tcg/Makefile.target
+++ b/tests/tcg/Makefile.target
@@ -34,7 +34,7 @@ all:
 -include ../config-$(TARGET).mak
 
 # Get semihosting definitions for user-mode emulation
-ifeq ($(CONFIG_USER_ONLY),y)
+ifeq ($(filter %-softmmu, $(TARGET)),)
 -include $(SRC_PATH)/configs/targets/$(TARGET).mak
 endif
 
@@ -44,7 +44,7 @@ COMMA := ,
 quiet-command = $(if $(V),$1,$(if $(2),@printf "  %-7s %s\n" $2 $3 && $1, @$1))
 
 # $1 = test name, $2 = cmd, $3 = desc
-ifdef CONFIG_USER_ONLY
+ifeq ($(filter %-softmmu, $(TARGET)),)
 run-test = $(call quiet-command, timeout --foreground $(TIMEOUT) $2 > $1.out, \
"TEST",$3)
 else
@@ -91,7 +91,7 @@ QEMU_OPTS=
 #   90swith --enable-tcg-interpreter
 TIMEOUT=90
 
-ifdef CONFIG_USER_ONLY
+ifeq ($(filter %-softmmu, $(TARGET)),)
 # The order we include is important. We include multiarch first and
 # then the target. If there are common tests shared between
 # sub-targets (e.g. ARM & AArch64) then it is up to
@@ -153,7 +153,7 @@ extract-plugin = $(wordlist 2, 2, $(subst -with-, ,$1))
 
 RUN_TESTS+=$(EXTRA_RUNS)
 
-ifdef CONFIG_USER_ONLY
+ifeq ($(filter %-softmmu, $(TARGET)),)
 run-%: %
$(call run-test, $<, $(QEMU) $(QEMU_OPTS) $<, "$< on $(TARGET_NAME)")
 
diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index 57026b5899..0d864c24fc 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -225,18 +225,12 @@ for target in $target_list; do
   echo "TARGET_NAME=$arch" >> $config_target_mak
   echo "target=$target" >> $config_target_mak
   case $target in
-*-linux-user)
-  echo "CONFIG_USER_ONLY=y" >> $config_target_mak
-  echo "QEMU=$PWD/qemu-$arch" >> $config_target_mak
-  ;;
-*-bsd-user)
-  echo "CONFIG_USER_ONLY=y" >> $config_target_mak
-  echo "QEMU=$PWD/qemu-$arch" >> $config_target_mak
-  ;;
 *-softmmu)
-  echo "CONFIG_SOFTMMU=y" >> $config_target_mak
   echo "QEMU=$PWD/qemu-system-$arch" >> $config_target_mak
   ;;
+*-linux-user|*-bsd-user)
+  echo "QEMU=$PWD/qemu-$arch" >> $config_target_mak
+  ;;
   esac
 
   eval "target_compiler_cflags=\${cross_cc_cflags_$arch}"
-- 
2.35.1





[PATCH 15/17] tests/tcg: isolate from QEMU's config-host.mak

2022-04-01 Thread Paolo Bonzini
Do not include variables for the QEMU's own compiler, as they
are not necessarily related to the cross compiler used for tests/tcg.

Signed-off-by: Paolo Bonzini 
---
 configure | 3 +--
 tests/tcg/Makefile.target | 3 +--
 tests/tcg/configure.sh| 5 +
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/configure b/configure
index 7c08c18358..e8786d478e 100755
--- a/configure
+++ b/configure
@@ -2937,7 +2937,6 @@ echo "GENISOIMAGE=$genisoimage" >> $config_host_mak
 echo "MESON=$meson" >> $config_host_mak
 echo "NINJA=$ninja" >> $config_host_mak
 echo "CC=$cc" >> $config_host_mak
-echo "HOST_CC=$host_cc" >> $config_host_mak
 echo "AR=$ar" >> $config_host_mak
 echo "AS=$as" >> $config_host_mak
 echo "CCAS=$ccas" >> $config_host_mak
@@ -3057,7 +3056,7 @@ done
 (for i in $cross_cc_vars; do
   export $i
 done
-export target_list source_path use_containers cpu
+export target_list source_path use_containers cpu host_cc
 $source_path/tests/tcg/configure.sh)
 
 # temporary config to build submodules
diff --git a/tests/tcg/Makefile.target b/tests/tcg/Makefile.target
index 95499d8c9b..f427a0304e 100644
--- a/tests/tcg/Makefile.target
+++ b/tests/tcg/Makefile.target
@@ -30,7 +30,7 @@
 #
 
 all:
--include ../../../config-host.mak
+-include ../config-host.mak
 -include ../config-$(TARGET).mak
 
 # Get semihosting definitions for user-mode emulation
@@ -77,7 +77,6 @@ EXTRA_TESTS=
 
 # Start with a blank slate, the build targets get to add stuff first
 CFLAGS=
-QEMU_CFLAGS=
 LDFLAGS=
 
 QEMU_OPTS=
diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index a577dd7ece..75603fee6d 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -83,6 +83,11 @@ fi
 makefile=tests/tcg/Makefile.prereqs
 echo "# Automatically generated by configure - do not modify" > $makefile
 
+config_host_mak=tests/tcg/config-host.mak
+echo "# Automatically generated by configure - do not modify" > 
$config_host_mak
+echo "SRC_PATH=$source_path" >> $config_host_mak
+echo "HOST_CC=$host_cc" >> $config_host_mak
+
 tcg_tests_targets=
 for target in $target_list; do
   arch=${target%%-*}
-- 
2.35.1





[PATCH 12/17] tests/tcg: prepare Makefile.prereqs at configure time

2022-04-01 Thread Paolo Bonzini
List the dependencies of the build-tcg-tests-* and run-tcg-tests-*
targets in a Makefile fragment, without going through Makefile.prereqs's
"parsing" of config-*.mak.

Signed-off-by: Paolo Bonzini 
Message-Id: <20220328140240.40798-12-pbonz...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 tests/Makefile.include |  9 ++---
 tests/tcg/Makefile.prereqs | 18 --
 tests/tcg/configure.sh | 10 --
 3 files changed, 14 insertions(+), 23 deletions(-)
 delete mode 100644 tests/tcg/Makefile.prereqs

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 05c534ea56..b5d0d6bc98 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -39,14 +39,17 @@ SPEED = quick
 # Build up our target list from the filtered list of ninja targets
 TARGETS=$(patsubst libqemu-%.fa, %, $(filter libqemu-%.fa, $(ninja-targets)))
 
+-include tests/tcg/Makefile.prereqs
+config-host.mak: $(SRC_PATH)/tests/tcg/configure.sh
+tests/tcg/Makefile.prereqs: config-host.mak
+
 # Per guest TCG tests
 BUILD_TCG_TARGET_RULES=$(patsubst %,build-tcg-tests-%, $(TARGETS))
 CLEAN_TCG_TARGET_RULES=$(patsubst %,clean-tcg-tests-%, $(TARGETS))
 RUN_TCG_TARGET_RULES=$(patsubst %,run-tcg-tests-%, $(TARGETS))
 
-# Probe for the Docker Builds needed for each build
-$(foreach PROBE_TARGET,$(TARGET_DIRS), \
-   $(eval -include $(SRC_PATH)/tests/tcg/Makefile.prereqs))
+$(foreach TARGET,$(TARGETS), \
+$(eval $(BUILD_DIR)/tests/tcg/config-$(TARGET).mak: config-host.mak))
 
 $(BUILD_TCG_TARGET_RULES): build-tcg-tests-%: $(if 
$(CONFIG_PLUGIN),test-plugins)
$(call quiet-command,$(MAKE) $(SUBDIR_MAKEFLAGS) \
diff --git a/tests/tcg/Makefile.prereqs b/tests/tcg/Makefile.prereqs
deleted file mode 100644
index 9a29604a83..00
--- a/tests/tcg/Makefile.prereqs
+++ /dev/null
@@ -1,18 +0,0 @@
-# -*- Mode: makefile -*-
-#
-# TCG Compiler Probe
-#
-# This Makefile fragment is included multiple times in the main make
-# script to probe for available compilers. This is used to build up a
-# selection of required docker targets before we invoke a sub-make for
-# each target.
-
-DOCKER_IMAGE:=
-
--include $(BUILD_DIR)/tests/tcg/config-$(PROBE_TARGET).mak
-
-ifneq ($(DOCKER_IMAGE),)
-build-tcg-tests-$(PROBE_TARGET): docker-image-$(DOCKER_IMAGE)
-endif
-$(BUILD_DIR)/tests/tcg/config_$(PROBE_TARGET).mak: config-host.mak
-config-host.mak: $(SRC_PATH)/tests/tcg/configure.sh
diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index 0d864c24fc..904c351029 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -80,6 +80,8 @@ fi
 : ${cross_as_tricore="tricore-as"}
 : ${cross_ld_tricore="tricore-ld"}
 
+makefile=tests/tcg/Makefile.prereqs
+: > $makefile
 for target in $target_list; do
   arch=${target%%-*}
 
@@ -226,14 +228,17 @@ for target in $target_list; do
   echo "target=$target" >> $config_target_mak
   case $target in
 *-softmmu)
-  echo "QEMU=$PWD/qemu-system-$arch" >> $config_target_mak
+  qemu="qemu-system-$arch"
   ;;
 *-linux-user|*-bsd-user)
-  echo "QEMU=$PWD/qemu-$arch" >> $config_target_mak
+  qemu="qemu-$arch"
   ;;
   esac
 
+  echo "run-tcg-tests-$target: $qemu\$(EXESUF)" >> $makefile
+
   eval "target_compiler_cflags=\${cross_cc_cflags_$arch}"
+  echo "QEMU=$PWD/$qemu" >> $config_target_mak
   echo "CROSS_CC_GUEST_CFLAGS=$target_compiler_cflags" >> $config_target_mak
 
   got_cross_cc=no
@@ -329,6 +334,7 @@ for target in $target_list; do
   test -n "$container_image"; then
   for host in $container_hosts; do
   if test "$host" = "$cpu"; then
+  echo "build-tcg-tests-$target: docker-image-$container_image" >> 
$makefile
   echo "DOCKER_IMAGE=$container_image" >> $config_target_mak
   echo "DOCKER_CROSS_CC_GUEST=$container_cross_cc" >> \
$config_target_mak
-- 
2.35.1





[PATCH 04/17] tests/docker: remove unnecessary default definitions

2022-04-01 Thread Paolo Bonzini
The definition of DOCKER_IMAGES and DOCKER_TESTS copes already with an
empty value of $(IMAGES) and $(TESTS), no need to force them to "%" if
undefined.

Signed-off-by: Paolo Bonzini 
Message-Id: <20220328140240.40798-5-pbonz...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 tests/docker/Makefile.include | 5 -
 1 file changed, 5 deletions(-)

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index c8d0ec3c66..751151d7e9 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -8,11 +8,6 @@ COMMA := ,
 
 HOST_ARCH = $(if $(ARCH),$(ARCH),$(shell uname -m))
 
-# These variables can be set by the user to limit the set of docker
-# images and tests to a more restricted subset
-TESTS ?= %
-IMAGES ?= %
-
 DOCKER_FILES_DIR := $(SRC_PATH)/tests/docker/dockerfiles
 # we don't run tests on intermediate images (used as base by another image)
 DOCKER_PARTIAL_IMAGES := debian10 debian11
-- 
2.35.1





  1   2   3   >