date:20240314

Re: [PATCH v2] target/s390x: improve cpu compatibility check error message

2024-03-14 Thread Philippe Mathieu-Daudé


On 14/3/24 22:37, Claudio Fontana wrote:

some users were confused by this message showing under TCG:

Selected CPU generation is too new. Maximum supported model
in the configuration: 'xyz'


(Note for the maintainer queuing this patch, consider adding
 few extra spaces to indent the previously displayed output).


Clarify that the maximum can depend on the accel, and add a
hint to try a different one.

Also add a hint for features mismatch to suggest trying
different accel, QEMU and kernel versions.

Signed-off-by: Claudio Fontana 
---
  target/s390x/cpu_models.c | 22 +++---
  1 file changed, 15 insertions(+), 7 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] docs/s390: clarify even more that cpu-topology is KVM-only

2024-03-14 Thread Philippe Mathieu-Daudé


On 14/3/24 18:22, Claudio Fontana wrote:

At least for now cpu-topology is implemented only for KVM.

We already say this, but this tries to be more explicit,
and also show it in the examples.

This adds a new reference in the introduction that we can point to,
whenever we need to reference accelerators and how to select them.

Signed-off-by: Claudio Fontana 
---
  docs/system/introduction.rst   |  2 ++
  docs/system/s390x/cpu-topology.rst | 14 --
  2 files changed, 10 insertions(+), 6 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 1/5] roms/efi: clean up edk2 build config

2024-03-14 Thread Philippe Mathieu-Daudé


On 14/3/24 12:52, Gerd Hoffmann wrote:

Needed to avoid stale toolchain configurations breaking firmware builds.

Signed-off-by: Gerd Hoffmann 
---
  roms/Makefile | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 3/5] roms/efi: exclude efi shell from secure boot builds

2024-03-14 Thread Philippe Mathieu-Daudé


On 14/3/24 12:53, Gerd Hoffmann wrote:

Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4641
Signed-off-by: Gerd Hoffmann 
---
  roms/edk2-build.config | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v4 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-14 Thread Jason Wang

On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  wrote:
>
> On setups with one or more virtio-net devices with vhost on,
> dirty tracking iteration increases cost the bigger the number
> amount of queues are set up e.g. on idle guests migration the
> following is observed with virtio-net with vhost=on:
>
> 48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
> 8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
> 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
> 2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14
>
> With high memory rates the symptom is lack of convergence as soon
> as it has a vhost device with a sufficiently high number of queues,
> the sufficient number of vhost devices.
>
> On every migration iteration (every 100msecs) it will redundantly
> query the *shared log* the number of queues configured with vhost
> that exist in the guest. For the virtqueue data, this is necessary,
> but not for the memory sections which are the same. So essentially
> we end up scanning the dirty log too often.
>
> To fix that, select a vhost device responsible for scanning the
> log with regards to memory sections dirty tracking. It is selected
> when we enable the logger (during migration) and cleared when we
> disable the logger. If the vhost logger device goes away for some
> reason, the logger will be re-selected from the rest of vhost
> devices.
>
> After making mem-section logger a singleton instance, constant cost
> of 7%-9% (like the 1 queue report) will be seen, no matter how many
> queues or how many vhost devices are configured:
>
> 48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
> 2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14
>
> Co-developed-by: Joao Martins 
> Signed-off-by: Joao Martins 
> Signed-off-by: Si-Wei Liu 
>
> ---
> v3 -> v4:
>   - add comment to clarify effect on cache locality and
> performance
>
> v2 -> v3:
>   - add after-fix benchmark to commit log
>   - rename vhost_log_dev_enabled to vhost_dev_should_log
>   - remove unneeded comparisons for backend_type
>   - use QLIST array instead of single flat list to store vhost
> logger devices
>   - simplify logger election logic
> ---
>  hw/virtio/vhost.c | 67 
> ++-
>  include/hw/virtio/vhost.h |  1 +
>  2 files changed, 62 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 612f4db..58522f1 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -45,6 +45,7 @@
>
>  static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
>  static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
> +static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX];
>
>  /* Memslots used by backends that support private memslots (without an fd). 
> */
>  static unsigned int used_memslots;
> @@ -149,6 +150,47 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev)
>  }
>  }
>
> +static inline bool vhost_dev_should_log(struct vhost_dev *dev)
> +{
> +assert(dev->vhost_ops);
> +assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE);
> +assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX);
> +
> +return dev == QLIST_FIRST(_log_devs[dev->vhost_ops->backend_type]);

A dumb question, why not simple check

dev->log == vhost_log_shm[dev->vhost_ops->backend_type]

?

Thanks

[PATCH for 9.1 v9 09/11] hw/pci: Use UINT32_MAX as a default value for rombar

2024-03-14 Thread Akihiko Odaki

Currently there is no way to distinguish the case that rombar is
explicitly specified as 1 and the case that rombar is not specified.

Set rombar UINT32_MAX by default to distinguish these cases just as it
is done for addr and romsize. It was confirmed that changing the default
value to UINT32_MAX will not change the behavior by looking at
occurences of rom_bar.

$ git grep -w rom_bar
hw/display/qxl.c:328:QXLRom *rom = memory_region_get_ram_ptr(>rom_bar);
hw/display/qxl.c:431:qxl_set_dirty(>rom_bar, 0, qxl->rom_size);
hw/display/qxl.c:1048:QXLRom *rom = 
memory_region_get_ram_ptr(>rom_bar);
hw/display/qxl.c:2131:memory_region_init_rom(>rom_bar, OBJECT(qxl), 
"qxl.vrom",
hw/display/qxl.c:2154: PCI_BASE_ADDRESS_SPACE_MEMORY, >rom_bar);
hw/display/qxl.h:101:MemoryRegion   rom_bar;
hw/pci/pci.c:74:DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
hw/pci/pci.c:2329:if (!pdev->rom_bar) {
hw/vfio/pci.c:1019:if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
hw/xen/xen_pt_load_rom.c:29:if (dev->romfile || !dev->rom_bar) {
include/hw/pci/pci_device.h:150:uint32_t rom_bar;

rom_bar refers to a different variable in qxl. It is only tested if
the value is 0 or not in the other places.

If a user explicitly set UINT32_MAX, we still cannot distinguish that
from the implicit default. However, it is unlikely to be a problem as
nobody would type literal UINT32_MAX (0x or 4294967295) by
chance.

Signed-off-by: Akihiko Odaki 
---
 hw/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 84df07a2789b..cb5ac46e9f27 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -71,7 +71,7 @@ static Property pci_props[] = {
 DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
 DEFINE_PROP_STRING("romfile", PCIDevice, romfile),
 DEFINE_PROP_UINT32("romsize", PCIDevice, romsize, UINT32_MAX),
-DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
+DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, UINT32_MAX),
 DEFINE_PROP_BIT("multifunction", PCIDevice, cap_present,
 QEMU_PCI_CAP_MULTIFUNCTION_BITNR, false),
 DEFINE_PROP_BIT("x-pcie-lnksta-dllla", PCIDevice, cap_present,

-- 
2.44.0

[PATCH for 9.1 v9 08/11] hw/pci: Replace -1 with UINT32_MAX for romsize

2024-03-14 Thread Akihiko Odaki

romsize is an uint32_t variable. Specifying -1 as an uint32_t value is
obscure way to denote UINT32_MAX.

Worse, if int is wider than 32-bit, it will change the behavior of a
construct like the following:
romsize = -1;
if (romsize != -1) {
...
}

When -1 is assigned to romsize, -1 will be implicitly casted into
uint32_t, resulting in UINT32_MAX. On contrary, when evaluating
romsize != -1, romsize will be casted into int, and it will be a
comparison of UINT32_MAX and -1, and result in false.

Replace -1 with UINT32_MAX for statements involving the variable to
clarify the intent and prevent potential breakage.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Markus Armbruster 
---
 hw/pci/pci.c | 8 
 hw/xen/xen_pt_load_rom.c | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 54b375da2d26..84df07a2789b 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -70,7 +70,7 @@ static bool pcie_has_upstream_port(PCIDevice *dev);
 static Property pci_props[] = {
 DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
 DEFINE_PROP_STRING("romfile", PCIDevice, romfile),
-DEFINE_PROP_UINT32("romsize", PCIDevice, romsize, -1),
+DEFINE_PROP_UINT32("romsize", PCIDevice, romsize, UINT32_MAX),
 DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
 DEFINE_PROP_BIT("multifunction", PCIDevice, cap_present,
 QEMU_PCI_CAP_MULTIFUNCTION_BITNR, false),
@@ -2073,7 +2073,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
  g_cmp_uint32, NULL);
 }
 
-if (pci_dev->romsize != -1 && !is_power_of_2(pci_dev->romsize)) {
+if (pci_dev->romsize != UINT32_MAX && !is_power_of_2(pci_dev->romsize)) {
 error_setg(errp, "ROM size %u is not a power of two", 
pci_dev->romsize);
 return;
 }
@@ -2359,7 +2359,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
is_default_rom,
 return;
 }
 
-if (load_file || pdev->romsize == -1) {
+if (load_file || pdev->romsize == UINT32_MAX) {
 path = qemu_find_file(QEMU_FILE_TYPE_BIOS, pdev->romfile);
 if (path == NULL) {
 path = g_strdup(pdev->romfile);
@@ -2378,7 +2378,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
is_default_rom,
pdev->romfile);
 return;
 }
-if (pdev->romsize != -1) {
+if (pdev->romsize != UINT_MAX) {
 if (size > pdev->romsize) {
 error_setg(errp, "romfile \"%s\" (%u bytes) "
"is too large for ROM size %u",
diff --git a/hw/xen/xen_pt_load_rom.c b/hw/xen/xen_pt_load_rom.c
index 03422a8a7148..6bc64acd3352 100644
--- a/hw/xen/xen_pt_load_rom.c
+++ b/hw/xen/xen_pt_load_rom.c
@@ -53,7 +53,7 @@ void *pci_assign_dev_load_option_rom(PCIDevice *dev,
 }
 fseek(fp, 0, SEEK_SET);
 
-if (dev->romsize != -1) {
+if (dev->romsize != UINT_MAX) {
 if (st.st_size > dev->romsize) {
 error_report("ROM BAR \"%s\" (%ld bytes) is too large for ROM size 
%u",
  rom_file, (long) st.st_size, dev->romsize);

-- 
2.44.0

[PATCH for 9.1 v9 10/11] hw/pci: Determine if rombar is explicitly enabled

2024-03-14 Thread Akihiko Odaki

vfio determines if rombar is explicitly enabled by inspecting QDict.
Inspecting QDict is not nice because QDict is untyped and depends on the
details on the external interface. Add an infrastructure to determine if
rombar is explicitly enabled to hw/pci.

This changes the semantics of UINT32_MAX, which has always been a valid
value to explicitly say rombar is enabled to denote the implicit default
value. Nobody should have been set UINT32_MAX to rombar however,
considering that its meaning was no different from 1 and typing a
literal UINT32_MAX (0x or 4294967295) is more troublesome.

Signed-off-by: Akihiko Odaki 
---
 include/hw/pci/pci_device.h | 5 +
 hw/vfio/pci.c   | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index ca151325085d..6be0f989ebe0 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -205,6 +205,11 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev)
 return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn);
 }
 
+static inline bool pci_rom_bar_explicitly_enabled(PCIDevice *dev)
+{
+return dev->rom_bar && dev->rom_bar != UINT32_MAX;
+}
+
 static inline void pci_set_power(PCIDevice *pci_dev, bool state)
 {
 /*
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 64780d1b7933..8708d2c1e2a2 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1012,7 +1012,6 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 {
 uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
 off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
-DeviceState *dev = DEVICE(vdev);
 char *name;
 int fd = vdev->vbasedev.fd;
 
@@ -1046,7 +1045,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 }
 
 if (vfio_opt_rom_in_denylist(vdev)) {
-if (dev->opts && qdict_haskey(dev->opts, "rombar")) {
+if (pci_rom_bar_explicitly_enabled(>pdev)) {
 warn_report("Device at %s is known to cause system instability"
 " issues during option rom execution",
 vdev->vbasedev.name);

-- 
2.44.0

[PATCH for 9.1 v9 02/11] pcie_sriov: Do not manually unrealize

2024-03-14 Thread Akihiko Odaki

A device gets automatically unrealized when being unparented.

Signed-off-by: Akihiko Odaki 
---
 hw/pci/pcie_sriov.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index e9b23221d713..499becd5273f 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -204,11 +204,7 @@ static void unregister_vfs(PCIDevice *dev)
 trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
PCI_FUNC(dev->devfn), num_vfs);
 for (i = 0; i < num_vfs; i++) {
-Error *err = NULL;
 PCIDevice *vf = dev->exp.sriov_pf.vf[i];
-if (!object_property_set_bool(OBJECT(vf), "realized", false, )) {
-error_reportf_err(err, "Failed to unplug: ");
-}
 object_unparent(OBJECT(vf));
 object_unref(OBJECT(vf));
 }

-- 
2.44.0

[PATCH for 9.1 v9 01/11] hw/pci: Rename has_power to enabled

2024-03-14 Thread Akihiko Odaki

The renamed state will not only represent powering state of PFs, but
also represent SR-IOV VF enablement in the future.

Signed-off-by: Akihiko Odaki 
---
 include/hw/pci/pci.h|  7 ++-
 include/hw/pci/pci_device.h |  2 +-
 hw/pci/pci.c| 14 +++---
 hw/pci/pci_host.c   |  4 ++--
 4 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index eaa3fc99d884..6c92b2f70008 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -642,6 +642,11 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev)
 }
 
 MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
-void pci_set_power(PCIDevice *pci_dev, bool state);
+void pci_set_enabled(PCIDevice *pci_dev, bool state);
+
+static inline void pci_set_power(PCIDevice *pci_dev, bool state)
+{
+pci_set_enabled(pci_dev, state);
+}
 
 #endif
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index d3dd0f64b273..d57f9ce83884 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -56,7 +56,7 @@ typedef struct PCIReqIDCache PCIReqIDCache;
 struct PCIDevice {
 DeviceState qdev;
 bool partially_hotplugged;
-bool has_power;
+bool enabled;
 
 /* PCI config space */
 uint8_t *config;
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e7a39cb203ae..8bde13f7cd1e 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1525,7 +1525,7 @@ static void pci_update_mappings(PCIDevice *d)
 continue;
 
 new_addr = pci_bar_address(d, i, r->type, r->size);
-if (!d->has_power) {
+if (!d->enabled) {
 new_addr = PCI_BAR_UNMAPPED;
 }
 
@@ -1613,7 +1613,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
addr, uint32_t val_in, int
 pci_update_irq_disabled(d, was_irq_disabled);
 memory_region_set_enabled(>bus_master_enable_region,
   (pci_get_word(d->config + PCI_COMMAND)
-   & PCI_COMMAND_MASTER) && d->has_power);
+   & PCI_COMMAND_MASTER) && d->enabled);
 }
 
 msi_write_config(d, addr, val_in, l);
@@ -2811,18 +2811,18 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int 
vector)
 return msg;
 }
 
-void pci_set_power(PCIDevice *d, bool state)
+void pci_set_enabled(PCIDevice *d, bool state)
 {
-if (d->has_power == state) {
+if (d->enabled == state) {
 return;
 }
 
-d->has_power = state;
+d->enabled = state;
 pci_update_mappings(d);
 memory_region_set_enabled(>bus_master_enable_region,
   (pci_get_word(d->config + PCI_COMMAND)
-   & PCI_COMMAND_MASTER) && d->has_power);
-if (!d->has_power) {
+   & PCI_COMMAND_MASTER) && d->enabled);
+if (!d->enabled) {
 pci_device_reset(d);
 }
 }
diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
index dfe6fe618401..0d82727cc9dd 100644
--- a/hw/pci/pci_host.c
+++ b/hw/pci/pci_host.c
@@ -86,7 +86,7 @@ void pci_host_config_write_common(PCIDevice *pci_dev, 
uint32_t addr,
  * allowing direct removal of unexposed functions.
  */
 if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) ||
-!pci_dev->has_power || is_pci_dev_ejected(pci_dev)) {
+!pci_dev->enabled || is_pci_dev_ejected(pci_dev)) {
 return;
 }
 
@@ -111,7 +111,7 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, 
uint32_t addr,
  * allowing direct removal of unexposed functions.
  */
 if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) ||
-!pci_dev->has_power || is_pci_dev_ejected(pci_dev)) {
+!pci_dev->enabled || is_pci_dev_ejected(pci_dev)) {
 return ~0x0;
 }
 

-- 
2.44.0

[PATCH for 9.1 v9 03/11] pcie_sriov: Ensure VF function number does not overflow

2024-03-14 Thread Akihiko Odaki

pci_new() aborts when creating a VF with a function number equals to or
is greater than PCI_DEVFN_MAX.

Signed-off-by: Akihiko Odaki 
---
 docs/pcie_sriov.txt |  8 +---
 include/hw/pci/pcie_sriov.h |  5 +++--
 hw/net/igb.c| 13 ++---
 hw/nvme/ctrl.c  | 24 
 hw/pci/pcie_sriov.c | 19 +--
 5 files changed, 51 insertions(+), 18 deletions(-)

diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index a47aad0bfab0..ab2142807f79 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -52,9 +52,11 @@ setting up a BAR for a VF.
   ...
 
   /* Add and initialize the SR/IOV capability */
-  pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
-   vf_devid, initial_vfs, total_vfs,
-   fun_offset, stride);
+  if (!pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
+  vf_devid, initial_vfs, total_vfs,
+  fun_offset, stride, errp)) {
+ return;
+  }
 
   /* Set up individual VF BARs (parameters as for normal BARs) */
   pcie_sriov_pf_init_vf_bar( ... )
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index b77eb7bf58ac..3e16a269f526 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -27,10 +27,11 @@ struct PCIESriovVF {
 uint16_t vf_number; /* Logical VF number of this function */
 };
 
-void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 const char *vfname, uint16_t vf_dev_id,
 uint16_t init_vfs, uint16_t total_vfs,
-uint16_t vf_offset, uint16_t vf_stride);
+uint16_t vf_offset, uint16_t vf_stride,
+Error **errp);
 void pcie_sriov_pf_exit(PCIDevice *dev);
 
 /* Set up a VF bar in the SR/IOV bar area */
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 9b37523d6df8..907259fd8b3b 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -447,9 +447,16 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 
 pcie_ari_init(pci_dev, 0x150);
 
-pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
-IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
-IGB_VF_OFFSET, IGB_VF_STRIDE);
+if (!pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET,
+TYPE_IGBVF, IGB_82576_VF_DEV_ID,
+IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
+IGB_VF_OFFSET, IGB_VF_STRIDE,
+errp)) {
+pcie_cap_exit(pci_dev);
+igb_cleanup_msix(s);
+msi_uninit(pci_dev);
+return;
+}
 
 pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MMIO_BAR_IDX,
 PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH,
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index c2b17de9872c..bd31d5432654 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8048,7 +8048,8 @@ out:
 return pow2ceil(bar_size);
 }
 
-static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
+static bool nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
+Error **errp)
 {
 uint16_t vf_dev_id = n->params.use_intel_id ?
  PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME;
@@ -8057,12 +8058,17 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice 
*pci_dev, uint16_t offset)
   le16_to_cpu(cap->vifrsm),
   NULL, NULL);
 
-pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
-   n->params.sriov_max_vfs, n->params.sriov_max_vfs,
-   NVME_VF_OFFSET, NVME_VF_STRIDE);
+if (!pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
+n->params.sriov_max_vfs, n->params.sriov_max_vfs,
+NVME_VF_OFFSET, NVME_VF_STRIDE,
+errp)) {
+return false;
+}
 
 pcie_sriov_pf_init_vf_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
   PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size);
+
+return true;
 }
 
 static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset)
@@ -8155,6 +8161,12 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 return false;
 }
 
+if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs &&
+!nvme_init_sriov(n, pci_dev, 0x120, errp)) {
+msix_uninit(pci_dev, >bar0, >bar0);
+return false;
+}
+
 nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize);
 
 if (n->params.cmb_size_mb) {
@@ -8165,10 +8177,6 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 nvme_init_pmr(n, pci_dev);
 }
 
-if (!pci_is_vf(pci_dev) &&

[PATCH for 9.1 v9 11/11] hw/qdev: Remove opts member

2024-03-14 Thread Akihiko Odaki

It is no longer used.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Markus Armbruster 
---
 include/hw/qdev-core.h |  4 
 hw/core/qdev.c |  1 -
 system/qdev-monitor.c  | 12 +++-
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 9228e96c87e9..5954404dcbfe 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -237,10 +237,6 @@ struct DeviceState {
  * @pending_deleted_expires_ms: optional timeout for deletion events
  */
 int64_t pending_deleted_expires_ms;
-/**
- * @opts: QDict of options for the device
- */
-QDict *opts;
 /**
  * @hotplugged: was device added after PHASE_MACHINE_READY?
  */
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index c68d0f7c512f..7349c9a86be8 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -706,7 +706,6 @@ static void device_finalize(Object *obj)
 dev->canonical_path = NULL;
 }
 
-qobject_unref(dev->opts);
 g_free(dev->id);
 }
 
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index c1243891c38f..6bcf5e23e6de 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -624,6 +624,7 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
 char *id;
 DeviceState *dev = NULL;
 BusState *bus = NULL;
+QDict *properties;
 
 driver = qdict_get_try_str(opts, "driver");
 if (!driver) {
@@ -704,13 +705,14 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
 }
 
 /* set properties */
-dev->opts = qdict_clone_shallow(opts);
-qdict_del(dev->opts, "driver");
-qdict_del(dev->opts, "bus");
-qdict_del(dev->opts, "id");
+properties = qdict_clone_shallow(opts);
+qdict_del(properties, "driver");
+qdict_del(properties, "bus");
+qdict_del(properties, "id");
 
-object_set_properties_from_keyval(>parent_obj, dev->opts, from_json,
+object_set_properties_from_keyval(>parent_obj, properties, from_json,
   errp);
+qobject_unref(properties);
 if (*errp) {
 goto err_del_dev;
 }

-- 
2.44.0

[PATCH for 9.1 v9 06/11] pcie_sriov: Remove num_vfs from PCIESriovPF

2024-03-14 Thread Akihiko Odaki

num_vfs is not migrated so use PCI_SRIOV_CTRL_VFE and PCI_SRIOV_NUM_VF
instead.

Signed-off-by: Akihiko Odaki 
---
 include/hw/pci/pcie_sriov.h |  1 -
 hw/pci/pcie_sriov.c | 28 
 hw/pci/trace-events |  2 +-
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 4b1133f79e15..793d03c5f12e 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -16,7 +16,6 @@
 #include "hw/pci/pci.h"
 
 struct PCIESriovPF {
-uint16_t num_vfs;   /* Number of virtual functions created */
 uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
 PCIDevice **vf; /* Pointer to an array of num_vfs VF devices */
 };
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 9bd7f8acc3f4..fae6acea4acb 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -57,7 +57,6 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 pcie_add_capability(dev, PCI_EXT_CAP_ID_SRIOV, 1,
 offset, PCI_EXT_CAP_SRIOV_SIZEOF);
 dev->exp.sriov_cap = offset;
-dev->exp.sriov_pf.num_vfs = 0;
 dev->exp.sriov_pf.vf = NULL;
 
 pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
@@ -186,6 +185,12 @@ void pcie_sriov_vf_register_bar(PCIDevice *dev, int 
region_num,
 }
 }
 
+static void clear_ctrl_vfe(PCIDevice *dev)
+{
+uint8_t *ctrl = dev->config + dev->exp.sriov_cap + PCI_SRIOV_CTRL;
+pci_set_word(ctrl, pci_get_word(ctrl) & ~PCI_SRIOV_CTRL_VFE);
+}
+
 static void register_vfs(PCIDevice *dev)
 {
 uint16_t num_vfs;
@@ -195,6 +200,7 @@ static void register_vfs(PCIDevice *dev)
 assert(sriov_cap > 0);
 num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
 if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) {
+clear_ctrl_vfe(dev);
 return;
 }
 
@@ -203,20 +209,18 @@ static void register_vfs(PCIDevice *dev)
 for (i = 0; i < num_vfs; i++) {
 pci_set_enabled(dev->exp.sriov_pf.vf[i], true);
 }
-dev->exp.sriov_pf.num_vfs = num_vfs;
 }
 
 static void unregister_vfs(PCIDevice *dev)
 {
-uint16_t num_vfs = dev->exp.sriov_pf.num_vfs;
 uint16_t i;
+uint8_t *cfg = dev->config + dev->exp.sriov_cap;
 
 trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
-   PCI_FUNC(dev->devfn), num_vfs);
-for (i = 0; i < num_vfs; i++) {
+   PCI_FUNC(dev->devfn));
+for (i = 0; i < pci_get_word(cfg + PCI_SRIOV_TOTAL_VF); i++) {
 pci_set_enabled(dev->exp.sriov_pf.vf[i], false);
 }
-dev->exp.sriov_pf.num_vfs = 0;
 }
 
 void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
@@ -242,6 +246,9 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t 
address,
 } else {
 unregister_vfs(dev);
 }
+} else if (range_covers_byte(off, len, PCI_SRIOV_NUM_VF)) {
+clear_ctrl_vfe(dev);
+unregister_vfs(dev);
 }
 }
 
@@ -304,7 +311,7 @@ PCIDevice *pcie_sriov_get_pf(PCIDevice *dev)
 PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int n)
 {
 assert(!pci_is_vf(dev));
-if (n < dev->exp.sriov_pf.num_vfs) {
+if (n < pcie_sriov_num_vfs(dev)) {
 return dev->exp.sriov_pf.vf[n];
 }
 return NULL;
@@ -312,5 +319,10 @@ PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int 
n)
 
 uint16_t pcie_sriov_num_vfs(PCIDevice *dev)
 {
-return dev->exp.sriov_pf.num_vfs;
+uint16_t sriov_cap = dev->exp.sriov_cap;
+uint8_t *cfg = dev->config + sriov_cap;
+
+return sriov_cap &&
+   (pci_get_word(cfg + PCI_SRIOV_CTRL) & PCI_SRIOV_CTRL_VFE) ?
+   pci_get_word(cfg + PCI_SRIOV_NUM_VF) : 0;
 }
diff --git a/hw/pci/trace-events b/hw/pci/trace-events
index 19643aa8c6b0..e98f575a9d19 100644
--- a/hw/pci/trace-events
+++ b/hw/pci/trace-events
@@ -14,7 +14,7 @@ msix_write_config(char *name, bool enabled, bool masked) "dev 
%s enabled %d mask
 
 # hw/pci/pcie_sriov.c
 sriov_register_vfs(const char *name, int slot, int function, int num_vfs) "%s 
%02x:%x: creating %d vf devs"
-sriov_unregister_vfs(const char *name, int slot, int function, int num_vfs) 
"%s %02x:%x: Unregistering %d vf devs"
+sriov_unregister_vfs(const char *name, int slot, int function) "%s %02x:%x: 
Unregistering vf devs"
 sriov_config_write(const char *name, int slot, int fun, uint32_t offset, 
uint32_t val, uint32_t len) "%s %02x:%x: sriov offset 0x%x val 0x%x len %d"
 
 # pcie.c

-- 
2.44.0

[PATCH for 9.1 v9 07/11] pcie_sriov: Register VFs after migration

2024-03-14 Thread Akihiko Odaki

pcie_sriov doesn't have code to restore its state after migration, but
igb, which uses pcie_sriov, naively claimed its migration capability.

Add code to register VFs after migration and fix igb migration.

Fixes: 3a977deebe6b ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
---
 include/hw/pci/pcie_sriov.h | 2 ++
 hw/pci/pci.c| 7 +++
 hw/pci/pcie_sriov.c | 7 +++
 3 files changed, 16 insertions(+)

diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 793d03c5f12e..d576a8c6be19 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -57,6 +57,8 @@ void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t 
opt_sup_pgsize);
 void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
  uint32_t val, int len);
 
+void pcie_sriov_pf_post_load(PCIDevice *dev);
+
 /* Reset SR/IOV */
 void pcie_sriov_pf_reset(PCIDevice *dev);
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 750c2ba696d1..54b375da2d26 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -733,10 +733,17 @@ static bool migrate_is_not_pcie(void *opaque, int 
version_id)
 return !pci_is_express((PCIDevice *)opaque);
 }
 
+static int pci_post_load(void *opaque, int version_id)
+{
+pcie_sriov_pf_post_load(opaque);
+return 0;
+}
+
 const VMStateDescription vmstate_pci_device = {
 .name = "PCIDevice",
 .version_id = 2,
 .minimum_version_id = 1,
+.post_load = pci_post_load,
 .fields = (const VMStateField[]) {
 VMSTATE_INT32_POSITIVE_LE(version_id, PCIDevice),
 VMSTATE_BUFFER_UNSAFE_INFO_TEST(config, PCIDevice,
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index fae6acea4acb..56523ab4e833 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -252,6 +252,13 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t 
address,
 }
 }
 
+void pcie_sriov_pf_post_load(PCIDevice *dev)
+{
+if (dev->exp.sriov_cap) {
+register_vfs(dev);
+}
+}
+
 
 /* Reset SR/IOV */
 void pcie_sriov_pf_reset(PCIDevice *dev)

-- 
2.44.0

[PATCH for 9.1 v9 04/11] pcie_sriov: Reuse SR-IOV VF device instances

2024-03-14 Thread Akihiko Odaki

Disable SR-IOV VF devices by reusing code to power down PCI devices
instead of removing them when the guest requests to disable VFs. This
allows to realize devices and report VF realization errors at PF
realization time.

Signed-off-by: Akihiko Odaki 
---
 include/hw/pci/pci.h|  5 ---
 include/hw/pci/pci_device.h | 15 +++
 include/hw/pci/pcie_sriov.h |  1 -
 hw/pci/pci.c|  2 +-
 hw/pci/pcie_sriov.c | 95 +++--
 5 files changed, 56 insertions(+), 62 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6c92b2f70008..442017b4865d 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -644,9 +644,4 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev)
 MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
 void pci_set_enabled(PCIDevice *pci_dev, bool state);
 
-static inline void pci_set_power(PCIDevice *pci_dev, bool state)
-{
-pci_set_enabled(pci_dev, state);
-}
-
 #endif
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index d57f9ce83884..ca151325085d 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -205,6 +205,21 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev)
 return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn);
 }
 
+static inline void pci_set_power(PCIDevice *pci_dev, bool state)
+{
+/*
+ * Don't change the enabled state of VFs when powering on/off the device.
+ *
+ * When powering on, VFs must not be enabled immediately but they must
+ * wait until the guest configures SR-IOV.
+ * When powering off, their corresponding PFs will be reset and disable
+ * VFs.
+ */
+if (!pci_is_vf(pci_dev)) {
+pci_set_enabled(pci_dev, state);
+}
+}
+
 uint16_t pci_requester_id(PCIDevice *dev);
 
 /* DMA access functions */
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 3e16a269f526..4b1133f79e15 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -18,7 +18,6 @@
 struct PCIESriovPF {
 uint16_t num_vfs;   /* Number of virtual functions created */
 uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
-const char *vfname; /* Reference to the device type used for the VFs */
 PCIDevice **vf; /* Pointer to an array of num_vfs VF devices */
 };
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 8bde13f7cd1e..750c2ba696d1 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2822,7 +2822,7 @@ void pci_set_enabled(PCIDevice *d, bool state)
 memory_region_set_enabled(>bus_master_enable_region,
   (pci_get_word(d->config + PCI_COMMAND)
& PCI_COMMAND_MASTER) && d->enabled);
-if (!d->enabled) {
+if (d->qdev.realized) {
 pci_device_reset(d);
 }
 }
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index f0bde0d3fc79..faadb0d2ea85 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -20,9 +20,16 @@
 #include "qapi/error.h"
 #include "trace.h"
 
-static PCIDevice *register_vf(PCIDevice *pf, int devfn,
-  const char *name, uint16_t vf_num);
-static void unregister_vfs(PCIDevice *dev);
+static void unparent_vfs(PCIDevice *dev, uint16_t total_vfs)
+{
+for (uint16_t i = 0; i < total_vfs; i++) {
+PCIDevice *vf = dev->exp.sriov_pf.vf[i];
+object_unparent(OBJECT(vf));
+object_unref(OBJECT(vf));
+}
+g_free(dev->exp.sriov_pf.vf);
+dev->exp.sriov_pf.vf = NULL;
+}
 
 bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 const char *vfname, uint16_t vf_dev_id,
@@ -30,6 +37,8 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 uint16_t vf_offset, uint16_t vf_stride,
 Error **errp)
 {
+BusState *bus = qdev_get_parent_bus(>qdev);
+int32_t devfn = dev->devfn + vf_offset;
 uint8_t *cfg = dev->config + offset;
 uint8_t *wmask;
 
@@ -49,7 +58,6 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 offset, PCI_EXT_CAP_SRIOV_SIZEOF);
 dev->exp.sriov_cap = offset;
 dev->exp.sriov_pf.num_vfs = 0;
-dev->exp.sriov_pf.vfname = g_strdup(vfname);
 dev->exp.sriov_pf.vf = NULL;
 
 pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
@@ -83,14 +91,34 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 
 qdev_prop_set_bit(>qdev, "multifunction", true);
 
+dev->exp.sriov_pf.vf = g_new(PCIDevice *, total_vfs);
+
+for (uint16_t i = 0; i < total_vfs; i++) {
+PCIDevice *vf = pci_new(devfn, vfname);
+vf->exp.sriov_vf.pf = dev;
+vf->exp.sriov_vf.vf_number = i;
+
+if (!qdev_realize(>qdev, bus, errp)) {
+unparent_vfs(dev, i);
+return false;
+}
+
+/* set vid/did according to sr/iov spec - they are not used */
+

[PATCH for 9.1 v9 05/11] pcie_sriov: Release VFs failed to realize

2024-03-14 Thread Akihiko Odaki

Release VFs failed to realize just as we do in unregister_vfs().

Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization 
(SR/IOV)")
Signed-off-by: Akihiko Odaki 
---
 hw/pci/pcie_sriov.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index faadb0d2ea85..9bd7f8acc3f4 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -99,6 +99,8 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 vf->exp.sriov_vf.vf_number = i;
 
 if (!qdev_realize(>qdev, bus, errp)) {
+object_unparent(OBJECT(vf));
+object_unref(vf);
 unparent_vfs(dev, i);
 return false;
 }

-- 
2.44.0

[PATCH for 9.1 v9 00/11] hw/pci: SR-IOV related fixes and improvements

2024-03-14 Thread Akihiko Odaki

I submitted a RFC series[1] to add support for SR-IOV emulation to
virtio-net-pci. During the development of the series, I fixed some
trivial bugs and made improvements that I think are independently
useful. This series extracts those fixes and improvements from the RFC
series.

[1]: https://patchew.org/QEMU/20231210-sriov-v2-0-b959e8a6d...@daynix.com/

Signed-off-by: Akihiko Odaki 
---
Changes in v9:
- Rebased.
- Restored '#include "qapi/error.h"' (Michael S. Tsirkin)
- Added patch "pcie_sriov: Ensure VF function number does not overflow"
  to fix abortion with wrong PF addr.
- Link to v8: 
https://lore.kernel.org/r/20240228-reuse-v8-0-282660281...@daynix.com

Changes in v8:
- Clarified that "hw/pci: Replace -1 with UINT32_MAX for romsize" is
  not a bug fix. (Markus Armbruster)
- Squashed patch "vfio: Avoid inspecting option QDict for rombar" into
  "hw/pci: Determine if rombar is explicitly enabled".
  (Markus Armbruster)
- Noted the minor semantics change for patch "hw/pci: Determine if
  rombar is explicitly enabled". (Markus Armbruster)
- Link to v7: 
https://lore.kernel.org/r/20240224-reuse-v7-0-29c14bcb9...@daynix.com

Changes in v7:
- Replaced -1 with UINT32_MAX when expressing uint32_t.
  (Markus Armbruster)
- Added patch "hw/pci: Replace -1 with UINT32_MAX for romsize".
- Link to v6: 
https://lore.kernel.org/r/20240220-reuse-v6-0-2e42a28b0...@daynix.com

Changes in v6:
- Fixed migration.
- Added patch "pcie_sriov: Do not manually unrealize".
- Restored patch "pcie_sriov: Release VFs failed to realize" that was
  missed in v5.
- Link to v5: 
https://lore.kernel.org/r/20240218-reuse-v5-0-e4fc1c19b...@daynix.com

Changes in v5:
- Added patch "hw/pci: Always call pcie_sriov_pf_reset()".
- Added patch "pcie_sriov: Reset SR-IOV extended capability".
- Removed a reference to PCI_SRIOV_CTRL_VFE in hw/nvme.
  (Michael S. Tsirkin)
- Noted the impact on the guest of patch "pcie_sriov: Do not reset
  NumVFs after unregistering VFs". (Michael S. Tsirkin)
- Changed to use pcie_sriov_num_vfs().
- Restored pci_set_power() and changed it to call pci_set_enabled() only
  for PFs with an expalanation. (Michael S. Tsirkin)
- Reordered patches.
- Link to v4: 
https://lore.kernel.org/r/20240214-reuse-v4-0-89ad093a0...@daynix.com

Changes in v4:
- Reverted the change to pci_rom_bar_explicitly_enabled().
  (Michael S. Tsirkin)
- Added patch "pcie_sriov: Do not reset NumVFs after unregistering VFs".
- Added patch "hw/nvme: Refer to dev->exp.sriov_pf.num_vfs".
- Link to v3: 
https://lore.kernel.org/r/20240212-reuse-v3-0-8017b689c...@daynix.com

Changes in v3:
- Extracted patch "hw/pci: Use -1 as a default value for rombar" from
  patch "hw/pci: Determine if rombar is explicitly enabled"
  (Philippe Mathieu-Daudé)
- Added an audit result of PCIDevice::rom_bar to the message of patch
  "hw/pci: Use -1 as a default value for rombar"
  (Philippe Mathieu-Daudé)
- Link to v2: 
https://lore.kernel.org/r/20240210-reuse-v2-0-24ba2a502...@daynix.com

Changes in v2:
- Reset after enabling a function so that NVMe VF state gets updated.
- Link to v1: 
https://lore.kernel.org/r/20240203-reuse-v1-0-5be8c5ce6...@daynix.com

---
Akihiko Odaki (11):
  hw/pci: Rename has_power to enabled
  pcie_sriov: Do not manually unrealize
  pcie_sriov: Ensure VF function number does not overflow
  pcie_sriov: Reuse SR-IOV VF device instances
  pcie_sriov: Release VFs failed to realize
  pcie_sriov: Remove num_vfs from PCIESriovPF
  pcie_sriov: Register VFs after migration
  hw/pci: Replace -1 with UINT32_MAX for romsize
  hw/pci: Use UINT32_MAX as a default value for rombar
  hw/pci: Determine if rombar is explicitly enabled
  hw/qdev: Remove opts member

 docs/pcie_sriov.txt |   8 ++-
 include/hw/pci/pci.h|   2 +-
 include/hw/pci/pci_device.h |  22 ++-
 include/hw/pci/pcie_sriov.h |   9 +--
 include/hw/qdev-core.h  |   4 --
 hw/core/qdev.c  |   1 -
 hw/net/igb.c|  13 +++-
 hw/nvme/ctrl.c  |  24 ---
 hw/pci/pci.c|  31 +
 hw/pci/pci_host.c   |   4 +-
 hw/pci/pcie_sriov.c | 149 
 hw/vfio/pci.c   |   3 +-
 hw/xen/xen_pt_load_rom.c|   2 +-
 system/qdev-monitor.c   |  12 ++--
 hw/pci/trace-events |   2 +-
 15 files changed, 172 insertions(+), 114 deletions(-)
---
base-commit: ba49d760eb04630e7b15f423ebecf6c871b8f77b
change-id: 20240129-reuse-faae22b11934

Best regards,
-- 
Akihiko Odaki

Re: [PATCH v4 1/2] vhost: dirty log should be per backend type

2024-03-14 Thread Jason Wang

On Fri, Mar 15, 2024 at 5:39 AM Si-Wei Liu  wrote:
>
> There could be a mix of both vhost-user and vhost-kernel clients
> in the same QEMU process, where separate vhost loggers for the
> specific vhost type have to be used. Make the vhost logger per
> backend type, and have them properly reference counted.

It's better to describe what's the advantage of doing this.

>
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Si-Wei Liu 
>
> ---
> v3->v4:
>   - remove checking NULL return value from vhost_log_get
>
> v2->v3:
>   - remove non-effective assertion that never be reached
>   - do not return NULL from vhost_log_get()
>   - add neccessary assertions to vhost_log_get()
> ---
>  hw/virtio/vhost.c | 45 +
>  1 file changed, 33 insertions(+), 12 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 2c9ac79..612f4db 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -43,8 +43,8 @@
>  do { } while (0)
>  #endif
>
> -static struct vhost_log *vhost_log;
> -static struct vhost_log *vhost_log_shm;
> +static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
> +static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
>
>  /* Memslots used by backends that support private memslots (without an fd). 
> */
>  static unsigned int used_memslots;
> @@ -287,6 +287,10 @@ static int vhost_set_backend_type(struct vhost_dev *dev,
>  r = -1;
>  }
>
> +if (r == 0) {
> +assert(dev->vhost_ops->backend_type == backend_type);
> +}
> +

Under which condition could we hit this? It seems not good to assert a
local logic.

Thanks

[RFC PATCH v3 3/3] migration: Add fd to FileMigrationArgs

2024-03-14 Thread Fabiano Rosas

The fd: URI has supported migration to a file or socket since before
QEMU 8.2. In 8.2 we added the file: URI that supported migration to a
file. So now we have two ways (three if you count exec:>cat) to
migrate to a file. Fine.

However,

In 8.2 we also added the new qmp_migrate API that uses a JSON channel
list instead of the URI. It added two migration transports SOCKET and
FILE. It was decided that the new API would classify the fd migration
as a type of socket migration, neglecting the fact that the fd.c code
also supported file migrations.

In 9.0 we're adding support for fd + multifd + mapped-ram, which is
tied to the file migration. This was implemented in fd.c, which is
only reachable when the SOCKET address type is used.

The result of this is that we're asking users of the new API to create   (1)
something called a "socket" to perform migration to a plain file. And
creating something called a "file" provides no way of passing in a
file descriptor. This is confusing.

The new API also parses the old-style URI into the new-style data
structures, so the old and correct fd: now also considers fd:
to be socket-related only.

Unlike the other migration addresses, the fd comes already setup from
the user, it's not just a string that QEMU will use to start a
connection. We need to actually fetch the fd from the monitor before
even being able to poke at it to know if it is a socket.

Aside from the issue (1) above, the current approach of parsing the fd
into a SOCKET and only later deciding if it is file-backed doesn't
work well now that we're adding multifd support for fd: and file:
migration via the mapped-ram feature. With a larger number of
combinations, we need to be able to know upfront when an fd is backed
by a plain file vs. a socket.

We're currently using a trick of allowing socket channels to pass some   (2)
validation that they shouldn't, to only later verify if the fd was
indeed a socket.

To clean this up I'm proposing we start requiring users of the new API
to use the "file" structure when the migration stream will end up in a
file and use the "socket" structure when the fd points to an actual
socket. This improves (1).

To keep backward compatibility, I'm still accepting everything that
was accepted before and only changing the structures internally to
allow the rest of the code to rely on the MIGRATION_ADDRESS_TYPE. This
addresses (2).

We can then slowly deprecate the wrong way, i.e. using type socket for
fd + file migration.

In this patch you'll find:

- a new function migrate_resolve_fd() that fetches the fd from the
  monitor and converts the MigrationAddress into the correct type;

- the removal of the hacks for (2);

- a helper function to convert the resolved fd that's represented as a
  string for storage in the MigrationAddress back into an integer for
  consumption.

Signed-off-by: Fabiano Rosas 
---
 migration/fd.c|  15 +--
 migration/file.c  |  14 +++---
 migration/migration.c | 100 --
 migration/migration.h |   1 +
 qapi/migration.json   |  11 -
 5 files changed, 107 insertions(+), 34 deletions(-)

diff --git a/migration/fd.c b/migration/fd.c
index fe0d096abd..2d4414c7ea 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -20,9 +20,6 @@
 #include "fd.h"
 #include "file.h"
 #include "migration.h"
-#include "monitor/monitor.h"
-#include "io/channel-file.h"
-#include "io/channel-socket.h"
 #include "io/channel-util.h"
 #include "options.h"
 #include "trace.h"
@@ -48,8 +45,7 @@ void fd_cleanup_outgoing_migration(void)
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
**errp)
 {
 QIOChannel *ioc;
-int fd = monitor_get_fd(monitor_cur(), fdname, errp);
-int newfd;
+int newfd, fd = migrate_fd_str_to_int(fdname, errp);
 
 if (fd == -1) {
 return;
@@ -91,7 +87,7 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
 void fd_start_incoming_migration(const char *fdname, Error **errp)
 {
 QIOChannel *ioc;
-int fd = monitor_fd_param(monitor_cur(), fdname, errp);
+int fd = migrate_fd_str_to_int(fdname, errp);
 if (fd == -1) {
 return;
 }
@@ -105,13 +101,6 @@ void fd_start_incoming_migration(const char *fdname, Error 
**errp)
 }
 
 if (migrate_multifd()) {
-if (fd_is_socket(fd)) {
-error_setg(errp,
-   "Multifd migration to a socket FD is not supported");
-object_unref(ioc);
-return;
-}
-
 file_create_incoming_channels(ioc, errp);
 } else {
 qio_channel_set_name(ioc, "migration-fd-incoming");
diff --git a/migration/file.c b/migration/file.c
index b6e8ba13f2..b29521ffa7 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -59,12 +59,6 @@ bool file_send_channel_create(gpointer opaque, Error **errp)
 int fd = fd_args_get_fd();
 
 if (fd && fd != -1) {
-if (fd_is_socket(fd)) {
-error_setg(errp,
-

[PATCH v3 2/3] migration/multifd: Duplicate the fd for the outgoing_args

2024-03-14 Thread Fabiano Rosas

We currently store the file descriptor used during the main outgoing
channel creation to use it again when creating the multifd
channels.

Since this fd is used for the first iochannel, there's risk that the
QIOChannel gets freed and the fd closed while outgoing_args.fd still
has it available. This could lead to an fd-reuse bug.

Duplicate the outgoing_args fd to avoid this issue.

Suggested-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/fd.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/migration/fd.c b/migration/fd.c
index c07030f715..fe0d096abd 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -49,8 +49,7 @@ void fd_start_outgoing_migration(MigrationState *s, const 
char *fdname, Error **
 {
 QIOChannel *ioc;
 int fd = monitor_get_fd(monitor_cur(), fdname, errp);
-
-outgoing_args.fd = -1;
+int newfd;
 
 if (fd == -1) {
 return;
@@ -63,7 +62,17 @@ void fd_start_outgoing_migration(MigrationState *s, const 
char *fdname, Error **
 return;
 }
 
-outgoing_args.fd = fd;
+/*
+ * This is dup()ed just to avoid referencing an fd that might
+ * be already closed by the iochannel.
+ */
+newfd = dup(fd);
+if (newfd == -1) {
+error_setg_errno(errp, errno, "Could not dup FD %d", fd);
+object_unref(ioc);
+return;
+}
+outgoing_args.fd = newfd;
 
 qio_channel_set_name(ioc, "migration-fd-outgoing");
 migration_channel_connect(s, ioc, NULL, NULL);
-- 
2.35.3

[PATCH v3 1/3] migration/multifd: Ensure we're not given a socket for file migration

2024-03-14 Thread Fabiano Rosas

When doing migration using the fd: URI, QEMU will fetch the file
descriptor passed in via the monitor at
fd_start_outgoing|incoming_migration(), which means the checks at
migration_channels_and_transport_compatible() happen too soon and we
don't know at that point whether the FD refers to a plain file or a
socket.

For this reason, we've been allowing a migration channel of type
SOCKET_ADDRESS_TYPE_FD to pass the initial verifications in scenarios
where the socket migration is not supported, such as with fd + multifd.

The commit decdc76772 ("migration/multifd: Add mapped-ram support to
fd: URI") was supposed to add a second check prior to starting
migration to make sure a socket fd is not passed instead of a file fd,
but failed to do so.

Add the missing verification and update the comment explaining this
situation which is currently incorrect.

Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI")
Signed-off-by: Fabiano Rosas 
---
 migration/fd.c| 8 
 migration/file.c  | 7 +++
 migration/migration.c | 6 +++---
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/migration/fd.c b/migration/fd.c
index 39a52e5c90..c07030f715 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -22,6 +22,7 @@
 #include "migration.h"
 #include "monitor/monitor.h"
 #include "io/channel-file.h"
+#include "io/channel-socket.h"
 #include "io/channel-util.h"
 #include "options.h"
 #include "trace.h"
@@ -95,6 +96,13 @@ void fd_start_incoming_migration(const char *fdname, Error 
**errp)
 }
 
 if (migrate_multifd()) {
+if (fd_is_socket(fd)) {
+error_setg(errp,
+   "Multifd migration to a socket FD is not supported");
+object_unref(ioc);
+return;
+}
+
 file_create_incoming_channels(ioc, errp);
 } else {
 qio_channel_set_name(ioc, "migration-fd-incoming");
diff --git a/migration/file.c b/migration/file.c
index ddde0ca818..b6e8ba13f2 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -15,6 +15,7 @@
 #include "file.h"
 #include "migration.h"
 #include "io/channel-file.h"
+#include "io/channel-socket.h"
 #include "io/channel-util.h"
 #include "options.h"
 #include "trace.h"
@@ -58,6 +59,12 @@ bool file_send_channel_create(gpointer opaque, Error **errp)
 int fd = fd_args_get_fd();
 
 if (fd && fd != -1) {
+if (fd_is_socket(fd)) {
+error_setg(errp,
+   "Multifd migration to a socket FD is not supported");
+goto out;
+}
+
 ioc = qio_channel_file_new_dupfd(fd, errp);
 } else {
 ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
diff --git a/migration/migration.c b/migration/migration.c
index 644e073b7d..f60bd371e3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -166,9 +166,9 @@ static bool transport_supports_seeking(MigrationAddress 
*addr)
 }
 
 /*
- * At this point, the user might not yet have passed the file
- * descriptor to QEMU, so we cannot know for sure whether it
- * refers to a plain file or a socket. Let it through anyway.
+ * At this point QEMU has not yet fetched the fd passed in by the
+ * user, so we cannot know for sure whether it refers to a plain
+ * file or a socket. Let it through anyway and check at fd.c.
  */
 if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
 return addr->u.socket.type == SOCKET_ADDRESS_TYPE_FD;
-- 
2.35.3

[PATCH v3 0/3] migration mapped-ram fixes

2024-03-14 Thread Fabiano Rosas

Hi,

In this v3:

patch 1 - The fd_is_socket() verification and an update to the comment
  in the code;

patch 2 - The fix for the fd-reuse bug in outgoing_args;

patch 3 - A proposal on how to fix the fd-socket vs. fd-file
  issue. I'm basically moving the fd_is_socket() call earlier
  to be able to do the checks properly.

based-on: https://gitlab.com/peterx/qemu/-/commits/migration-stable

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1214405210

Fabiano Rosas (3):
  migration/multifd: Ensure we're not given a socket for file migration
  migration/multifd: Duplicate the fd for the outgoing_args
  migration: Add fd to FileMigrationArgs

 migration/fd.c|  20 ++---
 migration/file.c  |   9 
 migration/migration.c | 100 --
 migration/migration.h |   1 +
 qapi/migration.json   |  11 -
 5 files changed, 119 insertions(+), 22 deletions(-)

-- 
2.35.3

Re: [PATCH] input-linux: Add option to not grab a device upon guest startup

2024-03-14 Thread Justinien Bouron

Just a ping to make sure this patch hasn't been lost in the noise.
Any chance to get this merged? Should I send a v2 with a revised commit message?

Regards,
Justinien

Re: [RFC PATCH 5/5] cxl/core: add poison injection event handler

2024-03-14 Thread Shiyang Ruan via





在 2024/2/14 0:51, Jonathan Cameron 写道:



+
+void cxl_event_handle_record(struct cxl_memdev *cxlmd,
+enum cxl_event_log_type type,
+enum cxl_event_type event_type,
+const uuid_t *uuid, union cxl_event *evt)
+{
+   if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
trace_cxl_general_media(cxlmd, type, >gen_media);
-   else if (event_type == CXL_CPER_EVENT_DRAM)
+   /* handle poison event */
+   if (type == CXL_EVENT_TYPE_FAIL)
+   cxl_event_handle_poison(cxlmd, >gen_media);


I'm not 100% convinced this is necessary poison causing.  Also
the text tells us we should see 'an appropriate event'.
DRAM one seems likely to be chosen by some vendors.


I think it's right to use DRAM Event Record for volatile-memdev, but 
should poison on a persistent-memdev also use DRAM Event Record too? 
Though its 'Physical Address' feild has the 'Volatile' bit too, which is 
same as General Media Event Record.  I am a bit confused about this.




The fatal check maybe makes it a little more likely (maybe though
I'm not sure anything says a device must log it to the failure log)
but it might be Memory Event Type 1, which is the host tried to
access an invalid address.  Sure poison might be returned to that
error but what would the main kernel memory handling do with it?
Something is very wrong
but it's not corrupted device memory.  TE state violations are in there
as well. Sure poison is returned on reads (I think - haven't checked).

IF the aim here is to say 'maybe there is poison, better check the
poison list'. Then that is reasonable but we should ensure things
like timer expiry are definitely ruled out and rename the function
to make it clear it might not find poison.


I forgot to distinguish the 'Transaction Type' here. Host Inject Poison 
is 0x04h. And other types should also have their specific handle method.



--
Thanks,
Ruan.



Jonathan

Re: [PATCH v9 0/7] QEMU CXL Provide mock CXL events and irq support

2024-03-14 Thread Yuquan Wang

Hello, Jonathan

When during the test of qmps of CXL events like 
"cxl-inject-general-media-event", 
I am confuesd about the argument "flags". According to "qapi/cxl.json" in qemu, 
this argument represents "Event Record Flags" in Common Event Record Format.
However, it seems like the specific 'Event Record Severity' in this field can be
different from the value of 'Event Status' in "Event Status Register". 

For instance (take an injection example in the coverlatter):

{ "execute": "cxl-inject-general-media-event",
"arguments": {
"path": "/machine/peripheral/cxl-mem0",
"log": "informational",
"flags": 1,
"dpa": 1000,
"descriptor": 3,
"type": 3,
"transaction-type": 192,
"channel": 3,
"device": 5,
"component-id": "iras mem"
}}

In my understanding, the 'Event Status' is informational and the 
'Event Record Severity' is Warning event, which means these two arguments are
independent of each other. Is my understanding correct?

Many thanks
Yuquan

答复: [PATCH V8 6/8] physmem: Add helper function to destroy CPU AddressSpace

2024-03-14 Thread zhukeqian via

Hi Salil,

[...]

+void cpu_address_space_destroy(CPUState *cpu, int asidx) {
+CPUAddressSpace *cpuas;
+
+assert(cpu->cpu_ases);
+assert(asidx >= 0 && asidx < cpu->num_ases);
+/* KVM cannot currently support multiple address spaces. */
+assert(asidx == 0 || !kvm_enabled());
+
+cpuas = >cpu_ases[asidx];
+if (tcg_enabled()) {
+memory_listener_unregister(>tcg_as_listener);
+}
+
+address_space_destroy(cpuas->as);
+g_free_rcu(cpuas->as, rcu);

In address_space_destroy(), it calls call_rcu1() on cpuas->as which will set 
do_address_space_destroy() as the rcu func.
And g_free_rcu() also calls call_rcu1() on cpuas->as which will overwrite the 
rcu func as g_free().

Then I think the g_free() may be called twice in rcu thread, please verify that.

The source code of call_rcu1:

void call_rcu1(struct rcu_head *node, void (*func)(struct rcu_head *node))
{
node->func = func;
enqueue(node);
qatomic_inc(_call_count);
qemu_event_set(_call_ready_event);
}

Thanks,
Keqian

+
+if (asidx == 0) {
+/* reset the convenience alias for address space 0 */
+cpu->as = NULL;
+}
+
+if (--cpu->cpu_ases_count == 0) {
+g_free(cpu->cpu_ases);
+cpu->cpu_ases = NULL;
+}
+}
+
 AddressSpace *cpu_get_address_space(CPUState *cpu, int asidx)  {
 /* Return the AddressSpace corresponding to the specified index */
--
2.34.1

Re: Unmapping KVM Guest Memory from Host Kernel

2024-03-14 Thread Manwaring, Derek

On Fri, 8 Mar 2024 15:22:50 -0800, Sean Christopherson wrote:
> On Fri, Mar 08, 2024, James Gowans wrote:
> > We are also aware of ongoing work on guest_memfd. The current
> > implementation unmaps guest memory from VMM address space, but leaves it
> > in the kernel’s direct map. We’re not looking at unmapping from VMM
> > userspace yet; we still need guest RAM there for PV drivers like virtio
> > to continue to work. So KVM’s gmem doesn’t seem like the right solution?
>
> We (and by "we", I really mean the pKVM folks) are also working on allowing
> userspace to mmap() guest_memfd[*].  pKVM aside, the long term vision I have 
> for
> guest_memfd is to be able to use it for non-CoCo VMs, precisely for the 
> security
> and robustness benefits it can bring.
>
> What I am hoping to do with guest_memfd is get userspace to only map memory it
> needs, e.g. for emulated/synthetic devices, on-demand.  I.e. to get to a state
> where guest memory is mapped only when it needs to be.

Thank you for the direction, this is super helpful.

We are new to the guest_memfd space, and for simplicity we'd prefer to
leave guest_memfd completely mapped in userspace. Even in the long term,
we actually don't have any use for unmapping from host userspace. The
current form of marking pages shared doesn't quite align with what we're
trying to do either since it also shares the pages with the host kernel.

What are your thoughts on a flag for KVM_CREATE_GUEST_MEMFD that only
removes from the host kernel's direct map, but leaves everything mapped
in userspace?

Derek

[PATCH v4 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-14 Thread Si-Wei Liu

On setups with one or more virtio-net devices with vhost on,
dirty tracking iteration increases cost the bigger the number
amount of queues are set up e.g. on idle guests migration the
following is observed with virtio-net with vhost=on:

48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14

With high memory rates the symptom is lack of convergence as soon
as it has a vhost device with a sufficiently high number of queues,
the sufficient number of vhost devices.

On every migration iteration (every 100msecs) it will redundantly
query the *shared log* the number of queues configured with vhost
that exist in the guest. For the virtqueue data, this is necessary,
but not for the memory sections which are the same. So essentially
we end up scanning the dirty log too often.

To fix that, select a vhost device responsible for scanning the
log with regards to memory sections dirty tracking. It is selected
when we enable the logger (during migration) and cleared when we
disable the logger. If the vhost logger device goes away for some
reason, the logger will be re-selected from the rest of vhost
devices.

After making mem-section logger a singleton instance, constant cost
of 7%-9% (like the 1 queue report) will be seen, no matter how many
queues or how many vhost devices are configured:

48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14

Co-developed-by: Joao Martins 
Signed-off-by: Joao Martins 
Signed-off-by: Si-Wei Liu 

---
v3 -> v4:
  - add comment to clarify effect on cache locality and
performance

v2 -> v3:
  - add after-fix benchmark to commit log
  - rename vhost_log_dev_enabled to vhost_dev_should_log
  - remove unneeded comparisons for backend_type
  - use QLIST array instead of single flat list to store vhost
logger devices
  - simplify logger election logic
---
 hw/virtio/vhost.c | 67 ++-
 include/hw/virtio/vhost.h |  1 +
 2 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 612f4db..58522f1 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -45,6 +45,7 @@
 
 static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
 static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
+static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX];
 
 /* Memslots used by backends that support private memslots (without an fd). */
 static unsigned int used_memslots;
@@ -149,6 +150,47 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev)
 }
 }
 
+static inline bool vhost_dev_should_log(struct vhost_dev *dev)
+{
+assert(dev->vhost_ops);
+assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE);
+assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX);
+
+return dev == QLIST_FIRST(_log_devs[dev->vhost_ops->backend_type]);
+}
+
+static inline void vhost_dev_elect_mem_logger(struct vhost_dev *hdev, bool add)
+{
+VhostBackendType backend_type;
+
+assert(hdev->vhost_ops);
+
+backend_type = hdev->vhost_ops->backend_type;
+assert(backend_type > VHOST_BACKEND_TYPE_NONE);
+assert(backend_type < VHOST_BACKEND_TYPE_MAX);
+
+if (add && !QLIST_IS_INSERTED(hdev, logdev_entry)) {
+if (QLIST_EMPTY(_log_devs[backend_type])) {
+QLIST_INSERT_HEAD(_log_devs[backend_type],
+  hdev, logdev_entry);
+} else {
+/*
+ * The first vhost_device in the list is selected as the shared
+ * logger to scan memory sections. Put new entry next to the head
+ * to avoid inadvertent change to the underlying logger device.
+ * This is done in order to get better cache locality and to avoid
+ * performance churn on the hot path for log scanning. Even when
+ * new devices come and go quickly, it wouldn't end up changing
+ * the active leading logger device at all.
+ */
+QLIST_INSERT_AFTER(QLIST_FIRST(_log_devs[backend_type]),
+   hdev, logdev_entry);
+}
+} else if (!add && QLIST_IS_INSERTED(hdev, logdev_entry)) {
+QLIST_REMOVE(hdev, logdev_entry);
+}
+}
+
 static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
MemoryRegionSection *section,
hwaddr first,
@@ -166,12 +208,14 @@ static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
 start_addr = MAX(first, start_addr);
 end_addr = MIN(last, end_addr);
 
-for (i = 0; i < dev->mem->nregions; ++i) {
-struct vhost_memory_region *reg = dev->mem->regions + i;
-vhost_dev_sync_region(dev, section, start_addr, end_addr,
-

[PATCH v4 1/2] vhost: dirty log should be per backend type

2024-03-14 Thread Si-Wei Liu

There could be a mix of both vhost-user and vhost-kernel clients
in the same QEMU process, where separate vhost loggers for the
specific vhost type have to be used. Make the vhost logger per
backend type, and have them properly reference counted.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Si-Wei Liu 

---
v3->v4:
  - remove checking NULL return value from vhost_log_get

v2->v3:
  - remove non-effective assertion that never be reached
  - do not return NULL from vhost_log_get()
  - add neccessary assertions to vhost_log_get()
---
 hw/virtio/vhost.c | 45 +
 1 file changed, 33 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2c9ac79..612f4db 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -43,8 +43,8 @@
 do { } while (0)
 #endif
 
-static struct vhost_log *vhost_log;
-static struct vhost_log *vhost_log_shm;
+static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
+static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
 
 /* Memslots used by backends that support private memslots (without an fd). */
 static unsigned int used_memslots;
@@ -287,6 +287,10 @@ static int vhost_set_backend_type(struct vhost_dev *dev,
 r = -1;
 }
 
+if (r == 0) {
+assert(dev->vhost_ops->backend_type == backend_type);
+}
+
 return r;
 }
 
@@ -319,16 +323,22 @@ static struct vhost_log *vhost_log_alloc(uint64_t size, 
bool share)
 return log;
 }
 
-static struct vhost_log *vhost_log_get(uint64_t size, bool share)
+static struct vhost_log *vhost_log_get(VhostBackendType backend_type,
+   uint64_t size, bool share)
 {
-struct vhost_log *log = share ? vhost_log_shm : vhost_log;
+struct vhost_log *log;
+
+assert(backend_type > VHOST_BACKEND_TYPE_NONE);
+assert(backend_type < VHOST_BACKEND_TYPE_MAX);
+
+log = share ? vhost_log_shm[backend_type] : vhost_log[backend_type];
 
 if (!log || log->size != size) {
 log = vhost_log_alloc(size, share);
 if (share) {
-vhost_log_shm = log;
+vhost_log_shm[backend_type] = log;
 } else {
-vhost_log = log;
+vhost_log[backend_type] = log;
 }
 } else {
 ++log->refcnt;
@@ -340,11 +350,20 @@ static struct vhost_log *vhost_log_get(uint64_t size, 
bool share)
 static void vhost_log_put(struct vhost_dev *dev, bool sync)
 {
 struct vhost_log *log = dev->log;
+VhostBackendType backend_type;
 
 if (!log) {
 return;
 }
 
+assert(dev->vhost_ops);
+backend_type = dev->vhost_ops->backend_type;
+
+if (backend_type == VHOST_BACKEND_TYPE_NONE ||
+backend_type >= VHOST_BACKEND_TYPE_MAX) {
+return;
+}
+
 --log->refcnt;
 if (log->refcnt == 0) {
 /* Sync only the range covered by the old log */
@@ -352,13 +371,13 @@ static void vhost_log_put(struct vhost_dev *dev, bool 
sync)
 vhost_log_sync_range(dev, 0, dev->log_size * VHOST_LOG_CHUNK - 1);
 }
 
-if (vhost_log == log) {
+if (vhost_log[backend_type] == log) {
 g_free(log->log);
-vhost_log = NULL;
-} else if (vhost_log_shm == log) {
+vhost_log[backend_type] = NULL;
+} else if (vhost_log_shm[backend_type] == log) {
 qemu_memfd_free(log->log, log->size * sizeof(*(log->log)),
 log->fd);
-vhost_log_shm = NULL;
+vhost_log_shm[backend_type] = NULL;
 }
 
 g_free(log);
@@ -376,7 +395,8 @@ static bool vhost_dev_log_is_shared(struct vhost_dev *dev)
 
 static inline void vhost_dev_log_resize(struct vhost_dev *dev, uint64_t size)
 {
-struct vhost_log *log = vhost_log_get(size, vhost_dev_log_is_shared(dev));
+struct vhost_log *log = vhost_log_get(dev->vhost_ops->backend_type,
+  size, vhost_dev_log_is_shared(dev));
 uint64_t log_base = (uintptr_t)log->log;
 int r;
 
@@ -2037,7 +2057,8 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev, bool vrings)
 uint64_t log_base;
 
 hdev->log_size = vhost_get_log_size(hdev);
-hdev->log = vhost_log_get(hdev->log_size,
+hdev->log = vhost_log_get(hdev->vhost_ops->backend_type,
+  hdev->log_size,
   vhost_dev_log_is_shared(hdev));
 log_base = (uintptr_t)hdev->log->log;
 r = hdev->vhost_ops->vhost_set_log_base(hdev,
-- 
1.8.3.1

[PATCH v2] target/s390x: improve cpu compatibility check error message

2024-03-14 Thread Claudio Fontana

some users were confused by this message showing under TCG:

Selected CPU generation is too new. Maximum supported model
in the configuration: 'xyz'

Clarify that the maximum can depend on the accel, and add a
hint to try a different one.

Also add a hint for features mismatch to suggest trying
different accel, QEMU and kernel versions.

Signed-off-by: Claudio Fontana 
---
 target/s390x/cpu_models.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 1a1c096122..8ed3bb6a27 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -500,6 +500,16 @@ static void error_prepend_missing_feat(const char *name, 
void *opaque)
 error_prepend((Error **) opaque, "%s ", name);
 }
 
+static void check_compat_model_failed(Error **errp,
+  const S390CPUModel *max_model,
+  const char *msg)
+{
+error_setg(errp, "%s. Maximum supported model in the current 
configuration: \'%s\'",
+   msg, max_model->def->name);
+error_append_hint(errp, "Consider a different accelerator, try \"-accel 
help\"\n");
+return;
+}
+
 static void check_compatibility(const S390CPUModel *max_model,
 const S390CPUModel *model, Error **errp)
 {
@@ -507,15 +517,11 @@ static void check_compatibility(const S390CPUModel 
*max_model,
 S390FeatBitmap missing;
 
 if (model->def->gen > max_model->def->gen) {
-error_setg(errp, "Selected CPU generation is too new. Maximum "
-   "supported model in the configuration: \'%s\'",
-   max_model->def->name);
+check_compat_model_failed(errp, max_model, "Selected CPU generation is 
too new");
 return;
 } else if (model->def->gen == max_model->def->gen &&
model->def->ec_ga > max_model->def->ec_ga) {
-error_setg(errp, "Selected CPU GA level is too new. Maximum "
-   "supported model in the configuration: \'%s\'",
-   max_model->def->name);
+check_compat_model_failed(errp, max_model, "Selected CPU GA level is 
too new");
 return;
 }
 
@@ -537,7 +543,9 @@ static void check_compatibility(const S390CPUModel 
*max_model,
 error_setg(errp, " ");
 s390_feat_bitmap_to_ascii(missing, errp, error_prepend_missing_feat);
 error_prepend(errp, "Some features requested in the CPU model are not "
-  "available in the configuration: ");
+  "available in the current configuration: ");
+error_append_hint(errp,
+  "Consider a different accelerator, QEMU, or kernel 
version\n");
 }
 
 S390CPUModel *get_max_cpu_model(Error **errp)
-- 
2.26.2

Regression in v7.2.10 - ui-dbus.so requires -fPIC

2024-03-14 Thread Olaf Hering

ui-dbus.so is a shared library. But it is apparently handled differently
than all the other shared libraries: it is not compiled with -fPIC.

As a result it fails to link. Not sure why this happens only here.
Everything up to v7.2.9 was fine.

Looking at some random other library like libui-spice-core.a,
every object is compiled with -fPIC. 

But ui/dbus-display1.c is compiled with -fPIE instead.

Is this intentional? 

Olaf

ld: ui/libdbus-display1.a.p/meson-generated_.._dbus-display1.c.o: warning: 
relocation against `qemu_dbus_display1_audio_get_type' in read-only section 
`.text'


pgpwtSSADV5iy.pgp
Description: Digitale Signatur von OpenPGP

Re: [PATCH v5 06/13] hw/mem/cxl_type3: Add host backend and address space handling for DC regions

2024-03-14 Thread fan

On Wed, Mar 06, 2024 at 04:28:16PM +, Jonathan Cameron wrote:
> On Mon,  4 Mar 2024 11:34:01 -0800
> nifan@gmail.com wrote:
> 
> > From: Fan Ni 
> > 
> > Add (file/memory backed) host backend, all the dynamic capacity regions
> > will share a single, large enough host backend. Set up address space for
> > DC regions to support read/write operations to dynamic capacity for DCD.
> > 
> > With the change, following supports are added:
> > 1. Add a new property to type3 device "volatile-dc-memdev" to point to host
> >memory backend for dynamic capacity. Currently, all dc regions share one
> >host backend.
> > 2. Add namespace for dynamic capacity for read/write support;
> > 3. Create cdat entries for each dynamic capacity region;
> > 4. Fix dvsec range registers to include DC regions.
> > 
> > Signed-off-by: Fan Ni 
> Hi Fan, 
> 
> This one has a few more significant comments inline.
> 
> thanks,
> 
> Jonathan
> 
> > ---
> 
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index c045fee32d..2b380a260b 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -45,7 +45,8 @@ enum {
> >  
> >  static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> >int dsmad_handle, uint64_t size,
> > -  bool is_pmem, uint64_t dpa_base)
> > +  bool is_pmem, bool is_dynamic,
> > +  uint64_t dpa_base)
> >  {
> >  g_autofree CDATDsmas *dsmas = NULL;
> >  g_autofree CDATDslbis *dslbis0 = NULL;
> 
> There is a fixlet going through for these as the autofree doesn't do anything.
> Will require a rebase.  I'll do it on my tree, but might not push that out 
> for a
> few days so this is just a heads up for anyone using these.
> 
> https://lore.kernel.org/qemu-devel/20240304104406.59855-1-th...@redhat.com/
> 
> It went in clean for me, so may not even be something anyone notices!
> 
> > @@ -61,7 +62,8 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader 
> > **cdat_table,
> >  .length = sizeof(*dsmas),
> >  },
> >  .DSMADhandle = dsmad_handle,
> > -.flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
> > +.flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
> > + (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),
> >  .DPA_base = dpa_base,
> >  .DPA_length = size,
> >  };
> > @@ -149,12 +151,13 @@ static int ct3_build_cdat_table(CDATSubHeader 
> > ***cdat_table, void *priv)
> >  g_autofree CDATSubHeader **table = NULL;
> >  
> >  
> > @@ -176,21 +179,55 @@ static int ct3_build_cdat_table(CDATSubHeader 
> > ***cdat_table, void *priv)
> >  pmr_size = memory_region_size(nonvolatile_mr);
> >  }
> >  
> > +if (ct3d->dc.num_regions) {
> > +if (ct3d->dc.host_dc) {
> > +dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > +if (!dc_mr) {
> > +return -EINVAL;
> > +}
> > +len += CT3_CDAT_NUM_ENTRIES * ct3d->dc.num_regions;
> > +} else {
> > +return -EINVAL;
> 
> Flip logic to get the error out the way first and reduce indent.
> 
>  if (ct3d->dc.num_regions) {
> if (!ct3d->dc.host_dc) {
> return -EINVAL;
> }
> dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> if (!dc_mr) {
> return -EINVAL;
> }
> len += CT3...
>  }
> 
> > +}
> > +}
> > +
> 
> >  
> >  *cdat_table = g_steal_pointer();
> > @@ -300,11 +337,24 @@ static void build_dvsecs(CXLType3Dev *ct3d)
> >  range2_size_hi = ct3d->hostpmem->size >> 32;
> >  range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> >   (ct3d->hostpmem->size & 0xF000);
> > +} else if (ct3d->dc.host_dc) {
> > +range2_size_hi = ct3d->dc.host_dc->size >> 32;
> > +range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > + (ct3d->dc.host_dc->size & 0xF000);
> >  }
> > -} else {
> > +} else if (ct3d->hostpmem) {
> >  range1_size_hi = ct3d->hostpmem->size >> 32;
> >  range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> >   (ct3d->hostpmem->size & 0xF000);
> > +if (ct3d->dc.host_dc) {
> > +range2_size_hi = ct3d->dc.host_dc->size >> 32;
> > +range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > + (ct3d->dc.host_dc->size & 0xF000);
> > +}
> > +} else {
> > +range1_size_hi = ct3d->dc.host_dc->size >> 32;
> > +range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
> > + (ct3d->dc.host_dc->size & 0xF000);
> 
> I've forgotten if we ever closed out on the right thing to do
> with the legacy range registers.   Maybe, just ignoring DC is the
> right option for now?  So

Re: [PATCH] target/s390x: improve cpu compatibility check error message

2024-03-14 Thread Claudio Fontana

On 3/14/24 21:10, Claudio Fontana wrote:
> On 3/14/24 20:44, Nina Schoetterl-Glausch wrote:
>> On Thu, 2024-03-14 at 20:00 +0100, Claudio Fontana wrote:
>>> some users were confused by this message showing under TCG:
>>>
>>> Selected CPU generation is too new. Maximum supported model
>>> in the configuration: 'xyz'
>>>
>>> Try to clarify that the maximum can depend on the accel by
>>> adding also the current accelerator to the message as such:
>>>
>>> Selected CPU generation is too new. Maximum supported model
>>> in the accelerator 'tcg' configuration: 'xyz'
>>>
>>> Signed-off-by: Claudio Fontana 
>>> ---
>>>  target/s390x/cpu_models.c | 11 ++-
>>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
>>> index 1a1c096122..0d6d8fc727 100644
>>> --- a/target/s390x/cpu_models.c
>>> +++ b/target/s390x/cpu_models.c
>>> @@ -508,14 +508,14 @@ static void check_compatibility(const S390CPUModel 
>>> *max_model,
>>>  
>>>  if (model->def->gen > max_model->def->gen) {
>>>  error_setg(errp, "Selected CPU generation is too new. Maximum "
>>> -   "supported model in the configuration: \'%s\'",
>>> -   max_model->def->name);
>>> +   "supported model in the accelerator \'%s\' 
>>> configuration: \'%s\'",
>>> +   current_accel_name(), max_model->def->name);
>>>  return;
>>>  } else if (model->def->gen == max_model->def->gen &&
>>> model->def->ec_ga > max_model->def->ec_ga) {
>>>  error_setg(errp, "Selected CPU GA level is too new. Maximum "
>>> -   "supported model in the configuration: \'%s\'",
>>> -   max_model->def->name);
>>> +   "supported model in the accelerator \'%s\' 
>>> configuration: \'%s\'",
>>> +   current_accel_name(), max_model->def->name);
>>>  return;
>>>  }
>>>  
>>> @@ -537,7 +537,8 @@ static void check_compatibility(const S390CPUModel 
>>> *max_model,
>>>  error_setg(errp, " ");
>>>  s390_feat_bitmap_to_ascii(missing, errp, error_prepend_missing_feat);
>>>  error_prepend(errp, "Some features requested in the CPU model are not "
>>> -  "available in the configuration: ");
>>> +  "available in the accelerator \'%s\' configuration: ",
>>> +  current_accel_name());
>>>  }
>>
>> I wonder if these might not be confusing in other circumstances, e.g. when
>> running with KVM and the Linux version lacks support for some feature.
> 
> Here you are referencing specifically the last hunk right? Ie the "Some 
> features requested..." message.
> 
>> I think something along the lines of:
>>
>> error_...(errp, "... supported by the current configuration ...", ...);
>> error_append_hint(errp, "Consider using a different accelerator, a different 
>> QEMU version or, when using KVM, a different kernel");
>>
>> would be better.
> 
> Interesting I'll try something along these lines.
> 
>>
>> I'm not sure about line breaks in error message, I like the better 
>> grepability
>> of unbroken lines but the coding style guide doesn't mention anything.
> 
> better greppability in the log (as the error message in the log), or in the 
> source code (or both)?
> I am generally in favor of both, but there might be constraints on line 
> length, although scripts/checkpatch.pl did not complain when I attempted this 
> (I wonder if bug or feature).
> 
> docs/devel/style.rst on the code line length topic says:
> "Lines should be 80 characters; try not to make them longer..."
> 
> and it does talk about exceptions. In the case of error message strings I 
> think this could be one one of those exceptions.
> 
> In terms of logs, I did not find anything either, the most pertinent section 
> should be "Error handling and reporting" in the same file,
> but there is nothing about breaking up [or not] a single message in errors 
> with newlines.

Ah I forgot the mythical include/qapi/error.h:

for error_setg we have:

"The resulting message should be a single phrase, with no newline or trailing 
punctuation."

so this helps.

> 
>>>  
>>>  S390CPUModel *get_max_cpu_model(Error **errp)
>>
> 
> Thanks,
> 
> Claudio

Re: [PATCH v2 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-14 Thread Jonah Palmer





On 3/14/24 3:05 PM, Eugenio Perez Martin wrote:

On Thu, Mar 14, 2024 at 5:06 PM Jonah Palmer  wrote:




On 3/14/24 10:55 AM, Eugenio Perez Martin wrote:

On Thu, Mar 14, 2024 at 1:16 PM Jonah Palmer  wrote:




On 3/13/24 11:01 PM, Jason Wang wrote:

On Wed, Mar 13, 2024 at 7:55 PM Jonah Palmer  wrote:


Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.

The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.

In a split virtqueue layout, this data includes:
- upper 16 bits: shadow_avail_idx
- lower 16 bits: virtqueue index

In a packed virtqueue layout, this data includes:
- upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
- lower 16 bits: virtqueue index

Tested-by: Lei Yang 
Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
hw/virtio/virtio-pci.c | 10 +++---
hw/virtio/virtio.c | 18 ++
include/hw/virtio/virtio.h |  1 +
3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index cb6940fc0e..0f5c3c3b2f 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
{
VirtIOPCIProxy *proxy = opaque;
VirtIODevice *vdev = virtio_bus_get_device(>bus);
-uint16_t vector;
+uint16_t vector, vq_idx;
hwaddr pa;

switch (addr) {
@@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
vdev->queue_sel = val;
break;
case VIRTIO_PCI_QUEUE_NOTIFY:
-if (val < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, val);
+vq_idx = val;
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_data(vdev, val);
+}
+virtio_queue_notify(vdev, vq_idx);
}
break;
case VIRTIO_PCI_STATUS:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d229755eae..bcb9e09df0 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, 
int align)
}
}

+void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data)


Maybe I didn't explain well, but I think it is better to pass directly
idx to a VirtQueue *. That way only the caller needs to check for a
valid vq idx, and (my understanding is) the virtio.c interface is
migrating to VirtQueue * use anyway.



Oh, are you saying to just pass in a VirtQueue *vq instead of
VirtIODevice *vdev and get rid of the vq->vring.desc check in the function?



No, that needs to be kept. I meant the access to vdev->vq[i] without
checking for a valid i.



Ahh okay I see what you mean. But I thought the following was checking 
for a valid VQ index:


if (vq_idx < VIRTIO_QUEUE_MAX)

Of course the virtio device may not have up to VIRTIO_QUEUE_MAX 
virtqueues, so maybe we should be checking for validity like this?


if (vdev->vq[i].vring.num == 0)

Or was there something else you had in mind? Apologies for the confusion.


You can get the VirtQueue in the caller with virtio_get_queue. Which
also does not check for a valid index, but that way is clearer the
caller needs to check it.



Roger, I'll use this instead for clarity.


As a side note, the check for desc != 0 is widespread in QEMU but the
driver may use 0 address for desc, so it's not 100% valid. But to
change that now requires a deeper change out of the scope of this
series, so let's keep it for now :).

Thanks! >


I'll add it to the todo list =]


+{
+/* Lower 16 bits is the virtqueue index */
+uint16_t i = data;
+VirtQueue *vq = >vq[i];
+
+if (!vq->vring.desc) {
+return;
+}
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+vq->shadow_avail_wrap_counter = (data >> 31) & 0x1;
+vq->shadow_avail_idx = (data >> 16) & 0x7FFF;
+} else {
+vq->shadow_avail_idx = (data >> 16);


Do we need to do a sanity check for this value?

Thanks



It can't hurt, right? What kind of check did you have in mind?

if (vq->shadow_avail_idx >= vq->vring.num)



I'm a little bit lost too. shadow_avail_idx can take all uint16_t
values. Maybe you meant checking for a valid vq index, Jason?

Thanks!


Or something else?


+}
+}
+
static void virtio_queue_notify_vq(VirtQueue *vq)
{
if (vq->vring.desc && vq->handle_output) {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8f72850bc..53915947a7 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -335,6 +335,7 @@ void

Re: [PATCH] target/s390x: improve cpu compatibility check error message

2024-03-14 Thread Claudio Fontana

On 3/14/24 20:44, Nina Schoetterl-Glausch wrote:
> On Thu, 2024-03-14 at 20:00 +0100, Claudio Fontana wrote:
>> some users were confused by this message showing under TCG:
>>
>> Selected CPU generation is too new. Maximum supported model
>> in the configuration: 'xyz'
>>
>> Try to clarify that the maximum can depend on the accel by
>> adding also the current accelerator to the message as such:
>>
>> Selected CPU generation is too new. Maximum supported model
>> in the accelerator 'tcg' configuration: 'xyz'
>>
>> Signed-off-by: Claudio Fontana 
>> ---
>>  target/s390x/cpu_models.c | 11 ++-
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
>> index 1a1c096122..0d6d8fc727 100644
>> --- a/target/s390x/cpu_models.c
>> +++ b/target/s390x/cpu_models.c
>> @@ -508,14 +508,14 @@ static void check_compatibility(const S390CPUModel 
>> *max_model,
>>  
>>  if (model->def->gen > max_model->def->gen) {
>>  error_setg(errp, "Selected CPU generation is too new. Maximum "
>> -   "supported model in the configuration: \'%s\'",
>> -   max_model->def->name);
>> +   "supported model in the accelerator \'%s\' 
>> configuration: \'%s\'",
>> +   current_accel_name(), max_model->def->name);
>>  return;
>>  } else if (model->def->gen == max_model->def->gen &&
>> model->def->ec_ga > max_model->def->ec_ga) {
>>  error_setg(errp, "Selected CPU GA level is too new. Maximum "
>> -   "supported model in the configuration: \'%s\'",
>> -   max_model->def->name);
>> +   "supported model in the accelerator \'%s\' 
>> configuration: \'%s\'",
>> +   current_accel_name(), max_model->def->name);
>>  return;
>>  }
>>  
>> @@ -537,7 +537,8 @@ static void check_compatibility(const S390CPUModel 
>> *max_model,
>>  error_setg(errp, " ");
>>  s390_feat_bitmap_to_ascii(missing, errp, error_prepend_missing_feat);
>>  error_prepend(errp, "Some features requested in the CPU model are not "
>> -  "available in the configuration: ");
>> +  "available in the accelerator \'%s\' configuration: ",
>> +  current_accel_name());
>>  }
> 
> I wonder if these might not be confusing in other circumstances, e.g. when
> running with KVM and the Linux version lacks support for some feature.

Here you are referencing specifically the last hunk right? Ie the "Some 
features requested..." message.

> I think something along the lines of:
> 
> error_...(errp, "... supported by the current configuration ...", ...);
> error_append_hint(errp, "Consider using a different accelerator, a different 
> QEMU version or, when using KVM, a different kernel");
> 
> would be better.

Interesting I'll try something along these lines.

> 
> I'm not sure about line breaks in error message, I like the better grepability
> of unbroken lines but the coding style guide doesn't mention anything.

better greppability in the log (as the error message in the log), or in the 
source code (or both)?
I am generally in favor of both, but there might be constraints on line length, 
although scripts/checkpatch.pl did not complain when I attempted this (I wonder 
if bug or feature).

docs/devel/style.rst on the code line length topic says:
"Lines should be 80 characters; try not to make them longer..."

and it does talk about exceptions. In the case of error message strings I think 
this could be one one of those exceptions.

In terms of logs, I did not find anything either, the most pertinent section 
should be "Error handling and reporting" in the same file,
but there is nothing about breaking up [or not] a single message in errors with 
newlines.

>>  
>>  S390CPUModel *get_max_cpu_model(Error **errp)
> 

Thanks,

Claudio

Re: [PATCH] target/s390x: improve cpu compatibility check error message

2024-03-14 Thread Nina Schoetterl-Glausch

On Thu, 2024-03-14 at 20:00 +0100, Claudio Fontana wrote:
> some users were confused by this message showing under TCG:
> 
> Selected CPU generation is too new. Maximum supported model
> in the configuration: 'xyz'
> 
> Try to clarify that the maximum can depend on the accel by
> adding also the current accelerator to the message as such:
> 
> Selected CPU generation is too new. Maximum supported model
> in the accelerator 'tcg' configuration: 'xyz'
> 
> Signed-off-by: Claudio Fontana 
> ---
>  target/s390x/cpu_models.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
> index 1a1c096122..0d6d8fc727 100644
> --- a/target/s390x/cpu_models.c
> +++ b/target/s390x/cpu_models.c
> @@ -508,14 +508,14 @@ static void check_compatibility(const S390CPUModel 
> *max_model,
>  
>  if (model->def->gen > max_model->def->gen) {
>  error_setg(errp, "Selected CPU generation is too new. Maximum "
> -   "supported model in the configuration: \'%s\'",
> -   max_model->def->name);
> +   "supported model in the accelerator \'%s\' configuration: 
> \'%s\'",
> +   current_accel_name(), max_model->def->name);
>  return;
>  } else if (model->def->gen == max_model->def->gen &&
> model->def->ec_ga > max_model->def->ec_ga) {
>  error_setg(errp, "Selected CPU GA level is too new. Maximum "
> -   "supported model in the configuration: \'%s\'",
> -   max_model->def->name);
> +   "supported model in the accelerator \'%s\' configuration: 
> \'%s\'",
> +   current_accel_name(), max_model->def->name);
>  return;
>  }
>  
> @@ -537,7 +537,8 @@ static void check_compatibility(const S390CPUModel 
> *max_model,
>  error_setg(errp, " ");
>  s390_feat_bitmap_to_ascii(missing, errp, error_prepend_missing_feat);
>  error_prepend(errp, "Some features requested in the CPU model are not "
> -  "available in the configuration: ");
> +  "available in the accelerator \'%s\' configuration: ",
> +  current_accel_name());
>  }

I wonder if these might not be confusing in other circumstances, e.g. when
running with KVM and the Linux version lacks support for some feature.
I think something along the lines of:

error_...(errp, "... supported by the current configuration ...", ...);
error_append_hint(errp, "Consider using a different accelerator, a different 
QEMU version or, when using KVM, a different kernel");

would be better.

I'm not sure about line breaks in error message, I like the better grepability
of unbroken lines but the coding style guide doesn't mention anything.
>  
>  S390CPUModel *get_max_cpu_model(Error **errp)

Re: [PATCH v7 4/4] target/riscv: Enable sdtrig for Ventana's Veyron CPUs

2024-03-14 Thread Andrew Jones

On Fri, Mar 15, 2024 at 12:29:57AM +0530, Himanshu Chauhan wrote:
> Ventana's Veyron CPUs support sdtrig ISA extension. By default, enable
> the sdtrig extension and disable the debug property for these CPUs.

You still have the 'and disable the debug property' here...

> 
> Signed-off-by: Himanshu Chauhan 
> ---
>  target/riscv/cpu.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 4231f36c1b..c9dda73748 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -569,6 +569,7 @@ static void rv64_veyron_v1_cpu_init(Object *obj)
>  cpu->cfg.cbom_blocksize = 64;
>  cpu->cfg.cboz_blocksize = 64;
>  cpu->cfg.ext_zicboz = true;
> +cpu->cfg.ext_sdtrig = true;
>  cpu->cfg.ext_smaia = true;
>  cpu->cfg.ext_ssaia = true;
>  cpu->cfg.ext_sscofpmf = true;
> -- 
> 2.34.1
>

Re: [PATCH v7 3/4] target/riscv: Expose sdtrig ISA extension

2024-03-14 Thread Andrew Jones

On Fri, Mar 15, 2024 at 12:29:56AM +0530, Himanshu Chauhan wrote:
> This patch adds "sdtrig" in the ISA string when sdtrig extension is enabled.
> The sdtrig extension may or may not be implemented in a system. Therefore, the
>-cpu rv64,sdtrig=
> option can be used to dynamically turn sdtrig extension on or off.
> 
> By default, the sdtrig extension is disabled and debug property enabled as 
> usual.
> 
> Signed-off-by: Himanshu Chauhan 
> ---
>  target/riscv/cpu.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index ab631500ac..4231f36c1b 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -175,6 +175,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
>  ISA_EXT_DATA_ENTRY(zvkt, PRIV_VERSION_1_12_0, ext_zvkt),
>  ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
>  ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
> +ISA_EXT_DATA_ENTRY(sdtrig, PRIV_VERSION_1_12_0, ext_sdtrig),

So we're sure this should be 1.12? Or do we need to introduce
PRIV_VERSION_1_13_0?

>  ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
>  ISA_EXT_DATA_ENTRY(smepmp, PRIV_VERSION_1_12_0, ext_smepmp),
>  ISA_EXT_DATA_ENTRY(smstateen, PRIV_VERSION_1_12_0, ext_smstateen),
> @@ -1485,6 +1486,7 @@ const RISCVCPUMultiExtConfig riscv_cpu_extensions[] = {
>  MULTI_EXT_CFG_BOOL("zvfhmin", ext_zvfhmin, false),
>  MULTI_EXT_CFG_BOOL("sstc", ext_sstc, true),
>  
> +MULTI_EXT_CFG_BOOL("sdtrig", ext_sdtrig, false),
>  MULTI_EXT_CFG_BOOL("smaia", ext_smaia, false),
>  MULTI_EXT_CFG_BOOL("smepmp", ext_smepmp, false),
>  MULTI_EXT_CFG_BOOL("smstateen", ext_smstateen, false),
> -- 
> 2.34.1
>

Thanks,
drew

Re: [PATCH v7 2/4] target/riscv: Enable mcontrol6 triggers only when sdtrig is selected

2024-03-14 Thread Andrew Jones

On Fri, Mar 15, 2024 at 12:29:55AM +0530, Himanshu Chauhan wrote:
> The mcontrol6 triggers are not defined in debug specification v0.13
> These triggers are defined in sdtrig ISA extension.
> 
> This patch:
>* Adds ext_sdtrig capability which is used to select mcontrol6 triggers
>* Keeps the debug property. All triggers that are defined in v0.13 are
>  exposed.
> 
> Signed-off-by: Himanshu Chauhan 
> ---
>  target/riscv/cpu.c |  5 +
>  target/riscv/cpu_cfg.h |  1 +
>  target/riscv/debug.c   | 30 +-
>  3 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index c160b9216b..ab631500ac 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -1008,6 +1008,11 @@ static void riscv_cpu_reset_hold(Object *obj)
>  set_default_nan_mode(1, >fp_status);
>  
>  #ifndef CONFIG_USER_ONLY
> +if (!cpu->cfg.debug && cpu->cfg.ext_sdtrig) {
> +warn_report("Enabling 'debug' since 'sdtrig' is enabled.");
> +cpu->cfg.debug = true;
> +}
> +
>  if (cpu->cfg.debug) {
>  riscv_trigger_reset_hold(env);
>  }
> diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
> index 2040b90da0..0c57e1acd4 100644
> --- a/target/riscv/cpu_cfg.h
> +++ b/target/riscv/cpu_cfg.h
> @@ -114,6 +114,7 @@ struct RISCVCPUConfig {
>  bool ext_zvfbfwma;
>  bool ext_zvfh;
>  bool ext_zvfhmin;
> +bool ext_sdtrig;
>  bool ext_smaia;
>  bool ext_ssaia;
>  bool ext_sscofpmf;
> diff --git a/target/riscv/debug.c b/target/riscv/debug.c
> index 5f14b39b06..c40e727e12 100644
> --- a/target/riscv/debug.c
> +++ b/target/riscv/debug.c
> @@ -100,13 +100,16 @@ static trigger_action_t 
> get_trigger_action(CPURISCVState *env,
>  target_ulong tdata1 = env->tdata1[trigger_index];
>  int trigger_type = get_trigger_type(env, trigger_index);
>  trigger_action_t action = DBG_ACTION_NONE;
> +const RISCVCPUConfig *cfg = riscv_cpu_cfg(env);
>  
>  switch (trigger_type) {
>  case TRIGGER_TYPE_AD_MATCH:
>  action = (tdata1 & TYPE2_ACTION) >> 12;
>  break;
>  case TRIGGER_TYPE_AD_MATCH6:
> -action = (tdata1 & TYPE6_ACTION) >> 12;
> +if (cfg->ext_sdtrig) {
> +action = (tdata1 & TYPE6_ACTION) >> 12;
> +}
>  break;
>  case TRIGGER_TYPE_INST_CNT:
>  case TRIGGER_TYPE_INT:
> @@ -727,7 +730,12 @@ void tdata_csr_write(CPURISCVState *env, int 
> tdata_index, target_ulong val)
>  type2_reg_write(env, env->trigger_cur, tdata_index, val);
>  break;
>  case TRIGGER_TYPE_AD_MATCH6:
> -type6_reg_write(env, env->trigger_cur, tdata_index, val);
> +if (riscv_cpu_cfg(env)->ext_sdtrig) {
> +type6_reg_write(env, env->trigger_cur, tdata_index, val);
> +} else {
> +qemu_log_mask(LOG_UNIMP, "trigger type: %d is not supported\n",
> +  trigger_type);
> +}
>  break;
>  case TRIGGER_TYPE_INST_CNT:
>  itrigger_reg_write(env, env->trigger_cur, tdata_index, val);
> @@ -750,9 +758,13 @@ void tdata_csr_write(CPURISCVState *env, int 
> tdata_index, target_ulong val)
>  
>  target_ulong tinfo_csr_read(CPURISCVState *env)
>  {
> -/* assume all triggers support the same types of triggers */
> -return BIT(TRIGGER_TYPE_AD_MATCH) |
> -   BIT(TRIGGER_TYPE_AD_MATCH6);
> +target_ulong ts = BIT(TRIGGER_TYPE_AD_MATCH);
> +
> +if (riscv_cpu_cfg(env)->ext_sdtrig) {
> +ts |= BIT(TRIGGER_TYPE_AD_MATCH6);
> +}
> +
> +return ts;
>  }
>  
>  void riscv_cpu_debug_excp_handler(CPUState *cs)
> @@ -803,6 +815,10 @@ bool riscv_cpu_debug_check_breakpoint(CPUState *cs)
>  }
>  break;
>  case TRIGGER_TYPE_AD_MATCH6:
> +if (!cpu->cfg.ext_sdtrig) {
> +break;
> +}
> +
>  ctrl = env->tdata1[i];
>  pc = env->tdata2[i];
>  
> @@ -869,6 +885,10 @@ bool riscv_cpu_debug_check_watchpoint(CPUState *cs, 
> CPUWatchpoint *wp)
>  }
>  break;
>  case TRIGGER_TYPE_AD_MATCH6:
> +if (!cpu->cfg.ext_sdtrig) {
> +break;
> +}
> +
>  ctrl = env->tdata1[i];
>  addr = env->tdata2[i];
>  flags = 0;
> -- 
> 2.34.1
>

Reviewed-by: Andrew Jones

Re: [PATCH v3 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-14 Thread Eugenio Perez Martin

On Thu, Mar 14, 2024 at 7:35 PM Si-Wei Liu  wrote:
>
>
>
> On 3/14/2024 8:34 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 14, 2024 at 9:38 AM Si-Wei Liu  wrote:
> >> On setups with one or more virtio-net devices with vhost on,
> >> dirty tracking iteration increases cost the bigger the number
> >> amount of queues are set up e.g. on idle guests migration the
> >> following is observed with virtio-net with vhost=on:
> >>
> >> 48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
> >> 8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
> >> 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
> >> 2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14
> >>
> >> With high memory rates the symptom is lack of convergence as soon
> >> as it has a vhost device with a sufficiently high number of queues,
> >> the sufficient number of vhost devices.
> >>
> >> On every migration iteration (every 100msecs) it will redundantly
> >> query the *shared log* the number of queues configured with vhost
> >> that exist in the guest. For the virtqueue data, this is necessary,
> >> but not for the memory sections which are the same. So essentially
> >> we end up scanning the dirty log too often.
> >>
> >> To fix that, select a vhost device responsible for scanning the
> >> log with regards to memory sections dirty tracking. It is selected
> >> when we enable the logger (during migration) and cleared when we
> >> disable the logger. If the vhost logger device goes away for some
> >> reason, the logger will be re-selected from the rest of vhost
> >> devices.
> >>
> >> After making mem-section logger a singleton instance, constant cost
> >> of 7%-9% (like the 1 queue report) will be seen, no matter how many
> >> queues or how many vhost devices are configured:
> >>
> >> 48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
> >> 2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14
> >>
> >> Co-developed-by: Joao Martins 
> >> Signed-off-by: Joao Martins 
> >> Signed-off-by: Si-Wei Liu 
> >> ---
> >> v2 -> v3:
> >>- add after-fix benchmark to commit log
> >>- rename vhost_log_dev_enabled to vhost_dev_should_log
> >>- remove unneeded comparisons for backend_type
> >>- use QLIST array instead of single flat list to store vhost
> >>  logger devices
> >>- simplify logger election logic
> >>
> >> ---
> >>   hw/virtio/vhost.c | 63 
> >> ++-
> >>   include/hw/virtio/vhost.h |  1 +
> >>   2 files changed, 58 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index efe2f74..d91858b 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -45,6 +45,7 @@
> >>
> >>   static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
> >>   static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
> >> +static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX];
> >>
> >>   /* Memslots used by backends that support private memslots (without an 
> >> fd). */
> >>   static unsigned int used_memslots;
> >> @@ -149,6 +150,43 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev)
> >>   }
> >>   }
> >>
> >> +static inline bool vhost_dev_should_log(struct vhost_dev *dev)
> >> +{
> >> +assert(dev->vhost_ops);
> >> +assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE);
> >> +assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX);
> >> +
> >> +return dev == 
> >> QLIST_FIRST(_log_devs[dev->vhost_ops->backend_type]);
> >> +}
> >> +
> >> +static inline void vhost_dev_elect_mem_logger(struct vhost_dev *hdev, 
> >> bool add)
> >> +{
> >> +VhostBackendType backend_type;
> >> +
> >> +assert(hdev->vhost_ops);
> >> +
> >> +backend_type = hdev->vhost_ops->backend_type;
> >> +assert(backend_type > VHOST_BACKEND_TYPE_NONE);
> >> +assert(backend_type < VHOST_BACKEND_TYPE_MAX);
> >> +
> >> +if (add && !QLIST_IS_INSERTED(hdev, logdev_entry)) {
> >> +if (QLIST_EMPTY(_log_devs[backend_type])) {
> >> +QLIST_INSERT_HEAD(_log_devs[backend_type],
> >> +  hdev, logdev_entry);
> >> +} else {
> >> +/*
> >> + * The first vhost_device in the list is selected as the 
> >> shared
> >> + * logger to scan memory sections. Put new entry next to the 
> >> head
> >> + * to avoid inadvertent change to the underlying logger 
> >> device.
> >> + */
> > Why is changing the logger device a problem? All the code paths are
> > either changing the QLIST or logging, isn't it?
> Changing logger device doesn't affect functionality for sure, but may
> have inadvertent effect on cache locality, particularly it's relevant to
> the log scanning process in the hot path. The code makes sure there's no
> churn on the leading logger selection as a result of adding new vhost
> device, unless the selected logger device will be gone and a re-election
> of

Re: [PATCH v2 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-14 Thread Eugenio Perez Martin

On Thu, Mar 14, 2024 at 5:06 PM Jonah Palmer  wrote:
>
>
>
> On 3/14/24 10:55 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 14, 2024 at 1:16 PM Jonah Palmer  
> > wrote:
> >>
> >>
> >>
> >> On 3/13/24 11:01 PM, Jason Wang wrote:
> >>> On Wed, Mar 13, 2024 at 7:55 PM Jonah Palmer  
> >>> wrote:
> 
>  Add support to virtio-pci devices for handling the extra data sent
>  from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
>  transport feature has been negotiated.
> 
>  The extra data that's passed to the virtio-pci device when this
>  feature is enabled varies depending on the device's virtqueue
>  layout.
> 
>  In a split virtqueue layout, this data includes:
> - upper 16 bits: shadow_avail_idx
> - lower 16 bits: virtqueue index
> 
>  In a packed virtqueue layout, this data includes:
> - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
> - lower 16 bits: virtqueue index
> 
>  Tested-by: Lei Yang 
>  Reviewed-by: Eugenio Pérez 
>  Signed-off-by: Jonah Palmer 
>  ---
> hw/virtio/virtio-pci.c | 10 +++---
> hw/virtio/virtio.c | 18 ++
> include/hw/virtio/virtio.h |  1 +
> 3 files changed, 26 insertions(+), 3 deletions(-)
> 
>  diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>  index cb6940fc0e..0f5c3c3b2f 100644
>  --- a/hw/virtio/virtio-pci.c
>  +++ b/hw/virtio/virtio-pci.c
>  @@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, 
>  uint32_t addr, uint32_t val)
> {
> VirtIOPCIProxy *proxy = opaque;
> VirtIODevice *vdev = virtio_bus_get_device(>bus);
>  -uint16_t vector;
>  +uint16_t vector, vq_idx;
> hwaddr pa;
> 
> switch (addr) {
>  @@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, 
>  uint32_t addr, uint32_t val)
> vdev->queue_sel = val;
> break;
> case VIRTIO_PCI_QUEUE_NOTIFY:
>  -if (val < VIRTIO_QUEUE_MAX) {
>  -virtio_queue_notify(vdev, val);
>  +vq_idx = val;
>  +if (vq_idx < VIRTIO_QUEUE_MAX) {
>  +if (virtio_vdev_has_feature(vdev, 
>  VIRTIO_F_NOTIFICATION_DATA)) {
>  +virtio_queue_set_shadow_avail_data(vdev, val);
>  +}
>  +virtio_queue_notify(vdev, vq_idx);
> }
> break;
> case VIRTIO_PCI_STATUS:
>  diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>  index d229755eae..bcb9e09df0 100644
>  --- a/hw/virtio/virtio.c
>  +++ b/hw/virtio/virtio.c
>  @@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, 
>  int n, int align)
> }
> }
> 
>  +void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t 
>  data)
> >
> > Maybe I didn't explain well, but I think it is better to pass directly
> > idx to a VirtQueue *. That way only the caller needs to check for a
> > valid vq idx, and (my understanding is) the virtio.c interface is
> > migrating to VirtQueue * use anyway.
> >
>
> Oh, are you saying to just pass in a VirtQueue *vq instead of
> VirtIODevice *vdev and get rid of the vq->vring.desc check in the function?
>

No, that needs to be kept. I meant the access to vdev->vq[i] without
checking for a valid i.

You can get the VirtQueue in the caller with virtio_get_queue. Which
also does not check for a valid index, but that way is clearer the
caller needs to check it.

As a side note, the check for desc != 0 is widespread in QEMU but the
driver may use 0 address for desc, so it's not 100% valid. But to
change that now requires a deeper change out of the scope of this
series, so let's keep it for now :).

Thanks!

>  +{
>  +/* Lower 16 bits is the virtqueue index */
>  +uint16_t i = data;
>  +VirtQueue *vq = >vq[i];
>  +
>  +if (!vq->vring.desc) {
>  +return;
>  +}
>  +
>  +if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
>  +vq->shadow_avail_wrap_counter = (data >> 31) & 0x1;
>  +vq->shadow_avail_idx = (data >> 16) & 0x7FFF;
>  +} else {
>  +vq->shadow_avail_idx = (data >> 16);
> >>>
> >>> Do we need to do a sanity check for this value?
> >>>
> >>> Thanks
> >>>
> >>
> >> It can't hurt, right? What kind of check did you have in mind?
> >>
> >> if (vq->shadow_avail_idx >= vq->vring.num)
> >>
> >
> > I'm a little bit lost too. shadow_avail_idx can take all uint16_t
> > values. Maybe you meant checking for a valid vq index, Jason?
> >
> > Thanks!
> >
> >> Or something else?
> >>
>  +}
>  +}
>  +
> static void virtio_queue_notify_vq(VirtQueue *vq)
> {
> if (vq->vring.desc && vq->handle_output) {
>  diff --git

Re: [PATCH] docs/s390: clarify even more that cpu-topology is KVM-only

2024-03-14 Thread Nina Schoetterl-Glausch

On Thu, 2024-03-14 at 18:22 +0100, Claudio Fontana wrote:
> At least for now cpu-topology is implemented only for KVM.
> 
> We already say this, but this tries to be more explicit,
> and also show it in the examples.
> 
> This adds a new reference in the introduction that we can point to,
> whenever we need to reference accelerators and how to select them.
> 
> Signed-off-by: Claudio Fontana 

Reviewed-by: Nina Schoetterl-Glausch 
Tested-by: Nina Schoetterl-Glausch 
(meaning I ran make html)

> ---
>  docs/system/introduction.rst   |  2 ++
>  docs/system/s390x/cpu-topology.rst | 14 --
>  2 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/docs/system/introduction.rst b/docs/system/introduction.rst
> index 51ac132d6c..746707eb00 100644
> --- a/docs/system/introduction.rst
> +++ b/docs/system/introduction.rst
> @@ -1,6 +1,8 @@
>  Introduction
>  
>  
> +.. _Accelerators:
> +
>  Virtualisation Accelerators
>  ---
>  
> diff --git a/docs/system/s390x/cpu-topology.rst 
> b/docs/system/s390x/cpu-topology.rst
> index 5133fdc362..ca344e273c 100644
> --- a/docs/system/s390x/cpu-topology.rst
> +++ b/docs/system/s390x/cpu-topology.rst
> @@ -25,17 +25,19 @@ monitor polarization changes, see 
> ``docs/devel/s390-cpu-topology.rst``.
>  Prerequisites
>  -
>  
> -To use the CPU topology, you need to run with KVM on a s390x host that
> -uses the Linux kernel v6.0 or newer (which provide the so-called
> +To use the CPU topology, you currently need to choose the KVM accelerator.
> +See :ref:`Accelerators` for more details about accelerators and how to 
> select them.
> +
> +The s390x host needs to use a Linux kernel v6.0 or newer (which provides the 
> so-called
>  ``KVM_CAP_S390_CPU_TOPOLOGY`` capability that allows QEMU to signal the
>  CPU topology facility via the so-called STFLE bit 11 to the VM).
>  
>  Enabling CPU topology
>  -
>  
> -Currently, CPU topology is only enabled in the host model by default.
> +Currently, CPU topology is enabled by default only in the "host" cpu model.
>  
> -Enabling CPU topology in a CPU model is done by setting the CPU flag
> +Enabling CPU topology in another CPU model is done by setting the CPU flag
>  ``ctop`` to ``on`` as in:
>  
>  .. code-block:: bash
> @@ -132,7 +134,7 @@ In the following machine we define 8 sockets with 4 cores 
> each.
>  
>  .. code-block:: bash
>  
> -  $ qemu-system-s390x -m 2G \
> +  $ qemu-system-s390x -accel kvm -m 2G \
>  -cpu gen16b,ctop=on \
>  -smp cpus=5,sockets=8,cores=4,maxcpus=32 \
>  -device host-s390x-cpu,core-id=14 \
> @@ -227,7 +229,7 @@ with vertical high entitlement.
>  
>  .. code-block:: bash
>  
> -  $ qemu-system-s390x -m 2G \
> +  $ qemu-system-s390x -accel kvm -m 2G \
>  -cpu gen16b,ctop=on \
>  -smp cpus=1,sockets=8,cores=4,maxcpus=32 \
>  \

[PATCH v7 3/4] target/riscv: Expose sdtrig ISA extension

2024-03-14 Thread Himanshu Chauhan

This patch adds "sdtrig" in the ISA string when sdtrig extension is enabled.
The sdtrig extension may or may not be implemented in a system. Therefore, the
   -cpu rv64,sdtrig=
option can be used to dynamically turn sdtrig extension on or off.

By default, the sdtrig extension is disabled and debug property enabled as 
usual.

Signed-off-by: Himanshu Chauhan 
---
 target/riscv/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index ab631500ac..4231f36c1b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -175,6 +175,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(zvkt, PRIV_VERSION_1_12_0, ext_zvkt),
 ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
 ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
+ISA_EXT_DATA_ENTRY(sdtrig, PRIV_VERSION_1_12_0, ext_sdtrig),
 ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
 ISA_EXT_DATA_ENTRY(smepmp, PRIV_VERSION_1_12_0, ext_smepmp),
 ISA_EXT_DATA_ENTRY(smstateen, PRIV_VERSION_1_12_0, ext_smstateen),
@@ -1485,6 +1486,7 @@ const RISCVCPUMultiExtConfig riscv_cpu_extensions[] = {
 MULTI_EXT_CFG_BOOL("zvfhmin", ext_zvfhmin, false),
 MULTI_EXT_CFG_BOOL("sstc", ext_sstc, true),
 
+MULTI_EXT_CFG_BOOL("sdtrig", ext_sdtrig, false),
 MULTI_EXT_CFG_BOOL("smaia", ext_smaia, false),
 MULTI_EXT_CFG_BOOL("smepmp", ext_smepmp, false),
 MULTI_EXT_CFG_BOOL("smstateen", ext_smstateen, false),
-- 
2.34.1

[PATCH v7 2/4] target/riscv: Enable mcontrol6 triggers only when sdtrig is selected

2024-03-14 Thread Himanshu Chauhan

The mcontrol6 triggers are not defined in debug specification v0.13
These triggers are defined in sdtrig ISA extension.

This patch:
   * Adds ext_sdtrig capability which is used to select mcontrol6 triggers
   * Keeps the debug property. All triggers that are defined in v0.13 are
 exposed.

Signed-off-by: Himanshu Chauhan 
---
 target/riscv/cpu.c |  5 +
 target/riscv/cpu_cfg.h |  1 +
 target/riscv/debug.c   | 30 +-
 3 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c160b9216b..ab631500ac 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1008,6 +1008,11 @@ static void riscv_cpu_reset_hold(Object *obj)
 set_default_nan_mode(1, >fp_status);
 
 #ifndef CONFIG_USER_ONLY
+if (!cpu->cfg.debug && cpu->cfg.ext_sdtrig) {
+warn_report("Enabling 'debug' since 'sdtrig' is enabled.");
+cpu->cfg.debug = true;
+}
+
 if (cpu->cfg.debug) {
 riscv_trigger_reset_hold(env);
 }
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index 2040b90da0..0c57e1acd4 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -114,6 +114,7 @@ struct RISCVCPUConfig {
 bool ext_zvfbfwma;
 bool ext_zvfh;
 bool ext_zvfhmin;
+bool ext_sdtrig;
 bool ext_smaia;
 bool ext_ssaia;
 bool ext_sscofpmf;
diff --git a/target/riscv/debug.c b/target/riscv/debug.c
index 5f14b39b06..c40e727e12 100644
--- a/target/riscv/debug.c
+++ b/target/riscv/debug.c
@@ -100,13 +100,16 @@ static trigger_action_t get_trigger_action(CPURISCVState 
*env,
 target_ulong tdata1 = env->tdata1[trigger_index];
 int trigger_type = get_trigger_type(env, trigger_index);
 trigger_action_t action = DBG_ACTION_NONE;
+const RISCVCPUConfig *cfg = riscv_cpu_cfg(env);
 
 switch (trigger_type) {
 case TRIGGER_TYPE_AD_MATCH:
 action = (tdata1 & TYPE2_ACTION) >> 12;
 break;
 case TRIGGER_TYPE_AD_MATCH6:
-action = (tdata1 & TYPE6_ACTION) >> 12;
+if (cfg->ext_sdtrig) {
+action = (tdata1 & TYPE6_ACTION) >> 12;
+}
 break;
 case TRIGGER_TYPE_INST_CNT:
 case TRIGGER_TYPE_INT:
@@ -727,7 +730,12 @@ void tdata_csr_write(CPURISCVState *env, int tdata_index, 
target_ulong val)
 type2_reg_write(env, env->trigger_cur, tdata_index, val);
 break;
 case TRIGGER_TYPE_AD_MATCH6:
-type6_reg_write(env, env->trigger_cur, tdata_index, val);
+if (riscv_cpu_cfg(env)->ext_sdtrig) {
+type6_reg_write(env, env->trigger_cur, tdata_index, val);
+} else {
+qemu_log_mask(LOG_UNIMP, "trigger type: %d is not supported\n",
+  trigger_type);
+}
 break;
 case TRIGGER_TYPE_INST_CNT:
 itrigger_reg_write(env, env->trigger_cur, tdata_index, val);
@@ -750,9 +758,13 @@ void tdata_csr_write(CPURISCVState *env, int tdata_index, 
target_ulong val)
 
 target_ulong tinfo_csr_read(CPURISCVState *env)
 {
-/* assume all triggers support the same types of triggers */
-return BIT(TRIGGER_TYPE_AD_MATCH) |
-   BIT(TRIGGER_TYPE_AD_MATCH6);
+target_ulong ts = BIT(TRIGGER_TYPE_AD_MATCH);
+
+if (riscv_cpu_cfg(env)->ext_sdtrig) {
+ts |= BIT(TRIGGER_TYPE_AD_MATCH6);
+}
+
+return ts;
 }
 
 void riscv_cpu_debug_excp_handler(CPUState *cs)
@@ -803,6 +815,10 @@ bool riscv_cpu_debug_check_breakpoint(CPUState *cs)
 }
 break;
 case TRIGGER_TYPE_AD_MATCH6:
+if (!cpu->cfg.ext_sdtrig) {
+break;
+}
+
 ctrl = env->tdata1[i];
 pc = env->tdata2[i];
 
@@ -869,6 +885,10 @@ bool riscv_cpu_debug_check_watchpoint(CPUState *cs, 
CPUWatchpoint *wp)
 }
 break;
 case TRIGGER_TYPE_AD_MATCH6:
+if (!cpu->cfg.ext_sdtrig) {
+break;
+}
+
 ctrl = env->tdata1[i];
 addr = env->tdata2[i];
 flags = 0;
-- 
2.34.1

[PATCH v7 1/4] target/riscv: Check for valid itimer pointer before free

2024-03-14 Thread Himanshu Chauhan

Check if each element of array of pointers for itimer contains a non-null
pointer before freeing.

Signed-off-by: Himanshu Chauhan 
---
 target/riscv/debug.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/riscv/debug.c b/target/riscv/debug.c
index e30d99cc2f..5f14b39b06 100644
--- a/target/riscv/debug.c
+++ b/target/riscv/debug.c
@@ -938,7 +938,10 @@ void riscv_trigger_reset_hold(CPURISCVState *env)
 env->tdata3[i] = 0;
 env->cpu_breakpoint[i] = NULL;
 env->cpu_watchpoint[i] = NULL;
-timer_del(env->itrigger_timer[i]);
+if (env->itrigger_timer[i]) {
+timer_del(env->itrigger_timer[i]);
+env->itrigger_timer[i] = NULL;
+}
 }
 
 env->mcontext = 0;
-- 
2.34.1

[PATCH v7 4/4] target/riscv: Enable sdtrig for Ventana's Veyron CPUs

2024-03-14 Thread Himanshu Chauhan

Ventana's Veyron CPUs support sdtrig ISA extension. By default, enable
the sdtrig extension and disable the debug property for these CPUs.

Signed-off-by: Himanshu Chauhan 
---
 target/riscv/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 4231f36c1b..c9dda73748 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -569,6 +569,7 @@ static void rv64_veyron_v1_cpu_init(Object *obj)
 cpu->cfg.cbom_blocksize = 64;
 cpu->cfg.cboz_blocksize = 64;
 cpu->cfg.ext_zicboz = true;
+cpu->cfg.ext_sdtrig = true;
 cpu->cfg.ext_smaia = true;
 cpu->cfg.ext_ssaia = true;
 cpu->cfg.ext_sscofpmf = true;
-- 
2.34.1

[PATCH] target/s390x: improve cpu compatibility check error message

2024-03-14 Thread Claudio Fontana

some users were confused by this message showing under TCG:

Selected CPU generation is too new. Maximum supported model
in the configuration: 'xyz'

Try to clarify that the maximum can depend on the accel by
adding also the current accelerator to the message as such:

Selected CPU generation is too new. Maximum supported model
in the accelerator 'tcg' configuration: 'xyz'

Signed-off-by: Claudio Fontana 
---
 target/s390x/cpu_models.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 1a1c096122..0d6d8fc727 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -508,14 +508,14 @@ static void check_compatibility(const S390CPUModel 
*max_model,
 
 if (model->def->gen > max_model->def->gen) {
 error_setg(errp, "Selected CPU generation is too new. Maximum "
-   "supported model in the configuration: \'%s\'",
-   max_model->def->name);
+   "supported model in the accelerator \'%s\' configuration: 
\'%s\'",
+   current_accel_name(), max_model->def->name);
 return;
 } else if (model->def->gen == max_model->def->gen &&
model->def->ec_ga > max_model->def->ec_ga) {
 error_setg(errp, "Selected CPU GA level is too new. Maximum "
-   "supported model in the configuration: \'%s\'",
-   max_model->def->name);
+   "supported model in the accelerator \'%s\' configuration: 
\'%s\'",
+   current_accel_name(), max_model->def->name);
 return;
 }
 
@@ -537,7 +537,8 @@ static void check_compatibility(const S390CPUModel 
*max_model,
 error_setg(errp, " ");
 s390_feat_bitmap_to_ascii(missing, errp, error_prepend_missing_feat);
 error_prepend(errp, "Some features requested in the CPU model are not "
-  "available in the configuration: ");
+  "available in the accelerator \'%s\' configuration: ",
+  current_accel_name());
 }
 
 S390CPUModel *get_max_cpu_model(Error **errp)
-- 
2.26.2

[PATCH v7 0/4] Introduce sdtrig ISA extension

2024-03-14 Thread Himanshu Chauhan

All the CPUs may or may not implement the debug triggers. Some CPUs
may implement only debug specification v0.13 and not sdtrig ISA
extension.

This patchset, adds sdtrig ISA as an extension which can be turned on or off by
sdtrig= option. It is turned off by default.

When debug is true and sdtrig is false, the behaviour is as defined in debug
specification v0.13. If sdtrig is turned on, the behaviour is as defined
in the sdtrig ISA extension.

The "sdtrig" string is concatenated to ISA string when debug or sdtrig is 
enabled.

Changes from v1:
  - Replaced the debug property with ext_sdtrig
  - Marked it experimenatal by naming it x-sdtrig
  - x-sdtrig is added to ISA string
  - Reversed the patch order

Changes from v2:
  - Mark debug property as deprecated and replace internally with sdtrig 
extension
  - setting/unsetting debug property shows warning and sets/unsets ext_sdtrig
  - sdtrig is added to ISA string as RISC-V debug specification is frozen

Changes from v3:
  - debug propery is not deprecated but it is superceded by sdtrig extension
  - Mcontrol6 support is not published when only debug property is turned
on as debug spec v0.13 doesn't define mcontrol6 match triggers.
  - Enabling sdtrig extension turns of debug property and a warning is printed.
This doesn't break debug specification implemenation since sdtrig is
backward compatible with debug specification.
  - Disable debug property and enable sdtrig by default for Ventana's Veyron
CPUs.

Changes from v4:
  - Enable debug flag if sdtrig was enabled but debug was disabled.
  - Other cosmetic changes.

Changes from v5:
  - Addressed comments from Andrew Jones

Changes from v6:
  - Cosmetic changes. All patches were run through checkpatch.pl.
No errors/warning.
  - Remove all debug || ext_sdtrig references. All decisions are based on
debug flag alone
  - Added null check before itimers are deleted. Without this check a
crash is observed.

Himanshu Chauhan (4):
  target/riscv: Check for valid itimer pointer before free
  target/riscv: Enable mcontrol6 triggers only when sdtrig is selected
  target/riscv: Expose sdtrig ISA extension
  target/riscv: Enable sdtrig for Ventana's Veyron CPUs

 target/riscv/cpu.c |  8 
 target/riscv/cpu_cfg.h |  1 +
 target/riscv/debug.c   | 35 +--
 3 files changed, 38 insertions(+), 6 deletions(-)

-- 
2.34.1

Re: [PATCH v3 1/2] vhost: dirty log should be per backend type

2024-03-14 Thread Si-Wei Liu





On 3/14/2024 8:25 AM, Eugenio Perez Martin wrote:

On Thu, Mar 14, 2024 at 9:38 AM Si-Wei Liu  wrote:

There could be a mix of both vhost-user and vhost-kernel clients
in the same QEMU process, where separate vhost loggers for the
specific vhost type have to be used. Make the vhost logger per
backend type, and have them properly reference counted.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Si-Wei Liu 
---
v2->v3:
   - remove non-effective assertion that never be reached
   - do not return NULL from vhost_log_get()
   - add neccessary assertions to vhost_log_get()

---
  hw/virtio/vhost.c | 50 ++
  1 file changed, 38 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2c9ac79..efe2f74 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -43,8 +43,8 @@
  do { } while (0)
  #endif

-static struct vhost_log *vhost_log;
-static struct vhost_log *vhost_log_shm;
+static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
+static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];

  /* Memslots used by backends that support private memslots (without an fd). */
  static unsigned int used_memslots;
@@ -287,6 +287,10 @@ static int vhost_set_backend_type(struct vhost_dev *dev,
  r = -1;
  }

+if (r == 0) {
+assert(dev->vhost_ops->backend_type == backend_type);
+}
+
  return r;
  }

@@ -319,16 +323,22 @@ static struct vhost_log *vhost_log_alloc(uint64_t size, 
bool share)
  return log;
  }

-static struct vhost_log *vhost_log_get(uint64_t size, bool share)
+static struct vhost_log *vhost_log_get(VhostBackendType backend_type,
+   uint64_t size, bool share)
  {
-struct vhost_log *log = share ? vhost_log_shm : vhost_log;
+struct vhost_log *log;
+
+assert(backend_type > VHOST_BACKEND_TYPE_NONE);
+assert(backend_type < VHOST_BACKEND_TYPE_MAX);
+
+log = share ? vhost_log_shm[backend_type] : vhost_log[backend_type];

  if (!log || log->size != size) {
  log = vhost_log_alloc(size, share);
  if (share) {
-vhost_log_shm = log;
+vhost_log_shm[backend_type] = log;
  } else {
-vhost_log = log;
+vhost_log[backend_type] = log;
  }
  } else {
  ++log->refcnt;
@@ -340,11 +350,20 @@ static struct vhost_log *vhost_log_get(uint64_t size, 
bool share)
  static void vhost_log_put(struct vhost_dev *dev, bool sync)
  {
  struct vhost_log *log = dev->log;
+VhostBackendType backend_type;

  if (!log) {
  return;
  }

+assert(dev->vhost_ops);
+backend_type = dev->vhost_ops->backend_type;
+
+if (backend_type == VHOST_BACKEND_TYPE_NONE ||
+backend_type >= VHOST_BACKEND_TYPE_MAX) {
+return;
+}
+
  --log->refcnt;
  if (log->refcnt == 0) {
  /* Sync only the range covered by the old log */
@@ -352,13 +371,13 @@ static void vhost_log_put(struct vhost_dev *dev, bool 
sync)
  vhost_log_sync_range(dev, 0, dev->log_size * VHOST_LOG_CHUNK - 1);
  }

-if (vhost_log == log) {
+if (vhost_log[backend_type] == log) {
  g_free(log->log);
-vhost_log = NULL;
-} else if (vhost_log_shm == log) {
+vhost_log[backend_type] = NULL;
+} else if (vhost_log_shm[backend_type] == log) {
  qemu_memfd_free(log->log, log->size * sizeof(*(log->log)),
  log->fd);
-vhost_log_shm = NULL;
+vhost_log_shm[backend_type] = NULL;
  }

  g_free(log);
@@ -376,7 +395,8 @@ static bool vhost_dev_log_is_shared(struct vhost_dev *dev)

  static inline void vhost_dev_log_resize(struct vhost_dev *dev, uint64_t size)
  {
-struct vhost_log *log = vhost_log_get(size, vhost_dev_log_is_shared(dev));
+struct vhost_log *log = vhost_log_get(dev->vhost_ops->backend_type,
+  size, vhost_dev_log_is_shared(dev));
  uint64_t log_base = (uintptr_t)log->log;
  int r;

@@ -2037,8 +2057,14 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev, bool vrings)
  uint64_t log_base;

  hdev->log_size = vhost_get_log_size(hdev);
-hdev->log = vhost_log_get(hdev->log_size,
+hdev->log = vhost_log_get(hdev->vhost_ops->backend_type,
+  hdev->log_size,
vhost_dev_log_is_shared(hdev));
+if (!hdev->log) {

I thought vhost_log_get couldn't return NULL :).

Sure, missed that. Will post a revised v4.

-Siwei


Other than that,

Acked-by: Eugenio Pérez 


+VHOST_OPS_DEBUG(r, "vhost_log_get failed");
+goto fail_vq;
+}
+
  log_base = (uintptr_t)hdev->log->log;
  r = hdev->vhost_ops->vhost_set_log_base(hdev,
  hdev->log_size ? log_base :

Re: [PATCH v3 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-14 Thread Si-Wei Liu





On 3/14/2024 8:34 AM, Eugenio Perez Martin wrote:

On Thu, Mar 14, 2024 at 9:38 AM Si-Wei Liu  wrote:

On setups with one or more virtio-net devices with vhost on,
dirty tracking iteration increases cost the bigger the number
amount of queues are set up e.g. on idle guests migration the
following is observed with virtio-net with vhost=on:

48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14

With high memory rates the symptom is lack of convergence as soon
as it has a vhost device with a sufficiently high number of queues,
the sufficient number of vhost devices.

On every migration iteration (every 100msecs) it will redundantly
query the *shared log* the number of queues configured with vhost
that exist in the guest. For the virtqueue data, this is necessary,
but not for the memory sections which are the same. So essentially
we end up scanning the dirty log too often.

To fix that, select a vhost device responsible for scanning the
log with regards to memory sections dirty tracking. It is selected
when we enable the logger (during migration) and cleared when we
disable the logger. If the vhost logger device goes away for some
reason, the logger will be re-selected from the rest of vhost
devices.

After making mem-section logger a singleton instance, constant cost
of 7%-9% (like the 1 queue report) will be seen, no matter how many
queues or how many vhost devices are configured:

48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14

Co-developed-by: Joao Martins 
Signed-off-by: Joao Martins 
Signed-off-by: Si-Wei Liu 
---
v2 -> v3:
   - add after-fix benchmark to commit log
   - rename vhost_log_dev_enabled to vhost_dev_should_log
   - remove unneeded comparisons for backend_type
   - use QLIST array instead of single flat list to store vhost
 logger devices
   - simplify logger election logic

---
  hw/virtio/vhost.c | 63 ++-
  include/hw/virtio/vhost.h |  1 +
  2 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index efe2f74..d91858b 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -45,6 +45,7 @@

  static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
  static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
+static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX];

  /* Memslots used by backends that support private memslots (without an fd). */
  static unsigned int used_memslots;
@@ -149,6 +150,43 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev)
  }
  }

+static inline bool vhost_dev_should_log(struct vhost_dev *dev)
+{
+assert(dev->vhost_ops);
+assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE);
+assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX);
+
+return dev == QLIST_FIRST(_log_devs[dev->vhost_ops->backend_type]);
+}
+
+static inline void vhost_dev_elect_mem_logger(struct vhost_dev *hdev, bool add)
+{
+VhostBackendType backend_type;
+
+assert(hdev->vhost_ops);
+
+backend_type = hdev->vhost_ops->backend_type;
+assert(backend_type > VHOST_BACKEND_TYPE_NONE);
+assert(backend_type < VHOST_BACKEND_TYPE_MAX);
+
+if (add && !QLIST_IS_INSERTED(hdev, logdev_entry)) {
+if (QLIST_EMPTY(_log_devs[backend_type])) {
+QLIST_INSERT_HEAD(_log_devs[backend_type],
+  hdev, logdev_entry);
+} else {
+/*
+ * The first vhost_device in the list is selected as the shared
+ * logger to scan memory sections. Put new entry next to the head
+ * to avoid inadvertent change to the underlying logger device.
+ */

Why is changing the logger device a problem? All the code paths are
either changing the QLIST or logging, isn't it?
Changing logger device doesn't affect functionality for sure, but may 
have inadvertent effect on cache locality, particularly it's relevant to 
the log scanning process in the hot path. The code makes sure there's no 
churn on the leading logger selection as a result of adding new vhost 
device, unless the selected logger device will be gone and a re-election 
of another logger is needed.


-Siwei




+QLIST_INSERT_AFTER(QLIST_FIRST(_log_devs[backend_type]),
+   hdev, logdev_entry);
+}
+} else if (!add && QLIST_IS_INSERTED(hdev, logdev_entry)) {
+QLIST_REMOVE(hdev, logdev_entry);
+}
+}
+
  static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
 MemoryRegionSection *section,
 hwaddr first,
@@ -166,12 +204,14 @@ static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
  start_addr = MAX(first,

Re: [PATCH v2 0/2] migration mapped-ram fixes

2024-03-14 Thread Peter Xu

On Wed, Mar 13, 2024 at 06:28:22PM -0300, Fabiano Rosas wrote:
> Hi,
> 
> In this v2:
> 
> patch 1 - The fix for the ioc leaks, now including the main channel
> 
> patch 2 - A fix for an fd: migration case I thought I had written code
>   for, but obviously didn't.

The two issues are separate.  I assume patch 2 will need a rework, but I
queued patch 1 first.  Thanks.

-- 
Peter Xu

[PATCH for 9.0 v15 06/10] target/riscv/vector_helpers: do early exit when vstart >= vl

2024-03-14 Thread Daniel Henrique Barboza

We're going to make changes that will required each helper to be
responsible for the 'vstart' management, i.e. we will relieve the
'vstart < vl' assumption that helpers have today.

Helpers are usually able to deal with vstart >= vl, i.e. doing nothing
aside from setting vstart = 0 at the end, but the tail update functions
will update the tail regardless of vstart being valid or not. Unifying
the tail update process in a single function that would handle the
vstart >= vl case isn't trivial (see [1] for more info).

This patch takes a blunt approach: do an early exit in every single
vector helper if vstart >= vl, unless the helper is guarded with
vstart_eq_zero in the translation. For those cases the helper is ready
to deal with cases where vl might be zero, i.e. throwing exceptions
based on it like vcpop_m() and first_m().

Helpers that weren't changed:

- vcpop_m(), vfirst_m(), vmsetm(), GEN_VEXT_VIOTA_M(): these are guarded
  directly with vstart_eq_zero;

- GEN_VEXT_VCOMPRESS_VM(): guarded with vcompress_vm_check() that checks
  vstart_eq_zero;

- GEN_VEXT_RED(): guarded with either reduction_check() or
  reduction_widen_check(), both check vstart_eq_zero;

- GEN_VEXT_FRED(): guarded with either freduction_check() or
  freduction_widen_check(), both check vstart_eq_zero.

Another exception is vext_ldst_whole(), who operates on effective vector
length regardless of the current settings in vtype and vl.

[1] 
https://lore.kernel.org/qemu-riscv/1590234b-0291-432a-a0fa-c5a687609...@linux.alibaba.com/

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/vcrypto_helper.c   | 32 
 target/riscv/vector_helper.c| 66 +
 target/riscv/vector_internals.c |  4 ++
 target/riscv/vector_internals.h |  9 +
 4 files changed, 111 insertions(+)

diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index e2d719b13b..f7423df226 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -222,6 +222,8 @@ static inline void xor_round_key(AESState *round_state, 
AESState *round_key)
 uint32_t total_elems = vext_get_total_elems(env, desc, 4);\
 uint32_t vta = vext_vta(desc);\
   \
+VSTART_CHECK_EARLY_EXIT(env); \
+  \
 for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {\
 AESState round_key;   \
 round_key.d[0] = *((uint64_t *)vs2 + H8(i * 2 + 0));  \
@@ -246,6 +248,8 @@ static inline void xor_round_key(AESState *round_state, 
AESState *round_key)
 uint32_t total_elems = vext_get_total_elems(env, desc, 4);\
 uint32_t vta = vext_vta(desc);\
   \
+VSTART_CHECK_EARLY_EXIT(env); \
+  \
 for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {\
 AESState round_key;   \
 round_key.d[0] = *((uint64_t *)vs2 + H8(0));  \
@@ -305,6 +309,8 @@ void HELPER(vaeskf1_vi)(void *vd_vptr, void *vs2_vptr, 
uint32_t uimm,
 uint32_t total_elems = vext_get_total_elems(env, desc, 4);
 uint32_t vta = vext_vta(desc);
 
+VSTART_CHECK_EARLY_EXIT(env);
+
 uimm &= 0b;
 if (uimm > 10 || uimm == 0) {
 uimm ^= 0b1000;
@@ -351,6 +357,8 @@ void HELPER(vaeskf2_vi)(void *vd_vptr, void *vs2_vptr, 
uint32_t uimm,
 uint32_t total_elems = vext_get_total_elems(env, desc, 4);
 uint32_t vta = vext_vta(desc);
 
+VSTART_CHECK_EARLY_EXIT(env);
+
 uimm &= 0b;
 if (uimm > 14 || uimm < 2) {
 uimm ^= 0b1000;
@@ -457,6 +465,8 @@ void HELPER(vsha2ms_vv)(void *vd, void *vs1, void *vs2, 
CPURISCVState *env,
 uint32_t total_elems;
 uint32_t vta = vext_vta(desc);
 
+VSTART_CHECK_EARLY_EXIT(env);
+
 for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
 if (sew == MO_32) {
 vsha2ms_e32(((uint32_t *)vd) + i * 4, ((uint32_t *)vs1) + i * 4,
@@ -572,6 +582,8 @@ void HELPER(vsha2ch32_vv)(void *vd, void *vs1, void *vs2, 
CPURISCVState *env,
 uint32_t total_elems;
 uint32_t vta = vext_vta(desc);
 
+VSTART_CHECK_EARLY_EXIT(env);
+
 for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
 vsha2c_32(((uint32_t *)vs2) + 4 * i, ((uint32_t *)vd) + 4 * i,
   ((uint32_t *)vs1) + 4 * i + 2);
@@ -590,6 +602,8 @@ void HELPER(vsha2ch64_vv)(void *vd, void *vs1, void *vs2, 
CPURISCVState *env,
 uint32_t

[PATCH for 9.0 v15 09/10] target/riscv: enable 'vstart_eq_zero' in the end of insns

2024-03-14 Thread Daniel Henrique Barboza

From: Ivan Klokov 

The vstart_eq_zero flag is updated at the beginning of the translation
phase from the env->vstart variable. During the execution phase all
functions will set env->vstart = 0 after a successful execution, but the
vstart_eq_zero flag remains the same as at the start of the block. This
will wrongly cause SIGILLs in translations that requires env->vstart = 0
and might be reading vstart_eq_zero = false.

This patch adds a new finalize_rvv_inst() helper that is called at the
end of each vector instruction that will both update vstart_eq_zero and
do a mark_vs_dirty().

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1976
Signed-off-by: Ivan Klokov 
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
---
 target/riscv/insn_trans/trans_rvbf16.c.inc |  6 +-
 target/riscv/insn_trans/trans_rvv.c.inc| 83 --
 target/riscv/insn_trans/trans_rvvk.c.inc   | 12 ++--
 target/riscv/translate.c   |  6 ++
 4 files changed, 59 insertions(+), 48 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvbf16.c.inc 
b/target/riscv/insn_trans/trans_rvbf16.c.inc
index a842e76a6b..0a9cd1ec31 100644
--- a/target/riscv/insn_trans/trans_rvbf16.c.inc
+++ b/target/riscv/insn_trans/trans_rvbf16.c.inc
@@ -83,7 +83,7 @@ static bool trans_vfncvtbf16_f_f_w(DisasContext *ctx, 
arg_vfncvtbf16_f_f_w *a)
ctx->cfg_ptr->vlenb,
ctx->cfg_ptr->vlenb, data,
gen_helper_vfncvtbf16_f_f_w);
-mark_vs_dirty(ctx);
+finalize_rvv_inst(ctx);
 return true;
 }
 return false;
@@ -108,7 +108,7 @@ static bool trans_vfwcvtbf16_f_f_v(DisasContext *ctx, 
arg_vfwcvtbf16_f_f_v *a)
ctx->cfg_ptr->vlenb,
ctx->cfg_ptr->vlenb, data,
gen_helper_vfwcvtbf16_f_f_v);
-mark_vs_dirty(ctx);
+finalize_rvv_inst(ctx);
 return true;
 }
 return false;
@@ -135,7 +135,7 @@ static bool trans_vfwmaccbf16_vv(DisasContext *ctx, 
arg_vfwmaccbf16_vv *a)
ctx->cfg_ptr->vlenb,
ctx->cfg_ptr->vlenb, data,
gen_helper_vfwmaccbf16_vv);
-mark_vs_dirty(ctx);
+finalize_rvv_inst(ctx);
 return true;
 }
 return false;
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 401ee939b8..7d84e7d812 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -167,7 +167,7 @@ static bool do_vsetvl(DisasContext *s, int rd, int rs1, 
TCGv s2)
 
 gen_helper_vsetvl(dst, tcg_env, s1, s2);
 gen_set_gpr(s, rd, dst);
-mark_vs_dirty(s);
+finalize_rvv_inst(s);
 
 gen_update_pc(s, s->cur_insn_len);
 lookup_and_goto_ptr(s);
@@ -187,7 +187,7 @@ static bool do_vsetivli(DisasContext *s, int rd, TCGv s1, 
TCGv s2)
 
 gen_helper_vsetvl(dst, tcg_env, s1, s2);
 gen_set_gpr(s, rd, dst);
-mark_vs_dirty(s);
+finalize_rvv_inst(s);
 gen_update_pc(s, s->cur_insn_len);
 lookup_and_goto_ptr(s);
 s->base.is_jmp = DISAS_NORETURN;
@@ -657,6 +657,7 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
 }
 
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -812,6 +813,7 @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, 
uint32_t rs2,
 
 fn(dest, mask, base, stride, tcg_env, desc);
 
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -913,6 +915,7 @@ static bool ldst_index_trans(uint32_t vd, uint32_t rs1, 
uint32_t vs2,
 
 fn(dest, mask, base, index, tcg_env, desc);
 
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -1043,7 +1046,7 @@ static bool ldff_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
 
 fn(dest, mask, base, tcg_env, desc);
 
-mark_vs_dirty(s);
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -1100,6 +1103,7 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, 
uint32_t nf,
 
 fn(dest, base, tcg_env, desc);
 
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -1189,7 +1193,7 @@ do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn 
*gvec_fn,
tcg_env, s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data, fn);
 }
-mark_vs_dirty(s);
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -1240,7 +1244,7 @@ static bool opivx_trans(uint32_t vd, uint32_t rs1, 
uint32_t vs2, uint32_t vm,
 
 fn(dest, mask, src1, src2, tcg_env, desc);
 
-mark_vs_dirty(s);
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -1265,7 +1269,7 @@ do_opivx_gvec(DisasContext *s, arg_rmrr *a, GVecGen2sFn 
*gvec_fn,
 gvec_fn(s->sew, vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2),
 src1, MAXSZ(s), MAXSZ(s));
 
-mark_vs_dirty(s);
+finalize_rvv_inst(s);

[PATCH for 9.0 v15 08/10] trans_rvv.c.inc: remove redundant mark_vs_dirty() calls

2024-03-14 Thread Daniel Henrique Barboza

trans_vmv_v_i , trans_vfmv_v_f and the trans_##NAME macro from
GEN_VMV_WHOLE_TRANS() are calling mark_vs_dirty() in both branches of
their 'ifs'. conditionals.

Call it just once in the end like other functions are doing.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
---
 target/riscv/insn_trans/trans_rvv.c.inc | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 7931fb2f3f..401ee939b8 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -2065,7 +2065,6 @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
 if (s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
 tcg_gen_gvec_dup_imm(s->sew, vreg_ofs(s, a->rd),
  MAXSZ(s), MAXSZ(s), simm);
-mark_vs_dirty(s);
 } else {
 TCGv_i32 desc;
 TCGv_i64 s1;
@@ -2083,9 +2082,8 @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
   s->cfg_ptr->vlenb, data));
 tcg_gen_addi_ptr(dest, tcg_env, vreg_ofs(s, a->rd));
 fns[s->sew](dest, s1, tcg_env, desc);
-
-mark_vs_dirty(s);
 }
+mark_vs_dirty(s);
 return true;
 }
 return false;
@@ -2612,7 +2610,6 @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f 
*a)
 
 tcg_gen_gvec_dup_i64(s->sew, vreg_ofs(s, a->rd),
  MAXSZ(s), MAXSZ(s), t1);
-mark_vs_dirty(s);
 } else {
 TCGv_ptr dest;
 TCGv_i32 desc;
@@ -2635,9 +2632,8 @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f 
*a)
 tcg_gen_addi_ptr(dest, tcg_env, vreg_ofs(s, a->rd));
 
 fns[s->sew - 1](dest, t1, tcg_env, desc);
-
-mark_vs_dirty(s);
 }
+mark_vs_dirty(s);
 return true;
 }
 return false;
@@ -3560,12 +3556,11 @@ static bool trans_##NAME(DisasContext *s, arg_##NAME * 
a)   \
 if (s->vstart_eq_zero) {\
 tcg_gen_gvec_mov(s->sew, vreg_ofs(s, a->rd),\
  vreg_ofs(s, a->rs2), maxsz, maxsz);\
-mark_vs_dirty(s);   \
 } else {\
 tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2), \
tcg_env, maxsz, maxsz, 0, gen_helper_vmvr_v); \
-mark_vs_dirty(s);   \
 }   \
+mark_vs_dirty(s);   \
 return true;\
 }   \
 return false;   \
-- 
2.44.0

Re: [PATCH v2 2/2] migration/multifd: Ensure we're not given a socket for file migration

2024-03-14 Thread Peter Xu

On Thu, Mar 14, 2024 at 01:50:07PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Thu, Mar 14, 2024 at 11:10:12AM -0400, Peter Xu wrote:
> >> On Wed, Mar 13, 2024 at 06:28:24PM -0300, Fabiano Rosas wrote:
> >> > When doing migration using the fd: URI, the incoming migration starts
> >> > before the user has passed the file descriptor to QEMU. This means
> >> > that the checks at migration_channels_and_transport_compatible()
> >> > happen too soon and we need to allow a migration channel of type
> >> > SOCKET_ADDRESS_TYPE_FD even though socket migration is not supported
> >> > with multifd.
> >> 
> >> Hmm, bare with me if this is a stupid one.. why the incoming migration can
> >> start _before_ the user passed in the fd?
> >> 
> >> IOW, why can't we rely on a single fd_is_socket() check for
> >> SOCKET_ADDRESS_TYPE_FD in transport_supports_multi_channels()?
> >> 
> >> > 
> >> > The commit decdc76772 ("migration/multifd: Add mapped-ram support to
> >> > fd: URI") was supposed to add a second check prior to starting
> >> > migration to make sure a socket fd is not passed instead of a file fd,
> >> > but failed to do so.
> >> > 
> >> > Add the missing verification.
> >> > 
> >> > Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: 
> >> > URI")
> >> > Signed-off-by: Fabiano Rosas 
> >> > ---
> >> >  migration/fd.c   | 8 
> >> >  migration/file.c | 7 +++
> >> >  2 files changed, 15 insertions(+)
> >> > 
> >> > diff --git a/migration/fd.c b/migration/fd.c
> >> > index 39a52e5c90..c07030f715 100644
> >> > --- a/migration/fd.c
> >> > +++ b/migration/fd.c
> >> > @@ -22,6 +22,7 @@
> >> >  #include "migration.h"
> >> >  #include "monitor/monitor.h"
> >> >  #include "io/channel-file.h"
> >> > +#include "io/channel-socket.h"
> >> >  #include "io/channel-util.h"
> >> >  #include "options.h"
> >> >  #include "trace.h"
> >> > @@ -95,6 +96,13 @@ void fd_start_incoming_migration(const char *fdname, 
> >> > Error **errp)
> >> >  }
> >> >  
> >> >  if (migrate_multifd()) {
> >> > +if (fd_is_socket(fd)) {
> >> > +error_setg(errp,
> >> > +   "Multifd migration to a socket FD is not 
> >> > supported");
> >> > +object_unref(ioc);
> >> > +return;
> >> > +}
> >
> > And... I just noticed this is forbiding multifd+socket+fd in general?  But
> > isn't that the majority of multifd usage when with libvirt over sockets?
> 
> I didn't think multifd supported socket fds, does it? I don't see code
> to create the multiple channels anywhere. How would that work? Multiple
> threads writing to a single socket fd? I'm a bit confused.

You're probably right.

I somehow had the assumption that Libvirt always used fds to passover to
QEMU for migration, but indeed multifd at least shouldn't support it as I
read the code again..  It'll be good if Dan would help to clarify when fd
will be used in migrations.

-- 
Peter Xu

[PATCH for 9.0 v15 02/10] trans_rvv.c.inc: set vstart = 0 in int scalar move insns

2024-03-14 Thread Daniel Henrique Barboza

trans_vmv_x_s, trans_vmv_s_x, trans_vfmv_f_s and trans_vfmv_s_f aren't
setting vstart = 0 after execution. This is usually done by a helper in
vector_helper.c but these functions don't use helpers.

We'll set vstart after any potential 'over' brconds, and that will also
mandate a mark_vs_dirty() too.

Fixes: dedc53cbc9 ("target/riscv: rvv-1.0: integer scalar move instructions")
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvv.c.inc | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index e42728990e..8c16a9f5b3 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -3373,6 +3373,8 @@ static bool trans_vmv_x_s(DisasContext *s, arg_vmv_x_s *a)
 vec_element_loadi(s, t1, a->rs2, 0, true);
 tcg_gen_trunc_i64_tl(dest, t1);
 gen_set_gpr(s, a->rd, dest);
+tcg_gen_movi_tl(cpu_vstart, 0);
+mark_vs_dirty(s);
 return true;
 }
 return false;
@@ -3399,8 +3401,9 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
 s1 = get_gpr(s, a->rs1, EXT_NONE);
 tcg_gen_ext_tl_i64(t1, s1);
 vec_element_storei(s, a->rd, 0, t1);
-mark_vs_dirty(s);
 gen_set_label(over);
+tcg_gen_movi_tl(cpu_vstart, 0);
+mark_vs_dirty(s);
 return true;
 }
 return false;
@@ -3427,6 +3430,8 @@ static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s 
*a)
 }
 
 mark_fs_dirty(s);
+tcg_gen_movi_tl(cpu_vstart, 0);
+mark_vs_dirty(s);
 return true;
 }
 return false;
@@ -3452,8 +3457,9 @@ static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f 
*a)
 do_nanbox(s, t1, cpu_fpr[a->rs1]);
 
 vec_element_storei(s, a->rd, 0, t1);
-mark_vs_dirty(s);
 gen_set_label(over);
+tcg_gen_movi_tl(cpu_vstart, 0);
+mark_vs_dirty(s);
 return true;
 }
 return false;
-- 
2.44.0

[PATCH for 9.0 v15 07/10] target/riscv: remove 'over' brconds from vector trans

2024-03-14 Thread Daniel Henrique Barboza

All helpers that rely on vstart >= vl are now doing early exits using
the VSTART_CHECK_EARLY_EXIT() macro. This macro will not only exit the
helper but also clear vstart.

We're still left with brconds that are skipping the helper, which is the
only place where we're clearing vstart. The pattern goes like this:

tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
(... calls helper that clears vstart ...)
gen_set_label(over);
return true;

This means that every time we jump to 'over' we're not clearing vstart,
which is an oversight that we're doing across the board.

Instead of setting vstart = 0 manually after each 'over' jump, remove
those brconds that are skipping helpers. The exception will be
trans_vmv_s_x() and trans_vfmv_s_f(): they don't use a helper and are
already clearing vstart manually in the 'over' label.

While we're at it, remove the (vl == 0) brconds from trans_rvbf16.c.inc
too since they're unneeded.

Suggested-by: Richard Henderson 
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
---
 target/riscv/insn_trans/trans_rvbf16.c.inc | 12 ---
 target/riscv/insn_trans/trans_rvv.c.inc| 99 --
 target/riscv/insn_trans/trans_rvvk.c.inc   | 18 
 3 files changed, 129 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvbf16.c.inc 
b/target/riscv/insn_trans/trans_rvbf16.c.inc
index 8ee99df3f3..a842e76a6b 100644
--- a/target/riscv/insn_trans/trans_rvbf16.c.inc
+++ b/target/riscv/insn_trans/trans_rvbf16.c.inc
@@ -71,11 +71,8 @@ static bool trans_vfncvtbf16_f_f_w(DisasContext *ctx, 
arg_vfncvtbf16_f_f_w *a)
 
 if (opfv_narrow_check(ctx, a) && (ctx->sew == MO_16)) {
 uint32_t data = 0;
-TCGLabel *over = gen_new_label();
 
 gen_set_rm_chkfrm(ctx, RISCV_FRM_DYN);
-tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
-tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
 data = FIELD_DP32(data, VDATA, VM, a->vm);
 data = FIELD_DP32(data, VDATA, LMUL, ctx->lmul);
@@ -87,7 +84,6 @@ static bool trans_vfncvtbf16_f_f_w(DisasContext *ctx, 
arg_vfncvtbf16_f_f_w *a)
ctx->cfg_ptr->vlenb, data,
gen_helper_vfncvtbf16_f_f_w);
 mark_vs_dirty(ctx);
-gen_set_label(over);
 return true;
 }
 return false;
@@ -100,11 +96,8 @@ static bool trans_vfwcvtbf16_f_f_v(DisasContext *ctx, 
arg_vfwcvtbf16_f_f_v *a)
 
 if (opfv_widen_check(ctx, a) && (ctx->sew == MO_16)) {
 uint32_t data = 0;
-TCGLabel *over = gen_new_label();
 
 gen_set_rm_chkfrm(ctx, RISCV_FRM_DYN);
-tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
-tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
 data = FIELD_DP32(data, VDATA, VM, a->vm);
 data = FIELD_DP32(data, VDATA, LMUL, ctx->lmul);
@@ -116,7 +109,6 @@ static bool trans_vfwcvtbf16_f_f_v(DisasContext *ctx, 
arg_vfwcvtbf16_f_f_v *a)
ctx->cfg_ptr->vlenb, data,
gen_helper_vfwcvtbf16_f_f_v);
 mark_vs_dirty(ctx);
-gen_set_label(over);
 return true;
 }
 return false;
@@ -130,11 +122,8 @@ static bool trans_vfwmaccbf16_vv(DisasContext *ctx, 
arg_vfwmaccbf16_vv *a)
 if (require_rvv(ctx) && vext_check_isa_ill(ctx) && (ctx->sew == MO_16) &&
 vext_check_dss(ctx, a->rd, a->rs1, a->rs2, a->vm)) {
 uint32_t data = 0;
-TCGLabel *over = gen_new_label();
 
 gen_set_rm_chkfrm(ctx, RISCV_FRM_DYN);
-tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
-tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
 data = FIELD_DP32(data, VDATA, VM, a->vm);
 data = FIELD_DP32(data, VDATA, LMUL, ctx->lmul);
@@ -147,7 +136,6 @@ static bool trans_vfwmaccbf16_vv(DisasContext *ctx, 
arg_vfwmaccbf16_vv *a)
ctx->cfg_ptr->vlenb, data,
gen_helper_vfwmaccbf16_vv);
 mark_vs_dirty(ctx);
-gen_set_label(over);
 return true;
 }
 return false;
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 1366445e1f..7931fb2f3f 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -616,9 +616,6 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
 TCGv base;
 TCGv_i32 desc;
 
-TCGLabel *over = gen_new_label();
-tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-
 dest = tcg_temp_new_ptr();
 mask = tcg_temp_new_ptr();
 base = get_gpr(s, rs1, EXT_NONE);
@@ -660,7 +657,6 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
 }
 
-gen_set_label(over);
 return true;
 }
 
@@ -802,9 +798,6 @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, 
uint32_t rs2,
 TCGv

[PATCH for 9.0 v15 00/10] target/riscv: vector fixes

2024-03-14 Thread Daniel Henrique Barboza

Hi,

The series was renamed to reflect that at this point we're fixing more
things than just vstart management.

In this new version a couple fixes were added:

- patch 3 (new) fixes the memcpy endianess in 'vmvr_v', as suggested by
  Richard;

- patch 5 (new) fixes ldst_whole insns to now clear vstart in all cases.
  The fix was proposed by Max.

Another notable change was made in patch 6 (patch 4 from v14). We're not
doing early exits in helpers that are gated by vstart_eq_zero. This was
found to cause side-effects with insns that wants to send faults if vl =
0, and for the rest it becomes a moot check since vstart is granted to
be zero beforehand.

Series based on master.

Patches missing acks: 3, 4, 5

Changes from v14:
- patch 3 (new):
  - make 'vmvr_v' big endian compliant
- patch 5 (new):
  - make ldst_whole insns clear vstart in all code paths
- patch 6 (patch 4 from v14):
  - do not add early exits on helpers that are gated with vstart_eq_zero
- v14 link: 
https://lore.kernel.org/qemu-riscv/20240313220141.427730-1-dbarb...@ventanamicro.com/

Daniel Henrique Barboza (9):
  target/riscv/vector_helper.c: set vstart = 0 in GEN_VEXT_VSLIDEUP_VX()
  trans_rvv.c.inc: set vstart = 0 in int scalar move insns
  target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess
  target/riscv: always clear vstart in whole vec move insns
  target/riscv: always clear vstart for ldst_whole insns
  target/riscv/vector_helpers: do early exit when vstart >= vl
  target/riscv: remove 'over' brconds from vector trans
  trans_rvv.c.inc: remove redundant mark_vs_dirty() calls
  target/riscv/vector_helper.c: optimize loops in ldst helpers

Ivan Klokov (1):
  target/riscv: enable 'vstart_eq_zero' in the end of insns

 target/riscv/insn_trans/trans_rvbf16.c.inc |  18 +-
 target/riscv/insn_trans/trans_rvv.c.inc| 244 ++---
 target/riscv/insn_trans/trans_rvvk.c.inc   |  30 +--
 target/riscv/translate.c   |   6 +
 target/riscv/vcrypto_helper.c  |  32 +++
 target/riscv/vector_helper.c   |  93 +++-
 target/riscv/vector_internals.c|   4 +
 target/riscv/vector_internals.h|   9 +
 8 files changed, 220 insertions(+), 216 deletions(-)

-- 
2.44.0

[PATCH for 9.0 v15 10/10] target/riscv/vector_helper.c: optimize loops in ldst helpers

2024-03-14 Thread Daniel Henrique Barboza

Change the for loops in ldst helpers to do a single increment in the
counter, and assign it env->vstart, to avoid re-reading from vstart
every time.

Suggested-by: Richard Henderson 
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Alistair Francis 
Reviewed-by: Richard Henderson 
---
 target/riscv/vector_helper.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 63a1083f03..fa139040f8 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -209,7 +209,7 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
 
 VSTART_CHECK_EARLY_EXIT(env);
 
-for (i = env->vstart; i < env->vl; i++, env->vstart++) {
+for (i = env->vstart; i < env->vl; env->vstart = ++i) {
 k = 0;
 while (k < nf) {
 if (!vm && !vext_elem_mask(v0, i)) {
@@ -277,7 +277,7 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState 
*env, uint32_t desc,
 VSTART_CHECK_EARLY_EXIT(env);
 
 /* load bytes from guest memory */
-for (i = env->vstart; i < evl; i++, env->vstart++) {
+for (i = env->vstart; i < evl; env->vstart = ++i) {
 k = 0;
 while (k < nf) {
 target_ulong addr = base + ((i * nf + k) << log2_esz);
@@ -393,7 +393,7 @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
 VSTART_CHECK_EARLY_EXIT(env);
 
 /* load bytes from guest memory */
-for (i = env->vstart; i < env->vl; i++, env->vstart++) {
+for (i = env->vstart; i < env->vl; env->vstart = ++i) {
 k = 0;
 while (k < nf) {
 if (!vm && !vext_elem_mask(v0, i)) {
-- 
2.44.0

[PATCH for 9.0 v15 05/10] target/riscv: always clear vstart for ldst_whole insns

2024-03-14 Thread Daniel Henrique Barboza

Commit 8ff8ac6329 added a conditional to guard the vext_ldst_whole()
helper if vstart >= evl. But by skipping the helper we're also not
setting vstart = 0 at the end of the insns, which is incorrect.

We'll move the conditional to vext_ldst_whole(), following in line with
the removal of all brconds vstart >= vl that the next patch will do. The
idea is to make the helpers responsible for their own vstart management.

Fix ldst_whole isns by:

- remove the brcond that skips the helper if vstart is >= evl;

- vext_ldst_whole() now does an early exit with the same check, where
  evl = (vlenb * nf) >> log2_esz, but the early exit will also clear
  vstart.

The 'width' param is now unneeded in ldst_whole_trans() and is also
removed. It was used for the evl calculation for the brcond and has no
other use now.  The 'width' is reflected in vext_ldst_whole() via
log2_esz, which is encoded by GEN_VEXT_LD_WHOLE() as
"ctzl(sizeof(ETYPE))".

Suggested-by: Max Chou 
Fixes: 8ff8ac6329 ("target/riscv: rvv: Add missing early exit condition for 
whole register load/store")
Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/insn_trans/trans_rvv.c.inc | 52 +++--
 target/riscv/vector_helper.c|  5 +++
 2 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 52c26a7834..1366445e1f 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -1097,13 +1097,9 @@ GEN_VEXT_TRANS(vle64ff_v, MO_64, r2nfvm, ldff_op, 
ld_us_check)
 typedef void gen_helper_ldst_whole(TCGv_ptr, TCGv, TCGv_env, TCGv_i32);
 
 static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
- uint32_t width, gen_helper_ldst_whole *fn,
+ gen_helper_ldst_whole *fn,
  DisasContext *s)
 {
-uint32_t evl = s->cfg_ptr->vlenb * nf / width;
-TCGLabel *over = gen_new_label();
-tcg_gen_brcondi_tl(TCG_COND_GEU, cpu_vstart, evl, over);
-
 TCGv_ptr dest;
 TCGv base;
 TCGv_i32 desc;
@@ -1120,8 +1116,6 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, 
uint32_t nf,
 
 fn(dest, base, tcg_env, desc);
 
-gen_set_label(over);
-
 return true;
 }
 
@@ -1129,42 +1123,42 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, 
uint32_t nf,
  * load and store whole register instructions ignore vtype and vl setting.
  * Thus, we don't need to check vill bit. (Section 7.9)
  */
-#define GEN_LDST_WHOLE_TRANS(NAME, ARG_NF, WIDTH)   \
+#define GEN_LDST_WHOLE_TRANS(NAME, ARG_NF)\
 static bool trans_##NAME(DisasContext *s, arg_##NAME * a) \
 { \
 if (require_rvv(s) && \
 QEMU_IS_ALIGNED(a->rd, ARG_NF)) { \
-return ldst_whole_trans(a->rd, a->rs1, ARG_NF, WIDTH, \
+return ldst_whole_trans(a->rd, a->rs1, ARG_NF,\
 gen_helper_##NAME, s);\
 } \
 return false; \
 }
 
-GEN_LDST_WHOLE_TRANS(vl1re8_v,  1, 1)
-GEN_LDST_WHOLE_TRANS(vl1re16_v, 1, 2)
-GEN_LDST_WHOLE_TRANS(vl1re32_v, 1, 4)
-GEN_LDST_WHOLE_TRANS(vl1re64_v, 1, 8)
-GEN_LDST_WHOLE_TRANS(vl2re8_v,  2, 1)
-GEN_LDST_WHOLE_TRANS(vl2re16_v, 2, 2)
-GEN_LDST_WHOLE_TRANS(vl2re32_v, 2, 4)
-GEN_LDST_WHOLE_TRANS(vl2re64_v, 2, 8)
-GEN_LDST_WHOLE_TRANS(vl4re8_v,  4, 1)
-GEN_LDST_WHOLE_TRANS(vl4re16_v, 4, 2)
-GEN_LDST_WHOLE_TRANS(vl4re32_v, 4, 4)
-GEN_LDST_WHOLE_TRANS(vl4re64_v, 4, 8)
-GEN_LDST_WHOLE_TRANS(vl8re8_v,  8, 1)
-GEN_LDST_WHOLE_TRANS(vl8re16_v, 8, 2)
-GEN_LDST_WHOLE_TRANS(vl8re32_v, 8, 4)
-GEN_LDST_WHOLE_TRANS(vl8re64_v, 8, 8)
+GEN_LDST_WHOLE_TRANS(vl1re8_v,  1)
+GEN_LDST_WHOLE_TRANS(vl1re16_v, 1)
+GEN_LDST_WHOLE_TRANS(vl1re32_v, 1)
+GEN_LDST_WHOLE_TRANS(vl1re64_v, 1)
+GEN_LDST_WHOLE_TRANS(vl2re8_v,  2)
+GEN_LDST_WHOLE_TRANS(vl2re16_v, 2)
+GEN_LDST_WHOLE_TRANS(vl2re32_v, 2)
+GEN_LDST_WHOLE_TRANS(vl2re64_v, 2)
+GEN_LDST_WHOLE_TRANS(vl4re8_v,  4)
+GEN_LDST_WHOLE_TRANS(vl4re16_v, 4)
+GEN_LDST_WHOLE_TRANS(vl4re32_v, 4)
+GEN_LDST_WHOLE_TRANS(vl4re64_v, 4)
+GEN_LDST_WHOLE_TRANS(vl8re8_v,  8)
+GEN_LDST_WHOLE_TRANS(vl8re16_v, 8)
+GEN_LDST_WHOLE_TRANS(vl8re32_v, 8)
+GEN_LDST_WHOLE_TRANS(vl8re64_v, 8)
 
 /*
  * The vector whole register store instructions are encoded similar to
  * unmasked unit-stride store of elements with EEW=8.
  */
-GEN_LDST_WHOLE_TRANS(vs1r_v, 1, 1)
-GEN_LDST_WHOLE_TRANS(vs2r_v, 2, 1)
-GEN_LDST_WHOLE_TRANS(vs4r_v, 4, 1)
-GEN_LDST_WHOLE_TRANS(vs8r_v, 8, 1)
+GEN_LDST_WHOLE_TRANS(vs1r_v, 1)
+GEN_LDST_WHOLE_TRANS(vs2r_v, 2)
+GEN_LDST_WHOLE_TRANS(vs4r_v, 4)

[PATCH for 9.0 v15 01/10] target/riscv/vector_helper.c: set vstart = 0 in GEN_VEXT_VSLIDEUP_VX()

2024-03-14 Thread Daniel Henrique Barboza

The helper isn't setting env->vstart = 0 after its execution, as it is
expected from every vector instruction that completes successfully.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
---
 target/riscv/vector_helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index fe56c007d5..ca79571ae2 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4781,6 +4781,7 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, 
void *vs2, \
 } \
 *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));  \
 } \
+env->vstart = 0;  \
 /* set tail elements to 1s */ \
 vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);  \
 }
-- 
2.44.0

[PATCH for 9.0 v15 03/10] target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess

2024-03-14 Thread Daniel Henrique Barboza

vmvr_v isn't handling the case where the host might be big endian and
the bytes to be copied aren't sequential.

Suggested-by: Richard Henderson 
Fixes: f714361ed7 ("target/riscv: rvv-1.0: implement vstart CSR")
Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/vector_helper.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index ca79571ae2..34ac4aa808 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -5075,9 +5075,17 @@ void HELPER(vmvr_v)(void *vd, void *vs2, CPURISCVState 
*env, uint32_t desc)
 uint32_t startb = env->vstart * sewb;
 uint32_t i = startb;
 
+if (HOST_BIG_ENDIAN && i % 8 != 0) {
+uint32_t j = ROUND_UP(i, 8);
+memcpy((uint8_t *)vd + H1(j - 1),
+   (uint8_t *)vs2 + H1(j - 1),
+   j - i);
+i = j;
+}
+
 memcpy((uint8_t *)vd + H1(i),
(uint8_t *)vs2 + H1(i),
-   maxsz - startb);
+   maxsz - i);
 
 env->vstart = 0;
 }
-- 
2.44.0

[PATCH for 9.0 v15 04/10] target/riscv: always clear vstart in whole vec move insns

2024-03-14 Thread Daniel Henrique Barboza

These insns have 2 paths: we'll either have vstart already cleared if
vstart_eq_zero or we'll do a brcond to check if vstart >= maxsz to call
the 'vmvr_v' helper. The helper will clear vstart if it executes until
the end, or if vstart >= vl.

For starters, the check itself is wrong: we're checking vstart >= maxsz,
when in fact we should use vstart in bytes, or 'startb' like 'vmvr_v' is
calling, to do the comparison. But even after fixing the comparison we'll
still need to clear vstart in the end, which isn't happening too.

We want to make the helpers responsible to manage vstart, including
these corner cases, precisely to avoid these situations:

- remove the wrong vstart >= maxsz cond from the translation;
- add a 'startb >= maxsz' cond in 'vmvr_v', and clear vstart if that
  happens.

This way we're now sure that vstart is being cleared in the end of the
execution, regardless of the path taken.

Fixes: f714361ed7 ("target/riscv: rvv-1.0: implement vstart CSR")
Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/insn_trans/trans_rvv.c.inc | 3 ---
 target/riscv/vector_helper.c| 5 +
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 8c16a9f5b3..52c26a7834 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -3664,12 +3664,9 @@ static bool trans_##NAME(DisasContext *s, arg_##NAME * 
a)   \
  vreg_ofs(s, a->rs2), maxsz, maxsz);\
 mark_vs_dirty(s);   \
 } else {\
-TCGLabel *over = gen_new_label();   \
-tcg_gen_brcondi_tl(TCG_COND_GEU, cpu_vstart, maxsz, over);  \
 tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2), \
tcg_env, maxsz, maxsz, 0, gen_helper_vmvr_v); \
 mark_vs_dirty(s);   \
-gen_set_label(over);\
 }   \
 return true;\
 }   \
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 34ac4aa808..bcc553c0e2 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -5075,6 +5075,11 @@ void HELPER(vmvr_v)(void *vd, void *vs2, CPURISCVState 
*env, uint32_t desc)
 uint32_t startb = env->vstart * sewb;
 uint32_t i = startb;
 
+if (startb >= maxsz) {
+env->vstart = 0;
+return;
+}
+
 if (HOST_BIG_ENDIAN && i % 8 != 0) {
 uint32_t j = ROUND_UP(i, 8);
 memcpy((uint8_t *)vd + H1(j - 1),
-- 
2.44.0

Re: [PATCH v2 2/2] migration/multifd: Ensure we're not given a socket for file migration

2024-03-14 Thread Peter Xu

On Thu, Mar 14, 2024 at 01:44:13PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Wed, Mar 13, 2024 at 06:28:24PM -0300, Fabiano Rosas wrote:
> >> When doing migration using the fd: URI, the incoming migration starts
> >> before the user has passed the file descriptor to QEMU. This means
> >> that the checks at migration_channels_and_transport_compatible()
> >> happen too soon and we need to allow a migration channel of type
> >> SOCKET_ADDRESS_TYPE_FD even though socket migration is not supported
> >> with multifd.
> >
> > Hmm, bare with me if this is a stupid one.. why the incoming migration can
> > start _before_ the user passed in the fd?
> 
> It's been a while since I looked at this. Looking into it once more
> today, I think the issue is actually that we only fetch the fds from the
> monitor at fd_start_outgoing|incoming_migration().

Yes that looks more reasonable.

It means we may want to touch up transport_supports_seeking()'s comment on
this if needed.

> 
> >
> > IOW, why can't we rely on a single fd_is_socket() check for
> > SOCKET_ADDRESS_TYPE_FD in transport_supports_multi_channels()?
> >
> 
> There's no fd at that point. Just a string.
> 
> I think the right fix here would be to move the
> monitor_fd_get/monitor_fd_param (why two different functions?)

Yes this is all confusing.

I think it makes sense to use the same guideline for both sides on the fd:
protocol, perhaps it means we should always use monitor_fd_param() to be
consistent and compatible.

> earlier into migrate_uri_parse. And possibly also extend
> FileMigrationArgs to contain an fd. Not sure how easy would that be.

But if so IIUC the 'filename' parameter will need to be optional if "fd"
exists.  While that will break QAPI for 8.2.

I think it's fine we keep what we do right now (with the above comment
fixed on why that check needs to be delayed, though), or if you find it
easy to unify the check in some undestructive way.

-- 
Peter Xu

Re: [PATCH v2 0/2] migration mapped-ram fixes

2024-03-14 Thread Peter Xu

On Thu, Mar 14, 2024 at 01:55:31PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Wed, Mar 13, 2024 at 06:28:22PM -0300, Fabiano Rosas wrote:
> >> Hi,
> >> 
> >> In this v2:
> >> 
> >> patch 1 - The fix for the ioc leaks, now including the main channel
> >> 
> >> patch 2 - A fix for an fd: migration case I thought I had written code
> >>   for, but obviously didn't.
> >
> > Maybe I found one more issue.. I'm looking at fd_start_outgoing_migration().
> >
> > ioc = qio_channel_new_fd(fd, errp);  <- here the fd is consumed and
> > then owned by the IOC
> > if (!ioc) {
> > close(fd);
> > return;
> > }
> >
> > outgoing_args.fd = fd;   <- here we use the fd again,
> > and "owned" by outgoing_args
> > even if it shouldn't?
> >
> > The problem is outgoing_args.fd will be cleaned up with a close().  I had a
> > feeling that it's possible it will close() something else if the fd reused
> > before that close() but after the IOC's.  We may want yet another dup() for
> > outgoing_args.fd?
> 
> I think the right fix is to not close() it at
> fd_cleanup_outgoing_migration(). That fd is already owned by the ioc.

But outgoing_args.fd can point to other things if the IOC (along with the
ioc->fd) is released.  Keeping outgoing_args.fd pointing to that fd index
should be dangerous because the integer can be reused.

> 
> >
> > If you agree, we may also want to avoid doing:
> >
> > outgoing_args.fd = -1;
> 
> We will always need this. This is just initialization of the field
> because 0 is a valid fd value. Otherwise the file.c code can't know if
> we're actually using an fd at all.

I meant avoid setting it to -1 only in fd_start_outgoing_migration().
Using -1 to represent "no fd" is fine.

> 
> @file_send_channel_create:
> 
> int fd = fd_args_get_fd();
> 
> if (fd && fd != -1) {
> 
> } else {
> 
> }
> 
> >
> > We could assert it instead making sure no fd leak.
> >
> >> 
> >> Thank you for your patience.
> >> 
> >> based-on: https://gitlab.com/peterx/qemu/-/commits/migration-stable
> >> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1212483701
> >> 
> >> Fabiano Rosas (2):
> >>   migration: Fix iocs leaks during file and fd migration
> >>   migration/multifd: Ensure we're not given a socket for file migration
> >> 
> >>  migration/fd.c   | 35 +++---
> >>  migration/file.c | 65 
> >>  migration/file.h |  1 +
> >>  3 files changed, 60 insertions(+), 41 deletions(-)
> >> 
> >> -- 
> >> 2.35.3
> >> 
> 

-- 
Peter Xu

Re: [PATCH v2] block: Use LVM tools for LV block device truncation

2024-03-14 Thread Daniel P . Berrangé

On Thu, Mar 14, 2024 at 06:25:00PM +0100, Alexander Ivanov wrote:
> 
> 
> On 3/14/24 13:44, Daniel P. Berrangé wrote:
> > On Wed, Mar 13, 2024 at 11:43:27AM +0100, Alexander Ivanov wrote:
> > > If a block device is an LVM logical volume we can resize it using
> > > standard LVM tools.
> > > 
> > > Add a helper to detect if a device is a DM device. In raw_co_truncate()
> > > check if the block device is DM and resize it executing lvresize.
> > > 
> > > Signed-off-by: Alexander Ivanov 
> > > ---
> > >   block/file-posix.c | 61 ++
> > >   1 file changed, 61 insertions(+)
> > > 
> > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > index 35684f7e21..5f07d98aa5 100644
> > > --- a/block/file-posix.c
> > > +++ b/block/file-posix.c
> > > @@ -2642,6 +2642,38 @@ raw_regular_truncate(BlockDriverState *bs, int fd, 
> > > int64_t offset,
> > >   return raw_thread_pool_submit(handle_aiocb_truncate, );
> > >   }
> > >   static int coroutine_fn raw_co_truncate(BlockDriverState *bs, int64_t 
> > > offset,
> > >   bool exact, PreallocMode 
> > > prealloc,
> > >   BdrvRequestFlags flags, Error 
> > > **errp)
> > > @@ -2670,6 +2702,35 @@ static int coroutine_fn 
> > > raw_co_truncate(BlockDriverState *bs, int64_t offset,
> > >   if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
> > >   int64_t cur_length = raw_getlength(bs);
> > > +/*
> > > + * Try to resize an LVM device using LVM tools.
> > > + */
> > > +if (device_is_dm() && offset > 0) {
> > > +int spawn_flags = G_SPAWN_SEARCH_PATH | 
> > > G_SPAWN_STDOUT_TO_DEV_NULL;
> > > +int status;
> > > +bool success;
> > > +char *err;
> > > +GError *gerr = NULL;
> > > +g_autofree char *size_str = g_strdup_printf("%ldB", offset);
> > offset is 64-bit, but '%ld' is not guaranteed to be 64-bit. I expect
> > this will break on 32-bit platforms. Try PRId64 instead.
> > 
> > > +const char *cmd[] = {"lvresize", "-f", "-L",
> > > + size_str, bs->filename, NULL};
> > > +
> > > +success = g_spawn_sync(NULL, (gchar **)cmd, NULL, 
> > > spawn_flags,
> > > +   NULL, NULL, NULL, , , 
> > > );
> > > +
> > > +if (success && WEXITSTATUS(status) == 0) {
> > > +return 0;
> > > +}
> > We should probably check  'g_spawn_check_wait_status' rather than
> > WEXITSTATUS, as this then gives us further eror message details
> > that
> Thank you.
> I think it would be better to use 'g_spawn_check_exit_status' because there
> is no
> 'g_spawn_check_wait_status' in glib before 2.70 and even in 2.78 it leads to
> 'g_spawn_check_wait_status is deprecated: Not available before 2.70' error.

Ah yes, well spotted.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2] block: Use LVM tools for LV block device truncation

2024-03-14 Thread Alexander Ivanov





On 3/14/24 13:44, Daniel P. Berrangé wrote:

On Wed, Mar 13, 2024 at 11:43:27AM +0100, Alexander Ivanov wrote:

If a block device is an LVM logical volume we can resize it using
standard LVM tools.

Add a helper to detect if a device is a DM device. In raw_co_truncate()
check if the block device is DM and resize it executing lvresize.

Signed-off-by: Alexander Ivanov 
---
  block/file-posix.c | 61 ++
  1 file changed, 61 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 35684f7e21..5f07d98aa5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2642,6 +2642,38 @@ raw_regular_truncate(BlockDriverState *bs, int fd, 
int64_t offset,
  return raw_thread_pool_submit(handle_aiocb_truncate, );
  }
  static int coroutine_fn raw_co_truncate(BlockDriverState *bs, int64_t offset,
  bool exact, PreallocMode prealloc,
  BdrvRequestFlags flags, Error **errp)
@@ -2670,6 +2702,35 @@ static int coroutine_fn raw_co_truncate(BlockDriverState 
*bs, int64_t offset,
  if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
  int64_t cur_length = raw_getlength(bs);
  
+/*

+ * Try to resize an LVM device using LVM tools.
+ */
+if (device_is_dm() && offset > 0) {
+int spawn_flags = G_SPAWN_SEARCH_PATH | G_SPAWN_STDOUT_TO_DEV_NULL;
+int status;
+bool success;
+char *err;
+GError *gerr = NULL;
+g_autofree char *size_str = g_strdup_printf("%ldB", offset);

offset is 64-bit, but '%ld' is not guaranteed to be 64-bit. I expect
this will break on 32-bit platforms. Try PRId64 instead.


+const char *cmd[] = {"lvresize", "-f", "-L",
+ size_str, bs->filename, NULL};
+
+success = g_spawn_sync(NULL, (gchar **)cmd, NULL, spawn_flags,
+   NULL, NULL, NULL, , , );
+
+if (success && WEXITSTATUS(status) == 0) {
+return 0;
+}

We should probably check  'g_spawn_check_wait_status' rather than
WEXITSTATUS, as this then gives us further eror message details
that

Thank you.
I think it would be better to use 'g_spawn_check_exit_status' because 
there is no
'g_spawn_check_wait_status' in glib before 2.70 and even in 2.78 it 
leads to

'g_spawn_check_wait_status is deprecated: Not available before 2.70' error.



+
+if (!success) {
+error_setg(errp, "lvresize execution error: %s", 
gerr->message);
+} else {
+error_setg(errp, "%s", err);

...we would also include here, such as the exit code or terminal
signal.


+}
+
+return -EINVAL;
+}
+
  if (offset != cur_length && exact) {
  error_setg(errp, "Cannot resize device files");
  return -ENOTSUP;
--
2.40.1



With regards,
Daniel


--
Best regards,
Alexander Ivanov

Re: question on s390x topology: KVM only, or also TCG?

2024-03-14 Thread Claudio Fontana

On 3/14/24 17:44, Nina Schoetterl-Glausch wrote:
> On Thu, 2024-03-14 at 16:54 +0100, Thomas Huth wrote:
>> On 14/03/2024 16.49, Claudio Fontana wrote:
>>> Hello Pierre, Ilya,
>>>
>>> I have a question on the s390x "topology" feature and examples.
>>>
>>> Mainly, is this feature supposed to be KVM accelerator-only, or also 
>>> available when using the TCG accelerator?
>>
>>   Hi Claudio!
>>
>> Pierre left IBM, please CC: Nina with regards to s390x topology instead.
>>
>> But with regards to your question, I think I can answer that, too: The 
>> topology feature is currently working with KVM only, yes. It hasn't been 
>> implemented for TCG yet.
>>
>>> (docs/devel/s390-cpu-topology.rst vs 
>>> https://www.qemu.org/docs/master/system/s390x/cpu-topology.html)
>>>
>>> I see stsi-topology.c in target/s390x/kvm/ , so that part is clearly 
>>> KVM-specific,
>>>
>>> but in hw/s390x/cpu-topology.c I read:
>>>
>>> "
>>>   * - The first part in this file is taking care of all common functions
>>>   *   used by KVM and TCG to create and modify the topology.
> 
> What Thomas said. Read this as the code in file being independent with 
> respect to the accelerator,
> it's just that TCG support is missing.
>  
> [...]
>>>
>>> So I would assume this is KVM-only, but then in the "Examples" section 
>>> below I see the example:
>>>
>>> "
>>> $ qemu-system-s390x -m 2G \
>>>-cpu gen16b,ctop=on \
> 
> TCG doesn't support this cpu ^ and so will refuse to run.
> 
>>>-smp cpus=5,sockets=8,cores=4,maxcpus=32 \
> 
> When running with TCG, drawers & books are supported by -smp also, but well, 
> you cannot do anything
> with that.
> 
> [...]
>>
> 

Thank you for your responses Thomas and Nina,
I have just sent a patch that tries to make it even more explicit.

"[PATCH] docs/s390: clarify even more that cpu-topology is KVM-only"

Thanks,

Claudio

[PATCH] docs/s390: clarify even more that cpu-topology is KVM-only

2024-03-14 Thread Claudio Fontana

At least for now cpu-topology is implemented only for KVM.

We already say this, but this tries to be more explicit,
and also show it in the examples.

This adds a new reference in the introduction that we can point to,
whenever we need to reference accelerators and how to select them.

Signed-off-by: Claudio Fontana 
---
 docs/system/introduction.rst   |  2 ++
 docs/system/s390x/cpu-topology.rst | 14 --
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/docs/system/introduction.rst b/docs/system/introduction.rst
index 51ac132d6c..746707eb00 100644
--- a/docs/system/introduction.rst
+++ b/docs/system/introduction.rst
@@ -1,6 +1,8 @@
 Introduction
 
 
+.. _Accelerators:
+
 Virtualisation Accelerators
 ---
 
diff --git a/docs/system/s390x/cpu-topology.rst 
b/docs/system/s390x/cpu-topology.rst
index 5133fdc362..ca344e273c 100644
--- a/docs/system/s390x/cpu-topology.rst
+++ b/docs/system/s390x/cpu-topology.rst
@@ -25,17 +25,19 @@ monitor polarization changes, see 
``docs/devel/s390-cpu-topology.rst``.
 Prerequisites
 -
 
-To use the CPU topology, you need to run with KVM on a s390x host that
-uses the Linux kernel v6.0 or newer (which provide the so-called
+To use the CPU topology, you currently need to choose the KVM accelerator.
+See :ref:`Accelerators` for more details about accelerators and how to select 
them.
+
+The s390x host needs to use a Linux kernel v6.0 or newer (which provides the 
so-called
 ``KVM_CAP_S390_CPU_TOPOLOGY`` capability that allows QEMU to signal the
 CPU topology facility via the so-called STFLE bit 11 to the VM).
 
 Enabling CPU topology
 -
 
-Currently, CPU topology is only enabled in the host model by default.
+Currently, CPU topology is enabled by default only in the "host" cpu model.
 
-Enabling CPU topology in a CPU model is done by setting the CPU flag
+Enabling CPU topology in another CPU model is done by setting the CPU flag
 ``ctop`` to ``on`` as in:
 
 .. code-block:: bash
@@ -132,7 +134,7 @@ In the following machine we define 8 sockets with 4 cores 
each.
 
 .. code-block:: bash
 
-  $ qemu-system-s390x -m 2G \
+  $ qemu-system-s390x -accel kvm -m 2G \
 -cpu gen16b,ctop=on \
 -smp cpus=5,sockets=8,cores=4,maxcpus=32 \
 -device host-s390x-cpu,core-id=14 \
@@ -227,7 +229,7 @@ with vertical high entitlement.
 
 .. code-block:: bash
 
-  $ qemu-system-s390x -m 2G \
+  $ qemu-system-s390x -accel kvm -m 2G \
 -cpu gen16b,ctop=on \
 -smp cpus=1,sockets=8,cores=4,maxcpus=32 \
 \
-- 
2.26.2

Re: [PATCH v6 3/3] target/riscv: Enable sdtrig for Ventana's Veyron CPUs

2024-03-14 Thread Andrew Jones

On Thu, Mar 14, 2024 at 05:05:10PM +0530, Himanshu Chauhan wrote:
> Ventana's Veyron CPUs support sdtrig ISA extension. By default, enable
> the sdtrig extension and disable the debug property for these CPUs.

The commit message needs to be updated to remove the 'and disable the
debug property'.

> 
> Signed-off-by: Himanshu Chauhan 
> ---
>  target/riscv/cpu.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 66c91fffd6..3c7ad1c903 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -569,6 +569,7 @@ static void rv64_veyron_v1_cpu_init(Object *obj)
>  cpu->cfg.cbom_blocksize = 64;
>  cpu->cfg.cboz_blocksize = 64;
>  cpu->cfg.ext_zicboz = true;
> +cpu->cfg.ext_sdtrig = true;
>  cpu->cfg.ext_smaia = true;
>  cpu->cfg.ext_ssaia = true;
>  cpu->cfg.ext_sscofpmf = true;
> -- 
> 2.34.1
>

Thanks,
drew

Re: [PATCH v6 2/3] target/riscv: Expose sdtrig ISA extension

2024-03-14 Thread Andrew Jones

On Thu, Mar 14, 2024 at 05:05:09PM +0530, Himanshu Chauhan wrote:
> This patch adds "sdtrig" in the ISA string when sdtrig extension is enabled.
> The sdtrig extension may or may not be implemented in a system. Therefore, the
>-cpu rv64,sdtrig=
> option can be used to dynamically turn sdtrig extension on or off.
> 
> Since, the sdtrig ISA extension is a superset of debug specification, disable
> the debug property when sdtrig is enabled. A warning is printed when this is
> done.
> 
> By default, the sdtrig extension is disabled and debug property enabled as 
> usual.
> 
> Signed-off-by: Himanshu Chauhan 
> ---
>  target/riscv/cpu.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 2602aae9f5..66c91fffd6 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -175,6 +175,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
>  ISA_EXT_DATA_ENTRY(zvkt, PRIV_VERSION_1_12_0, ext_zvkt),
>  ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
>  ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
> +ISA_EXT_DATA_ENTRY(sdtrig, PRIV_VERSION_1_12_0, ext_sdtrig),

sdtrig isn't in 1.12, is it? I think it's 1.13. Hmm, I wonder if we don't
need to audit all our recently added extensions to make sure they're
actually 1.12, since we don't have PRIV_VERSION_1_13_0 defined...

>  ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
>  ISA_EXT_DATA_ENTRY(smepmp, PRIV_VERSION_1_12_0, ext_smepmp),
>  ISA_EXT_DATA_ENTRY(smstateen, PRIV_VERSION_1_12_0, ext_smstateen),
> @@ -1008,6 +1009,11 @@ static void riscv_cpu_reset_hold(Object *obj)
>  set_default_nan_mode(1, >fp_status);
>  
>  #ifndef CONFIG_USER_ONLY
> +if (!cpu->cfg.debug && cpu->cfg.ext_sdtrig) {
> +warn_report("Enabling 'debug' since 'sdtrig' is enabled.");
> +cpu->cfg.debug = true;
> +}
> +
>  if (cpu->cfg.debug || cpu->cfg.ext_sdtrig) {
>  riscv_trigger_reset_hold(env);
>  }
> @@ -1480,6 +1486,7 @@ const RISCVCPUMultiExtConfig riscv_cpu_extensions[] = {
>  MULTI_EXT_CFG_BOOL("zvfhmin", ext_zvfhmin, false),
>  MULTI_EXT_CFG_BOOL("sstc", ext_sstc, true),
>  
> +MULTI_EXT_CFG_BOOL("sdtrig", ext_sdtrig, false),
>  MULTI_EXT_CFG_BOOL("smaia", ext_smaia, false),
>  MULTI_EXT_CFG_BOOL("smepmp", ext_smepmp, false),
>  MULTI_EXT_CFG_BOOL("smstateen", ext_smstateen, false),
> -- 
> 2.34.1
> 

Thanks,
drew

Re: [PATCH v6 1/3] target/riscv: Enable mcontrol6 triggers only when sdtrig is selected

2024-03-14 Thread Andrew Jones

On Thu, Mar 14, 2024 at 05:05:08PM +0530, Himanshu Chauhan wrote:
> The mcontrol6 triggers are not defined in debug specification v0.13
> These triggers are defined in sdtrig ISA extension.
> 
> This patch:
>* Adds ext_sdtrig capability which is used to select mcontrol6 triggers
>* Keeps the debug property. All triggers that are defined in v0.13 are
>  exposed.
> 
> Signed-off-by: Himanshu Chauhan 
> ---
>  target/riscv/cpu.c |  4 +-
>  target/riscv/cpu_cfg.h |  1 +
>  target/riscv/csr.c |  2 +-
>  target/riscv/debug.c   | 90 +-
>  4 files changed, 57 insertions(+), 40 deletions(-)
> 
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index c160b9216b..2602aae9f5 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -1008,7 +1008,7 @@ static void riscv_cpu_reset_hold(Object *obj)
>  set_default_nan_mode(1, >fp_status);
>  
>  #ifndef CONFIG_USER_ONLY
> -if (cpu->cfg.debug) {
> +if (cpu->cfg.debug || cpu->cfg.ext_sdtrig) {

I still don't see the point of adding '|| cpu->cfg.ext_sdtrig'. debug must
be true when ext_sdtrig is true.

>  riscv_trigger_reset_hold(env);
>  }
>  
> @@ -1168,7 +1168,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
> **errp)
>  riscv_cpu_register_gdb_regs_for_features(cs);
>  
>  #ifndef CONFIG_USER_ONLY
> -if (cpu->cfg.debug) {
> +if (cpu->cfg.debug || cpu->cfg.ext_sdtrig) {
>  riscv_trigger_realize(>env);
>  }
>  #endif
> diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
> index 2040b90da0..0c57e1acd4 100644
> --- a/target/riscv/cpu_cfg.h
> +++ b/target/riscv/cpu_cfg.h
> @@ -114,6 +114,7 @@ struct RISCVCPUConfig {
>  bool ext_zvfbfwma;
>  bool ext_zvfh;
>  bool ext_zvfhmin;
> +bool ext_sdtrig;
>  bool ext_smaia;
>  bool ext_ssaia;
>  bool ext_sscofpmf;
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 726096444f..26623d3640 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -546,7 +546,7 @@ static RISCVException have_mseccfg(CPURISCVState *env, 
> int csrno)
>  
>  static RISCVException debug(CPURISCVState *env, int csrno)
>  {
> -if (riscv_cpu_cfg(env)->debug) {
> +if (riscv_cpu_cfg(env)->debug || riscv_cpu_cfg(env)->ext_sdtrig) {
>  return RISCV_EXCP_NONE;
>  }
>  
> diff --git a/target/riscv/debug.c b/target/riscv/debug.c
> index e30d99cc2f..674223e966 100644
> --- a/target/riscv/debug.c
> +++ b/target/riscv/debug.c
> @@ -100,13 +100,15 @@ static trigger_action_t 
> get_trigger_action(CPURISCVState *env,
>  target_ulong tdata1 = env->tdata1[trigger_index];
>  int trigger_type = get_trigger_type(env, trigger_index);
>  trigger_action_t action = DBG_ACTION_NONE;
> +const RISCVCPUConfig *cfg = riscv_cpu_cfg(env);
>  
>  switch (trigger_type) {
>  case TRIGGER_TYPE_AD_MATCH:
>  action = (tdata1 & TYPE2_ACTION) >> 12;
>  break;
>  case TRIGGER_TYPE_AD_MATCH6:
> -action = (tdata1 & TYPE6_ACTION) >> 12;
> +if (cfg->ext_sdtrig)
> +action = (tdata1 & TYPE6_ACTION) >> 12;

QEMU requires {}, even for single line blocks. I'm not sure if QEMU's
checkpatch is smart enough to complain about that, but if you haven't
run checkpatch, then you probably should.

>  break;
>  case TRIGGER_TYPE_INST_CNT:
>  case TRIGGER_TYPE_INT:
> @@ -727,7 +729,12 @@ void tdata_csr_write(CPURISCVState *env, int 
> tdata_index, target_ulong val)
>  type2_reg_write(env, env->trigger_cur, tdata_index, val);
>  break;
>  case TRIGGER_TYPE_AD_MATCH6:
> -type6_reg_write(env, env->trigger_cur, tdata_index, val);
> +if (riscv_cpu_cfg(env)->ext_sdtrig) {
> +type6_reg_write(env, env->trigger_cur, tdata_index, val);
> +} else {
> +qemu_log_mask(LOG_UNIMP, "trigger type: %d is not supported\n",
> +  trigger_type);
> +}
>  break;
>  case TRIGGER_TYPE_INST_CNT:
>  itrigger_reg_write(env, env->trigger_cur, tdata_index, val);
> @@ -750,9 +757,14 @@ void tdata_csr_write(CPURISCVState *env, int 
> tdata_index, target_ulong val)
>  
>  target_ulong tinfo_csr_read(CPURISCVState *env)
>  {
> -/* assume all triggers support the same types of triggers */
> -return BIT(TRIGGER_TYPE_AD_MATCH) |
> -   BIT(TRIGGER_TYPE_AD_MATCH6);
> +target_ulong ts = 0;

Useless initialization to zero since it's assigned in the next line.
Actually, should just do

  target_ulong ts = BIT(TRIGGER_TYPE_AD_MATCH);

> +
> +ts = BIT(TRIGGER_TYPE_AD_MATCH);
> +
> +if (riscv_cpu_cfg(env)->ext_sdtrig)
> +ts |= BIT(TRIGGER_TYPE_AD_MATCH6);

Need {}

> +
> +return ts;
>  }
>  
>  void riscv_cpu_debug_excp_handler(CPUState *cs)
> @@ -803,19 +815,21 @@ bool riscv_cpu_debug_check_breakpoint(CPUState *cs)
>  }
>  break;
>  case TRIGGER_TYPE_AD_MATCH6:
> -

[PATCH for-9.0 2/2] iotests: Add test for reset/AioContext switches with NBD exports

2024-03-14 Thread Kevin Wolf

This replicates the scenario in which the bug was reported.
Unfortunately this relies on actually executing a guest (so that the
firmware initialises the virtio-blk device and moves it to its
configured iothread), so this can't make use of the qtest accelerator
like most other test cases. I tried to find a different easy way to
trigger the bug, but couldn't find one.

Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/tests/iothreads-nbd-export | 66 +++
 .../tests/iothreads-nbd-export.out| 19 ++
 2 files changed, 85 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/iothreads-nbd-export
 create mode 100644 tests/qemu-iotests/tests/iothreads-nbd-export.out

diff --git a/tests/qemu-iotests/tests/iothreads-nbd-export 
b/tests/qemu-iotests/tests/iothreads-nbd-export
new file mode 100755
index 00..63cac8fdbf
--- /dev/null
+++ b/tests/qemu-iotests/tests/iothreads-nbd-export
@@ -0,0 +1,66 @@
+#!/usr/bin/env python3
+# group: rw quick
+#
+# Copyright (C) 2024 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+# Creator/Owner: Kevin Wolf 
+
+import asyncio
+import iotests
+import qemu
+import time
+
+iotests.script_initialize(supported_fmts=['qcow2'],
+  supported_platforms=['linux'])
+
+with iotests.FilePath('disk1.img') as path, \
+ iotests.FilePath('nbd.sock', base_dir=iotests.sock_dir) as nbd_sock, \
+ qemu.machine.QEMUMachine(iotests.qemu_prog) as vm:
+
+img_size = '10M'
+
+iotests.log('Preparing disk...')
+iotests.qemu_img_create('-f', iotests.imgfmt, path, img_size)
+vm.add_args('-blockdev', f'file,node-name=disk-file,filename={path}')
+vm.add_args('-blockdev', f'qcow2,node-name=disk,file=disk-file')
+vm.add_args('-object', 'iothread,id=iothread0')
+vm.add_args('-device', 
'virtio-blk,drive=disk,iothread=iothread0,share-rw=on')
+
+iotests.log('Launching VM...')
+vm.add_args('-accel', 'kvm', '-accel', 'tcg')
+#vm.add_args('-accel', 'qtest')
+vm.launch()
+
+iotests.log('Exporting to NBD...')
+iotests.log(vm.qmp('nbd-server-start',
+   addr={'type': 'unix', 'data': {'path': nbd_sock}}))
+iotests.log(vm.qmp('block-export-add', type='nbd', id='exp0',
+   node_name='disk', writable=True))
+
+iotests.log('Connecting qemu-img...')
+qemu_io = iotests.QemuIoInteractive('-f', 'raw',
+f'nbd+unix:///disk?socket={nbd_sock}')
+
+iotests.log('Moving the NBD export to a different iothread...')
+for i in range(0, 10):
+iotests.log(vm.qmp('system_reset'))
+time.sleep(0.1)
+
+iotests.log('Checking that it is still alive...')
+iotests.log(vm.qmp('query-status'))
+
+qemu_io.close()
+vm.shutdown()
diff --git a/tests/qemu-iotests/tests/iothreads-nbd-export.out 
b/tests/qemu-iotests/tests/iothreads-nbd-export.out
new file mode 100644
index 00..bc514e35e5
--- /dev/null
+++ b/tests/qemu-iotests/tests/iothreads-nbd-export.out
@@ -0,0 +1,19 @@
+Preparing disk...
+Launching VM...
+Exporting to NBD...
+{"return": {}}
+{"return": {}}
+Connecting qemu-img...
+Moving the NBD export to a different iothread...
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+Checking that it is still alive...
+{"return": {"running": true, "status": "running"}}
-- 
2.44.0

[PATCH for-9.0 0/2] nbd: Fix server crash on reset with iothreads

2024-03-14 Thread Kevin Wolf

Kevin Wolf (2):
  nbd/server: Fix race in draining the export
  iotests: Add test for reset/AioContext switches with NBD exports

 nbd/server.c  | 15 ++---
 tests/qemu-iotests/tests/iothreads-nbd-export | 66 +++
 .../tests/iothreads-nbd-export.out| 19 ++
 3 files changed, 92 insertions(+), 8 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/iothreads-nbd-export
 create mode 100644 tests/qemu-iotests/tests/iothreads-nbd-export.out

-- 
2.44.0

[PATCH for-9.0 1/2] nbd/server: Fix race in draining the export

2024-03-14 Thread Kevin Wolf

When draining an NBD export, nbd_drained_begin() first sets
client->quiescing so that nbd_client_receive_next_request() won't start
any new request coroutines. Then nbd_drained_poll() tries to makes sure
that we wait for any existing request coroutines by checking that
client->nb_requests has become 0.

However, there is a small window between creating a new request
coroutine and increasing client->nb_requests. If a coroutine is in this
state, it won't be waited for and drain returns too early.

In the context of switching to a different AioContext, this means that
blk_aio_attached() will see client->recv_coroutine != NULL and fail its
assertion.

Fix this by increasing client->nb_requests immediately when starting the
coroutine. Doing this after the checks if we should create a new
coroutine is okay because client->lock is held.

Cc: qemu-sta...@nongnu.org
Fixes: fd6afc501a019682d1b8468b562355a2887087bd
Signed-off-by: Kevin Wolf 
---
 nbd/server.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 941832f178..c3484cc1eb 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -3007,8 +3007,8 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
 /* Owns a reference to the NBDClient passed as opaque.  */
 static coroutine_fn void nbd_trip(void *opaque)
 {
-NBDClient *client = opaque;
-NBDRequestData *req = NULL;
+NBDRequestData *req = opaque;
+NBDClient *client = req->client;
 NBDRequest request = { 0 };/* GCC thinks it can be used uninitialized 
*/
 int ret;
 Error *local_err = NULL;
@@ -3037,8 +3037,6 @@ static coroutine_fn void nbd_trip(void *opaque)
 goto done;
 }
 
-req = nbd_request_get(client);
-
 /*
  * nbd_co_receive_request() returns -EAGAIN when nbd_drained_begin() has
  * set client->quiescing but by the time we get back nbd_drained_end() may
@@ -3112,9 +3110,7 @@ static coroutine_fn void nbd_trip(void *opaque)
 }
 
 done:
-if (req) {
-nbd_request_put(req);
-}
+nbd_request_put(req);
 
 qemu_mutex_unlock(>lock);
 
@@ -3143,10 +3139,13 @@ disconnect:
  */
 static void nbd_client_receive_next_request(NBDClient *client)
 {
+NBDRequestData *req;
+
 if (!client->recv_coroutine && client->nb_requests < MAX_NBD_REQUESTS &&
 !client->quiescing) {
 nbd_client_get(client);
-client->recv_coroutine = qemu_coroutine_create(nbd_trip, client);
+req = nbd_request_get(client);
+client->recv_coroutine = qemu_coroutine_create(nbd_trip, req);
 aio_co_schedule(client->exp->common.ctx, client->recv_coroutine);
 }
 }
-- 
2.44.0

Re: [PATCH v2 0/2] migration mapped-ram fixes

2024-03-14 Thread Fabiano Rosas

Peter Xu  writes:

> On Wed, Mar 13, 2024 at 06:28:22PM -0300, Fabiano Rosas wrote:
>> Hi,
>> 
>> In this v2:
>> 
>> patch 1 - The fix for the ioc leaks, now including the main channel
>> 
>> patch 2 - A fix for an fd: migration case I thought I had written code
>>   for, but obviously didn't.
>
> Maybe I found one more issue.. I'm looking at fd_start_outgoing_migration().
>
> ioc = qio_channel_new_fd(fd, errp);  <- here the fd is consumed and
> then owned by the IOC
> if (!ioc) {
> close(fd);
> return;
> }
>
> outgoing_args.fd = fd;   <- here we use the fd again,
> and "owned" by outgoing_args
> even if it shouldn't?
>
> The problem is outgoing_args.fd will be cleaned up with a close().  I had a
> feeling that it's possible it will close() something else if the fd reused
> before that close() but after the IOC's.  We may want yet another dup() for
> outgoing_args.fd?

I think the right fix is to not close() it at
fd_cleanup_outgoing_migration(). That fd is already owned by the ioc.

>
> If you agree, we may also want to avoid doing:
>
> outgoing_args.fd = -1;

We will always need this. This is just initialization of the field
because 0 is a valid fd value. Otherwise the file.c code can't know if
we're actually using an fd at all.

@file_send_channel_create:

int fd = fd_args_get_fd();

if (fd && fd != -1) {

} else {

}

>
> We could assert it instead making sure no fd leak.
>
>> 
>> Thank you for your patience.
>> 
>> based-on: https://gitlab.com/peterx/qemu/-/commits/migration-stable
>> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1212483701
>> 
>> Fabiano Rosas (2):
>>   migration: Fix iocs leaks during file and fd migration
>>   migration/multifd: Ensure we're not given a socket for file migration
>> 
>>  migration/fd.c   | 35 +++---
>>  migration/file.c | 65 
>>  migration/file.h |  1 +
>>  3 files changed, 60 insertions(+), 41 deletions(-)
>> 
>> -- 
>> 2.35.3
>>

Re: [PATCH v2 2/2] migration/multifd: Ensure we're not given a socket for file migration

2024-03-14 Thread Fabiano Rosas

Peter Xu  writes:

> On Thu, Mar 14, 2024 at 11:10:12AM -0400, Peter Xu wrote:
>> On Wed, Mar 13, 2024 at 06:28:24PM -0300, Fabiano Rosas wrote:
>> > When doing migration using the fd: URI, the incoming migration starts
>> > before the user has passed the file descriptor to QEMU. This means
>> > that the checks at migration_channels_and_transport_compatible()
>> > happen too soon and we need to allow a migration channel of type
>> > SOCKET_ADDRESS_TYPE_FD even though socket migration is not supported
>> > with multifd.
>> 
>> Hmm, bare with me if this is a stupid one.. why the incoming migration can
>> start _before_ the user passed in the fd?
>> 
>> IOW, why can't we rely on a single fd_is_socket() check for
>> SOCKET_ADDRESS_TYPE_FD in transport_supports_multi_channels()?
>> 
>> > 
>> > The commit decdc76772 ("migration/multifd: Add mapped-ram support to
>> > fd: URI") was supposed to add a second check prior to starting
>> > migration to make sure a socket fd is not passed instead of a file fd,
>> > but failed to do so.
>> > 
>> > Add the missing verification.
>> > 
>> > Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI")
>> > Signed-off-by: Fabiano Rosas 
>> > ---
>> >  migration/fd.c   | 8 
>> >  migration/file.c | 7 +++
>> >  2 files changed, 15 insertions(+)
>> > 
>> > diff --git a/migration/fd.c b/migration/fd.c
>> > index 39a52e5c90..c07030f715 100644
>> > --- a/migration/fd.c
>> > +++ b/migration/fd.c
>> > @@ -22,6 +22,7 @@
>> >  #include "migration.h"
>> >  #include "monitor/monitor.h"
>> >  #include "io/channel-file.h"
>> > +#include "io/channel-socket.h"
>> >  #include "io/channel-util.h"
>> >  #include "options.h"
>> >  #include "trace.h"
>> > @@ -95,6 +96,13 @@ void fd_start_incoming_migration(const char *fdname, 
>> > Error **errp)
>> >  }
>> >  
>> >  if (migrate_multifd()) {
>> > +if (fd_is_socket(fd)) {
>> > +error_setg(errp,
>> > +   "Multifd migration to a socket FD is not 
>> > supported");
>> > +object_unref(ioc);
>> > +return;
>> > +}
>
> And... I just noticed this is forbiding multifd+socket+fd in general?  But
> isn't that the majority of multifd usage when with libvirt over sockets?

I didn't think multifd supported socket fds, does it? I don't see code
to create the multiple channels anywhere. How would that work? Multiple
threads writing to a single socket fd? I'm a bit confused.

>
> Shouldn't it about fd's seekable-or-not instead when mapped-ram enabled
> (IOW, migration_needs_seekable_channel() only)?

Yes, that could be a validation to be done if we actually get the fd at
the right moment.

>
>> > +
>> >  file_create_incoming_channels(ioc, errp);
>> >  } else {
>> >  qio_channel_set_name(ioc, "migration-fd-incoming");
>> > diff --git a/migration/file.c b/migration/file.c
>> > index ddde0ca818..b6e8ba13f2 100644
>> > --- a/migration/file.c
>> > +++ b/migration/file.c
>> > @@ -15,6 +15,7 @@
>> >  #include "file.h"
>> >  #include "migration.h"
>> >  #include "io/channel-file.h"
>> > +#include "io/channel-socket.h"
>> >  #include "io/channel-util.h"
>> >  #include "options.h"
>> >  #include "trace.h"
>> > @@ -58,6 +59,12 @@ bool file_send_channel_create(gpointer opaque, Error 
>> > **errp)
>> >  int fd = fd_args_get_fd();
>> >  
>> >  if (fd && fd != -1) {
>> > +if (fd_is_socket(fd)) {
>> > +error_setg(errp,
>> > +   "Multifd migration to a socket FD is not 
>> > supported");
>> > +goto out;
>> > +}
>> > +
>> >  ioc = qio_channel_file_new_dupfd(fd, errp);
>> >  } else {
>> >  ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, 
>> > errp);
>> > -- 
>> > 2.35.3
>> > 
>> 
>> -- 
>> Peter Xu

Re: question on s390x topology: KVM only, or also TCG?

2024-03-14 Thread Nina Schoetterl-Glausch

On Thu, 2024-03-14 at 16:54 +0100, Thomas Huth wrote:
> On 14/03/2024 16.49, Claudio Fontana wrote:
> > Hello Pierre, Ilya,
> > 
> > I have a question on the s390x "topology" feature and examples.
> > 
> > Mainly, is this feature supposed to be KVM accelerator-only, or also 
> > available when using the TCG accelerator?
> 
>   Hi Claudio!
> 
> Pierre left IBM, please CC: Nina with regards to s390x topology instead.
> 
> But with regards to your question, I think I can answer that, too: The 
> topology feature is currently working with KVM only, yes. It hasn't been 
> implemented for TCG yet.
> 
> > (docs/devel/s390-cpu-topology.rst vs 
> > https://www.qemu.org/docs/master/system/s390x/cpu-topology.html)
> > 
> > I see stsi-topology.c in target/s390x/kvm/ , so that part is clearly 
> > KVM-specific,
> > 
> > but in hw/s390x/cpu-topology.c I read:
> > 
> > "
> >   * - The first part in this file is taking care of all common functions
> >   *   used by KVM and TCG to create and modify the topology.

What Thomas said. Read this as the code in file being independent with respect 
to the accelerator,
it's just that TCG support is missing.
 
[...]
> > 
> > So I would assume this is KVM-only, but then in the "Examples" section 
> > below I see the example:
> > 
> > "
> > $ qemu-system-s390x -m 2G \
> >-cpu gen16b,ctop=on \

TCG doesn't support this cpu ^ and so will refuse to run.

> >-smp cpus=5,sockets=8,cores=4,maxcpus=32 \

When running with TCG, drawers & books are supported by -smp also, but well, 
you cannot do anything
with that.

[...]
>

Re: [PATCH v2 2/2] migration/multifd: Ensure we're not given a socket for file migration

2024-03-14 Thread Fabiano Rosas

Peter Xu  writes:

> On Wed, Mar 13, 2024 at 06:28:24PM -0300, Fabiano Rosas wrote:
>> When doing migration using the fd: URI, the incoming migration starts
>> before the user has passed the file descriptor to QEMU. This means
>> that the checks at migration_channels_and_transport_compatible()
>> happen too soon and we need to allow a migration channel of type
>> SOCKET_ADDRESS_TYPE_FD even though socket migration is not supported
>> with multifd.
>
> Hmm, bare with me if this is a stupid one.. why the incoming migration can
> start _before_ the user passed in the fd?

It's been a while since I looked at this. Looking into it once more
today, I think the issue is actually that we only fetch the fds from the
monitor at fd_start_outgoing|incoming_migration().

>
> IOW, why can't we rely on a single fd_is_socket() check for
> SOCKET_ADDRESS_TYPE_FD in transport_supports_multi_channels()?
>

There's no fd at that point. Just a string.

I think the right fix here would be to move the
monitor_fd_get/monitor_fd_param (why two different functions?) earlier
into migrate_uri_parse. And possibly also extend FileMigrationArgs to
contain an fd. Not sure how easy would that be.

>> 
>> The commit decdc76772 ("migration/multifd: Add mapped-ram support to
>> fd: URI") was supposed to add a second check prior to starting
>> migration to make sure a socket fd is not passed instead of a file fd,
>> but failed to do so.
>> 
>> Add the missing verification.
>> 
>> Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI")
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  migration/fd.c   | 8 
>>  migration/file.c | 7 +++
>>  2 files changed, 15 insertions(+)
>> 
>> diff --git a/migration/fd.c b/migration/fd.c
>> index 39a52e5c90..c07030f715 100644
>> --- a/migration/fd.c
>> +++ b/migration/fd.c
>> @@ -22,6 +22,7 @@
>>  #include "migration.h"
>>  #include "monitor/monitor.h"
>>  #include "io/channel-file.h"
>> +#include "io/channel-socket.h"
>>  #include "io/channel-util.h"
>>  #include "options.h"
>>  #include "trace.h"
>> @@ -95,6 +96,13 @@ void fd_start_incoming_migration(const char *fdname, 
>> Error **errp)
>>  }
>>  
>>  if (migrate_multifd()) {
>> +if (fd_is_socket(fd)) {
>> +error_setg(errp,
>> +   "Multifd migration to a socket FD is not supported");
>> +object_unref(ioc);
>> +return;
>> +}
>> +
>>  file_create_incoming_channels(ioc, errp);
>>  } else {
>>  qio_channel_set_name(ioc, "migration-fd-incoming");
>> diff --git a/migration/file.c b/migration/file.c
>> index ddde0ca818..b6e8ba13f2 100644
>> --- a/migration/file.c
>> +++ b/migration/file.c
>> @@ -15,6 +15,7 @@
>>  #include "file.h"
>>  #include "migration.h"
>>  #include "io/channel-file.h"
>> +#include "io/channel-socket.h"
>>  #include "io/channel-util.h"
>>  #include "options.h"
>>  #include "trace.h"
>> @@ -58,6 +59,12 @@ bool file_send_channel_create(gpointer opaque, Error 
>> **errp)
>>  int fd = fd_args_get_fd();
>>  
>>  if (fd && fd != -1) {
>> +if (fd_is_socket(fd)) {
>> +error_setg(errp,
>> +   "Multifd migration to a socket FD is not supported");
>> +goto out;
>> +}
>> +
>>  ioc = qio_channel_file_new_dupfd(fd, errp);
>>  } else {
>>  ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, 
>> errp);
>> -- 
>> 2.35.3
>>

Re: [PATCH V4 1/1] target/loongarch: Fixed tlb huge page loading issue

2024-03-14 Thread Richard Henderson


On 3/13/24 15:33, Xianglai Li wrote:

+if (unlikely((level == 0) || (level > 4))) {
+return base;
+}
+
+if (FIELD_EX64(base, TLBENTRY, HUGE)) {
+if (FIELD_EX64(base, TLBENTRY, LEVEL)) {
+return base;
+} else {
+return  FIELD_DP64(base, TLBENTRY, LEVEL, level);
+}
+
+if (unlikely(level == 4)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "Attempted use of level %lu huge page\n", level);
+}


This block is unreachable, because you've already returned.
Perhaps it would be worthwhile to add another for the level==0 or > 4 case 
above?


@@ -530,20 +553,34 @@ void helper_ldpte(CPULoongArchState *env, target_ulong 
base, target_ulong odd,
  CPUState *cs = env_cpu(env);
  target_ulong phys, tmp0, ptindex, ptoffset0, ptoffset1, ps, badv;
  int shift;
-bool huge = (base >> LOONGARCH_PAGE_HUGE_SHIFT) & 0x1;
  uint64_t ptbase = FIELD_EX64(env->CSR_PWCL, CSR_PWCL, PTBASE);
  uint64_t ptwidth = FIELD_EX64(env->CSR_PWCL, CSR_PWCL, PTWIDTH);
+uint64_t dir_base, dir_width;
  
  base = base & TARGET_PHYS_MASK;

+if (FIELD_EX64(base, TLBENTRY, HUGE)) {
+/*
+ * Gets the huge page level and Gets huge page size
+ * Clears the huge page level information in the address
+ * Clears huge page bit
+ */
+get_dir_base_width(env, _base, _width,
+   FIELD_EX64(base, TLBENTRY, LEVEL));
+
+FIELD_DP64(base, TLBENTRY, LEVEL, 0);
+FIELD_DP64(base, TLBENTRY, HUGE, 0);
+if (FIELD_EX64(base, TLBENTRY, HG)) {
+FIELD_DP64(base, TLBENTRY, HG, 0);
+FIELD_DP64(base, TLBENTRY, G, 1);


FIELD_DP64 returns a value.  You need

base = FIELD_DP64(base, ...);


r~

Re: [PATCH v2 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-14 Thread Jonah Palmer





On 3/14/24 10:55 AM, Eugenio Perez Martin wrote:

On Thu, Mar 14, 2024 at 1:16 PM Jonah Palmer  wrote:




On 3/13/24 11:01 PM, Jason Wang wrote:

On Wed, Mar 13, 2024 at 7:55 PM Jonah Palmer  wrote:


Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.

The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.

In a split virtqueue layout, this data includes:
   - upper 16 bits: shadow_avail_idx
   - lower 16 bits: virtqueue index

In a packed virtqueue layout, this data includes:
   - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
   - lower 16 bits: virtqueue index

Tested-by: Lei Yang 
Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
   hw/virtio/virtio-pci.c | 10 +++---
   hw/virtio/virtio.c | 18 ++
   include/hw/virtio/virtio.h |  1 +
   3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index cb6940fc0e..0f5c3c3b2f 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
   {
   VirtIOPCIProxy *proxy = opaque;
   VirtIODevice *vdev = virtio_bus_get_device(>bus);
-uint16_t vector;
+uint16_t vector, vq_idx;
   hwaddr pa;

   switch (addr) {
@@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
   vdev->queue_sel = val;
   break;
   case VIRTIO_PCI_QUEUE_NOTIFY:
-if (val < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, val);
+vq_idx = val;
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_data(vdev, val);
+}
+virtio_queue_notify(vdev, vq_idx);
   }
   break;
   case VIRTIO_PCI_STATUS:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d229755eae..bcb9e09df0 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, 
int align)
   }
   }

+void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data)


Maybe I didn't explain well, but I think it is better to pass directly
idx to a VirtQueue *. That way only the caller needs to check for a
valid vq idx, and (my understanding is) the virtio.c interface is
migrating to VirtQueue * use anyway.



Oh, are you saying to just pass in a VirtQueue *vq instead of 
VirtIODevice *vdev and get rid of the vq->vring.desc check in the function?



+{
+/* Lower 16 bits is the virtqueue index */
+uint16_t i = data;
+VirtQueue *vq = >vq[i];
+
+if (!vq->vring.desc) {
+return;
+}
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+vq->shadow_avail_wrap_counter = (data >> 31) & 0x1;
+vq->shadow_avail_idx = (data >> 16) & 0x7FFF;
+} else {
+vq->shadow_avail_idx = (data >> 16);


Do we need to do a sanity check for this value?

Thanks



It can't hurt, right? What kind of check did you have in mind?

if (vq->shadow_avail_idx >= vq->vring.num)



I'm a little bit lost too. shadow_avail_idx can take all uint16_t
values. Maybe you meant checking for a valid vq index, Jason?

Thanks!


Or something else?


+}
+}
+
   static void virtio_queue_notify_vq(VirtQueue *vq)
   {
   if (vq->vring.desc && vq->handle_output) {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8f72850bc..53915947a7 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -335,6 +335,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n);
   void virtio_init_region_cache(VirtIODevice *vdev, int n);
   void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
   void virtio_queue_notify(VirtIODevice *vdev, int n);
+void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data);
   uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
   void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
   int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
--
2.39.3

Re: question on s390x topology: KVM only, or also TCG?

2024-03-14 Thread Thomas Huth


On 14/03/2024 16.49, Claudio Fontana wrote:

Hello Pierre, Ilya,

I have a question on the s390x "topology" feature and examples.

Mainly, is this feature supposed to be KVM accelerator-only, or also available 
when using the TCG accelerator?


 Hi Claudio!

Pierre left IBM, please CC: Nina with regards to s390x topology instead.

But with regards to your question, I think I can answer that, too: The 
topology feature is currently working with KVM only, yes. It hasn't been 
implemented for TCG yet.



(docs/devel/s390-cpu-topology.rst vs 
https://www.qemu.org/docs/master/system/s390x/cpu-topology.html)

I see stsi-topology.c in target/s390x/kvm/ , so that part is clearly 
KVM-specific,

but in hw/s390x/cpu-topology.c I read:

"
  * - The first part in this file is taking care of all common functions
  *   used by KVM and TCG to create and modify the topology.
  *
  * - The second part, building the topology information data for the
  *   guest with CPU and KVM specificity will be implemented inside
  *   the target/s390/kvm sub tree.
"

In the docs/devel/s390-cpu-topology.rst

I see the example command:

  qemu-system-s390x \
 -enable-kvm \
 -cpu z14,ctop=on \
 -smp 1,drawers=3,books=3,sockets=2,cores=2,maxcpus=36 \
 -device z14-s390x-cpu,core-id=19,entitlement=high \
 -device z14-s390x-cpu,core-id=11,entitlement=low \
 -device z14-s390x-cpu,core-id=12,entitlement=high \
...


which uses KVM only.

In https://www.qemu.org/docs/master/system/s390x/cpu-topology.html

I read:

"Prerequisites:
To use the CPU topology, you need to run with KVM on a s390x host that uses the 
Linux kernel v6.0 or newer (which provide the so-called 
KVM_CAP_S390_CPU_TOPOLOGY capability that allows QEMU to signal the CPU 
topology facility via the so-called STFLE bit 11 to the VM).
"

So I would assume this is KVM-only, but then in the "Examples" section below I 
see the example:

"
$ qemu-system-s390x -m 2G \
   -cpu gen16b,ctop=on \
   -smp cpus=5,sockets=8,cores=4,maxcpus=32 \
   -device host-s390x-cpu,core-id=14 \
"

and

"
qemu-system-s390x -m 2G \
   -cpu gen16b,ctop=on \
   -smp cpus=1,sockets=8,cores=4,maxcpus=32 \
   \
   -device gen16b-s390x-cpu,drawer-id=1,book-id=1,socket-id=2,core-id=1 \
   -device gen16b-s390x-cpu,drawer-id=1,book-id=1,socket-id=2,core-id=2 \
   -device gen16b-s390x-cpu,drawer-id=1,book-id=1,socket-id=2,core-id=3 \
   \
   -device gen16b-s390x-cpu,drawer-id=0,book-id=0,socket-id=0,core-id=9 \
   -device gen16b-s390x-cpu,drawer-id=0,book-id=0,socket-id=0,core-id=14 \
   \
   -device gen16b-s390x-cpu,core-id=4,dedicated=on,entitlement=high
"

We received questions about this, so I hope you can shed some light, maybe it 
would be good to just update the web page to include -accel kvm or -enable-kvm 
everywhere for clarity?


Yes, it would be better to include "-accel kvm" in those examples. Would you 
like to send a patch?


 Thanks,
  Thomas

question on s390x topology: KVM only, or also TCG?

2024-03-14 Thread Claudio Fontana

Hello Pierre, Ilya,

I have a question on the s390x "topology" feature and examples.

Mainly, is this feature supposed to be KVM accelerator-only, or also available 
when using the TCG accelerator?

(docs/devel/s390-cpu-topology.rst vs 
https://www.qemu.org/docs/master/system/s390x/cpu-topology.html)

I see stsi-topology.c in target/s390x/kvm/ , so that part is clearly 
KVM-specific,

but in hw/s390x/cpu-topology.c I read:

"
 * - The first part in this file is taking care of all common functions 

 *   used by KVM and TCG to create and modify the topology. 

 *  

 * - The second part, building the topology information data for the

 *   guest with CPU and KVM specificity will be implemented inside  

 *   the target/s390/kvm sub tree.
"

In the docs/devel/s390-cpu-topology.rst

I see the example command:

 qemu-system-s390x \
-enable-kvm \
-cpu z14,ctop=on \
-smp 1,drawers=3,books=3,sockets=2,cores=2,maxcpus=36 \
-device z14-s390x-cpu,core-id=19,entitlement=high \
-device z14-s390x-cpu,core-id=11,entitlement=low \
-device z14-s390x-cpu,core-id=12,entitlement=high \
   ...


which uses KVM only.

In https://www.qemu.org/docs/master/system/s390x/cpu-topology.html

I read:

"Prerequisites:
To use the CPU topology, you need to run with KVM on a s390x host that uses the 
Linux kernel v6.0 or newer (which provide the so-called 
KVM_CAP_S390_CPU_TOPOLOGY capability that allows QEMU to signal the CPU 
topology facility via the so-called STFLE bit 11 to the VM).
"

So I would assume this is KVM-only, but then in the "Examples" section below I 
see the example:

"
$ qemu-system-s390x -m 2G \
  -cpu gen16b,ctop=on \
  -smp cpus=5,sockets=8,cores=4,maxcpus=32 \
  -device host-s390x-cpu,core-id=14 \
"

and

"
qemu-system-s390x -m 2G \
  -cpu gen16b,ctop=on \
  -smp cpus=1,sockets=8,cores=4,maxcpus=32 \
  \
  -device gen16b-s390x-cpu,drawer-id=1,book-id=1,socket-id=2,core-id=1 \
  -device gen16b-s390x-cpu,drawer-id=1,book-id=1,socket-id=2,core-id=2 \
  -device gen16b-s390x-cpu,drawer-id=1,book-id=1,socket-id=2,core-id=3 \
  \
  -device gen16b-s390x-cpu,drawer-id=0,book-id=0,socket-id=0,core-id=9 \
  -device gen16b-s390x-cpu,drawer-id=0,book-id=0,socket-id=0,core-id=14 \
  \
  -device gen16b-s390x-cpu,core-id=4,dedicated=on,entitlement=high
"

We received questions about this, so I hope you can shed some light, maybe it 
would be good to just update the web page to include -accel kvm or -enable-kvm 
everywhere for clarity?

Thanks for your help on this,

Claudio

-- 
Claudio Fontana
Engineering Manager Virtualization, SUSE Labs Core

SUSE Software Solutions Italy Srl

Re: [PATCH v2 3/4] tests/avocado: use OpenBSD 7.4 for sbsa-ref

2024-03-14 Thread Marcin Juszkiewicz


W dniu 14.03.2024 o 15:56, Alex Bennée pisze:

If we are not going to delete the entries then at least use a @skip
instead of commenting. Maybe:
@skip("Potential un-diagnosed upstream bug?")

Daniel or Peter suggested to open a GitLab issue and use

 @skip("https://gitlab.com/qemu-project/qemu/-/issues/xyz;)

to track progress.

That's a good idea. Are you going to respin?


Opened https://gitlab.com/qemu-project/qemu/-/issues/2224 to track 
problem. Subscribed to arm@openbsd mailing list.


Will walk the dog and then mail them with problem.

And respin patch series tomorrow.

Re: [PATCH v2 2/2] migration/multifd: Ensure we're not given a socket for file migration

2024-03-14 Thread Peter Xu

On Thu, Mar 14, 2024 at 11:10:12AM -0400, Peter Xu wrote:
> On Wed, Mar 13, 2024 at 06:28:24PM -0300, Fabiano Rosas wrote:
> > When doing migration using the fd: URI, the incoming migration starts
> > before the user has passed the file descriptor to QEMU. This means
> > that the checks at migration_channels_and_transport_compatible()
> > happen too soon and we need to allow a migration channel of type
> > SOCKET_ADDRESS_TYPE_FD even though socket migration is not supported
> > with multifd.
> 
> Hmm, bare with me if this is a stupid one.. why the incoming migration can
> start _before_ the user passed in the fd?
> 
> IOW, why can't we rely on a single fd_is_socket() check for
> SOCKET_ADDRESS_TYPE_FD in transport_supports_multi_channels()?
> 
> > 
> > The commit decdc76772 ("migration/multifd: Add mapped-ram support to
> > fd: URI") was supposed to add a second check prior to starting
> > migration to make sure a socket fd is not passed instead of a file fd,
> > but failed to do so.
> > 
> > Add the missing verification.
> > 
> > Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI")
> > Signed-off-by: Fabiano Rosas 
> > ---
> >  migration/fd.c   | 8 
> >  migration/file.c | 7 +++
> >  2 files changed, 15 insertions(+)
> > 
> > diff --git a/migration/fd.c b/migration/fd.c
> > index 39a52e5c90..c07030f715 100644
> > --- a/migration/fd.c
> > +++ b/migration/fd.c
> > @@ -22,6 +22,7 @@
> >  #include "migration.h"
> >  #include "monitor/monitor.h"
> >  #include "io/channel-file.h"
> > +#include "io/channel-socket.h"
> >  #include "io/channel-util.h"
> >  #include "options.h"
> >  #include "trace.h"
> > @@ -95,6 +96,13 @@ void fd_start_incoming_migration(const char *fdname, 
> > Error **errp)
> >  }
> >  
> >  if (migrate_multifd()) {
> > +if (fd_is_socket(fd)) {
> > +error_setg(errp,
> > +   "Multifd migration to a socket FD is not 
> > supported");
> > +object_unref(ioc);
> > +return;
> > +}

And... I just noticed this is forbiding multifd+socket+fd in general?  But
isn't that the majority of multifd usage when with libvirt over sockets?

Shouldn't it about fd's seekable-or-not instead when mapped-ram enabled
(IOW, migration_needs_seekable_channel() only)?

> > +
> >  file_create_incoming_channels(ioc, errp);
> >  } else {
> >  qio_channel_set_name(ioc, "migration-fd-incoming");
> > diff --git a/migration/file.c b/migration/file.c
> > index ddde0ca818..b6e8ba13f2 100644
> > --- a/migration/file.c
> > +++ b/migration/file.c
> > @@ -15,6 +15,7 @@
> >  #include "file.h"
> >  #include "migration.h"
> >  #include "io/channel-file.h"
> > +#include "io/channel-socket.h"
> >  #include "io/channel-util.h"
> >  #include "options.h"
> >  #include "trace.h"
> > @@ -58,6 +59,12 @@ bool file_send_channel_create(gpointer opaque, Error 
> > **errp)
> >  int fd = fd_args_get_fd();
> >  
> >  if (fd && fd != -1) {
> > +if (fd_is_socket(fd)) {
> > +error_setg(errp,
> > +   "Multifd migration to a socket FD is not 
> > supported");
> > +goto out;
> > +}
> > +
> >  ioc = qio_channel_file_new_dupfd(fd, errp);
> >  } else {
> >  ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, 
> > errp);
> > -- 
> > 2.35.3
> > 
> 
> -- 
> Peter Xu

-- 
Peter Xu

Re: [PATCH v3 2/2] vhost: Perform memory section dirty scans once per iteration

2024-03-14 Thread Eugenio Perez Martin

On Thu, Mar 14, 2024 at 9:38 AM Si-Wei Liu  wrote:
>
> On setups with one or more virtio-net devices with vhost on,
> dirty tracking iteration increases cost the bigger the number
> amount of queues are set up e.g. on idle guests migration the
> following is observed with virtio-net with vhost=on:
>
> 48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
> 8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
> 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13
> 2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14
>
> With high memory rates the symptom is lack of convergence as soon
> as it has a vhost device with a sufficiently high number of queues,
> the sufficient number of vhost devices.
>
> On every migration iteration (every 100msecs) it will redundantly
> query the *shared log* the number of queues configured with vhost
> that exist in the guest. For the virtqueue data, this is necessary,
> but not for the memory sections which are the same. So essentially
> we end up scanning the dirty log too often.
>
> To fix that, select a vhost device responsible for scanning the
> log with regards to memory sections dirty tracking. It is selected
> when we enable the logger (during migration) and cleared when we
> disable the logger. If the vhost logger device goes away for some
> reason, the logger will be re-selected from the rest of vhost
> devices.
>
> After making mem-section logger a singleton instance, constant cost
> of 7%-9% (like the 1 queue report) will be seen, no matter how many
> queues or how many vhost devices are configured:
>
> 48 queues -> 8.71%[.] vhost_dev_sync_region.isra.13
> 2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14
>
> Co-developed-by: Joao Martins 
> Signed-off-by: Joao Martins 
> Signed-off-by: Si-Wei Liu 
> ---
> v2 -> v3:
>   - add after-fix benchmark to commit log
>   - rename vhost_log_dev_enabled to vhost_dev_should_log
>   - remove unneeded comparisons for backend_type
>   - use QLIST array instead of single flat list to store vhost
> logger devices
>   - simplify logger election logic
>
> ---
>  hw/virtio/vhost.c | 63 
> ++-
>  include/hw/virtio/vhost.h |  1 +
>  2 files changed, 58 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index efe2f74..d91858b 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -45,6 +45,7 @@
>
>  static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
>  static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
> +static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX];
>
>  /* Memslots used by backends that support private memslots (without an fd). 
> */
>  static unsigned int used_memslots;
> @@ -149,6 +150,43 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev)
>  }
>  }
>
> +static inline bool vhost_dev_should_log(struct vhost_dev *dev)
> +{
> +assert(dev->vhost_ops);
> +assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE);
> +assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX);
> +
> +return dev == QLIST_FIRST(_log_devs[dev->vhost_ops->backend_type]);
> +}
> +
> +static inline void vhost_dev_elect_mem_logger(struct vhost_dev *hdev, bool 
> add)
> +{
> +VhostBackendType backend_type;
> +
> +assert(hdev->vhost_ops);
> +
> +backend_type = hdev->vhost_ops->backend_type;
> +assert(backend_type > VHOST_BACKEND_TYPE_NONE);
> +assert(backend_type < VHOST_BACKEND_TYPE_MAX);
> +
> +if (add && !QLIST_IS_INSERTED(hdev, logdev_entry)) {
> +if (QLIST_EMPTY(_log_devs[backend_type])) {
> +QLIST_INSERT_HEAD(_log_devs[backend_type],
> +  hdev, logdev_entry);
> +} else {
> +/*
> + * The first vhost_device in the list is selected as the shared
> + * logger to scan memory sections. Put new entry next to the head
> + * to avoid inadvertent change to the underlying logger device.
> + */

Why is changing the logger device a problem? All the code paths are
either changing the QLIST or logging, isn't it?

> +QLIST_INSERT_AFTER(QLIST_FIRST(_log_devs[backend_type]),
> +   hdev, logdev_entry);
> +}
> +} else if (!add && QLIST_IS_INSERTED(hdev, logdev_entry)) {
> +QLIST_REMOVE(hdev, logdev_entry);
> +}
> +}
> +
>  static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
> MemoryRegionSection *section,
> hwaddr first,
> @@ -166,12 +204,14 @@ static int vhost_sync_dirty_bitmap(struct vhost_dev 
> *dev,
>  start_addr = MAX(first, start_addr);
>  end_addr = MIN(last, end_addr);
>
> -for (i = 0; i < dev->mem->nregions; ++i) {
> -struct vhost_memory_region *reg = dev->mem->regions + i;
> -vhost_dev_sync_region(dev, section, start_addr, end_addr,
> -

Re: [PATCH v4 00/21] Workaround Windows failing to find 64bit SMBIOS entry point with SeaBIOS

2024-03-14 Thread Michael S. Tsirkin

On Thu, Mar 14, 2024 at 04:22:41PM +0100, Igor Mammedov wrote:
> Changelog:
>  v4:
>* rebase on top of current master due to conflict with
>  new SMBIOS table 9 commits
>* add extra patch with comments about obscure legacy entries counting


Thanks a lot for the quick turnaround Igor!

>  v3:
>* whitespace missed by checkpatch
>* fix idndent in QAPI
>* reorder 17/20 before 1st 'auto' can be used
>* pick up acks
>  v2:
>* QAPI style fixes (Markus Armbruster )
>* squash 11/19 into 10/19 (Ani Sinha )
>* split '[PATCH 09/19] smbios: build legacy mode code only for 'pc' 
> machine'
>  in 3 smaller patches, to make it more readable
>smbios: add smbios_add_usr_blob_size() helper  
> 
>smbios: rename/expose structures/bitmaps used by both legacy and 
> modern code  
>smbios: build legacy mode code only for 'pc' machine
>* pick up acks
> 
> Windows (10) bootloader when running on top of SeaBIOS, fails to find
> SMBIOSv3 entry point. Tracing it shows that it looks for v2 anchor markers
> only and not v3. Tricking it into believing that entry point is found
> lets Windows successfully locate and parse SMBIOSv3 tables. Whether it
> will be fixed on Windows side is not clear so here goes a workaround.
> 
> Idea is to try build v2 tables if QEMU configuration permits,
> and fallback to v3 tables otherwise. That will mask Windows issue
> form majority of users.
> However if VM configuration can't be described (typically large VMs)
> by v2 tables, QEMU will use SMBIOSv3 and Windows will hit the issue
> again. In this case complain to Microsoft and/or use UEFI instead of
> SeaBIOS (requires reinstall).
> 
> Default compat setting of smbios-entry-point-type after series
> for pc/q35 machines:
>   * 9.0-newer: 'auto'
>   * 8.1-8.2: '64'
>   * 8.0-older: '32'
> 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2008
> 
> Igor Mammedov (21):
>   tests: smbios: make it possible to write SMBIOS only test
>   tests: smbios: add test for -smbios type=11 option
>   tests: smbios: add test for legacy mode CLI options
>   smbios: cleanup smbios_get_tables() from legacy handling
>   smbios: get rid of smbios_smp_sockets global
>   smbios: get rid of smbios_legacy global
>   smbios: avoid mangling user provided tables
>   smbios: don't check type4 structures in legacy mode
>   smbios: add smbios_add_usr_blob_size() helper
>   smbios: rename/expose structures/bitmaps used by both legacy and
> modern code
>   smbios: build legacy mode code only for 'pc' machine
>   smbios: handle errors consistently
>   smbios: get rid of global smbios_ep_type
>   smbios: clear smbios_type4_count before building tables
>   smbios: extend smbios-entry-point-type with 'auto' value
>   smbios: in case of entry point is 'auto' try to build v2 tables 1st
>   smbios: error out when building type 4 table is not possible
>   tests: acpi/smbios: whitelist expected blobs
>   pc/q35: set SMBIOS entry point type to 'auto' by default
>   tests: acpi: update expected SSDT.dimmpxm blob
>   smbios: add extra comments to smbios_get_table_legacy()
> 
>  hw/i386/fw_cfg.h |   3 +-
>  include/hw/firmware/smbios.h |  28 +-
>  hw/arm/virt.c|   6 +-
>  hw/i386/Kconfig  |   1 +
>  hw/i386/fw_cfg.c |  14 +-
>  hw/i386/pc.c |   4 +-
>  hw/i386/pc_piix.c|   4 +
>  hw/i386/pc_q35.c |   3 +
>  hw/loongarch/virt.c  |   7 +-
>  hw/riscv/virt.c  |   6 +-
>  hw/smbios/Kconfig|   2 +
>  hw/smbios/meson.build|   4 +
>  hw/smbios/smbios.c   | 483 +++
>  hw/smbios/smbios_legacy.c| 192 +++
>  hw/smbios/smbios_legacy_stub.c   |  15 +
>  qapi/machine.json|   5 +-
>  tests/data/acpi/q35/SSDT.dimmpxm | Bin 1815 -> 1815 bytes
>  tests/data/smbios/type11_blob| Bin 0 -> 11 bytes
>  tests/data/smbios/type11_blob.legacy | Bin 0 -> 10 bytes
>  tests/qtest/bios-tables-test.c   |  81 -
>  20 files changed, 541 insertions(+), 317 deletions(-)
>  create mode 100644 hw/smbios/smbios_legacy.c
>  create mode 100644 hw/smbios/smbios_legacy_stub.c
>  create mode 100644 tests/data/smbios/type11_blob
>  create mode 100644 tests/data/smbios/type11_blob.legacy
> 
> --
> 2.39.3

Re: [PATCH v3 1/2] vhost: dirty log should be per backend type

2024-03-14 Thread Eugenio Perez Martin

On Thu, Mar 14, 2024 at 9:38 AM Si-Wei Liu  wrote:
>
> There could be a mix of both vhost-user and vhost-kernel clients
> in the same QEMU process, where separate vhost loggers for the
> specific vhost type have to be used. Make the vhost logger per
> backend type, and have them properly reference counted.
>
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Si-Wei Liu 
> ---
> v2->v3:
>   - remove non-effective assertion that never be reached
>   - do not return NULL from vhost_log_get()
>   - add neccessary assertions to vhost_log_get()
>
> ---
>  hw/virtio/vhost.c | 50 ++
>  1 file changed, 38 insertions(+), 12 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 2c9ac79..efe2f74 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -43,8 +43,8 @@
>  do { } while (0)
>  #endif
>
> -static struct vhost_log *vhost_log;
> -static struct vhost_log *vhost_log_shm;
> +static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
> +static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
>
>  /* Memslots used by backends that support private memslots (without an fd). 
> */
>  static unsigned int used_memslots;
> @@ -287,6 +287,10 @@ static int vhost_set_backend_type(struct vhost_dev *dev,
>  r = -1;
>  }
>
> +if (r == 0) {
> +assert(dev->vhost_ops->backend_type == backend_type);
> +}
> +
>  return r;
>  }
>
> @@ -319,16 +323,22 @@ static struct vhost_log *vhost_log_alloc(uint64_t size, 
> bool share)
>  return log;
>  }
>
> -static struct vhost_log *vhost_log_get(uint64_t size, bool share)
> +static struct vhost_log *vhost_log_get(VhostBackendType backend_type,
> +   uint64_t size, bool share)
>  {
> -struct vhost_log *log = share ? vhost_log_shm : vhost_log;
> +struct vhost_log *log;
> +
> +assert(backend_type > VHOST_BACKEND_TYPE_NONE);
> +assert(backend_type < VHOST_BACKEND_TYPE_MAX);
> +
> +log = share ? vhost_log_shm[backend_type] : vhost_log[backend_type];
>
>  if (!log || log->size != size) {
>  log = vhost_log_alloc(size, share);
>  if (share) {
> -vhost_log_shm = log;
> +vhost_log_shm[backend_type] = log;
>  } else {
> -vhost_log = log;
> +vhost_log[backend_type] = log;
>  }
>  } else {
>  ++log->refcnt;
> @@ -340,11 +350,20 @@ static struct vhost_log *vhost_log_get(uint64_t size, 
> bool share)
>  static void vhost_log_put(struct vhost_dev *dev, bool sync)
>  {
>  struct vhost_log *log = dev->log;
> +VhostBackendType backend_type;
>
>  if (!log) {
>  return;
>  }
>
> +assert(dev->vhost_ops);
> +backend_type = dev->vhost_ops->backend_type;
> +
> +if (backend_type == VHOST_BACKEND_TYPE_NONE ||
> +backend_type >= VHOST_BACKEND_TYPE_MAX) {
> +return;
> +}
> +
>  --log->refcnt;
>  if (log->refcnt == 0) {
>  /* Sync only the range covered by the old log */
> @@ -352,13 +371,13 @@ static void vhost_log_put(struct vhost_dev *dev, bool 
> sync)
>  vhost_log_sync_range(dev, 0, dev->log_size * VHOST_LOG_CHUNK - 
> 1);
>  }
>
> -if (vhost_log == log) {
> +if (vhost_log[backend_type] == log) {
>  g_free(log->log);
> -vhost_log = NULL;
> -} else if (vhost_log_shm == log) {
> +vhost_log[backend_type] = NULL;
> +} else if (vhost_log_shm[backend_type] == log) {
>  qemu_memfd_free(log->log, log->size * sizeof(*(log->log)),
>  log->fd);
> -vhost_log_shm = NULL;
> +vhost_log_shm[backend_type] = NULL;
>  }
>
>  g_free(log);
> @@ -376,7 +395,8 @@ static bool vhost_dev_log_is_shared(struct vhost_dev *dev)
>
>  static inline void vhost_dev_log_resize(struct vhost_dev *dev, uint64_t size)
>  {
> -struct vhost_log *log = vhost_log_get(size, 
> vhost_dev_log_is_shared(dev));
> +struct vhost_log *log = vhost_log_get(dev->vhost_ops->backend_type,
> +  size, 
> vhost_dev_log_is_shared(dev));
>  uint64_t log_base = (uintptr_t)log->log;
>  int r;
>
> @@ -2037,8 +2057,14 @@ int vhost_dev_start(struct vhost_dev *hdev, 
> VirtIODevice *vdev, bool vrings)
>  uint64_t log_base;
>
>  hdev->log_size = vhost_get_log_size(hdev);
> -hdev->log = vhost_log_get(hdev->log_size,
> +hdev->log = vhost_log_get(hdev->vhost_ops->backend_type,
> +  hdev->log_size,
>vhost_dev_log_is_shared(hdev));
> +if (!hdev->log) {

I thought vhost_log_get couldn't return NULL :).

Other than that,

Acked-by: Eugenio Pérez 

> +VHOST_OPS_DEBUG(r, "vhost_log_get failed");
> +goto fail_vq;
> +}
> +
>  log_base = (uintptr_t)hdev->log->log;
>  r =

[PATCH v4 19/21] pc/q35: set SMBIOS entry point type to 'auto' by default

2024-03-14 Thread Igor Mammedov

Use smbios-entry-point-type='auto' for newer machine types as a workaround
for Windows not detecting SMBIOS tables. Which makes QEMU pick SMBIOS tables
based on configuration (with 2.x preferred and fallback to 3.x if the former
isn't compatible with configuration)

Default compat setting of smbios-entry-point-type after series
for pc/q35 machines:
  * 9.0-newer: 'auto'
  * 8.1-8.2: '64'
  * 8.0-older: '32'

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2008
Signed-off-by: Igor Mammedov 
Reviewed-by: Ani Sinha 
Tested-by: Fiona Ebner 
---
 hw/i386/pc.c  | 2 +-
 hw/i386/pc_piix.c | 4 
 hw/i386/pc_q35.c  | 3 +++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 44eb073abd..e80f02bef4 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1832,7 +1832,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->nvdimm_supported = true;
 mc->smp_props.dies_supported = true;
 mc->default_ram_id = "pc.ram";
-pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_64;
+pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_AUTO;
 
 object_class_property_add(oc, PC_MACHINE_MAX_RAM_BELOW_4G, "size",
 pc_machine_get_max_ram_below_4g, pc_machine_set_max_ram_below_4g,
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index c9a6c0aa68..18ba076609 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -525,12 +525,16 @@ DEFINE_I440FX_MACHINE(v9_0, "pc-i440fx-9.0", NULL,
 
 static void pc_i440fx_8_2_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+
 pc_i440fx_9_0_machine_options(m);
 m->alias = NULL;
 m->is_default = false;
 
 compat_props_add(m->compat_props, hw_compat_8_2, hw_compat_8_2_len);
 compat_props_add(m->compat_props, pc_compat_8_2, pc_compat_8_2_len);
+/* For pc-i44fx-8.2 and 8.1, use SMBIOS 3.X by default */
+pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_64;
 }
 
 DEFINE_I440FX_MACHINE(v8_2, "pc-i440fx-8.2", NULL,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 8a427c4647..b5922b44af 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -376,11 +376,14 @@ DEFINE_Q35_MACHINE(v9_0, "pc-q35-9.0", NULL,
 
 static void pc_q35_8_2_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_q35_9_0_machine_options(m);
 m->alias = NULL;
 m->max_cpus = 1024;
 compat_props_add(m->compat_props, hw_compat_8_2, hw_compat_8_2_len);
 compat_props_add(m->compat_props, pc_compat_8_2, pc_compat_8_2_len);
+/* For pc-q35-8.2 and 8.1, use SMBIOS 3.X by default */
+pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_64;
 }
 
 DEFINE_Q35_MACHINE(v8_2, "pc-q35-8.2", NULL,
-- 
2.39.3

[PATCH v4 03/21] tests: smbios: add test for legacy mode CLI options

2024-03-14 Thread Igor Mammedov

Unfortunately having 2.0 machine type deprecated is not enough
to get rid of legacy SMBIOS handling since 'isapc' also uses
that and it's staying around.

Hence add test for CLI options handling to be sure that it
ain't broken during SMBIOS code refactoring.

Signed-off-by: Igor Mammedov 
Reviewed-by: Ani Sinha 
Tested-by: Fiona Ebner 
---
 tests/data/smbios/type11_blob.legacy | Bin 0 -> 10 bytes
 tests/qtest/bios-tables-test.c   |  17 +
 2 files changed, 17 insertions(+)
 create mode 100644 tests/data/smbios/type11_blob.legacy

diff --git a/tests/data/smbios/type11_blob.legacy 
b/tests/data/smbios/type11_blob.legacy
new file mode 100644
index 
..aef463aab903405958b0a85f85c5980671c08bee
GIT binary patch
literal 10
Rcmd;PW!S(N;u;*n000Tp0s;U4

literal 0
HcmV?d1

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index a116f88e1d..d1ff4db7a2 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -2106,6 +2106,21 @@ static void test_acpi_pc_smbios_blob(void)
 free_test_data();
 }
 
+static void test_acpi_isapc_smbios_legacy(void)
+{
+uint8_t req_type11[] = { 1, 11 };
+test_data data = {
+.machine = "isapc",
+.variant = ".pc_smbios_legacy",
+.required_struct_types = req_type11,
+.required_struct_types_len = ARRAY_SIZE(req_type11),
+};
+
+test_smbios("-smbios file=tests/data/smbios/type11_blob.legacy "
+"-smbios type=1,family=TEST", );
+free_test_data();
+}
+
 static void test_oem_fields(test_data *data)
 {
 int i;
@@ -2261,6 +2276,8 @@ int main(int argc, char *argv[])
test_acpi_pc_smbios_options);
 qtest_add_func("acpi/piix4/smbios-blob",
test_acpi_pc_smbios_blob);
+qtest_add_func("acpi/piix4/smbios-legacy",
+   test_acpi_isapc_smbios_legacy);
 }
 if (qtest_has_machine(MACHINE_Q35)) {
 qtest_add_func("acpi/q35", test_acpi_q35_tcg);
-- 
2.39.3

[PATCH v4 17/21] smbios: error out when building type 4 table is not possible

2024-03-14 Thread Igor Mammedov

If SMBIOS v2 version is requested but number of cores/threads
are more than it's possible to describe with v2, error out
instead of silently ignoring the fact and filling core/thread
count with bogus values.

This will help caller to decide if it should fallback to
SMBIOSv3 when smbios-entry-point-type='auto'

Signed-off-by: Igor Mammedov 
Reviewed-by: Ani Sinha 
Tested-by: Fiona Ebner 
---
 hw/smbios/smbios.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index d9dda226e6..be64919def 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -669,7 +669,8 @@ static void smbios_build_type_3_table(void)
 }
 
 static void smbios_build_type_4_table(MachineState *ms, unsigned instance,
-  SmbiosEntryPointType ep_type)
+  SmbiosEntryPointType ep_type,
+  Error **errp)
 {
 char sock_str[128];
 size_t tbl_len = SMBIOS_TYPE_4_LEN_V28;
@@ -723,6 +724,12 @@ static void smbios_build_type_4_table(MachineState *ms, 
unsigned instance,
 if (tbl_len == SMBIOS_TYPE_4_LEN_V30) {
 t->core_count2 = t->core_enabled2 = cpu_to_le16(cores_per_socket);
 t->thread_count2 = cpu_to_le16(threads_per_socket);
+} else if (t->core_count == 0xFF || t->thread_count == 0xFF) {
+error_setg(errp, "SMBIOS 2.0 doesn't support number of processor "
+ "cores/threads more than 255, use "
+ "-machine smbios-entry-point-type=64 option to enable 
"
+ "SMBIOS 3.0 support");
+return;
 }
 
 SMBIOS_BUILD_TABLE_POST;
@@ -,7 +1118,10 @@ static bool smbios_get_tables_ep(MachineState *ms,
 assert(ms->smp.sockets >= 1);
 
 for (i = 0; i < ms->smp.sockets; i++) {
-smbios_build_type_4_table(ms, i, ep_type);
+smbios_build_type_4_table(ms, i, ep_type, errp);
+if (*errp) {
+goto err_exit;
+}
 }
 
 smbios_build_type_8_table();
-- 
2.39.3

[PATCH v4 04/21] smbios: cleanup smbios_get_tables() from legacy handling

2024-03-14 Thread Igor Mammedov

smbios_get_tables() bails out right away if leagacy mode is enabled
and won't generate any SMBIOS tables. At the same time x86 specific
fw_cfg_build_smbios() will genarate legacy tables and then proceed
to preparing temporary mem_array for useless call to
smbios_get_tables() and then discard it.

Drop legacy related check in smbios_get_tables() and return from
fw_cfg_build_smbios() early if legacy tables where built without
proceeding to non legacy part of the function.

Signed-off-by: Igor Mammedov 
Reviewed-by: Ani Sinha 
Tested-by: Fiona Ebner 
---
 hw/i386/fw_cfg.c   | 1 +
 hw/smbios/smbios.c | 6 --
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
index 98a478c276..a635234e68 100644
--- a/hw/i386/fw_cfg.c
+++ b/hw/i386/fw_cfg.c
@@ -74,6 +74,7 @@ void fw_cfg_build_smbios(PCMachineState *pcms, FWCfgState 
*fw_cfg)
 if (smbios_tables) {
 fw_cfg_add_bytes(fw_cfg, FW_CFG_SMBIOS_ENTRIES,
  smbios_tables, smbios_tables_len);
+return;
 }
 
 /* build the array of physical mem area from e820 table */
diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index e3d5d8f2e2..22369b49fb 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -1229,12 +1229,6 @@ void smbios_get_tables(MachineState *ms,
 {
 unsigned i, dimm_cnt, offset;
 
-if (smbios_legacy) {
-*tables = *anchor = NULL;
-*tables_len = *anchor_len = 0;
-return;
-}
-
 if (!smbios_immutable) {
 smbios_build_type_0_table();
 smbios_build_type_1_table();
-- 
2.39.3

[PATCH v4 01/21] tests: smbios: make it possible to write SMBIOS only test

2024-03-14 Thread Igor Mammedov

Cureently it not possible to run SMBIOS test without ACPI one,
which gets into the way when testing ACPI-less configs.

Extract SMBIOS testing into separate routines that could also
be run without ACPI dependency and use that for testing SMBIOS.

As the 1st user add "acpi/piix4/smbios-options" test case.

Signed-off-by: Igor Mammedov 
Reviewed-by: Ani Sinha 
Tested-by: Fiona Ebner 
---
 tests/qtest/bios-tables-test.c | 47 +++---
 1 file changed, 38 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index 21811a1ab5..b2992bafa8 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -858,16 +858,8 @@ static void test_vm_prepare(const char *params, test_data 
*data)
 g_free(args);
 }
 
-static void process_acpi_tables_noexit(test_data *data)
+static void process_smbios_tables_noexit(test_data *data)
 {
-test_acpi_load_tables(data);
-
-if (getenv(ACPI_REBUILD_EXPECTED_AML)) {
-dump_aml_files(data, true);
-} else {
-test_acpi_asl(data);
-}
-
 /*
  * TODO: make SMBIOS tests work with UEFI firmware,
  * Bug on uefi-test-tools to provide entry point:
@@ -879,6 +871,27 @@ static void process_acpi_tables_noexit(test_data *data)
 }
 }
 
+static void test_smbios(const char *params, test_data *data)
+{
+test_vm_prepare(params, data);
+boot_sector_test(data->qts);
+process_smbios_tables_noexit(data);
+qtest_quit(data->qts);
+}
+
+static void process_acpi_tables_noexit(test_data *data)
+{
+test_acpi_load_tables(data);
+
+if (getenv(ACPI_REBUILD_EXPECTED_AML)) {
+dump_aml_files(data, true);
+} else {
+test_acpi_asl(data);
+}
+
+process_smbios_tables_noexit(data);
+}
+
 static void process_acpi_tables(test_data *data)
 {
 process_acpi_tables_noexit(data);
@@ -2064,6 +2077,20 @@ static void test_acpi_q35_pvpanic_isa(void)
 free_test_data();
 }
 
+static void test_acpi_pc_smbios_options(void)
+{
+uint8_t req_type11[] = { 11 };
+test_data data = {
+.machine = MACHINE_PC,
+.variant = ".pc_smbios_options",
+.required_struct_types = req_type11,
+.required_struct_types_len = ARRAY_SIZE(req_type11),
+};
+
+test_smbios("-smbios type=11,value=TEST", );
+free_test_data();
+}
+
 static void test_oem_fields(test_data *data)
 {
 int i;
@@ -2215,6 +2242,8 @@ int main(int argc, char *argv[])
 #ifdef CONFIG_POSIX
 qtest_add_func("acpi/piix4/acpierst", test_acpi_piix4_acpi_erst);
 #endif
+qtest_add_func("acpi/piix4/smbios-options",
+   test_acpi_pc_smbios_options);
 }
 if (qtest_has_machine(MACHINE_Q35)) {
 qtest_add_func("acpi/q35", test_acpi_q35_tcg);
-- 
2.39.3

[PATCH v4 10/21] smbios: rename/expose structures/bitmaps used by both legacy and modern code

2024-03-14 Thread Igor Mammedov

As a preparation to move legacy handling into a separate file,
add prefix 'smbios_' to type0/type1/have_binfile_bitmap/have_fields_bitmap
and expose them in smbios.h so that they can be reused in
legacy and modern code.

Doing it as a separate patch to avoid rename cluttering follow-up
patch which will move legacy code into a separate file.

Signed-off-by: Igor Mammedov 
Reviewed-by: Ani Sinha 
---
 include/hw/firmware/smbios.h |  16 +
 hw/smbios/smbios.c   | 113 ---
 2 files changed, 69 insertions(+), 60 deletions(-)

diff --git a/include/hw/firmware/smbios.h b/include/hw/firmware/smbios.h
index 0f0dca8f83..05707c6341 100644
--- a/include/hw/firmware/smbios.h
+++ b/include/hw/firmware/smbios.h
@@ -2,6 +2,7 @@
 #define QEMU_SMBIOS_H
 
 #include "qapi/qapi-types-machine.h"
+#include "qemu/bitmap.h"
 
 /*
  * SMBIOS Support
@@ -16,8 +17,23 @@
  *
  */
 
+typedef struct {
+const char *vendor, *version, *date;
+bool have_major_minor, uefi;
+uint8_t major, minor;
+} smbios_type0_t;
+extern smbios_type0_t smbios_type0;
+
+typedef struct {
+const char *manufacturer, *product, *version, *serial, *sku, *family;
+/* uuid is in qemu_uuid */
+} smbios_type1_t;
+extern smbios_type1_t smbios_type1;
 
 #define SMBIOS_MAX_TYPE 127
+extern DECLARE_BITMAP(smbios_have_binfile_bitmap, SMBIOS_MAX_TYPE + 1);
+extern DECLARE_BITMAP(smbios_have_fields_bitmap, SMBIOS_MAX_TYPE + 1);
+
 #define offsetofend(TYPE, MEMBER) \
(offsetof(TYPE, MEMBER) + sizeof_field(TYPE, MEMBER))
 
diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index d530667a9d..e93b8f9cb1 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -78,19 +78,11 @@ static int smbios_type4_count = 0;
 static bool smbios_have_defaults;
 static uint32_t smbios_cpuid_version, smbios_cpuid_features;
 
-static DECLARE_BITMAP(have_binfile_bitmap, SMBIOS_MAX_TYPE+1);
-static DECLARE_BITMAP(have_fields_bitmap, SMBIOS_MAX_TYPE+1);
+DECLARE_BITMAP(smbios_have_binfile_bitmap, SMBIOS_MAX_TYPE + 1);
+DECLARE_BITMAP(smbios_have_fields_bitmap, SMBIOS_MAX_TYPE + 1);
 
-static struct {
-const char *vendor, *version, *date;
-bool have_major_minor, uefi;
-uint8_t major, minor;
-} type0;
-
-static struct {
-const char *manufacturer, *product, *version, *serial, *sku, *family;
-/* uuid is in qemu_uuid */
-} type1;
+smbios_type0_t smbios_type0;
+smbios_type1_t smbios_type1;
 
 static struct {
 const char *manufacturer, *product, *version, *serial, *asset, *location;
@@ -599,36 +591,36 @@ static void smbios_maybe_add_str(int type, int offset, 
const char *data)
 static void smbios_build_type_0_fields(void)
 {
 smbios_maybe_add_str(0, offsetof(struct smbios_type_0, vendor_str),
- type0.vendor);
+ smbios_type0.vendor);
 smbios_maybe_add_str(0, offsetof(struct smbios_type_0, bios_version_str),
- type0.version);
+ smbios_type0.version);
 smbios_maybe_add_str(0, offsetof(struct smbios_type_0,
  bios_release_date_str),
- type0.date);
-if (type0.have_major_minor) {
+ smbios_type0.date);
+if (smbios_type0.have_major_minor) {
 smbios_add_field(0, offsetof(struct smbios_type_0,
  system_bios_major_release),
- , 1);
+ _type0.major, 1);
 smbios_add_field(0, offsetof(struct smbios_type_0,
  system_bios_minor_release),
- , 1);
+ _type0.minor, 1);
 }
 }
 
 static void smbios_build_type_1_fields(void)
 {
 smbios_maybe_add_str(1, offsetof(struct smbios_type_1, manufacturer_str),
- type1.manufacturer);
+ smbios_type1.manufacturer);
 smbios_maybe_add_str(1, offsetof(struct smbios_type_1, product_name_str),
- type1.product);
+ smbios_type1.product);
 smbios_maybe_add_str(1, offsetof(struct smbios_type_1, version_str),
- type1.version);
+ smbios_type1.version);
 smbios_maybe_add_str(1, offsetof(struct smbios_type_1, serial_number_str),
- type1.serial);
+ smbios_type1.serial);
 smbios_maybe_add_str(1, offsetof(struct smbios_type_1, sku_number_str),
- type1.sku);
+ smbios_type1.sku);
 smbios_maybe_add_str(1, offsetof(struct smbios_type_1, family_str),
- type1.family);
+ smbios_type1.family);
 if (qemu_uuid_set) {
 /* We don't encode the UUID in the "wire format" here because this
  * function is for legacy mode and needs to keep the guest ABI, and
@@ -646,14 +638,14 @@ uint8_t *smbios_get_table_legacy(size_t

[PATCH v4 08/21] smbios: don't check type4 structures in legacy mode

2024-03-14 Thread Igor Mammedov

legacy mode doesn't support structures of type 2 and more,
and CLI has a check for '-smbios type' option, however it's
still possible to sneak in type4 as a blob with '-smbios file'
option. However doing the later makes SMBIOS tables broken
since SeaBIOS doesn't expect that.

Rather than trying to add support for type4 to legacy code
(both QEMU and SeaBIOS), simplify smbios_get_table_legacy()
by dropping not relevant check in legacy code and error out
on type4 blob.

Signed-off-by: Igor Mammedov 
Reviewed-by: Ani Sinha 
Tested-by: Fiona Ebner 
---
 * The issue affects 'isapc' and pc-i440fx-2.0. the later is
   in deprecated state and to be dropped in near future
 * possibly the same issue applies to other SMBIOS types above type 1
   but I haven't tested that, and well tables that aren't
   generated by SeaBIOS can get be added just fine
   (tested type11 blob). So I went with a minimal change
   to fixup type4 only that I'm touching. Leaving the rest
   for other time or when someone complains about it, which is
   very unlikely given it's really only remaining isapc machine.

   I'd very much prefer to deprecate 'isapc' and then drop
   all legacy related code (it will benefit not only SMBIOS
   but other code as well).
   BTW: 'isapc' is in semi-dead, I cna't boot RHEL6 on it
   with KVM enabled anymore (RHEL9 host), TCG still boots though.
   One more reason to get deprecate it.
---
 include/hw/firmware/smbios.h |  2 +-
 hw/i386/fw_cfg.c |  3 +--
 hw/smbios/smbios.c   | 18 ++
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/include/hw/firmware/smbios.h b/include/hw/firmware/smbios.h
index 7b42e7b4ac..0f0dca8f83 100644
--- a/include/hw/firmware/smbios.h
+++ b/include/hw/firmware/smbios.h
@@ -313,7 +313,7 @@ void smbios_set_defaults(const char *manufacturer, const 
char *product,
  const char *version,
  bool uuid_encoded, SmbiosEntryPointType ep_type);
 void smbios_set_default_processor_family(uint16_t processor_family);
-uint8_t *smbios_get_table_legacy(uint32_t expected_t4_count, size_t *length);
+uint8_t *smbios_get_table_legacy(size_t *length);
 void smbios_get_tables(MachineState *ms,
const struct smbios_phys_mem_area *mem_array,
const unsigned int mem_array_size,
diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
index c1e9c0fd9c..d1281066f4 100644
--- a/hw/i386/fw_cfg.c
+++ b/hw/i386/fw_cfg.c
@@ -71,8 +71,7 @@ void fw_cfg_build_smbios(PCMachineState *pcms, FWCfgState 
*fw_cfg)
 smbios_set_cpuid(cpu->env.cpuid_version, cpu->env.features[FEAT_1_EDX]);
 
 if (pcmc->smbios_legacy_mode) {
-smbios_tables = smbios_get_table_legacy(ms->smp.cpus,
-_tables_len);
+smbios_tables = smbios_get_table_legacy(_tables_len);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_SMBIOS_ENTRIES,
  smbios_tables, smbios_tables_len);
 return;
diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index db422f4fb0..a96e5cd839 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -545,14 +545,17 @@ opts_init(smbios_register_config);
  */
 #define SMBIOS_21_MAX_TABLES_LEN 0x
 
-static void smbios_validate_table(uint32_t expected_t4_count)
+static void smbios_check_type4_count(uint32_t expected_t4_count)
 {
 if (smbios_type4_count && smbios_type4_count != expected_t4_count) {
 error_report("Expected %d SMBIOS Type 4 tables, got %d instead",
  expected_t4_count, smbios_type4_count);
 exit(1);
 }
+}
 
+static void smbios_validate_table(void)
+{
 if (smbios_ep_type == SMBIOS_ENTRY_POINT_TYPE_32 &&
 smbios_tables_len > SMBIOS_21_MAX_TABLES_LEN) {
 error_report("SMBIOS 2.1 table length %zu exceeds %d",
@@ -637,7 +640,7 @@ static void smbios_build_type_1_fields(void)
 }
 }
 
-uint8_t *smbios_get_table_legacy(uint32_t expected_t4_count, size_t *length)
+uint8_t *smbios_get_table_legacy(size_t *length)
 {
 int i;
 size_t usr_offset;
@@ -650,6 +653,12 @@ uint8_t *smbios_get_table_legacy(uint32_t 
expected_t4_count, size_t *length)
 exit(1);
 }
 
+if (test_bit(4, have_binfile_bitmap)) {
+error_report("can't process table for smbios "
+ "type 4 on machine versions < 2.1!");
+exit(1);
+}
+
 g_free(smbios_entries);
 smbios_entries_len = sizeof(uint16_t);
 smbios_entries = g_malloc0(smbios_entries_len);
@@ -676,7 +685,7 @@ uint8_t *smbios_get_table_legacy(uint32_t 
expected_t4_count, size_t *length)
 
 smbios_build_type_0_fields();
 smbios_build_type_1_fields();
-smbios_validate_table(expected_t4_count);
+smbios_validate_table();
 *length = smbios_entries_len;
 return smbios_entries;
 }
@@ -1304,7 +1313,8 @@ void smbios_get_tables(MachineState *ms,
 smbios_build_type_41_table(errp);
 smbios_build_type_127_table();
 
-

[PATCH v4 06/21] smbios: get rid of smbios_legacy global

2024-03-14 Thread Igor Mammedov

clean up smbios_set_defaults() which is reused by legacy
and non legacy machines from being aware of 'legacy' notion
and need to turn it off. And push legacy handling up to
PC machine code where it's relevant.

Signed-off-by: Igor Mammedov 
Reviewed-by: Ani Sinha 
Acked-by: Daniel Henrique Barboza 
Tested-by: Fiona Ebner 
---
PS: I've moved/kept legacy smbios_entries to smbios_get_tables()
but it at least is not visible to API users. To get rid of it
as well, it would be necessary to change how '-smbios' CLI
option is processed. Which is done later in the series.
---
 include/hw/firmware/smbios.h |  2 +-
 hw/arm/virt.c|  2 +-
 hw/i386/fw_cfg.c |  7 ---
 hw/loongarch/virt.c  |  2 +-
 hw/riscv/virt.c  |  2 +-
 hw/smbios/smbios.c   | 35 +++
 6 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/include/hw/firmware/smbios.h b/include/hw/firmware/smbios.h
index 36744b6cc9..7b42e7b4ac 100644
--- a/include/hw/firmware/smbios.h
+++ b/include/hw/firmware/smbios.h
@@ -310,7 +310,7 @@ struct smbios_type_127 {
 void smbios_entry_add(QemuOpts *opts, Error **errp);
 void smbios_set_cpuid(uint32_t version, uint32_t features);
 void smbios_set_defaults(const char *manufacturer, const char *product,
- const char *version, bool legacy_mode,
+ const char *version,
  bool uuid_encoded, SmbiosEntryPointType ep_type);
 void smbios_set_default_processor_family(uint16_t processor_family);
 uint8_t *smbios_get_table_legacy(uint32_t expected_t4_count, size_t *length);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e5cd935232..b634c908a7 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1650,7 +1650,7 @@ static void virt_build_smbios(VirtMachineState *vms)
 }
 
 smbios_set_defaults("QEMU", product,
-vmc->smbios_old_sys_ver ? "1.0" : mc->name, false,
+vmc->smbios_old_sys_ver ? "1.0" : mc->name,
 true, SMBIOS_ENTRY_POINT_TYPE_64);
 
 /* build the array of physical mem area from base_memmap */
diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
index fcb4fb0769..c1e9c0fd9c 100644
--- a/hw/i386/fw_cfg.c
+++ b/hw/i386/fw_cfg.c
@@ -63,15 +63,16 @@ void fw_cfg_build_smbios(PCMachineState *pcms, FWCfgState 
*fw_cfg)
 if (pcmc->smbios_defaults) {
 /* These values are guest ABI, do not change */
 smbios_set_defaults("QEMU", mc->desc, mc->name,
-pcmc->smbios_legacy_mode, 
pcmc->smbios_uuid_encoded,
+pcmc->smbios_uuid_encoded,
 pcms->smbios_entry_point_type);
 }
 
 /* tell smbios about cpuid version and features */
 smbios_set_cpuid(cpu->env.cpuid_version, cpu->env.features[FEAT_1_EDX]);
 
-smbios_tables = smbios_get_table_legacy(ms->smp.cpus, _tables_len);
-if (smbios_tables) {
+if (pcmc->smbios_legacy_mode) {
+smbios_tables = smbios_get_table_legacy(ms->smp.cpus,
+_tables_len);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_SMBIOS_ENTRIES,
  smbios_tables, smbios_tables_len);
 return;
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index efce112310..53bfdcee61 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -355,7 +355,7 @@ static void virt_build_smbios(LoongArchMachineState *lams)
 return;
 }
 
-smbios_set_defaults("QEMU", product, mc->name, false,
+smbios_set_defaults("QEMU", product, mc->name,
 true, SMBIOS_ENTRY_POINT_TYPE_64);
 
 smbios_get_tables(ms, NULL, 0, _tables, _tables_len,
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index a094af97c3..535fd047ba 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -1275,7 +1275,7 @@ static void virt_build_smbios(RISCVVirtState *s)
 product = "KVM Virtual Machine";
 }
 
-smbios_set_defaults("QEMU", product, mc->name, false,
+smbios_set_defaults("QEMU", product, mc->name,
 true, SMBIOS_ENTRY_POINT_TYPE_64);
 
 if (riscv_is_32bit(>soc[0])) {
diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index 50617fa585..1bae36a6e0 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -54,7 +54,6 @@ struct smbios_table {
 
 static uint8_t *smbios_entries;
 static size_t smbios_entries_len;
-static bool smbios_legacy = true;
 static bool smbios_uuid_encoded = true;
 /* end: legacy structures & constants for <= 2.0 machines */
 
@@ -633,9 +632,16 @@ static void smbios_build_type_1_fields(void)
 
 uint8_t *smbios_get_table_legacy(uint32_t expected_t4_count, size_t *length)
 {
-if (!smbios_legacy) {
-*length = 0;
-return NULL;
+/* drop unwanted version of command-line file blob(s) */
+g_free(smbios_tables);
+smbios_tables = NULL;
+
+/* also complain if fields were given for types > 1 */
+

[PATCH v4 20/21] tests: acpi: update expected SSDT.dimmpxm blob

2024-03-14 Thread Igor Mammedov

address shift is caused by switch to 32-bit SMBIOS entry point
which has slightly different size from 64-bit one and happens
to trigger a bit different memory layout.

Expected diff:

-Name (MEMA, 0x07FFE000)
+Name (MEMA, 0x07FFF000)

Signed-off-by: Igor Mammedov 
Acked-by: Ani Sinha 
---
 tests/qtest/bios-tables-test-allowed-diff.h |   1 -
 tests/data/acpi/q35/SSDT.dimmpxm| Bin 1815 -> 1815 bytes
 2 files changed, 1 deletion(-)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index 81148a604f..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,2 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/q35/SSDT.dimmpxm",
diff --git a/tests/data/acpi/q35/SSDT.dimmpxm b/tests/data/acpi/q35/SSDT.dimmpxm
index 
70f133412f5e0aa128ab210245a8de7304eeb843..9ea4e0d0ceaa8a5cbd706afb6d49de853fafe654
 100644
GIT binary patch
delta 23
ecmbQvH=U0wIM^jboSlJzam_|9E_UV*|JeaVTLvQl

delta 23
ecmbQvH=U0wIM^jboSlJzanD9BE_UVz|JeaVy9Ofw

-- 
2.39.3

Re: [RFC PATCH 3/5] cxl/core: introduce cxl_mem_report_poison()

2024-03-14 Thread Shiyang Ruan via





在 2024/2/10 14:46, Dan Williams 写道:

Shiyang Ruan wrote:

If poison is detected(reported from cxl memdev), OS should be notified to
handle it.  Introduce this function:
   1. translate DPA to HPA;
   2. construct a MCE instance; (TODO: more details need to be filled)
   3. log it into MCE event queue;

After that, MCE mechanism can walk over its notifier chain to execute
specific handlers.

Signed-off-by: Shiyang Ruan 
---
  arch/x86/kernel/cpu/mce/core.c |  1 +
  drivers/cxl/core/mbox.c| 33 +
  2 files changed, 34 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index bc39252bc54f..a64c0aceb7e0 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -131,6 +131,7 @@ void mce_setup(struct mce *m)
m->ppin = cpu_data(m->extcpu).ppin;
m->microcode = boot_cpu_data.microcode;
  }
+EXPORT_SYMBOL_GPL(mce_setup);


No, mce_setup() is x86 specific and the CXL subsystem is CPU
architecture independent. My expectation is that CXL should translate
errors for edac similar to how the ACPI GHES code does it. See usage of
edac_raw_mc_handle_error() and memory_failure_queue().

Otherwise an MCE is a CPU consumption of poison event, and CXL is
reporting device-side discovery of poison.


Yes, I misunderstood here.  I was mean to use MCE to finally call 
memory_failure(). I think memory_failure_queue() is what I need.


  void memory_failure_queue(unsigned long pfn, int flags)

But it can only queue one PFN at a time, we may need to make it support 
queuing a range of PFN.



--
Thanks,
Ruan.

[PATCH v4 21/21] smbios: add extra comments to smbios_get_table_legacy()

2024-03-14 Thread Igor Mammedov

Signed-off-by: Igor Mammedov 
---
 hw/smbios/smbios_legacy.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/smbios/smbios_legacy.c b/hw/smbios/smbios_legacy.c
index 06907cd16c..c37a8ee821 100644
--- a/hw/smbios/smbios_legacy.c
+++ b/hw/smbios/smbios_legacy.c
@@ -151,6 +151,9 @@ uint8_t *smbios_get_table_legacy(size_t *length, Error 
**errp)
 smbios_entries_len = sizeof(uint16_t);
 smbios_entries = g_malloc0(smbios_entries_len);
 
+/*
+ * build a set of legacy smbios_table entries using user provided blobs
+ */
 for (i = 0, usr_offset = 0; usr_blobs_sizes && i < usr_blobs_sizes->len;
  i++)
 {
@@ -166,6 +169,10 @@ uint8_t *smbios_get_table_legacy(size_t *length, Error 
**errp)
 table->header.length = cpu_to_le16(sizeof(*table) + size);
 memcpy(table->data, header, size);
 smbios_entries_len += sizeof(*table) + size;
+/*
+ * update number of entries in the blob,
+ * see SeaBIOS: qemu_cfg_legacy():QEMU_CFG_SMBIOS_ENTRIES
+ */
 (*(uint16_t *)smbios_entries) =
 cpu_to_le16(le16_to_cpu(*(uint16_t *)smbios_entries) + 1);
 usr_offset += size;
-- 
2.39.3

[PATCH v4 12/21] smbios: handle errors consistently

2024-03-14 Thread Igor Mammedov

Current code uses mix of error_report()+exit(1)
and error_setg() to handle errors.
Use newer error_setg() everywhere, beside consistency
it will allow to detect error condition without killing
QEMU and attempt switch-over to SMBIOS3.x tables/entrypoint
in follow up patch.

while at it, clear smbios_tables pointer after freeing.
that will avoid double free if smbios_get_tables() is called
multiple times.

Signed-off-by: Igor Mammedov 
Reviewed-by: Ani Sinha 
---
 include/hw/firmware/smbios.h |  4 ++--
 hw/i386/fw_cfg.c |  3 ++-
 hw/smbios/smbios.c   | 34 ++
 hw/smbios/smbios_legacy.c| 22 ++
 4 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/include/hw/firmware/smbios.h b/include/hw/firmware/smbios.h
index ccc51e72f5..d4b91d5a14 100644
--- a/include/hw/firmware/smbios.h
+++ b/include/hw/firmware/smbios.h
@@ -326,7 +326,7 @@ struct smbios_type_127 {
 struct smbios_structure_header header;
 } QEMU_PACKED;
 
-void smbios_validate_table(void);
+bool smbios_validate_table(Error **errp);
 void smbios_add_usr_blob_size(size_t size);
 void smbios_entry_add(QemuOpts *opts, Error **errp);
 void smbios_set_cpuid(uint32_t version, uint32_t features);
@@ -334,7 +334,7 @@ void smbios_set_defaults(const char *manufacturer, const 
char *product,
  const char *version,
  bool uuid_encoded, SmbiosEntryPointType ep_type);
 void smbios_set_default_processor_family(uint16_t processor_family);
-uint8_t *smbios_get_table_legacy(size_t *length);
+uint8_t *smbios_get_table_legacy(size_t *length, Error **errp);
 void smbios_get_tables(MachineState *ms,
const struct smbios_phys_mem_area *mem_array,
const unsigned int mem_array_size,
diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
index d1281066f4..e387bf50d0 100644
--- a/hw/i386/fw_cfg.c
+++ b/hw/i386/fw_cfg.c
@@ -71,7 +71,8 @@ void fw_cfg_build_smbios(PCMachineState *pcms, FWCfgState 
*fw_cfg)
 smbios_set_cpuid(cpu->env.cpuid_version, cpu->env.features[FEAT_1_EDX]);
 
 if (pcmc->smbios_legacy_mode) {
-smbios_tables = smbios_get_table_legacy(_tables_len);
+smbios_tables = smbios_get_table_legacy(_tables_len,
+_fatal);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_SMBIOS_ENTRIES,
  smbios_tables, smbios_tables_len);
 return;
diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index f9efe01233..512ecd46f3 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -19,7 +19,6 @@
 #include "qemu/units.h"
 #include "qapi/error.h"
 #include "qemu/config-file.h"
-#include "qemu/error-report.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
 #include "sysemu/sysemu.h"
@@ -511,23 +510,25 @@ opts_init(smbios_register_config);
  */
 #define SMBIOS_21_MAX_TABLES_LEN 0x
 
-static void smbios_check_type4_count(uint32_t expected_t4_count)
+static bool smbios_check_type4_count(uint32_t expected_t4_count, Error **errp)
 {
 if (smbios_type4_count && smbios_type4_count != expected_t4_count) {
-error_report("Expected %d SMBIOS Type 4 tables, got %d instead",
- expected_t4_count, smbios_type4_count);
-exit(1);
+error_setg(errp, "Expected %d SMBIOS Type 4 tables, got %d instead",
+   expected_t4_count, smbios_type4_count);
+return false;
 }
+return true;
 }
 
-void smbios_validate_table(void)
+bool smbios_validate_table(Error **errp)
 {
 if (smbios_ep_type == SMBIOS_ENTRY_POINT_TYPE_32 &&
 smbios_tables_len > SMBIOS_21_MAX_TABLES_LEN) {
-error_report("SMBIOS 2.1 table length %zu exceeds %d",
- smbios_tables_len, SMBIOS_21_MAX_TABLES_LEN);
-exit(1);
+error_setg(errp, "SMBIOS 2.1 table length %zu exceeds %d",
+   smbios_tables_len, SMBIOS_21_MAX_TABLES_LEN);
+return false;
 }
+return true;
 }
 
 bool smbios_skip_table(uint8_t type, bool required_table)
@@ -1151,15 +1152,18 @@ void smbios_get_tables(MachineState *ms,
 smbios_build_type_41_table(errp);
 smbios_build_type_127_table();
 
-smbios_check_type4_count(ms->smp.sockets);
-smbios_validate_table();
+if (!smbios_check_type4_count(ms->smp.sockets, errp)) {
+goto err_exit;
+}
+if (!smbios_validate_table(errp)) {
+goto err_exit;
+}
 smbios_entry_point_setup();
 
 /* return tables blob and entry point (anchor), and their sizes */
 *tables = smbios_tables;
 *tables_len = smbios_tables_len;
 *anchor = (uint8_t *)
-
 /* calculate length based on anchor string */
 if (!strncmp((char *), "_SM_", 4)) {
 *anchor_len = sizeof(struct smbios_21_entry_point);
@@ -1168,6 +1172,12 @@ void smbios_get_tables(MachineState *ms,
 } else {
 abort();
 }
+
+return;
+err_exit:
+g_free(smbios_tables);
+

1 2 >

1 - 100 of 182 matches

Mail list logo