Re: [PATCH v1 03/15] libvhost-user: Bump up VHOST_USER_MAX_RAM_SLOTS to 509

2024-02-03 Thread Raphael Norwitz
On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand  wrote:
>
> Let's support up to 509 mem slots, just like vhost in the kernel usually
> does and the rust vhost-user implementation recently [1] started doing.
> This is required to properly support memory hotplug, either using
> multiple DIMMs (ACPI supports up to 256) or using virtio-mem.
>
> The 509 used to be the KVM limit, it supported 512, but 3 were
> used for internal purposes. Currently, KVM supports more than 512, but
> it usually doesn't make use of more than ~260 (i.e., 256 DIMMs + boot
> memory), except when other memory devices like PCI devices with BARs are
> used. So, 509 seems to work well for vhost in the kernel.
>
> Details can be found in the QEMU change that made virtio-mem consume
> up to 256 mem slots across all virtio-mem devices. [2]
>
> 509 mem slots implies 509 VMAs/mappings in the worst case (even though,
> in practice with virtio-mem we won't be seeing more than ~260 in most
> setups).
>
> With max_map_count under Linux defaulting to 64k, 509 mem slots
> still correspond to less than 1% of the maximum number of mappings.
> There are plenty left for the application to consume.
>
> [1] https://github.com/rust-vmm/vhost/pull/224
> [2] https://lore.kernel.org/all/20230926185738.277351-1-da...@redhat.com/
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.h | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.h 
> b/subprojects/libvhost-user/libvhost-user.h
> index c882b4e3a2..deb40e77b3 100644
> --- a/subprojects/libvhost-user/libvhost-user.h
> +++ b/subprojects/libvhost-user/libvhost-user.h
> @@ -31,10 +31,12 @@
>  #define VHOST_MEMORY_BASELINE_NREGIONS 8
>
>  /*
> - * Set a reasonable maximum number of ram slots, which will be supported by
> - * any architecture.
> + * vhost in the kernel usually supports 509 mem slots. 509 used to be the
> + * KVM limit, it supported 512, but 3 were used for internal purposes. This
> + * limit is sufficient to support many DIMMs and virtio-mem in
> + * "dynamic-memslots" mode.
>   */
> -#define VHOST_USER_MAX_RAM_SLOTS 32
> +#define VHOST_USER_MAX_RAM_SLOTS 509
>
>  #define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
>
> --
> 2.43.0
>
>



[PATCH] loongarch: Change the UEFI loading mode to loongarch

2024-02-03 Thread Xianglai Li
The UEFI loading mode in loongarch is very different
from that in other architectures:loongarch's UEFI code
is in rom, while other architectures' UEFI code is in flash.

loongarch UEFI can be loaded as follows:
-machine virt,pflash=pflash0-format
-bios ./QEMU_EFI.fd

Other architectures load UEFI using the following methods:
-machine virt,pflash0=pflash0-format,pflash1=pflash1-format

loongarch's UEFI loading method makes qemu and libvirt incompatible
when using NVRAM, and the cost of loongarch's current loading method
far outweighs the benefits, so we decided to use the same UEFI loading
scheme as other architectures.

Cc: Andrea Bolognani 
Cc: maob...@loongson.cn
Cc: Philippe Mathieu-Daudé 
Cc: Song Gao 
Cc: zhaotian...@loongson.cn
Signed-off-by: Xianglai Li 
---
 hw/loongarch/acpi-build.c   |  29 +--
 hw/loongarch/virt.c | 101 ++--
 include/hw/loongarch/virt.h |   8 +--
 3 files changed, 106 insertions(+), 32 deletions(-)

diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index 730bc4a748..308a233e47 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -314,16 +314,39 @@ static void build_pci_device_aml(Aml *scope, 
LoongArchMachineState *lams)
 static void build_flash_aml(Aml *scope, LoongArchMachineState *lams)
 {
 Aml *dev, *crs;
+MemoryRegion *flash_mem;
 
-hwaddr flash_base = VIRT_FLASH_BASE;
-hwaddr flash_size = VIRT_FLASH_SIZE;
+hwaddr flash0_base;
+hwaddr flash0_size;
+
+hwaddr flash1_base;
+hwaddr flash1_size;
+
+flash_mem = pflash_cfi01_get_memory(lams->flash[0]);
+flash0_base = flash_mem->addr;
+flash0_size = flash_mem->size;
+
+flash_mem = pflash_cfi01_get_memory(lams->flash[1]);
+flash1_base = flash_mem->addr;
+flash1_size = flash_mem->size;
 
 dev = aml_device("FLS0");
 aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0015")));
 aml_append(dev, aml_name_decl("_UID", aml_int(0)));
 
 crs = aml_resource_template();
-aml_append(crs, aml_memory32_fixed(flash_base, flash_size, 
AML_READ_WRITE));
+aml_append(crs, aml_memory32_fixed(flash0_base, flash0_size,
+   AML_READ_WRITE));
+aml_append(dev, aml_name_decl("_CRS", crs));
+aml_append(scope, dev);
+
+dev = aml_device("FLS1");
+aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0015")));
+aml_append(dev, aml_name_decl("_UID", aml_int(1)));
+
+crs = aml_resource_template();
+aml_append(crs, aml_memory32_fixed(flash1_base, flash1_size,
+   AML_READ_WRITE));
 aml_append(dev, aml_name_decl("_CRS", crs));
 aml_append(scope, dev);
 }
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index c9a680e61a..79dff497da 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -54,7 +54,9 @@ struct loaderparams {
 const char *initrd_filename;
 };
 
-static void virt_flash_create(LoongArchMachineState *lams)
+static PFlashCFI01 *virt_flash_create1(LoongArchMachineState *lams,
+   const char *name,
+   const char *alias_prop_name)
 {
 DeviceState *dev = qdev_new(TYPE_PFLASH_CFI01);
 
@@ -66,45 +68,78 @@ static void virt_flash_create(LoongArchMachineState *lams)
 qdev_prop_set_uint16(dev, "id1", 0x18);
 qdev_prop_set_uint16(dev, "id2", 0x00);
 qdev_prop_set_uint16(dev, "id3", 0x00);
-qdev_prop_set_string(dev, "name", "virt.flash");
-object_property_add_child(OBJECT(lams), "virt.flash", OBJECT(dev));
-object_property_add_alias(OBJECT(lams), "pflash",
+qdev_prop_set_string(dev, "name", name);
+object_property_add_child(OBJECT(lams), name, OBJECT(dev));
+object_property_add_alias(OBJECT(lams), alias_prop_name,
   OBJECT(dev), "drive");
+return PFLASH_CFI01(dev);
+}
 
-lams->flash = PFLASH_CFI01(dev);
+static void virt_flash_create(LoongArchMachineState *lams)
+{
+lams->flash[0] = virt_flash_create1(lams, "virt.flash0", "pflash0");
+lams->flash[1] = virt_flash_create1(lams, "virt.flash1", "pflash1");
 }
 
-static void virt_flash_map(LoongArchMachineState *lams,
-   MemoryRegion *sysmem)
+static void virt_flash_map1(PFlashCFI01 *flash,
+hwaddr base, hwaddr size,
+MemoryRegion *sysmem)
 {
-PFlashCFI01 *flash = lams->flash;
 DeviceState *dev = DEVICE(flash);
-hwaddr base = VIRT_FLASH_BASE;
-hwaddr size = VIRT_FLASH_SIZE;
+BlockBackend *blk;
+hwaddr real_size = size;
+
+blk = pflash_cfi01_get_blk(flash);
+if (blk) {
+real_size = blk_getlength(blk);
+assert(real_size && real_size <= size);
+}
 
-assert(QEMU_IS_ALIGNED(size, VIRT_FLASH_SECTOR_SIZE));
-assert(size / VIRT_FLASH_SECTOR_SIZE <= UINT32_MAX);
+assert(QEMU_IS_ALIGNED(real_size, VIRT_FLASH_SECTOR_SIZE));
+assert(real_size / 

Re: [PATCH v1 14/15] libvhost-user: Dynamically remap rings after (temporarily?) removing memory regions

2024-02-03 Thread Raphael Norwitz
Someone else with more knowledge of the VQ mapping code should also review.

On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand  wrote:
>
> Currently, we try to remap all rings whenever we add a single new memory
> region. That doesn't quite make sense, because we already map rings when
> setting the ring address, and panic if that goes wrong. Likely, that
> handling was simply copied from set_mem_table code, where we actually
> have to remap all rings.
>
> Remapping all rings might require us to walk quite a lot of memory
> regions to perform the address translations. Ideally, we'd simply remove
> that remapping.
>
> However, let's be a bit careful. There might be some weird corner cases
> where we might temporarily remove a single memory region (e.g., resize
> it), that would have worked for now. Further, a ring might be located on
> hotplugged memory, and as the VM reboots, we might unplug that memory, to
> hotplug memory before resetting the ring addresses.
>
> So let's unmap affected rings as we remove a memory region, and try
> dynamically mapping the ring again when required.
>
> Signed-off-by: David Hildenbrand 

Acked-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 107 --
>  1 file changed, 78 insertions(+), 29 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index febeb2eb89..738e84ab63 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -283,10 +283,75 @@ vu_remove_all_mem_regs(VuDev *dev)
>  dev->nregions = 0;
>  }
>
> +static bool
> +map_ring(VuDev *dev, VuVirtq *vq)
> +{
> +vq->vring.desc = qva_to_va(dev, vq->vra.desc_user_addr);
> +vq->vring.used = qva_to_va(dev, vq->vra.used_user_addr);
> +vq->vring.avail = qva_to_va(dev, vq->vra.avail_user_addr);
> +
> +DPRINT("Setting virtq addresses:\n");
> +DPRINT("vring_desc  at %p\n", vq->vring.desc);
> +DPRINT("vring_used  at %p\n", vq->vring.used);
> +DPRINT("vring_avail at %p\n", vq->vring.avail);
> +
> +return !(vq->vring.desc && vq->vring.used && vq->vring.avail);
> +}
> +
>  static bool

Consider changing the function name to indicate that it may actually map a vq?

Maybe vu_maybe_map_vq()?

>  vu_is_vq_usable(VuDev *dev, VuVirtq *vq)
>  {
> -return likely(!dev->broken) && likely(vq->vring.avail);
> +if (unlikely(dev->broken)) {
> +return false;
> +}
> +
> +if (likely(vq->vring.avail)) {
> +return true;
> +}
> +
> +/*
> + * In corner cases, we might temporarily remove a memory region that
> + * mapped a ring. When removing a memory region we make sure to
> + * unmap any rings that would be impacted. Let's try to remap if we
> + * already succeeded mapping this ring once.
> + */
> +if (!vq->vra.desc_user_addr || !vq->vra.used_user_addr ||
> +!vq->vra.avail_user_addr) {
> +return false;
> +}
> +if (map_ring(dev, vq)) {
> +vu_panic(dev, "remapping queue on access");
> +return false;
> +}
> +return true;
> +}
> +
> +static void
> +unmap_rings(VuDev *dev, VuDevRegion *r)
> +{
> +int i;
> +
> +for (i = 0; i < dev->max_queues; i++) {
> +VuVirtq *vq = >vq[i];
> +const uintptr_t desc = (uintptr_t)vq->vring.desc;
> +const uintptr_t used = (uintptr_t)vq->vring.used;
> +const uintptr_t avail = (uintptr_t)vq->vring.avail;
> +
> +if (desc < r->mmap_addr || desc >= r->mmap_addr + r->size) {
> +continue;
> +}
> +if (used < r->mmap_addr || used >= r->mmap_addr + r->size) {
> +continue;
> +}
> +if (avail < r->mmap_addr || avail >= r->mmap_addr + r->size) {
> +continue;
> +}
> +
> +DPRINT("Unmapping rings of queue %d\n", i);
> +vq->vring.desc = NULL;
> +vq->vring.used = NULL;
> +vq->vring.avail = NULL;
> +}
>  }
>
>  static size_t
> @@ -784,21 +849,6 @@ vu_reset_device_exec(VuDev *dev, VhostUserMsg *vmsg)
>  return false;
>  }
>
> -static bool
> -map_ring(VuDev *dev, VuVirtq *vq)
> -{
> -vq->vring.desc = qva_to_va(dev, vq->vra.desc_user_addr);
> -vq->vring.used = qva_to_va(dev, vq->vra.used_user_addr);
> -vq->vring.avail = qva_to_va(dev, vq->vra.avail_user_addr);
> -
> -DPRINT("Setting virtq addresses:\n");
> -DPRINT("vring_desc  at %p\n", vq->vring.desc);
> -DPRINT("vring_used  at %p\n", vq->vring.used);
> -DPRINT("vring_avail at %p\n", vq->vring.avail);
> -
> -return !(vq->vring.desc && vq->vring.used && vq->vring.avail);
> -}
> -
>  static bool
>  generate_faults(VuDev *dev) {
>  unsigned int i;
> @@ -882,7 +932,6 @@ generate_faults(VuDev *dev) {
>
>  static bool
>  vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> -int i;
>  VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = 
>
>  if 

Re: [PATCH v1 15/15] libvhost-user: Mark mmap'ed region memory as MADV_DONTDUMP

2024-02-03 Thread Raphael Norwitz
On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand  wrote:
>
> We already use MADV_NORESERVE to deal with sparse memory regions. Let's
> also set madvise(MADV_DONTDUMP), otherwise a crash of the process can
> result in us allocating all memory in the mmap'ed region for dumping
> purposes.
>
> This change implies that the mmap'ed rings won't be included in a
> coredump. If ever required for debugging purposes, we could mark only
> the mapped rings MADV_DODUMP.
>
> Ignore errors during madvise() for now.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 738e84ab63..26c289518c 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -458,6 +458,12 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion 
> *msg_region, int fd)
>  DPRINT("mmap_addr:   0x%016"PRIx64"\n",
> (uint64_t)(uintptr_t)mmap_addr);
>
> +#if defined(__linux__)
> +/* Don't include all guest memory in a coredump. */
> +madvise(mmap_addr, msg_region->memory_size + mmap_offset,
> +MADV_DONTDUMP);
> +#endif
> +
>  /* Shift all affected entries by 1 to open a hole at idx. */
>  r = >regions[idx];
>  memmove(r + 1, r, sizeof(VuDevRegion) * (dev->nregions - idx));
> --
> 2.43.0
>
>



Re: [PATCH v2 0/2] hw/smbios: Fix option validation

2024-02-03 Thread Michael Tokarev

29.01.2024 11:03, Akihiko Odaki:

This fixes qemu_smbios_type8_opts and qemu_smbios_type11_opts to have
list terminators and elements for the type option.



Akihiko Odaki (2):
   hw/smbios: Fix OEM strings table option validation
   hw/smbios: Fix port connector option validation

  hw/smbios/smbios.c | 12 
  1 file changed, 12 insertions(+)


Should I pick this up via the trivial-patches tree perhaps?
The changes are trivial enough.  Picking up there, unless there's
a pull request pending for smbios area.

Thanks,

/mjt



[PATCH v2 1/2] target/riscv: Register vendors CSR

2024-02-03 Thread LIU Zhiwei
riscv specification allows custom CSRs in decode area. So we should
register all vendor CSRs in cpu realize stage.

Signed-off-by: LIU Zhiwei 
---
1) Use int index to quiet the  Werror for "i < 0".
---
 target/riscv/cpu.c |  3 +++
 target/riscv/tcg/tcg-cpu.c | 18 ++
 target/riscv/tcg/tcg-cpu.h |  1 +
 3 files changed, 22 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8cbfc7e781..2dcbc9ff32 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1113,6 +1113,9 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
+if (tcg_enabled()) {
+riscv_tcg_cpu_register_vendor_csr(cpu);
+}
 riscv_cpu_register_gdb_regs_for_features(cs);
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index 994ca1cdf9..559bf373f3 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -871,6 +871,24 @@ static void riscv_cpu_validate_profiles(RISCVCPU *cpu)
 }
 }
 
+void riscv_tcg_cpu_register_vendor_csr(RISCVCPU *cpu)
+{
+static const struct {
+bool (*guard_func)(const RISCVCPUConfig *);
+riscv_csr_operations *csr_ops;
+} vendors[] = {
+};
+for (int i = 0; i < ARRAY_SIZE(vendors); ++i) {
+if (!vendors[i].guard_func(>cfg)) {
+continue;
+}
+for (size_t j = 0; j < CSR_TABLE_SIZE &&
+   vendors[i].csr_ops[j].name; j++) {
+csr_ops[j] = vendors[i].csr_ops[j];
+}
+}
+}
+
 void riscv_tcg_cpu_finalize_features(RISCVCPU *cpu, Error **errp)
 {
 CPURISCVState *env = >env;
diff --git a/target/riscv/tcg/tcg-cpu.h b/target/riscv/tcg/tcg-cpu.h
index f7b32417f8..3daaebf4fb 100644
--- a/target/riscv/tcg/tcg-cpu.h
+++ b/target/riscv/tcg/tcg-cpu.h
@@ -25,5 +25,6 @@
 void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp);
 void riscv_tcg_cpu_finalize_features(RISCVCPU *cpu, Error **errp);
 bool riscv_cpu_tcg_compatible(RISCVCPU *cpu);
+void riscv_tcg_cpu_register_vendor_csr(RISCVCPU *cpu);
 
 #endif
-- 
2.25.1




[PATCH v2 2/2] target/riscv: Support xtheadmaee for thead-c906

2024-02-03 Thread LIU Zhiwei
This patch set fix the regression on kernel pointed by Björn Töpel in
https://www.mail-archive.com/qemu-devel@nongnu.org/msg1018232.html.

thead-c906 uses some flags in pte [60-63] bits. It has history reasons that
SVPBMT didn't exist when thead-c906 came to wotrld. We named this feature as
xtheadmaee[1]. this feature is controlled by an custom CSR named mxstatus,
whose maee field encodes whether enable the pte [60-63] bits.

[1]:https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadmaee.adoc

Signed-off-by: LIU Zhiwei 
---
v1->v2:
1) Remove mxstatus user mode access
2) Add reference documentation to the commit log
---
 target/riscv/cpu.c |  6 
 target/riscv/cpu.h |  9 ++
 target/riscv/cpu_bits.h|  6 
 target/riscv/cpu_cfg.h |  4 ++-
 target/riscv/cpu_helper.c  | 25 ---
 target/riscv/meson.build   |  1 +
 target/riscv/tcg/tcg-cpu.c |  7 +++-
 target/riscv/xthead_csr.c  | 65 ++
 8 files changed, 110 insertions(+), 13 deletions(-)
 create mode 100644 target/riscv/xthead_csr.c

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 2dcbc9ff32..bfdbb0539a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -171,6 +171,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(xtheadmemidx, PRIV_VERSION_1_11_0, ext_xtheadmemidx),
 ISA_EXT_DATA_ENTRY(xtheadmempair, PRIV_VERSION_1_11_0, ext_xtheadmempair),
 ISA_EXT_DATA_ENTRY(xtheadsync, PRIV_VERSION_1_11_0, ext_xtheadsync),
+ISA_EXT_DATA_ENTRY(xtheadmaee, PRIV_VERSION_1_11_0, ext_xtheadmaee),
 ISA_EXT_DATA_ENTRY(xventanacondops, PRIV_VERSION_1_12_0, 
ext_XVentanaCondOps),
 
 DEFINE_PROP_END_OF_LIST(),
@@ -506,6 +507,7 @@ static void rv64_thead_c906_cpu_init(Object *obj)
 
 cpu->cfg.mvendorid = THEAD_VENDOR_ID;
 #ifndef CONFIG_USER_ONLY
+cpu->cfg.ext_xtheadmaee = true;
 set_satp_mode_max_supported(cpu, VM_1_10_SV39);
 #endif
 
@@ -949,6 +951,9 @@ static void riscv_cpu_reset_hold(Object *obj)
 }
 
 pmp_unlock_entries(env);
+if (riscv_cpu_cfg(env)->ext_xtheadmaee) {
+env->th_mxstatus |= TH_MXSTATUS_MAEE;
+}
 #endif
 env->xl = riscv_cpu_mxl(env);
 riscv_cpu_update_mask(env);
@@ -1439,6 +1444,7 @@ const RISCVCPUMultiExtConfig riscv_cpu_vendor_exts[] = {
 MULTI_EXT_CFG_BOOL("xtheadmemidx", ext_xtheadmemidx, false),
 MULTI_EXT_CFG_BOOL("xtheadmempair", ext_xtheadmempair, false),
 MULTI_EXT_CFG_BOOL("xtheadsync", ext_xtheadsync, false),
+MULTI_EXT_CFG_BOOL("xtheadmaee", ext_xtheadmaee, false),
 MULTI_EXT_CFG_BOOL("xventanacondops", ext_XVentanaCondOps, false),
 
 DEFINE_PROP_END_OF_LIST(),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 5f3955c38d..1bacf40355 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -412,6 +412,14 @@ struct CPUArchState {
 target_ulong cur_pmmask;
 target_ulong cur_pmbase;
 
+union {
+/* Custom CSR for Xuantie CPU */
+struct {
+#ifndef CONFIG_USER_ONLY
+target_ulong th_mxstatus;
+#endif
+};
+};
 /* Fields from here on are preserved across CPU reset. */
 QEMUTimer *stimer; /* Internal timer for S-mode interrupt */
 QEMUTimer *vstimer; /* Internal timer for VS-mode interrupt */
@@ -799,6 +807,7 @@ void riscv_add_satp_mode_properties(Object *obj);
 bool riscv_cpu_accelerator_compatible(RISCVCPU *cpu);
 
 /* CSR function table */
+extern riscv_csr_operations th_csr_ops[CSR_TABLE_SIZE];
 extern riscv_csr_operations csr_ops[CSR_TABLE_SIZE];
 
 extern const bool valid_vm_1_10_32[], valid_vm_1_10_64[];
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index e116f6c252..67ebb1cefe 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -897,4 +897,10 @@ typedef enum RISCVException {
 /* JVT CSR bits */
 #define JVT_MODE   0x3F
 #define JVT_BASE   (~0x3F)
+
+/* Xuantie custom CSRs */
+#define CSR_TH_MXSTATUS 0x7c0
+
+#define TH_MXSTATUS_MAEE_SHIFT  21
+#define TH_MXSTATUS_MAEE(0x1 << TH_MXSTATUS_MAEE_SHIFT)
 #endif
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index 780ae6ef17..3735c69fd6 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -136,6 +136,7 @@ struct RISCVCPUConfig {
 bool ext_xtheadmemidx;
 bool ext_xtheadmempair;
 bool ext_xtheadsync;
+bool ext_xtheadmaee;
 bool ext_XVentanaCondOps;
 
 uint32_t pmu_mask;
@@ -176,7 +177,8 @@ static inline bool has_xthead_p(const RISCVCPUConfig *cfg)
cfg->ext_xtheadcondmov ||
cfg->ext_xtheadfmemidx || cfg->ext_xtheadfmv ||
cfg->ext_xtheadmac || cfg->ext_xtheadmemidx ||
-   cfg->ext_xtheadmempair || cfg->ext_xtheadsync;
+   cfg->ext_xtheadmempair || cfg->ext_xtheadsync ||
+   cfg->ext_xtheadmaee;
 }
 
 #define MATERIALISE_EXT_PREDICATE(ext) \
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 

Re: [PATCH v1 05/15] libvhost-user: Merge vu_set_mem_table_exec_postcopy() into vu_set_mem_table_exec()

2024-02-03 Thread Raphael Norwitz
On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand  wrote:
>
> Let's reduce some code duplication and prepare for further changes.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 119 +++---
>  1 file changed, 39 insertions(+), 80 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index d5b3468e43..d9e2214ad2 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -937,95 +937,23 @@ vu_get_shared_object(VuDev *dev, VhostUserMsg *vmsg)
>  }
>
>  static bool
> -vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
> +vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
>  {
> -unsigned int i;
>  VhostUserMemory m = vmsg->payload.memory, *memory = 
> -dev->nregions = memory->nregions;
> -
> -DPRINT("Nregions: %u\n", memory->nregions);
> -for (i = 0; i < dev->nregions; i++) {
> -void *mmap_addr;
> -VhostUserMemoryRegion *msg_region = >regions[i];
> -VuDevRegion *dev_region = >regions[i];
> -
> -DPRINT("Region %d\n", i);
> -DPRINT("guest_phys_addr: 0x%016"PRIx64"\n",
> -   msg_region->guest_phys_addr);
> -DPRINT("memory_size: 0x%016"PRIx64"\n",
> -   msg_region->memory_size);
> -DPRINT("userspace_addr   0x%016"PRIx64"\n",
> -   msg_region->userspace_addr);
> -DPRINT("mmap_offset  0x%016"PRIx64"\n",
> -   msg_region->mmap_offset);
> -
> -dev_region->gpa = msg_region->guest_phys_addr;
> -dev_region->size = msg_region->memory_size;
> -dev_region->qva = msg_region->userspace_addr;
> -dev_region->mmap_offset = msg_region->mmap_offset;
> +int prot = PROT_READ | PROT_WRITE;
> +unsigned int i;
>
> -/* We don't use offset argument of mmap() since the
> - * mapped address has to be page aligned, and we use huge
> - * pages.
> +if (dev->postcopy_listening) {
> +/*
>   * In postcopy we're using PROT_NONE here to catch anyone
>   * accessing it before we userfault
>   */
> -mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> - PROT_NONE, MAP_SHARED | MAP_NORESERVE,
> - vmsg->fds[i], 0);
> -
> -if (mmap_addr == MAP_FAILED) {
> -vu_panic(dev, "region mmap error: %s", strerror(errno));
> -} else {
> -dev_region->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
> -DPRINT("mmap_addr:   0x%016"PRIx64"\n",
> -   dev_region->mmap_addr);
> -}
> -
> -/* Return the address to QEMU so that it can translate the ufd
> - * fault addresses back.
> - */
> -msg_region->userspace_addr = dev_region->mmap_addr +
> - dev_region->mmap_offset;
> -close(vmsg->fds[i]);
> -}
> -
> -/* Send the message back to qemu with the addresses filled in */
> -vmsg->fd_num = 0;
> -if (!vu_send_reply(dev, dev->sock, vmsg)) {
> -vu_panic(dev, "failed to respond to set-mem-table for postcopy");
> -return false;
> -}
> -
> -/* Wait for QEMU to confirm that it's registered the handler for the
> - * faults.
> - */
> -if (!dev->read_msg(dev, dev->sock, vmsg) ||
> -vmsg->size != sizeof(vmsg->payload.u64) ||
> -vmsg->payload.u64 != 0) {
> -vu_panic(dev, "failed to receive valid ack for postcopy 
> set-mem-table");
> -return false;
> +prot = PROT_NONE;
>  }
>
> -/* OK, now we can go and register the memory and generate faults */
> -(void)generate_faults(dev);
> -
> -return false;
> -}
> -
> -static bool
> -vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
> -{
> -unsigned int i;
> -VhostUserMemory m = vmsg->payload.memory, *memory = 
> -
>  vu_remove_all_mem_regs(dev);
>  dev->nregions = memory->nregions;
>
> -if (dev->postcopy_listening) {
> -return vu_set_mem_table_exec_postcopy(dev, vmsg);
> -}
> -
>  DPRINT("Nregions: %u\n", memory->nregions);
>  for (i = 0; i < dev->nregions; i++) {
>  void *mmap_addr;
> @@ -1051,8 +979,7 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
>   * mapped address has to be page aligned, and we use huge
>   * pages.  */
>  mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> - PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
> - vmsg->fds[i], 0);
> + prot, MAP_SHARED | MAP_NORESERVE, vmsg->fds[i], 0);
>
>  if (mmap_addr == MAP_FAILED) {
>  vu_panic(dev, "region mmap error: %s", strerror(errno));
> @@ -1062,9 +989,41 @@ vu_set_mem_table_exec(VuDev *dev, 

Re: [PATCH v1 04/15] libvhost-user: Factor out removing all mem regions

2024-02-03 Thread Raphael Norwitz
On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand  wrote:
>
> Let's factor it out. Note that the check for MAP_FAILED was wrong as
> we never set mmap_addr if mmap() failed. We'll remove the NULL check
> separately.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 34 ---
>  1 file changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 8a5a7a2295..d5b3468e43 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -240,6 +240,22 @@ qva_to_va(VuDev *dev, uint64_t qemu_addr)
>  return NULL;
>  }
>
> +static void
> +vu_remove_all_mem_regs(VuDev *dev)
> +{
> +unsigned int i;
> +
> +for (i = 0; i < dev->nregions; i++) {
> +VuDevRegion *r = >regions[i];
> +void *ma = (void *)(uintptr_t)r->mmap_addr;
> +
> +if (ma) {
> +munmap(ma, r->size + r->mmap_offset);
> +}
> +}
> +dev->nregions = 0;
> +}
> +
>  static void
>  vmsg_close_fds(VhostUserMsg *vmsg)
>  {
> @@ -1003,14 +1019,7 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
>  unsigned int i;
>  VhostUserMemory m = vmsg->payload.memory, *memory = 
>
> -for (i = 0; i < dev->nregions; i++) {
> -VuDevRegion *r = >regions[i];
> -void *ma = (void *) (uintptr_t) r->mmap_addr;
> -
> -if (ma) {
> -munmap(ma, r->size + r->mmap_offset);
> -}
> -}
> +vu_remove_all_mem_regs(dev);
>  dev->nregions = memory->nregions;
>
>  if (dev->postcopy_listening) {
> @@ -2112,14 +2121,7 @@ vu_deinit(VuDev *dev)
>  {
>  unsigned int i;
>
> -for (i = 0; i < dev->nregions; i++) {
> -VuDevRegion *r = >regions[i];
> -void *m = (void *) (uintptr_t) r->mmap_addr;
> -if (m != MAP_FAILED) {
> -munmap(m, r->size + r->mmap_offset);
> -}
> -}
> -dev->nregions = 0;
> +vu_remove_all_mem_regs(dev);
>
>  for (i = 0; i < dev->max_queues; i++) {
>  VuVirtq *vq = >vq[i];
> --
> 2.43.0
>
>



Re: [PATCH 0/4] Consolidate the use of device_class_set_parent_realize()

2024-02-03 Thread Michael Tokarev

[Trimming list of addresses]

01.02.2024 11:40, Zhao Liu :

From: Zhao Liu 

Hi list,

Now we already have the device_class_set_parent_realize() to set
parent realize(), thus clean up the places where that helper was
forgotten.


Applied to trivial-patches tree, thanks!

/mjt



Re: [PATCH 2/2] target/riscv: Support xtheadmaee for thead-c906

2024-02-03 Thread LIU Zhiwei



On 2024/1/30 19:43, Christoph Müllner wrote:

On Tue, Jan 30, 2024 at 12:12 PM LIU Zhiwei
 wrote:

thead-c906 uses some flags in pte [60-63] bits. It has history reasons that
SVPBMT didn't exist when thead-c906 came to world.

We named this feature as xtheadmaee. this feature is controlled by an custom
CSR named mxstatus, whose maee field encodes whether enable the pte [60-63] 
bits.

The sections "5.2.2.1 Page table structure" and "15.1.7.1 M-mode extension
status register (MXSTATUS)" in document[1] give the detailed information
about its design.

I would prefer if we would not define an extension like XTheadMaee
without a specification.
The linked document defines the bit MAEE in a custom CSR, but the
scope of XTheadMaee
is not clearly defined (the term XTheadMaee is not even part of the PDF).

We have all the XThead* extensions well described here:
   https://github.com/T-head-Semi/thead-extension-spec/tree/master
And it would not be much effort to add XTheadMaee there as well.


Done.  Thanks for the comment. The XTheadMaee specification is here:

https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadmaee.adoc

Thanks,
Zhiwei



For those who don't know the context of this patch, here is the c906
boot regression report from Björn:
   https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg04766.html

BR
Christoph

Christoph


[1]:https://occ-intl-prod.oss-ap-southeast-1.aliyuncs.com/resource//1699265191641/XuanTie-Openc906-UserManual.pdf
Signed-off-by: LIU Zhiwei 
---
  target/riscv/cpu.c |  6 
  target/riscv/cpu.h |  9 ++
  target/riscv/cpu_bits.h|  6 
  target/riscv/cpu_cfg.h |  4 ++-
  target/riscv/cpu_helper.c  | 25 ---
  target/riscv/meson.build   |  1 +
  target/riscv/tcg/tcg-cpu.c |  9 ++
  target/riscv/xthead_csr.c  | 63 ++
  8 files changed, 105 insertions(+), 18 deletions(-)
  create mode 100644 target/riscv/xthead_csr.c

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 2dcbc9ff32..bfdbb0539a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -171,6 +171,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(xtheadmemidx, PRIV_VERSION_1_11_0, ext_xtheadmemidx),
  ISA_EXT_DATA_ENTRY(xtheadmempair, PRIV_VERSION_1_11_0, ext_xtheadmempair),
  ISA_EXT_DATA_ENTRY(xtheadsync, PRIV_VERSION_1_11_0, ext_xtheadsync),
+ISA_EXT_DATA_ENTRY(xtheadmaee, PRIV_VERSION_1_11_0, ext_xtheadmaee),
  ISA_EXT_DATA_ENTRY(xventanacondops, PRIV_VERSION_1_12_0, 
ext_XVentanaCondOps),

  DEFINE_PROP_END_OF_LIST(),
@@ -506,6 +507,7 @@ static void rv64_thead_c906_cpu_init(Object *obj)

  cpu->cfg.mvendorid = THEAD_VENDOR_ID;
  #ifndef CONFIG_USER_ONLY
+cpu->cfg.ext_xtheadmaee = true;
  set_satp_mode_max_supported(cpu, VM_1_10_SV39);
  #endif

@@ -949,6 +951,9 @@ static void riscv_cpu_reset_hold(Object *obj)
  }

  pmp_unlock_entries(env);
+if (riscv_cpu_cfg(env)->ext_xtheadmaee) {
+env->th_mxstatus |= TH_MXSTATUS_MAEE;
+}
  #endif
  env->xl = riscv_cpu_mxl(env);
  riscv_cpu_update_mask(env);
@@ -1439,6 +1444,7 @@ const RISCVCPUMultiExtConfig riscv_cpu_vendor_exts[] = {
  MULTI_EXT_CFG_BOOL("xtheadmemidx", ext_xtheadmemidx, false),
  MULTI_EXT_CFG_BOOL("xtheadmempair", ext_xtheadmempair, false),
  MULTI_EXT_CFG_BOOL("xtheadsync", ext_xtheadsync, false),
+MULTI_EXT_CFG_BOOL("xtheadmaee", ext_xtheadmaee, false),
  MULTI_EXT_CFG_BOOL("xventanacondops", ext_XVentanaCondOps, false),

  DEFINE_PROP_END_OF_LIST(),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 5f3955c38d..1bacf40355 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -412,6 +412,14 @@ struct CPUArchState {
  target_ulong cur_pmmask;
  target_ulong cur_pmbase;

+union {
+/* Custom CSR for Xuantie CPU */
+struct {
+#ifndef CONFIG_USER_ONLY
+target_ulong th_mxstatus;
+#endif
+};
+};
  /* Fields from here on are preserved across CPU reset. */
  QEMUTimer *stimer; /* Internal timer for S-mode interrupt */
  QEMUTimer *vstimer; /* Internal timer for VS-mode interrupt */
@@ -799,6 +807,7 @@ void riscv_add_satp_mode_properties(Object *obj);
  bool riscv_cpu_accelerator_compatible(RISCVCPU *cpu);

  /* CSR function table */
+extern riscv_csr_operations th_csr_ops[CSR_TABLE_SIZE];
  extern riscv_csr_operations csr_ops[CSR_TABLE_SIZE];

  extern const bool valid_vm_1_10_32[], valid_vm_1_10_64[];
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index e116f6c252..67ebb1cefe 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -897,4 +897,10 @@ typedef enum RISCVException {
  /* JVT CSR bits */
  #define JVT_MODE   0x3F
  #define JVT_BASE   (~0x3F)
+
+/* Xuantie custom CSRs */
+#define CSR_TH_MXSTATUS 0x7c0
+
+#define TH_MXSTATUS_MAEE_SHIFT  21
+#define TH_MXSTATUS_MAEE(0x1 << 

Re: [PATCH v1 06/15] libvhost-user: Factor out adding a memory region

2024-02-03 Thread Raphael Norwitz
On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand  wrote:
>
> Let's factor it out, reducing quite some code duplication and perparing
> for further changes.
>
> If we fail to mmap a region and panic, we now simply don't add that
> (broken) region.
>
> Note that we now increment dev->nregions as we are successfully
> adding memory regions, and don't increment dev->nregions if anything went
> wrong.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 168 --
>  1 file changed, 60 insertions(+), 108 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index d9e2214ad2..a2baefe84b 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -256,6 +256,61 @@ vu_remove_all_mem_regs(VuDev *dev)
>  dev->nregions = 0;
>  }
>
> +static void
> +_vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
> +{
> +int prot = PROT_READ | PROT_WRITE;
> +VuDevRegion *r;
> +void *mmap_addr;
> +
> +DPRINT("Adding region %d\n", dev->nregions);
> +DPRINT("guest_phys_addr: 0x%016"PRIx64"\n",
> +   msg_region->guest_phys_addr);
> +DPRINT("memory_size: 0x%016"PRIx64"\n",
> +   msg_region->memory_size);
> +DPRINT("userspace_addr   0x%016"PRIx64"\n",
> +   msg_region->userspace_addr);
> +DPRINT("mmap_offset  0x%016"PRIx64"\n",
> +   msg_region->mmap_offset);
> +
> +if (dev->postcopy_listening) {
> +/*
> + * In postcopy we're using PROT_NONE here to catch anyone
> + * accessing it before we userfault
> + */
> +prot = PROT_NONE;
> +}
> +
> +/*
> + * We don't use offset argument of mmap() since the mapped address has
> + * to be page aligned, and we use huge pages.
> + */
> +mmap_addr = mmap(0, msg_region->memory_size + msg_region->mmap_offset,
> + prot, MAP_SHARED | MAP_NORESERVE, fd, 0);
> +if (mmap_addr == MAP_FAILED) {
> +vu_panic(dev, "region mmap error: %s", strerror(errno));
> +return;
> +}
> +DPRINT("mmap_addr:   0x%016"PRIx64"\n",
> +   (uint64_t)(uintptr_t)mmap_addr);
> +
> +r = >regions[dev->nregions];
> +r->gpa = msg_region->guest_phys_addr;
> +r->size = msg_region->memory_size;
> +r->qva = msg_region->userspace_addr;
> +r->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
> +r->mmap_offset = msg_region->mmap_offset;
> +dev->nregions++;
> +
> +if (dev->postcopy_listening) {
> +/*
> + * Return the address to QEMU so that it can translate the ufd
> + * fault addresses back.
> + */
> +msg_region->userspace_addr = r->mmap_addr + r->mmap_offset;
> +}
> +}
> +
>  static void
>  vmsg_close_fds(VhostUserMsg *vmsg)
>  {
> @@ -727,10 +782,7 @@ generate_faults(VuDev *dev) {
>  static bool
>  vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>  int i;
> -bool track_ramblocks = dev->postcopy_listening;
>  VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = 
> -VuDevRegion *dev_region = >regions[dev->nregions];
> -void *mmap_addr;
>
>  if (vmsg->fd_num != 1) {
>  vmsg_close_fds(vmsg);
> @@ -760,69 +812,20 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>   * we know all the postcopy client bases have been received, and we
>   * should start generating faults.
>   */
> -if (track_ramblocks &&
> +if (dev->postcopy_listening &&
>  vmsg->size == sizeof(vmsg->payload.u64) &&
>  vmsg->payload.u64 == 0) {
>  (void)generate_faults(dev);
>  return false;
>  }
>
> -DPRINT("Adding region: %u\n", dev->nregions);
> -DPRINT("guest_phys_addr: 0x%016"PRIx64"\n",
> -   msg_region->guest_phys_addr);
> -DPRINT("memory_size: 0x%016"PRIx64"\n",
> -   msg_region->memory_size);
> -DPRINT("userspace_addr   0x%016"PRIx64"\n",
> -   msg_region->userspace_addr);
> -DPRINT("mmap_offset  0x%016"PRIx64"\n",
> -   msg_region->mmap_offset);
> -
> -dev_region->gpa = msg_region->guest_phys_addr;
> -dev_region->size = msg_region->memory_size;
> -dev_region->qva = msg_region->userspace_addr;
> -dev_region->mmap_offset = msg_region->mmap_offset;
> -
> -/*
> - * We don't use offset argument of mmap() since the
> - * mapped address has to be page aligned, and we use huge
> - * pages.
> - */
> -if (track_ramblocks) {
> -/*
> - * In postcopy we're using PROT_NONE here to catch anyone
> - * accessing it before we userfault.
> - */
> -mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> - PROT_NONE, MAP_SHARED | MAP_NORESERVE,
> - vmsg->fds[0], 0);
> -

Re: [PATCH v1 08/15] libvhost-user: Don't zero out memory for memory regions

2024-02-03 Thread Raphael Norwitz
On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand  wrote:
>
> dev->nregions always covers only valid entries. Stop zeroing out other
> array elements that are unused.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index f99c888b48..e1a1b9df88 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -888,13 +888,9 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>
>  munmap((void *)(uintptr_t)r->mmap_addr, r->size + 
> r->mmap_offset);
>
> -/*
> - * Shift all affected entries by 1 to close the hole at index i 
> and
> - * zero out the last entry.
> - */
> +/* Shift all affected entries by 1 to close the hole at index. */
>  memmove(dev->regions + i, dev->regions + i + 1,
>  sizeof(VuDevRegion) * (dev->nregions - i - 1));
> -memset(dev->regions + dev->nregions - 1, 0, sizeof(VuDevRegion));
>  DPRINT("Successfully removed a region\n");
>  dev->nregions--;
>  i--;
> @@ -2119,7 +2115,6 @@ vu_init(VuDev *dev,
>  DPRINT("%s: failed to malloc mem regions\n", __func__);
>  return false;
>  }
> -memset(dev->regions, 0, VHOST_USER_MAX_RAM_SLOTS * 
> sizeof(dev->regions[0]));
>
>  dev->vq = malloc(max_queues * sizeof(dev->vq[0]));
>  if (!dev->vq) {
> --
> 2.43.0
>
>



Re: [PATCH v1 07/15] libvhost-user: No need to check for NULL when unmapping

2024-02-03 Thread Raphael Norwitz
On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand  wrote:
>
> We never add a memory region if mmap() failed. Therefore, no need to check
> for NULL.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index a2baefe84b..f99c888b48 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -247,11 +247,8 @@ vu_remove_all_mem_regs(VuDev *dev)
>
>  for (i = 0; i < dev->nregions; i++) {
>  VuDevRegion *r = >regions[i];
> -void *ma = (void *)(uintptr_t)r->mmap_addr;
>
> -if (ma) {
> -munmap(ma, r->size + r->mmap_offset);
> -}
> +munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
>  }
>  dev->nregions = 0;
>  }
> @@ -888,11 +885,8 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>  for (i = 0; i < dev->nregions; i++) {
>  if (reg_equal(>regions[i], msg_region)) {
>  VuDevRegion *r = >regions[i];
> -void *ma = (void *) (uintptr_t) r->mmap_addr;
>
> -if (ma) {
> -munmap(ma, r->size + r->mmap_offset);
> -}
> +munmap((void *)(uintptr_t)r->mmap_addr, r->size + 
> r->mmap_offset);
>
>  /*
>   * Shift all affected entries by 1 to close the hole at index i 
> and
> --
> 2.43.0
>
>



Re: [PATCH] loongarch: Change the UEFI loading mode to loongarch

2024-02-03 Thread maobibo




On 2024/2/4 上午9:54, Xianglai Li wrote:

The UEFI loading mode in loongarch is very different
from that in other architectures:loongarch's UEFI code
is in rom, while other architectures' UEFI code is in flash.

loongarch UEFI can be loaded as follows:
-machine virt,pflash=pflash0-format
-bios ./QEMU_EFI.fd

Other architectures load UEFI using the following methods:
-machine virt,pflash0=pflash0-format,pflash1=pflash1-format

loongarch's UEFI loading method makes qemu and libvirt incompatible
when using NVRAM, and the cost of loongarch's current loading method
far outweighs the benefits, so we decided to use the same UEFI loading
scheme as other architectures.

Cc: Andrea Bolognani 
Cc: maob...@loongson.cn
Cc: Philippe Mathieu-Daudé 
Cc: Song Gao 
Cc: zhaotian...@loongson.cn
Signed-off-by: Xianglai Li 
---
  hw/loongarch/acpi-build.c   |  29 +--
  hw/loongarch/virt.c | 101 ++--
  include/hw/loongarch/virt.h |   8 +--
  3 files changed, 106 insertions(+), 32 deletions(-)

diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index 730bc4a748..308a233e47 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -314,16 +314,39 @@ static void build_pci_device_aml(Aml *scope, 
LoongArchMachineState *lams)
  static void build_flash_aml(Aml *scope, LoongArchMachineState *lams)
  {
  Aml *dev, *crs;
+MemoryRegion *flash_mem;
  
-hwaddr flash_base = VIRT_FLASH_BASE;

-hwaddr flash_size = VIRT_FLASH_SIZE;
+hwaddr flash0_base;
+hwaddr flash0_size;
+
+hwaddr flash1_base;
+hwaddr flash1_size;
+
+flash_mem = pflash_cfi01_get_memory(lams->flash[0]);
+flash0_base = flash_mem->addr;
+flash0_size = flash_mem->size;
+
+flash_mem = pflash_cfi01_get_memory(lams->flash[1]);
+flash1_base = flash_mem->addr;
+flash1_size = flash_mem->size;
  
  dev = aml_device("FLS0");

  aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0015")));
  aml_append(dev, aml_name_decl("_UID", aml_int(0)));
  
  crs = aml_resource_template();

-aml_append(crs, aml_memory32_fixed(flash_base, flash_size, 
AML_READ_WRITE));
+aml_append(crs, aml_memory32_fixed(flash0_base, flash0_size,
+   AML_READ_WRITE));
+aml_append(dev, aml_name_decl("_CRS", crs));
+aml_append(scope, dev);
+
+dev = aml_device("FLS1");
+aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0015")));
+aml_append(dev, aml_name_decl("_UID", aml_int(1)));
+
+crs = aml_resource_template();
+aml_append(crs, aml_memory32_fixed(flash1_base, flash1_size,
+   AML_READ_WRITE));
  aml_append(dev, aml_name_decl("_CRS", crs));
  aml_append(scope, dev);
  }
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index c9a680e61a..79dff497da 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -54,7 +54,9 @@ struct loaderparams {
  const char *initrd_filename;
  };
  
-static void virt_flash_create(LoongArchMachineState *lams)

+static PFlashCFI01 *virt_flash_create1(LoongArchMachineState *lams,
+   const char *name,
+   const char *alias_prop_name)
  {
  DeviceState *dev = qdev_new(TYPE_PFLASH_CFI01);
  
@@ -66,45 +68,78 @@ static void virt_flash_create(LoongArchMachineState *lams)

  qdev_prop_set_uint16(dev, "id1", 0x18);
  qdev_prop_set_uint16(dev, "id2", 0x00);
  qdev_prop_set_uint16(dev, "id3", 0x00);
-qdev_prop_set_string(dev, "name", "virt.flash");
-object_property_add_child(OBJECT(lams), "virt.flash", OBJECT(dev));
-object_property_add_alias(OBJECT(lams), "pflash",
+qdev_prop_set_string(dev, "name", name);
+object_property_add_child(OBJECT(lams), name, OBJECT(dev));
+object_property_add_alias(OBJECT(lams), alias_prop_name,
OBJECT(dev), "drive");
+return PFLASH_CFI01(dev);
+}
  
-lams->flash = PFLASH_CFI01(dev);

+static void virt_flash_create(LoongArchMachineState *lams)
+{
+lams->flash[0] = virt_flash_create1(lams, "virt.flash0", "pflash0");
+lams->flash[1] = virt_flash_create1(lams, "virt.flash1", "pflash1");
  }
  
-static void virt_flash_map(LoongArchMachineState *lams,

-   MemoryRegion *sysmem)
+static void virt_flash_map1(PFlashCFI01 *flash,
+hwaddr base, hwaddr size,
+MemoryRegion *sysmem)
  {
-PFlashCFI01 *flash = lams->flash;
  DeviceState *dev = DEVICE(flash);
-hwaddr base = VIRT_FLASH_BASE;
-hwaddr size = VIRT_FLASH_SIZE;
+BlockBackend *blk;
+hwaddr real_size = size;
+
+blk = pflash_cfi01_get_blk(flash);
+if (blk) {
+real_size = blk_getlength(blk);
+assert(real_size && real_size <= size);
+}
  
-assert(QEMU_IS_ALIGNED(size, VIRT_FLASH_SECTOR_SIZE));

-assert(size / VIRT_FLASH_SECTOR_SIZE <= UINT32_MAX);
+

[PATCH] MAINTAINERS: Switch to my Enfabrica email

2024-02-03 Thread Raphael Norwitz
I'd prefer to use my new work email so this change updates MAINTAINERS
with it.

Signed-off-by: Raphael Norwitz 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2f9741b898..a12b58abe2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2556,7 +2556,7 @@ F: include/hw/virtio/virtio-gpu.h
 F: docs/system/devices/virtio-gpu.rst
 
 vhost-user-blk
-M: Raphael Norwitz 
+M: Raphael Norwitz 
 S: Maintained
 F: contrib/vhost-user-blk/
 F: contrib/vhost-user-scsi/
-- 
2.43.0




[PATCH v2 0/2] target/riscv: Support mxstatus CSR for thead-c906

2024-02-03 Thread LIU Zhiwei
This patch set fix the regression on kernel pointed by Björn Töpel in
https://www.mail-archive.com/qemu-devel@nongnu.org/msg1018232.html.

We first add a framework for vendor CSRs in patch 1. After that we add
one thead-c906 CSR mxstatus, which is used for mmu extension xtheadmaee.

thead-c906 uses some flags in pte [60-63] bits. It has history reasons that
SVPBMT didn't exist when thead-c906 came to wotrld. We named this feature as
xtheadmaee[1]. this feature is controlled by an custom CSR named mxstatus,
whose maee field encodes whether enable the pte [60-63] bits.

[1]:https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadmaee.adoc

LIU Zhiwei (2):
  target/riscv: Register vendors CSR
  target/riscv: Support xtheadmaee for thead-c906

 target/riscv/cpu.c |  9 ++
 target/riscv/cpu.h |  9 ++
 target/riscv/cpu_bits.h|  6 
 target/riscv/cpu_cfg.h |  4 ++-
 target/riscv/cpu_helper.c  | 25 ---
 target/riscv/meson.build   |  1 +
 target/riscv/tcg/tcg-cpu.c | 25 ++-
 target/riscv/tcg/tcg-cpu.h |  1 +
 target/riscv/xthead_csr.c  | 65 ++
 9 files changed, 132 insertions(+), 13 deletions(-)
 create mode 100644 target/riscv/xthead_csr.c

-- 
2.25.1




Re: [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation

2024-02-03 Thread Raphael Norwitz
As a heads up, I've left Nutanix and updated it in MAINTAINERS. Will
be updating it again shortly so tagging these with my new work email.

On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand  wrote:
>
> We barely had mmap_offset set in the past. With virtio-mem and
> dynamic-memslots that will change.
>
> In vu_add_mem_reg() and vu_set_mem_table_exec_postcopy(), we are
> performing pointer arithmetics, which is wrong. Let's simply
> use dev_region->mmap_addr instead of "void *mmap_addr".
>
> Fixes: ec94c8e621de ("Support adding individual regions in libvhost-user")
> Fixes: 9bb38019942c ("vhost+postcopy: Send address back to qemu")
> Cc: Raphael Norwitz 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index a3b158c671..7e515ed15d 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -800,8 +800,8 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>   * Return the address to QEMU so that it can translate the ufd
>   * fault addresses back.
>   */
> -msg_region->userspace_addr = (uintptr_t)(mmap_addr +
> - dev_region->mmap_offset);
> +msg_region->userspace_addr = dev_region->mmap_addr +
> + dev_region->mmap_offset;
>
>  /* Send the message back to qemu with the addresses filled in. */
>  vmsg->fd_num = 0;
> @@ -969,8 +969,8 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg 
> *vmsg)
>  /* Return the address to QEMU so that it can translate the ufd
>   * fault addresses back.
>   */
> -msg_region->userspace_addr = (uintptr_t)(mmap_addr +
> - dev_region->mmap_offset);
> +msg_region->userspace_addr = dev_region->mmap_addr +
> + dev_region->mmap_offset;
>  close(vmsg->fds[i]);
>  }
>
> --
> 2.43.0
>
>



Re: [PATCH v1 02/15] libvhost-user: Dynamically allocate memory for memory slots

2024-02-03 Thread Raphael Norwitz
On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand  wrote:
>
> Let's prepare for increasing VHOST_USER_MAX_RAM_SLOTS by dynamically
> allocating dev->regions. We don't have any ABI guarantees (not
> dynamically linked), so we can simply change the layout of VuDev.
>
> Let's zero out the memory, just as we used to do.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 11 +++
>  subprojects/libvhost-user/libvhost-user.h |  2 +-
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 7e515ed15d..8a5a7a2295 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -2171,6 +2171,8 @@ vu_deinit(VuDev *dev)
>
>  free(dev->vq);
>  dev->vq = NULL;
> +free(dev->regions);
> +dev->regions = NULL;
>  }
>
>  bool
> @@ -2205,9 +2207,18 @@ vu_init(VuDev *dev,
>  dev->backend_fd = -1;
>  dev->max_queues = max_queues;
>
> +dev->regions = malloc(VHOST_USER_MAX_RAM_SLOTS * 
> sizeof(dev->regions[0]));
> +if (!dev->regions) {
> +DPRINT("%s: failed to malloc mem regions\n", __func__);
> +return false;
> +}
> +memset(dev->regions, 0, VHOST_USER_MAX_RAM_SLOTS * 
> sizeof(dev->regions[0]));
> +
>  dev->vq = malloc(max_queues * sizeof(dev->vq[0]));
>  if (!dev->vq) {
>  DPRINT("%s: failed to malloc virtqueues\n", __func__);
> +free(dev->regions);
> +dev->regions = NULL;
>  return false;
>  }
>
> diff --git a/subprojects/libvhost-user/libvhost-user.h 
> b/subprojects/libvhost-user/libvhost-user.h
> index c2352904f0..c882b4e3a2 100644
> --- a/subprojects/libvhost-user/libvhost-user.h
> +++ b/subprojects/libvhost-user/libvhost-user.h
> @@ -398,7 +398,7 @@ typedef struct VuDevInflightInfo {
>  struct VuDev {
>  int sock;
>  uint32_t nregions;
> -VuDevRegion regions[VHOST_USER_MAX_RAM_SLOTS];
> +VuDevRegion *regions;
>  VuVirtq *vq;
>  VuDevInflightInfo inflight_info;
>  int log_call_fd;
> --
> 2.43.0
>
>



Re: [PATCH v1 10/15] libvhost-user: Factor out search for memory region by GPA and simplify

2024-02-03 Thread Raphael Norwitz
On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand  wrote:
>
> Memory regions cannot overlap, and if we ever hit that case something
> would be really flawed.
>
> For example, when vhost code in QEMU decides to increase the size of memory
> regions to cover full huge pages, it makes sure to never create overlaps,
> and if there would be overlaps, it would bail out.
>
> QEMU commits 48d7c9757749 ("vhost: Merge sections added to temporary
> list"), c1ece84e7c93 ("vhost: Huge page align and merge") and
> e7b94a84b6cb ("vhost: Allow adjoining regions") added and clarified that
> handling and how overlaps are impossible.
>
> Consequently, each GPA can belong to at most one memory region, and
> everything else doesn't make sense. Let's factor out our search to prepare
> for further changes.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 79 +--
>  1 file changed, 45 insertions(+), 34 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 22154b217f..d036b54ed0 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -195,30 +195,47 @@ vu_panic(VuDev *dev, const char *msg, ...)
>   */
>  }
>
> +/* Search for a memory region that covers this guest physical address. */
> +static VuDevRegion *
> +vu_gpa_to_mem_region(VuDev *dev, uint64_t guest_addr)
> +{
> +unsigned int i;
> +
> +/*
> + * Memory regions cannot overlap in guest physical address space. Each
> + * GPA belongs to exactly one memory region, so there can only be one
> + * match.
> + */
> +for (i = 0; i < dev->nregions; i++) {
> +VuDevRegion *cur = >regions[i];
> +
> +if (guest_addr >= cur->gpa && guest_addr < cur->gpa + cur->size) {
> +return cur;
> +}
> +}
> +return NULL;
> +}
> +
>  /* Translate guest physical address to our virtual address.  */
>  void *
>  vu_gpa_to_va(VuDev *dev, uint64_t *plen, uint64_t guest_addr)
>  {
> -unsigned int i;
> +VuDevRegion *r;
>
>  if (*plen == 0) {
>  return NULL;
>  }
>
> -/* Find matching memory region.  */
> -for (i = 0; i < dev->nregions; i++) {
> -VuDevRegion *r = >regions[i];
> -
> -if ((guest_addr >= r->gpa) && (guest_addr < (r->gpa + r->size))) {
> -if ((guest_addr + *plen) > (r->gpa + r->size)) {
> -*plen = r->gpa + r->size - guest_addr;
> -}
> -return (void *)(uintptr_t)
> -guest_addr - r->gpa + r->mmap_addr + r->mmap_offset;
> -}
> +r = vu_gpa_to_mem_region(dev, guest_addr);
> +if (!r) {
> +return NULL;
>  }
>
> -return NULL;
> +if ((guest_addr + *plen) > (r->gpa + r->size)) {
> +*plen = r->gpa + r->size - guest_addr;
> +}
> +return (void *)(uintptr_t)guest_addr - r->gpa + r->mmap_addr +
> +   r->mmap_offset;
>  }
>
>  /* Translate qemu virtual address to our virtual address.  */
> @@ -854,8 +871,8 @@ static inline bool reg_equal(VuDevRegion *vudev_reg,
>  static bool
>  vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>  VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = 
> -unsigned int i;
> -bool found = false;
> +unsigned int idx;
> +VuDevRegion *r;
>
>  if (vmsg->fd_num > 1) {
>  vmsg_close_fds(vmsg);
> @@ -882,28 +899,22 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>  DPRINT("mmap_offset  0x%016"PRIx64"\n",
> msg_region->mmap_offset);
>
> -for (i = 0; i < dev->nregions; i++) {
> -if (reg_equal(>regions[i], msg_region)) {
> -VuDevRegion *r = >regions[i];
> -
> -munmap((void *)(uintptr_t)r->mmap_addr, r->size + 
> r->mmap_offset);
> -
> -/* Shift all affected entries by 1 to close the hole at index. */
> -memmove(dev->regions + i, dev->regions + i + 1,
> -sizeof(VuDevRegion) * (dev->nregions - i - 1));
> -DPRINT("Successfully removed a region\n");
> -dev->nregions--;
> -i--;
> -
> -found = true;
> -break;
> -}
> -}
> -
> -if (!found) {
> +r = vu_gpa_to_mem_region(dev, msg_region->guest_phys_addr);
> +if (!r || !reg_equal(r, msg_region)) {
> +vmsg_close_fds(vmsg);
>  vu_panic(dev, "Specified region not found\n");
> +return false;
>  }
>
> +munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
> +
> +idx = r - dev->regions;
> +assert(idx < dev->nregions);
> +/* Shift all affected entries by 1 to close the hole. */
> +memmove(r, r + 1, sizeof(VuDevRegion) * (dev->nregions - idx - 1));
> +DPRINT("Successfully removed a region\n");
> +dev->nregions--;
> +
>  vmsg_close_fds(vmsg);
>
>  return false;
> --
> 2.43.0
>
>



Re: [PATCH v1 09/15] libvhost-user: Don't search for duplicates when removing memory regions

2024-02-03 Thread Raphael Norwitz
On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand  wrote:
>
> We cannot have duplicate memory regions, something would be deeply
> flawed elsewhere. Let's just stop the search once we found an entry.
>
> We'll add more sanity checks when adding memory regions later.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index e1a1b9df88..22154b217f 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -896,8 +896,7 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>  i--;
>
>  found = true;
> -
> -/* Continue the search for eventual duplicates. */
> +break;
>  }
>  }
>
> --
> 2.43.0
>
>



Re: [PATCH v1 11/15] libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va()

2024-02-03 Thread Raphael Norwitz
One comment on this one.

On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand  wrote:
>
> Let's speed up GPA to memory region / virtual address lookup. Store the
> memory regions ordered by guest physical addresses, and use binary
> search for address translation, as well as when adding/removing memory
> regions.
>
> Most importantly, this will speed up GPA->VA address translation when we
> have many memslots.
>
> Signed-off-by: David Hildenbrand 
> ---
>  subprojects/libvhost-user/libvhost-user.c | 49 +--
>  1 file changed, 45 insertions(+), 4 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index d036b54ed0..75e47b7bb3 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -199,19 +199,30 @@ vu_panic(VuDev *dev, const char *msg, ...)
>  static VuDevRegion *
>  vu_gpa_to_mem_region(VuDev *dev, uint64_t guest_addr)
>  {
> -unsigned int i;
> +int low = 0;
> +int high = dev->nregions - 1;
>
>  /*
>   * Memory regions cannot overlap in guest physical address space. Each
>   * GPA belongs to exactly one memory region, so there can only be one
>   * match.
> + *
> + * We store our memory regions ordered by GPA and can simply perform a
> + * binary search.
>   */
> -for (i = 0; i < dev->nregions; i++) {
> -VuDevRegion *cur = >regions[i];
> +while (low <= high) {
> +unsigned int mid = low + (high - low) / 2;
> +VuDevRegion *cur = >regions[mid];
>
>  if (guest_addr >= cur->gpa && guest_addr < cur->gpa + cur->size) {
>  return cur;
>  }
> +if (guest_addr >= cur->gpa + cur->size) {
> +low = mid + 1;
> +}
> +if (guest_addr < cur->gpa) {
> +high = mid - 1;
> +}
>  }
>  return NULL;
>  }
> @@ -273,9 +284,14 @@ vu_remove_all_mem_regs(VuDev *dev)
>  static void
>  _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
>  {
> +const uint64_t start_gpa = msg_region->guest_phys_addr;
> +const uint64_t end_gpa = start_gpa + msg_region->memory_size;
>  int prot = PROT_READ | PROT_WRITE;
>  VuDevRegion *r;
>  void *mmap_addr;
> +int low = 0;
> +int high = dev->nregions - 1;
> +unsigned int idx;
>
>  DPRINT("Adding region %d\n", dev->nregions);
>  DPRINT("guest_phys_addr: 0x%016"PRIx64"\n",
> @@ -295,6 +311,29 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion 
> *msg_region, int fd)
>  prot = PROT_NONE;
>  }
>
> +/*
> + * We will add memory regions into the array sorted by GPA. Perform a
> + * binary search to locate the insertion point: it will be at the low
> + * index.
> + */
> +while (low <= high) {
> +unsigned int mid = low + (high - low)  / 2;
> +VuDevRegion *cur = >regions[mid];
> +
> +/* Overlap of GPA addresses. */

Looks like this check will only catch if the new region is fully
contained within an existing region. I think we need to check whether
either start or end region are in the range, i.e.:

if ((start_gpa > curr_gpa && start_gpa < cur->gpa + curr_size ) ||
(end_gpa > currr->gpa && end_gpa < cur->gpa + curr->size)  )


> +if (start_gpa < cur->gpa + cur->size && cur->gpa < end_gpa) {
> +vu_panic(dev, "regions with overlapping guest physical 
> addresses");
> +return;
> +}
> +if (start_gpa >= cur->gpa + cur->size) {
> +low = mid + 1;
> +}
> +if (start_gpa < cur->gpa) {
> +high = mid - 1;
> +}
> +}
> +idx = low;
> +
>  /*
>   * We don't use offset argument of mmap() since the mapped address has
>   * to be page aligned, and we use huge pages.
> @@ -308,7 +347,9 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion 
> *msg_region, int fd)
>  DPRINT("mmap_addr:   0x%016"PRIx64"\n",
> (uint64_t)(uintptr_t)mmap_addr);
>
> -r = >regions[dev->nregions];
> +/* Shift all affected entries by 1 to open a hole at idx. */
> +r = >regions[idx];
> +memmove(r + 1, r, sizeof(VuDevRegion) * (dev->nregions - idx));
>  r->gpa = msg_region->guest_phys_addr;
>  r->size = msg_region->memory_size;
>  r->qva = msg_region->userspace_addr;
> --
> 2.43.0
>
>



Re: [PATCH v1 12/15] libvhost-user: Use most of mmap_offset as fd_offset

2024-02-03 Thread Raphael Norwitz
On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand  wrote:
>
> In the past, QEMU would create memory regions that could partially cover
> hugetlb pages, making mmap() fail if we would use the mmap_offset as an
> fd_offset. For that reason, we never used the mmap_offset as an offset into
> the fd and instead always mapped the fd from the very start.
>
> However, that can easily result in us mmap'ing a lot of unnecessary
> parts of an fd, possibly repeatedly.
>
> QEMU nowadays does not create memory regions that partially cover huge
> pages -- it never really worked with postcopy. QEMU handles merging of
> regions that partially cover huge pages (due to holes in boot memory) since
> 2018 in c1ece84e7c93 ("vhost: Huge page align and merge").
>
> Let's be a bit careful and not unconditionally convert the
> mmap_offset into an fd_offset. Instead, let's simply detect the hugetlb
> size and pass as much as we can as fd_offset, making sure that we call
> mmap() with a properly aligned offset.
>
> With QEMU and a virtio-mem device that is fully plugged (50GiB using 50
> memslots) the qemu-storage daemon process consumes in the VA space
> 1281GiB before this change and 58GiB after this change.
>
> Example debug output:
>    Vhost user message 
>   Request: VHOST_USER_ADD_MEM_REG (37)
>   Flags:   0x9
>   Size:40
>   Fds: 59
>   Adding region 50
>   guest_phys_addr: 0x000d8000
>   memory_size: 0x4000
>   userspace_addr   0x7f54ebffe000
>   mmap_offset  0x000c
>   fd_offset:   0x000c
>   new mmap_offset: 0x
>   mmap_addr:   0x7f7ecc00
>   Successfully added new region
>    Vhost user message 
>   Request: VHOST_USER_ADD_MEM_REG (37)
>   Flags:   0x9
>   Size:40
>   Fds: 59
>   Adding region 51
>   guest_phys_addr: 0x000dc000
>   memory_size: 0x4000
>   userspace_addr   0x7f552bffe000
>   mmap_offset  0x000c4000
>   fd_offset:   0x000c4000
>   new mmap_offset: 0x
>   mmap_addr:   0x7f7e8c00
>   Successfully added new region
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 50 ---
>  1 file changed, 45 insertions(+), 5 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 75e47b7bb3..7d8293dc84 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -43,6 +43,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>
>  #ifdef __NR_userfaultfd
>  #include 
> @@ -281,12 +283,36 @@ vu_remove_all_mem_regs(VuDev *dev)
>  dev->nregions = 0;
>  }
>
> +static size_t
> +get_fd_pagesize(int fd)
> +{
> +static size_t pagesize;
> +#if defined(__linux__)
> +struct statfs fs;
> +int ret;
> +
> +do {
> +ret = fstatfs(fd, );
> +} while (ret != 0 && errno == EINTR);
> +
> +if (!ret && fs.f_type == HUGETLBFS_MAGIC) {
> +return fs.f_bsize;
> +}
> +#endif
> +
> +if (!pagesize) {
> +pagesize = getpagesize();
> +}
> +return pagesize;
> +}
> +
>  static void
>  _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
>  {
>  const uint64_t start_gpa = msg_region->guest_phys_addr;
>  const uint64_t end_gpa = start_gpa + msg_region->memory_size;
>  int prot = PROT_READ | PROT_WRITE;
> +uint64_t mmap_offset, fd_offset;
>  VuDevRegion *r;
>  void *mmap_addr;
>  int low = 0;
> @@ -335,11 +361,25 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion 
> *msg_region, int fd)
>  idx = low;
>
>  /*
> - * We don't use offset argument of mmap() since the mapped address has
> - * to be page aligned, and we use huge pages.
> + * Convert most of msg_region->mmap_offset to fd_offset. In almost all
> + * cases, this will leave us with mmap_offset == 0, mmap()'ing only
> + * what we really need. Only if a memory region would partially cover
> + * hugetlb pages, we'd get mmap_offset != 0, which usually doesn't happen
> + * anymore (i.e., modern QEMU).
> + *
> + * Note that mmap() with hugetlb would fail if the offset into the file
> + * is not aligned to the huge page size.
>   */
> -mmap_addr = mmap(0, msg_region->memory_size + msg_region->mmap_offset,
> - prot, MAP_SHARED | MAP_NORESERVE, fd, 0);
> +fd_offset = ALIGN_DOWN(msg_region->mmap_offset, get_fd_pagesize(fd));
> +mmap_offset = msg_region->mmap_offset - fd_offset;
> +
> +DPRINT("fd_offset:   0x%016"PRIx64"\n",
> +   fd_offset);
> +DPRINT("adj mmap_offset: 0x%016"PRIx64"\n",
> +   mmap_offset);
> +
> +mmap_addr = mmap(0, msg_region->memory_size + 

Re: [PATCH v1 13/15] libvhost-user: Factor out vq usability check

2024-02-03 Thread Raphael Norwitz
On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand  wrote:
>
> Let's factor it out to prepare for further changes.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 24 +++
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 7d8293dc84..febeb2eb89 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -283,6 +283,12 @@ vu_remove_all_mem_regs(VuDev *dev)
>  dev->nregions = 0;
>  }
>
> +static bool
> +vu_is_vq_usable(VuDev *dev, VuVirtq *vq)
> +{
> +return likely(!dev->broken) && likely(vq->vring.avail);
> +}
> +
>  static size_t
>  get_fd_pagesize(int fd)
>  {
> @@ -2378,8 +2384,7 @@ vu_queue_get_avail_bytes(VuDev *dev, VuVirtq *vq, 
> unsigned int *in_bytes,
>  idx = vq->last_avail_idx;
>
>  total_bufs = in_total = out_total = 0;
> -if (unlikely(dev->broken) ||
> -unlikely(!vq->vring.avail)) {
> +if (!vu_is_vq_usable(dev, vq)) {
>  goto done;
>  }
>
> @@ -2494,8 +2499,7 @@ vu_queue_avail_bytes(VuDev *dev, VuVirtq *vq, unsigned 
> int in_bytes,
>  bool
>  vu_queue_empty(VuDev *dev, VuVirtq *vq)
>  {
> -if (unlikely(dev->broken) ||
> -unlikely(!vq->vring.avail)) {
> +if (!vu_is_vq_usable(dev, vq)) {
>  return true;
>  }
>
> @@ -2534,8 +2538,7 @@ vring_notify(VuDev *dev, VuVirtq *vq)
>
>  static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
>  {
> -if (unlikely(dev->broken) ||
> -unlikely(!vq->vring.avail)) {
> +if (!vu_is_vq_usable(dev, vq)) {
>  return;
>  }
>
> @@ -2860,8 +2863,7 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
>  unsigned int head;
>  VuVirtqElement *elem;
>
> -if (unlikely(dev->broken) ||
> -unlikely(!vq->vring.avail)) {
> +if (!vu_is_vq_usable(dev, vq)) {
>  return NULL;
>  }
>
> @@ -3018,8 +3020,7 @@ vu_queue_fill(VuDev *dev, VuVirtq *vq,
>  {
>  struct vring_used_elem uelem;
>
> -if (unlikely(dev->broken) ||
> -unlikely(!vq->vring.avail)) {
> +if (!vu_is_vq_usable(dev, vq)) {
>  return;
>  }
>
> @@ -3048,8 +3049,7 @@ vu_queue_flush(VuDev *dev, VuVirtq *vq, unsigned int 
> count)
>  {
>  uint16_t old, new;
>
> -if (unlikely(dev->broken) ||
> -unlikely(!vq->vring.avail)) {
> +if (!vu_is_vq_usable(dev, vq)) {
>  return;
>  }
>
> --
> 2.43.0
>
>



Re: [PATCH] loongarch: Change the UEFI loading mode to loongarch

2024-02-03 Thread lixianglai

Hi  maobibo:




diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index 6ef9a92394..d1fba1204e 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -20,8 +20,10 @@
  #define VIRT_BIOS_BASE  0x1c00UL
  #define VIRT_BIOS_SIZE  (4 * MiB)
  #define VIRT_FLASH_SECTOR_SIZE  (128 * KiB)
-#define VIRT_FLASH_BASE 0x1d00UL
-#define VIRT_FLASH_SIZE (16 * MiB)
+#define VIRT_FLASH0_BASE    VIRT_BIOS_BASE
+#define VIRT_FLASH0_SIZE    VIRT_BIOS_SIZE


Xianglai,

If there are two flash, what is size for flash0 16M (VIRT_FLASH1_BASE 
- VIRT_FLASH0_BASE) or 4M (VIRT_BIOS_SIZE) ?


IIRC it should be 16M (VIRT_FLASH1_BASE -  VIRT_FLASH0_BASE).

Regards
Bibo Mao


I referred to Philippe's last review suggestion:


https://lore.kernel.org/qemu-devel/b62401b2-3a12-e89d-6953-b40dd170b...@linaro.org/

@@ -20,6 +21,9 @@  > #define VIRT_FWCFG_BASE 0x1e02UL > #define VIRT_BIOS_BASE 
0x1c00UL > #define VIRT_BIOS_SIZE (4 * MiB) > +#define 
VIRT_FLASH_SECTOR_SIZE (128 * KiB) > +#define VIRT_FLASH0_BASE 
(VIRT_BIOS_BASE + VIRT_BIOS_SIZE)

Do you really want the flash base addr to depend of the ROM size?
It could be safer/simpler to start with a fixed address, leaving
room for a bigger ROM if you think you might have to use one.


Thanks!

Xianglai



+#define VIRT_FLASH1_BASE 0x1d00UL
+#define VIRT_FLASH1_SIZE    (16 * MiB)
    #define VIRT_LOWMEM_BASE    0
  #define VIRT_LOWMEM_SIZE    0x1000
@@ -49,7 +51,7 @@ struct LoongArchMachineState {
  int  fdt_size;
  DeviceState *platform_bus_dev;
  PCIBus   *pci_bus;
-    PFlashCFI01  *flash;
+    PFlashCFI01  *flash[2];
  MemoryRegion system_iocsr;
  MemoryRegion iocsr_mem;
  AddressSpace as_iocsr;





Re: [PATCH] loongarch: Change the UEFI loading mode to loongarch

2024-02-03 Thread lixianglai


Hi  maobibo:



diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index 6ef9a92394..d1fba1204e 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -20,8 +20,10 @@
  #define VIRT_BIOS_BASE  0x1c00UL
  #define VIRT_BIOS_SIZE  (4 * MiB)
  #define VIRT_FLASH_SECTOR_SIZE  (128 * KiB)
-#define VIRT_FLASH_BASE 0x1d00UL
-#define VIRT_FLASH_SIZE (16 * MiB)
+#define VIRT_FLASH0_BASE    VIRT_BIOS_BASE
+#define VIRT_FLASH0_SIZE    VIRT_BIOS_SIZE


Xianglai,

If there are two flash, what is size for flash0 16M (VIRT_FLASH1_BASE 
- VIRT_FLASH0_BASE) or 4M (VIRT_BIOS_SIZE) ?


IIRC it should be 16M (VIRT_FLASH1_BASE -  VIRT_FLASH0_BASE).

Regards
Bibo Mao


I referred to Philippe's last review suggestion:


https://lore.kernel.org/qemu-devel/b62401b2-3a12-e89d-6953-b40dd170b...@linaro.org/

> @@ -20,6 +21,9 @@ > #define VIRT_FWCFG_BASE 0x1e02UL > #define VIRT_BIOS_BASE 
0x1c00UL > #define VIRT_BIOS_SIZE (4 * MiB) > +#define 
VIRT_FLASH_SECTOR_SIZE (128 * KiB) > +#define VIRT_FLASH0_BASE 
(VIRT_BIOS_BASE + VIRT_BIOS_SIZE)

Do you really want the flash base addr to depend of the ROM size?
It could be safer/simpler to start with a fixed address, leaving
room for a bigger ROM if you think you might have to use one.


Ok, I think I misunderstood what he meant, I will keep the changes to 
VIRT_FLASH0_BASE


and change the VIRT_BIOS_SIZE to 16MB, like this:

 - #define VIRT_BIOS_SIZE  (4 * MiB)

  +#define VIRT_BIOS_SIZE  (16 * MiB)

  #define VIRT_FLASH_SECTOR_SIZE  (128 * KiB)
-#define VIRT_FLASH_BASE 0x1d00UL
-#define VIRT_FLASH_SIZE (16 * MiB)
+#define VIRT_FLASH0_BASE    VIRT_BIOS_BASE
+#define VIRT_FLASH0_SIZE    VIRT_BIOS_SIZE

+#define VIRT_FLASH1_BASE0x1d00UL
+#define VIRT_FLASH1_SIZE(16 * MiB)

Thanks,

Xianglai




Thanks!

Xianglai



+#define VIRT_FLASH1_BASE 0x1d00UL
+#define VIRT_FLASH1_SIZE    (16 * MiB)
    #define VIRT_LOWMEM_BASE    0
  #define VIRT_LOWMEM_SIZE    0x1000
@@ -49,7 +51,7 @@ struct LoongArchMachineState {
  int  fdt_size;
  DeviceState *platform_bus_dev;
  PCIBus   *pci_bus;
-    PFlashCFI01  *flash;
+    PFlashCFI01  *flash[2];
  MemoryRegion system_iocsr;
  MemoryRegion iocsr_mem;
  AddressSpace as_iocsr;





[PATCH] target/riscv: Enable xtheadsync under user mode

2024-02-03 Thread LIU Zhiwei
According to xtheadsync[1][2] documentation, it can be used in user mode and
the behavior is same with other priviledges.

[1]:https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadsync/sync.adoc
[2]:https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadsync/sync_i.adoc

Signed-off-by: LIU Zhiwei 
---
 target/riscv/insn_trans/trans_xthead.c.inc | 10 --
 1 file changed, 10 deletions(-)

diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index dbb6411239..22488412d4 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -992,7 +992,6 @@ static bool trans_th_sfence_vmas(DisasContext *ctx, 
arg_th_sfence_vmas *a)
 #endif
 }
 
-#ifndef CONFIG_USER_ONLY
 static void gen_th_sync_local(DisasContext *ctx)
 {
 /*
@@ -1003,14 +1002,12 @@ static void gen_th_sync_local(DisasContext *ctx)
 tcg_gen_exit_tb(NULL, 0);
 ctx->base.is_jmp = DISAS_NORETURN;
 }
-#endif
 
 static bool trans_th_sync(DisasContext *ctx, arg_th_sync *a)
 {
 (void) a;
 REQUIRE_XTHEADSYNC(ctx);
 
-#ifndef CONFIG_USER_ONLY
 REQUIRE_PRIV_MSU(ctx);
 
 /*
@@ -1019,9 +1016,6 @@ static bool trans_th_sync(DisasContext *ctx, arg_th_sync 
*a)
 gen_th_sync_local(ctx);
 
 return true;
-#else
-return false;
-#endif
 }
 
 static bool trans_th_sync_i(DisasContext *ctx, arg_th_sync_i *a)
@@ -1029,7 +1023,6 @@ static bool trans_th_sync_i(DisasContext *ctx, 
arg_th_sync_i *a)
 (void) a;
 REQUIRE_XTHEADSYNC(ctx);
 
-#ifndef CONFIG_USER_ONLY
 REQUIRE_PRIV_MSU(ctx);
 
 /*
@@ -1038,9 +1031,6 @@ static bool trans_th_sync_i(DisasContext *ctx, 
arg_th_sync_i *a)
 gen_th_sync_local(ctx);
 
 return true;
-#else
-return false;
-#endif
 }
 
 static bool trans_th_sync_is(DisasContext *ctx, arg_th_sync_is *a)
-- 
2.25.1




Re: [PATCH v11 0/3] gdbstub and TCG plugin improvements

2024-02-03 Thread Akihiko Odaki

On 2024/02/03 22:58, Alex Bennée wrote:

Akihiko Odaki  writes:


On 2024/02/03 20:08, Alex Bennée wrote:

Akihiko Odaki  writes:


This series extracts fixes and refactorings that can be applied
independently from "[PATCH v9 00/23] plugins: Allow to read registers".

The patch "target/riscv: Move MISA limits to class" was replaced with
patch "target/riscv: Move misa_mxl_max to class" since I found instances
may have different misa_ext_mask.

As this is re-based on Alistair's riscv-to-apply.next tree I'll wait
for
this to go through the RiscV trees and then re-base the plugin patches
and dropping the merged riscv patches from my tree.
In the meantime feel free to review:
Message-Id: <20240122145610.413836-1-alex.ben...@linaro.org>
Date: Mon, 22 Jan 2024 14:55:49 +
Subject: [PATCH v3 00/21] plugin updates (register access) for 9.0 (pre-PR?)
From: =?UTF-8?q?Alex=20Benn=C3=A9e?= 
For:
contrib/plugins: extend execlog to track register changes
gdbstub: expose api to find registers
So I can add this to my maintainer omnibus series for the next PR I
send.


I added one trivial comment to: "gdbstub: expose api to find registers"

"contrib/plugins: extend execlog to track register changes" depends on
"plugins: add an API to read registers". The comments for the patch in
the following email are not addressed yet:
https://lore.kernel.org/all/4b2156ed-688d-4617-b52d-200413f01...@daynix.com/


I don't think we need to serialise with the BQL as the structures are
per-CPU (and created on vCPU creation).


qemu_plugin_get_registers() has vcpu parameter, which can refer to a 
different vcpu the caller is on (or the caller may not be in a vcpu 
context at all).




As far as the restructuring we can move it into gdbstub later if there
is a need to. At the moment the structure is just housekeeping for
plugins.


Certainly we can move it later, but adding the code in the plugin 
infrastructure now won't help in that case.




RE: [PATCH v2 3/3] hw: Set virtio-iommu aw-bits default value on pc_q35_9.0 and arm virt

2024-02-03 Thread Duan, Zhenzhong


>-Original Message-
>From: Eric Auger 
>Subject: Re: [PATCH v2 3/3] hw: Set virtio-iommu aw-bits default value on
>pc_q35_9.0 and arm virt
>
>Hi Zhenzhong,
>
>On 2/2/24 07:51, Duan, Zhenzhong wrote:
>> Hi Eric,
>>
>>> -Original Message-
>>> From: Eric Auger 
>>> pc_q35_9.0 and arm virt
>>>
>>> Currently the default input range can extend to 64 bits. On x86,
>>> when the virtio-iommu protects vfio devices, the physical iommu
>>> may support only 39 bits. Let's set the default to 39, as done
>>> for the intel-iommu. On ARM we set 48b as a default (matching
>>> SMMUv3 SMMU_IDR5.VAX == 0).
>>>
>>> We use hw_compat_8_2 to handle the compatibility for machines
>>> before 9.0 which used to have a virtio-iommu default input range
>>> of 64 bits.
>>>
>>> Of course if aw-bits is set from the command line, the default
>>> is overriden.
>>>
>>> Signed-off-by: Eric Auger 
>>>
>>> ---
>>>
>>> v1 -> v2:
>>> - set aw-bits to 48b on ARM
>>> - use hw_compat_8_2 to handle the compat for older machines
>>>  which used 64b as a default
>>> ---
>>> hw/arm/virt.c| 6 ++
>>> hw/core/machine.c| 5 -
>>> hw/i386/pc.c | 6 ++
>>> hw/virtio/virtio-iommu.c | 2 +-
>>> 4 files changed, 17 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>> index e6ead2c5c8..56539f2fc5 100644
>>> --- a/hw/arm/virt.c
>>> +++ b/hw/arm/virt.c
>>> @@ -2718,10 +2718,16 @@ static void
>>> virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>>> } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MD_PCI)) {
>>> virtio_md_pci_pre_plug(VIRTIO_MD_PCI(dev),
>MACHINE(hotplug_dev),
>>> errp);
>>> } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI))
>{
>>> +uint8_t aw_bits = object_property_get_uint(OBJECT(dev),
>>> +   "aw-bits", NULL);
>>> hwaddr db_start = 0, db_end = 0;
>>> QList *reserved_regions;
>>> char *resv_prop_str;
>>>
>>> +if (!aw_bits) {
>>> +qdev_prop_set_uint8(dev, "aw-bits", 48);
>>> +}
>>> +
>>> if (vms->iommu != VIRT_IOMMU_NONE) {
>>> error_setg(errp, "virt machine does not support multiple
>IOMMUs");
>>> return;
>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>>> index fb5afdcae4..70ac96954c 100644
>>> --- a/hw/core/machine.c
>>> +++ b/hw/core/machine.c
>>> @@ -30,9 +30,12 @@
>>> #include "exec/confidential-guest-support.h"
>>> #include "hw/virtio/virtio-pci.h"
>>> #include "hw/virtio/virtio-net.h"
>>> +#include "hw/virtio/virtio-iommu.h"
>>> #include "audio/audio.h"
>>>
>>> -GlobalProperty hw_compat_8_2[] = {};
>>> +GlobalProperty hw_compat_8_2[] = {
>>> +{ TYPE_VIRTIO_IOMMU_PCI, "aw-bits", "64" },
>>> +};
>>> const size_t hw_compat_8_2_len = G_N_ELEMENTS(hw_compat_8_2);
>>>
>>> GlobalProperty hw_compat_8_1[] = {
>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>> index 803244e5cc..0e2bcb4840 100644
>>> --- a/hw/i386/pc.c
>>> +++ b/hw/i386/pc.c
>>> @@ -1458,6 +1458,8 @@ static void
>>> pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>>> } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MD_PCI)) {
>>> virtio_md_pci_pre_plug(VIRTIO_MD_PCI(dev),
>MACHINE(hotplug_dev),
>>> errp);
>>> } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI))
>{
>>> +uint8_t aw_bits = object_property_get_uint(OBJECT(dev),
>>> +   "aw-bits", NULL);
>>> /* Declare the APIC range as the reserved MSI region */
>>> char *resv_prop_str = g_strdup_printf("0xfee0:0xfeef:%d",
>>>   VIRTIO_IOMMU_RESV_MEM_T_MSI);
>>> @@ -1466,6 +1468,10 @@ static void
>>> pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>>> qlist_append_str(reserved_regions, resv_prop_str);
>>> qdev_prop_set_array(dev, "reserved-regions", reserved_regions);
>>>
>>> +if (!aw_bits) {
>>> +qdev_prop_set_uint8(dev, "aw-bits", 39);
>>> +}
>>> +
>>> g_free(resv_prop_str);
>>> }
>>>
>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>>> index 7870bdbeee..c468e9b13b 100644
>>> --- a/hw/virtio/virtio-iommu.c
>>> +++ b/hw/virtio/virtio-iommu.c
>>> @@ -1529,7 +1529,7 @@ static Property virtio_iommu_properties[] = {
>>> DEFINE_PROP_LINK("primary-bus", VirtIOIOMMU, primary_bus,
>>>  TYPE_PCI_BUS, PCIBus *),
>>> DEFINE_PROP_BOOL("boot-bypass", VirtIOIOMMU, boot_bypass, true),
>>> -DEFINE_PROP_UINT8("aw-bits", VirtIOIOMMU, aw_bits, 64),
>>> +DEFINE_PROP_UINT8("aw-bits", VirtIOIOMMU, aw_bits, 0),
>> Not clear if virtio-iommu support other archs besides x86 and arm.
>> It looks on those archs, aw_bits is default 0 on machine 9.0 above
>> and will fails the check in realize?
>
>At the moment the virtio-iommu only is supported along with q35 and
>arm-virt.
>Only those 

[PATCH 1/4] chardev/parallel: Don't close stdin on inappropriate device

2024-02-03 Thread Markus Armbruster
The __linux__ version of qemu_chr_open_pp_fd() tries to claim the
parport device with a PPCLAIM ioctl().  On success, it stores the file
descriptor in the chardev object, and returns success.  On failure, it
closes the file descriptor, and returns failure.

chardev_new() then passes the Chardev to object_unref().  This duly
calls char_parallel_finalize(), which closes the file descriptor
stored in the chardev object.  Since qemu_chr_open_pp_fd() didn't
store it, it's still zero, so this closes standard input.  Ooopsie.

To demonstate, add a unit test.  With the bug above unfixed, running
this test closes standard input.  char_hotswap_test() happens to run
next.  It opens a socket, duly gets file descriptor 0, and since it
tests for success with > 0 instead of >= 0, it fails.

The test needs to be conditional exactly like the chardev it tests.
Since the condition is rather complicated, steal the solution from the
serial chardev: define HAVE_CHARDEV_PARALLEL in qemu/osdep.h.  This
also permits simplifying chardev/meson.build a bit.

The bug fix is easy enough: store the file descriptor, and leave
closing it to char_parallel_finalize().

Signed-off-by: Markus Armbruster 
---
 include/qemu/osdep.h|  9 -
 chardev/char-parallel.c |  7 +--
 tests/unit/test-char.c  | 21 +
 chardev/meson.build |  4 +---
 4 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index c9692cc314..3b0f3389d3 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -508,11 +508,18 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 
 #ifdef _WIN32
 #define HAVE_CHARDEV_SERIAL 1
-#elif defined(__linux__) || defined(__sun__) || defined(__FreeBSD__)\
+#define HAVE_CHARDEV_PARALLEL 1
+#else
+#if defined(__linux__) || defined(__sun__) || defined(__FreeBSD__)   \
 || defined(__NetBSD__) || defined(__OpenBSD__) || defined(__DragonFly__) \
 || defined(__GLIBC__) || defined(__APPLE__)
 #define HAVE_CHARDEV_SERIAL 1
 #endif
+#if defined(__linux__) || defined(__FreeBSD__) \
+|| defined(__FreeBSD_kernel__) || defined(__DragonFly__)
+#define HAVE_CHARDEV_PARALLEL 1
+#endif
+#endif
 
 #if defined(__HAIKU__)
 #define SIGIO SIGPOLL
diff --git a/chardev/char-parallel.c b/chardev/char-parallel.c
index a5164f975a..78697d7522 100644
--- a/chardev/char-parallel.c
+++ b/chardev/char-parallel.c
@@ -164,13 +164,13 @@ static void qemu_chr_open_pp_fd(Chardev *chr,
 {
 ParallelChardev *drv = PARALLEL_CHARDEV(chr);
 
+drv->fd = fd;
+
 if (ioctl(fd, PPCLAIM) < 0) {
 error_setg_errno(errp, errno, "not a parallel port");
-close(fd);
 return;
 }
 
-drv->fd = fd;
 drv->mode = IEEE1284_MODE_COMPAT;
 }
 #endif /* __linux__ */
@@ -238,6 +238,7 @@ static void qemu_chr_open_pp_fd(Chardev *chr,
 }
 #endif
 
+#ifdef HAVE_CHARDEV_PARALLEL
 static void qmp_chardev_open_parallel(Chardev *chr,
   ChardevBackend *backend,
   bool *be_opened,
@@ -306,3 +307,5 @@ static void register_types(void)
 }
 
 type_init(register_types);
+
+#endif  /* HAVE_CHARDEV_PARALLEL */
diff --git a/tests/unit/test-char.c b/tests/unit/test-char.c
index 649fdf64e1..76946e6f90 100644
--- a/tests/unit/test-char.c
+++ b/tests/unit/test-char.c
@@ -1203,6 +1203,24 @@ static void char_serial_test(void)
 }
 #endif
 
+#if defined(HAVE_CHARDEV_PARALLEL) && !defined(WIN32)
+static void char_parallel_test(void)
+{
+QemuOpts *opts;
+Chardev *chr;
+
+opts = qemu_opts_create(qemu_find_opts("chardev"), "parallel-id",
+1, _abort);
+qemu_opt_set(opts, "backend", "parallel", _abort);
+qemu_opt_set(opts, "path", "/dev/null", _abort);
+
+chr = qemu_chr_new_from_opts(opts, NULL, NULL);
+g_assert_null(chr);
+
+qemu_opts_del(opts);
+}
+#endif
+
 #ifndef _WIN32
 static void char_file_fifo_test(void)
 {
@@ -1544,6 +1562,9 @@ int main(int argc, char **argv)
 g_test_add_func("/char/udp", char_udp_test);
 #if defined(HAVE_CHARDEV_SERIAL) && !defined(WIN32)
 g_test_add_func("/char/serial", char_serial_test);
+#endif
+#if defined(HAVE_CHARDEV_PARALLEL) && !defined(WIN32)
+g_test_add_func("/char/parallel", char_parallel_test);
 #endif
 g_test_add_func("/char/hotswap", char_hotswap_test);
 g_test_add_func("/char/websocket", char_websock_test);
diff --git a/chardev/meson.build b/chardev/meson.build
index c80337d15f..b39028b7b0 100644
--- a/chardev/meson.build
+++ b/chardev/meson.build
@@ -21,11 +21,9 @@ if host_os == 'windows'
 else
   chardev_ss.add(files(
   'char-fd.c',
+'char-parallel.c',
   'char-pty.c',
 ), util)
-  if host_os in ['linux', 'gnu/kfreebsd', 'freebsd', 'dragonfly']
-chardev_ss.add(files('char-parallel.c'))
-  endif
 endif
 
 chardev_ss = chardev_ss.apply({})
-- 
2.43.0




[PATCH 4/4] qapi/char: Deprecate backend type "memory"

2024-02-03 Thread Markus Armbruster
It's an alias for "ringbuf" we kept for backward compatibility; see
commit 3a1da42eb35 (qapi: Rename ChardevBackend member "memory" to
"ringbuf").  Deprecation is long overdue.

Signed-off-by: Markus Armbruster 
---
 docs/about/deprecated.rst | 8 
 qapi/char.json| 8 +---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index d4492b9460..ae1a520c26 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -371,6 +371,14 @@ Specifying the iSCSI password in plain text on the command 
line using the
 used instead, to refer to a ``--object secret...`` instance that provides
 a password via a file, or encrypted.
 
+Character device options
+
+
+Backend ``memory`` (since 9.0)
+^^
+
+``memory`` is a deprecated synonym for ``ringbuf``.
+
 CPU device properties
 '
 
diff --git a/qapi/char.json b/qapi/char.json
index 2d74e66746..75a7e057f0 100644
--- a/qapi/char.json
+++ b/qapi/char.json
@@ -468,6 +468,10 @@
 #
 # @memory: Since 1.5
 #
+# Features:
+#
+# @deprecated: Member @memory is deprecated.  Use @ringbuf instead.
+#
 # Since: 1.4
 ##
 { 'enum': 'ChardevBackendKind',
@@ -492,8 +496,7 @@
 { 'name': 'dbus', 'if': 'CONFIG_DBUS_DISPLAY' },
 'vc',
 'ringbuf',
-# next one is just for compatibility
-'memory' ] }
+{ 'name': 'memory', 'features': [ 'deprecated' ] } ] }
 
 ##
 # @ChardevFileWrapper:
@@ -642,7 +645,6 @@
   'if': 'CONFIG_DBUS_DISPLAY' },
 'vc': 'ChardevVCWrapper',
 'ringbuf': 'ChardevRingbufWrapper',
-# next one is just for compatibility
 'memory': 'ChardevRingbufWrapper' } }
 
 ##
-- 
2.43.0




[PATCH 2/4] tests/unit/test-char: Fix qemu_socket(), make_udp_socket() check

2024-02-03 Thread Markus Armbruster
qemu_socket() and make_udp_socket() return a file descriptor on
success, -1 on failure.  The check misinterprets 0 as failure.  Fix
that.

Signed-off-by: Markus Armbruster 
---
 tests/unit/test-char.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/unit/test-char.c b/tests/unit/test-char.c
index 76946e6f90..e3b783c06b 100644
--- a/tests/unit/test-char.c
+++ b/tests/unit/test-char.c
@@ -556,7 +556,7 @@ static int make_udp_socket(int *port)
 socklen_t alen = sizeof(addr);
 int ret, sock = qemu_socket(PF_INET, SOCK_DGRAM, 0);
 
-g_assert_cmpint(sock, >, 0);
+g_assert_cmpint(sock, >=, 0);
 addr.sin_family = AF_INET ;
 addr.sin_addr.s_addr = htonl(INADDR_ANY);
 addr.sin_port = 0;
@@ -1401,7 +1401,7 @@ static void char_hotswap_test(void)
 
 int port;
 int sock = make_udp_socket();
-g_assert_cmpint(sock, >, 0);
+g_assert_cmpint(sock, >=, 0);
 
 chr_args = g_strdup_printf("udp:127.0.0.1:%d", port);
 
-- 
2.43.0




[PATCH 3/4] qapi/char: Make backend types properly conditional

2024-02-03 Thread Markus Armbruster
Character backends are actually QOM types.  When a backend's
compile-time conditional QOM type is not compiled in, creation fails
with "'FOO' is not a valid char driver name".  Okay, except
introspecting chardev-add with query-qmp-schema doesn't work then: the
backend type is there even though the QOM type isn't.

A management application can work around this issue by using
qom-list-types instead.

Fix the issue anyway: add the conditionals to the QAPI schema.

Signed-off-by: Markus Armbruster 
---
 qapi/char.json | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/qapi/char.json b/qapi/char.json
index 6c6ad3b10c..2d74e66746 100644
--- a/qapi/char.json
+++ b/qapi/char.json
@@ -472,8 +472,8 @@
 ##
 { 'enum': 'ChardevBackendKind',
   'data': [ 'file',
-'serial',
-'parallel',
+{ 'name': 'serial', 'if': 'HAVE_CHARDEV_SERIAL' },
+{ 'name': 'parallel', 'if': 'HAVE_CHARDEV_PARALLEL' },
 'pipe',
 'socket',
 'udp',
@@ -482,10 +482,10 @@
 'mux',
 'msmouse',
 'wctablet',
-'braille',
+{ 'name': 'braille', 'if': 'CONFIG_BRLAPI' },
 'testdev',
 'stdio',
-'console',
+{ 'name': 'console', 'if': 'CONFIG_WIN32' },
 { 'name': 'spicevmc', 'if': 'CONFIG_SPICE' },
 { 'name': 'spiceport', 'if': 'CONFIG_SPICE' },
 { 'name': 'qemu-vdagent', 'if': 'CONFIG_SPICE_PROTOCOL' },
@@ -614,8 +614,10 @@
   'base': { 'type': 'ChardevBackendKind' },
   'discriminator': 'type',
   'data': { 'file': 'ChardevFileWrapper',
-'serial': 'ChardevHostdevWrapper',
-'parallel': 'ChardevHostdevWrapper',
+'serial': { 'type': 'ChardevHostdevWrapper',
+'if': 'HAVE_CHARDEV_SERIAL' },
+'parallel': { 'type': 'ChardevHostdevWrapper',
+  'if': 'HAVE_CHARDEV_PARALLEL' },
 'pipe': 'ChardevHostdevWrapper',
 'socket': 'ChardevSocketWrapper',
 'udp': 'ChardevUdpWrapper',
@@ -624,10 +626,12 @@
 'mux': 'ChardevMuxWrapper',
 'msmouse': 'ChardevCommonWrapper',
 'wctablet': 'ChardevCommonWrapper',
-'braille': 'ChardevCommonWrapper',
+'braille': { 'type': 'ChardevCommonWrapper',
+ 'if': 'CONFIG_BRLAPI' },
 'testdev': 'ChardevCommonWrapper',
 'stdio': 'ChardevStdioWrapper',
-'console': 'ChardevCommonWrapper',
+'console': { 'type': 'ChardevCommonWrapper',
+ 'if': 'CONFIG_WIN32' },
 'spicevmc': { 'type': 'ChardevSpiceChannelWrapper',
   'if': 'CONFIG_SPICE' },
 'spiceport': { 'type': 'ChardevSpicePortWrapper',
-- 
2.43.0




[PATCH 0/4] char: Minor fixes, and a tighter QAPI schema

2024-02-03 Thread Markus Armbruster
Markus Armbruster (4):
  chardev/parallel: Don't close stdin on inappropriate device
  tests/unit/test-char: Fix qemu_socket(), make_udp_socket() check
  qapi/char: Make backend types properly conditional
  qapi/char: Deprecate backend type "memory"

 docs/about/deprecated.rst |  8 
 qapi/char.json| 28 +---
 include/qemu/osdep.h  |  9 -
 chardev/char-parallel.c   |  7 +--
 tests/unit/test-char.c| 25 +++--
 chardev/meson.build   |  4 +---
 6 files changed, 62 insertions(+), 19 deletions(-)

-- 
2.43.0




Re: [PATCH 3/3] monitor: only run coroutine commands in qemu_aio_context

2024-02-03 Thread Michael Tokarev

16.01.2024 22:00, Stefan Hajnoczi пишет:

monitor_qmp_dispatcher_co() runs in the iohandler AioContext that is not
polled during nested event loops. The coroutine currently reschedules
itself in the main loop's qemu_aio_context AioContext, which is polled
during nested event loops. One known problem is that QMP device-add
calls drain_call_rcu(), which temporarily drops the BQL, leading to all
sorts of havoc like other vCPU threads re-entering device emulation code
while another vCPU thread is waiting in device emulation code with
aio_poll().

Paolo Bonzini suggested running non-coroutine QMP handlers in the
iohandler AioContext. This avoids trouble with nested event loops. His
original idea was to move coroutine rescheduling to
monitor_qmp_dispatch(), but I resorted to moving it to qmp_dispatch()
because we don't know if the QMP handler needs to run in coroutine
context in monitor_qmp_dispatch(). monitor_qmp_dispatch() would have
been nicer since it's associated with the monitor implementation and not
as general as qmp_dispatch(), which is also used by qemu-ga.

A number of qemu-iotests need updated .out files because the order of
QMP events vs QMP responses has changed.

Solves Issue #1933.

Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use drain_call_rcu in 
in qmp_device_add")
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985
Buglink: https://issues.redhat.com/browse/RHEL-17369
Signed-off-by: Stefan Hajnoczi 


This change broke something in 7.2. I'm still debugging it, will
come with a follow-up once some more details are found, I'll also
check current master with and without this commit.

The prob happens with multiple suspend-resume cycles, - with this
change applied, guest does not work as expected after *second*
suspend-resume.

JFYI for now.

Thanks,

/mjt



[RFC 4/6] target/i386: Add support for Intel Thread Director feature

2024-02-03 Thread Zhao Liu
From: Zhao Liu 

Intel Thread Director (ITD) is the extension of HFI, and it extends
the HFI to provide performance and energy efficiency data for advanced
classes of instructions [1].

>From Alder Lake, Intel's client products support ITD, and this feature
can be used in VM to optimize scheduling on hybrid architectures.

Like HFI, ITD virtualization also has the same topology limitations
(only 1 die and 1 socket) because ITD's virtualization support is based
on HFI.

In order to avoid potential contention problems caused by multiple
virtual-packages/dies, add the following restrictions to the ITD feature
bit:

1. Mark ITD as no_autoenable_flags and it won't be enabled by default.
2. ITD can't be enabled for the case with multiple packages/dies.

ITD feature depends on HFI, so also add its dependency.

[1]: SDM, vol. 3B, section 15.6 HARDWARE FEEDBACK INTERFACE AND INTEL
 THREAD DIRECTOR

Tested-by: Yanting Jiang 
Co-developed-by: Zhuocheng Ding 
Signed-off-by: Zhuocheng Ding 
Signed-off-by: Zhao Liu 
---
 target/i386/cpu.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index e3eb361436c9..55287d0a3e73 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1118,17 +1118,18 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, "hfi",
-NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, "itd",
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 },
 .cpuid = { .eax = 6, .reg = R_EAX, },
 .tcg_features = TCG_6_EAX_FEATURES,
 /*
- * PTS and HFI shouldn't be enabled by default since they have
+ * PTS, HFI and ITD shouldn't be enabled by default since they have
  * requirement for cpu topology.
  */
-.no_autoenable_flags = CPUID_6_EAX_PTS | CPUID_6_EAX_HFI,
+.no_autoenable_flags = CPUID_6_EAX_PTS | CPUID_6_EAX_HFI |
+   CPUID_6_EAX_ITD,
 },
 [FEAT_XSAVE_XCR0_LO] = {
 .type = CPUID_FEATURE_WORD,
@@ -1569,6 +1570,10 @@ static FeatureDep feature_dependencies[] = {
 .from = { FEAT_6_EAX,   CPUID_6_EAX_PTS },
 .to = { FEAT_6_EAX, CPUID_6_EAX_HFI },
 },
+{
+.from = { FEAT_6_EAX,   CPUID_6_EAX_HFI },
+.to = { FEAT_6_EAX, CPUID_6_EAX_ITD },
+},
 };
 
 typedef struct X86RegisterInfo32 {
@@ -7468,10 +7473,10 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 return;
 }
 
-if (env->features[FEAT_6_EAX] & CPUID_6_EAX_HFI &&
+if (env->features[FEAT_6_EAX] & (CPUID_6_EAX_HFI | CPUID_6_EAX_ITD) &&
 (ms->smp.dies > 1 || ms->smp.sockets > 1)) {
 error_setg(errp,
-   "HFI currently only supports die/package, "
+   "HFI/ITD currently only supports die/package, "
"please set by \"-smp ...,sockets=1,dies=1\"");
 return;
 }
-- 
2.34.1




[RFC 5/6] target/i386: Add support for HRESET feature

2024-02-03 Thread Zhao Liu
From: Zhuocheng Ding 

HRESET provides an HRESET instruction to reset the ITD related history
accumulated on the current logical processor it is executing on [1].

HRESET feature not only needs to have the feature bit of 0x07.0x01.eax
[bit 22] in the CPUID, but also the associated 0x20 leaf, thus, we also
fill HRESET related info provided by KVM into Guest's 0x20 leaf.

Because currently HRESET is only used to reset ITD's history and ITD has
been marked as no_autoenable_flags, mark the HRESET feature bit as
no_autoenable_flags, too.

Additionally, add MSR_IA32_HW_HRESET_ENABLE save/load support since it's
emulated in KVM. This MSR is used to control the enabling of ITD's
history reset.

[1]: SDM, vol. 3B, section 15.6.11 Logical Processor Scope History

Tested-by: Yanting Jiang 
Co-developed-by: Zhuocheng Ding 
Signed-off-by: Zhuocheng Ding 
Signed-off-by: Zhao Liu 
---
 target/i386/cpu.c | 28 +++-
 target/i386/cpu.h |  6 ++
 target/i386/kvm/kvm.c | 14 ++
 3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 55287d0a3e73..3b26b471b861 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -966,7 +966,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 NULL, NULL, "fzrm", "fsrs",
 "fsrc", NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
-NULL, "amx-fp16", NULL, "avx-ifma",
+NULL, "amx-fp16", "hreset", "avx-ifma",
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 },
@@ -976,6 +976,11 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .reg = R_EAX,
 },
 .tcg_features = TCG_7_1_EAX_FEATURES,
+/*
+ * Currently HRESET is only used for ITD history reset. ITD is not
+ * autoenable, so also don't enable HRESET by default.
+ */
+.no_autoenable_flags = CPUID_7_1_EAX_HRESET,
 },
 [FEAT_7_1_EDX] = {
 .type = CPUID_FEATURE_WORD,
@@ -6502,6 +6507,22 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 }
 break;
 }
+case 0x20: {
+/* Processor History Reset */
+if (kvm_enabled() &&
+env->features[FEAT_7_1_EAX] & CPUID_7_1_EAX_HRESET) {
+*eax = kvm_arch_get_supported_cpuid(cs->kvm_state, 0x20,
+count, R_EAX);
+*ebx = kvm_arch_get_supported_cpuid(cs->kvm_state, 0x20,
+count, R_EBX);
+} else {
+*eax = 0;
+*ebx = 0;
+}
+*ecx = 0;
+*edx = 0;
+break;
+}
 case 0x4000:
 /*
  * CPUID code in kvm_arch_init_vcpu() ignores stuff
@@ -7147,6 +7168,11 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
 if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_SGX) {
 x86_cpu_adjust_level(cpu, >cpuid_min_level, 0x12);
 }
+
+/* HRESET requires CPUID[0x20] */
+if (env->features[FEAT_7_1_EAX] & CPUID_7_1_EAX_HRESET) {
+x86_cpu_adjust_level(cpu, >cpuid_min_level, 0x20);
+}
 }
 
 /* Set cpuid_*level* based on cpuid_min_*level, if not explicitly set */
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index b54a2ccd6a6e..a68c9d8a8660 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -535,6 +535,7 @@ typedef enum X86Seg {
 
 #define MSR_IA32_HW_FEEDBACK_CONFIG 0x17d0
 #define MSR_IA32_HW_FEEDBACK_PTR0x17d1
+#define MSR_IA32_HW_HRESET_ENABLE   0x17da
 
 #define MSR_IA32_VMX_BASIC  0x0480
 #define MSR_IA32_VMX_PINBASED_CTLS  0x0481
@@ -933,6 +934,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_1_EAX_FSRC  (1U << 12)
 /* Support Tile Computational Operations on FP16 Numbers */
 #define CPUID_7_1_EAX_AMX_FP16  (1U << 21)
+/* HISTORY RESET */
+#define CPUID_7_1_EAX_HRESET(1U << 22)
 /* Support for VPMADD52[H,L]UQ */
 #define CPUID_7_1_EAX_AVX_IFMA  (1U << 23)
 
@@ -1786,6 +1789,9 @@ typedef struct CPUArchState {
 uint64_t hfi_config;
 uint64_t hfi_ptr;
 
+/* Per-VCPU HRESET MSR */
+uint64_t hreset_enable;
+
 /* exception/interrupt handling */
 int error_code;
 int exception_is_int;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 694aa20afc67..e490126f23ca 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -141,6 +141,7 @@ static bool has_msr_pkrs;
 static bool has_msr_therm;
 static bool has_msr_pkg_therm;
 static bool has_msr_hfi;
+static bool has_msr_hreset;
 
 static uint32_t has_architectural_pmu_version;
 static uint32_t num_architectural_pmu_gp_counters;
@@ -2471,6 +2472,9 @@ static int kvm_get_supported_msrs(KVMState *s)
 case MSR_IA32_HW_FEEDBACK_PTR:
 has_msr_hfi = true;
 

[RFC 3/6] target/i386: Add support for Hardware Feedback Interface feature

2024-02-03 Thread Zhao Liu
From: Zhuocheng Ding 

Hardware Feedback Interface (HFI) is a basic dependency of ITD.

HFI is the feature to allow hardware to provide guidance to the
Operating System (OS) scheduler to perform optimal workload scheduling
through a hardware feedback interface structure in memory [1]. Currently
in virtualization scenario it is used to help enable ITD.

HFI feature provides the basic HFI information in CPUID.0x06:
* 0x06.eax[bit 19]: HFI feature bit
* 0x06.ecx[bits 08-15]: Number of HFI/ITD supported classes
* 0x06.edx[bits 00-07]: Bitmap of supported HFI capabilities
* 0x06.edx[bits 08-11]: Enumerates the size of the HFI table in number
of 4 KB pages
* 0x06.edx[bits 16-31]: HFI table index of processor

Here the basic information is generated by KVM based on the virtual HFI
table that can be emulated, and QEMU needs to specify the HFI table
index for each vCPU.

HFI feature also provides 2 package level MSRs:
MSR_IA32_HW_FEEDBACK_CONFIG and MSR_IA32_HW_FEEDBACK_PTR.

They're emulated in KVM, but currently KVM hasn't supported msr
topology.

Thus, like PTS MSRs, the emulation of these 2 package-level HFI MSRs are
only supported at the whole VM-level, and all vCPUs share these two
MSRs, so that the emulation of these two MSRs does not distinguish
between the different virtual-packages.

And HFI driver maintains per die HFI instances, so this can also cause
problems with access to HFI MSRs when multiple virtual-dies exist.

In order to avoid potential contention problems caused by multiple
virtual-packages/dies, add the following restrictions to the HFI feature
bit:

1. Mark HFI as no_autoenable_flags and it won't be enabled by default.
2. HFI can't be enabled for the case with multiple packages/dies.
3. HFI can't be enabled if ITD is not set for Guest, since currently HFI
   is only used to help enable ITD in virtualization scenario.

HFI feature depends on ACPI, TM and PTS, also add their dependencies.

Additionally, add save/load support for 2 HFI related MSRs.

[1]: SDM, vol. 3B, section 15.6 HARDWARE FEEDBACK INTERFACE AND INTEL
 THREAD DIRECTOR

Tested-by: Yanting Jiang 
Co-developed-by: Zhuocheng Ding 
Signed-off-by: Zhuocheng Ding 
Signed-off-by: Zhao Liu 
---
 target/i386/cpu.c | 49 ++-
 target/i386/cpu.h |  8 ++-
 target/i386/kvm/kvm.c | 21 +++
 3 files changed, 72 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index e772d35d9403..e3eb361436c9 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1117,7 +1117,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 NULL, NULL, "pts", NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
-NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, "hfi",
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
@@ -1125,10 +1125,10 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .cpuid = { .eax = 6, .reg = R_EAX, },
 .tcg_features = TCG_6_EAX_FEATURES,
 /*
- * PTS shouldn't be enabled by default since it has
+ * PTS and HFI shouldn't be enabled by default since they have
  * requirement for cpu topology.
  */
-.no_autoenable_flags = CPUID_6_EAX_PTS,
+.no_autoenable_flags = CPUID_6_EAX_PTS | CPUID_6_EAX_HFI,
 },
 [FEAT_XSAVE_XCR0_LO] = {
 .type = CPUID_FEATURE_WORD,
@@ -1557,6 +1557,18 @@ static FeatureDep feature_dependencies[] = {
 .from = { FEAT_VMX_SECONDARY_CTLS,  
VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE },
 .to = { FEAT_7_0_ECX,   CPUID_7_0_ECX_WAITPKG },
 },
+{
+.from = { FEAT_1_EDX,   CPUID_ACPI },
+.to = { FEAT_6_EAX, CPUID_6_EAX_HFI },
+},
+{
+.from = { FEAT_1_EDX,   CPUID_TM },
+.to = { FEAT_6_EAX, CPUID_6_EAX_HFI },
+},
+{
+.from = { FEAT_6_EAX,   CPUID_6_EAX_PTS },
+.to = { FEAT_6_EAX, CPUID_6_EAX_HFI },
+},
 };
 
 typedef struct X86RegisterInfo32 {
@@ -6158,6 +6170,25 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ebx = 0;
 *ecx = 0;
 *edx = 0;
+/*
+ * KVM only supports HFI virtualization with ITD, so
+ * set the HFI information only if the ITD is enabled.
+ */
+if (*eax & CPUID_6_EAX_ITD) {
+if (kvm_enabled()) {
+*ecx = kvm_arch_get_supported_cpuid(cs->kvm_state, 0x6,
+count, R_ECX);
+/*
+ * No need to adjust the number of pages since the default
+ * 1 4KB page is enough to hold the HFI entries of max_cpus
+ * (1024) supported by i386 machine (q35).
+ */
+*edx 

[RFC 1/6] target/i386: Add support for save/load of ACPI thermal MSRs

2024-02-03 Thread Zhao Liu
From: Zhuocheng Ding 

The CPUID_ACPI (CPUID.0x01.edx[bit 22]) feature bit has been
introduced as the TCG feature. Currently, based on KVM's ACPI emulation,
add related ACPI support in QEMU.

>From SDM [1], ACPI feature means:

"The ACPI flag (bit 22) of the CPUID feature flags indicates the
presence of the IA32_THERM_STATUS, IA32_THERM_INTERRUPT,
IA32_CLOCK_MODULATION MSRs, and the xAPIC thermal LVT entry."

With the emulation of ACPI in KVM, add the support for save/load of ACPI
thermal MSRs: MSR_IA32_THERM_CONTROL, MSR_IA32_THERM_INTERRUPT and
MSR_IA32_THERM_STATUS.

[1]: SDM, vol. 3B, section 15.8.4.1, Detection of Software Controlled
 Clock Modulation Extension.

Tested-by: Yanting Jiang 
Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
---
 target/i386/cpu.h |  9 +
 target/i386/kvm/kvm.c | 25 +
 2 files changed, 34 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 7f0786e8b98f..e453b3f010e2 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -526,6 +526,10 @@ typedef enum X86Seg {
 #define MSR_IA32_XSS0x0da0
 #define MSR_IA32_UMWAIT_CONTROL 0xe1
 
+#define MSR_IA32_THERM_CONTROL  0x019a
+#define MSR_IA32_THERM_INTERRUPT0x019b
+#define MSR_IA32_THERM_STATUS   0x019c
+
 #define MSR_IA32_VMX_BASIC  0x0480
 #define MSR_IA32_VMX_PINBASED_CTLS  0x0481
 #define MSR_IA32_VMX_PROCBASED_CTLS 0x0482
@@ -1758,6 +1762,11 @@ typedef struct CPUArchState {
 uint64_t msr_lbr_depth;
 LBREntry lbr_records[ARCH_LBR_NR_ENTRIES];
 
+/* Per-VCPU thermal MSRs */
+uint64_t therm_control;
+uint64_t therm_interrupt;
+uint64_t therm_status;
+
 /* exception/interrupt handling */
 int error_code;
 int exception_is_int;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 76a66246eb72..3bf57b35bfcd 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -138,6 +138,7 @@ static bool has_msr_ucode_rev;
 static bool has_msr_vmx_procbased_ctls2;
 static bool has_msr_perf_capabs;
 static bool has_msr_pkrs;
+static bool has_msr_therm;
 
 static uint32_t has_architectural_pmu_version;
 static uint32_t num_architectural_pmu_gp_counters;
@@ -2455,6 +2456,11 @@ static int kvm_get_supported_msrs(KVMState *s)
 case MSR_IA32_PKRS:
 has_msr_pkrs = true;
 break;
+case MSR_IA32_THERM_CONTROL:
+case MSR_IA32_THERM_INTERRUPT:
+case MSR_IA32_THERM_STATUS:
+has_msr_therm = true;
+break;
 }
 }
 }
@@ -3302,6 +3308,11 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 if (has_msr_virt_ssbd) {
 kvm_msr_entry_add(cpu, MSR_VIRT_SSBD, env->virt_ssbd);
 }
+if (has_msr_therm) {
+kvm_msr_entry_add(cpu, MSR_IA32_THERM_CONTROL, env->therm_control);
+kvm_msr_entry_add(cpu, MSR_IA32_THERM_INTERRUPT, env->therm_interrupt);
+kvm_msr_entry_add(cpu, MSR_IA32_THERM_STATUS, env->therm_status);
+}
 
 #ifdef TARGET_X86_64
 if (lm_capable_kernel) {
@@ -3774,6 +3785,11 @@ static int kvm_get_msrs(X86CPU *cpu)
 kvm_msr_entry_add(cpu, MSR_IA32_TSC, 0);
 env->tsc_valid = !runstate_is_running();
 }
+if (has_msr_therm) {
+kvm_msr_entry_add(cpu, MSR_IA32_THERM_CONTROL, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_THERM_INTERRUPT, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_THERM_STATUS, 0);
+}
 
 #ifdef TARGET_X86_64
 if (lm_capable_kernel) {
@@ -4255,6 +4271,15 @@ static int kvm_get_msrs(X86CPU *cpu)
 case MSR_ARCH_LBR_INFO_0 ... MSR_ARCH_LBR_INFO_0 + 31:
 env->lbr_records[index - MSR_ARCH_LBR_INFO_0].info = msrs[i].data;
 break;
+case MSR_IA32_THERM_CONTROL:
+env->therm_control = msrs[i].data;
+break;
+case MSR_IA32_THERM_INTERRUPT:
+env->therm_interrupt = msrs[i].data;
+break;
+case MSR_IA32_THERM_STATUS:
+env->therm_status = msrs[i].data;
+break;
 }
 }
 
-- 
2.34.1




[RFC 6/6] i386: Add a new property to set ITD related feature bits for Guest

2024-02-03 Thread Zhao Liu
From: Zhao Liu 

The property enable-itd will be used to set ITD related feature bits
for Guest, which includes PTS, HFI, ITD and HRESET.

Now PTS, HFI, ITD and HRESET are marked as no_autoenable_flags, since
PTS, HFI and ITD have additional restrictions on CPU topology, and
HRESET is only used in ITD case. If user wants to enable ITD for Guest,
he need to specify PTS, HFI, ITD and HRESET explicitly in the -cpu
command.

Thus it's necessary to introduce "-cpu enable-itd" to help set these
feature bits.

Tested-by: Yanting Jiang 
Signed-off-by: Zhao Liu 
---
 target/i386/cpu.c | 20 +++-
 target/i386/cpu.h |  3 +++
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 3b26b471b861..070f7ff43a1b 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7304,6 +7304,12 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
  */
 x86_cpu_hyperv_realize(cpu);
 
+if (cpu->enable_itd) {
+env->features[FEAT_6_EAX] |= CPUID_6_EAX_PTS | CPUID_6_EAX_HFI |
+ CPUID_6_EAX_ITD;
+env->features[FEAT_7_1_EAX] |= CPUID_7_1_EAX_HRESET;
+}
+
 x86_cpu_expand_features(cpu, _err);
 if (local_err) {
 goto out;
@@ -7494,22 +7500,25 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 
 if (env->features[FEAT_6_EAX] & CPUID_6_EAX_PTS && ms->smp.sockets > 1) {
 error_setg(errp,
-   "PTS currently only supports 1 package, "
-   "please set by \"-smp ...,sockets=1\"");
+   "%s currently only supports 1 package, "
+   "please set by \"-smp ...,sockets=1\"",
+   cpu->enable_itd ? "enable-itd" : "PTS");
 return;
 }
 
 if (env->features[FEAT_6_EAX] & (CPUID_6_EAX_HFI | CPUID_6_EAX_ITD) &&
 (ms->smp.dies > 1 || ms->smp.sockets > 1)) {
 error_setg(errp,
-   "HFI/ITD currently only supports die/package, "
-   "please set by \"-smp ...,sockets=1,dies=1\"");
+   "%s currently only supports 1 die/package, "
+   "please set by \"-smp ...,sockets=1,dies=1\"",
+   cpu->enable_itd ? "enable-itd" : "HFI/ITD");
 return;
 }
 
 if (env->features[FEAT_6_EAX] & (CPUID_6_EAX_PTS | CPUID_6_EAX_HFI) &&
 !(env->features[FEAT_6_EAX] & CPUID_6_EAX_ITD)) {
-error_setg(errp,
+error_setg(errp, "%s", cpu->enable_itd ?
+   "Host doesn't support ITD" :
"In the absence of ITD, Guest does "
"not need PTS/HFI");
 return;
@@ -8003,6 +8012,7 @@ static Property x86_cpu_properties[] = {
  false),
 DEFINE_PROP_BOOL("x-intel-pt-auto-level", X86CPU, intel_pt_auto_level,
  true),
+DEFINE_PROP_BOOL("enable-itd", X86CPU, enable_itd, false),
 DEFINE_PROP_END_OF_LIST()
 };
 
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index a68c9d8a8660..009ec66dead0 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2071,6 +2071,9 @@ struct ArchCPU {
 int32_t hv_max_vps;
 
 bool xen_vapic;
+
+/* Set ITD and related feature bits (PTS, HFI and HRESET) for Guest. */
+bool enable_itd;
 };
 
 typedef struct X86CPUModel X86CPUModel;
-- 
2.34.1




[RFC 0/6] Intel Thread Director Virtualization Support in QEMU

2024-02-03 Thread Zhao Liu
From: Zhao Liu 

Hi list,

This is our refreshed RFC to support our ITD virtualization patch
series [1] in KVM, and bases on bd2e12310b18 ("Merge tag
'qga-pull-2024-01-30' of https://github.com/kostyanf14/qemu into
staging").

ITD is Intel's client specific feature to optimize scheduling on Intel
hybrid platforms. Though this feature depends on hybrid topology
details, in our parctice on Win11 Guest, ITD works with hyrbid topolohy
and CPU affinity can achieve the most performance improvement in Win11
Guest (for example, on i9-13900K, up to 14%+ improvement on
3DMARK). More data or details, can be found in [1]. Thus, the ITD for
Win11 is also a typical use case of hybrid topology.


Welcome your feedback!


1. Background and Motivation


ITD allows the hardware to provide scheduling hints to the OS to help
optimize scheduling performance, and under the Intel hybrid
architecture, since Core and Atom have different capabilities
(performance, energy effency, etc.),  scheduling based on hardware
hints can take full advantage of this hybrid architecture. This is also
the most ideal scheduling model for intel hybrid architecture.

Therefore, we want to virtualize the ITD feature so that ITD can benefit
performance of the virtual machines on the hybrid machines as well.

Currently, our ITD virtualization is a software virtualization solution.


2. Introduction to HFI and ITD
==

Intel provides Hardware Feedback Interface (HFI) feature to allow
hardware to provide guidance to the OS scheduler to perform optimal
workload scheduling through a hardware feedback interface structure in
memory [2]. This hfi structure is called HFI table.

As for now, the guidance includes performance and energy enficency hints,
and it could update via thermal interrupt as the actual operating
conditions of the processor change during run time.

And Intel Thread Director (ITD) feature extends the HFI to provide
performance and energy efficiency data for advanced classes of
instructions.

The virtual HFI table is maintained in KVM, and for QEMU, we just need
to handle HFI/ITD/HRESET (and their dependent features: ACPI, TM and
PTS) related CPUIDs and MSRs.


3. Package level MSRs handling
==

PTS, HFI and ITD are all have package level features, such as package
level MSRs and package level HFI tables. But since KVM hasn't
support msr-topology and it just handle these package-level MSRs and
HFI table at VM level, in order to avoid potential contention problems
caused by multiple virtual-packages, we restrict VMs to be able to
enable PTC/HFI/ITD iff there's only 1 package (and only 1 die for
ITD/HFI).


4. HFI/ITD related info in CPUID


KVM provides some basic HFI info in CPUID.0x06 leaf, which is associated
with the virtual HFI table in KVM.

QEMU should configure HFI table index for each vCPU. Here we set the HFI
table index to vCPU index so that different vCPUs have different HFI
entries to avoid unnecessary competition problems.


5. Compatibility issues
===

HFI is supported in both server (SPR) and client (ADL/RPL/MTL) platform
products while ITD is the client specific feature.

For client platform, ITD (with HFI) could be enabled in Guest to improve
scheduling, but for server platform, HFI (without ITD) is only useful
on Host and Guest doesn't need it.

To simplify the enabling logic and avoid impacting the common topology
of the Guest, we set PTS, HFI, and ITD as feature bits that are not
automatically enabled.

Only when the user actively specifies these features, QEMU will check
and decide whether to enable them based on the topology constraints and
the ITD constraints.


6. New option "enable-itd"


ITD-related features include PTS, HFI, ITD, and HRESET.

To make it easier for users to enable ITD for Guest without specifying
the above feature bits one by one, we provide a new option "enable-itd"
to set the above feature bits for Guest all at once.

"enable-itd" does not guarantee that ITD will be enabled for Guest.
The success of enabling ITD for guest depends on topology constraints,
platform support, etc., which are checked in QEMU.


7. Patch Summary


Patch 1: Add support save/load for ACPI feature related thermal MSRs
 since ACPI feature CPUID has been added in QEMU.
Patch 2: Add support for PTS (package) thermal MSRs and its CPUID
Patch 3: Add support for HFI MSRs and its CPUID
Patch 4: Add support ITD CPUID and MSR_IA32_HW_FEEDBACK_THREAD_CONFIG.
Patch 5: Add support HRESET CPUID and MSR_IA32_HW_HRESET_ENABLE.
Patch 6: Add "enable-itd" to help user set ITD related feature bits.

# 8. References

[1]: KVM RFC: [RFC 00/26] Intel Thread Director Virtualization
 
https://lore.kernel.org/kvm/20240203091214.411862-1-zhao1@linux.intel.com/T/#t
[2]: SDM, vol. 3B, section 15.6 HARDWARE FEEDBACK INTERFACE AND INTEL
 THREAD DIRECTOR


[RFC 2/6] target/i386: Add support for Package Thermal Management feature

2024-02-03 Thread Zhao Liu
From: Zhuocheng Ding 

PTS feature (Package Thermal Management) is a dependency of ITD.

PTS provides 2 package level MSRs: MSR_IA32_PACKAGE_THERM_STATUS and
MSR_IA32_PACKAGE_THERM_INTERRUPT.

They're emulated in KVM, but currently KVM hasn't supported msr
topology.

Thus the emulation of these 2 package-level MSRs are only supported at
the whole VM-level, and all vCPUs share these two MSRs, so that the
emulation of these two MSRs does not distinguish between the different
virtual-packages.

In order to avoid potential contention problems caused by multiple
virtual-packages, add the following restrictions to the PTS feature bit:

1. Mark PTS as no_autoenable_flags and it won't be enabled by default.
2. PTS can't be enabled for the case with multiple packages.
3. PTS can't be enabled if ITD is not set for Guest, since currently PTS
   is only used to help enable ITD in virtualization scenario.

Additionally, add save/load support for 2 PTS related MSRs.

Tested-by: Yanting Jiang 
Co-developed-by: Zhuocheng Ding 
Signed-off-by: Zhuocheng Ding 
Signed-off-by: Zhao Liu 
---
 target/i386/cpu.c | 22 +-
 target/i386/cpu.h | 13 +
 target/i386/kvm/kvm.c | 24 
 3 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 03822d9ba8ee..e772d35d9403 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1114,7 +1114,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
 NULL, NULL, "arat", NULL,
-NULL, NULL, NULL, NULL,
+NULL, NULL, "pts", NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
@@ -1124,6 +1124,11 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 },
 .cpuid = { .eax = 6, .reg = R_EAX, },
 .tcg_features = TCG_6_EAX_FEATURES,
+/*
+ * PTS shouldn't be enabled by default since it has
+ * requirement for cpu topology.
+ */
+.no_autoenable_flags = CPUID_6_EAX_PTS,
 },
 [FEAT_XSAVE_XCR0_LO] = {
 .type = CPUID_FEATURE_WORD,
@@ -7424,6 +7429,21 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 goto out;
 }
 }
+
+if (env->features[FEAT_6_EAX] & CPUID_6_EAX_PTS && ms->smp.sockets > 1) {
+error_setg(errp,
+   "PTS currently only supports 1 package, "
+   "please set by \"-smp ...,sockets=1\"");
+return;
+}
+
+if (env->features[FEAT_6_EAX] & CPUID_6_EAX_PTS &&
+!(env->features[FEAT_6_EAX] & CPUID_6_EAX_ITD)) {
+error_setg(errp,
+   "In the absence of ITD, Guest does "
+   "not need PTS");
+return;
+}
 #endif
 
 mce_init(cpu);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index e453b3f010e2..a8c247b2ef89 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -530,6 +530,9 @@ typedef enum X86Seg {
 #define MSR_IA32_THERM_INTERRUPT0x019b
 #define MSR_IA32_THERM_STATUS   0x019c
 
+#define MSR_IA32_PACKAGE_THERM_STATUS0x01b1
+#define MSR_IA32_PACKAGE_THERM_INTERRUPT 0x01b2
+
 #define MSR_IA32_VMX_BASIC  0x0480
 #define MSR_IA32_VMX_PINBASED_CTLS  0x0481
 #define MSR_IA32_VMX_PROCBASED_CTLS 0x0482
@@ -982,6 +985,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_XSAVE_XSAVES (1U << 3)
 
 #define CPUID_6_EAX_ARAT   (1U << 2)
+#define CPUID_6_EAX_PTS(1U << 6)
+#define CPUID_6_EAX_ITD(1U << 23)
 
 /* CPUID[0x8007].EDX flags: */
 #define CPUID_APM_INVTSC   (1U << 8)
@@ -1767,6 +1772,14 @@ typedef struct CPUArchState {
 uint64_t therm_interrupt;
 uint64_t therm_status;
 
+/*
+ * Although these are package level MSRs, for the PTS feature, we
+ * temporarily limit it to be enabled for only 1 package, so the value
+ * of each vCPU is same and it's enough to support the save/load.
+ */
+uint64_t pkg_therm_interrupt;
+uint64_t pkg_therm_status;
+
 /* exception/interrupt handling */
 int error_code;
 int exception_is_int;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 3bf57b35bfcd..258591535fd5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -139,6 +139,7 @@ static bool has_msr_vmx_procbased_ctls2;
 static bool has_msr_perf_capabs;
 static bool has_msr_pkrs;
 static bool has_msr_therm;
+static bool has_msr_pkg_therm;
 
 static uint32_t has_architectural_pmu_version;
 static uint32_t num_architectural_pmu_gp_counters;
@@ -2461,6 +2462,10 @@ static int kvm_get_supported_msrs(KVMState *s)
 case MSR_IA32_THERM_STATUS:
 has_msr_therm = true;
 break;
+case MSR_IA32_PACKAGE_THERM_STATUS:
+case MSR_IA32_PACKAGE_THERM_INTERRUPT:
+

Re: [RFC 0/6] Intel Thread Director Virtualization Support in QEMU

2024-02-03 Thread Zhao Liu
On Sat, Feb 03, 2024 at 05:30:48PM +0800, Zhao Liu wrote:
> Date: Sat, 3 Feb 2024 17:30:48 +0800
> From: Zhao Liu 
> Subject: [RFC 0/6] Intel Thread Director Virtualization Support in QEMU
> X-Mailer: git-send-email 2.34.1
> 
> From: Zhao Liu 
> 
> Hi list,
> 
> This is our refreshed RFC to support our ITD virtualization patch
> series [1] in KVM, and bases on bd2e12310b18 ("Merge tag
> 'qga-pull-2024-01-30' of https://github.com/kostyanf14/qemu into
> staging").
> 
> ITD is Intel's client specific feature to optimize scheduling on Intel
> hybrid platforms. Though this feature depends on hybrid topology

s/depends/doesn't depend/

Regards,
Zhao

> details, in our parctice on Win11 Guest, ITD works with hyrbid topolohy
> and CPU affinity can achieve the most performance improvement in Win11
> Guest (for example, on i9-13900K, up to 14%+ improvement on
> 3DMARK). More data or details, can be found in [1]. Thus, the ITD for
> Win11 is also a typical use case of hybrid topology.
> 
> 
> Welcome your feedback!
> 
> 
> 1. Background and Motivation
> 
> 
> ITD allows the hardware to provide scheduling hints to the OS to help
> optimize scheduling performance, and under the Intel hybrid
> architecture, since Core and Atom have different capabilities
> (performance, energy effency, etc.),  scheduling based on hardware
> hints can take full advantage of this hybrid architecture. This is also
> the most ideal scheduling model for intel hybrid architecture.
> 
> Therefore, we want to virtualize the ITD feature so that ITD can benefit
> performance of the virtual machines on the hybrid machines as well.
> 
> Currently, our ITD virtualization is a software virtualization solution.
> 
> 
> 2. Introduction to HFI and ITD
> ==
> 
> Intel provides Hardware Feedback Interface (HFI) feature to allow
> hardware to provide guidance to the OS scheduler to perform optimal
> workload scheduling through a hardware feedback interface structure in
> memory [2]. This hfi structure is called HFI table.
> 
> As for now, the guidance includes performance and energy enficency hints,
> and it could update via thermal interrupt as the actual operating
> conditions of the processor change during run time.
> 
> And Intel Thread Director (ITD) feature extends the HFI to provide
> performance and energy efficiency data for advanced classes of
> instructions.
> 
> The virtual HFI table is maintained in KVM, and for QEMU, we just need
> to handle HFI/ITD/HRESET (and their dependent features: ACPI, TM and
> PTS) related CPUIDs and MSRs.
> 
> 
> 3. Package level MSRs handling
> ==
> 
> PTS, HFI and ITD are all have package level features, such as package
> level MSRs and package level HFI tables. But since KVM hasn't
> support msr-topology and it just handle these package-level MSRs and
> HFI table at VM level, in order to avoid potential contention problems
> caused by multiple virtual-packages, we restrict VMs to be able to
> enable PTC/HFI/ITD iff there's only 1 package (and only 1 die for
> ITD/HFI).
> 
> 
> 4. HFI/ITD related info in CPUID
> 
> 
> KVM provides some basic HFI info in CPUID.0x06 leaf, which is associated
> with the virtual HFI table in KVM.
> 
> QEMU should configure HFI table index for each vCPU. Here we set the HFI
> table index to vCPU index so that different vCPUs have different HFI
> entries to avoid unnecessary competition problems.
> 
> 
> 5. Compatibility issues
> ===
> 
> HFI is supported in both server (SPR) and client (ADL/RPL/MTL) platform
> products while ITD is the client specific feature.
> 
> For client platform, ITD (with HFI) could be enabled in Guest to improve
> scheduling, but for server platform, HFI (without ITD) is only useful
> on Host and Guest doesn't need it.
> 
> To simplify the enabling logic and avoid impacting the common topology
> of the Guest, we set PTS, HFI, and ITD as feature bits that are not
> automatically enabled.
> 
> Only when the user actively specifies these features, QEMU will check
> and decide whether to enable them based on the topology constraints and
> the ITD constraints.
> 
> 
> 6. New option "enable-itd"
> 
> 
> ITD-related features include PTS, HFI, ITD, and HRESET.
> 
> To make it easier for users to enable ITD for Guest without specifying
> the above feature bits one by one, we provide a new option "enable-itd"
> to set the above feature bits for Guest all at once.
> 
> "enable-itd" does not guarantee that ITD will be enabled for Guest.
> The success of enabling ITD for guest depends on topology constraints,
> platform support, etc., which are checked in QEMU.
> 
> 
> 7. Patch Summary
> 
> 
> Patch 1: Add support save/load for ACPI feature related thermal MSRs
>  since ACPI feature CPUID has been added in QEMU.
> Patch 2: Add support for PTS (package) thermal MSRs and its CPUID
> Patch 

[PATCH 5/6] pcie_sriov: Reuse SR-IOV VF device instances

2024-02-03 Thread Akihiko Odaki
Disable SR-IOV VF devices by reusing code to power down PCI devices
instead of removing them when the guest requests to disable VFs. This
allows to realize devices and report VF realization errors at PF
realization time.

Signed-off-by: Akihiko Odaki 
---
 docs/pcie_sriov.txt |   8 ++--
 include/hw/pci/pci.h|   2 +-
 include/hw/pci/pci_device.h |   2 +-
 include/hw/pci/pcie_sriov.h |   6 +--
 hw/net/igb.c|  13 --
 hw/nvme/ctrl.c  |  29 +++--
 hw/pci/pci.c|  18 
 hw/pci/pci_host.c   |   4 +-
 hw/pci/pcie.c   |   4 +-
 hw/pci/pcie_sriov.c | 100 
 10 files changed, 97 insertions(+), 89 deletions(-)

diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index a47aad0bfab0..ab2142807f79 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -52,9 +52,11 @@ setting up a BAR for a VF.
   ...
 
   /* Add and initialize the SR/IOV capability */
-  pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
-   vf_devid, initial_vfs, total_vfs,
-   fun_offset, stride);
+  if (!pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
+  vf_devid, initial_vfs, total_vfs,
+  fun_offset, stride, errp)) {
+ return;
+  }
 
   /* Set up individual VF BARs (parameters as for normal BARs) */
   pcie_sriov_pf_init_vf_bar( ... )
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index fa6313aabc43..fae83b9b723c 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -643,6 +643,6 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev)
 }
 
 MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
-void pci_set_power(PCIDevice *pci_dev, bool state);
+void pci_set_enabled(PCIDevice *pci_dev, bool state);
 
 #endif
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index 54fa0676abf1..f5aba8ae2675 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -56,7 +56,7 @@ typedef struct PCIReqIDCache PCIReqIDCache;
 struct PCIDevice {
 DeviceState qdev;
 bool partially_hotplugged;
-bool has_power;
+bool is_enabled;
 
 /* PCI config space */
 uint8_t *config;
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 095fb0c9edf9..d9a39daccac4 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -18,7 +18,6 @@
 struct PCIESriovPF {
 uint16_t num_vfs;   /* Number of virtual functions created */
 uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
-const char *vfname; /* Reference to the device type used for the VFs */
 PCIDevice **vf; /* Pointer to an array of num_vfs VF devices */
 };
 
@@ -27,10 +26,11 @@ struct PCIESriovVF {
 uint16_t vf_number; /* Logical VF number of this function */
 };
 
-void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 const char *vfname, uint16_t vf_dev_id,
 uint16_t init_vfs, uint16_t total_vfs,
-uint16_t vf_offset, uint16_t vf_stride);
+uint16_t vf_offset, uint16_t vf_stride,
+Error **errp);
 void pcie_sriov_pf_exit(PCIDevice *dev);
 
 /* Set up a VF bar in the SR/IOV bar area */
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 0b5c31a58bba..1079a33d4000 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -447,9 +447,16 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 
 pcie_ari_init(pci_dev, 0x150);
 
-pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
-IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
-IGB_VF_OFFSET, IGB_VF_STRIDE);
+if (!pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET,
+TYPE_IGBVF, IGB_82576_VF_DEV_ID,
+IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
+IGB_VF_OFFSET, IGB_VF_STRIDE,
+errp)) {
+pcie_cap_exit(pci_dev);
+igb_cleanup_msix(s);
+msi_uninit(pci_dev);
+return;
+}
 
 pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MMIO_BAR_IDX,
 PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH,
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f026245d1e9e..a356a452ad47 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8031,7 +8031,8 @@ static uint64_t nvme_bar_size(unsigned total_queues, 
unsigned total_irqs,
 return bar_size;
 }
 
-static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
+static bool nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
+Error **errp)
 {
 uint16_t vf_dev_id = n->params.use_intel_id ?
  PCI_DEVICE_ID_INTEL_NVME : 

[PATCH 4/6] pcie_sriov: Validate NumVFs

2024-02-03 Thread Akihiko Odaki
The guest may write NumVFs greater than TotalVFs and that can lead
to buffer overflow in VF implementations.

Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization 
(SR/IOV)")
Signed-off-by: Akihiko Odaki 
---
 hw/pci/pcie_sriov.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index a1fe65f5d801..da209b7f47fd 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -176,6 +176,9 @@ static void register_vfs(PCIDevice *dev)
 
 assert(sriov_cap > 0);
 num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
+if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) {
+return;
+}
 
 dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);
 

-- 
2.43.0




[PATCH 2/6] vfio: Avoid inspecting option QDict for rombar

2024-02-03 Thread Akihiko Odaki
Use pci_rom_bar_explicitly_enabled() to determine if rombar is explicitly
enabled.

Signed-off-by: Akihiko Odaki 
---
 hw/vfio/pci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d7fe06715c4b..44178ac9355f 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1010,7 +1010,6 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 {
 uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
 off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
-DeviceState *dev = DEVICE(vdev);
 char *name;
 int fd = vdev->vbasedev.fd;
 
@@ -1044,7 +1043,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 }
 
 if (vfio_opt_rom_in_denylist(vdev)) {
-if (dev->opts && qdict_haskey(dev->opts, "rombar")) {
+if (pci_rom_bar_explicitly_enabled(>pdev)) {
 warn_report("Device at %s is known to cause system instability"
 " issues during option rom execution",
 vdev->vbasedev.name);

-- 
2.43.0




[PATCH 1/6] hw/pci: Determine if rombar is explicitly enabled

2024-02-03 Thread Akihiko Odaki
vfio determines if rombar is explicitly enabled by inspecting QDict.
Inspecting QDict is not nice because QDict is untyped and depends on the
details on the external interface. Add an infrastructure to determine if
rombar is explicitly enabled to hw/pci. PCIDevice::rom_bar is changed to
have -1 by the default to tell rombar is explicitly enabled. It is
consistent with other properties like addr and romsize.

Signed-off-by: Akihiko Odaki 
---
 include/hw/pci/pci_device.h | 5 +
 hw/pci/pci.c| 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index d3dd0f64b273..54fa0676abf1 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -205,6 +205,11 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev)
 return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn);
 }
 
+static inline bool pci_rom_bar_explicitly_enabled(PCIDevice *dev)
+{
+return dev->rom_bar && dev->rom_bar != -1;
+}
+
 uint16_t pci_requester_id(PCIDevice *dev);
 
 /* DMA access functions */
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 76080af580d7..d08548d8ffe9 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -71,7 +71,7 @@ static Property pci_props[] = {
 DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
 DEFINE_PROP_STRING("romfile", PCIDevice, romfile),
 DEFINE_PROP_UINT32("romsize", PCIDevice, romsize, -1),
-DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
+DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, -1),
 DEFINE_PROP_BIT("multifunction", PCIDevice, cap_present,
 QEMU_PCI_CAP_MULTIFUNCTION_BITNR, false),
 DEFINE_PROP_BIT("x-pcie-lnksta-dllla", PCIDevice, cap_present,

-- 
2.43.0




Re: [PATCH v11 0/3] gdbstub and TCG plugin improvements

2024-02-03 Thread Akihiko Odaki

On 2024/02/03 20:08, Alex Bennée wrote:

Akihiko Odaki  writes:


This series extracts fixes and refactorings that can be applied
independently from "[PATCH v9 00/23] plugins: Allow to read registers".

The patch "target/riscv: Move MISA limits to class" was replaced with
patch "target/riscv: Move misa_mxl_max to class" since I found instances
may have different misa_ext_mask.


As this is re-based on Alistair's riscv-to-apply.next tree I'll wait for
this to go through the RiscV trees and then re-base the plugin patches
and dropping the merged riscv patches from my tree.

In the meantime feel free to review:

   Message-Id: <20240122145610.413836-1-alex.ben...@linaro.org>
   Date: Mon, 22 Jan 2024 14:55:49 +
   Subject: [PATCH v3 00/21] plugin updates (register access) for 9.0 (pre-PR?)
   From: =?UTF-8?q?Alex=20Benn=C3=A9e?= 

For:

   contrib/plugins: extend execlog to track register changes
   gdbstub: expose api to find registers

So I can add this to my maintainer omnibus series for the next PR I
send.


I added one trivial comment to: "gdbstub: expose api to find registers"

"contrib/plugins: extend execlog to track register changes" depends on 
"plugins: add an API to read registers". The comments for the patch in 
the following email are not addressed yet:

https://lore.kernel.org/all/4b2156ed-688d-4617-b52d-200413f01...@daynix.com/

Please check them out.



Re: [PATCH v3 16/21] gdbstub: expose api to find registers

2024-02-03 Thread Akihiko Odaki

On 2024/01/22 23:56, Alex Bennée wrote:

Expose an internal API to QEMU to return all the registers for a vCPU.
The list containing the details required to called gdb_read_register().

Based-on: <20231025093128.33116-15-akihiko.od...@daynix.com>
Cc: Akihiko Odaki 
Message-Id: <20240103173349.398526-38-alex.ben...@linaro.org>
Signed-off-by: Alex Bennée 

---
v3
   - rm unused api functions left over
---
  include/exec/gdbstub.h | 28 
  gdbstub/gdbstub.c  | 27 ++-
  2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index da9ddfe54c5..eb14b91139b 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -111,6 +111,34 @@ void gdb_feature_builder_end(const GDBFeatureBuilder 
*builder);
   */
  const GDBFeature *gdb_find_static_feature(const char *xmlname);
  
+/**

+ * gdb_read_register() - Read a register associated with a CPU.
+ * @cpu: The CPU associated with the register.
+ * @buf: The buffer that the read register will be appended to.
+ * @reg: The register's number returned by gdb_find_feature_register().
+ *
+ * Return: The number of read bytes.
+ */
+int gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
+
+/**
+ * typedef GDBRegDesc - a register description from gdbstub
+ */
+typedef struct {


nit: Add struct name; docs/devel/style.rst says struct has a CamelCase 
name *and* corresponding typedef, though this rule is apparently not 
strictly enforced.




Re: [PATCH v3 16/21] gdbstub: expose api to find registers

2024-02-03 Thread Akihiko Odaki

On 2024/02/03 20:44, Alex Bennée wrote:

Akihiko Odaki  writes:


On 2024/01/22 23:56, Alex Bennée wrote:

Expose an internal API to QEMU to return all the registers for a vCPU.
The list containing the details required to called gdb_read_register().
Based-on: <20231025093128.33116-15-akihiko.od...@daynix.com>
Cc: Akihiko Odaki 
Message-Id: <20240103173349.398526-38-alex.ben...@linaro.org>
Signed-off-by: Alex Bennée 
---
v3
- rm unused api functions left over
---
   include/exec/gdbstub.h | 28 
   gdbstub/gdbstub.c  | 27 ++-
   2 files changed, 54 insertions(+), 1 deletion(-)
diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index da9ddfe54c5..eb14b91139b 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -111,6 +111,34 @@ void gdb_feature_builder_end(const GDBFeatureBuilder 
*builder);
*/
   const GDBFeature *gdb_find_static_feature(const char *xmlname);
   +/**
+ * gdb_read_register() - Read a register associated with a CPU.
+ * @cpu: The CPU associated with the register.
+ * @buf: The buffer that the read register will be appended to.
+ * @reg: The register's number returned by gdb_find_feature_register().
+ *
+ * Return: The number of read bytes.
+ */
+int gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
+
+/**
+ * typedef GDBRegDesc - a register description from gdbstub
+ */
+typedef struct {


nit: Add struct name; docs/devel/style.rst says struct has a CamelCase
name *and* corresponding typedef, though this rule is apparently not
strictly enforced.


I think the wording is a little ambiguous here, especially with the
reference to typedefs.h where the anonymous structure typedefs are held.
In this case we don't need the structname because there is no internal
reference to itself.



The majority of structure typedefs have tag names though; grep "typedef 
struct".


That said, I'm fine even without a tag name. As I mentioned earlier, 
this convention in QEMU is not consistent, and I have no other reason to 
have a tag name.




Re: [PULL 00/36] target-arm queue

2024-02-03 Thread Peter Maydell
On Fri, 2 Feb 2024 at 15:36, Peter Maydell  wrote:
>
> The following changes since commit c3709fde5955d13f6d4f86ab46ef3cc2288ca65e:
>
>   Merge tag 'pull-aspeed-20240201' of https://github.com/legoater/qemu into 
> staging (2024-02-01 14:42:11 +)
>
> are available in the Git repository at:
>
>   https://git.linaro.org/people/pmaydell/qemu-arm.git 
> tags/pull-target-arm-20240202
>
> for you to fetch changes up to f09c2b7ba9908714a3e2f1decd989462536cf731:
>
>   hw/arm: Connect SPI Controller to BCM2835 (2024-02-02 13:51:59 +)
>
> 
> target/arm: fix exception syndrome for AArch32 bkpt insn
> pci, vmbus, adb, s390x/css-bridge: Switch buses to 3-phase reset
> system/vl.c: Fix handling of '-serial none -serial something'
> target/arm: Add ID_AA64ZFR0_EL1.B16B16 to the exposed-to-userspace set
> tests/qtest/xlnx-versal-trng-test.c: Drop use of variable length array
> target/arm: Reinstate "vfp" property on AArch32 CPUs
> doc/sphinx/hxtool.py: add optional label argument to SRST directive
> hw/arm: Check for CPU types in machine_run_board_init() for various boards
> pci-host: designware: Limit value range of iATU viewport register
> hw/arm: Convert some DPRINTF macros to trace events and guest errors
> hw/arm: NPCM7XX SoC: Add GMAC ethernet controller devices
> hw/arm: Implement BCM2835 SPI Controller
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM



ui/dbus-display1.[ch] are not properly listed in dependencies

2024-02-03 Thread Michael Tokarev

Doing a build of just a single target with --enable modules results in build
error:


 rm -rf b; cd b
 ../configure --enable-modules --target-list=x86_64-softmmu
 ninja qemu-system-x86_64

  In file included from ../ui/dbus-chardev.c:34:
 ../ui/dbus.h:34:10: fatal error: ui/dbus-display1.h: No such file or directory
34 | #include "ui/dbus-display1.h"
   |  ^~~~
 compilation terminated.


When building without modules, or when not specifying a single target,
the build succeeded.  So I'm concluding not all deps for ui/dbus-display1.h
are specified, - dbus_ss should depend on this file I guess..

/mjt



Re: [PATCH v11 0/3] gdbstub and TCG plugin improvements

2024-02-03 Thread Alex Bennée
Akihiko Odaki  writes:

> On 2024/02/03 20:08, Alex Bennée wrote:
>> Akihiko Odaki  writes:
>> 
>>> This series extracts fixes and refactorings that can be applied
>>> independently from "[PATCH v9 00/23] plugins: Allow to read registers".
>>>
>>> The patch "target/riscv: Move MISA limits to class" was replaced with
>>> patch "target/riscv: Move misa_mxl_max to class" since I found instances
>>> may have different misa_ext_mask.
>> As this is re-based on Alistair's riscv-to-apply.next tree I'll wait
>> for
>> this to go through the RiscV trees and then re-base the plugin patches
>> and dropping the merged riscv patches from my tree.
>> In the meantime feel free to review:
>>Message-Id: <20240122145610.413836-1-alex.ben...@linaro.org>
>>Date: Mon, 22 Jan 2024 14:55:49 +
>>Subject: [PATCH v3 00/21] plugin updates (register access) for 9.0 
>> (pre-PR?)
>>From: =?UTF-8?q?Alex=20Benn=C3=A9e?= 
>> For:
>>contrib/plugins: extend execlog to track register changes
>>gdbstub: expose api to find registers
>> So I can add this to my maintainer omnibus series for the next PR I
>> send.
>
> I added one trivial comment to: "gdbstub: expose api to find registers"
>
> "contrib/plugins: extend execlog to track register changes" depends on
> "plugins: add an API to read registers". The comments for the patch in
> the following email are not addressed yet:
> https://lore.kernel.org/all/4b2156ed-688d-4617-b52d-200413f01...@daynix.com/

I don't think we need to serialise with the BQL as the structures are
per-CPU (and created on vCPU creation).

As far as the restructuring we can move it into gdbstub later if there
is a need to. At the moment the structure is just housekeeping for
plugins.


>
> Please check them out.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH v3 16/21] gdbstub: expose api to find registers

2024-02-03 Thread Alex Bennée
Akihiko Odaki  writes:

> On 2024/01/22 23:56, Alex Bennée wrote:
>> Expose an internal API to QEMU to return all the registers for a vCPU.
>> The list containing the details required to called gdb_read_register().
>> Based-on: <20231025093128.33116-15-akihiko.od...@daynix.com>
>> Cc: Akihiko Odaki 
>> Message-Id: <20240103173349.398526-38-alex.ben...@linaro.org>
>> Signed-off-by: Alex Bennée 
>> ---
>> v3
>>- rm unused api functions left over
>> ---
>>   include/exec/gdbstub.h | 28 
>>   gdbstub/gdbstub.c  | 27 ++-
>>   2 files changed, 54 insertions(+), 1 deletion(-)
>> diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
>> index da9ddfe54c5..eb14b91139b 100644
>> --- a/include/exec/gdbstub.h
>> +++ b/include/exec/gdbstub.h
>> @@ -111,6 +111,34 @@ void gdb_feature_builder_end(const GDBFeatureBuilder 
>> *builder);
>>*/
>>   const GDBFeature *gdb_find_static_feature(const char *xmlname);
>>   +/**
>> + * gdb_read_register() - Read a register associated with a CPU.
>> + * @cpu: The CPU associated with the register.
>> + * @buf: The buffer that the read register will be appended to.
>> + * @reg: The register's number returned by gdb_find_feature_register().
>> + *
>> + * Return: The number of read bytes.
>> + */
>> +int gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
>> +
>> +/**
>> + * typedef GDBRegDesc - a register description from gdbstub
>> + */
>> +typedef struct {
>
> nit: Add struct name; docs/devel/style.rst says struct has a CamelCase
> name *and* corresponding typedef, though this rule is apparently not
> strictly enforced.

I think the wording is a little ambiguous here, especially with the
reference to typedefs.h where the anonymous structure typedefs are held.
In this case we don't need the structname because there is no internal
reference to itself.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH v11 0/3] gdbstub and TCG plugin improvements

2024-02-03 Thread Alex Bennée
Akihiko Odaki  writes:

> This series extracts fixes and refactorings that can be applied
> independently from "[PATCH v9 00/23] plugins: Allow to read registers".
>
> The patch "target/riscv: Move MISA limits to class" was replaced with
> patch "target/riscv: Move misa_mxl_max to class" since I found instances
> may have different misa_ext_mask.

As this is re-based on Alistair's riscv-to-apply.next tree I'll wait for
this to go through the RiscV trees and then re-base the plugin patches
and dropping the merged riscv patches from my tree.

In the meantime feel free to review:

  Message-Id: <20240122145610.413836-1-alex.ben...@linaro.org>
  Date: Mon, 22 Jan 2024 14:55:49 +
  Subject: [PATCH v3 00/21] plugin updates (register access) for 9.0 (pre-PR?)
  From: =?UTF-8?q?Alex=20Benn=C3=A9e?= 

For:

  contrib/plugins: extend execlog to track register changes
  gdbstub: expose api to find registers

So I can add this to my maintainer omnibus series for the next PR I
send.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH 0/4] char: Minor fixes, and a tighter QAPI schema

2024-02-03 Thread Marc-André Lureau
Hi

On Sat, Feb 3, 2024 at 12:02 PM Markus Armbruster  wrote:
>
> Markus Armbruster (4):
>   chardev/parallel: Don't close stdin on inappropriate device
>   tests/unit/test-char: Fix qemu_socket(), make_udp_socket() check
>   qapi/char: Make backend types properly conditional
>   qapi/char: Deprecate backend type "memory"

Reviewed-by: Marc-André Lureau 

>
>  docs/about/deprecated.rst |  8 
>  qapi/char.json| 28 +---
>  include/qemu/osdep.h  |  9 -
>  chardev/char-parallel.c   |  7 +--
>  tests/unit/test-char.c| 25 +++--
>  chardev/meson.build   |  4 +---
>  6 files changed, 62 insertions(+), 19 deletions(-)
>
> --
> 2.43.0
>




[PULL 0/5] QAPI patches patches for 2024-02-03

2024-02-03 Thread Markus Armbruster
The following changes since commit 29b008927ef6e3fbb70e6607b25d3fcae26a5190:

  Merge tag 'pull-nic-config-2-20240202' of 
git://git.infradead.org/users/dwmw2/qemu into staging (2024-02-02 16:47:36 
+)

are available in the Git repository at:

  https://repo.or.cz/qemu/armbru.git tags/pull-qapi-2024-02-03

for you to fetch changes up to 3424ed6caf9759eb57405d965537fd5f3d70026b:

  qga/qapi-schema: Move command description right after command name 
(2024-02-03 09:20:33 +0100)


QAPI patches patches for 2024-02-03


Markus Armbruster (5):
  qapi: Drop redundant documentation of inherited members
  qapi: Drop redundant documentation of conditional
  qapi: Elide "Potential additional modes" from generated docs
  qga: Move type description right after type name
  qga/qapi-schema: Move command description right after command name

 qapi/block-core.json   |  6 ++
 qapi/block-export.json | 11 ++-
 qapi/introspect.json   |  2 --
 qapi/misc-target.json  |  2 --
 qga/qapi-schema.json   | 18 +-
 5 files changed, 17 insertions(+), 22 deletions(-)

-- 
2.43.0




[PULL 1/5] qapi: Drop redundant documentation of inherited members

2024-02-03 Thread Markus Armbruster
Documentation generated for SchemaInfo looks like

The members of "SchemaInfoBuiltin" when "meta-type" is ""builtin""
The members of "SchemaInfoEnum" when "meta-type" is ""enum""
The members of "SchemaInfoArray" when "meta-type" is ""array""
The members of "SchemaInfoObject" when "meta-type" is ""object""
The members of "SchemaInfoAlternate" when "meta-type" is ""alternate""
The members of "SchemaInfoCommand" when "meta-type" is ""command""
The members of "SchemaInfoEvent" when "meta-type" is ""event""
Additional members depend on the value of "meta-type".

The last line became redundant when commit 88f63467c57 (qapi2texi:
Generate reference to base type members) added the lines preceding it.
Drop it.

BlockdevOptions has the same issue.  Drop

Remaining options are determined by the block driver.

Signed-off-by: Markus Armbruster 
Message-ID: <20240129115008.674248-2-arm...@redhat.com>
Reviewed-by: Eric Blake 
---
 qapi/block-core.json | 2 --
 qapi/introspect.json | 2 --
 2 files changed, 4 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 48c181e55d..530c4af50f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4665,8 +4665,6 @@
 # @force-share: force share all permission on added nodes.  Requires
 # read-only=true.  (Since 2.10)
 #
-# Remaining options are determined by the block driver.
-#
 # Since: 2.9
 ##
 { 'union': 'BlockdevOptions',
diff --git a/qapi/introspect.json b/qapi/introspect.json
index 8df1ce85ed..b041b02ba8 100644
--- a/qapi/introspect.json
+++ b/qapi/introspect.json
@@ -93,8 +93,6 @@
 # particular order.  (since 4.1 for object types, 4.2 for
 # commands, 5.0 for the rest)
 #
-# Additional members depend on the value of @meta-type.
-#
 # Since: 2.5
 ##
 { 'union': 'SchemaInfo',
-- 
2.43.0




[PULL 4/5] qga: Move type description right after type name

2024-02-03 Thread Markus Armbruster
Documentation of type BlockdevOptionsIscsi describes the type's
purpose after its members.  Everywhere else, we do it the other way
round.  Move it for consistency.

Signed-off-by: Markus Armbruster 
Message-ID: <20240129115008.674248-5-arm...@redhat.com>
Reviewed-by: Konstantin Kostiuk 
---
 qapi/block-core.json | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 530c4af50f..781c9bd03e 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4070,6 +4070,8 @@
 ##
 # @BlockdevOptionsIscsi:
 #
+# Driver specific block device options for iscsi
+#
 # @transport: The iscsi transport type
 #
 # @portal: The address of the iscsi portal
@@ -4094,8 +4096,6 @@
 # @timeout: Timeout in seconds after which a request will timeout.  0
 # means no timeout and is the default.
 #
-# Driver specific block device options for iscsi
-#
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsIscsi',
-- 
2.43.0




[PULL 3/5] qapi: Elide "Potential additional modes" from generated docs

2024-02-03 Thread Markus Armbruster
Documentation of BlockExportRemoveMode has

Potential additional modes to be added in the future:

hide: Just hide export from new clients, leave existing connections
as is.  Remove export after all clients are disconnected.

soft: Hide export from new clients, answer with ESHUTDOWN for all
further requests from existing clients.

I think this is useful only for developers.  Elide it from generated
documentation by turning it into a TODO section.

This effectively reverts my own commit b71fd73cc45 (Revert "qapi:
BlockExportRemoveMode: move comments to TODO").  At the time, I was
about to elide TODO sections from the generated manual, I wasn't sure
about this one, and decided to avoid change.  And now I've made up my
mind.

Signed-off-by: Markus Armbruster 
Message-ID: <20240129115008.674248-4-arm...@redhat.com>
Reviewed-by: Eric Blake 
---
 qapi/block-export.json | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/qapi/block-export.json b/qapi/block-export.json
index 7874a49ba7..e063e9255a 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -266,13 +266,14 @@
 #
 # @hard: Drop all connections immediately and remove export.
 #
-# Potential additional modes to be added in the future:
+# TODO: Potential additional modes to be added in the future:
 #
-# hide: Just hide export from new clients, leave existing connections
-# as is.  Remove export after all clients are disconnected.
+# - hide: Just hide export from new clients, leave existing
+#   connections as is.  Remove export after all clients are
+#   disconnected.
 #
-# soft: Hide export from new clients, answer with ESHUTDOWN for all
-# further requests from existing clients.
+# - soft: Hide export from new clients, answer with ESHUTDOWN for
+#   all further requests from existing clients.
 #
 # Since: 2.12
 ##
-- 
2.43.0




[PULL 2/5] qapi: Drop redundant documentation of conditional

2024-02-03 Thread Markus Armbruster
Documentation generated for dump-skeys contains

This command is only supported on s390 architecture.

and

If
~~

"TARGET_S390X"

The former became redundant in commit 901a34a400a (qapi: add 'If:'
section to generated documentation) added the latter.  Drop the
former.

Signed-off-by: Markus Armbruster 
Message-ID: <20240129115008.674248-3-arm...@redhat.com>
Reviewed-by: Eric Blake 
---
 qapi/misc-target.json | 2 --
 1 file changed, 2 deletions(-)

diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 9195e7d26b..03e83c053f 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -237,8 +237,6 @@
 #
 # @filename: the path to the file to dump to
 #
-# This command is only supported on s390 architecture.
-#
 # Since: 2.5
 #
 # Example:
-- 
2.43.0




Re: [PATCH 3/3] monitor: only run coroutine commands in qemu_aio_context

2024-02-03 Thread Michael Tokarev

03.02.2024 12:01, Michael Tokarev wrote:
...

This change broke something in 7.2. I'm still debugging it, will
come with a follow-up once some more details are found, I'll also
check current master with and without this commit.

The prob happens with multiple suspend-resume cycles, - with this
change applied, guest does not work as expected after *second*
suspend-resume.


So, it turned out the prob here exists on master too, and manifests
itself the same way on 7.2.9 or on 8.2.1, - in all cases where we
have this change applied it works (or breaks) equally.

A (simple) reproducer so far is a hibernate test, - it fails *only*
after suspend-to-ram, but works fine after just hibernate.

I used just an initrd (with a drive image used for swap -
for hibernation space).

 qemu-img create s.img 256M
 mkswap s.img
 qemu-system-x86_64 \
  -serial stdio -vga none -display none -parallel none -net none \
  -machine q35 \
  -drive file=s.img,if=ide,format=raw \
  -m 256 \
  -monitor unix:ttyS0,server,nowait \
  -kernel /boot/vmlinuz-6.1.0-15-amd64 \
  -initrd /boot/initrd.img-6.1.0-15-amd64 \
  -append "shell=/bin/sh console=ttyS0 root=none"

 There, in the guest (it has busybox only here):
 # swapon /dev/sda
 # echo mem > /sys/power/state
 (system_wakeup on the monitor)
 # echo disk > /sys/power/state

The system will hibernate but *not* turn off power, qemu
will continue running, while all console messages are the
same as when it works fine.  qemu process is spinning up
with 100% cpu usage at this stage.

Without the intermediate suspend-to-ram or without the
commit in question, qemu process will exit normally at
this stage.

This is a somewhat patalogical test case, but I see it as an
indicator of something else being wrong, like we aren't saving
or restoring some state now which we should do.

The tight loop also suggests we're not having success in there.

Thanks,

/mjt



[PULL 5/5] qga/qapi-schema: Move command description right after command name

2024-02-03 Thread Markus Armbruster
Documentation of commands guest-ssh-get-authorized-keys,
guest-ssh-add-authorized-keys, and guest-ssh-remove-authorized-keys
describes the command's purpose after its arguments.  Everywhere else,
we do it the other way round.  Move it for consistency.

Signed-off-by: Markus Armbruster 
Message-ID: <20240129115008.674248-6-arm...@redhat.com>
Reviewed-by: Konstantin Kostiuk 
---
 qga/qapi-schema.json | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index 876e2a8ea8..50b0a558c7 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -1565,11 +1565,11 @@
 ##
 # @guest-ssh-get-authorized-keys:
 #
-# @username: the user account to add the authorized keys
-#
 # Return the public keys from user .ssh/authorized_keys on Unix
 # systems (not implemented for other systems).
 #
+# @username: the user account to add the authorized keys
+#
 # Returns: @GuestAuthorizedKeys
 #
 # Since: 5.2
@@ -1582,6 +1582,9 @@
 ##
 # @guest-ssh-add-authorized-keys:
 #
+# Append public keys to user .ssh/authorized_keys on Unix systems (not
+# implemented for other systems).
+#
 # @username: the user account to add the authorized keys
 #
 # @keys: the public keys to add (in OpenSSH/sshd(8) authorized_keys
@@ -1589,9 +1592,6 @@
 #
 # @reset: ignore the existing content, set it with the given keys only
 #
-# Append public keys to user .ssh/authorized_keys on Unix systems (not
-# implemented for other systems).
-#
 # Returns: Nothing on success.
 #
 # Since: 5.2
@@ -1603,15 +1603,15 @@
 ##
 # @guest-ssh-remove-authorized-keys:
 #
+# Remove public keys from the user .ssh/authorized_keys on Unix
+# systems (not implemented for other systems). It's not an error if
+# the key is already missing.
+#
 # @username: the user account to remove the authorized keys
 #
 # @keys: the public keys to remove (in OpenSSH/sshd(8) authorized_keys
 # format)
 #
-# Remove public keys from the user .ssh/authorized_keys on Unix
-# systems (not implemented for other systems). It's not an error if
-# the key is already missing.
-#
 # Returns: Nothing on success.
 #
 # Since: 5.2
-- 
2.43.0




[PATCH 0/6] hw/pci: SR-IOV related fixes and improvements

2024-02-03 Thread Akihiko Odaki
I submitted a RFC series[1] to add support for SR-IOV emulation to
virtio-net-pci. During the development of the series, I fixed some
trivial bugs and made improvements that I think are independently
useful. This series extracts those fixes and improvements from the RFC
series. Below is an explanation of the patches:

Patch 1 adds a function to check if ROM BAR is explicitly enabled. It
is used in the RFC series to report an error if the user requests to
enable ROM BAR for SR-IOV VF. Patch 2 and 3 use it for vfio to remove
hacky device option dictionary inspection.

Patch 4 adds SR-IOV NumVFs validation to fix potential buffer overflow.

Patch 5 changes to realize SR-IOV VFs when the PF is being realized to
validate VF configuration.

Patch 6 fixes memory leak that occurs if a SR-IOV VF fails to realize.

[1]: https://patchew.org/QEMU/20231210-sriov-v2-0-b959e8a6d...@daynix.com/

Signed-off-by: Akihiko Odaki 
---
Akihiko Odaki (6):
  hw/pci: Determine if rombar is explicitly enabled
  vfio: Avoid inspecting option QDict for rombar
  hw/qdev: Remove opts member
  pcie_sriov: Validate NumVFs
  pcie_sriov: Reuse SR-IOV VF device instances
  pcie_sriov: Release VFs failed to realize

 docs/pcie_sriov.txt |   8 ++--
 include/hw/pci/pci.h|   2 +-
 include/hw/pci/pci_device.h |   7 ++-
 include/hw/pci/pcie_sriov.h |   6 +--
 include/hw/qdev-core.h  |   4 --
 hw/core/qdev.c  |   1 -
 hw/net/igb.c|  13 --
 hw/nvme/ctrl.c  |  29 ++--
 hw/pci/pci.c|  20 +
 hw/pci/pci_host.c   |   4 +-
 hw/pci/pcie.c   |   4 +-
 hw/pci/pcie_sriov.c | 105 +---
 hw/vfio/pci.c   |   3 +-
 system/qdev-monitor.c   |  12 ++---
 14 files changed, 116 insertions(+), 102 deletions(-)
---
base-commit: 4a4efae44f19528589204581e9e2fab69c5d39aa
change-id: 20240129-reuse-faae22b11934

Best regards,
-- 
Akihiko Odaki 




[PATCH 3/6] hw/qdev: Remove opts member

2024-02-03 Thread Akihiko Odaki
It is no longer used.

Signed-off-by: Akihiko Odaki 
---
 include/hw/qdev-core.h |  4 
 hw/core/qdev.c |  1 -
 system/qdev-monitor.c  | 12 +++-
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 151d9682380d..6befbca31117 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -237,10 +237,6 @@ struct DeviceState {
  * @pending_deleted_expires_ms: optional timeout for deletion events
  */
 int64_t pending_deleted_expires_ms;
-/**
- * @opts: QDict of options for the device
- */
-QDict *opts;
 /**
  * @hotplugged: was device added after PHASE_MACHINE_READY?
  */
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 43d863b0c5b0..c98691a90d48 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -706,7 +706,6 @@ static void device_finalize(Object *obj)
 dev->canonical_path = NULL;
 }
 
-qobject_unref(dev->opts);
 g_free(dev->id);
 }
 
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index a13db763e5dd..71c00f62ee38 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -625,6 +625,7 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
 char *id;
 DeviceState *dev = NULL;
 BusState *bus = NULL;
+QDict *properties;
 
 driver = qdict_get_try_str(opts, "driver");
 if (!driver) {
@@ -705,13 +706,14 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
 }
 
 /* set properties */
-dev->opts = qdict_clone_shallow(opts);
-qdict_del(dev->opts, "driver");
-qdict_del(dev->opts, "bus");
-qdict_del(dev->opts, "id");
+properties = qdict_clone_shallow(opts);
+qdict_del(properties, "driver");
+qdict_del(properties, "bus");
+qdict_del(properties, "id");
 
-object_set_properties_from_keyval(>parent_obj, dev->opts, from_json,
+object_set_properties_from_keyval(>parent_obj, properties, from_json,
   errp);
+qobject_unref(properties);
 if (*errp) {
 goto err_del_dev;
 }

-- 
2.43.0




[PATCH 6/6] pcie_sriov: Release VFs failed to realize

2024-02-03 Thread Akihiko Odaki
Release VFs failed to realize just as we do in unregister_vfs().

Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization 
(SR/IOV)")
Signed-off-by: Akihiko Odaki 
---
 hw/pci/pcie_sriov.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 9ba34cf8f8ed..9d668b8d6c17 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -91,6 +91,8 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 vf->exp.sriov_vf.vf_number = i;
 
 if (!qdev_realize(>qdev, bus, errp)) {
+object_unparent(OBJECT(vf));
+object_unref(vf);
 unrealize_vfs(dev, i);
 return false;
 }

-- 
2.43.0




[PATCH v11 2/3] target/riscv: Move misa_mxl_max to class

2024-02-03 Thread Akihiko Odaki
misa_mxl_max is common for all instances of a RISC-V CPU class so they
are better put into class.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.h |   4 +-
 hw/riscv/boot.c|   3 +-
 target/riscv/cpu.c | 160 -
 target/riscv/gdbstub.c |  12 ++--
 target/riscv/kvm/kvm-cpu.c |  10 +--
 target/riscv/machine.c |   7 +-
 target/riscv/tcg/tcg-cpu.c |  12 ++--
 target/riscv/translate.c   |   3 +-
 8 files changed, 112 insertions(+), 99 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 51381877273e..5c9577f1be7b 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -189,7 +189,6 @@ struct CPUArchState {
 
 /* RISCVMXL, but uint32_t for vmstate migration */
 uint32_t misa_mxl;  /* current mxl */
-uint32_t misa_mxl_max;  /* max mxl for this cpu */
 uint32_t misa_ext;  /* current extensions */
 uint32_t misa_ext_mask; /* max ext for this cpu */
 uint32_t xl;/* current xlen */
@@ -471,6 +470,7 @@ struct RISCVCPUClass {
 
 DeviceRealize parent_realize;
 ResettablePhases parent_phases;
+uint32_t misa_mxl_max;  /* max mxl for this cpu */
 };
 
 static inline int riscv_has_ext(CPURISCVState *env, target_ulong ext)
@@ -783,7 +783,7 @@ enum riscv_pmu_event_idx {
 /* used by tcg/tcg-cpu.c*/
 void isa_ext_update_enabled(RISCVCPU *cpu, uint32_t ext_offset, bool en);
 bool isa_ext_is_enabled(RISCVCPU *cpu, uint32_t ext_offset);
-void riscv_cpu_set_misa(CPURISCVState *env, RISCVMXL mxl, uint32_t ext);
+void riscv_cpu_set_misa_ext(CPURISCVState *env, uint32_t ext);
 bool riscv_cpu_is_vendor(Object *cpu_obj);
 
 typedef struct RISCVCPUMultiExtConfig {
diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
index 0ffca05189f0..12f9792245a4 100644
--- a/hw/riscv/boot.c
+++ b/hw/riscv/boot.c
@@ -36,7 +36,8 @@
 
 bool riscv_is_32bit(RISCVHartArrayState *harts)
 {
-return harts->harts[0].env.misa_mxl_max == MXL_RV32;
+RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(>harts[0]);
+return mcc->misa_mxl_max == MXL_RV32;
 }
 
 /*
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 88e8cc868144..c9d09d175510 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -301,9 +301,8 @@ const char *riscv_cpu_get_trap_name(target_ulong cause, 
bool async)
 }
 }
 
-void riscv_cpu_set_misa(CPURISCVState *env, RISCVMXL mxl, uint32_t ext)
+void riscv_cpu_set_misa_ext(CPURISCVState *env, uint32_t ext)
 {
-env->misa_mxl_max = env->misa_mxl = mxl;
 env->misa_ext_mask = env->misa_ext = ext;
 }
 
@@ -416,11 +415,7 @@ static void riscv_any_cpu_init(Object *obj)
 {
 RISCVCPU *cpu = RISCV_CPU(obj);
 CPURISCVState *env = >env;
-#if defined(TARGET_RISCV32)
-riscv_cpu_set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVD | RVC | RVU);
-#elif defined(TARGET_RISCV64)
-riscv_cpu_set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVU);
-#endif
+riscv_cpu_set_misa_ext(env, RVI | RVM | RVA | RVF | RVD | RVC | RVU);
 
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj),
@@ -441,19 +436,17 @@ static void riscv_max_cpu_init(Object *obj)
 {
 RISCVCPU *cpu = RISCV_CPU(obj);
 CPURISCVState *env = >env;
-RISCVMXL mlx = MXL_RV64;
 
 cpu->cfg.mmu = true;
 cpu->cfg.pmp = true;
 
-#ifdef TARGET_RISCV32
-mlx = MXL_RV32;
-#endif
-riscv_cpu_set_misa(env, mlx, 0);
 env->priv_ver = PRIV_VERSION_LATEST;
 #ifndef CONFIG_USER_ONLY
-set_satp_mode_max_supported(RISCV_CPU(obj), mlx == MXL_RV32 ?
-VM_1_10_SV32 : VM_1_10_SV57);
+#ifdef TARGET_RISCV32
+set_satp_mode_max_supported(cpu, VM_1_10_SV32);
+#else
+set_satp_mode_max_supported(cpu, VM_1_10_SV57);
+#endif
 #endif
 }
 
@@ -466,8 +459,6 @@ static void rv64_base_cpu_init(Object *obj)
 cpu->cfg.mmu = true;
 cpu->cfg.pmp = true;
 
-/* We set this in the realise function */
-riscv_cpu_set_misa(env, MXL_RV64, 0);
 /* Set latest version of privileged specification */
 env->priv_ver = PRIV_VERSION_LATEST;
 #ifndef CONFIG_USER_ONLY
@@ -479,8 +470,7 @@ static void rv64_sifive_u_cpu_init(Object *obj)
 {
 RISCVCPU *cpu = RISCV_CPU(obj);
 CPURISCVState *env = >env;
-riscv_cpu_set_misa(env, MXL_RV64,
-   RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
+riscv_cpu_set_misa_ext(env, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
 env->priv_ver = PRIV_VERSION_1_10_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV39);
@@ -498,7 +488,7 @@ static void rv64_sifive_e_cpu_init(Object *obj)
 CPURISCVState *env = _CPU(obj)->env;
 RISCVCPU *cpu = RISCV_CPU(obj);
 
-riscv_cpu_set_misa(env, MXL_RV64, RVI | RVM | RVA | RVC | RVU);
+riscv_cpu_set_misa_ext(env, RVI | RVM | RVA | RVC | RVU);
 env->priv_ver = PRIV_VERSION_1_10_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ 

[PATCH v11 1/3] target/riscv: Remove misa_mxl validation

2024-02-03 Thread Akihiko Odaki
It is initialized with a simple assignment and there is little room for
error. In fact, the validation is even more complex.

Signed-off-by: Akihiko Odaki 
Acked-by: LIU Zhiwei 
Reviewed-by: Daniel Henrique Barboza 
Acked-by: Alistair Francis 
---
 target/riscv/tcg/tcg-cpu.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index da437975b429..94dca7e446eb 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -268,7 +268,7 @@ static void riscv_cpu_validate_misa_priv(CPURISCVState 
*env, Error **errp)
 }
 }
 
-static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, Error **errp)
+static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu)
 {
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(cpu);
 CPUClass *cc = CPU_CLASS(mcc);
@@ -288,11 +288,6 @@ static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, 
Error **errp)
 default:
 g_assert_not_reached();
 }
-
-if (env->misa_mxl_max != env->misa_mxl) {
-error_setg(errp, "misa_mxl_max must be equal to misa_mxl");
-return;
-}
 }
 
 static void riscv_cpu_validate_v(CPURISCVState *env, RISCVCPUConfig *cfg,
@@ -908,7 +903,6 @@ static bool riscv_cpu_is_generic(Object *cpu_obj)
 static bool riscv_tcg_cpu_realize(CPUState *cs, Error **errp)
 {
 RISCVCPU *cpu = RISCV_CPU(cs);
-Error *local_err = NULL;
 
 if (!riscv_cpu_tcg_compatible(cpu)) {
 g_autofree char *name = riscv_cpu_get_name(cpu);
@@ -917,14 +911,11 @@ static bool riscv_tcg_cpu_realize(CPUState *cs, Error 
**errp)
 return false;
 }
 
-riscv_cpu_validate_misa_mxl(cpu, _err);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
-return false;
-}
+riscv_cpu_validate_misa_mxl(cpu);
 
 #ifndef CONFIG_USER_ONLY
 CPURISCVState *env = >env;
+Error *local_err = NULL;
 
 CPU(cs)->tcg_cflags |= CF_PCREL;
 

-- 
2.43.0




[PATCH v11 0/3] gdbstub and TCG plugin improvements

2024-02-03 Thread Akihiko Odaki
This series extracts fixes and refactorings that can be applied
independently from "[PATCH v9 00/23] plugins: Allow to read registers".

The patch "target/riscv: Move MISA limits to class" was replaced with
patch "target/riscv: Move misa_mxl_max to class" since I found instances
may have different misa_ext_mask.

V6 -> V7:
  Rebased.

V5 -> V6:
  Added patch "default-configs: Add TARGET_XML_FILES definition".
  Rebased.

V4 -> V5:
  Added patch "hw/riscv: Use misa_mxl instead of misa_mxl_max".

V3 -> V4:
  Added patch "gdbstub: Check if gdb_regs is NULL".

V2 -> V3:
  Restored patch sets from the previous version.
  Rebased to commit 800485762e6564e04e2ab315132d477069562d91.

V1 -> V2:
  Added patch "target/riscv: Do not allow MXL_RV32 for TARGET_RISCV64".
  Added patch "target/riscv: Initialize gdb_core_xml_file only once".
  Dropped patch "target/riscv: Remove misa_mxl validation".
  Dropped patch "target/riscv: Move misa_mxl_max to class".
  Dropped patch "target/riscv: Validate misa_mxl_max only once".

Signed-off-by: Akihiko Odaki 
---
Changes in v11:
- Rebased on: https://github.com/alistair23/qemu/tree/riscv-to-apply.next
- Link to v10: 
https://lore.kernel.org/r/20240128-riscv-v10-0-fdbe59397...@daynix.com

Changes in v10:
- Dropped patch "hw/riscv: Use misa_mxl instead of misa_mxl_max" due to
  invalid assumption that the relevant code is only used for kernel
  loading.
- Link to v9: 
https://lore.kernel.org/r/20240115-riscv-v9-0-ff171e1ae...@daynix.com

Changes in v9:
- Rebased to commit 977542ded7e6b28d2bc077bcda24568c716e393c.
- Link to v8: 
https://lore.kernel.org/r/20231218-riscv-v8-0-c9bf2b158...@daynix.com

Changes in v8:
- Added a more detailed explanation for patch "hw/riscv: Use misa_mxl
  instead of misa_mxl_max". (Alistair Francis)
- Link to v7: 
https://lore.kernel.org/r/20231213-riscv-v7-0-a760156a3...@daynix.com

---
Akihiko Odaki (3):
  target/riscv: Remove misa_mxl validation
  target/riscv: Move misa_mxl_max to class
  target/riscv: Validate misa_mxl_max only once

 target/riscv/cpu.h |   4 +-
 hw/riscv/boot.c|   3 +-
 target/riscv/cpu.c | 181 ++---
 target/riscv/gdbstub.c |  12 ++-
 target/riscv/kvm/kvm-cpu.c |  10 +--
 target/riscv/machine.c |   7 +-
 target/riscv/tcg/tcg-cpu.c |  44 ++-
 target/riscv/translate.c   |   3 +-
 8 files changed, 133 insertions(+), 131 deletions(-)
---
base-commit: 0c9d286cf791cdda76fd57e4562e2cb18d4a79e2
change-id: 20231213-riscv-fcc9640986cf

Best regards,
-- 
Akihiko Odaki 




[PATCH v11 3/3] target/riscv: Validate misa_mxl_max only once

2024-02-03 Thread Akihiko Odaki
misa_mxl_max is now a class member and initialized only once for each
class. This also moves the initialization of gdb_core_xml_file which
will be referenced before realization in the future.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.c | 21 +
 target/riscv/tcg/tcg-cpu.c | 23 ---
 2 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c9d09d175510..12a69efe89c4 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1344,6 +1344,26 @@ static const MISAExtInfo misa_ext_info_arr[] = {
 MISA_EXT_INFO(RVB, "x-b", "Bit manipulation (Zba_Zbb_Zbs)")
 };
 
+static void riscv_cpu_validate_misa_mxl(RISCVCPUClass *mcc)
+{
+CPUClass *cc = CPU_CLASS(mcc);
+
+/* Validate that MISA_MXL is set properly. */
+switch (mcc->misa_mxl_max) {
+#ifdef TARGET_RISCV64
+case MXL_RV64:
+case MXL_RV128:
+cc->gdb_core_xml_file = "riscv-64bit-cpu.xml";
+break;
+#endif
+case MXL_RV32:
+cc->gdb_core_xml_file = "riscv-32bit-cpu.xml";
+break;
+default:
+g_assert_not_reached();
+}
+}
+
 static int riscv_validate_misa_info_idx(uint32_t bit)
 {
 int idx;
@@ -2303,6 +2323,7 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
*data)
 RISCVCPUClass *mcc = RISCV_CPU_CLASS(c);
 
 mcc->misa_mxl_max = (uint32_t)(uintptr_t)data;
+riscv_cpu_validate_misa_mxl(mcc);
 }
 
 static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str,
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index 067f1493fea7..e5a60c2e8b60 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -268,27 +268,6 @@ static void riscv_cpu_validate_misa_priv(CPURISCVState 
*env, Error **errp)
 }
 }
 
-static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu)
-{
-RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(cpu);
-CPUClass *cc = CPU_CLASS(mcc);
-
-/* Validate that MISA_MXL is set properly. */
-switch (mcc->misa_mxl_max) {
-#ifdef TARGET_RISCV64
-case MXL_RV64:
-case MXL_RV128:
-cc->gdb_core_xml_file = "riscv-64bit-cpu.xml";
-break;
-#endif
-case MXL_RV32:
-cc->gdb_core_xml_file = "riscv-32bit-cpu.xml";
-break;
-default:
-g_assert_not_reached();
-}
-}
-
 static void riscv_cpu_validate_v(CPURISCVState *env, RISCVCPUConfig *cfg,
  Error **errp)
 {
@@ -911,8 +890,6 @@ static bool riscv_tcg_cpu_realize(CPUState *cs, Error 
**errp)
 return false;
 }
 
-riscv_cpu_validate_misa_mxl(cpu);
-
 #ifndef CONFIG_USER_ONLY
 CPURISCVState *env = >env;
 Error *local_err = NULL;

-- 
2.43.0




Re: ehci: more than 16 ITDs

2024-02-03 Thread BALATON Zoltan

On Sat, 3 Feb 2024, Christian Gudrian wrote:

Hello!

I am trying to access a USB audio hardware (a Korg Kronos synthesizer)
running an ancient 2.6.32.11 kernel (due to the binary only Korg driver
modules) via USB pass through. Loading the audio driver immediately
leads to a reset of the EHCI controller due to a "processing error"
caused by the 17th ITD being fetched.

Entirely ignorant of what I'm doing I've patched hcd-ehci.c to accept
this 17th ITD. Needless to say that this attempt didn't work: the
refcount warning in the qh_get function (in ehci-mem.c) triggers and the
system halts.

Is this a direct consequence of me allowing 17 ITDs or is this caused by
another part of the Korg driver misbehaving or a bug in the EHCI
emulation of QEMU? How can I find out, what's going wrong?


It's hard to tell, I don't know EHCI but there's a TODO comment near that 
error so maybe there's some EHCI feature not emulated that you may need to 
implement in QEMU there. I guess you'd need to check the EHCI 
specification or the docs of the actual chip that is emulated for info on 
how this should work.


Also to get more debugging info to see what's happening you can add 
--trace enable="usb*" (or see qemu/hw/usb/trace-events for the list of 
trace points available that could be enabled individually) in case you 
were not aware of that. There are several ways to enable trace points, see 
QEMU docs on that. Apart from the simple enable=pattern these can also be 
controlled from QEMU monitor or put the list in a text file and use that 
with --trace to enable several of these.


Regards,
BALATON Zoltan



Re: ehci: more than 16 ITDs

2024-02-03 Thread Christian Gudrian



On 03.02.2024 21:06, BALATON Zoltan wrote:


It's hard to tell, I don't know EHCI but there's a TODO comment near
that error so maybe there's some EHCI feature not emulated that you
may need to implement in QEMU there. I guess you'd need to check the
EHCI specification or the docs of the actual chip that is emulated
for info on how this should work.


I'll have a look at it. Meanwhile I've added dedicated debug output and
found out, that the 17th ITD only occurs occasionally. Could it be the
16 ITD limit has been arbitrarily chosen? The host machine is not
particularly powerful so that buffers might fill up to a higher level
than usually expected.


Also to get more debugging info to see what's happening you can add
--trace enable="usb*" (or see qemu/hw/usb/trace-events for the list
of trace points available that could be enabled individually) in case
you were not aware of that.


Thanks for the hint. I indeed didn't know that. The audio hardware,
however, is particularly time sensitive during initialization and
requires a power cycle if anything appears strange to it. But I'll have
a try!

The refcount warning is triggered by inconsistent data structures in the
EHCI driver of the Linux kernel: while the 'periodic' array encodes a
structure type of 'ehci_qh' the pointer in the 'pshadow' array actually
points to a structure of type 'ehci_itd'. My hunch is I'm running into a
kernel race condition due to massively skewed timing conditions compared
with the original bare metal hardware (where the kernel runs just fine).

Christian



Re: [PATCH] tests/tcg: Fix the /proc/self/mem probing in the PROT_NONE gdbstub test

2024-02-03 Thread Michael Tokarev

01.02.2024 01:02, Ilya Leoshkevich wrote:

The `if not probe_proc_self_mem` check never passes, because
probe_proc_self_mem is a function object, which is a truthy value.
Add parentheses in order to perform a function call.

Fixes: dc84d50a7f9b ("tests/tcg: Add the PROT_NONE gdbstub test")


FWIW (it's too late already and this commit has landed in master),
commit "tests/tcg: Add the PROT_NONE gdbstub test" is 82607a73f8
not dc84d50a7f9b.

/mjt



Re: [PULL v2 00/58] tcg patch queue

2024-02-03 Thread Peter Maydell
On Sat, 3 Feb 2024 at 07:07, Richard Henderson
 wrote:
>
> v2: Rebase and resolve target/loongarch conflicts.
> Include linux-user/aarch64 vdso fix.
>
> r~
>
> The following changes since commit 29b008927ef6e3fbb70e6607b25d3fcae26a5190:
>
>   Merge tag 'pull-nic-config-2-20240202' of 
> git://git.infradead.org/users/dwmw2/qemu into staging (2024-02-02 16:47:36 
> +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20240202-2
>
> for you to fetch changes up to 6400be014f80e4c2c246eb8be709ea3a96428233:
>
>   linux-user/aarch64: Add padding before __kernel_rt_sigreturn (2024-02-03 
> 16:46:10 +1000)
>
> 
> tests/tcg: Fix multiarch/gdbstub/prot-none.py
> hw/core: Convert cpu_mmu_index to a CPUClass hook
> tcg/loongarch64: Set vector registers call clobbered
> target/sparc: floating-point cleanup
> linux-user/aarch64: Add padding before __kernel_rt_sigreturn
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM



Re: [PULL 0/5] QAPI patches patches for 2024-02-03

2024-02-03 Thread Peter Maydell
On Sat, 3 Feb 2024 at 08:34, Markus Armbruster  wrote:
>
> The following changes since commit 29b008927ef6e3fbb70e6607b25d3fcae26a5190:
>
>   Merge tag 'pull-nic-config-2-20240202' of 
> git://git.infradead.org/users/dwmw2/qemu into staging (2024-02-02 16:47:36 
> +)
>
> are available in the Git repository at:
>
>   https://repo.or.cz/qemu/armbru.git tags/pull-qapi-2024-02-03
>
> for you to fetch changes up to 3424ed6caf9759eb57405d965537fd5f3d70026b:
>
>   qga/qapi-schema: Move command description right after command name 
> (2024-02-03 09:20:33 +0100)
>
> 
> QAPI patches patches for 2024-02-03
>
> 
> Markus Armbruster (5):
>   qapi: Drop redundant documentation of inherited members
>   qapi: Drop redundant documentation of conditional
>   qapi: Elide "Potential additional modes" from generated docs
>   qga: Move type description right after type name
>   qga/qapi-schema: Move command description right after command name


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM



ehci: more than 16 ITDs

2024-02-03 Thread Christian Gudrian

Hello!

I am trying to access a USB audio hardware (a Korg Kronos synthesizer)
running an ancient 2.6.32.11 kernel (due to the binary only Korg driver
modules) via USB pass through. Loading the audio driver immediately
leads to a reset of the EHCI controller due to a "processing error"
caused by the 17th ITD being fetched.

Entirely ignorant of what I'm doing I've patched hcd-ehci.c to accept
this 17th ITD. Needless to say that this attempt didn't work: the
refcount warning in the qh_get function (in ehci-mem.c) triggers and the
system halts.

Is this a direct consequence of me allowing 17 ITDs or is this caused by
another part of the Korg driver misbehaving or a bug in the EHCI
emulation of QEMU? How can I find out, what's going wrong?

Thanks for any hints!

Christian



Re: ehci: more than 16 ITDs

2024-02-03 Thread Christian Gudrian



On 03.02.2024 21:06, BALATON Zoltan wrote:


It's hard to tell, I don't know EHCI but there's a TODO comment near
that error so maybe there's some EHCI feature not emulated that you
may need to implement in QEMU there. I guess you'd need to check the
EHCI specification or the docs of the actual chip that is emulated
for info on how this should work.


I'll have a look at it. Meanwhile I've added dedicated debug output and
found out, that the 17th ITD only occurs occasionally. Could it be the
16 ITD limit has been arbitrarily chosen? The host machine is not
particularly powerful so that buffers might fill up to a higher level
than usually expected.


Also to get more debugging info to see what's happening you can add
--trace enable="usb*" (or see qemu/hw/usb/trace-events for the list
of trace points available that could be enabled individually) in case
you were not aware of that.


Thanks for the hint. I indeed didn't know that. The audio hardware,
however, is particularly time sensitive during initialization and
requires a power cycle if anything appears strange to it. But I'll have
a try!

The refcount warning is triggered by inconsistent data structures in the
EHCI driver of the Linux kernel: while the 'periodic' array encodes a
structure type of 'ehci_qh' the pointer in the 'pshadow' array actually
points to a structure of type 'ehci_itd'. My hunch is I'm running into a
kernel race condition due to massively skewed timing conditions compared
with the original bare metal hardware (where the kernel runs just fine).

Christian



Re: [PATCH v4 1/1] oslib-posix: initialize backend memory objects in parallel

2024-02-03 Thread Dongli Zhang



On 1/31/24 08:53, Mark Kanda wrote:
> QEMU initializes preallocated backend memory as the objects are parsed from
> the command line. This is not optimal in some cases (e.g. memory spanning
> multiple NUMA nodes) because the memory objects are initialized in series.
> 
> Allow the initialization to occur in parallel (asynchronously). In order to
> ensure optimal thread placement, asynchronous initialization requires prealloc
> context threads to be in use.
> 
> Signed-off-by: Mark Kanda 
> Signed-off-by: David Hildenbrand 
> ---
>  backends/hostmem.c |   7 ++-
>  hw/virtio/virtio-mem.c |   4 +-
>  include/hw/qdev-core.h |   5 ++
>  include/qemu/osdep.h   |  18 +-
>  system/vl.c|   9 +++
>  util/oslib-posix.c | 131 +++--
>  util/oslib-win32.c |   8 ++-
>  7 files changed, 145 insertions(+), 37 deletions(-)
> 
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index 30f69b2cb5..17221e422a 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -20,6 +20,7 @@
>  #include "qom/object_interfaces.h"
>  #include "qemu/mmap-alloc.h"
>  #include "qemu/madvise.h"
> +#include "hw/qdev-core.h"
>  
>  #ifdef CONFIG_NUMA
>  #include 
> @@ -237,7 +238,7 @@ static void host_memory_backend_set_prealloc(Object *obj, 
> bool value,
>  uint64_t sz = memory_region_size(>mr);
>  
>  if (!qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
> -   backend->prealloc_context, errp)) {
> +   backend->prealloc_context, false, errp)) {
>  return;
>  }
>  backend->prealloc = true;
> @@ -323,6 +324,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
> Error **errp)
>  HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc);
>  void *ptr;
>  uint64_t sz;
> +bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED);
>  
>  if (!bc->alloc) {
>  return;
> @@ -398,7 +400,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
> Error **errp)
>  if (backend->prealloc && 
> !qemu_prealloc_mem(memory_region_get_fd(>mr),
>  ptr, sz,
>  backend->prealloc_threads,
> -backend->prealloc_context, 
> errp)) {
> +backend->prealloc_context,
> +async, errp)) {
>  return;
>  }
>  }
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 99ab989852..ffd119ebac 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -605,7 +605,7 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, 
> uint64_t start_gpa,
>  int fd = memory_region_get_fd(>memdev->mr);
>  Error *local_err = NULL;
>  
> -if (!qemu_prealloc_mem(fd, area, size, 1, NULL, _err)) {
> +if (!qemu_prealloc_mem(fd, area, size, 1, NULL, false, _err)) {
>  static bool warned;
>  
>  /*
> @@ -1248,7 +1248,7 @@ static int virtio_mem_prealloc_range_cb(VirtIOMEM 
> *vmem, void *arg,
>  int fd = memory_region_get_fd(>memdev->mr);
>  Error *local_err = NULL;
>  
> -if (!qemu_prealloc_mem(fd, area, size, 1, NULL, _err)) {
> +if (!qemu_prealloc_mem(fd, area, size, 1, NULL, false, _err)) {
>  error_report_err(local_err);
>  return -ENOMEM;
>  }
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 151d968238..83dd9e2485 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -1071,6 +1071,11 @@ typedef enum MachineInitPhase {
>   */
>  PHASE_ACCEL_CREATED,
>  
> +/*
> + * Late backend objects have been created and initialized.
> + */
> +PHASE_LATE_BACKENDS_CREATED,
> +
>  /*
>   * machine_class->init has been called, thus creating any embedded
>   * devices and validating machine properties.  Devices created at
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index c9692cc314..7d359dabc4 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -680,6 +680,8 @@ typedef struct ThreadContext ThreadContext;
>   * @area: start address of the are to preallocate
>   * @sz: the size of the area to preallocate
>   * @max_threads: maximum number of threads to use
> + * @tc: prealloc context threads pointer, NULL if not in use
> + * @async: request asynchronous preallocation, requires @tc
>   * @errp: returns an error if this function fails
>   *
>   * Preallocate memory (populate/prefault page tables writable) for the 
> virtual
> @@ -687,10 +689,24 @@ typedef struct ThreadContext ThreadContext;
>   * each page in the area was faulted in writable at least once, for example,
>   * after allocating file blocks for mapped files.
>   *
> + * When setting @async, allocation might be performed asynchronously.
> + * 

Re: [PATCH v3 25/33] tests/tcg: Extend file in linux-madvise.c

2024-02-03 Thread Richard Henderson

On 1/30/24 23:13, Ilya Leoshkevich wrote:

Is there a way to set the guest page size from
the command line?


No.  Before this patch set, guest page size was a compile-time constant.  Afterward, it 
would be possible for any TARGET_PAGE_BITS_VARY target.


I refrained from adding such a command-line switch because, other than selecting the host 
page size, it only leads to breakage.  That doesn't seem helpful, even for developers.



r~