date:20240131

Re: Dynamic & heterogeneous machines, initial configuration: problems

2024-01-31 Thread Zhao Liu

Hi Markus,

On Wed, Jan 31, 2024 at 09:14:21PM +0100, Markus Armbruster wrote:
> Date: Wed, 31 Jan 2024 21:14:21 +0100
> From: Markus Armbruster 
> Subject: Dynamic & heterogeneous machines, initial configuration: problems
> 
> This memo is the fruit of discussions with Philippe Mathieu-Daudé.
> Its errors are mine.
> 
> QEMU defines machines statically in C code.  We've long wished we could
> define them dynamically in some suitable DSL.  This is what we call
> "dynamic machines".
> 
> There's a need for machines that contain more than one target's CPUs.
> This is what we call "heterogeneous machines".  They require a single
> binary capable of any of the targets involved.
> 
> There's substantial overlap with a seemingly unrelated problem:
> machine-friendly initial configuration.
> 
> To keep the memo's length in check (sort of), it focuses on (known)
> problems.
> 
> 
> = Problem 1: Initial configuration =
> 
> Previously discussed in
> 
> Subject: Redesign of QEMU startup & initial configuration
> Date: Thu, 02 Dec 2021 07:57:38 +0100
> Message-ID: <87lf13cx3x@dusky.pond.sub.org>
> 
> 
> == What users want for initial configuration ==
> 
> 1. QMP only
> 
>Management applications need to use QMP for monitoring anyway.  They
>may want to use it for initial configuration, too.  Libvirt does.
> 
>They still need to bootstrap a QMP monitor, and for that, CLI is fine
>as long as it's simple and stable.
> 
> 2. CLI and configuration files
> 
>Human users want a CLI and configuration files.
> 
>CLI is good for quick tweaks, and to explore.
> 
>For more permanent, non-trivial configuration, configuration files
>are more suitable, because they are easier to read, edit, and
>document than long command lines.
> 
> 
> == What we have for initial configuration ==
> 
> Half of 1. and half of 2., satisfying nobody's needs.
> 
> Management applications need to do a lot of non-trivial initial
> configuration with the CLI.
> 
> Human users struggle with inconsistent syntax, insufficiently expressive
> configuration files, and huge command lines.
> 
> 
> = Problem 2: Defining machines =
> 
> This is how I understand the problem.  Please correct me where I'm off.
> 
> 
> == How we'd like to build machines ==
> 
> We want to build machines declaratively, by configuring devices and
> their connections.
> 
> We want to build composite devices the same way.
> 
> The non-composite devices are provided by the QEMU binary.
> 
> Users want to build machines as variations of canned machine types
> shipped with QEMU.  Some users may want to build their own machines from
> scratch.
> 
> To enable all this, machine configuration needs to be composable and
> dynamic.
> 
> Composable means configuration can be assembled from components,
> recursively.
> 
> Dynamic means it can be done during qemu-system-FOO initial
> configuration.
> 
> 
> == What we have for defining machines ==
> 
> A QEMU binary provides a fixed set of device types, some of them
> composite, and a fixed set of machine types.
> 
> Machines are QOM objects: instance of a concrete subtype of "machine".
> 
> Devices are usually QOM objects: instance of a concrete subtype of
> "device".  Exceptions remain in old code nobody can be bothered to
> update.
> 
> Both machine types and composite devices are built from devices
> by code, i.e. imperatively, not declaratively.
> 
> The code can be parameterized.  For QOM objects, parameters should be
> QOM properties, but machine type code additionally uses global old-style
> configuration such as -drive and -serial.
> 
> Code may create default backends for convenience.  Machine type code may
> also create backends to honor global old-style configuration.  Only some
> backends are QOM objects.
> 
> Machine types split their code between object creation (QOM methods
> .instance_init() and .instance_post_init()) and machine initialization
> (MachineClass method .init()).  However, basically everything is done in
> the latter.
> 
> QOM device types split their code between object creation and device
> realization (qdev method .realize()).  The actual split varies widely
> between devices.  Developers are commonly unsure what to put where.
> 
> After machine type code is done, the resulting machine can still be
> adjusted with device cold plug and unplug: -device, device_add,
> device_del.  Only works for a subset of the devices.
> 
> Related, but out of scope here: hot plug and unplug.
> 
> 
> = Common sub-problem: qemu-system initial startup =
> 
> QAPI/QMP is our most capable, flexible, and mature configuration
> interface.  We need to offer machine-friendly initial configuration via
> QMP, and we'd very much like to have a QAPI-based CLI and configuration
> files (see "What users want for initial configuration" above).
> 
> Dynamic machine configuration happens during initial startup.  This
> makes it part of the larger initial configuration problem.  We want an
> integrated

[PULL 1/1] target/loongarch: Fix qtest test-hmp error when KVM-only build

2024-01-31 Thread Song Gao

The cc->sysemu_ops->get_phys_page_debug() is NULL when
KVM-only build. this patch fixes it.

Signed-off-by: Song Gao 
Tested-by: Bibo Mao 
Message-Id: <20240125061401.52526-1-gaos...@loongson.cn>
---
 target/loongarch/cpu.c|   2 -
 target/loongarch/cpu_helper.c | 231 ++
 target/loongarch/internals.h  |  20 ++-
 target/loongarch/meson.build  |   1 +
 target/loongarch/tcg/tlb_helper.c | 230 -
 5 files changed, 250 insertions(+), 234 deletions(-)
 create mode 100644 target/loongarch/cpu_helper.c

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index fb8dde7def..76eb4961d5 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -752,9 +752,7 @@ static const TCGCPUOps loongarch_tcg_ops = {
 #include "hw/core/sysemu-cpu-ops.h"
 
 static const struct SysemuCPUOps loongarch_sysemu_ops = {
-#ifdef CONFIG_TCG
 .get_phys_page_debug = loongarch_cpu_get_phys_page_debug,
-#endif
 };
 
 static int64_t loongarch_cpu_get_arch_id(CPUState *cs)
diff --git a/target/loongarch/cpu_helper.c b/target/loongarch/cpu_helper.c
new file mode 100644
index 00..f68d63f466
--- /dev/null
+++ b/target/loongarch/cpu_helper.c
@@ -0,0 +1,231 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * LoongArch CPU helpers for qemu
+ *
+ * Copyright (c) 2024 Loongson Technology Corporation Limited
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "internals.h"
+#include "cpu-csr.h"
+
+static int loongarch_map_tlb_entry(CPULoongArchState *env, hwaddr *physical,
+   int *prot, target_ulong address,
+   int access_type, int index, int mmu_idx)
+{
+LoongArchTLB *tlb = >tlb[index];
+uint64_t plv = mmu_idx;
+uint64_t tlb_entry, tlb_ppn;
+uint8_t tlb_ps, n, tlb_v, tlb_d, tlb_plv, tlb_nx, tlb_nr, tlb_rplv;
+
+if (index >= LOONGARCH_STLB) {
+tlb_ps = FIELD_EX64(tlb->tlb_misc, TLB_MISC, PS);
+} else {
+tlb_ps = FIELD_EX64(env->CSR_STLBPS, CSR_STLBPS, PS);
+}
+n = (address >> tlb_ps) & 0x1;/* Odd or even */
+
+tlb_entry = n ? tlb->tlb_entry1 : tlb->tlb_entry0;
+tlb_v = FIELD_EX64(tlb_entry, TLBENTRY, V);
+tlb_d = FIELD_EX64(tlb_entry, TLBENTRY, D);
+tlb_plv = FIELD_EX64(tlb_entry, TLBENTRY, PLV);
+if (is_la64(env)) {
+tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY_64, PPN);
+tlb_nx = FIELD_EX64(tlb_entry, TLBENTRY_64, NX);
+tlb_nr = FIELD_EX64(tlb_entry, TLBENTRY_64, NR);
+tlb_rplv = FIELD_EX64(tlb_entry, TLBENTRY_64, RPLV);
+} else {
+tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY_32, PPN);
+tlb_nx = 0;
+tlb_nr = 0;
+tlb_rplv = 0;
+}
+
+/* Remove sw bit between bit12 -- bit PS*/
+tlb_ppn = tlb_ppn & ~(((0x1UL << (tlb_ps - 12)) -1));
+
+/* Check access rights */
+if (!tlb_v) {
+return TLBRET_INVALID;
+}
+
+if (access_type == MMU_INST_FETCH && tlb_nx) {
+return TLBRET_XI;
+}
+
+if (access_type == MMU_DATA_LOAD && tlb_nr) {
+return TLBRET_RI;
+}
+
+if (((tlb_rplv == 0) && (plv > tlb_plv)) ||
+((tlb_rplv == 1) && (plv != tlb_plv))) {
+return TLBRET_PE;
+}
+
+if ((access_type == MMU_DATA_STORE) && !tlb_d) {
+return TLBRET_DIRTY;
+}
+
+*physical = (tlb_ppn << R_TLBENTRY_64_PPN_SHIFT) |
+(address & MAKE_64BIT_MASK(0, tlb_ps));
+*prot = PAGE_READ;
+if (tlb_d) {
+*prot |= PAGE_WRITE;
+}
+if (!tlb_nx) {
+*prot |= PAGE_EXEC;
+}
+return TLBRET_MATCH;
+}
+
+/*
+ * One tlb entry holds an adjacent odd/even pair, the vpn is the
+ * content of the virtual page number divided by 2. So the
+ * compare vpn is bit[47:15] for 16KiB page. while the vppn
+ * field in tlb entry contains bit[47:13], so need adjust.
+ * virt_vpn = vaddr[47:13]
+ */
+bool loongarch_tlb_search(CPULoongArchState *env, target_ulong vaddr,
+  int *index)
+{
+LoongArchTLB *tlb;
+uint16_t csr_asid, tlb_asid, stlb_idx;
+uint8_t tlb_e, tlb_ps, tlb_g, stlb_ps;
+int i, compare_shift;
+uint64_t vpn, tlb_vppn;
+
+csr_asid = FIELD_EX64(env->CSR_ASID, CSR_ASID, ASID);
+stlb_ps = FIELD_EX64(env->CSR_STLBPS, CSR_STLBPS, PS);
+vpn = (vaddr & TARGET_VIRT_MASK) >> (stlb_ps + 1);
+stlb_idx = vpn & 0xff; /* VA[25:15] <==> TLBIDX.index for 16KiB Page */
+compare_shift = stlb_ps + 1 - R_TLB_MISC_VPPN_SHIFT;
+
+/* Search STLB */
+for (i = 0; i < 8; ++i) {
+tlb = >tlb[i * 256 + stlb_idx];
+tlb_e = FIELD_EX64(tlb->tlb_misc, TLB_MISC, E);
+if (tlb_e) {
+tlb_vppn = FIELD_EX64(tlb->tlb_misc, TLB_MISC, VPPN);
+tlb_asid = FIELD_EX64(tlb->tlb_misc, TLB_MISC, ASID);
+tlb_g = FIELD_EX64(tlb->tlb_entry0, TLBENTRY, G);
+
+if ((tlb_g == 1 || tlb_asid == csr_asid) &&
+(vpn == (tlb_vppn >>

[PULL 0/1] loongarch-to-apply queue

2024-01-31 Thread Song Gao

The following changes since commit bd2e12310b18b51aefbf834e6d54989fd175976f:

  Merge tag 'qga-pull-2024-01-30' of https://github.com/kostyanf14/qemu into 
staging (2024-01-30 15:53:46 +)

are available in the Git repository at:

  https://gitlab.com/gaosong/qemu.git tags/pull-loongarch-20240201

for you to fetch changes up to 27edd5040cae63bfa92c68f69883ba81aa3b6cda:

  target/loongarch: Fix qtest test-hmp error when KVM-only build (2024-02-01 
15:29:40 +0800)


pull-loongarch-20240201


Song Gao (1):
  target/loongarch: Fix qtest test-hmp error when KVM-only build

 target/loongarch/cpu.c|   2 -
 target/loongarch/cpu_helper.c | 231 ++
 target/loongarch/internals.h  |  20 +++-
 target/loongarch/meson.build  |   1 +
 target/loongarch/tcg/tlb_helper.c | 230 -
 5 files changed, 250 insertions(+), 234 deletions(-)
 create mode 100644 target/loongarch/cpu_helper.c

[PATCH rfcv2 13/18] intel_iommu: Extract out vtd_cap_init to initialize cap/ecap

2024-01-31 Thread Zhenzhong Duan

This is a prerequisite for host cap/ecap sync.

No functional change intended.

Reviewed-by: Eric Auger 
Signed-off-by: Zhenzhong Duan 
---
 hw/i386/intel_iommu.c | 93 ---
 1 file changed, 51 insertions(+), 42 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 9b62441439..ffa1ad6429 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4003,30 +4003,10 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
*iommu_mr, IOMMUNotifier *n)
 return;
 }
 
-/* Do the initialization. It will also be called when reset, so pay
- * attention when adding new initialization stuff.
- */
-static void vtd_init(IntelIOMMUState *s)
+static void vtd_cap_init(IntelIOMMUState *s)
 {
 X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
 
-memset(s->csr, 0, DMAR_REG_SIZE);
-memset(s->wmask, 0, DMAR_REG_SIZE);
-memset(s->w1cmask, 0, DMAR_REG_SIZE);
-memset(s->womask, 0, DMAR_REG_SIZE);
-
-s->root = 0;
-s->root_scalable = false;
-s->dmar_enabled = false;
-s->intr_enabled = false;
-s->iq_head = 0;
-s->iq_tail = 0;
-s->iq = 0;
-s->iq_size = 0;
-s->qi_enabled = false;
-s->iq_last_desc_type = VTD_INV_DESC_NONE;
-s->iq_dw = false;
-s->next_frcd_reg = 0;
 s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
  VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
  VTD_CAP_MGAW(s->aw_bits);
@@ -4043,27 +4023,6 @@ static void vtd_init(IntelIOMMUState *s)
 }
 s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
 
-/*
- * Rsvd field masks for spte
- */
-vtd_spte_rsvd[0] = ~0ULL;
-vtd_spte_rsvd[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits,
-  x86_iommu->dt_supported);
-vtd_spte_rsvd[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
-vtd_spte_rsvd[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
-vtd_spte_rsvd[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
-
-vtd_spte_rsvd_large[2] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits,
- 
x86_iommu->dt_supported);
-vtd_spte_rsvd_large[3] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits,
- 
x86_iommu->dt_supported);
-
-if (s->scalable_mode || s->snoop_control) {
-vtd_spte_rsvd[1] &= ~VTD_SPTE_SNP;
-vtd_spte_rsvd_large[2] &= ~VTD_SPTE_SNP;
-vtd_spte_rsvd_large[3] &= ~VTD_SPTE_SNP;
-}
-
 if (x86_iommu_ir_supported(x86_iommu)) {
 s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV;
 if (s->intr_eim == ON_OFF_AUTO_ON) {
@@ -4096,6 +4055,56 @@ static void vtd_init(IntelIOMMUState *s)
 if (s->pasid) {
 s->ecap |= VTD_ECAP_PASID;
 }
+}
+
+/*
+ * Do the initialization. It will also be called when reset, so pay
+ * attention when adding new initialization stuff.
+ */
+static void vtd_init(IntelIOMMUState *s)
+{
+X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
+
+memset(s->csr, 0, DMAR_REG_SIZE);
+memset(s->wmask, 0, DMAR_REG_SIZE);
+memset(s->w1cmask, 0, DMAR_REG_SIZE);
+memset(s->womask, 0, DMAR_REG_SIZE);
+
+s->root = 0;
+s->root_scalable = false;
+s->dmar_enabled = false;
+s->intr_enabled = false;
+s->iq_head = 0;
+s->iq_tail = 0;
+s->iq = 0;
+s->iq_size = 0;
+s->qi_enabled = false;
+s->iq_last_desc_type = VTD_INV_DESC_NONE;
+s->iq_dw = false;
+s->next_frcd_reg = 0;
+
+vtd_cap_init(s);
+
+/*
+ * Rsvd field masks for spte
+ */
+vtd_spte_rsvd[0] = ~0ULL;
+vtd_spte_rsvd[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits,
+  x86_iommu->dt_supported);
+vtd_spte_rsvd[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
+vtd_spte_rsvd[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
+vtd_spte_rsvd[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
+
+vtd_spte_rsvd_large[2] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits,
+x86_iommu->dt_supported);
+vtd_spte_rsvd_large[3] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits,
+x86_iommu->dt_supported);
+
+if (s->scalable_mode || s->snoop_control) {
+vtd_spte_rsvd[1] &= ~VTD_SPTE_SNP;
+vtd_spte_rsvd_large[2] &= ~VTD_SPTE_SNP;
+vtd_spte_rsvd_large[3] &= ~VTD_SPTE_SNP;
+}
 
 vtd_reset_caches(s);
 
-- 
2.34.1

[PATCH rfcv2 11/18] intel_iommu: Add set/unset_iommu_device callback

2024-01-31 Thread Zhenzhong Duan

From: Yi Liu 

This adds set/unset_iommu_device() implementation in Intel vIOMMU.
In set call, a pointer to host IOMMU device info is stored in hash
table indexed by PCI BDF.

Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 hw/i386/intel_iommu_internal.h | 14 +++
 include/hw/i386/intel_iommu.h  |  2 +
 hw/i386/intel_iommu.c  | 74 ++
 3 files changed, 90 insertions(+)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index f8cf99bddf..3301f54b35 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -28,6 +28,8 @@
 #ifndef HW_I386_INTEL_IOMMU_INTERNAL_H
 #define HW_I386_INTEL_IOMMU_INTERNAL_H
 #include "hw/i386/intel_iommu.h"
+#include "sysemu/host_iommu_device.h"
+#include "hw/vfio/vfio-common.h"
 
 /*
  * Intel IOMMU register specification
@@ -537,4 +539,16 @@ typedef struct VTDRootEntry VTDRootEntry;
 #define VTD_SL_IGN_COM  0xbff0ULL
 #define VTD_SL_TM   (1ULL << 62)
 
+
+typedef struct VTDHostIOMMUDevice {
+IntelIOMMUState *iommu_state;
+PCIBus *bus;
+uint8_t devfn;
+union {
+HostIOMMUDevice *dev;
+IOMMULegacyDevice *ldev;
+IOMMUFDDevice *idev;
+};
+QLIST_ENTRY(VTDHostIOMMUDevice) next;
+} VTDHostIOMMUDevice;
 #endif
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 7fa0a695c8..bbc7b96add 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -292,6 +292,8 @@ struct IntelIOMMUState {
 /* list of registered notifiers */
 QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers;
 
+GHashTable *vtd_host_iommu_dev; /* VTDHostIOMMUDevice */
+
 /* interrupt remapping */
 bool intr_enabled;  /* Whether guest enabled IR */
 dma_addr_t intr_root;   /* Interrupt remapping table pointer */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 1a07faddb4..9b62441439 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -237,6 +237,13 @@ static gboolean vtd_as_equal(gconstpointer v1, 
gconstpointer v2)
(key1->pasid == key2->pasid);
 }
 
+static gboolean vtd_as_idev_equal(gconstpointer v1, gconstpointer v2)
+{
+const struct vtd_as_key *key1 = v1;
+const struct vtd_as_key *key2 = v2;
+
+return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
+}
 /*
  * Note that we use pointer to PCIBus as the key, so hashing/shifting
  * based on the pointer value is intended. Note that we deal with
@@ -3812,6 +3819,68 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus,
 return vtd_dev_as;
 }
 
+static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
+HostIOMMUDevice *base_dev, Error **errp)
+{
+IntelIOMMUState *s = opaque;
+VTDHostIOMMUDevice *vtd_hdev;
+struct vtd_as_key key = {
+.bus = bus,
+.devfn = devfn,
+};
+struct vtd_as_key *new_key;
+
+assert(base_dev);
+
+vtd_iommu_lock(s);
+
+vtd_hdev = g_hash_table_lookup(s->vtd_host_iommu_dev, );
+
+if (vtd_hdev) {
+error_setg(errp, "IOMMUFD device already exist");
+vtd_iommu_unlock(s);
+return -EEXIST;
+}
+
+vtd_hdev = g_malloc0(sizeof(VTDHostIOMMUDevice));
+vtd_hdev->bus = bus;
+vtd_hdev->devfn = (uint8_t)devfn;
+vtd_hdev->iommu_state = s;
+vtd_hdev->dev = base_dev;
+
+new_key = g_malloc(sizeof(*new_key));
+new_key->bus = bus;
+new_key->devfn = devfn;
+
+g_hash_table_insert(s->vtd_host_iommu_dev, new_key, vtd_hdev);
+
+vtd_iommu_unlock(s);
+
+return 0;
+}
+
+static void vtd_dev_unset_iommu_device(PCIBus *bus, void *opaque, int devfn)
+{
+IntelIOMMUState *s = opaque;
+VTDHostIOMMUDevice *vtd_hdev;
+struct vtd_as_key key = {
+.bus = bus,
+.devfn = devfn,
+};
+
+vtd_iommu_lock(s);
+
+vtd_hdev = g_hash_table_lookup(s->vtd_host_iommu_dev, );
+if (!vtd_hdev) {
+vtd_iommu_unlock(s);
+return;
+}
+
+g_hash_table_remove(s->vtd_host_iommu_dev, );
+
+vtd_iommu_unlock(s);
+}
+
 /* Unmap the whole range in the notifier's scope. */
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
 {
@@ -4107,6 +4176,8 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void 
*opaque, int devfn)
 
 static PCIIOMMUOps vtd_iommu_ops = {
 .get_address_space = vtd_host_dma_iommu,
+.set_iommu_device = vtd_dev_set_iommu_device,
+.unset_iommu_device = vtd_dev_unset_iommu_device,
 };
 
 static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
@@ -4230,6 +4301,9 @@ static void vtd_realize(DeviceState *dev, Error **errp)
  g_free, g_free);
 s->vtd_address_spaces = g_hash_table_new_full(vtd_as_hash, vtd_as_equal,
   g_free, g_free);
+

[PATCH rfcv2 16/18] intel_iommu: Implement check and sync mechanism in iommufd mode

2024-01-31 Thread Zhenzhong Duan

We use cap_frozen to mark cap/ecap read/writable or read-only,
At init stage, we allow to update cap/ecap based on host IOMMU
cap/ecap, but when machine create done, cap_frozen is set and
we only allow checking cap/ecap for compatibility.

Currently only stage-2 translation is supported which is backed by
shadow page table on host side. So we don't need exact matching of
each bit of cap/ecap between vIOMMU and host. However, we can still
ensure compatibility of host and vIOMMU's address width at least,
i.e., vIOMMU's mgaw <= host IOMMU mgaw, which is missed before.

When stage-1 translation is supported in future, a.k.a. scalable
modern mode, this mechanism will be further extended to check more
bits.

Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 hw/i386/intel_iommu_internal.h |  1 +
 include/hw/i386/intel_iommu.h  |  1 +
 hw/i386/intel_iommu.c  | 29 +
 3 files changed, 31 insertions(+)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 3301f54b35..33d2298dce 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -206,6 +206,7 @@
 #define VTD_DOMAIN_ID_MASK  ((1UL << VTD_DOMAIN_ID_SHIFT) - 1)
 #define VTD_CAP_ND  (((VTD_DOMAIN_ID_SHIFT - 4) / 2) & 7ULL)
 #define VTD_ADDRESS_SIZE(aw)(1ULL << (aw))
+#define VTD_CAP_MGAW_MASK   (0x3fULL << 16)
 #define VTD_CAP_MGAW(aw)aw) - 1) & 0x3fULL) << 16)
 #define VTD_MAMV18ULL
 #define VTD_CAP_MAMV(VTD_MAMV << 48)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index c71a133820..a0b530ebc6 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -47,6 +47,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(IntelIOMMUState, 
INTEL_IOMMU_DEVICE)
 #define VTD_HOST_AW_48BIT   48
 #define VTD_HOST_ADDRESS_WIDTH  VTD_HOST_AW_39BIT
 #define VTD_HAW_MASK(aw)((1ULL << (aw)) - 1)
+#define VTD_MGAW_FROM_CAP(cap)  (((cap >> 16) & 0x3fULL) + 1)
 
 #define DMAR_REPORT_F_INTR  (1)
 
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 7ed2b79669..409f8a59c3 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -35,6 +35,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/dma.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/iommufd.h"
 #include "hw/i386/apic_internal.h"
 #include "kvm/kvm_i386.h"
 #include "migration/vmstate.h"
@@ -3830,6 +3831,34 @@ static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
   IOMMUFDDevice *idev,
   Error **errp)
 {
+struct iommu_hw_info_vtd vtd;
+enum iommu_hw_info_type type = IOMMU_HW_INFO_TYPE_INTEL_VTD;
+long host_mgaw, viommu_mgaw = VTD_MGAW_FROM_CAP(s->cap);
+uint64_t tmp_cap = s->cap;
+int ret;
+
+ret = iommufd_device_get_info(idev, , sizeof(vtd), , errp);
+if (ret) {
+return ret;
+}
+
+if (type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
+error_setg(errp, "IOMMU hardware is not compatible");
+return -EINVAL;
+}
+
+host_mgaw = VTD_MGAW_FROM_CAP(vtd.cap_reg);
+if (viommu_mgaw > host_mgaw) {
+if (s->cap_frozen) {
+error_setg(errp, "mgaw %" PRId64 " > host mgaw %" PRId64,
+   viommu_mgaw, host_mgaw);
+return -EINVAL;
+}
+tmp_cap &= ~VTD_CAP_MGAW_MASK;
+tmp_cap |= VTD_CAP_MGAW(host_mgaw + 1);
+}
+
+s->cap = tmp_cap;
 return 0;
 }
 
-- 
2.34.1

[PATCH rfcv2 18/18] intel_iommu: Block migration if cap is updated

2024-01-31 Thread Zhenzhong Duan

When there is VFIO device and vIOMMU cap/ecap is updated based on host
IOMMU cap/ecap, migration should be blocked.

Signed-off-by: Zhenzhong Duan 
---
 hw/i386/intel_iommu.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 72cc8b2c71..7f9ff653b2 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -39,6 +39,7 @@
 #include "hw/i386/apic_internal.h"
 #include "kvm/kvm_i386.h"
 #include "migration/vmstate.h"
+#include "migration/blocker.h"
 #include "trace.h"
 
 #define S_AW_BITS (VTD_MGAW_FROM_CAP(s->cap) + 1)
@@ -3829,6 +3830,8 @@ static int vtd_check_legacy_hdev(IntelIOMMUState *s,
 return 0;
 }
 
+static Error *vtd_mig_blocker;
+
 static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
   IOMMUFDDevice *idev,
   Error **errp)
@@ -3860,8 +3863,17 @@ static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
 tmp_cap |= VTD_CAP_MGAW(host_mgaw + 1);
 }
 
-s->cap = tmp_cap;
-return 0;
+if (s->cap != tmp_cap) {
+if (vtd_mig_blocker == NULL) {
+error_setg(_mig_blocker,
+   "cap/ecap update from host IOMMU block migration");
+ret = migrate_add_blocker(_mig_blocker, errp);
+}
+if (!ret) {
+s->cap = tmp_cap;
+}
+}
+return ret;
 }
 
 static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hdev,
-- 
2.34.1

[PATCH rfcv2 01/18] Introduce a common abstract struct HostIOMMUDevice

2024-01-31 Thread Zhenzhong Duan

HostIOMMUDevice will be inherited by two sub classes,
legacy and iommufd currently.

Introduce a helper function host_iommu_base_device_init to initialize it.

Suggested-by: Eric Auger 
Signed-off-by: Zhenzhong Duan 
---
 include/sysemu/host_iommu_device.h | 22 ++
 1 file changed, 22 insertions(+)
 create mode 100644 include/sysemu/host_iommu_device.h

diff --git a/include/sysemu/host_iommu_device.h 
b/include/sysemu/host_iommu_device.h
new file mode 100644
index 00..fe80ab25fb
--- /dev/null
+++ b/include/sysemu/host_iommu_device.h
@@ -0,0 +1,22 @@
+#ifndef HOST_IOMMU_DEVICE_H
+#define HOST_IOMMU_DEVICE_H
+
+typedef enum HostIOMMUDevice_Type {
+HID_LEGACY,
+HID_IOMMUFD,
+HID_MAX,
+} HostIOMMUDevice_Type;
+
+typedef struct HostIOMMUDevice {
+HostIOMMUDevice_Type type;
+size_t size;
+} HostIOMMUDevice;
+
+static inline void host_iommu_base_device_init(HostIOMMUDevice *dev,
+   HostIOMMUDevice_Type type,
+   size_t size)
+{
+dev->type = type;
+dev->size = size;
+}
+#endif
-- 
2.34.1

[PATCH rfcv2 05/18] vfio: Remove redundant iommufd and devid elements in VFIODevice

2024-01-31 Thread Zhenzhong Duan

iommufd and devid in VFIODevice are redundant with the ones
in IOMMUFDDevice, so remove them.

Suggested-by: Eric Auger 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h |  2 --
 hw/vfio/ap.c  |  2 +-
 hw/vfio/ccw.c |  2 +-
 hw/vfio/common.c  |  2 +-
 hw/vfio/helpers.c |  2 +-
 hw/vfio/iommufd.c | 26 ++
 hw/vfio/pci.c |  2 +-
 hw/vfio/platform.c|  3 ++-
 8 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 1bbad003ee..24e3eaaf3d 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -131,8 +131,6 @@ typedef struct VFIODevice {
 OnOffAuto pre_copy_dirty_page_tracking;
 bool dirty_pages_supported;
 bool dirty_tracking;
-int devid;
-IOMMUFDBackend *iommufd;
 union {
 HostIOMMUDevice base_hdev;
 IOMMULegacyDevice legacy_dev;
diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index e157aa1ff7..11526d93d4 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -198,7 +198,7 @@ static void vfio_ap_unrealize(DeviceState *dev)
 static Property vfio_ap_properties[] = {
 DEFINE_PROP_STRING("sysfsdev", VFIOAPDevice, vdev.sysfsdev),
 #ifdef CONFIG_IOMMUFD
-DEFINE_PROP_LINK("iommufd", VFIOAPDevice, vdev.iommufd,
+DEFINE_PROP_LINK("iommufd", VFIOAPDevice, vdev.iommufd_dev.iommufd,
  TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
 #endif
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 90e4a53437..b1b75ffa2a 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -667,7 +667,7 @@ static Property vfio_ccw_properties[] = {
 DEFINE_PROP_STRING("sysfsdev", VFIOCCWDevice, vdev.sysfsdev),
 DEFINE_PROP_BOOL("force-orb-pfch", VFIOCCWDevice, force_orb_pfch, false),
 #ifdef CONFIG_IOMMUFD
-DEFINE_PROP_LINK("iommufd", VFIOCCWDevice, vdev.iommufd,
+DEFINE_PROP_LINK("iommufd", VFIOCCWDevice, vdev.iommufd_dev.iommufd,
  TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
 #endif
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 059bfdc07a..8b3b575c9d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1505,7 +1505,7 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
 const VFIOIOMMUClass *ops =
 VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
 
-if (vbasedev->iommufd) {
+if (vbasedev->iommufd_dev.iommufd) {
 ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
 }
 
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index 6789870802..e5457ca326 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -626,7 +626,7 @@ int vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
 vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
 }
 } else {
-if (!vbasedev->iommufd) {
+if (!vbasedev->iommufd_dev.iommufd) {
 error_setg(errp, "Use FD passing only with iommufd backend");
 return -EINVAL;
 }
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 9bfddc1360..5d50549713 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -65,7 +65,7 @@ static void iommufd_cdev_kvm_device_del(VFIODevice *vbasedev)
 
 static int iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
 {
-IOMMUFDBackend *iommufd = vbasedev->iommufd;
+IOMMUFDBackend *iommufd = vbasedev->iommufd_dev.iommufd;
 struct vfio_device_bind_iommufd bind = {
 .argsz = sizeof(bind),
 .flags = 0,
@@ -96,9 +96,10 @@ static int iommufd_cdev_connect_and_bind(VFIODevice 
*vbasedev, Error **errp)
 goto err_bind;
 }
 
-vbasedev->devid = bind.out_devid;
+vbasedev->iommufd_dev.devid = bind.out_devid;
 trace_iommufd_cdev_connect_and_bind(bind.iommufd, vbasedev->name,
-vbasedev->fd, vbasedev->devid);
+vbasedev->fd,
+vbasedev->iommufd_dev.devid);
 return ret;
 err_bind:
 iommufd_cdev_kvm_device_del(vbasedev);
@@ -111,7 +112,7 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice 
*vbasedev)
 {
 /* Unbind is automatically conducted when device fd is closed */
 iommufd_cdev_kvm_device_del(vbasedev);
-iommufd_backend_disconnect(vbasedev->iommufd);
+iommufd_backend_disconnect(vbasedev->iommufd_dev.iommufd);
 }
 
 static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
@@ -181,7 +182,7 @@ out_free_path:
 static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
  Error **errp)
 {
-int ret, iommufd = vbasedev->iommufd->fd;
+int ret, iommufd = vbasedev->iommufd_dev.iommufd->fd;
 struct vfio_device_attach_iommufd_pt attach_data = {
 .argsz = sizeof(attach_data),

[PATCH rfcv2 10/18] hw/pci: Introduce pci_device_set/unset_iommu_device()

2024-01-31 Thread Zhenzhong Duan

From: Yi Liu 

This adds pci_device_set/unset_iommu_device() to set/unset
HostIOMMUDevice for a given PCIe device. Caller of set
should fail if set operation fails.

Extract out pci_device_get_iommu_bus_devfn() to facilitate
implementation of pci_device_set/unset_iommu_device().

Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Nicolin Chen 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/pci/pci.h | 38 ++-
 hw/pci/pci.c | 62 +---
 2 files changed, 96 insertions(+), 4 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index fa6313aabc..5b471fd380 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -3,6 +3,7 @@
 
 #include "exec/memory.h"
 #include "sysemu/dma.h"
+#include "sysemu/host_iommu_device.h"
 
 /* PCI includes legacy ISA access.  */
 #include "hw/isa/isa.h"
@@ -384,10 +385,45 @@ typedef struct PCIIOMMUOps {
  *
  * @devfn: device and function number
  */
-   AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int devfn);
+AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int devfn);
+/**
+ * @set_iommu_device: set iommufd device for a PCI device to vIOMMU
+ *
+ * Optional callback, if not implemented in vIOMMU, then vIOMMU can't
+ * utilize iommufd specific features.
+ *
+ * Return true if iommufd device is accepted, or else return false with
+ * errp set.
+ *
+ * @bus: the #PCIBus of the PCI device.
+ *
+ * @opaque: the data passed to pci_setup_iommu().
+ *
+ * @devfn: device and function number of the PCI device.
+ *
+ * @dev: the data structure representing host assigned device.
+ *
+ */
+int (*set_iommu_device)(PCIBus *bus, void *opaque, int devfn,
+HostIOMMUDevice *dev, Error **errp);
+/**
+ * @unset_iommu_device: unset iommufd device for a PCI device from vIOMMU
+ *
+ * Optional callback.
+ *
+ * @bus: the #PCIBus of the PCI device.
+ *
+ * @opaque: the data passed to pci_setup_iommu().
+ *
+ * @devfn: device and function number of the PCI device.
+ */
+void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
 } PCIIOMMUOps;
 
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
+int pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *base_dev,
+Error **errp);
+void pci_device_unset_iommu_device(PCIDevice *dev);
 
 /**
  * pci_setup_iommu: Initialize specific IOMMU handlers for a PCIBus
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 76080af580..8078307963 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2672,11 +2672,14 @@ static void pci_device_class_base_init(ObjectClass 
*klass, void *data)
 }
 }
 
-AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
+static void pci_device_get_iommu_bus_devfn(PCIDevice *dev,
+   PCIBus **aliased_bus,
+   PCIBus **piommu_bus,
+   int *aliased_devfn)
 {
 PCIBus *bus = pci_get_bus(dev);
 PCIBus *iommu_bus = bus;
-uint8_t devfn = dev->devfn;
+int devfn = dev->devfn;
 
 while (iommu_bus && !iommu_bus->iommu_ops && iommu_bus->parent_dev) {
 PCIBus *parent_bus = pci_get_bus(iommu_bus->parent_dev);
@@ -2717,13 +2720,66 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice 
*dev)
 
 iommu_bus = parent_bus;
 }
-if (!pci_bus_bypass_iommu(bus) && iommu_bus->iommu_ops) {
+
+assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
+assert(iommu_bus);
+
+if (pci_bus_bypass_iommu(bus) || !iommu_bus->iommu_ops) {
+iommu_bus = NULL;
+}
+
+*piommu_bus = iommu_bus;
+
+if (aliased_bus) {
+*aliased_bus = bus;
+}
+
+if (aliased_devfn) {
+*aliased_devfn = devfn;
+}
+}
+
+AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
+{
+PCIBus *bus;
+PCIBus *iommu_bus;
+int devfn;
+
+pci_device_get_iommu_bus_devfn(dev, , _bus, );
+if (iommu_bus) {
 return iommu_bus->iommu_ops->get_address_space(bus,
  iommu_bus->iommu_opaque, devfn);
 }
 return _space_memory;
 }
 
+int pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *base_dev,
+Error **errp)
+{
+PCIBus *iommu_bus;
+
+pci_device_get_iommu_bus_devfn(dev, NULL, _bus, NULL);
+if (iommu_bus && iommu_bus->iommu_ops->set_iommu_device) {
+return iommu_bus->iommu_ops->set_iommu_device(pci_get_bus(dev),
+  iommu_bus->iommu_opaque,
+  dev->devfn, base_dev,
+  errp);
+}
+return 0;
+}
+
+void pci_device_unset_iommu_device(PCIDevice *dev)
+{
+PCIBus *iommu_bus;
+

[PATCH rfcv2 14/18] intel_iommu: Add a framework to check and sync host IOMMU cap/ecap

2024-01-31 Thread Zhenzhong Duan

From: Yi Liu 

Add a framework to check and synchronize host IOMMU cap/ecap with
vIOMMU cap/ecap.

The sequence will be:

vtd_cap_init() initializes iommu->cap/ecap.
vtd_check_hdev() update iommu->cap/ecap based on host cap/ecap.
iommu->cap_frozen set when machine create done, iommu->cap/ecap become readonly.

Implementation details for different backends will be in following patches.

Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/i386/intel_iommu.h |  1 +
 hw/i386/intel_iommu.c | 41 ++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index bbc7b96add..c71a133820 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -283,6 +283,7 @@ struct IntelIOMMUState {
 
 uint64_t cap;   /* The value of capability reg */
 uint64_t ecap;  /* The value of extended capability reg */
+bool cap_frozen;/* cap/ecap become read-only after frozen 
*/
 
 uint32_t context_cache_gen; /* Should be in [1,MAX] */
 GHashTable *iotlb;  /* IOTLB */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index ffa1ad6429..7ed2b79669 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3819,6 +3819,31 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus,
 return vtd_dev_as;
 }
 
+static int vtd_check_legacy_hdev(IntelIOMMUState *s,
+ IOMMULegacyDevice *ldev,
+ Error **errp)
+{
+return 0;
+}
+
+static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
+  IOMMUFDDevice *idev,
+  Error **errp)
+{
+return 0;
+}
+
+static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hdev,
+  Error **errp)
+{
+HostIOMMUDevice *base_dev = vtd_hdev->dev;
+
+if (base_dev->type == HID_LEGACY) {
+return vtd_check_legacy_hdev(s, vtd_hdev->ldev, errp);
+}
+return vtd_check_iommufd_hdev(s, vtd_hdev->idev, errp);
+}
+
 static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
 HostIOMMUDevice *base_dev, Error **errp)
 {
@@ -3829,6 +3854,7 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void 
*opaque, int devfn,
 .devfn = devfn,
 };
 struct vtd_as_key *new_key;
+int ret;
 
 assert(base_dev);
 
@@ -3848,6 +3874,13 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void 
*opaque, int devfn,
 vtd_hdev->iommu_state = s;
 vtd_hdev->dev = base_dev;
 
+ret = vtd_check_hdev(s, vtd_hdev, errp);
+if (ret) {
+g_free(vtd_hdev);
+vtd_iommu_unlock(s);
+return ret;
+}
+
 new_key = g_malloc(sizeof(*new_key));
 new_key->bus = bus;
 new_key->devfn = devfn;
@@ -4083,7 +4116,9 @@ static void vtd_init(IntelIOMMUState *s)
 s->iq_dw = false;
 s->next_frcd_reg = 0;
 
-vtd_cap_init(s);
+if (!s->cap_frozen) {
+vtd_cap_init(s);
+}
 
 /*
  * Rsvd field masks for spte
@@ -4254,6 +4289,10 @@ static int vtd_machine_done_notify_one(Object *child, 
void *unused)
 
 static void vtd_machine_done_hook(Notifier *notifier, void *unused)
 {
+IntelIOMMUState *iommu = INTEL_IOMMU_DEVICE(x86_iommu_get_default());
+
+iommu->cap_frozen = true;
+
 object_child_foreach_recursive(object_get_root(),
vtd_machine_done_notify_one, NULL);
 }
-- 
2.34.1

[PATCH rfcv2 03/18] vfio: Introduce IOMMULegacyDevice

2024-01-31 Thread Zhenzhong Duan

Similar as IOMMUFDDevice, IOMMULegacyDevice represents a device in
legacy mode and can be used as a communication interface between
devices (i.e., VFIO, VDPA) and vIOMMU.

Currently it includes nothing legacy specific, but could be extended
with any wanted info of legacy mode when necessary.

IOMMULegacyDevice is willingly not a QOM object because we don't want
it to be visible from the user interface.

Suggested-by: Eric Auger 
Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 9b7ef7d02b..8bfb9cbe94 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -31,6 +31,7 @@
 #endif
 #include "sysemu/sysemu.h"
 #include "hw/vfio/vfio-container-base.h"
+#include "sysemu/host_iommu_device.h"
 
 #define VFIO_MSG_PREFIX "vfio %s: "
 
@@ -97,6 +98,11 @@ typedef struct VFIOIOMMUFDContainer {
 uint32_t ioas_id;
 } VFIOIOMMUFDContainer;
 
+/* Abstraction of host IOMMU legacy device */
+typedef struct IOMMULegacyDevice {
+HostIOMMUDevice base;
+} IOMMULegacyDevice;
+
 typedef struct VFIODeviceOps VFIODeviceOps;
 
 typedef struct VFIODevice {
-- 
2.34.1

[PATCH rfcv2 17/18] intel_iommu: Use mgaw instead of s->aw_bits

2024-01-31 Thread Zhenzhong Duan

Because vIOMMU mgaw can be updated based on host IOMMU mgaw, s->aw_bits
does't necessarily represent the final mgaw now but the mgaw field in
s->cap does.

Replace reference to s->aw_bits with a MACRO S_AW_BITS to fetch mgaw
from s->cap. There are two exceptions on this, aw_bits value sanity
check and s->cap initialization.

ACPI DMAR table is also updated with right mgaw value.

Signed-off-by: Zhenzhong Duan 
---
 hw/i386/acpi-build.c  |  3 ++-
 hw/i386/intel_iommu.c | 44 ++-
 2 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index edc979379c..6467157686 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2159,7 +2159,8 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker, 
const char *oem_id,
 
 acpi_table_begin(, table_data);
 /* Host Address Width */
-build_append_int_noprefix(table_data, intel_iommu->aw_bits - 1, 1);
+build_append_int_noprefix(table_data,
+  VTD_MGAW_FROM_CAP(intel_iommu->cap), 1);
 build_append_int_noprefix(table_data, dmar_flags, 1); /* Flags */
 g_array_append_vals(table_data, rsvd10, sizeof(rsvd10)); /* Reserved */
 
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 409f8a59c3..72cc8b2c71 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -41,6 +41,8 @@
 #include "migration/vmstate.h"
 #include "trace.h"
 
+#define S_AW_BITS (VTD_MGAW_FROM_CAP(s->cap) + 1)
+
 /* context entry operations */
 #define VTD_CE_GET_RID2PASID(ce) \
 ((ce)->val[1] & VTD_SM_CONTEXT_ENTRY_RID2PASID_MASK)
@@ -1409,13 +1411,13 @@ static int 
vtd_root_entry_rsvd_bits_check(IntelIOMMUState *s,
 {
 /* Legacy Mode reserved bits check */
 if (!s->root_scalable &&
-(re->hi || (re->lo & VTD_ROOT_ENTRY_RSVD(s->aw_bits
+(re->hi || (re->lo & VTD_ROOT_ENTRY_RSVD(S_AW_BITS
 goto rsvd_err;
 
 /* Scalable Mode reserved bits check */
 if (s->root_scalable &&
-((re->lo & VTD_ROOT_ENTRY_RSVD(s->aw_bits)) ||
- (re->hi & VTD_ROOT_ENTRY_RSVD(s->aw_bits
+((re->lo & VTD_ROOT_ENTRY_RSVD(S_AW_BITS)) ||
+ (re->hi & VTD_ROOT_ENTRY_RSVD(S_AW_BITS
 goto rsvd_err;
 
 return 0;
@@ -1432,7 +1434,7 @@ static inline int 
vtd_context_entry_rsvd_bits_check(IntelIOMMUState *s,
 {
 if (!s->root_scalable &&
 (ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI ||
- ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO(s->aw_bits))) {
+ ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO(S_AW_BITS))) {
 error_report_once("%s: invalid context entry: hi=%"PRIx64
   ", lo=%"PRIx64" (reserved nonzero)",
   __func__, ce->hi, ce->lo);
@@ -1440,7 +1442,7 @@ static inline int 
vtd_context_entry_rsvd_bits_check(IntelIOMMUState *s,
 }
 
 if (s->root_scalable &&
-(ce->val[0] & VTD_SM_CONTEXT_ENTRY_RSVD_VAL0(s->aw_bits) ||
+(ce->val[0] & VTD_SM_CONTEXT_ENTRY_RSVD_VAL0(S_AW_BITS) ||
  ce->val[1] & VTD_SM_CONTEXT_ENTRY_RSVD_VAL1 ||
  ce->val[2] ||
  ce->val[3])) {
@@ -1571,7 +1573,7 @@ static int 
vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
 .hook_fn = vtd_sync_shadow_page_hook,
 .private = (void *)_as->iommu,
 .notify_unmap = true,
-.aw = s->aw_bits,
+.aw = S_AW_BITS,
 .as = vtd_as,
 .domain_id = vtd_get_domain_id(s, ce, vtd_as->pasid),
 };
@@ -1990,7 +1992,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 }
 
 ret_fr = vtd_iova_to_slpte(s, , addr, is_write, , ,
-   , , s->aw_bits, pasid);
+   , , S_AW_BITS, pasid);
 if (ret_fr) {
 vtd_report_fault(s, -ret_fr, is_fpd_set, source_id,
  addr, is_write, pasid != PCI_NO_PASID, pasid);
@@ -2004,7 +2006,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 out:
 vtd_iommu_unlock(s);
 entry->iova = addr & page_mask;
-entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask;
+entry->translated_addr = vtd_get_slpte_addr(slpte, S_AW_BITS) & page_mask;
 entry->addr_mask = ~page_mask;
 entry->perm = access_flags;
 return true;
@@ -2021,7 +2023,7 @@ error:
 static void vtd_root_table_setup(IntelIOMMUState *s)
 {
 s->root = vtd_get_quad_raw(s, DMAR_RTADDR_REG);
-s->root &= VTD_RTADDR_ADDR_MASK(s->aw_bits);
+s->root &= VTD_RTADDR_ADDR_MASK(S_AW_BITS);
 
 vtd_update_scalable_state(s);
 
@@ -2039,7 +2041,7 @@ static void 
vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
 uint64_t value = 0;
 value = vtd_get_quad_raw(s, DMAR_IRTA_REG);
 s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
-s->intr_root = value & VTD_IRTA_ADDR_MASK(s->aw_bits);
+s->intr_root = value & VTD_IRTA_ADDR_MASK(S_AW_BITS);
 s->intr_eime = value & VTD_IRTA_EIME;

[PATCH rfcv2 15/18] backends/iommufd: Introduce helper function iommufd_device_get_info()

2024-01-31 Thread Zhenzhong Duan

Introduce a helper function iommufd_device_get_info() to get
host IOMMU related information through iommufd uAPI.

Signed-off-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/sysemu/iommufd.h |  4 
 backends/iommufd.c   | 25 -
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index c3f3469760..ec8b80d8d9 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -4,6 +4,7 @@
 #include "qom/object.h"
 #include "exec/hwaddr.h"
 #include "exec/cpu-common.h"
+#include 
 #include "sysemu/host_iommu_device.h"
 
 #define TYPE_IOMMUFD_BACKEND "iommufd"
@@ -47,4 +48,7 @@ typedef struct IOMMUFDDevice {
 } IOMMUFDDevice;
 
 void iommufd_device_init(IOMMUFDDevice *idev);
+int iommufd_device_get_info(IOMMUFDDevice *idev,
+enum iommu_hw_info_type *type,
+uint32_t len, void *data, Error **errp);
 #endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index d92791bba9..1b0b991747 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -20,7 +20,6 @@
 #include "monitor/monitor.h"
 #include "trace.h"
 #include 
-#include 
 
 static void iommufd_backend_init(Object *obj)
 {
@@ -237,3 +236,27 @@ void iommufd_device_init(IOMMUFDDevice *idev)
 host_iommu_base_device_init(>base, HID_IOMMUFD,
 sizeof(IOMMUFDDevice));
 }
+
+int iommufd_device_get_info(IOMMUFDDevice *idev,
+enum iommu_hw_info_type *type,
+uint32_t len, void *data, Error **errp)
+{
+struct iommu_hw_info info = {
+.size = sizeof(info),
+.flags = 0,
+.dev_id = idev->devid,
+.data_len = len,
+.__reserved = 0,
+.data_uptr = (uintptr_t)data,
+};
+int ret;
+
+ret = ioctl(idev->iommufd->fd, IOMMU_GET_HW_INFO, );
+if (ret) {
+error_setg_errno(errp, errno, "Failed to get hardware info");
+} else {
+*type = info.out_data_type;
+}
+
+return ret;
+}
-- 
2.34.1

[PATCH rfcv2 07/18] vfio/container: Implement host_iommu_device_init callback in legacy mode

2024-01-31 Thread Zhenzhong Duan

This callback will be used to initialize base and public elements
in IOMMULegacyDevice.

Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/container.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index bd25b9fbad..8fafd4b4e5 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -1120,6 +1120,12 @@ out_single:
 return ret;
 }
 
+static void vfio_legacy_host_iommu_device_init(VFIODevice *vbasedev)
+{
+host_iommu_base_device_init(>base_hdev, HID_LEGACY,
+sizeof(IOMMULegacyDevice));
+}
+
 static void vfio_iommu_legacy_class_init(ObjectClass *klass, void *data)
 {
 VFIOIOMMUClass *vioc = VFIO_IOMMU_CLASS(klass);
@@ -1132,6 +1138,7 @@ static void vfio_iommu_legacy_class_init(ObjectClass 
*klass, void *data)
 vioc->set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking;
 vioc->query_dirty_bitmap = vfio_legacy_query_dirty_bitmap;
 vioc->pci_hot_reset = vfio_legacy_pci_hot_reset;
+vioc->host_iommu_device_init = vfio_legacy_host_iommu_device_init;
 };
 
 static const TypeInfo types[] = {
-- 
2.34.1

[PATCH rfcv2 12/18] vfio: Initialize host IOMMU device and pass to vIOMMU

2024-01-31 Thread Zhenzhong Duan

Initialize host IOMMU device in vfio and pass to vIOMMU, so that vIOMMU
could get hw IOMMU information.

Support both iommufd and legacy backend.

Originally-by: Yi Liu 
Signed-off-by: Nicolin Chen 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/pci.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index dedb64fc08..b23c5ea790 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3112,11 +3112,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
 vfio_bars_register(vdev);
 
-ret = vfio_add_capabilities(vdev, errp);
+ret = pci_device_set_iommu_device(pdev, >base_hdev, errp);
 if (ret) {
+error_prepend(errp, "Failed to set iommu_device: ");
 goto out_teardown;
 }
 
+ret = vfio_add_capabilities(vdev, errp);
+if (ret) {
+goto out_unset_idev;
+}
+
 if (vdev->vga) {
 vfio_vga_quirk_setup(vdev);
 }
@@ -3133,7 +3139,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 error_setg(errp,
"cannot support IGD OpRegion feature on hotplugged "
"device");
-goto out_teardown;
+goto out_unset_idev;
 }
 
 ret = vfio_get_dev_region_info(vbasedev,
@@ -3142,13 +3148,13 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 if (ret) {
 error_setg_errno(errp, -ret,
  "does not support requested IGD OpRegion 
feature");
-goto out_teardown;
+goto out_unset_idev;
 }
 
 ret = vfio_pci_igd_opregion_init(vdev, opregion, errp);
 g_free(opregion);
 if (ret) {
-goto out_teardown;
+goto out_unset_idev;
 }
 }
 
@@ -3234,6 +3240,8 @@ out_deregister:
 if (vdev->intx.mmap_timer) {
 timer_free(vdev->intx.mmap_timer);
 }
+out_unset_idev:
+pci_device_unset_iommu_device(pdev);
 out_teardown:
 vfio_teardown_msi(vdev);
 vfio_bars_exit(vdev);
@@ -3262,6 +3270,7 @@ static void vfio_instance_finalize(Object *obj)
 static void vfio_exitfn(PCIDevice *pdev)
 {
 VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+VFIODevice *vbasedev = >vbasedev;
 
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
@@ -3276,7 +3285,8 @@ static void vfio_exitfn(PCIDevice *pdev)
 vfio_teardown_msi(vdev);
 vfio_pci_disable_rp_atomics(vdev);
 vfio_bars_exit(vdev);
-vfio_migration_exit(>vbasedev);
+vfio_migration_exit(vbasedev);
+pci_device_unset_iommu_device(pdev);
 }
 
 static void vfio_pci_reset(DeviceState *dev)
-- 
2.34.1

[PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU

2024-01-31 Thread Zhenzhong Duan

Hi,

This enables vIOMMU to get host IOMMU cap/ecap information by introducing
a new set/unset_iommu_device interface, then vIOMMU could check or sync
with vIOMMU's own cap/ecap config.

It works by having device side, i.e. VFIO, register either an IOMMULegacyDevice
or IOMMUFDDevice to vIOMMU, which includes necessary data to archive that.
Currently only VFIO device is supported, but it could also be used for other
devices, i.e., VDPA.

For coldplugged device, we can get its host IOMMU cap/ecap during qemu init,
then check and sync into vIOMMU cap/ecap.
For hotplugged device, vIOMMU cap/ecap is frozen, we could only check with
vIOMMU cap/ecap, not allowed to update. IF check fails, hotplugged will fail.

This is also a prerequisite for incoming iommufd nesting series:
'intel_iommu: Enable stage-1 translation'.

I didn't implement cap/ecap sync for legacy VFIO backend, would like to see
what Eric want to put in IOMMULegacyDevice for virtio-iommu and if I can
utilize some of them.

PATCH1-3: Introduce HostIOMMUDevice and two sub class
PATCH4-5: Define HostIOMMUDevice instance in VFIODevice
PATCH6-9: Introdcue host_iommu_device_init callback to intialize HostIOMMUDevice
PATCH10-12: Introdcue set/unset_iommu_device to pass HostIOMMUDevice to vIOMMU
PATCH13-18: Implement cap/ecap check and sync in intel_iommu

Qemu code can be found at:
https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_preq_rfcv2

Thanks
Zhenzhong


Changelog:
rfcv2:
- introduce common abstract HostIOMMUDevice and sub struct for different BEs 
(Eric, Cédric)
- remove iommufd_device.[ch] (Cédric)
- remove duplicate iommufd/devid define from VFIODevice (Eric)
- drop the p in aliased_pbus and aliased_pdevfn (Eric)
- assert devfn and iommu_bus in pci_device_get_iommu_bus_devfn (Cédric, Eric)
- use errp in iommufd_device_get_info (Eric)
- split and simplify cap/ecap check/sync code in intel_iommu.c (Cédric)
- move VTDHostIOMMUDevice declaration to intel_iommu_internal.h (Cédric)
- make '(vtd->cap_reg >> 16) & 0x3fULL' a MACRO and add missed '+1' (Cédric)
- block migration if vIOMMU cap/ecap updated based on host IOMMU cap/ecap
- add R-B


Yi Liu (3):
  hw/pci: Introduce pci_device_set/unset_iommu_device()
  intel_iommu: Add set/unset_iommu_device callback
  intel_iommu: Add a framework to check and sync host IOMMU cap/ecap

Zhenzhong Duan (15):
  Introduce a common abstract struct HostIOMMUDevice
  backends/iommufd: Introduce IOMMUFDDevice
  vfio: Introduce IOMMULegacyDevice
  vfio: Add host iommu device instance into VFIODevice
  vfio: Remove redundant iommufd and devid elements in VFIODevice
  vfio: Introduce host_iommu_device_init callback
  vfio/container: Implement host_iommu_device_init callback in legacy
mode
  vfio/iommufd: Implement host_iommu_device_init callback in iommufd
mode
  vfio/pci: Initialize host iommu device instance after attachment
  vfio: Initialize host IOMMU device and pass to vIOMMU
  intel_iommu: Extract out vtd_cap_init to initialize cap/ecap
  backends/iommufd: Introduce helper function iommufd_device_get_info()
  intel_iommu: Implement check and sync mechanism in iommufd mode
  intel_iommu: Use mgaw instead of s->aw_bits
  intel_iommu: Block migration if cap is updated

 hw/i386/intel_iommu_internal.h|  15 ++
 include/hw/i386/intel_iommu.h |   4 +
 include/hw/pci/pci.h  |  38 +++-
 include/hw/vfio/vfio-common.h |  20 +-
 include/hw/vfio/vfio-container-base.h |   1 +
 include/sysemu/host_iommu_device.h|  22 ++
 include/sysemu/iommufd.h  |  18 ++
 backends/iommufd.c|  31 ++-
 hw/i386/acpi-build.c  |   3 +-
 hw/i386/intel_iommu.c | 279 --
 hw/pci/pci.c  |  62 +-
 hw/vfio/ap.c  |   2 +-
 hw/vfio/ccw.c |   2 +-
 hw/vfio/common.c  |  10 +-
 hw/vfio/container.c   |   7 +
 hw/vfio/helpers.c |   2 +-
 hw/vfio/iommufd.c |  32 +--
 hw/vfio/pci.c |  25 ++-
 hw/vfio/platform.c|   3 +-
 19 files changed, 488 insertions(+), 88 deletions(-)
 create mode 100644 include/sysemu/host_iommu_device.h

-- 
2.34.1

[PATCH rfcv2 08/18] vfio/iommufd: Implement host_iommu_device_init callback in iommufd mode

2024-01-31 Thread Zhenzhong Duan

This callback will be used to initialize base and public elements
in IOMMUFDDevice, with the exception of iommufd and devid which
are initialized early in attachment.

Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/iommufd.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 5d50549713..7d39d7a5fa 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -621,6 +621,11 @@ out_single:
 return ret;
 }
 
+static void vfio_cdev_host_iommu_device_init(VFIODevice *vbasedev)
+{
+iommufd_device_init(>iommufd_dev);
+}
+
 static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
 {
 VFIOIOMMUClass *vioc = VFIO_IOMMU_CLASS(klass);
@@ -630,6 +635,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass 
*klass, void *data)
 vioc->attach_device = iommufd_cdev_attach;
 vioc->detach_device = iommufd_cdev_detach;
 vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
+vioc->host_iommu_device_init = vfio_cdev_host_iommu_device_init;
 };
 
 static const TypeInfo types[] = {
-- 
2.34.1

[PATCH rfcv2 09/18] vfio/pci: Initialize host iommu device instance after attachment

2024-01-31 Thread Zhenzhong Duan

Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d1e1b8cb89..dedb64fc08 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3006,6 +3006,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 goto error;
 }
 
+/* Initialize host iommu device after attachment succeed */
+host_iommu_device_init(vbasedev);
+
 vfio_populate_device(vdev, );
 if (err) {
 error_propagate(errp, err);
-- 
2.34.1

[PATCH rfcv2 06/18] vfio: Introduce host_iommu_device_init callback

2024-01-31 Thread Zhenzhong Duan

Introduce host_iommu_device_init callback and a wrapper for it.

Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h | 1 +
 include/hw/vfio/vfio-container-base.h | 1 +
 hw/vfio/common.c  | 8 
 3 files changed, 10 insertions(+)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 24e3eaaf3d..9c4b60c906 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -216,6 +216,7 @@ struct vfio_device_info *vfio_get_device_info(int fd);
 int vfio_attach_device(char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp);
 void vfio_detach_device(VFIODevice *vbasedev);
+void host_iommu_device_init(VFIODevice *vbasedev);
 
 int vfio_kvm_device_add_fd(int fd, Error **errp);
 int vfio_kvm_device_del_fd(int fd, Error **errp);
diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index b2813b0c11..c71f4abb2d 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -120,6 +120,7 @@ struct VFIOIOMMUClass {
 int (*attach_device)(const char *name, VFIODevice *vbasedev,
  AddressSpace *as, Error **errp);
 void (*detach_device)(VFIODevice *vbasedev);
+void (*host_iommu_device_init)(VFIODevice *vbasedev);
 /* migration feature */
 int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
bool start);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 8b3b575c9d..f7f85160be 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1521,3 +1521,11 @@ void vfio_detach_device(VFIODevice *vbasedev)
 }
 vbasedev->bcontainer->ops->detach_device(vbasedev);
 }
+
+void host_iommu_device_init(VFIODevice *vbasedev)
+{
+const VFIOIOMMUClass *ops = vbasedev->bcontainer->ops;
+
+assert(ops->host_iommu_device_init);
+ops->host_iommu_device_init(vbasedev);
+}
-- 
2.34.1

[PATCH rfcv2 02/18] backends/iommufd: Introduce IOMMUFDDevice

2024-01-31 Thread Zhenzhong Duan

IOMMUFDDevice represents a device in iommufd and can be used as
a communication interface between devices (i.e., VFIO, VDPA) and
vIOMMU.

Currently it includes only public iommufd handle and device id
which could be used by vIOMMU to get hw IOMMU information.

There will also be some elements in private field in future,
i.e., capability bits for dirty tracking; when nested translation
is supported in future, vIOMMU is going to have more iommufd related
operations like allocate hwpt for a device, attach/detach hwpt, etc.
So IOMMUFDDevice will be further extended with those needs.

IOMMUFDDevice is willingly not a QOM object because we don't want
it to be visible from the user interface.

Introduce a helper iommufd_device_init to initialize IOMMUFDDevice.

Originally-by: Yi Liu 
Signed-off-by: Yi Sun 
Signed-off-by: Zhenzhong Duan 
---
 include/sysemu/iommufd.h | 14 ++
 backends/iommufd.c   |  6 ++
 2 files changed, 20 insertions(+)

diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 9af27ebd6c..c3f3469760 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -4,6 +4,7 @@
 #include "qom/object.h"
 #include "exec/hwaddr.h"
 #include "exec/cpu-common.h"
+#include "sysemu/host_iommu_device.h"
 
 #define TYPE_IOMMUFD_BACKEND "iommufd"
 OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass, IOMMUFD_BACKEND)
@@ -33,4 +34,17 @@ int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t 
ioas_id, hwaddr iova,
 ram_addr_t size, void *vaddr, bool readonly);
 int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
   hwaddr iova, ram_addr_t size);
+
+
+/* Abstraction of host IOMMUFD device */
+typedef struct IOMMUFDDevice {
+HostIOMMUDevice base;
+/* private: */
+
+/* public: */
+IOMMUFDBackend *iommufd;
+uint32_t devid;
+} IOMMUFDDevice;
+
+void iommufd_device_init(IOMMUFDDevice *idev);
 #endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 1ef683c7b0..d92791bba9 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -231,3 +231,9 @@ static void register_types(void)
 }
 
 type_init(register_types);
+
+void iommufd_device_init(IOMMUFDDevice *idev)
+{
+host_iommu_base_device_init(>base, HID_IOMMUFD,
+sizeof(IOMMUFDDevice));
+}
-- 
2.34.1

[PATCH rfcv2 04/18] vfio: Add host iommu device instance into VFIODevice

2024-01-31 Thread Zhenzhong Duan

Either IOMMULegacyDevice or IOMMUFDDevice into VFIODevice, neither
both.

Signed-off-by: Zhenzhong Duan 
---
 include/hw/vfio/vfio-common.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 8bfb9cbe94..1bbad003ee 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -32,6 +32,7 @@
 #include "sysemu/sysemu.h"
 #include "hw/vfio/vfio-container-base.h"
 #include "sysemu/host_iommu_device.h"
+#include "sysemu/iommufd.h"
 
 #define VFIO_MSG_PREFIX "vfio %s: "
 
@@ -132,8 +133,18 @@ typedef struct VFIODevice {
 bool dirty_tracking;
 int devid;
 IOMMUFDBackend *iommufd;
+union {
+HostIOMMUDevice base_hdev;
+IOMMULegacyDevice legacy_dev;
+IOMMUFDDevice iommufd_dev;
+};
 } VFIODevice;
 
+QEMU_BUILD_BUG_ON(offsetof(VFIODevice, legacy_dev.base) !=
+  offsetof(VFIODevice, base_hdev));
+QEMU_BUILD_BUG_ON(offsetof(VFIODevice, iommufd_dev.base) !=
+  offsetof(VFIODevice, base_hdev));
+
 struct VFIODeviceOps {
 void (*vfio_compute_needs_reset)(VFIODevice *vdev);
 int (*vfio_hot_reset_multi)(VFIODevice *vdev);
-- 
2.34.1

Re: [PULL 13/17] hw/fsi: Aspeed APB2OPB & On-chip peripheral bus

2024-01-31 Thread Cédric Le Goater


Here is an update version with a fix. I will include it in the next PR.

Thanks,

C.




From: Ninad Palsule 

This is a part of patchset where IBM's Flexible Service Interface is
introduced.

An APB-to-OPB bridge enabling access to the OPB from the ARM core in
the AST2600. Hardware limitations prevent the OPB from being directly
mapped into APB, so all accesses are indirect through the bridge.

The On-Chip Peripheral Bus (OPB): A low-speed bus typically found in
POWER processors. This now makes an appearance in the ASPEED SoC due
to tight integration of the FSI master IP with the OPB, mainly the
existence of an MMIO-mapping of the CFAM address straight onto a
sub-region of the OPB address space.

Signed-off-by: Andrew Jeffery 
Signed-off-by: Ninad Palsule 
Reviewed-by: Cédric Le Goater 
[ clg: - moved FSIMasterState under AspeedAPB2OPBState
   - modified fsi_opb_fsi_master_address() and
 fsi_opb_opb2fsi_address()
   - instroduced fsi_aspeed_apb2opb_init()
   - reworked fsi_aspeed_apb2opb_realize()
   - removed FSIMasterState object and fsi_opb_realize()
   - simplified OPBus
   - introduced fsi_aspeed_apb2opb_rw to fix endianness issue ]
Signed-off-by: Cédric Le Goater 
---
 include/hw/fsi/aspeed_apb2opb.h |  46 
 hw/fsi/aspeed_apb2opb.c | 367 
 hw/arm/Kconfig  |   1 +
 hw/fsi/Kconfig  |   5 +
 hw/fsi/meson.build  |   1 +
 hw/fsi/trace-events |   2 +
 6 files changed, 422 insertions(+)
 create mode 100644 include/hw/fsi/aspeed_apb2opb.h
 create mode 100644 hw/fsi/aspeed_apb2opb.c

diff --git a/include/hw/fsi/aspeed_apb2opb.h b/include/hw/fsi/aspeed_apb2opb.h
new file mode 100644
index ..f6a2387abf28
--- /dev/null
+++ b/include/hw/fsi/aspeed_apb2opb.h
@@ -0,0 +1,46 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2024 IBM Corp.
+ *
+ * ASPEED APB2OPB Bridge
+ * IBM On-Chip Peripheral Bus
+ */
+#ifndef FSI_ASPEED_APB2OPB_H
+#define FSI_ASPEED_APB2OPB_H
+
+#include "exec/memory.h"
+#include "hw/fsi/fsi-master.h"
+#include "hw/sysbus.h"
+
+#define TYPE_FSI_OPB "fsi.opb"
+
+#define TYPE_OP_BUS "opb"
+OBJECT_DECLARE_SIMPLE_TYPE(OPBus, OP_BUS)
+
+typedef struct OPBus {
+BusState bus;
+
+MemoryRegion mr;
+AddressSpace as;
+} OPBus;
+
+#define TYPE_ASPEED_APB2OPB "aspeed.apb2opb"
+OBJECT_DECLARE_SIMPLE_TYPE(AspeedAPB2OPBState, ASPEED_APB2OPB)
+
+#define ASPEED_APB2OPB_NR_REGS ((0xe8 >> 2) + 1)
+
+#define ASPEED_FSI_NUM 2
+
+typedef struct AspeedAPB2OPBState {
+SysBusDevice parent_obj;
+
+MemoryRegion iomem;
+
+uint32_t regs[ASPEED_APB2OPB_NR_REGS];
+qemu_irq irq;
+
+OPBus opb[ASPEED_FSI_NUM];
+FSIMasterState fsi[ASPEED_FSI_NUM];
+} AspeedAPB2OPBState;
+
+#endif /* FSI_ASPEED_APB2OPB_H */
diff --git a/hw/fsi/aspeed_apb2opb.c b/hw/fsi/aspeed_apb2opb.c
new file mode 100644
index ..ea50718b6a2b
--- /dev/null
+++ b/hw/fsi/aspeed_apb2opb.c
@@ -0,0 +1,367 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2024 IBM Corp.
+ *
+ * ASPEED APB-OPB FSI interface
+ * IBM On-chip Peripheral Bus
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qom/object.h"
+#include "qapi/error.h"
+#include "trace.h"
+
+#include "hw/fsi/aspeed_apb2opb.h"
+#include "hw/qdev-core.h"
+
+#define TO_REG(x) (x >> 2)
+
+#define APB2OPB_VERSIONTO_REG(0x00)
+#define APB2OPB_TRIGGERTO_REG(0x04)
+
+#define APB2OPB_CONTROLTO_REG(0x08)
+#define   APB2OPB_CONTROL_OFF  BE_GENMASK(31, 13)
+
+#define APB2OPB_OPB2FSITO_REG(0x0c)
+#define   APB2OPB_OPB2FSI_OFF  BE_GENMASK(31, 22)
+
+#define APB2OPB_OPB0_SEL   TO_REG(0x10)
+#define APB2OPB_OPB1_SEL   TO_REG(0x28)
+#define   APB2OPB_OPB_SEL_EN   BIT(0)
+
+#define APB2OPB_OPB0_MODE  TO_REG(0x14)
+#define APB2OPB_OPB1_MODE  TO_REG(0x2c)
+#define   APB2OPB_OPB_MODE_RD  BIT(0)
+
+#define APB2OPB_OPB0_XFER  TO_REG(0x18)
+#define APB2OPB_OPB1_XFER  TO_REG(0x30)
+#define   APB2OPB_OPB_XFER_FULLBIT(1)
+#define   APB2OPB_OPB_XFER_HALFBIT(0)
+
+#define APB2OPB_OPB0_ADDR  TO_REG(0x1c)
+#define APB2OPB_OPB0_WRITE_DATATO_REG(0x20)
+
+#define APB2OPB_OPB1_ADDR  TO_REG(0x34)
+#define APB2OPB_OPB1_WRITE_DATA  TO_REG(0x38)
+
+#define APB2OPB_IRQ_STSTO_REG(0x48)
+#define   APB2OPB_IRQ_STS_OPB1_TX_ACK  BIT(17)
+#define   APB2OPB_IRQ_STS_OPB0_TX_ACK  BIT(16)
+
+#define APB2OPB_OPB0_WRITE_WORD_ENDIAN TO_REG(0x4c)
+#define   APB2OPB_OPB0_WRITE_WORD_ENDIAN_BE 0x0011101b
+#define APB2OPB_OPB0_WRITE_BYTE_ENDIAN TO_REG(0x50)
+#define   APB2OPB_OPB0_WRITE_BYTE_ENDIAN_BE 0x0c330f3f
+#define APB2OPB_OPB1_WRITE_WORD_ENDIAN TO_REG(0x54)
+#define

Re: [PATCH v3 4/4] tests/tcg/s390x: Test CONVERT TO BINARY

2024-01-31 Thread Thomas Huth


On 01/02/2024 00.07, Ilya Leoshkevich wrote:

Check the CVB's and CVBG's corner cases.

Co-developed-by: Pavel Zbitskiy 
Signed-off-by: Ilya Leoshkevich 
---
  tests/tcg/s390x/Makefile.target |  1 +
  tests/tcg/s390x/cvb.c   | 47 +
  2 files changed, 48 insertions(+)
  create mode 100644 tests/tcg/s390x/cvb.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 04e4bddd83d..e2aba2ec274 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -46,6 +46,7 @@ TESTS+=laalg
  TESTS+=add-logical-with-carry
  TESTS+=lae
  TESTS+=cvd
+TESTS+=cvb
  
  cdsg: CFLAGS+=-pthread

  cdsg: LDFLAGS+=-pthread
diff --git a/tests/tcg/s390x/cvb.c b/tests/tcg/s390x/cvb.c
new file mode 100644
index 000..47b7a7965f4
--- /dev/null
+++ b/tests/tcg/s390x/cvb.c
@@ -0,0 +1,47 @@
+/*
+ * Test the CONVERT TO DECIMAL instruction.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include 
+#include 
+#include 
+
+static int32_t cvb(uint64_t x)
+{
+uint32_t ret;
+
+asm("cvb %[ret],%[x]" : [ret] "=r" (ret) : [x] "R" (x));
+
+return ret;
+}
+
+static int64_t cvbg(__uint128_t x)
+{
+int64_t ret;
+
+asm("cvbg %[ret],%[x]" : [ret] "=r" (ret) : [x] "T" (x));
+
+return ret;
+}


Just to be on the safe side, could you please add a check for CVBY, too?

 Thanks,
  Thomas

Re: [PATCH v3 3/4] target/s390x: implement CVB, CVBY and CVBG

2024-01-31 Thread Thomas Huth


On 01/02/2024 00.07, Ilya Leoshkevich wrote:

From: Pavel Zbitskiy 

Convert to Binary - counterparts of the already implemented Convert
to Decimal (CVD*) instructions.
Example from the Principles of Operation: 25594C becomes 63FA.

[iii: Use separate functions for CVB and CVBG for simplicity].

Signed-off-by: Pavel Zbitskiy 
---
  target/s390x/helper.h|  1 +
  target/s390x/tcg/insn-data.h.inc |  4 
  target/s390x/tcg/int_helper.c| 40 
  target/s390x/tcg/translate.c | 12 ++
  4 files changed, 57 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 332a9a9c632..3c607f4e437 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -88,6 +88,7 @@ DEF_HELPER_FLAGS_3(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i128, 
i64)
  DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
  DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
  DEF_HELPER_FLAGS_2(sqxb, TCG_CALL_NO_WG, i128, env, i128)
+DEF_HELPER_FLAGS_3(cvb, TCG_CALL_NO_WG, i64, env, i64, i32)
  DEF_HELPER_FLAGS_1(cvd, TCG_CALL_NO_RWG_SE, i64, s32)
  DEF_HELPER_FLAGS_1(cvdg, TCG_CALL_NO_RWG_SE, i128, s64)
  DEF_HELPER_FLAGS_4(pack, TCG_CALL_NO_WG, void, env, i32, i64, i64)
diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index 388dcb8dbbc..9eb998d4c25 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/target/s390x/tcg/insn-data.h.inc
@@ -293,6 +293,10 @@
  D(0xec73, CLFIT,   RIE_a, GIE, r1_32u, i2_16u, 0, 0, ct, 0, 1)
  D(0xec71, CLGIT,   RIE_a, GIE, r1_o, i2_16u, 0, 0, ct, 0, 1)
  
+/* CONVERT TO BINARY */

+C(0x4f00, CVB, RX_a,  Z,   la2, 0, new, r1_32, cvb, 0)
+C(0xe306, CVBY,RXY_a, LD,  la2, 0, new, r1_32, cvb, 0)
+C(0xe30e, CVBG,RXY_a, Z,   la2, 0, r1, 0, cvbg, 0)
  /* CONVERT TO DECIMAL */
  C(0x4e00, CVD, RX_a,  Z,   r1_o, a2, 0, 0, cvd, 0)
  C(0xe326, CVDY,RXY_a, LD,  r1_o, a2, 0, 0, cvd, 0)
diff --git a/target/s390x/tcg/int_helper.c b/target/s390x/tcg/int_helper.c
index 121e3006a65..002d4b52dda 100644
--- a/target/s390x/tcg/int_helper.c
+++ b/target/s390x/tcg/int_helper.c
@@ -25,6 +25,7 @@
  #include "exec/exec-all.h"
  #include "qemu/host-utils.h"
  #include "exec/helper-proto.h"
+#include "exec/cpu_ldst.h"
  
  /* #define DEBUG_HELPER */

  #ifdef DEBUG_HELPER
@@ -98,6 +99,45 @@ Int128 HELPER(divu64)(CPUS390XState *env, uint64_t ah, 
uint64_t al, uint64_t b)
  tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
  }
  
+uint64_t HELPER(cvb)(CPUS390XState *env, uint64_t src, uint32_t n)

+{
+int64_t dec, sign = 0, digit, val = 0, pow10 = 0;
+const uintptr_t ra = GETPC();
+uint64_t tmpsrc;
+int i, j;
+
+for (i = 0; i < n; i++) {
+tmpsrc = wrap_address(env, src + (n - i - 1) * 8);
+dec = cpu_ldq_data_ra(env, tmpsrc, ra);
+for (j = 0; j < 16; j++, dec >>= 4) {
+if (i == 0 && j == 0) {
+sign = dec & 0xf;
+if (sign < 0xa) {
+tcg_s390_data_exception(env, 0, ra);
+}
+continue;
+}
+digit = dec & 0xf;
+if (digit > 0x9) {
+tcg_s390_data_exception(env, 0, ra);
+}
+if (i == 0 && j == 1) {
+if (sign == 0xb || sign == 0xd) {
+val = -digit;
+pow10 = -10;
+} else {
+val = digit;
+pow10 = 10;
+}
+} else {
+val += digit * pow10;
+pow10 *= 10;
+}
+}
+}
+return val;
+}


I just noticed that there was even a v5 of Pavel's patch where David noted 
that the fixed-point-divide exception checks are still missing, see:


https://patchwork.kernel.org/project/qemu-devel/patch/20180902003322.3428-4-pavel.zbits...@gmail.com/ 



Could you add those, too, please?

 Thanks,
  Thomas

Re: [PATCH 1/3] ui/gtk: skip drawing guest scanout when associated VC is invisible

2024-01-31 Thread Marc-André Lureau

Hi

On Wed, Jan 31, 2024 at 10:56 PM Kim, Dongwon  wrote:
>
> Hi Marc-André,
>
> > https://docs.gtk.org/gtk3/method.Widget.is_visible.html
>
> This is what we had tried first but it didn't seem to work for the case of 
> window minimization.
> I see the visible flag for the GTK widget didn't seem to be toggled for some 
> reason. And when

Right, because minimize != visible. You can still get window preview
with alt-tab and other compositor drawings.

Iow, it should keep rendering even when minimized.

> closing window, vc->window widget is destroyed so it is not possible to check 
> the flag using
> this GTK function. Having extra flag bound to VC was most intuitive for the 
> logic I wanted to
> implement.
>
> Thanks!!
> DW
>
> > Subject: Re: [PATCH 1/3] ui/gtk: skip drawing guest scanout when associated
> > VC is invisible
> >
> > Hi Dongwon
> >
> > On Wed, Jan 31, 2024 at 3:50 AM  wrote:
> > >
> > > From: Dongwon Kim 
> > >
> > > A new flag "visible" is added to show visibility status of the gfx 
> > > console.
> > > The flag is set to 'true' when the VC is visible but set to 'false'
> > > when it is hidden or closed. When the VC is invisible, drawing guest
> > > frames should be skipped as it will never be completed and it would
> > > potentially lock up the guest display especially when blob scanout is 
> > > used.
> >
> > Can't it skip drawing when the widget is not visible instead?
> > https://docs.gtk.org/gtk3/method.Widget.is_visible.html
> >
> > >
> > > Cc: Marc-André Lureau 
> > > Cc: Gerd Hoffmann 
> > > Cc: Vivek Kasireddy 
> > >
> > > Signed-off-by: Dongwon Kim 
> > > ---
> > >  include/ui/gtk.h |  1 +
> > >  ui/gtk-egl.c |  8 
> > >  ui/gtk-gl-area.c |  8 
> > >  ui/gtk.c | 10 +-
> > >  4 files changed, 26 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/ui/gtk.h b/include/ui/gtk.h index
> > > aa3d637029..2de38e5724 100644
> > > --- a/include/ui/gtk.h
> > > +++ b/include/ui/gtk.h
> > > @@ -57,6 +57,7 @@ typedef struct VirtualGfxConsole {
> > >  bool y0_top;
> > >  bool scanout_mode;
> > >  bool has_dmabuf;
> > > +bool visible;
> > >  #endif
> > >  } VirtualGfxConsole;
> > >
> > > diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c index 3af5ac5bcf..993c283191
> > > 100644
> > > --- a/ui/gtk-egl.c
> > > +++ b/ui/gtk-egl.c
> > > @@ -265,6 +265,10 @@ void
> > gd_egl_scanout_dmabuf(DisplayChangeListener
> > > *dcl,  #ifdef CONFIG_GBM
> > >  VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
> > >
> > > +if (!vc->gfx.visible) {
> > > +return;
> > > +}
> > > +
> > >  eglMakeCurrent(qemu_egl_display, vc->gfx.esurface,
> > > vc->gfx.esurface, vc->gfx.ectx);
> > >
> > > @@ -363,6 +367,10 @@ void gd_egl_flush(DisplayChangeListener *dcl,
> > >  VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
> > >  GtkWidget *area = vc->gfx.drawing_area;
> > >
> > > +if (!vc->gfx.visible) {
> > > +return;
> > > +}
> > > +
> > >  if (vc->gfx.guest_fb.dmabuf && !vc->gfx.guest_fb.dmabuf-
> > >draw_submitted) {
> > >  graphic_hw_gl_block(vc->gfx.dcl.con, true);
> > >  vc->gfx.guest_fb.dmabuf->draw_submitted = true; diff --git
> > > a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c index 52dcac161e..04e07bd7ee
> > > 100644
> > > --- a/ui/gtk-gl-area.c
> > > +++ b/ui/gtk-gl-area.c
> > > @@ -285,6 +285,10 @@ void
> > > gd_gl_area_scanout_flush(DisplayChangeListener *dcl,  {
> > >  VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
> > >
> > > +if (!vc->gfx.visible) {
> > > +return;
> > > +}
> > > +
> > >  if (vc->gfx.guest_fb.dmabuf && !vc->gfx.guest_fb.dmabuf-
> > >draw_submitted) {
> > >  graphic_hw_gl_block(vc->gfx.dcl.con, true);
> > >  vc->gfx.guest_fb.dmabuf->draw_submitted = true; @@ -299,6
> > > +303,10 @@ void gd_gl_area_scanout_dmabuf(DisplayChangeListener *dcl,
> > > #ifdef CONFIG_GBM
> > >  VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
> > >
> > > +if (!vc->gfx.visible) {
> > > +return;
> > > +}
> > > +
> > >  gtk_gl_area_make_current(GTK_GL_AREA(vc->gfx.drawing_area));
> > >  egl_dmabuf_import_texture(dmabuf);
> > >  if (!dmabuf->texture) {
> > > diff --git a/ui/gtk.c b/ui/gtk.c
> > > index 810d7fc796..02eb667d8a 100644
> > > --- a/ui/gtk.c
> > > +++ b/ui/gtk.c
> > > @@ -1312,15 +1312,20 @@ static void gd_menu_quit(GtkMenuItem *item,
> > > void *opaque)  static void gd_menu_switch_vc(GtkMenuItem *item, void
> > > *opaque)  {
> > >  GtkDisplayState *s = opaque;
> > > -VirtualConsole *vc = gd_vc_find_by_menu(s);
> > > +VirtualConsole *vc;
> > >  GtkNotebook *nb = GTK_NOTEBOOK(s->notebook);
> > >  gint page;
> > >
> > > +vc = gd_vc_find_current(s);
> > > +vc->gfx.visible = false;
> > > +
> > > +vc = gd_vc_find_by_menu(s);
> > >  gtk_release_modifiers(s);
> > >  if (vc) {
> > >  page =

Re: [PATCH v3 4/4] tests/tcg/s390x: Test CONVERT TO BINARY

2024-01-31 Thread Thomas Huth


On 01/02/2024 00.07, Ilya Leoshkevich wrote:

Check the CVB's and CVBG's corner cases.

Co-developed-by: Pavel Zbitskiy 
Signed-off-by: Ilya Leoshkevich 
---
  tests/tcg/s390x/Makefile.target |  1 +
  tests/tcg/s390x/cvb.c   | 47 +
  2 files changed, 48 insertions(+)
  create mode 100644 tests/tcg/s390x/cvb.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 04e4bddd83d..e2aba2ec274 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -46,6 +46,7 @@ TESTS+=laalg
  TESTS+=add-logical-with-carry
  TESTS+=lae
  TESTS+=cvd
+TESTS+=cvb
  
  cdsg: CFLAGS+=-pthread

  cdsg: LDFLAGS+=-pthread
diff --git a/tests/tcg/s390x/cvb.c b/tests/tcg/s390x/cvb.c
new file mode 100644
index 000..47b7a7965f4
--- /dev/null
+++ b/tests/tcg/s390x/cvb.c
@@ -0,0 +1,47 @@
+/*
+ * Test the CONVERT TO DECIMAL instruction.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include 
+#include 
+#include 
+
+static int32_t cvb(uint64_t x)
+{
+uint32_t ret;
+
+asm("cvb %[ret],%[x]" : [ret] "=r" (ret) : [x] "R" (x));
+
+return ret;
+}
+
+static int64_t cvbg(__uint128_t x)
+{
+int64_t ret;
+
+asm("cvbg %[ret],%[x]" : [ret] "=r" (ret) : [x] "T" (x));
+
+return ret;
+}
+
+int main(void)
+{
+__uint128_t m = (((__uint128_t)0x9223372036854775) << 16) | 0x8070;
+
+assert(cvb(0xc) == 0);
+assert(cvb(0x1c) == 1);
+assert(cvb(0x25594c) == 25594);
+assert(cvb(0x1d) == -1);
+assert(cvb(0x2147483647c) == 0x7fff);
+assert(cvb(0x2147483647d) == -0x7fff);
+
+assert(cvbg(0xc) == 0);
+assert(cvbg(0x1c) == 1);
+assert(cvbg(0x25594c) == 25594);
+assert(cvbg(0x1d) == -1);
+assert(cvbg(m | 0xc) == 0x7fff);
+assert(cvbg(m | 0xd) == -0x7fff);
+
+return EXIT_SUCCESS;
+}


Reviewed-by: Thomas Huth

Re: [PATCH v3 2/4] tests/tcg/s390x: Test CONVERT TO DECIMAL

2024-01-31 Thread Thomas Huth


On 01/02/2024 00.07, Ilya Leoshkevich wrote:

Check the CVD's and CVDG's corner cases.

Signed-off-by: Ilya Leoshkevich 
---
  tests/tcg/s390x/Makefile.target |  1 +
  tests/tcg/s390x/cvd.c   | 45 +
  2 files changed, 46 insertions(+)
  create mode 100644 tests/tcg/s390x/cvd.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 30994dcf9c2..04e4bddd83d 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -45,6 +45,7 @@ TESTS+=clc
  TESTS+=laalg
  TESTS+=add-logical-with-carry
  TESTS+=lae
+TESTS+=cvd
  
  cdsg: CFLAGS+=-pthread

  cdsg: LDFLAGS+=-pthread
diff --git a/tests/tcg/s390x/cvd.c b/tests/tcg/s390x/cvd.c
new file mode 100644
index 000..c1fb63ca9a6
--- /dev/null
+++ b/tests/tcg/s390x/cvd.c
@@ -0,0 +1,45 @@
+/*
+ * Test the CONVERT TO DECIMAL instruction.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include 
+#include 
+#include 
+
+static uint64_t cvd(int32_t x)
+{
+uint64_t ret;
+
+asm("cvd %[x],%[ret]" : [ret] "=R" (ret) : [x] "r" (x));
+
+return ret;
+}
+
+static __uint128_t cvdg(int64_t x)
+{
+__uint128_t ret;
+
+asm("cvdg %[x],%[ret]" : [ret] "=T" (ret) : [x] "r" (x));
+
+return ret;
+}
+
+int main(void)
+{
+__uint128_t m = (((__uint128_t)0x9223372036854775) << 16) | 0x8070;
+
+assert(cvd(0) == 0xc);
+assert(cvd(1) == 0x1c);
+assert(cvd(-1) == 0x1d);
+assert(cvd(0x7fff) == 0x2147483647c);
+assert(cvd(-0x7fff) == 0x2147483647d);
+
+assert(cvdg(0) == 0xc);
+assert(cvdg(1) == 0x1c);
+assert(cvdg(-1) == 0x1d);
+assert(cvdg(0x7fff) == (m | 0xc));
+assert(cvdg(-0x7fff) == (m | 0xd));
+
+return EXIT_SUCCESS;
+}


Reviewed-by: Thomas Huth

Re: [PATCH 14/14] migration/multifd: Forbid spurious wakeups

2024-01-31 Thread Peter Xu

On Wed, Jan 31, 2024 at 06:31:11PM +0800, pet...@redhat.com wrote:
> From: Peter Xu 
> 
> Now multifd's logic is designed to have no spurious wakeup.  I still
> remember a talk to Juan and he seems to agree we should drop it now, and if
> my memory was right it was there because multifd used to hit that when
> still debugging.
> 
> Let's drop it and see what can explode; as long as it's not reaching
> soft-freeze.
> 
> Signed-off-by: Peter Xu 
> ---
>  migration/multifd.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 0f22646f95..bd0e3ea1a5 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -766,9 +766,6 @@ static void *multifd_send_thread(void *opaque)
>  p->pending_sync = false;
>  qemu_mutex_unlock(>mutex);
>  qemu_sem_post(>sem_sync);
> -} else {
> -qemu_mutex_unlock(>mutex);
> -/* sometimes there are spurious wakeups */
>  }
>  }
>  
> -- 
> 2.43.0
> 

While removing this is still the goal, I just noticed that _if_ something
spurious wakeup happens then this will not crash qemu, but instead it'll
cause mutex locked forever and deadlock.

A deadlock is less wanted than a crash in this case, so when I repost, I'll
make sure it crashes and does it hard, like squashing this in:


diff --git a/migration/multifd.c b/migration/multifd.c
index bd0e3ea1a5..89011f75d9 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -751,7 +751,9 @@ static void *multifd_send_thread(void *opaque)
 p->next_packet_size = 0;
 p->pending_job = false;
 qemu_mutex_unlock(>mutex);
-} else if (p->pending_sync) {
+} else {
+/* If not a normal job, must be a sync request */
+assert(p->pending_sync);
 p->flags = MULTIFD_FLAG_SYNC;
 multifd_send_fill_packet(p);
 ret = qio_channel_write_all(p->c, (void *)p->packet,


Fabiano, I'll keep your ACK, but let me know otherwise..

-- 
Peter Xu

Re: [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups

2024-01-31 Thread Peter Xu

On Wed, Jan 31, 2024 at 07:49:51PM -0300, Fabiano Rosas wrote:
> pet...@redhat.com writes:
> 
> > From: Peter Xu 
> >
> > This patchset contains quite a few refactorings to current multifd:
> >
> >   - It picked up some patches from an old series of mine [0] (the last
> > patches were dropped, though; I did the cleanup slightly differently):
> >
> > I still managed to include one patch to split pending_job, but I
> > rewrote the patch here.
> >
> >   - It tries to cleanup multiple multifd paths here and there, the ultimate
> > goal is to redefine send_prepare() to be something like:
> >
> >   p->pages --->  send_prepare() -> IOVs
> >
> > So that there's no obvious change yet on multifd_ops besides redefined
> > interface for send_prepare().  We may want a separate OPs for file
> > later.
> >
> > For 2), one benefit is already presented by Fabiano in his other series [1]
> > on cleaning up zero copy, but this patchset addressed it quite differently,
> > and hopefully also more gradually.  The other benefit is for sure if we
> > have a more concrete API for send_prepare() and if we can reach an initial
> > consensus, then we can have the recent compression accelerators rebased on
> > top of this one.
> >
> > This also prepares for the case where the input can be extended to even not
> > any p->pages, but arbitrary data (like VFIO's potential use case in the
> > future?).  But that will also for later even if reasonable.
> >
> > Please have a look.  Thanks,
> >
> > [0] https://lore.kernel.org/r/20231022201211.452861-1-pet...@redhat.com
> > [1] 
> > https://lore.kernel.org/qemu-devel/20240126221943.26628-1-faro...@suse.de
> >
> > Peter Xu (14):
> >   migration/multifd: Drop stale comment for multifd zero copy
> >   migration/multifd: multifd_send_kick_main()
> >   migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths
> >   migration/multifd: Postpone reset of MultiFDPages_t
> >   migration/multifd: Drop MultiFDSendParams.normal[] array
> >   migration/multifd: Separate SYNC request with normal jobs
> >   migration/multifd: Simplify locking in sender thread
> >   migration/multifd: Drop pages->num check in sender thread
> >   migration/multifd: Rename p->num_packets and clean it up
> >   migration/multifd: Move total_normal_pages accounting
> >   migration/multifd: Move trace_multifd_send|recv()
> >   migration/multifd: multifd_send_prepare_header()
> >   migration/multifd: Move header prepare/fill into send_prepare()
> >   migration/multifd: Forbid spurious wakeups
> >
> >  migration/multifd.h  |  34 +++--
> >  migration/multifd-zlib.c |  11 +-
> >  migration/multifd-zstd.c |  11 +-
> >  migration/multifd.c  | 291 +++
> >  4 files changed, 182 insertions(+), 165 deletions(-)
> 
> This series didn't survive my  iterations test on the opensuse
> machine.
> 
> # Running /x86_64/migration/multifd/tcp/tls/x509/reject-anon-client
> ...
> kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core 
> dumped)
> 
> 
> #0  0x5575dda06399 in qemu_mutex_lock_impl (mutex=0x18, 
> file=0x5575ddce9cc3 "../util/qemu-thread-posix.c", line=275) at 
> ../util/qemu-thread-posix.c:92
> #1  0x5575dda06a94 in qemu_sem_post (sem=0x18) at 
> ../util/qemu-thread-posix.c:275
> #2  0x5575dd56a512 in multifd_send_thread (opaque=0x5575df054ef8) at 
> ../migration/multifd.c:720
> #3  0x5575dda0709b in qemu_thread_start (args=0x7fd404001d50) at 
> ../util/qemu-thread-posix.c:541
> #4  0x7fd45e8a26ea in start_thread (arg=0x7fd3faffd700) at 
> pthread_create.c:477
> #5  0x7fd45cd2150f in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> 
> The multifd thread is posting channels_ready with an already freed
> multifd_send_state.
> 
> This is the bug Avihai has hit. We're going into multifd_save_cleanup()
> so early that multifd_new_send_channel_async() hasn't even had the
> chance to set p->running. So it misses the join and frees everything up
> while a second multifd thread is just starting.

Thanks for doing that.

Would this series makes that bug easier to happen?  I didn't do a lot of
test on it, it only survived the smoke test and the kicked CI job.  I think
we can still decide to fix that issues separately; but if this series makes
that easier to happen then that's definitely bad..

-- 
Peter Xu

Re: [PATCH v2 2/2] e1000e: fix link state on resume

2024-01-31 Thread Jason Wang

On Wed, Jan 24, 2024 at 6:40 PM Laurent Vivier  wrote:
>
> On resume e1000e_vm_state_change() always calls e1000e_autoneg_resume()
> that sets link_down to false, and thus activates the link even
> if we have disabled it.
>
> The problem can be reproduced starting qemu in paused state (-S) and
> then set the link to down. When we resume the machine the link appears
> to be up.
>
> Reproducer:
>
># qemu-system-x86_64 ... -device e1000e,netdev=netdev0,id=net0 -S
>
>{"execute": "qmp_capabilities" }
>{"execute": "set_link", "arguments": {"name": "net0", "up": false}}
>{"execute": "cont" }
>
> To fix the problem, merge the content of e1000e_vm_state_change()
> into e1000e_core_post_load() as e1000 does.
>
> Buglink: https://issues.redhat.com/browse/RHEL-21867
> Fixes: 6f3fbe4ed06a ("net: Introduce e1000e device emulation")
> Suggested-by: Akihiko Odaki 
> Signed-off-by: Laurent Vivier 
> ---
>

I've queued this.

Thanks

Re: [PATCH v2 1/2] igb: fix link state on resume

2024-01-31 Thread Jason Wang

On Wed, Jan 24, 2024 at 6:30 PM Laurent Vivier  wrote:
>
> On resume igb_vm_state_change() always calls igb_autoneg_resume()
> that sets link_down to false, and thus activates the link even
> if we have disabled it.
>
> The problem can be reproduced starting qemu in paused state (-S) and
> then set the link to down. When we resume the machine the link appears
> to be up.
>
> Reproducer:
>
># qemu-system-x86_64 ... -device igb,netdev=netdev0,id=net0 -S
>
>{"execute": "qmp_capabilities" }
>{"execute": "set_link", "arguments": {"name": "net0", "up": false}}
>{"execute": "cont" }
>
> To fix the problem, merge the content of igb_vm_state_change()
> into igb_core_post_load() as e1000 does.
>
> Buglink: https://issues.redhat.com/browse/RHEL-21867
> Fixes: 3a977deebe6b ("Intrdocue igb device emulation")
> Cc: akihiko.od...@daynix.com
> Suggested-by: Akihiko Odaki 
> Signed-off-by: Laurent Vivier 
> ---
>
> Notes:
> v2: Add Fixes: and a comment about igb_intrmgr_resume() purpose.
>

Queued.

Thanks

Re: [PATCH 0/2] Move net backend cleanup to NIC cleanup

2024-01-31 Thread Jason Wang

On Mon, Jan 29, 2024 at 9:24 PM Eugenio Pérez  wrote:
>
> Commit a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net
> structures if peer nic is present") effectively delayed the backend
> cleanup, allowing the frontend or the guest to access it resources as
> long as the frontend NIC is still visible to the guest.
>
> However it does not clean up the resources until the qemu process is
> over.  This causes an effective leak if the device is deleted with
> device_del, as there is no way to close the vdpa device.  This makes
> impossible to re-add that device to this or other QEMU instances until
> the first instance of QEMU is finished.
>
> Move the cleanup from qemu_cleanup to the NIC deletion.
>
> Fixes: a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures 
> if peer nic is present")
> Acked-by: Jason Wang 
> Reported-by: Lei Yang 
> Signed-off-by: Eugenio Pérez 
>
> Eugenio Pérez (2):
>   net: parameterize the removing client from nc list
>   net: move backend cleanup to NIC cleanup
>
>  net/net.c| 30 --
>  net/vhost-vdpa.c |  8 
>  2 files changed, 20 insertions(+), 18 deletions(-)
>
> --

Queued.

Thanks

Re: [PATCH v3 06/20] util/dsa: Add dependency idxd.

2024-01-31 Thread Peter Xu

On Thu, Jan 04, 2024 at 12:44:38AM +, Hao Xiang wrote:
> Idxd is the device driver for DSA (Intel Data Streaming
> Accelerator). The driver is fully functioning since Linux
> kernel 5.19. This change adds the driver's header file used
> for userspace development.
> 
> Signed-off-by: Hao Xiang 
> ---
>  linux-headers/linux/idxd.h | 356 +
>  1 file changed, 356 insertions(+)
>  create mode 100644 linux-headers/linux/idxd.h

This can be addressed and posted separately.  I see that we already updated
it to v6.7-rc5.

Did you check scripts/update-linux-headers.sh?  Please check and see the
usage.  If idxd.h is not pulled in for some reason, we may want to address
that.

-- 
Peter Xu

Re: [PATCH v2 02/14] plugins: scoreboard API

2024-01-31 Thread Pierrick Bouvier


On 1/31/24 11:44, Pierrick Bouvier wrote:

On 1/26/24 19:14, Alex Bennée wrote:

+need_realloc = TRUE;
+}
+plugin.scoreboard_size = cpu->cpu_index + 1;
+g_assert(plugin.scoreboard_size <= plugin.scoreboard_alloc_size);
+
+if (g_hash_table_size(plugin.scoreboards) == 0) {
+/* nothing to do, we just updated sizes for future scoreboards */
+return;
+}
+
+if (need_realloc) {
+#ifdef CONFIG_USER_ONLY
+/**
+ * cpus must be stopped, as some tb might still use an existing
+ * scoreboard.
+ */
+start_exclusive();
+#endif


Hmm this seems wrong to be USER_ONLY. While we don't expect to resize in
system mode if we did we certainly want to do it during exclusive
periods.



After investigation, current_cpu TLS var is not set in cpus-common.c at
this point.

Indeed we are not on any cpu_exec path, but in the cpu_realize_fn when
calling this (through qemu_plugin_vcpu_init_hook).

One obvious fix is to check if it's NULL or not, like:
--- a/cpu-common.c
+++ b/cpu-common.c
@@ -193,7 +193,7 @@ void start_exclusive(void)
   CPUState *other_cpu;
   int running_cpus;

-if (current_cpu->exclusive_context_count) {
+if (current_cpu && current_cpu->exclusive_context_count) {
   current_cpu->exclusive_context_count++;
   return;
   }

Does anyone suggest another possible fix? (like define current_cpu
somewhere, or moving qemu_plugin_vcpu_init_hook call).


Running init_hook asynchronously on cpu works and solves the problem, 
without any need to modify start/end exclusive code.

Re: [PATCH v3 15/20] migration/multifd: Add test hook to set normal page ratio.

2024-01-31 Thread Peter Xu

On Thu, Jan 04, 2024 at 12:44:47AM +, Hao Xiang wrote:
> +# @multifd-normal-page-ratio: Test hook setting the normal page ratio.
> +# (Since 8.2)

Please remember to touch all of them to 9.0 when repost, thanks.

-- 
Peter Xu

Re: [PATCH v3 03/20] multifd: Zero pages transmission

2024-01-31 Thread Peter Xu

On Thu, Jan 04, 2024 at 12:44:35AM +, Hao Xiang wrote:
> From: Juan Quintela 
> 
> This implements the zero page dection and handling.
> 
> Signed-off-by: Juan Quintela 
> ---
>  migration/multifd.c | 41 +++--
>  migration/multifd.h |  5 +
>  2 files changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 5a1f50c7e8..756673029d 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/cutils.h"
>  #include "qemu/rcu.h"
>  #include "exec/target_page.h"
>  #include "sysemu/sysemu.h"
> @@ -279,6 +280,12 @@ static void multifd_send_fill_packet(MultiFDSendParams 
> *p)
>  
>  packet->offset[i] = cpu_to_be64(temp);
>  }
> +for (i = 0; i < p->zero_num; i++) {
> +/* there are architectures where ram_addr_t is 32 bit */
> +uint64_t temp = p->zero[i];
> +
> +packet->offset[p->normal_num + i] = cpu_to_be64(temp);
> +}
>  }

I think changes like this needs to be moved into the previous patch.  I got
quite confused when reading previous one and only understood what happens
until now.  Fabiano, if you're going to pick these ones out and post
separately, please also consider.  Perhaps squashing them together?

-- 
Peter Xu

Re: [External] Re: [PATCH v3 01/20] multifd: Add capability to enable/disable zero_page

2024-01-31 Thread Peter Xu

On Tue, Jan 23, 2024 at 12:10:55PM -0300, Fabiano Rosas wrote:
> Hao Xiang  writes:
> 
> > On Sun, Jan 14, 2024 at 10:02 PM Shivam Kumar  
> > wrote:
> >>
> >>
> >>
> >> > On 04-Jan-2024, at 6:14 AM, Hao Xiang  wrote:
> >> >
> >> > From: Juan Quintela 
> >> >
> >> > We have to enable it by default until we introduce the new code.
> >> >
> >> > Signed-off-by: Juan Quintela 
> >> > ---
> >> > migration/options.c | 15 +++
> >> > migration/options.h |  1 +
> >> > qapi/migration.json |  8 +++-
> >> > 3 files changed, 23 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/migration/options.c b/migration/options.c
> >> > index 8d8ec73ad9..0f6bd78b9f 100644
> >> > --- a/migration/options.c
> >> > +++ b/migration/options.c
> >> > @@ -204,6 +204,8 @@ Property migration_properties[] = {
> >> > DEFINE_PROP_MIG_CAP("x-switchover-ack",
> >> > MIGRATION_CAPABILITY_SWITCHOVER_ACK),
> >> > DEFINE_PROP_MIG_CAP("x-dirty-limit", 
> >> > MIGRATION_CAPABILITY_DIRTY_LIMIT),
> >> > +DEFINE_PROP_MIG_CAP("main-zero-page",
> >> > +MIGRATION_CAPABILITY_MAIN_ZERO_PAGE),
> >> > DEFINE_PROP_END_OF_LIST(),
> >> > };
> >> >
> >> > @@ -284,6 +286,19 @@ bool migrate_multifd(void)
> >> > return s->capabilities[MIGRATION_CAPABILITY_MULTIFD];
> >> > }
> >> >
> >> > +bool migrate_use_main_zero_page(void)
> >> > +{
> >> > +/* MigrationState *s; */
> >> > +
> >> > +/* s = migrate_get_current(); */
> >> > +
> >> > +/*
> >> > + * We will enable this when we add the right code.
> >> > + * return 
> >> > s->enabled_capabilities[MIGRATION_CAPABILITY_MAIN_ZERO_PAGE];
> >> > + */
> >> > +return true;
> >> > +}
> >> > +
> >> > bool migrate_pause_before_switchover(void)
> >> > {
> >> > MigrationState *s = migrate_get_current();
> >> > diff --git a/migration/options.h b/migration/options.h
> >> > index 246c160aee..c901eb57c6 100644
> >> > --- a/migration/options.h
> >> > +++ b/migration/options.h
> >> > @@ -88,6 +88,7 @@ int migrate_multifd_channels(void);
> >> > MultiFDCompression migrate_multifd_compression(void);
> >> > int migrate_multifd_zlib_level(void);
> >> > int migrate_multifd_zstd_level(void);
> >> > +bool migrate_use_main_zero_page(void);
> >> > uint8_t migrate_throttle_trigger_threshold(void);
> >> > const char *migrate_tls_authz(void);
> >> > const char *migrate_tls_creds(void);
> >> > diff --git a/qapi/migration.json b/qapi/migration.json
> >> > index eb2f883513..80c4b13516 100644
> >> > --- a/qapi/migration.json
> >> > +++ b/qapi/migration.json
> >> > @@ -531,6 +531,12 @@
> >> > # and can result in more stable read performance.  Requires KVM
> >> > # with accelerator property "dirty-ring-size" set.  (Since 8.1)
> >> > #
> >> > +#
> >> > +# @main-zero-page: If enabled, the detection of zero pages will be
> >> > +#  done on the main thread.  Otherwise it is done on
> >> > +#  the multifd threads.
> >> > +#  (since 8.2)
> >> > +#
> >> Should the capability name be something like "zero-page-detection" or just 
> >> “zero-page”?
> >> CC: Fabiano Rosas
> >
> > I think the same concern was brought up last time Juan sent out the
> > original patchset. Right now, the zero page detection is done in the
> > main migration thread and it is always "ON". This change added a
> > functionality to move the zero page detection from the main thread to
> > the multifd sender threads. Now "main-zero-page" is turned "OFF" by
> > default, and zero page checking is done in the multifd sender thread
> > (much better performance). If user wants to run the zero page
> > detection in the main thread (keep current behavior), user can change
> > "main-zero-page" to "ON".
> >
> > Renaming it to "zero-page-detection" or just “zero-page” can not
> > differentiate the old behavior and the new behavior.
> 
> Yes, the main point here is what happens when we try to migrate from
> different QEMU versions that have/don't have this code. We need some way
> to maintain the compatibility. In this case Juan chose to keep this
> capability with the semantics of "old behavior" so that we can enable it
> on the new QEMU to match with the old binary that doesn't expect to see
> zero pages on the packet/stream.
> 
> > Here are the options:
> > 1) Keep the current behavior. "main-zero-page" is OFF by default and
> > zero page detection runs on the multifd thread by default. User can
> > turn the switch to "ON" if they want old behavior.
> > 2) Make "main-zero-page" switch ON as default. This would keep the
> > current behavior by default. User can set it to "OFF" for better
> > performance.
> 
> 3) Make multifd-zero-page ON by default. User can set it to OFF to get
> the old behavior. There was some consideration about how libvirt works
> that would make this one unusable, but I don't understand what's that
> about.
> 
> I would make this a default ON parameter instead of a capability.

If we want to add a knob for zero page,

Re: [PATCH 1/5] migration/multifd: Separate compression ops from non-compression

2024-01-31 Thread Peter Xu

On Wed, Jan 31, 2024 at 10:14:58AM -0300, Fabiano Rosas wrote:
> > I am thinking the p->normal is mostly redundant.. at least on the sender
> > side that I just read.  Since I'll be preparing a new spin of the multifd
> > cleanup series I posted, maybe I can append one more to try dropping
> > p->normal[] completely.
> 
> Just for reference, you don't have to use it, but I have this patch:
> 
> https://gitlab.com/farosas/qemu/-/commit/4316e145ae7e7bf378ef7fde64c2b02260362847

Oops, I missed that even though I did have a glance over your branch (only
the final look, though), or I could have picked it up indeed, sorry.  But
it's also good news then it means it's probably the right thing to do.

-- 
Peter Xu

Re: [PATCH] hw/intc: Handle the error of IOAPICCommonClass.realize()

2024-01-31 Thread Zhao Liu

Hi Philippe,

On Wed, Jan 31, 2024 at 05:48:24PM +0100, Philippe Mathieu-Daudé wrote:
> Date: Wed, 31 Jan 2024 17:48:24 +0100
> From: Philippe Mathieu-Daudé 
> Subject: Re: [PATCH] hw/intc: Handle the error of
>  IOAPICCommonClass.realize()
> 
> Hi Zhao,
> 
> On 31/1/24 15:29, Zhao Liu wrote:
> > From: Zhao Liu 
> > 
> > IOAPICCommonClass implements its own private realize(), and this private
> > realize() allows error.
> > 
> > Therefore, return directly if IOAPICCommonClass.realize() meets error.
> > 
> > Signed-off-by: Zhao Liu 
> > ---
> >   hw/intc/ioapic_common.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/hw/intc/ioapic_common.c b/hw/intc/ioapic_common.c
> > index cb9bf6214608..3772863377c2 100644
> > --- a/hw/intc/ioapic_common.c
> > +++ b/hw/intc/ioapic_common.c
> > @@ -162,6 +162,9 @@ static void ioapic_common_realize(DeviceState *dev, 
> > Error **errp)
> >   info = IOAPIC_COMMON_GET_CLASS(s);
> >   info->realize(dev, errp);
> > +if (*errp) {
> > +return;
> > +}
> 
> Could be clearer to deviate from DeviceRealize and let the
> handler return a boolean:
> 
> -- >8 --
> diff --git a/hw/intc/ioapic_internal.h b/hw/intc/ioapic_internal.h
> index 37b8565539..9664bb3e00 100644
> --- a/hw/intc/ioapic_internal.h
> +++ b/hw/intc/ioapic_internal.h
> @@ -92,3 +92,3 @@ struct IOAPICCommonClass {
> 
> -DeviceRealize realize;
> +bool (*realize)(DeviceState *dev, Error **errp);

What about I change the name of this interface?

Maybe ioapic_realize(), to distinguish it from DeviceClass.realize().

>  DeviceUnrealize unrealize;

Additionally, if I change the pattern of realize(), should I also avoid
the DeviceUnrealize macro for symmetry's sake and just declare a similar
function pointer as you said?

Further, do you think it's necessary to introduce InternalRealize and
InternalUnrealize macros for qdev to wrap these special realize/unrealize
to differentiate them from normal DeviceRealize/DeviceUnrealize?

Because I found that this pattern of realize() (i.e. registering the
realize() of the child class in the parent class instead of DeviceClass,
and then calling the registered realize() in parent realize()) is also
widely used in many cases:

* xen_block_realize()
* virtser_port_device_realize()
* x86_iommu_realize()
* virtio_input_device_realize()
* apic_common_realize()
* pc_dimm_realize()
* virtio_device_realize()
...

I'm not quite sure if this is a generic way to use it, although it looks
like it could easily be confused with DeviceClass.realize().

> diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> index 409d0c8c76..96747ef2b8 100644
> --- a/hw/i386/kvm/ioapic.c
> +++ b/hw/i386/kvm/ioapic.c
> @@ -121,3 +121,3 @@ static void kvm_ioapic_set_irq(void *opaque, int irq,
> int level)
> 
> -static void kvm_ioapic_realize(DeviceState *dev, Error **errp)
> +static bool kvm_ioapic_realize(DeviceState *dev, Error **errp)
>  {
> @@ -133,2 +133,4 @@ static void kvm_ioapic_realize(DeviceState *dev, Error
> **errp)
>  qdev_init_gpio_in(dev, kvm_ioapic_set_irq, IOAPIC_NUM_PINS);
> +
> +return true;
>  }
> diff --git a/hw/intc/ioapic_common.c b/hw/intc/ioapic_common.c
> index cb9bf62146..beab65be04 100644
> --- a/hw/intc/ioapic_common.c
> +++ b/hw/intc/ioapic_common.c
> @@ -163,3 +163,5 @@ static void ioapic_common_realize(DeviceState *dev,
> Error **errp)
>  info = IOAPIC_COMMON_GET_CLASS(s);
> -info->realize(dev, errp);
> +if (!info->realize(dev, errp)) {
> +return;
> +}
> 
> ---
> 
> What do you think?

I'm OK with the change here, but not sure if the return of private
realize() should be changed elsewhere as well.

Thanks,
Zhao

Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64

2024-01-31 Thread Peter Xu

On Wed, Jan 31, 2024 at 10:09:16AM -0300, Fabiano Rosas wrote:
> If we ask for KVM and it falls back to TCG, we need a cpu that supports
> both. We don't have that. I've put some command-line combinations at the
> end of the email[1], take a look.

Thanks a lot, Fabiano.  I think I have a better picture now.

Now the question is whether it'll be worthwhile we (migration) explicitly
provide code to workaround such issue in qtest, or we wait for ARM side
until we have a processor that can be both stable and support KVM+TCG.

I actually personally prefer to wait - it's not too bad after all, because
it only affects the new "n-1" migration test.  Most of the migration
functionality will still be covered there in CI for ARM.

Meanwhile, AFAIU we do have a plan upstream to have a stable aarch64 cpu
model sooner or later that at least support KVM.  If that will also be able
to support TCG then goal achieved.  Or vice versa, if we would be able to
add KVM support to some stable TCG-only cores (like neoverse-n1).

Do we have a plan in this area?  Copy both Peter & Eric.

If we can have that in 9.0 then that'll be perfect; we can already start to
switch migration tests to use the cpu model.

As of now, maybe we can (1) fix the gic-version in migration-test.c to be
stable; this seems a separate issue just to get prepared when a new model
comes, then (2) document above decision in migration-compat-aarch64 test in
.gitlab-ci.d/, if we can reach consensus.  Then we only rely on x86 for
"n-1" migration tests until later.

-- 
Peter Xu

Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU

2024-01-31 Thread Zhao Liu

Hi Daniel,

On Wed, Jan 31, 2024 at 10:28:42AM +, Daniel P. Berrangé wrote:
> Date: Wed, 31 Jan 2024 10:28:42 +
> From: "Daniel P. Berrangé" 
> Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU
> 
> On Wed, Jan 31, 2024 at 06:13:29PM +0800, Zhao Liu wrote:
> > From: Zhao Liu 

[snip]

> > However, after digging deeper into the description and use cases of
> > cluster in the device tree [3], I realized that the essential
> > difference between clusters and modules is that cluster is an extremely
> > abstract concept:
> >   * Cluster supports nesting though currently QEMU doesn't support
> > nested cluster topology. However, modules will not support nesting.
> >   * Also due to nesting, there is great flexibility in sharing resources
> > on clusters, rather than narrowing cluster down to sharing L2 (and
> > L3 tags) as the lowest topology level that contains cores.
> >   * Flexible nesting of cluster allows it to correspond to any level
> > between the x86 package and core.
> > 
> > Based on the above considerations, and in order to eliminate the naming
> > confusion caused by the mapping between general cluster and x86 module
> > in v7, we now formally introduce smp.modules as the new topology level.
> 
> What is the Linux kernel calling this topology level on x86 ?
> It will be pretty unfortunate if Linux and QEMU end up with
> different names for the same topology level.
> 

Now Intel's engineers in the Linux kernel are starting to use "module"
to refer to this layer of topology [4] to avoid confusion, where
previously the scheduler developers referred to the share L2 hierarchy
collectively as "cluster".

Looking at it this way, it makes more sense for QEMU to use the
"module" for x86.

[4]: 
https://lore.kernel.org/lkml/20231116142245.1233485-3-kan.li...@linux.intel.com/

Thanks,
Zhao

Re: [PATCH v5 0/6] Pointer Masking update for Zjpm v0.8

2024-01-31 Thread Alistair Francis

On Tue, Jan 30, 2024 at 5:23 AM Alexey Baturo  wrote:
>
> From: Alexey Baturo 
>
> Hi,
>
> This patch series targets Zjpm v0.8 extension.
> The spec itself could be found here: 
> https://github.com/riscv/riscv-j-extension/blob/8088461d8d66a7676872b61c908cbeb7cf5c5d1d/zjpm-spec.pdf
> This patch series is updated after the suggested comments:
> - add "x-" to the extension names to indicate experimental

Do you mind rebasing this on
https://github.com/alistair23/qemu/tree/riscv-to-apply.next ?

Alistair

>
> [v4]:
> Patch series updated after the suggested comments:
> - removed J-letter extension as it's unused
> - renamed and fixed function to detect if address should be sign-extended
> - zeroed unused context variables and moved computation logic to another patch
> - bumped pointer masking version_id and minimum_version_id by 1
>
> Thanks
>
> [v3]:
> There patches are updated after Richard's comments:
> - moved new tb flags to the end
> - used tcg_gen_(s)extract to get the final address
> - properly handle CONFIG_USER_ONLY
>
> Thanks
>
> [v2]:
> As per Richard's suggestion I made pmm field part of tb_flags.
> It allowed to get rid of global variable to store pmlen.
> Also it allowed to simplify all the machinery around it.
>
> Thanks
>
> [v1]:
> Hi all,
>
> It looks like Zjpm v0.8 is almost frozen and we don't expect it change 
> drastically anymore.
> Compared to the original implementation with explicit base and mask CSRs, we 
> now only have
> several fixed options for number of masked bits which are set using existing 
> CSRs.
> The changes have been tested with handwritten assembly tests and LLVM HWASAN
> test suite.
>
> Thanks
>
> Alexey Baturo (6):
>   target/riscv: Remove obsolete pointer masking extension code.
>   target/riscv: Add new CSR fields for S{sn,mn,m}pm extensions as part
> of Zjpm v0.8
>   target/riscv: Add helper functions to calculate current number of
> masked bits for pointer masking
>   target/riscv: Add pointer masking tb flags
>   target/riscv: Update address modify functions to take into account
> pointer masking
>   target/riscv: Enable updates for pointer masking variables and thus
> enable pointer masking extension
>
>  target/riscv/cpu.c   |  22 ++-
>  target/riscv/cpu.h   |  46 +++--
>  target/riscv/cpu_bits.h  |  90 +-
>  target/riscv/cpu_cfg.h   |   3 +
>  target/riscv/cpu_helper.c|  97 +-
>  target/riscv/csr.c   | 337 ++-
>  target/riscv/machine.c   |  20 +--
>  target/riscv/pmp.c   |  13 +-
>  target/riscv/pmp.h   |  11 +-
>  target/riscv/tcg/tcg-cpu.c   |   5 +-
>  target/riscv/translate.c |  46 ++---
>  target/riscv/vector_helper.c |  15 +-
>  12 files changed, 158 insertions(+), 547 deletions(-)
>
> --
> 2.34.1
>
>

答复: Disk migration from qcow2 to SPDK

2024-01-31 Thread 陈孚

can someone help me ?

发件人: 陈孚
发送时间: 2024年1月30日 10:51
收件人: 'qemu-bl...@nongnu.org' 
抄送: 'qemu-devel@nongnu.org' 
主题: Disk migration from qcow2 to SPDK

Hello everyone,

Recently, we are looking to switch the VM’s disks from the qemu driver with the 
qcow2 format to a vhost-user-blk driver based on SPDK. Currently, we are able 
to copy data into an SPDK-based target using the blockcopy method. We would 
like to inquire if there is a theoretical possibility to switch the virtio 
backend from qemu to vhost-user-blk online. If it's possible, what would need 
to be done?

Best regards

Re: [PATCH] tcg: Fixes set const_args[i] wrong value when instructions imm is 0

2024-01-31 Thread gaosong


在 2024/2/1 上午5:16, Richard Henderson 写道:

On 1/31/24 17:27, Song Gao wrote:

It seems that tcg_reg_alloc_op() set const_args[i] wrong value
when instructions imm is 0. The LoongArch tcg_out_vec_op() cmp_vec
use the wrong const_args[2].
e.g
    The wrong const_args[2] is 0.
    IN: vslti.w v5, v4, 0x0   OUT: vslt.w  v1, v1, v0

    The right const_args[2] is 1.
    IN: vslti.w v5, v4, 0x0   OUT: vslti.w v1, v1, 0x0

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2136
Signed-off-by: Song Gao 
---
  tcg/tcg.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index e2c38f6d11..5b290123bc 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -4808,7 +4808,7 @@ static void tcg_reg_alloc_op(TCGContext *s, 
const TCGOp *op)

  arg_ct = >args_ct[i];
  ts = arg_temp(arg);
  -    if (ts->val_type == TEMP_VAL_CONST
+    if ((ts->val_type == TEMP_VAL_CONST || ts->kind == TEMP_CONST)
  && tcg_target_const_match(ts->val, ts->type, 
arg_ct->ct, TCGOP_VECE(op))) {

  /* constant is OK for instruction */
  const_args[i] = 1;


This is wrong.

I strongly suspect that the TEMP_CONST value 0 has been loaded into a 
register for use in another operation, and the register allocator sees 
that it is still there.



Ah, I'm not familiar with this piece of code,  I just try to fix the bug,
and thanks for your suggestion.

Thanks.
Song Gao


r~

Re: [External] Re: [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work

2024-01-31 Thread Hao Xiang

On Wed, Jan 31, 2024 at 5:19 AM Fabiano Rosas  wrote:
>
> Peter Xu  writes:
>
> > On Mon, Jan 29, 2024 at 09:51:06AM -0300, Fabiano Rosas wrote:
> >> Peter Xu  writes:
> >>
> >> > On Mon, Jan 29, 2024 at 01:41:01AM +, Liu, Yuan1 wrote:
> >> >> Because this change has an impact on the previous live migration
> >> >> With IAA Patch, does the submission of the next version needs
> >> >> to be submitted based on this change?
> >> >
> >> > I'd say hold off a little while until we're more certain on the planned
> >> > interface changes, to avoid you rebase your code back and forth; unless
> >> > you're pretty confident that this will be the right approach.
> >> >
> >> > I apologize on not having looked at any of the QAT/IAA compression / zero
> >> > detection series posted on the list; I do plan to read them very soon too
> >> > after Fabiano.  So I may not have a complete full picture here yet, 
> >> > please
> >> > bare with me.
> >> >
> >> > If this series is trying to provide a base ground for all the efforts,
> >> > it'll be great if we can thoroughly discuss here and settle an approach
> >> > soon that will satisfy everyone.
> >>
> >> Just a summary if it helps:
> >>
> >> For compression work (IAA/QPL, QAT) the discussion is around having a
> >> new "compression acceleration" option that enables the accelerators and
> >> is complementary to the existing zlib compression method. We'd choose
> >> those automatically based on availability and we'd make HW accelerated
> >> compression produce a stream that is compatible with QEMU's zlib stream
> >> so we could migrate between solutions.
> >>
> >> For zero page work and zero page acceleration (DSA), the question is how
> >> to fit zero page detection into multifd and whether we need a new hook
> >> multifd_ops->zero_page_detect() (or similar) to allow client code to
> >> provide it's own zero page detection methods. My worry here is that
> >> teaching multifd to recognize zero pages is one more coupling to the
> >> "pages" data type. Ideallly we'd find a way to include that operation as
> >> a prepare() responsibility and the client code would deal with it.
> >
> > Thanks Fabiano.

Hi Fabiano,

Your current refactoring assumes that compression ops and multifd
socket ops are mutually exclusive. Both of them need to implement the
entire MultiFDMethods interface. I think this works fine for now. Once
we introduce multifd zero page checking and we add a new interface for
that, we are adding a new method zero_page_detect() on the
MultiFDMethods interface. If we do that, zero_page_detect() needs to
be implemented in multifd_socket_ops and it also needs to be
implemented in zlib and zstd. On top of that, if we add an accelerator
to offload zero_page_detect(), that accelerator configuration can
co-exist with compression or socket. That makes things quite
complicated in my opinion.

Can we create an instance of MultiFDMethods at runtime and fill each
method depending on the configuration? If methods are not filled, we
fallback to fill it with the default implementation (like what
socket.c provides) For instance, if zstd is enabled and zero page
checking using CPU, the interface will be filled with all the
functions zstd currently implements and since zstd doesn't implement
zero_page_detect(), we will fallback to fill zero_page_detect() with
the default multifd zero page checking implementation.

> >
> > Since I'm preparing the old series to post for some fundamental cleanups
> > around multifd, and when I'm looking around the code, I noticed that
> > _maybe_ it'll also be eaiser to apply such a series if we can cleanup more
> > things then move towards a clean base to add more accelerators.
> >
> > I agree many ideas in your this series, but I may address it slightly
> > different (e.g., I want to avoid send(), but you can consider that in the
> > fixed-ram series instead), also it'll be after some other cleanup I plan to
> > give a stab at which is not yet covered in this series.  I hope I can add
> > your "Co-developed-by" in some of the patches there.  If you haven't spend
> > more time on new version of this series, please wait 1-2 days so I can post
> > my thoughts.
>
> Sure, go ahead.
>

Re: [PATCH v5 6/6] target/riscv: Enable updates for pointer masking variables and thus enable pointer masking extension

2024-01-31 Thread Alistair Francis

On Tue, Jan 30, 2024 at 5:24 AM Alexey Baturo  wrote:
>
> From: Alexey Baturo 
>
> Signed-off-by: Alexey Baturo 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index d8de1f1890..bf431ab728 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -153,6 +153,9 @@ const RISCVIsaExtData isa_edata_arr[] = {
>  ISA_EXT_DATA_ENTRY(svinval, PRIV_VERSION_1_12_0, ext_svinval),
>  ISA_EXT_DATA_ENTRY(svnapot, PRIV_VERSION_1_12_0, ext_svnapot),
>  ISA_EXT_DATA_ENTRY(svpbmt, PRIV_VERSION_1_12_0, ext_svpbmt),
> +ISA_EXT_DATA_ENTRY(ssnpm, PRIV_VERSION_1_12_0, ext_ssnpm),
> +ISA_EXT_DATA_ENTRY(smnpm, PRIV_VERSION_1_12_0, ext_smnpm),
> +ISA_EXT_DATA_ENTRY(smmpm, PRIV_VERSION_1_12_0, ext_smmpm),
>  ISA_EXT_DATA_ENTRY(xtheadba, PRIV_VERSION_1_11_0, ext_xtheadba),
>  ISA_EXT_DATA_ENTRY(xtheadbb, PRIV_VERSION_1_11_0, ext_xtheadbb),
>  ISA_EXT_DATA_ENTRY(xtheadbs, PRIV_VERSION_1_11_0, ext_xtheadbs),
> @@ -1395,6 +1398,12 @@ const RISCVCPUMultiExtConfig 
> riscv_cpu_experimental_exts[] = {
>  MULTI_EXT_CFG_BOOL("x-zvfbfmin", ext_zvfbfmin, false),
>  MULTI_EXT_CFG_BOOL("x-zvfbfwma", ext_zvfbfwma, false),
>
> +/* Zjpm v0.8 extensions */
> +MULTI_EXT_CFG_BOOL("x-ssnpm", ext_ssnpm, false),
> +MULTI_EXT_CFG_BOOL("x-smnpm", ext_smnpm, false),
> +MULTI_EXT_CFG_BOOL("x-smmpm", ext_smmpm, false),
> +
> +
>  DEFINE_PROP_END_OF_LIST(),
>  };
>
> --
> 2.34.1
>
>

Re: [PATCH v5 3/6] target/riscv: Add helper functions to calculate current number of masked bits for pointer masking

2024-01-31 Thread Alistair Francis

On Tue, Jan 30, 2024 at 5:24 AM Alexey Baturo  wrote:
>
> From: Alexey Baturo 
>
> Signed-off-by: Alexey Baturo 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.h|  4 +++
>  target/riscv/cpu_helper.c | 58 +++
>  2 files changed, 62 insertions(+)
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index c9bed5c9fc..1c8979c1c8 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -671,6 +671,10 @@ static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, 
> target_ulong vtype)
>  void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
>uint64_t *cs_base, uint32_t *pflags);
>
> +bool riscv_cpu_virt_mem_enabled(CPURISCVState *env);
> +RISCVPmPmm riscv_pm_get_pmm(CPURISCVState *env);
> +int riscv_pm_get_pmlen(RISCVPmPmm pmm);
> +
>  RISCVException riscv_csrrw(CPURISCVState *env, int csrno,
> target_ulong *ret_value,
> target_ulong new_value, target_ulong write_mask);
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index a3d477d226..9640e4c2c5 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -139,6 +139,64 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
>  *pflags = flags;
>  }
>
> +RISCVPmPmm riscv_pm_get_pmm(CPURISCVState *env)
> +{
> +int pmm = 0;
> +#ifndef CONFIG_USER_ONLY
> +int priv_mode = cpu_address_mode(env);
> +/* Get current PMM field */
> +switch (priv_mode) {
> +case PRV_M:
> +pmm = riscv_cpu_cfg(env)->ext_smmpm ?
> +  get_field(env->mseccfg, MSECCFG_PMM) : PMM_FIELD_DISABLED;
> +break;
> +case PRV_S:
> +pmm = riscv_cpu_cfg(env)->ext_smnpm ?
> +  get_field(env->menvcfg, MENVCFG_PMM) : PMM_FIELD_DISABLED;
> +break;
> +case PRV_U:
> +pmm = riscv_cpu_cfg(env)->ext_ssnpm ?
> +  get_field(env->senvcfg, SENVCFG_PMM) : PMM_FIELD_DISABLED;
> +break;
> +default:
> +g_assert_not_reached();
> +}
> +#endif
> +return pmm;
> +}
> +
> +bool riscv_cpu_virt_mem_enabled(CPURISCVState *env)
> +{
> +bool virt_mem_en = false;
> +#ifndef CONFIG_USER_ONLY
> +int satp_mode = 0;
> +int priv_mode = cpu_address_mode(env);
> +/* Get current PMM field */
> +if (riscv_cpu_mxl(env) == MXL_RV32) {
> +satp_mode = get_field(env->satp, SATP32_MODE);
> +} else {
> +satp_mode = get_field(env->satp, SATP64_MODE);
> +}
> +virt_mem_en = ((satp_mode != VM_1_10_MBARE) && (priv_mode != PRV_M));
> +#endif
> +return virt_mem_en;
> +}
> +
> +int riscv_pm_get_pmlen(RISCVPmPmm pmm)
> +{
> +switch (pmm) {
> +case PMM_FIELD_DISABLED:
> +return 0;
> +case PMM_FIELD_PMLEN7:
> +return 7;
> +case PMM_FIELD_PMLEN16:
> +return 16;
> +default:
> +g_assert_not_reached();
> +}
> +return -1;
> +}
> +
>  #ifndef CONFIG_USER_ONLY
>
>  /*
> --
> 2.34.1
>
>

Re: [PATCH v4 7/8] STM32L4x5: Use the RCC Sysclk

2024-01-31 Thread Alistair Francis

On Wed, Jan 31, 2024 at 3:21 AM Arnaud Minier
 wrote:
>
> Now that we can generate reliable clock frequencies from the RCC, remove
> the hacky definition of the sysclk in the b_l475e_iot01a initialisation
> code and use the correct RCC clock.
>
> Signed-off-by: Arnaud Minier 
> Signed-off-by: Inès Varhol 

Acked-by: Alistair Francis 

Alistair

> ---
>  hw/arm/b-l475e-iot01a.c| 10 +-
>  hw/arm/stm32l4x5_soc.c | 33 -
>  include/hw/arm/stm32l4x5_soc.h |  3 ---
>  3 files changed, 5 insertions(+), 41 deletions(-)
>
> diff --git a/hw/arm/b-l475e-iot01a.c b/hw/arm/b-l475e-iot01a.c
> index 6ecde2db15..d862aa43fc 100644
> --- a/hw/arm/b-l475e-iot01a.c
> +++ b/hw/arm/b-l475e-iot01a.c
> @@ -26,27 +26,19 @@
>  #include "qapi/error.h"
>  #include "hw/boards.h"
>  #include "hw/qdev-properties.h"
> -#include "hw/qdev-clock.h"
>  #include "qemu/error-report.h"
>  #include "hw/arm/stm32l4x5_soc.h"
>  #include "hw/arm/boot.h"
>
> -/* Main SYSCLK frequency in Hz (80MHz) */
> -#define MAIN_SYSCLK_FREQ_HZ 8000ULL
> +/* B-L475E-IOT01A implementation is derived from netduinoplus2 */
>
>  static void b_l475e_iot01a_init(MachineState *machine)
>  {
>  const Stm32l4x5SocClass *sc;
>  DeviceState *dev;
> -Clock *sysclk;
> -
> -/* This clock doesn't need migration because it is fixed-frequency */
> -sysclk = clock_new(OBJECT(machine), "SYSCLK");
> -clock_set_hz(sysclk, MAIN_SYSCLK_FREQ_HZ);
>
>  dev = qdev_new(TYPE_STM32L4X5XG_SOC);
>  object_property_add_child(OBJECT(machine), "soc", OBJECT(dev));
> -qdev_connect_clock_in(dev, "sysclk", sysclk);
>  sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
>
>  sc = STM32L4X5_SOC_GET_CLASS(dev);
> diff --git a/hw/arm/stm32l4x5_soc.c b/hw/arm/stm32l4x5_soc.c
> index d5c04b446d..347a5377e5 100644
> --- a/hw/arm/stm32l4x5_soc.c
> +++ b/hw/arm/stm32l4x5_soc.c
> @@ -85,9 +85,6 @@ static void stm32l4x5_soc_initfn(Object *obj)
>  object_initialize_child(obj, "exti", >exti, TYPE_STM32L4X5_EXTI);
>  object_initialize_child(obj, "syscfg", >syscfg, 
> TYPE_STM32L4X5_SYSCFG);
>  object_initialize_child(obj, "rcc", >rcc, TYPE_STM32L4X5_RCC);
> -
> -s->sysclk = qdev_init_clock_in(DEVICE(s), "sysclk", NULL, NULL, 0);
> -s->refclk = qdev_init_clock_in(DEVICE(s), "refclk", NULL, NULL, 0);
>  }
>
>  static void stm32l4x5_soc_realize(DeviceState *dev_soc, Error **errp)
> @@ -99,30 +96,6 @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, 
> Error **errp)
>  DeviceState *armv7m;
>  SysBusDevice *busdev;
>
> -/*
> - * We use s->refclk internally and only define it with 
> qdev_init_clock_in()
> - * so it is correctly parented and not leaked on an init/deinit; it is 
> not
> - * intended as an externally exposed clock.
> - */
> -if (clock_has_source(s->refclk)) {
> -error_setg(errp, "refclk clock must not be wired up by the board 
> code");
> -return;
> -}
> -
> -if (!clock_has_source(s->sysclk)) {
> -error_setg(errp, "sysclk clock must be wired up by the board code");
> -return;
> -}
> -
> -/*
> - * TODO: ideally we should model the SoC RCC and its ability to
> - * change the sysclk frequency and define different sysclk sources.
> - */
> -
> -/* The refclk always runs at frequency HCLK / 8 */
> -clock_set_mul_div(s->refclk, 8, 1);
> -clock_set_source(s->refclk, s->sysclk);
> -
>  if (!memory_region_init_rom(>flash, OBJECT(dev_soc), "flash",
>  sc->flash_size, errp)) {
>  return;
> @@ -152,8 +125,10 @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, 
> Error **errp)
>  qdev_prop_set_uint32(armv7m, "num-prio-bits", 4);
>  qdev_prop_set_string(armv7m, "cpu-type", ARM_CPU_TYPE_NAME("cortex-m4"));
>  qdev_prop_set_bit(armv7m, "enable-bitband", true);
> -qdev_connect_clock_in(armv7m, "cpuclk", s->sysclk);
> -qdev_connect_clock_in(armv7m, "refclk", s->refclk);
> +qdev_connect_clock_in(armv7m, "cpuclk",
> +qdev_get_clock_out(DEVICE(&(s->rcc)), "cortex-fclk-out"));
> +qdev_connect_clock_in(armv7m, "refclk",
> +qdev_get_clock_out(DEVICE(&(s->rcc)), "cortex-refclk-out"));
>  object_property_set_link(OBJECT(>armv7m), "memory",
>   OBJECT(system_memory), _abort);
>  if (!sysbus_realize(SYS_BUS_DEVICE(>armv7m), errp)) {
> diff --git a/include/hw/arm/stm32l4x5_soc.h b/include/hw/arm/stm32l4x5_soc.h
> index e480fcc976..1f71298b45 100644
> --- a/include/hw/arm/stm32l4x5_soc.h
> +++ b/include/hw/arm/stm32l4x5_soc.h
> @@ -50,9 +50,6 @@ struct Stm32l4x5SocState {
>  MemoryRegion sram2;
>  MemoryRegion flash;
>  MemoryRegion flash_alias;
> -
> -Clock *sysclk;
> -Clock *refclk;
>  };
>
>  struct Stm32l4x5SocClass {
> --
> 2.34.1
>
>

Re: [PATCH v4 3/8] Add an internal PLL Clock object

2024-01-31 Thread Alistair Francis

On Wed, Jan 31, 2024 at 2:09 AM Arnaud Minier
 wrote:
>
> This object represents the PLLs and their channels. The PLLs allow for a
> more fine-grained control of the clocks frequency.
>
> Wasn't sure about how to handle the reset and the migration so used the
> same appproach as the BCM2835 CPRMAN.
>
> Signed-off-by: Arnaud Minier 
> Signed-off-by: Inès Varhol 
> ---
>  hw/misc/stm32l4x5_rcc.c   | 175 ++
>  hw/misc/trace-events  |   5 +
>  include/hw/misc/stm32l4x5_rcc.h   |  40 +
>  include/hw/misc/stm32l4x5_rcc_internals.h |  22 +++
>  4 files changed, 242 insertions(+)
>
> diff --git a/hw/misc/stm32l4x5_rcc.c b/hw/misc/stm32l4x5_rcc.c
> index ed10832f88..fb0233c3e9 100644
> --- a/hw/misc/stm32l4x5_rcc.c
> +++ b/hw/misc/stm32l4x5_rcc.c
> @@ -162,6 +162,156 @@ static void clock_mux_set_source(RccClockMuxState *mux, 
> RccClockMuxSource src)
>  clock_mux_update(mux);
>  }
>
> +static void pll_update(RccPllState *pll)
> +{
> +uint64_t vco_freq, old_channel_freq, channel_freq;
> +int i;
> +
> +/* The common PLLM factor is handled by the PLL mux */
> +vco_freq = muldiv64(clock_get_hz(pll->in), pll->vco_multiplier, 1);
> +
> +for (i = 0; i < RCC_NUM_CHANNEL_PLL_OUT; i++) {
> +if (!pll->channel_exists[i]) {
> +continue;
> +}
> +
> +old_channel_freq = clock_get_hz(pll->channels[i]);
> +if (!pll->enabled ||
> +!pll->channel_enabled[i] ||
> +!pll->channel_divider[i]) {
> +channel_freq = 0;
> +} else {
> +channel_freq = muldiv64(vco_freq,
> +1,
> +pll->channel_divider[i]);
> +}
> +
> +/* No change, early continue to avoid log spam and useless 
> propagation */
> +if (old_channel_freq == channel_freq) {
> +continue;
> +}
> +
> +clock_update_hz(pll->channels[i], channel_freq);
> +trace_stm32l4x5_rcc_pll_update(pll->id, i, vco_freq,
> +old_channel_freq, channel_freq);
> +}
> +}
> +
> +static void pll_src_update(void *opaque, ClockEvent event)
> +{
> +RccPllState *s = opaque;
> +pll_update(s);
> +}
> +
> +static void pll_init(Object *obj)
> +{
> +RccPllState *s = RCC_PLL(obj);
> +size_t i;
> +
> +s->in = qdev_init_clock_in(DEVICE(s), "in",
> +   pll_src_update, s, ClockUpdate);
> +
> +const char *names[] = {
> +"out-p", "out-q", "out-r",
> +};
> +
> +for (i = 0; i < RCC_NUM_CHANNEL_PLL_OUT; i++) {
> +s->channels[i] = qdev_init_clock_out(DEVICE(s), names[i]);
> +}
> +}
> +
> +static void pll_reset_hold(Object *obj)
> +{ }
> +
> +static const VMStateDescription pll_vmstate = {
> +.name = TYPE_RCC_PLL,
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT32(id, RccPllState),
> +VMSTATE_CLOCK(in, RccPllState),
> +VMSTATE_ARRAY_CLOCK(channels, RccPllState,
> +RCC_NUM_CHANNEL_PLL_OUT),
> +VMSTATE_BOOL(enabled, RccPllState),
> +VMSTATE_UINT32(vco_multiplier, RccPllState),
> +VMSTATE_BOOL_ARRAY(channel_enabled, RccPllState, 
> RCC_NUM_CHANNEL_PLL_OUT),
> +VMSTATE_BOOL_ARRAY(channel_exists, RccPllState, 
> RCC_NUM_CHANNEL_PLL_OUT),
> +VMSTATE_UINT32_ARRAY(channel_divider, RccPllState, 
> RCC_NUM_CHANNEL_PLL_OUT),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
> +static void pll_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +ResettableClass *rc = RESETTABLE_CLASS(klass);
> +
> +rc->phases.hold = pll_reset_hold;
> +dc->vmsd = _vmstate;
> +}
> +
> +static void pll_set_vco_multiplier(RccPllState *pll, uint32_t vco_multiplier)
> +{
> +if (pll->vco_multiplier == vco_multiplier) {
> +return;
> +}
> +
> +if (vco_multiplier < 8 || vco_multiplier > 86) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +"%s: VCO multiplier is out of bound (%u) for PLL %u\n",
> +__func__, vco_multiplier, pll->id);

Should we bail out with an invalid value?

Alistair

> +}
> +
> +trace_stm32l4x5_rcc_pll_set_vco_multiplier(pll->id,
> +pll->vco_multiplier, vco_multiplier);
> +
> +pll->vco_multiplier = vco_multiplier;
> +pll_update(pll);
> +}
> +
> +static void pll_set_enable(RccPllState *pll, bool enabled)
> +{
> +if (pll->enabled == enabled) {
> +return;
> +}
> +
> +pll->enabled = enabled;
> +pll_update(pll);
> +}
> +
> +static void pll_set_channel_enable(RccPllState *pll,
> +   PllCommonChannels channel,
> +   bool enabled)
> +{
> +if (pll->channel_enabled[channel] == enabled) {
> +return;
> +}
> +
> +if (enabled) {
> +

Re: Call for GSoC/Outreachy internship project ideas

2024-01-31 Thread Stefan Hajnoczi

On Wed, Jan 31, 2024, 18:55 Gurchetan Singh 
wrote:

>
>
> On Wed, Jan 24, 2024 at 4:51 AM Stefan Hajnoczi 
> wrote:
>
>> On Tue, 23 Jan 2024 at 22:47, Gurchetan Singh
>>  wrote:
>> > Title:
>> > - Improve display integration for upstream virtualized graphics
>> >
>> > Summary:
>> > - The Rutabaga Virtual Graphics interface's UI integration upstream is
>> very simple, but in deployment it will be complex.  This project aims to
>> bridge the gap between downstream consumers and upstream QEMU.
>> >
>> > Looking for someone interested in Rust + system level graphics to help
>> realize the next steps.
>>
>> Hi Gurchetan,
>> It's unclear what this project idea entails.
>>
>> Based on your email my guess is you're looking for someone to help
>> upstream code into QEMU, but I'm not sure. Last year there was a
>> project to upstream bsd-user emulation code into QEMU and I think that
>> type of project can work well.
>>
>> Or maybe you're looking for someone to write a QEMU UI code that uses
>> rutabaga_gfx.
>>
>> Can you describe the next steps in more detail?
>>
>> The project description should contain enough information for someone
>> who knows how to program but has no domain knowledge in Rutabaga,
>> virtio-gpu, or QEMU.
>>
>> > Note: developers should be willing to sign Google CLA, here:
>> >
>> > https://cla.developers.google.com/about/google-individual
>> >
>> > But everything will be FOSS.
>>
>> Which codebase will this project touch? If a CLA is required then it
>> sounds like it's not qemu.git?
>>
>
> Good points, I think we need to think about this a bit more   I hereby
> withdraw the idea!
>

The project doesn't necessarily need to be in qemu.git. Just something that
is related to QEMU. Past projects have included rust-vmm crates, libnbd,
libblkio, etc.

If the code is used in conjunction with a QEMU guest, then it is probably
in scope.

We can discuss further if you like.

Stefan


>
>>
>> > Links
>> > - https://crosvm.dev/book/appendix/rutabaga_gfx.html
>> > -
>> https://patchew.org/QEMU/20230421011223.718-1-gurchetansi...@chromium.org/
>> >
>> > Skills
>> >  - Level: Advanced
>> >  - Rust, Vulkan, virtualization, cross-platform graphics
>>
>

Re: [PATCH v4 1/8] Implement STM32L4x5_RCC skeleton

2024-01-31 Thread Alistair Francis

On Wed, Jan 31, 2024 at 2:09 AM Arnaud Minier
 wrote:
>
> Add the necessary files to add a simple RCC implementation with just
> reads from and writes to registers. Also instanciate the RCC in the
> STM32L4x5_SoC. It is needed for accurate emulation of all the SoC
> clocks and timers.
>
> Signed-off-by: Arnaud Minier 
> Signed-off-by: Inès Varhol 

Acked-by: Alistair Francis 

Alistair

> ---
>  MAINTAINERS   |   5 +-
>  docs/system/arm/b-l475e-iot01a.rst|   2 +-
>  hw/arm/Kconfig|   1 +
>  hw/arm/stm32l4x5_soc.c|  12 +-
>  hw/misc/Kconfig   |   3 +
>  hw/misc/meson.build   |   1 +
>  hw/misc/stm32l4x5_rcc.c   | 433 ++
>  hw/misc/trace-events  |   4 +
>  include/hw/arm/stm32l4x5_soc.h|   2 +
>  include/hw/misc/stm32l4x5_rcc.h   |  80 
>  include/hw/misc/stm32l4x5_rcc_internals.h | 286 ++
>  11 files changed, 826 insertions(+), 3 deletions(-)
>  create mode 100644 hw/misc/stm32l4x5_rcc.c
>  create mode 100644 include/hw/misc/stm32l4x5_rcc.h
>  create mode 100644 include/hw/misc/stm32l4x5_rcc_internals.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index dfaca8323e..50ab2982bb 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1128,7 +1128,10 @@ M: Inès Varhol 
>  L: qemu-...@nongnu.org
>  S: Maintained
>  F: hw/arm/stm32l4x5_soc.c
> -F: include/hw/arm/stm32l4x5_soc.h
> +F: hw/misc/stm32l4x5_exti.c
> +F: hw/misc/stm32l4x5_syscfg.c
> +F: hw/misc/stm32l4x5_rcc.c
> +F: include/hw/*/stm32l4x5_*.h
>
>  B-L475E-IOT01A IoT Node
>  M: Arnaud Minier 
> diff --git a/docs/system/arm/b-l475e-iot01a.rst 
> b/docs/system/arm/b-l475e-iot01a.rst
> index 1a021b306a..b857a56ca4 100644
> --- a/docs/system/arm/b-l475e-iot01a.rst
> +++ b/docs/system/arm/b-l475e-iot01a.rst
> @@ -17,13 +17,13 @@ Currently B-L475E-IOT01A machine's only supports the 
> following devices:
>  - Cortex-M4F based STM32L4x5 SoC
>  - STM32L4x5 EXTI (Extended interrupts and events controller)
>  - STM32L4x5 SYSCFG (System configuration controller)
> +- STM32L4x5 RCC (Reset and clock control)
>
>  Missing devices
>  """
>
>  The B-L475E-IOT01A does *not* support the following devices:
>
> -- Reset and clock control (RCC)
>  - Serial ports (UART)
>  - General-purpose I/Os (GPIO)
>  - Analog to Digital Converter (ADC)
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index f927878152..92b72d56dc 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -465,6 +465,7 @@ config STM32L4X5_SOC
>  select OR_IRQ
>  select STM32L4X5_SYSCFG
>  select STM32L4X5_EXTI
> +select STM32L4X5_RCC
>
>  config XLNX_ZYNQMP_ARM
>  bool
> diff --git a/hw/arm/stm32l4x5_soc.c b/hw/arm/stm32l4x5_soc.c
> index f470ff74ec..d5c04b446d 100644
> --- a/hw/arm/stm32l4x5_soc.c
> +++ b/hw/arm/stm32l4x5_soc.c
> @@ -75,6 +75,8 @@ static const int exti_irq[NUM_EXTI_IRQ] = {
>  1,  /* PVM4 wakeup */
>  78  /* LCD wakeup, Direct  */
>  };
> +#define RCC_BASE_ADDRESS 0x40021000
> +#define RCC_IRQ 5
>
>  static void stm32l4x5_soc_initfn(Object *obj)
>  {
> @@ -82,6 +84,7 @@ static void stm32l4x5_soc_initfn(Object *obj)
>
>  object_initialize_child(obj, "exti", >exti, TYPE_STM32L4X5_EXTI);
>  object_initialize_child(obj, "syscfg", >syscfg, 
> TYPE_STM32L4X5_SYSCFG);
> +object_initialize_child(obj, "rcc", >rcc, TYPE_STM32L4X5_RCC);
>
>  s->sysclk = qdev_init_clock_in(DEVICE(s), "sysclk", NULL, NULL, 0);
>  s->refclk = qdev_init_clock_in(DEVICE(s), "refclk", NULL, NULL, 0);
> @@ -184,6 +187,14 @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, 
> Error **errp)
>qdev_get_gpio_in(DEVICE(>exti), i));
>  }
>
> +/* RCC device */
> +busdev = SYS_BUS_DEVICE(>rcc);
> +if (!sysbus_realize(busdev, errp)) {
> +return;
> +}
> +sysbus_mmio_map(busdev, 0, RCC_BASE_ADDRESS);
> +sysbus_connect_irq(busdev, 0, qdev_get_gpio_in(armv7m, RCC_IRQ));
> +
>  /* APB1 BUS */
>  create_unimplemented_device("TIM2",  0x4000, 0x400);
>  create_unimplemented_device("TIM3",  0x4400, 0x400);
> @@ -246,7 +257,6 @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, 
> Error **errp)
>  create_unimplemented_device("DMA1",  0x4002, 0x400);
>  create_unimplemented_device("DMA2",  0x40020400, 0x400);
>  /* RESERVED:0x40020800, 0x800 */
> -create_unimplemented_device("RCC",   0x40021000, 0x400);
>  /* RESERVED:0x40021400, 0xC00 */
>  create_unimplemented_device("FLASH", 0x40022000, 0x400);
>  /* RESERVED:0x40022400, 0xC00 */
> diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
> index 4fc6b29b43..727386fa4b 100644
> --- a/hw/misc/Kconfig
> +++ b/hw/misc/Kconfig
> @@ -93,6 +93,9 @@ config STM32L4X5_EXTI
>  config STM32L4X5_SYSCFG
>  bool
>
>

Re: Call for GSoC/Outreachy internship project ideas

2024-01-31 Thread Gurchetan Singh

On Wed, Jan 24, 2024 at 4:51 AM Stefan Hajnoczi  wrote:

> On Tue, 23 Jan 2024 at 22:47, Gurchetan Singh
>  wrote:
> > Title:
> > - Improve display integration for upstream virtualized graphics
> >
> > Summary:
> > - The Rutabaga Virtual Graphics interface's UI integration upstream is
> very simple, but in deployment it will be complex.  This project aims to
> bridge the gap between downstream consumers and upstream QEMU.
> >
> > Looking for someone interested in Rust + system level graphics to help
> realize the next steps.
>
> Hi Gurchetan,
> It's unclear what this project idea entails.
>
> Based on your email my guess is you're looking for someone to help
> upstream code into QEMU, but I'm not sure. Last year there was a
> project to upstream bsd-user emulation code into QEMU and I think that
> type of project can work well.
>
> Or maybe you're looking for someone to write a QEMU UI code that uses
> rutabaga_gfx.
>
> Can you describe the next steps in more detail?
>
> The project description should contain enough information for someone
> who knows how to program but has no domain knowledge in Rutabaga,
> virtio-gpu, or QEMU.
>
> > Note: developers should be willing to sign Google CLA, here:
> >
> > https://cla.developers.google.com/about/google-individual
> >
> > But everything will be FOSS.
>
> Which codebase will this project touch? If a CLA is required then it
> sounds like it's not qemu.git?
>

Good points, I think we need to think about this a bit more   I hereby
withdraw the idea!


>
> > Links
> > - https://crosvm.dev/book/appendix/rutabaga_gfx.html
> > -
> https://patchew.org/QEMU/20230421011223.718-1-gurchetansi...@chromium.org/
> >
> > Skills
> >  - Level: Advanced
> >  - Rust, Vulkan, virtualization, cross-platform graphics
>

Re: [PATCH 03/10] disas/riscv: Clean up includes

2024-01-31 Thread Alistair Francis

On Fri, Jan 26, 2024 at 4:04 AM Peter Maydell  wrote:
>
> This commit was created with scripts/clean-includes:
>  ./scripts/clean-includes --git disas/riscv disas/riscv*[ch]
>
> All .c should include qemu/osdep.h first.  The script performs three
> related cleanups:
>
> * Ensure .c files include qemu/osdep.h first.
> * Including it in a .h is redundant, since the .c  already includes
>   it.  Drop such inclusions.
> * Likewise, including headers qemu/osdep.h includes is redundant.
>   Drop these, too.
>
> Signed-off-by: Peter Maydell 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  disas/riscv.h  | 1 -
>  disas/riscv-xthead.c   | 1 +
>  disas/riscv-xventana.c | 1 +
>  3 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/disas/riscv.h b/disas/riscv.h
> index 19e5ed2ce63..16a08e4895c 100644
> --- a/disas/riscv.h
> +++ b/disas/riscv.h
> @@ -7,7 +7,6 @@
>  #ifndef DISAS_RISCV_H
>  #define DISAS_RISCV_H
>
> -#include "qemu/osdep.h"
>  #include "target/riscv/cpu_cfg.h"
>
>  /* types */
> diff --git a/disas/riscv-xthead.c b/disas/riscv-xthead.c
> index 99da679d16c..fcca326d1c3 100644
> --- a/disas/riscv-xthead.c
> +++ b/disas/riscv-xthead.c
> @@ -4,6 +4,7 @@
>   * SPDX-License-Identifier: GPL-2.0-or-later
>   */
>
> +#include "qemu/osdep.h"
>  #include "disas/riscv.h"
>  #include "disas/riscv-xthead.h"
>
> diff --git a/disas/riscv-xventana.c b/disas/riscv-xventana.c
> index a0224d1fb31..cd694f15f32 100644
> --- a/disas/riscv-xventana.c
> +++ b/disas/riscv-xventana.c
> @@ -4,6 +4,7 @@
>   * SPDX-License-Identifier: GPL-2.0-or-later
>   */
>
> +#include "qemu/osdep.h"
>  #include "disas/riscv.h"
>  #include "disas/riscv-xventana.h"
>
> --
> 2.34.1
>
>

[PATCH v3 3/4] target/s390x: implement CVB, CVBY and CVBG

2024-01-31 Thread Ilya Leoshkevich

From: Pavel Zbitskiy 

Convert to Binary - counterparts of the already implemented Convert
to Decimal (CVD*) instructions.
Example from the Principles of Operation: 25594C becomes 63FA.

[iii: Use separate functions for CVB and CVBG for simplicity].

Signed-off-by: Pavel Zbitskiy 
---
 target/s390x/helper.h|  1 +
 target/s390x/tcg/insn-data.h.inc |  4 
 target/s390x/tcg/int_helper.c| 40 
 target/s390x/tcg/translate.c | 12 ++
 4 files changed, 57 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 332a9a9c632..3c607f4e437 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -88,6 +88,7 @@ DEF_HELPER_FLAGS_3(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i128, 
i64)
 DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_2(sqxb, TCG_CALL_NO_WG, i128, env, i128)
+DEF_HELPER_FLAGS_3(cvb, TCG_CALL_NO_WG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_1(cvd, TCG_CALL_NO_RWG_SE, i64, s32)
 DEF_HELPER_FLAGS_1(cvdg, TCG_CALL_NO_RWG_SE, i128, s64)
 DEF_HELPER_FLAGS_4(pack, TCG_CALL_NO_WG, void, env, i32, i64, i64)
diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index 388dcb8dbbc..9eb998d4c25 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/target/s390x/tcg/insn-data.h.inc
@@ -293,6 +293,10 @@
 D(0xec73, CLFIT,   RIE_a, GIE, r1_32u, i2_16u, 0, 0, ct, 0, 1)
 D(0xec71, CLGIT,   RIE_a, GIE, r1_o, i2_16u, 0, 0, ct, 0, 1)
 
+/* CONVERT TO BINARY */
+C(0x4f00, CVB, RX_a,  Z,   la2, 0, new, r1_32, cvb, 0)
+C(0xe306, CVBY,RXY_a, LD,  la2, 0, new, r1_32, cvb, 0)
+C(0xe30e, CVBG,RXY_a, Z,   la2, 0, r1, 0, cvbg, 0)
 /* CONVERT TO DECIMAL */
 C(0x4e00, CVD, RX_a,  Z,   r1_o, a2, 0, 0, cvd, 0)
 C(0xe326, CVDY,RXY_a, LD,  r1_o, a2, 0, 0, cvd, 0)
diff --git a/target/s390x/tcg/int_helper.c b/target/s390x/tcg/int_helper.c
index 121e3006a65..002d4b52dda 100644
--- a/target/s390x/tcg/int_helper.c
+++ b/target/s390x/tcg/int_helper.c
@@ -25,6 +25,7 @@
 #include "exec/exec-all.h"
 #include "qemu/host-utils.h"
 #include "exec/helper-proto.h"
+#include "exec/cpu_ldst.h"
 
 /* #define DEBUG_HELPER */
 #ifdef DEBUG_HELPER
@@ -98,6 +99,45 @@ Int128 HELPER(divu64)(CPUS390XState *env, uint64_t ah, 
uint64_t al, uint64_t b)
 tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
 
+uint64_t HELPER(cvb)(CPUS390XState *env, uint64_t src, uint32_t n)
+{
+int64_t dec, sign = 0, digit, val = 0, pow10 = 0;
+const uintptr_t ra = GETPC();
+uint64_t tmpsrc;
+int i, j;
+
+for (i = 0; i < n; i++) {
+tmpsrc = wrap_address(env, src + (n - i - 1) * 8);
+dec = cpu_ldq_data_ra(env, tmpsrc, ra);
+for (j = 0; j < 16; j++, dec >>= 4) {
+if (i == 0 && j == 0) {
+sign = dec & 0xf;
+if (sign < 0xa) {
+tcg_s390_data_exception(env, 0, ra);
+}
+continue;
+}
+digit = dec & 0xf;
+if (digit > 0x9) {
+tcg_s390_data_exception(env, 0, ra);
+}
+if (i == 0 && j == 1) {
+if (sign == 0xb || sign == 0xd) {
+val = -digit;
+pow10 = -10;
+} else {
+val = digit;
+pow10 = 10;
+}
+} else {
+val += digit * pow10;
+pow10 *= 10;
+}
+}
+}
+return val;
+}
+
 uint64_t HELPER(cvd)(int32_t reg)
 {
 /* positive 0 */
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index c2fdc920a50..43216571b44 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2223,6 +2223,18 @@ static DisasJumpType op_csp(DisasContext *s, DisasOps *o)
 }
 #endif
 
+static DisasJumpType op_cvb(DisasContext *s, DisasOps *o)
+{
+gen_helper_cvb(o->out, tcg_env, o->addr1, tcg_constant_i32(1));
+return DISAS_NEXT;
+}
+
+static DisasJumpType op_cvbg(DisasContext *s, DisasOps *o)
+{
+gen_helper_cvb(o->out, tcg_env, o->addr1, tcg_constant_i32(2));
+return DISAS_NEXT;
+}
+
 static DisasJumpType op_cvd(DisasContext *s, DisasOps *o)
 {
 TCGv_i64 t1 = tcg_temp_new_i64();
-- 
2.43.0

[PATCH v3 1/4] target/s390x: Emulate CVDG

2024-01-31 Thread Ilya Leoshkevich

CVDG is the same as CVD, except that it converts 64 bits into 128,
rather than 32 into 64. Create a new helper, which uses Int128
wrappers.

Reported-by: Ido Plat 
Reviewed-by: Richard Henderson 
Signed-off-by: Ilya Leoshkevich 
---
 target/s390x/helper.h|  1 +
 target/s390x/tcg/insn-data.h.inc |  1 +
 target/s390x/tcg/int_helper.c| 21 +
 target/s390x/tcg/translate.c |  8 
 4 files changed, 31 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 05102578fc9..332a9a9c632 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -89,6 +89,7 @@ DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_2(sqxb, TCG_CALL_NO_WG, i128, env, i128)
 DEF_HELPER_FLAGS_1(cvd, TCG_CALL_NO_RWG_SE, i64, s32)
+DEF_HELPER_FLAGS_1(cvdg, TCG_CALL_NO_RWG_SE, i128, s64)
 DEF_HELPER_FLAGS_4(pack, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(pka, TCG_CALL_NO_WG, void, env, i64, i64, i32)
 DEF_HELPER_FLAGS_4(pku, TCG_CALL_NO_WG, void, env, i64, i64, i32)
diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index 2f07f39d9cb..388dcb8dbbc 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/target/s390x/tcg/insn-data.h.inc
@@ -296,6 +296,7 @@
 /* CONVERT TO DECIMAL */
 C(0x4e00, CVD, RX_a,  Z,   r1_o, a2, 0, 0, cvd, 0)
 C(0xe326, CVDY,RXY_a, LD,  r1_o, a2, 0, 0, cvd, 0)
+C(0xe32e, CVDG,RXY_a, Z,   r1_o, a2, 0, 0, cvdg, 0)
 /* CONVERT TO FIXED */
 F(0xb398, CFEBR,   RRF_e, Z,   0, e2, new, r1_32, cfeb, 0, IF_BFP)
 F(0xb399, CFDBR,   RRF_e, Z,   0, f2, new, r1_32, cfdb, 0, IF_BFP)
diff --git a/target/s390x/tcg/int_helper.c b/target/s390x/tcg/int_helper.c
index eb8e6dd1b57..121e3006a65 100644
--- a/target/s390x/tcg/int_helper.c
+++ b/target/s390x/tcg/int_helper.c
@@ -118,6 +118,27 @@ uint64_t HELPER(cvd)(int32_t reg)
 return dec;
 }
 
+Int128 HELPER(cvdg)(int64_t reg)
+{
+/* positive 0 */
+Int128 dec = int128_make64(0x0c);
+Int128 bin = int128_makes64(reg);
+Int128 base = int128_make64(10);
+int shift;
+
+if (!int128_nonneg(bin)) {
+bin = int128_neg(bin);
+dec = int128_make64(0x0d);
+}
+
+for (shift = 4; (shift < 128) && int128_nz(bin); shift += 4) {
+dec = int128_or(dec, int128_lshift(int128_remu(bin, base), shift));
+bin = int128_divu(bin, base);
+}
+
+return dec;
+}
+
 uint64_t HELPER(popcnt)(uint64_t val)
 {
 /* Note that we don't fold past bytes. */
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index a5fd9cccaa5..c2fdc920a50 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2233,6 +2233,14 @@ static DisasJumpType op_cvd(DisasContext *s, DisasOps *o)
 return DISAS_NEXT;
 }
 
+static DisasJumpType op_cvdg(DisasContext *s, DisasOps *o)
+{
+TCGv_i128 t = tcg_temp_new_i128();
+gen_helper_cvdg(t, o->in1);
+tcg_gen_qemu_st_i128(t, o->in2, get_mem_index(s), MO_TE | MO_128);
+return DISAS_NEXT;
+}
+
 static DisasJumpType op_ct(DisasContext *s, DisasOps *o)
 {
 int m3 = get_field(s, m3);
-- 
2.43.0

[PATCH v3 4/4] tests/tcg/s390x: Test CONVERT TO BINARY

2024-01-31 Thread Ilya Leoshkevich

Check the CVB's and CVBG's corner cases.

Co-developed-by: Pavel Zbitskiy 
Signed-off-by: Ilya Leoshkevich 
---
 tests/tcg/s390x/Makefile.target |  1 +
 tests/tcg/s390x/cvb.c   | 47 +
 2 files changed, 48 insertions(+)
 create mode 100644 tests/tcg/s390x/cvb.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 04e4bddd83d..e2aba2ec274 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -46,6 +46,7 @@ TESTS+=laalg
 TESTS+=add-logical-with-carry
 TESTS+=lae
 TESTS+=cvd
+TESTS+=cvb
 
 cdsg: CFLAGS+=-pthread
 cdsg: LDFLAGS+=-pthread
diff --git a/tests/tcg/s390x/cvb.c b/tests/tcg/s390x/cvb.c
new file mode 100644
index 000..47b7a7965f4
--- /dev/null
+++ b/tests/tcg/s390x/cvb.c
@@ -0,0 +1,47 @@
+/*
+ * Test the CONVERT TO DECIMAL instruction.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include 
+#include 
+#include 
+
+static int32_t cvb(uint64_t x)
+{
+uint32_t ret;
+
+asm("cvb %[ret],%[x]" : [ret] "=r" (ret) : [x] "R" (x));
+
+return ret;
+}
+
+static int64_t cvbg(__uint128_t x)
+{
+int64_t ret;
+
+asm("cvbg %[ret],%[x]" : [ret] "=r" (ret) : [x] "T" (x));
+
+return ret;
+}
+
+int main(void)
+{
+__uint128_t m = (((__uint128_t)0x9223372036854775) << 16) | 0x8070;
+
+assert(cvb(0xc) == 0);
+assert(cvb(0x1c) == 1);
+assert(cvb(0x25594c) == 25594);
+assert(cvb(0x1d) == -1);
+assert(cvb(0x2147483647c) == 0x7fff);
+assert(cvb(0x2147483647d) == -0x7fff);
+
+assert(cvbg(0xc) == 0);
+assert(cvbg(0x1c) == 1);
+assert(cvbg(0x25594c) == 25594);
+assert(cvbg(0x1d) == -1);
+assert(cvbg(m | 0xc) == 0x7fff);
+assert(cvbg(m | 0xd) == -0x7fff);
+
+return EXIT_SUCCESS;
+}
-- 
2.43.0

[PATCH v3 0/4] target/s390x: Emulate CVDG and CVB*

2024-01-31 Thread Ilya Leoshkevich

v2: https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05048.html
v2 -> v3: Resurrect an old CVB* patch (Thomas).
  Add Richard's R-b.

v1: https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02865.html
v1 -> v2: Fix !CONFIG_INT128 builds (Richard).

Hi,

Ido reported that we are missing the CVDG emulation (which is very
similar to the existing CVD emulation). This series adds it along with
a test.

Best regards,
Ilya

Ilya Leoshkevich (3):
  target/s390x: Emulate CVDG
  tests/tcg/s390x: Test CONVERT TO DECIMAL
  tests/tcg/s390x: Test CONVERT TO BINARY

Pavel Zbitskiy (1):
  target/s390x: implement CVB, CVBY and CVBG

 target/s390x/helper.h|  2 ++
 target/s390x/tcg/insn-data.h.inc |  5 +++
 target/s390x/tcg/int_helper.c| 61 
 target/s390x/tcg/translate.c | 20 +++
 tests/tcg/s390x/Makefile.target  |  2 ++
 tests/tcg/s390x/cvb.c| 47 
 tests/tcg/s390x/cvd.c| 45 +++
 7 files changed, 182 insertions(+)
 create mode 100644 tests/tcg/s390x/cvb.c
 create mode 100644 tests/tcg/s390x/cvd.c

-- 
2.43.0

[PATCH v3 2/4] tests/tcg/s390x: Test CONVERT TO DECIMAL

2024-01-31 Thread Ilya Leoshkevich

Check the CVD's and CVDG's corner cases.

Signed-off-by: Ilya Leoshkevich 
---
 tests/tcg/s390x/Makefile.target |  1 +
 tests/tcg/s390x/cvd.c   | 45 +
 2 files changed, 46 insertions(+)
 create mode 100644 tests/tcg/s390x/cvd.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 30994dcf9c2..04e4bddd83d 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -45,6 +45,7 @@ TESTS+=clc
 TESTS+=laalg
 TESTS+=add-logical-with-carry
 TESTS+=lae
+TESTS+=cvd
 
 cdsg: CFLAGS+=-pthread
 cdsg: LDFLAGS+=-pthread
diff --git a/tests/tcg/s390x/cvd.c b/tests/tcg/s390x/cvd.c
new file mode 100644
index 000..c1fb63ca9a6
--- /dev/null
+++ b/tests/tcg/s390x/cvd.c
@@ -0,0 +1,45 @@
+/*
+ * Test the CONVERT TO DECIMAL instruction.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include 
+#include 
+#include 
+
+static uint64_t cvd(int32_t x)
+{
+uint64_t ret;
+
+asm("cvd %[x],%[ret]" : [ret] "=R" (ret) : [x] "r" (x));
+
+return ret;
+}
+
+static __uint128_t cvdg(int64_t x)
+{
+__uint128_t ret;
+
+asm("cvdg %[x],%[ret]" : [ret] "=T" (ret) : [x] "r" (x));
+
+return ret;
+}
+
+int main(void)
+{
+__uint128_t m = (((__uint128_t)0x9223372036854775) << 16) | 0x8070;
+
+assert(cvd(0) == 0xc);
+assert(cvd(1) == 0x1c);
+assert(cvd(-1) == 0x1d);
+assert(cvd(0x7fff) == 0x2147483647c);
+assert(cvd(-0x7fff) == 0x2147483647d);
+
+assert(cvdg(0) == 0xc);
+assert(cvdg(1) == 0x1c);
+assert(cvdg(-1) == 0x1d);
+assert(cvdg(0x7fff) == (m | 0xc));
+assert(cvdg(-0x7fff) == (m | 0xd));
+
+return EXIT_SUCCESS;
+}
-- 
2.43.0

Re: [PATCH v2 09/19] qapi/schema: allow resolve_type to be used for built-in types

2024-01-31 Thread John Snow

On Mon, Jan 22, 2024 at 8:12 AM Markus Armbruster  wrote:
>
> John Snow  writes:
>
> > On Tue, Jan 16, 2024 at 6:09 AM Markus Armbruster  wrote:
> >>
> >> John Snow  writes:
> >>
> >> > allow resolve_type to be used for both built-in and user-specified
> >> > type definitions. In the event that the type cannot be resolved, assert
> >> > that 'info' and 'what' were both provided in order to create a usable
> >> > QAPISemError.
> >> >
> >> > In practice, 'info' will only be None for built-in definitions, which
> >> > *should not fail* type lookup.
> >> >
> >> > As a convenience, allow the 'what' and 'info' parameters to be elided
> >> > entirely so that it can be used as a can-not-fail version of
> >> > lookup_type.
> >>
> >> The convenience remains unused until the next patch.  It should be added
> >> there.
> >
> > Okie-ducky.
> >
> >>
> >> > Note: there are only three callsites to resolve_type at present where
> >> > "info" is perceived to be possibly None:
> >> >
> >> > 1) QAPISchemaArrayType.check()
> >> > 2) QAPISchemaObjectTypeMember.check()
> >> > 3) QAPISchemaEvent.check()
> >> >
> >> > Of those three, only the first actually ever passes None;
> >>
> >> Yes.  More below.
> >
> > Scary...
>
> I know...
>
> >> >   the other 
> >> > two
> >> > are limited by their base class initializers which accept info=None, 
> >> > but
> >>
> >> They do?
> >
> > In the case of QAPISchemaObjectTypeMember, the parent class
> > QAPISchemaMember allows initialization with info=None. I can't fully
> > trace all of the callsites, but one of them at least is in types.py:
> >
> >> enum_members = members + [QAPISchemaEnumMember('_MAX', None)]
>
> I see.
>
> We may want to do the _MAX thingy differently.  Not now.
>
> > which necessitates, for now, info-less QAPISchemaEnumMember, which
> > necessitates info-less QAPISchemaMember. There are others, etc.
>
> Overriding an inherited attribute of type Optional[T] so it's
> non-optional T makes mypy unhappy?
>
> >> > neither actually use it in practice.
> >> >
> >> > Signed-off-by: John Snow 
> >>
> >> Hmm.
> >
> > Scary.
> >
> >>
> >> We look up types by name in two ways:
> >>
> >> 1. Failure is a semantic error
> >>
> >>Use .resolve_type(), passing real @info and @what.
> >>
> >>Users:
> >>
> >>* QAPISchemaArrayType.check() resolving the element type
> >>
> >>  Fine print: when the array type is built-in, we pass None @info and
> >>  @what.  The built-in array type's element type must exist for
> >>  .resolve_type() to work.  This commit changes .resolve_type() to
> >>  assert it does.
> >>
> >>* QAPISchemaObjectType.check() resolving the base type
> >>
> >>* QAPISchemaObjectTypeMember.check() resolving the member type
> >>
> >>* QAPISchemaCommand.check() resolving argument type (if named) and
> >>  return type (which is always named).
> >>
> >>* QAPISchemaEvent.check() resolving argument type (if named).
> >>
> >>Note all users are in .check() methods.  That's where type named get
> >>resolved.
> >>
> >> 2. Handle failure
> >>
> >>Use .lookup_type(), which returns None when the named type doesn't
> >>exist.
> >>
> >>Users:
> >>
> >>* QAPISchemaVariants.check(), to look up the base type containing the
> >>  tag member for error reporting purposes.  Failure would be a
> >>  programming error.
> >>
> >>* .resolve_type(), which handles failure as semantic error
> >>
> >>* ._make_array_type(), which uses it as "type exists already"
> >>   predicate.
> >>
> >>* QAPISchemaGenIntrospectVisitor._use_type(), to look up certain
> >>  built-in types.  Failure would be a programming error.
> >>
> >> The next commit switches the uses where failure would be a programming
> >> error from .lookup_type() to .resolve_type() without @info and @what, so
> >> failure trips its assertion.  I don't like it, because it overloads
> >> .resolve_type() to serve two rather different use cases:
> >>
> >> 1. Failure is a semantic error; pass @info and @what
> >>
> >> 2. Failure is a programming error; don't pass @info and what
> >>
> >> The odd one out is of course QAPISchemaArrayType.check(), which wants to
> >> use 1. for the user's types and 2. for built-in types.  Let's ignore it
> >> for a second.
> >
> > "Let's ignore what motivated this patch" aww...
>
> Just for a second, I swear!
>
> >> I prefer to do 2. like typ = .lookup_type(); assert typ.  We can factor
> >> this out into its own helper if that helps (pardon the pun).
> >>
> >> Back to QAPISchemaArrayType.check().  Its need to resolve built-in
> >> element types, which have no info, necessitates .resolve_type() taking
> >> Optional[QAPISourceInfo].  This might bother you.  It doesn't bother me,
> >> unless it leads to mypy complications I can't see.
> >
> > Well, with this patch I allowed it to take Optional[QAPISourceInfo] -
> > just keep in mind that

Re: [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups

2024-01-31 Thread Fabiano Rosas

pet...@redhat.com writes:

> From: Peter Xu 
>
> This patchset contains quite a few refactorings to current multifd:
>
>   - It picked up some patches from an old series of mine [0] (the last
> patches were dropped, though; I did the cleanup slightly differently):
>
> I still managed to include one patch to split pending_job, but I
> rewrote the patch here.
>
>   - It tries to cleanup multiple multifd paths here and there, the ultimate
> goal is to redefine send_prepare() to be something like:
>
>   p->pages --->  send_prepare() -> IOVs
>
> So that there's no obvious change yet on multifd_ops besides redefined
> interface for send_prepare().  We may want a separate OPs for file
> later.
>
> For 2), one benefit is already presented by Fabiano in his other series [1]
> on cleaning up zero copy, but this patchset addressed it quite differently,
> and hopefully also more gradually.  The other benefit is for sure if we
> have a more concrete API for send_prepare() and if we can reach an initial
> consensus, then we can have the recent compression accelerators rebased on
> top of this one.
>
> This also prepares for the case where the input can be extended to even not
> any p->pages, but arbitrary data (like VFIO's potential use case in the
> future?).  But that will also for later even if reasonable.
>
> Please have a look.  Thanks,
>
> [0] https://lore.kernel.org/r/20231022201211.452861-1-pet...@redhat.com
> [1] https://lore.kernel.org/qemu-devel/20240126221943.26628-1-faro...@suse.de
>
> Peter Xu (14):
>   migration/multifd: Drop stale comment for multifd zero copy
>   migration/multifd: multifd_send_kick_main()
>   migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths
>   migration/multifd: Postpone reset of MultiFDPages_t
>   migration/multifd: Drop MultiFDSendParams.normal[] array
>   migration/multifd: Separate SYNC request with normal jobs
>   migration/multifd: Simplify locking in sender thread
>   migration/multifd: Drop pages->num check in sender thread
>   migration/multifd: Rename p->num_packets and clean it up
>   migration/multifd: Move total_normal_pages accounting
>   migration/multifd: Move trace_multifd_send|recv()
>   migration/multifd: multifd_send_prepare_header()
>   migration/multifd: Move header prepare/fill into send_prepare()
>   migration/multifd: Forbid spurious wakeups
>
>  migration/multifd.h  |  34 +++--
>  migration/multifd-zlib.c |  11 +-
>  migration/multifd-zstd.c |  11 +-
>  migration/multifd.c  | 291 +++
>  4 files changed, 182 insertions(+), 165 deletions(-)

This series didn't survive my  iterations test on the opensuse
machine.

# Running /x86_64/migration/multifd/tcp/tls/x509/reject-anon-client
...
kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core 
dumped)


#0  0x5575dda06399 in qemu_mutex_lock_impl (mutex=0x18, file=0x5575ddce9cc3 
"../util/qemu-thread-posix.c", line=275) at ../util/qemu-thread-posix.c:92
#1  0x5575dda06a94 in qemu_sem_post (sem=0x18) at 
../util/qemu-thread-posix.c:275
#2  0x5575dd56a512 in multifd_send_thread (opaque=0x5575df054ef8) at 
../migration/multifd.c:720
#3  0x5575dda0709b in qemu_thread_start (args=0x7fd404001d50) at 
../util/qemu-thread-posix.c:541
#4  0x7fd45e8a26ea in start_thread (arg=0x7fd3faffd700) at 
pthread_create.c:477
#5  0x7fd45cd2150f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The multifd thread is posting channels_ready with an already freed
multifd_send_state.

This is the bug Avihai has hit. We're going into multifd_save_cleanup()
so early that multifd_new_send_channel_async() hasn't even had the
chance to set p->running. So it misses the join and frees everything up
while a second multifd thread is just starting.

Re: [PATCH v2 09/19] qapi/schema: allow resolve_type to be used for built-in types

2024-01-31 Thread John Snow

On Mon, Jan 22, 2024 at 8:12 AM Markus Armbruster  wrote:
>
> John Snow  writes:
>
> > On Tue, Jan 16, 2024 at 6:09 AM Markus Armbruster  wrote:
> >>
> >> John Snow  writes:
> >>
> >> > allow resolve_type to be used for both built-in and user-specified
> >> > type definitions. In the event that the type cannot be resolved, assert
> >> > that 'info' and 'what' were both provided in order to create a usable
> >> > QAPISemError.
> >> >
> >> > In practice, 'info' will only be None for built-in definitions, which
> >> > *should not fail* type lookup.
> >> >
> >> > As a convenience, allow the 'what' and 'info' parameters to be elided
> >> > entirely so that it can be used as a can-not-fail version of
> >> > lookup_type.
> >>
> >> The convenience remains unused until the next patch.  It should be added
> >> there.
> >
> > Okie-ducky.
> >
> >>
> >> > Note: there are only three callsites to resolve_type at present where
> >> > "info" is perceived to be possibly None:
> >> >
> >> > 1) QAPISchemaArrayType.check()
> >> > 2) QAPISchemaObjectTypeMember.check()
> >> > 3) QAPISchemaEvent.check()
> >> >
> >> > Of those three, only the first actually ever passes None;
> >>
> >> Yes.  More below.
> >
> > Scary...
>
> I know...
>
> >> >   the other 
> >> > two
> >> > are limited by their base class initializers which accept info=None, 
> >> > but
> >>
> >> They do?
> >
> > In the case of QAPISchemaObjectTypeMember, the parent class
> > QAPISchemaMember allows initialization with info=None. I can't fully
> > trace all of the callsites, but one of them at least is in types.py:
> >
> >> enum_members = members + [QAPISchemaEnumMember('_MAX', None)]
>
> I see.
>
> We may want to do the _MAX thingy differently.  Not now.
>
> > which necessitates, for now, info-less QAPISchemaEnumMember, which
> > necessitates info-less QAPISchemaMember. There are others, etc.
>
> Overriding an inherited attribute of type Optional[T] so it's
> non-optional T makes mypy unhappy?

Yeah, it considers it to be improper OO - it remembers only the
broadest type from the base class, which is Optional[T]. We aren't
overriding the property itself, we've just redefined a different
initializer, which doesn't carry through to the actual object.

(i.e. the initializer takes a T, the core object has an Optional[T],
there's no problem - but the field remains Optional[T].)

>
> >> > neither actually use it in practice.
> >> >
> >> > Signed-off-by: John Snow 
> >>
> >> Hmm.
> >
> > Scary.
> >
> >>
> >> We look up types by name in two ways:
> >>
> >> 1. Failure is a semantic error
> >>
> >>Use .resolve_type(), passing real @info and @what.
> >>
> >>Users:
> >>
> >>* QAPISchemaArrayType.check() resolving the element type
> >>
> >>  Fine print: when the array type is built-in, we pass None @info and
> >>  @what.  The built-in array type's element type must exist for
> >>  .resolve_type() to work.  This commit changes .resolve_type() to
> >>  assert it does.
> >>
> >>* QAPISchemaObjectType.check() resolving the base type
> >>
> >>* QAPISchemaObjectTypeMember.check() resolving the member type
> >>
> >>* QAPISchemaCommand.check() resolving argument type (if named) and
> >>  return type (which is always named).
> >>
> >>* QAPISchemaEvent.check() resolving argument type (if named).
> >>
> >>Note all users are in .check() methods.  That's where type named get
> >>resolved.
> >>
> >> 2. Handle failure
> >>
> >>Use .lookup_type(), which returns None when the named type doesn't
> >>exist.
> >>
> >>Users:
> >>
> >>* QAPISchemaVariants.check(), to look up the base type containing the
> >>  tag member for error reporting purposes.  Failure would be a
> >>  programming error.
> >>
> >>* .resolve_type(), which handles failure as semantic error
> >>
> >>* ._make_array_type(), which uses it as "type exists already"
> >>   predicate.
> >>
> >>* QAPISchemaGenIntrospectVisitor._use_type(), to look up certain
> >>  built-in types.  Failure would be a programming error.
> >>
> >> The next commit switches the uses where failure would be a programming
> >> error from .lookup_type() to .resolve_type() without @info and @what, so
> >> failure trips its assertion.  I don't like it, because it overloads
> >> .resolve_type() to serve two rather different use cases:
> >>
> >> 1. Failure is a semantic error; pass @info and @what
> >>
> >> 2. Failure is a programming error; don't pass @info and what
> >>
> >> The odd one out is of course QAPISchemaArrayType.check(), which wants to
> >> use 1. for the user's types and 2. for built-in types.  Let's ignore it
> >> for a second.
> >
> > "Let's ignore what motivated this patch" aww...
>
> Just for a second, I swear!
>
> >> I prefer to do 2. like typ = .lookup_type(); assert typ.  We can factor
> >> this out into its own helper if that helps (pardon the pun).
> >>
> >>

[PATCH] tests/tcg: Fix the /proc/self/mem probing in the PROT_NONE gdbstub test

2024-01-31 Thread Ilya Leoshkevich

The `if not probe_proc_self_mem` check never passes, because
probe_proc_self_mem is a function object, which is a truthy value.
Add parentheses in order to perform a function call.

Fixes: dc84d50a7f9b ("tests/tcg: Add the PROT_NONE gdbstub test")
Signed-off-by: Ilya Leoshkevich 
---
 tests/tcg/multiarch/gdbstub/prot-none.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/tcg/multiarch/gdbstub/prot-none.py 
b/tests/tcg/multiarch/gdbstub/prot-none.py
index e829d3ebc5f..7e264589cb8 100644
--- a/tests/tcg/multiarch/gdbstub/prot-none.py
+++ b/tests/tcg/multiarch/gdbstub/prot-none.py
@@ -20,7 +20,7 @@ def probe_proc_self_mem():
 
 def run_test():
 """Run through the tests one by one"""
-if not probe_proc_self_mem:
+if not probe_proc_self_mem():
 print("SKIP: /proc/self/mem is not usable")
 exit(0)
 gdb.Breakpoint("break_here")
-- 
2.43.0

Re: [PATCH 00/22] target/sparc: floating-point cleanup

2024-01-31 Thread Mark Cave-Ayland


On 28/01/2024 06:49, Richard Henderson wrote:


On 11/4/23 03:38, Richard Henderson wrote:

Major changes:

(1) Get rid of the env->qt[01] temporaries and use TCGv_i128 for float128.
(2) Perform ieee exception check within the helpers, before any writeback
 to the floating point registers.
(3) Split env->fsr into pieces to simplify update, especially compares.


r~


Based-on: 20231101041132.174501-1-richard.hender...@linaro.org
("[PATCH v2 00/21] target/sparc: Cleanup condition codes etc")


Ping.

Prerequisites are upstream, and it rebases cleanly on master.
For reference,

   https://gitlab.com/rth7680/qemu/-/commits/tgt-sparc-fp


r~


I've tested the above branch on my SPARC32 and SPARC64 images, and whilst I don't 
think they particularly exercise FP instructions, I don't see any regressions so:


Tested-by: Mark Cave-Ayland 
Acked-by: Mark Cave-Ayland 

I'm happy for you to take this via tcg-next if that's easiest for you.


Richard Henderson (22):
   target/sparc: Use tcg_gen_qemu_{ld,st}_i128 for ASI_M_BCOPY
   target/sparc: Use tcg_gen_qemu_{ld,st}_i128 for ASI_M_BFILL
   target/sparc: Remove gen_dest_fpr_F
   target/sparc: Introduce gen_{load,store}_fpr_Q
   target/sparc: Inline FNEG, FABS
   target/sparc: Use i128 for FSQRTq
   target/sparc: Use i128 for FADDq, FSUBq, FMULq, FDIVq
   target/sparc: Use i128 for FqTOs, FqTOi
   target/sparc: Use i128 for FqTOd, FqTOx
   target/sparc: Use i128 for FCMPq, FCMPEq
   target/sparc: Use i128 for FsTOq, FiTOq
   target/sparc: Use i128 for FdTOq, FxTOq
   target/sparc: Use i128 for Fdmulq
   target/sparc: Remove qt0, qt1 temporaries
   target/sparc: Introduce cpu_get_fsr, cpu_put_fsr
   target/split: Split ver from env->fsr
   target/sparc: Clear cexc and ftt in do_check_ieee_exceptions
   target/sparc: Merge check_ieee_exceptions with FPop helpers
   target/sparc: Split cexc and ftt from env->fsr
   target/sparc: Remove cpu_fsr
   target/sparc: Split fcc out of env->fsr
   target/sparc: Remove FSR_FTT_NMASK, FSR_FTT_CEXC_NMASK

  target/sparc/cpu.h  |  39 +-
  target/sparc/helper.h   | 116 ++
  linux-user/sparc/cpu_loop.c |   2 +-
  linux-user/sparc/signal.c   |  14 +-
  target/sparc/cpu.c  |  32 +-
  target/sparc/fop_helper.c   | 510 +--
  target/sparc/gdbstub.c  |   8 +-
  target/sparc/ldst_helper.c  |   3 -
  target/sparc/machine.c  |  38 +-
  target/sparc/translate.c    | 799 
  10 files changed, 680 insertions(+), 881 deletions(-)



ATB,

Mark.

Re: [PATCH 13/14] migration/multifd: Move header prepare/fill into send_prepare()

2024-01-31 Thread Fabiano Rosas

pet...@redhat.com writes:

> From: Peter Xu 
>
> This patch redefines the interfacing of ->send_prepare().  It further
> simplifies multifd_send_thread() especially on zero copy.
>
> Now with the new interface, we require the hook to do all the work for
> preparing the IOVs to send.  After it's completed, the IOVs should be ready
> to be dumped into the specific multifd QIOChannel later.
>
> So now the API looks like:
>
>   p->pages --->  send_prepare() -> IOVs
>
> This also prepares for the case where the input can be extended to even not
> any p->pages.  But that's for later.
>
> This patch will achieve similar goal of what Fabiano used to propose here:
>
> https://lore.kernel.org/r/20240126221943.26628-1-faro...@suse.de
>
> However the send() interface may not be necessary.  I'm boldly attaching a

So should I drop send() for fixed-ram as well? Or do you still want a
separate layer just for send()?

> "Co-developed-by" for Fabiano.
>
> Co-developed-by: Fabiano Rosas 
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH 14/14] migration/multifd: Forbid spurious wakeups

2024-01-31 Thread Fabiano Rosas

pet...@redhat.com writes:

> From: Peter Xu 
>
> Now multifd's logic is designed to have no spurious wakeup.  I still
> remember a talk to Juan and he seems to agree we should drop it now, and if
> my memory was right it was there because multifd used to hit that when
> still debugging.
>
> Let's drop it and see what can explode; as long as it's not reaching
> soft-freeze.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH 12/14] migration/multifd: multifd_send_prepare_header()

2024-01-31 Thread Fabiano Rosas

pet...@redhat.com writes:

> From: Peter Xu 
>
> Introduce a helper multifd_send_prepare_header() to setup the header packet
> for multifd sender.
>
> It's fine to setup the IOV[0] _before_ send_prepare() because the packet
> buffer is already ready, even if the content is to be filled in.
>
> With this helper, we can already slightly clean up the zero copy path.
>
> Note that I explicitly put it into multifd.h, because I want it inlined
> directly into multifd*.c where necessary later.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas 

nit below:

> ---
>  migration/multifd.h |  8 
>  migration/multifd.c | 16 
>  2 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 2e4ad0dc56..4ec005f53f 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -209,5 +209,13 @@ typedef struct {
>  
>  void multifd_register_ops(int method, MultiFDMethods *ops);
>  
> +static inline void multifd_send_prepare_header(MultiFDSendParams *p)
> +{
> +p->iov[0].iov_len = p->packet_len;
> +p->iov[0].iov_base = p->packet;
> +p->iovs_num++;
> +}
> +
> +
>  #endif
>  
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 8d4b80f365..1b0035787e 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -707,10 +707,14 @@ static void *multifd_send_thread(void *opaque)
>  if (p->pending_job) {
>  MultiFDPages_t *pages = p->pages;
>  
> -if (use_zero_copy_send) {
> -p->iovs_num = 0;
> -} else {
> -p->iovs_num = 1;
> +p->iovs_num = 0;
> +
> +if (!use_zero_copy_send) {
> +/*
> + * Only !zero_copy needs the header in IOV; zerocopy will
> + * send it separately.

Could use the same spelling for both mentions to zero copy.

> + */
> +multifd_send_prepare_header(p);
>  }
>  
>  assert(pages->num);
> @@ -730,10 +734,6 @@ static void *multifd_send_thread(void *opaque)
>  if (ret != 0) {
>  break;
>  }
> -} else {
> -/* Send header using the same writev call */
> -p->iov[0].iov_len = p->packet_len;
> -p->iov[0].iov_base = p->packet;
>  }
>  
>  ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, 
> NULL,

Re: [PATCH 11/14] migration/multifd: Move trace_multifd_send|recv()

2024-01-31 Thread Fabiano Rosas

pet...@redhat.com writes:

> From: Peter Xu 
>
> Move them into fill/unfill of packets.  With that, we can further cleanup
> the send/recv thread procedure, and remove one more temp var.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH 10/14] migration/multifd: Move total_normal_pages accounting

2024-01-31 Thread Fabiano Rosas

pet...@redhat.com writes:

> From: Peter Xu 
>
> Just like the previous patch, move the accounting for total_normal_pages on
> both src/dst sides into the packet fill/unfill procedures.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH 09/14] migration/multifd: Rename p->num_packets and clean it up

2024-01-31 Thread Fabiano Rosas

pet...@redhat.com writes:

> From: Peter Xu 
>
> This field, no matter whether on src or dest, is only used for debugging
> purpose.
>
> They can even be removed already, unless it still more or less provide some
> accounting on "how many packets are sent/recved for this thread".  The
> other more important one is called packet_num, which is embeded in the
> multifd packet headers (MultiFDPacket_t).
>
> So let's keep them for now, but make them much easier to understand, by
> doing below:
>
>   - Rename both of them to packets_sent / packets_recved, the old
>   name (num_packets) are waaay too confusing when we already have
>   MultiFDPacket_t.packets_num.
>
>   - Avoid worrying on the "initial packet": we know we will send it, that's
>   good enough.  The accounting won't matter a great deal to start with 0 or
>   with 1.
>
>   - Move them to where we send/recv the packets.  They're:
>
> - multifd_send_fill_packet() for senders.
> - multifd_recv_unfill_packet() for receivers.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH 08/14] migration/multifd: Drop pages->num check in sender thread

2024-01-31 Thread Fabiano Rosas

pet...@redhat.com writes:

> From: Peter Xu 
>
> Now with a split SYNC handler, we always have pages->num set for
> pending_job==true.  Assert it instead.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH] tcg: Fixes set const_args[i] wrong value when instructions imm is 0

2024-01-31 Thread Richard Henderson


On 1/31/24 17:27, Song Gao wrote:

It seems that tcg_reg_alloc_op() set const_args[i] wrong value
when instructions imm is 0. The LoongArch tcg_out_vec_op() cmp_vec
use the wrong const_args[2].
e.g
The wrong const_args[2] is 0.
IN: vslti.w v5, v4, 0x0   OUT: vslt.w  v1, v1, v0

The right const_args[2] is 1.
IN: vslti.w v5, v4, 0x0   OUT: vslti.w v1, v1, 0x0

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2136
Signed-off-by: Song Gao 
---
  tcg/tcg.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index e2c38f6d11..5b290123bc 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -4808,7 +4808,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
  arg_ct = >args_ct[i];
  ts = arg_temp(arg);
  
-if (ts->val_type == TEMP_VAL_CONST

+if ((ts->val_type == TEMP_VAL_CONST || ts->kind == TEMP_CONST)
  && tcg_target_const_match(ts->val, ts->type, arg_ct->ct, 
TCGOP_VECE(op))) {
  /* constant is OK for instruction */
  const_args[i] = 1;


This is wrong.

I strongly suspect that the TEMP_CONST value 0 has been loaded into a register for use in 
another operation, and the register allocator sees that it is still there.



r~

QEMU 8.2.0 aarch64 sve ldff1b returning 1 byte when 16 are expected

2024-01-31 Thread Mark Charney OS

Using QEMU v8.2.0 (and also the HEAD of the git master branch), I
encountered an unexpected situation: an ldff1b is returning 1 byte
when I run with the QEMU user level plugin (and setting FFR as if
there was a fault).

However the ldff1b actually loads 16 bytes when: (a) I run this same
test natively on a system with SVE support (no QEMU involved) or (b)
when I run this test interactively (logged in to a console) in GDB
running on the QEMU (with no plugin involved).

I was wondering if this one-byte-per-ldff1b was a known/expected
behavior with plugins?  I guess it is legal to only return one byte,
but I was wondering why QEMU did this and if there was some way to get
QEMU to return 16 bytes in the absence of faults (or as many bytes as
it can up until the fault).

There is *no* page boundary being crossed in the examples of interest,
and no MMIO, so a partial data return is not expected. The page
referenced is mapped and previously referenced.

Talking to Alex Bennee, he pointed out:

> I'm wondering if this is a result of the fix in 6d03226b422
> (plugins: force slow path when plugins instrument memory ops). This
> will always force the slow path which is where we instrument the
> operation.

I attempted to revert this commit locally and no longer got memop
callbacks for any SVE load operations, first fault, nonfault or not
"normal" predicated SVE operations. But I believe ldff1b are returning
16 bytes (judging by the control flow).

Our goal is to use QEMU for tracing with a home-grown plugin.  For our
purposes, we were expecting to observe control flow like what we see
on SVE-enabled hardware where ldff1b returns 16 bytes in the absence
of faults.

If necessary, I can provide a reproducer, that includes:
  - a sve strcpy loop from one of Alex's talks.
  - a simple user level plugin

[PATCH 3/3] tests/tcg: Add two follow-fork-mode tests

2024-01-31 Thread Ilya Leoshkevich

Add follow-fork-mode child and and follow-fork-mode parent tests.
Check for the obvious pitfalls, such as lingering breakpoints,
catchpoints, and single-step mode.

Signed-off-by: Ilya Leoshkevich 
---
 tests/tcg/multiarch/Makefile.target   | 17 +-
 tests/tcg/multiarch/follow-fork-mode.c| 56 +++
 .../gdbstub/follow-fork-mode-child.py | 40 +
 .../gdbstub/follow-fork-mode-parent.py| 16 ++
 4 files changed, 128 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/multiarch/follow-fork-mode.c
 create mode 100644 tests/tcg/multiarch/gdbstub/follow-fork-mode-child.py
 create mode 100644 tests/tcg/multiarch/gdbstub/follow-fork-mode-parent.py

diff --git a/tests/tcg/multiarch/Makefile.target 
b/tests/tcg/multiarch/Makefile.target
index e10951a8016..b8b70c81860 100644
--- a/tests/tcg/multiarch/Makefile.target
+++ b/tests/tcg/multiarch/Makefile.target
@@ -115,6 +115,20 @@ run-gdbstub-catch-syscalls: catch-syscalls
--bin $< --test $(MULTIARCH_SRC)/gdbstub/catch-syscalls.py, \
hitting a syscall catchpoint)
 
+run-gdbstub-follow-fork-mode-child: follow-fork-mode
+   $(call run-test, $@, $(GDB_SCRIPT) \
+   --gdb $(GDB) \
+   --qemu $(QEMU) --qargs "$(QEMU_OPTS)" \
+   --bin $< --test 
$(MULTIARCH_SRC)/gdbstub/follow-fork-mode-child.py, \
+   following children on fork)
+
+run-gdbstub-follow-fork-mode-parent: follow-fork-mode
+   $(call run-test, $@, $(GDB_SCRIPT) \
+   --gdb $(GDB) \
+   --qemu $(QEMU) --qargs "$(QEMU_OPTS)" \
+   --bin $< --test 
$(MULTIARCH_SRC)/gdbstub/follow-fork-mode-parent.py, \
+   following parents on fork)
+
 else
 run-gdbstub-%:
$(call skip-test, "gdbstub test $*", "need working gdb with $(patsubst 
-%,,$(TARGET_NAME)) support")
@@ -122,7 +136,8 @@ endif
 EXTRA_RUNS += run-gdbstub-sha1 run-gdbstub-qxfer-auxv-read \
  run-gdbstub-proc-mappings run-gdbstub-thread-breakpoint \
  run-gdbstub-registers run-gdbstub-prot-none \
- run-gdbstub-catch-syscalls
+ run-gdbstub-catch-syscalls run-gdbstub-follow-fork-mode-child \
+ run-gdbstub-follow-fork-mode-parent
 
 # ARM Compatible Semi Hosting Tests
 #
diff --git a/tests/tcg/multiarch/follow-fork-mode.c 
b/tests/tcg/multiarch/follow-fork-mode.c
new file mode 100644
index 000..cb6b032b388
--- /dev/null
+++ b/tests/tcg/multiarch/follow-fork-mode.c
@@ -0,0 +1,56 @@
+/*
+ * Test GDB's follow-fork-mode.
+ *
+ * fork() a chain of processes.
+ * Parents sends one byte to their children, and children return their
+ * position in the chain, in order to prove that they survived GDB's fork()
+ * handling.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include 
+#include 
+#include 
+#include 
+
+void break_after_fork(void)
+{
+}
+
+int main(void)
+{
+int depth = 42, err, i, fd[2], status;
+pid_t child, pid;
+ssize_t n;
+char b;
+
+for (i = 0; i < depth; i++) {
+err = pipe(fd);
+assert(err == 0);
+child = fork();
+break_after_fork();
+assert(child != -1);
+if (child == 0) {
+close(fd[1]);
+
+n = read(fd[0], , 1);
+close(fd[0]);
+assert(n == 1);
+assert(b == (char)i);
+} else {
+close(fd[0]);
+
+b = (char)i;
+n = write(fd[1], , 1);
+close(fd[1]);
+assert(n == 1);
+
+pid = waitpid(child, , 0);
+assert(pid == child);
+assert(WIFEXITED(status));
+return WEXITSTATUS(status) - 1;
+}
+}
+
+return depth;
+}
diff --git a/tests/tcg/multiarch/gdbstub/follow-fork-mode-child.py 
b/tests/tcg/multiarch/gdbstub/follow-fork-mode-child.py
new file mode 100644
index 000..72a6e440c08
--- /dev/null
+++ b/tests/tcg/multiarch/gdbstub/follow-fork-mode-child.py
@@ -0,0 +1,40 @@
+"""Test GDB's follow-fork-mode child.
+
+SPDX-License-Identifier: GPL-2.0-or-later
+"""
+from test_gdbstub import main, report
+
+
+def run_test():
+"""Run through the tests one by one"""
+gdb.execute("set follow-fork-mode child")
+# Check that the parent breakpoints are unset.
+gdb.execute("break break_after_fork")
+# Check that the parent syscall catchpoints are unset.
+# Skip this check on the architectures that don't have them.
+have_fork_syscall = False
+for fork_syscall in ("fork", "clone", "clone2", "clone3"):
+try:
+gdb.execute("catch syscall {}".format(fork_syscall))
+except gdb.error:
+pass
+else:
+have_fork_syscall = True
+gdb.execute("continue")
+for i in range(42):
+if have_fork_syscall:
+# syscall entry.
+if i % 2 == 0:
+# Check that the parent single-stepping is turned off.
+gdb.execute("si")
+

[PATCH 2/3] gdbstub: Implement follow-fork-mode child

2024-01-31 Thread Ilya Leoshkevich

Currently it's not possible to use gdbstub for debugging linux-user
code that runs in a forked child, which is normally done using the `set
follow-fork-mode child` GDB command. Purely on the protocol level, the
missing piece is the fork-events feature.

However, a deeper problem is supporting $Hg switching between different
processes - right now it can do only threads. Implementing this for the
general case would be quite complicated, but, fortunately, for the
follow-fork-mode case there are a few factors that greatly simplify
things: fork() happens in the exclusive section, there are only two
processes involved, and before one of them is resumed, the second one
is detached.

This makes it possible to implement a simplified scheme: the parent and
the child share the gdbserver socket, it's used only by one of them at
any given time, which is coordinated through a separate socketpair. The
processes can read from the gdbserver socket only one byte at a time,
which is not great for performance, but, fortunately, the
follow-fork-mode involves only a few messages.

Add the hooks for the user-specific handling of $qSupported, $Hg, and
$D. Advertise the fork-events support, and remember whether GDB has it
as well. Implement the state machine that is initialized on fork(),
decides the current owner of the gdbserver socket, and is terminated
when one of the two processes is detached. The logic for the parent and
the child is the same, only the initial state is different.

Handle the `stepi` of a syscall corner case by disabling the
single-stepping in detached processes.

Signed-off-by: Ilya Leoshkevich 
---
 gdbstub/gdbstub.c   |  29 --
 gdbstub/internals.h |   3 +
 gdbstub/user.c  | 210 +++-
 3 files changed, 234 insertions(+), 8 deletions(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 7e73e916bdc..46f5dd47e9e 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -991,6 +991,12 @@ static void handle_detach(GArray *params, void *user_ctx)
 pid = get_param(params, 0)->val_ul;
 }
 
+#ifdef CONFIG_USER_ONLY
+if (gdb_handle_detach_user(pid)) {
+return;
+}
+#endif
+
 process = gdb_get_process(pid);
 gdb_process_breakpoint_remove_all(process);
 process->attached = false;
@@ -1066,6 +1072,7 @@ static void handle_cont_with_sig(GArray *params, void 
*user_ctx)
 
 static void handle_set_thread(GArray *params, void *user_ctx)
 {
+uint32_t pid, tid;
 CPUState *cpu;
 
 if (params->len != 2) {
@@ -1083,8 +1090,14 @@ static void handle_set_thread(GArray *params, void 
*user_ctx)
 return;
 }
 
-cpu = gdb_get_cpu(get_param(params, 1)->thread_id.pid,
-  get_param(params, 1)->thread_id.tid);
+pid = get_param(params, 1)->thread_id.pid;
+tid = get_param(params, 1)->thread_id.tid;
+#ifdef CONFIG_USER_ONLY
+if (gdb_handle_set_thread_user(pid, tid)) {
+return;
+}
+#endif
+cpu = gdb_get_cpu(pid, tid);
 if (!cpu) {
 gdb_put_packet("E22");
 return;
@@ -1599,6 +1612,7 @@ static void handle_query_thread_extra(GArray *params, 
void *user_ctx)
 
 static void handle_query_supported(GArray *params, void *user_ctx)
 {
+const char *gdb_supported;
 CPUClass *cc;
 
 g_string_printf(gdbserver_state.str_buf, "PacketSize=%x", 
MAX_PACKET_LENGTH);
@@ -1622,9 +1636,14 @@ static void handle_query_supported(GArray *params, void 
*user_ctx)
 g_string_append(gdbserver_state.str_buf, ";qXfer:exec-file:read+");
 #endif
 
-if (params->len &&
-strstr(get_param(params, 0)->data, "multiprocess+")) {
-gdbserver_state.multiprocess = true;
+if (params->len) {
+gdb_supported = get_param(params, 0)->data;
+if (strstr(gdb_supported, "multiprocess+")) {
+gdbserver_state.multiprocess = true;
+}
+#if defined(CONFIG_USER_ONLY)
+gdb_handle_query_supported_user(gdb_supported);
+#endif
 }
 
 g_string_append(gdbserver_state.str_buf, ";vContSupported+;multiprocess+");
diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index 56b7c13b750..b4724598384 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -196,6 +196,9 @@ void gdb_handle_v_file_pread(GArray *params, void 
*user_ctx); /* user */
 void gdb_handle_v_file_readlink(GArray *params, void *user_ctx); /* user */
 void gdb_handle_query_xfer_exec_file(GArray *params, void *user_ctx); /* user 
*/
 void gdb_handle_set_catch_syscalls(GArray *params, void *user_ctx); /* user */
+void gdb_handle_query_supported_user(const char *gdb_supported); /* user */
+bool gdb_handle_set_thread_user(uint32_t pid, uint32_t tid); /* user */
+bool gdb_handle_detach_user(uint32_t pid); /* user */
 
 void gdb_handle_query_attached(GArray *params, void *user_ctx); /* both */
 
diff --git a/gdbstub/user.c b/gdbstub/user.c
index 120eb7fc117..962f4cb74e7 100644
--- a/gdbstub/user.c
+++ b/gdbstub/user.c
@@ -10,6 +10,7 @@
  */
 
 #include "qemu/osdep.h"

[PATCH 0/3] gdbstub: Implement follow-fork-mode child

2024-01-31 Thread Ilya Leoshkevich

Based-on: <20240116094411.216665-1-...@linux.ibm.com>

Hi,

I needed to debug a linux-user crash between fork() and exec() [1] and
realized that gdbstub does not allow this. This series lifts this
restriction (one still cannot debug past exec() though). Patch 1 is a
preliminary refactoring, I can split it if necessary. Patch 2 is the
implementation, and patch 3 is the test.

[1] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg06424.html

Best regards,
Ilya

Ilya Leoshkevich (3):
  gdbstub: Refactor fork() handling
  gdbstub: Implement follow-fork-mode child
  tests/tcg: Add two follow-fork-mode tests

 bsd-user/freebsd/os-proc.h|   6 +-
 bsd-user/main.c   |   8 +-
 bsd-user/qemu.h   |   2 +-
 gdbstub/gdbstub.c |  29 ++-
 gdbstub/internals.h   |   3 +
 gdbstub/user.c| 225 +-
 include/gdbstub/user.h|  11 +-
 linux-user/main.c |   8 +-
 linux-user/syscall.c  |   4 +-
 linux-user/user-internals.h   |   2 +-
 tests/tcg/multiarch/Makefile.target   |  17 +-
 tests/tcg/multiarch/follow-fork-mode.c|  56 +
 .../gdbstub/follow-fork-mode-child.py |  40 
 .../gdbstub/follow-fork-mode-parent.py|  16 ++
 14 files changed, 403 insertions(+), 24 deletions(-)
 create mode 100644 tests/tcg/multiarch/follow-fork-mode.c
 create mode 100644 tests/tcg/multiarch/gdbstub/follow-fork-mode-child.py
 create mode 100644 tests/tcg/multiarch/gdbstub/follow-fork-mode-parent.py

-- 
2.43.0

[PATCH 1/3] gdbstub: Refactor fork() handling

2024-01-31 Thread Ilya Leoshkevich

Prepare for implementing follow-fork-mode child:
* Introduce gdbserver_fork_start(), which for now is a no-op.
* Rename gdbserver_fork() to gdbserver_fork_end(), call it in both
  parent and child processes, and pass the fork()'s return value to it.
* Factor out disable_gdbstub().
* Update ts_tid in the forked child.

Signed-off-by: Ilya Leoshkevich 
---
 bsd-user/freebsd/os-proc.h  |  6 +++---
 bsd-user/main.c |  8 ++--
 bsd-user/qemu.h |  2 +-
 gdbstub/user.c  | 25 +++--
 include/gdbstub/user.h  | 11 ---
 linux-user/main.c   |  8 ++--
 linux-user/syscall.c|  4 ++--
 linux-user/user-internals.h |  2 +-
 8 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/bsd-user/freebsd/os-proc.h b/bsd-user/freebsd/os-proc.h
index d6418780344..3003c8cb637 100644
--- a/bsd-user/freebsd/os-proc.h
+++ b/bsd-user/freebsd/os-proc.h
@@ -208,7 +208,7 @@ static inline abi_long do_freebsd_fork(void *cpu_env)
  */
 set_second_rval(cpu_env, child_flag);
 
-fork_end(child_flag);
+fork_end(ret);
 
 return ret;
 }
@@ -252,7 +252,7 @@ static inline abi_long do_freebsd_rfork(void *cpu_env, 
abi_long flags)
  * value: 0 for parent process, 1 for child process.
  */
 set_second_rval(cpu_env, child_flag);
-fork_end(child_flag);
+fork_end(ret);
 
 return ret;
 
@@ -285,7 +285,7 @@ static inline abi_long do_freebsd_pdfork(void *cpu_env, 
abi_ulong target_fdp,
  * value: 0 for parent process, 1 for child process.
  */
 set_second_rval(cpu_env, child_flag);
-fork_end(child_flag);
+fork_end(ret);
 
 return ret;
 }
diff --git a/bsd-user/main.c b/bsd-user/main.c
index e5efb7b8458..8ecfa395cc5 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -106,10 +106,13 @@ void fork_start(void)
 start_exclusive();
 cpu_list_lock();
 mmap_fork_start();
+gdbserver_fork_start();
 }
 
-void fork_end(int child)
+void fork_end(abi_long pid)
 {
+int child = pid == 0;
+
 if (child) {
 CPUState *cpu, *next_cpu;
 /*
@@ -127,10 +130,11 @@ void fork_end(int child)
  * state, so we don't need to end_exclusive() here.
  */
 qemu_init_cpu_list();
-gdbserver_fork(thread_cpu);
+gdbserver_fork_end(pid);
 } else {
 mmap_fork_end(child);
 cpu_list_unlock();
+gdbserver_fork_end(pid);
 end_exclusive();
 }
 }
diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index dc842fffa7d..2414a87559b 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -180,7 +180,7 @@ void cpu_loop(CPUArchState *env);
 char *target_strerror(int err);
 int get_osversion(void);
 void fork_start(void);
-void fork_end(int child);
+void fork_end(abi_long pid);
 
 #include "qemu/log.h"
 
diff --git a/gdbstub/user.c b/gdbstub/user.c
index 766f7c08848..120eb7fc117 100644
--- a/gdbstub/user.c
+++ b/gdbstub/user.c
@@ -356,16 +356,29 @@ int gdbserver_start(const char *port_or_path)
 return -1;
 }
 
+void gdbserver_fork_start(void)
+{
+}
+
+static void disable_gdbstub(void)
+{
+CPUState *cpu;
+
+close(gdbserver_user_state.fd);
+gdbserver_user_state.fd = -1;
+CPU_FOREACH(cpu) {
+cpu_breakpoint_remove_all(cpu, BP_GDB);
+/* no cpu_watchpoint_remove_all for user-mode */
+}
+}
+
 /* Disable gdb stub for child processes.  */
-void gdbserver_fork(CPUState *cpu)
+void gdbserver_fork_end(pid_t pid)
 {
-if (!gdbserver_state.init || gdbserver_user_state.fd < 0) {
+if (pid != 0 || !gdbserver_state.init || gdbserver_user_state.fd < 0) {
 return;
 }
-close(gdbserver_user_state.fd);
-gdbserver_user_state.fd = -1;
-cpu_breakpoint_remove_all(cpu, BP_GDB);
-/* no cpu_watchpoint_remove_all for user-mode */
+disable_gdbstub();
 }
 
 /*
diff --git a/include/gdbstub/user.h b/include/gdbstub/user.h
index 68b6534130c..1694d4fd330 100644
--- a/include/gdbstub/user.h
+++ b/include/gdbstub/user.h
@@ -46,10 +46,15 @@ static inline int gdb_handlesig(CPUState *cpu, int sig)
 void gdb_signalled(CPUArchState *as, int sig);
 
 /**
- * gdbserver_fork() - disable gdb stub for child processes.
- * @cs: CPU
+ * gdbserver_fork_start() - inform gdb of the upcoming fork()
+ */
+void gdbserver_fork_start(void);
+
+/**
+ * gdbserver_fork_end() - disable gdb stub for child processes.
+ * @pid: 0 if in child process, -1 if fork failed, child process pid otherwise
  */
-void gdbserver_fork(CPUState *cs);
+void gdbserver_fork_end(pid_t pid);
 
 /**
  * gdb_syscall_entry() - inform gdb of syscall entry and yield control to it
diff --git a/linux-user/main.c b/linux-user/main.c
index c9470eeccfc..b42c8f36a1d 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -144,10 +144,13 @@ void fork_start(void)
 mmap_fork_start();
 cpu_list_lock();
 qemu_plugin_user_prefork_lock();
+gdbserver_fork_start();
 }
 
-void fork_end(int child)
+void fork_end(abi_long pid)
 {
+int child = pid

Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread

2024-01-31 Thread Stefan Hajnoczi

On Fri, Jan 26, 2024 at 04:24:49PM +0100, Hanna Czenczek wrote:
> On 26.01.24 14:18, Kevin Wolf wrote:
> > Am 25.01.2024 um 18:32 hat Hanna Czenczek geschrieben:
> > > On 23.01.24 18:10, Kevin Wolf wrote:
> > > > Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
> > > > > On 21.12.23 22:23, Kevin Wolf wrote:
> > > > > > From: Stefan Hajnoczi
> > > > > > 
> > > > > > Stop depending on the AioContext lock and instead access
> > > > > > SCSIDevice->requests from only one thread at a time:
> > > > > > - When the VM is running only the BlockBackend's AioContext may 
> > > > > > access
> > > > > >  the requests list.
> > > > > > - When the VM is stopped only the main loop may access the requests
> > > > > >  list.
> > > > > > 
> > > > > > These constraints protect the requests list without the need for 
> > > > > > locking
> > > > > > in the I/O code path.
> > > > > > 
> > > > > > Note that multiple IOThreads are not supported yet because the code
> > > > > > assumes all SCSIRequests are executed from a single AioContext. 
> > > > > > Leave
> > > > > > that as future work.
> > > > > > 
> > > > > > Signed-off-by: Stefan Hajnoczi
> > > > > > Reviewed-by: Eric Blake
> > > > > > Message-ID:<20231204164259.1515217-2-stefa...@redhat.com>
> > > > > > Signed-off-by: Kevin Wolf
> > > > > > ---
> > > > > > include/hw/scsi/scsi.h |   7 +-
> > > > > > hw/scsi/scsi-bus.c | 181 
> > > > > > -
> > > > > > 2 files changed, 131 insertions(+), 57 deletions(-)
> > > > > My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now 
> > > > > breaks more
> > > > > often because of this commit than because of the original bug, i.e. 
> > > > > when
> > > > > repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd 
> > > > > device,
> > > > > this tends to happen when unplugging the scsi-hd:
> 
> Note: We (on issues.redhat.com) have a separate report that seems to be
> concerning this very problem: https://issues.redhat.com/browse/RHEL-19381
> 
> > > > > {"execute":"device_del","arguments":{"id":"stg0"}}
> > > > > {"return": {}}
> > > > > qemu-system-x86_64: ../block/block-backend.c:2429: 
> > > > > blk_get_aio_context:
> > > > > Assertion `ctx == blk->ctx' failed.
> > > [...]
> > > 
> > > > > I don’t know anything about the problem yet, but as usual, I like
> > > > > speculation and discovering how wrong I was later on, so one thing I 
> > > > > came
> > > > > across that’s funny about virtio-scsi is that requests can happen 
> > > > > even while
> > > > > a disk is being attached or detached.  That is, Linux seems to probe 
> > > > > all
> > > > > LUNs when a new virtio-scsi device is being attached, and it won’t 
> > > > > stop just
> > > > > because a disk is being attached or removed.  So maybe that’s part of 
> > > > > the
> > > > > problem, that we get a request while the BB is being detached, and
> > > > > temporarily in an inconsistent state (BDS context differs from BB 
> > > > > context).
> > > > I don't know anything about the problem either, but since you already
> > > > speculated about the cause, let me speculate about the solution:
> > > > Can we hold the graph writer lock for the tran_commit() call in
> > > > bdrv_try_change_aio_context()? And of course take the reader lock for
> > > > blk_get_aio_context(), but that should be completely unproblematic.
> > > Actually, now that completely unproblematic part is giving me trouble.  I
> > > wanted to just put a graph lock into blk_get_aio_context() (making it a
> > > coroutine with a wrapper)
> > Which is the first thing I neglected and already not great. We have
> > calls of blk_get_aio_context() in the SCSI I/O path, and creating a
> > coroutine and doing at least two context switches simply for this call
> > is a lot of overhead...
> > 
> > > but callers of blk_get_aio_context() generally assume the context is
> > > going to stay the BB’s context for as long as their AioContext *
> > > variable is in scope.
> > I'm not so sure about that. And taking another step back, I'm actually
> > also not sure how much it still matters now that they can submit I/O
> > from any thread.
> 
> That’s my impression, too, but “not sure” doesn’t feel great. :)
> scsi_device_for_each_req_async_bh() specifically double-checks whether it’s
> still in the right context before invoking the specified function, so it
> seems there was some intention to continue to run in the context associated
> with the BB.
> 
> (Not judging whether that intent makes sense or not, yet.)
> 
> > Maybe the correct solution is to remove the assertion from
> > blk_get_aio_context() and just always return blk->ctx. If it's in the
> > middle of a change, you'll either get the old one or the new one. Either
> > one is fine to submit I/O from, and if you care about changes for other
> > reasons (like SCSI does), then you need explicit code to protect it
> > anyway (which SCSI apparently has, but it doesn't work).
> 
> I think most

Re: [PATCH 07/14] migration/multifd: Simplify locking in sender thread

2024-01-31 Thread Fabiano Rosas

pet...@redhat.com writes:

> From: Peter Xu 
>
> The sender thread will yield the p->mutex before IO starts, trying to not
> block the requester thread.  This may be unnecessary lock optimizations,
> because the requester can already read pending_job safely even without the
> lock, because the requester is currently the only one who can assign a
> task.

What about the coroutine yield at qio_channel_writev_full_all()? Is it
safe from yield while holding a lock? Could the main loop dispatch the
cleanup function, it calls join on the multifd thread and it deadlocks?

>
> Drop that lock complication on both sides:
>
>   (1) in the sender thread, always take the mutex until job done
>   (2) in the requester thread, check pending_job clear lockless
>
> Signed-off-by: Peter Xu 
> ---
>  migration/multifd.c | 23 ---
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 6a4863edd2..4dc5af0a15 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -429,7 +429,9 @@ static int multifd_send_pages(void)
>  return -1;
>  }
>  
> +/* We wait here, until at least one channel is ready */
>  qemu_sem_wait(_send_state->channels_ready);
> +
>  /*
>   * next_channel can remain from a previous migration that was
>   * using more channels, so ensure it doesn't overflow if the
> @@ -441,17 +443,26 @@ static int multifd_send_pages(void)
>  return -1;
>  }
>  p = _send_state->params[i];
> -qemu_mutex_lock(>mutex);
> +/*
> + * Lockless read to p->pending_job is safe, because only multifd
> + * sender thread can clear it.
> + */
>  if (!p->pending_job) {

The worst it could happen is we read at the same time the thread is
clearing it and we loop to the next channel. So it doesn't need to be
atomic either.

> -p->pending_job = true;
>  next_channel = (i + 1) % migrate_multifd_channels();
>  break;
>  }
> -qemu_mutex_unlock(>mutex);
>  }
> +
> +qemu_mutex_lock(>mutex);

What data this lock protects now? Everything below here only happens
after this thread sees pending_job==false. It seems we would only need a
barrier on the multifd thread to make sure p->pending_job=false is
ordered after everything.

Even for the "sync" case, it appears the lock is not needed as well?

We might need to remove p->running first and move the kick from
multifd_send_terminate_threads() into multifd_save_cleanup() like I
suggested, but it seems like we could remove this lock.

Which would make sense, because there's nothing another thread would
want to do with a channel's MultiFDSendParams unless the channel is idle
waiting for work.

> +/*
> + * Double check on pending_job==false with the lock.  In the future if
> + * we can have >1 requester thread, we can replace this with a "goto
> + * retry", but that is for later.
> + */
> +assert(p->pending_job == false);
> +p->pending_job = true;
>  assert(!p->pages->num);
>  assert(!p->pages->block);
> -
>  p->packet_num = multifd_send_state->packet_num++;

I noticed this line cannot be here. If the channel thread takes long to
wakeup, the "sync" code will increment once more and overwrite this
field. This and the identical line at multifd_send_sync_main() should go
into multifd_send_fill_packet() I think.

>  multifd_send_state->pages = p->pages;
>  p->pages = pages;
> @@ -704,8 +715,6 @@ static void *multifd_send_thread(void *opaque)
>  multifd_send_fill_packet(p);
>  p->num_packets++;
>  p->total_normal_pages += pages->num;
> -qemu_mutex_unlock(>mutex);
> -
>  trace_multifd_send(p->id, packet_num, pages->num, p->flags,
> p->next_packet_size);
>  
> @@ -725,6 +734,7 @@ static void *multifd_send_thread(void *opaque)
>  ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, 
> NULL,
>0, p->write_flags, _err);
>  if (ret != 0) {
> +qemu_mutex_unlock(>mutex);
>  break;
>  }
>  
> @@ -733,7 +743,6 @@ static void *multifd_send_thread(void *opaque)
>  
>  multifd_pages_reset(p->pages);
>  p->next_packet_size = 0;
> -qemu_mutex_lock(>mutex);
>  p->pending_job = false;
>  qemu_mutex_unlock(>mutex);
>  } else if (p->pending_sync) {

Dynamic & heterogeneous machines, initial configuration: problems

2024-01-31 Thread Markus Armbruster

This memo is the fruit of discussions with Philippe Mathieu-Daudé.
Its errors are mine.

QEMU defines machines statically in C code.  We've long wished we could
define them dynamically in some suitable DSL.  This is what we call
"dynamic machines".

There's a need for machines that contain more than one target's CPUs.
This is what we call "heterogeneous machines".  They require a single
binary capable of any of the targets involved.

There's substantial overlap with a seemingly unrelated problem:
machine-friendly initial configuration.

To keep the memo's length in check (sort of), it focuses on (known)
problems.


= Problem 1: Initial configuration =

Previously discussed in

Subject: Redesign of QEMU startup & initial configuration
Date: Thu, 02 Dec 2021 07:57:38 +0100
Message-ID: <87lf13cx3x@dusky.pond.sub.org>


== What users want for initial configuration ==

1. QMP only

   Management applications need to use QMP for monitoring anyway.  They
   may want to use it for initial configuration, too.  Libvirt does.

   They still need to bootstrap a QMP monitor, and for that, CLI is fine
   as long as it's simple and stable.

2. CLI and configuration files

   Human users want a CLI and configuration files.

   CLI is good for quick tweaks, and to explore.

   For more permanent, non-trivial configuration, configuration files
   are more suitable, because they are easier to read, edit, and
   document than long command lines.


== What we have for initial configuration ==

Half of 1. and half of 2., satisfying nobody's needs.

Management applications need to do a lot of non-trivial initial
configuration with the CLI.

Human users struggle with inconsistent syntax, insufficiently expressive
configuration files, and huge command lines.


= Problem 2: Defining machines =

This is how I understand the problem.  Please correct me where I'm off.


== How we'd like to build machines ==

We want to build machines declaratively, by configuring devices and
their connections.

We want to build composite devices the same way.

The non-composite devices are provided by the QEMU binary.

Users want to build machines as variations of canned machine types
shipped with QEMU.  Some users may want to build their own machines from
scratch.

To enable all this, machine configuration needs to be composable and
dynamic.

Composable means configuration can be assembled from components,
recursively.

Dynamic means it can be done during qemu-system-FOO initial
configuration.


== What we have for defining machines ==

A QEMU binary provides a fixed set of device types, some of them
composite, and a fixed set of machine types.

Machines are QOM objects: instance of a concrete subtype of "machine".

Devices are usually QOM objects: instance of a concrete subtype of
"device".  Exceptions remain in old code nobody can be bothered to
update.

Both machine types and composite devices are built from devices
by code, i.e. imperatively, not declaratively.

The code can be parameterized.  For QOM objects, parameters should be
QOM properties, but machine type code additionally uses global old-style
configuration such as -drive and -serial.

Code may create default backends for convenience.  Machine type code may
also create backends to honor global old-style configuration.  Only some
backends are QOM objects.

Machine types split their code between object creation (QOM methods
.instance_init() and .instance_post_init()) and machine initialization
(MachineClass method .init()).  However, basically everything is done in
the latter.

QOM device types split their code between object creation and device
realization (qdev method .realize()).  The actual split varies widely
between devices.  Developers are commonly unsure what to put where.

After machine type code is done, the resulting machine can still be
adjusted with device cold plug and unplug: -device, device_add,
device_del.  Only works for a subset of the devices.

Related, but out of scope here: hot plug and unplug.


= Common sub-problem: qemu-system initial startup =

QAPI/QMP is our most capable, flexible, and mature configuration
interface.  We need to offer machine-friendly initial configuration via
QMP, and we'd very much like to have a QAPI-based CLI and configuration
files (see "What users want for initial configuration" above).

Dynamic machine configuration happens during initial startup.  This
makes it part of the larger initial configuration problem.  We want an
integrated solution for the larger configuration problem that includes
machine configuration.

Traditionally, QMP becomes available quite late, long after machine
initialization.  This precludes use of QMP for most parts of initial
configuration, including dynamic machine configuration.

To enable a bit of machine configuration via QMP, experimental CLI
option -preconfig delays part of initial startup including machine
initialization by moving it into QMP command x-exit-preconfig.  Only
selected

Re: [PATCH] blkio: Respect memory-alignment for bounce buffer allocations

2024-01-31 Thread Stefan Hajnoczi

On Wed, Jan 31, 2024 at 06:31:40PM +0100, Kevin Wolf wrote:
> blkio_alloc_mem_region() requires that the requested buffer size is a
> multiple of the memory-alignment property. If it isn't, the allocation
> fails with a return value of -EINVAL.
> 
> Fix the call in blkio_resize_bounce_pool() to make sure the requested
> size is properly aligned.
> 
> I observed this problem with vhost-vdpa, which requires page aligned
> memory. As the virtio-blk device behind it still had 512 byte blocks, we
> got bs->bl.request_alignment = 512, but actually any request that needed
> a bounce buffer and was not aligned to 4k would fail without this fix.
> 
> Suggested-by: Stefano Garzarella 
> Signed-off-by: Kevin Wolf 
> ---
>  block/blkio.c | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 0/2] Enable -Wvla, forbidding use of variable length arrays

2024-01-31 Thread Peter Maydell

On Wed, 31 Jan 2024 at 14:56, Thomas Huth  wrote:
> There's still a vla left in the ppc kvm code:
>
>   https://gitlab.com/thuth/qemu/-/jobs/6063230079#L2005
>
> ../target/ppc/kvm.c: In function ‘kvmppc_save_htab’:
> ../target/ppc/kvm.c:2691:5: error: ISO C90 forbids variable length array
> ‘buf’ [-Werror=vla]
>   2691 | uint8_t buf[bufsize];
>| ^~~
> ../target/ppc/kvm.c: In function ‘kvmppc_read_hptes’:
> ../target/ppc/kvm.c:2773:9: error: ISO C90 forbids variable length array
> ‘buf’ [-Werror=vla]
>   2773 | char buf[sizeof(*hdr) + m * HASH_PTE_SIZE_64];
>| ^~~~
> cc1: all warnings being treated as errors

Thanks for catching that -- it being in code built only on
ppc hosts I missed it.

kvm_ppc_save_htab() is called twice, and in both cases the
bufsize passed in is MAX_KVM_BUF_SIZE. So we could drop
that argument and have the buf[] array always be MAX_KVM_BUF_SIZE.

kvmppc_read_hptes() does this:
int m = n < HPTES_PER_GROUP ? n : HPTES_PER_GROUP;
char buf[sizeof(*hdr) + m * HASH_PTE_SIZE_64];

HPTES_PER_GROUP is 8 and HASH_PTE_SIZE_64 is 16, so we aren't
saving many bytes of stack by trying to make the buf smaller
based on the value of n. So we could have the buf always
be [sizeof(*hdr) + HPTES_PER_GROUP * HASH_PTE_SIZE_64].

thanks
-- PMM

Re: [PULL 0/4] Misc changes guest agent

2024-01-31 Thread Peter Maydell

On Tue, 30 Jan 2024 at 10:59, Konstantin Kostiuk  wrote:
>
> The following changes since commit 11be70677c70fdccd452a3233653949b79e97908:
>
>   Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into 
> staging (2024-01-29 10:53:56 +)
>
> are available in the Git repository at:
>
>   https://github.com/kostyanf14/qemu.git tags/qga-pull-2024-01-30
>
> for you to fetch changes up to b3e0f64487a4b937d871ce4ce9c259e02ec02191:
>
>   qga: Solaris has net/if_arp.h and netinet/if_ether.h but not ETHER_ADDR_LEN 
> (2024-01-30 12:14:28 +0200)
>
> 
> qga-pull-2024-01-30


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM

Re: [PULL 00/31] tcg patch queue

2024-01-31 Thread Peter Maydell

On Mon, 29 Jan 2024 at 23:01, Richard Henderson
 wrote:
>
> The following changes since commit 7a1dc45af581d2b643cdbf33c01fd96271616fbd:
>
>   Merge tag 'pull-target-arm-20240126' of 
> https://git.linaro.org/people/pmaydell/qemu-arm into staging (2024-01-26 
> 18:16:35 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20240130
>
> for you to fetch changes up to ec1d32af123e7f13d98754a72bcaa7aa8c8e9d27:
>
>   target/i386: Extract x86_cpu_exec_halt() from accel/tcg/ (2024-01-29 
> 21:04:10 +1000)
>
> 
> linux-user: Allow gdbstub to ignore page protection
> cpu-exec: simplify jump cache management
> include/exec: Cleanups toward building accel/tcg once


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM

Re: [PATCH 3/6] target/riscv: add remaining named features

2024-01-31 Thread Daniel Henrique Barboza





On 1/29/24 22:10, Alistair Francis wrote:

On Fri, Jan 26, 2024 at 5:54 AM Daniel Henrique Barboza
 wrote:


The RVA22U64 and RVA22S64 profiles mandates certain extensions that,
until now, we were implying that they were available.

We can't do this anymore since named features also has a riscv,isa
entry.  Let's add them to riscv_cpu_named_features[].

They will also need to be explicitly enabled in both profile
descriptions. TCG will enable the named features it already implements,
other accelerators are free to handle it as they like.

After this patch, here's the riscv,isa from a buildroot using the
'rva22s64' CPU:

  # cat /proc/device-tree/cpus/cpu@0/riscv,isa
rv64imafdc_zic64b_zicbom_zicbop_zicboz_ziccamoa_ziccif_zicclsm_ziccrse_
zicntr_zicsr_zifencei_zihintpause_zihpm_za64rs_zfhmin_zca_zcd_zba_zbb_
zbs_zkt_sscounterenw_sstvala_sstvecd_svade_svinval_svpbmt#

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.c | 41 +-
  target/riscv/cpu_cfg.h |  9 +
  target/riscv/tcg/tcg-cpu.c | 19 +-
  3 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 28d3cfa8ce..1ecd8a57ed 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -101,6 +101,10 @@ const RISCVIsaExtData isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(zicbom, PRIV_VERSION_1_12_0, ext_zicbom),
  ISA_EXT_DATA_ENTRY(zicbop, PRIV_VERSION_1_12_0, ext_zicbop),
  ISA_EXT_DATA_ENTRY(zicboz, PRIV_VERSION_1_12_0, ext_zicboz),
+ISA_EXT_DATA_ENTRY(ziccamoa, PRIV_VERSION_1_11_0, ext_ziccamoa),
+ISA_EXT_DATA_ENTRY(ziccif, PRIV_VERSION_1_11_0, ext_ziccif),
+ISA_EXT_DATA_ENTRY(zicclsm, PRIV_VERSION_1_11_0, ext_zicclsm),
+ISA_EXT_DATA_ENTRY(ziccrse, PRIV_VERSION_1_11_0, ext_ziccrse),
  ISA_EXT_DATA_ENTRY(zicond, PRIV_VERSION_1_12_0, ext_zicond),
  ISA_EXT_DATA_ENTRY(zicntr, PRIV_VERSION_1_12_0, ext_zicntr),
  ISA_EXT_DATA_ENTRY(zicsr, PRIV_VERSION_1_10_0, ext_zicsr),
@@ -109,6 +113,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(zihintpause, PRIV_VERSION_1_10_0, ext_zihintpause),
  ISA_EXT_DATA_ENTRY(zihpm, PRIV_VERSION_1_12_0, ext_zihpm),
  ISA_EXT_DATA_ENTRY(zmmul, PRIV_VERSION_1_12_0, ext_zmmul),
+ISA_EXT_DATA_ENTRY(za64rs, PRIV_VERSION_1_12_0, ext_za64rs),
  ISA_EXT_DATA_ENTRY(zacas, PRIV_VERSION_1_12_0, ext_zacas),
  ISA_EXT_DATA_ENTRY(zawrs, PRIV_VERSION_1_12_0, ext_zawrs),
  ISA_EXT_DATA_ENTRY(zfa, PRIV_VERSION_1_12_0, ext_zfa),
@@ -170,8 +175,12 @@ const RISCVIsaExtData isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(smepmp, PRIV_VERSION_1_12_0, ext_smepmp),
  ISA_EXT_DATA_ENTRY(smstateen, PRIV_VERSION_1_12_0, ext_smstateen),
  ISA_EXT_DATA_ENTRY(ssaia, PRIV_VERSION_1_12_0, ext_ssaia),
+ISA_EXT_DATA_ENTRY(ssccptr, PRIV_VERSION_1_11_0, ext_ssccptr),
  ISA_EXT_DATA_ENTRY(sscofpmf, PRIV_VERSION_1_12_0, ext_sscofpmf),
+ISA_EXT_DATA_ENTRY(sscounterenw, PRIV_VERSION_1_12_0, ext_sscounterenw),
  ISA_EXT_DATA_ENTRY(sstc, PRIV_VERSION_1_12_0, ext_sstc),
+ISA_EXT_DATA_ENTRY(sstvala, PRIV_VERSION_1_12_0, ext_sstvala),
+ISA_EXT_DATA_ENTRY(sstvecd, PRIV_VERSION_1_12_0, ext_sstvecd),
  ISA_EXT_DATA_ENTRY(svade, PRIV_VERSION_1_11_0, ext_svade),
  ISA_EXT_DATA_ENTRY(svadu, PRIV_VERSION_1_12_0, ext_svadu),
  ISA_EXT_DATA_ENTRY(svinval, PRIV_VERSION_1_12_0, ext_svinval),
@@ -1523,6 +1532,22 @@ const RISCVCPUMultiExtConfig riscv_cpu_named_features[] 
= {
  MULTI_EXT_CFG_BOOL("svade", ext_svade, true),
  MULTI_EXT_CFG_BOOL("zic64b", ext_zic64b, true),

+/*
+ * cache-related extensions that are always enabled
+ * since QEMU RISC-V does not have a cache model.
+ */
+MULTI_EXT_CFG_BOOL("za64rs", ext_za64rs, true),
+MULTI_EXT_CFG_BOOL("ziccif", ext_ziccif, true),
+MULTI_EXT_CFG_BOOL("ziccrse", ext_ziccrse, true),
+MULTI_EXT_CFG_BOOL("ziccamoa", ext_ziccamoa, true),
+MULTI_EXT_CFG_BOOL("zicclsm", ext_zicclsm, true),
+MULTI_EXT_CFG_BOOL("ssccptr", ext_ssccptr, true),
+
+/* Other named features that QEMU TCG always implements */
+MULTI_EXT_CFG_BOOL("sstvecd", ext_sstvecd, true),
+MULTI_EXT_CFG_BOOL("sstvala", ext_sstvala, true),
+MULTI_EXT_CFG_BOOL("sscounterenw", ext_sscounterenw, true),
+
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -2116,13 +2141,8 @@ static const PropertyInfo prop_marchid = {
  };

  /*
- * RVA22U64 defines some 'named features' or 'synthetic extensions'
- * that are cache related: Za64rs, Zic64b, Ziccif, Ziccrse, Ziccamoa
- * and Zicclsm. We do not implement caching in QEMU so we'll consider
- * all these named features as always enabled.
- *
- * There's no riscv,isa update for them (nor for zic64b, despite it
- * having a cfg offset) at this moment.
+ * RVA22U64 defines some cache related extensions: Za64rs,
+ * Ziccif, Ziccrse, Ziccamoa and Zicclsm.
   */
  static RISCVCPUProfile RVA22U64 = {
  .parent = NULL,
@@ -2139,7 +2159,9

RE: [PATCH 2/3] ui/gtk: set the ui size to 0 when invisible

2024-01-31 Thread Kim, Dongwon

Hi Marc-André,

> -Original Message-
> From: Marc-André Lureau 
> Sent: Tuesday, January 30, 2024 11:13 PM
> To: Kim, Dongwon 
> Cc: qemu-devel@nongnu.org
> Subject: Re: [PATCH 2/3] ui/gtk: set the ui size to 0 when invisible
> 
> Hi
> 
> On Wed, Jan 31, 2024 at 3:50 AM  wrote:
> >
> > From: Dongwon Kim 
> >
> > UI size is set to 0 when the VC is invisible, which will prevent the
> > further scanout update by notifying the guest that the display is not
> > in active state. Then it is restored to the original size whenever the
> > VC becomes visible again.
> 
> This can have unwanted results on multi monitor setups, such as moving
> windows or icons around on different monitors.

[Kim, Dongwon]  You are right. This is just a choice we made.
> 
> Switching tabs or minimizing the display window shouldn't cause a guest
> display reconfiguration.
> 
> What is the benefit of disabling the monitor here? Is it for performance 
> reasons?

[Kim, Dongwon] Not sure if you recognized it but this patch series was a part of
our VM display hot-plug feature we submitted a few months ago. There, we added 
a new
param called connectors to have a way to fix individual VM displays (in multi 
display env)
on different physical displays there and made the VM display disconnected when
associated physical one is disconnected. We just wanted to make tab switching 
and
window minimization do the similar to make all of this logic consistent. 

However, if it makes more sense to have those displays all connected even when
those are not shown except for the case of hot-plug in, we could change the 
logic.
But as you mentioned, there will be some waste of bandwidth and perf since the
guest will keep sending out scan-out frames that would be just dumped.

This might be a minor thing but another concern is about tab-switching. 
Initially, the guest
will detect only one display even if the max-output is set to N (other than 1). 
Multi displays
will be detected once you detach or switch to another tab. Then if you move to 
the original
tab or close the detached window, the guest won't go back to single display 
setup.
All multi-displays will exist in its setup and this doesn’t look consistent to 
me.

> 
> >
> > Cc: Marc-André Lureau 
> > Cc: Gerd Hoffmann 
> > Cc: Vivek Kasireddy 
> > Signed-off-by: Dongwon Kim 
> > ---
> >  ui/gtk.c | 15 ++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/ui/gtk.c b/ui/gtk.c
> > index 02eb667d8a..651ed3492f 100644
> > --- a/ui/gtk.c
> > +++ b/ui/gtk.c
> > @@ -1314,10 +1314,12 @@ static void gd_menu_switch_vc(GtkMenuItem
> *item, void *opaque)
> >  GtkDisplayState *s = opaque;
> >  VirtualConsole *vc;
> >  GtkNotebook *nb = GTK_NOTEBOOK(s->notebook);
> > +GdkWindow *window;
> >  gint page;
> >
> >  vc = gd_vc_find_current(s);
> >  vc->gfx.visible = false;
> > +gd_set_ui_size(vc, 0, 0);
> >
> >  vc = gd_vc_find_by_menu(s);
> >  gtk_release_modifiers(s);
> > @@ -1325,6 +1327,9 @@ static void gd_menu_switch_vc(GtkMenuItem
> *item, void *opaque)
> >  page = gtk_notebook_page_num(nb, vc->tab_item);
> >  gtk_notebook_set_current_page(nb, page);
> >  gtk_widget_grab_focus(vc->focus);
> > +window = gtk_widget_get_window(vc->gfx.drawing_area);
> > +gd_set_ui_size(vc, gdk_window_get_width(window),
> > +   gdk_window_get_height(window));
> >  vc->gfx.visible = true;
> >  }
> >  }
> > @@ -1356,6 +1361,7 @@ static gboolean gd_tab_window_close(GtkWidget
> *widget, GdkEvent *event,
> >  GtkDisplayState *s = vc->s;
> >
> >  vc->gfx.visible = false;
> > +gd_set_ui_size(vc, 0, 0);
> >  gtk_widget_set_sensitive(vc->menu_item, true);
> >  gd_widget_reparent(vc->window, s->notebook, vc->tab_item);
> >  gtk_notebook_set_tab_label_text(GTK_NOTEBOOK(s->notebook),
> > @@ -1391,6 +1397,7 @@ static gboolean gd_win_grab(void *opaque)
> > static void gd_menu_untabify(GtkMenuItem *item, void *opaque)  {
> >  GtkDisplayState *s = opaque;
> > +GdkWindow *window;
> >  VirtualConsole *vc = gd_vc_find_current(s);
> >
> >  if (vc->type == GD_VC_GFX &&
> > @@ -1429,6 +1436,10 @@ static void gd_menu_untabify(GtkMenuItem
> *item, void *opaque)
> >  gd_update_geometry_hints(vc);
> >  gd_update_caption(s);
> >  }
> > +
> > +window = gtk_widget_get_window(vc->gfx.drawing_area);
> > +gd_set_ui_size(vc, gdk_window_get_width(window),
> > +   gdk_window_get_height(window));
> >  vc->gfx.visible = true;
> >  }
> >
> > @@ -1753,7 +1764,9 @@ static gboolean gd_configure(GtkWidget *widget,
> > {
> >  VirtualConsole *vc = opaque;
> >
> > -gd_set_ui_size(vc, cfg->width, cfg->height);
> > +if (vc->gfx.visible) {
> > +gd_set_ui_size(vc, cfg->width, cfg->height);
> > +}
> >  return FALSE;
> >  }
> >
> > --
> > 2.34.1
> >
> >
> 
> 
> --
> Marc-André Lureau

RE: [PATCH 1/3] ui/gtk: skip drawing guest scanout when associated VC is invisible

2024-01-31 Thread Kim, Dongwon

Hi Marc-André,

> https://docs.gtk.org/gtk3/method.Widget.is_visible.html

This is what we had tried first but it didn't seem to work for the case of 
window minimization.
I see the visible flag for the GTK widget didn't seem to be toggled for some 
reason. And when
closing window, vc->window widget is destroyed so it is not possible to check 
the flag using
this GTK function. Having extra flag bound to VC was most intuitive for the 
logic I wanted to
implement.

Thanks!!
DW

> Subject: Re: [PATCH 1/3] ui/gtk: skip drawing guest scanout when associated
> VC is invisible
> 
> Hi Dongwon
> 
> On Wed, Jan 31, 2024 at 3:50 AM  wrote:
> >
> > From: Dongwon Kim 
> >
> > A new flag "visible" is added to show visibility status of the gfx console.
> > The flag is set to 'true' when the VC is visible but set to 'false'
> > when it is hidden or closed. When the VC is invisible, drawing guest
> > frames should be skipped as it will never be completed and it would
> > potentially lock up the guest display especially when blob scanout is used.
> 
> Can't it skip drawing when the widget is not visible instead?
> https://docs.gtk.org/gtk3/method.Widget.is_visible.html
> 
> >
> > Cc: Marc-André Lureau 
> > Cc: Gerd Hoffmann 
> > Cc: Vivek Kasireddy 
> >
> > Signed-off-by: Dongwon Kim 
> > ---
> >  include/ui/gtk.h |  1 +
> >  ui/gtk-egl.c |  8 
> >  ui/gtk-gl-area.c |  8 
> >  ui/gtk.c | 10 +-
> >  4 files changed, 26 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/ui/gtk.h b/include/ui/gtk.h index
> > aa3d637029..2de38e5724 100644
> > --- a/include/ui/gtk.h
> > +++ b/include/ui/gtk.h
> > @@ -57,6 +57,7 @@ typedef struct VirtualGfxConsole {
> >  bool y0_top;
> >  bool scanout_mode;
> >  bool has_dmabuf;
> > +bool visible;
> >  #endif
> >  } VirtualGfxConsole;
> >
> > diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c index 3af5ac5bcf..993c283191
> > 100644
> > --- a/ui/gtk-egl.c
> > +++ b/ui/gtk-egl.c
> > @@ -265,6 +265,10 @@ void
> gd_egl_scanout_dmabuf(DisplayChangeListener
> > *dcl,  #ifdef CONFIG_GBM
> >  VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
> >
> > +if (!vc->gfx.visible) {
> > +return;
> > +}
> > +
> >  eglMakeCurrent(qemu_egl_display, vc->gfx.esurface,
> > vc->gfx.esurface, vc->gfx.ectx);
> >
> > @@ -363,6 +367,10 @@ void gd_egl_flush(DisplayChangeListener *dcl,
> >  VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
> >  GtkWidget *area = vc->gfx.drawing_area;
> >
> > +if (!vc->gfx.visible) {
> > +return;
> > +}
> > +
> >  if (vc->gfx.guest_fb.dmabuf && !vc->gfx.guest_fb.dmabuf-
> >draw_submitted) {
> >  graphic_hw_gl_block(vc->gfx.dcl.con, true);
> >  vc->gfx.guest_fb.dmabuf->draw_submitted = true; diff --git
> > a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c index 52dcac161e..04e07bd7ee
> > 100644
> > --- a/ui/gtk-gl-area.c
> > +++ b/ui/gtk-gl-area.c
> > @@ -285,6 +285,10 @@ void
> > gd_gl_area_scanout_flush(DisplayChangeListener *dcl,  {
> >  VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
> >
> > +if (!vc->gfx.visible) {
> > +return;
> > +}
> > +
> >  if (vc->gfx.guest_fb.dmabuf && !vc->gfx.guest_fb.dmabuf-
> >draw_submitted) {
> >  graphic_hw_gl_block(vc->gfx.dcl.con, true);
> >  vc->gfx.guest_fb.dmabuf->draw_submitted = true; @@ -299,6
> > +303,10 @@ void gd_gl_area_scanout_dmabuf(DisplayChangeListener *dcl,
> > #ifdef CONFIG_GBM
> >  VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
> >
> > +if (!vc->gfx.visible) {
> > +return;
> > +}
> > +
> >  gtk_gl_area_make_current(GTK_GL_AREA(vc->gfx.drawing_area));
> >  egl_dmabuf_import_texture(dmabuf);
> >  if (!dmabuf->texture) {
> > diff --git a/ui/gtk.c b/ui/gtk.c
> > index 810d7fc796..02eb667d8a 100644
> > --- a/ui/gtk.c
> > +++ b/ui/gtk.c
> > @@ -1312,15 +1312,20 @@ static void gd_menu_quit(GtkMenuItem *item,
> > void *opaque)  static void gd_menu_switch_vc(GtkMenuItem *item, void
> > *opaque)  {
> >  GtkDisplayState *s = opaque;
> > -VirtualConsole *vc = gd_vc_find_by_menu(s);
> > +VirtualConsole *vc;
> >  GtkNotebook *nb = GTK_NOTEBOOK(s->notebook);
> >  gint page;
> >
> > +vc = gd_vc_find_current(s);
> > +vc->gfx.visible = false;
> > +
> > +vc = gd_vc_find_by_menu(s);
> >  gtk_release_modifiers(s);
> >  if (vc) {
> >  page = gtk_notebook_page_num(nb, vc->tab_item);
> >  gtk_notebook_set_current_page(nb, page);
> >  gtk_widget_grab_focus(vc->focus);
> > +vc->gfx.visible = true;
> >  }
> >  }
> >
> > @@ -1350,6 +1355,7 @@ static gboolean gd_tab_window_close(GtkWidget
> *widget, GdkEvent *event,
> >  VirtualConsole *vc = opaque;
> >  GtkDisplayState *s = vc->s;
> >
> > +vc->gfx.visible = false;
> >  gtk_widget_set_sensitive(vc->menu_item, true);
> >  gd_widget_reparent(vc->window, s->notebook,

Re: [PATCH] target/riscv: Use RISCVException as return type for all csr ops

2024-01-31 Thread Daniel Henrique Barboza





On 1/30/24 08:08, LIU Zhiwei wrote:

The real return value type has been converted to RISCVException,
but some function declarations still not. This patch makes all
csr operation declarations use RISCVExcetion.

Signed-off-by: LIU Zhiwei 
---


There's a trivial conflict down there due to the vlen->vlenb changes that
got merged recently in riscv-to-apply.next.

As for the patch:

Reviewed-by: Daniel Henrique Barboza 


  target/riscv/csr.c | 117 -
  1 file changed, 74 insertions(+), 43 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 674ea075a4..ac9a856cc5 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -242,7 +242,7 @@ static RISCVException any32(CPURISCVState *env, int csrno)
  
  }
  
-static int aia_any(CPURISCVState *env, int csrno)

+static RISCVException aia_any(CPURISCVState *env, int csrno)
  {
  if (!riscv_cpu_cfg(env)->ext_smaia) {
  return RISCV_EXCP_ILLEGAL_INST;
@@ -251,7 +251,7 @@ static int aia_any(CPURISCVState *env, int csrno)
  return any(env, csrno);
  }
  
-static int aia_any32(CPURISCVState *env, int csrno)

+static RISCVException aia_any32(CPURISCVState *env, int csrno)
  {
  if (!riscv_cpu_cfg(env)->ext_smaia) {
  return RISCV_EXCP_ILLEGAL_INST;
@@ -269,7 +269,7 @@ static RISCVException smode(CPURISCVState *env, int csrno)
  return RISCV_EXCP_ILLEGAL_INST;
  }
  
-static int smode32(CPURISCVState *env, int csrno)

+static RISCVException smode32(CPURISCVState *env, int csrno)
  {
  if (riscv_cpu_mxl(env) != MXL_RV32) {
  return RISCV_EXCP_ILLEGAL_INST;
@@ -278,7 +278,7 @@ static int smode32(CPURISCVState *env, int csrno)
  return smode(env, csrno);
  }
  
-static int aia_smode(CPURISCVState *env, int csrno)

+static RISCVException aia_smode(CPURISCVState *env, int csrno)
  {
  if (!riscv_cpu_cfg(env)->ext_ssaia) {
  return RISCV_EXCP_ILLEGAL_INST;
@@ -287,7 +287,7 @@ static int aia_smode(CPURISCVState *env, int csrno)
  return smode(env, csrno);
  }
  
-static int aia_smode32(CPURISCVState *env, int csrno)

+static RISCVException aia_smode32(CPURISCVState *env, int csrno)
  {
  if (!riscv_cpu_cfg(env)->ext_ssaia) {
  return RISCV_EXCP_ILLEGAL_INST;
@@ -496,7 +496,7 @@ static RISCVException pointer_masking(CPURISCVState *env, 
int csrno)
  return RISCV_EXCP_ILLEGAL_INST;
  }
  
-static int aia_hmode(CPURISCVState *env, int csrno)

+static RISCVException aia_hmode(CPURISCVState *env, int csrno)
  {
  if (!riscv_cpu_cfg(env)->ext_ssaia) {
  return RISCV_EXCP_ILLEGAL_INST;
@@ -505,7 +505,7 @@ static int aia_hmode(CPURISCVState *env, int csrno)
   return hmode(env, csrno);
  }
  
-static int aia_hmode32(CPURISCVState *env, int csrno)

+static RISCVException aia_hmode32(CPURISCVState *env, int csrno)
  {
  if (!riscv_cpu_cfg(env)->ext_ssaia) {
  return RISCV_EXCP_ILLEGAL_INST;
@@ -681,7 +681,8 @@ static RISCVException read_vl(CPURISCVState *env, int csrno,
  return RISCV_EXCP_NONE;
  }
  
-static int read_vlenb(CPURISCVState *env, int csrno, target_ulong *val)

+static RISCVException read_vlenb(CPURISCVState *env, int csrno,
+ target_ulong *val)
  {
  *val = riscv_cpu_cfg(env)->vlen >> 3;
  return RISCV_EXCP_NONE;
@@ -742,13 +743,15 @@ static RISCVException write_vstart(CPURISCVState *env, 
int csrno,
  return RISCV_EXCP_NONE;
  }
  
-static int read_vcsr(CPURISCVState *env, int csrno, target_ulong *val)

+static RISCVException read_vcsr(CPURISCVState *env, int csrno,
+target_ulong *val)
  {
  *val = (env->vxrm << VCSR_VXRM_SHIFT) | (env->vxsat << VCSR_VXSAT_SHIFT);
  return RISCV_EXCP_NONE;
  }
  
-static int write_vcsr(CPURISCVState *env, int csrno, target_ulong val)

+static RISCVException write_vcsr(CPURISCVState *env, int csrno,
+ target_ulong val)
  {
  #if !defined(CONFIG_USER_ONLY)
  env->mstatus |= MSTATUS_VS;
@@ -798,13 +801,15 @@ static RISCVException read_timeh(CPURISCVState *env, int 
csrno,
  return RISCV_EXCP_NONE;
  }
  
-static int read_hpmcounter(CPURISCVState *env, int csrno, target_ulong *val)

+static RISCVException read_hpmcounter(CPURISCVState *env, int csrno,
+  target_ulong *val)
  {
  *val = get_ticks(false);
  return RISCV_EXCP_NONE;
  }
  
-static int read_hpmcounterh(CPURISCVState *env, int csrno, target_ulong *val)

+static RISCVException read_hpmcounterh(CPURISCVState *env, int csrno,
+   target_ulong *val)
  {
  *val = get_ticks(true);
  return RISCV_EXCP_NONE;
@@ -812,7 +817,8 @@ static int read_hpmcounterh(CPURISCVState *env, int csrno, 
target_ulong *val)
  
  #else /* CONFIG_USER_ONLY */
  
-static int read_mhpmevent(CPURISCVState *env, int csrno, target_ulong *val)

+static RISCVException read_mhpmevent(CPURISCVState *env, int csrno,
+

Re: [PATCH 2/2] target/riscv: Support xtheadmaee for thead-c906

2024-01-31 Thread Conor Dooley

On Tue, Jan 30, 2024 at 12:43:25PM +0100, Christoph Müllner wrote:
> On Tue, Jan 30, 2024 at 12:12 PM LIU Zhiwei
>  wrote:
> >
> > thead-c906 uses some flags in pte [60-63] bits. It has history reasons that
> > SVPBMT didn't exist when thead-c906 came to world.
> >
> > We named this feature as xtheadmaee. this feature is controlled by an custom
> > CSR named mxstatus, whose maee field encodes whether enable the pte [60-63] 
> > bits.
> >
> > The sections "5.2.2.1 Page table structure" and "15.1.7.1 M-mode extension
> > status register (MXSTATUS)" in document[1] give the detailed information
> > about its design.
> 
> I would prefer if we would not define an extension like XTheadMaee
> without a specification.
> The linked document defines the bit MAEE in a custom CSR, but the
> scope of XTheadMaee
> is not clearly defined (the term XTheadMaee is not even part of the PDF).
> 
> We have all the XThead* extensions well described here:
>   https://github.com/T-head-Semi/thead-extension-spec/tree/master
> And it would not be much effort to add XTheadMaee there as well.

Yeah, I was gonna request exactly this, so glad to see you beat me to
the punch. It would be really great if this was done, particularly if
this xmaee is going to appear in devicetrees and sooner or later is
gonna want to be documented in a binding.

Cheers,
Conor.

> For those who don't know the context of this patch, here is the c906
> boot regression report from Björn:
>   https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg04766.html


signature.asc
Description: PGP signature

Re: [PATCH 06/14] migration/multifd: Separate SYNC request with normal jobs

2024-01-31 Thread Fabiano Rosas

pet...@redhat.com writes:

> From: Peter Xu 
>
> Multifd provide a threaded model for processing jobs.  On sender side,
> there can be two kinds of job: (1) a list of pages to send, or (2) a sync
> request.
>
> The sync request is a very special kind of job.  It never contains a page
> array, but only a multifd packet telling the dest side to synchronize with
> sent pages.
>
> Before this patch, both requests use the pending_job field, no matter what
> the request is, it will boost pending_job, while multifd sender thread will
> decrement it after it finishes one job.
>
> However this should be racy, because SYNC is special in that it needs to
> set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request.
> Consider a sequence of operations where:
>
>   - migration thread enqueue a job to send some pages, pending_job++ (0->1)
>
>   - [...before the selected multifd sender thread wakes up...]
>
>   - migration thread enqueue another job to sync, pending_job++ (1->2),
> setup p->flags=MULTIFD_FLAG_SYNC
>
>   - multifd sender thread wakes up, found pending_job==2
> - send the 1st packet with MULTIFD_FLAG_SYNC and list of pages
> - send the 2nd packet with flags==0 and no pages
>
> This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done
> after all the pages are received.  Meanwhile, the 2nd packet will be
> completely useless, which contains zero information.
>
> I didn't verify above, but I think this issue is still benign in that at
> least on the recv side we always receive pages before handling
> MULTIFD_FLAG_SYNC.  However that's not always guaranteed and just tricky.
>
> One other reason I want to separate it is using p->flags to communicate
> between the two threads is also not clearly defined, it's very hard to read
> and understand why accessing p->flags is always safe; see the current impl
> of multifd_send_thread() where we tried to cache only p->flags.  It doesn't
> need to be that complicated.
>
> This patch introduces pending_sync, a separate flag just to show that the
> requester needs a sync.  Alongside, we remove the tricky caching of
> p->flags now because after this patch p->flags should only be used by
> multifd sender thread now, which will be crystal clear.  So it is always
> thread safe to access p->flags.
>
> With that, we can also safely convert the pending_job into a boolean,
> because we don't support >1 pending jobs anyway.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

[PATCH] RISC-V: Report the QEMU vendor/arch IDs on virtual CPUs

2024-01-31 Thread Palmer Dabbelt

Right now we just report 0 for marchid/mvendorid in QEMU.  That's legal,
but it's tricky for users that want to check if they're running on QEMU
to do so.  This sets marchid to 42, which I've proposed as the QEMU
architecture ID (mvendorid remains 0, just explicitly set, as that's how
the ISA handles open source implementations).

Link: https://github.com/riscv/riscv-isa-manual/pull/1213
Signed-off-by: Palmer Dabbelt 
---
 target/riscv/cpu.c  | 16 
 target/riscv/cpu_vendorid.h |  3 +++
 2 files changed, 19 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8cbfc7e781..1aef186f87 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -415,6 +415,9 @@ static void riscv_any_cpu_init(Object *obj)
 cpu->cfg.ext_zicsr = true;
 cpu->cfg.mmu = true;
 cpu->cfg.pmp = true;
+
+cpu->cfg.mvendorid = QEMU_MVENDORID;
+cpu->cfg.marchid = QEMU_MARCHID;
 }
 
 static void riscv_max_cpu_init(Object *obj)
@@ -432,6 +435,8 @@ static void riscv_max_cpu_init(Object *obj)
 set_satp_mode_max_supported(RISCV_CPU(obj), mlx == MXL_RV32 ?
 VM_1_10_SV32 : VM_1_10_SV57);
 #endif
+cpu->cfg.mvendorid = QEMU_MVENDORID;
+cpu->cfg.marchid = QEMU_MARCHID;
 }
 
 #if defined(TARGET_RISCV64)
@@ -445,6 +450,8 @@ static void rv64_base_cpu_init(Object *obj)
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV57);
 #endif
+cpu->cfg.mvendorid = QEMU_MVENDORID;
+cpu->cfg.marchid = QEMU_MARCHID;
 }
 
 static void rv64_sifive_u_cpu_init(Object *obj)
@@ -569,6 +576,8 @@ static void rv128_base_cpu_init(Object *obj)
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV57);
 #endif
+cpu->cfg.mvendorid = QEMU_MVENDORID;
+cpu->cfg.marchid = QEMU_MARCHID;
 }
 
 static void rv64i_bare_cpu_init(Object *obj)
@@ -591,6 +600,8 @@ static void rv64i_bare_cpu_init(Object *obj)
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV64);
 #endif
+cpu->cfg.mvendorid = QEMU_MVENDORID;
+cpu->cfg.marchid = QEMU_MARCHID;
 }
 #else
 static void rv32_base_cpu_init(Object *obj)
@@ -603,6 +614,8 @@ static void rv32_base_cpu_init(Object *obj)
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV32);
 #endif
+cpu->cfg.mvendorid = QEMU_MVENDORID;
+cpu->cfg.marchid = QEMU_MARCHID;
 }
 
 static void rv32_sifive_u_cpu_init(Object *obj)
@@ -672,6 +685,9 @@ static void rv32_imafcu_nommu_cpu_init(Object *obj)
 cpu->cfg.ext_zifencei = true;
 cpu->cfg.ext_zicsr = true;
 cpu->cfg.pmp = true;
+
+cpu->cfg.mvendorid = QEMU_MVENDORID;
+cpu->cfg.marchid = QEMU_MARCHID;
 }
 #endif
 
diff --git a/target/riscv/cpu_vendorid.h b/target/riscv/cpu_vendorid.h
index 96b6b9c2cb..486832cd53 100644
--- a/target/riscv/cpu_vendorid.h
+++ b/target/riscv/cpu_vendorid.h
@@ -7,4 +7,7 @@
 #define VEYRON_V1_MIMPID0x111
 #define VEYRON_V1_MVENDORID 0x61f
 
+#define QEMU_VIRT_MVENDORID 0
+#define QEMU_VIRT_MARCHID   42
+
 #endif /*  TARGET_RISCV_CPU_VENDORID_H */
-- 
2.43.0

Re: [PATCH v4 1/1] oslib-posix: initialize backend memory objects in parallel

2024-01-31 Thread David Hildenbrand


On 31.01.24 17:53, Mark Kanda wrote:

QEMU initializes preallocated backend memory as the objects are parsed from
the command line. This is not optimal in some cases (e.g. memory spanning
multiple NUMA nodes) because the memory objects are initialized in series.

Allow the initialization to occur in parallel (asynchronously). In order to
ensure optimal thread placement, asynchronous initialization requires prealloc
context threads to be in use.

Signed-off-by: Mark Kanda 
Signed-off-by: David Hildenbrand 
---


So, this LGTM. There might be ways to not rely on phases to achieve what 
we want to achieve (e.g., let the machine set an internal property on 
memory backends we create from the cmdline), but this should do as well.


I'll wait a bit for more feedback. If there is none, I'll route this 
through my tree (after doing a quick sanity test).


Thanks!

--
Cheers,

David / dhildenb

Re: Call for GSoC/Outreachy internship project ideas

2024-01-31 Thread Stefan Hajnoczi

On Wed, 31 Jan 2024 at 10:59, Palmer Dabbelt  wrote:
>
> On Wed, 31 Jan 2024 06:39:25 PST (-0800), stefa...@gmail.com wrote:
> > On Tue, 30 Jan 2024 at 14:40, Palmer Dabbelt  wrote:
> >> On Mon, 15 Jan 2024 08:32:59 PST (-0800), stefa...@gmail.com wrote:
> >> I'm not 100% sure this is a sane GSoC idea, as it's a bit open ended and
> >> might have some tricky parts.  That said it's tripping some people up
> >> and as far as I know nobody's started looking at it, so I figrued I'd
> >> write something up.
> >
> > Hi Palmer,
> > Your idea has been added:
> > https://wiki.qemu.org/Google_Summer_of_Code_2024#RISC-V_Vector_TCG_Frontend_Optimization
> >
> > I added links to the vector extension specification and the RISC-V TCG
> > frontend source code.
> >
> > Please add concrete tasks (e.g. specific optimizations the intern
> > should implement and benchmark) by Feb 21st. Thank you!
>
> OK.  We've got a few examples starting to filter in, I'll keep updating
> the bug until we get some nice concrete reproducers for slowdows of
> decent vectorized code.  Then I'll take a look and what's inside them,
> with any luck it'll be simple to figure out which vector instructions
> are commonly used and slow -- there's a bunch of stuff in the RVV
> translation that doesn't map cleanly, so I'm guessing it'll be in there.
>
> If that all goes smoothly then I think we should have a reasonably
> actionable intern project, but LMK if you were thinking of something
> else?

That's great!

Thanks,
Stefan

Re: [PATCH] blkio: Respect memory-alignment for bounce buffer allocations

2024-01-31 Thread Stefano Garzarella


On Wed, Jan 31, 2024 at 06:31:40PM +0100, Kevin Wolf wrote:

blkio_alloc_mem_region() requires that the requested buffer size is a
multiple of the memory-alignment property. If it isn't, the allocation
fails with a return value of -EINVAL.

Fix the call in blkio_resize_bounce_pool() to make sure the requested
size is properly aligned.

I observed this problem with vhost-vdpa, which requires page aligned
memory. As the virtio-blk device behind it still had 512 byte blocks, we
got bs->bl.request_alignment = 512, but actually any request that needed
a bounce buffer and was not aligned to 4k would fail without this fix.

Suggested-by: Stefano Garzarella 
Signed-off-by: Kevin Wolf 
---
block/blkio.c | 3 +++
1 file changed, 3 insertions(+)


Thanks for fixinig this!

Reviewed-by: Stefano Garzarella 



diff --git a/block/blkio.c b/block/blkio.c
index 0a0a6c0f5f..b989617608 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -89,6 +89,9 @@ static int blkio_resize_bounce_pool(BDRVBlkioState *s, 
int64_t bytes)
/* Pad size to reduce frequency of resize calls */
bytes += 128 * 1024;

+/* Align the pool size to avoid blkio_alloc_mem_region() failure */
+bytes = QEMU_ALIGN_UP(bytes, s->mem_region_alignment);
+
WITH_QEMU_LOCK_GUARD(>blkio_lock) {
int ret;

--
2.43.0

Re: [PATCH] hw/pci: migration: Skip config space check for vendor specific capability during restore/load

2024-01-31 Thread Alex Williamson

On Wed, 31 Jan 2024 15:22:59 +0530
Vinayak Kale  wrote:

> On 31/01/24 12:28 am, Alex Williamson wrote:
> > 
> > On Tue, 30 Jan 2024 23:32:26 +0530
> > Vinayak Kale  wrote:
> >   
> >> Missed adding Michael, Marcel, Alex and Avihai earlier, apologies.
> >>
> >> Regards,
> >> Vinayak
> >>
> >> On 30/01/24 3:26 pm, Vinayak Kale wrote:  
> >>> In case of migration, during restore operation, qemu checks the config 
> >>> space of the pci device with the config space
> >>> in the migration stream captured during save operation. In case of config 
> >>> space data mismatch, restore operation is failed.
> >>>
> >>> config space check is done in function get_pci_config_device(). By 
> >>> default VSC (vendor-specific-capability) in config space is checked.
> >>>
> >>> Ideally qemu should not check VSC during restore/load. This patch skips 
> >>> the check by not setting pdev->cmask[] for VSC offsets in 
> >>> pci_add_capability().
> >>> If cmask[] is not set for an offset, then qemu skips config space check 
> >>> for that offset.
> >>>
> >>> Signed-off-by: Vinayak Kale 
> >>> ---
> >>>hw/pci/pci.c | 7 +--
> >>>1 file changed, 5 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>> index 76080af580..32429109df 100644
> >>> --- a/hw/pci/pci.c
> >>> +++ b/hw/pci/pci.c
> >>> @@ -2485,8 +2485,11 @@ int pci_add_capability(PCIDevice *pdev, uint8_t 
> >>> cap_id,
> >>>memset(pdev->used + offset, 0xFF, QEMU_ALIGN_UP(size, 4));
> >>>/* Make capability read-only by default */
> >>>memset(pdev->wmask + offset, 0, size);
> >>> -/* Check capability by default */
> >>> -memset(pdev->cmask + offset, 0xFF, size);
> >>> +
> >>> +if (cap_id != PCI_CAP_ID_VNDR) {
> >>> +/* Check non-vendor specific capability by default */
> >>> +memset(pdev->cmask + offset, 0xFF, size);
> >>> +}
> >>>return offset;
> >>>}
> >>>  
> >>  
> > 
> > If there is a possibility that the data within the vendor specific cap
> > can be consumed by the driver or diagnostic tools, then it's part of
> > the device ABI and should be consistent across migration.  A mismatch
> > can certainly cause a migration failure, but why shouldn't it?  
> 
> Sure, the device ABI should be consistent across migration. In case of 
> VSC, it should represent same format on source and destination. But 
> shouldn't VSC content check or its interpretation be left to vendor 
> driver instead of qemu?

By "vendor driver" here, are you suggesting that QEMU device models (ex.
hw/net/{e1000*,igb*,rtl8139*}) should perform that validation?  If so,
where's the patch that introduces any sort of validation hooks for
vendors to provide?  Where is this validation going to happen in the
case of a migratable vfio-pci variant devices?  Nothing about this
patch suggests that it's deferring responsibility to some other code
entity, it only indicates "checking this breaks, let's not do it".

It's possible that the device you care about only reports volatile
diagnostic information through the vendor specific capability, but
another device might use it to report information relative to the
internal hardware configuration.  Without knowing what the vendor
specific capability contains, QEMU needs to take the most conservative
approach by default.  Thanks,

Alex

Re: [PATCH 0/5] buses: switch to 3-phase-reset

2024-01-31 Thread Cédric Le Goater


On 1/22/24 15:19, Cédric Le Goater wrote:

Hello,

On 1/22/24 03:06, Peter Xu wrote:

Hi, Peter,

On Fri, Jan 19, 2024 at 04:35:07PM +, Peter Maydell wrote:

I wrote this ages ago and recently picked it back up because of a
recent PCI related reset ordering problem noted by Peter Xu.  I'm not
sure if this patchset is necessary as a part of fixing that ordering
problem (it might even be possible now to have the intel_iommu device
use 3-phase reset and put the relevant parts of its reset into the
'exit' phase), but either way we really ought to do this cleanup
to reduce the amount of legacy/transitional handling we have.


The VFIO issue I was working on may not directly benefit from this series
iiuc, as it's more of an special ordering on both (1) VFIO special case
reset path using qemu_register_reset(), and (2) VT-d device is not put at
the right place in the QOM hierachy [1].

Said that, thanks a lot for posting the patches; they all look reasonable
and good cleanups to the reset infrastructure, afaict.



Yes. I took the series in my vfio testing environment (x86_64 and s390x) and
didn't see any issue. I will keep it for further testing.


Acked-by: Cédric Le Goater 
Tested-by: Cédric Le Goater 

Thanks,

C.

[PATCH] blkio: Respect memory-alignment for bounce buffer allocations

2024-01-31 Thread Kevin Wolf

blkio_alloc_mem_region() requires that the requested buffer size is a
multiple of the memory-alignment property. If it isn't, the allocation
fails with a return value of -EINVAL.

Fix the call in blkio_resize_bounce_pool() to make sure the requested
size is properly aligned.

I observed this problem with vhost-vdpa, which requires page aligned
memory. As the virtio-blk device behind it still had 512 byte blocks, we
got bs->bl.request_alignment = 512, but actually any request that needed
a bounce buffer and was not aligned to 4k would fail without this fix.

Suggested-by: Stefano Garzarella 
Signed-off-by: Kevin Wolf 
---
 block/blkio.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blkio.c b/block/blkio.c
index 0a0a6c0f5f..b989617608 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -89,6 +89,9 @@ static int blkio_resize_bounce_pool(BDRVBlkioState *s, 
int64_t bytes)
 /* Pad size to reduce frequency of resize calls */
 bytes += 128 * 1024;
 
+/* Align the pool size to avoid blkio_alloc_mem_region() failure */
+bytes = QEMU_ALIGN_UP(bytes, s->mem_region_alignment);
+
 WITH_QEMU_LOCK_GUARD(>blkio_lock) {
 int ret;
 
-- 
2.43.0

[PATCH] seccomp: report EPERM instead of killing process for spawn set

2024-01-31 Thread Daniel P . Berrangé

When something tries to run one of the spawn syscalls (eg clone),
our seccomp deny filter is set to cause a fatal trap which kills
the process.

This is found to be unhelpful when QEMU has loaded the nvidia
GL library. This tries to spawn a process to modprobe the nvidia
kmod. This is a dubious thing to do, but at the same time, the
code will gracefully continue if this fails. Our seccomp filter
rightly blocks the spawning, but prevent the graceful continue.

Switching to reporting EPERM will make QEMU behave more gracefully
without impacting the level of protect we have.

https://gitlab.com/qemu-project/qemu/-/issues/2116
Signed-off-by: Daniel P. Berrangé 
---
 system/qemu-seccomp.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/system/qemu-seccomp.c b/system/qemu-seccomp.c
index 4d7439e7f7..98ffce075c 100644
--- a/system/qemu-seccomp.c
+++ b/system/qemu-seccomp.c
@@ -74,7 +74,7 @@ const struct scmp_arg_cmp sched_setscheduler_arg[] = {
 
 #define RULE_CLONE_FLAG(flag) \
 { SCMP_SYS(clone),  QEMU_SECCOMP_SET_SPAWN, \
-  ARRAY_SIZE(clone_arg ## flag), clone_arg ## flag, SCMP_ACT_TRAP }
+  ARRAY_SIZE(clone_arg ## flag), clone_arg ## flag, SCMP_ACT_ERRNO(EPERM) }
 
 /* If no CLONE_* flags are set, except CSIGNAL, deny */
 const struct scmp_arg_cmp clone_arg_none[] = {
@@ -214,13 +214,13 @@ static const struct QemuSeccompSyscall denylist[] = {
   0, NULL, SCMP_ACT_TRAP },
 /* spawn */
 { SCMP_SYS(fork),   QEMU_SECCOMP_SET_SPAWN,
-  0, NULL, SCMP_ACT_TRAP },
+  0, NULL, SCMP_ACT_ERRNO(EPERM) },
 { SCMP_SYS(vfork),  QEMU_SECCOMP_SET_SPAWN,
-  0, NULL, SCMP_ACT_TRAP },
+  0, NULL, SCMP_ACT_ERRNO(EPERM) },
 { SCMP_SYS(execve), QEMU_SECCOMP_SET_SPAWN,
-  0, NULL, SCMP_ACT_TRAP },
+  0, NULL, SCMP_ACT_ERRNO(EPERM) },
 { SCMP_SYS(clone),  QEMU_SECCOMP_SET_SPAWN,
-  ARRAY_SIZE(clone_arg_none), clone_arg_none, SCMP_ACT_TRAP },
+  ARRAY_SIZE(clone_arg_none), clone_arg_none, SCMP_ACT_ERRNO(EPERM) },
 RULE_CLONE_FLAG(CLONE_VM),
 RULE_CLONE_FLAG(CLONE_FS),
 RULE_CLONE_FLAG(CLONE_FILES),
-- 
2.43.0

[PATCH v4 1/1] oslib-posix: initialize backend memory objects in parallel

2024-01-31 Thread Mark Kanda

QEMU initializes preallocated backend memory as the objects are parsed from
the command line. This is not optimal in some cases (e.g. memory spanning
multiple NUMA nodes) because the memory objects are initialized in series.

Allow the initialization to occur in parallel (asynchronously). In order to
ensure optimal thread placement, asynchronous initialization requires prealloc
context threads to be in use.

Signed-off-by: Mark Kanda 
Signed-off-by: David Hildenbrand 
---
 backends/hostmem.c |   7 ++-
 hw/virtio/virtio-mem.c |   4 +-
 include/hw/qdev-core.h |   5 ++
 include/qemu/osdep.h   |  18 +-
 system/vl.c|   9 +++
 util/oslib-posix.c | 131 +++--
 util/oslib-win32.c |   8 ++-
 7 files changed, 145 insertions(+), 37 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 30f69b2cb5..17221e422a 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -20,6 +20,7 @@
 #include "qom/object_interfaces.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/madvise.h"
+#include "hw/qdev-core.h"
 
 #ifdef CONFIG_NUMA
 #include 
@@ -237,7 +238,7 @@ static void host_memory_backend_set_prealloc(Object *obj, 
bool value,
 uint64_t sz = memory_region_size(>mr);
 
 if (!qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
-   backend->prealloc_context, errp)) {
+   backend->prealloc_context, false, errp)) {
 return;
 }
 backend->prealloc = true;
@@ -323,6 +324,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
 HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc);
 void *ptr;
 uint64_t sz;
+bool async = !phase_check(PHASE_LATE_BACKENDS_CREATED);
 
 if (!bc->alloc) {
 return;
@@ -398,7 +400,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
 if (backend->prealloc && 
!qemu_prealloc_mem(memory_region_get_fd(>mr),
 ptr, sz,
 backend->prealloc_threads,
-backend->prealloc_context, 
errp)) {
+backend->prealloc_context,
+async, errp)) {
 return;
 }
 }
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 99ab989852..ffd119ebac 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -605,7 +605,7 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, 
uint64_t start_gpa,
 int fd = memory_region_get_fd(>memdev->mr);
 Error *local_err = NULL;
 
-if (!qemu_prealloc_mem(fd, area, size, 1, NULL, _err)) {
+if (!qemu_prealloc_mem(fd, area, size, 1, NULL, false, _err)) {
 static bool warned;
 
 /*
@@ -1248,7 +1248,7 @@ static int virtio_mem_prealloc_range_cb(VirtIOMEM *vmem, 
void *arg,
 int fd = memory_region_get_fd(>memdev->mr);
 Error *local_err = NULL;
 
-if (!qemu_prealloc_mem(fd, area, size, 1, NULL, _err)) {
+if (!qemu_prealloc_mem(fd, area, size, 1, NULL, false, _err)) {
 error_report_err(local_err);
 return -ENOMEM;
 }
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 151d968238..83dd9e2485 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -1071,6 +1071,11 @@ typedef enum MachineInitPhase {
  */
 PHASE_ACCEL_CREATED,
 
+/*
+ * Late backend objects have been created and initialized.
+ */
+PHASE_LATE_BACKENDS_CREATED,
+
 /*
  * machine_class->init has been called, thus creating any embedded
  * devices and validating machine properties.  Devices created at
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index c9692cc314..7d359dabc4 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -680,6 +680,8 @@ typedef struct ThreadContext ThreadContext;
  * @area: start address of the are to preallocate
  * @sz: the size of the area to preallocate
  * @max_threads: maximum number of threads to use
+ * @tc: prealloc context threads pointer, NULL if not in use
+ * @async: request asynchronous preallocation, requires @tc
  * @errp: returns an error if this function fails
  *
  * Preallocate memory (populate/prefault page tables writable) for the virtual
@@ -687,10 +689,24 @@ typedef struct ThreadContext ThreadContext;
  * each page in the area was faulted in writable at least once, for example,
  * after allocating file blocks for mapped files.
  *
+ * When setting @async, allocation might be performed asynchronously.
+ * qemu_finish_async_prealloc_mem() must be called to finish any asynchronous
+ * preallocation.
+ *
  * Return: true on success, else false setting @errp with error.
  */
 bool qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
-   ThreadContext *tc, Error **errp);
+

[PATCH v4 0/1] Initialize backend memory objects in parallel

2024-01-31 Thread Mark Kanda

v4:
- remove unneeded async check from host_memory_backend_set_prealloc()
- rename qemu_finish_async_mem_prealloc -> qemu_finish_async_prealloc_mem
- use new phase PHASE_LATE_BACKENDS_CREATED for async

v3:
- squash into a single patch
- use global context list for async handling only (MT capability)
- add BQL asserts to guard against concurrent async prealloc requests
- clean up qemu_finish_async_mem_prealloc() error handling

Includes David's suggested restructuring [1] (with David's SoB).

[1] 
https://lore.kernel.org/qemu-devel/c15161eb-f52c-4a82-8b4b-0ba038421...@redhat.com/

v2:
- require MADV_POPULATE_WRITE (simplify the implementation)
- require prealloc context threads to ensure optimal thread placement
- use machine phase 'initialized' to determine when to allow parallel init

QEMU initializes preallocated backend memory when parsing the corresponding
objects from the command line. In certain scenarios, such as memory being
preallocated across multiple numa nodes, this approach is not optimal due to
the unnecessary serialization.

This series addresses this issue by initializing the backend memory objects in
parallel.

Mark Kanda (1):
  oslib-posix: initialize backend memory objects in parallel

 backends/hostmem.c |   7 ++-
 hw/virtio/virtio-mem.c |   4 +-
 include/hw/qdev-core.h |   5 ++
 include/qemu/osdep.h   |  18 +-
 system/vl.c|   9 +++
 util/oslib-posix.c | 131 +++--
 util/oslib-win32.c |   8 ++-
 7 files changed, 145 insertions(+), 37 deletions(-)

-- 
2.39.3

1 2 3 >

1 - 100 of 249 matches

Mail list logo